Paxos shouldn't be the first protocol you learn -- I think it gives consensus th...

ahelwer · on Oct 27, 2020

All the supposed "simpler" consensus protocols I've seen just hide all the complexity somewhere (like raft with its leader election). Where does streamlet hide its complexity?

the-smug-one · on Oct 27, 2020

It hides the complexity in the first bullet point: "The protocol proceeds in synchronized epochs."

That is, we're now solving consensus for a synchronous system.

So for example the Paxos assumption:

>Messages are sent asynchronously and may take arbitrarily long to deliver.

Does no longer hold AFAIK.

Edit: Good article, they even say this:

>Partially Synchronous Network: The network can at times be unreliable, or adversarial; the protocol must not lose consistency in this case. However, when network conditions are good, i.e., when honest players can communicate with each other within 1 second, the protocol must make progress.

Vervious · on Oct 27, 2020

No complexity is hidden here. Streamlet is (essentially) in the same setting as paxos. Just like in paxos, messages can take arbitrarily long to deliver.

The only assumption is that clocks are synchronized, which is much, much weaker. (We can actually remove even this assumption. But it's easier to think about this way.)

i.e. I think it's monday at the same time you think it's monday, or I think it's epoch 1 at the same time you think it's epoch 1. or my quartz crystal oscillates at the same rate as your quartz crystal. Importantly, a message sent during epoch 1 might arrive in epoch 3, or never at all.

Paxos is also in the partially synchronous setting - it requires some network synchrony in order to terminate, just like streamlet (For truly asynchronous protocols, there's a whole, complicated literature).

the-smug-one · on Oct 27, 2020

>Paxos is also in the partially synchronous setting - it requires some network synchrony in order to terminate, just like streamlet

As far as I've understood it Paxos is fully asynchronous, but you can either guarantee correctness or termination and never both for those systems. Paxos decided on correctness. Am I wrong in this?

Vervious · on Oct 27, 2020

Right, when run in a fully asynchronous setting, where the network adversary can always deliver messages in the order of their choice, the FLP impossibility result says that Paxos cannot terminate.

Thus, for Paxos to make progress, the network needs to eventually "heal" and deliver messages in a timely manner -- that's what we mean by partial synchrony. Streamlet, likewise, tolerates arbitrary network delays (in the sense that no two players will agree on different things even if Comcast maliciously scheduled every message), but also requires partial synchrony to make progress (i.e. once in a while Comcast will deliver messages in timely fashion and in order, like you expect).

(Partial synchrony is arguably the most practical of the theoretical synchrony models for consensus. There are other models, i.e. "periods of synchrony", but they all end up capturing the same idea).

FLP, however, is a fairly weak statement. It only rules out termination for deterministic protocols. We can in fact design randomized asynchronous protocols that make progress (with good probability), regardless of the ordering in which messages are delivered. I.e. as long as Comcast eventually delivers every message -- and letting it choose delays arbitrarily -- a randomized protocol could make progress. These are the "true" asynchronous protocols. (Paxos is not one of them)

I guess the original asynchronous consensus protocol is Ben-Or's protocol (which takes 2^n time in expectation IIRC, where n is the number of players). We can reduce the running time using some (fairly sophisticated) cryptography -- the most famous is the Canetti-Rabin protocol, and Cachin and many others have worked on it since. Most of the complexity (for polynomial time protocols) is in building a "global common coin" -- i.e. a global coin that everyone agrees on the result of; once you have that we can use Mostefaoui's protocol for instance. Building this coin is incredibly complicated (much more so than Paxos) unless we assume some strong cryptographic trusted setup (i.e. a nice threshold signature scheme). How practical any of these schemes are is an open question.

Vervious · on Oct 27, 2020

(For more, Ittai Abraham has a series of nice posts: https://decentralizedthoughts.github.io/2019-06-01-2019-5-31...)

Vervious · on Oct 27, 2020

I don't think it hides any complexity, but I guess that's for the reader to decide.

Raft on the other hand... I've implemented versions of both paxos and raft, and by golly was the latter more difficult to get right.

judofyr · on Oct 27, 2020

> Raft on the other hand... I've implemented versions of both paxos and raft, and by golly was the latter more difficult to get right.

Could you clarify what you mean? "Paxos" as defined in the literature (and as presented in this article) is a single-shot consensus algorithm. Raft is a leader-based replicated state machine. They are not comparable at all and it's not surprising if you find Raft more difficult.

However, in practice, when people say "Paxos" they usually mean some variant of MultiPaxos and "Raft vs Paxos" is typically interpreted as "Raft vs MultiPaxos". It's also well-established that MultiPaxos and Raft have roughly the same complexity.

Vervious · on Oct 27, 2020

(In the context of embedding paxos in some sort of proof-of-concept replication system. Something like Multi-paxos.)

You're right, I guess we generally talk about state machine replication instead of one-shot systems.

RabbitmqGuy · on Oct 27, 2020

Looks like streamlet requires synchronized time. That can be hard to achieve.

From the link; "The protocol runs sequentially in synchronized ‘epochs’ that are 2 seconds long. Every player starts in epoch 0 at the same time...."

pkulak · on Oct 27, 2020

Yeah, but synchronizing time gets easier the bigger the accuracy window is, and approaches impossible the smaller. 2 seconds in well within NTP's wheelhouse on all but the most jittery and unusable networks.

Vervious · on Oct 27, 2020

We could actually run a generic clock synchronization protocol to synchronize clocks over a partially synchronous network. I guess paxos kind of builds that in, which adds to its complexity.

Safety in particular doesn't require any timing assumption.

(Note that messages can still be delayed arbitrarily in both protocols)

jen20 · on Oct 27, 2020

I really wish I could remember who to correctly attribute this quote to, but I heard it from Randy Shoup - paraphrased:

Consensus protocols are either Paxos, Paxos with unnecesary additional cruft, or broken.

tkhattra · on Oct 27, 2020

I think it was Mike Burrows, but I can't remember where I heard or read that.

hinkley · on Oct 27, 2020

One of the Raft introduction videos claims that there are approximately 12 people on the planet who fully understand Paxos.

This is contrasted with a Leslie Lamport quote insisting that it is so simple everyone could understand it.

I suspect this means that Mr Lamport would make a terrible professor.

SamReidHughes · on Oct 27, 2020

I think Lamport's Paxos paper was, well, kind of a bad way to teach people about Paxos, but a good thing to have as part of Paxos teaching.

IMO, Paxos is much easier to "fully" understand than Raft. The question really becomes, define "fully." What set of performance scenarios, with member connects/disconnects and other shenanigans, do we need to consider in order to "fully" understand Paxos (or basic Multi-Paxos)?

What I really like about Paxos is that it's pretty easy to re-invent the exact algorithm, or come up with your own Multi-Paxos definition, after you've forgotten what the original exactly was, once you've grokked the general principles of how it works.

ahelwer · on Oct 27, 2020

I was at the talk where that assertion was made. It ticked me off. This sort of thing is learned helplessness, no different from people saying "they aren't very good with computers".

hinkley · on Oct 27, 2020

As someone who was taught to drive stick shift, I am well aware that there are things a person can do which may actually improve their quality of life for having done them. Within that group of things is a large subset which we shouldn't force people to have to think about. Particularly if those people are tasked with also keeping track of other important things.

Yes, there are plenty of people who can't be bothered to learn something because they are lazy, intimidated, or stupid. But it's not even most, and that presumption, of all of the egotistical self-centered bullshit we pull, ticks me off most of all.

We only have time in our lives to become experts in a handful of things (some people think that number is one, or two, which is a shame), and lots of people have other shit they'd rather be doing, possibly (hopefully!) including things that you can't be bothered to learn about. If we all worry about the same things then we don't get where we're going.

geocar · on Oct 28, 2020

There's a difference between being able to drive stick shift, and "fully understand"-ing how to drive stick shift.

I would absolutely say that I can drive stick shift because I've proven it to examiners in multiple countries (not every country permits a license exchange sadly), but to "fully understand" something is (well) something of a higher bar so I would ask how high? to make sure I understand what it is they expect me to understand. There might be a nuance that you (or I!) hadn't considered, for example driving stick shift with an unsynchronised manual transmission, or shifting without clutching, or something else entirely, and the point of the "fully understand" qualifier might be to call attention to something like this.

For example: Say I can implement paxos in my sleep, debug someone elses implementation, and I can even design and implement a new consensus algorithm based on paxos where I may satisfy requirements of parts in slightly different ways that are extremely domain-specific but may be beneficial from a performance/time/space/cost perspective. This is what someone might mean by "fully understand"-ing paxos, but making some changes to paxos may take time for me to convince myself it is correct (or incorrect!), or there might be parts I think are essential/important that aren't (in some situations), and so I would still be interested in learning something new. Who wouldn't?

However, I don't have good experiences with people who say "there are only X people who fully understand Y so you should use Z" -- it's usually means they don't understand Y as well as they understand Z, and that's not what I want from an expert.

ahelwer · on Oct 27, 2020

Nothing wrong with having different learning priorities. Just be honest about it instead of putting yourself down & normalizing learned helplessness.