Paxos shouldn't be the first protocol you learn -- I think it gives consensus this mistaken reputation for being "difficult to understand".
Consensus protocols (and the related problems of state machine replication, permissioned blockchains) have really come a long way over the years. They've become more performant and much simpler (in my opinion).
All the supposed "simpler" consensus protocols I've seen just hide all the complexity somewhere (like raft with its leader election). Where does streamlet hide its complexity?
It hides the complexity in the first bullet point: "The protocol proceeds in synchronized epochs."
That is, we're now solving consensus for a synchronous system.
So for example the Paxos assumption:
>Messages are sent asynchronously and may take arbitrarily long to deliver.
Does no longer hold AFAIK.
Edit:
Good article, they even say this:
>Partially Synchronous Network: The network can at times be unreliable, or adversarial; the protocol must not lose consistency in this case. However, when network conditions are good, i.e., when honest players can communicate with each other within 1 second, the protocol must make progress.
No complexity is hidden here. Streamlet is (essentially) in the same setting as paxos. Just like in paxos, messages can take arbitrarily long to deliver.
The only assumption is that clocks are synchronized, which is much, much weaker. (We can actually remove even this assumption. But it's easier to think about this way.)
i.e. I think it's monday at the same time you think it's monday, or I think it's epoch 1 at the same time you think it's epoch 1. or my quartz crystal oscillates at the same rate as your quartz crystal. Importantly, a message sent during epoch 1 might arrive in epoch 3, or never at all.
Paxos is also in the partially synchronous setting - it requires some network synchrony in order to terminate, just like streamlet (For truly asynchronous protocols, there's a whole, complicated literature).
>Paxos is also in the partially synchronous setting - it requires some network synchrony in order to terminate, just like streamlet
As far as I've understood it Paxos is fully asynchronous, but you can either guarantee correctness or termination and never both for those systems. Paxos decided on correctness. Am I wrong in this?
Right, when run in a fully asynchronous setting, where the network adversary can always deliver messages in the order of their choice, the FLP impossibility result says that Paxos cannot terminate.
Thus, for Paxos to make progress, the network needs to eventually "heal" and deliver messages in a timely manner -- that's what we mean by partial synchrony. Streamlet, likewise, tolerates arbitrary network delays (in the sense that no two players will agree on different things even if Comcast maliciously scheduled every message), but also requires partial synchrony to make progress (i.e. once in a while Comcast will deliver messages in timely fashion and in order, like you expect).
(Partial synchrony is arguably the most practical of the theoretical synchrony models for consensus. There are other models, i.e. "periods of synchrony", but they all end up capturing the same idea).
FLP, however, is a fairly weak statement. It only rules out termination for deterministic protocols. We can in fact design randomized asynchronous protocols that make progress (with good probability), regardless of the ordering in which messages are delivered. I.e. as long as Comcast eventually delivers every message -- and letting it choose delays arbitrarily -- a randomized protocol could make progress. These are the "true" asynchronous protocols. (Paxos is not one of them)
I guess the original asynchronous consensus protocol is Ben-Or's protocol (which takes 2^n time in expectation IIRC, where n is the number of players). We can reduce the running time using some (fairly sophisticated) cryptography -- the most famous is the Canetti-Rabin protocol, and Cachin and many others have worked on it since. Most of the complexity (for polynomial time protocols) is in building a "global common coin" -- i.e. a global coin that everyone agrees on the result of; once you have that we can use Mostefaoui's protocol for instance. Building this coin is incredibly complicated (much more so than Paxos) unless we assume some strong cryptographic trusted setup (i.e. a nice threshold signature scheme). How practical any of these schemes are is an open question.
> Raft on the other hand... I've implemented versions of both paxos and raft, and by golly was the latter more difficult to get right.
Could you clarify what you mean? "Paxos" as defined in the literature (and as presented in this article) is a single-shot consensus algorithm. Raft is a leader-based replicated state machine. They are not comparable at all and it's not surprising if you find Raft more difficult.
However, in practice, when people say "Paxos" they usually mean some variant of MultiPaxos and "Raft vs Paxos" is typically interpreted as "Raft vs MultiPaxos". It's also well-established that MultiPaxos and Raft have roughly the same complexity.
Yeah, but synchronizing time gets easier the bigger the accuracy window is, and approaches impossible the smaller. 2 seconds in well within NTP's wheelhouse on all but the most jittery and unusable networks.
We could actually run a generic clock synchronization protocol to synchronize clocks over a partially synchronous network. I guess paxos kind of builds that in, which adds to its complexity.
Safety in particular doesn't require any timing assumption.
(Note that messages can still be delayed arbitrarily in both protocols)
I think Lamport's Paxos paper was, well, kind of a bad way to teach people about Paxos, but a good thing to have as part of Paxos teaching.
IMO, Paxos is much easier to "fully" understand than Raft. The question really becomes, define "fully." What set of performance scenarios, with member connects/disconnects and other shenanigans, do we need to consider in order to "fully" understand Paxos (or basic Multi-Paxos)?
What I really like about Paxos is that it's pretty easy to re-invent the exact algorithm, or come up with your own Multi-Paxos definition, after you've forgotten what the original exactly was, once you've grokked the general principles of how it works.
I was at the talk where that assertion was made. It ticked me off. This sort of thing is learned helplessness, no different from people saying "they aren't very good with computers".
As someone who was taught to drive stick shift, I am well aware that there are things a person can do which may actually improve their quality of life for having done them. Within that group of things is a large subset which we shouldn't force people to have to think about. Particularly if those people are tasked with also keeping track of other important things.
Yes, there are plenty of people who can't be bothered to learn something because they are lazy, intimidated, or stupid. But it's not even most, and that presumption, of all of the egotistical self-centered bullshit we pull, ticks me off most of all.
We only have time in our lives to become experts in a handful of things (some people think that number is one, or two, which is a shame), and lots of people have other shit they'd rather be doing, possibly (hopefully!) including things that you can't be bothered to learn about. If we all worry about the same things then we don't get where we're going.
There's a difference between being able to drive stick shift, and "fully understand"-ing how to drive stick shift.
I would absolutely say that I can drive stick shift because I've proven it to examiners in multiple countries (not every country permits a license exchange sadly), but to "fully understand" something is (well) something of a higher bar so I would ask how high? to make sure I understand what it is they expect me to understand. There might be a nuance that you (or I!) hadn't considered, for example driving stick shift with an unsynchronised manual transmission, or shifting without clutching, or something else entirely, and the point of the "fully understand" qualifier might be to call attention to something like this.
For example: Say I can implement paxos in my sleep, debug someone elses implementation, and I can even design and implement a new consensus algorithm based on paxos where I may satisfy requirements of parts in slightly different ways that are extremely domain-specific but may be beneficial from a performance/time/space/cost perspective. This is what someone might mean by "fully understand"-ing paxos, but making some changes to paxos may take time for me to convince myself it is correct (or incorrect!), or there might be parts I think are essential/important that aren't (in some situations), and so I would still be interested in learning something new. Who wouldn't?
However, I don't have good experiences with people who say "there are only X people who fully understand Y so you should use Z" -- it's
usually means they don't understand Y as well as they understand Z, and that's not what I want from an expert.
Consensus protocols (and the related problems of state machine replication, permissioned blockchains) have really come a long way over the years. They've become more performant and much simpler (in my opinion).
(I'll take the opportunity to pub streamlet: it's simpler, has only two message types "propose" & "vote", and it tolerates malicious faults. https://decentralizedthoughts.github.io/2020-05-14-streamlet...)