Paxos livelocks (doesn’t work). So Lamport suggests electing a coordinator via t...

ahelwer · on Sept 1, 2019

Paxos livelocks in a very unstable state which requires multiple independent nodes to sent specific messages at specific times. If you add random intervals between retries, the probability of staying in a livelocked state quickly grows astonishingly low.

vyodaiken · on Sept 1, 2019

I wrote a little simulation. Paxos rarely gets stuck with a small number of proposers/acceptors, but jams up solid once you get to, I forget, 10 or 20. Here it is:

https://www.yodaiken.com/wp-content/uploads/2016/10/paxos.pt...

Embarrassingly badly written Lua code for the simulator https://github.com/yodaikenv/consensus

The key to me is that once you accept the need for leader election, you have much simpler alternatives and can toss away the rest of paxos.

ahelwer · on Sept 1, 2019

This is authentically a very interesting simulation and I would be interesting to see a writeup if you have one. I will say that you only have multiple proposers if the proposers cannot talk to one another, as otherwise they would be silenced by any simple leader election protocol (heartbeats + node rank, etc.) It's true that the leader election protocol is not part of paxos per se but all implementations will have one, and the protocol livelocks in the unusual case where there's an enduring network partition and the multiple proposers are sending messages with specific timing.

Edit: I took a look at your post[0] and it is interesting. I'll take a look at CM.

[0] https://www.yodaiken.com/papers-and-talks/distributed-consen...

grogers · on Sept 1, 2019

What is the simpler and more efficient solution than paxos for consensus that uses timeouts?

vyodaiken · on Sept 1, 2019

stage 1: elect a single coordinator (which you have to do with paxos anyways) stage 2: the coordinator sends each message over and over until it has accepts from 51% of acceptors or it times out and starts a reformulation of the acceptor set

If an acceptor does not receive a message from the coordinator over a sufficient period,it times out and tries to trigger a new coordinator election

Coordinator election: try for t time units to get a majority of acceptors to accept your proposal to be coordinator. give up on timeout or notification that there is a running coordinator.

This is the simplest method and not the most efficienct, but it is both live and safe.

Chang/Maxemchuk propose to make stage 2 more efficient that acceptors be arranged in a cycle and messages from the coordinator be numbered sequentially. Then each acceptor takes a turn to broadcast its accept but only does it if it has seen all prior (lower numbered) messages and their accept message responses. on each cycle completion, the coordinator can commit the first message in the cycle.

grogers · on Sept 1, 2019

If the leader fails, the next leader needs to know the values the acceptors have already accepted in order to propagate the previously chosen value, if any. I assume this happens in your election phase. Since we only want to talk to a majority of acceptors during the election, we need a way to order the values proposed by different leaders (ballot number). Without that, we'd need to receive responses from possibly all acceptors, which would break liveness. Once the leader has proposed a value at a ballot number, acceptors need to not accept values from older ballots, otherwise the old leader, having woken up from a nap, could just overwrite the chosen value. So in filling in the not specified portions of your algorithm in the simplest way possible to make it work, we've just described paxos.

I will have to read the Chang-Maxemchuk paper, I have not heard of that before.

vyodaiken · on Sept 1, 2019

You don't need Paxos at all. In fact, you should read Liskov's paper https://www.yodaiken.com/2018/07/16/practical-uses-of-synchr...

The key is to decide whether timeouts work in your network. If timeouts don't work and you don't have some reliable message protocol, then nothing works. If timeouts do work, then a completely simple obvious brute force algorithm works. In this algorithm, we have leader/coordinator election. The leader then polls the acceptors for the highest sequence number of a committed message and recovers all missing messages. Then the leader starts sending new messages - increasing the same sequence number. The old leader, if it recovers should know that its lease as leader has timed out, but if it does not it cannot get acks from a majority of acceptors for a new message because a majority have signed on to the new leader. There is no "value" competition - only a competition for the id of the new leader during leader election. We don't need Paxos at all.