The Reddit guys think differently: "[...] (rabbitmq) died, which added about an hour to the downtime. It dies like this pretty often at 2am or at other especially bad times. Usually it doesn't cause any data-loss, just sleep-loss (its queues are persisted and the apps just build up their own queues until it comes back up), but in this case it decided to crash in a way that corrupted its database of persisted queues beyond repair. rabbitmq accounts for the only unrecoverable data-loss incurred, which was about 400 votes. [...] Coincidentally, rabbitmq crashed twice more that day and a few more times into the weekend. [...] Things have improved thus far, but replacing rabbitmq is at the top end of our extremely long list of things to do."
"Crashed"... I'm glad they're using such specific terms. I give them a lot of slack because they run that shop with a skeleton crew, but they sure do run into a lot of issues with perfectly good software, have Twitter levels of performance & availability, and make some very odd technical decisions.