>Issue #1 is that in Kafka’s server.properties file has the line log.flush.interval.messages=1 which forces Kafka to fsync on each message batch. So all tests, even those where this is not configured in the workload file will get this fsync behavior. I have previously blogged about how Kafka uses recovery instead of fsync for safety.
Respect to the Kafka team as Kafka is an incredible piece of software, but the Mongo guys got torched for eternity for pulling the same shenanigans.
I don't think Kafka using eschewing fsyncs is a bad thing; I'm aware of the risks. What I'm pointing out, and what got Mongo killed in the court of public opinion, was saying "our database is blazing fast because we turned off fsyncs".
Benchmarking a system that fsyncs every write to one that doesn't isn't an apples-to-apples comparison. You are free to make the argument that you might not need them, but if you are benchmarking systems and one of them fsyncs by default, that is the level of durability I'm going to expect, otherwise I can assume the other guy will be just as fast if he turns off fsyncs as well.
Last I heard of MongoDB it was getting utterly buried by the Jepsen guy, and for anyone that follows distributed systems at some technical level, that is damning. He finds stuff wrong with everything, but that one was particularly damning.
MongoDB has always seemed to place write consistency secondary to other priorities (mostly sales / read / features) which is frankly a crap way to do a database, much less a distributed one. And I am so sick of MongoDB basically saying "no it's fixed in the new version" which is always a major red flag.
Right now it's getting its lunch eaten by Postgres's document interface from what I can tell.
a) Every distributed database has had serious issues with Jepsen.
b) MongoDB has been growing revenue ~40% year on year for the last few years.
c) PostgreSQL is only a serious competitor for MongoDB if you have small datasets. After all these years PostgresSQL still is ridiculously poor when it comes to clustering, replication etc. Everyone's solution of "just buy a bigger instance" is just laughable.
Jepsen does find stuff with everything. Thus you have to know what is being discussed is serious and blatantly bad, or just the usual "wow distributed is hard".
"MongoDB’s default level of write concern was (and remains) acknowledgement by a single node, which means MongoDB may lose data by default."
Cassandra doesn't do that, consistency level is fundamental to the documentation and user guide. That is AWFUL.
"Curiously, MongoDB omitted any mention of these findings in their MongoDB and Jepsen page. Instead, that page discusses only passing results, makes no mention of read or write concern, buries the actual report in a footnote, and goes on to claim:
MongoDB offers among the strongest data consistency, correctness, and safety guarantees of any database available today.
"
That is fraud. That is clownshow. Enjoy your increasing revenue.
Although it was some time ago and I may be misremembering, I seem to recall reading the Jepsen article on RedPanda and thinking that it (and Postgresql) were among the better reports.
Certainly, not all Jepsen reports are all that bad, and tbh I'm at leaast as interested in the way the vendors respond (some of which have been terrible).
With the usual recommended settings, XFS filesystem, 3 replicas, 2 "in-sync" replicas, etc., it is rather safe. You can also tune background flush to your liking.
The above tradeoffs are very reasonable and Kafka runs very fast on slow disk s(magnetic or in cloud), and even faster on SSD/NVMe disks.
MongoDB has been doing fsync by default for over a decade now .
And those that actually had tried it were aware that every client enabled fsync out of the box. So in fact the entire situation was seriously overblown.
But sure let irrational ideology affect your technology decisions. That will work out well.
Avoiding a database that has a proven historical record of disregarding data consistency and resorting to marketing gimmicks is "irrational ideology"?
Not everyone has time to review every single line of code in their tech stacks. Past reputation is important, and your replies here don't seem to be of much help to MongoDB's reputation as far as I can tell.
Respect to the Kafka team as Kafka is an incredible piece of software, but the Mongo guys got torched for eternity for pulling the same shenanigans.