>*Issue #1 is that in Kafka’s server.properties file has the line log.flush.inte...

haggy · on May 15, 2023

Kafka, unlike Mongo DB, relies on recovery/replication instead of fsync:

https://jack-vanlightly.com/blog/2023/4/24/why-apache-kafka-...

Kafka has never tried to hide that fact and it does not, in any way, make Kafka unsafe.

nemothekid · on May 15, 2023

I don't think Kafka using eschewing fsyncs is a bad thing; I'm aware of the risks. What I'm pointing out, and what got Mongo killed in the court of public opinion, was saying "our database is blazing fast because we turned off fsyncs".

Benchmarking a system that fsyncs every write to one that doesn't isn't an apples-to-apples comparison. You are free to make the argument that you might not need them, but if you are benchmarking systems and one of them fsyncs by default, that is the level of durability I'm going to expect, otherwise I can assume the other guy will be just as fast if he turns off fsyncs as well.

fulafel · on May 15, 2023

Is durability preserved when you lose replica connectivity around the same time as power to your CPU? As tends to happen.

skyde · on May 15, 2023

Exactly I will never ever try MongoDB because of that. A database that do not fsync should not be called a database.

uberduper · on May 15, 2023

MongoDB moved on from mmap at version ~3.6. WiredTiger can be configured to fsync every commit. Enjoy trying MongoDB!

PS: I really miss working with mongodb. It's been almost 7 years since I last used it. I'm surprised I don't see it mentioned very often anymore.

AtlasBarfed · on May 15, 2023

Last I heard of MongoDB it was getting utterly buried by the Jepsen guy, and for anyone that follows distributed systems at some technical level, that is damning. He finds stuff wrong with everything, but that one was particularly damning.

MongoDB has always seemed to place write consistency secondary to other priorities (mostly sales / read / features) which is frankly a crap way to do a database, much less a distributed one. And I am so sick of MongoDB basically saying "no it's fixed in the new version" which is always a major red flag.

Right now it's getting its lunch eaten by Postgres's document interface from what I can tell.

threeseed · on May 15, 2023

a) Every distributed database has had serious issues with Jepsen.

b) MongoDB has been growing revenue ~40% year on year for the last few years.

c) PostgreSQL is only a serious competitor for MongoDB if you have small datasets. After all these years PostgresSQL still is ridiculously poor when it comes to clustering, replication etc. Everyone's solution of "just buy a bigger instance" is just laughable.

KptMarchewa · on May 15, 2023

Growing revenue of a owner company as a argument for database? We have an Oracle fan here.

AtlasBarfed · on May 16, 2023

Jepsen does find stuff with everything. Thus you have to know what is being discussed is serious and blatantly bad, or just the usual "wow distributed is hard".

Which is why his papers are so great.

But the MongoDB one was "wow this is bad".

threeseed · on May 16, 2023

Every distributed database has been "wow this is bad".

I assume you have an example of one that wasn't ?

AtlasBarfed · on May 16, 2023

"MongoDB’s default level of write concern was (and remains) acknowledgement by a single node, which means MongoDB may lose data by default."

Cassandra doesn't do that, consistency level is fundamental to the documentation and user guide. That is AWFUL.

"Curiously, MongoDB omitted any mention of these findings in their MongoDB and Jepsen page. Instead, that page discusses only passing results, makes no mention of read or write concern, buries the actual report in a footnote, and goes on to claim:

    MongoDB offers among the strongest data consistency, correctness, and safety guarantees of any database available today.

"

That is fraud. That is clownshow. Enjoy your increasing revenue.

threeseed · on May 16, 2023

The default write concern for the last 2 years has been majority.

And single node is a perfectly fine default for most use cases.

After all Cassandra's default consistency level is 1.

dorlaor · on May 16, 2023

Most users of ScyllaDB/Cassandra use quorum or local_quorum

threeseed · on May 16, 2023

And most users of MongoDB use majority write concern.

Hence my point there is no difference in defaults between MongoDB and Cassandra.

doctor_eval · on May 16, 2023

Although it was some time ago and I may be misremembering, I seem to recall reading the Jepsen article on RedPanda and thinking that it (and Postgresql) were among the better reports.

Certainly, not all Jepsen reports are all that bad, and tbh I'm at leaast as interested in the way the vendors respond (some of which have been terrible).

nezirus · on May 15, 2023

Kafka doesn't do any stupid tricks, but uses the underlying platform for the full potential: https://kafka.apache.org/documentation/#linuxflush

With the usual recommended settings, XFS filesystem, 3 replicas, 2 "in-sync" replicas, etc., it is rather safe. You can also tune background flush to your liking.

The above tradeoffs are very reasonable and Kafka runs very fast on slow disk s(magnetic or in cloud), and even faster on SSD/NVMe disks.

datadeft · on May 15, 2023

Kafka is not a database....

postalrat · on May 15, 2023

Maybe you could say that if it acted like redis pub/sub and nothing was stored.

threeseed · on May 15, 2023

MongoDB has been doing fsync by default for over a decade now .

And those that actually had tried it were aware that every client enabled fsync out of the box. So in fact the entire situation was seriously overblown.

But sure let irrational ideology affect your technology decisions. That will work out well.

hnfong · on May 16, 2023

Avoiding a database that has a proven historical record of disregarding data consistency and resorting to marketing gimmicks is "irrational ideology"?

Not everyone has time to review every single line of code in their tech stacks. Past reputation is important, and your replies here don't seem to be of much help to MongoDB's reputation as far as I can tell.