Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Anyone else disappointed the post didn't go into all that much detail? Scaling databases is hard if you're not privileged enough to have access to large pools of money to hire good DBA's.

Interestingly, I would love to see companies like Github opening up their database schemas to the public with mock data. Scaling is one aspect, but the best thing you can do in the beginning is to create a solid schema (normalise, denormalise...) it would be interesting to see what Github uses and why. Still awesome to see MySQL being the choice most large companies like Github choose in the face of new and untested NoSQL databases like MongoDB.



Or better, more solid relational options like Postgres...

Honestly, is there any reason to use MySQL over Postgres at this point? Or is it sort of six of one half a dozen of the other as long as the data model is decent?


INSERT IGNORE and REPLACE are two pretty good reasons, in my opinion. Postgres also doesn't have real table partitioning. Yes, you can sorta kinda hack around the first with stored procs. And yes, you can do something that looks a lot like partitioned tables using table inheritance. And yes, Postgres now has replication support, if you don't mind using only row-based replication (MySQL lets you choose between row-based and statement-based replication) among other tradeoffs.

So yeah, Postgres is a better relational database than MySQL, if you ignore all the things MySQL does better. Another great thing about Postgres is that whenever a site like Hacker News gets a thread about MySQL, you get a bunch of people asking why you aren't using Postgres instead, and then whenever you try and answer the question you get a bunch of Postgres users to tell you you're wrong or that the features you care about don't matter (or that they're hard to implement, which... this is my problem why?) or that Postgres really has replication that's as good as MySQL's this time we pinky swear. So most MySQL users get a first impression of Postgres' user community that is quite frankly rather unfavorable.

Oh, and MySQL has a lot more third-party documentation, tooling and support available.


Perhaps it has something to do with having worked in the pg codebase for a time. It's clean as a whistle, I respect that a lot. But that's certainly not the whole picture.


I was surprised (and a little disappointed) they were not using MariaDB, as it's made by the creator of MySQL, has more features, and generally is considered "upstream" of MySQL these days... seems the MariaDB project would be more receptive to making changes and working with the Github team in order to scale.


I really don't want to get into a debate over the merits of MariaDB versus MySQL, as a lot of it depends on your workload. I investigated moving a MySQL application to MariaDB and some of the features we were using just weren't there yet, and I don't think all of them made it into the most recent release (but don't quote me on that -- I don't maintain that code anymore, so my memory could be hazy on it.)

But the idea that MariaDB is "upstream" of Oracle MySQL is silly. Is Oracle even merging code from MariaDB?


Well, there are two versions of MariaDB -- the 5.x tree (which lags behind the 5.x tree of MySQL and merges changes from MySQL as they are released), and then the 10.x tree which is where MariaDB introduces new features that are not yet in MySQL, and MySQL has merged some of those changes into their project. So, both projects sort of merge each other at times...

I'm no DBA, but I'd think Monty (the creator of MySQL) and his smaller crew at his consulting firm (SkySQL and MariaDB Consulting) which makes MariaDB would be more open and flexible to working directly with Github's teams and needs than through the bureaucracy at Oracle.


So neither is an upstream of each other at this point, they're a fork. And contrary to what people are saying in this thread, as of 10.0 MariaDB is no longer a drop-in replacement for the most recent version of Oracle MySQL and MariaDB's developers are no longer committing to porting all Oracle MySQL features. So MariaDB is no longer a Percona Server-like "MySQL plus goodies" upgrade proposition (and it really hasn't been for a long time - the 5.x series is still at 5.5). MariaDB actually will tell you which MySQL 5.6 features they support:

https://mariadb.com/blog/mysql-56-vs-mariadb-100

So if you need, for instance, MySQL 5.6's partitioning improvements:

https://blogs.oracle.com/MySQL/entry/mysql_5_6_is_a

You're better off on Oracle MySQL. There's other tradeoffs, depending on what you use. What bugs the heck out of me is how a fair number of MariaDB advocates spread FUD about Oracle (if you judge them by their track record, they've committed to improving and maintaining MySQL -- they're not perfect, but that's no excuse to harp about stuff they COULD do when there's no evidence they WANT to sabotage MySQL to force you to switch to Oracle), and want to turn the debate into a holy war rather than focusing on letting everyone pick the best tool for the task.


I think you should research MariaDB some more -- there are a lot of reasons to use it, and a lot of companies are switching. In fact, just today we upgraded our Zimbra cluster and was surprised to see they had made the switch from MySQL to MariaDB. This isn't "fud" as you put it... but rather a better product for a lot of reasons.

> You're better off on Oracle MySQL.

Hardly true, given the two db's are mostly the same except that the creator is now making newer and better changes in the 10.x branch of MariaDB (Monty left Oracle just like most Sun employees due to inner-politics and fighting that is regular at Oracle).


MySQL historically had more support for HA/clustering than Postgres. Recently, there's been a lot of progress on integrating Postgres clustering into the core, to the point where it's mature, but perhaps not as battle-tested. Not a reason to choose MySQL for a startup, I think, but if you've got a cluster working on MySQL and a clear understanding of its pitfalls, there's no real reason to switch to Postgres.


That makes sense, thanks. I guess as long as you have a reasonably optimized relational database of a given class, you're going to be about on the same order of magnitude of performance.


We chose MySQL where I work only because it's easier to hire people with a lot of MySQL knowledge. Technical reasons aren't the only ones when deciding what software/framework/libraries to use.


That can work both ways though. Sometimes if you chose a less popular option it can make it easier to find outstanding candidates.


1. Monitoring and administration tools for MySQL are (better?) more polished. 2. WAY easier to find MySQL DBA's vs. PG DBA's 3. More resources in general around MySQL. Whatever problem/issue you have it's out there already.


There are a lot of reasons IMO. MySQL has a proven track record for stability and performance powering huge sites. I would say MySQL has a nicer replication story as well.

In general MySQL is a lot more widely used with a greater pool of knowledge out there.


MariaDB is a drop-in replacement for MySQL, is made by the creator of MySQL (after he left Oracle post-Sun acquisition), has more features and is now generally considered the upstream of MySQL. I also bet Monty is more receptive to working directly with your teams to scale the product or make changes as necessary.


Nicer replication story? Try recent postgres with wal shipping + wal streaming;)


Tooling. While I agree psql tends to be more performant, when you're dealing with groups of people having good tools (and good documentation for those tools) trumps many considerations.

As an example, the organization I work at is considering a move to psql, but our main barrier is a lack of good DB clients that are accessible to people who aren't software developers. The best we're aware of as far as DB GUIs for Postgres is pgAdmin, whereas if we're talking MySQL you have things like MySQL workbench, SQLPro, and a myriad of other applications available for whatever your operating system of choice is.


After Postgres got replication out of the box I'm pretty sure it's just inertia and the fact that it's installed everywhere keeping mysql usage up. Postgres is so much nicer when dealing with day-to-day things.

Eg disabling connections per database using sql and having built-in query stats (http://www.postgresql.org/docs/9.4/static/pgstatstatements.h...) just made me smile.


The built in query stats is similar to Performance Schema in MySQL. The statistics are quite fine grained, and with a set of views on top (https://github.com/MarkLeith/mysql-sys) are very useful for observability.


Thanks, wasn't aware of that, quit mysql habits pre 5.6 :)


Technically, it was introduced in 5.5, but off by default :)

5.7 is going to be amazing for observability. Memory, transactions, stored procedures, replication, meta data locking and prepared statements are all instrumented in P_S.


MySQL is like PHP, its installed almost everywhere, and requires less set up. I did have one problem recently where MySQL turned out to be a better solution, as it has a native bitcount operation, and a larger set of numerical types.


I thought it was more to do with Github is based on Rails, and MySQl is the default choice.

Yes, I really wish DHH and the Basecamp team would move to Postgre , therefore moving the community to it.


SQLite is the Rails default, in fact.

In October 2013, NewRelic reported that 53% of their customers are using PostgreSQL: http://blog.newrelic.com/2013/10/10/infographic-state-stack-...

In July 2014, Planet Argon's Rails survey showed a surge in preference for PostgreSQL since 2012: http://rails-hosting.com/Results/2014/index.html#Database

Neither captures the distribution of usage from small- to large-scale apps, but it does look like the community has already moved both in mindshare and number of production deployments. I'd wager that Heroku's PostgreSQL default is responsible for much of that.


I find it quite patronizing that you assume the only reason anyone would use MySQL is that it is the default choice.


>I find it quite patronizing that you assume the only reason anyone would use MySQL is that it is the default choice.

Really? Defaults are supposed to be sane and the best-fit-for-most-common-scenarios... so it's a little patronizing to automatically assume your project is just so special that defaults are not good enough.

Defaults should be good enough until they have been proven to not be good enough. Don't over-engineer your project.


Given the needs of most applications (which don't press either database's capabilities whatsoever), that really is probably the main reason.


I think there is some truth in it though; that and the inertia it currently has.

If you're picking up a framework, do you use the default database or try to do something non-default on top of what you're already learning?


One minor reason I noticed recently is that you can't change Postgres' listen(2) backlog. This can have a significant impact on responsiveness.


I guess using a connection queue is preferable? (Although it complicates the software stack a bit.)


I was planning to move to Postgres, until TokuDB went open source.


The problem with going into too much detail is that it can become a list of recommendations that might not even apply to different people's infrastructure.

Most of the time MySQL optimization is workload specific.


Scaling databases is EASY if you pick a database designed for it. I've personally managed Cassandra, HBase and Riak on 30+ node clusters for companies. Almost no issues.

It shows how little you know about real world scalability when you actually suggest MySQL over something like Cassandra. It is a nightmare to scale unless you are sharding in your application layer.


It can be easier, but saying it's ever easy belies the bevy of issues that can (and will) always crop up.


> new and untested NoSQL databases

MongoDB is new and untested? What rock do you live under?

It is new compared to many of the veteran Relational databases like SQL Server (and I don't mean that in a bad way), but it is a proven technology used by many. See http://www.mongodb.com/who-uses-mongodb


Unfortunately, many of those who used mongodb regretted the decision

Mongodb is a good db but it is not good for many use cases that people are using them for. There are a lot of companies that realized the mistake and are actively migrating off it. I know we are planning a "Off-Mongo party" with another company once we both manage to migrate off


I think you're being too kind.

Many of us who used MongoDB have concluded that there is no task at which it is in any way better than all other existing alternatives.


That's a pretty bold statement. Care to enlighten?


I agree that document-oriented databases don't deserve a lot of the hype they get (use the right tool for the job and all that), but there is definitely some use cases where they make sense.

At my company most of our systems are SQL Server powered, but one of the newer systems that stores large blobs of metadata for products is using Mongo, and it is working quite well.


You are either being disingenuous or you're ignorant.

Many of the companies that switched off MongoDB were growing and ended up moving up to databases like Cassandra. MongoDB is a great database from when you're starting to when you're mid sized.

Cassandra destroys PostgreSQL in scalability but we don't say PostgreSQL is a crap database because of it.


What makes Cassandra unsuitable for starting to mid sized where MongoDB is better?


Calling it new and untested is a bit too much. It's probably more accurate to describe it as being tested and found unworthy for most use cases.



That has so much win. Thank you.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: