I haven't read the whole thing but I really enjoyed the introduction and agree completely. The RDBMS I've spent the most time with over the years is Microsoft SQL (immediately when 2000 was released) and over the years I've been surprised by two things: (a) How many developers that work with databases every day yet haven't the foggiest idea how to do anything the GUI tooling doesn't handle "easily" and (b) How quickly a developer can go from zero to "able to keep things performing well for all but the ugliest scenarios" knowledge-wise.
I've watched that last part unfold on many occasions. Usually I'd be brought in because the developers have done everything down to "copying the data to denormalized tables[0]" to try to sort out slowness. Most of the time, just "blindly investigating the schema" will turn up something frightening that the ORM did, or a mess of inappropriate indexes. Two indexes on a table used by nearly every query in the database caused 30 second requests to yield sub-second results in one case[1]. I'm never willing to walk in and promise that, but I can't think of a time it hasn't happened when similar circumstances were presented to me.
So if you're dragging your feet about learning SQL, take this as my encouragement: it's one of those things where the rewards come quick and the effort is far less than you probably expect.
[0] ... with a broken sync process that has to be run carefully b/c it hammers the already over-sized database every time it fires.
[1] If memory serves, it was an account table ... used a GUID between the app and the database, but primary key was an integer auto-incrementing field which was used as the FK to other tables. All I remember was adding a unique index to the GUID field and including the e-mail address/name columns which were included in every query. Ran the fix in production and it felt like "the dam broke".
1. Trace your application, so you know which query is running slowly in production.
2. Using this exact query, run `EXPLAIN ANALYZE` or whatever your RDBMS equivalent is.
3. Read the output, see which step is taking the slowest. Use google to help.
4. Google-fu until you find out which index might help you.
Over time, (3) and (4) requires less and less Google, because there's really only a few common cases that give you 100% of the speedup in 80% of cases.
Once you know this path to improvement exists, it's trivial to progress down it habitually. The `EXPLAIN ANALYZE` output looks like Greek at first, but quickly becomes as familiar to parse as a compiler error, etc.
DB 'expert' here - agree completely. It's about profiling then banging your head against it and learning more from books/the docs/web. That's really it.
If you're really starting out from square 1, I might read this book first: https://www.amazon.com/Database-Design-Mere-Mortals-Annivers... (although I read an earlier edition). The Art of SQL is a great book, but I wish I'd read something a bit...easier...first, lol. I struggled with it.
Yeah well, I am old school (or at least old) so I had already a good grasp of SQL, but I think the book is good because it explains overarching concepts very well.
The details depend heavily on what specific database you’re working with, but in broad strokes, you’ll want to know: indices, query plans, when and where to cache expensive computations, every f—- word of your DB manual’s chapter on concurrency control, and the fastest method(s) for yeeting data in and out from your language/toolkit of choice.
The last few years I've been working on a fairly big django project written by someone learning the problem domain and python/django at the same time.
I have a lot of similar stories and I you are absolutely right to encourage people to learn the basics, at least (looking at you, indexes and perf tools).
We're talking about having our cpu hitting 80-90% and rising internal temperature to 70-80C, while executing a task thats normally performed tens or hundreds of times during the workday.
A couple of carefully planned indexes, a bit of sql shuffling, is all it takes sometimes.
The upside of working with cheap bare metal servers is that you catch these things early on (5 concurrent users and your server is toasted). Fun times.
I've watched that last part unfold on many occasions. Usually I'd be brought in because the developers have done everything down to "copying the data to denormalized tables[0]" to try to sort out slowness. Most of the time, just "blindly investigating the schema" will turn up something frightening that the ORM did, or a mess of inappropriate indexes. Two indexes on a table used by nearly every query in the database caused 30 second requests to yield sub-second results in one case[1]. I'm never willing to walk in and promise that, but I can't think of a time it hasn't happened when similar circumstances were presented to me.
So if you're dragging your feet about learning SQL, take this as my encouragement: it's one of those things where the rewards come quick and the effort is far less than you probably expect.
[0] ... with a broken sync process that has to be run carefully b/c it hammers the already over-sized database every time it fires.
[1] If memory serves, it was an account table ... used a GUID between the app and the database, but primary key was an integer auto-incrementing field which was used as the FK to other tables. All I remember was adding a unique index to the GUID field and including the e-mail address/name columns which were included in every query. Ran the fix in production and it felt like "the dam broke".