Ask HN: Tell me about modern databases

BrentOzar · on Sept 10, 2016

Dell Software's Toad Query Optimizer (for Oracle and SQL Server) has an engine that does exactly that. It takes your SQL, tries rewriting it a bunch of different ways, throws in different compiler hints, and measures the response time/reads/CPU/etc for each execution, then gives you a set of graphs and recommendations.

I wanna say it was around $1500/seat last time I looked at it. It's worth every penny if you have to write a lot of complex queries. I've bought it for just one senior member of a dev team, and everybody brings their worst queries to that person for tuning.

Here's the trick, though: after it optimizes a query for you, keep your original query around. You're not going to want to try to edit the query that Toad produces - it can be horrifically complex. Instead, edit your original query, then run it through Toad again.

thom · on Sept 10, 2016

I will look into it, sounds like the kind of magic I'm after, and I'm not against hopping between platforms to save future pain. Thanks!

_RPM · on Sept 11, 2016

Sounds like a mistake to only have one member of team have it. Why does this member get special treatment?

kogir · on Sept 11, 2016

It sounds to me that while one license could be totally worth it, getting it for every developer would not be.

bkovacev · on Sept 11, 2016

Why does it have to be a special treatment?

_RPM · on Sept 12, 2016

By definition, this person gets special treatment. I don't know why it has to be.

eschutte2 · on Sept 10, 2016

It seems like the first question to ask is, where is the query planner falling down? What are you finding in those EXPLAIN ANALYZEs that is wrong? When you say "that isn't the way it seems to want to allocate resources," what does that mean - do you mean you want it to auto-create indexes? SQL Server used to have a tool to suggest indexes based on a query workload, is that what you mean? Have you used https://explain.depesz.com/ ?

thom · on Sept 10, 2016

Yes, I spend a large part of my day looking at explain output, often on that site. The SQL Server Database Engine Tuning Advisor is reasonably good, and something like that for Postgres would be great, but ultimately it doesn't seem to look at a query and come up with a semantically identical but structurally better way of phrasing it - it will never fix unnecessary big-O performance issues (in the way that, for example, posting a single query on Stackoverflow might). But even if such a tool exists for Postgres, I'd be running it _all the time_. I'm just naively wondering if technology has moved on. I have basically no writes outside an ETL process, much of the time this is effectively single-user ad-hoc stuff. I realise I might be being hopelessly naive, but it feels like I can clearly specify a query, and I have complete control over my schema. Something must exist that requires me to pay more money but ultimately care less about endless performance tweaks to create a view that is fast?

auganov · on Sept 14, 2016

So your queries are very diverse in their nature in an unpredictable way? I mean I'd assume you'd more or less know the big-Os for the stuff you're doing and have a vague idea if there's really room for improvement? At just 35m records I'd assume you're purely cpu bound with vanilla high-order polynomial queries. Maybe what you really want is to rethink what you want the db to do. Perhaps you can take a bit more of a soft computing approach and settle for imprecise answers.

al2o3cr · on Sept 11, 2016

If you're doing sufficiently focused sorts of queries - the question mentioned "find the path from A to B to C" - then you might get value out of a DB specifically focused on that question (a graph DB, or a time-series one). The Postgres FDW stuff may help you make integrating one of those less painful.

Before you do that, though: have you thrown RAM at the thing? 35m events sounds like a lot to humans, but not when you can rent machines with 1.2TB of RAM by the hour...

thom · on Sept 11, 2016

I was more thinking of spatial stuff when I talked about paths, and PostGIS is a pretty productive platform for me so far. Example calculations would be taking the geodesic distance moved over a series of events over the total length of the path to calculate a score for 'directness'. PostGIS has aggregates to create lines from points, measure the total length etc.

brudgers · on Sept 11, 2016

I'd suggest the little red book (Stonebraker's not Mao's):

http://www.redbook.io/

rer · on Sept 11, 2016

You're missing something like this: https://www.monetdb.org/

thom · on Sept 11, 2016

I seem to be the exact target market for this sort of thing, thank you! The SQL implementation seems reasonably modern, and while the GIS features are a little rudimentary there's at least something to work with.

My worry about column stores is that I can't guarantee my workload is optimised for that paradigm either - imagine a window function with multiple partition criteria etc etc, but I suppose it's worth testing.

bachback · on Sept 11, 2016

SQL is horrible for many use-cases. have a look at http://datomic.com

thom · on Sept 12, 2016

Is Datalog naturally fast for certain types of queries, or do performance gains for Datomic over RDBMSs come from being able to scale up a cluster? I don't have a feel for what Datomic's sweet-spot is, despite having much love for Clojure and its ecosystem.