> Although, not every integration is customer-facing and does not have to be infinitely scalable. Scalability = cost, often a heavy cost.
You're either putting that cost on the client or paying to replicate their DB yourself. If that's too expensive, you shouldn't be using this approach in the first place! Running arbitrary queries on your customer's database primary is simply never safe, nor can it be made safe.
> If you query the DB once a day, the customer can come back to you and say: our db's are a little hot at 3AM and you do your queries then - can you add some indexes to the view or run a DESCRIBE query and optimize this a bit?
It's not a question of the DB running a little hot. It's accidentally triggering one or more full table scans on a table with millions of records. You have no idea what time your customer experiences load, or when their own batch jobs run. Or whether the index you built against with EXPLAIN six months ago got dropped because there's no queries in your customer's codebase that could possibly reference it.
And it's not just about their system staying up. What if you trigger so many IOPS that your customer has a massive bill? Or if you run a query that needs more space to prepare the results, which scales up the instance automatically, resulting in unexpected costs? You're gonna pay for that, right?
You also might hit a performance cliff: the query planner might choose a bad plan for a query that's otherwise satisfiable with an index only, because the stats suggest (for instance) that the visibility map is stale. There's nothing you can do here to fix this short of tuning settings on the DB itself.
> And let's agree that the above point does not change whether you use an API layer or query directly.
It does! Your customer (hopefully) knows their own DB. You don't. Have your customer build the API and they'll support your integration. You can't possibly have a better coupling with the data in your customer's database than your customer does.
> Running arbitrary queries on your customer's database primary is simply never safe, nor can it be made safe
Look, the only thing that's never safe is speaking in absolutes.
> It's not a question of the DB running a little hot. It's accidentally triggering one or more full table scans on a table with millions of records.
Does the table have a million records? And I mean come on, a million records and a FTS once a day on modern machine - for the sake of inserts I actually would go with it... Besides may I remind you: materialised views.
> It does! Your customer (hopefully) knows their own DB. You don't. Have your customer build the API and they'll support your integration.
I envy you. You live in a fantasy land where your customers don't make mistakes and are full pros. I mean that, or you don't care...
You're either putting that cost on the client or paying to replicate their DB yourself. If that's too expensive, you shouldn't be using this approach in the first place! Running arbitrary queries on your customer's database primary is simply never safe, nor can it be made safe.
> If you query the DB once a day, the customer can come back to you and say: our db's are a little hot at 3AM and you do your queries then - can you add some indexes to the view or run a DESCRIBE query and optimize this a bit?
It's not a question of the DB running a little hot. It's accidentally triggering one or more full table scans on a table with millions of records. You have no idea what time your customer experiences load, or when their own batch jobs run. Or whether the index you built against with EXPLAIN six months ago got dropped because there's no queries in your customer's codebase that could possibly reference it.
And it's not just about their system staying up. What if you trigger so many IOPS that your customer has a massive bill? Or if you run a query that needs more space to prepare the results, which scales up the instance automatically, resulting in unexpected costs? You're gonna pay for that, right?
You also might hit a performance cliff: the query planner might choose a bad plan for a query that's otherwise satisfiable with an index only, because the stats suggest (for instance) that the visibility map is stale. There's nothing you can do here to fix this short of tuning settings on the DB itself.
> And let's agree that the above point does not change whether you use an API layer or query directly.
It does! Your customer (hopefully) knows their own DB. You don't. Have your customer build the API and they'll support your integration. You can't possibly have a better coupling with the data in your customer's database than your customer does.