Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

People who have experience with Aurora and RDS Postgres: What's your experience in terms of performance? If you dont need multi A-Z and quick failover, can you achieve better performance with RDS and e.g. gp3 64.000 iops and 3125 throughput (assuming everything else can deliver that and cpu/mem isn't the bottleneck)? Aurora seems to be especially slow for inserts and also quite expensive compared to what I get with RDS when I estimate things in the calculator. And what's the story on read performance for Aurora vs RDS? There's an abundance of benchmarks showing Aurora is better in terms of performance but they leave out so much about their RDS config that I'm having a hard time believing them.


We've seen better results and lower costs in a 1 writer, 1-2 reader setup on Aurora PG 14. The main advantages are 1) you don't re-pay for storage for each instance--you pay for cluster storage instead of per-instance storage & 2) you no longer need to provision IOPs and it provides ~80k IOPs

If you have a PG cluster with 1 writer, 2 readers, 10Ti of storage and 16k provision IOPs (io1/2 has better latency than gp3), you pay for 30Ti and 48k PIOPS without redundancy or 60Ti and 96k PIOPS with multi-AZ.

The same Aurora setup you pay for 10Ti and get multi-AZ for free (assuming the same cluster setup and that you've stuck the instances in different AZs).

I don't want to figure the exact numbers but iirc if you have enough storage--especially io1/2--you can end up saving money and getting better performance. For smaller amounts of storage, the numbers don't necessarily work out.

There's also 2 IO billing modes to be aware of. There's the default pay-per-IO which is really only helpful for extreme spikes and generally low IO usage. The other mode is "provisioned" or "storage optimized" or something where you pay a flat 30% of the instance cost (in addition to the instance cost) for unlimited IO--you can get a lot more IO and end up cheaper in this mode if you had an IO heavy workload before

I'd also say Serverless is almost never worth it. Iirc provisioning instances was ~17% of the cost of serverless. Serverless only works out if you have ~ <4 hours of heavy usage followed by almost all idle. You can add instances fairly quickly and failover for minimal downtime (of course barring running into the bug the article describes...) to handle workload spikes using fixed instance sizes without serverless


Have you benchmarked your load on RDS? [0] says that IOPS on Aurora is vastly different from actual IOPS. We have just one writer instance and mostly write 100's of GB in bulk.

[0] https://dev.to/aws-heroes/100k-write-iops-in-aurora-t3medium...


We didn't benchmark--we used APM data in Datadog to compare setups before and after migration

I believe the article is talking about I/O aggregate operations vs I/O average per second. I'm talking strictly about the "average per second" variety. The former is really only relevant for billing in the standard billing mode.

Actually a big motivator for the migration was batch writes (we generate tables in Snowflake, export to S3, then import from S3 using the AWS RDS extension) and Aurora (with ability to handle big spikes) helped us a lot. We'd see application performance (query latency reported by APM) increase a decent amount during these bulk imports and it was much less impactful with Aurora.

iirc it was something like 4-5ms to 10-12ms query latency for some common queries regularly and during import respectively with RDS PG and more like 6-7ms during import on Aurora (mainly because we were exhausting IO during imports before)


For me, the big miss with Postgres Aurora RDS was costs. We had some queries that did a fair amount of I/O in a way that would not normally be a problem, but in the Aurora Postgres RDS world that I/O was crazy expensive. A couple of fuzzy queries blew costs up to over $3,000/month for a database that should have cost maybe $50-$100/month. And this was for a dataset of only about 15 million rows without anything crazy in them.


Sounds like you need to use IO optimized storage billing mode.


We were burned by Aurora. Costs, performance, latency, all were poor and affected our product. Having good systems admins on staff, we ended up moving PostgreSQL on-prem.


> There's an abundance of benchmarks showing Aurora is better in terms of performance but they leave out so much about their RDS config that I'm having a hard time believing them.

Do you have a problem believing these claims on equivalent hardware?: https://pages.cs.wisc.edu/~yxy/cs764-f20/papers/aurora-sigmo...

Or do your own performance assessments, following the published document and templates available so you can find the facts on your own?

For Aurora MySql:

"Amazon Aurora Performance Assessment Technical Guide" - https://d1.awsstatic.com/product-marketing/Aurora/RDS_Aurora...

For Aurora Postgres:

"...Steps to benchmark the performance of the PostgreSQL-compatible edition of Amazon Aurora using the pgbench and sysbench benchmarking tools..." - https://d1.awsstatic.com/product-marketing/Aurora/RDS_Aurora...

"Automate benchmark tests for Amazon Aurora PostgreSQL" - https://aws.amazon.com/blogs/database/automate-benchmark-tes...

"Benchmarking Amazon Aurora Limitless with pgbench" - https://aws.amazon.com/blogs/database/benchmarking-amazon-au...


My experience is with Aurora MySQL, not postgres. But my understanding is that the way the storage layer works is much the same.

We have some clusters with very high write IOPS on Aurora.

When looking at costs we modelled running MySQL and regular RDS MySQL.

We found for the IOPS capacity of Aurora we wouldn't be able to match it on AWS without paying a stupid amount more.


Aurora doesn't use EBS under the hood. It has no option to choose storage type or io latency. Only a billing choice between pay per io or fixed price io.


Precisely! That's why RDS sounds so interesting. I get a lot more knobs to tweak performance, but I'm curious if a maxed out gp3 with instances that support it is going to fare any better than Aurora.


I've had better results with managing my own clusters on metal instances. You get much better performance with e.g. NVMe drives in a 0+1 raid (~million iops in a pure raid 0 with 7 drives) and I am comfortable running my own instances and clusters. I don't care for the way RDS limits your options on extensions and configuration, and I haven't had a good time with the high availability failovers internally, I'd rather run my own 3 instances in a cluster, 3 clusters in different AZs.

Blatant plug time:

I'm actually working for a company right now ( https://pgdog.dev/ ) that is working on proper sharding and failovers from a connection pooler standpoint. We handle failovers like this by pausing write traffic for up to 60 seconds by default at the connection pooler and swapping which backend instance is getting traffic.


> 3125 throughput

Max throughput on gp3 was recently increased to 2GB/s, is there some way I don't know about of getting 3.125?


This is super confusing. Check out the RDS Postgres calculator with gp3:

> General Purpose SSD (gp3) - Throughput > gp3 supports a max of 4000 MiBps per volume

But the docs say 2000. Then there's IOPS... The calculator allows up to 64.000 but on [0], if you expand "Higher performance and throughout" it says

> Customers looking for higher performance can scale up to 80,000 IOPS and 2,000 MiBps for an additional fee.

[0] https://aws.amazon.com/ebs/general-purpose/


RDS PG stripes multiple gp3 volumes so that's why RDS throughput is higher than gp3

I think 80k IOPs on gp3 is a newer release so presumably AWS hasn't updated RDS from the old max of 64k. iirc it took a while before gp3 and io2 were even available for RDS after they were released as EBS options

Edit: Presumably it takes some time to do testing/optimizations to make sure their RDS config can achieve the same performance as EBS. Sometimes there are limitations with instance generations/types that also impact whether you can hit maximum advertised throughput


Only if you allocate (and pay for) more than 400GB. And if you have high traffic 24/7 beware of "EBS optimized" instances which will fall down to baseline rates after a certain time. I use vantage.sh/rds (not affiliated) to get an overview of the tons of instance details stretched out over several tables in AWS docs.


RDS stripes multiple gp3 volumes. Docs are saying 4Gi/s per instance is the max for gp3 if I'm looking at the right table




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: