More

shayonj · 2026-01-28T17:14:24 1769620464

yeah, got confirmation of

> Increased API Error Rates in Region: eu-west-1 where s3 Service is affected.

shayonj · 2026-01-25T17:11:31 1769361091

This! Only thing worse than your main queue backing off is you dropping items from going into the DLQ because it can’t stay up.

shayonj · 2026-01-25T13:53:12 1769349192

v cool and impressive!

shayonj · 2026-01-20T19:54:31 1768938871

> more to do with the deeply unfussy work of actual context engineering.

I agree. There is something to be said about autonomous systems producing valid outcomes within the current constraints of RL

shayonj · 2025-11-15T14:27:18 1763216838

Sadly, its not the first time I have noticed unexpected and odd behaviors from Aurora PostgreSQL offering.

I noticed another interesting (and still unconfirmed) bug with Aurora PostgreSQL around their Zero Downtime Patching.

During an Aurora minor version upgrade, Aurora preserves sessions across the engine restart, but it appears to also preserve stale per-session execution state (including the internal statement timer). After ZDP, I’ve seen very simple queries (e.g. a single-row lookup via Rails/ActiveRecord) fail with `PG::QueryCanceled: ERROR: canceling statement due to statement timeout` in far less than the configured statement_timeout (GUC), and only in the brief window right after ZDP completes.

My working theory is that when the client reconnects (e.g. via PG::Connection#reset), Aurora routes the new TCP connection back to a preserved session whose “statement start time” wasn’t properly reset, so the new query inherits an old timer and gets canceled almost immediately even though it’s not long-running at all.

shayonj · 2025-11-03T13:27:16 1762176436

"juice wasn't worth the squeeze" - adding that to my vocab

shayonj · 2025-10-23T07:40:08 1761205208

I was kinda surprised the lack of CAS on per-endpoint plan version or rejecting stale writes via 2PC or single-writer lease per endpoint like patterns.

Definitely a painful one with good learnings and kudos to AWS for being so transparent and detailed :hugops:

donavanm · 2025-10-23T12:48:02 1761223682

See https://news.ycombinator.com/item?id=45681136. The actual DNS mutation API does, effectively, CAS. They had multiple unsynchronized writers who raced without logical constraints or ordering to teh changes. Without thinking much they _might_ have been able to implement something like a vector either through updating the zone serial or another "sentinel record" that was always used for ChangeRRSets affecting that label/zone; like a TXT record containing a serialized change set number or a "checksum" of the old + new state.

Im guessing the "plans" aspect skipped that and they were just applying intended state, without trying serialize them. And last-write-wins, until it doesnt.

cyberax · 2025-10-23T23:36:43 1761262603

Oh, I can see it from here. AWS internally has a problem with things like task orchestration. I bet that the enactor can be rewritten as a goroutine/thread in the planner, with proper locking and ordering.

But that's too complicated and results in more code. So they likely just used an SQS queue with consumers reading from it.

shayonj · 2025-10-15T22:24:12 1760567052

Here is another take on deletes through by just updating the row groups in Parquet file through multi part upload and UploadPartCopy - https://www.shayon.dev/post/2025/285/mutable-atomic-deletes-...

shayonj · 2025-10-10T21:17:43 1760131063

I think ideally you could map retention of cold data to file objects itself and using key space naming strategy and lifecycle rules, expire the data that is not needed, thus saving on the storage costs (as much as possible hopefully)

shayonj · 2025-10-10T14:45:28 1760107528

Yeah. Or just sub out the data with null bytes. Something like that could be nice too.

simlevesque · 2025-10-10T18:12:50 1760119970

Are you familiar with Parquet ? you can't do that at all, you need to rewrite the whole file.

shayonj · 2025-10-10T18:20:27 1760120427

Yeah , I poorly phrased it - I meant in an ideal situation with the benefits of parquet like columnar file structure. I very much understand that it’s not possible on parquet today for the reasons you mentioned and others.