This was a major issue, but it wasn't a total failure of the region. Our stuff i...

This was a major issue, but it wasn't a total failure of the region.

Our stuff is all in us-east-1, ops was a total shitshow today (mostly because many 3rd party services besides aws were down/slow), but our prod service was largely "ok", a total of <5% of customers were significantly impacted because existing instances got to keep running.

I think we got a bit lucky, but no actual SLAs were violated. I tagged the postmortem as Low impact despite the stress this caused internally.

We definitely learnt something here about both our software and our 3rd party dependencies.