Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> because this entire "region" is actually housed in a single datacenter

From the incident report you linked:

"a cooling system water pipe leak occurred in one of the data centers in the europe-west9 region [...] Europe-west9 contains three buildings with independent cooling, power, and networking"

Also,

> a global GCP console/API outage because GCP's single global control plane couldn't reach europe-west9.

again, from that page,

"A small number of methods within the GCE control plane API must collect information from multiple regions or zones by making requests to each regional control plane (called fanout requests). Google Cloud services including Cloud Console depend on these methods. When the GCE control plane for the europe-west9 region and zones went offline, some of these fanout methods did not operate correctly. During the outage, this led to global unavailability for some pages and control plane operations within Cloud Console"

It's certainly not great that a regional problem had global impact, but there is some nuance. For example, that paragraph talks about "each" regional control plane, in addition to a global control plane.

I don't mean to belittle the importance of the incident. Suffice it to say lessons were learned and follow-up changes were made.

As for Nitro - take a look at C3. https://cloud.google.com/blog/products/compute/introducing-c...

Disclosure: I work on GCE.



bigquery or table was down for a week




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: