I have about 30 years as a linux eng, starting with openbsd and have spent a LOT...

I have about 30 years as a linux eng, starting with openbsd and have spent a LOT of time with hardware building webhosts and CDNs until about 2020 where my last few roles have been 100% aws/gcloud/heroku.

I love building the cool edge network stuff with expensive bleeding edge hardware, smartnics, nvmeOF, etc but its infinitely more complicated and stressful than terraforming an AWS infra. Every cluster I set up I had to interact with multiple teams like networking, security, storage sometimes maintenance/electrical, etc. You've got some random tech you have to rely on across the country in one of your POPs with a blown server. Every single hardware infra person has had a NOC tech kick/unplug a server at least once if they've been in long enough.

And then when I get the hardware sometimes you have different people doing different parts of setup, like NOC does the boot, maybe boostraps the hardware with something that works over ssh before an agent is installed (ansible, etc), then your linux eng invokes their magic with a ton of bash or perl, then your k8s person sets up the k8s clusters with usually something like terraform/puppet/chef/salt probably calling helm charts. Then your monitoring person gets it into OTEL/grafana, etc. This all organically becomes more automated as time goes on, but I've seen it from a brand new infra where you've got no automation many times.

Now you're automating 90% of this via scripts and IAC, etc, but you're still doing a lot of tedious work.

You also have a much more difficult time hiring good engineers. The markets gone so heavily AWS (I'm no help) that its rare that I come across an ops resume that's ever touched hardware, especially not at the CDN distributed systems level.

So.. aws is the chill infra that stays online and you can basically rely on 99.99something%. Get some terraform blueprints going and your own developers can self serve. Don't need hardware or ops involved.

And none of this is even getting into supporting the clusters. Failing clusters. Dealing with maintenance, zero downtime kernel upgrades, rollbacks, yaddayadda.