They aren't going to be using fp32 for inferencing, so those FP numbers are meaningless.
Memory and memory bandwidth matters most for inferencing. 819.2 GB/s for M2 Ultra is less than half that of A100, but having 192GB of RAM instead of 80gb means they can run inference on models that would require THREE of those A100s and the only real cost is that it takes longer for the AI to respond.
3 A100 at $5300/mo each for the past 2 years is over $380,000. Considering it worked for them, I'd consider it a massive success.
From another perspective though, they could have bought 72 of those Ultra machines for that much money and had most devs on their own private instance.
The simple fact is that Nvidia GPUs are massively overpriced. Nvidia should worry a LOT that Apple's private AI cloud is going to eat their lunch.
An M2 is according to a reddit post around 27 tflops
So < 1/10 the performance of just computation. let alone the memory.
What workflow would use something like this?