> It's all the same trade-off as a traditional compiler: you get a lot more thro...

> It's all the same trade-off as a traditional compiler: you get a lot more throughput than hiring a specialist performance programmer, but the latter will typically outperform, possibly by orders of magnitude.

That throughput is the point though? You cannot have performance specialists on every single ML workload. It's still significantly better than not having these kinds of optimization.