More likely we will just use local LLMs period. In just a matter of a couple of ...

More likely we will just use local LLMs period.

In just a matter of a couple of years, we went from a single, closed source LLM entirely outputting tokens slower than one can read, to dozens of open source models, some specialized, able to run on a mobile device, outputting tokens faster than one can read.

The gap between inference providers and running on edge will always exist but will become less and less relevant.

What OpenAi did is like offering accelerated GPUs for 3D gaming that nobody could set up at home, before they could.

Are we using buying better gaming experience by renting cloud GPUs? I recall some companies including Google were offering that. It took a few years for investors to figure people would rather just run games locally.

We aren't dealing with gamers here, but I think the analogy is valid.