Previously choosing a top tier AI model tied you to what that provider wanted to do with hosting the model long term and the pricing they wanted to charge for it. Now you can get the same model anywhere with GPU, hosted or not, for minimal cost overhead to what it takes to run the model itself. You're also free to tune, retrain, or otherwise mess with the model as you see fit without needing approval.
The excitement is probably a bit much but it's not just about the eval results themselves but the baggaged attached with them.
For me the excitement is that around the o3 announcement I had a feeling like we were heading to an OpenAI / Sam Altman controlled dystopia. This resets that - you can run the model yourself, you can modify it yourself, it's essentially on par with the best public models, and it gives hope that the smaller players have a fighting chance going forward. They also published their innovations bringing back some of the feeling of open science that used to be in ML research but which mostly went away.
Google models are already in the lead in many areas in capability and cost, so I never felt like OpenAI was dominant. OpenAI was first to make a splash, but ChatGPT is in a ~5 way tie in terms of what it can do.
Which models at what cost? IMO Deepseek websearch potential to challenge Google search moat also makes Google particularly vunerable, because it dramatically evaporates advantages of 100s of billions of hardware. Not to imply Google does not maintain advantages, but it gap just went from insurmountable to many actors can potentially build AI search to rival Google on shoe string budget. Certainly on sovereign budget.
It's going to be an increasingly irrelevant game when models make regional scale, i.e. country/sovereign scale inference attainable. Countries that couldn't even role out domestic search pre accessible models that displaces search likely soon can.
AFAIK o1 is hidden behind an expensive subscription (iirc $20/mo and still rate-limited), it might as well just not exist for most users (since R1 is free, provided service availability).
Also R1 (and its distilled models) expose their CoT & web interface has a websearch option too.
With the 14b distilled models, I found multiple math-related prompts where it gives the right answers almost immediately but then wastes 10 minutes making self-verification mistakes (e.g. "Write Python3 code that computes the modular inverse of a mod 2^32")
I'm trying to square the excitement over DeepSeek with its good -but not dominant- performance in evals.