I am neither a mathematician or LLM creator but I do know how to evaluate interesting tech claims.
The absolute best case scenario for a new technology is that it when it seems like a toy for nerds, and doesn't outperform anything we have today, but the scaling path is clear.
Its problems just won't matter if it does that one thing with scaling. The web is a pretty good hypermedia platform, but a disastrously bad platform for most other computer applications. Nevertheless the scaling of URIs and internet protocols have caused us to reorganize our lives around it. And then if there really are unsolvable problems with the platform they just get offloaded onto users. Passwords? Privacy? Your problem now. Surely you know to use a password manager?
I think this new wave of AI is going to be like that. If they never solve the hallucination/confabulation issue, it's just going to become your problem. If they never really gain insight, it's going to become your problem to instruct them carefully. Your peers will chide for not using a robust AI-guardrail thing or not learning the basics of prompt engineering like all the kids do instinctively these days.
How on earth could you evaluate the scaling path with too little information. That's my point. You can't possibly know that a technology can solve a given kind of problem if it can only so far solve a completely different kind of problem which is largely unrelated!
Saying that performance on grade-school problems is predictive of performance on complex reasoning tasks (including theorem proving) is like saying that a new kind of mechanical engine that has 90% efficiency can be scaled 10x.
These kind of scaling claims drive investment, I get it. But to someone who understands (and is actually working on) the actual problem that needs solving, this kind of claim is perfectly transparent!
Any claims of objective, quantitative measurements of "scaling" in LLMs is voodoo snake oil when measured against some benchmarks consisting of "which questions does it answer correctly". Any machine learning PhD will admit this, albeit only in a quiet corner of a noisy bar after a few more drinks than is advisable when they're earning money from companies who claim scaling wins on such benchmarks.
For the current generative AI wave, this is how I understand it:
1. The scaling path is decreased val/test loss during training.
2. We have seen multiples times that large decreases in this loss have resulted in very impressive improvements in model capability across a diverse set of tasks (e.g. gpt-1 through gpt-4, and many other examples).
3. By now, there is tons of robust data demonstrating really nice relationships between model size, quantity of data, length of training, quality of data, etc and decreased loss. Evidence keeps building that most multi-billion param LLMs are probably undertrained, perhaps significantly so.
4. Ergo, we should expect continued capability improvement with continued scaling. Make a bigger model, get more data, get higher data quality, and/or train for longer and we will see improved capabilities. The graphs demand that it is so.
---
This is the fundamental scaling hypothesis that labs like OpenAI and Anthropic have been operating off of for the past 5+ years. They looked at the early versions of the curves mentioned above, extended the lines, and said, "Huh... These lines are so sharp. Why wouldn't it keep going? It seems like it would."
And they were right. The scaling curves may break at some point. But they don't show indications of that yet.
Lastly, all of this is largely just taking existing model architectures and scaling up. Neural nets are a very young technology. There will be better architectures in the future.
I don't think they will go anywhere. Europe doesn't have the ruthlessness required to compete in such an arena, it would need far more unification first before that could happen. And we're only drifting further apart it seems.
The absolute best case scenario for a new technology is that it when it seems like a toy for nerds, and doesn't outperform anything we have today, but the scaling path is clear.
Its problems just won't matter if it does that one thing with scaling. The web is a pretty good hypermedia platform, but a disastrously bad platform for most other computer applications. Nevertheless the scaling of URIs and internet protocols have caused us to reorganize our lives around it. And then if there really are unsolvable problems with the platform they just get offloaded onto users. Passwords? Privacy? Your problem now. Surely you know to use a password manager?
I think this new wave of AI is going to be like that. If they never solve the hallucination/confabulation issue, it's just going to become your problem. If they never really gain insight, it's going to become your problem to instruct them carefully. Your peers will chide for not using a robust AI-guardrail thing or not learning the basics of prompt engineering like all the kids do instinctively these days.