As someone experienced with operations / technical debt / weird company specific nonsense (Platform Engineer). No, you have to solve nuclear fusion at <insert-my-company>. You gotta do it over and over again. If it were that simple we wouldn't have even needed AI we would have hand written a few things, and then everything would have been legos, and legos of legos, but it takes a LONG time to find new true legos.
Yeah you’re right, all businesses are made of identical, interchangeable parts that we can swap out at our leisure.
This is why enterprises change ERP systems frictionlessly, and why the field of software engineering is no longer required. In fact, given that apparently, all business is solved, we can probably just template them all out, call it a day and all go home.
Yeah but thats not a Lego. A Lego is something that fits everwhere else. Not just previous work. There's a lot of previous work. There are very few true Legos.
AlphaFold simulated the structure of over 200 million proteins. Among those, there could be revolutionary ones that could change the medical scientific field forever, or they could all be useless. The reasoning is sound, but that's as far as any such tool can get, and you won't know it until you attempt to implement it in real life. As long as those models are unable to perfectly recreate the laws of the universe to the maximum resolution imaginable and follow them, you won't see an AI model, let alone a LLM, provide anything of the sort.
with these methods the issue is the log scale of compute. Let's say you ask it to solve fusion. It may be able to solve it but the issue is it's unverifiable WHICH was correct.
So it may generate 10 Billion answers to fusion and only 1-10 are correct.
There would be no way to know which one is correct without first knowing the answer to the question.
This is my main issue with these methods. They assume the future via RL then when it gets it right they mark that.
We should really be looking at methods of percentage it was wrong rather then it was right a single time.
Which is why it is incredibly depressing that OpenAI will not publish the raw chain of thought.
“Therefore, after weighing multiple factors including user experience, competitive advantage, and the option to pursue the chain of thought monitoring, we have decided not to show the raw chains of thought to users. We acknowledge this decision has disadvantages. We strive to partially make up for it by teaching the model to reproduce any useful ideas from the chain of thought in the answer. For the o1 model series we show a model-generated summary of the chain of thought.”
maybe they will enable to show CoT for a limited uses, like 5 prompts a day for Premium users, or for Enterprise users with agreement not to steal CoT or something like that.
if OpenAI sees this - please allow users to see CoT for a few prompts per day, or add it to Azure OpenAI for Enterprise customers with legal clauses not to steal CoT
Imagine if this tech was available in the middle ages and it was asked to 'solve' alchemy or perpetual motion, and responded that it was an impossible problem... people would (irrationally from our perspective) go Luddite on it I suspect. Now apply to the 'fusion power' problem.
The new thing that can do more at the "ceiling" price doesn't remove your ability to still use the 100x cheaper tokens for the things that were doable on that version.
That exact pattern is always true of technological advance. Even for a pretty broad definition of technology. I'm not sure if it's perfectly described by the name "induced demand" but it's basically the same thing.