This should also be good news for open weights models, right? Since OpenAI is ba...

gwern · on Sept 12, 2024

No. It's bad news, because you can't see the rationale/search process that led to the final answer, just the final answer, and if training on the final answer were really that adequate, we wouldn't be here. It also is probably massively expensive compute-wise, much more so than simple unsupervised training on a corpus of question/answer pairs (because you have to generate the corpus by search first). It's also also bad news because reinforcement learning tends to be highly finicky and requires you to sweat the details and act like a professional, while open weight stuff tends to be produced by people for whom the phrase 'like herding cats' was coined, and so open source RL stuff is usually flakier than proprietary solutions (where it exists at all). They can do it for a few passion projects shared by many nerds, like chess or Go, but it takes a long time.

haolez · on Sept 13, 2024

> It also is probably massively expensive compute-wise, much more so than simple unsupervised training on a corpus of question/answer pairs (because you have to generate the corpus by search first).

What do you mean? It sounds interesting.