I feel very comfortable to say that while the ability to solve grade school maths is not a predictor of abilities at a research level, the advances needed to solve 1 and 2 will mean improving results across the board unless you take shortcuts (e.g. adding an "add" instruction as proposed elsewhere), because if you actually dig into prompting an LLM to follow steps for arithmetic what you quickly see is that problem has not been the ability to reason on the whole (that is not to suggest that the ability to reason is good enough), but ability to consistently and precisely follow steps a sufficient number of times.
It's acting like a bored child who hasn't had following the steps and verifying the results repetitively drilled into it in primary school. That is not to say that their ability to reason is sufficient to reason at an advanced level yet, but so far what has hampered a lot of it has been far more basic.
Ironically, GPT4 is prone to take shortcuts and make use of the tooling enabled for it to paper over its abilities, but at the same time having pushed it until I got it to actually do arithmetic of large numbers step by step, it seems to do significantly better than it used to at systematically and repetitively following the methods it knows, and at applying "manual" sanity checks to its results afterward.
As for lemma conjecturing, there is research ongoing, and while it's by no means solved, it's also not nearly as dire as you suggest. See e.g.[1]
That's not to suggest it's reasoning abilities are sufficient, but I also don't think we've seen anything to suggest we're anywhere close to hitting the ceiling of what current models can be taught to do, even before considering advancements in tooling around them, such as giving them "methods" to work to and a loop with injected feedback, access to tools and working memory.
It's acting like a bored child who hasn't had following the steps and verifying the results repetitively drilled into it in primary school. That is not to say that their ability to reason is sufficient to reason at an advanced level yet, but so far what has hampered a lot of it has been far more basic.
Ironically, GPT4 is prone to take shortcuts and make use of the tooling enabled for it to paper over its abilities, but at the same time having pushed it until I got it to actually do arithmetic of large numbers step by step, it seems to do significantly better than it used to at systematically and repetitively following the methods it knows, and at applying "manual" sanity checks to its results afterward.
As for lemma conjecturing, there is research ongoing, and while it's by no means solved, it's also not nearly as dire as you suggest. See e.g.[1]
That's not to suggest it's reasoning abilities are sufficient, but I also don't think we've seen anything to suggest we're anywhere close to hitting the ceiling of what current models can be taught to do, even before considering advancements in tooling around them, such as giving them "methods" to work to and a loop with injected feedback, access to tools and working memory.
[1] https://research.chalmers.se/en/publication/537034