No I'm not that's what this whole sub-thread is about how bad LLMs are at the stuff thats described in the OP.
For context this is the grandparent within which my original reply was scoped:
I feel very comfortable saying, as a mathematician, that the ability to solve grade school maths problems would not be at all a predictor of ability to solve real mathematical problems at a research level.
The reason LLMs fail at solving mathematical problems is because: 1) they are terrible at arithmetic, 2) they are terrible at algebra, but most importantly, 3) they are terrible at complex reasoning (more specifically they mix up quantifiers and don't really understand the complex logical structure of many arguments) 4) they (current LLMs) cannot backtrack when they find that what they already wrote turned out not to lead to a solution, and it is too expensive to give them the thousands of restarts they'd require to randomly guess their way through the problem if you did give them that facility
Solving grade-school problems might mean progress in 1 and 2, but that is not at all impressive, as there are perfectly good tools out there that solve those problems just fine, and old-style AI researchers have built perfectly good tools for 3. The hard problem to solve is problem 4, and this is something you teach people how to do at a university level.
(I should add that another important problem is what is known as premise selection. I didn't list that because LLMs have actually been shown to manage this ok in about 70% of theorems, which basically matches records set by other machine learning techniques.)
(Real mathematical research also involves what is known as lemma conjecturing. I have never once observed an LLM do it, and I suspect they cannot do so. Basically the parameter set of the LLM dedicated to mathematical reasoning is either large enough to model the entire solution from end to end, or the LLM is likely to completely fail to solve the problem.)
I personally think this entire article is likely complete bunk.
Edit: after reading replies I realise I should have pointed out that humans do not simply backtrack. They learn from failed attempts in ways that LLMs do not seem to. The material they are trained on surely contributes to this problem.