But the point about how it just "improves" with slightly larger numbers, but still fails at really big numbers, shows that it's not really "reasoning" about math in a logical way - that's the point I was getting at.
For example, once you teach a grade schooler the basic process for addition, they can add 2 30 digit numbers correctly fairly easily (whether they want to do it or not is a different story). The fact that LLMs still make errors at larger numbers points to the fact that they're not really "learning" the rules of arithmetic.
Of course, it isn't. It approximates. I bet you'll get better results by increasing the depth of the network, as with each layer, you'll achieve a more accurate approximation. I have an idea for achieving this without significantly increasing the number of layers, and I'm currently working on it as a side project. However, this idea might prove to be useless after all, as it requires training the model from scratch with a lot of synthetic data mixed in. Experiments on small models look promising, but they are negligible, and I can't afford to train a larger model from scratch for a side project.
Isn't actually just impossible for it to do it well on arbitrarily large inputs like this even from computational complexity point of view. If it doens't know it's allowed to do step by step multiplication (addition is maybe ok). I'm not sure it's a criticism against its ability to reason. It's similar to asking someone to do addition in 5 seconds with no paper. like of course at some point it won't be able to do it for a large enough number. BTW strongly disagree that the average grade schooler will be able to add 2 30digit numbers even with paper without making a mistake.
For example, once you teach a grade schooler the basic process for addition, they can add 2 30 digit numbers correctly fairly easily (whether they want to do it or not is a different story). The fact that LLMs still make errors at larger numbers points to the fact that they're not really "learning" the rules of arithmetic.