If you want to solve grade school math problems, why not use an 'add' instruction? It's been around since the 50s, runs a billion times faster than an LLM, every assembly-language programmer knows how to use it, every high-level language has a one-token equivalent, and doesn't hallucinate answers (other than integer overflow).
We also know how to solve complex reasoning chains that require backtracking. Prolog has been around since 1972. It's not used that much because that's not the programming problem that most people are solving.
Why not use a tool for what it's good for and pick different tools for other problems they are better for? LLMs are good for summarization, autocompletion, and as an input to many other language problems like spelling and bigrams. They're not good at math. Computers are really good at math.
There's a theorem that an LLM can compute any computable function. That's true, but so can lambda calculus. We don't program in raw lambda calculus because it's terribly inefficient. Same with LLMs for arithmetic problems.
There is a general result in machine learning known as "the bitter lesson"[1], which is that methods which come from specialist knowledge tend to be beaten by methods which rely on brute force computation in the long run because of Moore's law and the ability to scale things by distributed computing. So the reason people don't use the "add instruction"[2] for example is that over the last 70 years of attempting to build out systems which do exactly what you are proposing, they have found that not to work very well whereas sacrificing what you are calling "efficiency" (which they would think of as special purpose domain-specific knowledge) turns out to give you a lot in terms of generality. And they can make up the lost efficiency by throwing more hardware at the problem.
As someone with a CS background myself, I don't think this is what GP was talking about.
Let's forget for a moment that stuff has to run on an actual machine. If you had to represent a quadratic equation, would you rather write:
(a) x^2 + 5x + 4 = 0
(b) the square of the variable plus five times the variable plus four equals zero
When you are trying to solve problems with a level of sophistication beyond the toy stuff you usually see in these threads, formal language is an aid rather than an impediment. The trajectory of every scientific field (math, physics, computer science, chemistry, even economics!) is away from natural language and towards formal language, even before computers, precisely for that reason.
We have lots of formal languages (general-purpose programming languages, logical languages like Prolog/Datalog/SQL, "regular" expressions, configuration languages, all kinds of DSLs...) because we have lots of problems, and we choose the representation of the problem that most suits our needs.
Unless you are assuming you have some kind of superintelligence that can automagically take care of everything you throw at it, natural language breaks down when your problem becomes wide enough or deep enough. In a way this is like people making Rube-Goldberg contraptions with Excel. 50% of my job is cleaning up that stuff.
I quite agree and so would Wittgenstein, who (as I understand it) argued that precise language is essential to thought and reasoning[1]. I think one of the key things here is often what we think of as reasoning boils down to taking a problem in the real world and building a model of it using some precise language that we can then apply some set of known tools to deal with. Your example of a quadratic is perfect, because of course now I see (a) I know right away that it's an upwards-facing parabola with a line of symmetry at -5/2, that the roots are at -4 and -1 etc whereas if I saw (b) I would first have to write it down to get it in a proper form I could reason about.
I think this is a fundamental problem with the "chat" style of interaction with many of these models (that the language interface isn't the best way of representing any specific problem even if it's quite a useful compromise for problems in general). I think an intrinsic problem of this class of model is that they only have text generation to "hang computation off" meaning the "cognative ability" (if we can call it that) is very strongly related to how much text it's generating for a given problem which is why that technique of prompting using chain of thought generates much better results for many problems[2].
[1] Hence the famous payoff line "whereof we cannot speak, thereof we must remain silent"
[2] And I suspect why in general GPT-4 seems to have got a lot more verbose. It seems to be doing a lot of thinking out loud in my use, which gives better answers than if you ask it to be terse and just give the answer or to give the answer first and then the reasoning, both of which generally generate inferior answers in my experience and in the research eg https://arxiv.org/abs/2201.11903
It depends on whether you ask him before or after he went camping -- but yeah, I was going for an early-Wittgenstein-esque "natural language makes it way too easy to say stuff that doesn't actually mean anything" (although my argument is much more limited).
> I think this is a fundamental problem with the "chat" style of interaction
The continuation of my argument would be that if the problem is effectively expressible in a formal language, then you likely have way better tools than LLMs to solve it. Tools that solve it every time, with perfect accuracy and near-optimal running time, and critically, tools that allow solutions to be composed arbitrarily.
Alpha Go and NNUE for computer chess, which are often cited for some reason as examples of this brave new science, would be completely worthless without "classical" tree search techniques straight out of the Russel-Norvig.
Hence my conclusion, contra what seems to be the popular opinion, is that these tools are potentially useful for some specific tasks, but make for very bad "universal" tools.
There are some domains that are in the twilight zone between language and deductive, formal reasoning. I've been into genealogy last year. It's very often deductive "detective work": say there are four women in a census with the same name and place that are listed on a birth certificate you're investigating. Which of them is it? You may rule one out on hard evidence (census suggests she would have been 70 when the birth would have happened), one on linked evidence (this one is the right age, but it's definitively the same one who died 5 years later and we know the child's mother didn't), one on combined softer evidence (she was in a fringe denomination and at the upper end of the age range) then you're left with one, etc.
Then as you collect more evidence you find that the age listed on the first one in the census was wildly off due to a transcription error and it's actually her.
You'd think some sort of rule-based system and database might help with these sorts of things. But the historical experience of expert system is that you then often automate the easy bits at the cost of demanding even more tedious data-entry. And you can't divorce data entry and deduction from each other either, because without context, good luck reading out a rare last name in the faded ink of some priest's messy gothic handwriting.
It feels like language models should be able to help. But they can't, yet. And it fundamentally isn't because they suck at grade school math.
Even linguistics, not something I know much about but another discipline where you try to make deductions from tons and tons of soft and vague evidence - you'd think language models, able to produce fluent text in more languages than any human, might be of use there. But no, it's the same thing: it can't actually combine common sense soft reasoning and formal rule-oriented reasoning very well.
It does. This is the plugins methodology described in the toolformers paper which I've linked elsewhere[1]. The model learns that for certain types of problems certain specific "tools" are the best way to solve the problem. The problem is of course it's simple to argue that the LLM learns to use the tool(s) and can't reason itself about the underlying problem. The question boils down to whether you're more interested in machines which can think (whatever that means) or having a super-powered co-pilot which can help with a wide variety of tasks. I'm quite biased towards the second so I have the wolfram alpha plugin enabled in my chat gpt. I can't say it solves all the math-related hallucinations I see but I might not be using it right.
GPT4 does even without explicitly enabling plugins now, by constructing Python. If you want it to actually reason through it, you now need to ask it, sometimes fairly forcefully/in detail, before it will indulge you and not omit steps. E.g. see [1] for the problem given above.
But as I noted elsewhere, training its ability to do it from scratch matters not for the ability to do it from scratch, but for the transferability of the reasoning ability. And so I think that while it's a good choice for OpenAI to make it automatically pick more effective strategies to give the answer it's asked for, there is good reason for us to still dig into its ability to solve these problems "from scratch".
Ideally we'd have both worlds -- but if we're aiming for AGI and we have to choose, using a language that lets you encode everything seems preferable to one that only lets you talk about, say, constrained maximization problems.
the ml method doesnt require you to know how to solve the problem at all, and could someday presumably develop novel solutions. not just high efficiency symbolic graph search.
The bitter lesson isn't a "general result". It's an empirical observation (and extrapolation therefrom) akin to Moore's law itself. As with Moore's law there are potential limiting factors: physical limits for Moore's law and availability and cost of quality training data for the bitter lesson.
Surely the "efficiency" is just being transferred from software to hardware e.g the hardware designers are having to come up with more efficient designs, shrink die sizes etc to cope with the inefficiency of the software engineers? We're starting to run into the limits of Moore's law in this regard when it comes to processors, although it looks like another race might be about to kick off for AI but with RAM instead. When you've got to the physical limits of both, is there anywhere else to go other than to make the software more efficient?
When you say "a general result", what does that mean? In my world, a general result is something which is rigorously proved, e.g., the fundamental theorem of algebra. But this seems to be more along the lines of "we have lots of examples of this happening".
I'm certainly no expert, but it seems to me that Wolfram Alpha provides a counterexample to some extent, since they claim to fuse expert knowledge and "AI" (not sure what that means exactly). Wolfram Alpha certainly seems to do much better at solving math problems than an LLM.
I would mention, that while yes, you can just throw computational power at the problem, the addition of human expertise didn't disappear. It moved from creating an add instruction, to coming up with a new Neural Net Architecture, and we've seen a lot of the ideas being super useful and pushing the boundaries.
> Certainly the objective is not for the AI to do research-level mathematics.
The problem is that there are different groups of people with different ideas about AI, and when talking about AI it's easy to end up tackling the ideas of a specific group but forgetting about the existence of the others. In this specific example, surely there are AI enthusiasts who see no limits to the applications of AI, including research-level mathematics.
This is so profoundly obvious you have to wonder the degree of motivated reasoning behind people’s attempt to cast this as “omg it can add… but so can my pocket calculator!”
There's no value in an LLM doing arithmetic for the sake of doing arithmetic with the LLM. There's value in testing an LLMs ability to follow the rules for doing arithmetic that it already knows, because the ability to recognise that a problem matches a set of rules it already knows in part or whole and then applying those rules with precision is likely to generalise to overall far better problem solving abilities.
By all means, we should give LLMs lots and lots of specialised tools to let them take shortcuts, but that does not remove the reasons for understanding how to strengthen the reasoning abilities that would also make them good at maths.
EDIT: After having just coerced the current GPT4 to do arithmetic manually: It appears to have drastically improved in its ability to systematically following the required method, while ironically being far less willing to do so (it took multiple attempts before I got it to stop taking shortcuts that appeared to involve recognising this was a calculation it could use tooling to carry out, or ignoring my instructions to do it step by step and just doing it "in its head" the way a recalcitrant student might. It's been a while since I tested this, but this is definitely "new-ish".
Gaslighting LLMs does wonders.
In this case, e.g., priming it by convincing it the tool is either inaccessible/overloaded/laggy, or here perhaps, telling it the python tool computed wrong and can thus not be trusted.
Why would we teach kids maths then, when they can use a calculator? It's much easier and faster for them.
I believe it's because having a foundational understanding of maths and logic is important when solving other problems, and if you are looking to create an AI that can generally solve all problems it should probably have some intuitive understanding of maths too.
i.e. if we want an LLM to be able to solve unsolved theorems in the future, this requires a level of understanding of maths that is more than 'teach it to use a calculator'.
More broadly, I can imagine a world where LLM training is a bit more 'interactive' - right now if you ask it to play a game of chess with you it fails, but it has only ever read about chess and past games and guesses the next token based on that. What if it could actually play a game of chess - would it get a deeper appreciation for the game? How would this change it's internal model for other questions (e.g. would this make it answer better at questions about other games, or even game theory?)
It's also fun to use your brain I guess, I think we've truly forgotten that life should be about fun.
Watching my kids grow up, they just have fun doing things like trying to crawl, walk or drink. It's not about being the best at it, or the most efficient, it's just about the experience.
Now maths is taught in a boring way, but knowing it can help us lead more enjoable lives. When math is taught in an enjoyable way AND people get results out of it. Well that's glorious.
> Why would we teach kids maths then, when they can use a calculator? It's much easier and faster for them.
I am five years older than my brother, and we happened to land just on opposite sides of when children were still being taught mental arithmetic and when it was assumed they would, in fact, have calculators in their pockets.
It drives him crazy that I can do basic day-to-day arithmetic in my head faster than he can get out his calculator to do it. He feels like he really did get cheated out of something useful because of the proliferation of technology.
Even if that were true, I can count on one hand the number of times I've needed to use anything more than basic algebra (which is basically arithmetic with a placeholder) in my adult life. I think I'd genuinely rather keep arithmetic in my head than calculator use.
Is this intuition scientifically supported? I've read that people who remember every detail of their lives tend not to have spectacular intelligence, but outside of that extreme I'm unaware of having seen the tradeoff actually bite. And there are certainly complementarities in knowledge -- knowing physics helps with chemistry, knowing math and drama both help with music, etc.
Chimps have a much better working memory than humans. They can also count 100 times faster than humans.
However, the area of their brain responsible for this faculty is used for language in humans... The theory is that the prior working memory and counting ability may have been optimized out at some point to make physical room, assuming human ancestors could do it too.
Lookup the chimp test. the videos of the best chimp are really quite incredible.
There is also the measured inflation of map traversing parts of the brain in pro tetris players and taxi drivers. I vaguely remember an explanation about atrophy in nearby areas of the brain, potentially to make room.
Judging by some YouTube videos I’ve seen, ChatGPT with GPT-4 can get pretty far through a game of chess. (Certainly much farther than GPT-3.5.) For that duration it makes reasonably strategic moves, though eventually it seems to inevitably lose track of the board state and start making illegal moves. I don’t know if that counts as being able to “actually play a game”, but it does have some ability, and that may have already influenced its answers about the other topics you mentioned.
What if you encoded the whole game state into a one-shot completion that fits into the context window every turn? It would likely not make those illegal moves. I suspect it's an artifact of the context window management that is designed to summarize lengthy chat conversations, rather than an actual limitation of GPT4's internal model of chess.
Having an internal model of chess and maintaining an internal model of the game state of a specific given game when it's unable to see the board are two very different things.
EDIT: On re-reading I think I misunderstood you. No, I don't think it's a bold assumption to think it has an internal model of it at all. It may not be a sophisticated model, but it's fairly clear that LLM training builds world models.
We know with reasonable certainty that an LLM fed on enough chess games will eventually develop an internal chess model. The only question is whether GPT4 got that far.
So can humans. And nothing stops probabilities in a probabilistic model from approaching or reaching 0 or 1 unless your architecture explicitly prevents that.
Or, given https://thegradient.pub/othello/, why wouldn't it have an internal model of chess? It probably saw more than enough example games and quite a few chess books during training.
I think the answer is Money, Money, Money. Sure it is 1000000000x more expensive in compute power, and error prown on top as well, to let a LLM solve an easy Problem. But the Monopolies generate a lot of hype around it to get more money from investors. Same as the self driving car hype was. Or the real time raytracing insanity in computer graphics. If one hype dies they artificially generate a new one. For me, I just watch all the ships sink to the ground. It is gold level comedy. Btw AGI is coming, haha, sure, we developers will be replaced by an script which can not bring B, A, C in a logical sequence. And this already needs massive town size data centers to train.
> If one hype dies they artificially generate a new one
They have a pipeline of hypes ready to be deployed at a moment's notice. The next one is quantum, it's already gathering in the background. Give it a couple of years.
Can LLM's compute any computable function? I thought that an LLM can approximate any computable function, if the function is within the distribution that it is are trained on. I think it's jolly interesting to think about different axiomizations in this context.
Also we know that LLM's can't do a few things - arithmetic, inference & planning are in there. They look like they can because they retrieve discussions from the internet that contain the problems, but when they are tested out of distribution then all of a sudden they fail. However, some other nn's can do these things because they have the architecture and infrastructure and training that enables it.
There is a question for some of these as to whether we want to make NN's do these tasks or just provide calculators, like for grade students, but on the other hand something like Alphazero looks like it could find new ways of doing some problems in planning. The challenge is to find architectures that integrate the different capabilities we can implement in a useful and synergistic way. Lots of people have drawn diagrams about how this can be done, then presented them with lots of hand waving at big conferences. What I love is that John Laird has been building this sort of thing for like, forty years, and is roundly ignored by NN people for some reason.
Maybe because he keeps saying it's really hard and then producing lots of reasons to believe him?
Many of the "specialist" parts of the brain are still made from cortical columns, though. Also, they are in many cases partly interchangeable, with some reduction in efficiency.
Transformers may be like that, in that they can do generalized learning from different types of input, with only minor modifications needed to optimize for different input (or output) modes.
Cortical columns are one part of much more complex systems of neural compute that at a minimum includes recursive connections with thalamus, hypothalamus, midbrain, brainstem nuclei, cerebellum, basal forebrain, — and the list goes on.
So it really does look like a society of networks, all working in functional synchrony (parasynchrony might be a better word) with some firms of “consciousness” updated in time slabs of about 200-300 milliseconds.
LLMs are probably equivalent now to Wernicke’s and Broca’s areas, but much more is needed “on top” and “on bottom”—-motivation, affect, short and longterm memory, plasticity of synaptic weighting and dynamics, and perhaps most important, a self-steering attentional supervisor or conductor. That attentional driver system is what we probably mean by consciousness.
> That attentional driver system is what we probably mean by consciousness.
You may know much more about this than me, but how sure are you about this? To me it seems like a better fit that the "self-steering attentional supervisor" is associated with what we mentally model (and oversimplify) as "free will", while "consciousness" seems to be downstream from the attention itself, and has more to do with organizing and rationalizing experiences than with than with the directly controlling behavior.
This processed information then seems to become ONE input to the executive function in following cycles, but with a lag of at least 1 second, and often much more.
> one part of much more complex systems of neural compute
As for your main objection, you're obviously right. But I wonder how much of the computation that is relevant for intelligence is actually in those other areas. It seems to me that recent developments indicate that Transformer type models are able to self-organize into several different type of microstructures, even within present day transformer based models [1].
Not sure at all. Also some ambiguities in definitions. Above I mean “consciousness” of the type many would be willing to assume operates in a cat, dog, or mouse—attentional and occasionally, also intentional.
I agree that this is downstream of pure attention. Attention needs to be steered and modulated. The combination of the two levels working together recursively is what I had in mind.
“Free will” gets us into more than that. I’ve been reading Daniel Dennett on levels of “intention” this week. This higher domain of an intentional stance (nice Wiki article) might get labeled “self-consciousness”.
Most humans seem to accept this as a cognitive and mainly linguistic domain—the internal discussions we have with ourselves, although I think we also accept that there is are major non-linguistic drivers. Language is an amazingly powerful tool for recursive attentional and semantic control.
My take on "free will" is definitely partly based on Dennett's work.
As for "consciousness", it seems to me that most of not all actions we do are decided BEFORE they hit our consciousness. For actions that are not executed immediately, the processing that we experience as "consciousness" may then raise some warning flags if the action our pre-conscious mind has decided on is likely to cause som bad consequences. This MAY cause the decision-making part (executive function) of the brain to modify the decision, but not because the consciousness can override the decision directly.
Instead, when this happens, it seems to be that our consciousness extrapolates our story into the future in a way that creates fear, desire or similar more primal motivations that have more direct influence over the executive function.
One can test this by for instance standing at near the top of a cliff (don't du this if suicidal): Try to imagine that you have decided to jump of the cliff. Now imagine the fall from the cliff and you hitting the rocks below. Even if (and maybe especially if) you managed to convince yourself that you were going to jump, this is likely to trigger a fear response strong enough to ensure you will not jump (unless you're truely suicidal).
Or for a less synthetic situation. Let's say you're a married man, but in a situation where you have an opportunity to have sex with a beautiful woman. The executive part of the brain may already have decided that you will. But if your consciousness predicts that your wife is likely to find out and starts to spin a narrative about divorce, loosing access to your children and so on, this MAY cause your executive function to alter the decision.
Often in situations like this, though, people tend to proceed with what the preconcious executive function had already decided. Afterwards, they may have some mental crisis because they ended up doing something their consciousness seemed to protest against. They may feel they did it against their own will.
This is why I think that the executive function, even the "free will" is not "inside" of consciousness, but is separate from it. And while it may be influenced by the narratives that our consciousness spin up, it also takes many other inputs that we may or may not be conscious of.
The reason I still call this "free" will, is based on Dennett's model, though. And in fact, "free" doesn't mean what we tend to think it means. Rather, the "free" part means that there is a degree of freedom (like in a vector space) that is sensitive to the kind of incentives the poeple around you may provide for your actions.
For instance stealing something can be seen as a "free will" decision if you would NOT do it if you knew with 100% certainty that you would be caught and punished for it. In other words, "free will" actions are those that, ironically, other people can influence to the point where they can almost force you to take them, by providing strong enough incentives.
Afaik some are similar, yes. But we also have different types of neurons etc. Maybe we'll get there with a generalist approach, but imho the first step is a patchwork of specialists.
In a single run, obviously not any, because it's context window is very limited. With a loop and access to an "API" (or willing conversation partner agreeing to act as one) to operate a Turing tape mechanism? It becomes a question of ability to coax it into complying. It trivially has the ability to carry out every step, and your main challenge becomes to get it to stick to it over and over.
One step "up", you can trivially get GPT4 to symbolically solve fairly complex runs of instructions of languages it can never have seen before if you specify a grammar and then give it a program, with the only real limitation again being getting it to continue to adhere to the instructions for long enough before it starts wanting to take shortcuts.
In other words: It can compute any computable function about as well as a reasonably easily distractable/bored human.
What exactly is it you think it can't do? It can explain and apply a number of methods for calculating sin. For sin it knows the symmetry and periodicity, and so will treat requests for sin of larger values accordingly. To convince it to continue to write out the numbers for an arbitrary large number of values without emitting "... continue like this" or similar shortcut a human told to do annoyingly pointless repetitive work would also be prone to prefer is indeed tricky, but there's nothing to suggest it can't do it.
You're missing the point: who's using the 'add' instruction ? You. We want 'something' to think about using the 'add' instruction to solve a problem.
We want to remove the human from the solution design. It would help us tremendously tbh, just like I don't know, Google map helped me never to have to look for direction ever again ?
Interesting, how do you use this idea? If you prompt the LLM "create a python Add function Foo to add a number to another number", "using Foo add 1 and 2", or somesuch, but what's to stop it hallucinating and saying "Sure, let me do that for you, foo 1 and 2 is 347. Please let me know if you need anything else."
Nothing stops it from writing a recipe for soup for every request, but it does tend to do what it's told. When asked to do mathsy things and told it's got a tool for doing those it tends to lean into that if it's a good llm.
It writes a function and then you provide it to an interpreter which does the calculation output on which gpt proceeds to do the rest.
That’s how langchain works, chatgpt plugins and gpt function calling. It has proven to be pretty robust - that is, gpt4 realising when it needs to use a tool/write code for calculations when needed and then using the output.
What you’re proposing is equivalent to training a monkey (or a child for that matter) to punch buttons that correspond to the symbols it sees without actually teaching it what any of the symbols mean.
That's not the aim here. Very obviously what we are talking about here is _complementing_ AI language models with improved mathematical abilities, and whether that leads to anything interesting. Surely you understand that? Aren't you one of the highest rated commenters on this site?
If you want to solve grade school math problems, why not use an 'add' instruction? It's been around since the 50s, runs a billion times faster than an LLM, every assembly-language programmer knows how to use it, every high-level language has a one-token equivalent, and doesn't hallucinate answers (other than integer overflow).
We also know how to solve complex reasoning chains that require backtracking. Prolog has been around since 1972. It's not used that much because that's not the programming problem that most people are solving.
Why not use a tool for what it's good for and pick different tools for other problems they are better for? LLMs are good for summarization, autocompletion, and as an input to many other language problems like spelling and bigrams. They're not good at math. Computers are really good at math.
There's a theorem that an LLM can compute any computable function. That's true, but so can lambda calculus. We don't program in raw lambda calculus because it's terribly inefficient. Same with LLMs for arithmetic problems.