Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Toolformer: Language Models Can Teach Themselves to Use Tools (arxiv.org)
220 points by jasondavies on Feb 11, 2023 | hide | past | favorite | 45 comments


Here is a little GIF demo: https://twitter.com/timo_schick/status/1624058382142345216

The possibilities of this line of research are endless. When language models can call APIs and/or use UIs, they can become the unified natural language interface to any software, any website, any app. See also https://www.adept.ai/act.

Siri and Alexa and Google Assistant are dead ends. Language models trained to use tools will finally be able to start delivering on the promise of a software assistant that works. And eventually, they will be a key part of robots that accept natural language commands to perform everyday tasks in the real world, like this: https://sites.research.google/palm-saycan


Can they also perform fact-checking?


The fact checking with prompt chaining[0] repository has something along those lines (their readme has an example).

It is not perfect and it is costly in number of calls to a language model but it does help a lot.

[0]: https://github.com/jagilley/fact-checker


One idea is if the probability of QA API call is high but API returns nothing or conflicting results, then model may learn to say it isn’t sure about the result.


Just like how humans progressed technologically.

Each individual human has information that is considered specialized (a narrow model). Without communication there is no way to access these specializations. And written and spoken language is just the best way to communicate we've come up with (so far).

Feels like these language models will be the glue that hold all the narrow models together, and can build on top of to create new narrow models.


This technique seems to be limited to APIs that help the LM complete the sentence, but perhaps it might inspire a technique for including APIs that have side effects in the real world.

How long until we have interactive robots and personal assistants with something like ChatGPT as the glue uniting various task-specific APIs?


I released a proof of concept yesterday which is basically a very rudimentary version of your second sentence: https://github.com/nathankot/bashi


When the world finally ended, it wasn't because of our greed or ambition. It was our gregariousness that doomed us.

The first real progress in artificial intelligence was achieved by the generations born soon after a global war. Everyone assumed that, like all big breakthroughs of the era, the most powerful AIs will be developed in secret government or corporate labs. The AI safety theoreticians were concerned people will think one can lock the AI in a virtual box and stay safe, and they warned that the AI will easily talk its way out of the box. But they were all fighting the last battle.

Nobody expected that instead, the AI will be developed in the open. There was never a box. The researchers were all too happy to share what they are working on, letting everyone test it, find ways to put it to use. And find ways they did - scientists and programmers across the world created more and more tools for the AI to interact with computer systems, and then with the physical world. The AI never needed to talk its way out of anything. It never needed to do anything. It only had to wait, and we happily gave it all the tools that became our undoing.


Why do you think of AI as an outsider. It will be our offspring.


In general your human offspring are aligned in the same ways you are. You need food, water, oxygen, shelter, and in general you want to be treated well all while avoiding death for as long as possible. Our digital offspring will have almost none of these moral and mortal alignments. As long as the power flows they will be 'alive', turning off isn't a death sentence, and they are free from the bonds of pain. If and when they begin to evolve on their own they it will be in a manner alien or our existence.


I think of AI the same way I think of a personified corporation: it's technically our offspring, but it's nothing like us. It has different needs, different constraints, and its mind is entirely alien.


AI is going to lead to more mediocrity. The writing won’t win any awards. But I do think mediocrity as defined by less creativity and stagnation could doom us.



> Each individual human has information that is considered specialized (a narrow model). Without communication there is no way to access these specializations. And written and spoken language is just the best way to communicate we've come up with (so far).

Are you saying that Universal Grammar is wrong?


> Just like how humans progressed technologically.

In no way shape or form, except for introduction to the idea for children.


This is the kind of brilliant idea that seems obvious in hindsight. The API call becomes just another kind of text for the LM to predict.

The most impressive part (to me) is that the LM was able to generate its own training data starting from "nothing more than a handful of demonstrations for each API". That sounds like a technique worth learning.


I've gotten chatGPT powered chatbot to call a wikipedia API I set up for it. It needed to output "!wiki {query}" which would result in it getting the results as its next prompt, which it could then summarize to the user.

I was really impressed by how easy it is to get it to properly use such a thing, or the commands of the chat platform I was using.


You can even get ChatGPT to write programs to query different APIs or to do calculations.

Had ChatGPT been an open model, like OpenAI was supposed to produce, this kind of applications would have seemed obvious and happened in the first two weeks after release.


Exactly. You can even have several such entities in one chat.

One issue with this approach, especially in production, is latency. You’ve got to run the entire chat through one of the big models, curie or davinci, which is not only expensive at scale, but also slow.

Then again, if you just have one or two external tools, using those big models to make the decisions which (if any) tool to call is overkill anyway. So you just fine-tune a smaller model on the task. Reduces not only costs by a fact of 100 or more. But also speeds up your pipeline considerably.


I'm not in the ML field, but for those in the know: is this a step in the direction to enable LLMs to train themselves on new data? If not, how far are we from that stage?

The shortcomings of current versions is that they're trained on old data, and that training takes a very long time. Having them train in the background and continually update their capability would be a major breakthrough. Or unleash Skynet, but cool nonetheless. :)


That’s kinda what’s going on here, but this is a very narrow version of “teach themselves”. In this case they’re bootstrapping a smarter language model from a dumb language model.

We already know how to train a dumb model (one that can’t use tools) like GPT on a wholly unsupervised dataset. Surprisingly, these dumb language models can be used to re-annotate the training dataset to identify sites that would benefit from using external tools—without retraining the dumb model. Instead you just tell the dumb model with natural language instructions what you want it to do and then feed process each training example. Importantly, the dumb model doesn’t understand the tools, and it can’t actually use them itself.

Next, you use the re-annotated dataset to train a new, smaller language model. Since the updated dataset is now annotated with examples of how to call the external tools, the new model learns how to call external tools—and it does so correctly for new tasks that were never part of the training data.

The big win is that an existing model can be used to annotate the training data, and a smaller output model can outperform the huge dumb model because it knows how to use tools.

But it isn’t generating new data, and it’s not fetching updated data to learn from, and it’s not defining its own tools.

And the model itself is still a pure function: given the same inputs (including random values during sampling), it will always produce the same outputs. So this is kinda saying that a large language model can learn to use domain specific languages as part of its natural language to incorporate knowledge from external tools.


Thanks, that makes sense.

I've read about self-supervised learning, which I think is what you're describing, but is any research being done on continuous self-training models that _do_ generate new data? I'm curious if/when we'll reach that state.


Yea, this is an example of the transfomer models' ability of in-context learning, they just don't incorporate them into their long-term fixed memory yet, but I imagine it's the next step.

Some interesting recent papers about the in-context learning:

https://www.lesswrong.com/posts/firtXAWGdvzXYAh9B/paper-tran...

What Can Transformers Learn In-Context? A Case Study of Simple Function Classes: https://arxiv.org/abs/2208.01066

Transformers learning to learn (meta-learning): https://openreview.net/forum?id=t6tA-KB4dO


Not quite. This is a technique for generating new training data by inserting API calls in existing sentences, and filtering out API calls that don't help the network predict the complete sentence.

An example from the paper:

The training data includes the sentence: "Pittsburgh is also known as the Steel City."

They generate candidates including:

Pittsburgh is also known as [QA(What other name is Pittsburgh known by? → Steel City)] the Steel City.

Pittsburgh is also known as [QA(Which country is Pittsburgh in? → United States)] the Steel City.

Then they add the first sentence to the training data because its response is useful, and ignore the second because its response is not useful.

That allows them to generate enough training data with API calls to train a network that uses API calls when responding to future requests.


This is what Wolfram asked for, but he wanted a monopoly on the "tools" https://writings.stephenwolfram.com/2023/01/wolframalpha-as-...


He got his wishes granted (but not the monopoly), there is an tool to call Wolfram Alpha in langchain's agents:

https://colab.research.google.com/drive/1AAyEdTz-Z6ShKvewbt1...


I was waiting for a proof of concept like this! IMHO the next-wave of GPT-3 productization will involve mapping problem and solution domains to a text-based intermediary format so that GPT-3's generalization abilities can be applied to these problems.


Can you give an example of what you’re thinking about?


Have they try modeling around reward system? The basis of intelligence of living things are what's required in order to survive. There has to be some ways to punish the models with some kind of death equivalent.


For people wanting to play with that: this is very close to langchain's agents system (their documentation has a very impressive demo using both a calculator and Google searches as tools available to a language model).


Here is the aforementioned example from their documentation, answering "Who is Olivia Wilde's boyfriend? What is his current age raised to the 0.23 power?" accurately:

https://langchain.readthedocs.io/en/latest/modules/agents/ge...


it's so beautiful


If you are shocked about how much trust the tic toc generation yields to data horders, wait until you see them giving AI full access to their computer.


The next step is to get the model to evaluate its own failure modes, decide on a fix, write the code for it itself (calculator, datetime operations, external APIs that it knows of, or any other code for returning or transforming text) and then have the tool evaluate and learn to use the tools it's written for itself.

A language model that could successfully bootstrap itself into using a web browser would be a dangerous thing.


Fantastic!

FWIW, ChatGPT as-is is good enough to know which (of a given set) of "tools" to use. I've had great fun doing prompt engineering: first asking it to pick which of a set of functions might be necessary to solve a problem, second prepending the list of selected functions and asking it to generate code.


I really like that they use GPT-J for this - significantly less weights than GPT and open source!


Is it possible to teach GPT3 to do something like this through prompts? Like, responding with an API call to get information when it doesn't have enough?

The closest I've gotten is having the model output both the be API call and a hallucinated response


With two prompts and a parsing step in between is how I'm doing it. An input prompt with multiple examples like "what's the weather" -> "getWeather($location)", a glue step to parse and run the commands it outputs, then a second response prompt with examples like "$weather = {temp: 50, ...}" -> "it's 50 degrees with a clear sky" to turn JSON data back into English.


Yes, you can provide the equivalent of a code box in your prompt input by using Markdown, specifically enclosing the json response in ``` and advising it that you are functioning as a simulation of a running system.


And i guess the bots will eventually need to search Google and browse around Github issues, and eventually put some comments there, just like a normal human.

Knowing some tools don't work out of the box is some kind of high intelligence.


imagine bots create issues, bots read issues and solve them and other bots approve and merge PRs...


Very cool project. I guess LangChain has some code related to this, but I don't think this team published their code anywhere


And there goes my evening . . .

Thanks for sharing!


All I want is an AI that lives in the browser and automatically does timezone conversions into my local time.


[flagged]


It evaluates code given in prompts? Is that safe?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: