I'm not in the ML field, but for those in the know: is this a step in the direct...

cgearhart · on Feb 12, 2023

That’s kinda what’s going on here, but this is a very narrow version of “teach themselves”. In this case they’re bootstrapping a smarter language model from a dumb language model.

We already know how to train a dumb model (one that can’t use tools) like GPT on a wholly unsupervised dataset. Surprisingly, these dumb language models can be used to re-annotate the training dataset to identify sites that would benefit from using external tools—without retraining the dumb model. Instead you just tell the dumb model with natural language instructions what you want it to do and then feed process each training example. Importantly, the dumb model doesn’t understand the tools, and it can’t actually use them itself.

Next, you use the re-annotated dataset to train a new, smaller language model. Since the updated dataset is now annotated with examples of how to call the external tools, the new model learns how to call external tools—and it does so correctly for new tasks that were never part of the training data.

The big win is that an existing model can be used to annotate the training data, and a smaller output model can outperform the huge dumb model because it knows how to use tools.

But it isn’t generating new data, and it’s not fetching updated data to learn from, and it’s not defining its own tools.

And the model itself is still a pure function: given the same inputs (including random values during sampling), it will always produce the same outputs. So this is kinda saying that a large language model can learn to use domain specific languages as part of its natural language to incorporate knowledge from external tools.

imiric · on Feb 12, 2023

Thanks, that makes sense.

I've read about self-supervised learning, which I think is what you're describing, but is any research being done on continuous self-training models that _do_ generate new data? I'm curious if/when we'll reach that state.

spyder · on Feb 12, 2023

Yea, this is an example of the transfomer models' ability of in-context learning, they just don't incorporate them into their long-term fixed memory yet, but I imagine it's the next step.

Some interesting recent papers about the in-context learning:

https://www.lesswrong.com/posts/firtXAWGdvzXYAh9B/paper-tran...

What Can Transformers Learn In-Context? A Case Study of Simple Function Classes: https://arxiv.org/abs/2208.01066

Transformers learning to learn (meta-learning): https://openreview.net/forum?id=t6tA-KB4dO

leereeves · on Feb 11, 2023

Not quite. This is a technique for generating new training data by inserting API calls in existing sentences, and filtering out API calls that don't help the network predict the complete sentence.

An example from the paper:

The training data includes the sentence: "Pittsburgh is also known as the Steel City."

They generate candidates including:

Pittsburgh is also known as [QA(What other name is Pittsburgh known by? → Steel City)] the Steel City.

Pittsburgh is also known as [QA(Which country is Pittsburgh in? → United States)] the Steel City.

Then they add the first sentence to the training data because its response is useful, and ignore the second because its response is not useful.

That allows them to generate enough training data with API calls to train a network that uses API calls when responding to future requests.