This paper reads to me as being about fundamental limitations of Transformers and backdoor risk.
The paper starts off by reviewing work which uses an encompassing theoretical model of transformers to prove they're limited to only expressing computations in TC^0 (roughly, upperbounded by set of parallelizable problems that can be solved by relatively shallow circuits).
There's also a reference to a paper which finds that (wrt input problem size), a polynomial number of intermediate scratchpad decoding steps allow transformers to recognize the class of polynomial-time solvable problems, linear steps is context-sensitive languages.
This paper now ask about filler tokens, do they help? The answer is negative except for a very clever exception they work out: problems with demonstrations that can be decomposed to be solvable in parallel. This identifies a practical limitation (transformer next token prediction is not expressive enough to capture all of TC^0) at the same as it identifies a theoretical capability. From the paper:
> Taken together these findings suggest that although current LLMs are unlikely to benefit from filler tokens, this is not an in-principle limitation of current architectures.
If I've understood, this means for learning to use fillers to benefit from CoT data, demonstrations must be structured such that they can be computed in parallel and not as a more natural sequential, instance-adaptive process.
> in order to use filler tokens on natural language data, LLMs would need to discover parallelizable algorithmic solutions given access only to CoT demonstrations lacking parallel structure. By training on instance-adaptive chains of thought, we can study whether models can learn to use filler tokens having seen only more naturalistic chain-of-thought data
>...
> We find that models trained on instance-adaptive CoT data fail to use filler tokens. On filler token sequences, the resulting models remain at, or below, no-intermediate-token, baseline performance, Figure 6. This indicates that there is no transfer from serial, instance-adaptive demonstrations to filler tokens for the 3SUM problem.
It also appears that the parallelizable problem must have a certain amount of structural complexity before a gap appears versus no filler modes (unless using an impractical amount of filler tokens):
> we expect integer addition tasks will not offer suitably rich structures for taking advantage of filler tokens when using large models—natural-language tasks may offer alternatives
Empirically, other papers have shown that LLM performance on complex tasks deteriorates significantly with input length and distractor text. Anyone who has naively attempted to combine RAG with large contexts might also have first hand experience with this.
The reason I consider this to be primarily a backdoor risk is that the kind of data and learning required seems highly unlikely to occur naturally but someone could create documents to introduce triggerable obfuscated computations. While not an issue today, future LLM training might need to filter for data with meaningful parts separated by meaningless patterns of repeated characters.
This paper follows a recent trend of marketing excellent theoretical work as LLMs being capable of secretly plotting behind your back, when the realistic implication is backdoor risk.
An article currently on the first page is relevant:
"This paper follows a recent trend of marketing excellent theoretical work as LLMs being capable of secretly plotting behind your back, when the realistic implication is backdoor risk".
Many top computer scientists consider loss of control risks to be a possibility that we need to take seriously.
So the question then becomes, is there a way to apply science to gain greater clarity on the possibility of these claims? And this is very tricky, since we're trying to evaluate claims not about models that currently exist, but about future models.
And I guess what people have realised recently is that, even if we can't directly run an experiment to determine the validity of the core claim of concern, we can run experiments on auxiliary claims in order to better inform discussions. For example, the best way to show that a future model could have a capability is to demonstrate that a current model possesses that capability.
I'm guessing you'd like to see more scientific evidence before you want to take possibilities like deceptive alignment seriously. I think that's reasonable. However, work like this is how we gather that evidence.
Obviously, each individual result doesn't provide much evidence on its own, but the accumulation of results has helped to provide more strategic clarity over time.
I would agree that choice of language 'hidden reasoning' is a poor one.
This paper demonstrates a novel training approach which could yield narrow capability growth on a certain class of tasks.
The narrow test tube environment in which we see better performance hints at the unknown which when better understood could promise further yields down the road.
To my mind, the idea that filler tokens might promote immergent capability leading to broader task complexity capability is more promising than the backdoor risk you lay out. The possible scale in each direction just doesn't seem comparable to me(assuming each scenario plays out in a meaningful way).
Re the article...
A single fundamental breakthrough could make his entire article obsolete in a single month. We've found a lot of limits to LLMs sure... This is always how it goes over the history of AI right? The pace of fundamental breakthroughs seems of more relevant conversation with respect to the prospects for AGI as framed by his article.
The paper also proves that this capability, one unlikely to occur naturally, does not help for tasks where one must create sequentially dependent chains of reasoning, a limiting constraint. At least not without overturning what we believe about TCS.
> A single fundamental breakthrough
Then we'd no longer be talking about transformers. That something unpredicted could happen is trivially true.
> immergent capability
It's specifically trained in, requires heavy supervision and is hard to learn. It's surprising that Transformers can achieve this at all but it's not emergent.
You are taking literally 2-4 token phrases from my comment and attacking them without context. I'll spend time on the latter quote. You quote 'emergent capability'.
A) appreciate you correcting my spelling
B) 'The narrow test tube environment in which we see better performance hints at the unknown which when better understood could promise further yields down the road.
To my mind, the idea that filler tokens might promote immergent capability leading to broader task complexity'
C) Now that we have actual context... I'll leave the rest to the thoughtful reader. I said the following key words: 'hints', 'could', 'might'
D) Who asserted this behavior was emergent?
Recommend slowing down next time. You might get a more clear picture before you attack a straw man. Expect no further exchange. Best of luck.
The paper starts off by reviewing work which uses an encompassing theoretical model of transformers to prove they're limited to only expressing computations in TC^0 (roughly, upperbounded by set of parallelizable problems that can be solved by relatively shallow circuits).
There's also a reference to a paper which finds that (wrt input problem size), a polynomial number of intermediate scratchpad decoding steps allow transformers to recognize the class of polynomial-time solvable problems, linear steps is context-sensitive languages.
This paper now ask about filler tokens, do they help? The answer is negative except for a very clever exception they work out: problems with demonstrations that can be decomposed to be solvable in parallel. This identifies a practical limitation (transformer next token prediction is not expressive enough to capture all of TC^0) at the same as it identifies a theoretical capability. From the paper:
> Taken together these findings suggest that although current LLMs are unlikely to benefit from filler tokens, this is not an in-principle limitation of current architectures.
If I've understood, this means for learning to use fillers to benefit from CoT data, demonstrations must be structured such that they can be computed in parallel and not as a more natural sequential, instance-adaptive process.
> in order to use filler tokens on natural language data, LLMs would need to discover parallelizable algorithmic solutions given access only to CoT demonstrations lacking parallel structure. By training on instance-adaptive chains of thought, we can study whether models can learn to use filler tokens having seen only more naturalistic chain-of-thought data >... > We find that models trained on instance-adaptive CoT data fail to use filler tokens. On filler token sequences, the resulting models remain at, or below, no-intermediate-token, baseline performance, Figure 6. This indicates that there is no transfer from serial, instance-adaptive demonstrations to filler tokens for the 3SUM problem.
It also appears that the parallelizable problem must have a certain amount of structural complexity before a gap appears versus no filler modes (unless using an impractical amount of filler tokens):
> we expect integer addition tasks will not offer suitably rich structures for taking advantage of filler tokens when using large models—natural-language tasks may offer alternatives
Empirically, other papers have shown that LLM performance on complex tasks deteriorates significantly with input length and distractor text. Anyone who has naively attempted to combine RAG with large contexts might also have first hand experience with this.
The reason I consider this to be primarily a backdoor risk is that the kind of data and learning required seems highly unlikely to occur naturally but someone could create documents to introduce triggerable obfuscated computations. While not an issue today, future LLM training might need to filter for data with meaningful parts separated by meaningless patterns of repeated characters.
This paper follows a recent trend of marketing excellent theoretical work as LLMs being capable of secretly plotting behind your back, when the realistic implication is backdoor risk.
An article currently on the first page is relevant:
https://www.strangeloopcanon.com/p/what-can-llms-never-do