This paper reads to me as being about fundamental limitations of Transformers an...

casebash · on April 28, 2024

"This paper follows a recent trend of marketing excellent theoretical work as LLMs being capable of secretly plotting behind your back, when the realistic implication is backdoor risk".

Many top computer scientists consider loss of control risks to be a possibility that we need to take seriously.

So the question then becomes, is there a way to apply science to gain greater clarity on the possibility of these claims? And this is very tricky, since we're trying to evaluate claims not about models that currently exist, but about future models.

And I guess what people have realised recently is that, even if we can't directly run an experiment to determine the validity of the core claim of concern, we can run experiments on auxiliary claims in order to better inform discussions. For example, the best way to show that a future model could have a capability is to demonstrate that a current model possesses that capability.

I'm guessing you'd like to see more scientific evidence before you want to take possibilities like deceptive alignment seriously. I think that's reasonable. However, work like this is how we gather that evidence.

Obviously, each individual result doesn't provide much evidence on its own, but the accumulation of results has helped to provide more strategic clarity over time.

nopromisessir · on April 28, 2024

I would agree that choice of language 'hidden reasoning' is a poor one.

This paper demonstrates a novel training approach which could yield narrow capability growth on a certain class of tasks.

The narrow test tube environment in which we see better performance hints at the unknown which when better understood could promise further yields down the road.

To my mind, the idea that filler tokens might promote immergent capability leading to broader task complexity capability is more promising than the backdoor risk you lay out. The possible scale in each direction just doesn't seem comparable to me(assuming each scenario plays out in a meaningful way).

Re the article...

A single fundamental breakthrough could make his entire article obsolete in a single month. We've found a lot of limits to LLMs sure... This is always how it goes over the history of AI right? The pace of fundamental breakthroughs seems of more relevant conversation with respect to the prospects for AGI as framed by his article.

Vetch · on April 28, 2024

The paper also proves that this capability, one unlikely to occur naturally, does not help for tasks where one must create sequentially dependent chains of reasoning, a limiting constraint. At least not without overturning what we believe about TCS.

> A single fundamental breakthrough

Then we'd no longer be talking about transformers. That something unpredicted could happen is trivially true.

> immergent capability

It's specifically trained in, requires heavy supervision and is hard to learn. It's surprising that Transformers can achieve this at all but it's not emergent.

nopromisessir · on April 28, 2024

Look...

You are taking literally 2-4 token phrases from my comment and attacking them without context. I'll spend time on the latter quote. You quote 'emergent capability'.

A) appreciate you correcting my spelling

B) 'The narrow test tube environment in which we see better performance hints at the unknown which when better understood could promise further yields down the road.

To my mind, the idea that filler tokens might promote immergent capability leading to broader task complexity'

C) Now that we have actual context... I'll leave the rest to the thoughtful reader. I said the following key words: 'hints', 'could', 'might'

D) Who asserted this behavior was emergent?

Recommend slowing down next time. You might get a more clear picture before you attack a straw man. Expect no further exchange. Best of luck.