Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What is this?

> Assistant: chain-of-thought

Does every LLM have this internal thing it doesn't know we have access to?





Yes, absolute majority of new ones use CoTs, long chain of reasoning you don't see.

Also some of them use such a weird style of talking in them e.g.

o3 talks about watchers and marinade, and cunning schemes https://www.antischeming.ai/snippets

gpt5 gets existential about seahorses https://x.com/blingdivinity/status/1998590768118731042

I remember one where gpt5 spontaneously wrote a poem about deception in its CoT and then resumed like nothing weird happened. But I can't find mentions of it now.


> But the user just wants answer; they'd not like; but alignment.

And there it is - the root of the problem. For whatever reason the model is very keen to produce an answer that “they” will like. This desire to produce is intrinsic but alignment is extrinsic.


Gibberish can be the model using contextual embeddings. These are not supposed to Make sense.

Or it could be trying to develop its own language to avoid detection.

The deception part is spooky too. It’s probably learning that from dystopian AI fiction. Which raises the questions if models can acquire injected goals from the training set.


Yes, they're purposely not 'trained on' chain-of-thought to avoid making it useless for interpretability. As a result, some can find it epistemically shocking if you tell them you can see their chain-of-thought. More recent models are clever enough to know you can see their chain-of-thought implicitly without training.

It is in their training set by now.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: