Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Honest question:

> Anthropic showed that LLMs don't understand their own thought processes

Where can I find this? I am really interested in that. Thanks.





https://www.anthropic.com/research/tracing-thoughts-language...

> Claude, on occasion, will give a plausible-sounding argument designed to agree with the user rather than to follow logical steps. We show this by asking it for help on a hard math problem while giving it an incorrect hint. We are able to “catch it in the act” as it makes up its fake reasoning, providing a proof of concept that our tools can be useful for flagging concerning mechanisms in models...

> Claude seems to be unaware of the sophisticated "mental math" strategies that it learned during training. If you ask how it figured out that 36+59 is 95, it describes the standard algorithm involving carrying the 1. This may reflect the fact that the model learns to explain math by simulating explanations written by people, but that it has to learn to do math "in its head" directly, without any such hints, and develops its own internal strategies to do so.


Thank you.

Well algorithms don't think. That's what LLM's are.

Your digital thermometer doesn't think either.


The question is more whether LLMs can accurately report their internal operations, not whether any of that counts as "thinking."

Simple algorithms can, eg, be designed to report whether they hit an exceptional case and activated a different set of operations than usual.


That's basically a variant of the halting problem and what you hope to get is a supervisor responding. If people expected this I don't think they would be as confused about the difference between statistical analysis of responses requiring emotions to be convincing and an LLM showing atonement.

I was asking for a technical argument against that spurious use of the term.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: