They really lie. Not on purpose; because they are trained on rewards that favor ...

lo_zamoyski · 2025-12-12T18:19:12 1765563552

> They really lie. Not on purpose

You can't lie by accident. You can tell a falsehood, however.

But where LLMs are concerned, they don't tell truths or falsehoods either, as "telling" also requires intent. Moreover, LLMs don't actually contain propositional content.

catigula · 2025-12-12T22:49:15 1765579755

I think you’re saying this with unwarranted confidence.

nomel · 2025-12-12T17:47:30 1765561650

Reference: https://www.science.org/content/article/ai-hallucinates-beca...

If you don't know the answer, and are only rewarded for correct answers, guessing, rather than saying "I don't know", is the optimal approach.

pegasus · 2025-12-12T22:36:02 1765578962

It's more than just that, but thanks for that link, I've been meaning to dig it up and revisit it. Beyond hallucinations, there are also deceptive behaviors like hiding uncertainty, omitting caveats or doubling down on previous statements even when weaknesses are pointed out to it. Plus there necessarily will be lies in the training data as well, sometimes enough of them to skew the pretrained/unaligned model itself.

Neywiny · 2025-12-12T16:31:35 1765557095

Not sure if that counts as lying but I've heard that an ML model (way before all this GPT LLM stuff) learned to classify images based on the text that was written. For an obfuscated example, it learned to read "stop", "arrêt", "alto", etc. on a stop sign instead of recognizing the red octagon with white letters. Which naturally does not work when the actual dataset has different text.

Jon_Lowtek · 2025-12-12T16:50:50 1765558250

typographic attacks against vision-language models are still a thing with more recent models like GPT4-V: https://arxiv.org/abs/2402.00626

catigula · 2025-12-12T16:45:10 1765557910

That does feel a little more like over-fitting, but you might be able to argue that there's some philosophical proximity to lying.

I think, largely, the

  Pre-training -> Post-training -> Safety/Alignment training

pipeline would obviously produce 'lying'. The trainings are in a sort of mutual dissonance.