There was a recent paper that showed you can spread model’s behavior through tra...

ed · 2025-07-29T00:23:59 1753748639

Subliminal learning happens when the teacher and student models share a common base model, which is unlikely to be the case here

gojomo · 2025-07-29T16:19:51 1753805991

Given that other work shows that models often converge on similar internal representations, I'd not be surprised if there were close analogues of 'subliminal learning' that don't require shared-ancestor-base-model, just enough overlap in training material.

Further, "enough" training from another model's outputs – de facto 'distillation' – is likely to have similar effects as starting from a common base model, just "from thge other direction".

(Finally: some of the more nationalistic-paranoid observers seem to think Chinese labs have relied on exfiltrated weights from US entities. I don't personally think that'd be a likely or necessary contributor to Z.ai & others' successes, the mere appearance of this occasional "I am Claude" answer is sure to fuel further armchair belief in those theories.)

roenxi · 2025-07-29T10:47:32 1753786052

It is also possible that it learned off the internet that when someone says "Hello" to something that identifies as an AI assistant that the most appropriate response is "Hello! I'm Claude, an AI assistant created by Anthropic. How can I help you today?".

andy99 · 2025-07-28T21:39:16 1753738756

Didn't think of that, that would be an extremely interesting finding. However in that paper the transfer only happens for fine tunes of the same architecture, so it would be a whole new thing for it to happen in this case.