There was a recent paper that showed you can spread model’s behavior through training on outputs, even if you don’t directly include obvious markers of the behavior. It’s totally plausible that training off Claude’s outputs subtly affected GLM into mentioning “Claude” even if they don’t include the direct tokens very often.
Given that other work shows that models often converge on similar internal representations, I'd not be surprised if there were close analogues of 'subliminal learning' that don't require shared-ancestor-base-model, just enough overlap in training material.
Further, "enough" training from another model's outputs – de facto 'distillation' – is likely to have similar effects as starting from a common base model, just "from thge other direction".
(Finally: some of the more nationalistic-paranoid observers seem to think Chinese labs have relied on exfiltrated weights from US entities. I don't personally think that'd be a likely or necessary contributor to Z.ai & others' successes, the mere appearance of this occasional "I am Claude" answer is sure to fuel further armchair belief in those theories.)
It is also possible that it learned off the internet that when someone says "Hello" to something that identifies as an AI assistant that the most appropriate response is "Hello! I'm Claude, an AI assistant created by Anthropic. How can I help you today?".
Didn't think of that, that would be an extremely interesting finding. However in that paper the transfer only happens for fine tunes of the same architecture, so it would be a whole new thing for it to happen in this case.
https://alignment.anthropic.com/2025/subliminal-learning/