I hope you are aware of the fact that LLMs does not have direct access to the stream of words/characters. It is one of the most basic things to know about their implementation.
Yes, but it could learn to associate tokens with word counts as it could with meanings.
Even still, if you ask it for token count it would still fail. My point is that it can’t count, the circuitry required to do so seems absent in these models