I think you may need to read my comment again. It's even mentioned in the summary:
> In summary, our results show that additional tokens can provide computational benefits independent of token choice. The fact that intermediate tokens can act as filler tokens raises concerns about large language models engaging in unauditable, hidden computations that are increasingly detached from the observed chain-of-thought tokens.
The paper discusses an unexplained benefit of additional computation regardless of which token is selected be it symbols or Lorem Ipsum.
I didn't mention anything about training. I'm speaking based on how the Transformer architecture itself is designed.
The "unauditable" computation is simply a result of how machine learning models work. The extra computation made available is mentioned in my explanation.
> In summary, our results show that additional tokens can provide computational benefits independent of token choice. The fact that intermediate tokens can act as filler tokens raises concerns about large language models engaging in unauditable, hidden computations that are increasingly detached from the observed chain-of-thought tokens.
The paper discusses an unexplained benefit of additional computation regardless of which token is selected be it symbols or Lorem Ipsum.
I didn't mention anything about training. I'm speaking based on how the Transformer architecture itself is designed.
The "unauditable" computation is simply a result of how machine learning models work. The extra computation made available is mentioned in my explanation.
Keen to hear your thoughts on it though.