This is something that people have toyed with to improve the quality of LLM resp...

This is something that people have toyed with to improve the quality of LLM responses. Often instructing the LLM to "think about" a problem before giving the answer will greatly improve the quality of response. For example, if you ask it how many letters are in the correctly spelled version of a misspelled word, it will first give the correct spelling, and then the number (which is often correct). But if you instruct it to only give the number the accuracy is greatly reduced.

I like the idea too that they turbocharged it by taking the limits off during the "thinking" state -- so if an LLM wants to think about horrible racist things or how to build bombs or other things that RLHF filters out that's fine so long as it isn't reflected in the final answer.