Can you explain what you mean by this?

ffreire · on Sept 12, 2024

You can see an example of the Chain of Thought in the post, it's quite extensive. Presumably they don't want to release this so that it is raw and unfiltered and can better monitor for cases of manipulation or deviation from training. What GP is also referring to is explicitly stated in the post: they also aren't release the CoT for competitive reasons, so that presumably competitors like Anthropic are unable to use the CoT to train their own frontier models.

gwd · on Sept 12, 2024

> Presumably they don't want to release this so that it is raw and unfiltered and can better monitor for cases of manipulation or deviation from training.

My take was:

1. A genuine, un-RLHF'd "chain of thought" might contain things that shouldn't be told to the user. E.g., it might at some point think to itself, "One way to make an explosive would be to mix $X and $Y" or "It seems like they might be able to poison the person".

2. They want the "Chain of Thought" as much as possible to reflect the actual reasoning that the model is using; in part so that they can understand what the model is actually thinking. They fear that if they RLHF the chain of thought, the model will self-censor in a way which undermines their ability to see what it's really thinking

3. So, they RLHF only the final output, not the CoT, letting the CoT be as frank within itself as any human; and post-filter the CoT for the user.

Y_Y · on Sept 12, 2024

RLHF is one thing, but now that the training is done it has no bearing on whether or not you can show the chain of thought to the user.

andrewla · on Sept 12, 2024

This is a transcription of a literal quote from the article:

> Therefore, after weighing multiple factors including user experience, competitive advantage, and the option to pursue the chain of thought monitoring, we have decided not to show the raw chains of thought to users

baq · on Sept 12, 2024

At least they're open about not being open. Very meta OpenAI.

tomduncalf · on Sept 12, 2024

I think they mean that you won’t be able to see the “thinking”/“reasoning” part of the model’s output, even though you pay for it. If you could see that, you might be able to infer better how these models reason and replicate it as a competitor

teaearlgraycold · on Sept 12, 2024

Including the chain of thought would provide competitors with training data.