Care to share these examples, in a scientific (n > 30) manner that can’t just be attributed to model nondeterminism? I don’t follow these threads religiously but in the ones I’ve seen no one has been able to provide any sort of convincing evidence. I’m not some sort of OpenAI apologist, so if there is actual good provable evidence here I will easily change my mind about it
I don't see how anyone could provide what you are asking for. I can go through my chat history and find a prompt that got a better answer 3 months ago than I get now, but you can always just say it's nondeterminism.
Without access to the old model, I can't collect samples with n > 1