I read this all the time and yet no one can seem to come up with even a few questions from several months ago that ChatGPT has become “worse” at. You would think if this is happening it would be very easy to produce such evidence since chat history of all conversations is stored by default.
It's probably just subjective bias, once the novelty wears off you learn not to rely on it as much because sometimes it's very difficult to get what you specifically want, so in my personal experience I ended up using it less and less to avoid butting heads with it, to the point I disabled my subscription altogether. YMMV of course.
Care to share these examples, in a scientific (n > 30) manner that can’t just be attributed to model nondeterminism? I don’t follow these threads religiously but in the ones I’ve seen no one has been able to provide any sort of convincing evidence. I’m not some sort of OpenAI apologist, so if there is actual good provable evidence here I will easily change my mind about it
I don't see how anyone could provide what you are asking for. I can go through my chat history and find a prompt that got a better answer 3 months ago than I get now, but you can always just say it's nondeterminism.
Without access to the old model, I can't collect samples with n > 1
Here is one.
I ask it to write some code. 4-5 pages long. With some back & forth it does. Then I ask "change lines 50-65 from blue to red", and it does (change#1). I ask it to show me the full code. Then I ask "change lines 100-120 from yellow to green". Aaaaand it makes the change#2 and revokes the change#1. Oh!! the amount of times this has happened.. So now I ask it to make a change, I do it by 'paragraph' and I copy & paste the new paragraph. It's annoying, but still makes things faster.
OpenAI regularly changes the model and they admit the new models are more restricted, in the sense that they prevent tricky prompts from producing naughty words, etc.
It should be their responsibility to prove that it's just as capable.
He who makes the logical argument must provide the burden of proof. Did OpenAI claim that their models didn’t regress while putting these new safeguards into place? If not, it feels like the burden of proof lies on whoever said that they did.
To be specific, the claim we are talking about here is “ChatGPT gives generally worse answers to the exact same questions than ChatGPT gave X months ago”. Perhaps for the subset of knowledge space you reference that updates were pushed to that is pretty easily provably true, but I’m more interested in the general case.
In other words, you can pretty easily make the claim that ChatGPT got worse at telling me how to make a weapon than it did 3 months ago. I could pretty easily believe that and also accept that it was probably intentional. While we can debate whether it was a good idea or not, I’m more interested in the claim over whether ChatGPT got worse at summarizing some famous novel or helping write a presentation than it was 3 months ago.
Well, sure, but shouldn’t some pedant have the time to dig up their ChatGPT history from 4 months ago to disprove the claim? Seems like it would be pretty easy to do and there are plenty of pedants on the internet but I don’t see the blogosphere awash of side by side comparisons showing how much worse it got
One example: it now refuses to summarise books that it trained on. Soon after trying GPT-4 I could get it to summarise Evans DDD chapter by chapter. Not anymore.
Pointing out a specific bug with functionality is not the same as saying “in general the quality of GPT answers has decreased over X months” especially when that bug is in a realm that LLM’s have already been provably bad at.