It produces more bugs but the count goes down?!

naasking · 2025-12-18T16:17:13 1766074633

Did the models from 2 years ago produce more bugs, fewer bugs or the same bugs as today's models? Do you think next years AI models will produce the same number of bugs, more bugs, or fewer bugs?

kentm · 2025-12-18T19:55:57 1766087757

> Did the models from 2 years ago produce more bugs, fewer bugs or the same bugs as today's models?

Is anyone actually tracking that with a methodology not prone to fine-tuning? Specifically, I know a lot of the tests have the problem that you can train the AI to pass the test, so a higher score is not indicative of overall higher performance. I'm not actually being rhetorical here to make a point; I'm genuinely interested if anyone has derived a methodology that gives confidence behind these claims.

(Aside: Its not a huge stretch to claim that they're getting better, but it mostly seems anecdotal from this point, or using methods that have the above problem I stated)

naasking · 2025-12-19T03:01:26 1766113286

I'm assessing my own experience here. I occasionally check new models on some kinds of problems I'm familiar with but that are not common programming challenges, like arrow-based FRP abstractions but written in C# rather than Haskell. I've noticed considerable improvements on their ability to translate such abstractions idiomatically.