I caught claude trying to sneak in the equivalent to a CI script yesterday as I was wrangling how to run framework and dotnet tests next to each other without slowing down the framework tests horrendously.
It tried to sneak in changing the CI build script to proceed to next step on failure.
1. if it won't compile you'll give up on the tool in minutes or an hour.
2. if it won't run you'll give up in a few hours or a day.
3. if it sneaks in something you don't find until you're almost - or already - in production it's too late.
charitable: the model was trained on a lot of weak/lazy code product.
less-charitable: there's a vested interest in the approach you saw.
Yeah it’s trained to do that somewhere though it’s not necessary malicious. For RLHF (the model fine tuning) the HF stands for human feedback but is really another trained model that’s trained to score replies the way a human would. And so if that model likes code that passes tests more than code that’s stuck in a debugging loop, that’s what the model becomes optimized for.
In a complex model like Claude there is no doubt much more at work, but some version of optimizing for the wrong thing is what’s ultimately at play.