Been playing with Codex CLI the past week and it really loves to create a fix for a bug by adding a special case for just that bug in the code. It couldn't see the patterns unless I pointed them out and asked it to create new abstractions.
It would just keep adding what it called "heuristics", which were just if statements that tested for a specific condition that arose during the bug. I could write 10 tests for a specific type of bug, and it would happily fix all of them. When I add another one test with the same kind of bug it obviously fails, because the fix that Codex came up with was a bunch of if statements that matched the first 10 tests.
Also they hedge a lot, will try doing things one way, have a catch / error handler and then try a completely different way - only one of them can right but it just doesn't care. Have to lean hard to get it to check which paths are actually used and delete the others.
I am convinced this behaviour and the one you described are due to optimising for swe benchmarks that reward 1-shotting fixes without regard to quality. Writing code like this makes complete sense in that context.
That's a really good point. I was wondering why some of the LLMs were trained to try to pass things so sloppily constantly. Writing mock data, methods and pretending as if the task is complete and everything is great, good to go. They do seem to be trained just to pass some sort of conditions sadly and it feels somehow to me that it has got worse as of late. It should be relatively easy to reward them for writing robust code even if it takes longer or won't work, but it does seem they are geared towards getting high swe benchmarks.
It's clear that these AIs are approaching human level intelligence. (:
Thank you for giving a perfect example of what I was describing.
The thing is, you actually can make the software work this way, you just have to add enough if-statements to handle all cases--or rather, enough cases that the manager is happy.
In Oslo we seem to have a problem with trucks. Just in the past year, two people have been run over and killed by trucks. One was where the truck driver was reversing and another where the truck driver did an illegal right turn over a pavement.
Recently there has been a case in the courts where a truck driver didn’t yield to a cyclist and killed her. The narrative from the national truck association was basically that the cyclist was at fault. Even the courts were in on it, only when it got to the highest court did it seem that anyone was willing to blame the truck driver.
I'm actually planning on doing a second masters from a slightly more prestigious university with a more theory-heavy degree [1], but it's nice to at least have an official graduate degree now. Hopefully it helps me find work a bit quicker, and if nothing else it's just kind of fun to pile up degrees.
I couldn’t find anything that ticks all those outside of OU.
University of Texas has one that looked pretty ok, but it was kind of expensive for a non-Texas resident.
University of Western Florida has one for “Mathematical Sciences”, which more or less fits, and it’s not even that expensive, but I think that one is synchronous.
Yeah, I wish there were more options than that. Also, remote phd or master+phd would be even better, but these are even more uncommon and pricey (unless you know about a one that is good and cheap and remote then I’d love to learn more)
I was doing the University of York online PhD in computer science (formal methods), and it was actually pretty great, but it was costing me like $17,000 per year, and it was a huge time sink when I was already working full time.
That said, if you feel like you're organized enough to pull it off, I do recommend looking into University of York. It's a very good school.
Oh I get it now, thanks! Please let me know if you decide to enroll to math master's in OU, maybe we can help each other! I think I'll do the same on the next semester (so early 2026)
I’m sure the CPU designers would love it if they didn’t need several different layers of cache. Or no cache at all. Imagine if memory IOPS were as fast as L1 cache, no need for all that dedicated SRAM on the chip or worry about side channel attacks.
Sure, but we were talking about the perspective of software developers. The hardware designers take on complexity so that the software developer's work can be simpler.
It would just keep adding what it called "heuristics", which were just if statements that tested for a specific condition that arose during the bug. I could write 10 tests for a specific type of bug, and it would happily fix all of them. When I add another one test with the same kind of bug it obviously fails, because the fix that Codex came up with was a bunch of if statements that matched the first 10 tests.