1. Open source tools solve the problem of "critical functions of the application changing without notice, or being signed up for disruptive testing without opt-in".
2. This makes me afraid that it is absolutely impossible for open source tools to ever reach the level of proprietary tools like Claude Code precisely because they cannot do A/B tests like this which means that their design decisions are usually informed by intuition and personal experience but not by hard data collected at scale.
Regarding point 1 specifically, there were so many people seriously miffed at the “man after midnight”[0] time-based easter egg that I would be careful with that reasoning.
Open source doesn’t always mean reproducible.
People don’t enjoy the thought of auditing code… someone else will do it; and its made somewhat worse with our penchant to pull in half the universe as dependencies (Rust, Go and Javascript tend to lean in this direction to various extremes). But auditing would be necessary in order for your first point here to be as valid as you present.
> People don’t enjoy the thought of auditing code… someone else will do it
I think that with modern LLMs auditing a big project personally, instead of relying on someone else to do it, actually became more realistic.
You can ask an LLM to walk you through the code, highlight parts that seem unusual or suspicious, etc.
On the other hand, LLMs also made producing code cheaper then ever, so you can argue, that big projects will just become even bigger wich will put them out of reach even for a reviewer who is also armed with an LLM.
fuck man, I'm either seriously stupid or y'all are taking crazy pills.
LLMs are auto-complete on steroids; I've lived through enough iterations of Markov Chains giving semi-sensible output (that we give meaning to) and neural networks which present the illusion of intelligence to see directly what these LLMs are: a fuckload of compute designed to find "the next most common word" given the preceding 10,000 or more words.
In such a case, the idea of it actually auditing anything is hilarious. You're looking at a 1/100 in actually finding anything useful. It will find "issues" in things that aren't issues (because they are covered by other cases), or skip over issues that people have historically had hard time identifying themselves.
It's not running code in a sandbox and watching memory, it's not making logical maps of code paths in its mind, it's not reasoning at all. It's fucking autocomplete. Stop treating it as if it can think, it fucking can't.
I'm so tired of this hype. It's very easy to convince midwits that something is intelligent, I'm absolutely not surprised at how salesmen and con-men operate now that I've seen this first hand.
I wish people would move on from this mindset. "Agentic" workflows such as those implemented by Claude Code or Cursor can definitely reason about code, and I've used them to successfully debug small issues occurring in my codebase.
We could argue about how they only "predict the next word", but there's also other stuff going on in the other layers of their NNs which do facilitate some sort of reasoning in the latent space.
> I've used them to successfully debug small issues occurring in my codebase.
Great! The pattern recognition machine successfully identified pattern.
But, how do you know that it won't flag the repaired pattern because you've added a guard to prevent the behaviour (ie; invalid/out of bounds memory access guarded by a heavy assert on a sized object before even entering the function itself)?
What about patterns that aren't in the training data because humans have a hard time identifying the bad pattern reliably?
The point I'm making is that it's autocomplete; if your case is well covered it will show up: wether you have guards or not (so: noise) and that it will totally miss anything that humans haven't identified before.
It works: absolutely, but there's no reliability and that's sort of inherent in the design.
For security auditing specifically, an unreliable tool isn't just unhelpful: it's actively dangerous, because false confidence is actually worse than an understood ignorance
If you seek to audit code through the use of LLMs then you have inherently misunderstood the capabilities of the technology and will be left disappointed.
1. Open source tools solve the problem of "critical functions of the application changing without notice, or being signed up for disruptive testing without opt-in".
2. This makes me afraid that it is absolutely impossible for open source tools to ever reach the level of proprietary tools like Claude Code precisely because they cannot do A/B tests like this which means that their design decisions are usually informed by intuition and personal experience but not by hard data collected at scale.