> Engineers need to really lean in to the change in my opinion.
I tried leaning in. I really tried. I'm not a web developer or game developer (more robotics, embedded systems). I tried vibe coding web apps and games. They were pretty boring. I got frustrated that I couldn't change little things. I remember getting frustrated that my game character kept getting stuck on imaginary walls and kept asking Cursor to fix it and it just made more and more of a mess. I remember making a simple front-end + backend with a database app to analyze thousands of pull request comments and it got massively slow and I didn't know why. Cursor wasn't very helpful in fixing it. I felt dumber after the whole process.
The next time I made a web app I just taught myself Flask and some basic JS and I found myself moving way more quickly. Not in the initial development, but later on when I had to tweak things.
The AI helped me a ton with looking things up: documentation, error messages, etc. It's essentially a supercharged Google search and Stack Overflow replacement, but I did not find it useful letting it take the wheel.
These posts like the one OP made is why I'm losing my mind.
Like, is there truly an agentic way to go 10x or is there some catch? At this point while I'm not thrilled about the idea of just "vibe coding" all the time, I'm fine with facing reality.
But I keep having the same experience as you, or rather leaning more on that supercharged Google/SO replacement
or just a "can you quickly make this boring func here that does xyz" "also add this" or for bash scripts etc.
And that's only when I've done most of the plumbing myself.
EVERY DX survey that comes out (surveying over 20k developers) says the exact same thing.
Staff engineers get the most time savings out of AI tools, and their weekly time savings is 4.4 hours for heavy AI users. That's a little more than 10% productivity, so not anywhere close to 10x.
What's more telling about the survey results is they are also consistent in their findings between heavy and light users of AI. Staff engineers who are heavy users of AI save 4.4 hours a week while staff engineers who are light users of AI save 3.3 hours a week. To put another way, the DX survey is pretty clear that the time savings between heavy and light AI users is minimal.
Yes surveys are all flawed in different ways but an N of 20k is nothing to sneeze at. Any study with data points shows that code generation is not a significant time savings and zero studies show significant time savings. All the productivity gains DX reports come from debugging and investigation/code base spelunking help.
In my experience the productivity measured in created merge requests increased massively.
More merge requests because now the same senior developers are creating more bugs, 4x comparing to 2025. Same developers, same codebase but now with Cursor!
Past survey results are hidden in some presentations I've seen, and the latest survey I have full access due to my company paying for it. So I'm not sure it's legal for me to reproduce
I think there is going to be 2-3 year lag in understanding how llms actually impact developer productivity. There are way too many balls in the air, and anyone claiming specific numbers on productivity increase is likely very very wrong.
For example citing staff engineers as an example will have a bias: they have years of traditional training and are obviously not representative of software engineers in general.
The catch is that to go 10x you have to either do a lot of work of the variety that AI excels at, mainly boilerplate and logical but tedious modifications. There's a lot of code I can write, but I will probably need to check the syntax and implementations for 10 or more functions / methods, but I know what they are and how I want the code to flow. AI never really nails it, but it gets close enough that I can fix it with considerable time savings. The major requirement here is that I, for the most part, already knew almost exactly what I wanted to do. This is the really fancy auto-complete that is actually a pretty reasonable assistant.
The other way is that you have to start from a position of 0.1x (or less) and go to !~1x.
There are a tremendous amount of people employed in tech roles, but outside of actual tech companies that have very very low throughput.
I've recently worked in a very large non-tech firm but one that is part of a major duopoly and is for the most part a household name worldwide. They employ 1000s of software developers whose primary function is to have a vague idea of who they should email about any question or change. The ratio of emails to lines of code is probably 25:1.
The idea that you could simply ask an AI to modify code, and it might do it correctly, in only a day is completely mind blowing to people whose primary development experience is from within one of these organizations.
> Like, is there truly an agentic way to go 10x or is there some catch?
Yes. I think it’s practice. I know this sounds ridiculous, but I feel like I have reached a kind of mind meld state with my AI tooling, specifically Claude Code. I am not really consciously aware of having learned anything related to these processes, but I have been all in on this since ChatGPT, and I honestly think my brain has been rewired in a way that I don’t truly perceive except in terms of the rate of software production.
There was a period of several months a while ago where I felt exhausted all the time. I was getting a lot done, but there was something about the experience that was incredibly draining. Now I am past that and I have gone to this new plateau of ridiculous productivity, and a kind of addictive joy in the work. A marvellous pleasure at the orchestration of complex tasks and seeing the results play out. It’s pure magic.
Yes, I know this sounds ridiculous and over-the-top. But I haven’t had this much fun writing software since my 20s.
> Yes, I know this sounds ridiculous and over-the-top.
in that case you should come with more data. tell us how you measured your productivity improvement. all you've said here is that it makes you feel good
Work that would have taken me 1-2 weeks to complete, I can now get done in 2-3 hours. That's not an exaggeration. I have another friend who is as all-in on this as me and he works in a company (I work for myself, as a solo contractor for clients), and he told me that he moved on to Q1 2026 projects because he'd completed all the work slated for 2025, weeks ahead of schedule. Meanwhile his colleagues are still wading through scrum meetings.
I realize that this all sounds kind of religious: you don't know what you're missing until you actually accept Jesus's love, or something along those lines. But you do have to kinda just go all-in to have this experience. I don't know what else to say about it.
If your work maps exceedingly well to the technology it is true, it goes much faster. Doubly so when you have enough experience and understanding of things to find its errors or suboptimal approaches and adjust it that much faster.
The second you get to a place where the mapping isn’t there though, it goes off rails quickly.
Not everyone programs in such a way that they may ever experience this but I have, as a Staff engineer at a large firm, run into this again and again.
It’s great for greenfield projects that follow CRUD patterns though.
this is just not a very interesting way to talk about technology. I'm glad it feels like a religious experience to you, I don't care about that. I care about reality
it seems to me if these things were real and repeatable there would be published traces that show the exact interactions that led to a specific output and the cost in time and money to get there.
My sympathies go out to the friend's coworkers. They are probably wading through a bunch of stuff right now, but given the context you have given us, its probably not "scrum meetings"..
I don't even care about the llm, I just want the confidence you have to assess that any given thing will take N weeks. You say 1-2 weeks.. thats like a big range! Something that "would" take 1 week takes ~2 hours, something that "would" take 2 weeks also takes ~2 hours. How does that even make sense? I wonder how long something that would of taken three weeks would take?
> They are probably wading through a bunch of stuff right now, but given the context you have given us, its probably not "scrum meetings"..
This made me laugh. Fair enough. ;)
In terms of the time estimations: if your point is that I don't have hard data to back up my assertions, you're absolutely correct. I was always terrible at estimating how long something would take. I'm still terrible at it. But I agree with the OP. I think the labour required is down 90%.
It does feel to me that we're getting into religious believer territory. There are those who have firsthand experience and are all-in (the believers), there are those who have firsthand experience and don't get it (the faithless), and there are those who haven't tried it (the atheists). It's hard to communicate across those divides, and each group's view of the others is essentially, "I don't understand you".
Religions are about faith, faith is belief in the absence of evidence. Engineering output is tangible and measurable, objectively verifiable and readily quantifiable (both locally and in terms of profits). Full evidence, testable assertions, no faith required.
Here we have claims of objective results, but also admissions we’re not even tracking estimations and are terrible at making them when we do. People are notoriously bad at estimating actual time spent versus output, particularly when dealing with unwanted work. We’re missing the fundamental criteria of assessment, and there are known biases unaccounted for.
Output in LOC has never been the issue, copy and paste handles that just fine. TCO and holistic velocity after a few years is a separate matter. Masterful orchestration of agents could include estimation and tracking tasks with minimal overhead. That’s not what we’re seeing though…
Someone who has even a 20% better method for deck construction is gonna show me some timetables, some billed projects, and a very fancy new car. If accepting Mothra as my lord and saviour is a prerequisite to pierce an otherwise impenetrable veil of ontological obfuscation in order to see the unseeable? That deck might not be as cheap as it sounds, one way or the other.
I’m getting a nice learning and productivity bump from LLMs, there are incredible capabilities available. But premature optimization is still premature, and claims of silver bullets are yet to be demonstrated.
Here's an example from this morning. At 10:00 am, a colleague created a ticket with an idea for the music plugin I'm working on: wouldn't it be cool if we could use nod detection (head tracking) to trigger recording? That way, musicians who use our app wouldn't need a foot switch (as a musician, you often have your hands occupied).
Yes, that would be cool. An hour later, I shipped a release build with that feature fully functional, including permissions plus a calibration UI that shows if your face is detected and lets you adjust sensitivity, and visually displays when a nod is detected. Most of that work got done while I was in the shower. That is the second feature in this app that got built today.
This morning I also created and deployed a bug fix release for analytics on one platform, and a brand-new report (fairly easy to put together because it followed the pattern of other reports) for a different platform.
I also worked out, argued with random people on HN and walked to work. Not bad for five hours! Do I know how long it would have taken to, for example, integrate face detection and tracking into a C++ audio plugin without assistance from AI? Especially given that I have never done that before? No, I do not. I am bad at estimating. Would it have been longer than 30 minutes? I mean...probably?
I would love to see that pull request, and how readable and maintainable the code is. And do you understand the code yourself, since you've never done this before?
Just having a 'count-in' type feature for recording would be much much more useful. Head nodding is something I do all the time anyway as a musician :).
I don't know what your user makeup is like, but shipping a CV feature same day sounds so potentially disastrous.. There are so many things I would think you would at least want to test, or even just consider with the kind of user emapthy we all should practice.
I think you have to make a distinction between indvidual experience and claims about general truths.
If I know someone as an honest and serious professional, and they tell me that some tool has made them 5x or 10x more productive, then I'm willing to believe that the tool really did make a big difference for them and their specific work. I would be far more sceptical if they told me that a tool has made them 10% more productive.
I might have some questions about how much technical debt was accumulated in the process and how much learning did not happen that might be needed down the road. How much of that productivity gain was borrowed from the future?
But I wouldn't dismiss the immediate claims out of hand. I think this experience is relevant as a starting point for the science that's needed to make more general claims.
Also, let's not forget that almost none of the choices we make as software engineers are based on solid empirical science. I have looked at quite a few studies about productivity and defect rates in software engineering projects. The methodology is almost always dodgy and the conclusions seem anything but robust to me.
> It does feel to me that we're getting into religious believer territory. There are those who have firsthand experience and are all-in (the believers), there are those who have firsthand experience and don't get it (the faithless), and there are those who haven't tried it (the atheists). It's hard to communicate across those divides, and each group's view of the others is essentially, "I don't understand you".
What a total crock. Your prose reminds of of the ridiculously funny Mike Meyers in "The Love Guru".
But then does this not give you pause, that it "feels religious"? Is there not some morsel of critical/rational interrogation on this? Aren't you worried about becoming perhaps too fundamentalist in your belief?
To extend the analogy: why charge clients for your labor anymore, which Claude can supposedly do in a fraction of the time? Why not just ask if they have heard the good word, so to speak?
Nobody had a robust, empirical metric of programmer productivity. Nobody. Ticket count, function points, LoC, and others tell you nothing about the fitness of the product. It’s all feels.
ok, but there's a spectrum between fully reproducible empirical evidence and divine revelation. I'm not convinced it's impossible to measure productivity in a meaningful way, even if it isn't perfect. it at least seems better to try than... whatever this is
What's worked best with Gemini such I made a DSL that transpiles to C with CUDA support to train small models in about 3 hours... (all programs must run against an image data set, must only generate embeddings)
Do not; vibe code from top down (ex. Make me a UI with React, with these buttons and these behaviors to each button)
Do not; chat casually with it. (ex. I think it would look better if the button was green)
Do; constrain phrasing to the next data transform goal (ex. You must add a function to change all words that start with lowercase to start with uppercase)
Do; vibe code bottom up (ex. You must generate a file with a function to open a plaintext file and appropriate tests; now you must add a function to count all words that begin with "f")
Do; stick to must/should/may (ex. You must extend the code with this next function)
Do; constrain it to mathematical abstractions (ex. sys prompt: You must not use loops, you must only use recursion and functional paradigms. You must not make up abstractions and stick to mathematical objects and known algorithms)
Do; constrain it to one file per type and function. This makes it quick to review, regenerate only what needs to change.
Using those patterns, Gemini 2.5 and 3 have cranked out banging code with little wandering off in the weeds and hallucinating.
Programming has been mired in made up semantics of the individual coder for the luls, to create mystique and obfuscate the truth to ensure job security; end of the day it's matrix math and state sync between memory and display.
Just as an aside I also think I am way more productive now but a really convincing datapoint would be someone who does project work and now has 5x the hourly rate they had last year. If there are not plenty of people like this, it cannot be 10x
That's not a very convincing argument. Even if you can do 10x the work, that doesn't necessarily mean you can easily find customers ready to pay 5x the hourly rate.
> Yes, I know this sounds ridiculous and over-the-top. But I haven’t had this much fun writing software since my 20s.
But...you're not writing it. The culmination of many sites, many people, Stack Overflow, etc. all wrote it through the filtering mechanism being called AI.
Lol that's like saying that because you found the solution on stack overflow you didn't write the program
News flash buddy: YOU never wrote any code yourself either. Literally every single line of code you've ever committed to a repo was first written by someone else and you just copied it and modified it a little.
Currently three main projects. Two are Rails back-ends and React front-ends, so they are all Ruby, Typescript, Tailwind, etc. The third is more recent, it's an audio plugin built using the JUCE framework, it is all C++. This is the one that has been blowing my mind the most because I am an expert web developer, but the last time I wrote a line of C++ was 20 years ago, and I have zero DSP or math skills. What blows my mind is that it works great, it's thread safe and performant.
In terms of workflow, I have a bunch of custom commands for tasks that I do frequently (e.g. "perform code review"), but I'm very much in the loop all the time. The whole "agent can code for hours at a time" thing is not something I personally believe. It depends on the task how involved I get, however. Sometimes I'm happy to just let it do work and then review afterwards. Other times, I will watch it code and interrupt it if I am unhappy with the direction. So yes, I am constantly stepping in manually. This is what I meant about "mind meld". The agent is not doing the work, I am not doing the work, WE are doing the work.
I maintain a few rails apps and Claude Code has written 95% of the code for the last 4 months. I deploy regularly.
I make my own PRs then have Copilot review them. Sometimes it finds criticisms, and I copy and paste that chunk of critique into Claude Code, and it fixes it.
Treat the LLMs like junior devs that can lookup answers supernaturally fast. You still need to be mindful of their work. Doubtful even. Test, test, test.
Extensive tailwind training data in the models. Sure there's something more efficient but it's just safer to let the model leverage what it was trained on.
In my experience the LLMs work better with frameworks that have more rigid guidance. Something like Tailwind has a body of examples that work together, language to reason about the behavior needed, higher levels of abstraction (potentially), etc. This seems to be helpful.
The LLMs can certainly use raw CSS and it works well, the challenge is when you need consistent framing across many pages with mounting special cases, and the LLMs may make extrapolate small inconsistencies further. If you stick within a rigid framework, the inconsistencies should be less across a larger project (in theory, at least).
Start by having the agent ask you questions until it has enough information to create a plan.
Use the agent to create the plan.
Follow the plan.
When I started, I had to look at the code pretty frequently. Rather than fix it myself, I spent time thinking about what I could change in my prompts or workflow.
Everyone keeps telling me that it's good for bash scripts but I've never had real success.
Here's an example from today. I wanted to write a small script to grab my Google scholar citations and I'm terrible with web so I ask the best way to parse the curl output. First off, it suggests I use a python package (seriously? For one line of code? No thanks!) but then it gets the wrong grep. So I pull up the page source, copy paste some to it, and try to parse it myself. I already have a better grep command and for the second time it's telling me to use pearl regex (why does it love -P as much as it loves delve?). Then I'm pasting in my new command showing it my output asking for the awk and sed parts while googling the awk I always forget. It messes up the sed parts while googling, so I fix it, which means editing the awk part slightly but I already had the SO post open that I needed anyways. So I saved maybe one minutes total?
Then I give it a skeleton of a script file adding the variables I wanted and fully expected it to be a simple cleanup. No. It's definitely below average, I mean I've never seen an LLM produce bash functions without being explicitly told (not that the same isn't also true for the average person). But hey, it saved me the while loop for the args so that was nice. So it cost as much time as it gave back.
Don't get me wrong, I find LLMs useful but they're nowhere near game changing like everyone says they are. I'm maybe 10% more productive? But I'm not convinced that's even true. And sure, I might have been able to do less handholding with agents and having it build test cases but for a script that took 15 minutes to write? Feels like serious overkill. And this is my average experience with them.
Is everyone just saying it's so good at bash because no one is taking the time to learn bash? It's a really simple language that every Linux user should know the basics of...
I did find some benefit in lowering the cost of exploratory work, but that's it—certainly worth 20€/month, but not the price of any of the "ultimate" plans.
For example today I had to write a simple state machine (for a parser that I was rewriting so I had all the testcases already). I asked Claude Code to write the state machine for me and stopped it before it tried compiling and testing.
Some of the code (of course including all the boilerplate) worked, some made no sense. It saved a few minutes and overall the code it produced was a decent first approximation, but waiting for it to "reason" through the fixes would have made no sense, at least to me. The time savings mostly came from avoiding the initial "type the boilerplate and make it compile" part.
When completing the refactoring there were a few other steps like where using AI was useful. But overall the LLM did maybe 10% of the work and saved optimistically 20-30 minutes over a morning.
Assuming I have similar savings once a week, which is again very optimistic... That's a 2% reduction or less.
> or just a "can you quickly make this boring func here that does xyz" "also add this" or for bash scripts etc.
I still write most of the interesting code myself, but when it comes to boring, tedious work (that's usually fairly repetitive, but can't be well abstracted any more), that's when I've found gen AI to be a huge win.
It's not 10x, because a lot of the time, I'm still writing code normally. For very specific, boring things (that also are usually my least favorite parts of code to write), it's fantastic and it really is a 10x. If you amortize that 10x over all the time, it's more like a 1.5x to 3x in my experience, but it saves my sanity.
Things like implementing very boring CRUD endpoints that have enough custom logic that I can't use a good abstraction and writing the associated tests.
I would dread doing work like that because it was just so mind numbing. Now, I've written a bunch of Cursor rules (that was actually pretty fun) so I can now drop in a Linear ticket description and have it get somewhere around 95% done all at once.
Now, if I'm writing something that is interesting, I probably want to work on it myself purely because it's fun, but also because the LLM may suck at it (although they're getting pretty damn good).
I tried claude code to write very simple app for me. Basically Golang mock server which will dump request to console. I'd write this kind of app in an hour. I spent around 1.5 hours with claude code and in the end I had code which I liked, almost the same code I'd write myself. It's not vibe coding, I carefully instructed it to write code in a way I prefer, one small step after another.
So for me, it's pretty obvious that with better training, I'd be able to achieve speed ups with the same result in the end. Not 10x, but 2x is possible. The very first attempt to use AI ended up with almost the same time I'd write the same code, and I have a lot to improve.
That said, I have huge problem with this approach. It's not fun to work like that. I started to program 25 years ago, because it was fun for me. It still fun for me today. I love writing all these loops and ifs. I can accept minimal automation like static autocomplete, but that's about it.
does anyone remember that episode of star trek tng where the kid is given a little laser engraver that carves a dolphin from a block of wood? and the kid is like "i didn't make this" and the teacher (who abducted him, ew) is like "yeah but it's what you wanted to make, the tool just guided you"
so in 2026 we're going to get in trouble doing code "the old way", the pleasurable way, the way an artist connects with the work. we're not to chefs any longer, we're a plumber now that pours food from a faucet.
we're annoyed because our output can suddenly be measured by the time unit. the jig is up. our secret clubhouse has a lightbulb the landlord controls.
some of us were already doing good work, saving money, making the right decisions. we'll be fine.
some of us don't know how to do those things - or won't do those things - and our options are funneled down. we're trashing at this, like dogs being led to the pound.
there's before, there's during, and there's after; the during is a thing we so seldom experience, and we're in it, and 2024 felt like nothing, 2025 feels like the struggle, and 2026 will be the reconciliation.
change sucks. but it's how we continue. we continue differently or we dont exist.
I sure do. I believe it's the first season episode "When the Bough Breaks," (S01E16). That show tackled so many heavy topics right out of the gate... I respect the hell of of the courage to try, even if it produced some pretty epic whiffs along with the home runs and standing doubles.
Feeling the same. I’m guessing the folks getting good results are literally writing extremely detailed pseudocode by hand?! Like:
Write a class Person who has members (int) age, (string) first name, (string) last name…
But if you can write that detailed…don’t you know the code you want to write and how you should write it? Writing plain pseudo code feels more verbose.
But the AI coding agent can then ask you follow up questions, consider angles you may not have, and generate other artifacts like documentation, data generation and migration scripts, tests, CRUD APIs, all in context. If you can reliably do all that from plain pseudo code, that's way less verbose than having to write out every different representation of the same underlying concept, by hand.
Sure, some of that, like CRUD APIs, you can generate via templates as well. Heck, you can even have the coding agent generate the templates and the code that will process/compile them, or generate the code that generates the templates given a set of parameters.
It's been my experience that reaching for an LLM is a significant context switch that breaks flow state. Comparable to a monkey entering your office and banging cymbals together for a minute, returning to programming after writing up instructions for an LLM requires a refocusing process to reestablish the immersion you just forfeited. This can be a worthwhile trade with particularly tedious or annoying tasks, but not always.
I suspect that this explains the current bifurcation of LLM usage. Where individuals either use LLMs for everything or use them minimally. With the in-between space shrinking by the day.
> Like, is there truly an agentic way to go 10x or is there some catch? At this point while I'm not thrilled about the idea of just "vibe coding" all the time, I'm fine with facing reality.
Below is based on my experience using (currently) mostly GPT-5 with open source code assistants.
For a new project with straightforward functionality? I think you (and "you" being "basically anybody who can code at all") can probably manage to go 10x the pace of a junior engineer of yesteryear.
Things get a lot trickier when you have complex business logic to express and backwards compatibility to maintain in an existing codebase. Writing out these kinds of requirements in natural language is its own skillset (which can be developed), and this process takes time in and of itself.
The more confusing the requirements, the more error prone the process becomes though. The model can do things "correctly" but oops maybe you forgot something in your description, and now the whole thing will be wrong. And the fact that you didn't write the code means that you missed out on your opportunity to fix / think about stuff in the first pass of implementation (i.e. you need to seriously review stuff, which also slow you down).
Sometimes iterating over English instructions will take longer than just writing/expressing things in code from the start. But sometimes it will be a lot faster too.
Basically the easy stuff is way easier but the more complex stuff is still going to require a lot of hand holding and a lot of manual review.
I have a feeling that people who are genuinely impressed by long term vibe coding on a single project are only impressed because they don't know any better.
Take writing a book, or blog post; writing a good blog post, or a chapter of a book, takes lots of skill and practice. The results are very satisfying and usually add value to both the writer's life as well as the reader's. When someone who has done that uses AI and sees the slop it generates, he's not impressed, probably even frustrated.
However, someone who can barely write a couple coherent sentences, would be baffled at how well AIs can put together sentences, paragraphs, and have a somewhat coherent train of thought through the entire text. People who struggled in school with writing an introduction and a conclusion will be amazed at AIs writing. They would maybe even assume that "those paragraphs actually add no meaning and are purely fluff" is a totally normal part of writing and not an AI artifact.
I’m impressed by getting the output of at least a mediocre developer at less than 1% of the cost. Brute force is an underrated strategy. I’ve been having a great experience.
That developers in the Hacker News comment bin report experiences that align with their personal financial interests doesn’t really dissuade me.
How many hours have you spent writing code? Thousands? Tens of thousands? Were you able to achieve good results in the first hundred hours?
Now, compare it to how much time you've spent working with agents. Did you dedicate considerable time to figuring out how to use them? Do you stop using the agent and do things manually when it isn't going right, or do you spend time figuring out how to get the agent to do it?
You can't really compare those 2. Agents a re non-deterministic. I can tell Clod to go update my unit test coverage and it will choke itself, burn 200k tokens and then loudly proclaim "Great! I've updated unit test coverage".
I'll kill that terminal, open it again and run the exact same command. 30k tokens, actually adds new tests.
It's hard to "learn" when the feedback cycle can take 30 minutes and result in the agent sitting in the corner touching itself and crooning about what a good boy it is. It's hard to _want_ to learn when you can't trust the damn thing with the same prompt twice.
And then all the heuristics you've learnt change under you and you're stuck doing 100-1000 more hours of learning with a drop in quality during that time.
Agentic AI is a pretty huge scam. Every organization worth it's salt has so many IAM protections in place that an AI developer is useless because you can't give it SSO and OIDC credentials to access the company resources. And no IT team will ever let it. So all these folks trying to convince us that AI will ever deploy anything useful are lying their butts off; IT won't even let their own developers deploy anything why would an AI be treated any differently?
That's my finding as well. The smaller the chunk, the better, and it saves me 5m here and an hour there. These really add up.
This is cool. It's extra cool on annoying things like "fix my types" or "find the syntax error" or "give me the flags for ffmpeg to do exactly this."
If I ever meet someone who drank the koolaid and wants to show me their process, I'm happy to see it. But I've tried enough to believe my own eyes, and when I see open source contributors I respect demo their methods, they spend enough time and energy either waiting on the machine and trying to keep it on the rails that, yes this is harder, but it does not appear to be faster.
It seems to very heavily depend on your exact project and how well it's represented in the training set.
For instance, AI is great at react native bullshit that I can't be bothered with. It absolutely cannot handle embedded development. Particularly if you're not using Arduino framework on an Atmel 328. I'm presently doing bare metal AVR on a new chip and none of the AI agents have a single clue what they're doing. Even when fed with the datasheet and an entire codebase of manually written code for this thing, AI just produces hot wet garbage.
If you're on the 1% happy path AI is great. If you diverge even slightly from the top 10 most common languages and frameworks it's basically useless.
The weird thing is if you go in reverse it works great. I can feed bits of AVR assembly in and the AI can parse it perfectly. Not sure how that works, I suspect it's a fundamentally different type of transformation that these models are really good at
I have been building a game (preview here: https://qpingpong.codeinput.com) as a practice to "vibe-coding". There is only one rule: I am not allowed to write a single line of code. But can prompt as much as I want.
So far I am hitting a "hard-block" on getting the AI to make changes once you have a large code base. One "unblocker" was to restructure all the elements as their own components. This makes it easier for the LLM (and you?) to reason about each component (React) in isolation.
Still, even as this "small/simple game" stage, it is not only hard for the LLM to get any change done but very easy for it to break things. The only way I can see my around it, is to structure very through tests (including E2E tests) so that any change by the LLM has to be thoroughly tested for regression.
I've been working on this for a month or so. I could have coded it faster by hand except for the design part.
I have a hobby project on the side involving radio digital signal processing in Rust that I've been pure vibe coding, just out of curiosity to see how far I can get. On more than one occasion the hobby project has gotten bogged down in a bug that is immensely challenging to resolve. And since the project isn't in an area I have experience with, and since I don't have a solid "theory of the program", since it's a gray box because I've been vibe coding it, I've definitely seen CC get stuck and introduce regressions in tricky issues we previously worked through.
The use of Claude Code with my day job has been quite different. In my day job, I understand the code and review it carefully, and CC has been a big help.
You can go faster once you understand the domain reasonably well that you could have written it yourself. This allows you to write better designs, and steer LLMs in the right direction.
"Vibe coding" though is moving an ever growing pile of nonunderstanding and complexity in front of you, until you get stuck. (But it does work until you've amassed a big enough pile, so it's good for smaller tasks - and then suddenly extremely frustrating once you reach that threshold)
Can you go 10x? Depends. I haven't tried any really large project yet, but I can compress fairly large things that would've taken a week or two pre-LLM into a single lazy Sunday.
For larger projects, it's definitely useful for some tasks. ("Ingest the last 10k commits, tell me which ones are most likely to have broken this particular feature") - the trick is finding tasks where the win from the right answer is large, and the loss from the wrong one is small. It's more like running algorithmic trading on a decent edge than it is like coding :)
It definitely struggles to do successfully do fully agentic work on very large code bases. But... I've also not tried too much in that space yet, so take that with a grain of salt.
If you have not started working on a new codebase while adopting AI, it may be harder to realize the gains.
I switched jobs somewhat recently. At my previous job, where I was on the codebase for years, I knew where the changes should be and what they should look like. So I tried to jump directly to implementation with the AI because I didn't need much help planning and the AI got confused and did an awful job.
In a new codebase, where I had no idea how things are structured, I started the process by using AI to understand where the relevant code is, the call hierarchies and side effects, etc.
I have found by using the AI to conduct the initial investigation, it was then very easy to get the AI to generate an effective spec, and then it was relatively easy to get the AI to generate the code to that spec. That flow works much better than trying to one shot implementation
It sounded like he was trying to one shot things when he mentioned he would ask it to fix problems with no luck. It's an approach I've tried before with similar results, so I was sharing an alternative that worked for me. Apologies if it came across as dismissive
GP said they were doing vibe coding and trying to get the ai to do one shots. Thats the worst way to use these tools. AI coding agents work best when you generally know what you want the output to look like but dont want to waste time writing that output
I don’t vibe code yet but it has sped me up a lot when working with large frameworks that have a lot of magic behind the scenes (Spring Boot). I am doing a very large refactor, major version spring boot upgrade, at the moment.
When given focused questions for parts of the code it it will give me 2-4 different approaches extending/implementing different bean overrides. I go through a cycle of back and forth having it give me sample implementations. I often ask what is considered the more modern or desirable approach. Things like give me a pros and cons list of the different approaches. The one I like the best I then go look up the specific docs to fact check a bit.
For this type of work it easily is a 2-3x. Spring specifically is really tough to search for due to its long history and large changes between major versions. More times than not it lands me on the most modern approach for my Spring Boot version and while the code it produces is not bad it isn’t great either. So, I rewrite it.
Also it does a pretty good job of writing integration tests. I have it give me the boilerplate for the test and then I can modify it for all my different scenarios. Then I run those against the unmodified and refactored code as validation suite that the refactor didn’t introduce issues.
When I am working in GoLang I don’t get this level of speed up but I also don’t need to look up as much. The number of ways to do things is far lower and there is no real magic behind the scenes. This might be one reason experiences may differ so radically.
The thing is, using an agent or AI to code for you is a learned skill. It doesn’t come naturally to most people. For you to be successful at it, you’ve got to adopt a mentor / lead mindset - directing vs doing. In other words, you have to be an expert at explaining yourself - communicating clearly to get great results.
Someone who hasn’t got any experience coding, or leading in any capacity, anywhere in life (or mentoring) will have a hard time with agentic development.
I’ll elaborate a bit more - the ideal mindset requires fighting that itch to “do it yourself” and sticking to the prompts for any changes. This habit will force you to get better at communicating effectively to others (including agents).
How are you guys using LLMs? I've done a couple of applications for my own use, including a "Mexican Train Dominoes" online multiplayer using LLMs and it doesn't stop amazing me, Gemini 3 is crazy good at finding bugs at work, And every week there are very interesting advances in Arxiv articles.
I'm 45 years old, have been programming since I was 9, and this is the most amazing time to be building stuff.
> I've had Claude Code write an entire unit/integration test suite in a few hours (300+ tests) for a fairly complex internal tool. This would take me, or many developers I know and respect, days to write by hand.
I have no problem believing that Claude generated 300 passing tests. I have a very hard time believing those tests were all well thought out, consise, actually testing the desired behavior while communicating to the next person or agent how the system under test is supposed to work. I'd give very good odds at least some of those tests are subtly testing themselves (ex mocking a function, calling said function, then asserting the mock was called). Many of them are probably also testing implementation details that were never intended to be part of the contract.
I'm not anti-AI, I use it regularly, but all of these articles about how crazy productive it is skip over the crazy amount of supervision it needs. Yes, it can spit out code fast, but unless your prepared to spend a significant chunk of that 'saved" time CAREFULLY (more carefully than with a human) reviewing code, you've accepted a big drop in quality.
The benefit of having a team of QA engineers create tests is their differing perspectives, so with LLMs being trained to act like affirmation engines you have to wonder how that impacts the test cases it creates. Its the problem of LLMs being miserable at critiques manifesting itself in a different way.
However, in saying that, I am by no means an AI hater, but rather I just want models to be better than they currently are. I am tired of the tech demos and benchmark stats that don't really mean much aside from impressing someone who's not in a critical thinking mindset.
Very similar experience here. I have not once managed to get an LLM to generate good tests, even for very simple code. It generally writes tautologies that will pass with high confidence.
I had CC write a bunch of tests to make sure some refactoring didn't break anything, and then I ran the app and it crashed out of the gate. Why? Because despite the verbosity of the tests it turns out that it had mocked the most import parts to test, so the _actual_ connections weren't being tested, and while CC was happy to claim victory with all tests green, the app was broken.
Anecdotes etc etc but the AI tests I've been sent to review have been absolute shit. Stuff like that just calling a function doesn't crash the program. No assertions other than "end of test method reached"
Yes sometimes those tests are necessary, but it seemed to just do it everywhere because it made the code coverage percentage go up. Even though it was useless.
I have also had great experiences with AI cranking out straightforward boilerplate or asking C++ template metaprogramming questions. It's not all negative. But net-net it feels like it takes more work in total to use AI as you have to learn to recognize when it just won't handle the task, which can happen a lot. And you need to keep up with what it did enough to be able to take over. And reading code is harder than writing it.
I’ve seen agents produce plenty of those tests, but recently I’ve seen them generate some actually decent unit tests that I wouldn’t have thought of myself. It’s a bit of a crapshoot
So you're openly saying you're fine with quantity over quality.... in software engineering? That's fine for a MVP, maybe, but nothing beyond on that IMHO unless they're throw away scripts.
There is exactly one "best" programmer in the world, and at this moment he/she is working on at most one project. Every other project in the world is accepting less than the "best" possible quality. Yes... in software engineering.
As soon as you sat down at the keyboard this morning, your employer accepted a sacrifice in quality for the sake of quantity. So did mine. Because neither one of us is the best. They could have hired someone better but they hired you and they're fine with that. They'd rather have the code you produce today than not have it.
It's the same for an AI. It could produce some code for you, right now, for nearly free. Would you rather have that code or not have it? It depends on the situation, yeah not always but sometimes it's worth having.
Here is the thing, most software engineers are not designing rockets, they are making basic CRUD apps. If there is a minor defect it can be caught and corrected without much issue. Our jobs are a lot less "critical infrastructure" than a lot of software engineers will allow their egos to accept.
Sure if you are making some medical surgery robot do it right, but if you are making a website the recommends wine pairings who cares if one of the buttons has a weird animation bug that doesn't even get noticed for a couple of years.
I think I'm "most" engineers and I haven't ever worked on something that was "just" a CRUD app. Having a DB behind your web app doesn't make it "just" a CRUD.
It's really overestimated how many simple apps exist.
Regular SaaS products of different kinds, cloud software, hosting software, etc. Really representative of most of the Web-enabled software out there.
For every one of them there has been an almost negligible amount of CRUD code, the meat of every one of those apps was very specific business logic. Some were also heavy on the frontend with equal amount of complexity on the backend. As a senior/staff level engineer you also have dive into other things like platform enablement, internal tooling, background jobs and data wrangling, distributed architectures, etc. which are even farther from CRUD.
Not to call you out but this is exactly what I meant when I said software engineers have egos that will not let them accept that they are not designing critical stuff.
Comparing your cloud based CRUD app to a missile is a perfect illustration. There is no dishonor in admitting that our stuff isn't going to kill anyone if there is a bug. Don't write bad code, but also sometimes just getting something out the door is much better than perfect quality (bird in the hand and all that).
Not to call you out either but it seems you have really no idea what a basic CRUD app is. Which is fine, I guess not everyone likes to reads the base definitions of these things. It's clear I replied to the wrong person as we don't have a shared understanding of complexity.
Banking software is critical, but guess what, most software engineers are not writing banking software. I never said no software engineers write critical code. Heck I'd argue most at some point in their career will write something that needs to be as bug free as possible... at some point in their careers.
My point is that for most software engineering getting a product out is more important that a super high quality bar that slows everything down.
If you are writing banking software or flight control systems please do it with care, if you are making some React based recipe website or something I don't really care (99% of software engineering falls into this latter category in my opinion).
Software engineers need to get over themselves a bit, AI really exposed how many were just getting by making repetitive junk and thinking they were special.
> most software engineers are not writing banking software
Many software engineers write software for people who won't like the idea that their request/case can be ignored/failed/lost, when expressed openly on the front page of your business offering. Are bookings important enough? Are gifts for significant events important? Maybe you're okay with losing my code commits every once in a while, I don't know. And I'm not sure why you think it's okay to spread this bad management idea of "not valuable or critical enough" among engineers who should know better and who should keep sources of bad ideas at bay when it comes to software quality in general.
The main benefit of writing tests is that is forces the developer to think about what they just wrote and what it is supposed to do. I often will find bugs while writing tests.
I've worked on projects with 2,000+ unit tests that are essentially useless, often fail when nothing is wrong, and rarely detect actual bugs. It is absolutely worse than having 0 tests. This is common when developers write tests to satisfy code coverage metrics, instead of in an effort to make sure their code works properly.
Hundreds of tests that were written basically for free in a few minutes even though a lot of them are kind of dumb?
Or hundreds of tests that were written for a five figure sum that took weeks or months, and only some of them are kind of dumb?
If you’re just thinking of code as the end in and of itself, then of course, the handcrafted artisanal product is better. If you think of code like an owner, an incidental expense towards solving a problem that has value, then cheap and disposable wins every time. We can throw our hands up about “quality“ and all that, but that baby was thrown out with the bathwater a very, very long time ago. The modern Web is slower than the older web. Desktop applications are just web browsers. Enterprise software barely works. Windows 11 happened. I don’t think anybody even bothers to scrutinize their dependency chains except for, I don’t know, like maybe missile guidance or something. And I just want to say Claude is not responsible for any of this. You humans are.
Neither. Tests should be written by developers only when it saves them time. The cost of writing them should be negative.
Instead of writing hundreds of useless tests so that the code coverage report shows high numbers, it is better to write a couple dozen tests based on business needs and code complexity.
Having used Bentley software products I can tell you with complete certainty that professional software developers have extremely bad judgment when it comes to the need to test software and verify its functionality. Developers just think they know what they’re doing because there’s typically not a strong feedback mechanism that inflicts serious career damage when they do things that are extremely lazy or stupid or unethical. How many people lost their job or had to change their name and live out the rest of their days in Juarez Mexico over AWS’ incomprehensible configuration causing an internet brown out? Anyone? A teenager serves cold onion rings at a burger joint and he’s on the street. Some lazy dweeb at Amazon blows up the internet and - come on, isn’t it about the friends we made along the way? It’s obscene and the lack of professionalism and accountability is a total disgrace.
If you can reduce a problem to a point where it can be solved by simple code you can get the rest of the solution very quickly.
Reducing a problem to a point where it can be solved with simple code takes a lot of skill and experience and is generally still quite a time-consuming process.
Most of software work is maintaining "legacy" code, that is older systems that have been around for a long time and get a lot of use. I find Claude Code in particular is great at grokking old code bases and making changes to it. I work on one of those old code bases and my productivity increased 10x mostly due to Claude Code's ability to research large code bases, make sense of it, answer questions and making careful surgical changes to it. It also helps with testing and debugging which is huge productivity boost. It's not about its ability to churn out lots of code quickly: it's an extra set of eyes/brain that works much faster that human developer.
I have the opposite experience. Claude can't get it all in the context window and make changes that will completely break something on the other side of the program.
Granted that's because the program is incredibly poorly written, but still, context window will stay a huge barrier for quite some time.
Between yours and GP's comments, I find echoes of my experience:
> Most of software work is maintaining "legacy" code, that is older systems that have been around for a long time and get a lot of use.
> Granted that's because the program is incredibly poorly written
LLMs can't fix big, shitty legacy codebases. That is where most maintenance work (in terms of hours) is, and where it will remain.
I would take it one step further and argue that LLMs and vibe-coding will compound into more big, shitty legacy codebases over time, and therefore, in the long arc, nothing will really change.
It has ever been thus. There are multi-million dollar businesses propped up by .NET applications on a foundation of shunted-around files, and at best, SQL used as APIs/queues. "Working" code is, in the long run, a liability outside the hands of those doing real engineering.
I want to voice the same bad experience, tried Claude and several more actually. I could get AI to understand some things but it quickly went of the rails trying to comprehend larger complexities and its suggested changes would have been between worse to detrimental had I allowed them to be committed.
Can it though? I thought it was most useful for writing new code, but have so far never had it correctly refactor existing code. Its refactoring attempts usually change behavior / logic, and sometimes even leave the code in a state where it's even harder to read.
I've found this as well. In some cases we aren't fully authorised to use the AI tools for actual coding but even just asking "how would you make this change" or "where would you look to resolve this bug" or "give me an overview of how this process works" is amazingly helpful.
> In some cases we aren't fully authorised to use the AI tools for actual coding but even just asking "how would you make this change" [...]
Isn't the logical endpoint of this equivalent to printing out a Stackoverflow answer and manually typing it into your computer instead of copy-and-pasting?
Nitpicks aside, I agree that contemporary AIs can be great for quickly getting up to speed with a code base. Both a new library or language you want to be using, and your own organisation's legacy code.
One of the biggest advantages of using established ecosystem was that stack-overflow had a robust repository of already answered questions (and you could also buy books on it). With AI you can immediately cook up your own Stackoverflow community equivalent that provides answers promptly instead of closing your question as off-topic.
And I pick Stackoverflow deliberately: it's a great resources, but not reliable enough to use blindly. I feel we are in a similar situation with AI at the moment. This will change gradually as the models become better. Just like Stackoverflow required less expertise to use than attending a university course. (And a university course requires less expertise than coming up with QuickSort in the first place.)
> Isn't the logical endpoint of this equivalent to printing out a Stackoverflow answer and manually typing it into your computer instead of copy-and-pasting?
Not in my case (I never used SO like that, anyway). I use it almost exactly like SO, except much more quickly and interactively (and without the inference that I’m “lazy,” or “stupid,” for not already knowing the answer).
I have found that ChatGPT gives me better code than Claude (I write Swift); even learning my coding and documentation style.
I still need to review all the code it gives me, and I have yet to use it verbatim, but it’s getting close.
The most valuable thing, is I get an error, and I can ask it “Here’s the symptoms and the code. What do you think is going on?”. It usually gives me a good starting point.
I could definitely figure it out on my own, but it might take half an hour. ChatGPT will give me a solid lead in about half a minute.
The problem is most likely not writing the actual code, but rather understanding an old, fairly large codebase and how it’s stitched together.
SO is (was?) great when you where thinking about how nice a recursive reduce function could replace the mess you’ve just cobbled together, but language x just didn’t yet flow naturally for you.
>Isn't the logical endpoint of this equivalent to printing out a Stackoverflow answer and manually typing it into your computer instead of copy-and-pasting?
when AI works well it is superior to Stack Overflow because what it replaces is not "Look up answer on SO, copy paste" but rather, look up several different things on SO that relate to your problem that you are trying to solve but which there is no exact definite solution posted anywhere, and copy those things together into a bit of code that you will probably just refactor a bit with a shorter time than doing all the SO look up yourself. When it works it can turn 2 hours of research into 2 minutes.
The problems are:
AI also sometimes replicates the following process - dev not understanding all parts of solution or requirements copies bits of code together from various answers making something that sort of works but is inefficient and has underlying problems.
Even with the working correctly solution your developer does not get in that 2 minutes what they used to get in the two hours before, an understanding of the problem space and how these parts of the solution hang together. This is the reason why it is more useful for seniors than juniors, because part of the looking through SO for what you want is light education.
>Isn't the logical endpoint of this equivalent to printing out a Stackoverflow answer and manually typing it into your computer instead of copy-and-pasting?
Isn't the answer on SO the result of a human intelligence writing it in the first place, and then voted by several human intelligencies to top place? If an LLM was merely an automated "equivalent" to that, that's already a good thing!
But in general, the LLM answer you appear to dismiss amounts to a lot more:
Having an close-to-good-human-level programmer
understand your existing codebase
answer questions about your existing codebase
answer questions about changes you want to make
on demand (not confined to copying SO answers)
interactively
and even being able to go in and make the changes
That amounts to "manually typing an SO answer" about as much as a pickup truck amounts to a horse carriage.
Or, to put it another way, isn't "the logical endpoint" of hiring another programmer and asking them to fix X "equivalent to printing out a Stackoverflow answer and manually typing it into their computer"?
>And I pick Stackoverflow deliberately: it's a great resources, but not reliable enough to use blindly. I feel we are in a similar situation with AI at the moment.
Well, we shouldn't be using either blindly anyway. Not even the input of another human programmer (that's way we do PR reviews).
> Isn't the answer on SO the result of a human intelligence writing it in the first place, and then voted by several human intelligencies to top place? If an LLM was merely an automated "equivalent" to that, that's already a good thing!
The word "merely" is doing all of the heavy lifting here. Having human intelligence in the loop providing and evaluating answers is what made it valuable. Without that intelligence you just have a machine that mimics the process yet produces garbage.
I've been building things with Claude while looking at say less than 5% of the code it produces. What I've built are tools I want to use myself and... well they work. So somebody can say that I can't do it, but on the other hand I've wanted to build several kinds of ducks and what I've built look like ducks and quack like ducks so...
I've found it's a lot better at evaluating code than producing it so what you do is tell it to write some code, then tell it to give you the top 10 things wrong with the code, then tell it to fix the five of them that are valid and important. That is a much different flow than going on an expedition to find a SO solution to an obscure problem.
A good quality metric of your code is to ask an LLM to find the ten worst things about it and if all of those are all stupid then your code is pretty good. I did this recently on a codebase and it's number 1 complaint was that the name I had chosen was stupid and confusing (which it was, I'm not explaining the joke to a computer) and that was my sign that it was done finding problems and time to move on.
>then tell it to give you the top 10 things wrong with the code, then tell it to fix the five of them that are valid and important.
I would be cautious of this. I've tried this multiple times and often it produces very subtle bugs. Sometimes the code is not bad enough to have 5 defects with it, but it will comply, and change things that don't need to. You will find out in prod at some point.
To be clear, I'm instructing it to generate a list of issues for me. I then decide if anything on that list is worth fixing (or is an issue at all, etc.)
Do you think you will be able to capture any of this extra value? I think I'm faster at coding, but the overall corporate project timeline feels about the same. I feel more relaxed and confident that the work can be done. Not sure how to get a raise out of this.
For me, as a remote developer, it means I'm able to finish my work in 1 hour instead of 8 hours. So I'm able to capture "extra value" in the form of time. In our team everyone uses GitHub Copilot and I use Claude Code. My teammates' productivity increased slightly but my productivity increased a lot. This is because 1. Claude Code is just a better coding agent 2. I invested time to get good at agentic coding. Eventually Copilot will catch up and management will realize that now 1 developer can do what previously would take a whole team.
I'm really curious on what your role is, and which industry are you in? I'm awed by these productivity gains others report, but I feel like AI helps in such a small part of my job (implementing specific changes as I direct).
Agentic workflows for me results in bloated code, which is fine when I'm willing to hand over an subsystem to the agent, such as a frontend on a side project and have it vibe code the entire thing. Trying to get clean code erases all/most of my productivity gains, and doesn't spark joy. I find having a back-end-forth with an agent exhausting, probably because I have to build and discard multiple mental models of the proposed solution, since the approach can vary wildly between prompts. An agent can easily switch between using Newton-Raphson and bisection when asked to refactor unrelated arguments, which a human colleague wouldn't do after a code review.
I've come to the same conclusion: If you just want a huge volume of code written as fast as possible, and don't care about 1. how big it is, 2. how fast it runs, 3. how buggy it is, 4. how maintainable or understandable it is, or 5. the overall craftsmanship and artistry of it, then you're probably seeing huge productivity gains! And this is fine for a lot of people and for a lot of companies: Quality really doesn't matter. They just care about shitting out mediocre code as fast as possible.
If you do care about these things, it will take you overall longer to write the code with an LLM than it would by hand-crafting it. I started playing around with Claude on my hobby projects, and found it requires an enormous amount of exhausting handholding and post-processing to get the code to the point where I am really happy with it as a consistent, complete, expressive work of art that I would be willing to sign my name to.
It does matter, but it's one requirement among many. Engineers think quality metrics as you listed are the most important requirements, but that's not typically true.
This really is what businesses want and always have wanted. I've seen countless broken systems spitting out wrong info that was actively used by the businesses in my career, before AI. They literally did not want it fixed when I brought it up because dealing with errors was part of the process now in pretty much all cases. I don't even try anymore unless I'm specifically brought on to fix a legacy system.
>that I would be willing to sign my name to.
This right here is what mgmt thinks is the big "problem" that AI solves. They have always wanted us to magically know what parts are "good enough" and what parts can slide but for us to bear the burden of blame. The real problem is same as always bad spec. AI won't solve that but it will in their eyes remove a layer in their poor communication. Obviously no SWE is going to build a system that spit out wrong info and just say "hire people to always double check the work" or add it to so-so's job duties to check, but that really is the solution most places seem to go with by lack of decision.
Perhaps there is some sort of failure of SWE's to understand that businesses don't care. Accounting will catch the expensive errors anyway. Then Execs will bull whip middle managers and it will go away.
The adversarial tension was all that ever made any of it work.
The "Perfectionist Engineer" without a "Pragmatic Executive" to press them into delivering something good enough would of course still been in their workshop, tinkering away, when the market had already closed.
But the "Pragmatic Executive" without the "Perfectionist Engineer" around to temper their naive optimism would just as soon find themselves chased from the market for selling gilded junk.
You're right that there do seem to be some execs, in the naive optimism that defines them, eager to see if this technology finally lets them bring their vision to market without the engineer to balance them.
That's a nice balanced wholesome take, only the problem is that the "Pragmatic Executive" is more like "Career-driven frenzied 'ship it today at all costs' psychopath executive".
You are describing a push-and-pull / tug-of-war balanced relationship. In reality that's absolutely exactly never balanced. The engineer has 1% say, the other 99% go to the executive.
I so wish your take was universally applicable. In my 24 years of career, it was not.
> Perhaps there is some sort of failure of SWE's to understand that businesses don't care
I think it's an engineer's nature to want to improve things and make them better, but then we naively assume that everybody else also wants to improve things.
I know I personally went through a pretty rough disillusionment phase where I realised most of the work I was asked to do wasn't actually to make anything better, but rather to achieve some very specific metrics that actually made everything but that metric worse.
Thanks to the human tendency to fixate on narratives, we can (for a while) trick ourselves into believing a nice story about what we're doing even if it's complete bunk. I think that false narrative is at the core of mission statements and why they intuitively feel fake (mission statement is often more gaslighting than guideline - it's the identity a company wants to present, not the reality it does present).
AI is eager to please and doesn't have to deal with that cognitive dissonance, so it's a metric chaser's dream.
<< They have always wanted us to magically know what parts are "good enough" and what parts can slide but for us to bear the burden of blame.
Well, that part is bound to add a level of tension to the process. Our leadership has AI training, where the user is responsible for checking its output, but the same leadership also outright stated it now sees individual user of AI as having 7 employees under them ( so should be 7x more productive ). Honestly, its maddening. None of it is how it works at all.
> This really is what businesses want and always have wanted.
There's a difference between what they really want and executives knowing what they want. You make it sound like every business makes optimal decisions to get optimal earnings.
> They literally did not want it fixed when I brought it up because
Because they thought they knew what earns them profits. The key here they thought they knew.
The real problem behind the scenes is a lot of management is short term. Of course they don't care. They roll out their shiny features, get their promotions and leave. The issues after that are not theirs. It is THE business' problem.
Senior Software Engineer. The system is a niche business software software for a specific industry. It doesn't do any fancy math, all straightforward business logic.
> Trying to get clean code erases all/most of my productivity gains, and doesn't spark joy. I find having a back-end-forth with an agent exhausting, probably because I have to build and discard multiple mental models of the proposed solution, since the approach can vary wildly between prompts
You probably work on something that requires very unique and creative solutions. I work on dumb business software. Claude Code is generally good at following existing code patterns. As far as back-and-forth with Claude Code being exhausting, I have few tips how how to minimize number or shots required to get good solution from CC:
1. Start by exploring relevant code by asking CC questions.
2. Then use Plan Mode for anything more than trivial change. Using Plan Mode is essential. You need to make sure you and CC are on the same page BEFORE it starts writing code
3. If you see CC making same mistake over and over, add instructions to your CLAUDE.md to avoid it in the future. This way your CC setup improves over time, like a coworker who learns over time.
Thank you for the actionable ideas. I'll experiment with closer supervision during the planning stage, hopefully finer-grained implementation details will reduce unnecessarily large refactors during review.
Claims about agentic workflows are the new version of "works on my machine" and should be treated with skepticism if they cannot be committed to a repository and used by other people.
Maybe parent is a galaxy-brained genius, or.. maybe they are just leaving work early and creating a huge mess for coworkers who now must stay late. Hard to say. But someone who isn't interested in automating/encoding processes for their idiosyncratic workflows is a bad engineer, right? And someone who isn't interested in sharing productivity gains with coworkers is basically engaged in sabotage.
> And someone who isn't interested in sharing productivity gains with coworkers is basically engaged in sabotage.
Who says they aren't interested in sharing? To give a less emotionally charged example: I think my specific use pattern of Git makes me (a bit) more productive. And I'm happy to chew anyone's ear off about it who's willing to listen.
But the willingness and ability of my coworkers to engage in git-related lectures, while greater than zero, is very definitely finite.
Something that is advertised as 10x improvement in productivity isn't like your personal preferences for git or a few dinky bash aliases or whatever. It's more like a secret personal project test-suite, or a whole data pipeline you're keeping private while everyone else is laboriously doing things manually.
Assuming 10x is real, then again the question: why would anyone do that? The only answers I can come up with are that they cannot share it (incompetence) or that they don't want to (sabotage). You're saying the third option is.. people just like working 8 hours while this guy works 1? Seems unlikely. Even if that's not sabotaging coworkers it's still sabotaging the business
The reason is because we are a Microsoft shop and our company doesn't have Claude account. I'm using my personal Claude Max account. My manager does know that I use Claude Code and I requested the person responsible for AI tooling in our company to use Claude Code but he just said that management already decided to go with GitHub copilot. He thinks that using Claude model in Copilot is same as using Claude Code. Another issue is that we are a Microsoft shop and I use Claude Code through WSL but I'm the only person on our team with Linux skills.
There are methods of connecting the claude code cli tools to copilot’s api — look at litellm or something along those lines, it’s a pip pkg and translates the calls code makes
Business and Enterprise plans have a no-training-on-your-data clause.
I’m not sure personal Claude has that. My account has the typical bullshit verbiage with opt-outs where nobody can really know whether they’re enforceable.
Using a personal account is akin to sharing the company code and could get one in serious trouble IMO.
You can opt-out of having your code being trained on. When Claude Code first came out Anthropic wasn't using CC sessions for training. They started training on it starting from Claude Code 2 that came out with Sonnet 4.5. User is asked on first use whether to opt-in or out of training.
> You're saying the third option is.. people just like working 8 hours while this guy works 1?
Nope, I don't say that at all.
I am saying that certain accommodations might feel like 10x to the person making them, but that doesn't mean they are portable.
Another personal example: I can claim with a straight face that using a standing desk and a Dvorak keyboard make me 10x more productive than otherwise. But that doesn't necessarily mean that other people will benefit from copying me, even if I'm happy to explain to anyone how to buy a standing desk from Ikea (or how to work company procurement to get one, in case you are working not-from-home).
In any case, the original commenter replied with a better explanation than our speculations here.
> And someone who isn't interested in sharing productivity gains with coworkers is basically engaged in sabotage.
I'll have to vigorously dissent on this notion: we sell our labor to employers - not our souls. Our individual labor, contracts and remuneration are personalized. Our labor. Not some promise to maximize productivity - that's a job for middle and upper management.
Your employer sure as hell won't directly share 8x productivity gains with employees. The best they can offer is a once-off, 3-15% annual bonus (based on your subjective performance, not the aggregate), alternatively, if you have RSU/options, gains on your miniscule ownership fraction.
It seems to me that the devs that managed to become sergeants of a small platoon of LLM agents to a crushing success deem their setup a competitive advantage and as such will never share it.
But them being humans, they do want to brag about it.
I'm teaching a course in how to do this to one of my clients this week.
Also, I used this same process to address a bug that is many years old in a very popular library this week. Admittedly, the first solution was a little wordy and required some back and forth, but I was able to get to a clean tested solution with little pain.
This has been my experience too. At the end of each session, i’m left very exhausted mentally without full understanding of what I just did, so I have to review it again.
Coding this way requires an effort that is equal to both designing, coding, and reviewing except the code i review isnt mine. Strange situation.
Well for me, all of my actual implementation work has been green field from “git init” and mostly coding around the AWS SDK in the target language and infrastructure as code since AI coding has gotten decent.
I haven’t had to write a line of code in a year. First ChatGPT and more recently Claude Code.
I don’t do “agentic coding”. I keep my hands on the steering wheel and build my abstractions and modules up step by step. I make sure every line of code looks like something I would write.
I’m a staff consultant (cloud + app dev) and always lead projects, discovery and design and depending on the size of the project, do all of the actual hands on work myself.
I would have had to staff at least one maybe two less senior consultants to do the actual hands on work before. It’s actually easier for me to do the work then having to have really detailed requirements and coordinating work (the whole “Mythical Man Month” thing).
FWIW: before the pearl clutching starts, I started coding in assembly in 1986 on an Apple //e and have been delivering production level code since 1996.
I have tech adjacent people on my team vibing out internal tools that are super useful, and take a load off of engineering. Most internal software is rehashing existing software with different/specific requirements.
Exactly what I experience. I don't need AI to generate complex algorithm, I need e.g. a lot of code for a UI library that is clean and maintainable - but it's can't ever generate such code and it can't be prompted, because training data has much less excellent code than good and ok code. Therefore I can't use AI for high-level design task, ony low-level code, which I then have to check and clean line by line, and that isn't an enjoyable work.
I don't need LLMs, I need some kind mind reading device :D
Not the OP but we use LLMs to build a restaurant pos system with reservations, loyalty, webshop etc. Almost at feature parity with bigwigs like lightspeed/toast.
> I find having a back-end-forth with an agent exhausting, probably because I have to build and discard multiple mental models of the proposed solution, since the approach can vary wildly between prompts
Just right now I had it improve QR payments on POS. This is standard stuff, and I have done it multiple time but i'm happy I didn't have to spend the mental energy to implement it and just had to review the code and test it.
```
Perfect! I've successfully implemented comprehensive network recovery strategies for the OnlinePaymentModal.tsx file. Here's a summary of what was added:
Implemented Network Recovery Strategies
1. Exponential Backoff for Polling (lines 187-191)
2. Network Status Detection (lines 223-246, 248-251)
3. Transaction Timeout Handling (lines 110-119)
4. Retry Logic for Initial Transaction (lines 44-105)
5. AbortController for Request Cancellation (lines 134-139, 216-220)
6. Better Error Messaging (lines 85-102, 193-196)
7. Circuit Breaker Pattern (lines 126-132)
All strategies work together to provide a robust, user-friendly payment
experience that gracefully handles network issues and automatically
recovers when connectivity is restored.
```
> An agent can easily switch between using Newton-Raphson and bisection when asked to refactor unrelated arguments, which a human colleague wouldn't do after a code review.
Can you share what domain your work is in? Is it deeptech. Maybe coding agents right now work better for transactional/ecommerce systems?
I don't know if that example is real, but if it is, that's exactly the reason I find AI tools irritating. You do not need six different ways to handle the connection being down, and if you do, you should really factor that out into a connection management layer.
One of my big issues with LLM coding assistants is that they make it easy to write lots & lots of code. Meanwhile, code is a liability, and you should want less of it.
You are talking about something like network layers in graphql. That's on our roadmap for other reasons(switching api endpoints to digital ocean when our main cloudflare worker is having an outage), however even with that you'll need some custom logic since this is doing at least two api calls in succession, and that's not easy to abstract via a transaction abstraction in a network layer(you'll have handle it durably in the network layer like how temporal does).
Despite the obvious downsides we actually moved it from durable workflow(cf's take of temporal) server side to client since on workflows it had horrible and variable latencies(sometimes 9s v/s consistent < 3s with this approach). It's not ideal, but it makes more sense business wise. I think many a times people miss that completely.
I think it just boils down to what you are aiming. AI is great for shipping bugfixes and features fast. At a company level I think it also shows in product velocity. However I'm sure very soon our competitors will catch up when AI skepticism flatters.
> Most of software work is maintaining "legacy" code, that is older systems that have been around for a long time and get a lot of use.
That's not the definition of legacy. Being there for a long time and getting lots of use is not what makes a legacy project "legacy".
Legacy projects are characterized by not being maintained and having little to no test coverage. The term "legacy" means "I'm afraid to touch it because it might break and I doubt I can put it back together". Legacy means resistance to change.
You can and do have legacy projects created a year or two ago. Most vibecoded apps fit the definition of legacy code.
That is why legacy projects are a challenge to agentic coding. Agents already output huge volumes of code changes that developers struggle to review, let alone assert it's correctness. On legacy projects that are in production, this is disastrous.
What you list are common characteristics encountered in legacy systems, but what makes it legacy is a business decision of declaring it obsolete and in maintenance mode, and so that no money or time to be invested in it. Old systems that continues to evolve are not legacy, like say Linux, and yes like you say a project that is only one year old can be declared legacy. Resistance to change is only an economic variable that drives the decision. Vibecoded apps fits the definition because the developer is unlikely to want to invest more time in them for different reasons.
> What you list are common characteristics encountered in legacy systems, but what makes it legacy is a business decision of declaring it obsolete and in maintenance mode, and so that no money or time to be invested in it.
No, not necessarily. Business decisions are one of the many factors in creating legacy code,but they are by no means the single cause. A bigger factor is developers mismanaging a project to the point they became a unmaintainable mess. I personally went through a couple of projects which were legacy code the minute they hit production. One of them was even a proof of concept that a principal engineer, one of those infamous 10x types, decided to push as-is to production and leave two teams of engineers to sort out the mess left in his wake.
I recommend reading "Working Effectively with Legacy Code" by Michael Feathers. It will certainly be eye-opening to some, as it dispels the myth that legacy code is a function of age.
So a project that's still using Java 1.6 and has perfect test coverage and some poor developer is paid to maintain it (but NOT upgrade it!) is not "legacy" in your book?
Then we disagree on the definition.
"Legacy" projects to me are those that should've went through at least two generational refactorings but haven't because of some unfathomable reason. These are the ones that eventually end up being rewritten from scratch because it's faster than trying to upgrade the 25 year old turd.
I've predominantly worked in two industries, healthcare/public health and insurance where policies terms are measured in decades. The software for both ranges from 20 to 40 years old, and it hasn't been upgraded because to do so poses an existential risk to either the business or, in the case of healthcare, to human life. Upgrades are measured in terms of human generations because of said risk, but I wouldn't call these systems legacy due to not moving beyond java 1.6.
Claude is insanely good at grunt-work maintenance coding, which is a fairly formulaic exercise that mostly requires RTFM and simple code changes that look a lot likw the surrounding code. Designing new things from scratch based on human specs is something which Claude still struggles with.
The problem is that it often doesn't get it right the first time. You have to sort of have a conversation and it eventually gets there but if you have no idea what the destination should be like, you can't guide it there.
Although many tools exist, there still seems to a large context gap here: we need better tools to orient ourselves and to navigate large (legacy) codebases. While not strictly a a source graph or the like, I do think Enso like interface may prove successful here[0].
> It's not about its ability to churn out lots of code quickly: it's an extra set of eyes/brain that works much faster that human developer.
This is the key take right here. LLMs excel at parsing existing content, summarizing it, and use it to explore scenarios and hypotheticals.
Even the best coding agents out there such as Claude Code or Gemini often fail to generate at the first try code that actually compiles, let alone does what it is expected to do.
Apologists come up with excuses such as the legacy software is not architected well enough to be easily parseable by LLMs but that is a cheap excuse. The same reference LLMs often output utter crap in greenfield projects they themselves generated, and do so after a hand full of prompts. The state of a project is not the issue.
The world is coming to the realization that the AI hype is not delivering. The talk of AI bubble is already mainstream. But like IDEs with auto complete, agents might not solve every problem but they are nevertheless useful and are here to stay. They are more a kin to a search engine where a user doesn't need to copy/paste code snippets to apply a code change.
I honestly don't know what code bases you guys are working with, for me I tried it with a large quant library (C++ 97) in an effort to modernize it and so far it's been nothing hit a waste of time. Similarly for a medium sized python quant codebase (3.6) trying to port it to 3.12, and it's also been a headache.
Completely agree. In the past 12 months, I've had five or six use cases that I would not have bothered scripting or automating before, but I've cranked out scripts or even small web services in under an hour that get the job done using AI. It has really revolutionized the super small bite-sized issues
exactly this!
You can do things nearby with one prompt which would have taken weeks before to tinker around; esp. ClaudeAI is very good with "give a detailed first prompt and some context sourcefiles and create a working example on first shot".
E.g. I have to deal with lot of reports and those are usually "never fully developed" because "we can add this one row/feature later" because "management want us to ship early".
Now I can enhance our reports by whatever metric just by handing over a current XLSX-exportcode and tell the LLM: "now i want additionally XY here...."
Well said. The cost of building a CRUD has dropped 90%.
The open question is why people needed fancy AI tools like Claude to write CRUDs in the first place. These kind of tasks ought to be have been automated a long time ago.
> These kind of tasks ought to be have been automated a long time ago
They have been, repeatedly, since the 70s. See dBase, Clipper, Microsoft Access, Hypercard, Ruby on Rails, stretching Wordpress to within an inch of its life, all manner of "no-code" things...
And, honestly, Excel. People do all manner of terrifying things with Excel, and it is unquestionably the most successful, and arguably the _only_ successful, "we can do this thing instead of employing a programmer" tool.
Generally, one of two things has happened. Either (a) the products of such automation become unmaintainable nightmares (common for the more automated approaches like MS Access) or (b) they become complex enough that they tend towards 'normal' programming (common with, say, Rails, where you could get a simple CRUD with basically just DSL, but realistically eventually you're gonna be writing lots of Ruby).
I feel like LLM-produced stuff is probably going to fall into column A.
Excel and Google Sheets are indeed where most non-programmers frequently come the closest to programming and actually create useful apps for themselves.
So what’s interesting is that Copilot is basically useless for this task, as is Gemini. How is Microsoft messing up this badly?
> These kind of tasks ought to be have been automated a long time ago.
It’s much easier to write business logic in code. The entire value of CRUD apps is in their business logic. Therefore, it makes sense to write CRUD apps in code and not some app builder.
And coding assistants can finally help with writing that business logic, in a way that frameworks cannot.
CRUD as a concept is flawed. It is more or less any computational system with input -> process -> output. Just as this abstract system can have any complexity, the same is true for any CRUD app.
You don't need Claude to write it. But you cannot generate solid web forms with the same speed. What usually would have taken you a few hours is now solved in much less time.
I doubt software will get cheaper though, requirements will adapt.
> These kind of tasks ought to be have been automated a long time ago.
People have been trying for literally decades. The problem is that there is just enough uniqueness to every CRUD app that you can't really have "the CRUD app".
I guess it's the sweet spot for AI at the moment because they're 95% all the same but with some fairly simple unique aspects.
Most code is simple, the fact that large complex systems are layers of simple code on top of itself, like garbage heaps at the dump, makes it complex. Sticking with the garbage analogy, the LLM is like upgrading from one shovel to an crew of 10 people with excavators to look for a lost Bitcoin hard drive.
Your project is still going to fail, but it will fail faster with the 10 excavators.
The analogy I use is going from hand-farming or farm animals to large combines overnight. You still need all the knowledge abouw farming, but it amplifies and multiplies your ability.
LLMs work great at identifying libraries I'd never have otherwise found and use them, as long as you ask them for solutions instead of micromanage how they should get things done.
Aren't we having major issues with there being too many small libraries right now and dependency chain that grows exponentially? I have thought LLMs will actually benefit us a lot here, with not having to use a lib for every little thing (leftpad etc?).
That's primarily a culture problem, mostly with Javascript (you don't really see the same issue in most language ecosystems). Having lots of tiny libraries is bad, but writing things covered by libraries instead of using _sensible_ libraries is also bad.
(IMO Javascript desperately needs an equivalent to Boost, or at the very least something like Apache Commons.)
That was probably a node / npm thing, because they had no stdlib it was quite common to have many small libraries.
I consider it an absolute golden rule for coding to not write unnecessary code & don't write collections.
I still see a lot of C that ought not to have been written.
I'm a grey beard, and don't fear for my job. But not relying on AI if it's faster to write, is as silly as refusing a correct autocomplete and typing it by hand. The bytes don't come out better
And not everyone wants to use a cloud AI either. Remember that when tons of cash is on the table, things like license agreements become less enforceable and more of a "don't get caught in the cookie jar" thing. All it would take is something similar to what's going on with book authors/publishers - a major AI provider exposed as using other firms' proprietary code without even considering to get a license - to totally blow up the "safety" of cloud based coding agents.
Local models are becoming more and more capable but the tooling still needs to get better for those.
While I would love for this to be true for financial and egotistical reasons, I have a growing feeling that this might not be true for long unless progress really starts to stall.
I've actually gone in the other direction. A year ago, I had that feeling, but since then I've gotten more certain that LLMs are never going to be able to handle complexity. And complexity is still the real problem of developing software.
We keep getting more cool features in the tools, but I don't see any indication that the models are getting any better at understanding or managing complexity. They still make dumb mistakes. They still write terrible code if you don't give them lots of guardrails. They still "fix" things by removing functionality or adding a ts-ignore comment. If they were making progress, I might be convinced that eventually they'll get there, but they're not.
Yeah but on the other hand there are plenty of human programmers that are bad at understanding complexity, make dumb mistakes, and write terrible code. Is there something fundamentally different about their brains to mine? I don't think so. They just aren't as good - not enough experience, or not enough neurons in the right places or whatever it is that makes some humans better at things than others.
So maybe there isn't any fundamental change needed to LLMs to take it from junior to senior dev.
> They still "fix" things by removing functionality or adding a ts-ignore comment.
I've worked with many many people who "fix" things like that. Hell just this week, one of my colleagues "fixed" a failing test by adding delays.
I still think current AI is pretty crap at programming anything non-trivial, but I don't think it necessarily requires fundamental changes to improve.
this whole analogy is so tired. "LLMs are stupid, but some humans are stupid too, therefore LLMs can be smart as well". let's put aside the obvious bad logic and think for one second about WHY some people are better than others at certain tasks. it is always because they have lots of practice and learned from their experiences. something LLM categorically cannot do
> LLMs are stupid, but some humans are stupid too, therefore LLMs can be smart as well
Not what I said. The correct logic is "LLMs are stupid, but that doesn't prove that they MUST ALWAYS be stupid, in the same way that the existence of stupid people doesn't prove that ALL people are stupid".
> let's put aside the obvious bad logic
Please.
> WHY some people are better than others at certain tasks. it is always because they have lots of practice and learned from their experiences.
What? No it isn't. It's partly because they have lots of practice and learned from experience. But it's also partly natural talent.
> something LLM categorically cannot do
There's literally a step called "training". What do you think that is?
The difference is that LLMs have a distinct off-line training step and can't learn after that. Kind of like the Memento guy. Does that completely rule out smart LLMs? Too early to tell I think.
I wouldn't say that the distinction is so much about code being "simple", but about code being made of patterns common enough in online examples. Claude Code and similar can write even very complex code, as long as it's something they have been trained on.
That is a good approach, bottom up, manage complexity. But the general picture is - you set the direction and hold the model responsible, it does the actual work. Think of it as your work is the negative of the AI work, it writes the code, you ensure it tests that code. The better test harness you create, the better the AI works. The real task is to constrain the AI into a narrow channel of valid work.
At the leafs of the branches I'm comfortable to just generate code (e.g. a popup dialog). But I want to have a good grasp of code that is central of the application.
Yes, but for experienced engineers that is still a huge huge change .
Even 12 months ago simplifying tasks alone was insufficient, you still needed a large group engineers to actually write, review and maintain a typical product for solid startup offering. This came with the associated overhead of hiring and running mid sized teams.
A lot of skilled people (y)our age/experience are forced into doing people management roles because there was no other way to deliver a product that scales(in team and complexity not DAU).
A CTO of mid-stage startup had to be good architect, a decent engineering manager, be deeply involved in product and also effectively communicate with internal and external customers.
Now for startups setting up new you can either defer the engineering manager and people complexity lot latter than you did before. You could have a very senior but small team who can be truly 10x level and be more productive without the overhead of communication, alignment and management that comes with large teams.
----
tldr; Skilled engineers can generate outsized returns to orgs that set them up to be successful(far more than before), I can't say if compensation is reflecting this yet, if not it soon will.
The funny thing is a lot of that was never really necessary per se. Tons of stories about great projects coming out of tiny teams. They’re not likely geniuses, they just had focus and clarity and a drive to GSD without excessive unproductive activity. I’ve long been a proponent of offshore developers for cost savings. You have to manage the process and people differently, but the output per dollar (pre AI) was phenomenal and when managing them I could put my brand of low touch management in place. Usually consisting of one weekly 1 hour meeting for everyone, then emphasizing nobody spins their wheels ever for more than an hour during the week without asking for help, then just making sure everyone was crystal clear on what we were working on and the priorities. I’ve never been a fan of sprints or really any unit of time block as a milestone because I don’t think it incentivizes people to finish early. I’m also not a perfectionist. If it’s spaghetti code and it works, great, we can clean it up on the next pass (within reason of course, but spirit is build, test, operationalize, then if it’s useful and has some staying power then refactor later. For all this, hiring cheap labor overseas has always made much more sense than hiring locally (in US) based on cost but also based one working style/culture. Somehow as labor rates shot up here in last couple of decades, people found excuses not to offshore. Some of it valid if you can’t manage the project correctly as it is different, but for me the solution has been to adapt my management style versus crying about it being difficult and hiring locally to be lazy. It always struck me as odd that startups and investors hadn’t leveraged the labor rate arbitrage opportunity that exists.
I have noticed lately that getting into an USA company as a foreigner became very difficult. I get a lot of praise on culture fit and tech assignments and then get told off with something very similar to "can't get compliance to agree to work with Bulgaria".
Sigh.
But I get what you mean. I started using LLMs and that gave me a perspective what it is to be an engineering manager.
I fed Claude Pro a REST API spec and told it to spit out a Powershell module and well... So far that 27k lines of code largely checks out (minus the undocumented stuff I knew about).
Getting it to write the pester scripts was a very different matter...
Had the cost of building custom software dropped 90%, we would be seeing a flurry of low-cost, decent-quality SaaS offering all over the marketplace, possibly undercutting some established players.
From where I sit, right now, this does not seem to be the case.
This is as if writing down the code is not the biggest problem, or the biggest time sink, of building software.
The keyword is "building". Yes costs may have dropped 90% just to build software. But there are 1000 other things that comes after it to run a successful software for months let alone years.
- Maintenance, Security
- Upgrades and patches
- Hosting and ability to maintain uptime with traffic
- Support and dealing with customer complexities
- New requirements/features
- Most importantly, ability to blame someone else (at least for management). Politics plays a part. If you build a tool in-house and it fails, you are on the chopping block. If you buy, you at least can say "Hey everyone else bought it too and I shouldn't be fired for that".
Customers pay for all of the above when they buy a SAAS subscription. AI may come for most of the above at some point but not yet. I say give it 3-5 years to see how it all pans out.
Good points but this list is missing the most critical problem which AI does not solve; exposure.
What you've listed are the easy parts that are within people's control. You didn't list the most critical part, the actual bottleneck which is not within people's control.
The market is now essentially controlled by algorithms. I predict there will be amazing software... Which will end up ignored by the markets completely until their features are copied by big tech and nobody will know where the idea originated.
Building is absolutely worthless in the context of a monopolized marketplace.
> Good points but this list is missing the most critical problem which AI does not solve; exposure.
There are SO FUCKING MANY tools for marketing your shitty SaaS all over subreddits dedicated for people to advertise their new services and applications.
I had to unsubscribe from all of them because about a year ago they went from semi-interesting to 100 different "my SaaS AI tool will automatically advertise your AI SaaS tool on social media" solutions every week.
This is assuming the marketplace works perfectly... Which is an incorrect assumption. Reality is that the marketplace is highly controlled by algorithms. New platforms will struggle to get exposure... No exposure, no credibility, no word of mouth, no users, catch 22... You think the big players will allow small SaaS projects to gain traction on their platforms? Have you seen how centralized the Internet is these days? Have you seen how afraid people are of betting on no-name platforms? If they choose the wrong no-name platforms and tools, they will lose their (increasingly precious) jobs. As the saying goes "Nobody lost their job for choosing IBM." As for B2C; it's dead, consumers don't have money and will have less of it in the future; the mass-market game is over.
My bet is if there were a lot of great apps being built, even excellent quality, nobody would even hear about them. The big players would copy them before anyone even found out about them.
IMO, the market is not even a playing field anymore, this is why everyone is getting into politics now, though politics is also somewhat monopolized, there is still more potential for success because there is such an abundance of dissatisfied people willing to look outside of mainstream channels. It's much easier to sell political ideologies than to sell products.
It's not the same because who controls the algorithms matters here. The algorithms work for some entities and against other entities. They are not neutral at all. They are aligned through shared monetary incentives, so well aligned that they would probably be less aligned if it was a literal conspiracy.
TBH. I'm kind of shocked I still have to explain this. When you get on the wrong side of the algorithms you will understand, you will understand viscerally. And I do mean 'when' not 'if'.
Maybe the algorithms have been working for you so far and you're not feeling them but just give it a few years. Unfortunately, once you understand, you won't have a voice anymore and those still in the game won't have enough empathy to help you.
Do you mean search or recommendation algorithms or something like that?
To me an algorithm is just something used to compute a result based on some rules - but apparently it has some different meaning for you that you just take for granted
To be fair, writing a SaaS software is like an order, perhaps two orders of magnitude more effort than writing software that runs on a computer and does the thing you want. There's a ton of stuff that SaaS is used for now that's basically trivial and literally all the "engineering" effort is spent on ensuring vendor lock in and retaining control of the software so that you can force people to keep paying you.
You might not be looking hard enough. There are a few sources you could look at, one is the GitHub Awesome YouTube channel. I am seeing a lot of several-hundred-stars open source projects with unreasonably large codebases starting to gain traction. This is the frontier of adoption, and my guess is this will start cascading outward.
I think you underestimate just how hard visibility is. If something is free or super low cost than they won't have any marketing budget for you to hear about it in the first place because it would be unprofitable...
One thing I've come to realize is that if something is cheap enough then people won't even want to promote it because if they get a commission on it then it won't be worth their time. So in some cases they will be better off recommending a much higher price competitor. Just go Google around for some type of software (something competitive and commercial like CRMs) and you'll notice why for commercial projects nobody is recommending free or really cheap solutions because it's not in anybody's best interest
Why? I don't want to bother making all the software that the AI wrote for me work on someone else's machine. The difference between software that solves my problem and that solves a problem many people have is also often like an order of magnitude of effort.
And why would this happen? Local to what every SaaS product I use is available on my Mac, Windows, iPhone and iPad and the web. Some are web only and some are web and apps.
Who is going to maintain the local software? Who is going to maintain the servers for self hosted or the client software?
This. I have a massive amount of custom software running locally to solve all sorts of problems for me now.
But it's for me and tailor made to solve my precise use cases. Publishing it would just add headaches and endless feature requests and bug reports for zero benefit to me.
Also also, we should reach the point where you have decent quality source code for a local application, and you can tell GPT "SaaS this", and it works.
With a SaaS, you have one platform that you fully control. Broken dependency? Need to update/rollback? It's all in your hands.
Local software has to target multiple OSes, multiple versions of those OSes, and then a million different combinations of environments that you as a developer have no control over. Windows update whatever broke your app, but the next one fixed it? Good luck getting your user base to update instead of being pissed at you
A single Go binary can cross-compile to multiple OS-versions with a simple Github Action.
And if it's a free open source application, why would I care if someone can't run it on their specific brand of OS? I'm open to PRs.
If the "user base" wants to update, they can come to the github page and download the latest binary. I'm not building an autoupdater for a free application.
But you're talking about a free open source application without guarantees, that's not comparable to a SaaS vs self-hosted "paid" software in model.
And even for the cases where you it is, even with a modern language like Go that makes it easy, you still have tons of OS specific complexity. Service definitions, filesystem operations, signal handling, autoupdates if you want them, etc etc.
It has dropped by maybe MORE than 90%. My sons school recently asked me to build some tools for them -- I did this over a decade ago for them, for free. I did it again using AI tools (different problem though) and I had it mostly done in 30 minutes (after I got the credentials set up properly -- that took up more time than the main coding part). This was probably several days of work for me in the past.
But in the past, you knew the codebase very well, and it was trivial to implement a fix and upgrade the software. Can the same be done with LLMs ? Well from what I see, it depends on your luck. But if the LLMs can't help you, then you gotta read the whole codebase that you've never read before and you quickly lose the initial benefits. I don't doubt someday we'll get there though.
I've hit this in little bursts, but one thing I've found is that LLMs are really good at reasoning about their own code and helping me understand how to diagnose and make fixes.
I recently found some assembly source for some old C64 games and used an LLM to walk me through it (purely recreational). It was so good at it. If I was teaching a software engineering class, I'd have students use LLMs to do analysis of large code bases. One of the things we did in grad school was to go through gcc and contribute something to it. Man, that code was so complex and compilers are one of my specialties (at the time). I think having an LLM with me would have made the task 100x easier.
Does that mean you don't think you learned anything valuable through the experience of working through this complexity yourself?
I'm not advocating for everyone to do all of their math on paper or something, but when I look back on the times I learned the most, it involved a level of focus and dedication that LLMs simply do not require. In fact, I think their default settings may unfortunately lead you toward shallow patterns of thought.
I wouldn't say there is no value to it, but I do feel like I learned more using LLMs as a companion than trying to figure everything out myself. And note, using an LLM doesn't mean that I don't think. It helps provide context and information that often would be time consuming to figure out, and I'm not sure if the time spent is proportional to the learning I'd get from it. Seeing that these memory locations mapped to sprites that then get mapped to those memory locations, which map to the video display -- are an example of things that might take a minute to explore to learn, but the LLM can tell me instantly.
I think the difficulty I have is that I don't think it's all that straightforward to assess how it is exactly that I came not just to _learn_, but to _understand_ things. As a result, I have low confidence in knowing which parts of my understanding were the result of different kinds of learning.
Learning things the hardest way possible isn't always the best way to learn.
In a language context: Immersion learning where you "live" the language, all media you consume is in that language and you just "get" it at some point, you get a feel for how the language flows and can interact using it.
vs. sitting in a class, going through all the weird ways French words conjugate and their completely bonkers number system. Then you get tested if you know the specific rule on how future tenses work.
Both will end up in the same place, but which one is better depends a lot on the end goal. Do you want to be able to manage day-to-day things in French or know the rules of the language and maybe speak it a bit?
I'd say this is similar to working with assembly vs c++ vs python. Programming in python you learn less about low level architecture trivia than in assembly, but you learn way more in terms of high level understanding of issues.
When I had to deal with/patch complex c/c++ code, I rarely ever got a deep understanding of what the code did exactly - just barely enough to patch what was needed and move on. With help of LLMs it's easier to understand what the whole codebase is about.
The most brilliant programmer I know is me three years ago. I look at code I wrote and I'm literally wondering "how did I figure out how to do that -- that makes no sense, but exactly what is needed!"
Turns out, that is also past me. In fact, often the incredible code that brilliant me wrote, which I don't understand now, is also the code that reckless me wrote that I now need to fix/add to -- and I have no idea where to start.
"Building software" is a bit too general, though. I believe "Building little web apps for my son's school" has gotten at least 10x easier. But the needle has not moved much on building something like Notion, or Superhuman, or Vercel, or <insert name of any non-trivial project with more than 1000 man-hours of dev work>.
Even with perfect prompt engineering, context rot catches up to you eventually. Maybe a fundamental architecture breakthrough will change this, but I'm not holding my breath.
Yeah, that's not a comparison to the kinds of highly complex internal systems I worked with the Fortune 1xx companies, particularly the regulated ones (healthcare). The whole "my son's school" thing is very nice, and it's cool you can knock that out so fast, but it's nothing at all like the environments I worked in, particularly the politics.
Well, because no self interested decision maker in any company of size is going to ever trust their business to an unknown company run by a one person operation.
And why would the benefits of being able to code faster accrue to a small independent developer over a large company that already has an established reputation and a customer base?
“No one ever got fired for buying Salesforce”.
I once had influence over the buying decision to support an implementation I was leading. I found this perfect SaaS product by a one man shop who was local.
Working with my CTO and lawyers, we made a proposal to the founder. We would sign with him and be 70% of his post signing revenue if he agreed to give us our own self hosted instance and put his latest code in escrow with a third party (Green Mountain) and we would have non exclusive rights to use the code (but not distribute it) under certain circumstances.
He never said that companies will trust a one man shop. His point was clearly that people and companies will make products designed for themselves, using LLMs.
Why pay for a piece of software that you really only use 5% of all the features, and still may need customizations for. Vs just internally have somebody code a custom solution for your company.
The only benefit of a outside solution is that you can blame a outsider. Internal solution used to be bad because if the person with the knowledge of the codebase left, you ended up screwed. But with LLMs and "vibe" coding, there becomes a disconnect between the code and whoever wrote it. Making it easier to later make modifications on that same codebase, using ... LLMs.
We have seen this before with home grown VB apps, excel spreadsheets with VBScript, FoxPro etc. How has that turned out every single time as requirements changed and the number of people dependent on it grew?
I think in a couple of years we are going to same type of mess. We are already seeing a bunch of shitty AI companies getting funded with no technical cofounders. Look at a few of the YC companies
It is happening though internally in businesses I've worked with. A few of them are starting to replace SaaS tools with custom built internal tooling. I suspect this pattern is happening everywhere to a varying level.
Often these SaaS tools are expensive, aren't actually that complicated (or if they are complicated, the bit they need isn't) and have limitations.
For example, a company I know recently got told their v1 API they relied on on some back office SaaS tool was being deprecated. V2 of the API didn't have the same features.
Result = dev spends a week or two rebuilding that tool. It's shipped and in production now. It would have taken similar amount of time to work around the API deprecation.
We were paying for Salesforce, then built the features we needed to do the same tracking into our interal tool and got rid of Salesforce to save money and simplify the data internally across departments
And now you have to spend money on developers for a system that “doesn’t make the beer taste better”. Does it give you a competitive advantage in the market?
We did the same. We replaced a proprietary build system with our own. The SaaS product we used was super expensive, had a very gougy licensing scheme, had a bunch of features that either didn't work for us, or were so overcomplicated, that we ended up not using them. Before the rewrite, we bypassed like 90% of the internal features, and relied on custom scripts to do everything.
Every SaaS feature in my experience ends up being a mess due to having to support a billion use cases, and figuring it out is more trouble than its worth, might not be able to do what you want, might be buggy.
But even if you do all that stuff, you end up with a mess that can be replaced with 5 lines of shell script. And many more people know shell scripting than figuring out the arcane BS that goes on inside that tool.
It's the eternal lowcode story.
> 'doesn’t make the beer taste better'
I'd say it did. Having a CI/CD pipeline where you don't have to wait for other people's builds, the build logic is identical to what's running on dev PCs, and everything is all-around faster, and more understandable (you can read the whole source) makes testing easier, and surprises less frequent.
All in all, making a hour-long CI/CD turnaround time into 5 minutes or less has been an incredible productivity boost.
We already had Developers and the system in place this was a tiny feature in the scheme of things.
Internally it gives us a competitive advantage of the data being in our system from the beginning of the pipeline through the rest of the system where the data would be needed anyway.
Saved money in the short term. But maintenance costs money. Amazon has all of the money in the world and could easily duplicate everything Salesforce does. Yet they use Salesforce internally.
All the money in the world would not be sufficient to cover the cost of seeing human developers duplicate Salesforce on any reasonable time scale. There are simply not enough developers in existence to see that happen, driving the cost towards infinity.
The idea here, however, is that machine developers are changing the calculus. If you need more machine developers it takes, what, a few days to produce the necessary hardware? Instead of 20+ years to produce the legacy human hardware. Meaning, for all intents and purposes, there is no observable limit to how much software machine can create, driving the cost towards zero.
Yeah, sure, the tech still isn't anywhere near capable enough to reproduce something like Salesforce in its entirety. But it is claimed that it is already there for the most trivial of services. Not all SaaS services are Salesforce-like behemoths. Think something more like patio11's bingo card creator. It is conceivable, however, that technology advancement will continue such that someday even Salesforce becomes equally trivial to reproduce.
Maintenance is not a meaningful cost unless you also want to continually have the software do more and more. That could tip the favour towards SaaS — but only if the SaaS service is in alignment with the same future you wish for. If you have to start paying them for bespoke modifications... Have fun with that. You'll be wishing you were paying for maintenance of your own product instead. Especially when said machines drive the cost of that maintenance to near-zero all the same.
I like your analysis but it seems to imply that at one point we can produce near-infinite amount of software and that this will be welcome.
It will not be. Even in this fairly broken state of affairs we are currently in, most non-technical people I spoke to already say that they have too much apps and too much machines with "intelligent" features.
And IMO when we have machines that can crank out a complete-but-better Salesforce, our civilization and race would be in an entirely another level and we would see such things as toys. Who needs that antiquated procurement and tracking expenses software, where's our 174th fusion reactor? What is even that in fact? Oh you mean that nail-sized addon we put on our main processing unit? Yeah we're not interested in ancient software history now. We need more juice to capture those gases around Jupiter for the wireless beaming of energy project! Our DAG-based workflow solver and the 5 AIs around it all said we can't do without it.
...So of course nobody wants to pay programmers. We've been viewed as expensive and unnecessary since the dawn of time. A necessary evil, more or less. But your last paragraph captures why many companies need them -- bespoke solutions. You can only add so many cloud services before your normal staff starts making mistakes on an hourly basis because they have to reconcile data between multiple systems whose vendors will always refuse to make integrations.
And even if many try to have their cake and eat it too -- i.e. have an IT friend they call only for those bespoke enhancements but only pay them during that time and not every month -- then this service will simply become more boutique and expensive, mostly compensating for the lack of salary. You'd do multiple stints for the year that would cover all your expenses and normal lifestyle, it would just not be through a monthly paycheck. Why? Because I think a lot of people will exit programming. So the law of supply and demand will ultimately triumph.
...Or we get a true general AI and it makes all of this redundant in 5 years.
> I like your analysis but it seems to imply that at one point we can produce near-infinite amount of software and that this will be welcome.
It implies that there will be no need to share libraries (which is said including things like networked SaaS services). You can have your legions of machine developers create all the code you need.
Let's face it, sharing code sucks for a long list of reasons. We accept it because it is a significantly better value proposition than putting human labor into duplicating efforts, but if that effort diminishes to almost nothing, things start to change in a lot of cases. There are still obvious exceptions, of course. You probably couldn't throw your machine developers at building a Stripe clone. It's far more about human relationships than code. But bingo card creator?
It says nothing about creating software nobody wants or needs.
I know of at least two multi-billion corps that are moving to internal ETL tools instead of 5tran now because the cost to maintain internally is much lower and you can customize for cheap. SaaS as a model is at risk without something tying someone down.
The greed/“capture all of the value” mindset of SaaS kills it, because you can infer the cost of delivery in many cases and beat it.
For anything that is billed by the human, O365 is the benchmark. I’m not paying some stupid company $30/mo for some basic process, I use our scale to justify hiring a couple of contractors to build 80% of what they do for $400-600k in a few months. Half the time I can have them build on powerapps and have zero new opex.
Yeah true, the downfall of most SaaS services I used was that they were too careful trying to build too much moat and sabotage any competing efforts.
If they were a little more chill then I'd think they could make much more money. I personally would pay a few services, even as an individual, right now, if I knew I could always get a good database / JSON dump of everything at a 5-minute notice, and build my own thing on top of it.
> It is happening though internally in businesses I've worked with
How many samples do you have?
Which industries are they from?
Which SaaS products were they using, exactly and which features?
> ...a company I know recently got told their v1 API they relied on on some back office SaaS tool was being deprecated. V2 of the API didn't have the same features ... dev spends a week or two rebuilding that tool
Was that SaaS the equivalent of the left-pad Node.js module?
We've got an backend pipeline that does image processing. At every step of the pipeline, it would make copies of small (less than 10MB) files from an S3 storage source, do a task, then copy the results back up to the storage source.
Originally, it was using AWS but years ago it was decided that AWS was not cost effective so we turned to another partner OVH and Backblaze.
Unfortunately, the reliability and throughput of both of them isn't as consistent as AWS and this has been a constant headache.
We were going to go back to AWS or find a new partner, but I nominated we use NFS. So we build nothing, pay nothing, get POSIX semantics back, and speed has gone up 3x. At peak, we only copy 40GB of files per day, so it was never really necessary to use S3 except that our servers were distributed and that was the only way anyone previously could think to give each server the same storage source.
While this isn't exactly what the OP and you are talking about, I think it illustrates a fact: SaaS software was seen as the hammer to all nails, giving you solutions and externalizing problems and accountability.
Now that either the industry has matured, building in-house is easier, or cost centers need to be reduced, SaaS is going be re-evaluated under the context of 'do we really need it'?
I think the answer to many people is going to be no, you don't need enterprise level solutions at all levels of your company, especially if you're not anywhere near the Fortune 1000.
I ran a shared services org in a Fortune 50. Enterprise costs don’t scale down well, and things that are absolutely essential to supporting 100k people sound insane for 100 people. Our senior leaders would sometimes demand we try and the CFO and I would just eyeroll.
Nobody would hire the JP Morgan IT team to run a dentist practice IT workload. Likewise, AWS can save you money at scale, but if your business can run on 3 2U servers, it should.
Lots of companies make good money selling the equivalent of leftpad for confluence or jira. Anecdotally, that's exactly the kind of stuff that gets replaced with homegrown AI-built solutions at our company
I'm a consultant so I see lots of businesses, it's happening in all of them. I'm not seeing people rip out tools for custom builds to be clear, I just see people solving today problems with custom apps.
I helped a company that is build averse move off of Fivetran to Debezium and some of their own internal tooling for the same workload they are paying 40k less a month (yeah they just raised their prices again).
Now, that's not exactly the same thing, but their paucity of skills made them terrified to do something like this before, they had little confidence they could pull it off and their exec team would just scoff and tell them to work on other revenue generating activities.
Now the confidence of Claude is hard to shake off of them which is not exactly the way I wanted the pendulum to swing, but its almost 500k yearly back in their pockets.
from my perspective, this is exactly where we are. Have you ever watched starter story? There's hundreds of people with 5,6+ apps making 100k+ MRR. The VC model is broken because of it.
Something weird happened to software after the 90s or so.
You had all these small-by-modern-standards teams (though sometimes in large companies) putting out desktop applications, sometimes on multiple platforms, with shitloads of features. On fairly tight schedules. To address markets that are itty-bitty by modern standards.
Now people are like “We’ll need (3x the personnel) and (2x the time) and you can forget about native, it’s webshit or else you can double those figures… for one platform. What’s that? Your TAM is only (the size of the entire home PC market circa 1995)? Oh forget about it then, you’ll never get funded”
It seems like we’ve gotten far less efficient.
I’m skeptical this problem has to do with code-writing, and so am skeptical that LLMs are going to even get us back to our former baseline.
1. Personally I find writing software for the web far more difficult/tedious than desktop. We sure settled on the lowest common denominator
1a. Perhaps part of that is that the web doesn't really afford the same level of WYSIWYG?
2. Is it perhaps more difficult (superlinear) to write one cloud SaaS product that can scale to the whole world, rather than apps for which each installation only needed to scale to one client? Oh and make sure to retain perfect separation between clients
2a. To make everything scale, it's super distributed, but having everything so distributed has a huge cost
3. Some level of DLL hell, but something different (update hell?) I barely do any programming in my free time anymore because I would end up spending almost the whole time needing to deal with some barrage of updates, to the IDE, to the framework, to the libraries, to the build infrastructure
3a. There's always a cost to shipping, to the development team and/or the users. With releases so frequent, that cost is paid constantly and/or unpredictably (from the development or user perspective)
3b. Is there any mental sense of completion/accomplishment anymore or just a never-ending always-accelerating treadmill?
3c. I wish I could find the source but there was some joke that said "software developers are arrogant or naïve enough to think that if you take a marathon and just break it up into smaller parts you can 'sprint' the whole way"
> To make everything scale, it's super distributed, but having everything so distributed has a huge cost
Its more then a huge cost, its often insane... We are not talking 10x but easily 100x to a 1000x. Its like when i see some known database makers that scale, write how they can do a million writes per second. Ignoring that they rented a 1000 servers for that, each costing $500 per month. That same software is also 10x to 100x more slower, then a single postgresql database in reads.
So you ask yourself, how many companies do a million writes per second. Few ... How much of those writes may have been reduced by using smarter caching / batching? Probably a factor of 10 to 100x...
The thing i like about scalable solution, is that its way easier to just add a few nodes, vs needing to deal with postgres replication / master setup, when the master node got moved / need to upgraded.
For fun, i wrote my own ART database, and a LSM database using LLMs ... Things do 400 a 500k inserts / second on basic cheap hardware. So wait, why are some companies advertising that they do a million inserts/s, on 500k/month hardware? Some companies may need this ability to scale, as they will not run a 1000 server but maybe 10.000, or more. But 99% of the companies will never even smell close to a 100k inserts/second, let alone a million.
People forget that network latency is a huge thing, but the moment you want consistency and need something like raft, that means now your doing not just 1x the network latency of a write but 4x (send write, verify receive, send commit, verify commit, confirm).
Even something as basic like sqlite vs postgres on the same server, can mean a difference of 3x performance, simply because of then network overhead vs in-function. And that network overhead is just local on the same machine.
Part of this is the huge ZIRP-driven salary bubble in the US. If good software engineers were as cheap as good structural engineers, you'd be able to stick three of them in a room for $500k a year and add a part-time manager and they'd churn out line-of-business software for some weird little niche that saves 50 skilled-employee-years per year and costs $20k a seat.
The bubble means that a) the salaries are higher, b) the total addressable market has to justify those salaries, c) everyone cargo cults the success stories, and so d) the best practices are all based on the idea that you're going to hyperscale and therefore need a bazillion microservices hooked up to multiple distributed databases that either use atomic clocks or are only eventually consistent, distributed queues and logs for everything, four separate UIs that work on web/iOS/android/desktop, an entire hadoop cluster, some kind of k8s/mesos/ECS abomination, etc.
The rest of the world, and apparently even the rest of the US, has engineering that looks a little more like this, but it's still influenced by hyperscaler best practices.
Circa 2005 my boss and I would pitch that the relational database + HTML forms paradigm was a breakthrough that put custom software within reach of more customers: for one thing you could just delete all the Installshield engineers but also memory safety was a big problem in the Win95 era not so much about being hacked and more about application state in applications would get corrupted over time so you just expected Word to crash once an hour or so.
Yep. Software construction was branded a team sport. Hence, social coding, tool quality being considered more important (good thing for sure), and, arguably, less emphasis on individual skill and agency.
This was in service of a time when tech was the great equalizer, powered by ZIRP. It also dovetailed perfectly with middle managers needing more reports in fast growing tech companies. Perhaps the pendulum is swinging back from the overly collective focus we had during the 2010s.
I would make the case as well that software underwent demographic shift as the demand skyrocketed and the barriers to entering the profession with languages and tooling dropped.
80's/90's dev teams were more weird nerds with very high dedication to their craft. Today devs are much more regular people, but there are a lot more of them.
> Something weird happened to software after the 90s or so.
Counterpoint: What might have happened is that we expect software to do a lot more than we did in the 90s, and we really don't expect our software features to be static after purchase.
I agree that we sometimes make things incredibly complex for no purpose in SE, but also think that we do a rose-colored thing where we forget how shitty things were in the 1990s.
> Counterpoint: What might have happened is that we expect software to do a lot more than we did in the 90s, and we really don't expect our software features to be static after purchase.
Outside the specific case of Apple's "magical" cross-device interoperability, I can't think of many areas where this is true. When I step outside the Apple ecosystem, stuff feels pretty much the same as it did in 2005 or so, except it's all using 5-20x the resources (and is a fully enshittified ad-filled disjointed mess of an OS in Windows' case)...
> I agree that we sometimes make things incredibly complex for no purpose in SE, but also think that we do a rose-colored thing where we forget how shitty things were in the 1990s.
... aside from that everything crashes way, way less now than in the '90s, but a ton of that's down to OS and driver improvements. Our tools are supposed to be handling most of the rest. If that improved stability is imposing high costs on development of user-facing software, something's gone very wrong.
You're right that all the instability used to be truly awful, but I'm not sure it's better now because software delivery slowed way down (in general—maybe for operating systems and drivers)
This is exactly what I mean with the rose coloured glasses.
Categorically, I cannot think of a single current software product that existed then, that I would rather be using. 90s browsers sucked, famously. 90s Photoshop is barely useable compared to modern Photoshop. Text editors and word processors are so much better (when was the last time you heard of someone losing all of their work? Now we don't even bother with frequent saving and a second floppy for safety). I can remember buying software n the 1990s and it just didn't install or work, at all, despite meeting all of the minimum specs.
Seriously, go use a computer and software from the 1990s or 2000s, you are forgetting. I'm also not convinced on your assertion that software delivery has slowed down. I get weekly updates on most of my programs. Most software in the 1990s was lucky to get yearly updates...
This needs to be said more. Software used to be so much better, and so was tooling.
While it wasn't perfect, I'd argue software got much worse, and I blame SaaSification and the push for web-based centralization.
Take for example, Docker. Linux suffered from an issue of hosting servers in a standardized, and isolated manner.
The kernel already had all those features to make it work, all we needed was a nice userland to take advantage of it.
What was needed was a small loader program that set up the sandbox and then started executing the target software.
What we got was Docker, which somehow came with its own daemon (what about cron, systemd etc), way of doing IPC (we had that), package distribution format (why not leave stuff on the disk), weird shit (layers wtf), repository (we had that as well), and CLI (why).
All this stuff was wrapped into a nice package you have to pay monthly subscription fees for.
Duplicating and enshittifying standard system functions, what a way to go.
I have seen it commonly cited (although haven't bothered to check the actual sources, mostly because I believe they should be taken with a grain of salt) that developers spend somewhere in the ballpark of 50-60% of their time doing "coding work" - that is writing the code, thinking about the solution in terms of technical aspects, reading code reviews. The rest are meetings, coordination, administrative tasks, being blocked or whatever else you can think of. Even if, wishfully thinking, the value of the act of "coding work" fell by 90%, the act of doing "software engineering" work would still cost no less than 50% of what it currently does. Headlines like these are alarmist and have little substance.
Edit: I also feel stumped why so many people give in to this hype that LLMs are good at coding when they can't even do seemingly simple tasks of plain English language summarization accurately as evidenced in https://www.youtube.com/watch?v=MrwJgDHJJoE. If the AI summarizes the code in its own context incorrectly then it will not be able to write it correctly either.
> Had the cost of building custom software dropped 90%
It definitely has for me. I'm creating toolS and utilities every week easily that I never would've attempted in the past.
> This is as if writing down the code is not the biggest problem, or the biggest time sink, of building software.
Lots of people can think logically and organize a process flow, but don't know all the ridiculous code incantations (and worse development and hosting environment details) to turn their plans into tools.
It's trivial to one-shot all kinds of impressive toys in Gemini now, but it's going to be an even bigger deal when Google adds some type of persistent data storage. It will be like the rebirth of a fully modern Microsoft Access.
> Had the cost of building custom software dropped 90%, we would be seeing a flurry of low-cost, decent-quality SaaS offering all over the marketplace, possibly undercutting some established players.
Aha. Are developers finally realizing that just writing code doesn't make a business? We actually have a ton of SaaS companies being born right now but they're not making headway, because functionality and good code don't necessarily mean good businesses. Building a business is hard.
Most SaaS used to be killed by bespoke software engineers that would build some custom thing, and it was integrated perfectly into the legacy system.
Then all those people decided to be managers and go on "i dont care" autopilot mode and hired a bunch of teens that still do care, to some extent. But those teens suck at it, and the old guys just don't really care anymore.
Now with agentic code, instead of "buy splunk" or "buy jira" or whatever thing they are trying to do, they have one of those "teens now in their mid twenties" that are SUPER excited about Agentic flows, either write an agentic tool or simply use an agentic tool to code up the 300 lines of code that would replace their need for a Jira or a Splunk or whatever. Since most people only use 5% of the total features of any product, there's no reason to buy tools anymore, just build it for a fraction of the cost.
I don't know if the above is where we're at right now, but it's coming.
Creating code sprawl, weird ball of twine systems etc until someone says, enough, we will just buy this SAAS solution which integrates it all. Rinse, repeat.
> Had the cost of building custom software dropped 90%, we would be seeing a flurry of low-cost, decent-quality SaaS offering all over the marketplace, possibly undercutting some established players.
Don't forget the second-order effect of clients deciding they could do it in-house.
In fact that is where AI could win. An in house system only needs to serve the needs of one customer whereas the SAAS has to be built for the imagined needs of many customers —- when you’re lucky you can “build one to throw away” and not throw it away.
You are saying there aren't more low cost alternatives coming out
You also say writing code isn't the big problem (which I agree with)
But both can be true and in fact the reason is because the second is true! You aren't seeing the alternate because marketing is hard. People generally don't care about new products and aren't willing to save a little bit of money risking their time on something new
I mean, we have had the tech to crank out some little app for a long time. The point of the Saas used to be that you had a neck to strangle when things went south. I guess these days that's just impossible anyhow and the prices aren't worth it so we're rediscovering that software can be made instead of bought?
There have been a lot of little blogs about "home cooking" style apps that you make for yourself. Maybe AI is the microwave meal version.
"We use AI to build the tools because we use them in cursor or Visual Studio or code or wherever else people are making our stuff. I use AI a bunch." https://37signals.com/podcast/listener-questions/
"Today we’re introducing Fizzy. Kanban as it should be, not as it has been.
[...] we’ll host your account for just $20/month for unlimited cards and unlimited users. [...] And here’s a surprise... Fizzy is open source! If you’d prefer not to pay us, or you want to customize Fizzy for your own use, you can run it yourself for free forever." https://x.com/jasonfried/status/1995886683028685291
People vibe one-off solutions for themselves all the time. They just don't have the desire to productionalize them. Frankly, product knowledge is something LLMs are not that good at
Same. I hate doing mobile coding, but just in the last few months I AI-coded 3 apps specifically for my needs. They'll never get released publicly, because they'd need polish and features that I don't care about personally. They potentially replace some SaaS too.
It can. It's just not needed for my own apps. It would be needed for a public release and I'm just... not interested in that enough. It would cost me time and likely never get enough return.
Very specific training app for guitar with spaced repetition, automated message forwarder (all good ones demand subscription), and something very specific to me.
Low code solutions like PowerApps I bang out stuff like this all the time. If your use case is limited enough, it makes lazy developers very productive.
The crap I build _replaces_ someone else's SaaS (or free open source) product.
They solve my exact problem and nothing else and they follow the ways I like to use my software, with no fancy Dockerised WebUIs etc.
I have exactly zero intention of putting any of that shit out there as any kind of service with user accounts and billing and all of the associated stress. A few of them might be something I could sell as a SaaS offering, but I'm not interested in it at all.
Most of them are on my Github though for anyone to get and use as they see fit, but then it's up to them if the vibe coded program does something it shouldn't :)
Astute observation. From where I sit, the market (at least for business software; I am not very familiar with the consumer market) seems to be wide open, and businesses in the 5 - 200 employee range seem to be particularly underserved.
The marketplace for software for single-owner shops or 1-5 employee size places does seem to be quite strong, and then there's enterprise software, but small business seems to have a software marketplace that is atrociously bad. Here is the typical thing a prospective customer asks me to fix for them:
- They are using some piece of software that is essential to their business.
- There really isn't much good competition for that software, and it would be a large cost to convert to another platform that also has all the same downsides below.
- The software vendor used to be great, but seems to have been sold several times.
- The vendor has recently switched to a subscription-only model and keeps on raising subscription prices in the 12% or so range every year, and the cost of this has started to become noticeable in their budget.
- They were accustomed to software being a capital investment with a modest ongoing cost for support, but now it's becoming just an expense.
- Quality has taken a nosedive and in particular new features are buggy. Promised integrations seem quite lacking and new features/integrations feel bolted on.
- Support is difficult to get ahold of, and the formerly good telephone support then got replaced by being asked to open tickets/emails and now has been replaced by an AI chatbot frontend before they can even open a ticket. Most issues go unresolved.
There are literally millions of software packages in existence, and the bulk of them by numbers are niche products used by small businesses. (Think of a software package which solely exists to help you write custom enhancements for another software package which is used by a specific sector of the furniture-manufacturing business, to get an example.) The quality of this sector is not improving.
This is a field that is absolutely ripe for improvement. If the cost of building software really were dropping 90%, this would be a very easy field to move into and simply start offering for $6,000 a year the product that your competition is charging $12,000 a year for, for an inferior product. Before you bring up things like vendor lock-in or the pain of migration... why can't you write software to solve those problems, too? After all, the cost of writing a migration tool should be 90% cheaper now, too, right?
barrier to entry is more problematic than anything else
make something decent in the same space as an existing mega-corporation's tool?
prepare to get sued and they also steal your good ideas and implement them themselves because you don't have the money to fight them in court
> Had the cost of building custom software dropped 90%, we would be seeing a flurry of low-cost, decent-quality SaaS offering all over the marketplace, possibly undercutting some established players.
NODS HEAD VIGOROUSLY
Last 12 months: Docusign down 37%, Adobe down 38%, Atlassian down 41%, Asana down 41%, Monday.com down 44%, Hubspot down 49%. Eventbrite being bought for pennies.
They are being replaced by newer, smaller, cheaper, sometimes internal solutions.
Your point was that "they are being replaced by..." not "The market expects them to be replaced by...". The former would only be supported if their businesses were already actively being displaced (which may very well be the case), but the stock market only supports the latter.
All of that being said, I do think what you're describing is happening, or at least will happen. I just don't think people placing bets on it happening counts as evidence.
The former does far less to support your point, because it's only indicative of what people expect to happen. It is not actually evidence that their predictions will come true.
I think the 90/90 rule comes into play. We all know Tom Cargill quote (even if we’ve never seen it attributed):
The first 90 percent of the code accounts for the first 90 percent of the development time. The remaining 10 percent of the code accounts for the other 90 percent of the development time.
It feels like a gigantic win when it carves through that first 90%… like, “wow, I’m almost done and I just started!” And it ‘is’ a genuine win! But for me it’s dramatically less useful after that. The things that trip up experienced developers really trip up LLMs and sometimes trying to break the task down into teeny weeny pieces and cajole it into doing the thing is worse than not having it.
So great with the backhoe tasks but mediocre-to-counterproductive with the shovel tasks. I have a feeling a lot of the impressiveness depends on which kind of tasks take up most of your dev time.
If your job is pumping out low-effort websites that are essentially marketing tools for small businesses, it must feel like magic. I think the more magical it feels for your use case, the less likely your use case will be earning you a living 2 years from now.
Yeah, I think the more your job demands correctness in novel scenarios the less impressed you are with these shiny demos. I encourage anyone to pause the demo once the thing is generated and stare at what it did. Is it genuinely correct and impressive? Are you impressed because it made a thing generally shaped like what you expected, or because it would be genuinely impressive (or even adequate) if a person did it?
you are right, but the inverse doesn't have to be. There is a factor that could make the work so cheap, that people don't care about the quality anymore.
Fully agreed, aligns perfectly with my experience hitting the 95% done wall on a solo contract project recently. I still do the majority of work using agentic tools but the multiplier effect feeling evaporated at a certain point as the accumulated tech debt, complexity and scope creep enabled by how “easy” features felt with Claude Code/Codex early on finally caught up to me.
I (probably) still would have used CC heavily with benefit of hindsight but with a view that every seemingly “trivial” feature CC adds in the greenfield stage to be radioactive tech debt as the pile grows over time. Until reaching the point where CC starts being unable to comprehend its own work and I have to plan out tedious large scale refactors to get the codebase into a state approaching long term maintainability.
It’s always tempting to start writing code before you really know what you’re going to build because it’s so satisfying and exciting to see an idea take shape. I know I’ve had more than one or two projects where I started writing before I understood the shape of the problem I was solving and ended up a few hours into the project with a useless pile of stupid. It seems like LLMs can lead you much further down that road because it just seems so magically productive.
One thing I've noticed is that Claude is so good at doing things that I've asked for, that later on I realise that I shouldn't have even been doing them because they're stupid or unnecessary but Claude was just cheerleading me on and emoji spamming tick marks so that I didn't realise there was very little purpose to the feature.
Yeah once you start working at the feature level, you’re into product design, and that’s an entirely different realm of working that, IMO, shouldn’t even involve code. Even higher-level software design —e.g. broad stroke architecture like figuring out your data model and how it will be accessed— is better off being done before any significant amount of code gets written. Claude, et al will happily walk you straight off a cliff if you ask it to, and not having that stuff sketched out ahead of time is a most efficient way of accidentally doing that.
> I'm sure every organisation has hundreds if not thousands of Excel sheets tracking important business processes that would be far better off as a SaaS app.
Far better off for who? People constantly dismiss spreadsheets, but in many cases, they are more powerful, more easily used by the people who have the domain knowledge required to properly implement calculations or workflow, and are more or less universally accessible.
Spreadsheets are an incredible tool. They were a key innovation in the history of applications. I love them and use them.
But it's very hard to have a large conventional cell-formula spreadsheet that is correct. The programming model / UI are closely coupled, so it's hard to see what's going on once your sheet is above some fairly low complexity. And many workplaces have monstrous sheets that run important things, curated lovingly (?) for many years. I bet many or most of them have significant errors.
It's astounding how useful and intuitive they are, but my biggest gripe is how easy is for anyone to mess calculations, say, SUM(<RANGE>), by simply adding one row/column/cell.
I use Google Worksheets frequently to track new things that fit into lists/tables, and giving someone else editor access without them knowing a few worksheet nuances means I have to recheck and correct them every month or two.
This happened not so many years ago, in a certain small European nation, where official government housing valuation numbers were incorrect for some years due to a flaw in a spreadsheet.
I remember my apartment got a ~10% bump in value one year due to this flaw being fixed (fix didn't apply to all housing, just those who were on floors 5 or above).
I don't think though that a SaaS would have solved anything here.
Counterpoint: if a small part of the process is getting tweaked, how responsive can the team responsible for these apps be? That’s the killer feature of spreadsheets for business processes: the accountants can change the accounting spreadsheets, the shipping and receiving people can change theirs, and there’s no team in the way to act as a bottleneck.
That’s also the reason that so-called “Shadow IT” exists. Teams will do whatever they need to do to get their jobs done, whether or not IT is going to be helpful in that effort.
i've seen many attempts to turn a widely used spreadsheet into a webapp. Eventually, it becomes an attempt to re-implement spreadsheets. The first time something changes and the user says "well in Excel i would just do this..." the dev team is off chasing existing features of excel for eternity and the users are pissed because it takes so long and is buggy, meanwhile, excel is right there ready and waiting.
I always see this point mentioned in "App VS Spreadsheet" but no one gives a concrete example. The whole point of using a "purpose" build app is to give some structure and consistency to the problem. If people are replicating spreadsheet feature then they needed "excel" to begin with since that is a purpose built tool for generalizing a lot of problems. It's like I can say well my notebook and pen is already in front of me, I can use this why would I ever bother opening an app? well because the app provides some additional value.
It's when the users start taking care of IT issues themselves. Maybe the name comes from the Shadow Cabinet in England?
Where it might not be obvious is that IT in this context is not just pulling wires and approving tickets, but is "information technology" in the broader sense of using computers to solve problems. This could mean creating custom apps, databases, etc. A huge amount of this goes on in most businesses. Solutions can range from trivial to massive and mission-critical.
I think the term is mainly just because it tends not to be very visible/legible to the organization as a whole (and that's probably the main risk of it: either someone leaves and a whole section of the IT infrastructure collapses, or someone sets up something horrifically insecure and the company gets pwned). Especially because most IT departments hate it so there's a strong incentive to keep it quiet (I personally think IT organizations should consider shadow IT a failing of themselves and seek out ways to collaborate with those setting it up or figure out what is lacking in the service they provide to the rest of the company that means they get passed over).
That's quite possible. I've done a certain amount of it myself. A couple of programs that I wrote for the factory 15+ years ago are being used continually for critical adjustment and testing of subassemblies. All told it's a few thousand lines of Visual Basic. Not "clean code" but carefully documented with a complete theory of operation that could be used as a spec for a professionally written version.
My view is that it's not a failing, any more than "software development can't be estimated" is, but a fact of life. Every complex organization faces the dilemma of big versus little projects, and ends up having to draw the line somewhere. It makes the most sense for the business, and for developer careers, to focus on the biggest, most visible projects.
The little projects get conducted in shadow mode. Perhaps a benefit of Excel is a kind of social compromise, where it signals that you're not trying to do IT work, and IT accepts that it's not threatening.
There's a risk, but I think it's minimal. Risk is probability times impact, measured in dollars. The biggest risks come from the biggest projects, just because the potential impact is big. Virtually all of the project failures that threaten businesses come from big projects that are carried out by good engineers using all of the proper methods.
It's where you have processes etc set up to manage your IT infra, but these very processes often make it impossible / too time consuming to use anything.
The team that needs it ends up managing things itself without central IT support (or visibility, or security etc..)
Think being given a locked down laptop and no admin access. Either get IT to give you admin access or buy another laptop that isn't visible to IT and let's you install whatever you need to get your job done.
It's rare than a third-party SaaS can approximate one of these "core sheets" and most of the exceptions have already been explored over the last several decades years.
You have to remember that an SaaS, just like shrink-wrap software, reflects someone else's model of of a process or workflow and the model and implementation evolve per the timeline/agenda of its publisher.
For certain parts of certain workflows, where there's a highly normative and robust industry standard, like invoicing or accounting or inventory tracking, that compromise is worthwhile and we've had both shrink-wrap and SaaS products servicing those needs for a very very long time. We see churn in which application is most popular and what it's interface and pricing look like, but the domains being served have mostly been constant (mostly only growing as new business lines/fashions emerge and mature).
Most of the stuff that remains in a "core sheet" could benefit from the attention of a practiced engineer who could make it more reliable and robust, but almost always reflects that the represented business process is somehow peculiar to the organization. As Access and FoxPro and VBA and Zapier and so many tools have done before, LLM coding assistants and software building tools offer some promise in shaking some of these up by letting orgs convert their "core sheets" to "internal applications".
But that's not an opportunity for SaaS entrepreneurs. It's an opportunity for LLM experts to try to come in and pitch private, bespoke software solutions for a better deal than whatever the Access guy had promised 20 years ago. Because of the long-term maintenance challenges that still plague code that's too LLM-colored, I wouldn't want to be that expert pitching that work, but it's an opportunity for some ambitious folks for sure.
> a lot of core sheets I see in businesses need more structure round them
We had this decades ago, it was called dBase, but FoxPro (pre-Microsoft) was great too. Visual For Pro or MS Access were a brutal downgrade of every good aspect of it.
Imagine if today some startup offered a full-stack(TM) platform that included IDE, a language with SQL-like features, visual UI designer, database; generated small standalone binarires, was performant, and was smaller than most web homepages.
There are modern options, like Servoy or Lianja, but they're too "cloudy" to be considered equivalents.
Edit: seems like there's OpenXava too, but that is Java-based, too hardcore for non-professional programmers IMO. The beauty of xBase was that even a highschooler could whip out a decent business application if the requirements were modest.
Programming in a spreadsheet is an anti-pattern. Does anyone review your workflow? Write tests for it? Use a real programming language; a notebook at least.
Streamlit apps or similar are doing a great job at this where I'm at.
As simple to build and deploy as Excel, but with the right data types, the right UI, the right access and version control, the right programming language that LLMs understand, the right SW ecosystem and packages, etc.
Spreadsheets are powerful, but often abused. They are great for economics but horrible for logic.
Most medium to large complex spreadsheets are better implemented in a high level programming language.
Spreadsheets seem useful for people that are scared of programming syntax but quickly become so much less maintainable and janky that I believe its almost always easier to just start with learning to program already.
Spreadsheets are absolutely the right solution for a great many problems. The important thing to recognize is when a problem has outgrown a spreadsheet solution. That’s usually when you start to use a spreadsheet as a database, or when it has more than a handful of users.
It’s a rare spreadsheet that survives its original creator.
Let's not forget: it's pretty unlikely that two orgs come up with the same administration/data-analyis for which they use those spreadsheets, so most of those proposed SaaS applications would have just one customer.
I disagree. I have seen many such Excel applications in many companies, paticularly in the finance and controlling departments that were delivering key numbers to the C-class which would never have allowed the sheets, applications or calculations out of the house, let alone to a cloud service or an out-of-company consultant/programmer. There were always backups managed by the department themselves and in one case, even entry to the room with the relevant machine was restricted to authorised personnel only. If I have learned anything in my 30 years of consulting, it's that you always take Finance at its word when they tell you No to your SaaS or ERP solution.
And often the are unmaintainable because the original author left the company and the users don’t really know what the spreadsheet does which leads to unrecognized bugs and errors especially in spreadsheets with lots of data
Articles like this seem to keep highlighting a fundamental disconnect between what software teams really do vs what the people "managing" software teams a couple of layers above think those teams actually do.
The people up in the clouds think they have a full understanding of what the software is supposed to be, that they "own" the entire intent and specification in a few ambiguously worded requirements and some loose constraints and, being generous, a very incomplete understanding of the system dependencies. They see software teams as an expensive cost center, not as true the source of all their wealth and power.
The art of turning that into an actual software product is what good software teams do; I haven't yet seen anything that can automate that process away or even help all that much.
These kind of future prediction posts keep coming, and I'm tired of them. Reality is always more boring, less extreme, and slower at changing, because there are too many factors involved, and the authors never account for everything.
Maybe we should collect all of these predictions, then go back in 5-10 years and see if anyone was actually right.
Despite a couple forward-looking statements, I didn’t read this as a prediction. It seems more of a subjective/anecdotal assessment of where things are in December 2025. (Yes, with some conjecture about the implications for next year.)
Overall, it echos my experience with Claude Opus 4.5 in particular. We’ve passed a threshold (one of several, no doubt).
Just to test out the OP articles theory, I was about to write some unit tests. I decided to let Opus 4.5 have a go. It did a pretty good job, but I spent probably as much time parsing what it had done as I would have writing the code from scratch. I still needed to clean it up, and of course, unsurprisingly, it had made a few tests that only really exercised the mocking it had made. A kind of mistake I wouldn't be caught dead sending in for peer review.
I'm glad the OP feels fine just letting Opus do whatever it wants without a pause to look under the covers, and perhaps we all have to learn to stop worrying and love the LLM? But I think really, here and now, we're witness to just another hype article written by a professional blogger and speaker, who's highly motivated to write engagement bait like this.
That is the thing ... How long ago did we get Agent mode. Like in CoPilot that thing is only 7 months old.
Things evolve faster then people realize... Agent mode, then came mcp servers, sub agents, now its rag databases allowing the LLMs to get data directly.
The development of LLMS looks slow but with each iteration, things get improved. As yourself, what will have been the result of those same tests you ran, 21 months ago, with Claude 3.0? How about Claude 4.0, that is only 8 months ago.
Right now Opus 4.5 is darn functional. The issue is more often not the code that it write, but more often it get stuck on "its too complex, let me simplify it", with the biggest issue often being context capacity.
LLMs are still bad at deeper tasks, but compared to the last LLMs, the jumps have been enormous. What about a year from now? Two years? I have a hard time believing that Claude 3 was not even 2 years but just 21 month ago. And we considered that a massive jump up, useful for working on a single file... Now we are throwing it entire codebases and is darn good at debugging, editing etc.
Do i like the results? No, there are lots of times that the results are not what "i wanted", but that is often a result of my own prompting being too generic.
LLMs are never going to really replace experience programmers, but boy is the progress scary.
I can't say my opinion has changed. It didn't give me results that more exciting or useful than Sonnet. Is it worth 3x price per token? I'm not so sure.
(It wasn't clear in my comment, but I already use agents for my code. I just think the OPs claims are overblown.)
This is only true if the code it wrote is something you can just sit down and write without any reference.
Now do something like I did: An application that can get your IMDB/Letterboxd/Goodreads/Steam libraries and store them locally (own your data). Also use OMDB/TMDB to enrich the movie and TV show data.
If you can write all that code faster than read what Claude did, I salute you and will subscribe to your Substack and Youtube channels :)
Oh btw, neither Goodreads, IMDB nor Letterboxd have proper export APIs so you need to have a playwright-style browser automation do it. Just debugging that mess by writing all the code yourself is going to be hours and hours.
The Steam API access Claude one-shotted (with Sonnet 3.7, this was a long time ago) as well as enriching the input data from different sources.
> If you can write all that code faster than read what Claude did
I think you need to parse my comment a little more keenly ;)
> The Steam API access Claude one-shotted (with Sonnet 3.7, this was a long time ago) as well as enriching the input data from different sources.
This story isn't different to the usual "I made a throw-away thing with an LLM and it was super useful and it took me no time at all". It's very different to the OP stating you can throw this at mature or legacy codebases and reduce labour inputs by 90%. If the correctness of the code matters (which it will as the codebase grows), you still need to read the code, and that still takes human eyes.
People posting stuff like this are clearly not doing it; they’re reading LinkedIn posts, toying with the tech and projecting what it looks like at scale.
That’s fair; but it’s also misguided.
Either try it yourself, or go and watch people (eg. @ArminRonacher) doing this at a reasonable scale and you can ground yourself in reality, instead of hype.
The tldr is: currently it doesn’t scale.
Not personally. Not at Microsoft. Not at $AI company.
Currently, as the “don’t change existing behaviour” constraint list goes up, the unsupervised agent capability goes down, and since most companies / individual devs don’t appreciate “help” that does something while breaking something else, this causes a significant hole in the “everyone can 10x” bed time story.
As mentioned in other threads; the cost and effort to produce new code is down, but the cost of producing usable code is, I guess, moderately on par with existing scaffolding tools.
Some domains where the constraints are more relaxed like image generation (no hands? Who cares?) and front end code (wrong styles? Not consistent? Who cares?) are genuinely experiencing a revolution like the OP was talking about.
…but generalising it to “all coding” appears to be like the self driving car problem.
Solvable? Probably.
…but a bit harder than people who don’t understands or haven’t tried to solve it themselves thought, or blogged about or speculated about.
Probably, there’s a much smaller set of problems that are much easier to solve… but it’s not happening in 2026; certainly not at the scale and rate the OP was describing.
You’ll notice, neither of us have self driving cars yet.
(Despite some companies providing cars that do drive themselves into things from time to time, but that’s always “user error” as I understand it…)
In your heart you either believe something or you don’t. I am happy to live in a world where so many people follow the courage of their convictions, even if they sound insane or uncomfortable.
Yeah yeah there is this guy with a weird moustache with some crazy ideas that we are being held down by these other group of people. We should definitely follow him. He sounds crazy but he seems so convincing. And look at the cool insignia and symbols! Did you know this salute was back from the Romans? - You circa 1920.
I contracted briefly on a post-LLM-boom Excel modernization project (which ended up being consulting mainly, because I had to spend all my time explaining key considerations for a long-running software project that would fit their domain).
The company had already tried to push 2 poor data analysts who kind of new Python into the role of vibe coding a Python desktop application that they would then distribute to users. In the best case scenario, these people would have vibe coded an application where the state was held in the UI, with no concept of architectural seperation and no prospects of understanding what the code was doing a couple months from inception (except through the lens of AI sycophancy), all packaged as a desktop application which would generate excel spreadsheets that they would then send to each other via Email (for some reason, this is what they wanted - probably because it is what they know).
You can't blame the business for this, because there are no technical people in these orgs. They were very smart people in this case, doing high-end consultancy work themselves, but they are not technical. If I tried to do vibe chemistry, I'm sure it would be equally disastrous.
The only thing vibe coding unlocks for these orgs by themselves is to run headfirst into an application which does horrendous things with customer data. It doesn't free up time for me as the experienced dev to bring the cost down, because again, there is so much work needed to bring these orgs to the point where they can actually run and own an internal piece of software that I'm not doing much coding anyway.
I love the hand drawn chart. Apparently "Open Source" was invented around 2005, which significantly reduced development cost, then AWS was invented in 2011 or so and made development even cheaper, but then, oh no, in 2018 "complexity" happened and development became harder!
I don't read this as when open-source was invented, but when it happened for the corporate world. In 2002 it was a very reasonable choice for $BIG_COMPANY to use a proprietary web server, e.g. IIS. In 2008 that would have been really be weird.
But why did that make development cheaper? An enterprise copy of Windows with IIS cost maybe a thousand bucks, right? Maybe there were more costs, my knowledge is, y'know, 23 years out of date.
You decide you need a web server. Ask management chain for approval. Ask IT dept for approval. Ask finance for approval for the expense. Contact Microsoft sales. Buy it.
Now you can start developing on it…
With open source it’s not just the cost of software you save, but also potentially all the other bureaucracy that you save due to not having to pay money to do something. You also get a lot of transparency on the technical side about the products you may choose to use.
If MySQL and Postgresql had been acceptables choices 25 years ago, our company at the time would've saved SO MUCH money that now went to fund Larry Ellison's yacht(s).
Both existed, but not in a way anyone could sell to a) customers b) C-staff making the final call.
> written an entire unit/integration test suite in a few hours
It’s often hard to ground how “good” blog writers are, but tidbits like this make it easy to disregard the author’s opinions. I’ve worked in many codebases where the test writers share the authors sentiment. They are awful and the tests are at best useless and often harmful.
Getting to this point in your career without understanding how to write effective tests is a major red flag.
I've used llms to help me write large sets of test cases, but it requires a lot of iteration and the mistakes it makes are both very common and insidious.
Stuff like reimplementing large amounts of the code inside the tests because testing the actual code is "too hard", spending inordinate amounts of time covering every single edge case on some tiny bit of input processing unrelated to the main business logic, mocking out the code under test, changing failing tests to match obviously incorrect behavior... basically all the mistakes you expect to see totally green devs who don't understand the purpose of tests making.
It saves a shitload of time setting up all the scaffolding and whatnot, but unless they very carefully reviewed and either manually edited or iterated a lot with the LLM I would be almost certain the tests were garbage given my experiences.
(This is with fairly current models too btw - mostly sonnet 4 and 4.5, also in fairness to the LLM a shocking proportion of tests written by real people that I've read are also unhelpful garbage, I can't imagine the training data is of great quality)
Good write-up. I don't disagree with any of his points, but does anybody here have practical suggestions on how to move forward and think about one's career? I've been a frontend (with a little full stack) for a few years now, and much of the modern landscape concerns me, specifically with how I should be positioning myself.
I hear vague suggestions like "get better at the business domain" and other things like that. I'm not discounting any of that, but what does this actually mean or look like in your day-to-day life? I'm working at a mid-sized company right now. I use Cursor and some other tools, but I can't help but wonder if I'm still falling behind or doing something wrong.
Does anybody have any thoughts or suggestions on this? The landscape and horizon just seems so foggy to me right now.
I think it's about looking at what you're building and proactively suggesting/prototyping what else could be useful for the business. This does get tricky in large corps where things are often quite siloed, but can you think "one step ahead" of the product requirements and build that as well?
I think regardless if you build it, it's a good exercise to run on any project - what would you think to build next, and what does the business actually want. If you are getting closer on those requests in your head then I think it's a positive sign you are understanding the domain.
I think you're right about trying to stay one step ahead of product requirements. Maybe my issue here is that I'm looking for another "path" where one might not exist, at least not a concretely defined one. From childhood to now, things were set in front of me and I just sort of did them, but now it feels like we're entering a real fog of war.
It would be helpful, as you suggest, to start shifting away from "I code based on concrete specs" to "I discover solutions for the business."
Thanks for the reply (and for the original essay). It has given me a lot to chew on.
1. Use the tools to their fullest extend, push boundaries and figure out what works and what doesn't
2. Be more than your tools
As long as you + LLM is significantly more valuable than just an LLM, you'll be employed. I don't know how "practical" this advice is, because it's basically what you're already doing, but it's how I'm thinking about it.
Let's say LLMs add 50 "skill points" to your output. Developer A is at 60 skill points in terms of coding ability, developer B is at 40. The differential between them looks large. Now add LLMs. Developer A is at 110 skill points, developer B is at 90. Same difference, but now it doesn't look as large.
The (perceived, alleged) augmentation by LLMs makes individual differences in developer skill seem less important. From the business's perspective, you are not getting much less by hiring a less skilled developer vs. hiring a more skilled one, even if both of them would be using LLMs on the job.
Obviously, real life is more complicated than this, but that's a rough idea of what the CEO and the shareholders are grappling with from a talent acquisition standpoint.
Don't chase specific technologies, especially not ones driven by for-profit companies. Chase ideas, become great in one slice of the industry, and the very least you can always fall back on that. Once established within a domain, you can always try to branch out, and feel a lot more comfortable doing so.
Ultimately, software is for doing something, and that something can be a whole range of things. If you become really good at just a slice of that, things get a lot easier regardless of the general state of the industry.
Thanks for the response. When you say "one slice of the industry", is the suggestion to understand the core business of whatever I'm building instead of being the "specs to code" person? I guess this is where the advice starts to become fuzzy and vague for me.
Its always been foggy. Even without AI, you were always at risk of having your field disrupted by some tech you didn't see coming.
AI will probably replace the bottom ~30-70%(depends who you ask) of dev jobs. Dont get caught in the dead zone when the bottom falls out.
Exactly how we'll train good devs in the future, if we don't give them a financially stable environment environment to learn in while they're bad, is an open question.
Use the best tools, the lowest tier of Claude Code is perfect for the stuff you do at home in the evenings and weekends. It's also by far the best at being a "pair coder" as it's chatty and tells you what it's doing and doesn't get confused if you hit ESC and tell it to do something else.
Build your own tools, need a small utility? Use an LLM to create it with you.
Create LLM-focused tools and adjust your workflows to be LLM-friendly.
I personally have a Taskfile setup that follows the same formula regardless of language. "task build" runs lint+test+build. Test and lint are kinda self-evident. All output is set to minimum, only errors are verbose (don't waste context on fancy output).
I also have tools for LLMs to use to find large code files, large and overly complex functions etc.
All project documentation lives in docs/ as markdown files with Mermaid charts.
This way I can just have the general "how to use a taskfile" instructions in my global WHATEVER.md and it'll work in every project.
Learn project management. Working with LLMs is exactly like project managing a bunch of smart and over eager junior coders who want to use every trick and pattern they learned at school for every tiny shell script.
Do a few test projects where you just pretend you're a non-techinical project lead and know WHAT you want but not HOW you want it done. Plan the project, split it into tasks (github tasks or beads[0] both work pretty well). Then have the LLM(s) tackle the tasks one by one and test the end result like a non-techical PM would do in a demo. Comment, critique and ask them to change stuff that doesn't work.
If you can afford it, bring in an outside consultant (Codex or Gemini), both of which are _really_ good at evaluating large codebases for duplication, test coverage, repetition, bad patterns etc. Give their responses verbatim to Claude and ask what it thinks about them.
Working with LLMs is a skill you just need to use to get a feel for it, it's not a science and more like an art. For example I can "feel" when Claude is doing its thing and being either overeager or trying to complete a task while ignoring the burning pile of unit tests it leaves behind and interrupt. it before it gets too far.
Another thing I'd suggest: look into and use non-coding AI tools that improve productivity. For example:
Zoom meeting transcriptions and summaries or Granola. A lot of context is lost when you take manual notes in meetings. If you use a tool that turns a meeting into notes automatically, you can use those notes to bootstrap a prompt/plan for agents.
My suggestion would be to move to a higher level of abstraction, change the way which you view the system.
Maybe becoming full stack? Maybe understanding the industry a little deeper? Maybe analyzing your company's competitors better? That would increase your value for the business (a bit of overlap with product management though). Assuming you can now deliver the expected tech part more easily, that's what I'd do.
As for me, I've moved to a permanent product management position.
My .02$. Show you can tackle harder problems. That includes knowing which problems matter. That happens with learning a "domain", versus just learning a tool (e.g. web development) in a domain.
Change is scary, but thats because most aren't willing to change. Part of the "scare" is the fear of lost investment (e.g. pick wrong major or career). I can appreciate that, but with a little flexibility, that investment can be repurposed quicker today that in pre-2022 thanks to AI.
AI is just another tool, treat it like a partner not a replacement. That can also include learning a domain. Ask AI how a given process works, its history, regulations, etc. Go confirm what it says. Have it break it down. We now can learn faster than ever before. Trust but verify.
You are using Cursor, that shows a willingness to try new things. Now try to move faster than before, go deeper into the challenges. That is always going to be valued.
Also blind leading the blind here but I see two paths.
1) Specialize in product engineering, which means taking on more business responsibility. Maybe it means building your own products, or maybe it means trying to get yourself in a more customer-facing or managerial role? Im not very sure. Probably do this if you think AI will be replacing most programmers.
2) Specialize in hard programming problems that AI can't do. Frontend is probably most at risk, low level systems programming least at risk. Learn Rust or C/C++, or maybe backend (C#\Java\Go) if you don't want to transition all the way to low level systems stuff.
That being said I don't think AI is really going to replace us anytime soon.
> but wonder if I'm still falling behind or doing something wrong.
This is normal with all that is going on in the industry and the AI/ML hype. But, one should not allow that to lead to "analysis paralysis".
> specifically with how I should be positioning myself. ... Does anybody have any thoughts or suggestions on this?
You have a stable job; hence your entire focus (for now) should be to "grow" in your job/organization. This means taking more responsibilities both technical/non-technical and demonstrating your long-term commitment to management. On the technical side, start with "full stack development" both frontend and backend so you can contribute end-to-end to the entire product line. Learn/Use all available tools (AI and otherwise) to demonstrate independent initiative. Step up for any tasks which might not have a owner (eg. CI/CD etc.). Keep your boss/higher-ups informed so as to maintain visibility throughout the organization. Learn more about the problem domain, interact more with Marketing/Sales so as to become the liaison between Engineering and rest of the organization/clients.
Generally, all higher management look for initiative and independent drive so that they can delegate work with the assurance that it will be taken care of and that is what you need to provide.
In teams of high performers who have built a lot of mutual trust, code reviews are mostly a formality and a stop gap against the big, obvious accidental blunders. "LGTM!"
I do not know or trust the agents that are putting out all this code, and the code review process is very different.
Watching the Copilot code review plugin complain about Agent code on top of it all has been quite an experience.
No. But most software products are nowhere near that sensitive and very few of them are developed with the level of caution and rigor appropriate for a safety-critical component.
I happily got rid of a legacy application (lost the pitch, another agency now must deal with the shit) I inherited as a somewhat technically savvy person about a year ago.
It was built by real people. Not a single line of AI slop in it. It was the most fragile crap I had ever the misfortune to witness. Even in my wildest vibe coding a prototype moments I was not able to get the AI to produce that amount of anti patterns, bad shit and code that would have had Hitchcock running.
I think we would be shocked to see what kind of human slop out there is running in production. The scale might change, but at least in this example, if I had rebuilt the app purely by vibe coding the code quality and the security of the code would actually have improved. Even with the lowest vibe coding effort thinkable.
I am not in any way condoning (is this the right word) bad practices, or shipping vibe code into prod without very, very thorough review. Far from it. I am just trying to provide a counter point to the narrative, that at least in the medium sized business I got to know in my time consulting/working in agencies, I have seen quite a metric ton of slop, that would make coding agents shiver.
DigitalOcean version 1 was a duck taped together mash of bash, chron jobs and perl, 2 people out of 12 understood it, 1 knew how to operate it. It worked, but it was insane, like really, really insane. 0% chance the original chatgpt would have written something as bad as DO v1.
To me, built and written are not the same. Built: OK, maybe that's an exaggeration. But could an early "this is pretty good at code" llm have written digitalocean v1? I think it could, yes (no offense Jeff). In terms of volume of code and size of architecture, yeah it was big and complex, but it was literally a bunch of relatively simple cron, bash and perl, and the whole thing was very...sloppy (because we were moving very quickly) - DigitalOcean as I last knew of it (a very long time ago), transformed to a very well written modern go shop. (Source: I am part of the "founding team" or whatever.)
AI doesn't overcome the limits of the one who is giving the input, like in pre-ai era SW, if the input sucks the output sucks.
What changed is the speed: AI and vibe coding just gave a turboboost to all you described. The amount of code will go parabolic (maybe it's already parabolic) and, in the mid-term, we will need even more swe/sre/devops/security/ecc to keep up.
I feel like people who write articles like this have never worked at big companies.
My wife works at Shutterstock, first as a SWE, now as a product manager. Most of their tasks involve small changes in 5 different systems. Sometimes in places like Salesforce. A simple ask can be profoundly complicated.
AI has certainly made grokking, and code changes easier. But the real cost of building has not been reduced 90%. Not even close.
The author "teaches workshops on AI development for engineering teams". This is nothing but a selling post for companies. I don't know what to discuss here honestly, this is more primitive bait than an average video preview picture on YouTube.
This article mentions cost to ship, but ignores that the largest cost of any software project isn't consumed by how long it takes to get to market, but by maintenance and addition of new features. How is agentic coding doing there? I've only seen huge, unmaintainable messes so far.
While this is true, I think some fields like game development may not always have this problem. If your goal is to release a non-upgradable game - fps, arcade, single-player titles, maintenance may be much less important than shipping.
I'm trying to understand where this kind of thinking comes from. I'm not trying to belittle you, I sincerely want to know: Are you aware that everyone writing software has the goal of releasing software so perfect it never needs an upgrade? Are you aware that we've all learned that that's impossible?
this was basically true until consoles started getting an online element. the up-front testing was more serious compared to the complexity of the games. there were still bugs, but there was no way to upgrade short of a recall.
I'm not saying that this model is profitable in the current environment, but it did exist in a real world environment at one point, making the point that certain processes are compatible with useful products, but maybe not leading edge competitive products that need to make a profit currently.
Agreed. I think a core problem is many developers (on HN) don't realise how "bad" so much human written code is.
I've seen unbelievably complex logistics logic coded in... WordPress templates and plugins to take a random example. Actually virtually impossible to figure out - but AI can actually extract all the logic pretty well now.
finally the right question! I would upvote you 1,000 times if I could!
this is why they need a senior/seasoned developer behind them. for things that can simply be learned directly (e.g. from man/docs) it rocks, without guidance. for other things it needs guidance
there are many millions of people writing code… that’s way too many to get any good quality. you might get lucky and get involved with codebase which does not make you dizzy (or outright sick) but most of us are not that lucky
certainly not more of it now, we have decades and decades of human-written code if I am understanding the question correctly.
all's I am saying is that "anti-AI" HN crowd literally glorifies human-written code every second of every day here, "AI slop this, AI code unmaintainable that..." I have been a contractor for many years now and usually brought on to fix shit and human-written code is in vast majority of cases much worse compared to AI generated code. the sample size of the latter is smaller but my general argument remains. I think people that write these "AI slop" comments should pick their favorite language/framework/... and then go to github and browse through codebases, written by humans (ignore commits before xxxxxx) and then see if they like what they see :)
I am shocked by the number of people who are dismissive of AI, or have stuck to the whole copy and paste into a chatbot approach to development.
I'm finding this stuff, when given proper guidance, can reproduce experiments I've run incredibly fast. Weeks of typing done in minutes of talking to Claude Code.
In the working world, a lot of the time what matters is getting results, not writing 'perfect' code the way software engineers would like to.
concerns:
- security bugs
- business logic errors that seem correct but are wrong
as long as you have domain experts, I suspect these will gradually go away. hopefully LLMs can be trained not to do insecure things in code.
> In the working world, a lot of the time what matters is getting results, not writing 'perfect' code the way software engineers would like to.
But you do recognize that one's ability to speedily implement features is dependent on the present quality of a codebase, right? Being a serious practitioner here means balancing active feature development with active tending to the codebase to keep it in a reasonable state, since its quality will tend ever downward otherwise.
In your experiments, do you find agents readily find this balance? I ask genuinely, I have only minimal experience with Cursor.
To be blunt and a bit nihilistic: I get paid to ship features.
Client wants a feature EoW, they get it EoW, they're not paying for a week of extra work for the "quality codebase" feature.
But the good thing is that we've had objective and automated tooling for quality checks for code. We used to use them for humans, but now we apply the same tools for AI.
Good unit testing practices, exhaustive linters, .editorconfig etc. force humans AND LLMs to produce code within specific parameters.
If your project doesn't have automated tests and linters, now is the time to add them. Maybe use an LLM to help you with it :)
Maintaining a quality code base is more than just having tests and linters. It's about organization, right-sized abstractions, architecture and choosing the right patterns. There is no real way to automate the verification of these things. If an agent farts out a 2,000 LOC feature in a day but bifurcates the code base, duplicates functions or makes awful abstractions, it WILL eventually turn into a big ball of mud.
All that being said, if wielded correctly an LLM can contribute to a healthy repository, but it requires much of the same thought and planning that development pre-LLMs did. I promise you, if you stick with the same code base long enough using your approach and little consideration to its health, it will become a hellish place to build in.
I suspect you haven't worked with agents enough. Start trying! You'll see...
In the age of agents.md files, you direct the agents style, organization and architectural choices. If you thought you were a coder, and a good one, your skill is useless. You now need to be an architect and a manager.
I'm not dismissive of AI but I still do the whole "copy and paste" into chatbot approach simply because I use it as a boilerplate or research tool where the intent or workflows are already established and is targeted so it doesn't really matter how it writes since I can "parse" it's output quickly kind like an advance version of vscode saved snippet templates. I never use it do software design for me since this actually requires you understanding the problem but I can still use it to research existing stuff which is pretty cool.
> I've had Claude Code write an entire unit/integration test suite in a few hours (300+ tests) for a fairly complex internal tool. This would take me, or many developers I know and respect, days to write by hand.
I'm not sure about this. The tests I've gotten out in a few hours are the kind I'd approve if another dev sent then but haven't really ended up finding meaningful issues.
Just to be clear, they weren't stupid 'is 1+1=2' type tests.
I had the agent scan the UX of the app being built, find all the common flows and save them to a markdown file.
I then asked the agent to find edge cases for them and come up with tests for those scenarios. I then set off parallel subagents to develop the the test suite.
It found some really interesting edge cases running them - so even if they never failed again there is value there.
I do realise in hindsight it makes it sound like the tests were just a load of nonsense. I was blown away with how well Claude Code + Opus 4.5 + 6 parallel subagents handled this.
I keep seeing posts like this so I decided to video record all my LLM coding sessions and post them on YouTube. Early days, I only had the idea on Saturday.
I think the whole software industry has tried to obscure the fact that most companies who hire software engineers are writing exactly the same code as every other company. How many people here have written the same goddamn webapp at the last 3 companies they've been to? Anyone ever wonder why nobody just publishes blueprints to software and licenses that blueprint to a single engineer to customize? Because there's a lot less money in doing that, versus selling a lot more software add-ons/SaaS/etc.
There is no value-add to hiring software engineers to build basic apps. That's what AI will be good for: repeating what has already been written and published to the web somewhere. The commoditized software that we shouldn't have been paying to write to begin with.
But AI won't help you with all the rest of the cost. The maintenance, which is 80% of your cost anyway. The infrastructure, monitoring, logging, metrics, VCS hosting, security scanning, QA, product manager, designer, data scientist, sales/marketing, customer support. All of that is part of the cost of building and running the software. The software engineers that churn out the initial app is a smaller cost than it seems. And we're still gonna need skilled engineers to use the AI, because AI is an idiot savant.
Personally I think 50% cost reduction in human engineers is the best you can expect. That's not nothing, but that's probably like a 10% savings on total revenue expenditure.
That. I was expecting some overview of the last couple of decades in a "There's no Silver Bullet" fashion.
Instead it's some guy that claims it takes a team to make CI/CD for something he can vibe-code in a day, and that agentic code totally solves the complexity problems caused by too much React.
Even if that were true, so we've made development needlessly more complicated, only to gain back the time lost by running numberwang across enough datacenters to fill a small country? You haven't abstracted anything away, Morty, just created another layer of shit.
Oh, if LLMs did actually solve the problem of too much React, that would by itself drop the cost of doing the software the author cares about by close to 90%.
And yes, it's a completely self-imposed problem that many people don't have at all. But LLMs make it worse, and the author is celebrating that it's worse now.
If AI is such a competitive advantage, why are AI companies even trying to sell it? Wouldn't it bring more money to use a bleeding edge internal model and just vibe a couple of facebooks at the fraction of the cost and profit like crazy?
It seems the people who think they can just tell computers to write code for them, also are the people who are inclined to tell other people to build apps for them.
We are hurdling towards a brave new world where only 10% of humans have to work, and the other 90% form the bureaucracy on top.
This is a bit like saying RoR or Django dropped the cost of building web apps by 90% because you could write a blog app with 5 lines of code (IIRC early Rails claim).
So yes, the cost of certain tasks may drop by 90% (though I think that's a high number still), certainly the cost of developing software overall has not dropped by 90%.
I might be able to whip up a script in 30 seconds instead of 30 minutes, but I still have to think of whether I need the script, what exactly it should do, what am I trying to build and how and why, how does it fit with all the requirements, etc. That part isn't being reduced by 90%.
I strongly agree. It may be even more than 90%. For example yesterday I was able to use lovable (and Claude code web) on my phone to build out an almost 1:1 replacement (for my purposes) for an expensive subscription based app for working out: https://strengthquest.lovable.app/
This is simply unimaginable level of productivity— in one day on my phone, I can essentially build and replace expensive software. Unreal days we are living in.
I must be holding wrong then because I do use Claude Code all the time and I do think its quite impressive… still I cant see where the productivity gains go nor am I even sure they exist (they might, I just cant tell for sure!)
Sure. But am I supposed to still understand that code at some point? Am I supposed to ask other team members to review and approve that code as if I had written it?
I'm still trying to ship quality work by the same standards I had 3 or 5 years ago.
No not, worse code. Wrong code. Code filled with bugs. Code filled with lawsuits too.
Code that make you look productive this month while you prepare to leave the company, and turn out to be absolute pooopoo the day after you leave.
I think there might be something here! a core of truth about what the future might hold. I cant take this approach right now though. Its not a good approach today.
I am a believer in the new agentic coding tools (I wasn't 6 months ago) but the delays and time it takes to build something haven't really changed even though everyone I know is using them. What I see is what has always been there:
Product doesn't understand the product because if it was easy to understand then someone else would have solved the problem already and we wouldn't have jobs. This means you need to iterate and discuss and figure out just like always. The iterations can be bolder, bigger, etc and maybe a bit faster but ultimately software scales np so a 10x improvement in -individual- capability doesn't scale to 10x improvement in -organizational- capability.
Let me put it another way. If your problem was so simple you could write a 200 word prompt to fully articulate it then you probably don't have much of a moat and aren't providing enough value to be competitive.
Ya completely agree, these companies will eventually push these costs to the consumer, might be in 1-2yrs, but it will eventually happen and though regulatory capture make it harder and harder to run local AI models because of “security” reasons.
The author teaches AI workshops. Nothing wrong with that, but I think it should be disclosed here. A lot of money is riding on LLMs being financially successful which explains a lot of the hype.
> Software engineering has got - in my opinion, often needlessly - complicated, with people rushing to very labour intensive patterns such as TDD, microservices, super complex React frontends and Kubernetes.
TDD as defined by Kent Beck (https://tidyfirst.substack.com/p/canon-tdd
) doesn't belong in that list. Beck's TDD is a way to order work you'd do anyway: slice the requirement, automate checks to confirm behavior and catch regressions, and refactor to keep the code healthy. It's not a bloated workflow, and it generalizes well to practices like property-based testing and design-by-contract.
This wouldn't be the first time that the cost of software radically dropped. It happened back during the early 1960s for the first time when IBM introduced the System 360, which included backward compatibility for the 1401. Prior to this point, the maximum lifespan of software was tied to that of the computer in question. The software would be re-written for the next architecture, every time a new computer was purchased.
The advent of the PC, and the appearance of Turbo Pascal, Visual Basic, and spreadsheets that could be automated made it possible for almost anyone to write useful applications.
If it gets cheaper to write code, we'll just find more uses for it.
If the cost of building software had dropped 90%, the author wouldn't need to write a blog post. Just undercut the competition by 80% (they can keep 10% for themself).
I keep seeing articles like these popup. I am in the industry but not in the “AI” industry.
What I have no concept of, is the current subsidized, VC funded, anywhere close to what the final product will be?
I always fall back to the Uber paradox. Yes it was great at first, now it’s 3x what it cost and has only given cabs pricing power. This was good for consumers to start but now it’s just another part of the k shaped economy.
So is that ultimately where AI goes? Top percent can afford a high monthly subscription and the not so fortunate get there free 5 minutes per month
But even if that did happen, the open source models are excellent and cost virtually nothing?
Like I prefer Opus 4.5 and Gemini 3 to the open weights models, but if Anthropic or Google upped the pricing 10x then everyone would switch to the open weights models.
Arguably you could say that the Chinese labs may stop releasing them, true, but even if all model development stopped today then they'd still be extremely useful and a decent competitor.
Again I’m not in the “AI” industry so I don’t fully understand the economics and don’t run open models locally.
What’s the cost to run this stuff locally, what type of hardware is required. When you say virtually nothing, do you mean that’s because you already have a 2k laptop or gpu?
Again I am only asking because I don’t know. Would these local models run OK on my 2016 Mac Pro intel or do I need to upgrade to the latest M4 chip with 32GB memory for it to work correctly?
The large open-weights models aren't really usable for local running (even with current hardware), but multiple providers compete on running inference for you, so it's reasonable to assume that there is and will be a functioning marketplace.
Basically yes, the useful models need a modernish GPU to get inference running at a usable speed. You can get smaller parameter models 3b/7b running on older laptops, it just won’t produce output at a useful speed.
> I've had Claude Code write an entire unit/integration test suite in a few hours (300+ tests) for a fairly complex internal tool.
And what do you have then? 300 tests that test the behavior that's exposed by the implementations of the api. Are they useful? Probably some are, probably some are not. The ones that are not will just be clutter and maintenance overhead.
Plus, there will be lots of use-cases for which you need to look a little deeper than just the api implementation, which are now not covered. And those kind of tests, tests that test real business use cases, are by far the most useful ones if you want to catch regressions.
So if your goal is to display some nice test coverage metrics on SonarQube or whatever, making your CTO happy, yes AI will help you enormously. But if your goal is to speed up development of useful test cases, less so. You will still gain from AI, but nowhere near 90%.
> I'd rather inherit a repo written with an agent and a good engineer in the loop than one written by a questionable quality contractor who left three years ago, with no tests, and a spaghetti mess of classes and methods.
To reinforce that point: we've got the world's most prominent AI promoting company (MSFT), that has finally realized that Windows Explorer is too slow to start.
And this company, with all the formidable powers of AI behind them, can find no way to optimize that other than pre loading the app in memory. And that's for a app that's basically a GUI for `ls`
I think this reflects one of the biggest fallacies behind LLM adoption; the idea that reducing costs for producers improves the state of affairs for consumers too. I've seen someone compare it to the steam engine.
With the steam engine, though, consumers made a trade-off: You pay less, and get (in most cases, I presume) a worse product. With LLMs and other machine learning technologies, maybe if you're paying for the software there's a trade-off (if the software is actually cheaper anyway), but otherwise it doesn't exist. It costs the same amount of money for you to read an LLM-generated article as to read a real one; your internet bill doesn't go down. Likewise for gratis software. It's just worse, with no benefit.
Hacker News is full of producers, in this sense, who often benefit from cutting corners, and LLMs allow them to cut corners, so obviously there are plenty of evangelists here. I saw someone else in this comment section mention that gamers who are not in the tech industry don't like "AI". That's to be expected; they're not the producers, so they're not the ones who benefit.
Maybe not 90% but everyone can see the signs in the horizon.
The tale goes like this: one day visual arts got commoditised to the point any given visual artwork could be obtained for virtually free digitally. This has been the case for centuries. The aural arts (see records of spoken poetry, podcasts, music) have been commoditised for a long time. Full commodisation might never been happen (ie you can still work in the field) but it is undeniable that it has had a massive impact in their respective fields. Getting a Picasso-like painting might not be quite possible but we are getting quite there, same with music.
The same is coming for devs.
Devving is still far away but it doesn't really take that much to produce a significant impact in the field. The percentage of devs who can be able to get late 10s salaries will be gradually diminishing over time. This is what early stage commodisation looks like.
i think theres a huge section of tasks which required you pay high salaries for which are gone.
just thinking from the finance world, in 2010 no knew how to program on the desk and no one knew sql, and even if they did the institutional knowledge to use the dev systems was not worth their time. So you had multiple layers of meetings and managers to communicate programs. As a result, anything small just didn't get done, and everything took time.
by 2020 most junior guys knew enough python to pick up some of the small stuff
in 2025 ai tools are good enough that they're picking up things that legit would have taken weeks to do in 2010 because of the processes around them not the difficulty and doing it in hours. A task that would take an hour to do used to take multiple meetings to properly outline to someone without finance knowledge and now they can do themselves in less time than it took to describe to a fresh cs grad.
Those tasks that junior traders/strats are able to do now that would have taken weeks or months to get into prod going through an it department i'm seeing cost drop 90% everyday right now. Which is good, it lets tech focus on tech and not learning the minutia of options trading in some random country
To nick pick, it is more about 800% gain in productivity. Cost actually increased.
I refer here to my experience as a solo developer.
With AI assistance I don't spend less hours coding, but more.
There is the thrill of shipping relevant features that were sleeping in my drawers for ages, quicker. Each hour of coding delivers just 8 x more features and bug solving.
Also, whereas I spent a few dozen dollars per month on server costs, I now also spend an equivalent amount on subscriptions and API calls to LLM services for this AI assisted coding. Worth every penny.
So while productivity increased manifold, absolute cost actually increased as well for me.
It's fascinating to read these comments - I believe everyone. Some are getting huge productivity gains and others very little - so perhaps we are not in the same business. I know that I've ranged over various work - all called software development and the variety of work was quite different - some I wouldn't call challenging but still needed a lot of manual labor - perhaps this is the type of work that finds easy wins from AI automation. Still other work was much more challenging but I've never really attempted to use AI in my work because it was forbidden by policy. I've used AI at home for fun projects and it has helped me with languages I've never used before but I've never come close to 90% productivity boost. Anyway, fascinating!
I agree with your observations, in my own job I cover a great deal of the aspects of all software development practices for a few clients. Probably something you'd normally have a bunch of different roles do. Not because of AI, I have been in this role since before the AI boom, this is just how agency work is sometimes.
My observation is that there is perhaps 15% of my job that has been boosted by AI by quite a lot, and the rest it hasn't touched much at all. Most of the job just isn't coding badically. The code generation aspect is a bit flawed too because to get good results I often spend more time collating requirements and engineering the prompt that I again could have just done it quicker myself.
There is a sweet spot in there where the requirements were easy to write out, and the code was simple enough but there is a lot to write, that it's nice to not have to write it myself. But even then I am finding that AI is often not successful, and if it takes three tries to get it to do the work properly then there is no productivity gain. Often enough time is lost to the failed attempts.
Usually there isn't that much code to write, but it's fairly complex and needs to be correct, which is where I find LLMs have too many failed attempts and waste time.
(I am an 18+ year "everything" developer, my experiences are from using Claude Code)
LLMs are calculating probable text/code. A lot of it in very short time.
Probable text/code is not the same as correct/proper text/code.
It is a huge mass of probable and maybe correct/proper text/code.
That is very dangerous as it looks correct but maybe it is not.
It is likely that the cost of software will increase because of the unmanageable mess that this creates.
As an example I wanted a plugin for visual studio. In the past I would have spent hours on it or just not bothered but I used Claude code to write it, it isn’t beautiful or interesting code, it lacks tests but it works and saves me time. It isn’t worth anything, won’t ever be deployed into production, I’ll likely share it but won’t try to monetise it, it is boring ugly code but more than good enough for its purpose.
Writing little utility apps has never been simpler and these are probably 90% cheaper
A plugin that does what exactly? A lot of comments here and under other posts are just declaring things with the following template: "I wanted to do X, but before it would took me N amount of hours, but now with LLM tool L, it has taken me way less time. I can't share anything about X, but LLM tool L is very useful. Just trust me, bro"
My favorite is this advert I keep getting that says "Imagine being able to build an app with your name on it!" I'm like... if you're struggling with the part where you put your name on it... and that's the priority.. I don't know what to tell you.
I think AI can be really powerful tool. I am more productive with it than not, but a lot of my time interacting with AI is reviewing its code, finding problems with it (I always find some issues with it), and telling it what to do differently multiple times, and eventually giving up, and fixing up the code by hand. But it definitely has reduced average time it takes me to implement features. But I also worry that not everyone would be responsible and check/fix AI generated code.
pretty decent article - but what it misses is most of these agents are trained on bad code - which is open source.
so what does this mean in practice? for people working on proprietary systems (cost will never go down) - the code is not on github, maybe hosted on an internal VCS - bitbucket etc. the agents were never trained on that code - yeah they might help with docs (but are they using the latest docs?)
for others - the agents spit bad code, make assumptions that don't exist, call api's that don't exist or have been deprecated ?
each of those you need an experienced builder who has 1. technical know-how 2. domain expertise ? so has the cost of experienced builder(s) gone down ? I don't think so - I think it has gone up
what people are vibecoding out there - is mostly tools / apps that deal in closed systems (never really interact with the outside world), scripts were ai can infer based on what was done before etc but are these people building anything new ?
I have also noticed there's a huge conflation with regards to - cost & complexity. zirp drove people to build software on very complex abstractions eg kubernetes, nextjs, microservices etc - hence people thought they needed huge armies of people etc. however we also know the inverse is true that most software can be built by teams of 1-3 people. we have countless proof of this.
so people think to reduce cost is to use a.i agents instead of addressing the problem head-on - built software in a simpler manner. will ai help - yeah but not to the extent of what is being sold or written daily.
> these agents are trained on bad code - which is open source.
This is doubtful and not what I've seen in over 30 years in the industry. People who are ashamed of their source code don't make it Open Source. In general, Open Source will be higher quality than closed source.
Sure, these days you will need to avoid github repositories made by students for their homework assignments. I don't think that's a problem.
The idea that LLMs were trained on miscellaneous scraped low quality code may have been true a year ago, but I suspect it is no longer true today
All of the major model vendors are competing on how well their models can code. The key to getting better code out of the model is improving the quality of the code that it is trained on.
Filtering training data for high quality code is easier than filtering for high quality data if other types.
My strong hunch is that the quality of code being used to train current frontier models is way higher than it was a year ago.
> I've had Claude Code write an entire unit/integration test suite in a few hours (300+ tests) for a fairly complex internal tool. This would take me, or many developers I know and respect, days to write by hand.
What was the value add of those tests, when I tried this, AI would often re-write the code to make match its poorly written test.
> The agentic coding tools have got extremely good at converting business logic specifications into pretty well written APIs and services.
I haven’t experienced this at all. They can do okay with greenfield services (as the author mentioned). However it’s often not “extremely good”. It’s usually “passable” at best. It doesn’t save me any time either. I have to read and audit every line and change it anyway.
> Previously, you'd have a small team of people working on setting up CI/CD... Nearly all of this can be done in a few hours with an agentic coding CLI
I've had a couple of contracts now, where I get to fix everything for teams who vibe-coded their infrastructure. I'm not saying it isn't a speed-up for teams who already have a wealth of infra experience - but it's not a substitute for the years of infra experience such a team already has.
Betteridge's law proven correct once again. The answer to the headline is: no. Perhaps it will be true in the future, nobody knows.
I'm skeptical the extent to which people publishing articles like this use AI to build non-trivial software, and by non-trivial I mean _imperfect_ codebases that have existed for a few years, battle tested, with scars from hotfixes to deal with fires and compromises to handle weird edge cases/workarounds and especially a codebase where many developers have contributed to it over time.
Just this morning I was using Gemini 3 Pro working on some trivial feature, I asked it about how to go about solving an issue and it completely hallucinated a solution suggesting to use a non-existing function that was supposedly exposed by a library. This situation has been the norm in my experience for years now and, while this has improved over time, it's still very, very common occurrence. If it can't get these use cases down to an acceptable successful degree, I just don't see how much I can trust it to take the reins and do it all with an agentic approach.
And this is just a pure usability perspective. If we consider the economics aspect, none of the AI services are profitable, they are all heavily subsidized by investor cash. Is it sustainable long term? Today it seems as if there is an infinite amount of cash but my bet is that this will give in before the cost of building software drops by 90%.
>I asked it about how to go about solving an issue and it completely hallucinated a solution suggesting to use a non-existing function that was supposedly exposed by a library.
Yeah, that's a huge pain point in LLMs. Personally, I'm way less impacted by them because my codebase is only minimally dependent on library stuff (by surface area) so if something doesn't exist or whatever, I can just tell the LLM to also implement the thing it hallucinated :P
These hallucinations are usually a good sign of "this logically should exist but it doesn't exist yet" as opposed to pure bs.
Can someone help me out how to get started in this kind of coding setup?
I haven't written production code for the last 8 years, but has prior development experience for about 17 years (ranging from C++, full stack, .NET, PHP and bunch of other stuff).
I used AI at personal level, and know the basics. Used
Claude/Github to me help fix and write some pieces of the code in languages I wasn't familiar with. But it seems like people talking and deploying large real world projects in short-"er" amount of time. An old colleague of mine whom I trust mentioned his startup is developing code 3x faster than we used to develop software.
Is there resource that explains the current best practices (presumably it's all new)? Where do I even start?
"I've had Claude Code write an entire unit/integration test suite in a few hours (300+ tests) for a fairly complex internal tool"
Did you catch what the author didn't mention? Are the tests any good? Are they even tests? I'm playing with Opus now (best entertainment for a coder), it is excellent at writing fake code and fabricating results. It wrote me a test that validates an extremely complex utility, and the test passed!
What was the test? Call utility with invalid parameters and check that there is an error.
> This takes a fairly large mindset shift, but the hard work is the conceptual thinking, not the typing.
But the hard work always was the conceptual thinking? At least at and beyond the Senior level, for me it was always the thinking that's the hard work, not converting the thoughts into code.
We are in that era where most people don't know about software quality enough to have an influence that annihilates the hole market. This makes me think of the lemon market thing ; eg : if people are unable to know what product qualitative or bad - then everything collapses.
The cost of building the first version of FB has dropped 90%
The cost of building the next FB stays the same
More sophisticated tools mean more refined products.
If an easier and cheaper method for working carbon fiber becomes broadly available, it won't mean you get less money; it means you'll now be cramming carbon fiber in the silverware, in the shoes, in baby strollers, EVERYWHERE. The cost of a carbon fiber bike will drop 90%, but teams will be doing a LOT more.
You could say the cost per line of code has dropped 90%, but the number of lines of code written will 100x.
How we would design a rigorous study that measures total cost of ownership when teams integrate AI assistance, including later maintenance and defect rates, rather than just initial output speed?
> I believe the AI agentic coders will threat tech giants more than it - collectively - threats software engineers.
Currently, I don't think so. Coding agents' performance generally depends on the quality of the model behind them. Running a powerful model is assets-dependent. Not everyone has the hardware and power to support Sonnet 4.5 or Gemini 3 even if they are open-source. So, before the top notch models can be deployed on personal computing devices, I would not say coding agents will threat any organization.
We should find out pretty quick, I'd probably suggest by now already, 10x is an absolutely massive productivity increase. That would essentially mean, all the software development my team did by November could have instead been finished by the end of January.
Building the software is only a small part of the overall cost. For any significant production deployment, the cost goes into change management and support.
Only a miniscule part of the work is green-field development. Everything else is managing a mess.
If software actually is 90% cheaper to build in 2026 there will be 10x the simple apps and abandonware to follow. Throwaway software like throwaway phones. It’ll be weird.
But you need: a staff level engineer to guide it, great standardization and testing best practices. And yes in that situation you can go 10-50x faster. Many teams/products are not in that environment though.
I work on a big ball of open source spaghetti and AI has become invaluable in helping me navigate my way through it. Even when it's wrong - it gives me valuable clues.
The cost of writing software had dropped by 90% since outsourcing was invented and all the software jobs have moved to India was I was told 15 years ago.
Software Development is much more than writing code. Writing code may have become 90% easier, but a lot of the other development tasks haven't appreciably changed due to AI, although that might come. So, for now at least the answer to the question posed in the headline is no.
An exception might be building something that is well specified in advance, maybe because it's a direct copy of existing software.
a little anecdote:
i used gemini cli for a large implementation of a feature for a cpp api.
gemini did a huge amount of work i otherwise had to write by myself.
problem? in all this great work was somewhere a memory bug hidden. there was no error you just feed to the cli and call it a day. after 4 days debugging i found the bug. needless to say, gemini did not even once came close to where the bug was in the guessing game...
will this change in the future? we will see...
I don’t know if it’s 90%, but I’m shipping in 2 days things that took 2-4 weeks before.
Opus 4.5 in particular has been a profound shift. I’m not sure how software dev as a career survives this. I have nearly 0 reason to hire a developer for my company because I just write a spec and Claude does it in one shot.
It’s honestly scary, and I hope my company doesn’t fail because as a developer I’m fucked. But… statistically my business will fail.
I think in a few years there will only be a handful of software companies—the ones who already have control of distribution. Products can be cloned in a few weeks now; not long until it’s a few minutes. I used to see a new competitor once every six months. Now I see a new competitor every few hours.
Agreed. Opus 4.5 does feel like a real shift and I have had exactly your experience. I've shipped stuff that would have taken me weeks in days. And really to a much higher quality standard - test suites would have been far smaller if I'd built manually. And probably everything in MVP Bootstrap CSS.
I have no idea how you could debug something in two days that is sufficient to ship. I certainly think that an LLM could write a few thousand lines, but who could read them?
Are you shipping things you haven't reviewed at all, and pronouncing them high quality?
I find these threads baffling. I haven't seen a glut of new software anywhere. I certainly haven't seen a bunch of companies fixing the same bugs that have been sitting in their trackers for years. People keep telling me there's this deluge of LLM code happening, but it (the actual code) is all a secret and behind closed doors. Why in the world would you keep it a secret? Why would any multibillion dollar company that ships AI features have any known bugs left in their flagship products?
I haven't seen a difference anywhere when looking outwards. I personally find it useful, but I have to constantly force refactors and rearchitecting to make the code intelligible. When I add features, bugs get reinserted, refactors get reverted, and instrumentation silently disappears. If I don't do the constant refactors, I wouldn't even notice this was happening half the time.
it has for me. I'm probably paying less than 10%, saving on seas, occasional contract fees for custom integrations, and zapier fees linking them together.
I've no idea what's going on in the enterprise space, but in the small 1-10 employee space, absolutely
I totally agree with you. I am working on a new platform right now for a niche industry. Maybe theres $10m ARR to make total in the industry. Last year, it wouldn’t be worth the effort to raise, hire a PM, a few devs, QA, etc. But for a solo dev like myself with AI, it definitely is worth it now.
Can we also take into account the mental cost associates with building software? Because how I see it, managing output from agents is way more exhausting than doing it ourself.
And obviously the cost of not upskilling in intricate technical details as much as before (aka staying at the high level perspective) will have to be paid at some point
It is pretty hard work huh! I was surprised. In my case, I was doing a personal project but in the end I felt a little crispy although the result was succesful.
I love how LLMs have made everyone forget how hard it is to verify software correctness and how hard it is to maintain existing software. There is endless gushing about how quickly LLMs can write code. Whenever I point out the LLMs make a lot of mistakes people just wave their hands and say software is easy to validate. The huge QA departments at all software shops would beg to disagree, along with the CVE database, the zero day brokers, etc. But you know, whatever, they're just boomers right?
Feels almost too on-the-nose to write "Betteridge's Law of Headlines" but the answer is obviously no. Look no further than the farce of their made-up "graph" of cost over time with no units or evidence.
In context of B2B SaaS products that require a high degree of customization per client, I think there could be an argument for this figure.
The biggest bottleneck I have seen is converting the requirements into code fast enough to prove to the customer that they didn't give us the right/sufficient requirements. Up until recently, you had to avoid spending time on code if you thought the requirements were bad. Throwing away 2+ weeks of work on ambiguity is a terrible time.
Today, you could hypothetically get lucky on a single prompt and be ~99% of the way there in one shot. Even if that other 1% sucks to clean up, imagine if it was enough to get the final polished requirements out of the customer. You could crap out an 80% prototype in the time it takes you to complete one daily standup call. Is the fact that it's only 80% there bad? I don't think so in this context. Handing a customer something that almost works is much more productive than fucking around with design documents and ensuring requirements are perfectly polished to developer preferences. A slightly wrong thing gets you the exact answer a lot faster than nothing at all.
This is a fascinating perspective — and honestly, it feels like one of those shifts we’ll only fully recognize in hindsight. The idea that software development could transition from months of coordination and engineering overhead to rapid iteration with small, high-leverage teams is both exciting and a little uncomfortable.
The 90% cost reduction isn’t just about efficiency — it’s about access. If the barrier to shipping software drops this dramatically, we’re likely standing at the edge of a new wave of innovation driven not just by engineers, but by domain specialists who previously couldn’t justify the investment.
The most interesting takeaway here is that technical mastery may become less of the moat, while contextual and domain intelligence becomes the real differentiator. That flips the traditional power structure in tech.
2026 might really be the year where “build fast, throw away, rebuild smarter” becomes normal instead of reckless.
Curious to see how fast organizations adapt — and who gets left behind simply because they assumed disruption would arrive slower.
Writing a giant unit test suite being the primary example that stuck out to me from that article really doesn't give a lot of credence to the question?
And yet, the conclusion seems to be as if the answer is yes?
Until AI can work organizationally as opposed to individually it'll necessarily be restricted in its abilities to produce gains beyond relatively marginal improvements (Saved 20 hours of developer time on unit tests) for a project that took X weeks/months/years to work it's way through Y number of people.
So sure, simple projects, simple asks, unit tests, projects handled by small teams of close knit coworkers who know the system in and out and already have the experience to differentiate between good code and bad? I could see that being reduced by 90%.
But, it doesn't seem to have done much for organizational efficiency here at BigCo and unit tests are pretty much the very tip of a project's iceberg here. I know a lot of people are using the AI agents, and I know a lot of people who aren't, and I worry for the younger engineers who I'm not sure have the chops to distinguish between good, bad, and irrelevant and thus leave in clearly extraneous code, and paragraphs in their documents. And as for the senior engineers with the chops, they seem to do okay with it although I can certainly tell you they're not doing ten times more than they were four years ago.
I kinda rambled at the end there, all that to say... organizational efficiency is the bug to solve.
(It's very difficult, I believe the 2D interfaces we've had for the last 40 years or whatever are not truly meeting the needs of the vast cathedrals of code we're working in, same thing for our organizations, our code reviews, everything man)
When the LLM code bases are too complex for the humans on deck to understand and debug… that sounds like the turning point when companies go back to real developers IMO. Any serious mission critical code needs knowledgeable humans on deck who can leap into again when s** hits the fan, put out fires and patch critical bugs.
this probably levels the playing field for a while, and then dramatically raises the bar over a longer period of time
as better engineers and better designers get more leverage with lower nuisance in the form of meetings and other people, they will be able to build better software with a level of taste and sophistication that wouldn't make sense if you had to hand type everything
Maybe I'm holding it wrong, but I don't actually see the huge productivity gains from LLM-assisted software development. Work is leaning on us to use AI—not requiring it yet, but we're at DEFCON 3, borderline 2 (DEFCON 1 being a Shopify situation). My team's experience is that it needs LOTS of handholding and manual fixing to produce even something basic that's remotely fit for production use.
I closed a comment from ~2.5y ago (https://news.ycombinator.com/item?id=36594800) with this sentence: "I'm not sure that incorporating LLMs into programming is (yet) not just an infinite generator of messes for humans to clean up." My experience with it is convincing me that that's just what it is. When the bills come due, the VC money dries up, and the AI providers start jacking up their prices... there's probably going to be a boom market for humans to clean up AI messes.
The cost of writing code has gone down - I don't think by 90%. Maybe by 30%, with a big asterisk that says something like "as long as someone has put some similar code somewhere or is a very mechanical refactor".
The thing is, writing code is just the first step on building software. You are reviewing what your AI generates, right? You will still be held responsible when it doesn't work. And you will have to maintain and support that code. That is, in my mind, also "building software".
This reminds me of the (amazing) Vim experts that zip around a codebase with their arcane keystrokes. I'm a main Vim user and I can't mimic a fraction of their power. It's mesmerizing to watch them edit files, it's as if their thoughs get translated into words on the screen.
I also know that editing is just the first step. If you skip the rest, you are being misled by an industry with vested interests.
> One objection I hear a lot is that LLMs are only good at greenfield projects. I'd push back hard on this. I've spent plenty of time trying to understand 3-year-old+ codebases where everyone who wrote it has left.
Where I am, 3 year old is greenfield, and old and large is 20 years old and has 8million lines of nasty c++.
I’ll have to wait a bit more I think …
honestly, my usage of AI is constantly evolving and sometimes I have days where I just don't use it at all, but it also allows me to do things I could never make myself do like realigning icons for 400+ models and also allows me to understand just enough of a subject to leave it behind and continue on my own.
AI has also probably saved me 100 hours of repetitive work at this point and completely elimitated the need to rely on other people working on time consuming configuration tasks and back-and-forward which used to stall work for me since I am the kind of person who will work for 20 hours until something is finished without losing that much in productivity.
Ai saves me like an hour per month tops. I still don't understand the hype. It's a solution in search of a problem. It can't solve the hard coding problems. And it doesn't say when it can't solve the essay ones either. It's less valuable than resharper. So the business value is maybe $10 a month. That can't finance this industry.
I read these sort of comments every so often and I do not understand them. You are in a sea of people telling you that they are developing software much quicker which ticks the required boxes. I understand that for some reason this isn't the case for your work flow, but obviously it has a lot more value for others.
If you are a chairmaker and everyone gains access to a machine that can spit out all the chair components but sometimes only spits out 3 legs or makes a mistake on the backs, you might find it pointless. Maybe it can't do all the nice artisan styles you can do. But you can be confident others will take advantage of this chair machine, work around the issues and drive the price down from $20 per chair to $2 per chair. In 24 months, you won't be able to sell enough of your chairs any more.
Maybe, or maybe the size of the chair market grows because with $2 chairs more buyers enter. The high end is roughly unaffected because they were never going to buy a low end chair.
> You are in a sea of people telling you that they are developing software much quicker which ticks the required boxes
But that's exactly not the case. Everyone is wondering what tf this is supposed to be for. People are vehemently against this tech, and yet it gets shoved down our throats although it's prohibitively expensive.
Coding should be among the easiest problems to tackle, yet none of the big models can write basic "real" code. They break when things get more complex than pong. And they can't even write a single proper function with modern c++ templating stuff for example.
They can actually - I thought they couldn’t , but the latest ones can, much to my surprise.
I changed my mind after playing with cursor 2 ( cursor 1 had lasted all of 10 mins), which actually wrote a full blown app with documentation, tests , coverage, ci/cd, etc. I was able to have it find a bug I encountered when using the app - it literally ran the code, inserted extra logs, grepped the logs , found the bug and fixed it.
> And they can't even write a single proper function with modern c++ templating stuff for example.
That's just not true. ChatGPT 4 could explain template concepts lucidly but would always bungle the implementation. Recent models are generally very strong at generating templated code, even if its fairly complex.
If you really get out into the weeds with things like ADL edge cases or static initialization issues they'll still go off the rails and start suggesting nonsense though.
> Coding should be among the easiest problems to tackle, yet none of the big models can write basic "real" code. They break when things get more complex than pong. And they can't even write a single proper function with modern c++ templating stuff for example.
> I've had Claude Code write an entire unit/integration test suite in a few hours (300+ tests)
I'd love to see someone do this, or a similar task, live on stream. I always feel like an idiot when I read things like this because despite using Claude Code a lot I've never been able to get anything of that magnitude out of it that wasn't slop/completely unusable, to the point where I started to question if I hadn't been faster writing everything by hand.
Claiming that software is now 90% cheaper feels absurd to me and I'd love to understand better where this completely different worldview comes from. Am I using the tools incorrectly? Different domains/languages/ecosystems?
100% agreed. I use Claude Code to write 90% of my code at this point, but have found that it is genuinely worse than a junior at writing meaningful test cases. Most of the time it will make up interfaces or mock things incorrectly etc to the point where I just give up and write them myself. The bulk of the “tests” it writes test things which are meaningless (does the interface exist etc). This is with typescript + vitest with opus 4.5.
This is the kind of piece that becomes popular around the top of the hype cycle when people are trying to keep it going but can sense that perhaps Wiley E coyote has run off the cliff and is suspended in the air. Obviously, by any possible objective indicator, the cost of software development has barely budged and "AGI" is nowhere in the offing while luminary scientists appear to be drawn to whatever the next big thing is, having seen the limits of their (admittedly impressive) creation.
I'm sure that AI tools will be here to stay and will become more integrated and better. I wonder what the final result will be, -20% productivity as in the METR study? +20%? Anything like 90% is the kind of sensationalism reserved for r/WallStreetBets
I am on the free tier of Gemini 3. With some intervention on my part, I got it to build, in Emacs Lisp, a primitive-recursive function for determining if a number is prime (by mu-recursive I mean a function built from the building blocks of a constant, successor and projection function, as well as a primitive recursive function and compositional function/macro). I was impressed, as previous models (including Anthropic and OpenAI) could not do this.
For the past few days I asked it to built a mu-recursive Ackermann function in Emacs Lisp (built on the primitive-recursive functions/operators, plus an extra operator - minimization). I said that the prime detector function it already built should be able to use the same functions/operators, and to rewrite code if necessary.
So far it has been unable to do this. If I thought it could but was stumbling over Emacs Lisp I might ask it to try in Scheme or Common Lisp or some other language. It's possible I'll get it to work in the time I have allotted from my daily free tier, but I have had no success so far. I am also starting with inputs to the Ackermann function of 0,0 - 0,1 - 1,0 - 1,1 to not overburden the system but it can't even handle 0, 0. Also it tries to redefine the Emacs Lisp keyword "and", which Emacs hiccups on.
A year ago LLMs were stumbling over Leetcode and Project Euler functions I was asking it to make. They seem to have gotten a little better, and I'm impressed Gemini 3 can handle, with help, primitive recursive functions in Emacs Lisp. Doesn't seem to be able to handle mu-recursive functions with minimization yet though. The trivial, toy implementations of these things. Also as I said, it tried to redefine "and" as well, which Emacs Lisp fell over on.
So it's a helpful helper and tool, but definitely not ready to hand things over to. As the saying goes, the first 90% of the code takes 90% of the time, and the last 10% of the code takes the other 90% of the time. Or the other saying - it's harder to find bugs than write code, so if you're coding at peak mental capacity, finding bugs becomes impossible. It does have its uses though, and has been getting better.
I think the author underestimate the forces that introduce coordination overhead.
"Good AI developers" are a mystery being (not really, but for corporate they are). Right now, companies are trying to measure them to understand what makes them tick.
Once that is measured, I can assure you that the next step is trying to control their output, which will inevitably kill their productivity.
> This then allows developers who really master this technology to be hugely effective at solving business problems.
See what I mean?
"If only we could make them work to solve our problems..."
You can! But that implies additional coordination overhead, which means they'll not be as productive as they were.
> Your job is going to change
My job changes all the time. Developers are ready for this. They were born of change, molded by it. You know what hasn't caught up with the changes?
However, the cost of software maintenance went up by 1000% . Lets hope you don't need to ever add a new business rule or user interface to your vibe coded software.
My tell-tale sign that AI is moving the needle is the disappearance of the concept of leetcode. If you've done an interview lately you would know AI hasn't moved any needles yet
By making up numbers and not supplying any evidence, you can come to any conclusion you like! Then you get to draw a graph with no units on it. Finally, you can say things that are objectively false like "These assertions are rapidly becoming completely false".
I don't really build software any more and have moved into other parts of the business. But I'm still a huge user of software and I'd just echo all the other comments asking if it's so easy to get all these great tools built and shipped, where are they? I can see that YouTube is flooded with auto-generated content. I can see that blogspam has skyrocketed beyond belief. I can see that the number of phishing texts and voicemails I get every day has gone through the roof. I don't see any flood of new CNCF incubating projects. I don't see that holy grail entire OS comparable to Linux but written in Rust. I don't see the other holy grail new web browser that can compete with Firefox, Chrome, and Safari. It's possible people are shipping more of the stripped down Jira clones designed for a team of ten that gets 60 customers and stops receiving updates after 2 years but that's not the kind of software that would be visible to me.
If you're replacing spreadsheets with a single-purpose web UI with proper access control and concurrent editing that doesn't need Sharepoint or Google Workspaces, fine, but if you're telling me that's going to revolutionize the entire industry and economy and justify trillions of dollars in new data centers, I don't think so. I think you need to actually compete with Sharepoint and Google Workspaces. Supposedly, Google and Microsoft claim to be using LLMs internally more than ever, but they're publicly traded companies. If it's having some huge impact, surely we'll see their margins skyrocket when they have no more labor costs, right?
Sure but that's the good of it. Lower labor cost = more productivity. The customer wins in the end because the equivalent product is cheaper or a better product costs the same. Businesses and employees still have to compete against each other so things won't get easier for them in the long term.
The customer only wins if the customer is the one using the tools directly, otherwise it leaves all the power in the hands of businesses who's only real goal is maximum profits. And without already possessing domain knowledge to be able to guide, judge, and correct AI along the way, its existence will be of limited use to consumers and business will not feel much pressure to make anything cheaper, it just leaves more margin to funnel to the top.
Except this is capitalism, so any improvements will go disproportionately to the owners. This narrative of improvements for customers has been repeated for decades and it keeps being wrong.
- Demirci, O., Hannane, J., & Zhu, X., “Who Is AI Replacing? The Impact of Generative AI on Online Freelancing Platforms,” Management Science (2024) - https://pubsonline.informs.org/doi/10.1287/mnsc.2024.05420%20PubsOnline
- Indeed Hiring Lab, “AI at Work Report 2025: How GenAI is Rewiring the DNA of Jobs” (September 2025) - https://www.hiringlab.org/wp-content/uploads/2025/09/Indeed-Hiring-Lab-AI-at-Work-Report-2025.pdf%20Indeed%20Hiring%20Lab
>Software engineering has got - in my opinion, often needlessly - complicated, with people rushing to very labour intensive patterns such as TDD, microservices, super complex React frontends and Kubernetes.
Let's say it is complicated. But what is the better alternative when dealing with large software? To what point we can simplify it and not lose anything important?
For me, the cost of motivating myself dropped significantly. I now feel like working on little things that have been pending tasks for ages. A db synch script here , an unearthed project from 12 years ago there , migrating project package versions , finding and fixing incomplete/missing data , refactoring legacy code to be suitable for unit testing , installing a bunch of cron jobs all in a day's work.
> Jevons Paradox says that when something becomes cheaper to produce, we don't just do the same amount for less money. Take electric lighting for example; while sales of candles and gas lamps fell, overall far more artificial light was generated.
We might actually get all the software we actually need. We won’t have to listen to antiquated DMV/IRS/health systems not being updated because the projects designed to replace them failed.
It will be interesting how this goes moving forward. Agents learn from massive scraping. With the newest tools and frameworks there is nothing but documentation and initial examples to scrape. And now that agent output is flooding everything it can be expected there will be a lot of feedback with automated learning early in development cycles.
Lots of applications have a simple structure of collecting and operating data with fairly well documented business logic tying everything together. Coding outside of that is going to be more tricky.
And if agentic coding is so great then why are there so still so many awful spreadsheets that can't compete with Excel? Something isn't adding up quite as well as some seem to expect.
Perhaps the cost will drop over time, but it will be because writing code is becoming more accessible. It's not just because of AI, but the natural progress of education and literacy on the topic that would have happened anyway.
What I see are salaries stagnating and opportunity for new niche roles or roles being redefined to have more technical responsibility. Is this not the future we all expected before AI hype anyway? People need to relax and refocus on what matters.
This article was more of an advertisement for...something than any meaningful commentary.
How good are tests written by AI, really? The junk "coverage" unit tests sure, but well thought out integration tests? No way. Testing code is difficult, some AI slop isn't going to make that easier because someone has to know the code and the infrastructure it is going in to and reason about all of it.
I've only been working with AI for a couple of months, but IMHO it's over. The Internet Age which ran 30 years from roughly 1995-2025 has ended and we've entered the AI Age (maybe the last age).
I know people with little programming experience who have already passed me in productivity, and I've been doing this since the 80s. And that trend is only going to accelerate and intensify.
The main point that people are having a hard time seeing, probably due to denial, is that once problem solving is solved at any level with AI, then it's solved at all levels. We're lost in the details of LLMs, NNs, etc, but not seeing the big picture. That if AI can work through a todo list, then it can write a todo list. It can check if a todo list is done. It can work recursively at any level of the problem solving hierarchy and in parallel. It can come up with new ideas creatively with stable diffusion. It can learn and it can teach. And most importantly, it can evolve.
Based on the context I have before me, I predict that at the end of 2026 (coinciding with the election) America and probably the world will enter a massive recession, likely bigger than the Housing Bubble popping. Definitely bigger than the Dot Bomb. Where too many bad decisions compounded for too many decades converge to throw away most of the quality of life gains that humanity has made since WWII, forcing us to start over. I'll just call it the Great Dumbpression.
If something like UBI is the eventual goal for humankind, or soft versions of that such as democratic socialism, it's on the other side of a bottleneck. One where 1000 billionaires and a few trillionaires effectively own the world, while everyone else scratches out a subsistence income under neofeudalism. One where as much food gets thrown away as what the world consumes, and a billion people go hungry. One where some people have more than they could use in countless lifetimes, including the option to cheat death, while everyone else faces their own mortality.
"AI was the answer to Earth's problems" could be the opening line of a novel. But I've heard this story too many times. In those stories, the next 10 years don't go as planned. Once we enter the Singularity and the rate of technological progress goes exponential, it becomes impossible to predict the future. Meaning that a lot of fringe and unthinkable timelines become highly likely. It's basically the Great Filter in the Drake equation and Fermi paradox.
This is a little hard for me to come to terms with after a lifetime of little or no progress in the areas of tech that I care about. I remember in the late 90s when people were talking about AI and couldn't find a use for it, so it had no funding. The best they could come up with was predicting the stock market, auditing, genetics, stuff like that. Who knew that AI would take off because of self-help, adult material and parody? But I guess we should have known. Every other form of information technology followed those trends.
Because of that lack of real tech as labor-saving devices to help us get real work done, there's been an explosion of phantom tech that increases our burden through distraction and makes our work/life balance even less healthy as underemployment. This is why AI will inevitably be recruited to demand an increase in productivity from us for the same income, not decrease our share of the workload.
What keeps me going is that I've always been wrong about the future. Maybe one of those timelines sees a great democratization of tech, where even the poorest people have access to free problem solving tech that allows them to build assistants that increase their leverage enough to escape poverty without money. In effect making (late-stage) capitalism irrelevent.
If the rate of increasing equity is faster than the rate of increasing excess, then we have a small window of time to catch up before we enter a Long Now of suffering, where wealth inequality approaches an asymptote making life performative, pageantry for the masses who must please an emperor with no clothes.
In a recent interview with Mel Robbins in episode 715 of Real Time, Bill Maher said "my book would be called: It's Not Gonna Be That" about the future not being what we think it is. I can't find a video, but he describes it starting around the 19:00 mark:
> I've had Claude Code write an entire unit/integration test suite in a few hours (300+ tests) for a fairly complex internal tool. This would take me, or many developers I know and respect, days to write by hand.
I should have stopped reading here. People who think that the time it takes to write some code is the only metric that matters are only marginally better than people who rank employees by lines of code.
I tried leaning in. I really tried. I'm not a web developer or game developer (more robotics, embedded systems). I tried vibe coding web apps and games. They were pretty boring. I got frustrated that I couldn't change little things. I remember getting frustrated that my game character kept getting stuck on imaginary walls and kept asking Cursor to fix it and it just made more and more of a mess. I remember making a simple front-end + backend with a database app to analyze thousands of pull request comments and it got massively slow and I didn't know why. Cursor wasn't very helpful in fixing it. I felt dumber after the whole process.
The next time I made a web app I just taught myself Flask and some basic JS and I found myself moving way more quickly. Not in the initial development, but later on when I had to tweak things.
The AI helped me a ton with looking things up: documentation, error messages, etc. It's essentially a supercharged Google search and Stack Overflow replacement, but I did not find it useful letting it take the wheel.
reply