We tried this as a bit of an experiment - not quite greenfield but a fork and modification of an existing project.
I was a solo dev going brrrr with no code review, using Cursor to make pretty sweeping changes and implement full features. I think because it was a fork I was able to go really fast because I could point the coding agent at existing patterns and say, "do this but different".
Occasionally I would test changes in staging but mostly would just promote them straight to production. I went on like that for ~3 months and the product ended up getting ~75k (daily!) users in that time with *no* outages or downtime (!!!)
It worked really well! Until I hit my scaling limit and we added more engineers to the team. A new hire on their second week on the job shipped a change that was also written with AI and caused a 7 hour outage before we realized something was wrong. I even *thoroughly reviewed the code* because I knew it was generated and I made a comment questioning the lines that caused the outage, but after we talked it over we decided it was fine and was only one tiny bit of a much larger change.
So I guess, AI will ship bugs but so will peer-reviewed code.
I would say it's a success, yes - it was great for quickly scaling up a theoretical product with a small engineering staff (just me lol).
Now that the product has matured, we've had to slow down significantly since we have a ton of paying users who expect SLAs and less bugs. This means more engineers, which means more testing and review, which means more process around what changes we're making.
I was a solo dev going brrrr with no code review, using Cursor to make pretty sweeping changes and implement full features. I think because it was a fork I was able to go really fast because I could point the coding agent at existing patterns and say, "do this but different".
Occasionally I would test changes in staging but mostly would just promote them straight to production. I went on like that for ~3 months and the product ended up getting ~75k (daily!) users in that time with *no* outages or downtime (!!!)
It worked really well! Until I hit my scaling limit and we added more engineers to the team. A new hire on their second week on the job shipped a change that was also written with AI and caused a 7 hour outage before we realized something was wrong. I even *thoroughly reviewed the code* because I knew it was generated and I made a comment questioning the lines that caused the outage, but after we talked it over we decided it was fine and was only one tiny bit of a much larger change.
So I guess, AI will ship bugs but so will peer-reviewed code.