I think most of this trial and error "You are an experienced engineer" stuff probably hurts model performance. No one ever does comprehensive testing so eh, yolo.
There are papers showing that models follow instructions less the more instructions they have. Now you think about how many instructions are embedded in that MD + the system prompt + likely a local AGENTS.md and at the end there is probably very little here that matters.
When you have a hammer, everything looks like a nail. Ad nauseam.
AI has made it possible for me to build several one-off personal tools in the matter of a couple of hours and has improved my non-tech life as a result. Before, I wouldn't even have considered such small projects because of the effort needed. It's been relieving not to have to even look at code, assuming you can describe your needs in a good prompt. On the other hand, I've seen vibe coded codebases with excessive layers of abstraction and performance issues that came from a possibly lax engineering culture of not doing enough design work upfront before jumping into implementation. It's a classic mistake, that is amplified by AI.
Yes, average code itself has become cheap, but good code still costs, and amazing code, well, you might still have an edge there for now, but eventually, accept that you will have to move up the abstraction stack to remain valuable when pitted against an AI.
What does this mean? Focus on core software engineering principles, design patterns, and understanding what computer is doing at a low level. Just because you're writing TypeScript doesn't mean you shouldn't know what's happening at the CPU level.
I predict the rise in AI slop cleanup consultancies, but they'll be competing with smarter AIs who will clean up after themselves.
https://amirmalik.net - I haven't blogged in a while, but have been experimenting with single-file build-step-free HTML tools (inspired by simonw's tool catalog) at https://amirmalik.net/tools -- I'm hoping to add more "bring your own API key" local-first mini tools that store their data in IndexedDB or OPFS and sync. I should probably write a post about it :)
I've built the same thing twice, first with Firecracker microVM, and second time using containers (gVisor).
While the microVM route is more secure, it's more complicated and ops are tricky, but you can do some cool things to optimize startup time like when I was working on a function as a service platform, and to reduce TTFB, I trapped the `listen()` call, sent a VSOCK message to the VMM to trigger a freeze, snapshot the VM and save it as a "template". Then for every request, the snapshot was cloned (with some file system tricks like CoW) and resumed to handle the request. It "just" worked, but the orchestration was kludgy.
In the second incarnation of this, I decided to use Linux containers with the gVisor sandbox. You can take a look at my project https://github.com/ammmir/sandboxer which uses Podman and gVisor underneath; it's good enough for a prototype. Later on, you can swap it out with Firecracker microVM, if necessary. In fact, I'm thinking of adding microVM support to sandboxer itself. If you wanted to do it yourself, swap out ContainerEngine() with a new implementation based on calling out to Firecracker. You'll need some way to do disk volume management (grow, clone, shared, cross-machine? good luck!), snapshots, etc.
Thank you for your guidance! We were thinking about using Docker and eventually settled on Firecracker.
Also, an interesting project you got there. If you are interested, would it be possible to invite you over to our project Discord? Would love to hear more of your experience.
SCSI had a reputation of being very stable and yet very finicky. Stable in the sense that not using the CPU for transfers yielded good performance and reliability. The finicky part was the quality of equipment (connectors, adapters, cables and terminators) something that led to users having to figure out the best order of connecting their devices in a chain that actually worked. “Hard drive into burner an always the scanner last.”
SEEKING WORK | Bangkok, Thailand | REMOTE (APAC timezone)
Hi, I'm a seasoned software professional with 15+ years of experience across the stack, from low-level systems and protocols to web and mobile apps to DevOps CI/CD pipeline engineering to modern AI/LLM/agentic workflows. I like solving real business problems using stable and proven tools, as well as prototyping ideas, so whether you're looking to build a v1 of your product, a DevOps engineer, or looking for a CTO for a more established org, please reach out!
Hey HN! I'm a seasoned software professional with 15+ years of experience across the stack, from low-
level systems and protocols to web and mobile apps to modern AI/LLM/agentic workflows. I like solving real business problems using stable and proven tools, as well as prototyping ideas, so whether you're looking to build a v1 of your product or looking for a CTO for a more established org, please reach out!
I was using Zed up until a few months ago. I got fed up with the entire AI panel being an editable area, so sometimes I ended up clobbering it. I switched to Cursor, but now I don't "trust" the the editor and its undo stack, I've lost code as a result of it, particularly when you're in mid-review of an agentic edit, but decide to edit the edit. The undo/redo gets difficult to track, I wish there was some heirarchical tree view of history.
The restore checkpoint/redo is too linear for my lizard brain. Am I wrong to want a tree-based agentic IDE? Why has nobody built it?
Interesting. I actually like the editable format of the chat interface because it allows fixing small stuff on the fly (instead of having to talk about it with the model) and de-cluttering the chat after a few back and forths make it a mess (instead of having to start anew), which makes the context window smaller and less confusing to the model, esp for local ones. Also, the editable form makes more sense to me, and it feels more natural and simple to interact with an LLM assistant with it.
Yes! Editing the whole buffer is a major feature because the more you keep around failed attempts and trash the dumber the model gets (and more expensive).
If you're working on stuff like marketing websites that are well represented in the model dataset then things will just fly, but if you're building something that is more niche it can be super important to tune the context -- in some cases this is the differentiating feature between being able to use AI assistance at all (otherwise the failure rate just goes to 100%).
> I actually like the editable format of the chat interface because it allows fixing small stuff on the fly
Fully agreed. This was the killer feature of Zed (and locally-hosted LLMs). Delete all tokens after the first mistake spotted in generated code. Then correct the mistake and re-run the model. This greatly improved code generation in my experience. I am not sure if cloud-based LLMs even allow modifying assistant output (I would assume not since it becomes a trivial way to bypass safety mechanisms).
The only issue I would imagine is not being able to use prompt caching, which can increase the cost of API calls, but I am not sure if prompt caching is used in general in such a context in the first place. Otherwise you just send the "history" in a json file, there is nothing mystical about llm chats really. If you use an API you can just send to autocomplete whatever you want.
Ah that's a bummer. You can still add threads as context, but that you cannot use slash commands there, so the only way to add them or other stuff in the context is to click buttons with the mouse. It would be nice if at least slash commands were working there.
edit: actually it is still possible to include text threads in there
It actually seems to work for me. I have an active text thread and it was added automatically to my inline prompt in the file. There was this box on the bottom of the inline text box. I think I had to click it the first time to include the context, but the subsequent times it was included by default.
Yeah, It was great because you were in control of where and when the edits happened.
So you could manage the context with great care, then go over to the editor and select specific regions and then "pull in" the changes that were discussed.
I guess it was silly that I was always typing "use the new code" in every inline assist message.
A hotkey to "pull new code" into a selected region would have been sweet.
I don't really want to "set it and forget it" and then come back to some mega diff that is like 30% wrong. Especially right now where it keeps getting stuck and doing nothing for 30m.
Been using cline and their snapshot/rewind/remove context (even out-of-order) features are really shining especially with larger projects and larger features+changes becoming more commonplace with stronger LLMs.
I would recommend you check it out if you've been frustrated by the other options out there - I've been very happy with it. I'm fairly sure you can't have git-like dag trees, nor do I think that would be particularly useful for AI based workflow - you'd have to delegate rebasing and merge conflict resolution to the agent itself... lots of potential for disaster there, at least for now.
omg. "the entire AI panel being an editable area" is the KILLER feature for me!
I have complete control, use my vim keys, switch models at will and life is awesome.
What I don't like in the last update is that they removed the multi-tabs in the assistant. Previously I could have multiple conversations going and switch easily, but now I can only do one thing at a time :(
Haven't tried the assistant2 much, mostly because I'm so comfy with my current setup
You will not catch me using the words "agentic IDE" to describe what I'm doing because its primary purpose isn't to be used by AI any more than the primary purpose of a car is to drive itself.
But yes, what I am doing is creating an IDE where the primary integration surface for humans, scripts, and AIs is not the 2D text buffer, but the embedded tree structure of the code. Zed almost gets there and it's maddening to me that they don't embrace it fully. I think once I show them what the stakes of the game are they have the engineering talent to catch up.
The main reason it hasn't been done is that we're still all basically writing code on paper. All of the most modern tools that people are using, they're still basically just digitizations of punchcard programming. If you dig down through all the layers of abstractions at the very bottom is line and column, that telltale hint of paper's two-dimensionality. And because line and column get baked into every integration surface, the limitations of IDEs are the limitations of paper. When you frame the task of programming as "write a huge amount of text out on paper" it's no wonder that people turn to LLMs to do it.
For the integration layer using the tree as the primary means you get to stop worrying about a valid tree layer blinking into and out of existence constantly, which is conceptually what happens when someone types code syntax in left to right. They put an opening brace in, then later a closing brace. In between a valid tree representation has ceased to exist.
Representing undo/redo history as a tree is quite different from representing the code structure as a tree. On the one hand I'm surprised no one seems to care that a response has nothing to do with the question... on the other hand, these AI tooling threads are always full of people talking right past each other and being very excited about it, so I guess it fits.
They certainly can be quite different things and in all current systems I know of the two are unrelated, but in my system they are one and the same.
That's possible because the source of truth for the IDE's state is an immutable concrete syntax tree. It can be immutable without ruining our costs because it has btree amortization built into it. So basically you can always
construct a new tree with some changes by reusing most of the nodes from an old tree. A version history would simply be a stack of these tree references.
I've very interested in this, and completely agree we are still trying to evolve the horse carriage without realizing we can move away from it.
How can I follow up on what you're building? Would you be open to having a chat? I've found your github, but let me know how if there's a better way to contact you.
Two months ago, I started exploring how LLMs can securely run arbitrary code. Since then, we've seen Manus and others build code inside sandboxes and I believe there are some YC startups in this space, too! I wrote a blog post [1] about building a simplistic version of this using Jupyter Notebook, but since then I've built a fully open source sandboxing server with more ergonomic HTTP endpoints (MCP should be next I guess?) and a half-decent UI for humans (see the demo video in the README).
A novel concept that I haven't seen implemented properly yet, perhaps useful for AI coding agents, is that a sandbox can be forked at any point. Similar to how you can fork a PostgreSQL database, you can fork a sandbox, which creates an independent sandbox with all of the changes in it. Technically, I tried to implement this with checkpoint/restore using CRIU, but ran into some issues with nesting beyond 2 levels deep and custom user namespaces for security. And it was difficult to use get CRIU to work with Linux programs that use shared memory segments, and other Unixy things. I ended up switching to file system diffs and using reflinks on XFS to get some Copy-on-Write semantics.
Features:
* Automatic HTTPS with unique URL per sandbox (no need to deal with ingresses or exposing ports)
* Static token auth or GitHub app auth
* Built-in UI
* Multi-tenant ready: each user gets their own network
* List, download, and upload files into sandboxes
* Fork sandboxes to create arbitrary depths of clones
It's still in early stages, but it should be usable. I'd love your feedback and ideas on where to take this :) Personally, I want to use this as a code execution backend for local AI agents.
Hey HN! I'm a seasoned software professional with 15+ years of experience across the stack, from low-level systems and protocols to web and mobile apps to modern AI/LLM/agentic workflows. I like solving real business problems using the latest tools, without introducing too many shiny new toys.
I'm looking for short-term consulting gigs (open to long-term), so whether you're looking to prototype something new in order to catch the AI hype train, or something more traditional, please reach out ASAP!
How much context is eaten up by skills that rehash what a SOTA model should already know?
Maybe token-wise, it's a wash: Elixir/OTP does a lot without third-party libs, which would require massive npm dependencies to achieve the same thing.
reply