Hacker Newsnew | past | comments | ask | show | jobs | submit | planckscnst's commentslogin

I mostly use LLMs in a zero-touch way - I never actually edit code and I almost never read it. But I do still dive into the details by exploring it with targeted questions through the LLM. Sometimes I go through ridiculously long sessions to get the LLM to "see" the correct/optimal/simplest/etc solution itself. There are many times when it simply never gets there no matter how close I get the horse to the water. I recently did one of these sessions yesterday and it reinforced my impression that systems like gastown and pure Ralph loop style is just not ever going to have the quality I'm looking for, and it's going to cost a lot of money not to get there.

I've honed a relatively decent flow that requires interaction from me for important parts (mostly) while making its own decisions at the not-important parts (mostly). This results in being able to send the agent off on an hours-long dev cycle and have relatively decent results after that need a few minor fixes. I think this is the best style for the current generation of AI


When they blocked OpenCode, I was in the middle of adding a feature. I don't think it's possible to mimic CC in an undetectable way and have the feature work.

The feature allows the LLM to edit the context. For example, you can "compact" just portions of the conversation and replace it with a summary. Anthropic can see that the conversation suddenly doesn't share the same history as previous API calls.

In fact, I ported the feature to Claude Code using tweakcc, so it literally _is_ Claude Code. After a couple days they started blocking that with the same message that they send when they block third party tools.


They even block Claude Code of you've modified it via tweakcc. When they blocked OpenCode, I ported a feature I wanted to Claude Code so I could continue using that feature. After a couple days, they started blocking it with the same message that OpenCode gets. I'm going to go down to the $20 plan and shift most of my work to OpenAI/ChatGPT because of this. The harness features matter more to me than model differences in the current generation.


"selected" and "highlighted" would also be useful


There is so much work we can do with harnesses that can make the already existing models so much more capable. I definitely feel the author's frustration as I've also been working on some harness stuff. When Anthropic subscriptions got cut off from OpenCode and other third party tools, I was very disappointed because the model I do the most work in is Claude and I was specifically developing a change [1] in the hopes it would make Claude even better. After that, I started implementing the feature in Claude Code directly (using tweakcc) and after a day of working on that, they even block my tweaked Claude Code with the same message. It means I simply won't be able to use this idea with Claude at all

[1]: the README.md describes the Context Bonsai features in my fork here: https://github.com/Vibecodelicious/opencode


I'm working on lots of projects. My favorite is what I call "context bonsai" where I'm giving LLM harnesses the ability to surgically edit the context. It's available as a tool. You can say "remove that failed debugging session and write a summary of what we learned." Or you can take a more hands-on approach and say "remove messages msg_ID1 through msg_ID2". The removal leaves a summary and keywords, and the original messages can be pulled back into context if the LLM thinks they're useful.

I would really like people to try it out and report bugs, failures, and successes.

https://github.com/Vibecodelicious/opencode/blob/surgical_co...

I'm currently trying to get the LLM to be more proactive about removing content that is no longer useful in order to stay ahead of autocompaction and also just to keep the context window small and focused in general.


I find it fascinating to give the LLMs huge stacks of reflective context. It's incredible how good they are at feeling huge amounts of csv like data. I imagine they would be good at trimming their context down.

I did some experiments by exposing the raw latent states, using hooks, of a small 1B Gemma model to a large model as it processed data. I'm curious if it is possible for the large model to nudge the smaller model latents to get the outputs it wants. I desperately want to get thinking out of tokens and into latent space. Something I've been chasing for a bit.


Yes - I think there is untapped potential into figuring out how to understand and use the latent space. I'm still at the language layer. I occasionally stumble across something that seems to tap into something deeper and I'm getting better at finding those. But direct observability and actuation of those lower layers is an area that I think is going to be very fruitful of we can figure it out


I'm sure you're aware but it's worth pointing out that you will lose all your cache hit discounts with some providers. The next turn will incur the cost of the whole trajectory billed at fresh input token rates.

As an aside, 95 pages into the system card for Claude Opus 4.6, Anthropic acknowledges that they have disabled prompt prefill.


Yes, I have already made deliberate cache decisions and plan to do more once it's working the way I imagine. I think the trimmed down context will have way bigger effect than the cache stuff, though.

As far as I understand, it's caches are not a "next-turn" thing, but a ttl thing.

I made the "retrieve" tool, which is what pulls back previously removed content, append to the conversation rather than putting it back where it previously was. But it's a but premature to really know if that's a real optimization.


I think this is a valuable direction for research to take. I suspect human emotion is represented in the training data and that there is some underlying model representing human emotional systems present within current LLMs. If we can understand how the model affects response given certain qualities in provided stimuli, then we can effectively use this embedded model to improve our understanding of both LLMs and Humans and also get more productive output from LLMs.


Thanks — I think you're right that emotional dynamics are already latent in training data. The question is whether we can make that implicit model explicit and architectural, so it's auditable and controllable rather than emergent and opaque. That's what HEART attempts: rather than hoping the LLM has internalized useful emotional patterns, we create an explicit 18-dimensional state that actively modulates retrieval and reasoning. The tradeoff is added complexity, but the gain is transparency — you can trace exactly how emotional state influenced a decision. Curious whether you think the interpretability benefits outweigh the engineering overhead, or if there's a lighter-weight approach that could get similar results.


That's so strange. I found GPT to be abysmal at following instructions to the point of unusability for any direction-heavy role. I have a common workflow that involves an orchestrator that pretty much does nothing but follow some simple directions [1]. GPT flat-out cannot do this most basic task.

[1]: https://github.com/Vibecodelicious/llm-conductor/blob/main/O...


Strange behaviour and LLMs are the iconic duo of the decade. They've definitley multiplied my productivity, since now instead of putting off writing boring code or getting stuck on details till I get frustrated and give up I just give it to an agent to figure out.

I don't thing my ability read, understand, and write code is going anywhere though.

Neat tool BTW, I'm in the market for something like that.


I've found this orchestrator+reviewer+judge setup to yield much better results than anything else I've tried. And it's such a simple setup - a few markdown files.

I'm also creating one that is similar, but purpose-built for making the plans that this setup can orchestrate. It still needs some tweaking to get agents to follow it better - it still takes additional prompting to nudge it down the proper path. But I've had similar benefits - sending plans through this adversarial review loop has yielded significant improvements in final output.

https://github.com/Vibecodelicious/llm-conductor/blob/main/p...


Unrelated but this just happened and I thought of you ;-)

  › commit this, **SKIP BEADS**


  • Ran git status -sb
    └ ## main...origin/main [ahead 4]
       M containers/frigate.nix
      ?? .beads/bd.sock.startlock

  • I see an untracked .beads/bd.sock.startlock file that might be normal but needs clarification. I also note the requirement to include a bead and confirm bead readiness
    before proceeding, so I'll ask whether to create the bead and if the untracked file should be included in the commit.

  ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

  • I can’t skip beads for this repo. The AGENTS.md rules are explicit: no work (including commits) without an open bead. Please confirm you want me to create a bead for
    this commit.
I don't know what's wrong with your Codex, but mine can't bring itself to break the rules.


The API key is not a subscription. The title says subscriptions are blocked from using third-party tools. Or am I misunderstanding?


Headline's been edited since my post. It previously said something along the lines of "Anthropic bans API use in OpenCode CLI"


I've been (adding an OpenCode feature that allows the LLM to edit its own context)[1] and trying to debug an issue with the Anthropic API because I'm calling it with missing fields that it expects. I hope my multiple erronious API calls aren't what triggered this decision

[1]: https://github.com/Vibecodelicious/opencode/tree/surgical_co...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: