More

AlexCoventry · 2026-01-04T08:37:34 1767515854

If it had been about taking out dictators, they were kind of spoiled for choice in that regard. They could have picked an easier one, or at least one which made strategic sense in some way.

https://chatgpt.com/share/695a2613-97e8-800e-b2e4-28fc7707f2...

AlexCoventry · 2026-01-04T08:07:23 1767514043

I like the idea, but this [1]:

    # Check for POSITIVE patterns (new in v3)
    elif echo "$PROMPT" | grep -qiE "perfect!|exactly right|that's exactly|that's what I wanted|great approach|keep doing this|love it|excellent|nailed it"; then

is fanciful.

[1] https://github.com/BayramAnnakov/claude-reflect/blob/main/sc...

Bayram · 2026-01-04T08:11:02 1767514262

Sorry, i will fix that

Bayram · 2026-01-04T08:14:06 1767514446

Created an issue to track , will fix tomorrow: https://github.com/BayramAnnakov/claude-reflect/issues/2

AlexCoventry · 2026-01-03T20:50:33 1767473433

No, there's no training going on, here, as far as I can tell. E.g., they use GPT-5 as their base model. Also, AFAICT from a quick skim/search there's no mention of loss functions or derivatives, FWIW.

alextheparrot · 2026-01-04T01:57:15 1767491835

The derivative being a grad(ient) student sampling scaffolds against evals + qualitative observations: most prompt-based llm papers

AlexCoventry · 2025-12-29T22:06:38 1767045998

I think most of the progress is training by reinforcement learning on automated assessments of the code produced. So data is not really an issue.

AlexCoventry · 2025-12-25T16:36:13 1766680573

Explosive ignition of a fire.

AlexCoventry · 2025-12-23T03:43:01 1766461381

The PRC has nothing remotely corresponding to the Fourth Amendment, as far as I know.

AlexCoventry · 2025-12-20T02:14:25 1766196865

This is probably a bit different. An LLM outputs a token at a time ("autoregressively") by sampling from a per-position token probability distribution, which depends on all the prior context so far. While the post doesn't describe OpenRouter's approach, most structured LLM output works by putting a mask over that distribution, so that any token which would break the intended structure has probability zero and cannot be sampled. So for instance, in the broken example from the post,

    {"name": "Alice", "age": 30

the standard LLM output would have stopped there because the LLM output an end-of-sequence (EOS) token. But because that would lead to a syntax error in JSON, the EOS token would have probability zero, and it would be forced to either extend the number "30", or add more entries to the object, or end it with "}".

I haven't played much with structured output, but I imagine the biggest risk is that you may force the model to work with contexts outside its training data, leading it to produce garbage, though hopefully syntactically-correct garbage.

I don't understand, though, why the probability of incorrect JSON wouldn't go to 0, under this framework (unless you hit the max sequence length before the JSON ended.) The post implies that JSON errors still happen, so it's possible they're doing something else.

AlexCoventry · 2025-12-20T01:59:21 1766195961

What do you find interesting about it, and how does it compare to commercial offerings?

simonw · 2025-12-20T03:49:58 1766202598

It's rare to find a local model that's capable of running tools in a loop well enough to power a coding agent.

I don't think gpt-oss:20b is strong enough to be honest, but 120b can do an OK job.

Nowhere NEAR as good as the big hosted models though.

ontouchstart · 2025-12-20T05:03:43 1766207023

Think of it as the early years of UNIX & PC. Running inferences and tools locally and offline opens doors to new industries. We might not even need client/server paradigm locally. LLM is just a probabilistic library we can call.

AlexCoventry · 2025-12-20T05:17:36 1766207856

Thanks.

AlexCoventry · 2025-12-15T04:22:21 1765772541

With the massive dependencies we tolerate these days, the risk of supply-chain attacks has already been enormous for years, so I was already in the habit of just doing all my development in a VM anyway, except for throwaway scripts with no dependencies. It amazes me that people don't do that.

AlexCoventry · 2025-12-08T02:11:18 1765159878

The fundamental ideas in the paper aren't particularly novel. They will probably work as advertised.