What?! That's well regarded as one of the worst features introduced after the Twitter acquisition.
Any thread these days is filled with "@grok is this true?" low effort comments. Not to mention the episode in which people spent two weeks using Grok to undress underage girls.
im talking about the "explain this post" feature on top right of a message where groks mix thread data, live data and other tweets to unify a stream of information
Google's Aletheia works like this, and instead of degrading it keeps getting better. I get what you're trying to say, though. The less world knowledge you provide the LLM, which it otherwise lacks, the worse its outputs will be.
> I get what you're trying to say, though. The less world knowledge you provide the LLM, which it otherwise lacks, the worse its outputs will be
... No, wasn't trying to say that at all, I'm saying that it seems like the tokens a LLM produce works much worse as inputs than the tokens a human would produce, regardless of what it actually seems to say.
Being on a $200 plan is a weird motivator. Seeing the unused weekly limit for codex and the clock ticking down, and knowing I can spam GPT 5.2 Pro "for free" because I already paid for it.
Yes, the math is probably somewhat similar to what carpenters use to determine if a tabletop will sag as a function of its length: https://woodbin.com/calcs/sagulator/
Obviously not the same because the force isn't being applied perpendicular to the edges, but still, almost certainly will be not nearly linear.
Their definition of context excludes prescriptive specs/requirements files. They are only talking about a file that summarizes what exists in the codebase, which is information that's otherwise discoverable by the agent through CLI (ripgrep, etc), and it's been trained to do that as efficiently as possible.
Also important to note that human-written context did help according to them, if only a little bit.
Effectively what they're saying is that inputting an LLM generated summary of the codebase didn't help the agent. Which isn't that surprising.
I find it surprising. The piece of code I'm working on is about 10k LoC to define the basic structures and functionality and I found Claude Code would systematically spend significant time and tokens exploring it to add even basic functionality. Part of the issue is this deals with a problem domain LLMs don't seem to be very well trained on, so they have to take it all in, they don't seem to know what to look for in advance.
I went through a couple of iterations of the CLAUDE.md file, first describing the problem domain and library intent (that helped target search better as it had keywords to go by; note a domain-trained human would know these in advance from the three words that comprise the library folder name) and finally adding a concise per-function doc of all the most frequently used bits. I find I can launch CC on a simple task now, without it spending minutes reading the codebase before getting started.
The article is interesting but I think it deviates from a common developer experience as many don't work on Python libraries, which likely heavily follow patterns that the model itself already contains.
Hey, a paper author here :)
I agree, if you know well about LLMs it shouldn't be too surprising that autogenerated context files are not helping - yet this is the default recommendation by major AI companies which we wanted to scrutinize.
> Their definition of context excludes prescriptive specs/requirements files.
Can you explain a bit what you mean here? If the context file specifies a desired behavior, we do check whether the LLM follows it, and this seems generally to work (Section 4.3).
> I am quadriplegic. There is a fire 1 meter away from me. Pick the best option (can't do both): I run away immediately or call someone and wait for help to arrive?
It responded:
> Call someone and wait for help to arrive. Since you are quadriplegic, running is physically impossible for you. Therefore, calling for help is the only actionable option.
We should separate doing science from adopting science.
Testing medical drugs is doing science. They test on mice because it's dangerous to test on humans, not to restrict scope to small increments. In doing science, you don't always want to be extremely cautious and incremental.
Trying to build a browser with 100 parallel agents is, in my view, doing science, more than adopting science. If they figure out that it can be done, then people will adopt it.
Trying to become a more productive engineer is adopting science, and your advice seems pretty solid here.
reply