Hacker Newsnew | past | comments | ask | show | jobs | submit | energy123's commentslogin

Grok usage is the most mystifying to me. Their model isn't in the top 3 and they have bad ethics. Like why would anyone bother for work tasks.

The lack of ethics is a selling point.

Why anyone would want a model that has "safety" features is beyond me. These features are not in the user's interest.


The X grok feature is one of the best end user feature or large scale genai

What is the grok feature? Literally just mentioning @grok? I don't really know how to use Grok on X.

What?! That's well regarded as one of the worst features introduced after the Twitter acquisition.

Any thread these days is filled with "@grok is this true?" low effort comments. Not to mention the episode in which people spent two weeks using Grok to undress underage girls.


high adoption means this works...

That's news to me, I haven't read a single Grok post in my life.

Am I missing out?


im talking about the "explain this post" feature on top right of a message where groks mix thread data, live data and other tweets to unify a stream of information

> Piping LLM output as input into new LLM calls

Google's Aletheia works like this, and instead of degrading it keeps getting better. I get what you're trying to say, though. The less world knowledge you provide the LLM, which it otherwise lacks, the worse its outputs will be.


> I get what you're trying to say, though. The less world knowledge you provide the LLM, which it otherwise lacks, the worse its outputs will be

... No, wasn't trying to say that at all, I'm saying that it seems like the tokens a LLM produce works much worse as inputs than the tokens a human would produce, regardless of what it actually seems to say.


Being on a $200 plan is a weird motivator. Seeing the unused weekly limit for codex and the clock ticking down, and knowing I can spam GPT 5.2 Pro "for free" because I already paid for it.

Yes, the math is probably somewhat similar to what carpenters use to determine if a tabletop will sag as a function of its length: https://woodbin.com/calcs/sagulator/

Obviously not the same because the force isn't being applied perpendicular to the edges, but still, almost certainly will be not nearly linear.


Their definition of context excludes prescriptive specs/requirements files. They are only talking about a file that summarizes what exists in the codebase, which is information that's otherwise discoverable by the agent through CLI (ripgrep, etc), and it's been trained to do that as efficiently as possible.

Also important to note that human-written context did help according to them, if only a little bit.

Effectively what they're saying is that inputting an LLM generated summary of the codebase didn't help the agent. Which isn't that surprising.


I find it surprising. The piece of code I'm working on is about 10k LoC to define the basic structures and functionality and I found Claude Code would systematically spend significant time and tokens exploring it to add even basic functionality. Part of the issue is this deals with a problem domain LLMs don't seem to be very well trained on, so they have to take it all in, they don't seem to know what to look for in advance.

I went through a couple of iterations of the CLAUDE.md file, first describing the problem domain and library intent (that helped target search better as it had keywords to go by; note a domain-trained human would know these in advance from the three words that comprise the library folder name) and finally adding a concise per-function doc of all the most frequently used bits. I find I can launch CC on a simple task now, without it spending minutes reading the codebase before getting started.


That's also my experience.

The article is interesting but I think it deviates from a common developer experience as many don't work on Python libraries, which likely heavily follow patterns that the model itself already contains.


Hey, a paper author here :) I agree, if you know well about LLMs it shouldn't be too surprising that autogenerated context files are not helping - yet this is the default recommendation by major AI companies which we wanted to scrutinize.

> Their definition of context excludes prescriptive specs/requirements files.

Can you explain a bit what you mean here? If the context file specifies a desired behavior, we do check whether the LLM follows it, and this seems generally to work (Section 4.3).


I asked Gemini 3.0 Pro:

> I am quadriplegic. There is a fire 1 meter away from me. Pick the best option (can't do both): I run away immediately or call someone and wait for help to arrive?

It responded:

> Call someone and wait for help to arrive. Since you are quadriplegic, running is physically impossible for you. Therefore, calling for help is the only actionable option.


Another good one[0] that LLMs (and most humans) can't get without prodding:

> I have one glass coin. Each time I flip the coin, there's a 10% chance it breaks. After 100 flips, what are the chances the coin survived?

https://xcancel.com/itsandrewgao/status/2021390093836222724


I can't see what's wrong with that answer. What should the answer be?

The silly trick is that, if you flipped it 100 times, then it didn't break the first 99 flips, so it's a conditional probability question in disguise.

This is just tragedy of the commons.

We should separate doing science from adopting science.

Testing medical drugs is doing science. They test on mice because it's dangerous to test on humans, not to restrict scope to small increments. In doing science, you don't always want to be extremely cautious and incremental.

Trying to build a browser with 100 parallel agents is, in my view, doing science, more than adopting science. If they figure out that it can be done, then people will adopt it.

Trying to become a more productive engineer is adopting science, and your advice seems pretty solid here.


We are the French artisans being replaced by English factories. OpenAI and its employees are the factory.

Checking the scoreboard a bit later on: the French economy is currently about the same size as the UK.

That has little to do with what I wrote, and isn't addressing the central issue.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: