Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Generating more "think out loud" tokens and hiding them from the user...

Idk if I'm "feeling the AGI" if I'm being honest.

Also... telling that they choose to benchmark against CodeForces rather than SWE-bench.



> Also... telling that they choose to benchmark against CodeForces rather than SWE-bench.

They also worked with Devin to benchmark it on Devin's internal benchmarks, where it's twice as good as GPT-4o: https://x.com/cognition_labs/status/1834292718174077014 https://www.cognition.ai/blog/evaluating-coding-agents


They’re running a business. They don’t owe you their trade secrets.


Why not? Isn't that basically what humans do? Sit there and think for a while before answering, going down different branches/chains of thought?


This new approach is showing:

1) The "bitter lesson" may not be true, and there is a fundamental limit to transformer intelligence.

2) The "bitter lesson" is true, and there just isn't enough data/compute/energy to train AGI.

All the cognition should be happening inside the transformer. Attention is all you need. The possible cognition and reasoning occurring "inside" in high dimensions is much more advanced than any possible cognition that you output into text tokens.

This feels like a sidequest/hack on what was otherwise a promising path to AGI.


On the contrary, this suggests that the bitter lesson is alive and kicking. The bitter lesson doesn't say "compute is all you need", it says "only those methods which allow you to make better use of hardware as hardware itself scales are relevant".

This chain of thought / reflection method allows you to make better use of the hardware as the hardware itself scales. If a given transformer is N billion parameters, and to solve a harder problem we estimate we need 10N billion parameters, one way to do it is to build a GPU cluster 10x larger.

This method shows that there might be another way: instead train the N billion model differently so that we can use 10x of it at inference time. Say hardware gets 2x better in 2 years -- then this method will be 20x better than now!


I'd be shocked if we don't see diminishing returns in the inference compute scaling laws. We already didn't deserve how clean and predictive the pre-training scaling laws were, no way the universe grants us another boon of that magnitude


Does that mean human intelligence is cheapened when you talk out a problem to yourself? Or when you write down steps solving a problem?

It's the exact same thing here.


The similarity is cosmetic only. The reason it is used is because it's easy to leverage existing work in LLMs, and scaling (although not cheap) is an obvious approach.


> Does that mean human intelligence is cheapened when you talk out a problem to yourself?

In a sense, maybe yeah. Of course if one were to really be absolute about that statement it would be absurd, it would greatly overfit the reality.

But it is interesting to assume this statement as true. Oftentimes when we think of ideas "off the top of our heads" they are not as profound as ideas that "come to us" in the shower. The subconscious may be doing 'more' 'computation' in a sense. Lakoff said the subconscious was 98% of the brain, and that the conscious mind is the tip of the iceberg of thought.


lol come on it’s not the exact same thing. At best this is like gagging yourself while you talk about it then engaging yourself when you say the answer. And that presupposing LLMs are thinking in, your words, exactly the same way as humans.

At best it maybe vaguely resembles thinking


> "lol come on"

I've never found this sort of argument convincing. it's very Chalmers.


Admittedly not my most articulate, my exasperation showed through. To some extent it seems warranted as it tends to be the most effective tactic against hyperbole. Still trying to find a better solution.


Karpathy himself believes that neural networks are perfectly plausible as a key component to AGI. He has said that it doesn't need to be superseded by something better, it's just that everything else around it (especially infrastructure) needs to improve. As one of the most valuable opinions in the entire world on the subject, I tend to trust what he said.

source: https://youtu.be/hM_h0UA7upI?t=973


I think it's too soon to tell. Training the next generation of models means building out entire datacenters. So while they wait they have engineers build these sidequests/hacks.


Attention is about similarity/statistical correlation which is fundamentally stochastic , while reasoning needs to be truthful and exact to be successful.


Imagine instead the bitter lesson says: we can expand an outwards circle in many dimensions of ways to continuously mathematically manipulate data to adjust outputs.

Even the attention-token approach is on the grand scale of things a simple line outwards from the centre; we have not even explored around the centre (with the same compute spend) for things like non-token generation, different layers/different activation functions and norming / query/key/value set up (why do we only use the 3 inherent to contextualising tokens, why not add a 4th matrix for something else?), character, sentence, whole thought, paragraph one-shot generation, positional embeddings which could work differently.

The bitter lesson says there is almost a work completely untouched by our findings for us to explore. The temporary work of non-data approaches can piggy back off a point on the line; it cannot expand it like we can as we exude out from the circle..


Sure, but if I want a human, I can hire a human. Humans also do many other things I don't want my LLM to do.


well it could be a lot cheaper to hire the AI model instead of a human?


This kind of short-sighted, simplistic reasoning / behaviour is what I worry about the most in terms of where our society is going. I always wonder - who will be the people buying or using your software (build very cheaply and efficiently with AI) once they can do the same, or get replaced by AI, or bankrupt themselves?

Everybody seems to be so focused on how to get ahead in race to profitability, that they don't consider the shortcut they are taking might be leading to a cliff.


Except that these aren't thoughts. These techniques are improvements to how the model breaks down input data, and how it evaluates its responses to arrive at a result that most closely approximates patterns it was previously rewarded for. Calling this "thinking" is anthropomorphizing what's really happening. "AI" companies love to throw these phrases around, since it obviously creates hype and pumps up their valuation.

Human thinking is much more nuanced than this mechanical process. We rely on actually understanding the meaning of what the text represents. We use deduction, intuition and reasoning that involves semantic relationships between ideas. Our understanding of the world doesn't require "reinforcement learning" and being trained on all the text that's ever been written.

Of course, this isn't to say that machine learning methods can't be useful, or that we can't keep improving them to yield better results. But these are still methods that mimic human intelligence, and I think it's disingenuous to label them as such.


It becomes thinking when you reinforcement learn on those Chain-of-Thought generations. The LLM is just a very good initialization.


Yes but with concepts instead of tokens spelling out the written representation of those concepts.


Without a world model, not really.


The whole thing is a world model- accurately predicting text that describes things happening in a world, can only be done by modeling the world.


Is it?


Exploring different approaches and stumbling on AGI eventually through a combination of random discoveries will be the way to go.

Same as Bitcoin being the right combination of things that already existed.


Crypto being used as an example of how we have moved forward successfully as a species is backward toilet sitting behaviour.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: