What sticks out to me is the 60% win rate vs GPT-4o when it comes to actual usag... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		morningsam on Sept 12, 2024 \| parent \| context \| favorite \| on: Learning to Reason with LLMs What sticks out to me is the 60% win rate vs GPT-4o when it comes to actual usage by humans for programming tasks. So in reality it's barely better than GPT-4o. That the figure is higher for mathematical calculation isn't surprising because LLMs were much worse at that than at programming to begin with.

quirino on Sept 12, 2024 [–]

I'm not sure that's the right way to interpret it.

If some tasks are too easy, both models might give satisfactory answers, in which case the human preference might as well be a coin toss.

I don't know the specifics of their methodology though.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact