Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Thank you for providing this.

Bing Chat runs on GPT-4, however [1]. And Bing gets this wrong in all 3 of its modes (Creative, Balanced, and Precise) as of time of writing.

Given this experiment and similar others presented around here, it stands to reason that GPTs(**1) often identify(**2) the problem as a "wolf, goat, and cabbage" problem and then merely guess which node of the problem is the middle node (inner node of the "danger to" graph), yielding a 1/3 chance of getting it right by pure luck, resulting in diverse reports here.

(**2) That does not always yield an adequate response beyond the mere permutation of nodes, however. I've been getting the following variants for step 1. from Bing in Precise in response to marginally slightly different rewordings of the same:

- The general escorts the ambassador of Costaguana through the tunnel first. This leaves the ambassador of Atlantis and the ambassador of Buranda in the bunker, but they are not alone because the general is still there.

- The general escorts the ambassador of Costaguana through the tunnel first. This leaves the ambassador of Atlantis and the ambassador of Buranda in the bunker, but they are not alone because they have each other.

and so on.

(**1) I also tried Bard and Llama 2 with even more disastrous results full of nonsense of (**2) kind. The earlier posted response of ChatGPT-3.5 is also prime with these as well.

Re

> By the way, as soon as these systems are able to check their reasoning (i don't think it'll be a huge leap) it's enough to solve reasoning problems with probability >0.1% for example. Because you can just have it do rollouts in its head until it's correct [2]

Mistakes of type (**2) don't seem to be fitting the target of the cyclic refinement you are proposing, as far as I can understand it. These errors aren't getting the logic wrong, but completely butcher the basic relationships of actors, like what it means to be alone, or spatial relationships between the actors and their environment.

[1] https://blogs.bing.com/search/march_2023/Confirmed-the-new-B...

[2] https://news.ycombinator.com/item?id=38389222



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: