Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes. Though I’d say that example is a bit mean (it’s a trick question) since the answer has expected type <time> but whose actual answer is something like “don’t be stupid; they’re not even on the same track”. It’s like asking “if I add three apples to two apples, how many pears do I get?” and being surprised when the LLM says “5 pears”. It’s the sort of thing that’s avoidable if the question is written properly.


This comes up time and time again. People claim these models are mind blowing. But then someone will post something where the model falls flat on its face, and they just get a bunch of that's too complex or that's the wrong type of thing to ask.

So it ends up that these models are awesome if you ask them questions from a narrow set of things and if you assume what they respond with is correct.


Well, at least in this subthread, the model is only failing at the same things humans are failing at too. To see the mind-blowing part, stop treating GPT-4 like the Oracle in Delphi, and start treating it as "first comes to mind" answer (aka. the inner voice) - and then notice the failure modes are pretty much the same like with humans. For example, coercing a trick question into a similarly-sounding straight question, and answering it before realizing the person asking is an asshole.


I was originally making the point that these models struggle with even basic mathematics (of the true kind, not arithmetic — though of course they struggle with that too). My point here was to play devil’s advocate and be slightly forgiving of the model, since I as a human am likely to be tripped up by similar trick questions. Since we don’t really know ‘how these models think’ (have much idea of the emergent world model they build) we are stuck in constant debate about whether they’re really quite amazing or absolutely pathetic.


GPT-4 Turbo:

When is 2+5=10?

The equation 2+5=10 is never true in standard arithmetic, as 2 plus 5 always equals 7. However, in a different context or under a specific set of rules (such as a puzzle or a riddle where the symbols have different meanings), it could be made to appear true.

GPT-4(phind):

The equation 2 + 3 = 10 is true in a base-5 number system.

In base-5 (also known as quinary), the digits range from 0 to 4. When you add 2 and 3 in base-5, the result is 10. This is because in base-5, the number 5 is represented as 10.

Here's a simple breakdown of how this works:

    In base-5, the rightmost digit represents 5^0 (which is 1), the next digit to the left represents 5^1 (which is 5), the next digit represents 5^2 (which is 25), and so on.
    So when you add 2 and 3, you get 5. But since 5 is represented as 10 in base-5 (1 digit of 5, and 0 digits of 1), the result is 10.
Therefore, in base-5, 2 + 3 equals 10


I think it's completely reasonable to ask an AI that people want to consider intelligent trick questions. If it's really that smart and capable of reasoning, it should identify the trick.

Some humans will be fooled by the question, sure. But an AI should be smarter than humans, or at least, as smart as an above-average human.


I agree. But you could ask which is more intelligent: recognising a trick question and balking, or recognising that the question as posed doesn’t quite make sense and offering a reformulation together with its answer. It’s not always clear whether something’s a trick, a mistake or a strangely worded (but nonetheless intentionally weird) question. So I think it would be very hard to get it to never fall for any tricks.


I think they've fixed it now, but it does seem to recognize popular trick questions, like "what weighs more, a ton of feathers or a ton of bricks?". It would answer with the typical explanation about density not mattering, etc.

But, it used to fail on "what weighs more, 3 tons of feathers or 2 tons of bricks?".

So, it seems less about what's a trick, and more about what's a common question --> answer pattern.


It's the same with humans. I don't fail on this (in an on-the-spot response) question because I've fallen on it as a kid, then learned the trick, then learned to be suspicious of this trick in similarly-worded questions.


If we're going to call these things "AI" (which I absolutely oppose) I think it's not unreasonable to expect them to get this right. A 5 year old would understand you some get pears by adding apples together.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: