I have the same feeling. I asked to find duplicates in a list of 6k items and it basically hallucinated the entire answer multiple times. Some times it finds some, but it interlaces the duplicates with other hallucinated items. I wasn't expecting it to get it right, cause I think this task is challenging with a fixed amount of attention heads. However, the answer seems much worse than Claude Opus or GPT-4.