Exactly. We don't do claims about humans. But there is room for improvement on current LLMs... For researchers to be able to improve LLMs we first need to know how to evaluate them. We can only improve what we can measure so we studied how to measure them :)