LLMs have room for improvement (we show that their scores are medium-low on several dimensions).
Maybe the average human also has lots of room for improvement. One thing does not necessarily depend on the other.
the same way we can say that LLMs still have room for improvement on a specific task (lets say mathematics) but the average human is also bad at mathematics...
We don't do any claims about human therapists. Just that LLMs have room for improvement on several dimensions if we want them to be good at therapy. Showing this is the first step to improve them
But you chose the word "struggle". And now you say:
> Just that LLMs have room for improvement on several dimensions if we want them to be good at therapy.
That implies they're not currently good at therapy. But you haven't shown that, have you? How are you defining that a score of 4 isn't already "good"? How do you know that isn't already correlated with meaningfully improved outcomes, and therefore already "good"?
Everybody has room for improvement if you say 6 is perfection and something isn't reaching 6 on average. But that doesn't mean everybody's struggling.
I take no issue with your methodology. But your broader framing, and title, don't seem justified or objective.
LLMs have room for improvement (we show that their scores are medium-low on several dimensions).
Maybe the average human also has lots of room for improvement. One thing does not necessarily depend on the other.
the same way we can say that LLMs still have room for improvement on a specific task (lets say mathematics) but the average human is also bad at mathematics...
We don't do any claims about human therapists. Just that LLMs have room for improvement on several dimensions if we want them to be good at therapy. Showing this is the first step to improve them