This is a good point. We have not tested the clinicians but I believe they would... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		RicardoRei 5 days ago \| parent \| context \| favorite \| on: New benchmark shows top LLMs struggle in real ment... This is a good point. We have not tested the clinicians but I believe they would not score each other perfectly as we observed some disagreement also between the scores which also reflects different opinions between clinicians

megaman821 5 days ago [–]

It is nice to have an accurate measure of things and a human baseline would be additionally helpful too.

Many things can be useful before they reach the level of world's best. Although with AI, non-intuitive failure modes must be taken into consideration too.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact