Do you have plans to improve the quality of the LLM as judge, in order to achiev... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		hoodsen 5 days ago \| parent \| context \| favorite \| on: New benchmark shows top LLMs struggle in real ment... Do you have plans to improve the quality of the LLM as judge, in order to achieve better parity with human clinician annotators? For example, fine-tuning models? Thinking that the comparative clinician judgements themselves would make useful fine-tuning material.

RicardoRei 5 days ago [–]

yep yep. Its something we have to study and its likely we can improve the LLM as a Judge further.

Same thing for the patient LLM. We can probably fine-tune an LLM to do a better job at simulating patients.

Those two components of our framework have space for improvement

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact