I have spent some time doing this for these benchmarks — the model still does ma...

		nmca on Sept 12, 2024 \| parent \| context \| favorite \| on: Learning to Reason with LLMs I have spent some time doing this for these benchmarks — the model still does make mistakes. Of the questions I can understand, (roughly half in this case) about half were real errors and half were broken questions.