Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think you're restating (in a longer and more accurate way) what I understood the original criticism to be, that this grading test isn't testing what's it's supposed to, partly because a grade is too few tokens.

The model could "assess" the code qualitatively the same and still give slightly different letter grades.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: