Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Looks exactly like what type of crunch work ML would do, but have you considered using brute force converters like latexml or pandoc where appropriate?


Yup, we've tried a lot of different tools in combination. All of them have their own trade-offs and extraction errors.

This system uses GROBID and some extraction techniques of our own. We're working on a GROBID replacement too, which should help us make things better.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: