Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes, the huge repository of raw materials is likely the hardest part. You can try crowdsourced collections ( https://tatoeba.org , https://datacollective.mozillafoundation.org/datasets?q=comm... , https://opus.nlpl.eu/OpenSubtitles/corpus/version/OpenSubtit... ) but you'll quickly run into data quality issues. My personal solution is to do manual data curation on the fly, but I think an app that occasionally throws up garbage and asks its users to pick out the good parts is unlikely to get popular.


Maybe the free version of the app could do the collaborative filtering part. And in the paid version you'd get the high quality content.


That lets you turn the problem of figuring out which part of your content is badly machine-translated into the problem of figuring out which of your users have enough attention to detail to spot badly machine-translated content in a language they themselves are still learning. Though I guess if you show paying users only content that lots of people think is good, at least it reduces the chances that a paying user notices an issue and complains, so it could still work in that sense.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: