Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Cool!

There do seem to be some text encoding issues though. For example: https://search.marginalia.nu/search?query=tim+visee



Yeah I think the charset detection needs work.

It understands the "Content-type: text/html;charset=utf-8" -header, and <meta charset="UTF-8">

but not

<meta http-equiv="content-type" content="text/html; charset=utf-8">

It turns out HTML has a lot of corner cases. I'm constantly marveling at how web browsers hold together as well as they do.


Thanks for your response! Hope you can implement this as well without too much trouble.

I wonder if you could just assume UTF-8 to be the default these days. I imagine that to fix many other cases as well.


Haha! I did actually assume UTF-8 at first, but being a search engine has a lot of older websites, I sadly got a lot of encoding errors doing that, too.


Maybe just like js-heavy sites also punish non-conforming-encoding sites.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: