Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Likewise, I co-maintain the only "fan" site on one of my all-time favourite composers/performers, and gave the engine a shot with a unique string query. While my text-heavy WP-driven site didn't seem to make the cut, the results were highly relevant in that they were links to former band members and collaborators - a couple of which I didn't realize existed. That being said, there were a few sites (including my own) I expected to be returned, but no dice. Still, a fascinating experiment that many at HN have been clamouring for.


The search engine doesn't actually do full text search, so maybe your query was too... unique.

But do first of all verify that you haven't been hacked. There's about quarter of a million domains I've flagged that, besides their wordpress content, also host a ton of link spam crap off in some hidden folder. This reflects on the quality rating extremely negatively to the point where you may have not been indexed at all.

Secondly, are you behind cloudflare or some other big-name CDN? Because, as I mentioned in another comment, I can't crawl their pages without getting captchad until they approve of my humble request to be classified as a good bot.

There are some other hosting providers I flat out block on a subnet level because they host a large amount of link farms. This is currently Alibaba, Psychz, eSited, Cloud Yuqu and 1Blu.


It’d be nice if you had a page to get the current index status for a domain.


Try a query on the form site:www.example.com ;-)


Would it be possible to have a link to a page with operators?


> site:www.washingtonpost.com

> Blacklisted false

> site:www.wsj.com

> Blacklisted false

> site:www.rt.com

> Blacklisted false

> site:www.nytimes.com

> Blacklisted true

?


Hmm, not sure what caused it to end up there, but I removed it from the blacklist. It still doesn't seem to want to index the domain however, probably CDN-related.


Thanks for the advice; not hacked, but I have "resurrected" many WP sites that have been (including my wife's non-profit). Just running on an EC2 micro instance, but I tried adding "site:" and received "No such domain". Actually, I think it's because I haven't enabled "HTTPS" yet! That's on my to-do along with migrating off EC2-Classic to VPC...


Vanilla HTTP should be fine. I think 80% of the urls are HTTP.

If you're getting no such domain, it's either blocked because it looks too much like a spam domain, or it simply hasn't been discovered yet.

What's the TLD? I severely restrict some cheaper TLDs because they gave so much spam.

For example, cr.yp.to is an example of a baby I know I've definitely thrown out with the bathwater.


Is a good ol' .com with no ads and minimal JS - originally launched in 2011. Thanks again for your insights; I've bookmarked your site and will check back every so often to see if my site's been indexed.


www.ft.com gets 'no such domain'


I added it now, but it turns out it's behind a CDN so I still can't crawl it.


Thanks for responding and especially thanks for the search engine. What a breath of fresh air, and access, it feels like, to real people.


Yeah, that's a large part of what I'm trying to accomplish. Great to see others understand as well.


Exactly this. A couple results returned reference to obscure now-defunct newsletters and clubs, people that I know were historically important for past researchers, but only because this was my research forcus for so long would I have known this.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: