Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This doesn't seem to be an uncommon practice. If you still use AOL Instant Messenger, it does the same thing.


Within 15 minutes of setting up a https CI environment, complete with robots.txt, Googlebot was hitting the DNS name which also wasn't public or previously used or easily guessed.

All the team use Gmail and one was using Chrome.


Isn't that part of the TOS for Chrome? They use your browsing activity to improve the search results.

I remember a storm in a teacup about MS doing this via their browser toolbar.


Google gets a lot of leeway from people. If you have done SEO, you will learn that the Googlebot doesn't always respect the robots.txt. Requesting to de-index a page may take weeks or even months. The quickest way is file a DMCA complaint for the link to your own site.

Recently, they started tracking all downloads made on Chrome (for malwares), it includes the filename, the URL, IP and the timestamp. Sucks hard since I love Chrome and the only way to disable it is to disable the website malware checker (which only uses part of the hashes anyway).


Another possibility was that the hostnames were leaked via the SSL certificate. I've seen evidence of spiders using this for discovery, including Google. Your best protection in that case is to use a wildcard certificate, if you want it to validate.


And people wonder why I run my own email and XMPP servers.


Nobody wonders why. They just question if you're getting value for your time and money.

For a lot of people spending >$100/year and many hours maintaining it isn't worth it.


I also thought this was a common practice. Facebook, Gchat, Skype all do it. If you have sensitive links you shouldn't post them in chats/emails.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: