Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Trouble In the House of Google (codinghorror.com)
461 points by ZeroMinx on Jan 3, 2011 | hide | past | favorite | 162 comments


Its that old issue. If you are paying you are the customer - if you aren't paying you're the product.

With Google the customer is the person placing the ads and the product is you.

The content farms are the middle man - they try and place you (the product) onto a paying page (the customer) and stop you going to a non-paying page (that doesn't pay-per-click).

Google has 2 business models:

* I search for an advert and Google sells me directly to the customer

* I search for something and Google takes me to a middle man who sells me to a customer

The first business model works great and I often search Google for an advert.

The second business model is broken - because I (the user) want a search engine that takes me to my destination - if something that triggers a purchase happens along the way, fine).

1/3 of the web now consists of Google's Middlemen selling Google's ads for Google.

When my (then) colleague Dale had 500,000 page views from his HTML5 pacman (http://news.ycombinator.com/item?id=1549056) he didn't put Google Ads on it because 'Google Ads are Cheap'.

But then my (non-technical) customer Tim specifically said he wouldn't put Google adds on http://cyclingbibliography.org/ which is designed to make him income, I thought, Oh!.

At Xmas my 12 year old was moaning about Google when looking for something.

It has now reached the point where a page ranking algorithm which penalises sites with Google Ads would be welcomed by many people.

Google's problem is that only way out is to reduce its income - when it has been tweaking its software to increase its yield.


Even when you're the one paying Google real money, the customer service is often lousy and Google's methods are still opaque.


I had an adwords represenative tell me "traffic school" (and various other keywords) is not a relevant keyword to an online traffic school for traffic tickets. One can only hope that Google Search is more sane.


I don't have numbers, but I'd imagine that Google gets paid more when you do a search and click on AdWords than when you do a search, click on a content farm, and then click on AdSense. In the latter case, they have to split the income with the content farm.


To generate income, Google prefers visitors that click on AdWords links. Do links to content farms cannibalize AdWords links? I don't think so.

Content farms are producing profitable responses to search queries in the present.

The negative impact of links to content farms to Google's bottom line is not due to visitor behavior during the current search but on the reduced likelihood of that visitor using Google to search again. It's possible that the average visitor is more satisfied with the results with content farms included than they are without content farms.


"Google's problem is that only way out is to reduce its income - when it has been tweaking its software to increase its yield."

Sounds like a case of Innovator's Dilemma.


Exactly Google's problem. As Christiansen says in the Innovators Dilemma - managers don't allocate resources, customers do.

One of the reasons why all Google's products except search are commercial failures.


Google's problem is that only way out is to reduce its income - when it has been tweaking its software to increase its yield.

Interesting. Considering Google's focus on A/B testing everything, perhaps they A/B tested themselves into this. Can you A/B test your way out of a slowly decreasing local minimum?


It's certain that annoying gadgets such as "Instant" reflect poorly on Google priorities.

The mission of Google is to help users find stuff, not generate the maximum possible number of ad impressions per query (which would be quite short-sighted).

But, is it really getting worse? It's never been possible to use Google effectively to research dishwashers. Never. I remember using a Firefox extension to block specific domains from Google search for a long time (it's now called "OptimizeGoogle" but had another name before that).

Dishwashers aside, I still find Google pretty effective.

Jeff's post starts with a chart that shows that 88.2% of SO's traffic comes from Google; if Google was that bad, wouldn't users start to use something else? Where is the increase in traffic from Bing (0.9% from the same chart!)? Where's the nascent but so powerful traffic from blekko...??!?


Jeff has more reason than most to be irritated by Google results (really, I doubt you could find ten people that would bemoan the loss of (e.g.) Developer IT in they smacked a big penalty on it - if you need a pretext to do so then using link rel="canonical" to your own page full of scraped content would appear to suffice...).

That said, some of the complaints I hear about Google being unable to help with shopping queries baffle me. When I search for dishwashers all the results (bar the EnergyStar one, arguably) look helpful. They won't give me an objective recommendation for the best dishwasher for my personal needs, but how the hell is a search engine supposed to manage that?

Likewise, I'm admittedly probably not being served the exact same results as Jeff (I get a couple of local UK ones for a start), but I can't really see how he found it so difficult to find iPhone cases for his wife using a search for iPhone 4 case

The top result is iphone4case.com, which is rather nicely designed even if the purchase links do go via another merchant. The next four results appear to be legit merchant sites - sandwiched in the middle is Google's suggestion you try a shopping search with its own custom price comparison engine.

Only one result on the home page is a splog and whilst four of the sites have "buy" links that redirect you to the purchase page of a separate merchant, the average internet consumer isn't going to mind, and certainly isn't going to find the results "useless" in their quest to buy a case online. It's not that big a flaw to the usability of Google given the average user really couldn't care less if the helpful site with the big pictures of phones is simply deep affiliate linking to Casemate.


Proportion is different from volume. The author states that the majority of their traffic comes from Google searches. That chart does not include "lost" traffic that was routed to sites that benefit from the advertising revenue without contributing to the value of the site.


You're right, but my point is that, if Google is bad, it should begin to lose market share. If it does not in fact lose market share while there are good alternatives, it must mean users are satisfied with what they're getting (or not unhappy enough to switch).


Saying something is wrong does not necessarily imply that there is a better solution currently out there.


Comcast is my only ISP option. You think I wouldn't rather have FioS?


The choice of a search engine does not rest on your geographic location; you may be forced to go with Comcast but you're free to use Bing, no?


This is really just a rehash of other posts from the last month (linked in article).

This post basically complains about two things: the finer points of SEO and content farms.

Content farms is an easy one. They're the Web equivalent of spam and I'm talking about the likes of Associated Ontent and Demand Media. They re a relatively new (last few years) phenomenon.

My personal view is that no one is better placed to deal with this new threat than Google. Email spam is basically a solved problem on Gmail. Thats not ss there aren't false positives and negatives but it's oohing like it used to be or could be. It'll take time but I believe that content farms are a transitory and doomed business model.

As for product searches, this encompasses many things. Anecdotally I recently searched for "<camera make and model> review" and found what I wanted no problem. Prices I found on pricegrabber (they have an iPad app).

SEO is a trickier beast. For one it's a constantly moving target. A combination of suboptimal source SEO and content farm SEO gaming allows the scrapers to survive. I can't say that keyword position matters all that much. Anecdotally Jeff claims it does but many factors are at pay so it's always best to be careful about making absolute claims.

Jeff claims not to want to be acquired. I'm reminded of a story I heard. Basically: if you wanted money (from angels) ask for advice. If you wanted advice, ask for money (IIRC this story came from either Mark Suster or Jason Calacanis, can't remember).

So, if you want to be acquired, say you don't?

Lastly, I'll reiterate my own opinion that social search isn't the answer in the general case (ie it will have specific use cases).

Content curation is a mixed bag. I believe there will (for at least a very long time) be a place for niche verticals. For example, dpreview is a vertical for cameras. General purpose models like Mahalo I think are doomed for much the same reason that Jeff and Joel have contended that general Q&A sites are doomed.


Email spams basically a solved problem on Gmail

I don't want to move the discussion away from web spam, but I disagree with your statement. I have email for several domains hosted on Google (as well as a regular Gmail address) and they all suffer from legitimate emails regularly going into the spam folder. A quick search has shown that I'm not the only one suffering from this problem.


I concur. Spam is only a "solved" problem on services like Google Mail if you consider a high rate of false positive detections to be a solution. I suspect this is one reason that many of my friends now use Facebook as their default messaging system -- which is a pain for those of us who don't want to use Facebook, and even more so if we send our e-mails from a personal domain that tends to get arbitrarily flagged as junk by services like Google Mail and Hotmail the first time we write to each friend.


If you think gmail has a high rate of false positives, I encourage you to try Hotmail or (especially) Yahoo.


Yahoo works pretty well for me. Occasional false positives and negatives, but it's generally about right.


My experience w/Yahoo! mail is surreal. It even marks my own mail as spam. And misses lots of real spam.

(i use gmail primarily, but have an old Yahoo! address for bacn)


Really? Hmm, maybe it's because it's a very old address, but my Yahoo spam filtering is terrible.

For example, I get a few very similar Canadian pharmacy junk messages every single day. Even though I usually mark them as spam, they continue to appear in my inbox. It never learns. Additionally about 1 in 4 order confirmation emails ends up in the Junk folder.


> Hmm, maybe it's because it's a very old address, but my Yahoo spam filtering is terrible.

My understanding of Bayesian spam filtering is that it's supposed to get better over time, not worse.


Supposed to, yes. Until my ISP moved to GMail and hugely upped the spam filtering without telling me (not impressed), I had much the same observation with Thunderbird - its junk filtering was getting less accurate, not more.


I use Thunderbird and find that spam comes in waves. I'll get a smattering of spam emails in my inbox, flag them as such, and the number tapers toward zero. Then, a month or two later, another smattering will appear, after the spammers realize the filters have adapted and change the composition of their spam emails.


If you get any false positives, then it isn't anywhere near right.


I think most people would agree that the Yahoo spam filtering is vastly inferior to Google's


I wouldn't agree, but I find Gmail to be awful to use. So I'm already biased.


I have the same problem, and I'm at a loss as to what to do about it.


I had the confirmation of an order from an online store directly into the spams recently, I guess too many people dislike their newsletter or something.

It took me a while to figure it out and wasn't sure if my order was properly registered, if I mistyped my email or if it was a problem on their side. Lesson learned, now I systematically scan and empty the spams.


Whenever I find an efreedom article linking back to Stackoverflow, its on the first page.

The original stackoverflow article on the other hand is on page 3 or 5 or greater.

I'm not sure what Stackoverflow is doing, but in this case efreedom is providing me a service by making this content easily found in a search engine. Sure its one more click, but in my mind the alternative is never being able to find the SO article at all.

My workflow involves looking at multiple articles, so I can't simply put 'site: stackoverflow.com' into my search.


Can you give me some concrete search terms that have this ranking discrepancy? I want to investigate and just haven't seen it yet myself.


Something has changed here. Aging or a ranking tweak or Phase Of The Moon, something. A search from a couple of days ago is now returning SO above EF, where that wasn't the case a few days ago.

Ah, here's one from the search history with EF above SO:

retaincount instruments

There are several eFreedom hits from the first page, and SO has one hit on page 2 and one hit on page 3.

I have screenshots of the first three pages.


Thanks for giving a concrete example. I'd gotten a couple examples when talking to Jeff Atwood earlier and passed that on to the appropriate team, but more concrete examples always help when we're trying to rank the sites in the best order.


By the way, if people want to mention more examples of other sites outranking Stack Overflow by replying here, I'll try to circle back to check for them, then use them to poke the relevant team to make sure we're doing everything we can.


I personally would prefer a way to just blacklist sites like this. It's very easy to visually spot a "cloner", and once I found such a site I don't care to see it again. I know this feature has been requested by many others; just adding one more voice to the pile.


I'd like them to just have a site to report SEO scammers, where you could just copy and paste the URL. This would make FF/Chrome extensions trivial. Google could rank each submitter in a similar way to how they rank reCaptcha submissions as a way of discouraging abuse.

Still, I think it's becoming apparent that SEO has broken PageRank, and we really are in need of something else. Whether that something else is algorithmic, social, or, most likely, a combination of the two, remains to be seen. Still, I don't think we'll ever have a system that can't be gamed.


Funnily enough, there WAS a way to blacklist/remove sites in the SearchWiki feature ! Could you bring this back, Google ? Pretty pleeeeeeese :))


Here's the only example I have: http://www.google.com/search?q=facebox+value+text&hl=en

Based on my search history I've clicked efreedom links several times, but currently the stackoverflow link is the top one for those searches now. I would be surprised if I had clicked efreedom if stackoverflow was at top.


Heh I get this spammy piece of crap at #1 for that search:

http://pinoytech.org/question/1699545/unable-to-get-value-fr...


I'd love to get more insight into how Google fixes problems like this, like hopefully you don't just hard code a multiplier for stackoverflow.com or a negative one for the spam sites?


HN's "voting based on commenter" at work? This is probably single most useful comment in the entire thread, and Matt's "thank you" note had higher votes than the content? Sorry about the derail.

I've personally not seen efreedom ranking above stackoverflow. I've seen it 2 places down or one down. So this is a very valuable comment. IMHO. YMMV.


First, searched for: ppa in pbuilder

I received an eFreedom result on the second page of results, and no corresponding Stack Overflow article. The title of the eFreedom page is "DEBIAN - pbuilder storing dependencies - efreedom"

I then searched for: ppa in pbuilder site:stackoverflow.com

This gives one response - not the same article that was scraped by eFreedom. So that seems a little off - as a user I'd expect the Stack Overflow article that was scraped to meet the query if eFreedom does.

Finally, based on the title of the Stack Overflow article that was scraped in my first query, I searched for: pbuilder storing dependencies

Here the original Stack Overflow article is listed first, eFreedom second, so that's good. Unfortunately when you scan a little further down the page, the article "UBUNTU - pbuilder create fail - efreedom" is above "pbuilder create fail - Stack Overflow".

As a side point, my search is not made more valuable by the scraped versions appearing in the results. Making Stack Overflow appear before eFreedom is one thing, but why should I see eFreedom at all? I'd rather the other results be different content, that way I don't have to mentally parse everything to work out if it's the same content with a different skin.


Apologies for nitpicking, but your post was quite hard to read due to typos. Most of the time my brain auto-translates typos without me realizing it, but the ones in your post really interrupted my reading flow. YMMV.

The content was interesting, thank you. Good point about niche sites like dpreview.


I was thinking the same thing, and couldn't even work out how he'd achieved them — commonly mishit combinations seem easy for me to translate on the fly. Dvorak?


iPad I think... see OP's response to another reply.


Email spam has no monetary value for google, whereas web spam may (I don't know whether they benefit from ads revenue from link farm and co, but that seems at least plausible).

I wonder if the search quality issue is not also linked to a vast increase of the size of internet - searchs are the same throughout the years (couple of keywords), for more and more websites. Maybe vertical search helps there just because they reduce the amount of data to look for ?

As noticed by some other people, I have used google all my internet life (since 2000 or so), and google has never ever be useful to look for things like hotel, off the shelve software information, etc... I did notice a decrease of quality in technical searchs like API, etc... (where google has always been effective up to now), I wonder if something changed the last few months.


I find far product searches to be far more efficient on Amazon. In fact, I think Amazon is the real threat to Google. I'll try to browse Google (but it's harder to do now for the reasons mentioned), but when I am ready to buy, I always check Amazon.

OT: did you type your commment on an ipad?


Google is indeed getting worse for product searches. However Amazon is not the worldwide answer for product comparison since for the vast majority of their non book products they refuse to ship anything outside of the US, thus making pricing pretty much useless. Google definately has more usefulness worldwide.


I'm glad you brought up this interesting point. I didn't know that their global presence was lacking


Amazon is an example of a niche vertical. A pretty big one mind you but nonetheless they specialize in the products they sell.

I can't seem to remember Amazon coming up on product searches I've done. I have to wonder:

- Is this because I went straight to Amazon?

- Am I misremembering?

- Is there something wrong with Amazon's SEO?

- Is Amazon restricting the their content being indexed (I can't imagine this is true)?

- Is Google actually failing here?

OT: Yeah my typo rate on my iPad is pretty terrible. I'm too used to touch typing on a keyboard. Still, I find myself reading the Wen on my iPad instead of a laptop/desktop these days so I need to put up with the typos (and get better with any luck), not comment or move to a computer to comment (not going to happen).


They show up for in SERPs all the time.

http://www.google.com/search?sourceid=chrome&ie=UTF-8...

http://www.google.com/search?hl=en&safe=off&q=+Asus+...

http://www.google.com/search?sourceid=chrome&ie=UTF-8...

Etc etc. I see them all the time in Google, despite also searching Amazon directly quite a bit.


I tend to go straight to Amazon when I'm price shopping now. If it's an "Amazon product" (books, electronics, etc) they usually have a pretty good price, and if they don't they give me the comparison. The big difference is: With Google, I'll research and occasionally get my wallet after finding something. With Amazon, get my wallet before I sit down.

As someone who pays for advertising this is a distinction I pay attention to, especially if it generalizes to more people than just me: people going to Amazon are ready to buy and not just doing research.

If Amazon can make that stick I think they pose a significant threat to Google. Why pay as much for research clicks when you can get clicks that are closer to a sale?


Agreed. Amazon reviews are also my first product research tool.


This is really just a rehash of other posts from the last month (linked in article). This post basically complains about two things: the finer points of SEO and content farms.

Yes indeed, on the money.

Further, the attitude of these various whiners is pathetic.

The war on spam-in-a-general-sense is not never going to be won or lost. The war for attention simply, inherently, has too many dimensions to be winnable or losable. As long as there's benefit for merely getting people attention, there will be people who successfully put out low-content, attention-getting information in one form or another. Human beings themselves are often often satisfied with less-than-stellar-content, how then could a given algorithm be expected to filter for stellar content?

Google and other filtering-entity's ability to provide us with stuff we're satisfied with will fluctuate. They won't ever be done. The "good" providers won't ever be fully satisfied and the "bad" provider won't ever completely go away. Any "new Google" will be "the same as the old Google".

Let's move on.


"<camera make and model> review"

Which is fine if you already have narrowed down your camera options to a top-5 or so, but the issue is getting to that stage.


It looks like the biggest thing that efreedom.com (the most prolific stackoverflow mirror) does to rank higher in google searches is put the category as the first word in the title. What stackoverflow titles as "How do I use MediaRecorder to record video without causing a ..." efreedom titles as "Android: How do I use MediaRecorder to record video without causing a ...". So when I search for "android mediarecorder segmentation fault" all other things being equal efreedom wins.


Which implies that they're actually taking the SO dumps and adding legitimate value to them.


Is that legitimate value -- the only thing it changes is making the page easier to find? It seems to me that the site is ugly, gives no way for the user to become part of the discussion, and full of 'share with your friends' links and adverts...


Incredibly minor legitimate value while introducing significant noise and complication thouugh.


It's not a very difficult problem to solve. 95% of the content-farm spam comes from a few domains. In the same way that spam-blacklists have proved to be the most-effective way to combat e-mail spam, Google just needs to decide to shut these content-farms out. They don't need to do anything sophisticated like tweak their algorithm... just shut them out. The fact that it hasn't been done yet suggests to me that Google doesn't want to.


But if Google shuts those domains out, then the content farmers can just pick up new ones. Furthermore, Google would need to have some sort of process for determining which domains are content-farm spam before shutting them out.

As with spam, the bad guys can be far more agile than the good guys. The point of having algorithms is that once you have a good algorithm, your search-quality staff doesn’t have to scale linearly with the number of pages on the Web.

(Disclaimer: I work in the search division of Nokia, i.e., I work for one of Google’s competitors.)


Google has 23,000 employees and ~$10B in yearly operating income. I'm pretty sure they could stay ahead of the bad guys if they wanted to.

EDIT: Additionally, it is hard to game pagerank rapidly because the rest of the web needs to link to your site. So, even if you switch your knock-off wikipedia site to a new domain, it would take weeks/months to rise through the rankings. I'm pretty sure it would only take a few employees (at most) to stay ahead of these huge spam sites.


Compared to how many millions of people making money on the web, all of whom have an incentive to boost their own traffic regardless of whether that's the best thing for users?

Joy's Law comes into play here: "No matter who you are, most of the smartest people work for someone else." 23,000 employees sounds huge, until you compare it with the million+ people who make their living as full-time E-bay sellers, and the however many million people who make their living off AdSense.


Google has 23,000 employees ... . I'm pretty sure they could stay ahead of the bad guys if they wanted to.

In the past there were people within Google who had the skill, knowledge, organizational connections and authority to quickly and gracefully reduce the problem of link spam without harming the bottom line too much.

I think we all loved their early work, and our love helped propel their little company to multibillions-per-year.

I don't think that kind of unique early experience and subsequent problem-solving effectiveness is snap-in interchangeable.

Some of those "rock stars" work at Facebook now. Google might not be capable of fixing the problem gracefully without them.


That's depressing... I hope you're wrong.


it is hard to game pagerank rapidly because the rest of the web needs to link to your site

Not really. Some people own different websites, purchased through different accounts, hosted on different servers. They, then, link between those sites.


GMail doesn't rely solely on algorithms, but also on learning from users' flagging of spam. The only problem is that sometimes it has false positives.

I would be happy if Google provided an "is spam" button that filtered the content only for me (i.e. without consequences to other users).


well, yeah, that's how the algorithms work... by using a training dataset where a human actually says "is spam."

And as for the "is spam" button... a given email isn't automatically added to the spam training dataset the minute 1 guy hits the spam button. it goes automatically into YOUR spam folder but not automatically into the training dataset. That takes many "votes" from many users.


That's why I would be happy with a personal spam filter: training by users' votes can be gamed, and producing false positives can take legitimate websites out of business.

I see there are some browser plugins floating around: I'm not happy with those because I use multiple workstations, and because hacking around a product's deficiencies is not "voting with my wallet".


Don't Google engineers actually use Google? They shouldn't really need flags to become aware of this sort of thing.


In my experience, the path from “engineer recognizes that queries X, Y, and Z return crap search results” to “search engine improves its performance with queries X, Y, and Z without creating more crap somewhere else” is more difficult than a lot of people realize.


Still, a power law applies and you can get huge gains by punishing the biggest offenders and moving down from there. And yes, spammers can always restart, but the sandbox helps with that.


Without wishing to be rude does Nokia's search division really compete with google?


High-end Nokia phones have a local-search client that competes with Google Maps. The front end for people using regular browsers is here: http://maps.ovi.com/


Fair enough, although that's not really competing in search is it? Doesn't work on an iPad so I can't see it right now.


I suppose it does in the mobile space, at least.


It reminds me of the reddit IAMA a while back by an affiliate marketer [1]. There were lots of responses by people claiming he was the scum of the earth, poisoning the web, not adding value, but he just calmly stated his position that affiliate marketers are a significant source of income for Google, and "Google == value", therefore affiliate marketing has value.

Same might be said for content farms, so I reckon your last sentence is on the money.

[1] http://www.reddit.com/r/IAmA/comments/azcni/i_made_62232296_...


Hacker mentality is largely gaming systems, so Google should indeed change the rules.


I wish they would. I find content farms irksome.


Perhaps because of legal issues, particularly when Google is being sued for tweaking its ranking algorithm to boost their own pages higher.

I dont understand the legalities of that lawsuit though. A ranking is an opinion. Its not that its some fundamental physical constant or a formally well defined quantity. Can I be sued for my subjective judgment ?

OT: Will appreciate if you let me know why you down voted.


I don't understand those lawsuits either. Isn't Google a private company? Aren't they entitled to do what they want with their product? I see no reason why their results must be "fair", legally.


Well, if they're actually a monopoly, then it does change what they can do, legally.


IANAL, but proving that a company has a monopoly and used that monopoly in an abusive way is very hard.

Consider that when discussing whether a company has a monopoly on a market, alternatives and cost of switching also comes up (besides market share); so it will be even harder than in the case of Microsoft.

Then even if Google is discovered to have a monopoly, shutting down a website ... can be argued that it was in the best interests of its users and faithful to the product's original mission: i.e. nobody can sue Microsoft for improving Windows / not bundling third-party software (they got sued for extending Windows with new functionality that destroyed competition in an existing market).


can be argued that it was in the best interests of its users and faithful to the product's original mission

If you're willing to pay the tens of millions of dollars and take the PR hit that is a government anti-trust lawsuit, you may eventually have the privilege of making that argument before a federal judge, who probably isn't technology literate enough to comprehend it. Let's face facts here, this conversation goes over the heads of 80% of internet users. The judge is going to see "the federal government accuses them of restraining this site's trade by removing them from the search index, and Google admits to it" and then that's the ballgame.

Or maybe the government never gets around to suing Google. Or maybe they get a judge who knows this sort of stuff already. But that's still a scenario keeping Google's Legal Dept. up at night.


Yeah, but that would be like suing Microsoft for making Windows more secure because that can "restrain the trade of companies producing Antiviruses".

That would be a pretty stupid argument, even in the face of a non-technical jury, wouldn't it?

Of course, the negative image would hurt, but lets be honest, the suit against Microsoft accomplished basically nothing it couldn't handle, while waisting taxpayer's money. Are these antitrust suits getting started so easily?


Or like suing Microsoft for making the computers that ship with its OS more useful by including Office cheaply...


I would be surprised to see them suffering for optimising the algorithm to try and trace the original source of content though. With the sort of sites we're talking about, by definition that original source will be more up-to-date content which the user could reasonably expect to be presented preferentially. At which point - you're welcome to make your business from rehosting public content that originates elsewhere, but if you expect us to primarily direct people to your copy rather than the original then I think you're onto a loser.

Let's put it another way. I could conceivably build a valuable service by taking content from StackOverflow and Wikipedia (or wherever) and automatically linking the two to enable people to get some more context around some questions, maybe reformatting pages to enable side-by-side content or something similar. It wouldn't be a trivial service but it wouldn't be impossible in the least and it could plausibly add value. As such it wouldn't be unreasonable to preferentially direct users in some cases to that source rather than the original, as algorithmically optimised content - the preferential ranking would be a result of the value of the linking algorithm. Without this though, by what measure am I conceivably superior to the original source by having an out-of-date copy with fewer legitimate inbound links and more irrelevant content (adverts)?

Being a monopoly isn't illegal. Using your monopoly power in a way that might harm another company isn't illegal. What's illegal is doing that unreasonably and capriciously, as Microsoft were with Netscape in that trial, or DR and Lotus in previous trials.


No law, as far as I am aware, forbids a monopoly from producing a crappy product. Antitrust law comes into play when a competitor offers a less-crappy product and the monopoly tries to drive the competitor out of business.


No, but a company that has a monopoly cannot manipulate a market by making a product selectively crappy, like a search engine that fails to find its competitors or a program that fails to run on a competing, compatible OS (early win3 on DR-DOS).


Yes, if you make your product selectively crappy in order to undermine competition, it’s an antitrust problem. Microsoft making Windows not run on DR-DOS is an antitrust problem. Microsoft making Windows fail at multiuser security is not an antitrust problem.


> Microsoft making Windows fail at multiuser security is not an antitrust problem.

No. That was incompetence.


It's a publicly traded company, but yes, "private" if you mean "not controlled by a government"


Two points:

1. What makes you claim blacklists are the "most-effective" form of filtering? My understanding is that other techniques like Bayesian filtering are.

Background: http://www.paulgraham.com/falsepositives.html

2. Saying that it only comes from a few sources is dangerous thinking. This type of thinking was used by politicians to get a ban on adult services from Craigslist ("get rid of it there and it goes away!").


A good email blacklist (such as the Spamhaus Zen list) will detect 90% of spam with no false positives and negligible load.

There is a lot of evidence that there are not many botnets and each one has little diversity of control. This observation does not generalize to other kinds of spam such as 419s.


Do these content farms run Google ads?


I don't understand what people are complaining about when they use google for generic product searches like this.

What do you expect "iPhone 4 cases" to return?

Links to reviews? Links to Apple's online stores? Links to other retailer's stores? Links to information about what the cases are manufactured from?

I don't understand what a search engine is supposed to do in this use case. How can it divine which of the many things related to iPhone cases you're interested in? This generic search could go in many different directions.

Personally I would never think go search google directly for a product review like this. Amazon is the best-known place to find reviews from fellow general-consumers.

When you use a search engine, I think the key to efficiency is having a firm idea of what type of results you'd like it to return before you press the "Search" button.


We want the whole world to teach each other and learn from the questions and answers posted on our sites. Remix, reuse, share – and teach your peers! That's our mission. That's why I get up in the morning.

However, implicit in this strategy was the assumption that we, as the canonical source for the original questions and answers, would always rank first.

Translation:

We thought syndicating content would give us Google juice but it backfired ...


when was the last time you clicked through to a page that was nothing more than a legally copied, properly attributed Wikipedia entry encrusted in advertisements? Never, right?

Jeff gets it wrong yet again. Has he never heard of (or clicked a search result that led to) answers.com?


Those sites exist, and no-one's disputing that. But the point is that the Wikipedia result is in the Top 5 (and often the top result), and answers.com is further down the list if it's even on the first page.


That is because wikipedia has more pagerank than god, not necessarily because google is favouring the "original" source, which seems to be the main concern of Jeff.


It could also be because Wikipedia tends to write about really broad topics that are not something you usually purchase. Hence, Wikipedia's high PageRank and difference in types of queries (e.g. "War of 1812" vs. "Dishwasher") probably make scraping worthless. Unless you were selling history textbooks.


He seems to think this has never happened before, but I can remember Google search quality apparently declining repeatedly in the past... sometimes it seemed to return all the way to where it had been, and sometimes part way, but it isn't as though this is unprecedented. Additionally:

when was the last time you clicked through to a page that was nothing more than a legally copied, properly attributed Wikipedia entry encrusted in advertisements? Never, right?

It's not too common, but it's not like it never happens. Again, at times in the past, this has happened regularly for a while, to the point where you have to add "wikipedia" as a search term, but it has always returned to normality after a few days or so.

Since this happens from time to time for me, I'm wondering now if Jeff has been doing something right that I'm failing to do when searching.


It is almost as if a dam has cracked and we are seeing the first trickles of "Google sucks lately" stories. It is increasingly becoming an arms race - Google tweaks its algorithms to defeat SEO, Spam and other Gamers and the gamers tweak their tactics to outwit Google's tweaks. Anybody else see an opportunity in this phenomenon to supplant algorithmic search with curated search?


the problem with curated search the first time around was that it was too hard to keep up. your curated index invariably fell behind. yahoo started out as a curated index of the internet and eventually went to algorithmic search because they couldn't keep up and that was in the 90s. i don't see how anyone could even conceive of keeping up now.

perhaps you mean something more like the zero click search that duck duck go has?


No I did mean curated. You bring up one of the key objections, with the web growing at the rate it is how do you keep up with curation? and I do have several ideas (but no pat single answer)

- Google has effectively a cache of the web

- These are the people who have figured out how to run rings around large data sets - MapReduce, BigTable, Colossus, etc

- there are aspects of information provenance that can help fix some problems (and having a snapshot of the web should help)

- Crowdsource some of the curation to improve quality

- INVEST MORE in SEARCH and drop some of the other distractions

- Resolve the basic conflict of interest with needing Ads to feed the bottom line but needing web users to use Google to get higher quality ad-free results

- Few other more focused ideas (Really need to punch up a blog post soon :))


Curation predates algorithmics, but things like web directories don't have that much traffic. Although Wikipedia is an example of a successful curated site.


What do you mean "lately"? This has been going on for a long time; just search Hacker News for entries related to sluggish performance on YouTube and Gmail for example.

Their search being criticized for reasons different that made people switch to DuckDuckGo are, however, somewhat novel.


In evolutionary terms, Google are gaining a very solid advantage every day. If Bing were to start growing suddenly, their tools for beating black-SEO and spam would be more primitive due to the lack of natural "predatory pressure". Bing's lack of immunity against some attacks would then set them back.


The issue is not technical, it is a structural feature of Google's business model. Google has a vested interest in ranking websites filled with advertising high in the search results...eyeballs on ads is ultimately the basis for their primary sources of revenue.

If simple searches yield crappy results, then the user spends more time on the search page revising the search terms and therefore looking at keyword based ads, clicks through a sponsored link, or clicks through to a site filled with Google served ads.

Bing does not face similar structural pressure because it is not Microsoft's primary source of revenue.


> If simple searches yield crappy results, then the user spends more time on the search page revising the search terms and therefore looking at keyword based ads, clicks through a sponsored link, or clicks through to a site filled with Google served ads.

But if that user has to click on multiple ads to go where he wants, this is bad for the advertisers, and should reflect back on Google indirectly.


Google is filled with smart people and I suspect that they work very hard to find the optimum mix of advertising driven results and pure search results. Given that their market dominance, I suspect that most internet users have a high tolerance for advertising driven results.


In other words, Bing can afford to make a better search engine because it's not weighed down by a need to make a profit? I'm sure Microsoft wants in the end to make money from search, much in the same way as Google is. Although they may need to find a few niches first rather than doing a billion-dollar frontal assault.


Microsoft can generate a profit even if Bing itself doesn't make money and it can still benefit greatly from Bing without it generating revenue directly because Microsoft can monetize search technology without ad revenue by incorporating better search technology into their primary product lines.

The billions of dollars attributed to Bing is just internal accounting, Microsoft doesn't backcharge other departments such as MSDN for its use on their websites. Bing is the result of a research project which investigates search algorithms, large data sets, and collects data on web users - keep in mind that 3% of all web search probably provides enough data for most research projects.

Bing might be seen as similar to TerraServer: a test bed for developing tools which work on a much larger scale than most enterprises currently require.


I just don't understand the problem that Google is having. Why can't they simply penalise sites/domains that are full of rubbish? Or manually boost domains and sites that aren't.

The lack of innovation in search worries me - there are big commercial incentives for Google's results to be poor. Though the emergence of viable alternatives will change this.

I'm sure I read that the average revenue per search was $0.08 or something around that mark. At that level it's worth having some human intervention. Perhaps Yahoo had something after all!


From a cynical point of view it seems to me that Google doesn't mind one bit if you have to click through 5 pages of links or modify your search query a few times. It's all page hits/ad impressions for them. Since there's very little competition in search they don't have to worry much about people going elsewhere.


>"Since there's very little competition in search they don't have to worry much about people going elsewhere."

Google search has become increasingly biased over the years, to the point where identical searches by two different people can yield significantly different results depending on each person's geographic location, previous search history, and previous web activity. Google may not be evil, but they often appear to be creepy.

That's why over the long haul, the billions spent on Bing probably make sense for Microsoft. They can run it and scale it without total dependence on ad revenue and the inevitable pressure to skew search results to create impressions for their ad network because it is not their primary revenue source.

Unlike Google search, Bing is not primarily a business model, it is another technology in Microsoft's portfolio. Ad sales are secondary to the added value of search when incorporated into other Microsoft products.


With the amount of cash they have in the bank, it would behoove them to spend a little on human oversight to protect their reputation as the go-to search site.


That's why they're called "search engines" and not "find engines".


The search "iphone 4 case" seems to be particularly susceptible to crap results. Even DDG (https://duckduckgo.com/?q=iphone+4+case) and Bing (http://www.bing.com/search?q=iphone+4+case) give shady results.


What happens if other sites are scraping content faster than Google can crawl it? In these cases, will Google really be able to guess which site is the original? For all they know, SO is scraping a lot of its content from other sites.

If this kind of uncertain-originator is any part of the problem, one solution might be for Jeff to temporarily block robots other than google/bing/etc. from retrieving new content, until say, ten minutes later. This gives the search engine a chance to figure out who the original is, while still (I think) remaining within the spirit of CC-SA. A Google API call (I'm high reputation, please crawl this new page now!) might be even better.

edit: clarified API suggestion.


That's assuming the scrapers will respect robots.txt. Some of them would, and that would help.

Assuming google does take into account who was first, a similar solution is for Jeff to submit his content directly to google for indexing, immediately it's published.

EDIT I'm wrong; google only accepts top-level URLs for indexing, not new content: http://www.google.com/addurl/?continue=/addurl


You can specify specific urls by submitting a sitemap (http://www.google.com/support/webmasters/bin/answer.py?hl=en...).


Well, you could ban misbehaving robots by IP address. That seems like a reasonable step for bad behavior and not in conflict with Jeff's general "sharing"/"openness" values.


The only way of solving this I can think of is sending new URL notifications to Google - if not sending the entire HTML content as well. Without the content it'd be open to abuse, and I can't any way to scale it - but then I'm not a Google engineer ;)


That seems good -- but I can imagine Google preferring to crawl the page, rather than receive it by API (so it's more likely to be what the user's going to see).

I think Google can handle the scaling problem; one not-great solution: ignore notifications except for those from people who are being scraped and need it.

Still, it's kind of a shame that webmasters have to worry about any of this.


That seems good -- but I can imagine Google preferring to crawl the page, rather than receive it by API

I'd agree with you, but can think of edge cases where naughty sites A, B and C submit new URLs within an arbitrary amount of time of the new content being published. In that case there'd be no way for Google to tell who published the content first other than to have a big list of original content publishers - and I think that list'd get messy fast.


You're right, there's an exploit there for the re-publishers. Great point.


If you, as a webmaster or developer, needs to inform search engines when you your content has changed, you just became part of the search-engine.

If that is what is required, and everyone on the internet needs to build google-hooks into their server-side website logic to not be SEO'd to death, I think its safe to say that google just stopped being useful.

I hope that in 2011 people will start seeing beyond the Google RDF which has gotten increasingly annoying throughout the last year: We need more competition in this space. Pronto.

Making your website part of the google search-engine will not accelerate this. It's not a "good" solution. It's barely a solution at all.


It's not a "good" solution. It's barely a solution at all.

I couldn't agree with you more; it's a stupid problem to have and one that shouldn't exist. It seems like the time is ripe for either a Google-killer to emerge in the next couple of years, or for Google to improve themselves in the same amount of time. Personally I believe that decent competition would be the healthiest way, but let's not kid ourselves; as soon as one search engine starts being used by a decent proportion of users, the SEO blackhats are going to try and game it just as much as they do with Google.


An easy easy to get better results for searches when looking for reviews as well a getting up to date content for searches has become my personal goal for unscatter.com. My first piece is up, using the blekko api. I will be adding more search filters powered by different apis in the future. At the moment it's basically a wrapper around the blekko api I admit but already useful for searching for iphone 4 cases I think. http://www.unscatter.com/search/?q=Iphone%204%20case&f=r...


Google has always had "bad neighborhoods" -- places where results weren't so good. What folks are finding is that the bad neighborhoods are on the rise, at least when it comes to short, popular searches. Now it appears the screen scrapers are busy at work targeting tech questions. In the last couple of months, when I had a technical question I got total junk for an answer -- lists of questions that took me to landing pages, re-dos of Stack Overflow pages, and random questions that didn't even have answers.

I use Google extensively for search. About once a month or so, I'll be looking for something in a bad neighborhood. It's not a pleasant experience. It's a shame to see tech questions end up like this.

But the problem, as another poster pointed out, is that nothing is for free. You are either paying money, in which case you are the customer, or you are the product. There's no "in-between" In Google's business model you are the product.

I think the business model can continue for a good, long time, but there is always going to be cross-incentives between people who want free stuff and providers who have to pay money to provide you with stuff. Not everybody can be a wikipedia and raise money with pictures of Jimmy Wales. They are an outlier.

My conclusion is that these are browser problems. After all, it's none of my business what people put on the web, and aside from liking Google and wishing them well, I really don't have a dog in the fight for their struggle. In fact, it's better for me to have a dozen search companies all using different algorithms -- makes it harder to game the system.

So what I want is a browser. A browser that uses multiple search engines automatically and completely eliminates any "fluff" from rendered pages -- perhaps even combining various pages into much simpler displays.

I'd pay for that, and that would make me the customer. Then I would have whatever web experience I desired, instead of the one that I get for free. I'd much rather be in the position of writing a check to the best browser provider that condensed and filtered information than the situation we have now.

(By the way, if anybody is interested in this browser project, please contact me, as it's been a pet project of mine for some time)


Hold on a tic. You're complaining about clutter in the search results, but didn't you just say at http://www.whattofix.com/blog/archives/2011/01/confession-i-... that you make several sites like facebook-login-help.com to target specific phrases? So you're complaining about junk on the web, but you also own domains like buy-fresh-blue-cheese.info, and at the same time don't want such sites to clutter up search results? My head hurts.


Matt,

I believe that there is no single metric for search results -- that there is no universal answer to the question of "iPad games"

So I try to provide original content in niches that don't exist.

I apologize if you don't like the quality of what I am doing. I will gladly delete the sites if they hurt people. I am simply trying to learn how to provide content that people want.

You are welcome to contact me offline and I'm happy to do whatever is necessary to prove that I mean well. I don't know what else I could offer to prove my good intentions.


Daniel, it wasn't my intent to pick on you personally--sorry if it came across that way. It was just the juxtaposition that caught me off-guard. :)


Please don't apologize. It's pretty clear hypocrisy.

And please do something about the StackOverflow problem. This is clearly an issue with Google, which is highly visible to ALL programmers. I couldn't imagine a worse problem to have for a company that is interested in hiring the best engineers.


I did ping the right people about SO and added a new example that someone mentioned. The right team (and their manager) are actively discussing what steps are doable to improve these searches.


Awesome! THe efreedom stuff was really starting to bug me. I use StackOverflow a lot.

Google is still hands down the best search engine.


I'm coming at this from the other side, so perhaps sometime we can compare notes. I'd like to learn.

I'm a writer -- I love writing. I have a blog that I've been writing on for several years now. I think I'd write even if nobody wanted to read what I wrote.

A year or so ago, it occurred to me that, instead of just writing whatever I feel like, perhaps I should try to write something that people want. So I use tools to see what people search for, and what kinds of content already exist.

I've written some really corny sites, as you point out. But I hope that each site gets a little better, and I know from the emails and return traffic I get to some of my sites that people are getting value from them. Some -- probably not. That's okay because I'm learning.

If I'm doing something wrong, please just let me know. I'm happy to change things. In this field we have the big companies, the scammers, and the folks somewhere in between. Most guys who make money off such sites probably aren't honest (stupid?) enough to blog about it. I know it's been very difficult for me to find guidelines on what works, what to do, and what not to do.

So I've created my own guidelines for now: no rewrites, no link-spam, no tricking the user, no link-bait. Perhaps I need more. Don't know. That's all I have so far.

Like I said, I'm not putting myself out there as an example of what to do or how to act. Beats me. This is a complicated problem and there are multiple valid points of view.

Hey -- the first thing I install on my browser is AdBlock. I hate all forms of advertising, on any medium. I suggest others do the same. That doesn't mean I have to give up writing about stuff and providing content. And it doesn't mean I can't put ads on the content I provide. Not crazy about a lot of stuff I participate in.

Head-spinning, perhaps. Sorry about that :)


> I've created my own guidelines for now: no rewrites, no link-spam, no tricking the user, no link-bait. Perhaps I need more. Don't know. That's all I have so far.

Google Webmaster Guidelines: http://www.google.com/support/webmasters/bin/answer.py?hl=en...

"We strongly encourage you to pay very close attention to the 'Quality Guidelines,' which outline some of the illicit practices that may lead to a site being removed entirely from the Google index or otherwise penalized. "

http://www.whattofix.com/ looks a little dodgy through the eyes of a SEO link farm hunter when it links to http://www.hamburger-casserole-recipes.com/ and http://neuropathyinfeet.us/ and http://paycheck-stub.com/ and http://facebook-login-help.com/ What sets it apart is obviously those sites really DO provide original information on those topics. But the layout of something like http://www.hamburger-casserole-recipes.com looks like http://www.spamrecipes.net/ which just copies other people's recipes and sticks a juicy ad block under them. It's hard to be honest in a world of scammers.


Thank you. I'll give them a look.

The http://www.hamburger-casserole-recipes.com/ site gets a lot of traffic, and a lot of it is return traffic. But, since recipes are fairly standard, it was very difficult not to in some way re-fashion what was already out there.

When I was doing the research on the recipes site, I found a lot of what I consider spam -- sites full of recipes with the intention of tricking users into downloading spyware, buying SMS subscription services, and other such stuff. I felt like a simple, easy-to-read, targeted site based on just a few recipes people want might have value. It's not a general purpose recipe site, and the content is presented as simply and openly as possible. (And it was all entered by my wife)

It's interesting that it would stand out as being dubious, as it seems to me to be the most straightforward trade of content for eyeballs of the sites I have.

As an aside, it doesn't make much money, perhaps because nobody advertises about casseroles or cookbooks? Beats me. But I plan to keep it because of the positive response it gets, including people writing to ask for new recipes.

I didn't have a problem listing my other sites on my blog, since, well, they are my other sites. It's not like I'm trying to keep them secret or anything.

I appreciate the help.


I don't see any contradiction, even if sites like 'facebook-login-help.com' are exactly the kind of clutter Markham would prefer did not rank at all.

One complaint is wishing that the search results were better. From the perspective of one person, that might or might not ever happen – one person outside Google has very little influence on overall search quality trends.

The other action is living in the world as it is. If cynically-optimized results are going to dominate the top spots, and being in the top spots is a way to make a living, even people who would prefer cynical-optimization not be rewarded have to engage in it. (I'm sure there are lots of good people creating awful content for Demand Media and other mills; they're just responding to bad incentives.)

It's the same for anyone who's had to twist their original web writing/design/structure in a way that was solely directed at search engines, rather than primary users. Yes, lots of SEO advice is making the web more usable for people, as well. But not all of it, axiomatically.


Note that the area StackOverflow is in has been a "bad neighborhood" for a long time... StackOverflow has played a big role in cleaning up.

About six months ago, just about any question about "doing X" with Windows or .NET would come up to some page on ExpertsExchange that would promise to give me the answer... if I paid. Yeah right.

It's no wonder the spammers in this space are fighting back against S.O.


> So what I want is a browser. A browser that uses multiple search engines automatically and completely eliminates any "fluff" from rendered pages -- perhaps even combining various pages into much simpler displays.

Aren't you describing a "metasearch engine"? They have been around for quite some time (which does not mean they could be much improved, of course).


Most of the good search engines have a TOS that disallows you from creating a meta search engine from their results (API or otherwise). And they WILL cut off your access if they catch you. Learned that the hard way.


The browser would solve the cross-purposes problem which exists not only in search but everywhere else.

So, for instance, Facebook provides you this cool platform for chatting, playing games, and keeping track of your friends. But, of course, it's not really free. You'll see ads, get pitched products by your friends, and all kinds of other stuff that has nothing to do with (perhaps) your primary purpose for being there.

Since this "Hey! We're free! (Except not really)" problem is going to continue to exist on the web in various formats and places, fixing the browser is the only way to keep control over your experience.

We're already seeing this type of work in developers who build add-ins to control the user experience. Hell, I'd pay for a subscription that kept my browser updated with all the appropriate plugins, but I think we can do a lot better than that. For instance, if I want to send a friend a message, I could care less whether they are on LinkedIn, FaceBook, MySpace, or just somebody from my address list.

The purpose of HTML is to separate the data from the presentation. It's the only way you can make the web work. But the implementation of html is full of walled gardens, addictive-play websites, and using your own friends against you. We need to get back to basics.


I think this contributes to an ongoing trend and even bigger threat for google. The way we access the content of the web isnt the same way it was back in 2000. Back then a search engine was your only starting point for the web. Now a growing part part of redirects comes thru in some way curated (mostly social) channels. The poor search results will only increase this trend.


I don't understand why this is such a hard problem to solve.

I assume that every business that manages to farm content and SEO it up to the first page must be making a decent investment in time and resources to achieve this. It doesn't happen overnight.

So wouldn't it be easy enough to maintain a blacklist or at least a de-value list that would bring the return below the investment? Shouldn't there be a streamlined process for assembling this blacklist? They must already be doing something along these lines and no doubt quite a bit more involved than what I'm describing here.

Could they add in a crowdsourcing flag link next to all search results. This wouldn't blacklist anything automatically obviously but would assist in identifying which results should be investigated further?

Why is it still an issue? Is it a legal problem? Can they be sued for maintaining a blacklist?

I'm not trying to say I know better, so I must be missing something. Maybe someone can shed some light on my ignorance?


Lately, I've been exploring the theory that too many Google employees exist:

Thousands of highly motivated employees attempt to expand their resumes + make an impact -> blind expansion of site features + sources of ad revenue -> loss of company character + restraint

Once the profit appears, no one dares to backtrack.

Does that make sense, or am I just speculating?


It might make sense if they were all, working on the same thing.

They're not.


It seems like the obvious solution is a crawl on demand service provided by Google-so that when you publish new content, or your content is updated you can get Google to index your new content, and associate it as original content based on first appearance.

Then, it would be up to Google to prioritize content originators over farmers.


This already exists in a few flavors, and Google checks bigger sites more often than smaller ones.


I suspected the search results I was getting over the past few months were of a lesser quality but thought it was just an aberration.


The current search situation as described by such posts seems analogous to the search quality deteriorating during the emergence of blogging. Google stepped in and cleaned it up rather well. I'd trust them to do the same with whatever tricks the rehash sites are using.


Extrapolation is such a tiring business. Google is constantly changing and developing. How can you make generalizing comments about the future without knowing what they're working on?

For future reference, replace Google by pretty much anything.


There should be an attribute which you can add to html elements to state that this is the original content source. Then if Google comes across a large website that has a tonne of "original source" content which lots of other sites are claiming "original source" for, then they can automatically identify it as a scraper site and penalise/flag for manual checking. Something like this, but more extensible:

<p original-source="true">

   This is some content which was generated on this website
</p>


Why wouldn't everyone add that flag then?


Mike makes it clear why he thinks everyone wouldn't add the flag:

If Google comes across a large website that has a tonne of "original source" content which lots of other sites are claiming "original source" for, then they can automatically identify it as a scraper site and penalise/flag for manual checking.

(I disagree with him that this would work, but that's clearly his line of reasoning. The most obvious failure mode is if a site mirrors all of SO's content, but nothing else, and puts the flag on it, in which case the flag doesn't help you distinguish which site is the source.)


50 blogs all write original content, and all put "original source" attributes on their content. 1 website comes along and copies all of the content from these 50 websites.

Google knows where the original content came from and can rank accordingly.

Now, that site which copied the original source content notices dropping down the list so lies and states that they are the original source instead. Google can automatically detect that they are doing this because they have lots of data that exists on lots of other sites which has conflicting "original source" data. Google now de-lists the scraper site, or penalises further.

Also, it becomes impossible for scraper sites to say that what they're doing is ok, and is good for users, if all of a sudden they're publishing lies that they are the original source for content which they're not.


The scrapers are probably doing lot of SEO optimization. It is time for stack overflow to hire some SEO services. Wikipedia is not monetizing in anyway other than donations whereas stackoverflow does display ads of its own so why not hire someone to do SEO and stay on top?


that's why I use DDG with !so


spammers, scrapers, and SEO'ed-to-the-hilt content farms are winning

Spammers, scrapers: sure, they're a problem.

SEO'd sites: there is nothing wrong with optimizing your site for search engines. And a site that's optimized ought to win.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: