Explanation for those unfamiliar with Russophone online culture: in the Russian-speaking internet, Russia's gopniks (approximate equivalent of England's chavs) have for a long time been occasionally referred to as orcs or goblins. In the context of the Russian/Ukrainian conflict, this has expanded to anti-Putinist opposition and pro-Ukrainian trolls labeling pro-Putin Russians as orcs, and Russia itself as Mordor, land of the orcs. (Examples of usage: http://spektr.press/chto-eto-s-nimi/http://grani.ru/opinion/portnikov/m.235880.html)
This also ties into an old Russian geek tradition of identifying fantasy-genre elves with America and the West (which is very obvious in some fictional universes; e.g. in Tolkien's works, Valinor geographically corresponds to North America), with the natural implication that orcs and goblins (the opponents of the elves) are Russians, and therefore basically good guys but tragically misunderstood by the world. See https://en.wikipedia.org/wiki/The_Last_Ringbearer for this viewpoint expanded into a novel.
So either google's algorithms had identified a sufficient number of humorous references to Russia as Mordor, or some intrepid band of trolls decided to google-bomb the algorithm to push this translation.
As a person native to russophone online culture: this "explanation" is your fantasy.
It's true ukrainian coments in forums can call russia mordor, but it's not an extension of any previous background. And IMHO it doesn't require explanation. BTW, I have never heard about america = valinor.
Every meme starts from somewhere. For example, the vatnik meme did not appear fully formed from the vacuum like Venus from ocean foam; it was invented by a specific person, who was (according to interviews) influenced by Spongebob Squarepants - hence the shape.
I don't know who invented the Russia/Mordor theme, but I can try to guess where in the online cultural background that existed in 2013 that person could have found their inspiration.
> And IMHO it doesn't require explanation.
IMHO it does require explanation. Why Mordor and not Nilfgaard or Fire Nation? Why Sauron and not Voldemort or Tywin Lannister?
I wonder if these "algorithmic errors" are the result of people "maliciously" training Google Translate, since there's the option to correct and give a better translation. I wouldn't be surprised if it were programmed to automatically assume that a correction is good if multiple people correct the same thing, which would then lead to these gaffes as communities like 4chan figured it out and "exploited" it.
That was my first thought, but the article has this paragraph near the bottom:
"Although translations are managed automatically, it is possible for users to suggest alternative translations manually.
However, the BBC understands that this was not how the errors were introduced."
As I understand it they're doing statistical translation - they compare versions of the same text in multiple languages and use it to automatically create a model of what things translate to. So if Ukranians are using the word "Mordor" instead of Russia when translating Russian text, then as far as Google Translate's concerned that's the correct translation of Russia.
No seriously, I saw many cases with BBC, like when they conduct an interview about political situation with a person pretending to have telepathic abilities.
In this particular case, google translate learns from texts existing in both english and russian (for example newspapers which are published in two languages, etc)
I doubt there are newspapers or other texts where russian article saying "Russia" is translated to english with "Mordor".
It's much more likely a kind of flashmob by ukrainan users who submitted manual corrections to google translate.
4chan also exploited reCAPTCHA when it was just bought by google and they used it to OCR books. reCAPTCHA provides you with two words, and one is much easier to recognize than the other. So 4chan users created a simple website with lots of reCAPTCHAs, typed in the easy word correctly to pass the test and typed in "nigger" instead of the second word. As many users raided at the same time, Google OCRed hard to recognize words as "nigger".
I always do the same when seeing a reCAPTCHA or other captcha from Google: Type the correct word properly, and a slightly off word for the other word, which seems plausible to naive OCR, but still is wrong.
From the ideological side, as long as Google profits from the work of users due to their monopoly, but doesn’t provide it as open dataset under a noncommercial license, I don’t wish to support reCAPTCHA.
The word you type is not going to be accepted just because it is similar to what it should be. It is double-checked multiple times anyway [0]. People like you just make OCRing slower, just as if you never typed reCAPTCHA at all.
Why the quotation marks? If you were working on Google Translate, then dealing with this sort of thing would be on your TODO list as an exploit, not an "exploit." As a user of Google Translate it's not helpful, either (that a user thinks that Nazi Germany and Soviet Russia are both evil states does not mean that the user is helped by reading about a war of Mordor against Mordor.) It's funny and all, but it's definitely an exploit.
I study Norwegian, and I've seen some odd things like that in Google Translate. For instance, one time I translated a sentence containing a prominent Norwegian university that got translated to Princeton in English. I'm inclined to think that this wasn't malicious or a joke, but simply a deficiency in the automation. I've read a little bit about how Google Translate's automated stuff works, but not enough to feel confident to suggest how or why these kinds of errors happen.
My favorite example (since fixed) was the word "amistad" translated from Spanish to English. It translated to "friendship," of course. You could add exclamation marks to it, like "amistad!" and it would translate to "friendship!" You could add more, and it would add more. But if you translated "amistad!!!!!" with precisely five exclamation marks, not four nor six but exactly five, it would instead translate to "murder!"
This sort of thing was mentioned in an early talk given by someone at Google Translate. (Sorry, I have neither the link nor the name of the person handy.)
As it makes use of parallel corpora, there were examples of documents mentioning (say) a famous person in the United States, but when the document was manually translated into French, the human translator selected a different famous person more well-known in France. It made perfect sense in the context of the document, but it threw a wrench into the automated translation based on those documents.
Both this and the article remind me of a strange Google Translate problem where the translation of "Austria" varied, depending on how many exclamation marks it was followed by:
I would throw a wild guess that it tries to replace phrases with ones of equivalent meaning, so that the gist of a sentence is translated.
Things like "raining cats and dogs" doesn't probably translate well literally, for instance. It probably failed to categorise these universities as things that should not be translated in this manner.
Tangentially reminds me of the recent Ludum Dare "best innovation" winning game entry that uses Google auto-correction (not translation):
Infinity Monkey Autocorrect
"This funk-filled game explores what would happen to the Infinity Monkey/Typewriter Theorem if it had a commercially-biased autocorrect. The game submits to a growing body of monkey-submitted literature."
This is superb! I've often wondered how autocorrect would finish some of my messages on my phone. It gives some really weird suggestions after certain words, perhaps it's worth investigating further.
The way Google Translate works, which is actually explained pretty nicely by Google statement quoted in this article, makes this a non-story in my opinion. Whether by an intentional attack (somehow flooding the Google corpus [the whole indexed web in a given language?] with biased texts) or a statistical mishap, this is ultimately fairly uninteresting when you realize that: no, no one at Google "snuck" this in there.
In the attack scenario, if there were details as to how someone pulled off a Google-bomb style attack, that would be kind of fun and interesting. Otherwise, there's not much to say.
On the other hand, asking which specific (US) agency plays the exact same role as the Ministry of Truth in the novel is kind of pointless. The short answer is none, but that doesn't mean the same kind of propaganda is not happening in a more nuanced/subtle form nowadays.
Lorem Ipsum is a passage quoted from a Latin book "De finibus bonorum et malorum" and has been used by typesetters for seventy or eighty years. So the fact it can be translated shouldn't be a surprise.
Lorem Ipsum is based on that passage, but has been mangled significantly in the adaptation. "Lorem" is half of the word "dolorem", for instance. So it can't actually be translated in any real sense.
Instead, if you try translating that passage from Latin, Google Translate goes nuts. Since "Lorem Ipsum" is used as filler text in a bunch of web sites, it finds apparently parallel texts all over the place and ends up cobbling together a "translation" out of random phrases. IIRC, "lorem ipsum" alone used to translate to something like "click here"? (It's passed through unchanged now.)
This is bizzare for me since I was messing around with the XKCD-substitutions plugin a few weeks ago, and replacing "Russia" with "Mordor" was one of the changes that I made.
Apparently the BBC use exact-quotes instead of scare-quotes. More details in this previous thread about an article in The Guardian: https://news.ycombinator.com/item?id=6446811
Wild guess: It probably took 'Mir', meaning world, kingdom, etc. in Russian, and made a connection to "Mordor", or perhaps "Mordor" is even spelled as "Mirdor" in Russian.
Mir can mean numerous things. And how is it not how Google Translate works? Do you have inside information? Because the only information we have is that it scours the web, makes connections, and forms a graph. Can feed it some "corrective" data as well.
But you can simply bicycle out (as the stream of Syrian refugees entering Norway from Russia illustrates). Ergo, you cannot simply walk out of Russia (to Norway) either.
This also ties into an old Russian geek tradition of identifying fantasy-genre elves with America and the West (which is very obvious in some fictional universes; e.g. in Tolkien's works, Valinor geographically corresponds to North America), with the natural implication that orcs and goblins (the opponents of the elves) are Russians, and therefore basically good guys but tragically misunderstood by the world. See https://en.wikipedia.org/wiki/The_Last_Ringbearer for this viewpoint expanded into a novel.
So either google's algorithms had identified a sufficient number of humorous references to Russia as Mordor, or some intrepid band of trolls decided to google-bomb the algorithm to push this translation.