Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Google translated Russia to 'Mordor' in 'automated' error (bbc.co.uk)
245 points by dan1234 on Jan 7, 2016 | hide | past | favorite | 68 comments


Explanation for those unfamiliar with Russophone online culture: in the Russian-speaking internet, Russia's gopniks (approximate equivalent of England's chavs) have for a long time been occasionally referred to as orcs or goblins. In the context of the Russian/Ukrainian conflict, this has expanded to anti-Putinist opposition and pro-Ukrainian trolls labeling pro-Putin Russians as orcs, and Russia itself as Mordor, land of the orcs. (Examples of usage: http://spektr.press/chto-eto-s-nimi/ http://grani.ru/opinion/portnikov/m.235880.html)

This also ties into an old Russian geek tradition of identifying fantasy-genre elves with America and the West (which is very obvious in some fictional universes; e.g. in Tolkien's works, Valinor geographically corresponds to North America), with the natural implication that orcs and goblins (the opponents of the elves) are Russians, and therefore basically good guys but tragically misunderstood by the world. See https://en.wikipedia.org/wiki/The_Last_Ringbearer for this viewpoint expanded into a novel.

So either google's algorithms had identified a sufficient number of humorous references to Russia as Mordor, or some intrepid band of trolls decided to google-bomb the algorithm to push this translation.


As a person native to russophone online culture: this "explanation" is your fantasy.

It's true ukrainian coments in forums can call russia mordor, but it's not an extension of any previous background. And IMHO it doesn't require explanation. BTW, I have never heard about america = valinor.


> this "explanation" is your fantasy.

My educated guess :)

Every meme starts from somewhere. For example, the vatnik meme did not appear fully formed from the vacuum like Venus from ocean foam; it was invented by a specific person, who was (according to interviews) influenced by Spongebob Squarepants - hence the shape.

I don't know who invented the Russia/Mordor theme, but I can try to guess where in the online cultural background that existed in 2013 that person could have found their inspiration.

> And IMHO it doesn't require explanation.

IMHO it does require explanation. Why Mordor and not Nilfgaard or Fire Nation? Why Sauron and not Voldemort or Tywin Lannister?


I never heard a Ukrainian call Russia Mordor, nor either russian or ukrainian call USA valinor.

Hell, most of the Russian speakers wouldn't probably know what Valinor means in the first place.

Yea, some people call it. But they are so marginal. I agree with you, this is a specific attack against google's algorithm.


Valinor - have not heard about but naming Russia Mordor is quite popular on Ukrainian unti-Russian forums.


We are not talking about most Russian speakers, but an Internet subculture of Russian speakers.


just because You are not familiar with the sub culture that generated a pop term, doesn't mean the etymology is wrong.


No, it's just You unfamiliar with the offline culture which really gave birth to this metaphor.


I'm not sure about your explaination, but either way the fact is Russia is somewhat frequently compared to Mordor.

This also reminded me there was some drama in late 2014 about lightning the Eye of Sauron above the skyscraper in Moscow (http://www.bbc.com/news/blogs-news-from-elsewhere-30414032) ;)


> pro-Ukrainian trolls

not only trolls, but ordinary Ukrainians as well use "Mordor" instead of "Russia" quite often in last few years


Fantastic—this should've been in the article!


I wonder if these "algorithmic errors" are the result of people "maliciously" training Google Translate, since there's the option to correct and give a better translation. I wouldn't be surprised if it were programmed to automatically assume that a correction is good if multiple people correct the same thing, which would then lead to these gaffes as communities like 4chan figured it out and "exploited" it.


That was my first thought, but the article has this paragraph near the bottom:

  "Although translations are managed automatically, it is possible for users to suggest alternative translations manually.
  However, the BBC understands that this was not how the errors were introduced."


Substituting "Russia" for "Mordor" is oddly specific if this algorithm issue was caused without direct human input though.


As I understand it they're doing statistical translation - they compare versions of the same text in multiple languages and use it to automatically create a model of what things translate to. So if Ukranians are using the word "Mordor" instead of Russia when translating Russian text, then as far as Google Translate's concerned that's the correct translation of Russia.


both ways are "automatic". one is getting source from thousands of translations, the other from thousand of user corrections.

you can only say something isn't automatic if someone with authority over the system executed the manual step.


[flagged]


No seriously, I saw many cases with BBC, like when they conduct an interview about political situation with a person pretending to have telepathic abilities.

In this particular case, google translate learns from texts existing in both english and russian (for example newspapers which are published in two languages, etc)

I doubt there are newspapers or other texts where russian article saying "Russia" is translated to english with "Mordor".

It's much more likely a kind of flashmob by ukrainan users who submitted manual corrections to google translate.


BBC reporter breaks 'world's safest drone' http://www.bbc.com/news/technology-35240062


FYI in this case the error was in Ukrainian-to-Russian translation, not Russian-to-English.


Haha


4chan also exploited reCAPTCHA when it was just bought by google and they used it to OCR books. reCAPTCHA provides you with two words, and one is much easier to recognize than the other. So 4chan users created a simple website with lots of reCAPTCHAs, typed in the easy word correctly to pass the test and typed in "nigger" instead of the second word. As many users raided at the same time, Google OCRed hard to recognize words as "nigger".

Edit: found related link https://www.reddit.com/r/pics/comments/cygfx/4chan_is_using_... https://i.imgur.com/oCa4d.png


I always do the same when seeing a reCAPTCHA or other captcha from Google: Type the correct word properly, and a slightly off word for the other word, which seems plausible to naive OCR, but still is wrong.

From the ideological side, as long as Google profits from the work of users due to their monopoly, but doesn’t provide it as open dataset under a noncommercial license, I don’t wish to support reCAPTCHA.


The word you type is not going to be accepted just because it is similar to what it should be. It is double-checked multiple times anyway [0]. People like you just make OCRing slower, just as if you never typed reCAPTCHA at all.

[0] https://www.reddit.com/r/pics/comments/cygfx/4chan_is_using_...


That’s exactly the goal. To not give Google work for free unless they also share it with others.

If Google wants actually meaningful manual OCR from me, they can have it – under GPL license, of course.


That seems to be his goal. To not help.


Indeed. Google constantly takes work from volunteers, modifies it, keeps the modification under closed source, and heavily profits from them.

Helping a company that uses such a business model is not in my interest, as it means less volunteer power is available for actually open projects.


Pro tip: the other word can be left out altogether.

I used to type aa before noticing that and this also worked like a charm.


Why the quotation marks? If you were working on Google Translate, then dealing with this sort of thing would be on your TODO list as an exploit, not an "exploit." As a user of Google Translate it's not helpful, either (that a user thinks that Nazi Germany and Soviet Russia are both evil states does not mean that the user is helped by reading about a war of Mordor against Mordor.) It's funny and all, but it's definitely an exploit.


People have been doing "google bombs" like this for about a decade, haven't they?


Yes. I believe that this has happened on more than one occasion before. Maybe 4chan could be behind this.


I study Norwegian, and I've seen some odd things like that in Google Translate. For instance, one time I translated a sentence containing a prominent Norwegian university that got translated to Princeton in English. I'm inclined to think that this wasn't malicious or a joke, but simply a deficiency in the automation. I've read a little bit about how Google Translate's automated stuff works, but not enough to feel confident to suggest how or why these kinds of errors happen.


My favorite example (since fixed) was the word "amistad" translated from Spanish to English. It translated to "friendship," of course. You could add exclamation marks to it, like "amistad!" and it would translate to "friendship!" You could add more, and it would add more. But if you translated "amistad!!!!!" with precisely five exclamation marks, not four nor six but exactly five, it would instead translate to "murder!"


This sort of thing was mentioned in an early talk given by someone at Google Translate. (Sorry, I have neither the link nor the name of the person handy.)

As it makes use of parallel corpora, there were examples of documents mentioning (say) a famous person in the United States, but when the document was manually translated into French, the human translator selected a different famous person more well-known in France. It made perfect sense in the context of the document, but it threw a wrench into the automated translation based on those documents.


Both this and the article remind me of a strange Google Translate problem where the translation of "Austria" varied, depending on how many exclamation marks it was followed by:

http://itre.cis.upenn.edu/~myl/languagelog/archives/005485.h...


I would throw a wild guess that it tries to replace phrases with ones of equivalent meaning, so that the gist of a sentence is translated.

Things like "raining cats and dogs" doesn't probably translate well literally, for instance. It probably failed to categorise these universities as things that should not be translated in this manner.


I've seen this with measurements and currency.

Danish "... 15 meters ... 100kr" becomes "... 15 feet ... $100" which isn't helpful. (45 feet would not be helpful either.)


Tangentially reminds me of the recent Ludum Dare "best innovation" winning game entry that uses Google auto-correction (not translation):

Infinity Monkey Autocorrect

"This funk-filled game explores what would happen to the Infinity Monkey/Typewriter Theorem if it had a commercially-biased autocorrect. The game submits to a growing body of monkey-submitted literature."

http://ludumdare.com/compo/ludum-dare-34/?action=preview&uid...

View the growing body of literature submitted through the game by monkeys (players): http://ld34.idumpling.com/manuscripts.txt

Extract:

"the difference, quickbooks online baby eats frog laid. the university favoured.. Dkdkbi ignored.. B bet but, the quotes"

Fantastic :)


This is superb! I've often wondered how autocorrect would finish some of my messages on my phone. It gives some really weird suggestions after certain words, perhaps it's worth investigating further.


Related algorithmic gaffe, "Google Mistakenly Tags Black People as ‘Gorillas,’": http://blogs.wsj.com/digits/2015/07/01/google-mistakenly-tag...


The way Google Translate works, which is actually explained pretty nicely by Google statement quoted in this article, makes this a non-story in my opinion. Whether by an intentional attack (somehow flooding the Google corpus [the whole indexed web in a given language?] with biased texts) or a statistical mishap, this is ultimately fairly uninteresting when you realize that: no, no one at Google "snuck" this in there.

In the attack scenario, if there were details as to how someone pulled off a Google-bomb style attack, that would be kind of fun and interesting. Otherwise, there's not much to say.


Looks like it's too late for me to edit parent comment, but see tetromino_'s comment for cool details behind why this may have happened.

https://news.ycombinator.com/item?id=10858904


How likely that the Ministry of Truth has asked Google to make a "mistake"?


Who do you mean by "the Ministry of Truth"?


If you truly did not get it, go read 1984, now.

On the other hand, asking which specific (US) agency plays the exact same role as the Ministry of Truth in the novel is kind of pointless. The short answer is none, but that doesn't mean the same kind of propaganda is not happening in a more nuanced/subtle form nowadays.



What the hell? How...


Ukraine is awesome!

They also had Darth Vader running for mayor.

'He joins Peking Duck and a man calling himself Putin in the leadership race.'

http://www.euronews.com/2015/10/23/ukraine-darth-vader-runs-...


Mordo(r) territory is pictured at old maps north-east to Mordva people. Rus` captured it and then Russia emerged there, so Google is right (a bit).


Also related, Google Translate can "learn" and "translate" lorem ipsum: http://tinyurl.com/hue43an

Not sure if this is a bug or a feature.


Lorem Ipsum is a passage quoted from a Latin book "De finibus bonorum et malorum" and has been used by typesetters for seventy or eighty years. So the fact it can be translated shouldn't be a surprise.


Lorem Ipsum is based on that passage, but has been mangled significantly in the adaptation. "Lorem" is half of the word "dolorem", for instance. So it can't actually be translated in any real sense.

Instead, if you try translating that passage from Latin, Google Translate goes nuts. Since "Lorem Ipsum" is used as filler text in a bunch of web sites, it finds apparently parallel texts all over the place and ends up cobbling together a "translation" out of random phrases. IIRC, "lorem ipsum" alone used to translate to something like "click here"? (It's passed through unchanged now.)


might be just lucky overlap between "lorem ipsom" and the latin texts it is inspired by


The article's picture of Sergey Lavrov and its caption is comical and clever


This is bizzare for me since I was messing around with the XKCD-substitutions plugin a few weeks ago, and replacing "Russia" with "Mordor" was one of the changes that I made.

https://chrome.google.com/webstore/detail/xkcd-substitutions...


Funny. It reminds me of the time I saw a picture of the Kremlin and thought "Mordor meets Candyland."


Is there a Ukrainian meme "One does not simply walk into Russia"?


I think this error is related with using "suggest translation" issues


And can you guess whom they translated as Sauron or Morgoth?

And it's not like it's the first time ;) http://www.kulichki.com/tolkien/podshivka/970121.htm


Deservingly.


The quotation for 'automated' made me think this was an intentional error made by Google.


Apparently the BBC use exact-quotes instead of scare-quotes. More details in this previous thread about an article in The Guardian: https://news.ycombinator.com/item?id=6446811


This was likely an issue of machine learning. The machine learning picked up on patterns people were using.

Note, I have no inside knowledge. But, the usefulness of machine learning for things like this.


Wild guess: It probably took 'Mir', meaning world, kingdom, etc. in Russian, and made a connection to "Mordor", or perhaps "Mordor" is even spelled as "Mirdor" in Russian.


In modern usage, Mir means world, not kingdom, and Mordor is not spelled Mirdor. Besides, that's not how Google's translation system works.


Mir can mean numerous things. And how is it not how Google Translate works? Do you have inside information? Because the only information we have is that it scours the web, makes connections, and forms a graph. Can feed it some "corrective" data as well.


Ok, then I suppose that you can't simply walk into russia... ok sorry for the bad joke


But you can simply bicycle out (as the stream of Syrian refugees entering Norway from Russia illustrates). Ergo, you cannot simply walk out of Russia (to Norway) either.


This was happening at the Finnish border, too, but just recently Finnish authorities banned crossing it with a bicycle, officially for safety reasons.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: