>I often wonder how much of a head start the isolating nature of English gave fo...

bonoboTP · on Nov 20, 2019

> you assume there was a head start, when, in fact, the world's first commercial computer was German

Did the Z4 do a lot of German language text generation, or German language input parsing? But anyway German is also not agglutinative, but it does have complexities like gendered declension of articles and adjectives.

> And until recently, natural languages had a near-zero effect on computing.

Seems like we're talking past each other and I packed multiple things in the comment. I meant user-facing messages there. I've done some software internationalization (translation) work some years ago and in many cases the format was just templates. You were often expected to translate templates with pluggable strings. Whereas what you would actually need is to write a function that looks at the word that you want to plug in, extracts the vowels, categorizes them with some branching logic, looks at the last consonant, decides if you need a linking vowel, decides on the vowel harmony based on the vowels, look up if it's an exception and then apply the suffix.

In English you can generate the message "Added %s to the %s." These are usually translated to Hungarian as if it was "%s has been added to the following: %s". Or instead of "with %s" they must write "with the following: %s", because applying "with" to a word or personal name requires non-trivial logic. Whenever the translators resort to "... the following: %s", you can know they weren't able to fit it into the sentence with proper grammar due to the use of too primitive string interpolation-based internationalization.

Until recently, Facebook was not able to apply declension to people's names, as it is quite complicated. Normally "$person_name likes this post." would require putting $person_name into dative case, requiring determination of vowel harmony. To avoid it, they picked a rarer verb form which doesn't need the dative case but doesn't sound as natural. They've only transitioned to the dative case in the last year or so.

A lot of this stuff is just not even on the mind of English speaking devs, because template-based string interpolation is a good enough solution in English for the vast majority of cases. The only exception that would need a little bit of branching logic is applying "a" or "an" before a word or pluralization, but these don't come up too often.

Again, my point was dynamically generating user-facing messages, UI elements is so easy in English, while properly doing it in other languages.

> Would have? NLP has only started to matter recently, at a time when it has to work in all languages from the get-go. The current evolution includes contributions of people from many languages and cultures.

Most of the research outside of explicit machine translation research is still based on English. How many papers are out there, e.g., on visual question answering (VQA) systems in Polish or Finnish? In many cases I feel less impressed by such systems because I feel like English is too easy. The order is very predictable, the words are easily separable, the whole thing is much more machine processable. Maybe it isn't so, it would be interesting to see empirical results.

romwell · on Nov 20, 2019

Ah. On that note, I guess my point was that language was never an impediment to UI.

Sure, some things will be easier in English. In other languages, the programmers would just roll with whatever is easier to code; the users would gobble it up as long as it's usable.

Back in the 90's, I've seen pirated software "internationalized" by running the UI keywords through machine translation into Russian. Knowing English was an advantage: if you translated the UI back into English, you could figure out what some of those things did. Still, it existed.

The complexity of language wasn't an impediment, it just lowered expectations for the quality of user interfaces.