This gives me nightmares of the time I was hired on contract to update a bunch of Word95 macros to manage the ISO9001 quality handbook for an international company. Thing is, Word95 could (or always did?) localize the BASIC keywords at least for some countries (no idea when/if that stopped - I have thankfully not needed that particular skill again..) and stored the scripts tokenized. Which was usually not a big problem - few people transferred scripts between localized versions.
Except these macros had been transferred from the Danish version of Word to the Norwegian version. A reasonable percentage of the code had been moved over in ways that made Word treat it as text, instead of tokenized BASIC, and consequently the translation wasnæt made. As a result, they had a ton of Danish keywords in the Norwegian codebase.
Yay. Now, it isn't the worst combination - the two languages are very similar. On the other hand this meant that a lot of things seemed plausible to me in Norwegian, yet still broke horribly because they had sometimes chosen different words that were both valid in either language...
Bad idea. It's bad for collaboration and it looks comic (especially when you need to use words like "HTTP", "POST" near with local language words). We in Russia have already good example of software written in "local language" - "1S".
For developers in Russia it's reference of code smell.
Yep. And programming languages, while they use English keywords, are not English. They are languages in their own right, and "translating" words is not a good idea, because that's creating a new language.
Still, a language monoculture is more convenient for most programmers. As an English-speaking programmer, I'd hate to have to maintain, say, Chinese code.
I think the language constructs/keywords are the "easy" part. But how do you translate identifiers, comments and other "textual" content automatically?
Nice work! I could certainly see how it could lower the barrier to entry for non-English speakers when trying to learn programming. My brother-in-law, for example, has been having a hard time with English but could probably pick up Javascript faster if he could use something like this.
Something else that would be useful for him would be a simplified translation dict that only contained jquery functions. Something like http://www.jsrosettastone.com/ but for other spoken languages.
Use case would be quick reference: he sees node.append(...) somewhere but doesn't know what append means so quickly looks it up.
Edit: it's easy to find ways in which something won't work. In the spirit of the recent discussions regarding being positive on HN, let's look for ways things like this can be beneficial.
This should really be an editor feature, something that can automatically replace localized words with english words and later show the localized version when the english version is moused over (or something like that).
Writing the entire codebase with localized words hinders collaboration, and code portability.
But having an editor feature that assists with the (human) language would keep all the localization with just a single user rather than in the entire codebase. Additionally, it would be able to display the english version as well, which could help the foreign user learn by osmosis.
I think this could actually be very interesting, especially for teaching children programming in non-native english speaking countries. Wouldn't it be cool if kahn academy adopted this for compsci stuff so teachers could use it in elementary? Any thoughts about making this require compliant?
For what it's worth, you can make the Arabic translation "more native" by adding a "dir=rtl" attribute to the text area, though it's still not quite right. Or you can leave in in LTR mode and put an LRM (in JS, "\u200e") after each word to make it just look like the English with all the words swapped out.
Both point at the underlying problem that translation is not a matter of just swapping out the words. For example, the mathematical notation that most programming languages rely upon is fundamentally left to right (much in the same way the keywords are English).
Those are actually a part of a "future reserved keywords" list from an older version of the ecmascript spec.. Safari v1 used to disallow the use of a few of them, but they never actually wound up getting used, and in more recent versions have been removed.
I am not a native English speaker myself but I've always felt that translations into my language of just about anything (computer lingo, science etc) made difficult concepts harder to understand, not easier.
Search engines could be localization-aware. Google already makes quite complex transformations to the queries, I don't think this would be insurmountable.
My understanding is that this is all about reserved words. It is a reverse mapping of (English) javascript reserved words from their non-English translations.
I did a quick Finnish translation as a lark, then found the BabylScript project and added their translations to the page.
The main difference between the approaches is that BabylScript does things in a more Right Way (they do punctuation, number parsing, localised object property names) whereas js-i18n only maps translated tokens to JavaScript tokens. Token mapping is easy to implement and has no effect on the generated code (though you need to watch out for namespace collisions, as different words in language X may map into one word in language Y). From reading the paper, the BabylScript project needs to do some runtime dispatching to handle translated property names.
Except these macros had been transferred from the Danish version of Word to the Norwegian version. A reasonable percentage of the code had been moved over in ways that made Word treat it as text, instead of tokenized BASIC, and consequently the translation wasnæt made. As a result, they had a ton of Danish keywords in the Norwegian codebase.
Yay. Now, it isn't the worst combination - the two languages are very similar. On the other hand this meant that a lot of things seemed plausible to me in Norwegian, yet still broke horribly because they had sometimes chosen different words that were both valid in either language...