Python Statements/Keyword Localization

E

Emanuele D'Arrigo

Greetings everybody,

some time ago I saw a paper that used an XSL transformation sheet to
transform (if I remember correctly) a Chinese xml file (inclusive of
Chinese-script XML tags) into an XHTML file.

More recently you might have all heard how the ICANN has opened up the
way for non-latin characters in domain names, so that we'll soon start
seeing URLs using Russian, Asian and Arabic characters.

In this context I was wondering if there has ever been much thought
about a mechanism to allow the localization not only of the strings
handled by python but also of its built-in keywords, such as "if",
"for", "while", "class" and so on. For example, the following English-
based piece of code:

class MyClass(object):
def myMethod(self, aVariable):
if aVariable == True:
print "It's True!"
else:
print "It's False!"

would become (in Italian):

classe LaMiaClasse(oggetto):
def ilMioMetodo(io, unaVariabile)
se unaVariabile == Vero:
stampa "E' Vero!"
altrimenti:
stampa "E' Falso!"

I can imagine how a translation script going through the source code
could do a 1:1 keyword translation to English fairly quickly but this
would mean that the runtime code still is in English and any error
message would be in English. I can also imagine that it should be
possible to "simply" recompile python to use different keywords, but
then all libraries using the English keywords would become
incompatible, wouldn't they?

In this context it seems to be the case that the executable would have
to be able to optionally accept -a list- of dictionaries to internally
translate to English the keywords found in the input code and at most -
one- dictionary to internally translate from English output messages
such as a stack trace.

What do you guys think?

Manu
 
M

MRAB

Emanuele said:
Greetings everybody,

some time ago I saw a paper that used an XSL transformation sheet to
transform (if I remember correctly) a Chinese xml file (inclusive of
Chinese-script XML tags) into an XHTML file.

More recently you might have all heard how the ICANN has opened up the
way for non-latin characters in domain names, so that we'll soon start
seeing URLs using Russian, Asian and Arabic characters.

In this context I was wondering if there has ever been much thought
about a mechanism to allow the localization not only of the strings
handled by python but also of its built-in keywords, such as "if",
"for", "while", "class" and so on. For example, the following English-
based piece of code:

class MyClass(object):
def myMethod(self, aVariable):
if aVariable == True:
print "It's True!"
else:
print "It's False!"

would become (in Italian):

classe LaMiaClasse(oggetto):
def ilMioMetodo(io, unaVariabile)
se unaVariabile == Vero:
stampa "E' Vero!"
altrimenti:
stampa "E' Falso!"

I can imagine how a translation script going through the source code
could do a 1:1 keyword translation to English fairly quickly but this
would mean that the runtime code still is in English and any error
message would be in English. I can also imagine that it should be
possible to "simply" recompile python to use different keywords, but
then all libraries using the English keywords would become
incompatible, wouldn't they?

In this context it seems to be the case that the executable would have
to be able to optionally accept -a list- of dictionaries to internally
translate to English the keywords found in the input code and at most -
one- dictionary to internally translate from English output messages
such as a stack trace.

What do you guys think?
It might be necessary to work in tokens, where a token is a word or a
string (or maybe also a comment). Your example would be encoded to:

«1» «2»(«3»):
«4» «5»(«6», «7»):
«8» «7» == «9»:
«10» «11»
«12»:
«10» «13»

with either English:

«1» class
«2» MyClass
«3» object
«4» def
«5» myMethod
«6» self
«7» aVariable
«8» if
«9» True
«10» print
«11» "It's True!"
«12» else
«13» "It's False!"

or Italian:

«1» classe
«2» LaMiaClasse
«3» oggetto
«4» def
«5» ilMioMetodo
«6» io
«7» unaVariabile
«8» se
«9» Vero
«10» stampa
«11» "É Vero!"
«12» altrimenti
«13» "É Falso!"

Any messages produced by, or format strings used by, the runtime would
also be tokens.

Python currently does lexical analysis on the source code to identify
names, strings, etc; a new tokenised file format would partially bypass
that because the names and strings (and comments?) have already been
identified.
 
G

garabik-news-2005-05

Emanuele D'Arrigo said:
Greetings everybody,

some time ago I saw a paper that used an XSL transformation sheet to
transform (if I remember correctly) a Chinese xml file (inclusive of
Chinese-script XML tags) into an XHTML file.

More recently you might have all heard how the ICANN has opened up the
way for non-latin characters in domain names, so that we'll soon start
seeing URLs using Russian, Asian and Arabic characters.

Non-latin characters in domain names are already possible (and usable and
actually used) for some years by now. Google for "punycode".

ICANN was talking about TLD.
I can imagine how a translation script going through the source code
could do a 1:1 keyword translation to English fairly quickly

....and get conflickts because I named my variable (in English python),
say, stampa.

Anyway, see http://www.chinesepython.org

--
-----------------------------------------------------------
| Radovan Garabík http://kassiopeia.juls.savba.sk/~garabik/ |
| __..--^^^--..__ garabik @ kassiopeia.juls.savba.sk |
-----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!
 
T

Terry Reedy

Emanuele said:
Greetings everybody,

some time ago I saw a paper that used an XSL transformation sheet to
transform (if I remember correctly) a Chinese xml file (inclusive of
Chinese-script XML tags) into an XHTML file.

More recently you might have all heard how the ICANN has opened up the
way for non-latin characters in domain names, so that we'll soon start
seeing URLs using Russian, Asian and Arabic characters.

In this context I was wondering if there has ever been much thought
about a mechanism to allow the localization not only of the strings
handled by python but also of its built-in keywords, such as "if",
"for", "while", "class" and so on.

There have been various debates and discussions on the topic. There has
been slow movement away from ascii-only in user code. (But not in the
stdlib, nor will there be there.)
1. Unicode data type.
2. Unicode allowed in comment and string literals.
This required input decoding and coding cookie. This lead, I believe
somewhat accidentally, to
3. Extended ascii (high bit set, for other European chars in various
encodings) for identifiers.
4 (In 3.0) unicode allowed for identifiers

Here is a version of the anti-customized-keyword position. Python is
designed to be read by people. Currently, any programmer in the world
can potentially read any Python program. The developers, especially
Guido, like this. Fixed keywords are not an undue burden because any
educated person should learn to read Latin characters a-z,0-9. and
Python has an intentionally short list that the developers are loath to
lengthen.

Change 4 above inhibits universal readability. But once 3 happened and
str became unicode, in 3.0, it was hard to say no to this.

A 'pro' argument: Python was designed for learning and is good for that
and *is* used in schools down to the elementary level. But kids cannot
be expected to know foreign alphabets and words whill still learning
their own.
> For example, the following English-
based piece of code:

class MyClass(object):
def myMethod(self, aVariable):
if aVariable == True:
print "It's True!"
else:
print "It's False!"

would become (in Italian):

classe LaMiaClasse(oggetto):
def ilMioMetodo(io, unaVariabile)
se unaVariabile == Vero:
stampa "E' Vero!"
altrimenti:
stampa "E' Falso!"

I can imagine how a translation script going through the source code
could do a 1:1 keyword translation to English fairly quickly but this
would mean that the runtime code still is in English and any error
message would be in English.

This is currently seen as a reason to not have other keywords: it will
do no good anyway. A Python programmer must know minimal English and the
keywords are the least of the problem.

I can imagine that there could be a mechanism for extracting and
replacing error messages with translations, like there is for Python
code, but I do not know if it will even happen with haphazard volunteer
work or will require grant sponsorship.
I can also imagine that it should be
possible to "simply" recompile python to use different keywords, but
then all libraries using the English keywords would become
incompatible, wouldn't they?

In this context it seems to be the case that the executable would have
to be able to optionally accept -a list- of dictionaries to internally
translate to English the keywords found in the input code and at most -
one- dictionary to internally translate from English output messages
such as a stack trace.

What do you guys think?

I would like anyone in the world to be able to use Python, and I would
like Python programmers to potentially be able to potentially read any
Python code and not have the community severely balkanized. To me, this
would eventually mean both native keywords and tranliteration from other
alphabets and scripts to latin chars. Not an easy project.

Terry Jan Reedy
 
M

Marco Mariani

Emanuele said:
In this context it seems to be the case that the executable would have
to be able to optionally accept -a list- of dictionaries to internally
translate to English the keywords found in the input code and at most -
one- dictionary to internally translate from English output messages
such as a stack trace.

What do you guys think?

Microsoft did that twenty years ago and we're still mocking them.
 
M

Marco Mariani

Terry said:
A 'pro' argument: Python was designed for learning and is good for that
and *is* used in schools down to the elementary level. But kids cannot
be expected to know foreign alphabets and words whill still learning
their own.

I taught myself BASIC at 9 by reading magazines, but had my first
english lessons five years later.
Knowing english would have helped to understand the operating/language
manuals, not the language keywords themselves.

I suppose if you were to try what the OP suggested, you would need to
translate a lot of the standard library, parameter names, and such...
nonsense.
 
E

Emanuele D'Arrigo

Thank you all for the insights. I particularly like the broad spread
of opinions on the subject.

Indeed when I wrote the original post my thoughts were with those
young students of non-English speaking countries that start learning
to program before they learn English. My case is almost one of those:
I started at home when I was 13, toying around with Basic, and at the
time not only I didn't know English, but for a few more years I would
be learning only French. Later I did start learning English but I
still found that while learning programming in Pascal at school its
English keywords were somehow an interruption of my mental flow. At
the time (20 years ago) localization wasn't a particularly big thing
and this issue would have been a bit of a lost cause. But as the world
is becoming more and more interconnected I think it is important that
we all make an effort to respect cultural needs and sensitivities of
both non-western adults and youngsters alike.

Ultimately I certainly appreciate the ubiquity of English even though
in the interest of fairness and efficiency I'd prefer the role of
common language to be given to a constructed language, such as Ido.
But it doesn't take a particularly religious person to see that "do to
others as you would want them do to you" tends to be a valid
principle, and in the same way the world would be at a loss if an
Indian university came up with a wonderful programming language
available only in Sanskrit, the world is at a loss not having a
beautiful language such as Python natively available in other
scripts.

Again, thank you all!

Manu
 
T

Terry Reedy

Ben said:
I prefer Lojban <URL:http://www.lojban.org/> as being logically robust
while fully expressive, and sharing the Ido goal of avoiding
disadvantage to native speakers of any particular existing language.

Even though I an mative-English speaking and therefore advantaged with
English becoming ubiquitous, I would not mind learning *one* artificial
'natural' language. I even considered Esperanto at one point. But the
advocates of such a lingua franca cannot seem to agree on which ;-).
(Ido? never heard of it before -- will check W.P).

tjr
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,071
Latest member
MetabolicSolutionsKeto

Latest Threads

Top