PEP 3131: Supporting Non-ASCII Identifiers

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= · May 15, 2007

Paul said:
Could you name a few? Thanks.

The GNU assembler also supports non-ASCII symbol names on object file
formats that support it; this includes at least ELF (not sure about
PE32). Higher-level programming languages can use that to encode symbols
in UTF-8.

Regards,
Martin

Hendrik van Rooyen · May 15, 2007

"Anders J. Munch" wrote:

Macros create dialects that are understood only by the three people in your
project group. It's unreasonable to compare that to a "dialect" such as
Mandarin, which is exclusive to a tiny little clique of one billion people.

A bit out of context here - I was trying to address the dichotomy of
reserved words and native identifiers - so if you want Mandarin or
Russian or Cantonese or Afrikaans or Flemish or German or Hebrew
identifiers, will you really be happy with the hassle of "for", "while",
"in" - as plain old English ASCII? How are you going to allow the native
speaker to get those into his mother's tongue without something like macros?

Are you suggesting separate parsers for each language?
Table driven parsers?

And if you go the macro route, how are you going to stop it being
abused for purposes that have nothing to do with language translation?

Do I have to draw a picture?

- Hendrik

Stefan Behnel · May 15, 2007

Paul said:
There is no feature that has ever been proposed for Python, that cannot
be supported with this argument. If you don't like having a "go to"
statement added to Python, where's the problem? Just don't use it in
your particular project.

"go to" is not meant for clarity, nor does it encourage code readability.

But that's what this PEP is about.

Stefan

Stefan Behnel · May 15, 2007

Paul said:
There is no feature that has ever been proposed for Python, that cannot
be supported with this argument. If you don't like having a "go to"
statement added to Python, where's the problem? Just don't use it in
your particular project.

"go to" is not meant for clarity, nor does it encourage code readability.

But that's what this PEP is about.

Stefan

Eric Brunel · May 15, 2007

On Mon said:
Can a discussion about support for non-english identifiers (1)
conducted in a group where 99.9% of the posters are fluent
speakers of english (2), have any chance of being objective
or fair?
Agreed.

Although probably not-sufficient to overcome this built-in
bias, it would be interesting if some bi-lingual readers would
raise this issue in some non-english Python discussion
groups to see if the opposition to this idea is as strong
there as it is here.

Done on the french Python newsgroup.

Eric Brunel · May 15, 2007

^^^
this. is. cool.

Yeah, right... The problems begin...

Joke aside, this just means that I won't ever be able to program math in
ADA, because I have absolutely no idea on how to do a 'pi' character on my
keyboard.

Still -1 for the PEP...

Antoon Pardon · May 15, 2007

There is no feature that has ever been proposed for Python, that cannot
be supported with this argument. If you don't like having a "go to"
statement added to Python, where's the problem? Just don't use it in
your particular project.

There is no feature that has ever been propose that cannot be rejected
by the opposite argument: I don't want to be bothered with something
like this and if it is introduced sooner or later I will.

And in my experience this argument is used a lot more than the first.

Stefan Behnel · May 15, 2007

Eric said:
Yeah, right... The problems begin...

Joke aside, this just means that I won't ever be able to program math in
ADA, because I have absolutely no idea on how to do a 'pi' character on
my keyboard.

Ah, you'll learn.

Stefan

Duncan Booth · May 15, 2007

Stefan Behnel a écrit :

I've never met anyone *serious* about programming and yet unable to
read and write CS-oriented technical English.

I don't believe that Python should be restricted to people *serious* about
programming.

Recently there has been quite a bit of publicity about the One Laptop Per
Child project. The XO laptop is just beginning rollout to children and
provides two main programming environments: Squeak and Python. It is an
exciting thought that that soon there will be millions of children in
countries such as Nigeria, Brazil, Uruguay or Nepal[*] who have the
potential to learn to program, but tragic if the Python community is too
arrogant to consider it acceptable to use anything but English and ASCII.

Yes, any sensible widespread project is going to mandate a particular
language for variable names and comments, but I see no reason at all why
they all have to use English.

[*] BTW, I see OLPC Nepal is looking for volunteer Python programmers this
Summer: if anyone fancies spending 6+ weeks in Nepal this Summer for no
pay, see http://www.mail-archive.com/[email protected]/msg04109.html

Eric Brunel · May 15, 2007

Recently there has been quite a bit of publicity about the One Laptop Per
Child project. The XO laptop is just beginning rollout to children and
provides two main programming environments: Squeak and Python. It is an
exciting thought that that soon there will be millions of children in
countries such as Nigeria, Brazil, Uruguay or Nepal[*] who have the
potential to learn to program, but tragic if the Python community is too
arrogant to consider it acceptable to use anything but English and ASCII.

You could say the same about Python standard library and keywords then.
Shouldn't these also have to be translated? One can even push things a
little further: I don't know about the languages used in the countries you
mention, but for example, a simple construction like 'if <condition> <do
something>' will look weird to a Japanese (the Japanese language has a
"post-fix" feel: the equivalent of the 'if' is put after the condition).
So why enforce an English-like sentence structure?

Yes, any sensible widespread project is going to mandate a particular
language for variable names and comments, but I see no reason at all why
they all have to use English.

Because that's what already happens? We definitely are in a globalized
world, and the only candidate language having a chance to allow people to
communicate with each other is English. Period. And believe me, I don't
like that (I'm French, if that can give you an idea about how much I
don't...). But that's a fact. Even people knowing the same language
sometimes communicate in English just in case they have to widen the
discussion to somebody else. To give you a perfect example, I had to
discuss just yesterday an answer we had to do to a Belgian guy, who speaks
French without any problem. His mail was written in English, and we
answered in English.

Anyway:

I don't believe that Python should be restricted to people *serious*
about programming.

You have a point here. When learning to program, or when programming for
fun without any intention to do something serious, it may be better to
have a language supporting "native" characters in identifiers. My problem
is: if you allow these, how can you prevent them from going public someday?

Stefan Behnel · May 15, 2007

Eric said:
You have a point here. When learning to program, or when programming for
fun without any intention to do something serious, it may be better to
have a language supporting "native" characters in identifiers. My
problem is: if you allow these, how can you prevent them from going
public someday?

My personal take on this is: search-and-replace is easier if you used well
chosen identifiers. Which is easier if you used your native language for them,
which in turn is easier if you can use the proper spellings. So I don't see
this problem getting any worse compared to the case where you use a
transliteration or even badly chosen english-looking identifiers from a small
vocabulary that is foreign to you.

For example, how many German names for a counter variable could you come up
with? Or english names for a function that does domain specific stuff and that
was specified in your native language using natively named concepts? Are you
sure you always know the correct english translations?

I think native identifiers can help here. Using them will enable you to name
things just right and with sufficient variation to make a search-and-replace
with english words easier in case it ever really becomes a requirement.

Stefan

Anton Vredegoor · May 15, 2007

Duncan said:
Recently there has been quite a bit of publicity about the One Laptop Per
Child project. The XO laptop is just beginning rollout to children and
provides two main programming environments: Squeak and Python. It is an
exciting thought that that soon there will be millions of children in
countries such as Nigeria, Brazil, Uruguay or Nepal[*] who have the
potential to learn to program, but tragic if the Python community is too
arrogant to consider it acceptable to use anything but English and ASCII.

Please don't be too quick with assuming arrogance. I have studied social
psychology for eleven years and my thesis was just about such a subject.
I even held a degree in social psychology for some time before my
government in their infinite wisdom decided to 'upgrade' the system so
that only people holding *working* positions at a university would be
able to convert their degrees to the new system. I suspect discerning
people can still sense a twinge of disagreement with that in my
professional attitude. However I still think the results of my research
were valid.

The idea was to try and measure whether it would be better for foreign
students visiting the Netherlands to be kept in their own separate
groups being able to speak their native language and to hold on to their
own culture versus directly integrating them with the main culture by
mixing them up with Dutch student groups (in this case the main culture
was Dutch).

I think I my research data supported the idea that it is best even for
the foreigners themselves to adapt as quickly as possible to the main
culture and start to interact with it by socializing with 'main culture'
persons.

My research at that time didn't fit in at all with the political climate
of the time and subsequently it was impossible for me to find a job.
That didn't mean that I forgot about it. I think a lot of the same ideas
would help the OLPC project so that they will not make the same mistake
of creating separate student populations.

I believe -but that is a personal belief which I haven't been able to
prove yet by doing research- that those people currently holding
positions of power in the main culture actively *prevent* new groups to
integrate because it would threaten their positions of power.

So instead of having a favorable view of teachers who are 'adapting' to
their students culture I have in fact quite the opposite view: Those
teachers are actually harming the future prospects of their students.
I'm not sure either whether they do it because they're trying to protect
their own positions or are merely complicit to higher up political forces.

Whatever you make of my position I would appreciate if you'd not
directly conclude that I'm just being arrogant or haven't thought about
the matter if I am of a different opinion than you.

Yes, any sensible widespread project is going to mandate a particular
language for variable names and comments, but I see no reason at all why
they all have to use English.

Well I clearly do see a reason why it would be in their very best
interest to immediately start to use English and to interact with the
main Python community.

[*] BTW, I see OLPC Nepal is looking for volunteer Python programmers this
Summer: if anyone fancies spending 6+ weeks in Nepal this Summer for no
pay, see http://www.mail-archive.com/[email protected]/msg04109.html

Thanks. I'll think about it. The main problem I see for my participation
is that I have absolutely *no* personal funds to contribute to this
project, not even to pay for my trip to that country or to pay my rent
while I'm abroad.

A.

Steven D'Aprano · May 15, 2007

No need -- a separate PEP (also by Martin) makes UTF-8 the default
encoding, and UTF-8 can encode any Unicode character you like.

Ah, that puts a slightly different perspective on the issue.

Steven D'Aprano · May 15, 2007

I concur, Aldo. Indeed, if I _can't_ be sure I understand a patch, I
don't accept it -- I ask the submitter to make it clearer.

Yes, but there is a huge gulf between what Aldo originally said he does
("visual inspection") and *reading and understanding the code*.

If somebody submits a piece of code where all the variable names,
functions, classes etc. are like a958323094, a498307913, etc. you're
going to have a massive problem following the code despite being in
ASCII. You would be sensible to reject the code. If you don't read
Chinese, and somebody submits a patch in Chinese, you would be sensible
to reject it, or at least have it vetted by somebody who does read
Chinese.

But is it really likely that somebody is going to submit a Chinese patch
to your English or Italian project? I don't think so.

Homoglyphs would ensure I could _never_ be sure I understand a patch,
without at least running it through some transliteration tool. I don't
think the world of open source needs this extra hurdle in its path.

If I've understood Martin's post, the PEP states that identifiers are
converted to normal form. If two identifiers look the same, they will be
the same.

Except, probably, identifiers using ASCII O and 0, or I l and 1, or rn
and m. Depending on your eyesight and your font, they look the same. The
solution to that isn't to prohibit O and 0 in identifiers, but to use a
font that makes them look different.

But even if the homoglyphs was a problem, as hurdles go, it's hardly a
big one. No doubt you already use automated tools for patch management,
revision control, bug tracking, unit-testing, maybe even spell checking.
Adding a transliteration tool to your arsenal is not really a disaster.

Thorsten Kampe · May 15, 2007

* Eric Brunel (Tue, 15 May 2007 10:52:21 +0200)

Recently there has been quite a bit of publicity about the One Laptop Per
Child project. The XO laptop is just beginning rollout to children and
provides two main programming environments: Squeak and Python. It is an
exciting thought that that soon there will be millions of children in
countries such as Nigeria, Brazil, Uruguay or Nepal[*] who have the
potential to learn to program, but tragic if the Python community is too
arrogant to consider it acceptable to use anything but English and ASCII.

Click to expand...

You could say the same about Python standard library and keywords then.

You're mixing apples and peaches: identifiers (variable names) are
part of the user interface for the programmer and free to his
diposition.

Thorsten

Steven D'Aprano · May 15, 2007

How do you know that? What steps did you take to ascertain it?

Why would I care? I don't bother to check it is ASCII because it makes no
difference whether it is ASCII or not. Allowing non-ASCII chars adds no
new vulnerability. Here's your example again, modified to show what I
mean:

if user_entered_password != stored_password_from_database:
password_is_correct = False
# much code goes here...
password_is_correct = True # sneaky backdoor inserted by Black Hat
# much code goes here...
if password_is_correct:
log_user_in()

Your example was poor security in the first place, but the vulnerability
doesn't come from the name of the identifier. It comes from the algorithm
you used.

Duncan Booth · May 15, 2007

Eric Brunel said:
Recently there has been quite a bit of publicity about the One Laptop
Per Child project. The XO laptop is just beginning rollout to
children and provides two main programming environments: Squeak and
Python. It is an exciting thought that that soon there will be
millions of children in countries such as Nigeria, Brazil, Uruguay or
Nepal[*] who have the potential to learn to program, but tragic if
the Python community is too arrogant to consider it acceptable to use
anything but English and ASCII.

Click to expand...

You could say the same about Python standard library and keywords
then. Shouldn't these also have to be translated? One can even push
things a little further: I don't know about the languages used in the
countries you mention, but for example, a simple construction like
'if <condition> <do something>' will look weird to a Japanese (the
Japanese language has a "post-fix" feel: the equivalent of the 'if'
is put after the condition). So why enforce an English-like sentence
structure?

Yes, non-English speakers have to learn a set of technical words which are
superficially in English, but even English native speakers have to learn
non-obvious meanings, or non-English words 'str', 'isalnum', 'ljust'.
That is an unavoidable barrier, but it is a limited vocabulary and a
limited set of syntax rules. What I'm trying to say is that we shouldn't
raise the entry bar any higher than it has to be.

The languages BTW in the countries I mentioned are: in Nigeria all school
children must study both their indigenous language and English, Brazil and
Uruguay use Spanish and Nepali is the official language of Nepal.

Duncan Booth · May 15, 2007

Anton Vredegoor said:
Whatever you make of my position I would appreciate if you'd not
directly conclude that I'm just being arrogant or haven't thought
about the matter if I am of a different opinion than you.

Sorry, I do apologise if that came across as a personal attack on you. It
certainly wasn't intended as such.

I was writing about the community as a whole: I think it would be arrogant
if the Python community was to decide not to support non-ascii identifiers
purely because the active community of experienced users doesn't want them
used in OSS software. OTOH, it may just be my own arrogance thinking such a
thing.

Well I clearly do see a reason why it would be in their very best
interest to immediately start to use English and to interact with the
main Python community.

I think the 'main Python community' is probably a very small subset of all
Python developers. To be honest I expect that only a tiny percentage of
OLPC users will ever do any programming, and a miniscule fraction of those
will go beyond simple scripts (but I'd love to be proved wrong and in a few
years be facing 50 million new Python programmers). Most of the programming
which is likely to happen on these devices is not going to require input
from the wider community.

[*] BTW, I see OLPC Nepal is looking for volunteer Python programmers
this Summer: if anyone fancies spending 6+ weeks in Nepal this Summer
for no pay, see
http://www.mail-archive.com/[email protected]/msg04109.html

Click to expand...

Thanks. I'll think about it. The main problem I see for my
participation is that I have absolutely *no* personal funds to
contribute to this project, not even to pay for my trip to that
country or to pay my rent while I'm abroad.

I think accomodation was included for the first 4 volunteers, the tricky
bit would be the air fare, I've no idea how much but I suspect flights to
Nepal aren't cheap.

Eric Brunel · May 15, 2007

* Eric Brunel (Tue, 15 May 2007 10:52:21 +0200)

Recently there has been quite a bit of publicity about the One Laptop Per
Child project. The XO laptop is just beginning rollout to children and
provides two main programming environments: Squeak and Python. It is an
exciting thought that that soon there will be millions of children in
countries such as Nigeria, Brazil, Uruguay or Nepal[*] who have the
potential to learn to program, but tragic if the Python community is too
arrogant to consider it acceptable to use anything but English and

Click to expand...

ASCII.

You could say the same about Python standard library and keywords then.

Click to expand...

You're mixing apples and peaches: identifiers (variable names) are
part of the user interface for the programmer and free to his
diposition.

So what? Does it mean that it's acceptable for the standard library and
keywords to be in English only, but the very same restriction on
user-defined identifiers is out of the question? Why? If I can use my own
language in my identifiers, why can't I write:

classe MaClasse:
dÃ©finir __init__(moi_mÃªme, maListe):
moi_mÃªme.monDictionnaire = {}
pour i dans maListe:
moi_mÃªme.monDictionnaire = Rien

For a French-speaking person, this is far more readable than:

class MaClasse:
def __init__(self, maListe):
self.monDictionnaire = {}
for i in maListe:
self.monDictionnaire = None

Now, *this* is mixing apples and peaches... And this would look even
weirder with a non-indo-european language...

Guest · May 15, 2007

Stefan said:
I agree that code posted to comp.lang.python should use english identifiers
and that it is worth considering to use english identifiers in open source
code that is posted to a public OS project site. Note that I didn't say "ASCII
identifiers" but plain english identifiers. All other code should use the
language and encoding that fits its environment best.

Unless you are 150% sure that there will *never* be the need for a
person who does not know your language of choice to be able to read or
modify your code, the language that "fits the environment best" is English.

I simply doubt that the problem which this PEP wants to solve actually
exists. If you know so little English that you really need non-ASCII
identifiers, you will have a very hard time programming Python anyway.

My native language is German, and in code I cooperate on with other
Germans, I still use English identifiers, even if I am "quite sure" that
no non-German will ever have to read the code. It also makes it easier
and more beautiful for me if my code uses the same natural language as
the stdlib and third party modules do. I do not need non-ASCII
characters in Python identifiers.

Atoms, Identifiers, and Primaries	21	Apr 16, 2013
Generating valid identifiers	8	Jul 26, 2012
Non-identifiers in dictionary keys for **expression syntax	3	May 23, 2013
Renaming identifiers & debugging	14	Feb 25, 2010
Looking for UNICODE to ASCII Conversioni Example Code	15	Oct 18, 2013
Python 3.5, bytes, and %-interpolation (aka PEP 461)	10	Feb 24, 2014
Is PEP-8 a Code or More of a Guideline?	52	May 26, 2007
Extended identifiers?	1	Jun 15, 2012

PEP 3131: Supporting Non-ASCII Identifiers

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Hendrik van Rooyen

Stefan Behnel

Stefan Behnel

Eric Brunel

Eric Brunel

Antoon Pardon

Stefan Behnel

Duncan Booth

Eric Brunel

Stefan Behnel

Anton Vredegoor

Steven D'Aprano

Steven D'Aprano

Thorsten Kampe

Steven D'Aprano

Duncan Booth

Duncan Booth

Eric Brunel

Guest

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads