PEP 3131: Supporting Non-ASCII Identifiers

Guest · May 15, 2007

Marc said:
You find it in the sources by the line number from the traceback and the
letters can be copy'n'pasted if you don't know how to input them with your
keymap or keyboard layout.

Typing them is not the only problem. They might not even *display*
correctly if you don't happen to use a font that supports them.

Stefan Behnel · May 15, 2007

René Fleschenberg said:
Typing them is not the only problem. They might not even *display*
correctly if you don't happen to use a font that supports them.

Sounds like high time for an editor that supports the project team in their
work, don't you think?

Stefan

Guest · May 15, 2007

Steven said:
How is that different from misreading "disk_burnt = True" as "disk_bumt =
True"? In the right (or perhaps wrong) font, like the ever-popular Arial,
the two can be visually indistinguishable. Or "call" versus "cal1"?

That is the wrong question. The right question is: Why do you want to
introduce *more* possibilities to do such mistakes? Does this PEP solve
an actual problem, and if so, is that problem big enough to be worth the
introduction of these new risks and problems?

I think it is not. I think that the problem only really applies to very
isolated use-cases. So isolated that they do not justify a change to
mainline Python. If someone thinks that non-ASCII identifiers are really
needed, he could maintain a special Python branch that supports them. I
doubt that there would be alot of demand for it.

Guest · May 15, 2007

Anders said:
There's any number of things to be done about that.
1. # -*- encoding: ascii -*-

This would limit comments and string literals to ASCII, too. I use "-*-
coding: utf-8 -*-" in all of my code and I am still against this PEP.

It is useful to be able to spell my own name correctly in a comment. It
is not useful to do so in a Python identifier.

Guest · May 15, 2007

Stefan said:
Ok, but then maybe that code just will not become Open Source. There's a
million reasons code cannot be made Open Source, licensing being one, lack of
resources being another, bad implementation and lack of documentation being
important also.

But that won't change by keeping Unicode characters out of source code.

Allowing non-ASCII identifiers will not change existing hindrances for
code-sharing, but it might add a new one.

IMO, the burden of proof is on you. If this PEP has the potential to
introduce another hindrance for code-sharing, the supporters of this PEP
should be required to provide a "damn good reason" for doing so. So far,
you have failed to do that, in my opinion. All you have presented are
vague notions of rare and isolated use-cases.

I'm only saying that this shouldn't be a language restriction, as there
definitely *are* projects (I know some for my part) that can benefit from the
clarity of native language identifiers (just like English speaking projects
benefit from the English language). And yes, this includes spelling native
language identifiers in the native way to make them easy to read and fast to
grasp for those who maintain the code.

If a maintenance programmer does not understand enough English to be
able to easily cope with ASCII-only identifiers, he will have a problem
anyway, since it will be very hard to use the standard library, the
documentation, and so on.

Aldo Cortesi · May 15, 2007

Thus spake Steven D'Aprano ([email protected]):

Yes, but there is a huge gulf between what Aldo originally said he does
("visual inspection") and *reading and understanding the code*.

Let's set aside the fact that you're guilty of sloppy quoting here, since the
phrase "visual inspection" is yours, not mine. Regardless, your interpretation
of my words is just plain dumb. My phrasing was intended to draw attention to
the fact that one needs to READ code in order to understand it. You know - with
one's eyes. VISUALLY. And VISUAL INSPECTION of code becomes unreliable if this
PEP passes.

If I've understood Martin's post, the PEP states that identifiers are
converted to normal form. If two identifiers look the same, they will be the
same.

I'm sorry to have to tell you, but you understood Martin's post no better than
you did mine. There is no general way to detect homoglyphs and "convert them to
a normal form". Observe:

import unicodedata
print repr(unicodedata.normalize("NFC", u"\u2160"))
print u"\u2160"
print "I"

So, a round 0 for reading comprehension this lesson, I'm afraid. Better luck
next time.

Regards,

Aldo

HYRY · May 15, 2007

- should non-ASCII identifiers be supported? why?
Yes. I want this for years. I am Chinese, and teaching some 12 years
old children learning programming. The biggest problem is we cannot
use Chinese words for the identifiers. As the program source becomes
longer, they always lost their thought about the program logic.

English keywords and libraries is not the problem, because we only use
about 30 - 50 of these words for teaching programming idea. They can
remember these words in one week. But for the name of variable or
function, it is difficult to remember all the English word. For
example, when we are doing numbers, maybe these words: [odd, even,
prime, minus ...], when we are programming for drawing: [line, circle,
pixel, ...], when it's for GUI: [ button, event, menu...]. There are
so many words that they cannot just remeber and use these words to
explain there idea.

Eventlly, when these children go to high school and study enough
English, I think they can use English words for programming. But as
the beginning step, it is difficult to learn both programming and
English.

So, I made a little program, just replace all the Chinese words in the
program to some sequency identifiers such as [a1, a2, a3, ...], So we
can do programming in Chinese, and Python can still run it.

If non-ASCII identifiers becomes true, I think it will be the best
gift for Children who donot know English.

Stefan Behnel · May 15, 2007

René Fleschenberg said:
you have failed to do that, in my opinion. All you have presented are
vague notions of rare and isolated use-cases.

I don't think software development at one of the biggest banks in Germany can
be considered a "rare and isolated use case".

Admittedly, it's done in Java, but why should Python fail to support unicode
identifiers in the way Java does?

Stefan

Guest · May 15, 2007

Stefan said:
My personal take on this is: search-and-replace is easier if you used well
chosen identifiers. Which is easier if you used your native language for them,
which in turn is easier if you can use the proper spellings.

I strongly disagree with this. My native language is German, and I do
*not* find it easier to find well chosen identifiers using German. Not
at all. Quite the opposite.

Programming is such an English-dominated culture that I even "think" in
English about it. When I want to explain something related to Computers
to German "non-techies", I often have to think about an appropriate
German word for what I have in mind first. Using the familiar English
term would be alot easier.

My experience is: If you know so little "technical" English that you
cannot come up with well chosen English identifiers, you need to learn
it. If you don't, you will not be able to write decent programs anyway.
All the keywords are in English, all of the standard library is in
English, most of the documentation is only available in English, almost
all third party modules' interfaces are in English.

Any program that uses non-English identifiers in Python is bound to
become gibberish, since it *will* be cluttered with English identifiers
all over the place anyway, wether you like it or not.

The point is: Supporting non-ASCII identifiers does *not* allow people
to write programs "using their native natural language". For that, you
would also have to translate the standard library and so on.

meincsv = csv.reader(open("meinedatei.csv"))
for zeile in meincsv:
for eintrag in zeile:
print eintrag.upper()

Even in that little trivial code snippet, you need to understand stuff
like "reader", "open", "for", "in", "print" and "upper". Mixing in the
German identifiers is both ugly and unnecessary.

For example, how many German names for a counter variable could you come up
with? Or english names for a function that does domain specific stuff and that
was specified in your native language using natively named concepts? Are you
sure you always know the correct english translations?

I don't need to know the perfect English translation, just one that is
understood by anyone who knows enough "Programming English", which is
just about any programmer in the world.

Marco Colombo · May 15, 2007

- should non-ASCII identifiers be supported? why?

Yes. For the same reason non-ASCII source files are supported.

- would you use them if it was possible to do so? in what cases?

Yes. In the same cases I'd use:
1) non-English comments;
2) non-English string literals;
3) a source file that is already non-ASCII.

Not that I usually do that. I speak Italian natively, but write
programs with English identifiers, English comments, English output
most of the time and usually the encoding is ASCII. Yet we support
other encoding for the source file, that is non-ASCII comments,
literals, and so on. And non-ASCII means non-English.

There may be very few reasons to use non-ASCII, non-English words in a
python program, _anywhere_, but that's not the issue. We already
support people writing program with, say, Italian comments, Italian
words in string literals (natively utf-8 encoded, not with escape
sequences), no matter if they are right or wrong doing so. The
following is a valid source file:

# coding=utf-8
# stampa il nome della città
citta = "Milano"
print "Città:", citta

We support people doing that. While we're at it, why not support this
one as well:

# coding=utf-8
# stampa il nome della città
città = "Milano"
print "Città:", città

(for people unable to see it: there's a "LATIN SMALL LETTER A WITH
GRAVE" at the end of "citta" instead of a plain "LATIN SMALL LETTER A"
- in the second program, that's true for the identifier as well).

Question is: if we support the former (right or wrong, we do), what do
we loose in supporting the latter? Most arguments against this PEP I
saw here apply to both programs, I'd like to see one that applies only
to the second one.

..TM.

Guest · May 15, 2007

Steven said:
That's because you can't read it, not because it uses Unicode. It could
be written entirely in ASCII, and still be unreadable and impossible to
understand.

That is a reason to actively encourage people to write their code in
English whereever possible, not one to allow non-ASCII identifiers,
which might even do the opposite.

That's no different from typos in ASCII. There's no doubt that we'll give
the same answer we've always given for this problem: unit tests, pylint
and pychecker.

Maybe it is no different (actually, I think it is: With ASCII, at least
my terminal font can display all the identifiers in a traceback), but
why do you want to create *more* possibilities to do mistakes?

Guest · May 15, 2007

Stefan said:
"go to" is not meant for clarity, nor does it encourage code readability.

Some people would argue that position.

But that's what this PEP is about.

IMHO, this PEP does not encourage clarity and readability, it
discourages it. Identifiers which my terminal cannot even display surely
are not very readable.

Stefan Behnel · May 15, 2007

René Fleschenberg said:
Programming is such an English-dominated culture that I even "think" in
English about it.

That's sad.

My experience is: If you know so little "technical" English that you
cannot come up with well chosen English identifiers, you need to learn
it.

This is not about "technical" English, this is about domain specific
English. How big is your knowledge about, say, biological terms or banking
terms in English? Would you say you're capable of modelling an application
from the domain of biology, well specified in a large German document, in
perfect English terms?

And: why would you want to do that?

Stefan

Guest · May 15, 2007

Stefan said:
Sounds like high time for an editor that supports the project team in their
work, don't you think?

I think your argument about "isolated projects" is flawed. It is not at
all unusual for code that was never intended to be public, whose authors
would have sworn that it will never ever be need to read by anyone
except themselves, to surprisingly go public at some point in the future.

Moreover, wether it uses ASCII-only identifiers or not might actually be
a factor in deciding wether it can then become useful for the community
as a whole or not.

If only some few projects that are so very very isolated from the rest
of the world profit from the change this PEP proposes, then it should
IMHO be feasible to require them to use a special Python branch. That
would keep the burden of all the possible problems away from the rest of
the Python community.

Guest · May 15, 2007

Stefan said:
I don't think software development at one of the biggest banks in Germany can
be considered a "rare and isolated use case".

And that software development at that bank is not done in Python because
Python does not support non-ASCII identifiers? Can you provide a source
for that?

Admittedly, it's done in Java, but why should Python fail to support unicode
identifiers in the way Java does?

Your example does not prove much. The fact that some people use
non-ASCII identifiers when they can does not at all prove that it would
be a serious problem for them if they could not.

Stefan Behnel · May 15, 2007

René Fleschenberg said:
And that software development at that bank is not done in Python because
Python does not support non-ASCII identifiers? Can you provide a source
for that?

Your example does not prove much. The fact that some people use
non-ASCII identifiers when they can does not at all prove that it would
be a serious problem for them if they could not.

Are we trying to prove that?

And, would we have serious problems and people running from Python if Python
2.5 did not integrate the "with" statement?

Stefan

Stefan Behnel · May 15, 2007

René Fleschenberg said:
I think your argument about "isolated projects" is flawed. It is not at
all unusual for code that was never intended to be public, whose authors
would have sworn that it will never ever be need to read by anyone
except themselves, to surprisingly go public at some point in the future.

Ok, but how is "using non-ASCII identifiers" different from "using mandarin
tranliterated ASCII identifiers" in that case?

Please try to understand what the point of this PEP is.

Stefan

Thorsten Kampe · May 15, 2007

* Eric Brunel (Tue, 15 May 2007 11:51:20 +0200)

* Eric Brunel (Tue, 15 May 2007 10:52:21 +0200)

On Tue, 15 May 2007 09:38:38 +0200, Duncan Booth
Recently there has been quite a bit of publicity about the One Laptop
Per
Child project. The XO laptop is just beginning rollout to children and
provides two main programming environments: Squeak and Python. It is
an
exciting thought that that soon there will be millions of children in
countries such as Nigeria, Brazil, Uruguay or Nepal[*] who have the
potential to learn to program, but tragic if the Python community is
too
arrogant to consider it acceptable to use anything but English and
ASCII.

You could say the same about Python standard library and keywords then.

Click to expand...

You're mixing apples and peaches: identifiers (variable names) are
part of the user interface for the programmer and free to his
diposition.

Click to expand...

So what? Does it mean that it's acceptable for the standard library and
keywords to be in English only, but the very same restriction on
user-defined identifiers is out of the question?
Yes.

Why? If I can use my own
language in my identifiers, why can't I write:

classe MaClasse:
définir __init__(moi_meme, maListe):
moi_meme.monDictionnaire = {}
pour i dans maListe:
moi_meme.monDictionnaire = Rien

For a French-speaking person, this is far more readable than:

Because keywords are not meant meant to extended or manipulated or
something similar by the programmers. Keywords are well known and only
a limited set of words. That's why you can't use keywords as
identifiers.

On the contrary identifiers are for the user's disposition. The
convention for naming them is: give them the name that makes the most
sense in relation to the code. In a lot of cases this will mean
english names and ASCII charset. And in some restricted environments
this means naming identifiers with terms from the native language. And
in this case it makes no sense at all to restrict these people to use
ASCII characters to write words in their own language.

There really is no difference to allow strings or comments in non-
english languages and non-ASCII characters.

Thorsten

Guest · May 15, 2007

Stefan said:
That's sad.

I don't think so. It enables me to communicate about that topic with a
very broad range of other people, which is A Good Thing.

This is not about "technical" English, this is about domain specific
English. How big is your knowledge about, say, biological terms or banking
terms in English? Would you say you're capable of modelling an application
from the domain of biology, well specified in a large German document, in
perfect English terms?

As I have said, I don't need to be able to do that (model the
application in perfect English terms). It is better to model it in
non-perfect English terms than to model it in perfect German terms. Yes,
I do sometimes use a dictionary to look up the correct English term for
a domain-specific German word when programming. It is rarely necessary,
but when it is, I usually prefer to take that effort over writing a
mixture of German and English.

And: why would you want to do that?

1) To get the broadest possible range of coworkers and maintenance
programmers.

2) To be consistent. The code is more beautiful if it does not
continously jump from one language to another. And the only way to
achieve that is to write it all in English, since the standard library
and alot of other stuff is in English.

Guest · May 15, 2007

Stefan said:
Are we trying to prove that?

IMO, if you cannot prove it, the PEP should be rejected, since that
would mean it introduces new problems without any proven substantial
benefits.

And, would we have serious problems and people running from Python if Python
2.5 did not integrate the "with" statement?

1) Which additional potential for bugs and which hindrances for
code-sharing do you see with the with-statement?

2) The with-statement does have proven substantial benefits, IMO.

Atoms, Identifiers, and Primaries	21	Apr 16, 2013
Generating valid identifiers	8	Jul 26, 2012
Non-identifiers in dictionary keys for **expression syntax	3	May 23, 2013
Renaming identifiers & debugging	14	Feb 25, 2010
Looking for UNICODE to ASCII Conversioni Example Code	15	Oct 18, 2013
Python 3.5, bytes, and %-interpolation (aka PEP 461)	10	Feb 24, 2014
Is PEP-8 a Code or More of a Guideline?	52	May 26, 2007
Extended identifiers?	1	Jun 15, 2012

PEP 3131: Supporting Non-ASCII Identifiers

Guest

Stefan Behnel

Guest

Guest

Guest

Aldo Cortesi

HYRY

Stefan Behnel

Guest

Marco Colombo

Guest

Guest

Stefan Behnel

Guest

Guest

Stefan Behnel

Stefan Behnel

Thorsten Kampe

Guest

Guest

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads