PEP 3131: Supporting Non-ASCII Identifiers

Grant Edwards · May 16, 2007

I can't find any reference for Steven's alleged idiomatic use of "by
eye", either -- _however_, my wife Anna (an American from Minnesota)
came up with exactly the same meaning when I asked her if "by eye" had
any idiomatic connotations, so I suspect it is indeed there, at least in
the Midwest.

That's what it means to me (I'm also from the upper midwest).
One also hears the phrase "eyeball it" the the same context:
"You don't need to measure that, just eyeball it."

Aldo Cortesi · May 16, 2007

Thus spake Alex Martelli ([email protected]):

I can't find any reference for Steven's alleged idiomatic use of "by
eye", either -- _however_, my wife Anna (an American from Minnesota)
came up with exactly the same meaning when I asked her if "by eye" had
any idiomatic connotations, so I suspect it is indeed there, at least in
the Midwest. Funniest, of course, is that the literal translation into
Italian, "a occhio", has a similiar idiomatic meaning to _any_ native
speaker of Italian -- and THAT one is even in the Italian wikipedia!-)

I'll be the first to admit that this issue has nothing to do with the
substance of the argument (on which my wife, also my co-author of the
2nd ed of the Python Cookbook and a fellow PSF member, deeply agrees
with you, Aldo, and me), but natural language nuances and curios are my
third-from-the-top most consuming interest (after programming and...
Anna herself!-).

I must admit to a fascination with language myself - I even have a degree in
English literature to prove it! To be fair to Steven, I've asked some of my
colleagues here in Sydney about their reactions to the phrase "by eye", and
none of them have yet come up with anything that has the strong pejorative
taint Steven gave it. At any rate, it's clear that the phrase is not well
defined anywhere (not even in the OED), and I'm sure there are substantial
regional variations in interpretation.

In cases like these, however, context is paramount, so I will quote sentences
that started this petty bickering:

The security implications have not been sufficiently explored. I don't want
to be in a situation where I need to mechanically "clean" code (say, from a
submitted patch) with a tool because I can't reliably verify it by eye.

Surely, in context, the meaning is clear? "By eye" here means nothing more nor
less than a literal reading suggests. Taking these sentences to be an argument
for a slip-shod, careless approach to code, as Steven did, is surely perverse.

Regards,

Aldo

rurpy · May 16, 2007

We all know what the PEP is about (we can read). The point is: If we do
not *need* non-English/ASCII identifiers, we do not need the PEP. If the
PEP does not solve an actual *problem* and still introduces some
potential for *new* problems, it should be rejected. So far, the
"problem" seems to just not exist. The burden of proof is on those who
support the PEP.

I'm not sure how you conclude that no problem exists.
- Meaningful identifiers are critical in creating good code.
- Non-english speakers can not create or understand
english identifiers hence can't create good code nor
easily grok existing code.
Considering the vastly greater number of non-English
spreakers in the world, who are not thus unable to use
Python effectively, seems like a problem to me.

That all programers know enough english to create and
understand english identifiers is currently speculation or
based on tiny personaly observed samples.

I will add my own personal observation supporting the
opposite. A Japanese programmer friend was working
on a project last fall for a large Japanese company in
Japan. A lot of their programming was outsourced to
Korea. While the liason people on both side communicated
in a mixture of English and Japanese my understanding
was the all most all the programmers spoke almost
no English. The language used was Java. I don't know
how they handled identifiers but I have no reason to
believe they were English (though they may have been
transliterated Japanese).

Now that too is a tiny personaly observered sample
so it carries no more weight than the others. But it
is enough to make me question the original assertion
thal all programmers know english.

It's a big world and there are a lot of people out there.
Drawing conclusions based on 5 or 50 or 500 personal
contacts is pretty risky, particularly when being wrong
means putting up major barriers to Python use for
huge numbers of people.

Terry Reedy · May 16, 2007

| I must admit to a fascination with language myself - I even have a degree
in
| English literature to prove it! To be fair to Steven, I've asked some of
my
| colleagues here in Sydney about their reactions to the phrase "by eye",
and
| none of them have yet come up with anything that has the strong
pejorative
| taint Steven gave it. At any rate, it's clear that the phrase is not well
| defined anywhere (not even in the OED), and I'm sure there are
substantial
| regional variations in interpretation.

As a native American, yes, 'by eye' is sometimes, maybe even often used
with a perjorative intent.

| In cases like these, however, context is paramount, so I will quote
sentences
| that started this petty bickering:

However, in this context
|
| > The security implications have not been sufficiently explored. I don't
want
| > to be in a situation where I need to mechanically "clean" code (say,
from a
| > submitted patch) with a tool because I can't reliably verify it by eye.

I read it just as Aldo claims .

| Surely, in context, the meaning is clear? "By eye" here means nothing
more nor
| less than a literal reading suggests. Taking these sentences to be an
argument
| for a slip-shod, careless approach to code, as Steven did, is surely
perverse.

Perhaps because in this context, it is not at all clear what the 'more
exact' method would be.

Terry Jan Reedy

Pierre Hanser · May 16, 2007

(e-mail address removed) a écrit :

it *does* solve a huge problem: i have to use degenerate french, with
orthographic mistakes, or select in a small subset of words to use
only ascii. I'm limited in my expression, and I ressent this
everyday!

This is true, even if commercial french programmers don't object
the pep because they have to use english in their own work. This
is something i really cannot understand.

it's a problem of everyday, for million people!

and yes sometimes i publish code (rarely), even if it uses french
identifiers, because someone looking after a real solution *does*
prefer an existing solution than nothing.

Hendrik van Rooyen · May 16, 2007

Hi!

- should non-ASCII identifiers be supported? why?
- would you use them if it was possible to do so? in what cases?

Yes.

JScript can use letters with accents in identifiers
XML (1.1) can use letters with accents in tags
C# can use letters with accents in variables
SQL: MySQL/MS-Sql/Oralcle/etc. can use accents in fields or request
etc.
etc.

Python MUST make up for its lost time.

All those lemmings are jumping over a cliff!
I must hurry to keep up!

- Hendrik

Hendrik van Rooyen · May 16, 2007

Eric Brunel said:
So what? Does it mean that it's acceptable for the standard library and
keywords to be in English only, but the very same restriction on
user-defined identifiers is out of the question? Why? If I can use my own
language in my identifiers, why can't I write:

classe MaClasse:
dÃ©finir __init__(moi_mÃªme, maListe):
moi_mÃªme.monDictionnaire = {}
pour i dans maListe:
moi_mÃªme.monDictionnaire = Rien

For a French-speaking person, this is far more readable than:

class MaClasse:
def __init__(self, maListe):
self.monDictionnaire = {}
for i in maListe:
self.monDictionnaire = None

Now, *this* is mixing apples and peaches... And this would look even
weirder with a non-indo-european language...

I don't have any French, but I support this point absolutely - having
native identifiers is NFG if you can't also have native reserved words.

You may be stuck with English sentence construction though. - Would
be hard, I would imagine, to let the programmer change the word order,
or to incorporate something as weird as the standard double negative
in Afrikaans...

We say things that translate literally to: "I am not a big man not.", and it
is completely natural, so the if statements should follow the pattern.

- Hendrik

Guest · May 16, 2007

Steven said:
But they aren't new risks and problems, that's the point. So far, every
single objection raised ALREADY EXISTS in some form or another.

No. The problem "The traceback shows function names having characters
that do not display on most systems' screens" for example does not exist
today, to the best of my knowledge. And "in some form or another"
basically means that the PEP would create more possibilities for things
to go wrong. That things can already go wrong today does not mean that
it does not matter if we create more occasions were things can go wrong
even worse.

There's
all this hysteria about the problems the proposed change will cause, but
those problems already exist. When was the last time a Black Hat tried to
smuggle in bad code by changing an identifier from xyz0 to xyzO?

Agreed, I don't think intended malicious use of the proposed feature
would be a big problem.

Like the 5.5 billion people who speak no English.

No. The X people who speak "no English" and program in Python. I think X
actually is very low (close to zero), because programming in Python
virtually does require you to know some English, wether you can use
non-ASCII characters in identifiers or not. It is naive to believe that
you can program in Python without understanding any English once you can
use your native characters in identifiers. That will not happen. Please
understand that: You basically *must* know some English to program in
Python, and the reason for that is not that you cannot use non-ASCII
identifiers.

I admit that there may be occasions where you have domain-specific terms
that are hard to translate into English for a programmer. But is it
really not feasible to use an ASCII transliteration in these cases? This
does not seem to have been such a big problem so far, or else we would
have seen more discussions about it, I think.

Maybe so. But I guarantee with a shadow of a doubt that if the change
were introduced, people would use it -- even if right now they say they
don't want it.

Well, that is exactly what I would like to avoid ;-)

Hendrik van Rooyen · May 16, 2007

If non-ASCII identifiers becomes true, I think it will be the best
gift for Children who donot know English.

How do you feel about the mix of English keywords and Chinese?
How does the English - like "sentences " look to a Chinese?

Would you support the extension of this PEP to include Chinese
Keywords?

Would that be a lesser or greater gift?

- Hendrik

Guest · May 16, 2007

Steven said:
It won't be gibberish to the people who speak the language.

Hmmm, did you read my posting? By my experience, it will. I wonder: is
English an acquired language for you?

Guest · May 16, 2007

Gregor said:
If comments are allowed to be none English, then why are identifier not?

I don't need to be able to type in the exact characters of a comment in
order to properly change the code, and if a comment does not display on
my screen correctly, I am not as fscked as badly as when an identifier
does not display (e.g. in a traceback).

Gregor Horvath · May 16, 2007

RenÃ© Fleschenberg said:
today, to the best of my knowledge. And "in some form or another"
basically means that the PEP would create more possibilities for things
to go wrong. That things can already go wrong today does not mean that
it does not matter if we create more occasions were things can go wrong
even worse.

Following this logic we should not add any new features at all, because
all of them can go wrong and can be used the wrong way.

I love Python because it does not dictate how to do things.
I do not need a ASCII-Dictator, I can judge myself when to use this
feature and when to avoid it, like any other feature.

Gregor

Hendrik van Rooyen · May 16, 2007

..

This is not about "technical" English, this is about domain specific

English. How big is your knowledge about, say, biological terms or banking
terms in English? Would you say you're capable of modelling an application
from the domain of biology, well specified in a large German document, in
perfect English terms?

And: why would you want to do that?

Possibly because it looks better and reads easier than
a dog ugly mix of perfectly good German words
all mixed up with English keywords in an English
style of sentence construction?

- Hendrik

Guest · May 16, 2007

I'm not sure how you conclude that no problem exists.
- Meaningful identifiers are critical in creating good code.

I agree.

- Non-english speakers can not create or understand
english identifiers hence can't create good code nor
easily grok existing code.

I agree that this is a problem, but please understand that is problem is
_not_ solved by allowing non-ASCII identifiers!

Considering the vastly greater number of non-English
spreakers in the world, who are not thus unable to use
Python effectively, seems like a problem to me.

Yes, but this problem is not really addressed by the PEP. If you want to
do something about this:
1) Translate documentation.
2) Create a way to internationalize the standard library (and possibly
the language keywords, too). Ideally, create a general standardized way
to internationalize code, possibly similiar to how people
internationalize strings today.

When that is done, non-ASCII identifiers could become useful. But of
course, doing that might create a hog of other problems.

That all programers know enough english to create and
understand english identifiers is currently speculation or
based on tiny personaly observed samples.

It is based on a look at the current Python environment. You do *at
least* have the problem that the standard library uses English names.
This assumes that there is documentation in the native language that is
good enough (i.e. almost as good as the official one), which I can tell
is not the case for German.

Raffaele Salmaso · May 16, 2007

After reading all thread, and based on my experience (I'm italian,
english is not my native language)

- should non-ASCII identifiers be supported? yes
- why?

Years ago I've read C code written by a turkish guy, and all identifiers
were transliteration of arab (persian? don't know) words.
What I've understand of this code? Nothing. 0 (zero

). Not a word.
It would have been different if it was used unicode identifiers? Not at all.

- would you use them if it was possible to do so?

yes

--
()_() | NN KAPISCO XK' CELLHAVETE T'ANNTO CN ME SL | +----
(o.o) | XK' SKRIVO 1 P'HO VELLOCE MA HALL'ORA DITTELO | +---+
'm m' | KE SIETE VOI K CI HAVVETE PROBBLEMI NO PENSATECI | O |
(___) | HE SENZA RANKORI CIAOOOO |
raffaele punto salmaso at gmail punto com

Guest · May 16, 2007

Gregor said:
Following this logic we should not add any new features at all, because
all of them can go wrong and can be used the wrong way.

No, that does not follow from my logic. What I say is: When thinking
about wether to add a new feature, the potential benefits should be
weighed against the potential problems. I see some potential problems
with this PEP and very little potential benefits.

I love Python because it does not dictate how to do things.
I do not need a ASCII-Dictator, I can judge myself when to use this
feature and when to avoid it, like any other feature.

*That* logic can be used to justify the introduction of *any* feature.

Eric Brunel · May 16, 2007

Ok, so I'm an Open-Source guy who happens to work in-house. And I'm a
supporter of PEP 3131. I admit that I was simplifying in my round-up.

But I would say that "irresponsible" is a pretty self-centered word in
this
context. Can't you imagine that those who take the "irresponsible"
decisions
of working on (and starting) projects in "another language than English"
are
maybe as responsible as you are when you take the decision of starting a
project in English, but in a different context? It all depends on the
specific
constraints of the project, i.e. environment, developer skills, domain,
...

The more complex an application domain, the more important is clear and
correct domain terminology. And software developers just don't have
that. They
know their own domain (software development with all those concepts,
languages
and keywords), but there is a reason why they develop software for those
who
know the complex professional domain in detail but do not know how to
develop
software. And it's a good idea to name things in a way that is
consistent with
those who know the professional domain.

That's why keywords are taken from the domain of software development and
identifiers are taken (mostly) from the application domain. And that's
why I
support PEP 3131.

You keep eluding the question: even if the decisions made at the project
start seem quite sensible *at that time*, if the project ends up
maintained in Korea, you *will have* to translate all your identifiers to
something displayable, understandable and typable by (almost) anyone,
a.k.a ASCII-English... Since - as I already said - I'm quite convinced
that any application bigger than the average quick-n-dirty throwable
script is highly likely to end up in a different country than its original
coders', you'll end up losing the time you appeared to have gained in the
beginning. That's what I called "irresponsible" (even if I admit that the
word was a bit strong...).

Anyway, concerning the PEP, I've finally "put some water in my wine" as we
say in French, and I'm not so strongly against it now... Not for the
reasons you give (so we can continue our flame war on this ;-) ), but
mainly considering Python's usage in a learning context: this is a valid
reason why non-ASCII identifiers should be supported. I just wish I'll get
a '--ascii-only' switch on my Python interpreter (or any other means to
forbid non-ASCII identifiers and/or strings and/or comments).

Stefan Behnel · May 16, 2007

RenÃ© Fleschenberg said:
I agree.

I agree that this is a problem, but please understand that is problem is
_not_ solved by allowing non-ASCII identifiers!

Well, as I said before, there are three major differences between the stdlib
and keywords on one hand and identifiers on the other hand. Ignoring arguments
does not make them any less true.

So, the problem is partly tackled by the people who face it by writing
degenerated transliterations and language mix in identifiers, but it would be
*solved* by means of the language if Unicode identifiers were available.

Stefan

Marc 'BlackJack' Rintsch · May 16, 2007

The main problem here seems to be proving the need of something to people who
do not need it themselves. So, if a simple "but I need it because a, b, c" is
not enough, what good is any further prove?

Maybe all the (potential) programmers that can't understand english and
would benefit from the ability to use non-ASCII characters in identifiers
could step up and take part in this debate. In an english speaking
newsgroupâ€¦ =

)

There are potential users of Python who don't know much english or no
english at all. This includes kids, old people, people from countries
that have "letters" that are not that easy to transliterate like european
languages, people who just want to learn Python for fun or to customize
their applications like office suites or GIS software with a Python
scripting option.

Some people here seem to think the user base is or should be only from the
computer science domain. Yes, if you are a programming professional it
may be mandatory to be able to write english identifiers, comments and
documentation, but there are not just programming professionals out there.

Ciao,
Marc 'BlackJack' Rintsch

Stefan Behnel · May 16, 2007

RenÃ© Fleschenberg said:
I don't need to be able to type in the exact characters of a comment in
order to properly change the code, and if a comment does not display on
my screen correctly, I am not as fscked as badly as when an identifier
does not display (e.g. in a traceback).

Then get tools that match your working environment.

Stefan

Atoms, Identifiers, and Primaries	21	Apr 16, 2013
Generating valid identifiers	8	Jul 26, 2012
Non-identifiers in dictionary keys for **expression syntax	3	May 23, 2013
Renaming identifiers & debugging	14	Feb 25, 2010
Looking for UNICODE to ASCII Conversioni Example Code	15	Oct 18, 2013
Python 3.5, bytes, and %-interpolation (aka PEP 461)	10	Feb 24, 2014
Is PEP-8 a Code or More of a Guideline?	52	May 26, 2007
Extended identifiers?	1	Jun 15, 2012

PEP 3131: Supporting Non-ASCII Identifiers

Grant Edwards

Aldo Cortesi

rurpy

Terry Reedy

Pierre Hanser

Hendrik van Rooyen

Hendrik van Rooyen

Guest

Hendrik van Rooyen

Guest

Guest

Gregor Horvath

Hendrik van Rooyen

Guest

Raffaele Salmaso

Guest

Eric Brunel

Stefan Behnel

Marc 'BlackJack' Rintsch

Stefan Behnel

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads