PEP 3131: Supporting Non-ASCII Identifiers

rurpy · May 16, 2007

Hendrik van Rooyen said:
I think this cuts right down to why I oppose the PEP.
It is not so much for technical reasons as for aesthetic
ones - I find reading a mix of languages horrible, and I am
kind of surprised by the strength of my own reaction.

But to reiterate, most public code will remain english
because that is the only practical way of managing an
international project.

If don't understand this almost pathological fear that
if the PEP is adopted, the world will be deluged by
a torrent of non-english programs. 99.9% of such programs
will be born an die in an enviroment where only speakers
of those languages will touch them.

The few that leak into the wider world will have to
be internationalized before most people will consider
adopting them, volenteering to maintain them, etc.

And has been already pointed out this is already the
case. How can you maintain a python program written
with only ascii identifiers but transliterated from
a non-english language and with documention, comments,
prompts and messages in that language?

This situation exists right now and it hasn't caused
the end of python-programming-as-we-know-it.

If I try to analyse my feelings, I think that really the PEP
does not go far enough, in a sense, and from memory
it seems to me that only E Brunel, R Fleschenberg and
to a lesser extent the Martellibot seem to somehow think
in a similar way as I do, but I seem to have an extreme
case of the disease...

And the summaries of reasons for and against have left
out objections based on this feeling of ugliness of mixed
language.

Interestingly, the people who seem to think a bit like that all
seem to be non native English speakers who are fluent in
English.

I have read that people who move to, or become citizens
of a new country often become far more patriotic and
defensive of their new country, then their native-born
compatriots.

While the support seems to come from people whose English
is perfectly adequate, but who are unsure to the extent that they
apologise for their "bad" English.

Is this a pattern that you have identified? - I don't know.

I still don't like the thought of the horrible mix of "foreign"
identifiers and English keywords, coupled with the English
sentence construction. And that, in a nutshell, is the main
reason for my rather vehement opposition to this PEP.

The other stuff about sharing and my inability to even type
the OP's name correctly with the umlaut is kind of secondary
to this feeling of revulsion.

Interesting explanation, thanks. I personally feel a
lot of the reaction against the PEP involves psychological
drivers like loss of control and loss of status but am
not a psycologist so it would be too much work from me
to try and defend, so I won't try to.

I'll just say I think that making Python (significantly!!)
more accessible to non-English speakers is far too imporant
to both those potential new users as to Python itself,
that it should not be decided by "feelings".

"Beautiful is better than ugly"

"Beauty is in the eye of the beholder"

sjdevnull · May 16, 2007

So the solution is to forbid Chinese XP ?

It's one solution, depending on your support needs.

Independent of Python, several companies I've worked at in Ecuador
(entirely composed of native Spanish-speaking Ecuadoreans) use the
English-language OS/application installations--they of course have the
Spanish dictionaries and use Spanish in their documents, but for them,
having localized application menus generates a lot more problems than
it solves.

Matthew Woodcraft · May 16, 2007

Eric Brunel said:
Joke aside, this just means that I won't ever be able to program math in
ADA, because I have absolutely no idea on how to do a 'pi' character on my
keyboard.

Just in case it wasn't clear: you could of course continue to use the
old name 'Pi' instead.

-M-

rurpy · May 16, 2007

I agree.

I agree that this is a problem, but please understand that is problem is
_not_ solved by allowing non-ASCII identifiers!

Yes, but this problem is not really addressed by the PEP.

I agree that the PEP does not provide a perfect solution
(whatever that is) to the difficulties faced by non-english
speaking Python users, but it provides a big and useful
improvement.

If you want to
do something about this:
1) Translate documentation.

Done. (In some cases.) For example here are the Standard
Library and Python Tutorial in Japanese:

http://www.python.jp/doc/release/lib/lib.html
http://www.python.jp/doc/release/tut/tut.html

(I mentioned this yesterday in
http://groups.google.com/group/comp.lang.python/msg/6ca67e21e9dc5358?hl=en&
but I can't critisize anyone for missing messages in this
hughmongous disscusion

2) Create a way to internationalize the standard library (and possibly
the language keywords, too). Ideally, create a general standardized way
to internationalize code, possibly similiar to how people
internationalize strings today.

Why? Or more acurately why before adopting the PEP?
The library is very usable by non-english speakers as long as
there is documentation in their native language. It would be
nice to have an internationalized standard library but there is
no reason why this should be a prerequisite to the PEP.

When that is done, non-ASCII identifiers could become useful. But of
course, doing that might create a hog of other problems.

I disagree, non-ascii identifiers are an immensely useful
change, right now. Python is somewhat useable by non-english
speakers now, but the identifier issue is a significant barrier.
If I can't write code with identifiers that are meaninful to
me (and my non-fluent-in-english colleagues) then I either
write code that is hard to understand by anyone (using ascii
transliterations) or write code understandable to you but
not me (using english). Neither option makes sense and
in practice I just use some language other than Python.

It is based on a look at the current Python environment. You do *at
least* have the problem that the standard library uses English names.

I don't know every nook and corner of the standard library.
Even as an english speaker, I only look up names for things
I already know. Those same things I recognise in code
because I use them. Otherwise I look in the index (which
is in my native language if I am not fluent in english). True,
I can't use docstrings effectively. And true, I can't guess at
the use of an unfamiliar name (but I have documentation)
Neither of those prevent my effective use of Python, nor
negate the immense value to me of being able to write code
that I and my colleagues can maintain.

So I see the use of english names in the standard lib as
a small problem, certainly not a reason to reject the PEP.

This assumes that there is documentation in the native language that is
good enough (i.e. almost as good as the official one), which I can tell
is not the case for German.

There is a chicken-and-egg problem here. Why would
many non-english speaking people want to adopt Python
if they cannot write maintainable (for them) programs in it?
If there aren't many non-english speaking Python users,
why would anyone want to put the effort into translating
docs for them? This is particularly true for people that
use scripts not based on latin letters.

rurpy · May 16, 2007

Christophe wrote: .....snip...

Thanks but no--I work with a _lot_ of code I didn't write, and looking
through stack traces from 3rd party packages is not uncommon.

Are you worried that some 3rd-party package you have
included in your software will have some non-ascii identifiers
buried in it somewhere? Surely that is easy to check for?
Far easier that checking that it doesn't have some trojan
code it it, it seems to me.

And I'm often not creating a stack trace procedure, I'm using the
built-in python procedure.

And I'm often dealing with mailing lists, Usenet, etc where I don't
know ahead of time what the other end's display capabilities are, how
to fix them if they don't display what I'm trying to send, whether
intervening systems will mangle things, etc.

I think we all are in this position. I always send plain
text mail to mailing lists, people I don't know etc. But
that doesn't mean that email software should be contrainted
to only 7-bit plain text, no attachements! I frequently use
such capabilities when they are appropriate.

If your response is, "yes, but look at the problems html
email, virus infected, attachements etc cause", the situation
is not the same. You have little control over what kind of
email people send you but you do have control over what
code, libraries, patches, you choose to use in your
software.

If you want to use ascii-only, do it! Nobody is making
you deal with non-ascii code if you don't want to.

Gregor Horvath · May 16, 2007

Why? Or more acurately why before adopting the PEP?
The library is very usable by non-english speakers as long as
there is documentation in their native language. It would be

Microsoft once translated their VBA to foreign languages.
I didn't use it because I was used to "English" code.
If I program in mixed cultural contexts I have to use to smallest
dominator. Mixing the symbols of the programming language is confusing.

Long time ago at the age of 12 I learned programming using English
Computer books. Then there were no German books at all. It was not easy.
It would have been completely impossible if our schools system would not
have been wise enough to teach as English early.

I think millions of people are handicapped because of this.
Any step to improve this, is a good step for all of us. In no doubt
there are a lot of talents wasted because of this wall.

Gregor

Gregor Horvath · May 16, 2007

It's one solution, depending on your support needs.

That would be a rather arrogant solution.
You would consider dropping the language and culture of millions of
users because a small number of support team staff does not understand
it? I would recommend to drop the support team and the management that
even considers this.

This PEP is not a technical question.
Technically it would no change much.

The underlying question is a philosophical one.
Should computer programming only be easy accessible to a small fraction
of privileged individuals who had the luck to be born in the correct
countries?

Should the unfounded and maybe xenophilous fear of loosing power and
control of a small number of those already privileged be a guide for
development?

Gregor

Gabriel Genellina · May 17, 2007

En Mon said:
Although probably not-sufficient to overcome this built-in
bias, it would be interesting if some bi-lingual readers would
raise this issue in some non-english Python discussion
groups to see if the opposition to this idea is as strong
there as it is here.

Survey results from a Spanish-speaking group and a local group from
Argentina:
Yes: 6
No: 3
Total: 9

Comments summary:

- Spanish requires few additional characters in addition to ASCII letters:
ñáéíóúü, so there is no great need of Unicode identifiers by Spanish
developers.

- Python can be embedded and extended using libraries - in those cases,
what matters mostly is the domain specific usage. Letting the final users
write their scripts/tasklets/etc using domain-specific and
language-specific names would be a great thing.

- Would be nice if class attribute names could correspond to table column
names directly; would be nice to use the Pi greek symbol, by example, in
math formulae.

- Others raised already seen concerns: about source code legibility; being
unable to type identifiers; risk of keywords being translated; that you
can't know in advance whether your code will become widely published so
best to use English identifiers from start.

- Someone proposed using escape sequences of some kind, supported by
editor plugins, so there is no need to modify the parser.

- Refactoring tools should let you rename foreign identifiers into ASCII
only.

rurpy · May 17, 2007

It's one solution, depending on your support needs.

Independent of Python, several companies I've worked at in Ecuador
(entirely composed of native Spanish-speaking Ecuadoreans) use the
English-language OS/application installations--they of course have the
Spanish dictionaries and use Spanish in their documents, but for them,
having localized application menus generates a lot more problems than
it solves.

Isn't the point of PEP-3131 free choice? How would
Ecuadoreans feel if their government mandated all
computers must use English?

Hendrik van Rooyen · May 17, 2007

This is a matter of taste.

I agree - and about perceptions of quality. Of what is good,
and not good. - If you havent yet, read Robert Pfirsig's book:
"Zen and the art of motorcycle maintenance"

In some programs I use German identifiers (not unicode). I and others
like the mix. My customers can understand the code better. (They are
only reading it)

I can sympathise a little bit with a customer who tries to read code.
Why that should be necessary, I cannot understand - does the stuff
not work to the extent that the customer feels he has to help you?
You do not talk as if you are incompetent, so I see no reason why
the customer should want to meddle in what you have written, unless
he is paying you to train him to program, and as Eric Brunel has
pointed out, this mixing of languages is all right in a training environment.

Correct.
But why do you think you should enforce your taste to all of us?

You misjudge me - the OP asked if I would use the feature, and I am
speaking for myself when I explain why I would not use it.

With this logic you should all drive Alfa Romeos!

Actually no - this is not about logic - my post clearly stated
that I was talking about feelings. And the only logic that applies
to feelings is the incontrovertible fact that they exist, and that it
makes good logical sense to acknowledge them, and to take that
into account in one's actions.

And as far as Alfa's go - we have found here that they are rather
soft - our dirt roads destroy them in no time. : - (

- Hendrik

Hendrik van Rooyen · May 17, 2007

How do you think you'd feel if Python had less in the way of
(conventionally used) English keywords/builtins. Like, say, Perl?

Would not like it at all, for the same reason I don't like re's -
It looks like random samples out of alphabet soup to me.

- Hendrik

Duncan Booth · May 17, 2007

Gabriel Genellina said:
- Someone proposed using escape sequences of some kind, supported by
editor plugins, so there is no need to modify the parser.

I'm not sure whether my suggestion below is the same as or a variation
on this.

- Refactoring tools should let you rename foreign identifiers into
ASCII only.

A possible modification to the PEP would be to permit identifiers to
also include \uxxxx and \Uxxxxxxxx escape sequences (as some other
languages already do). Then you could have a script easily (and
reversibly) convert all identifiers to ascii or indeed any other
encoding or subset of unicode using escapes only for the unrepresentable
characters.

I think this would remove several of the objections: such as being
unable to tell at a glance whether someone is trying to spoof your
variable names, or being unable to do minor maintenance on code using
character sets which your editor doesn't support: you just run the
script which would be included with every copy of Python to restrict the
character set of the source files to whatever character set you feel
happy with. The script should also be able to convert unrepresentable
characters in strings and comments (although that last operation
wouldn't be guaranteed reversible).

Of course it doesn't do anything for the objection about such
identifiers being ugly, but you can't have everything.

Gregor Horvath · May 17, 2007

Hendrik said:
I can sympathise a little bit with a customer who tries to read code.
Why that should be necessary, I cannot understand - does the stuff
not work to the extent that the customer feels he has to help you?
You do not talk as if you are incompetent, so I see no reason why
the customer should want to meddle in what you have written, unless
he is paying you to train him to program, and as Eric Brunel has
pointed out, this mixing of languages is all right in a training environment.

That is highly domain and customer specific individual logic, that the
costumer knows best. (For example variation logic of window and door
manufacturers)
He has to understand the code, so that he can verify it's correct.
We are in fact developing it together.
Some costumers even are coding this logic themselves. Some of them are
not fluent in English especially not in the computer domain.

Translating the logic into a documentation is a waste of time if the
code is self documenting and easy to grasp. (As python usually is) But
the code can only be self documenting if it is written in the domain
specific language of the customer. Sometimes these are words that are
not even used in general German. Even in German different customers are
naming the same thing with different words. Talking and coding in the
language of the customer is a huge benefit.

Gregor

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= · May 17, 2007

PEP 3131 uses a similar definition to C# except that PEP 3131

disallows formatting characters (category Cf). See section 9.4.2 of
http://www.ecma-international.org/publications/standards/Ecma-334.htm

UAX#31 discusses formatting characters in 2.2, and recognizes that
there might be good reasons to allow (and ignore) them; however,
it recommends against doing so except in special cases.

So I decided to disallow them.

Regards,
Martin

=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?= · May 17, 2007

Now look me in the eye and tell me that you find

the mix of proper German and English keywords
beautiful.

I can't admit that, but I find that using German
class and method names is beautiful. The rest around
it (keywords and names from the standard library)
are not English - they are Python.

(look me in the eye and tell me that "def" is
an English word, or that "getattr" is one)

Regards,
Martin

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= · May 17, 2007

A possible modification to the PEP would be to permit identifiers to

also include \uxxxx and \Uxxxxxxxx escape sequences (as some other
languages already do).

Several languages do that (e.g. C and C++), but I deliberately left
this out, as I cannot see this work in a practical way. Also,
it could be added later as another extension if there is an actual
need.

I think this would remove several of the objections: such as being
unable to tell at a glance whether someone is trying to spoof your
variable names,

If you are willing to run a script on the patch you receive, you
can perform that check even without having support for the \u
syntax in the language - either you convert to the \u notation,
and then check manually (converting back if all is fine), or you
have an automated check (e.g. at commit time) that checks for
conformance to the style guide.

or being unable to do minor maintenance on code using
character sets which your editor doesn't support: you just run the
script which would be included with every copy of Python to restrict the
character set of the source files to whatever character set you feel
happy with. The script should also be able to convert unrepresentable
characters in strings and comments (although that last operation
wouldn't be guaranteed reversible).

Again, if it's reversible, you don't need support for it in the
language. You convert to your editor's supported Unicode subset,
edit, then convert back.

However, I somewhat doubt that this case "my editor cannot display
my source code" is likely to occur: if the editor cannot display
it, you likely have a ban on those characters, anyway.

Regards,
Martin

Bjoern Schliessmann · May 17, 2007

Martin v. Löwis said:
I can't admit that, but I find that using German
class and method names is beautiful. The rest around
it (keywords and names from the standard library)
are not English - they are Python.

(look me in the eye and tell me that "def" is
an English word, or that "getattr" is one)

He's got a point (a small one though). For example:

- self (can be changed though)
- is
- with
- isinstance
- try

Regards,

Björn

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= · May 17, 2007

Consequently, Python's keywords and even the standard library can

exist with names being "just symbols" for many people.

I already told that on the py3k list: Until a week ago, I didn't know
why "pass" was chosen for the "no action" statement - with all my
English knowledge, I still could not understand why the opposite
of "fail" should mean "no action".

Still, I have been using "pass" for more than 10 years now, without
ever questioning what it means in English, and I've successfully
used it as a token. Except for the first draft of Das Python-Buch,
where I, from memory, thought the statement should be "skip";
I remembered it had four letters, and meant "go to the next line".

Now I understand it is meaning 12 in Merriam-Webster's dictionary,
a) "to decline to bid, double, or redouble in a card game", or b)
"to let something go by without accepting or taking
advantage of it".

Regards,
Martin

Guest · May 17, 2007

IMO, the burden of proof is on you. If this PEP has the potential to

introduce another hindrance for code-sharing, the supporters of this PEP
should be required to provide a "damn good reason" for doing so. So far,
you have failed to do that, in my opinion. All you have presented are
vague notions of rare and isolated use-cases.

The PEP explicitly states what the damn good reason is: "Such developers
often desire to define classes and functions with names in their native
languages, rather than having to come up with an (often incorrect)
English translation of the concept they want to name."

So the reason is that with this PEP, code clarity and readability will
become better. It's the same reason as for many other features
introduced into Python recently, e.g. the with statement.

If you doubt the claim, please indicate which of these three aspects
you doubt:
1. there are programmers which desire to defined classes and functions
with names in their native language.
2. those developers find the code clearer and more maintainable than
if they had to use English names.
3. code clarity and maintainability is important.

Regards,
Martin

Guest · May 17, 2007

You could say the same about Python standard library and keywords then.

Shouldn't these also have to be translated? One can even push things a
little further: I don't know about the languages used in the countries
you mention, but for example, a simple construction like 'if <condition>
<do something>' will look weird to a Japanese (the Japanese language has
a "post-fix" feel: the equivalent of the 'if' is put after the
condition). So why enforce an English-like sentence structure?

The Python syntax does not use an English-like sentence structure.
In English, a statement follows the pretty strict sequence of subject,
predicate, object (SPO). In Python, statements don't have a subject;
some don't even have a verb (e.g. assignments).

Regardless, this PEP does not propose to change the syntax of the
language, because doing so would cause technical problems - unlike
the proposed PEP, which does not cause any technical problems to
the language implementation whatsoever (and only slight technical
problems to editors, which aren't worse than the ones cause by
PEP 263).

You have a point here. When learning to program, or when programming for
fun without any intention to do something serious, it may be better to
have a language supporting "native" characters in identifiers. My
problem is: if you allow these, how can you prevent them from going
public someday?

You can't, and you shouldn't. What you can prevent is that the code
enters *your* project. I cannot see why you want to censor what code
other people publish.

Regards,
Martin

Atoms, Identifiers, and Primaries	21	Apr 16, 2013
Generating valid identifiers	8	Jul 26, 2012
Non-identifiers in dictionary keys for **expression syntax	3	May 23, 2013
Renaming identifiers & debugging	14	Feb 25, 2010
Looking for UNICODE to ASCII Conversioni Example Code	15	Oct 18, 2013
Python 3.5, bytes, and %-interpolation (aka PEP 461)	10	Feb 24, 2014
Is PEP-8 a Code or More of a Guideline?	52	May 26, 2007
Extended identifiers?	1	Jun 15, 2012

PEP 3131: Supporting Non-ASCII Identifiers

rurpy

sjdevnull

Matthew Woodcraft

rurpy

rurpy

Gregor Horvath

Gregor Horvath

Gabriel Genellina

rurpy

Hendrik van Rooyen

Hendrik van Rooyen

Duncan Booth

Gregor Horvath

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Bjoern Schliessmann

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Guest

Guest

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads