PEP 3131: Supporting Non-ASCII Identifiers

Stefan Behnel · May 14, 2007

Bruno said:
but CS is english-speaking, period.

That's a wrong assumption. I understand that people can have this impression
when they deal a lot with Open Source code, but I've seen a lot of places
where code was produced that was not written to become publicly available (and
believe me, it *never* will become Open Source). And the projects made strong
use of identifiers with domain specific names. And believe me, those are best
expressed in a language your client knows and expresses concepts in. And this
is definitely not the language you claim to be the only language in CS.

Stefan

Anton Vredegoor · May 14, 2007

Martin v. Löwis:

I support this to ease integration with other languages and
platforms that allow non-ASCII letters to be used in identifiers. Python
has a strong heritage as a glue language and this has been enabled by
adapting to the features of various environments rather than trying to
assert a Pythonic view of how things should work.

Neil

Ouch! Now I seem to be disagreeing with the one who writes my editor.
What will become of me now?

A.

Nick Craig-Wood · May 14, 2007

Martin v. Löwis said:
So, please provide feedback, e.g. perhaps by answering these
questions:

Firstly on the PEP itself:

It defines characters that would be allowed. However not being up to
speed on unicode jargon I don't have a clear idea about which
characters those are. A page with some examples or even all possible
allowed characters would be great, plus some examples of disallowed
characters.

- should non-ASCII identifiers be supported? why?

Only if PEP 8 was amended to state that ASCII characters only should
be used for publically released / library code. I'm quite happy with
Unicode in comments / docstrings (but that is supported already).

- would you use them if it was possible to do so? in what cases?

My initial reaction is that it would be cool to use all those great
symbols. A variable called OHM etc! However on reflection I think it
would be a step back for the easy to read nature of python.

My worries are :-

a) English speaking people would invent their own dialects of python
which looked like APL with all those nice Unicode mathematical
operators / Greek letters you could use as variable/function names. I
like the symbol free nature of python which makes for easy
comprehension of code and don't want to see it degenerate.

b) Unicode characters would creep into the public interface of public
libraries. I think this would be a step back for the homogeneous
nature of the python community.

c) the python keywords are in ASCII/English. I hope you weren't
thinking of changing them?

....

In summary, I'm not particularly keen on the idea; though it might be
all right in private. Unicode identifiers are allowed in java though,
so maybe I'm worrying too much ;-)

Eric Brunel · May 14, 2007

Ok, so we're back to my original example: the problem here is not the
non-ASCII encoding but the non-english identifiers.

As I said in the rest of my post, I do recognize that there is a problem
with non-english identifiers. I only think that allowing these identifiers
to use a non-ASCII encoding will make things worse, and so should be
avoided.

If we move the problem to a pure unicode naming problem:

How likely is it that it's *you* (lacking a native, say, kanji keyboard)
who
ends up with code that uses identifiers written in kanji? And that you
are the
only person who is now left to do the switch to an ASCII transliteration?

Any chance there are still kanji-enabled programmes around that were not
hit
by the bomb in this scenario? They might still be able to help you get
the
code "public".

Contrarily to what one might think seeing the great achievements of
open-source software, people willing to maintain public code and/or make
it evolve seem to be quite rare. If you add burdens on such people - such
as being able to read and write the language of the original code writer,
or forcing them to request a translation or transliteration from someone
else -, the chances are that they will become even rarer...

Stefan Behnel · May 14, 2007

Eric said:
Contrarily to what one might think seeing the great achievements of
open-source software, people willing to maintain public code and/or make
it evolve seem to be quite rare. If you add burdens on such people -
such as being able to read and write the language of the original code
writer, or forcing them to request a translation or transliteration from
someone else -, the chances are that they will become even rarer...

Ok, but then maybe that code just will not become Open Source. There's a
million reasons code cannot be made Open Source, licensing being one, lack of
resources being another, bad implementation and lack of documentation being
important also.

But that won't change by keeping Unicode characters out of source code.

Now that we're at it, badly named english identifiers chosen by non-english
native speakers, for example, are a sure way to keep people from understanding
the code and thus from being able to contribute resources.

I'm far from saying that all code should start using non-ASCII characters.
There are *very* good reasons why a lot of projects are well off with ASCII
and should obey the good advice of sticking to plain ASCII. But those are
mainly projects that are developed in English and use English documentation,
so there is not much of a risk to stumble into problems anyway.

I'm only saying that this shouldn't be a language restriction, as there
definitely *are* projects (I know some for my part) that can benefit from the
clarity of native language identifiers (just like English speaking projects
benefit from the English language). And yes, this includes spelling native
language identifiers in the native way to make them easy to read and fast to
grasp for those who maintain the code.

It should at least be an available option to use this feature.

Stefan

Neil Hodgson · May 14, 2007

Anton Vredegoor:

Ouch! Now I seem to be disagreeing with the one who writes my editor.
What will become of me now?

It should be OK. I try to keep my anger under control and not cut
off the pixel supply at the first stirrings of dissent.

It may be an idea to provide some more help for multilingual text
such as allowing ranges of characters to be represented as hex escapes
or character names automatically. Then someone who only normally uses
ASCII can more easily audit patches that could contain non-ASCII characters.

Neil

Marc 'BlackJack' Rintsch · May 14, 2007

Nick Craig-Wood said:
My initial reaction is that it would be cool to use all those great
symbols. A variable called OHM etc!

This is a nice candidate for homoglyph confusion. There's the Greek
letter omega (U+03A9) Î© and the SI unit symbol (U+2126) â„¦, and I think
some omegas in the mathematical symbols area too.

Ciao,
Marc 'BlackJack' Rintsch

Marco Colombo · May 14, 2007

I suggest we keep focused on the main issue here, which is "shoud non-
ascii identifiers be allowed, given that we already allow non-ascii
strings literals and comments?"

Most arguments against this proposal really fall into the category
"ascii-only source files". If you want to promote code-sharing, then
you should enfore quite restrictive policies:
- 7-bit only source files, so that everyone is able to correctly
display and _print_ them (somehow I feel that printing foreign glyphs
can be harder than displaying them) ;
- English-only, readable comments _and_ identifiers (if you think of
it, it's really the same issue, readability... I know no Coding Style
that requires good commenting but allows meaningless identifiers).

Now, why in the first place one should be allowed to violate those
policies? One reason is freedom. Let me write my code the way I like
it, and don't force me writing it the way you like it (unless it's
supposed to be part of _your_ project, then have me follow _your_
style).

Another reason is that readability is quite a relative term...
comments that won't make any sense in a real world program, may be
appropriate in a 'getting started with' guide example:

# this is another way to increment variable 'a'
a += 1

we know a comment like that is totally useless (and thus harmful) to
any programmer (makes me think "thanks, but i knew that already"), but
it's perfectly appropriate if you're introducing that += operator for
the first time to a newbie.

You could even say that most string literals are best made English-
only:

print "Ciao Mondo!"

it's better written:

print _("Hello World!")

or with any other mean to allow the i18n of the output. The Italian
version should be implemented with a .po file or whatever.

Yet, we support non-ascii encodings for source files. That's in order
to give authors more freedom. And freedom comes at a price, of course,
as non-ascii string literals, comments and identifiers are all harmful
to some extents and in some contexts.

What I fail to see is a context in which it makes sense to allow non-
ascii literals and non-ascii comments but _not_ non-ascii identifiers.
Or a context in which it makes sense to rule out non-ascii identifiers
but not string literals and comments. E.g. would you accept a patch
with comments you don't understand (or even that you are not able to
display correctly)? How can you make sure the patch is correct, if you
can't read and understand the string literals it adds?

My point being that most public open source projects already have
plenty of good reasons to enforce an English-only, ascii-only policy
on source files. I don't think that allowing non-ascii indentifiers at
language level would hinder thier ability to enforce such a policy
more than allowing non-ascii comments or literals did.

OTOH, I won't be able to contribute much to a project that already
uses, say, Chinese for comments and strings. Even if I manage to
display the source code correctly here, still I won't understand much
of it. So I'm not losing much by allowing them to use Chinese for
identifiers too.
And whether it was a mistake on their part not to choose an "English
only, ascii only" policy it's their call, not ours, IMHO.

..TM.

Alexander Schmolck · May 14, 2007

Neil Hodgson said:
C#, Java, Ecmascript, Visual Basic.

(i.e. everything that isn't a legacy or niche language)

scheme (major implementations such as PLT and the upcoming standard), the most
popular common lisp implementations, haskell[1], fortress[2], perl 6 and I should
imagine (but haven't checked) all new java or .NET based languages (F#,
IronPython, JavaFX, Groovy, etc.) as well -- the same goes for XML-based
languages.

(i.e. everything that's up and coming, too)

So as Neil said, I don't think keeping python ASCII and interoperable is an
option. I don't happen to think the anti-unicode arguments that have been
advanced so far terribly convincing so far[3], but even if they were it
wouldn't matter much -- the ability of functioning as a painless glue language
has always been absolutely vital for python.

cheers

'as

Footnotes:
[1] <http://hackage.haskell.org/trac/haskell-prime/wiki/UnicodeInHaskellSource>

[2] <http://research.sun.com/projects/plrg/fortress.pdf>

[3] Although I do agree that mechanisms to avoid spoofing and similar
problems (what normalization scheme and constraints unicode identifiers
should be subjected to) merit careful discussion.

Laurent Pointal · May 14, 2007

Martin v. LÃ¶wis a Ã©crit :

PEP 1 specifies that PEP authors need to collect feedback from the
community. As the author of PEP 3131, I'd like to encourage comments
to the PEP included below, either here (comp.lang.python), or to
(e-mail address removed)

In summary, this PEP proposes to allow non-ASCII letters as
identifiers in Python. If the PEP is accepted, the following
identifiers would also become valid as class, function, or
variable names: LÃ¶ffelstiel, changÃ©, Ð¾ÑˆÐ¸Ð±ÐºÐ°, or å£²ã‚Šå ´
(hoping that the latter one means "counter").

I believe this PEP differs from other Py3k PEPs in that it really
requires feedback from people with different cultural background
to evaluate it fully - most other PEPs are culture-neutral.

So, please provide feedback, e.g. perhaps by answering these
questions:
- should non-ASCII identifiers be supported? why?
- would you use them if it was possible to do so? in what cases?

I strongly prefer to stay with current standard limited ascii for
identifiers.

Ideally, it would be agreable to have variables like greek letters for
some scientific vars, for french people using Ã©Ã¨Ã§Ã in names...

But... (I join common obections):

* where are-they on my keyboard, how can I type them ?
(i can see french Ã©Ã¨Ã§Ã , but us-layout keyboard dont know them, imagine
kanji or greek)

* how do I spell this cyrilic/kanji char ?

* when there are very similar chars, how can I distinguish them?
(without dealing with same representation chars having different unicode
names)

* is "amÃ©dÃ©" variable and "amede" the same ?

* its an anti-KISS rule.

* not only I write code, I read it too, and having such variation
possibility in names make code really more unreadable.
(unless I learn other scripting representation - maybe not a bad thing
itself, but its not the objective here).

* I've read "Restricting the language to ASCII-only identifiers does
not enforce comments and documentation to be English, or the identifiers
actually to be English words, so an additional policy is necessary,
anyway."
But even with comments in german or spanish or japanese, I can guess to
identify what a (well written) code is doing with its data. It would be
very difficult with unicode spanning identifiers.

==> I wouldn't use them.

So, keep ascii only.
Basic ascii is the lower common denominator known and available
everywhere, its known by all developers who can identify these chars
correctly (maybe 1 vs I or O vs 0 can get into problems with uncorrect
fonts).

Maybe, make default file-encoding to utf8 and strings to be unicode
strings by default (with a s"" for basic strings by example), but this
is another problem.

L.Pointal.

Stefan Behnel · May 14, 2007

Marco said:
I suggest we keep focused on the main issue here, which is "shoud non-
ascii identifiers be allowed, given that we already allow non-ascii
strings literals and comments?"

Most arguments against this proposal really fall into the category
"ascii-only source files". If you want to promote code-sharing, then
you should enfore quite restrictive policies:
- 7-bit only source files, so that everyone is able to correctly
display and _print_ them (somehow I feel that printing foreign glyphs
can be harder than displaying them) ;
- English-only, readable comments _and_ identifiers (if you think of
it, it's really the same issue, readability... I know no Coding Style
that requires good commenting but allows meaningless identifiers).

Now, why in the first place one should be allowed to violate those
policies? One reason is freedom. Let me write my code the way I like
it, and don't force me writing it the way you like it (unless it's
supposed to be part of _your_ project, then have me follow _your_
style).

Another reason is that readability is quite a relative term...
comments that won't make any sense in a real world program, may be
appropriate in a 'getting started with' guide example:

# this is another way to increment variable 'a'
a += 1

we know a comment like that is totally useless (and thus harmful) to
any programmer (makes me think "thanks, but i knew that already"), but
it's perfectly appropriate if you're introducing that += operator for
the first time to a newbie.

You could even say that most string literals are best made English-
only:

print "Ciao Mondo!"

it's better written:

print _("Hello World!")

or with any other mean to allow the i18n of the output. The Italian
version should be implemented with a .po file or whatever.

Yet, we support non-ascii encodings for source files. That's in order
to give authors more freedom. And freedom comes at a price, of course,
as non-ascii string literals, comments and identifiers are all harmful
to some extents and in some contexts.

What I fail to see is a context in which it makes sense to allow non-
ascii literals and non-ascii comments but _not_ non-ascii identifiers.
Or a context in which it makes sense to rule out non-ascii identifiers
but not string literals and comments. E.g. would you accept a patch
with comments you don't understand (or even that you are not able to
display correctly)? How can you make sure the patch is correct, if you
can't read and understand the string literals it adds?

My point being that most public open source projects already have
plenty of good reasons to enforce an English-only, ascii-only policy
on source files. I don't think that allowing non-ascii indentifiers at
language level would hinder thier ability to enforce such a policy
more than allowing non-ascii comments or literals did.

OTOH, I won't be able to contribute much to a project that already
uses, say, Chinese for comments and strings. Even if I manage to
display the source code correctly here, still I won't understand much
of it. So I'm not losing much by allowing them to use Chinese for
identifiers too.
And whether it was a mistake on their part not to choose an "English
only, ascii only" policy it's their call, not ours, IMHO.

Very well written.

+1

Stefan

Duncan Booth · May 14, 2007

Alexander Schmolck said:
scheme (major implementations such as PLT and the upcoming standard),
the most popular common lisp implementations, haskell[1], fortress[2],
perl 6 and I should imagine (but haven't checked) all new java or .NET
based languages (F#, IronPython, JavaFX, Groovy, etc.) as well -- the
same goes for XML-based languages.

Just to confirm that: IronPython does accept non-ascii identifiers. From
"Differences between IronPython and CPython":

Stefan Behnel · May 14, 2007

Duncan said:
Alexander Schmolck said:

scheme (major implementations such as PLT and the upcoming standard),
the most popular common lisp implementations, haskell[1], fortress[2],
perl 6 and I should imagine (but haven't checked) all new java or .NET
based languages (F#, IronPython, JavaFX, Groovy, etc.) as well -- the
same goes for XML-based languages.

Click to expand...

Just to confirm that: IronPython does accept non-ascii identifiers. From
"Differences between IronPython and CPython":

IronPython will compile files whose identifiers use non-ASCII
characters if the file has an encoding comment such as "# -*- coding:
utf-8 -*-". CPython will not compile such a file in any case.

Click to expand...

Sounds like CPython would better follow IronPython here.

Stefan

Paul McGuire · May 14, 2007

A variable called OHM etc!

Then can 'lambda' -> 'Î»' be far behind? (I know this is a keyword
issue, not covered by this PEP, but I also sense that the 'lambda'
keyword has always been ranklesome.)

In my own personal English-only experience, I've thought that it would
be helpful to the adoption of pyparsing if I could distribute class
name translations, since so much of my design goal of pyparsing is
that it be somewhat readable as in:

integer = Word(nums)

is 'an integer is a word composed of numeric digits'.

By distributing a translation file, such as:

Palabra = Word
Grupo = Group
etc.

a Spanish-speaker could write their own parser using:

numero = Palabra(nums)

and this would still pass the "fairly easy-to-read" test, for that
user. While my examples don't use any non-ASCII characters, I'm sure
the issue would come up fairly quickly.

As to the responder who suggested not mixing ASCII/Latin with, say,
Hebrew in any given identifier, this is not always possible. On a
business trip to Israel, I learned that there are many terms that do
not have Hebrew correspondents, and so Hebrew technical literature is
sprinkled with English terms in Latin characters. This is especially
interesting to watch being typed on a terminal, as the Hebrew
characters are written on the screen right-to-left, and then an
English word is typed by switching the editor to left-to-right mode.
The cursor remains in the same position and the typed Latin characters
push out to the left as they are typed. Then typing in right-to-left
mode is resumed, just to the left of the Latin characters just
entered.

-- Paul

Duncan Booth · May 14, 2007

Stefan Behnel said:
Sounds like CPython would better follow IronPython here.

I cannot find any documentation which says exactly which non-ASCII
characters IronPython will accept.
I would guess that it probably follows C# in general, but it doesn't
follow C# identifier syntax exactly (in particular the leading @ to
quote keywords is not supported).

The C# identifier syntax from http://msdn2.microsoft.com/en-us/library/aa664670(VS.71).aspx
I think it differs from the PEP only in also allowing the Cf class of characters:

identifier:
available-identifier
@ identifier-or-keyword
available-identifier:
An identifier-or-keyword that is not a keyword
identifier-or-keyword:
identifier-start-character identifier-part-charactersopt
identifier-start-character:
letter-character
_ (the underscore character U+005F)
identifier-part-characters:
identifier-part-character
identifier-part-characters identifier-part-character
identifier-part-character:
letter-character
decimal-digit-character
connecting-character
combining-character
formatting-character
letter-character:
A Unicode character of classes Lu, Ll, Lt, Lm, Lo, or Nl
A unicode-escape-sequence representing a character of classes Lu, Ll, Lt, Lm, Lo, or Nl
combining-character:
A Unicode character of classes Mn or Mc
A unicode-escape-sequence representing a character of classes Mn or Mc
decimal-digit-character:
A Unicode character of the class Nd
A unicode-escape-sequence representing a character of the class Nd
connecting-character:
A Unicode character of the class Pc
A unicode-escape-sequence representing a character of the class Pc
formatting-character:
A Unicode character of the class Cf
A unicode-escape-sequence representing a character of the class Cf

For information on the Unicode character classes mentioned above, see
The Unicode Standard, Version 3.0, section 4.5.

gatti · May 14, 2007

In summary, this PEP proposes to allow non-ASCII letters as
identifiers in Python. If the PEP is accepted, the following
identifiers would also become valid as class, function, or
variable names: LÃ¶ffelstiel, changÃ©, Ð¾ÑˆÐ¸Ð±ÐºÐ°, or å£²ã‚Šå ´
(hoping that the latter one means "counter").

I am strongly against this PEP. The serious problems and huge costs
already explained by others are not balanced by the possibility of
using non-butchered identifiers in non-ASCII alphabets, especially
considering that one can write any language, in its full Unicode
glory, in the strings and comments of suitably encoded source files.
The diatribe about cross language understanding of Python code is IMHO
off topic; if one doesn't care about international readers, using
annoying alphabets for identifiers has only a marginal impact. It's
the same situation of IRIs (a bad idea) with HTML text (happily
Unicode).

- should non-ASCII identifiers be supported? why? No, they are useless.
- would you use them if it was possible to do so? in what cases?

No, never.
Being Italian, I'm sometimes tempted to use accented vowels in my
code, but I restrain myself because of the possibility of annoying
foreign readers and the difficulty of convincing every text editor I
use to preserve them

Python code is written by many people in the world who are not familiar
with the English language, or even well-acquainted with the Latin
writing system. Such developers often desire to define classes and
functions with names in their native languages, rather than having to
come up with an (often incorrect) English translation of the concept
they want to name.

The described set of users includes linguistically intolerant people
who don't accept the use of suitable languages instead of their own,
and of compromised but readable spelling instead of the one they
prefer.
Most "people in the world who are not familiar with the English
language" are much more mature than that, even when they don't write
for international readers.

The syntax of identifiers in Python will be based on the Unicode
standard annex UAX-31 [1]_, with elaboration and changes as defined
below.

Not providing an explicit listing of allowed characters is inexcusable
sloppiness.
The XML standard is an example of how listings of large parts of the
Unicode character set can be provided clearly, exactly and (almost)
concisely.

``ID_Start`` is defined as all characters having one of the general
categories uppercase letters (Lu), lowercase letters (Ll), titlecase
letters (Lt), modifier letters (Lm), other letters (Lo), letter numbers
(Nl), plus the underscore (XXX what are "stability extensions" listed in
UAX 31).

``ID_Continue`` is defined as all characters in ``ID_Start``, plus
nonspacing marks (Mn), spacing combining marks (Mc), decimal number
(Nd), and connector punctuations (Pc).

Am I the first to notice how unsuitable these characters are? Many of
these would be utterly invisible ("variation selectors" are Mn) or
displayed out of sequence (overlays are Mn), or normalized away
(combining accents are Mn) or absurdly strange and ambiguous (roman
numerals are Nl, for instance).

Lorenzo Gatti

Bruno Desthuilliers · May 14, 2007

Stefan Behnel a écrit :

Ok, but then maybe that code just will not become Open Source. There's a
million reasons code cannot be made Open Source, licensing being one, lack of
resources being another, bad implementation and lack of documentation being
important also.

But that won't change by keeping Unicode characters out of source code.

Nope, but adding unicode glyphs support for identifiers will only make
things worse, and we (free software authors/users/supporters)
definitively *don't* need this.

Now that we're at it, badly named english identifiers chosen by non-english
native speakers, for example, are a sure way to keep people from understanding
the code and thus from being able to contribute resources.

Broken English is certainly better than German or French or Italian when
it comes to sharing code.

I'm far from saying that all code should start using non-ASCII characters.
There are *very* good reasons why a lot of projects are well off with ASCII
and should obey the good advice of sticking to plain ASCII. But those are
mainly projects that are developed in English and use English documentation,
so there is not much of a risk to stumble into problems anyway.

I'm only saying that this shouldn't be a language restriction, as there
definitely *are* projects (I know some for my part) that can benefit from the
clarity of native language identifiers (just like English speaking projects
benefit from the English language).

As far as I'm concerned, I find "frenglish" source code (code with
identifiers in French) a total abomination. The fact is that all the
language (keywords, builtins, stdlib) *is* in English. Unless you
address that fact, your PEP is worthless (and even if you really plan to
do something about this, I still find it a very bad idea for reasons
already exposed).

The fact is also that anyone at least half-serious wrt/ CS will learn
technical English anyway. And, as other already pointed, learning
technical English is certainly not the most difficult part when it comes
to programming.

And yes, this includes spelling native
language identifiers in the native way to make them easy to read and fast to
grasp for those who maintain the code.

Yes, fine. So we end up with a code that's a mix of English (keywords,
builtins, stdlib, almost if not all third-part libs) and native
language. So, while native speakers will still have to deal with
English, non-native speakers won't be able to understand anything. Talk
about a great idea...

Bruno Desthuilliers · May 14, 2007

Stefan Behnel a écrit :

That's a wrong assumption.

I've never met anyone *serious* about programming and yet unable to read
and write CS-oriented technical English.

I understand that people can have this impression
when they deal a lot with Open Source code, but I've seen a lot of places
where code was produced that was not written to become publicly available (and
believe me, it *never* will become Open Source).

Yeah, fine. This doesn't mean that all and every people that may have to
work on this code is a native speaker of the language used - or even
fluent enough with it.

Eric Brunel · May 14, 2007

Ok, but then maybe that code just will not become Open Source. There's a
million reasons code cannot be made Open Source, licensing being one,
lack of
resources being another, bad implementation and lack of documentation
being
important also.

But that won't change by keeping Unicode characters out of source code.

Maybe; maybe not. This is one more reason for a code preventing it from
becoming open-source. IMHO, there are already plenty of these reasons, and
I don't think we need a new one...

Now that we're at it, badly named english identifiers chosen by
non-english
native speakers, for example, are a sure way to keep people from
understanding
the code and thus from being able to contribute resources.

I wish we could have an option forbidding these also ;-) But now, maybe
some of my own code would no more execute when it's turned on...

I'm far from saying that all code should start using non-ASCII
characters.
There are *very* good reasons why a lot of projects are well off with
ASCII
and should obey the good advice of sticking to plain ASCII. But those are
mainly projects that are developed in English and use English
documentation,
so there is not much of a risk to stumble into problems anyway.

I'm only saying that this shouldn't be a language restriction, as there
definitely *are* projects (I know some for my part) that can benefit
from the
clarity of native language identifiers (just like English speaking
projects
benefit from the English language). And yes, this includes spelling
native
language identifiers in the native way to make them easy to read and
fast to
grasp for those who maintain the code.

My point is only that I don't think you can tell right from the start that
a project you're working on will stay private forever. See Java for
instance: Sun said for quite a long time that it wasn't a good idea to
release Java as open-source and that it was highly unlikely to happen. But
it finally did...

You could tell that the rule should be that if the project has the
slightest chance of becoming open-source, or shared with people not
speaking the same language as the original coders, one should not use
non-ASCII identifiers. I'm personnally convinced that *any* industrial
project falls into this category. So accepting non-ASCII identifiers is
just introducing a disaster waiting to happen.

But now, I have the same feeling about non-ASCII strings, and I - as a
project leader - won't ever accept a source file which has a "_*_ coding
_*_" line specifying anything else than ascii... So even if I usually
don't buy the "we're already half-dirty, so why can't we be the dirtiest
possible" argument, I'd understand if this feature went into the language.
But I personnally won't ever use it, and forbid it from others whenever
I'll be able to.

It should at least be an available option to use this feature.

If it's actually an option to the interpreter, I guess I'll just have to
alias python to 'python --ascii-only-please'...

Michel Claveau · May 14, 2007

Hi !

- should non-ASCII identifiers be supported? why?
- would you use them if it was possible to do so? in what cases?

Yes.
And, more: yes yes yes

Because:

1) when I connect Python to j(ava)Script, if the pages "connected"
contains objects with non-ascii characters, I can't use it ; snif...

2) when I connect Python to databases, if there are fields (columns)
with emphatic letters, I can't use class properties for drive these
fields. Exemples:
"cité" (french translate of "city")
"téléphone" (for phone)

And, because non-ASCII characters are possible, they are no-obligatory
; consequently guys (snobs?) want stay in pure-ASCII dimension will
can.

* sorry for my bad english *

Atoms, Identifiers, and Primaries	21	Apr 16, 2013
Generating valid identifiers	8	Jul 26, 2012
Non-identifiers in dictionary keys for **expression syntax	3	May 23, 2013
Renaming identifiers & debugging	14	Feb 25, 2010
Looking for UNICODE to ASCII Conversioni Example Code	15	Oct 18, 2013
Python 3.5, bytes, and %-interpolation (aka PEP 461)	10	Feb 24, 2014
Is PEP-8 a Code or More of a Guideline?	52	May 26, 2007
Extended identifiers?	1	Jun 15, 2012

PEP 3131: Supporting Non-ASCII Identifiers

Stefan Behnel

Anton Vredegoor

Nick Craig-Wood

Eric Brunel

Stefan Behnel

Neil Hodgson

Marc 'BlackJack' Rintsch

Marco Colombo

Alexander Schmolck

Laurent Pointal

Stefan Behnel

Duncan Booth

Stefan Behnel

Paul McGuire

Duncan Booth

gatti

Bruno Desthuilliers

Bruno Desthuilliers

Eric Brunel

Michel Claveau

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads