PEP 3131: Supporting Non-ASCII Identifiers

Anders J. Munch · May 14, 2007

Eric said:
You could tell that the rule should be that if the project has the
slightest chance of becoming open-source, or shared with people not
speaking the same language as the original coders, one should not use
non-ASCII identifiers. I'm personnally convinced that *any* industrial
project falls into this category. So accepting non-ASCII identifiers is
just introducing a disaster waiting to happen.

Not at all. If the need arises, you just translate the whole thing. Contrary
to popular belief, this is a quick and easy thing to do.

So YAGNI applies, and even if you find that you do need it, you may still have
won on the balance! As the time saved by using your native language just might
outweigh the time spent translating.

- Anders

Anders J. Munch · May 14, 2007

Hendrik said:
And we have been through the Macro thingy here, and the consensus
seemed to be that we don't want people to write their own dialects.

Macros create dialects that are understood only by the three people in your
project group. It's unreasonable to compare that to a "dialect" such as
Mandarin, which is exclusive to a tiny little clique of one billion people.

- Anders

rurpy · May 14, 2007

Yes.
And, more: yes yes yes

Because:

1) when I connect Python to j(ava)Script, if the pages "connected"
contains objects with non-ascii characters, I can't use it ; snif...

2) when I connect Python to databases, if there are fields (columns)
with emphatic letters, I can't use class properties for drive these
fields. Exemples:
"cité" (french translate of "city")
"téléphone" (for phone)

And, because non-ASCII characters are possible, they are no-obligatory
; consequently guys (snobs?) want stay in pure-ASCII dimension will
can.

* sorry for my bad english *

Can a discussion about support for non-english identifiers (1)
conducted in a group where 99.9% of the posters are fluent
speakers of english (2), have any chance of being objective
or fair?

Although probably not-sufficient to overcome this built-in
bias, it would be interesting if some bi-lingual readers would
raise this issue in some non-english Python discussion
groups to see if the opposition to this idea is as strong
there as it is here.

(1) No quibbles about the distintion between non-english
and non-ascii please.
(2) Several posters have claimed non-native english speaker
status to bolster their position, but since they are clearly at
or near native-speaker levels of fluency, that english is not
their native language is really irrelevant.

Anton Vredegoor · May 14, 2007

Neil said:
Anton Vredegoor:

It should be OK. I try to keep my anger under control and not cut
off the pixel supply at the first stirrings of dissent.

Thanks! I guess I won't have to make the obligatory Sovjet Russia joke
now

It may be an idea to provide some more help for multilingual text
such as allowing ranges of characters to be represented as hex escapes
or character names automatically. Then someone who only normally uses
ASCII can more easily audit patches that could contain non-ASCII characters.

Now that I read about IronPython already supporting some larger
character set I feel like I'm somewhat caught in a side effect of an
embrace and extend scheme.

A.

Stefan Behnel · May 14, 2007

Jarek said:
Stefan Behnel napisaÅ‚(a):

OK, then. As a code obfuscation measure this would fit perfectly.

I actually meant it as a measure for clarity and readability for those who are
actually meant to *read* the code.

Stefan

Jakub Stolarski · May 14, 2007

- should non-ASCII identifiers be supported? why?

No. It's good convention to stick with english. And if we stick with
english, why we should need non-ASCII characters? Any non-ASCII
character makes code less readable. We never know if our code became
public.

- would you use them if it was possible to do so? in what cases?

No. I don't see any uses. I'm Polish. Polish-english mix looks funny.

Pierre Hanser · May 14, 2007

This pep is not technical, or at least not only. It has
larger implications about society model we want.

Let me explain with an analogy:
let's compare 'ascii english' to coca-cola.

It's available nearly everywhere.

It does not taste good at first try, and is especially
repulsive to young children.

It's cheap and you don't expect much of it.

You know you can drink some in case of real need.

It's imperialist connotation is widely accepted(?)

But it's not good as your favorite beverage, beer, wine, ...

The world is full of other possibilities. Think, in case
of necessity you could even have to drink tea with yack
butter in himalaya! in normal circonstances, you should
never see any, but in extreme situation you may have to!

Were is freedom in such a world you could only drink coca?

I DON'T WANT TO HAVE TO DRINK COCA AT HOME ALL THE TIME.

and this pep is a glorious occasion to get free from it.

[disclaimer: coca is used here as the generic name it became,
and no real offense is intended]

Terry Reedy · May 14, 2007

| Sounds like CPython would better follow IronPython here.

One could also turn the argument around and say that there is no need to
follow IronPython; people who want non-ASCII identifiers can just juse
IronPython.

Michel Claveau · May 14, 2007

And Il1 O0 ?

Marc 'BlackJack' Rintsch · May 14, 2007

Michel Claveau said:
And Il1 O0 ?

Hm, we should ban digits from identifier names.

Ciao,
Marc 'BlackJack' Rintsch

Stefan Behnel · May 14, 2007

Marc said:
Hm, we should ban digits from identifier names.

Ah, good idea - and capital letters also. After all, they are rare enough in
English to just plain ignore their existance.

Stefan

Grant Edwards · May 14, 2007

Ah, good idea - and capital letters also. After all, they are
rare enough in English to just plain ignore their existance.

And I don't really see any need for using more than two
characters. With just two letters (ignoring case, of course),
you can create 676 identifiers in any namespace. That's
certainly got to be enough. If not, adding a special caracter
suffix (e.g. $,%,#) to denote the data type should sufficient
expand the namespace.

So, let's just silently ignore anything past the first two.
That way we'd be compatible with Commodor PET Basic.

[You don't want to know how long it took me to find all of the
name-collision bugs after porting a basic program from a CP/M
system which had a fairly sophisticated Basic compiler (no line
numbers, all the normal structured programming flow control
constructs) to a Commodore PET which had a really crappy BASIC
interpreter.]

Guest · May 14, 2007

Hi!

- should non-ASCII identifiers be supported? why?
- would you use them if it was possible to do so? in what cases?

Yes.

JScript can use letters with accents in identifiers
XML (1.1) can use letters with accents in tags
C# can use letters with accents in variables
SQL: MySQL/MS-Sql/Oralcle/etc. can use accents in fields or request
etc.
etc.

Python MUST make up for its lost time.

MCI

Jakub Stolarski · May 14, 2007

Hi!

- should non-ASCII identifiers be supported? why?
- would you use them if it was possible to do so? in what cases?

Yes.

JScript can use letters with accents in identifiers
XML (1.1) can use letters with accents in tags
C# can use letters with accents in variables
SQL: MySQL/MS-Sql/Oralcle/etc. can use accents in fields or request
etc.
etc.

Python MUST make up for its lost time.

MCI

And generally nobody use it.
It sounds like "are for art's sake".

But OK. Maybe it'll be some impulse to learn some new languages.

+1 for this PEP

Guest · May 14, 2007

wrote:

This is a nice candidate for homoglyph confusion. There's the Greek
letter omega (U+03A9) Î© and the SI unit symbol (U+2126) â„¦, and I think
some omegas in the mathematical symbols area too.

Under the PEP, identifiers are converted to normal form NFC, and
we have

py> unicodedata.normalize("NFC", u"\u2126")
u'\u03a9'

So, OHM SIGN compares equal to GREEK CAPITAL LETTER OMEGA. It can't
be confused with it - it is equal to it by the proposed language
semantics.

Regards,
Martin

Guest · May 14, 2007

Not providing an explicit listing of allowed characters is inexcusable

sloppiness.

That is a deliberate part of the specification. It is intentional that
it does *not* specify a precise list, but instead defers that list
to the version of the Unicode standard used (in the unicodedata
module).

The XML standard is an example of how listings of large parts of the
Unicode character set can be provided clearly, exactly and (almost)
concisely.

And, indeed, this is now recognized as one of the bigger mistakes
of the XML recommendation: they provide an explicit list, and fail
to consider characters that are unassigned. In XML 1.1, they try
to address this issue, by now allowing unassigned characters in
XML names even though it's not certain yet what those characters
mean (until they are assigned).

Am I the first to notice how unsuitable these characters are?

Probably. Nobody in the Unicode consortium noticed, but what
do they know about suitability of Unicode characters...

Regards,
Martin

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= · May 14, 2007

Neil said:
C#, Java, Ecmascript, Visual Basic.

Specification-wise, C99 and C++98 also support Unicode identifiers,
although many compilers still don't.

For dynamic languages, Groovy also supports it.

Regards,
Martin

Neil Hodgson · May 14, 2007

Martin v. LÃ¶wis:

Specification-wise, C99 and C++98 also support Unicode identifiers,
although many compilers still don't.

Ada 2005 allows Unicode identifiers and even includes the constant
'Ï€' in Ada.Numerics.

Neil

Paul Rubin · May 14, 2007

Stefan Behnel said:
But then, where's the problem? Just stick to accepting only patches that are
plain ASCII *for your particular project*.

There is no feature that has ever been proposed for Python, that cannot
be supported with this argument. If you don't like having a "go to"
statement added to Python, where's the problem? Just don't use it in
your particular project.

ZeD · May 15, 2007

Neil said:
Ada 2005 allows Unicode identifiers and even includes the constant
'?' in Ada.Numerics.

this. is. cool.

(oh, and +1 for the pep)

Atoms, Identifiers, and Primaries	21	Apr 16, 2013
Generating valid identifiers	8	Jul 26, 2012
Non-identifiers in dictionary keys for **expression syntax	3	May 23, 2013
Renaming identifiers & debugging	14	Feb 25, 2010
Looking for UNICODE to ASCII Conversioni Example Code	15	Oct 18, 2013
Python 3.5, bytes, and %-interpolation (aka PEP 461)	10	Feb 24, 2014
Is PEP-8 a Code or More of a Guideline?	52	May 26, 2007
Extended identifiers?	1	Jun 15, 2012

PEP 3131: Supporting Non-ASCII Identifiers

Anders J. Munch

Anders J. Munch

rurpy

Anton Vredegoor

Stefan Behnel

Jakub Stolarski

Pierre Hanser

Terry Reedy

Michel Claveau

Marc 'BlackJack' Rintsch

Stefan Behnel

Grant Edwards

Guest

Jakub Stolarski

Guest

Guest

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Neil Hodgson

Paul Rubin

ZeD

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads