PEP 3131: Supporting Non-ASCII Identifiers

G

Guest

Steven said:
Just a touch of hyperbole perhaps?

You know, it may come to a surprise to some people that English is not
the only common language. In fact, it only ranks third, behind Mandarin
and Spanish, and just above Arabic. Although the exact number of speakers
vary according to the source you consult, the rankings are quite stable:
Mandarin, Spanish, then English. Any of those languages could equally
have claim to be the world's lingua franca.

For a language to be a (or the) lingua franca, the sheer number of
people who speak it is actually not as important as you seem to think.
Its use as an international exchange language is the decisive criterion
-- definitely not true for Mandarin, and for Spanish not nearly as much
as for English.

Also, there can be different "linguae francae" for different fields.
English definitely is the lingua franca of programming. But that is
actually off topic. Programming languages are not the same as natural
languages. I was talking about program code, not about works of literature.
 
C

Christophe

(e-mail address removed) a écrit :
Well, that's part of the point isn't it? It seems incredibly naive to
me to think that you could use whatever symbol was intended and have
it show up, and the "well fix your machine!" argument doesn't fly. A
lot of the time programmers have to look at stack traces on end-user's
machines (whatever they may be) to help debug. They have to look at
code on the (GUI-less) production servers over a terminal link. They
have to use all kinds of environments where they can't install the
latest and greatest fonts. Promoting code that becomes very hard to
read and debug in real situations seems like a sound negative to me.

Who displays stack frames? Your code. Whose code includes unicode
identifiers? Your code. Whose fault is it to create a stack trace
display procedure that cannot handle unicode? You. Even if you don't
make use of them, you still have to fix the stack trace display
procedure because the exception error message can include unicode text
*today*

You should know that displaying and editing UTF-8 text as if it was
latin-1 works very very well.

Also, Terminals have support for UTF-8 encodings already. Or you could
always use kate+fish to edit your script on the distant server without
such problems (fish is a KDE protocol used to access a computer with ssh
as if it was a hard disk and kate is the standard text/code editor) It's
a matter of tools.
 
S

Stefan Behnel

Hendrik said:
"Beautiful is better than ugly"

Good point. Today's transliteration of German words into ASCII identifiers
definitely looks ugly. Time for this PEP to be accepted.

Stefan
 
G

Gregor Horvath

code on the (GUI-less) production servers over a terminal link. They
have to use all kinds of environments where they can't install the
latest and greatest fonts. Promoting code that becomes very hard to
read and debug in real situations seems like a sound negative to me.

If someone wants to debug a Chinese program, he has in almost all cases
obviously already installed the correct fonts and his machine can handle
unicode.

Maybe yours and mine not, but I doubt that we are going to debug a
chinese program.

I have debugged German programs (not python) with unicode characters in
it for years and had no problem at all, because all customers and me
have obviously German machines.

Gregor
 
G

Guest

Gregor said:
No. That logic can only be used to justify the introduction of a feature
that brings freedom.

That is any feature that you are not forced to use. So let's get gotos
and the like. Every programming language dictates some things. This is
not a bad thing.
 
S

Stefan Behnel

Sure, if the feature isn't going to be used then it won't present
problems.

Thing is, this feature *is* going to be used. Just not by projects that you
are likely to stumble into. Most OpenSource projects will continue to stick to
English-only, and posts to English-speaking newsgroups will also stick to
English. But Closed-Source programs and posts to non-English newsgroups *can*
use this feature if their developers want. And you still wouldn't even notice.

Stefan
 
G

Gregor Horvath

Hendrik said:
It is not so much for technical reasons as for aesthetic
ones - I find reading a mix of languages horrible, and I am
kind of surprised by the strength of my own reaction.

This is a matter of taste.
In some programs I use German identifiers (not unicode). I and others
like the mix. My customers can understand the code better. (They are
only reading it)
"Beautiful is better than ugly"

Correct.
But why do you think you should enforce your taste to all of us?

With this logic you should all drive Alfa Romeos!

Gregor
 
H

HYRY

How do you feel about the mix of English keywords and Chinese?
How does the English - like "sentences " look to a Chinese?

Would you support the extension of this PEP to include Chinese
Keywords?

Would that be a lesser or greater gift?

Because the students can remember some English words, Mixing
characters is not a problem. But it's difficult to express their own
thought or logic in English or Pinyin(only mark the pronunciation of
the Chinese character).
As my experience, I found mixing identifiers of Chinese characters and
keywords of English is very easy for reading.
Because the large difference between Chinese characters and ASCII
characters, I can distinguish my identifiers with keywords and
library words quickly.
 
N

Neil Hodgson

Eric Brunel:
... there is no
keyboard *on Earth* allowing to type *all* characters in the whole
Unicode set.

My keyboard in conjunction with the operating system (US English
keyboard on a Windows XP system) allows me to type characters from any
language. I haven't learned how to type these all quickly but I can get
through a session of testing Japanese input by myself. Its a matter of
turning on different keyboard layouts through the "Text Services and
Input Languages" control panel. Then there are small windows called
Input Method Editors that provide a mapping from your input to the
target language. Other platforms provide similar services.

Neil
 
H

Hendrik van Rooyen

Good point. Today's transliteration of German words into ASCII identifiers
definitely looks ugly. Time for this PEP to be accepted.

Nice out of context quote. :)

Now look me in the eye and tell me that you find
the mix of proper German and English keywords
beautiful.

And I will call you a liar.

- Hendrik
 
G

Guest

Stefan said:
Ok, let me put it differently.

You *do not* design Python's keywords. You *do not* design the stdlib. You *do
not* design the concepts behind all that. You *use* them as they are. So you
can simply take the identifiers they define and use them the way the docs say.
You do not have to understand these names, they don't have to be words, they
don't have to mean anything to you. They are just tools. Even if you do not
understand English, they will not get in your way. You just learn them.

I claim that this is *completely unrealistic*. When learning Python, you
*do* learn the actual meanings of English terms like "open",
"exception", "if" and so on if you did not know them before. It would be
extremely foolish not to do so. You do care about these names and you do
want to know their meaning if you want to write anything more in your
life than a 10-line throw-away script.
But you *do* design your own software. You *do* design its concepts. You *do*
design its APIs. You *do* choose its identifiers. And you want them to be
clear and telling. You want them to match your (or your clients) view of the
application. You do not care about the naming of the tools you use inside. But
you do care about clarity and readability in *your own software*.

I do care about the naming of my tools. I care alot. Part of why I like
Python is that it resisted the temptation to clutter the syntax up with
strange symbols like Perl. And I do dislike the decorator syntax, for
example.

Also, your distinction between "inside" and "your own" is nonsense,
because the "inside" does heavily leak into the "own". It is impossible
to write "your own software" with clarity and readability by your
definition (i.e. in your native language). Any real Python program is a
mix of identifiers you designed yourself and identifiers you did not
design yourself. And I think the ones chosen by yourself are even often
in the minority. It is not feasible in practice to just learn what the
"other" identifiers do without understanding their names. Not for
general programming. The standard library is already too big for that,
and most real programs use not only the standard library, but also third
party libraries that have English APIs.
 
G

Guest

Christophe said:
Who displays stack frames? Your code.
Wrong.

Whose code includes unicode
identifiers? Your code.
Wrong.

Whose fault is it to create a stack trace
display procedure that cannot handle unicode? You.

Wrong. If you never have to deal with other people's code,
congratulations to you. Many other people have to. And no, I can usualy
not just tell the person to fix his code. I need to deal with it.
Even if you don't
make use of them, you still have to fix the stack trace display
procedure because the exception error message can include unicode text
*today*

The error message can, but at least the function names and other
identifiers can not.
You should know that displaying and editing UTF-8 text as if it was
latin-1 works very very well.s

No, this only works for those characters that are in the ASCII range.
For all the other characters it does not work well at all.
Also, Terminals have support for UTF-8 encodings already.

Some have, some have not. And you not only need a terminal that can
handle UTF-8 data, you also need a font that has a glyph for all the
characters you need to handle, and you may also need a way to actualy
enter those characters with your keyboard.
 
C

Christophe

René Fleschenberg a écrit :
No, this only works for those characters that are in the ASCII range.
For all the other characters it does not work well at all.

This alone shows you don't know enouth about UTF-8 to talk about it.
UTF-8 will NEVER use < 128 chars to describe multibyte chars. When you
parse a UTF-8 file, each space is a space, each \n is an end of line and
each 'Z' is a 'Z'.
Some have, some have not. And you not only need a terminal that can
handle UTF-8 data, you also need a font that has a glyph for all the
characters you need to handle, and you may also need a way to actualy
enter those characters with your keyboard.

Ever heard of the famous "cut/paste"? I use it all the time, even when
handling standard ASCII english code. It greatly cuts down my ability to
make some typo while writing code.
 
G

Guest

Christophe said:
René Fleschenberg a écrit :

This alone shows you don't know enouth about UTF-8 to talk about it.
UTF-8 will NEVER use < 128 chars to describe multibyte chars. When you
parse a UTF-8 file, each space is a space, each \n is an end of line and
each 'Z' is a 'Z'.

So? Does that mean that you can just display UTF-8 "as if it was
Latin-1"? No, it does not. It means you can do that for exactly those
characters that are in the ASCII range. For all the others, you can not.
 
S

Stefan Behnel

René Fleschenberg said:
I claim that this is *completely unrealistic*. When learning Python, you
*do* learn the actual meanings of English terms like "open",

Fine, then go ahead and learn their actual meaning in two languages (Python
and English). My point is: you don't have to. You only need to understand
their meaning in Python. Whether or not English can help here or can be useful
in your later life is completely off-topic.

I do care about the naming of my tools. I care alot. Part of why I like
Python is that it resisted the temptation to clutter the syntax up with
strange symbols like Perl. And I do dislike the decorator syntax, for
example.

Also, your distinction between "inside" and "your own" is nonsense,
because the "inside" does heavily leak into the "own". It is impossible
to write "your own software" with clarity and readability by your
definition (i.e. in your native language). Any real Python program is a
mix of identifiers you designed yourself and identifiers you did not
design yourself. And I think the ones chosen by yourself are even often
in the minority. It is not feasible in practice to just learn what the
"other" identifiers do without understanding their names. Not for
general programming. The standard library is already too big for that,
and most real programs use not only the standard library, but also third
party libraries that have English APIs.

Ok, I think the difference here is that I have practical experience with
developing that way and I am missing native identifiers in my daily work. You
don't have that experience and therefore do not feel that need. And you know
what? That's perfectly fine. I'm not criticising that at all. All I'm
criticising is that people without need for this feature are trying to prevent
those who need it and want to use it *where it is appropriate* from actually
getting this feature into the language.

Stefan
 
G

Guest

"Years ago", i wrote RUR-PLE (a python learning environment based on
Karel the Robot).
Someone mentioned using RUR-PLE to teach programming in Chinese to
kids. Here's a little text extracted from the English lessons (and an
even smaller one from the Turkish one). I believe that this is
relevant to this discussion.
==========
While the creators of Reeborg designed him so that he obeys
instructions in English, they realised that not everyone understands
English. So, they gave him the ability to easily learn a second
language. For example, if we want to tell someone to "move forward" in
French, we would say "avance". We can tell Reeborg that "avance" is a
synonym of "move" simply by writing
avance = move.
The order here is important; the known command has to be on the right,
and the new one has to be on the left. Note that we don't have any
parentheses "()" appearing since the parentheses would tell Reeborg
that we want him to obey an instruction; here, we are simply teaching
him a new word. When we want Reeborg to follow the new instruction, we
will use avance().
[snip]

If you want, you can also teach Reeborg a synonym for turn_off. Or,
you may give synonyms in a language other than French if you prefer,
even creating your own language. Then, watch Reeborg as he obeys
instructions written in your language.
[snip]
Note that, if English is not your favourite language, you can always
create a synonym in your language, as long as you define it first,
before using it. However, the synonym you introduce must use the
English alphabet (letters without any accents). For example, in
French, one might define vire_a_gauche = turn_left and use
vire_a_gauche() to instruct the robot to turn left.

----------(this last paragraph, now translated in Turkish)

Eğer İngilizce sizin favori diliniz değilse komutları her zaman kendi
dilinizde de tanımlayabilirsiniz, ancak kendi dilinizde tanımladığınız
komutları oluştururken yalnızca İngiliz alfabesindeki 26 harfi
kullanabilirsiniz. Örneğin Türkçede sola dönüş için sola_don =
turn_left kullanılmalıdır (ö yerine o kullanılmış dikkat ediniz). Bu
tanımlamayı yaptıktan sonra Reeborg'u sola döndürmek için sola_don()
komutunu kullanabilirsiniz.
=================
I don't read Turkish, but I notice the number 26 there (plus a many
accented letters in the text), suspecting it refers to a small English
alphabet. It always bugged me that I could not have proper robot
commands in French.


While I would not use any non-ascii characters in my coding project
(because I like to be able to get bug reports [and patch!] from
others), I would love to be able to rewrite the lessons for RUR-PLE
using commands in proper French, rather than the bastardized purely
ascii based version. And I suspect it would be even more important in
Chinese...

André
 
E

Eric Brunel

Eric Brunel:


My keyboard in conjunction with the operating system (US English
keyboard on a Windows XP system) allows me to type characters from any
language. I haven't learned how to type these all quickly but I can get
through a session of testing Japanese input by myself. Its a matter of
turning on different keyboard layouts through the "Text Services and
Input Languages" control panel. Then there are small windows called
Input Method Editors that provide a mapping from your input to the
target language. Other platforms provide similar services.

Funny you talk about Japanese, a language I'm a bit familiar with and for
which I actually know some input methods. The thing is, these only work if
you know the transcription to the latin alphabet of the word you want to
type, which closely match its pronunciation. So if you don't know that 売り
å ´ is pronounced "uriba" for example, you have absolutely no way of
entering the word. Even if you could choose among a list of characters,
are you aware that there are almost 2000 "basic" Chinese characters used
in the Japanese language? And if I'm not mistaken, there are several tens
of thousands characters in the Chinese language itself. This makes typing
them virtually impossible if you don't know the language and/or have the
correct keyboard.
 
C

Carsten Haese

The X people who speak "no English" and program in Python. I
think X actually is very low (close to zero), because programming in
Python virtually does require you to know some English, wether you
can use non-ASCII characters in identifiers or not. It is naive to
believe that you can program in Python without understanding any
English once you can use your native characters in identifiers. That
will not happen. Please understand that: You basically *must* know
some English to program in Python, and the reason for that is not
that you cannot use non-ASCII identifiers.

There is evidence against your assertions that knowing some English is a
prerequisite for programming in Python and that people won't use non-ASCII
identifiers if they could. Go read the posts by "HYRY" on this thread, a
teacher from China, who teaches his students programming in Python, and they
don't know any English. They *do* use non-ASCII identifiers, and then they use
a cleanup script the teacher wrote to replace the identifiers with ASCII
identifiers so that they can actually run their programs. This disproves your
assertion on both counts.

-Carsten
 
R

Ross Ridge

Ross said:
non-ASCII identifiers. While it's easy to find code where comments use
non-ASCII characters, I was never able to find a non-made up example
that used them in identifiers.

Gregor Horvath said:
If comments are allowed to be none English, then why are identifier not?

In the code I was looking at identifiers were allowed to use non-ASCII
characters. For whatever reason, the programmers choose not use non-ASCII
indentifiers even though they had no problem using non-ASCII characters
in commonets.

Ross Ridge
 
G

Guest

You have misread my statements.

Carsten said:
There is evidence against your assertions that knowing some English is a
prerequisite for programming

I think it is a prerequesite for "real" programming. Yes, I can imagine
that if you use Python as a teaching tool for Chinese 12 year-olds, then
it might be nice to be able to spell identifiers with Chinese
characters. However, IMO this is such a special use-case that it is
justified to require the people who need this to explicitly enable it,
by using a patched interpreter or by enabling an interpreter option for
example.
in Python and that people won't use non-ASCII
identifiers if they could.

I did not assert that at all, where did you get the impression that I
do? If I were convinced that noone would use it, I would have not such a
big problem with it. I fear that it *will* be used "in the wild" if the
PEP in its current form is accepted and that I personally *will* have to
deal with such code.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,280
Latest member
BGBBrock56

Latest Threads

Top