unicode as valid naming symbols

Mark H Harris · Mar 25, 2014

greetings, I would like to create a lamda as follows:

âˆš = lambda n: sqrt(n)

however this works:

The question is which unicode(s) are capable of being proper name
characters, and which ones are off-limits, and why?

marcus

wxjmfauth · Mar 25, 2014

Le mardi 25 mars 2014 19:30:34 UTC+1, Mark H. Harris a Ã©critÂ :

greetings, I would like to create a lamda as follows:

âˆš = lambda n: sqrt(n)

On my keyboard mapping the "problem" character is alt-v which produces

the radical symbol. When trying to set the symbol as a name within the

name-space gives a syntax error:

SyntaxError: invalid character in identifier

however this works:

The question is which unicode(s) are capable of being proper name

characters, and which ones are off-limits, and why?

marcus

S.isidentifier() -> bool

Return True if S is a valid identifier according
to the language definition.
cf "unicode.org" doc

jmf

Mark H Harris · Mar 25, 2014

S.isidentifier() -> bool

Return True if S is a valid identifier according
to the language definition.

cf "unicode.org" doc

Excellent, thanks!

marcus

MRAB · Mar 25, 2014

greetings, I would like to create a lamda as follows:

âˆš = lambda n: sqrt(n)

however this works:

The question is which unicode(s) are capable of being proper name
characters, and which ones are off-limits, and why?

It's explained in PEP 3131.

Basically, a name should to start with a letter (this has been extended
to include Chinese characters, etc) or an underscore.

Î» is a classified as Lowercase_Letter.

âˆš is classified as Math_Symbol.

Mark H Harris · Mar 25, 2014

It's explained in PEP 3131.

Basically, a name should to start with a letter (this has been extended
to include Chinese characters, etc) or an underscore.

Î» is a classified as Lowercase_Letter.

âˆš is classified as Math_Symbol.

Thanks much! I'll note that for improvements. Any unicode symbol
(that is not a number) should be allowed as an identifier.

marcus

Mark H Harris · Mar 25, 2014

It's explained in PEP 3131.

Basically, a name should to start with a letter (this has been extended
to include Chinese characters, etc) or an underscore.

Î» is a classified as Lowercase_Letter.

âˆš is classified as Math_Symbol.

Thanks much! I'll note that for improvements. Any unicode symbol
(that is not a number) should be allowed as an identifier.

marcus

Dave Angel · Mar 25, 2014

Mark H Harris said:
greetings, I would like to create a lamda as follows:

âˆš = lambda n: sqrt(n)

however this works:

The question is which unicode(s) are capable of being proper name
characters, and which ones are off-limits, and why?

See the official docs

http://docs.python.org/3/reference/lexical_analysis.html#identifiers

There's also a method on str that'll tell you: isidentifier ().
To see such methods, use dir ("")

As for why, you can get a pretty good idea from the reference
above, as it lists 12 unicode categories that can be used. You
can also look at pep3131 and at Potsdam ' s site. Both links are
on the above page. Letters, marks, connectors, and numbers, but
not punctuation.

Marko Rauhamaa · Mar 25, 2014

Mark H Harris said:
Thanks much! I'll note that for improvements. Any unicode symbol
(that is not a number) should be allowed as an identifier.

I don't know if that's a good idea, but that's how it is in lisp/scheme.

Thus, "*" and "1+" are normal identifiers in lisp and scheme.

Marko

Ian Kelly · Mar 25, 2014

Thanks much! I'll note that for improvements. Any unicode symbol (that
is not a number) should be allowed as an identifier.

âˆš cannot be used in identifiers for the same reasons that + and ~
cannot: identifiers are intended to be alphanumeric. âˆš is not
currently the name of an operator, but who knows what may happen in
the future?

Python generally follows Annex 31 of the Unicode standard in this regard:

http://www.unicode.org/reports/tr31/

Skip Montanaro · Mar 25, 2014

I don't know if that's a good idea, but that's how it is in lisp/scheme.

Thus, "*" and "1+" are normal identifiers in lisp and scheme.

But parsing Lisp is pretty trivial.

Skip

Tim Chase · Mar 25, 2014

Thanks much! I'll note that for improvements. Any unicode
symbol (that is not a number) should be allowed as an identifier.

It's not just about number'ness:
True

-tkc

Cameron Simpson · Mar 25, 2014

I don't know if that's a good idea, but that's how it is in lisp/scheme.

I think it is a terrible idea. Doing that preemptively prevents
allowing them for any other purpose in the future, ever.

Identifiers are easy if you stick to the corresponding Unicode class.

Sucking in every other symbol prevents other uses later. Such as using the
square root symbol as a prefix operator. Etc.

Don't be too grabby with syntax; it leaves no room later for better syntax.

Cheers,

Ethan Furman · Mar 25, 2014

Thanks much! I'll note that for improvements. Any unicode symbol (that is not a number) should be allowed as an
identifier.

No, it shouldn't. Doing so would mean we could not use âˆš as the square root operator in the future.

Identifiers are made up of letters, numbers, and the underscore. Considering all the unicode letters and unicode
numbers out there, you shouldn't be lacking for names.

Steven D'Aprano · Mar 25, 2014

Thanks much! I'll note that for improvements. Any unicode symbol
(that is not a number) should be allowed as an identifier.

To quote a great Spaniard:

â€œYou keep using that word, I do not think it means what you
think it means.â€

Do you think that the ability to write this would be an improvement?

import âŒº
âŒš = âŒº.â•©â–‘
â‘¥ = 5*âŒº.â‹¨â‹©
â¹ = â‘¥ - 1
â™…âš•âš› = [âŒº.âœ±âœ³**âŒº.â‡*â¹{â ª|âŒš.âˆ£} for â ª in âŒº.â£š]
âŒº.Ë˜ËœÂ¨Â´Õ›Õœ(â™…âš•âš›)

Of course, it's not even necessary to be that exotic. "Any unicode symbol
that is not a number"... that means things like these:

x+y
spam.eggs
cheese["toast"]

would count as identifiers, which could lead to a little bit of parsing
ambiguity... *wink*

There are languages that can allow arbitrary symbols as identifiers, like
Lisp and Forth. You will note that they have a certain reputation for
being, um, different, and although both went through periods of
considerable popularity, both have faded in popularity since. While they
have their strengths, and their defenders, nobody argues that they are
readily accessible to the average programmer.

Rustom Mody · Mar 26, 2014

Le mardi 25 mars 2014 19:30:34 UTC+1, Mark H. Harris a Ã©critÂ :
S.isidentifier() -> bool

Return True if S is a valid identifier according
to the language definition.

Thanks jmf!
You obviously have more unicode knowledge than many (most?) of us here.
And when you contribute that knowledge in short-n-sweet form as above
it is helpful to all.

cf "unicode.org" doc

Ummm...
Less helpful here.
What/where do you expect someone to start reading?
If a python beginner asks some basic question and someone here were to say
"Go read up on http://python.org"
who is helped?

Terry Reedy · Mar 26, 2014

greetings, I would like to create a lamda as follows:

A lambda is a function lacking a proper name.

âˆš = lambda n: sqrt(n)

This is discouraged in PEP8. If the following worked,

def âˆš(n): return sqrt(n)

would have âˆš as its __name__ attribute

MRAB · Mar 26, 2014

No, it shouldn't. Doing so would mean we could not use âˆš as the square root operator in the future.

Or as a root operator, e.g. 3 âˆš x (the cube root of x).

Antoon Pardon · Mar 26, 2014

No, it shouldn't. Doing so would mean we could not use âˆš as the
square root operator in the future.

And what advantage would that bring over just using it as a function?

Antoon Pardon · Mar 26, 2014

Or as a root operator, e.g. 3 âˆš x (the cube root of x).

Personally I would think such an operator is too limited to include in a programming language.
This kind of notation is only used with a constant to indicate what kind of root is taken and
only with positive integers. Something like the equivallent of the following I have never seen.

t = 2.5
x = 8.2
y = t âˆš x

Of course we don't have to follow mathematical convention with python. However allowing any
unicode symbol as an identifier doesn't prohibit from using âˆš as an operator. We do have
"in" and "is" as operators now, even if they would otherwise be acceptable identifiers.
So I wonder, would you consider to introduce log as an operator. 2 log x seems an interesting
operation for a programmer.

Ian Kelly · Mar 26, 2014

Personally I would think such an operator is too limited to include in a programming language.
This kind of notation is only used with a constant to indicate what kind of root is taken and
only with positive integers. Something like the equivallent of the following I have never seen.

t = 2.5
x = 8.2
y = t âˆš x

An example is taking the geometric mean of an arbitrary number of values:

product = functools.reduce(operator.mul, values, 1)
n = len(values)
geometric_mean = n âˆš product

I might argue though for the inverted syntax (product âˆš n) to more
closely parallel division.

Of course we don't have to follow mathematical convention with python. However allowing any
unicode symbol as an identifier doesn't prohibit from using âˆš as an operator. We do have
"in" and "is" as operators now, even if they would otherwise be acceptable identifiers.
So I wonder, would you consider to introduce log as an operator. 2 log x seems an interesting
operation for a programmer.

If it's going to become an operator, then it has to be a keyword.
Changing a token that is currently allowed to be an identifier into a
keyword is generally avoided as much as possible, because it breaks
backward compatibility. "in" and "is" have both been keywords for a
very long time, perhaps since the initial release of Python.

Python Unicode handling wins again -- mostly	67	Nov 30, 2013
Unicode help please	5	Oct 19, 2013
Benchmarking stripping of Unicode characters which are invalid XML	0	Mar 18, 2012
Weird Behavior with Rays in C and OpenGL	4	Feb 13, 2024
byte count unicode string	0	Sep 21, 2006
Demystifying Symbols.	23	Jan 5, 2006
Python's handling of unicode surrogates	17	Apr 20, 2007
FAQ 9.17 How do I check a valid mail address?	0	Apr 15, 2011

unicode as valid naming symbols

Mark H Harris

wxjmfauth

Mark H Harris

MRAB

Mark H Harris

Mark H Harris

Dave Angel

Marko Rauhamaa

Ian Kelly

Skip Montanaro

Tim Chase

Cameron Simpson

Ethan Furman

Steven D'Aprano

Rustom Mody

Terry Reedy

MRAB

Antoon Pardon

Antoon Pardon

Ian Kelly

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads