Proposal for adding symbols within Python

  • Thread starter Pierre Barbier de Reuille
  • Start date
B

Ben Finney

Steven D'Aprano said:
The only advantage would be if you want to do something like this:

MO, TU, WE, TH, FR, SA, SU = Symbols()

I believe that's exactly what Pierre doesn't want to do. He wants to
simply use names (marked special in some way) and have Python
automatically determine a unique value for each name, with nary an
assignment in sight.

To me, that's a net loss. It makes names more complicated, it loses
"explicit is better than implicit", and it loses the checks Python
could do against using a name that hasn't been assigned a value
(caused by e.g. a misspelled name).
 
E

Erik Max Francis

Ben said:
I believe that's exactly what Pierre doesn't want to do. He wants to
simply use names (marked special in some way) and have Python
automatically determine a unique value for each name, with nary an
assignment in sight.

To me, that's a net loss. It makes names more complicated, it loses
"explicit is better than implicit", and it loses the checks Python
could do against using a name that hasn't been assigned a value
(caused by e.g. a misspelled name).

I agree. And, when done explicitly, it's already easy enough to do this
within the language, by just assigning it a value, even if it's an
integer from range/xrange or a new sentinel like object().
 
B

Ben Finney

Michael said:
If you have a name, you can redefine a name, therefore the value a
name refers to is mutable.

Since there are mutable and immutable values, it might be clearer to
say "the binding of a name to a value can be changed". Yes?

In that case, I don't see why someone who wants such a binding to be
unchanging can't simply avoid changing it. Where's the case for having
Python enforce this?
Conversely consider "NAME" to be a symbol. I can't modify "NAME". It
always means the same as "NAME" and "NAME", but is never the same as
"FRED". What's tricky is I can't have namespaceOne."NAME" [1] and
namespaceTwo."NAME" as different "NAME"s even though logically
there's no reason I couldn't treat "NAME" differently inside each.

So you want to mark such objects as being in a namespace, where they
compare the same within that namespace but not outside. Why is
separate syntax necessary for this? A data type that is informed of
its "space", and refuses comparison with values from other spaces,
would suffice.

class Xenophobe(object):
def __init__(self, space, value):
self.__space = space
self.__value = value

def __str__(self):
return str(self.__value)

def __cmp__(self, other):
if not isinstance(other, Xenophobe):
raise AssertionError, \
"Can only compare Xenophobes to each other"
if not other.__space == self.__space:
raise AssertionError, \
"Can only compare Xenophobes from the same space"
return cmp(self.__value, other.__value)

With the bonus that you could pass such values around between
different names, and they'd *still* compare, or not, as you choose
when you first create them.

Replace the AssertionError with some appropriate return value, if you
want such comparisons to succeed.
However it might be useful to note that these two values (or
symbols) are actually different, even if you remove their
namespaces.

The above Xenophobe implementation creates objects that know their
"space" forever.
To me, the use of a symbol implies a desire for a constant, and then
to only use that constant rather than the value. In certain
situations it's the fact that constant A is not the same as constant
B that's important (eg modelling state machines).

Since the actual value can't be easily accessed, the only purpose of a
Xenophobe is too be created and compared to others.
Often you can use strings for that sort of thing, but unfortunately
even python's strings can't be used as symbols that are always the
same thing in all ways. For example, we can force the id of
identical strings to be different:

Hold up -- what in your stated use case requires *identity* to be the
same? You said you just wanted to compare them to each other.

Besides, Python *does* preserve identity for short strings...
(135049832, 135059864)

.... which is sufficient for unique values, if you're not actively
fighting the system as above. If the use case is for brief, identical
values, we have those: short strings.
As a result I can see that *IF* you really want this kind of symbol,
rather than the various other kinds people have discussed in the
thread, that some special syntax (like u'hello' for unicode 'hello')
could be useful.

I can see the case for a new type. I disagree that syntax changes are
necessary.
However, I'd be more interested in some real world usecases where
this would be beneficial, and then seeing what sort of syntax would
be nice/useful (Especially since I can think of some uses where it
would be nice).

Real use cases would interest me, too, but *only* if they can't be
satisfied with a new type that knows things about its creation state,
such as Xenophobe.
The reason I'm more interested in seeing usecases, is because I'd
rather see where the existing approaches people use/define symbols
has caused the OP problems to the extent he feels the language needs
to change to fix these real world problems.

Ditto.
 
P

Pierre Barbier de Reuille

Ben Finney a écrit :
Since there are mutable and immutable values, it might be clearer to
say "the binding of a name to a value can be changed". Yes?

In that case, I don't see why someone who wants such a binding to be
unchanging can't simply avoid changing it. Where's the case for having
Python enforce this?

The problem is not about having something constant !
The main point with symbols is to get human-readable values.
Let say you have a symbol "opened" and a symbol "closed". The state of a
file may be one of the two.

If you have these symbols, you can ask for the state at any point and
get something readable. If you use constants valued, typically, to
integers, the state of your file will we 0 or 1, which does not mean
anything.

Now, if you're using an object with more than two states, and moreover
if the number of states is likely to increase during developpement, it's
much more convenient to directly get the *meaning* of the value rather
than the value itself (which does not mean anything).

The key point that, I think, you misunderstand is that symbols are not
*variables* they are *values*.
Conversely consider "NAME" to be a symbol. I can't modify "NAME". It
always means the same as "NAME" and "NAME", but is never the same as
"FRED". What's tricky is I can't have namespaceOne."NAME" [1] and
namespaceTwo."NAME" as different "NAME"s even though logically
there's no reason I couldn't treat "NAME" differently inside each.


So you want to mark such objects as being in a namespace, where they
compare the same within that namespace but not outside. Why is
separate syntax necessary for this? A data type that is informed of
its "space", and refuses comparison with values from other spaces,
would suffice.

class Xenophobe(object):
def __init__(self, space, value):
self.__space = space
self.__value = value

def __str__(self):
return str(self.__value)

def __cmp__(self, other):
if not isinstance(other, Xenophobe):
raise AssertionError, \
"Can only compare Xenophobes to each other"
if not other.__space == self.__space:
raise AssertionError, \
"Can only compare Xenophobes from the same space"
return cmp(self.__value, other.__value)

With the bonus that you could pass such values around between
different names, and they'd *still* compare, or not, as you choose
when you first create them.

Replace the AssertionError with some appropriate return value, if you
want such comparisons to succeed.

Well, I think a new syntax will promote the use of symbols. And as I
think they are good practice (much better than meaningless constants)
they should be promoted. Needless to say that in every language I know
implementing symbols (or something close to symbols), there is an
easy-to-use syntax associated.
The above Xenophobe implementation creates objects that know their
"space" forever.




Since the actual value can't be easily accessed, the only purpose of a
Xenophobe is too be created and compared to others.




Hold up -- what in your stated use case requires *identity* to be the
same? You said you just wanted to compare them to each other.

Besides, Python *does* preserve identity for short strings...




... which is sufficient for unique values, if you're not actively
fighting the system as above. If the use case is for brief, identical
values, we have those: short strings.

Well, one *big* difference between short string and symbols is that the
identity between short strings are implementation dependant, while
between symbols it has to be in all implementations as you will rely on
this identity. Then, once more, strings are just one possible
implementation for symbols and I wouldn't like to tie that much symbols
to strings.
I can see the case for a new type. I disagree that syntax changes are
necessary.

Well, syntactic sugar is all about what you want to promote or
discourage ... it is never necessary, even if it can really be usefull.
Real use cases would interest me, too, but *only* if they can't be
satisfied with a new type that knows things about its creation state,
such as Xenophobe.

Well, once more, new syntax for new objects never solve new problems,
they just make them easier to write.

Pierre
 
R

Rocco Moretti

Pierre said:
Please, note that I am entirely open for every points on this proposal
(which I do not dare yet to call PEP).

I still don't see why you can't just use strings. The only two issues I
see you might have with them are a) two identical strings might not be
identical by id(), b) they aren't local in scope.

The objection a) is minor. One, all of your examples use equality for
testing already, and two, short strings are interned and identical in
most cases anyway (they only differ if you go to lengths to create
them, or they aren't sufficiently "variable like") - at most you would
have to standardize the rules.

The objection b) is a little harder to dismiss. But I'm not sure if
you've completely thought what it means for a symbol to be "local to a
module". What happens when you assign a variable containing a symbol to
a variable in another module? For that matter, what does it mean to be
"in a module". Which module is a class instance (and associated sybols)
"in" if the class is defined in one module, instantiated in another, and
then passed as a return value to a third? What about from ... imports?
If you need a symbol "from another class" what's the mechanism of
obtaining it? Can you import symbols? Since you advocate storing symbols
internally as integers, I suppose you would have a program-global table
to keep symbols from different modules from having the same internal
representation. How do you pickle a symbol and have it go to a different
Python program, which may have a massive symbol table of it's own?


It's been said before, and I'll say it again - the key to successful
Python language changes is compelling use cases. Find an existing Python
program or library (the stdlib is best) which would be markedly improved
by your language change. Not only will Guido be more likely to be
convinced, but what you're proposing will likely be clearer to everyone
else, if it's grounded in practical use cases.
 
B

Ben Finney

Pierre Barbier de Reuille said:
The problem is not about having something constant !
The main point with symbols is to get human-readable values.
Let say you have a symbol "opened" and a symbol "closed". The state
of a file may be one of the two.

from some_enum_module import Enum

FileState = Enum('open', 'closed')

input_file.state = FileState.closed
If you have these symbols, you can ask for the state at any point
and get something readable. If you use constants valued, typically,
to integers, the state of your file will we 0 or 1, which does not
mean anything.

str(input_file.state) # -> 'closed'
Now, if you're using an object with more than two states, and
moreover if the number of states is likely to increase during
developpement, it's much more convenient to directly get the
*meaning* of the value rather than the value itself (which does not
mean anything).

PixelColour = Enum('red', 'green', 'blue', 'black')
The key point that, I think, you misunderstand is that symbols are
not *variables* they are *values*.

So far, I see nothing that requires anything but a special object type
with the behaviour you describe. Which most of the enumerated-type
implementations do quite neatly.
Well, once more, new syntax for new objects never solve new
problems, they just make them easier to write.

If you want to promote something, it would be best to implement it and
demonstrate some problems that it solves. You don't seem to be arguing
against a new object type, so perhaps it would be best to simply start
using that type to solve some actual problems.

Since "promotion" is the only argument you've given for new syntax
for this concept, I don't see what is served talking about creating
syntax for something that does not yet exist to be promoted. Once an
implementation exists for examination and is actually useful to some
amount of users for solving actual problems, that's the time to talk
about promoting it.
 
S

Steven D'Aprano

The problem is not about having something constant !
The main point with symbols is to get human-readable values.
Let say you have a symbol "opened" and a symbol "closed". The state of a
file may be one of the two.

If you have these symbols, you can ask for the state at any point and
get something readable. If you use constants valued, typically, to
integers, the state of your file will we 0 or 1, which does not mean
anything.

???

Why does the byte string "\x6f\x70\x65\x6e\x65\x64" have intrinsic meaning
when the int 0 doesn't? It certainly doesn't mean anything to non-English
speakers.

If all you want is human readable byte strings, then just use them:

class MyFile:
def open(self):
self.state = "opened"
def close(self):
self.state = "closed"


You don't need special syntax to use strings as symbols, you get them for
free without all the overhead you are proposing.

Now, if you're using an object with more than two states, and moreover
if the number of states is likely to increase during developpement, it's
much more convenient to directly get the *meaning* of the value rather
than the value itself (which does not mean anything).

How do you expect this to work in practice? You have an object which
has states:

obj = SomeThingComplex()

Now you want to operate on it according to the state. What do you do?

if obj.state is $closed$:
obj.open()
elif obj.state is $opened$:
obj.close()
elif obj.state is $full$:
obj.make_empty()
elif obj.state is $empty$:
obj.make_full()
else:
# some other symbol
raise ValueError("Unexpected state %s") % obj.state

Replace "is" with "==" and $ with " and you have strings. You still need
to know what the object state is, and the way you do that is by comparing
it to something. Whether you call that something a symbol, an enum, a
string, an int, a class, whatever, the comparison still needs to be done.

The key point that, I think, you misunderstand is that symbols are not
*variables* they are *values*.

Python doesn't have variables. It has names and objects.

Well, I think a new syntax will promote the use of symbols. And as I
think they are good practice (much better than meaningless constants)
they should be promoted. Needless to say that in every language I know
implementing symbols (or something close to symbols), there is an
easy-to-use syntax associated.

Why is $closed$ better practice than "closed"?

Why is "closed" a meaningless constant and $closed$ a good symbol?

Well, one *big* difference between short string and symbols is that the
identity between short strings are implementation dependant,

Then don't use identity. Who cares whether the state you are testing
against points to the same chunk of memory or not? What possible
difference will that make, except some unnecessary optimization
_possibly_ saving you one millionth of a second at runtime?
while
between symbols it has to be in all implementations as you will rely on
this identity. Then, once more, strings are just one possible
implementation for symbols and I wouldn't like to tie that much symbols
to strings.

And ints are another possible implementation for symbols, or classes, or
enums.

obj.state = 42 is not an ideal implementation, because it is not
self-documenting, and self-documenting code is good code. But in some
contexts, it may be the right thing to do:

class MutablePolygon:
"""Define a polygon object that can grow or lose sides."""
def __init__(self, n):
"""Create a new polygon with n sides."""
self.state = n
def grow_side(self):
self.state += 1
def lose_side(self):
self.state -= 1

Compare that with something like this:

class MutablePolygon:
"""Define a polygon object that can grow or lose sides."""
def __init__(self, n):
"""Create a new polygon with n sides."""
if n == 1:
self.state = $one$
elif n == 2:
self.state = $two$
elif n == 3:
self.state = $three$
elif n ...
 
G

Grant Edwards

I still don't see why you can't just use strings.

Same here. In the situations described, I always use strings
and have never felt the need for something else:

file.state = 'closed'

...

if file.state == 'open':
whatever
elif file.state == 'error':
something_else

The only two issues I see you might have with them are a) two
identical strings might not be identical by id(), b) they
aren't local in scope.

The objection a) is minor. [...]

The objection b) is a little harder to dismiss. But I'm not
sure if you've completely thought what it means for a symbol
to be "local to a module".

I don't think I even understand what the objection is. What is
needed is a code fragment that shows how the use of strings is
untenable.
 
G

Guest

Steven D'Aprano said:
Why does the byte string "\x6f\x70\x65\x6e\x65\x64" have intrinsic
meaning when the int 0 doesn't? It certainly doesn't mean anything to
non-English speakers.

If all you want is human readable byte strings, then just use them:

class MyFile:
def open(self):
self.state = "opened"
def close(self):
self.state = "closed"

So, I guess no one read my explanation of why this an issue about more
than implementing enums (which is fairly trivial, as we have seen).
 
S

Steven D'Aprano

Björn Lindström said:
So, I guess no one read my explanation of why this an issue about more
than implementing enums (which is fairly trivial, as we have seen).


I read it. I don't see that it is an issue, and I
especially don't see why it is relevent to Pierre's
usage of symbols.

In your earlier post, you say:

"The problem with that is that you can't pass around
the names of objects that are used for other things."

That's demonstrably not true. If you know that the name
of something is Parrot, then you can pass the string
"Parrot" and use it in many ways:

print obj.__getattribute__["Parrot"]
instance.__dict__["Parrot"] = 42

I'm not aware of anything that you can do to a
name/object binding that can't also be done by a
string. Perhaps I've missed something -- anyone?

If you don't know the name, well, how did it get into
your program in the first case? Where did it come from?
If it came from user input, then surely that is a
string, yes?

You also suggested:

"Being able to do that precludes the need for
converting going back and forth between strings and
method names when you need to do things like keeping a
list of function names, even when you need to be able
to change what those function names point to."

I'm not convinced. Instead of keeping a list of
function names, just keep a list of functions --
functions are first-class objects in Python.

If you need to change what the function names point to,
simply rebind the list item to another function.

The only benefit I can see for being able to refer to
functions by name would be if you are writing a formula
evaluator, it might be useful to have "sin"(0) evaluate
directly. But I don't like the idea of making strings
aliases to executables, except through a single
well-understood mechanism. I'd much rather accept one
intermediate layer than create a second mechanism of
function execution:

table = {"sin": math.sin, "cos": math.cos}
# easy to modify
table["sin"] = my_better_sine_function
result = table["sin"](0)


If I have missed a usage case, perhaps you should give
at specific example.
 
B

Ben Finney

Steven D'Aprano said:
Python doesn't have variables. It has names and objects.

That seems to be what Pierre wants to change.

What he hasn't yet made clear is what benefit this brings, over simply
using existing basic types (with as much intrinsic meaning -- i.e.
none -- as the new object he's proposing).
 
B

Ben Finney

Björn Lindström said:
So, I guess no one read my explanation of why this an issue about
more than implementing enums (which is fairly trivial, as we have
seen).

I read it. I see that something more than enums is being asked for.
What I don't see is a use case where this is a benefit over just using
an object type, such as enum or a string or something else.
 
B

Ben Sizer

Grant said:
In the situations described, I always use strings
and have never felt the need for something else:
....

I don't think I even understand what the objection is. What is
needed is a code fragment that shows how the use of strings is
untenable.

myObject.value = 'value1'

#... 100 lines of code elided...

if myObject.value = 'Value1':
do_right_thing()
else:
do_wrong_thing()


I don't actually think string use is 'untenable', but it is definitely
more error-prone. With some sort of named object on the right hand side
you will at least get a helpful NameError.
 
S

Steven D'Aprano

myObject.value = 'value1'

#... 100 lines of code elided...

if myObject.value = 'Value1':
do_right_thing()
else:
do_wrong_thing()


I don't actually think string use is 'untenable', but it is definitely
more error-prone. With some sort of named object on the right hand side
you will at least get a helpful NameError.

It is moments like this that I'm not too proud to admit I learnt some good
techniques from Pascal:

# define some pseudo-constants
RIGHT_THING = 'value1'
WRONG_THING = 'some other value'
INCHES_TO_FEET = 12
CM_TO_METRES = 100

# ...

myObject.value = RIGHT_THING

#... 100 lines of code elided...

if myObject.value = RIGHT_THING:
do_right_thing()
else:
do_wrong_thing()


It isn't always appropriate or necessary to define "constants" (and I
sometimes wish that Python would enforce assign-once names), but they can
help avoid some silly mistakes.
 
G

Grant Edwards

myObject.value = 'value1'

#... 100 lines of code elided...

if myObject.value == 'Value1':
do_right_thing()
else:
do_wrong_thing()

I don't actually think string use is 'untenable', but it is
definitely more error-prone. With some sort of named object on
the right hand side you will at least get a helpful NameError.

I don't see how that's an argument in favor of the proposal
being discussed. Aren't $Value1 and $value1 both legal and
distinct symbols in the proposed syntax? Won't you have the
exact same issue that you do with mis-typing strings?
 
R

Rocco Moretti

Björn Lindström said:
So, I guess no one read my explanation of why this an issue about more
than implementing enums (which is fairly trivial, as we have seen).

I did, but I still don't see why it is an argument against using
strings. The point you may not appreciate is that (C)Python already uses
strings to represent names, as an important part of its introspective
abilities.

##########################################
module.klass.method()
2 0 LOAD_GLOBAL 0 (module)
3 LOAD_ATTR 1 (klass)
6 LOAD_ATTR 2 (method)
9 CALL_FUNCTION 0
12 POP_TOP
13 LOAD_CONST 0 (None)
16 RETURN_VALUE
>>> f.func_code.co_names ('module', 'klass', 'method')
>>> type(f.func_code.co_names[1]) is type('a')
True
##############################################

I'll let you dig through the interpreter source to convince yourself
that, indeed, the names module, klass, and method are stored internally
as true python strings. The same holds for other namespaces - the names
are stored as real python strings, in a real python dictionary.

############################################ def foo(self):
pass
def bar(self):
pass
def baz(self):
pass

>>> type(c.__dict__) is type({}) True
>>> c.__dict__.keys() ['baz', '__module__', 'foo', 'bar', '__doc__']
>>> type(c.__dict__.keys()[0]) is type('a')
True
##############################################

P.S. This may change for other implementations of Python, but the fact
remains - there is less difference between names and strings than you
may first think.
 
B

Bengt Richter

It isn't always appropriate or necessary to define "constants" (and I
sometimes wish that Python would enforce assign-once names), but they can
help avoid some silly mistakes.
(As I'm sure you know) you can have "assign-once" names
if you are willing to spell them with a dot ;-)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
AttributeError: can't set attribute

One could also write a more efficient write-once guarantee for function scope
using a decorator that munges byte code to guarantee it.

Or one could write a custom import that guarantees it for module scope.

Or one could change the language ;-)

Regards,
Bengt Richter
 
P

Pierre Barbier de Reuille

Rocco Moretti a écrit :
[...]
I did, but I still don't see why it is an argument against using
strings. The point you may not appreciate is that (C)Python already uses
strings to represent names, as an important part of its introspective
abilities.

Well, I'm well aware of that, but I'm also well aware that's (as you
said yourself) specific to C-Python, so can just *cannot* rely on
strings being used as symbols in the language. What I would like to see
in Python is "names" (or "symbols", as you prefer) defined within the
language so that you'll get something similar in whatever Python
implementation.

Then, in CPython, names may well be just strings are they already are
implemented to be efficient as such, but other implementation may just
choose something completly different.

The point is, why don't provide the programmer to express just what he
needs (that is, some symbolic value like "opened", "blocked", ...) and
let the interpreter use whatever he think is more efficient for him ?

That's the whole points for "names" ... being able to handle symbolic
values within the language, that's what made LISP so successful. That's
what makes dynamic languages possible ! But why say a name is a
*string* when it is just an implementation detail ??? Isn't Python
mainly about allowing the programmer to concentrate on important stuff ?

Pierre
 
B

Ben Sizer

Grant said:
I don't see how that's an argument in favor of the proposal
being discussed. Aren't $Value1 and $value1 both legal and
distinct symbols in the proposed syntax? Won't you have the
exact same issue that you do with mis-typing strings?

I think the idea is that if the symbol hasn't been instantiated locally
in an assignment operation, then it will not exist, and "if foo ==
$symbolName" will either raise a NameError or flag some error during
compilation. It cannot do this with string comparisons.

I expect this would require a 2-pass compilation process, the first
pass spotting all references to symbols and instantiating them
appropriately, the second pass resolving these references and noting
any that did not match up.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,197
Latest member
Sean29G025

Latest Threads

Top