"/a" is not "/a" ?

  • Thread starter Emanuele D'Arrigo
  • Start date
E

Emanuele D'Arrigo

Hi everybody,

while testing a module today I stumbled on something that I can work
around but I don't quite understand.
False # eeeeek!

Why c and d point to two different objects with an identical string
content rather than the same object?

Manu
 
G

Gary Herron

Emanuele said:
Hi everybody,

while testing a module today I stumbled on something that I can work
around but I don't quite understand.

*Do NOT use "is" to compare immutable types.* **Ever! **

It is an implementation choice (usually driven by efficiency considerations) to choose when two strings with the same value are stored in memory once or twice. In order for Python to recognize when a newly created string has the same value as an already existing string, and so use the already existing value, it would need to search *every* existing string whenever a new string is created. Clearly that's not going to be efficient. However, the C implementation of Python does a limited version of such a thing -- at least with strings of length 1.

Gary Herron
 
R

Robert Kern

*Do NOT use "is" to compare immutable types.* **Ever! **

Well, "foo is None" is actually recommended practice....

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
G

Gary Herron

Robert said:
Well, "foo is None" is actually recommended practice....

But since newbies are always falling into this trap, it is still a good
rule to say:

Newbies: Never use "is" to compare immutable types.

and then later point out, for those who have absorbed the first rule:

Experts: Singleton immutable types *may* be compared with "is",
although normal equality with == works just as well.

Gary Herron
 
R

Robert Kern

But since newbies are always falling into this trap, it is still a good
rule to say:

Newbies: Never use "is" to compare immutable types.

and then later point out, for those who have absorbed the first rule:

Experts: Singleton immutable types *may* be compared with "is",
although normal equality with == works just as well.

That's not really true. If my object overrides __eq__ in a funny way, "is None"
is much safer.

Use "is" when you really need to compare by object identity and not value.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
S

Steven D'Aprano

Gary said:
*Do NOT use "is" to compare immutable types.* **Ever! **

Huh? How am I supposed to compare immutable types for identity then? Your
bizarre instruction would prohibit:

if something is None

which is the recommended way to compare to None, which is immutable. The
standard library has *many* identity tests to None.

I would say, *always* use "is" to compare any type whenever you intend to
compare by *identity* instead of equality. That's what it's for. If you use
it to test for equality, you're doing it wrong. But in the very rare cases
where you care about identity (and you almost never do), "is" is the
correct tool to use.

It is an implementation choice (usually driven by efficiency
considerations) to choose when two strings with the same value are stored
in memory once or twice. In order for Python to recognize when a newly
created string has the same value as an already existing string, and so
use the already existing value, it would need to search *every* existing
string whenever a new string is created.

Not at all. It's quite easy, and efficient. Here's a pure Python string
constructor that caches strings.

class CachedString(str):
_cache = {}
def __new__(cls, value):
s = cls._cache.setdefault(value, value)
return s

Python even includes a built-in function to do this: intern(), although I
believe it has been removed from Python 3.0.

Clearly that's not going to be efficient.

Only if you do it the inefficient way.
However, the C implementation of Python does a limited version
of such a thing -- at least with strings of length 1.

No, that's not right. The identity test fails for some strings of length
one.
False


Clearly, Python doesn't intern all strings of length one. What Python
actually interns are strings that look like, or could be, identifiers:
True

It also does a similar thing for small integers, currently something
like -10 through to 256 I believe, although this is an implementation
detail subject to change.
 
S

Steven D'Aprano

Emanuele said:
Hi everybody,

while testing a module today I stumbled on something that I can work
around but I don't quite understand.

Why do you have to work around it?

What are you trying to do that requires that two strings should occupy the
same memory location rather than merely being equal?

Why c and d point to two different objects with an identical string
content rather than the same object?

Why shouldn't they?
 
G

Gary Herron

Robert said:
That's not really true. If my object overrides __eq__ in a funny way,
"is None" is much safer.

Use "is" when you really need to compare by object identity and not
value.

But that definition is the *source* of the trouble. It is *completely*
meaningless to newbies. Until one has experience in programming in
general and experience in Python in particular, the difference between
"object identity" and "value" is a mystery.

So in order to lead newbies away from this *very* common trap they often
fall into, it is still a valid rule to say

Newbies: Never use "is" to compare immutable types.

of even better

Newbies: Never use "is" to compare anything.

This will help them avoid traps, and won't hurt their use of the
language. If they get to a point that they need to contemplate using
"is", then almost be definition, they are not a newbie anymore, and the
rule is still valid.

Gary Herron
 
G

Gary Herron

Steven said:
Gary Herron wrote:



Huh? How am I supposed to compare immutable types for identity then? Your
bizarre instruction would prohibit:

if something is None

Just use:

if something == None

It does *exactly* the same thing.


But... I'm not (repeat NOT) saying *you* should do it this way.

I am saying that since newbies continually trip over incorrect uses of
"is", they should be warned against using "is" in any situation until
they understand the subtle nature or "is".

If they use a couple "something==None" instead of "something is None"
in their code while learning Python, it won't hurt, and they can change
their style when they understand the difference. And meanwhile they
will skip traps newbies fall into when they don't understand these
things yet.

Gary Herron
 
S

Steven D'Aprano

Gary said:
Robert Kern wrote: ....

But that definition is the *source* of the trouble. It is *completely*
meaningless to newbies. Until one has experience in programming in
general and experience in Python in particular, the difference between
"object identity" and "value" is a mystery.

Then teach them the difference, rather than give them bogus advice.

So in order to lead newbies away from this *very* common trap they often
fall into, it is still a valid rule to say

Newbies: Never use "is" to compare immutable types.

Look in the standard library, and you will see dozens of cases of
first-quality code breaking your "valid" rule.

Your rule is not valid. A better rule might be:

Never use "is" to compare equality.

Or even:

Never use "is" unless you know the difference between identity and equality.

Or even:

Only use "is" on Tuesdays.

At least that last rule is occasionally right (in the same way a stopped
clock is right twice a day), while your rule is *always* wrong. It is never
correct to avoid using "is" when you need to compare for identity.
of even better

Newbies: Never use "is" to compare anything.

Worse and worse! Now you're actively teaching newbies to write buggy code!
 
S

Steven D'Aprano

Gary said:
Just use:

if something == None

It does *exactly* the same thing.

Wrong.

"something is None" is a pointer comparison. It's blindingly fast, and it
will only return True if something is the same object as None. Any other
object *must* return False.

"something == None" calls something.__eq__(None), which is a method of
arbitrary complexity, which may cause arbitrary side-effects. It can have
false positives, where objects with unexpected __eq__ methods may return
True, which is almost certainly not the intention of the function author
and therefore a bug.

[...]
If they use a couple "something==None" instead of "something is None"
in their code while learning Python, it won't hurt,

Apart from the subtle bugs they introduce into their code.
and they can change
their style when they understand the difference. And meanwhile they
will skip traps newbies fall into when they don't understand these
things yet.

How about teaching them the right reasons for using "is" instead of giving
them false information by telling them they should never use it?
 
E

Emanuele D'Arrigo

Thank you everybody for the contributions and sorry if I reawoke the
recurring "is vs ==" issue. I -think- I understand how Python's
object model works, but clearly I'm still missing something. Let me
reiterate my original example without the distracting aspect of the
"==" comparisons and the four variables:
False

So, it appears that in the first case a and b are names to the same
string object, while in the second case they are to two separate
objects. Why? What's so special about the forward slash that cause the
two "/a" strings to create two separate objects? Is this an
implementation-specific issue?

Manu
 
E

Emanuele D'Arrigo

It is an implementation choice (usually driven by efficiency considerations) to choose when two strings with the same value are stored in memory once or twice.  In order for Python to recognize when a newly created string has the same value as an already existing string, and so use the already existing value, it would need to search *every* existing string whenever a new string is created.  Clearly that's not going to be efficient.  However, the C implementation of Python does a limited version of such a thing -- at least with strings of length 1.

Gary, thanks for your reply: your explanation does pretty much answer
my question. One thing I can add however is that it really seems that
non-alphanumeric characters such as the forward slash make the
difference, not just the number of characters. I.e.
False

I just find it peculiar more than a nuisance, but I'll go to the
blackboard and write 100 times "never compare the identities of two
immutables". Thank you all!

Manu
 
S

skip

Gary> *Do NOT use "is" to compare immutable types.* **Ever! **

The obvious followup question is then, "when is it ok to use 'is'?"

Robert> Well, "foo is None" is actually recommended practice....

Indeed. It does have some (generally small) performance ramifications as
well. Two trivial one-line examples:

% python -m timeit -s 'x = None' 'x is None'
10000000 loops, best of 3: 0.065 usec per loop
% python -m timeit -s 'x = None' 'x == None'
10000000 loops, best of 3: 0.121 usec per loop
% python -m timeit -s 'x = object(); y = object()' 'x == y'
10000000 loops, best of 3: 0.154 usec per loop
% python -m timeit -s 'x = object(); y = object()' 'x is y'
10000000 loops, best of 3: 0.0646 usec per loop

I imagine the distinction grows if you implement a class with __eq__ or
__cmp__ methods, but that would make the examples greater than one line
long. Of course, the more complex the objects you are comparing the
stronger the recommendation agaist using 'is' to compare two objects.

Skip
 
G

Gary Herron

Steven said:
Gary Herron wrote:



Then teach them the difference, rather than give them bogus advice.




Look in the standard library, and you will see dozens of cases of
first-quality code breaking your "valid" rule.

Your rule is not valid. A better rule might be:

Never use "is" to compare equality.

Or even:

Never use "is" unless you know the difference between identity and equality.

Or even:

Only use "is" on Tuesdays.

At least that last rule is occasionally right (in the same way a stopped
clock is right twice a day), while your rule is *always* wrong. It is never
correct to avoid using "is" when you need to compare for identity.



Worse and worse! Now you're actively teaching newbies to write buggy code!

Nonsense. Show me "newbie" level code that's buggy with "==" but
correct with "is".

However, I do like your restatement of the rule this way:
Never use "is" unless you know the difference between identity and
equality.
That warns newbies away from the usual pitfall, and (perhaps) won't
offend those
who seem to forget what "newbie" means.

Gary Herron
 
M

Martin v. Löwis

So, it appears that in the first case a and b are names to the same
string object, while in the second case they are to two separate
objects. Why?

This question is ambiguous:
a) Why does the Python interpreter behave this way?
(i.e. what specific algorithm produces this result?)
or
b) Why was the interpreter written to behave this way?
(i.e. what is the rationale for that algorithm?)

For a), the answer is in Object/codeobject.c:

/* Intern selected string constants */
for (i = PyTuple_Size(consts); --i >= 0; ) {
PyObject *v = PyTuple_GetItem(consts, i);
if (!PyString_Check(v))
continue;
if (!all_name_chars((unsigned char *)PyString_AS_STRING(v)))
continue;
PyString_InternInPlace(&PyTuple_GET_ITEM(consts, i));
}

So it interns all strings which only consist of name
characters.

For b), the rationale is that such string literals
in source code are often used to denote names, e.g.
for getattr() calls and the like. As all names are interned,
name-like strings get interned also.
What's so special about the forward slash that cause the
two "/a" strings to create two separate objects?

See above.
Is this an implementation-specific issue?

Yes, see above.

Martin
 
G

Gary Herron

Emanuele said:
Gary, thanks for your reply: your explanation does pretty much answer
my question. One thing I can add however is that it really seems that
non-alphanumeric characters such as the forward slash make the
difference, not just the number of characters. I.e.


False

I just find it peculiar more than a nuisance, but I'll go to the
blackboard and write 100 times "never compare the identities of two
immutables". Thank you all!

Unless you are *trying* to discern something about the implementation
and its attempt at efficiencies. Here's several more interesting example:
False


Gary Herron
 
G

Gabriel Genellina

False

So, it appears that in the first case a and b are names to the same
string object, while in the second case they are to two separate
objects. Why? What's so special about the forward slash that cause the
two "/a" strings to create two separate objects? Is this an
implementation-specific issue?

With all the answers you got, I hope you now understand that you put the
question backwards: it's not "why aren't a and b the very same object in
the second case?" but "why are they the same object in the first case?".

Two separate expressions, involving two separate literals, don't *have* to
evaluate as the same object. Only because strings are immutable the
interpreter *may* choose to re-use the same string. But Python would still
be Python even if all those strings were separate objects (although it
would perform a lot slower!)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top