python string comparison oddity

F

Faheem Mitha

Hi everybody,

I was wondering if anyone can explain this. My understanding is that 'is'
checks if the object is the same. However, in that case, why this
inconsistency for short strings? I would expect a 'False' for all three
comparisons. This is reproducible across two different machines, so it is
not just a local quirk. I'm running Debian etch with Python 2.4.4 (the
default).
Thanks, Faheem.

In [1]: a = '--'

In [2]: a is '--'
Out[2]: False

In [4]: a = '-'

In [5]: a is '-'
Out[5]: True

In [6]: a = 'foo'

In [7]: a is 'foo'
Out[7]: True
 
L

Lie

Hi everybody,

I was wondering if anyone can explain this. My understanding is that 'is'
checks if the object is the same. However, in that case, why this
inconsistency for short strings? I would expect a 'False' for all three
comparisons. This is reproducible across two different machines, so it is
not just a local quirk. I'm running Debian etch with Python 2.4.4 (the
default).
                                                            Thanks, Faheem.

In [1]: a = '--'

In [2]: a is '--'
Out[2]: False

In [4]: a = '-'

In [5]: a is '-'
Out[5]: True

In [6]: a = 'foo'

In [7]: a is 'foo'
Out[7]: True

Yes, this happens because of small objects caching. When small
integers or short strings are created, there are possibility that they
might refer to the same objects behind-the-scene. Don't rely on this
behavior.
 
F

Faheem Mitha

Hi everybody,

I was wondering if anyone can explain this. My understanding is that 'is'
checks if the object is the same. However, in that case, why this
inconsistency for short strings? I would expect a 'False' for all three
comparisons. This is reproducible across two different machines, so it is
not just a local quirk. I'm running Debian etch with Python 2.4.4 (the
default).
                                                            Thanks, Faheem.

In [1]: a = '--'

In [2]: a is '--'
Out[2]: False

In [4]: a = '-'

In [5]: a is '-'
Out[5]: True

In [6]: a = 'foo'

In [7]: a is 'foo'
Out[7]: True

Yes, this happens because of small objects caching. When small
integers or short strings are created, there are possibility that they
might refer to the same objects behind-the-scene. Don't rely on this
behavior.

Yes, but why is '-' and 'foo' cached, and not '--'? Do you know what
the basis of the choice is?
Faheem.
 
R

Robert Kern

Faheem said:
Hi everybody,

I was wondering if anyone can explain this. My understanding is that 'is'
checks if the object is the same. However, in that case, why this
inconsistency for short strings? I would expect a 'False' for all three
comparisons. This is reproducible across two different machines, so it is
not just a local quirk. I'm running Debian etch with Python 2.4.4 (the
default).
Thanks, Faheem.

In [1]: a = '--'

In [2]: a is '--'
Out[2]: False

In [4]: a = '-'

In [5]: a is '-'
Out[5]: True

In [6]: a = 'foo'

In [7]: a is 'foo'
Out[7]: True
Yes, this happens because of small objects caching. When small
integers or short strings are created, there are possibility that they
might refer to the same objects behind-the-scene. Don't rely on this
behavior.

Yes, but why is '-' and 'foo' cached, and not '--'? Do you know what
the basis of the choice is?

Shortish Python identifiers and operators, I think. Plus a handful like '\x00'.
The source would know for sure, but alas, I am lazy.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
L

Lie

Hi everybody,
I was wondering if anyone can explain this. My understanding is that 'is'
checks if the object is the same. However, in that case, why this
inconsistency for short strings? I would expect a 'False' for all three
comparisons. This is reproducible across two different machines, so it is
not just a local quirk. I'm running Debian etch with Python 2.4.4 (the
default).
                                                            Thanks, Faheem.
In [1]: a = '--'
In [2]: a is '--'
Out[2]: False
In [4]: a = '-'
In [5]: a is '-'
Out[5]: True
In [6]: a = 'foo'
In [7]: a is 'foo'
Out[7]: True
Yes, this happens because of small objects caching. When small
integers or short strings are created, there are possibility that they
might refer to the same objects behind-the-scene. Don't rely on this
behavior.

Yes, but why is '-' and 'foo' cached, and not '--'? Do you know what
the basis of the choice is?
                                                             Faheem.

Yes, but we're already warned not to rely on it since the basis of
what may be cached and what-not might be arbitrary. Personally, I'd
not delve deeply into them, they aren't a reliable behavior.
 
H

Hrvoje Niksic

Faheem Mitha said:
Yes, but why is '-' and 'foo' cached, and not '--'? Do you know what
the basis of the choice is?

Caches such as intern dictionary/set and one-character cache are
specific to the implementation (and also to its version version,
etc.). In this case '-' is a 1-character string, all of which are
cached. Python also interns strings that show up in Python source as
literals that can be interpreted as identifiers. It also reuses
string literals within a single expression. None of this should be
relied on, but it's interesting to get insight into the implementation
by examining the different cases:
True # string repeated within an expression is simply reused
False # not cached
False # all 1-character strings are cached
True # flobozz is a valid identifier, so it's cached
False
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,266
Messages
2,571,073
Members
48,772
Latest member
Backspace Studios

Latest Threads

Top