String Identity Test

  • Thread starter Avetis KAZARIAN
  • Start date
A

Avetis KAZARIAN

After reading the discussion about the same subject ( From: "Thomas
Moore" <[email protected]> Date: Tue, 1 Nov 2005 21:45:56
+0800 ), I tried myself some tests with some confusing results (I'm a
beginner with Python, I'm coming from PHP)



# 1. Short alpha-numeric String without space

a = "b747"
b = "b747"
True



# 2. Long alpha-numeric String without space

a =
"averylongstringbutreallyaveryveryverylongstringwithabout68characters"
b =
"averylongstringbutreallyaveryveryverylongstringwithabout68characters"
True



# 3. Short alpha-numeric String with space

a = "x y"
b = "x y"
False



# 4. Long alpha-numeric String with space

a = "I love Python it s so much better than PHP but sometimes
confusing"
b = "I love Python it s so much better than PHP but sometimes
confusing"
False



# 5. Empty String

a = ""
b = ""
True


# 6. Whitecharacter String : space

a = " "
b = " "
False



# 7. Whitecharacter String : new line

a = "\n"
b = "\n"
False



# 8. Non-ASCII without space

a = "é"
b = "é"
False



# 9. Non-ASCII with space

a = "é à"
b = "é à"
False



It seems that any strict ASCII alpha-numeric string is instantiated as
an unique object, like a "singleton" ( a = "x" and b = "x" => a is b )
and that any non strict ASCII alpha-numeric string is instantiated as
a new object every time with a new id.

Conclusion :

How does Python manage strings as objects?
 
G

Gary Herron

Avetis said:
After reading the discussion about the same subject ( From: "Thomas
Moore" <[email protected]> Date: Tue, 1 Nov 2005 21:45:56
+0800 ), I tried myself some tests with some confusing results (I'm a
beginner with Python, I'm coming from PHP)



# 1. Short alpha-numeric String without space

a = "b747"
b = "b747"


True



# 2. Long alpha-numeric String without space

a =
"averylongstringbutreallyaveryveryverylongstringwithabout68characters"
b =
"averylongstringbutreallyaveryveryverylongstringwithabout68characters"


True



# 3. Short alpha-numeric String with space

a = "x y"
b = "x y"


False



# 4. Long alpha-numeric String with space

a = "I love Python it s so much better than PHP but sometimes
confusing"
b = "I love Python it s so much better than PHP but sometimes
confusing"


False



# 5. Empty String

a = ""
b = ""


True


# 6. Whitecharacter String : space

a = " "
b = " "


False



# 7. Whitecharacter String : new line

a = "\n"
b = "\n"


False



# 8. Non-ASCII without space

a = "é"
b = "é"


False



# 9. Non-ASCII with space

a = "é à"
b = "é à"


False



It seems that any strict ASCII alpha-numeric string is instantiated as
an unique object, like a "singleton" ( a = "x" and b = "x" => a is b )
and that any non strict ASCII alpha-numeric string is instantiated as
a new object every time with a new id.

Conclusion :

How does Python manage strings as objects?

However the implementors want.

That may seem a flippant answer, but it's actually accurate. The choice
of whether a new string reuses an existing string or creates a new one
is *not* a Python question, but rather a question of implementation.
It's a matter of efficiency, and as such each implementation/version of
Python may make its own choices. Writing a program that depends on the
string identity policy would be considered an erroneous program, and
should be avoided.

The question now is: Why do you care? The properties of strings do
not depend on the implementation's choice, so you shouldn't care because
of programming considerations. Perhaps it's just a matter of curiosity
on your part.


Gary Herron
 
T

Terry Reedy

Avetis said:
After reading the discussion about the same subject ( From: "Thomas
Moore" <[email protected]> Date: Tue, 1 Nov 2005 21:45:56
+0800 ), I tried myself some tests with some confusing results (I'm a
beginner with Python, I'm coming from PHP)

For immutable objects, identity is essentially irrelevant. Whether an
implementation conserves space by reusing immutable objects with a given
value, and if so, how so, depends on the particular version of a
particular implementation. Unless one in interested in interpreter
implementation, I advise against paying too much attention to the issue.
It seems to generate more confusion than enlightenment.
How does Python manage strings as objects?

Python the language does not 'manage' objects. Particular interpreters
do what they do. The CPython sources are decently readable.

tjr
 
A

Avetis KAZARIAN

Gary said:
The question now is: Why do you care? The properties of strings do
not depend on the implementation's choice, so you shouldn't care because
of programming considerations. Perhaps it's just a matter of curiosity
on your part.

Gary Herron

Well, it's not about curiosity, it's more about performance.

I will make a PHP example (a really quite simple )

PHP :

Stat 1 : $aVeryLongString == $anOtherVeryLongString
Stat 2 : $aVeryLongString === $anOtherVeryLongString

Stat 2 is really faster than Stat 1 (due to the binary comparison)

As I said, I'm coming from PHP, so I was wondering if there was such a
difference in Python.

Because I was trying to use "is" as for "===".
 
P

Peter Otten

Avetis said:
Well, it's not about curiosity, it's more about performance.

I will make a PHP example (a really quite simple )

PHP :

Stat 1 : $aVeryLongString == $anOtherVeryLongString
Stat 2 : $aVeryLongString === $anOtherVeryLongString

Stat 2 is really faster than Stat 1 (due to the binary comparison)

As I said, I'm coming from PHP, so I was wondering if there was such a
difference in Python.

Because I was trying to use "is" as for "===".

So you have two very long strings that may be equal. How did you get them?
If you read them from a file, that took much more time than the comparison.

If they are sufficiently likely to be not equal just read them in smaller
chunks and compare these. If you want to compare multiple combinations use
hashes.

If 'a is b' worked like 'a == b' for arbitrary string that would mean that
the python implementation had done a lot of unnecessary 'a == b'
comparisons behind the scene or at least calculated a lot of hash values,
i. e. the ability to use the fast operation would in effect slow down your
program.

Peter
 
S

Steve Holden

Avetis said:
Well, it's not about curiosity, it's more about performance.

I will make a PHP example (a really quite simple )

PHP :

Stat 1 : $aVeryLongString == $anOtherVeryLongString
Stat 2 : $aVeryLongString === $anOtherVeryLongString

Stat 2 is really faster than Stat 1 (due to the binary comparison)

As I said, I'm coming from PHP, so I was wondering if there was such a
difference in Python.

Because I was trying to use "is" as for "===".

Suppose you write

a = b

Thereafter, unless some further assignment is made to either a or b, you
are guaranteed that "a is b" returns True.

This is pretty much the only guarantee you have. There is no guarantee
(across all implementations) that

a = some-expression

b = some-equivalent-expression

will leave "a is b" True.

Does PHP really keep only one copy of every string? Sounds like that
could slow string creation down a little. Essentially it's keeping all
strings in a set. Of course you could do that in Python if you wanted,
but it would certainly slow things down.

Anyway, thanks for looking at Python. I hope you continue to enjoy it!

regards
Steve
 
G

Gabriel Genellina

Well, it's not about curiosity, it's more about performance.

I will make a PHP example (a really quite simple )

PHP :

Stat 1 : $aVeryLongString == $anOtherVeryLongString
Stat 2 : $aVeryLongString === $anOtherVeryLongString

Stat 2 is really faster than Stat 1 (due to the binary comparison)

As I said, I'm coming from PHP, so I was wondering if there was such a
difference in Python.

Because I was trying to use "is" as for "===".

PHP '==' has no direct correspondence in Python. '===' in PHP is more like
'==' in Python (but not exactly the same).
In PHP, $x === $y is true if both variables are of the same type *and*
both have the same value. $x == $y checks only the values, doing type
conversions as needed, even string -> number; there is no equivalent
operator in Python. PHP === is called "identity" but isn't related to the
"is" operator in Python; there is no identity test in PHP with the Python
semantics.

PHP:
1 == 1
TRUE

1 == 1.0
TRUE

1 == "1"
TRUE

1 == "1.0"
TRUE

1 === 1
TRUE

1 === 1.0
FALSE

1 === "1"
FALSE

1 === "1.0"
FALSE

array(1,2,3) == array(1,2,3)
TRUE

array(1,2,3) === array(1,2,3)
TRUE


Python:
1 == 1
True

1 == 1.0
True

1 == "1"
False

1 == "1.0"
False

[1,2,3] == [1,2,3]
True

[1,2,3] is [1,2,3]
False


So, don't try to translate concepts from one language to another. (Ok,
it's natural to try to do that if you know PHP, but doesn't work. You have
to know the differences).
 
A

Avetis KAZARIAN

Steve said:
Does PHP really keep only one copy of every string?

Not at all.

I might have said something confusing if you understood that...
So, don't try to translate concepts from one language to another.

I'll try ;]
 
S

S Arrowsmith

Avetis KAZARIAN said:
It seems that any strict ASCII alpha-numeric string is instantiated as
an unique object, like a "singleton" ( a =3D "x" and b =3D "x" =3D> a is b =
)
and that any non strict ASCII alpha-numeric string is instantiated as
a new object every time with a new id.

What no-one appears to have mentioned so far is that the purpose
of this implementation detail is to ensure that there is a single
instance of strings which are valid identifiers, so that you don't
go around creating and destroying string instances just to do an
attribute look-up on an object. A few strings which are not valid
as identifiers get swept up into this system:
True

"Small" integers get a similar treatment:
False

But as as hopefully been made clear, all this is completely an
implementation detail. (Indeed, the range of "interned" integers
changed from 0--99 to -5--2356 a few versions ago.) So don't,
under any circumstances, rely on it, even when you understand
what's going on.
 
H

Hendrik van Rooyen

S Arrowsmith said:
"Small" integers get a similar treatment:

False

This is weird - I would have thought that the limit
of "small" would be at 255 - the biggest number to
fit in a byte. 256 takes two bytes, so it must be
an arbitrary limit - could have been set at 300,
or 30 000...

- Hendrik
 
B

Bruno Desthuilliers

Hendrik van Rooyen a écrit :
This is weird - I would have thought that the limit
of "small" would be at 255 - the biggest number to
fit in a byte. 256 takes two bytes, so it must be
an arbitrary limit

It is, and has changed from version to version.
 
B

Bruno Desthuilliers

Avetis KAZARIAN a écrit :

> Well, it's not about curiosity, it's more about performance.
Steve Holden wrote: (snip)
So, don't try to translate concepts from one language to another.

I'll try ;]

Also and FWIW:

1/ Python has some very handy tools when it comes to perfs - like a
couple profilers (to identify bottlenecks), or the timeit module (for
quick benchmarks).

2/ Most "best practice" idioms are frequently discussed here

3/ If you have performance problems related to wrong algorithm/data
structure, some of us here _really_ enjoy helping !-)

Welcome onboard.
 
T

Terry Reedy

Hendrik said:
This is weird - I would have thought that the limit
of "small" would be at 255 - the biggest number to
fit in a byte. 256 takes two bytes, so it must be
an arbitrary limit - could have been set at 300,
or 30 000...

'Small' also goes to -10 or so. 256 was included, at minuscule cost,
because it is a relatively common number, being the number of bytes.
 
T

Terry Reedy

Ints take as least 4 bytes. It is commonness of usage that determined
caching. The range was expanded a few years ago in anticipation of the
new bytes type, whose contents are ints, not chars.
'Small' also goes to -10 or so. 256 was included, at minuscule cost,
because it is a relatively common number, being the number of bytes.

In fact, 3.0.1 starts with 36 internal references to the cached int 256!
38 # -2 for the function call
2
>>> [sys.getrefcount(i)-2 for i in range(258)]

shows that only 15 cached ints start with more references. 0 has the
most with 724 (and that small actually goes to -5).

tjr
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,266
Latest member
DavidaAlla

Latest Threads

Top