How does python know?

T

Tobiah

I do this:

a = 'lasdfjlasdjflaksdjfl;akjsdf;kljasdl;kfjasl'
b = 'lasdfjlasdjflaksdjfl;akjsdf;kljasdl;kfjasl'

print
print id(a)
print id(b)


And get this:

True
140329184721376
140329184721376


This works for longer strings. Does python
compare a new string to every other string
I've made in order to determine whether it
needs to create a new object?

Thanks,

Tobiah
 
T

Tobiah

I do this:

a = 'lasdfjlasdjflaksdjfl;akjsdf;kljasdl;kfjasl'
b = 'lasdfjlasdjflaksdjfl;akjsdf;kljasdl;kfjasl'

print
print id(a)
print id(b)


And get this:

True
140329184721376
140329184721376


This works for longer strings. Does python
compare a new string to every other string
I've made in order to determine whether it
needs to create a new object?

Thanks,

Tobiah

Weird as well, is that in the interpreter,
the introduction of punctuation appears to
defeat the reuse of the object:
True

Tobiah
 
C

Chris Angelico

This works for longer strings. Does python
compare a new string to every other string
I've made in order to determine whether it
needs to create a new object?

No, it doesn't; but when you compile a module (including a simple
script like that), Python checks for repeated literals. It's only good
for literals, though.

If you specifically need this behaviour, it's called 'interning'. You
can ask Python to do this, or you can do it manually. But most of the
time, you can just ignore id() and simply let two strings be equal
based on their contents; the fact that constants are shared is a neat
optimization, nothing more.

ChrisA
 
G

Gary Herron

I do this:

a = 'lasdfjlasdjflaksdjfl;akjsdf;kljasdl;kfjasl'
b = 'lasdfjlasdjflaksdjfl;akjsdf;kljasdl;kfjasl'

print
print id(a)
print id(b)


And get this:

True
140329184721376
140329184721376


This works for longer strings. Does python
compare a new string to every other string
I've made in order to determine whether it
needs to create a new object?

Thanks,

Tobiah

Yes.

Kind of: It's a hash calculation not a direct comparison, and it's
applied to strings of limited length. Details are implementation (and
perhaps version) specific. The process is called "string interning".
Google and wikipedia have lots to say about it.

Gary Herron
 
D

Dave Angel

Tobiah said:
Weird as well, is that in the interpreter,
the introduction of punctuation appears to
defeat the reuse of the object:


As others have said, interning is implementation specific, so you
should never rely on it.

I think the current CPython algorithm is designed to save both
memory and time in the storage and look up of symbol names. In a
typical program, those are the most likely to be duplicated. So
if your literal is of reasonable size and doesn’t invalid symbol
characters (such as space, punctuation, etc) then it just might
be added to the interned dictionary.
 
R

Roy Smith

Tobiah <[email protected]> said:
I do this:

a = 'lasdfjlasdjflaksdjfl;akjsdf;kljasdl;kfjasl'
b = 'lasdfjlasdjflaksdjfl;akjsdf;kljasdl;kfjasl'

print
print id(a)
print id(b)


And get this:

True
140329184721376
140329184721376


This works for longer strings. Does python
compare a new string to every other string
I've made in order to determine whether it
needs to create a new object?

Yes[*]. It's called interning. See
https://en.wikipedia.org/wiki/Intern_(computer_science).

[*] Well, nothing requires Python to do that. Some implementations do.
Some don't. Some do it for certain types of strings. Your mileage may
vary.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,773
Messages
2,569,594
Members
45,123
Latest member
Layne6498
Top