addressof object with id()

F

Fabian von Romberg

Hi,

I have a single questions regarding id() built-in function.

example 1:

var1 = "some string"
var2 = "some string"

if use the id() function on both, it returns exactly the same address.


example 2:

data = "some string"
var1 = data
var2 = data

if use the id() function on var1 and var2, it returns exactly the same address.


can anyone explain me please why does this happens? Is this correct?


Thanks in advance and regards,
Fabian
 
R

Roy Smith

Fabian von Romberg said:
Hi,

I have a single questions regarding id() built-in function.

example 1:

var1 = "some string"
var2 = "some string"

if use the id() function on both, it returns exactly the same address.

Yup. This is because (in some implementations, but not guaranteed),
Python interns strings. That means, when you create a string literal
(i.e. something in quotes), the system looks to see if it's seen that
exact same string before and if so, gives you a reference to the same
string in memory, instead of creating a new one.

Also, be careful about saying things like, "the id() function [...]
returns [an] address. It's only in some implementations (and, again,
not guaranteed), that id() returns the address of an object. All the
docs say is, "an integer (or long integer) which is guaranteed to be
unique and constant for this object during its lifetime". An
implementation is free to number objects consecutively from 1, or from
23000, or pick random numbers, anything else it wants, as long as it
meets that requirement.

BTW, I tried this:
3810944

which is, of course, the result I expected. Then I tried:
3810944

which actually surprised me. I had thought interning only affected
string literals, but apparently it works for all strings! This works
too:
3810152

but, again, none of this is guaranteed.
 
C

Chris Angelico

I had thought interning only affected
string literals, but apparently it works for all strings! This works
too:

3810152

but, again, none of this is guaranteed.

No, what that proves is that concatenation of literals is treated as a literal.
14870112

However, you can of course force the matter.(14870112, 14870112, 14870112)

(In Python 2, use intern() rather than sys.intern().)

It seems that in CPython 3.3.0 on Windows 32-bit (and yes, I do need
to be that specific, though this probably applies to a broad range of
CPython versions), string literals are interned. But that's an
implementation detail.

ChrisA
 
S

Steven D'Aprano

Hi,

I have a single questions regarding id() built-in function.

example 1:

var1 = "some string"
var2 = "some string"

if use the id() function on both, it returns exactly the same address.

The id() function does not return an address. Scrub that from your mind.
See below for further details.

In any case, when I try the above, I do *not* get the same ID.

py> var1 = "some string"
py> var2 = "some string"
py> id(var1), id(var2)
(3082752384L, 3082752960L)

I get two distinct IDs.


example 2:

data = "some string"
var1 = data
var2 = data

In this case, id(var1) and id(var2) is *guaranteed* to return the same
value, because both names are bound to the same object.

The first line creates a string object and binds it to the name "data".
The second line binds the same object to the name "var1", and the third
line does the same to the name "var2". So now you have three names all
referring to the same object. Since all three names refer to the same
object, it doesn't matter whether you call id(data), id(var1) or
id(var2), they return the same ID.

Objects in Python are like people -- they can have more than one name, or
no name at all. Depending on who is doing the talking, all of the
following could refer to the same person:

"Barrack"
"Obama"
"Dad"
"Son"
"Mr. President"
"POTUS"
"Renegade"


if use the id() function on var1 and var2, it returns exactly the same
address.


The id() function does not return an address. It returns an arbitrary
identification number, an ID. In Jython, IDs start at 1, and are never re-
used. In CPython, IDs look like addresses[1], and may be re-used.

The only promises that Python makes about IDs are:

- they are integers;

- they are constant for the life-time of the object;

- they are unique for the life-time of the object.

That means that they *may* or *may not* be re-used. Jython does not re-
use them, CPython does.

Python does not make many promises about object identity. Creating a new
object using a literal expression may or may not create a new object.
Python makes no promises either way. For example:

py> a = 42
py> b = 42
py> id(a) == id(b) # a and b are the same object
True


but:

py> a = 420000
py> b = 420000
py> id(a) == id(b) # a and b are different objects
False


but:


py> x = id(420000)
py> y = id(420000)
py> x == y # x and y get the same ID
True


However all of these examples are implementation dependent and may vary
according to the Python implementation and version. In particular, the
third example is specific to CPython. In Jython, for example, the IDs
will be unequal.

id() is one of the most pointless functions in Python. There is almost
never any need to care about the id of an object. I wish that it were
moved into the inspect module instead of a built-in function, because in
practice id() is only useful for confusing beginners.






[1] They look like addresses because the CPython implementation uses the
address of the object, as returned by the C compiler, as the ID. But
since there is no way to dereference an id, or lookup the object found at
an id, it is not an address. It merely has the same numeric value as what
the C compiler sees as the address.
 
S

Steven D'Aprano


Nope. Did you actually try it?


As far as I know, there is no Python implementation that automatically
interns strings which are not valid identifiers. "some string" is not a
valid identifier, due to the space.


[...]
which is, of course, the result I expected. Then I tried:

3810944

which actually surprised me. I had thought interning only affected
string literals, but apparently it works for all strings! This works
too:

3810152

You're not actually testing what you think you're testing. In Cpython,
which I assume you are using, there is a keyhole optimizer that runs when
code is compiled to byte-code, and it folds some constant expressions
like `a = "b" + "ar"` to `a = "bar"`. To defeat the keyhole optimizer,
you need something like this:

a = "bar"
b = "".join(["b", "a", "r"])

although of course a sufficiently smart keyhole optimizer could recognise
that as a constant as well. This, on the other hand, will defeat it:

b = str("ba") + str("r")

because the keyhole optimizer cannot tell whether str() is still the
built-in function or has been replaced by something else.

but, again, none of this is guaranteed.

Correct.
 
R

Roy Smith

Steven D'Aprano said:
As far as I know, there is no Python implementation that automatically
interns strings which are not valid identifiers. "some string" is not a
valid identifier, due to the space.

I stand corrected. And somewhat humbled.
 
N

Nobody

I have a single questions regarding id() built-in function.

example 1:

var1 = "some string"
var2 = "some string"

if use the id() function on both, it returns exactly the same address.

I'm assuming that you used something other than "some string" for your
tests (e.g. something without a space).

Python *may* intern strings. The CPython implementation *does* intern
strings which are valid identifiers (a sequence of letters, digits and
underscores not starting with a digit), in order to optimise attribute
lookups.

FWIW, it also does this for integers between -5 <= n <= 256:
var1 = 256
var2 = 255 + 1
id(var1) 21499520
21499520
var3 = var1 + 1
var4 = 257
id(var3) 21541752
id(var4)
21541584

In this case, it just saves on memory.

More generally, an implementation *may* intern any immutable value,
although it's not guaranteed to do so for anything except (IIRC) False,
True and None.
 
P

Peter Otten

Steven said:
Nope. Did you actually try it?


As far as I know, there is no Python implementation that automatically
interns strings which are not valid identifiers. "some string" is not a
valid identifier, due to the space.


I don't know about other implementations, but in CPython two equal strings
in the *same* *compilation* will end up with the same id as a result of
constant folding. In the interpreter:
True

In a script:

$ cat tmp.py
a = "some string"
b = "some string"
print a is b
$ python tmp.py
True
 
S

Steven D'Aprano

More generally, an implementation *may* intern any immutable value,
although it's not guaranteed to do so for anything except (IIRC) False,
True and None.

I believe the same also applies to NotImplemented and Ellipsis, although
I'm too lazy to look them up to check for sure.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,265
Messages
2,571,069
Members
48,771
Latest member
ElysaD

Latest Threads

Top