id functions of ints, floats and strings

Z

zillow10

Hi all,

I've been playing around with the identity function id() for different
types of objects, and I think I understand its behaviour when it comes
to objects like lists and tuples in which case an assignment r2 = r1
(r1 refers to an existing object) creates an alias r2 that refers to
the same object as r1. In this case id(r1) == id(r2) (or, if you
like: r1 is r2). However for r1, r2 assigned as follows: r1 = [1, 2,
3] and r2 = [1, 2, 3], (r1 is r2) is False, even if r1==r2,
etc. ...this is all very well. Therefore, it seems that id(r) can be
interpreted as the address of the object that 'r' refers to.

My observations of its behaviour when comparing ints, floats and
strings have raised some questions in my mind, though. Consider the
following examples:

#########################################################################

# (1) turns out to be true
a = 10
b = 10
print a is b

# (2) turns out to be false
f = 10.0
g = 10.0
print f is g

# behaviour when a list or tuple contains the same elements ("same"
meaning same type and value):

# define the following function, that checks if all the elements in an
iterable object are equal:

def areAllElementsEqual(iterable):
return reduce(lambda x, y: x == y and x, iterable) != False

# (3) checking if ids of all list elements are the same for different
cases:

a = 3*[1]; areAllElementsEqual([id(i) for i in a]) # True
b = [1, 1, 1]; areAllElementsEqual([id(i) for i in b]) # True
f = 3*[1.0]; areAllElementsEqual([id(i) for i in f]) # True
g = [1.0, 1.0, 1.0]; areAllElementsEqual([id(i) for i in g]) # True
g1 = [1.0, 1.0, 0.5+0.5]; areAllElementsEqual([id(i) for i in g1]) #
False

# (4) two equal floats defined inside a function body behave
differently than case (1):

def func():
f = 10.0
g = 10.0
return f is g

print func() # True

######################################################

I didn't mention any examples with strings; they behaved like ints
with respect to their id properties for all the cases I tried.
While I have no particular qualms about the behaviour, I have the
following questions:

1) Which of the above behaviours are reliable? For example, does a1 =
a2 for ints and strings always imply that a1 is a2?
2) From the programmer's perspective, are ids of ints, floats and
string of any practical significance at all (since these types are
immutable)?
3) Does the behaviour of ids for lists and tuples of the same element
(of type int, string and sometimes even float), imply that the tuple a
= (1,) takes (nearly) the same storage space as a = 10000*(1,)? (What
about a list, where elements can be changed at will?)

Would appreciate your responses...

AK
 
Z

zillow10

Hi all,

I've been playing around with the identity function id() for different
types of objects, and I think I understand its behaviour when it comes
to objects like lists and tuples in which case an assignment r2 = r1
(r1 refers to an existing object) creates an alias r2 that refers to
the same object as r1. In this case id(r1) == id(r2) (or, if you
like: r1 is r2). However for r1, r2 assigned as follows: r1 = [1, 2,
3] and r2 = [1, 2, 3], (r1 is r2) is False, even if r1==r2,
etc. ...this is all very well. Therefore, it seems that id(r) can be
interpreted as the address of the object that 'r' refers to.

My observations of its behaviour when comparing ints, floats and
strings have raised some questions in my mind, though. Consider the
following examples:

#########################################################################

# (1) turns out to be true
a = 10
b = 10
print a is b

# (2) turns out to be false
f = 10.0
g = 10.0
print f is g

# behaviour when a list or tuple contains the same elements ("same"
meaning same type and value):

# define the following function, that checks if all the elements in an
iterable object are equal:

def areAllElementsEqual(iterable):
return reduce(lambda x, y: x == y and x, iterable) != False

# (3) checking if ids of all list elements are the same for different
cases:

a = 3*[1]; areAllElementsEqual([id(i) for i in a]) # True
b = [1, 1, 1]; areAllElementsEqual([id(i) for i in b]) # True
f = 3*[1.0]; areAllElementsEqual([id(i) for i in f]) # True
g = [1.0, 1.0, 1.0]; areAllElementsEqual([id(i) for i in g]) # True
g1 = [1.0, 1.0, 0.5+0.5]; areAllElementsEqual([id(i) for i in g1]) #
False

# (4) two equal floats defined inside a function body behave
differently than case (1):

def func():
f = 10.0
g = 10.0
return f is g

print func() # True

######################################################

I didn't mention any examples with strings; they behaved like ints
with respect to their id properties for all the cases I tried.
While I have no particular qualms about the behaviour, I have the
following questions:

1) Which of the above behaviours are reliable? For example, does a1 =
a2 for ints and strings always imply that a1 is a2?
2) From the programmer's perspective, are ids of ints, floats and
string of any practical significance at all (since these types are
immutable)?
3) Does the behaviour of ids for lists and tuples of the same element
(of type int, string and sometimes even float), imply that the tuple a
= (1,) takes (nearly) the same storage space as a = 10000*(1,)? (What
about a list, where elements can be changed at will?)

Would appreciate your responses...

AK

Question 1 should read "For example, does a1 == a2 for ints ..."
 
G

George Sakkis

1) Which of the above behaviours are reliable? For example, does a1 =
a2 for ints and strings always imply that a1 is a2?
No.

2) From the programmer's perspective, are ids of ints, floats and
string of any practical significance at all (since these types are
immutable)?
No.

3) Does the behaviour of ids for lists and tuples of the same element
(of type int, string and sometimes even float), imply that the tuple a
= (1,) takes (nearly) the same storage space as a = 10000*(1,)? (What
about a list, where elements can be changed at will?)

No.

Regards,
George
 
D

Daniel Fetchinson

Hi all,
I've been playing around with the identity function id() for different
types of objects, and I think I understand its behaviour when it comes
to objects like lists and tuples in which case an assignment r2 = r1
(r1 refers to an existing object) creates an alias r2 that refers to
the same object as r1. In this case id(r1) == id(r2) (or, if you
like: r1 is r2). However for r1, r2 assigned as follows: r1 = [1, 2,
3] and r2 = [1, 2, 3], (r1 is r2) is False, even if r1==r2,
etc. ...this is all very well. Therefore, it seems that id(r) can be
interpreted as the address of the object that 'r' refers to.

My observations of its behaviour when comparing ints, floats and
strings have raised some questions in my mind, though. Consider the
following examples:

#########################################################################

# (1) turns out to be true
a = 10
b = 10
print a is b

# (2) turns out to be false
f = 10.0
g = 10.0
print f is g

# behaviour when a list or tuple contains the same elements ("same"
meaning same type and value):

# define the following function, that checks if all the elements in an
iterable object are equal:

def areAllElementsEqual(iterable):
return reduce(lambda x, y: x == y and x, iterable) != False

# (3) checking if ids of all list elements are the same for different
cases:

a = 3*[1]; areAllElementsEqual([id(i) for i in a]) # True
b = [1, 1, 1]; areAllElementsEqual([id(i) for i in b]) # True
f = 3*[1.0]; areAllElementsEqual([id(i) for i in f]) # True
g = [1.0, 1.0, 1.0]; areAllElementsEqual([id(i) for i in g]) # True
g1 = [1.0, 1.0, 0.5+0.5]; areAllElementsEqual([id(i) for i in g1]) #
False

# (4) two equal floats defined inside a function body behave
differently than case (1):

def func():
f = 10.0
g = 10.0
return f is g

print func() # True

######################################################

I didn't mention any examples with strings; they behaved like ints
with respect to their id properties for all the cases I tried.
While I have no particular qualms about the behaviour, I have the
following questions:

Small integers and short strings are cached and reused and for these (
r1 == r2 ) implies ( r1 is r2 ). For longer strings or larger integers
this does not happen and so in general ( r1 == r2 ) does not imply (
r1 is r2 ). The caching and reuse is for performance gains and is an
implementation detail which should not be relied upon.
1) Which of the above behaviours are reliable? For example, does a1 =
a2 for ints and strings always imply that a1 is a2?

No, see above.
2) From the programmer's perspective, are ids of ints, floats and
string of any practical significance at all (since these types are
immutable)?

You can check identity (and not equality) with them. So whenever you
need that they are practically useful if all you need is equality they
are useless.
3) Does the behaviour of ids for lists and tuples of the same element
(of type int, string and sometimes even float), imply that the tuple a
= (1,) takes (nearly) the same storage space as a = 10000*(1,)? (What
about a list, where elements can be changed at will?)

I'm not sure about tuples but for lists the storage space needed for
10000*(1,) is roughly 10000 times more than for (1,).
Would appreciate your responses...

HTH,
Daniel
 
G

Gabriel Genellina

Hi all,

I've been playing around with the identity function id() for different
types of objects, and I think I understand its behaviour when it comes
to objects like lists and tuples in which case an assignment r2 = r1
(r1 refers to an existing object) creates an alias r2 that refers to
the same object as r1. In this case id(r1) == id(r2) (or, if you
like: r1 is r2). However for r1, r2 assigned as follows: r1 = [1, 2,
3] and r2 = [1, 2, 3], (r1 is r2) is False, even if r1==r2,
etc. ...this is all very well. Therefore, it seems that id(r) can be
interpreted as the address of the object that 'r' refers to.

My observations of its behaviour when comparing ints, floats and
strings have raised some questions in my mind, though. Consider the
following examples:

#########################################################################

# (1) turns out to be true
a = 10
b = 10
print a is b

....only because CPython happens to cache small integers and return always
the same object. Try again with 10000. This is just an optimization and
the actual range of cached integer, or whether they are cached at all, is
implementation (and version) dependent.
(As integers are immutable, the optimization *can* be done, but that
doesn't mean that all immutable objects are always shared).
# (2) turns out to be false
f = 10.0
g = 10.0
print f is g

Because the above optimization isn't used for floats.
The `is` operator checks object identity: whether both operands are the
very same object (*not* a copy, or being equal: the *same* object)
("identity" is a primitive concept)
The only way to guarantee that you are talking of the same object, is
using a reference to a previously created object. That is:

a = some_arbitrary_object
b = a
assert a is b

The name `b` now refers to the same object as name `a`; the assertion
holds for whatever object it is.

In other cases, like (1) and (2) above, the literals are just handy
constructors for int and float objects. You have two objects constructed
(a and b, f and g). Whether they are identical or not is not defined; they
might be the same, or not, depending on unknown factors that might include
the moon phase; both alternatives are valid Python.
# (3) checking if ids of all list elements are the same for different
cases:

a = 3*[1]; areAllElementsEqual([id(i) for i in a]) # True
b = [1, 1, 1]; areAllElementsEqual([id(i) for i in b]) # True
f = 3*[1.0]; areAllElementsEqual([id(i) for i in f]) # True
g = [1.0, 1.0, 1.0]; areAllElementsEqual([id(i) for i in g]) # True
g1 = [1.0, 1.0, 0.5+0.5]; areAllElementsEqual([id(i) for i in g1]) #
False

Again, this is implementation dependent. If you try with a different
Python version or a different implementation you may get other results -
and that doesn't mean that any of them is broken.
# (4) two equal floats defined inside a function body behave
differently than case (1):

def func():
f = 10.0
g = 10.0
return f is g

print func() # True

Another implementation detail related to co_consts. You shouldn't rely on
it.
I didn't mention any examples with strings; they behaved like ints
with respect to their id properties for all the cases I tried.

You didn't try hard enough :)

py> x = "abc"
py> y = ''.join(x)
py> x == y
True
py> x is y
False

Long strings behave like big integers: they aren't cached:

py> x = "a rather long string, full of garbage. No, this isn't garbage,
just non
sense text to fill space."
py> y = "a rather long string, full of garbage. No, this isn't garbage,
just non
sense text to fill space."
py> x == y
True
py> x is y
False

As always: you have two statements constructing two objects. Whether they
return the same object or not, it's not defined.
While I have no particular qualms about the behaviour, I have the
following questions:

1) Which of the above behaviours are reliable? For example, does a1 =
a2 for ints and strings always imply that a1 is a2?

If you mean:

a1 = something
a2 = a1
a1 is a2

then, from my comments above, you should be able to answer: yes, always,
not restricted to ints and strings.

If you mean:

a1 = someliteral
a2 = someliteral
a1 is a2

then: no, it isn't guaranteed at all, nor even for small integers or
strings.
2) From the programmer's perspective, are ids of ints, floats and
string of any practical significance at all (since these types are
immutable)?

The same significance as id() of any other object... mostly, none, except
for debugging purposes.
3) Does the behaviour of ids for lists and tuples of the same element
(of type int, string and sometimes even float), imply that the tuple a
= (1,) takes (nearly) the same storage space as a = 10000*(1,)? (What
about a list, where elements can be changed at will?)

That's a different thing. A tuple maintains only references to its
elements (as any other object in Python). The memory required for a tuple
(I'm talking of CPython exclusively) is: (a small header) + n *
sizeof(pointer). So the expression 10000*(anything,) will take more space
than the singleton (anything,) because the former requires space for 10000
pointers and the latter just one.

You have to take into account the memory for the elements themselves; but
in both cases there is a *single* object referenced, so it doesn't matter.
Note that it doesn't matter whether that single element is an integer, a
string, mutable or immutable object: it's always the same object, already
existing, and creating that 10000-uple just increments its reference count
by 10000.

The situation is similar for lists, except that being mutable containers,
they're over-allocated (to have room for future expansion). So the list
[anything]*10000 has a size somewhat larger than 10000*sizeof(pointer);
its (only) element increments its reference count by 10000.
 
S

Steve Holden

Gabriel said:
Hi all,

I've been playing around with the identity function id() for different
types of objects, and I think I understand its behaviour when it comes
to objects like lists and tuples in which case an assignment r2 = r1
(r1 refers to an existing object) creates an alias r2 that refers to
the same object as r1. In this case id(r1) == id(r2) (or, if you
like: r1 is r2). However for r1, r2 assigned as follows: r1 = [1, 2,
3] and r2 = [1, 2, 3], (r1 is r2) is False, even if r1==r2,
etc. ...this is all very well. Therefore, it seems that id(r) can be
interpreted as the address of the object that 'r' refers to.

My observations of its behaviour when comparing ints, floats and
strings have raised some questions in my mind, though. Consider the
following examples:

#########################################################################

# (1) turns out to be true
a = 10
b = 10
print a is b

...only because CPython happens to cache small integers and return always
the same object. Try again with 10000. This is just an optimization and
the actual range of cached integer, or whether they are cached at all, is
implementation (and version) dependent.
(As integers are immutable, the optimization *can* be done, but that
doesn't mean that all immutable objects are always shared).
# (2) turns out to be false
f = 10.0
g = 10.0
print f is g

Because the above optimization isn't used for floats.
The `is` operator checks object identity: whether both operands are the
very same object (*not* a copy, or being equal: the *same* object)
("identity" is a primitive concept)
The only way to guarantee that you are talking of the same object, is
using a reference to a previously created object. That is:

a = some_arbitrary_object
b = a
assert a is b

The name `b` now refers to the same object as name `a`; the assertion
holds for whatever object it is.

In other cases, like (1) and (2) above, the literals are just handy
constructors for int and float objects. You have two objects constructed
(a and b, f and g). Whether they are identical or not is not defined; they
might be the same, or not, depending on unknown factors that might include
the moon phase; both alternatives are valid Python.
# (3) checking if ids of all list elements are the same for different
cases:

a = 3*[1]; areAllElementsEqual([id(i) for i in a]) # True
b = [1, 1, 1]; areAllElementsEqual([id(i) for i in b]) # True
f = 3*[1.0]; areAllElementsEqual([id(i) for i in f]) # True
g = [1.0, 1.0, 1.0]; areAllElementsEqual([id(i) for i in g]) # True
g1 = [1.0, 1.0, 0.5+0.5]; areAllElementsEqual([id(i) for i in g1]) #
False

Again, this is implementation dependent. If you try with a different
Python version or a different implementation you may get other results -
and that doesn't mean that any of them is broken.
# (4) two equal floats defined inside a function body behave
differently than case (1):

def func():
f = 10.0
g = 10.0
return f is g

print func() # True

Another implementation detail related to co_consts. You shouldn't rely on
it.
I didn't mention any examples with strings; they behaved like ints
with respect to their id properties for all the cases I tried.

You didn't try hard enough :)

py> x = "abc"
py> y = ''.join(x)
py> x == y
True
py> x is y
False

Long strings behave like big integers: they aren't cached:

py> x = "a rather long string, full of garbage. No, this isn't garbage,
just non
sense text to fill space."
py> y = "a rather long string, full of garbage. No, this isn't garbage,
just non
sense text to fill space."
py> x == y
True
py> x is y
False

As always: you have two statements constructing two objects. Whether they
return the same object or not, it's not defined.
While I have no particular qualms about the behaviour, I have the
following questions:

1) Which of the above behaviours are reliable? For example, does a1 =
a2 for ints and strings always imply that a1 is a2?

If you mean:

a1 = something
a2 = a1
a1 is a2

then, from my comments above, you should be able to answer: yes, always,
not restricted to ints and strings.

If you mean:

a1 = someliteral
a2 = someliteral
a1 is a2

then: no, it isn't guaranteed at all, nor even for small integers or
strings.
2) From the programmer's perspective, are ids of ints, floats and
string of any practical significance at all (since these types are
immutable)?

The same significance as id() of any other object... mostly, none, except
for debugging purposes.
3) Does the behaviour of ids for lists and tuples of the same element
(of type int, string and sometimes even float), imply that the tuple a
= (1,) takes (nearly) the same storage space as a = 10000*(1,)? (What
about a list, where elements can be changed at will?)

That's a different thing. A tuple maintains only references to its
elements (as any other object in Python). The memory required for a tuple
(I'm talking of CPython exclusively) is: (a small header) + n *
sizeof(pointer). So the expression 10000*(anything,) will take more space
than the singleton (anything,) because the former requires space for 10000
pointers and the latter just one.

You have to take into account the memory for the elements themselves; but
in both cases there is a *single* object referenced, so it doesn't matter.
Note that it doesn't matter whether that single element is an integer, a
string, mutable or immutable object: it's always the same object, already
existing, and creating that 10000-uple just increments its reference count
by 10000.

The situation is similar for lists, except that being mutable containers,
they're over-allocated (to have room for future expansion). So the list
[anything]*10000 has a size somewhat larger than 10000*sizeof(pointer);
its (only) element increments its reference count by 10000.
In fact all you can in truth say is that

a is b --> a == b

The converse definitely not true.

regards
Steve
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,045
Latest member
DRCM

Latest Threads

Top