cPickle.dumps differs from Pickle.dumps; looks like a bug.

V

Victor Kryukov

Hello list,

I've found the following strange behavior of cPickle. Do you think
it's a bug, or is it by design?

Best regards,
Victor.

from pickle import dumps
from cPickle import dumps as cdumps

print dumps('1001799')==dumps(str(1001799))
print cdumps('1001799')==cdumps(str(1001799))

outputs

True
False


vicbook:~ victor$ python
Python 2.5 (r25:51918, Sep 19 2006, 08:49:13)
[GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
Type "help", "copyright", "credits" or "license" for more information.vicbook:~ victor$ uname -a
Darwin vicbook 8.9.1 Darwin Kernel Version 8.9.1: Thu Feb 22 20:55:00
PST 2007; root:xnu-792.18.15~1/RELEASE_I386 i386 i386
 
N

Nick Vatamaniuc

Hello list,

I've found the following strange behavior of cPickle. Do you think
it's a bug, or is it by design?

Best regards,
Victor.

from pickle import dumps
from cPickle import dumps as cdumps

print dumps('1001799')==dumps(str(1001799))
print cdumps('1001799')==cdumps(str(1001799))

outputs

True
False

vicbook:~ victor$ python
Python 2.5 (r25:51918, Sep 19 2006, 08:49:13)
[GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
Type "help", "copyright", "credits" or "license" for more information.>>> quit()

vicbook:~ victor$ uname -a
Darwin vicbook 8.9.1 Darwin Kernel Version 8.9.1: Thu Feb 22 20:55:00
PST 2007; root:xnu-792.18.15~1/RELEASE_I386 i386 i386

If you unpickle though will the results be the same? I suspect they
will be. That should matter most of all (unless you plan to compare
objects' identity based on their pickled version.)

Remember, that by default pickle and cPickle will create a longer
ASCII representation, for a binary representation use a higher pickle
protocol -- 2 instead of 1.

Hope that helps,
-Nick Vatamaniuc
 
D

Daniel Nogradi

I've found the following strange behavior of cPickle. Do you think
it's a bug, or is it by design?

Best regards,
Victor.

from pickle import dumps
from cPickle import dumps as cdumps

print dumps('1001799')==dumps(str(1001799))
print cdumps('1001799')==cdumps(str(1001799))

outputs

True
False

vicbook:~ victor$ python
Python 2.5 (r25:51918, Sep 19 2006, 08:49:13)
[GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
Type "help", "copyright", "credits" or "license" for more information.>>> quit()

vicbook:~ victor$ uname -a
Darwin vicbook 8.9.1 Darwin Kernel Version 8.9.1: Thu Feb 22 20:55:00
PST 2007; root:xnu-792.18.15~1/RELEASE_I386 i386 i386

If you unpickle though will the results be the same? I suspect they
will be. That should matter most of all (unless you plan to compare
objects' identity based on their pickled version.)

The OP was not comparing identity but equality. So it looks like a
real bug, I think the following should be True for any function f:

if a == b: f(a) == f(b)

or not?

Daniel
 
N

Nick Vatamaniuc

Hello list,

I've found the following strange behavior of cPickle. Do you think
it's a bug, or is it by design?

Best regards,
Victor.

from pickle import dumps
from cPickle import dumps as cdumps

print dumps('1001799')==dumps(str(1001799))
print cdumps('1001799')==cdumps(str(1001799))

outputs

True
False

vicbook:~ victor$ python
Python 2.5 (r25:51918, Sep 19 2006, 08:49:13)
[GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
Type "help", "copyright", "credits" or "license" for more information.>>> quit()

vicbook:~ victor$ uname -a
Darwin vicbook 8.9.1 Darwin Kernel Version 8.9.1: Thu Feb 22 20:55:00
PST 2007; root:xnu-792.18.15~1/RELEASE_I386 i386 i386

I might have found the culprit: see http://svn.python.org/projects/python/trunk/Modules/cPickle.c
Function static int put2(...) has the following code block in it :

---------cPickle.c-----------
int p;
....
if ((p = PyDict_Size(self->memo)) < 0) goto finally;
/* Make sure memo keys are positive! */
/* XXX Why?
* XXX And does "positive" really mean non-negative?
* XXX pickle.py starts with PUT index 0, not 1. This makes for
* XXX gratuitous differences between the pickling modules.
*/
p++;
-------------------------------

p++ will cause the difference. It seems the developers are not quite
sure why it's there or whether memo key sizes can be 0 or have to be
1.

Here is corresponding section for the Python version (pickle.py) taken
from Python 2.5
---------pickle.py----------
def memoize(self, obj):
"""Store an object in the memo."""
# The Pickler memo is a dictionary mapping object ids to 2-
tuples
# that contain the Unpickler memo key and the object being
memoized.
# The memo key is written to the pickle and will become
# the key in the Unpickler's memo. The object is stored in
the
# Pickler memo so that transient objects are kept alive during
# pickling.

# The use of the Unpickler memo length as the memo key is just
a
# convention. The only requirement is that the memo values be
unique.
# But there appears no advantage to any other scheme, and this
# scheme allows the Unpickler memo to be implemented as a
plain (but
# growable) array, indexed by memo key.
if self.fast:
return
assert id(obj) not in self.memo
memo_len = len(self.memo)
self.write(self.put(memo_len))
self.memo[id(obj)] = memo_len, obj

# Return a PUT (BINPUT, LONG_BINPUT) opcode string, with argument
i.
def put(self, i, pack=struct.pack):
if self.bin:
if i < 256:
return BINPUT + chr(i)
else:
return LONG_BINPUT + pack("<i", i)
return PUT + repr(i) + '\n'
------------------------------------------

In memoize memo_len is the 'int p' from the c version. The size is 0
and is kept 0 while in the C version the size initially is 0 but then
is incremented with p++;

Any developers that know more about this?

-Nick Vatamaniuc
 
M

Marc 'BlackJack' Rintsch

Daniel Nogradi said:
The OP was not comparing identity but equality. So it looks like a
real bug, I think the following should be True for any function f:

if a == b: f(a) == f(b)

or not?

In [74]: def f(x):
....: return x / 2
....:

In [75]: a = 5

In [76]: b = 5.0

In [77]: a == b
Out[77]: True

In [78]: f(a) == f(b)
Out[78]: False

And `f()` doesn't even use something like `random()` or `time()` here. ;-)

Ciao,
Marc 'BlackJack' Rintsch
 
C

Chris Mellon

I've found the following strange behavior of cPickle. Do you think
it's a bug, or is it by design?

Best regards,
Victor.

from pickle import dumps
from cPickle import dumps as cdumps

print dumps('1001799')==dumps(str(1001799))
print cdumps('1001799')==cdumps(str(1001799))

outputs

True
False

vicbook:~ victor$ python
Python 2.5 (r25:51918, Sep 19 2006, 08:49:13)
[GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
Type "help", "copyright", "credits" or "license" for more information.>>> quit()

vicbook:~ victor$ uname -a
Darwin vicbook 8.9.1 Darwin Kernel Version 8.9.1: Thu Feb 22 20:55:00
PST 2007; root:xnu-792.18.15~1/RELEASE_I386 i386 i386

If you unpickle though will the results be the same? I suspect they
will be. That should matter most of all (unless you plan to compare
objects' identity based on their pickled version.)

The OP was not comparing identity but equality. So it looks like a
real bug, I think the following should be True for any function f:

if a == b: f(a) == f(b)

or not?

Obviously not, in the general case. random.random(x) is the most
obvious example, but there's any number functions which don't return
the same value for equal inputs. Take file() or open() - since you get
a new file object with new state, it obviously will not be equal even
if it's the same file path.

For certain inputs, cPickle doesn't print the memo information that is
used to support recursive and shared data structures. I'm not sure how
it tells the difference, perhaps it has something to do with
refcounts. In any case, it's an optimization of the pickle output, not
a bug.
 
D

Daniel Nogradi

I've found the following strange behavior of cPickle. Do you think
it's a bug, or is it by design?

Best regards,
Victor.

from pickle import dumps
from cPickle import dumps as cdumps

print dumps('1001799')==dumps(str(1001799))
print cdumps('1001799')==cdumps(str(1001799))

outputs

True
False

vicbook:~ victor$ python
Python 2.5 (r25:51918, Sep 19 2006, 08:49:13)
[GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
Type "help", "copyright", "credits" or "license" for more information.>>>
quit()

vicbook:~ victor$ uname -a
Darwin vicbook 8.9.1 Darwin Kernel Version 8.9.1: Thu Feb 22 20:55:00
PST 2007; root:xnu-792.18.15~1/RELEASE_I386 i386 i386

If you unpickle though will the results be the same? I suspect they
will be. That should matter most of all (unless you plan to compare
objects' identity based on their pickled version.)

The OP was not comparing identity but equality. So it looks like a
real bug, I think the following should be True for any function f:

if a == b: f(a) == f(b)

or not?

Obviously not, in the general case. random.random(x) is the most
obvious example, but there's any number functions which don't return
the same value for equal inputs. Take file() or open() - since you get
a new file object with new state, it obviously will not be equal even
if it's the same file path.

Right, sorry about that, posted too quickly :)
I was thinking for a while about a deterministic
For certain inputs, cPickle doesn't print the memo information that is
used to support recursive and shared data structures. I'm not sure how
it tells the difference, perhaps it has something to do with
refcounts. In any case, it's an optimization of the pickle output, not
a bug.
Caching?
True
.........
.........
False


Daniel
 
G

Gabriel Genellina

En Thu, 17 May 2007 02:09:02 -0300, Josiah Carlson
All strings of length 0 (there is 1) and 1 (there are 256) are interned.

I thought it was the case too, but not always:

py> a = "a"
py> b = "A".lower()
py> a==b
True
py> a is b
False
py> a is intern(a)
True
py> b is intern(b)
False
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,733
Messages
2,569,439
Members
44,829
Latest member
PIXThurman

Latest Threads

Top