cPickle.dumps differs from Pickle.dumps; looks like a bug.

Discussion in 'Python' started by Victor Kryukov, May 16, 2007.

  1. Hello list,

    I've found the following strange behavior of cPickle. Do you think
    it's a bug, or is it by design?

    Best regards,
    Victor.

    from pickle import dumps
    from cPickle import dumps as cdumps

    print dumps('1001799')==dumps(str(1001799))
    print cdumps('1001799')==cdumps(str(1001799))

    outputs

    True
    False


    vicbook:~ victor$ python
    Python 2.5 (r25:51918, Sep 19 2006, 08:49:13)
    [GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> quit()

    vicbook:~ victor$ uname -a
    Darwin vicbook 8.9.1 Darwin Kernel Version 8.9.1: Thu Feb 22 20:55:00
    PST 2007; root:xnu-792.18.15~1/RELEASE_I386 i386 i386
    Victor Kryukov, May 16, 2007
    #1
    1. Advertising

  2. On May 16, 1:13 pm, Victor Kryukov <> wrote:
    > Hello list,
    >
    > I've found the following strange behavior of cPickle. Do you think
    > it's a bug, or is it by design?
    >
    > Best regards,
    > Victor.
    >
    > from pickle import dumps
    > from cPickle import dumps as cdumps
    >
    > print dumps('1001799')==dumps(str(1001799))
    > print cdumps('1001799')==cdumps(str(1001799))
    >
    > outputs
    >
    > True
    > False
    >
    > vicbook:~ victor$ python
    > Python 2.5 (r25:51918, Sep 19 2006, 08:49:13)
    > [GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
    > Type "help", "copyright", "credits" or "license" for more information.>>> quit()
    >
    > vicbook:~ victor$ uname -a
    > Darwin vicbook 8.9.1 Darwin Kernel Version 8.9.1: Thu Feb 22 20:55:00
    > PST 2007; root:xnu-792.18.15~1/RELEASE_I386 i386 i386


    If you unpickle though will the results be the same? I suspect they
    will be. That should matter most of all (unless you plan to compare
    objects' identity based on their pickled version.)

    Remember, that by default pickle and cPickle will create a longer
    ASCII representation, for a binary representation use a higher pickle
    protocol -- 2 instead of 1.

    Hope that helps,
    -Nick Vatamaniuc
    Nick Vatamaniuc, May 16, 2007
    #2
    1. Advertising

  3. > > I've found the following strange behavior of cPickle. Do you think
    > > it's a bug, or is it by design?
    > >
    > > Best regards,
    > > Victor.
    > >
    > > from pickle import dumps
    > > from cPickle import dumps as cdumps
    > >
    > > print dumps('1001799')==dumps(str(1001799))
    > > print cdumps('1001799')==cdumps(str(1001799))
    > >
    > > outputs
    > >
    > > True
    > > False
    > >
    > > vicbook:~ victor$ python
    > > Python 2.5 (r25:51918, Sep 19 2006, 08:49:13)
    > > [GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
    > > Type "help", "copyright", "credits" or "license" for more information.>>>

    > quit()
    > >
    > > vicbook:~ victor$ uname -a
    > > Darwin vicbook 8.9.1 Darwin Kernel Version 8.9.1: Thu Feb 22 20:55:00
    > > PST 2007; root:xnu-792.18.15~1/RELEASE_I386 i386 i386

    >
    > If you unpickle though will the results be the same? I suspect they
    > will be. That should matter most of all (unless you plan to compare
    > objects' identity based on their pickled version.)


    The OP was not comparing identity but equality. So it looks like a
    real bug, I think the following should be True for any function f:

    if a == b: f(a) == f(b)

    or not?

    Daniel
    Daniel Nogradi, May 16, 2007
    #3
  4. On May 16, 1:13 pm, Victor Kryukov <> wrote:
    > Hello list,
    >
    > I've found the following strange behavior of cPickle. Do you think
    > it's a bug, or is it by design?
    >
    > Best regards,
    > Victor.
    >
    > from pickle import dumps
    > from cPickle import dumps as cdumps
    >
    > print dumps('1001799')==dumps(str(1001799))
    > print cdumps('1001799')==cdumps(str(1001799))
    >
    > outputs
    >
    > True
    > False
    >
    > vicbook:~ victor$ python
    > Python 2.5 (r25:51918, Sep 19 2006, 08:49:13)
    > [GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
    > Type "help", "copyright", "credits" or "license" for more information.>>> quit()
    >
    > vicbook:~ victor$ uname -a
    > Darwin vicbook 8.9.1 Darwin Kernel Version 8.9.1: Thu Feb 22 20:55:00
    > PST 2007; root:xnu-792.18.15~1/RELEASE_I386 i386 i386


    I might have found the culprit: see http://svn.python.org/projects/python/trunk/Modules/cPickle.c
    Function static int put2(...) has the following code block in it :

    ---------cPickle.c-----------
    int p;
    ....
    if ((p = PyDict_Size(self->memo)) < 0) goto finally;
    /* Make sure memo keys are positive! */
    /* XXX Why?
    * XXX And does "positive" really mean non-negative?
    * XXX pickle.py starts with PUT index 0, not 1. This makes for
    * XXX gratuitous differences between the pickling modules.
    */
    p++;
    -------------------------------

    p++ will cause the difference. It seems the developers are not quite
    sure why it's there or whether memo key sizes can be 0 or have to be
    1.

    Here is corresponding section for the Python version (pickle.py) taken
    from Python 2.5
    ---------pickle.py----------
    def memoize(self, obj):
    """Store an object in the memo."""
    # The Pickler memo is a dictionary mapping object ids to 2-
    tuples
    # that contain the Unpickler memo key and the object being
    memoized.
    # The memo key is written to the pickle and will become
    # the key in the Unpickler's memo. The object is stored in
    the
    # Pickler memo so that transient objects are kept alive during
    # pickling.

    # The use of the Unpickler memo length as the memo key is just
    a
    # convention. The only requirement is that the memo values be
    unique.
    # But there appears no advantage to any other scheme, and this
    # scheme allows the Unpickler memo to be implemented as a
    plain (but
    # growable) array, indexed by memo key.
    if self.fast:
    return
    assert id(obj) not in self.memo
    memo_len = len(self.memo)
    self.write(self.put(memo_len))
    self.memo[id(obj)] = memo_len, obj

    # Return a PUT (BINPUT, LONG_BINPUT) opcode string, with argument
    i.
    def put(self, i, pack=struct.pack):
    if self.bin:
    if i < 256:
    return BINPUT + chr(i)
    else:
    return LONG_BINPUT + pack("<i", i)
    return PUT + repr(i) + '\n'
    ------------------------------------------

    In memoize memo_len is the 'int p' from the c version. The size is 0
    and is kept 0 while in the C version the size initially is 0 but then
    is incremented with p++;

    Any developers that know more about this?

    -Nick Vatamaniuc
    Nick Vatamaniuc, May 16, 2007
    #4
  5. In <>, Daniel Nogradi
    wrote:

    > The OP was not comparing identity but equality. So it looks like a
    > real bug, I think the following should be True for any function f:
    >
    > if a == b: f(a) == f(b)
    >
    > or not?


    In [74]: def f(x):
    ....: return x / 2
    ....:

    In [75]: a = 5

    In [76]: b = 5.0

    In [77]: a == b
    Out[77]: True

    In [78]: f(a) == f(b)
    Out[78]: False

    And `f()` doesn't even use something like `random()` or `time()` here. ;-)

    Ciao,
    Marc 'BlackJack' Rintsch
    Marc 'BlackJack' Rintsch, May 16, 2007
    #5
  6. Victor Kryukov

    Chris Mellon Guest

    On 5/16/07, Daniel Nogradi <> wrote:
    > > > I've found the following strange behavior of cPickle. Do you think
    > > > it's a bug, or is it by design?
    > > >
    > > > Best regards,
    > > > Victor.
    > > >
    > > > from pickle import dumps
    > > > from cPickle import dumps as cdumps
    > > >
    > > > print dumps('1001799')==dumps(str(1001799))
    > > > print cdumps('1001799')==cdumps(str(1001799))
    > > >
    > > > outputs
    > > >
    > > > True
    > > > False
    > > >
    > > > vicbook:~ victor$ python
    > > > Python 2.5 (r25:51918, Sep 19 2006, 08:49:13)
    > > > [GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
    > > > Type "help", "copyright", "credits" or "license" for more information.>>>

    > > quit()
    > > >
    > > > vicbook:~ victor$ uname -a
    > > > Darwin vicbook 8.9.1 Darwin Kernel Version 8.9.1: Thu Feb 22 20:55:00
    > > > PST 2007; root:xnu-792.18.15~1/RELEASE_I386 i386 i386

    > >
    > > If you unpickle though will the results be the same? I suspect they
    > > will be. That should matter most of all (unless you plan to compare
    > > objects' identity based on their pickled version.)

    >
    > The OP was not comparing identity but equality. So it looks like a
    > real bug, I think the following should be True for any function f:
    >
    > if a == b: f(a) == f(b)
    >
    > or not?
    >


    Obviously not, in the general case. random.random(x) is the most
    obvious example, but there's any number functions which don't return
    the same value for equal inputs. Take file() or open() - since you get
    a new file object with new state, it obviously will not be equal even
    if it's the same file path.

    For certain inputs, cPickle doesn't print the memo information that is
    used to support recursive and shared data structures. I'm not sure how
    it tells the difference, perhaps it has something to do with
    refcounts. In any case, it's an optimization of the pickle output, not
    a bug.
    Chris Mellon, May 16, 2007
    #6
  7. > > > > I've found the following strange behavior of cPickle. Do you think
    > > > > it's a bug, or is it by design?
    > > > >
    > > > > Best regards,
    > > > > Victor.
    > > > >
    > > > > from pickle import dumps
    > > > > from cPickle import dumps as cdumps
    > > > >
    > > > > print dumps('1001799')==dumps(str(1001799))
    > > > > print cdumps('1001799')==cdumps(str(1001799))
    > > > >
    > > > > outputs
    > > > >
    > > > > True
    > > > > False
    > > > >
    > > > > vicbook:~ victor$ python
    > > > > Python 2.5 (r25:51918, Sep 19 2006, 08:49:13)
    > > > > [GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
    > > > > Type "help", "copyright", "credits" or "license" for more

    > information.>>>
    > > > quit()
    > > > >
    > > > > vicbook:~ victor$ uname -a
    > > > > Darwin vicbook 8.9.1 Darwin Kernel Version 8.9.1: Thu Feb 22 20:55:00
    > > > > PST 2007; root:xnu-792.18.15~1/RELEASE_I386 i386 i386
    > > >
    > > > If you unpickle though will the results be the same? I suspect they
    > > > will be. That should matter most of all (unless you plan to compare
    > > > objects' identity based on their pickled version.)

    > >
    > > The OP was not comparing identity but equality. So it looks like a
    > > real bug, I think the following should be True for any function f:
    > >
    > > if a == b: f(a) == f(b)
    > >
    > > or not?
    > >

    >
    > Obviously not, in the general case. random.random(x) is the most
    > obvious example, but there's any number functions which don't return
    > the same value for equal inputs. Take file() or open() - since you get
    > a new file object with new state, it obviously will not be equal even
    > if it's the same file path.


    Right, sorry about that, posted too quickly :)
    I was thinking for a while about a deterministic

    > For certain inputs, cPickle doesn't print the memo information that is
    > used to support recursive and shared data structures. I'm not sure how
    > it tells the difference, perhaps it has something to do with
    > refcounts. In any case, it's an optimization of the pickle output, not
    > a bug.


    Caching?

    >>> from cPickle import dumps
    >>> dumps('0') == dumps(str(0))

    True
    >>> dumps('1') == dumps(str(1))

    True
    >>> dumps('2') == dumps(str(2))

    True
    .........
    .........
    >>> dumps('9') == dumps(str(9))

    True
    >>> dumps('10') == dumps(str(10))

    False
    >>> dumps('11') == dumps(str(11))

    False


    Daniel
    Daniel Nogradi, May 16, 2007
    #7
  8. Daniel Nogradi wrote:
    > Caching?
    >
    >>>> from cPickle import dumps
    >>>> dumps('0') == dumps(str(0))

    > True
    >>>> dumps('1') == dumps(str(1))

    > True
    >>>> dumps('2') == dumps(str(2))

    > True
    > ........
    > ........
    >>>> dumps('9') == dumps(str(9))

    > True
    >>>> dumps('10') == dumps(str(10))

    > False
    >>>> dumps('11') == dumps(str(11))

    > False


    All strings of length 0 (there is 1) and 1 (there are 256) are interned.

    - Josiah
    Josiah Carlson, May 17, 2007
    #8
  9. En Thu, 17 May 2007 02:09:02 -0300, Josiah Carlson
    <> escribió:

    > All strings of length 0 (there is 1) and 1 (there are 256) are interned.


    I thought it was the case too, but not always:

    py> a = "a"
    py> b = "A".lower()
    py> a==b
    True
    py> a is b
    False
    py> a is intern(a)
    True
    py> b is intern(b)
    False

    --
    Gabriel Genellina
    Gabriel Genellina, May 17, 2007
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Guenter Walser
    Replies:
    0
    Views:
    593
    Guenter Walser
    Oct 15, 2003
  2. Jesse Bloom

    problem using pickle / cPickle

    Jesse Bloom, Jan 2, 2004, in forum: Python
    Replies:
    1
    Views:
    377
    Vojin Jovanovic
    Jan 3, 2004
  3. Adrian B.

    Does shelve use cPickle or Pickle?

    Adrian B., Feb 7, 2004, in forum: Python
    Replies:
    1
    Views:
    717
    Gerrit
    Feb 7, 2004
  4. A.B., Khalid

    pickle, cPickle, & HIGHEST_PROTOCOL

    A.B., Khalid, Jan 30, 2005, in forum: Python
    Replies:
    1
    Views:
    375
    Tim Peters
    Jan 30, 2005
  5. Zac Burns
    Replies:
    0
    Views:
    213
    Zac Burns
    Nov 10, 2009
Loading...

Share This Page