cPickle asymptotic performance?

Discussion in 'Python' started by Eric Jonas, Jun 12, 2008.

  1. Eric Jonas

    Eric Jonas Guest

    Hello,

    I've done some benchmarking while attempting to serialize my (large)
    graph data structure with cPickle; I'm seeing superlinear performance
    (plotting it seems to suggest n^2 where n is the number of nodes of my
    graph), in the duration of the pickle.dump calls and I can't quite
    figure out why. The connectivity of the graph is such that the number of
    nodes is ~ number of edges, so I don't think this is a problem of edge
    count secretly growing > O(n). Is cPickle's behavior known to be O(n^2)?
    Does anyone have any generic tips for speeding up cPickle?

    Thanks,
    ...Eric
     
    Eric Jonas, Jun 12, 2008
    #1
    1. Advertising

  2. Eric Jonas <> writes:

    > I've done some benchmarking while attempting to serialize my (large)
    > graph data structure with cPickle; I'm seeing superlinear performance
    > (plotting it seems to suggest n^2 where n is the number of nodes of my
    > graph), in the duration of the pickle.dump calls and I can't quite
    > figure out why.


    Try gc.disable() before loading the pickle, and gc.enable() after.

    > Is cPickle's behavior known to be O(n^2)?


    No, but the garbage collector's sometimes is.
     
    Hrvoje Niksic, Jun 12, 2008
    #2
    1. Advertising

  3. Eric Jonas

    Eric Jonas Guest

    On Thu, 2008-06-12 at 20:57 +0200, Hrvoje Niksic wrote:
    > Eric Jonas <> writes:
    >
    > > I've done some benchmarking while attempting to serialize my (large)
    > > graph data structure with cPickle; I'm seeing superlinear performance
    > > (plotting it seems to suggest n^2 where n is the number of nodes of my
    > > graph), in the duration of the pickle.dump calls and I can't quite
    > > figure out why.

    >
    > Try gc.disable() before loading the pickle, and gc.enable() after.
    >
    > > Is cPickle's behavior known to be O(n^2)?

    >
    > No, but the garbage collector's sometimes is.



    Wow, that totally fixed it -- we went from 1200 seconds to 60 seconds.
    I'm somewhat shocked -- is the occasionally-abysmal behavior of the GC
    documented anywhere?
    ...Eric
     
    Eric Jonas, Jun 12, 2008
    #3
  4. Eric Jonas <> writes:

    >> Try gc.disable() before loading the pickle, and gc.enable() after.
    >>
    >> > Is cPickle's behavior known to be O(n^2)?

    >>
    >> No, but the garbage collector's sometimes is.

    >
    > Wow, that totally fixed it -- we went from 1200 seconds to 60
    > seconds.


    60 seconds is still a long time. How many objects are you
    deserializing? Is the time now approximately O(n)?

    > I'm somewhat shocked -- is the occasionally-abysmal behavior of the
    > GC documented anywhere?


    I don't think so. It's somewhat of an FAQ on the list, though. The
    question tends to arise when someone tries to construct a large list
    of non-trivial objects, which takes quadratic time because object
    allocation regularly triggers GC, which traverses the growing list.
     
    Hrvoje Niksic, Jun 13, 2008
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Drochom

    cPickle alternative?

    Drochom, Aug 15, 2003, in forum: Python
    Replies:
    13
    Views:
    880
    Michael Peuser
    Aug 17, 2003
  2. Carsten Gips

    Jython: jythonc and cPickle

    Carsten Gips, Sep 9, 2003, in forum: Python
    Replies:
    0
    Views:
    428
    Carsten Gips
    Sep 9, 2003
  3. Guenter Walser
    Replies:
    0
    Views:
    607
    Guenter Walser
    Oct 15, 2003
  4. Replies:
    2
    Views:
    342
    Paul Rubin
    May 11, 2004
  5. Replies:
    2
    Views:
    355
Loading...

Share This Page