How to make this unpickling/sorting demo faster?

Discussion in 'Python' started by Steve Bergman, Apr 17, 2008.

  1. I'm involved in a discussion thread in which it has been stated that:

    """
    Anything written in a language that is > 20x slower (Perl, Python,
    PHP) than C/C++ should be instantly rejected by users on those grounds
    alone.
    """

    I've challenged someone to beat the snippet of code below in C, C++,
    or assembler, for reading in one million pairs of random floats and
    sorting them by the second member of the pair. I'm not a master
    Python programmer. Is there anything I could do to make this even
    faster than it is?

    Also, if I try to write the resulting list of tuples back out to a
    gdbm file, it takes a good 14 seconds, which is far longer than the
    reading and sorting takes. The problem seems to be that the 'f' flag
    to gdbm.open() is being ignored and writes are being sync'd to disk
    either on each write, or on close. I'd really prefer to let the OS
    decide when to actually write to disk.

    I'm using python 2.5.2, libgdm 1.8.3, and python-gdbm 2.5.2 under
    Ubuntu 8.4 beta and an x86_64 architechture.

    Thanks for any tips.

    =====
    import cPickle, gdbm, operator
    dbmIn = gdbm.open('float_pairs_in.pickel')
    print "Reading pairs..."
    pairs = cPickle.loads(dbmIn['pairs'])
    print "Sorting pairs..."
    pairs.sort(key=operator.itemgetter(1))
    print "Done!"
    =====


    The input file was created with this:

    =====
    import random, gdbm, cPickle
    print "Creating pairs file..."
    pairs = [(random.random(), random.random(),) for pair in
    range(0,1000000)]
    dbmOut = gdbm.open('float_pairs_in.pickel', 'nf')
    dbmOut['pairs'] = cPickle.dumps(pairs, 2)
    dbmOut.close()
    print "Done!"
    =====
     
    Steve Bergman, Apr 17, 2008
    #1
    1. Advertising

  2. Steve Bergman

    Paddy Guest

    On Apr 17, 6:15 pm, Steve Bergman <> wrote:
    > I'm involved in a discussion thread in which it has been stated that:
    >
    > """
    > Anything written in a language that is > 20x slower (Perl, Python,
    > PHP) than C/C++ should be instantly rejected by users on those grounds
    > alone.
    > """
    >

    THe above is applied slavishly by those who value machine time over
    peoples time. Do you want to work with them?

    - Paddy.
     
    Paddy, Apr 17, 2008
    #2
    1. Advertising


  3. > THe above is applied slavishly by those who value machine time over
    > peoples time. Do you want to work with them?


    I understand where you are coming from. But in writing the code
    snippet I learned something about pickling/unpickling, which I knew
    *about* but had never actually used before. And also learned the
    quick and easy way of DSU sorting, which got easier and faster in
    2.4. So, personally, I'm ahead on this one. Otherwise, I agree that
    arguing the matter in long flame threads is a waste of time.
     
    Steve Bergman, Apr 17, 2008
    #3
  4. Steve Bergman

    Paul Rubin Guest

    Steve Bergman <> writes:
    > Anything written in a language that is > 20x slower (Perl, Python,
    > PHP) than C/C++ should be instantly rejected by users on those grounds
    > alone.


    Well, if you time it starting from when you sit down at the computer
    and start programming, til when the sorted array is output, Python
    might be 20x faster than C/C++ and 100x faster than assembler.

    > I've challenged someone to beat the snippet of code below in C, C++,
    > or assembler, for reading in one million pairs of random floats and
    > sorting them by the second member of the pair. I'm not a master
    > Python programmer. Is there anything I could do to make this even
    > faster than it is?


    1. Turn off the cyclic garbage collector during the operation since
    you will have no cyclic garbage.

    2. See if there is a way to read the array directly (using something
    like the struct module or ctypes) rather than a pickle.

    3. Use psyco and partition the array into several smaller ones with a
    quicksort-like partitioning step, then sort the smaller arrays in
    parallel using multiple processes, if you have a multicore CPU.

    4. Write your sorting routine in C or assembler and call it through
    the C API. If the sorting step is a small part of a large program and
    the sort is using a lot of cpu, this is a good approach since for the
    other parts of the program you still get the safety and productivity
    gain of Python.

    > Also, if I try to write the resulting list of tuples back out to a
    > gdbm file,


    I don't understand what you're doing with gdbm. Just use a binary
    file.
     
    Paul Rubin, Apr 17, 2008
    #4
  5. Thanks for the help. Yes, I see now that gdbm is really superfluous.
    Not having used pickel before, I started from the example in "Python
    in a Nutshell" which uses anydbm. Using a regular file saves some
    time. By far, the biggest improvement has come from using marshall
    rather than cPickel. Especially for writing. gc.disable() shaved off
    another 20%. I can't use psyco because I'm on x86_64. And my goal is
    to demonstrate how Python can combine simplicity, readability, *and*
    speed when the standard library is leveraged properly. So, in this
    case, calling C or assembler code would defeat the purpose, as would
    partitioning into smaller arrays. And my processor is single core.

    I started at about 7 seconds to just read in and sort, and at 20
    seconds to read, sort, and write. I can now read in, sort, and write
    back out in almost exactly 5 seconds, thanks to marshal and your
    suggestions.

    I'll look into struct and ctypes. Although I know *of* them, I'm not
    very familiar *with* them.

    Thank you, again, for your help.
     
    Steve Bergman, Apr 17, 2008
    #5
  6. Steve Bergman

    Paul Rubin Guest

    Steve Bergman <> writes:
    > I started at about 7 seconds to just read in and sort, and at 20
    > seconds to read, sort, and write. I can now read in, sort, and write
    > back out in almost exactly 5 seconds, thanks to marshal and your
    > suggestions.


    That is not bad if you aren't using a raid disk. If you require that
    the output actually be written to the disk though, make sure to type
    "sync" after running your program and count that as part of the
    elapsed time. If those are 64-bit floats then a million of them is 16
    MB and you're doing pretty good if you can get that much through a
    file system in 1 second, so reading and writing the data is about 2
    seconds all by itself. Therefore the likelihood of a C or asm program
    being 20x faster including disk i/o is dim. But realistically,
    counting just CPU time, you might get a 20x speedup with assembler if
    you're really determined, using x86 SSE (128-bit vector) instructions,
    cache prefetching, etc.
     
    Paul Rubin, Apr 18, 2008
    #6
  7. Steve Bergman

    hdante Guest

    On Apr 17, 7:33 pm, Steve Bergman <> wrote:
    > to demonstrate how Python can combine simplicity, readability, *and*
    > speed when the standard library is leveraged properly. So, in this


    But that's not true. You're just creating an artificial example to
    prove your point.

    Consider this C code that does what you say. The code is very basic
    and readable. In my machine, it takes a little longer than one second
    to execute.

    #include <stdio.h>
    #include <stdlib.h>

    #define size 1000000

    struct float_pair {
    double a;
    double b;
    };
    static struct float_pair s[size];

    int compare(const void *a, const void *b)
    {
    const struct float_pair *sa;
    const struct float_pair *sb;
    sa = a;
    sb = b;
    if (sa->b > sb->b) return 1;
    if (sa->b == sb->b) return 0;
    return -1;
    }

    int main(void)
    {
    FILE *f;
    f = fopen("floats", "rb");
    puts("Reading pairs...");
    fread(s, sizeof(s), 1, f);
    fclose(f);
    qsort(s, size, sizeof(struct float_pair), compare);
    puts("Done!");
    return 0;
    }

    Is the code too big ? Yes. Can you make it faster in python ?
    Probably not in the 10 minutes required to write this. The code that
    generates the files is the following:

    #include <stdlib.h>
    #include <stdio.h>
    #include <time.h>

    struct float_pair {
    double a;
    double b;
    };

    static struct float_pair s[1000000];

    int main(void)
    {
    FILE *f;
    unsigned i;
    f = fopen("floats", "wb");
    srand(time(NULL));
    for (i = 0; i < 1000000; i++) {
    s.a = (double)rand()/RAND_MAX;
    s.b = (double)rand()/RAND_MAX;
    }
    fwrite(s, sizeof(s), 1, f);
    fclose(f);
    return 0;
    }

    The binary file used is not portable, because the "double" type is
    not standardized in C. However:
    - if you restrict this code to run only on machines that follow the
    IEEE standard, this code will probably be more portable than the
    python one
    - in python, there may be small rounding errors when exchanging the
    data between machines, since "float" is not standardized either.


    > case, calling C or assembler code would defeat the purpose, as would
    > partitioning into smaller arrays. And my processor is single core.
    >
    > I started at about 7 seconds to just read in and sort, and at 20
    > seconds to read, sort, and write. I can now read in, sort, and write
    > back out in almost exactly 5 seconds, thanks to marshal and your
    > suggestions.
    >
    > I'll look into struct and ctypes. Although I know *of* them, I'm not
    > very familiar *with* them.
    >
    > Thank you, again, for your help.
     
    hdante, Apr 18, 2008
    #7
  8. On Apr 17, 7:37 pm, Paul Rubin <http://> wrote:

    > Therefore the likelihood of a C or asm program
    > being 20x faster including disk i/o is dim.  But realistically,
    > counting just CPU time, you might get a 20x speedup with assembler if
    > you're really determined, using x86 SSE (128-bit vector) instructions,
    > cache prefetching, etc.


    I think that the attitude that is prevalent is that although Python
    and other "interpreted" languages are slow, there are places where
    that slowness is not important and using them is OK. The point that I
    am trying to make with my example is that often, by leveraging the
    care and optimization work that others have put into the Python
    standard library over the years, a rather casual programmer can match
    or better the performance of 'average' C code. i.e. C code that was
    written by a good C programmer while no one was looking over his
    shoulder, and no pressures motivated him to spend a lot of time
    optimizing the C to the hilt, or switching to asm.

    With the reading and writing of the data (which actually works out to
    about 23MB, marshalled) now down to 1 second each, I'm content. In
    the beginning, the io time was overshadowing the sort time by a good
    bit. And it was the sort time that I wanted to highlight.

    BTW, no one has responded to my challenge to best the original sample
    Python code with C, C++, or asm.
     
    Steve Bergman, Apr 18, 2008
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Bob
    Replies:
    1
    Views:
    358
    Martin v. =?iso-8859-15?q?L=F6wis?=
    Sep 11, 2003
  2. BG
    Replies:
    4
    Views:
    324
  3. F. GEIGER
    Replies:
    9
    Views:
    1,226
    F. GEIGER
    May 3, 2004
  4. =?iso-8859-15?Q?Pierre-Fr=E9d=E9ric_Caillaud?=

    Unpickling crashing my machine

    =?iso-8859-15?Q?Pierre-Fr=E9d=E9ric_Caillaud?=, Jul 30, 2004, in forum: Python
    Replies:
    3
    Views:
    303
    =?iso-8859-15?Q?Pierre-Fr=E9d=E9ric_Caillaud?=
    Jul 30, 2004
  5. Andy Leszczynski

    wxPython demo /Process does not open new demo

    Andy Leszczynski, Feb 18, 2005, in forum: Python
    Replies:
    1
    Views:
    647
    Andy Leszczynski
    Feb 18, 2005
Loading...

Share This Page