best(fastest) way to send and get lists from files

Discussion in 'Python' started by Abrahams, Max, Jan 31, 2008.

  1. I've looked into pickle, dump, load, save, readlines(), etc.

    Which is the best method? Fastest? My lists tend to be around a thousand to a million items.

    Binary and text files are both okay, text would be preferred in general unless there's a significant speed boost from something binary.

    thanks
     
    Abrahams, Max, Jan 31, 2008
    #1
    1. Advertising

  2. Abrahams, Max

    Yu-Xi Lim Guest

    Abrahams, Max wrote:
    > I've looked into pickle, dump, load, save, readlines(), etc.
    >
    > Which is the best method? Fastest? My lists tend to be around a thousand to a million items.
    >
    > Binary and text files are both okay, text would be preferred in general unless there's a significant speed boost from something binary.
    >
    > thanks


    1) Why don't you time them with the timeit module?
    http://docs.python.org/lib/module-timeit.html

    Results will vary with the specific data you have, and your hardware
    speed, but if it's a lot of data, it's most likely going to be the
    latter that's the bottleneck. A compact binary format will help
    alleviate this.

    If you're reading a lot of data into memory, you might have to deal with
    your OS swap/virtual memory.

    2) "Best" depends on what your data is and what you're doing with it.

    Are you reinventing a flat-file database? There are better solutions for
    databases.

    If you're just reformatting data to pass to another program, say, for
    scientific computation, the portability may be more of an issue. Number
    crunching the resultant data may be even more time consuming such that
    the time spent writing/reading it becomes insignificant.
     
    Yu-Xi Lim, Jan 31, 2008
    #2
    1. Advertising

  3. Abrahams, Max

    Paddy Guest

    On Jan 31, 7:34 pm, "Abrahams, Max" <> wrote:
    > I've looked into pickle, dump, load, save, readlines(), etc

    I've used the following sometimes:

    from pprint import pprint as pp
    print "data = \\"
    pp(data)

    That created a python file that could be read as a module, but there
    are limitations on the __repr__ of the data.

    - Paddy.
    P.S. I never timed it - it was fast enough, and the data was readable.
     
    Paddy, Feb 1, 2008
    #3
  4. Abrahams, Max <> wrote:
    >
    > I've looked into pickle, dump, load, save, readlines(), etc.
    >
    > Which is the best method? Fastest? My lists tend to be around a thousand to a million items.
    >
    > Binary and text files are both okay, text would be preferred in
    > general unless there's a significant speed boost from something
    > binary.


    You could try the marshal module which is very vast, lightweight and
    built in.

    http://www.python.org/doc/current/lib/module-marshal.html

    It makes a binary format though, and it will only dump "simple"
    objects - see the page above. It is what python uses internally to
    make .pyc files from .py I believe.

    ------------------------------------------------------------
    #!/usr/bin/python

    import os
    from marshal import dump, load
    from timeit import Timer

    def write(N, file_name = "z.marshal"):
    L = range(N)
    out = open(file_name, "wb")
    dump(L, out)
    out.close()
    print "Written %d bytes for list size %d" % (os.path.getsize(file_name), N)

    def read(N):
    inp = open("z.marshal", "rb")
    L = load(inp)
    inp.close()
    assert len(L) == N

    for log_N in range(7):
    N = 10**log_N
    loops = 10
    write(N)
    print "Read back %d items in" % N, Timer("read(%d)" % N, "from __main__ import read").repeat(1, loops)[0]/loops, "s"
    ------------------------------------------------------------

    Produces

    $ ./test-marshal.py
    Written 10 bytes for list size 1
    Read back 1 items in 4.14133071899e-05 s
    Written 55 bytes for list size 10
    Read back 10 items in 4.31060791016e-05 s
    Written 505 bytes for list size 100
    Read back 100 items in 8.23020935059e-05 s
    Written 5005 bytes for list size 1000
    Read back 1000 items in 0.000352478027344 s
    Written 50005 bytes for list size 10000
    Read back 10000 items in 0.00165479183197 s
    Written 500005 bytes for list size 100000
    Read back 100000 items in 0.0175776958466 s
    Written 5000005 bytes for list size 1000000
    Read back 1000000 items in 0.175704598427 s

    --
    Nick Craig-Wood <> -- http://www.craig-wood.com/nick
     
    Nick Craig-Wood, Feb 5, 2008
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. chad kline

    fastest way to load NON-readonly files.

    chad kline, Aug 12, 2003, in forum: C Programming
    Replies:
    0
    Views:
    299
    chad kline
    Aug 12, 2003
  2. =?UTF-8?B?w4FuZ2VsIEd1dGnDqXJyZXogUm9kcsOtZ3Vleg==

    List of lists of lists of lists...

    =?UTF-8?B?w4FuZ2VsIEd1dGnDqXJyZXogUm9kcsOtZ3Vleg==, May 8, 2006, in forum: Python
    Replies:
    5
    Views:
    413
    =?UTF-8?B?w4FuZ2VsIEd1dGnDqXJyZXogUm9kcsOtZ3Vleg==
    May 15, 2006
  3. Prateek
    Replies:
    11
    Views:
    1,145
    Prateek
    Apr 30, 2007
  4. G.W. Lucas
    Replies:
    12
    Views:
    572
  5. howa
    Replies:
    6
    Views:
    143
Loading...

Share This Page