best(fastest) way to send and get lists from files

A

Abrahams, Max

I've looked into pickle, dump, load, save, readlines(), etc.

Which is the best method? Fastest? My lists tend to be around a thousand to a million items.

Binary and text files are both okay, text would be preferred in general unless there's a significant speed boost from something binary.

thanks
 
Y

Yu-Xi Lim

I've looked into pickle, dump, load, save, readlines(), etc.

Which is the best method? Fastest? My lists tend to be around a thousand to a million items.

Binary and text files are both okay, text would be preferred in general unless there's a significant speed boost from something binary.

thanks

1) Why don't you time them with the timeit module?
http://docs.python.org/lib/module-timeit.html

Results will vary with the specific data you have, and your hardware
speed, but if it's a lot of data, it's most likely going to be the
latter that's the bottleneck. A compact binary format will help
alleviate this.

If you're reading a lot of data into memory, you might have to deal with
your OS swap/virtual memory.

2) "Best" depends on what your data is and what you're doing with it.

Are you reinventing a flat-file database? There are better solutions for
databases.

If you're just reformatting data to pass to another program, say, for
scientific computation, the portability may be more of an issue. Number
crunching the resultant data may be even more time consuming such that
the time spent writing/reading it becomes insignificant.
 
P

Paddy

I've looked into pickle, dump, load, save, readlines(), etc
I've used the following sometimes:

from pprint import pprint as pp
print "data = \\"
pp(data)

That created a python file that could be read as a module, but there
are limitations on the __repr__ of the data.

- Paddy.
P.S. I never timed it - it was fast enough, and the data was readable.
 
N

Nick Craig-Wood

Abrahams said:
I've looked into pickle, dump, load, save, readlines(), etc.

Which is the best method? Fastest? My lists tend to be around a thousand to a million items.

Binary and text files are both okay, text would be preferred in
general unless there's a significant speed boost from something
binary.

You could try the marshal module which is very vast, lightweight and
built in.

http://www.python.org/doc/current/lib/module-marshal.html

It makes a binary format though, and it will only dump "simple"
objects - see the page above. It is what python uses internally to
make .pyc files from .py I believe.

------------------------------------------------------------
#!/usr/bin/python

import os
from marshal import dump, load
from timeit import Timer

def write(N, file_name = "z.marshal"):
L = range(N)
out = open(file_name, "wb")
dump(L, out)
out.close()
print "Written %d bytes for list size %d" % (os.path.getsize(file_name), N)

def read(N):
inp = open("z.marshal", "rb")
L = load(inp)
inp.close()
assert len(L) == N

for log_N in range(7):
N = 10**log_N
loops = 10
write(N)
print "Read back %d items in" % N, Timer("read(%d)" % N, "from __main__ import read").repeat(1, loops)[0]/loops, "s"
------------------------------------------------------------

Produces

$ ./test-marshal.py
Written 10 bytes for list size 1
Read back 1 items in 4.14133071899e-05 s
Written 55 bytes for list size 10
Read back 10 items in 4.31060791016e-05 s
Written 505 bytes for list size 100
Read back 100 items in 8.23020935059e-05 s
Written 5005 bytes for list size 1000
Read back 1000 items in 0.000352478027344 s
Written 50005 bytes for list size 10000
Read back 10000 items in 0.00165479183197 s
Written 500005 bytes for list size 100000
Read back 100000 items in 0.0175776958466 s
Written 5000005 bytes for list size 1000000
Read back 1000000 items in 0.175704598427 s
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top