performance question: dictionary or list, float or string?

B

bkamrani

Hi Python gurus!
I'm going to read in an Ascii file containing float numbers in rows
and columns (say 10 columns 500000 rows) for further numerical
process. Which format is best to save them in, eg, dictionary, list,
or numpy array when it comes to performance?

Will it be beneficial to convert all strings to float directly after
reading or it doesn't matter to save them as string and thereafter
when it comes to calculation convert them to floats?

Thank you!
/Ben
 
B

bkamrani

I forgot to mention that I did a simple timeit test which doesn't
show
significant runtime difference 3.5 sec for dictionary case and 3.48
for
list case.


def read_as_dictionary():
fil = open('myDataFile', 'r')
forces = {}
for region in range(25):
forces[region] = {}

for step in range(20000):
for region in range(25):
line = fil.next(); spl = line.split()
forces[region] [step] = spl

def read_as_list():
fil = open('myDataFile.txt', 'r')
forces = []
for region in range(25):
forces.append([])

for step in range(20000):
for region in range(25):
line = fil.next(); spl = line.split()
forces[region].append(spl)

Cheers,
/Ben
 
S

Steven D'Aprano

Hi Python gurus!
I'm going to read in an Ascii file containing float numbers in rows and
columns (say 10 columns 500000 rows) for further numerical process.
Which format is best to save them in, eg, dictionary, list, or numpy
array when it comes to performance?

That depends on:

(1) What do you mean by performance? Speed or memory use?

(2) Do you care about the performance of reading the data in, or the
performance of working with the data later, or both?

(3) What do you intend to do with the numbers later?

Will it be beneficial to convert all strings to float directly after
reading or it doesn't matter to save them as string and thereafter when
it comes to calculation convert them to floats?

That depends on what you intend to do with them. Since you're doing
numerical processing, it's probably a good idea to convert them to
numbers rather than strings.
 
M

Matimus

I forgot to mention that I did a simple timeit test which doesn't
show
significant runtime difference 3.5 sec for dictionary case and 3.48
for
list case.

def read_as_dictionary():
    fil = open('myDataFile', 'r')
    forces = {}
    for region in range(25):
        forces[region] = {}

    for step in range(20000):
        for region in range(25):
            line = fil.next(); spl = line.split()
            forces[region] [step] = spl

def read_as_list():
    fil = open('myDataFile.txt', 'r')
    forces = []
    for region in range(25):
        forces.append([])

    for step in range(20000):
        for region in range(25):
            line = fil.next(); spl = line.split()
            forces[region].append(spl)

Cheers,
/Ben

There really isn't enough information to recommend a particular
direction. A dictionary doesn't seem appropriate for
this information though. Also, you are hard coding the step range to
20000. Is that the number of lines in the file? That isn't really a
safe way to do it.

# this is just bad style in python:
line = fil.next(); spl = line.split()
# better written
spl = fil.next().split()

I would just do it this way:

def read_as_list(data, regions=25, maxlines=20000):
# If data is a filename, open the file. If it is a file
# object or any sequence of 'lines' it should just work.

file_opened = False
if isinstance(data, basestring):
data = open(data, 'r')
file_opened = True

forces = [[] for _ in xrange(regions)]
try:
for i, line in data:
if i == maxlines:
break
forces[i % 25].append(line.split())
finally:
if file_opened:
f.close()
return forces


Matt
 
B

bkamrani

Thanks for your questions. Here come some answer below.

That depends on:

(1) What do you mean by performance? Speed or memory use?

Well, I think the speed is more important in this case as the volume
of data
is not large.
(2) Do you care about the performance of reading the data in, or the
performance of working with the data later, or both?

The reading process is pretty fast and in range of some second, but I
meant the performance of working with data.
(3) What do you intend to do with the numbers later?

Normal numercal calcualtion such as sum, multiplication. (but not
matrix multiplication)

Thanks
/Ben
 
B

bkamrani

Matt, really thanks for your comments!
Even thogh it was not a direct answer to my questions,
I like your coding style very much and I think you have a good point.

About the number of line in the file, because I get that info from
another
in advance. Therefore I thought it could be hard coded.

BTW, could you recommend a book or a note on points you have mentioned
so that I can learn more like that?

Thanks,
/Ben

I forgot to mention that I did a simple timeit test which doesn't
show
significant runtime difference 3.5 sec for dictionary case and 3.48
for
list case.
def read_as_dictionary():
    fil = open('myDataFile', 'r')
    forces = {}
    for region in range(25):
        forces[region] = {}
    for step in range(20000):
        for region in range(25):
            line = fil.next(); spl = line.split()
            forces[region] [step] = spl
def read_as_list():
    fil = open('myDataFile.txt', 'r')
    forces = []
    for region in range(25):
        forces.append([])
    for step in range(20000):
        for region in range(25):
            line = fil.next(); spl = line.split()
            forces[region].append(spl)
Cheers,
/Ben

There really isn't enough information to recommend a particular
direction. A dictionary doesn't seem appropriate for
this information though. Also, you are hard coding the step range to
20000. Is that the number of lines in the file? That isn't really a
safe way to do it.

# this is just bad style in python:
line = fil.next(); spl = line.split()
# better written
spl = fil.next().split()

I would just do it this way:

def read_as_list(data, regions=25, maxlines=20000):
    # If data is a filename, open the file. If it is a file
    # object or any sequence of 'lines' it should just work.

    file_opened = False
    if isinstance(data, basestring):
        data = open(data, 'r')
        file_opened = True

    forces = [[] for _ in xrange(regions)]
    try:
        for i, line in data:
            if i == maxlines:
                break
            forces[i % 25].append(line.split())
    finally:
        if file_opened:
            f.close()
    return forces

Matt
 
B

bkamrani

About the piece of code you posted, there is something I don't
understand.

for i, line in data:

where data is a file object. Is it legal to write that?
I believe it results in "too many values to unpack" or do I miss
something?

/Ben

Matt, really thanks for your comments!
Even thogh it was not a direct answer to my questions,
I like your coding style very much and I think you have a good point.

About the number of line in the file, because I get that info from
another
in advance. Therefore I thought it could be hard coded.

BTW, could you recommend a book or a note on points you have mentioned
so that I can learn more like that?

Thanks,
/Ben
# this is just bad style in python:
line = fil.next(); spl = line.split()
# better written
spl = fil.next().split()
I would just do it this way:
def read_as_list(data, regions=25, maxlines=20000):
    # If data is a filename, open the file. If it is a file
    # object or any sequence of 'lines' it should just work.
    file_opened = False
    if isinstance(data, basestring):
        data = open(data, 'r')
        file_opened = True
    forces = [[] for _ in xrange(regions)]
    try:
        for i, line in data:
            if i == maxlines:
                break
            forces[i % 25].append(line.split())
    finally:
        if file_opened:
            f.close()
    return forces
 
A

alex23

About the piece of code you posted, there is something I don't
understand.

        for i, line in data:

where data is a file object. Is it legal to write that?
I believe it results in "too many values to unpack" or do I miss
something?

From the context, my guess is Matimus intended to write:

for i, line in enumerate(data):
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top