repeatedly open file or save entire file to memory?

J

Jason Lillywhite

I want to make sure I do what is most efficient when dealing with
multiple and potentially large files.

I need to take row(n) and row(n+1) from a file and use the data to do
things in other parts of my program. Then the program will iterate by
incrementing n. I may have up to 30 files, each having 50,000 rows.

My question is should I read row(n) and row(n+1), accessing the file
again and again on each iteration of the main program? Or should I just
read the whole file into memory (say, an array) then just grab items
from the array by index in the main program?
 
R

Robert Klemme

2009/9/17 Jason Lillywhite said:
I want to make sure I do what is most efficient when dealing with
multiple and potentially large files.

I need to take row(n) and row(n+1) from a file and use the data to do
things in other parts of my program. Then the program will iterate by
incrementing n. I may have up to 30 files, each having 50,000 rows.

My question is should I read row(n) and row(n+1), accessing the file
again and again on each iteration of the main program? Or should I just
read the whole file into memory (say, an array) then just grab items
from the array by index in the main program?

Other schemes can be devised too:

1. read the file once remembering indexes for every file and row
(IO#tell) and then access rows via IO#seek

2. since you are incrementing n, read row n, remember pos, read row n
+ 1, next time round #seek to position and continue reading

3. as 2 but remember line n+1 so you do not have to read it again

4. if the access pattern to files is not round robin but different,
you might get better results by storing more information in memory
forr least recently accessed files

5. read files in chunks of x lines and remember them in memory thus
reducing file accesses

...

It really depends on what you do with those files, how your access
patterns are etc.

Kind regards

robert
 
A

Axel Etzold

Jason,


you can use ruby-prof for profiling of your code. It's available as
a gem.

Best regards,

Axel
 
R

Robert Klemme

2009/9/17 Axel Etzold said:
Jason,

you can use ruby-prof for profiling of your code. It's available as
a gem.

I consider Jason's question as a design level question. That's
nothing where a profiler can really help. Of course you can code up
alternatives and measure performance. But it can only tell you which
version of several is fastest - it cannot tell you how you should
change your design to improve it.

In this case performance bottlenecks are rather in the area of disk IO
and all a profiler can tell you is how much of your time you spend in
IO - but not how to minimize that.

Kind regards

robert
 
F

Fabian Streitel

[Note: parts of this message were removed to make it a legal post.]

You could put the data into a database,
which should be performant enough
and still very easy to use, even when
your lookup pattern should change
in the future.

Greetz!
 
F

Fabian Streitel

[Note: parts of this message were removed to make it a legal post.]
In this case performance bottlenecks are rather in the area of disk IO
and all a profiler can tell you is how much of your time you spend in
IO - but not how to minimize that.
I agree, although that argument doesn't make much sense.

A profiler can never tell you how to minimize anything, it can
only show you where you should look for optimizations.
In this case of course that's futile, since we already know where
to optimize: the IO

Greetz!
 
A

Axel Etzold

-------- Original-Nachricht --------
Datum: Fri, 18 Sep 2009 15:30:38 +0900
Von: Robert Klemme <shortcutter@googlema il.com>
An: (e-mail address removed)
Betreff: Re: repeatedly open file or save entire file to memory?

Dear Robert,
I consider Jason's question as a design level question. That's
nothing where a profiler can really help. Of course you can code up
alternatives and measure performance. But it can only tell you which
version of several is fastest - it cannot tell you how you should
change your design to improve it.

I agree with you. I proposed this precisely to see how long several
alternatives take. One always has to think about design oneself :)

Best regards,

Axel
 
J

Jason Lillywhite

Fabian said:
You could put the data into a database,
which should be performant enough
and still very easy to use, even when
your lookup pattern should change
in the future.

Greetz!

That is a good idea. Do you recommend ruby DBI or ActiveRecord? I need
ease of use and simplicity. My interface is the command line.
 
F

Fabian Streitel

[Note: parts of this message were removed to make it a legal post.]

actually I like Datamapper the most. It's very intuitive.
You should check it out: http://datamapper.org/doku.php

I definitely like the way datamapper handles things better
than ActiveRecord, but that's a matter of taste.

Greetz!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,015
Latest member
AmbrosePal

Latest Threads

Top