repeatedly open file or save entire file to memory?

Discussion in 'Ruby' started by Jason Lillywhite, Sep 17, 2009.

  1. I want to make sure I do what is most efficient when dealing with
    multiple and potentially large files.

    I need to take row(n) and row(n+1) from a file and use the data to do
    things in other parts of my program. Then the program will iterate by
    incrementing n. I may have up to 30 files, each having 50,000 rows.

    My question is should I read row(n) and row(n+1), accessing the file
    again and again on each iteration of the main program? Or should I just
    read the whole file into memory (say, an array) then just grab items
    from the array by index in the main program?
    --
    Posted via http://www.ruby-forum.com/.
     
    Jason Lillywhite, Sep 17, 2009
    #1
    1. Advertising

  2. 2009/9/17 Jason Lillywhite <>:
    > I want to make sure I do what is most efficient when dealing with
    > multiple and potentially large files.
    >
    > I need to take row(n) and row(n+1) from a file and use the data to do
    > things in other parts of my program. Then the program will iterate by
    > incrementing n. I may have up to 30 files, each having 50,000 rows.
    >
    > My question is should I read row(n) and row(n+1), accessing the file
    > again and again on each iteration of the main program? Or should I just
    > read the whole file into memory (say, an array) then just grab items
    > from the array by index in the main program?


    Other schemes can be devised too:

    1. read the file once remembering indexes for every file and row
    (IO#tell) and then access rows via IO#seek

    2. since you are incrementing n, read row n, remember pos, read row n
    + 1, next time round #seek to position and continue reading

    3. as 2 but remember line n+1 so you do not have to read it again

    4. if the access pattern to files is not round robin but different,
    you might get better results by storing more information in memory
    forr least recently accessed files

    5. read files in chunks of x lines and remember them in memory thus
    reducing file accesses

    ...

    It really depends on what you do with those files, how your access
    patterns are etc.

    Kind regards

    robert


    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
     
    Robert Klemme, Sep 17, 2009
    #2
    1. Advertising

  3. Jason Lillywhite

    Axel Etzold Guest

    Jason,


    > > I want to make sure I do what is most efficient when dealing with
    > > multiple and potentially large files.


    you can use ruby-prof for profiling of your code. It's available as
    a gem.

    Best regards,

    Axel
    --
    Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3 -
    sicherer, schneller und einfacher! http://portal.gmx.net/de/go/chbrowser
     
    Axel Etzold, Sep 17, 2009
    #3
  4. 2009/9/17 Axel Etzold <>:
    > Jason,


    >> > I want to make sure I do what is most efficient when dealing with
    >> > multiple and potentially large files.

    >
    > you can use ruby-prof for profiling of your code. It's available as
    > a gem.


    I consider Jason's question as a design level question. That's
    nothing where a profiler can really help. Of course you can code up
    alternatives and measure performance. But it can only tell you which
    version of several is fastest - it cannot tell you how you should
    change your design to improve it.

    In this case performance bottlenecks are rather in the area of disk IO
    and all a profiler can tell you is how much of your time you spend in
    IO - but not how to minimize that.

    Kind regards

    robert

    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
     
    Robert Klemme, Sep 18, 2009
    #4
  5. [Note: parts of this message were removed to make it a legal post.]

    You could put the data into a database,
    which should be performant enough
    and still very easy to use, even when
    your lookup pattern should change
    in the future.

    Greetz!


    > > > I want to make sure I do what is most efficient when dealing with
    > > > multiple and potentially large files.

    >
    >
     
    Fabian Streitel, Sep 18, 2009
    #5
  6. [Note: parts of this message were removed to make it a legal post.]

    >
    > In this case performance bottlenecks are rather in the area of disk IO
    > and all a profiler can tell you is how much of your time you spend in
    > IO - but not how to minimize that.
    >
    >

    I agree, although that argument doesn't make much sense.

    A profiler can never tell you how to minimize anything, it can
    only show you where you should look for optimizations.
    In this case of course that's futile, since we already know where
    to optimize: the IO

    Greetz!
     
    Fabian Streitel, Sep 18, 2009
    #6
  7. Jason Lillywhite

    Axel Etzold Guest

    -------- Original-Nachricht --------
    > Datum: Fri, 18 Sep 2009 15:30:38 +0900
    > Von: Robert Klemme <shortcutter@googlema il.com>
    > An:
    > Betreff: Re: repeatedly open file or save entire file to memory?


    > 2009/9/17 Axel Etzold <>:
    > > Jason,

    >
    > >> > I want to make sure I do what is most efficient when dealing with
    > >> > multiple and potentially large files.

    > >
    > > you can use ruby-prof for profiling of your code. It's available as
    > > a gem.


    Dear Robert,

    >
    > I consider Jason's question as a design level question. That's
    > nothing where a profiler can really help. Of course you can code up
    > alternatives and measure performance. But it can only tell you which
    > version of several is fastest - it cannot tell you how you should
    > change your design to improve it.


    I agree with you. I proposed this precisely to see how long several
    alternatives take. One always has to think about design oneself :)

    Best regards,

    Axel
    --
    Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3 -
    sicherer, schneller und einfacher! http://portal.gmx.net/de/go/atbrowser
     
    Axel Etzold, Sep 18, 2009
    #7
  8. Fabian Streitel wrote:
    > You could put the data into a database,
    > which should be performant enough
    > and still very easy to use, even when
    > your lookup pattern should change
    > in the future.
    >
    > Greetz!


    That is a good idea. Do you recommend ruby DBI or ActiveRecord? I need
    ease of use and simplicity. My interface is the command line.
    --
    Posted via http://www.ruby-forum.com/.
     
    Jason Lillywhite, Sep 18, 2009
    #8
  9. [Note: parts of this message were removed to make it a legal post.]

    actually I like Datamapper the most. It's very intuitive.
    You should check it out: http://datamapper.org/doku.php

    I definitely like the way datamapper handles things better
    than ActiveRecord, but that's a matter of taste.

    Greetz!


    > That is a good idea. Do you recommend ruby DBI or ActiveRecord? I need
    > ease of use and simplicity. My interface is the command line.
    >
    >
     
    Fabian Streitel, Sep 18, 2009
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. SteveLB
    Replies:
    0
    Views:
    337
    SteveLB
    Aug 8, 2003
  2. Jarrod Hermer
    Replies:
    2
    Views:
    2,236
    kahlawat
    Jul 13, 2007
  3. BusyBoy
    Replies:
    2
    Views:
    631
    =?Utf-8?B?TGVvbiBNYXluZQ==?=
    Oct 31, 2006
  4. Nick Keighley

    repeatedly trying to open a file

    Nick Keighley, Nov 22, 2006, in forum: C++
    Replies:
    1
    Views:
    330
    Pete Becker
    Nov 22, 2006
  5. Graeme Stewart

    Repeatedly parsing a file to "clean" it.

    Graeme Stewart, Sep 10, 2004, in forum: Perl Misc
    Replies:
    6
    Views:
    123
    Tad McClellan
    Sep 18, 2004
Loading...

Share This Page