Reading whole file into memory. Parsing 'C' like file efficently

Discussion in 'Perl Misc' started by n_macpherson@sky.com, Jun 17, 2008.

  1. Guest

    I know there are a number of FAQs which disscourage reading whole
    files into memory rather than line by line.

    However my problem is as follows.

    I am reading a file which is a language which looks like (but isn't )
    C. I need to insert comments / documentation at various points in the
    file. However sometimes I don't know what I want to insert until I get
    well past the current line - for example


    for(i=0;i<64;i++)
    {
    // lots of code
    }

    Say my opening brace is on line 95 and my closing brace 195 I want to
    insert a comment

    // for loop ends line 195

    at line 94 (i.e immediately above the opening brace). The problem is
    that processing line by line I don't know until I get to line 195 what
    I have to change at line 9 so I have to store lines 94 to 195 in
    memory anyway

    Similarly if I read a function header, I want to insert some
    documentation before the function header
    so I don't believe processing the file line by line is the best
    solution here. As I will be inserting extra lines into the middle of
    an array I think I am going to need a module to do this.

    Memory won't be an issue - my largest file will only be 6000

    I've been away from Perl for a while but I seem to remember there was
    a module File::Tie which might be suitable.

    I'd be grateful if anyone has any suggestions - the people who will be
    using this don't normally use Perl so I'd like to avoid using any non-
    standard modules if possible

    Thanks

    Niall
     
    , Jun 17, 2008
    #1
    1. Advertising

  2. wrote:
    >Similarly if I read a function header, I want to insert some
    >documentation before the function header
    >so I don't believe processing the file line by line is the best
    >solution here.


    Based on what you said I would tend to agree.

    If that kind of automated annotation is useful is a different story,
    thou. I doubt it. Like for

    >Say my opening brace is on line 95 and my closing brace 195 I want to
    >insert a comment
    >// for loop ends line 195


    First of all a proper indentation will provide even better guidance as
    to where the loop ends. And second a single block spanning 100 lines is
    just plain nuts. A classic rule of thumb used to be that if the code for
    a sub doesn't fit on VT220 screen, then it was too long and you should
    think about splitting it. There ware two reasons for this:
    - you don't want to keep scrolling up and down while thinking about this
    sub
    - anyting much longer becomes too complex for a single sub

    Granted, times have changed and typically you can display many more
    lines on modern terminals. But the second reason is still very sound.
    Many people will probably consider 30-50 lines of code to be the maximum
    length of code that can still be easily viewed and recognized without
    too much mental scrolling.

    >As I will be inserting extra lines into the middle of
    >an array I think I am going to need a module to do this.


    Why? Sounds like a perfect job for splice().

    jue
     
    Jürgen Exner, Jun 17, 2008
    #2
    1. Advertising

  3. Guest

    >
    > First of all a proper indentation will provide even better guidance as
    > to where the loop ends. And second a single block spanning 100 lines is
    > just plain nuts. A classic rule of thumb used to be that if the code for
    > a sub doesn't fit on VT220 screen, then it was too long and you should
    > think about splitting it. There ware two reasons for this:
    > - you don't want to keep scrolling up and down while thinking about this
    > sub
    > - anyting much longer becomes too complex for a single sub
    >
    > Granted, times have changed and typically you can display many more
    > lines on modern terminals. But the second reason is still very sound.
    > Many people will probably consider 30-50 lines of code to be the maximum
    > length of code that can still be easily viewed and recognized without
    > too much mental scrolling.
    >


    One of the reasons I am writing this script is because we have
    introduced coding standards which specify a maximum of 300 lines per
    function and 70 lines for a while/if/else/for loop and I need to
    highlight places in our scripts where this occurs. I agree 300 lines
    for a function is probably too long but in the language concerned
    anything less than 200 would be completely impractical unfortunately.

    The indentation is a good point - our developers mostly develop on
    site which means a variety of editors ( UltraEdit, Visual Studio,
    Notepad++, our own proprietary editor ) are used. This means
    indentation across scripts becomes inconsistent. One of the functions
    of the script I am writing will be to make sure the indentation
    conforms to the coding standards.

    > Why? Sounds like a perfect job for splice().


    Yes - I'd forgotten splice() will allow me to insert into the middle
    of an array (as I said I have been away from Perl for a little
    while) . That should work fine for my purposes.
     
    , Jun 17, 2008
    #3
  4. Guest

    wrote:
    > I know there are a number of FAQs which disscourage reading whole
    > files into memory rather than line by line.


    I hope the discourage you from reading whole files into memory
    thoughtlessly and without good reason. It seems like you do have a good
    reason to read them into memory, so go ahead and do it. There is even a
    module, File::Slurp, to facilitate it.

    ....
    >
    > Memory won't be an issue - my largest file will only be 6000


    Those are famous last words :)

    I remember many times when I've said "it will only ever be X large" and
    then had to eat those words. But of course, I suspect there are many many
    more times that my statement held true and it never did get much larger,
    but those ones don't force themselves back into your attention the way the
    other ones do.

    >
    > I've been away from Perl for a while but I seem to remember there was
    > a module File::Tie which might be suitable.


    For 6000 lines of code, you should be a long long way from needing
    Tie::File. In fact, last time I investigated it, the memory overhead for
    Tie::File was so large that, unless your file's lines are very long, much
    longer than one generally finds in a computer program, it provided little
    memory benefit over slurping the file.

    >
    > I'd be grateful if anyone has any suggestions -


    Don't worry about this particular problem until it has proven itself
    to be an issue (which it probably won't)

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    The costs of publication of this article were defrayed in part by the
    payment of page charges. This article must therefore be hereby marked
    advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
    this fact.
     
    , Jun 17, 2008
    #4
  5. Ben Morrow Guest

    Quoth :
    > wrote:
    >

    [slurping a file into an array]
    > > I've been away from Perl for a while but I seem to remember there was
    > > a module File::Tie which might be suitable.

    >
    > For 6000 lines of code, you should be a long long way from needing
    > Tie::File. In fact, last time I investigated it, the memory overhead for
    > Tie::File was so large that, unless your file's lines are very long, much
    > longer than one generally finds in a computer program, it provided little
    > memory benefit over slurping the file.


    One major advantage of Tie::File is that the interface is exactly the
    same as a slurped array, so if/when memory does become a problem, you
    can simply replace

    use File::Slurp qw/read_file/;

    my @data = read_file 'name';

    with

    use Tie::File;

    tie my @data, 'Tie::File', 'name' or die "can't read 'name': $!";

    and leave the rest of the code unchanged.

    Ben

    --
    Many users now operate their own computers day in and day out on various
    applications without ever writing a program. Indeed, many of these users
    cannot write new programs for their machines...
    -- F.P. Brooks, 'No Silver Bullet', 1987 []
     
    Ben Morrow, Jun 17, 2008
    #5
  6. Guest

    Ben Morrow <> wrote:
    > Quoth :
    > > wrote:
    > >

    > [slurping a file into an array]
    > > > I've been away from Perl for a while but I seem to remember there was
    > > > a module File::Tie which might be suitable.

    > >
    > > For 6000 lines of code, you should be a long long way from needing
    > > Tie::File. In fact, last time I investigated it, the memory overhead
    > > for Tie::File was so large that, unless your file's lines are very
    > > long, much longer than one generally finds in a computer program, it
    > > provided little memory benefit over slurping the file.

    >
    > One major advantage of Tie::File is that the interface is exactly the
    > same as a slurped array, so if/when memory does become a problem, you
    > can simply replace
    >
    > use File::Slurp qw/read_file/;
    >
    > my @data = read_file 'name';


    This uses 3 times as much memory as reading in the file in a while loop
    and pushing it into the array. It seems like it should only be two times
    as much, but it isn't (And it is 1.5 times as much @data=<$fh> takes). Of
    course, most of that excess memory is eligible for later reuse, provided
    your program survives and needs it.

    >
    > with
    >
    > use Tie::File;
    >
    > tie my @data, 'Tie::File', 'name' or die "can't read 'name': $!";
    >
    > and leave the rest of the code unchanged.


    But my lament is that this just doesn't save all that much memory over
    an already efficient slurping method, due to the overhead of Tie::File's
    internal structures. I checked again on the latest Tie::File, and based on
    vague recollections it does seem substantially better than the older one I
    played around with, but still the memory overhead is not an insignificant
    fraction of what it would be to just slurp a large file of short lines. So
    I consider Tie::File to be an emergency measure I'd throw at a program to
    keep it limping along while I redesign and rewrite. (Not that there is
    anything wrong with that)

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    The costs of publication of this article were defrayed in part by the
    payment of page charges. This article must therefore be hereby marked
    advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
    this fact.
     
    , Jun 17, 2008
    #6
  7. cartercc Guest

    On Jun 17, 6:49 am, wrote:
    > Say my opening brace is on line 95 and my closing brace 195 I want to
    > insert a comment
    >
    > // for loop ends line 195
    >
    > at line 94 (i.e immediately above the opening brace). The problem is
    > that processing line by line I don't know until I get to line 195 what
    > I have to change at line 9 so I have to store lines 94 to 195 in
    > memory anyway
    >
    > Similarly if I read a function header, I want to insert some
    > documentation before the function header
    > so I don't believe processing the file line by line is the best
    > solution here. As I will be inserting extra lines into the middle of
    > an array I think I am going to need a module to do this.


    I might approach this by matching delimiters. You can certainly match
    delimiters and insert comments just above the opening brace. If you
    match on key words (for, while, if, else, etc.) and count your lines,
    you can create an intermediate file with a comment template just above
    the opening brace, and then manually edit for the final program.
    Something like this, maybe:

    my $line_counter
    my @brace_stack #holds info about your block
    while(<INFILE>)
    if $_ matches '{'
    $line_counter++
    push $brace_stack[n]
    print OUTFILE "// COMMENT"
    print OUTFILE $_
    if $_ matches '}'
    $line_counter--
    pop $brace_stack[n]
    print OUTFILE $_
    print OUTFILE "// COMMENT"

    Obviously, your logic would depend on your coding standard. I wrote
    something similar in Java and developed a class that would do
    something similar. Perl ought to be a lot easier.

    CC
     
    cartercc, Jun 17, 2008
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. markspace

    reading a whole file?

    markspace, May 24, 2004, in forum: C++
    Replies:
    3
    Views:
    3,390
    John Harrison
    May 24, 2004
  2. \A_Michigan_User\
    Replies:
    2
    Views:
    937
    \A_Michigan_User\
    Aug 21, 2006
  3. josh logan
    Replies:
    4
    Views:
    347
    John Nagle
    Oct 26, 2010
  4. Roger Pack
    Replies:
    3
    Views:
    110
    Caleb Clausen
    Dec 2, 2009
  5. rahulthathoo

    Huge Memory Load for reading into memory

    rahulthathoo, Nov 7, 2006, in forum: Perl Misc
    Replies:
    6
    Views:
    134
    Ted Zlatanov
    Nov 10, 2006
Loading...

Share This Page