Reading whole file into memory. Parsing 'C' like file efficently

n_macpherson · Jun 17, 2008

I know there are a number of FAQs which disscourage reading whole
files into memory rather than line by line.

However my problem is as follows.

I am reading a file which is a language which looks like (but isn't )
C. I need to insert comments / documentation at various points in the
file. However sometimes I don't know what I want to insert until I get
well past the current line - for example

for(i=0;i<64;i++)
{
// lots of code
}

Say my opening brace is on line 95 and my closing brace 195 I want to
insert a comment

// for loop ends line 195

at line 94 (i.e immediately above the opening brace). The problem is
that processing line by line I don't know until I get to line 195 what
I have to change at line 9 so I have to store lines 94 to 195 in
memory anyway

Similarly if I read a function header, I want to insert some
documentation before the function header
so I don't believe processing the file line by line is the best
solution here. As I will be inserting extra lines into the middle of
an array I think I am going to need a module to do this.

Memory won't be an issue - my largest file will only be 6000

I've been away from Perl for a while but I seem to remember there was
a module File::Tie which might be suitable.

I'd be grateful if anyone has any suggestions - the people who will be
using this don't normally use Perl so I'd like to avoid using any non-
standard modules if possible

Thanks

Niall

Jürgen Exner · Jun 17, 2008

Similarly if I read a function header, I want to insert some
documentation before the function header
so I don't believe processing the file line by line is the best
solution here.

Based on what you said I would tend to agree.

If that kind of automated annotation is useful is a different story,
thou. I doubt it. Like for

Say my opening brace is on line 95 and my closing brace 195 I want to
insert a comment
// for loop ends line 195

First of all a proper indentation will provide even better guidance as
to where the loop ends. And second a single block spanning 100 lines is
just plain nuts. A classic rule of thumb used to be that if the code for
a sub doesn't fit on VT220 screen, then it was too long and you should
think about splitting it. There ware two reasons for this:
- you don't want to keep scrolling up and down while thinking about this
sub
- anyting much longer becomes too complex for a single sub

Granted, times have changed and typically you can display many more
lines on modern terminals. But the second reason is still very sound.
Many people will probably consider 30-50 lines of code to be the maximum
length of code that can still be easily viewed and recognized without
too much mental scrolling.

As I will be inserting extra lines into the middle of
an array I think I am going to need a module to do this.

Why? Sounds like a perfect job for splice().

jue

n_macpherson · Jun 17, 2008

First of all a proper indentation will provide even better guidance as
to where the loop ends. And second a single block spanning 100 lines is
just plain nuts. A classic rule of thumb used to be that if the code for
a sub doesn't fit on VT220 screen, then it was too long and you should
think about splitting it. There ware two reasons for this:
- you don't want to keep scrolling up and down while thinking about this
sub
- anyting much longer becomes too complex for a single sub

Granted, times have changed and typically you can display many more
lines on modern terminals. But the second reason is still very sound.
Many people will probably consider 30-50 lines of code to be the maximum
length of code that can still be easily viewed and recognized without
too much mental scrolling.

One of the reasons I am writing this script is because we have
introduced coding standards which specify a maximum of 300 lines per
function and 70 lines for a while/if/else/for loop and I need to
highlight places in our scripts where this occurs. I agree 300 lines
for a function is probably too long but in the language concerned
anything less than 200 would be completely impractical unfortunately.

The indentation is a good point - our developers mostly develop on
site which means a variety of editors ( UltraEdit, Visual Studio,
Notepad++, our own proprietary editor ) are used. This means
indentation across scripts becomes inconsistent. One of the functions
of the script I am writing will be to make sure the indentation
conforms to the coding standards.

Why? Sounds like a perfect job for splice().

Yes - I'd forgotten splice() will allow me to insert into the middle
of an array (as I said I have been away from Perl for a little
while) . That should work fine for my purposes.

xhoster · Jun 17, 2008

I know there are a number of FAQs which disscourage reading whole
files into memory rather than line by line.

I hope the discourage you from reading whole files into memory
thoughtlessly and without good reason. It seems like you do have a good
reason to read them into memory, so go ahead and do it. There is even a
module, File::Slurp, to facilitate it.

....

Memory won't be an issue - my largest file will only be 6000

Those are famous last words

I remember many times when I've said "it will only ever be X large" and
then had to eat those words. But of course, I suspect there are many many
more times that my statement held true and it never did get much larger,
but those ones don't force themselves back into your attention the way the
other ones do.

I've been away from Perl for a while but I seem to remember there was
a module File::Tie which might be suitable.

For 6000 lines of code, you should be a long long way from needing
Tie::File. In fact, last time I investigated it, the memory overhead for
Tie::File was so large that, unless your file's lines are very long, much
longer than one generally finds in a computer program, it provided little
memory benefit over slurping the file.

I'd be grateful if anyone has any suggestions -

Don't worry about this particular problem until it has proven itself
to be an issue (which it probably won't)

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.

Ben Morrow · Jun 17, 2008

Quoth (e-mail address removed):

(e-mail address removed) wrote:

[slurping a file into an array]

For 6000 lines of code, you should be a long long way from needing
Tie::File. In fact, last time I investigated it, the memory overhead for
Tie::File was so large that, unless your file's lines are very long, much
longer than one generally finds in a computer program, it provided little
memory benefit over slurping the file.

One major advantage of Tie::File is that the interface is exactly the
same as a slurped array, so if/when memory does become a problem, you
can simply replace

use File::Slurp qw/read_file/;

my @data = read_file 'name';

with

use Tie::File;

tie my @data, 'Tie::File', 'name' or die "can't read 'name': $!";

and leave the rest of the code unchanged.

Ben

xhoster · Jun 17, 2008

Ben Morrow said:
Quoth (e-mail address removed):

(e-mail address removed) wrote:

Click to expand...

[slurping a file into an array]

For 6000 lines of code, you should be a long long way from needing
Tie::File. In fact, last time I investigated it, the memory overhead
for Tie::File was so large that, unless your file's lines are very
long, much longer than one generally finds in a computer program, it
provided little memory benefit over slurping the file.

Click to expand...

One major advantage of Tie::File is that the interface is exactly the
same as a slurped array, so if/when memory does become a problem, you
can simply replace

use File::Slurp qw/read_file/;

my @data = read_file 'name';

This uses 3 times as much memory as reading in the file in a while loop
and pushing it into the array. It seems like it should only be two times
as much, but it isn't (And it is 1.5 times as much @data=<$fh> takes). Of
course, most of that excess memory is eligible for later reuse, provided
your program survives and needs it.

with

use Tie::File;

tie my @data, 'Tie::File', 'name' or die "can't read 'name': $!";

and leave the rest of the code unchanged.

But my lament is that this just doesn't save all that much memory over
an already efficient slurping method, due to the overhead of Tie::File's
internal structures. I checked again on the latest Tie::File, and based on
vague recollections it does seem substantially better than the older one I
played around with, but still the memory overhead is not an insignificant
fraction of what it would be to just slurp a large file of short lines. So
I consider Tie::File to be an emergency measure I'd throw at a program to
keep it limping along while I redesign and rewrite. (Not that there is
anything wrong with that)

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.

cartercc · Jun 17, 2008

Say my opening brace is on line 95 and my closing brace 195 I want to
insert a comment

// for loop ends line 195

at line 94 (i.e immediately above the opening brace). The problem is
that processing line by line I don't know until I get to line 195 what
I have to change at line 9 so I have to store lines 94 to 195 in
memory anyway

Similarly if I read a function header, I want to insert some
documentation before the function header
so I don't believe processing the file line by line is the best
solution here. As I will be inserting extra lines into the middle of
an array I think I am going to need a module to do this.

I might approach this by matching delimiters. You can certainly match
delimiters and insert comments just above the opening brace. If you
match on key words (for, while, if, else, etc.) and count your lines,
you can create an intermediate file with a comment template just above
the opening brace, and then manually edit for the final program.
Something like this, maybe:

my $line_counter
my @brace_stack #holds info about your block
while(<INFILE>)
if $_ matches '{'
$line_counter++
push $brace_stack[n]
print OUTFILE "// COMMENT"
print OUTFILE $_
if $_ matches '}'
$line_counter--
pop $brace_stack[n]
print OUTFILE $_
print OUTFILE "// COMMENT"

Obviously, your logic would depend on your coding standard. I wrote
something similar in Java and developed a class that would do
something similar. Perl ought to be a lot easier.

CC

Rearranging .ply file via C++ String Parsing	0	Dec 14, 2019
C program: memory leak/ segmentation fault/ memory limit exceeded	0	Nov 12, 2022
How to save textBox values into a xml-file(with naming an choosing directory)?	1	Aug 23, 2022
How to debug every line of a c code with macros like functions ?	0	Aug 8, 2022
Reading File Into 2D List	2	Jul 9, 2013
How can I view / open / render / display a pdf file with c code?	0	Sep 23, 2023
Php combine identical lines in text file	4	Oct 11, 2023
Help with importing from multiple files and printing lines in designated spot to spit out one file.	1	Jan 16, 2023

Reading whole file into memory. Parsing 'C' like file efficently

n_macpherson

Jürgen Exner

n_macpherson

xhoster

Ben Morrow

xhoster

cartercc

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads