How to handle large variable

  • Thread starter Shashidhar Vajramatti
  • Start date
S

Shashidhar Vajramatti

Hello,
As listed below in the snippet of code, I am trying to handle a
file's contents by copying in to a single variable. It works fine for the
files of lesser size( few MegaBytes). But when the size of the file is more
than 16MB, it gives me an error message saying "System out of memory".

But perl doesnot have any limits to the size of the variable. Then, what
can be the reason for this problem?

Following is the code which is used to handle the file contents through a
single variable using regular expressions.

#$file = join '',<IN>;

@file = <IN>;



Error Message: "System Out of Memory"
 
P

Paul Lalli

Shashidhar Vajramatti said:
Hello,
As listed below in the snippet of code, I am trying to handle a
file's contents by copying in to a single variable. It works fine for the
files of lesser size( few MegaBytes). But when the size of the file is more
than 16MB, it gives me an error message saying "System out of memory".

But perl doesnot have any limits to the size of the variable. Then, what
can be the reason for this problem?

Exactly what the error message says: your system is out of memory.
Perl's only limitations (memory-wise) are your systems limitations.
You're trying to store an absurd amount of data in active memory.

What makes you believe you have a need to do this? Perhaps if you tell
us you're actual goal, rather than your proposed solution to your goal,
we can help you find a better way.
Following is the code which is used to handle the file contents through a
single variable using regular expressions.

#$file = join '',<IN>;

@file = <IN>;

There are no regular expressions in that snippet. Show us a *short but
complete* program that demonstrates what you're trying to do.

Paul Lalli
 
A

Arndt Jonasson

Shashidhar Vajramatti said:
As listed below in the snippet of code, I am trying to handle a
file's contents by copying in to a single variable. It works fine for the
files of lesser size( few MegaBytes). But when the size of the file is more
than 16MB, it gives me an error message saying "System out of memory".

But perl doesnot have any limits to the size of the variable. Then, what
can be the reason for this problem?

Following is the code which is used to handle the file contents through a
single variable using regular expressions.

#$file = join '',<IN>;

@file = <IN>;

Error Message: "System Out of Memory"

For what it's worth, I tried reading in a 9 MB file using this method,
and my perl dumped core with "out of memory" and produced a core dump
of 66 MB. The data need a lot more memory when contained in a Perl data
structure than it does on disk. Exactly how much depends on the system.

I think that even if you have enough swap space to handle all your big
files, doing it this way is a bad idea. Handling the files one line
at a time keeps the perl process small, and therefore faster (due to
less paging).
 
J

James Willmore

Shashidhar said:
As listed below in the snippet of code, I am trying to handle a
file's contents by copying in to a single variable. It works fine for the
files of lesser size( few MegaBytes). But when the size of the file is more
than 16MB, it gives me an error message saying "System out of memory".

But perl doesnot have any limits to the size of the variable. Then, what
can be the reason for this problem?

Physical and "virtual" (like swap space) memory has run out. This is
the reason why most in this newsgroup say to read a file a line at a
time. If you have no way of knowing how big a file will be *before* you
read the file, then you should stick with reading the file a line at a time.
Following is the code which is used to handle the file contents through a
single variable using regular expressions.

#$file = join '',<IN>;

@file = <IN>;



Error Message: "System Out of Memory"

Both methods you have outlined above will fail if you try to read a file
that is greater than your available memory. I'm not 100% sure if you
can use File::Slurp to do whatever it is you're trying to do.

What are you trying to do that you feel the need to read the entire file
into memory?

Jim
 
A

A. Sinan Unur

Following is the code which is used to handle the file contents
through a single variable using regular expressions.

#$file = join '',<IN>;

@file = <IN>;

I am surprised why no one has pointed out that this a bad, bad way to
slurp a file. File::Slurp is an excellent module:

http://search.cpan.org/~uri/File-Slurp-9999.06/lib/File/Slurp.pm

One can also do:

{
local $/;
$file = <IN>;
}

Reading the lines of a file into an array and then joining those lines
ought to effectively double the memory requirement.
 
B

Ben Morrow

Quoth "Shashidhar Vajramatti said:
Hello,
As listed below in the snippet of code, I am trying to handle a
file's contents by copying in to a single variable. It works fine for the
files of lesser size( few MegaBytes). But when the size of the file is more
than 16MB, it gives me an error message saying "System out of memory".

But perl doesnot have any limits to the size of the variable. Then, what
can be the reason for this problem?

Err... that the system is out of memory? You probably want to work a
line at a time.
Following is the code which is used to handle the file contents through a
single variable using regular expressions.

#$file = join '',<IN>;

The usual idiom for this is

$file = do {local $/; <IN>};

; a better idiom is

use File::Slurp;

$file = read_file $filename;
@file = <IN>;

Well, which? If you want the array, then you can use Tie::File instead.

Ben
 
L

Laura

A. Sinan Unur said:
I am surprised why no one has pointed out that this a bad, bad way to
slurp a file. File::Slurp is an excellent module:

What is so great about this File::Slurp module anyway? Does it deal with
files on a byte level using buffers appropriately? How is it that the
built in way of doing it is less efficient? If File::Slurp is so great,
why is it not one of the modules included with Perl?

The problem with reading line by line is that you have to read ahead through
the file to see where the lines start and end if you want random access or
you are stuck going sequentially. I propose that a user should be able to
page through a file with random access ability, loading a specified sized
chunk at a time. You could also have the chunk auto-shrink or expand to
the nearest line(s). Or, you could create an index of lines saved to the
same or a separate file so future accesses of lines will not take so long.
 
U

Uri Guttman

L> What is so great about this File::Slurp module anyway? Does it
L> deal with files on a byte level using buffers appropriately? How
L> is it that the built in way of doing it is less efficient? If
L> File::Slurp is so great, why is it not one of the modules included
L> with Perl?

why don't you look at the module and see for yourself? why don't you
read the article that comes with it and was published on perl.com?

L> The problem with reading line by line is that you have to read
L> ahead through the file to see where the lines start and end if you
L> want random access or you are stuck going sequentially. I propose
L> that a user should be able to page through a file with random
L> access ability, loading a specified sized chunk at a time. You
L> could also have the chunk auto-shrink or expand to the nearest
L> line(s). Or, you could create an index of lines saved to the same
L> or a separate file so future accesses of lines will not take so
L> long.

you are so clueless as to be laughable. file access is an OS issue and
not a programming language one. and some OS's support random access
lines (VMS for one) but most don't.

you are batting worse than the cardinals did in the series. perhaps you
should go to the T-ball level programming league.

uri
 
C

ctcgag

Shashidhar Vajramatti said:
Hello,
As listed below in the snippet of code, I am trying to handle a
file's contents by copying in to a single variable. It works fine for the
files of lesser size( few MegaBytes). But when the size of the file is
more than 16MB, it gives me an error message saying "System out of
memory".

Well, how much memory does your system have? What else is running on it?
But perl doesnot have any limits to the size of the variable. Then,
what can be the reason for this problem?

Most likely, you have run out of memory.

Xho
 
U

Uri Guttman

c> In my hands, not much. I found it marginally slower than the old
c> fashioned way.

care to show some benchmarks? have you looked at the benchmark script
that comes in the tarball? and was that for line/record mode or scalar
mode? return by ref or by value?

just saying it is slower is like newbies saying it doesn't work :)

uri
 
A

Anno Siegel

Arndt Jonasson said:
Newbie question: in what way is the above better than
$file = <IN>;

It isn't better, it's entirely different. "$file = <IN>" reads one
line from the file, "$file = do {local $/; <IN>}" reads them all.

Anno
 
A

Arndt Jonasson

It isn't better, it's entirely different. "$file = <IN>" reads one
line from the file, "$file = do {local $/; <IN>}" reads them all.

Silly me. 1) I didn't try it out, because I thought I knew what it would
do; 2) for some reason I thought that localized variables keep their
old values, instead of getting "undef". Thanks.

For those who still wonder: $/ (the input line separator) gets set
(locally) to nothing, so the whole file will be considered as one line
when doing <IN>.
 
T

Tad McClellan

For those who still wonder: $/ (the input line separator) gets set ^^^^
(locally) to nothing, so the whole file will be considered as one line
^^^^^^^ ^^^^
when doing <IN>.


Let's not be so loose with terminology, it can lead to confusion...


Calling a thing that is not necessarily a line "line" is a Bad Idea.

A "line" has one \n in it, and it is at the end.

The name of $/ is "input *record* separator", probably for
that very reason. :)

$/ does not get set to nothing (whatever that means), it
gets set to undef.

.... so the whole file will be considered as one *record* when doing <IN>.
 
A

Arndt Jonasson

Tad McClellan said:
^^^^^^^ ^^^^


Let's not be so loose with terminology, it can lead to confusion...

I agree.
Calling a thing that is not necessarily a line "line" is a Bad Idea.

I agree, but Programming Perl, 2nd edition (I know there is a 3rd),
only uses the word "line" when describing the angle operator. But I
should have spent 10 seconds and looked up the full name of $/ before
posting. I hope no one reads my explanations without looking the stuff
up themselves when they need it.
$/ does not get set to nothing (whatever that means), it
gets set to undef.

My thought process was probably something like "$/ gets set to the undefined
value, which causes the input operation to behave as if there were no
such thing as a record separator". Saying "$/ is set to undef" would have
been saying less than I wanted. It turned out a bit sloppy, yes. In another
context, "nothing" could very well mean the empty string.

Exactly how vague is one allowed to be without misleading, when saying
things about 'undef', I wonder. "$x has no value", "$x is undefined",
"$x has the value 'undef'", "$x has the undefined value", "$x is set
to nothing". Which of these are OK? I haven't picked up the common
Perl speech patterns yet.

(I find your two-line blank spaces harder to read than one-line ones, but
that's personal taste, of course.)
 
U

Uri Guttman

AJ> My thought process was probably something like "$/ gets set to the
AJ> undefined value, which causes the input operation to behave as if
AJ> there were no such thing as a record separator". Saying "$/ is set
AJ> to undef" would have been saying less than I wanted. It turned out
AJ> a bit sloppy, yes. In another context, "nothing" could very well
AJ> mean the empty string.

and the empty string in $/ has another specific meaning of paragraph
mode. so using the term 'nothing' around $/ is not a good idea. computer
stuff likes specificity and not vagueness.

AJ> Exactly how vague is one allowed to be without misleading, when
AJ> saying things about 'undef', I wonder. "$x has no value", "$x is
AJ> undefined", "$x has the value 'undef'", "$x has the undefined
AJ> value", "$x is set to nothing". Which of these are OK? I haven't
AJ> picked up the common Perl speech patterns yet.

vagueness is not permitted!! :)

try reading some bad technical documentation where there are dangling
pronouns galore. which 'that' does this 'it' refer to? general writing
expects to have more pronouns and such for style and the reader can
usually pick out the proper meaning via many clues and some
redundancy. in computer writing (and coding), accuracy and unambiguity
are key.

uri
 
A

Arndt Jonasson

Uri Guttman said:
AJ> My thought process was probably something like "$/ gets set to the
AJ> undefined value, which causes the input operation to behave as if
AJ> there were no such thing as a record separator". Saying "$/ is set
AJ> to undef" would have been saying less than I wanted. It turned out
AJ> a bit sloppy, yes. In another context, "nothing" could very well
AJ> mean the empty string.

and the empty string in $/ has another specific meaning of paragraph
mode. so using the term 'nothing' around $/ is not a good idea. computer
stuff likes specificity and not vagueness.

Is there a need to rub it in? We agree. I'm the kind of person who'd
like to see a formal definition for Perl.
AJ> Exactly how vague is one allowed to be without misleading, when
AJ> saying things about 'undef', I wonder. "$x has no value", "$x is
AJ> undefined", "$x has the value 'undef'", "$x has the undefined
AJ> value", "$x is set to nothing". Which of these are OK? I haven't
AJ> picked up the common Perl speech patterns yet.

vagueness is not permitted!! :)

try reading some bad technical documentation where there are dangling
pronouns galore. which 'that' does this 'it' refer to? general writing
expects to have more pronouns and such for style and the reader can
usually pick out the proper meaning via many clues and some
redundancy. in computer writing (and coding), accuracy and unambiguity
are key.

Yes, yes, I wasn't advocating sloppy style (like permitting all of the
above statements about $x), but I do want to know where the line is
drawn. Are all of them unpermissible?

I've had documentation written by me reworked by a "technical writer" (*)
so that while the language was slightly more mellifluous, it had become
impossible to see exactly what the documentation was saying - we had to
scrap that rework and go back to the original. I know what you mean.

(*) A really bad one. I don't mean any disrespect to the technical writers
as a species.
 
U

Uri Guttman

AJ> Is there a need to rub it in? We agree. I'm the kind of person who'd
AJ> like to see a formal definition for Perl.

i never stated we need a formal definition for perl. how did you get
that from my post? i said that your wording was wrong in an important
way ($/ does diff things when set to undef and ''). it could have been a
minor wording issue and i would not have commented upon it.

AJ> Yes, yes, I wasn't advocating sloppy style (like permitting all of
AJ> the above statements about $x), but I do want to know where the
AJ> line is drawn. Are all of them unpermissible?

like anything else in software and the world, it takes experience. some
get it faster than other like coding in general. writing unambiguous
english (or any natural lang) is a skill unto itself. and if it is too
unambiguous it becomes stilted and boring and useless too. read damian
conway's or peter scott's books to see how to write interesting and
accurate technical english.

uri
 
A

Arndt Jonasson

Uri Guttman said:
AJ> Is there a need to rub it in? We agree. I'm the kind of person who'd
AJ> like to see a formal definition for Perl.

i never stated we need a formal definition for perl. how did you get
that from my post?

No, you didn't state that, and I didn't get that from your post at
all. Did I state that you did? I made it clear (or I tried to) that I
too consider it important to let things be well defined. How do you
get from "would like to see" to "thinks we need"?

Does this lead anywhere? It hasn't been about Perl for a while now; not
even about programming practice.
AJ> Yes, yes, I wasn't advocating sloppy style (like permitting all of
AJ> the above statements about $x), but I do want to know where the
AJ> line is drawn. Are all of them unpermissible?

like anything else in software and the world, it takes experience. some
get it faster than other like coding in general. writing unambiguous
english (or any natural lang) is a skill unto itself. and if it is too
unambiguous it becomes stilted and boring and useless too. read damian
conway's or peter scott's books to see how to write interesting and
accurate technical english.

I really do want to know which ones are OK (they've disappeared in the
quoted text now, I see). It's a simple, very specific, question. Maybe
there isn't a simple answer, though.

I'll look up the names you mention - apart from that I don't see how your
answer relates to my question, even if I agree (if I'm permitted to do so)
with what you write.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,780
Messages
2,569,608
Members
45,252
Latest member
MeredithPl

Latest Threads

Top