perl memory policy

O

ozy

Hi guys!

I have some code which could be reduced to the following...
--------------------
my @unsorted;
open( FH, 'test_data.txt' );

while ( <FH> ){
chomp $_;
push @unsorted, $_;
}

close( FH );
--------------------

The file that we are reading is quite simple: a single 0 in every line,
but the size is about 12Mb (and will be way bigger in the real
enviroment)

We are using v5.6.1 built for MSWin32-x86-multi-thread (Binary build
635 provided by ActiveState Corp)

My questions would be:
- could the snippets be written somehow else to be more effective in
sight of memory usage?
- what is the actuall memory policy of perl? Just because it is using
way more that i would except... (using 600+ Mb at the end) (I know hoe
TCL handles memory allocation but I can't find a clue about Perl's
behaviour )
- do I have any option to influence this policy?

thanks in advance!

ozy
 
X

xhoster

ozy said:
Hi guys!

I have some code which could be reduced to the following...

In micro-optimization, unlike in the macro-optimization or debugging
issues we generally address here, the gory details are what matter. A
short simple script is generally not good enough to get at those gory
details.
--------------------
my @unsorted;
open( FH, 'test_data.txt' );

while ( <FH> ){
chomp $_;
push @unsorted, $_;
}

close( FH );

Is that 6e6 lines or 4e6 lines?
(and will be way bigger in the real
enviroment)

In that case, you almost definitely need to find a non-memory-resident
implementation if you are going to do this in Perl. Why spend an enourmous
amount of time to make your data just barely fit into memory, only to have
it break when your dataset grows by 2 percent? Spending your time doing a
thorough redesign of the algorithm/implementation instead.

We are using v5.6.1 built for MSWin32-x86-multi-thread (Binary build
635 provided by ActiveState Corp)

My questions would be:
- could the snippets be written somehow else to be more effective in
sight of memory usage?

Yes. For example, just changing the "push @unsorted, $_;" to
"push @unsorted, $_+0;" cut the memory use in half, for me (5.8.0, Linux).
Whether this applies to your real code on your OS, I can't say.
- what is the actuall memory policy of perl?

I believe this is Perl's memory policy (from perldoc -q memory):

When it comes to time-space tradeoffs, Perl nearly always
prefers to throw memory at a problem.


Just because it is using
way more that i would except... (using 600+ Mb at the end)

I get 264 MB here, or 126 MB with the optimization I mentioned.
(I know hoe
TCL handles memory allocation but I can't find a clue about Perl's
behaviour )

The point of high level languages is that you don't have to worry about it.
If you find yourself worrying about, you might be better off switching a
low level language, or fusing them (Inline::C, XS, etc.)
- do I have any option to influence this policy?

Tons of them. Far more than I am willing to expound upon without having
more details. For some examples, I'd first question whether it is really
necessary to have an array with 6 millions zeros in it. If you have seen
one zero, you have pretty much seen them all.

Xho
 
J

James Taylor

Beware, that if the last line does not have a newline character
after the 0, then this while loop will evaluate false and not
run that final time. If this is a possibility, you might be
better saying:


Do you plan to sort @unsorted at some point?

Surely not *every* line, as there would be little point in
processing such a file.
Tons of them. Far more than I am willing to expound upon without having
more details. For some examples, I'd first question whether it is really
necessary to have an array with 6 millions zeros in it. If you have seen
one zero, you have pretty much seen them all.

Yeah, if every line is a 0, then just counting the lines is
sufficient information to be able to reproduce the data.

If it's simply that *most* lines contain a zero, and a few lines
contain more interesting numbers, then perhaps you could store
the non-zero lines in a hash, like this:

my %nonzero;
while (defined my $line = <FH>) {
chomp $line;
$nonzero{$.} = $line + 0 if $line != 0;
}

Then you can find out what value is on a specific line
by saying:

my $value = $nonzero{$line_number} || 0;
 
P

Paul Lalli

James said:
Beware, that if the last line does not have a newline character
after the 0, then this while loop will evaluate false and not
run that final time. If this is a possibility, you might be
better saying:

while ( defined $_ = <FH> ) {
# etc...

Incorrect. The lone <FH> inside a while loop is magical. It is
defined as:
defined($_ = <FH>)
for precisely the reason you mention....

perl -MO=Deparse -e' while (<FH>) { print; } '
while (defined($_ = <FH>)) {
print $_;
}
-e syntax OK

Do take note of the parentheses, btw... the way you wrote it would
result in a syntax error:
Can't modify defined operator in scalar assignment at -e line 1, near
"<FH>) "


Paul Lalli
 
J

James Taylor

Incorrect. The lone <FH> inside a while loop is magical. It is
defined as:
defined($_ = <FH>)
for precisely the reason you mention....

I'm astonished to discover I've been carrying that
misapprehension around for years! Literally.

I could have sworn I read something in the Camel that
explicitly warned against this. Was I just dreaming?

I'm also certain I've hit warnings generated by at least
one version of perl that forced me to put a defined() inside
the condition in order to stop the warnings being produced.
I can't have just been dreaming this can I?
perl -MO=Deparse -e' while (<FH>) { print; } '

I'd love to get this working on my platform.
Are the modules O, B and B::Deparse all pure Perl?
Do take note of the parentheses, btw... the way you wrote it would
result in a syntax error:
Can't modify defined operator in scalar assignment at -e line 1, near
"<FH>) "

Oops, my mistake.
 
P

Paul Lalli

James said:
I'm astonished to discover I've been carrying that
misapprehension around for years! Literally.

I could have sworn I read something in the Camel that
explicitly warned against this. Was I just dreaming?

No, but you may have been mis-remembering. If you do not take full
advantage of the magic, you get none of the magic. The situation you
were warning against will arrise if you do the following:

while (my $line = <FH>) { ... }

In this case, define() is not automagically added, and you would have
to explicitly check for the 0-without-newline case.
I'm also certain I've hit warnings generated by at least
one version of perl that forced me to put a defined() inside
the condition in order to stop the warnings being produced.
I can't have just been dreaming this can I?

Here I couldn't tell you. I can't remember hitting any such warnings.
I'd love to get this working on my platform.
Are the modules O, B and B::Deparse all pure Perl?

Er. Don't know. As far as I know, they're all standard modules in the
core distribution, however.
http://search.cpan.org/~nwclark/perl-5.8.7/ext/B/B/Deparse.pm

Paul Lalli
 
T

Tad McClellan

Paul Lalli said:
James said:
No, but you may have been mis-remembering.


No, it _used_ to be as he remembers, but is no longer they way he remembers.

If you do not take full
advantage of the magic, you get none of the magic. The situation you
were warning against will arrise if you do the following:

while (my $line = <FH>) { ... }

In this case, define() is not automagically added, and you would have
to explicitly check for the 0-without-newline case.


Incorrect:

perl -MO=Deparse -e' while (my $line = <FH>) { print; } '
while (defined(my $line = <FH>)) {
print $_;
}
-e syntax OK
 
T

Tad McClellan

James Taylor said:
I'm astonished to discover I've been carrying that
misapprehension around for years!


For some of those years it was a misapprehension, but for some it was not. :)

Adding the defined() for you was added to perl at some point
(before 2000, I think).

Literally.


Gak!

How can you actually "carry" an idea (misapprehension)?

It seems to me that you were speaking _figuratively_
(ie. just the opposite of "literally"!).

I could have sworn I read something in the Camel that
explicitly warned against this. Was I just dreaming?


Not if the Camel was old enough. :)

I'm also certain I've hit warnings generated by at least
one version of perl that forced me to put a defined() inside
the condition in order to stop the warnings being produced.


The warning is still in perldiag:

---------------------
=item Value of %s can be "0"; test with defined()

(W misc) In a conditional expression, you used <HANDLE>, <*> (glob),
C<each()>, or C<readdir()> as a boolean value. Each of these constructs
can return a value of "0"; that would make the conditional expression
false, which is probably not what you intended. When using these
constructs in conditional expressions, test their values with the
C<defined> operator.
---------------------

I'd love to get this working on my platform.
Are the modules O, B and B::Deparse all pure Perl?


I doubt it, but that should be irrelevant if you have managed
to build perl at all, since they are part of the core.

Did you try running the code above? What happened?
 
P

Paul Lalli

Tad said:
Paul Lalli said:
No, it _used_ to be as he remembers, but is no longer they way he remembers.

Huh. In what version did that behavior change?
Incorrect:

perl -MO=Deparse -e' while (my $line = <FH>) { print; } '
while (defined(my $line = <FH>)) {
print $_;
}
-e syntax OK

Well crap. I feel like a moron. Thanks very much for the correction,
Tad.

Paul Lalli
 
J

James Taylor

No, but you may have been mis-remembering.

Well, I can't find it in the section on the line input operator
in the third Camel, but then I learnt my Perl using the 2nd ed
and haven't read the 3rd cover to cover yet. There may be some
other section where the warning appears. Perhaps someone with
access to online/searchable copies of the books can tell us.
If you do not take full advantage of the magic, you get none of
the magic. The situation you were warning against will arrise
if you do the following:

while (my $line = <FH>) { ... }

In this case, define() is not automagically added,

Actually, this works as expected for me.
The defined() *is* implicitly inserted.
Er. Don't know. As far as I know, they're all standard modules in the
core distribution, however.
http://search.cpan.org/~nwclark/perl-5.8.7/ext/B/B/Deparse.pm

Sadly this is not so on the version of perl I'm stuck with,
and the original porter seems to have lost interest in my platform.
I keep wishing I was better at C because I'm more than motivated
enough to spend several months porting a later version if only I
had the necessary experience.
 
O

ozy

In that case, you almost definitely need to find a non-memory-resident
implementation if you are going to do this in Perl. Why spend an enourmous
amount of time to make your data just barely fit into memory, only to have
it break when your dataset grows by 2 percent? Spending your time doing a
thorough redesign of the algorithm/implementation instead.

Yeap. probably it will be recoded to have the SQL Server to the most of
the job, but my boss is 1. curious 2. don't want me to spend 2 days
with the rewriting (no comments on that issue please :) )
For example, just changing the "push @unsorted, $_;" to "push @unsorted, $_+0;"
Sadly It's not good for us. It seems that I confused everyone with this
0 thing. That was just my test file (I tappped on the 0 for no reason)
In production the lines will contaion any kind of data (integers,
floats, strings).

Honestly. Was there anybody out there who seriously thought that I want
to read in and sort a bunch of zeros??
The point of high level languages is that you don't have to worry about it.
If you find yourself worrying about, you might be better off switching a
low level language, or fusing them (Inline::C, XS, etc.)
Your right in a way, but "dont have to care about" and "not knowing it"
is just not the same. I did not know it and I was highly suprised that
I can't find any usable information on the net. I don't want to dig
into Perl's core, I just would like to know. Call it curiosity or as
you wish.
Tons of them.
Actually I was wondering if there is a switch or options to have Perl
to use traditional memory policy, or to set the maximal allocatable
space in memory, or smth alike ...
 
O

ozy

Surely not *every* line, as there would be little point in processing such a file.
You are absolutly right :)
As I mentioned in my prev post it was just a dummy test file. The real
one will (hopefully) contain way more meaningfull data :)
 
A

Anno Siegel

Tad McClellan said:
Gak!

How can you actually "carry" an idea (misapprehension)?

I know what you mean ("We literally died with laughter"), but in this
case I'd give him the benefit of the doubt and read it as "I've been
(figuratively) carrying that misapprehension around for (literally)
years".

Anno
 
A

Anno Siegel

[...]
Honestly. Was there anybody out there who seriously thought that I want
to read in and sort a bunch of zeros??

This is Usenet. We've seen stranger things than that.

Anno
 
O

ozy

This is Usenet. We've seen stranger things than that.

Understood :)

Btw I was the one who screwd up things not properly expressing the
case... so sorry everyone!
 
P

Peter Scott

my @unsorted;
open( FH, 'test_data.txt' );

while ( <FH> ){
chomp $_;
push @unsorted, $_;
}
My questions would be:
- could the snippets be written somehow else to be more effective in
sight of memory usage?

You can improve run time by pre-extending the array before the loop:

$#unsorted = 6E6; # say

but I do n ot think it makes any difference to the memory used.
- what is the actuall memory policy of perl? Just because it is using
way more that i would except... (using 600+ Mb at the end) (I know hoe
TCL handles memory allocation but I can't find a clue about Perl's
behaviour )

perldoc perldebguts, see "Debugging Perl Memory Usage"
- do I have any option to influence this policy?

Wait until Perl 6 when you can declare an array as holding native strings? :)
 
X

xhoster

Sadly It's not good for us. It seems that I confused everyone with this
0 thing. That was just my test file (I tappped on the 0 for no reason)
In production the lines will contaion any kind of data (integers,
floats, strings).

In microoptimization problems, the exact nature of the data if very
important. If it were only integers and floats then you could still use
this, but of course with strings you can't. Since it is a mixture, maybe
you could detect which it is and use the $_+0 only for the non-string
cases. Of course, if are then going to run this array through a stringy
sort (or anything else stringy) then this would be pointless or even
counterproductive.
Honestly. Was there anybody out there who seriously thought that I want
to read in and sort a bunch of zeros??

I rather suspected you didn't want to do that, but that suspicion wasn't
sufficient to figure what you actually were doing. When in doubt, I assume
the poster meant what they said :)

Your right in a way, but "dont have to care about" and "not knowing it"
is just not the same. I did not know it and I was highly suprised that
I can't find any usable information on the net.

I've posted quiet a bit of usable (well, at least I used it) information
previously in this forum.
I don't want to dig
into Perl's core, I just would like to know. Call it curiosity or as
you wish.

Know what? Perl does a hell of a lot of things for you. Almost all of
those things have memory implications. That means there are a hell of a
lot of different things to (possibly) know. I've previously posted what I
consider to be Perl's memory policy. If you want more details, read
perldoc perlguts, or ask specific questions.
Actually I was wondering if there is a switch or options to have Perl
to use traditional memory policy, or to set the maximal allocatable
space in memory, or smth alike ...

What is a "traditional memory policy"? It sounds like a memory policy
which won't let it's daughter marry anyone outside the ethnic group/clan.

Setting maximum resource useage is generally done at the OS/shell/kernel
level (ulimit,limit, etc.) although there probably are modules for doing
it, too. But of course, once you run out of allowed memory, Perl doesn't
magically turn into C. It gives up.

Xho
 
J

Joe Smith

James said:
I'm astonished to discover I've been carrying that
misapprehension around for years! Literally.

I could have sworn I read something in the Camel that
explicitly warned against this. Was I just dreaming?

If you have a copy of the pink Camel book (1st ed), then
you are not dreaming. Things have changed since version 4.0.
-Joe
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top