EOF issue

I

IanW

I'm looping through each line of a text file, with a format like this:

# start of file
===============
Line 1
Line 2
Line 3
===============
Line 1
Line 2
Line 3
===============
Line 1
Line 2
Line 3

# end of file

I am processing it in chunks where "===============" is the start of each
chunk.

So after opening the file (FH), I while-loop through it like this:

my @chunk = ();
while(my $ln = <FH>){
if(($ln =~ /^={15}$/ or eof) and scalar @chunk > 0){
# process @chunk
undef @chunk;
}
push(@chunk,$ln);
}

That works nicely except that when it gets to the end of the file "Line 3"
above is not added to @chunk. I would have expected eof to test positive
when it gets to the end of the blank line. Am I missing something obvious
here or is there a quirk with the usage of eof?

IanW
 
G

Graham Wood

IanW said:
I'm looping through each line of a text file
my @chunk = ();
while(my $ln = <FH>){
if(($ln =~ /^={15}$/ or eof) and scalar @chunk > 0){
# process @chunk
undef @chunk;
}
push(@chunk,$ln);
}

That works nicely except that when it gets to the end of the file "Line 3"
above is not added to @chunk. I would have expected eof to test positive
when it gets to the end of the blank line. Am I missing something obvious
here or is there a quirk with the usage of eof?

IanW

Your problem is that your while loop exits when it hits eof and
therefore you never hit the if statement when eof is true.

I'd suggest removing the eof from the condition and adding a "process
@chunk" after the while loop.

Someone will doubtless suggest a more elegant solution.

Graham
 
G

Gunnar Hjalmarsson

IanW said:
I'm looping through each line of a text file, with a format like
this:

# start of file
===============
Line 1
Line 2
Line 3
===============
Line 1
Line 2
Line 3
===============
Line 1
Line 2
Line 3

# end of file

I am processing it in chunks where "===============" is the start of
each chunk.

You can try playing with the $/ variable. See "perldoc perlvar".
 
P

Peter Wyzl

IanW said:
I'm looping through each line of a text file, with a format like this:

I am processing it in chunks where "===============" is the start of each
chunk.

So after opening the file (FH), I while-loop through it like this:

my @chunk = ();
while(my $ln = <FH>){
if(($ln =~ /^={15}$/ or eof) and scalar @chunk > 0){
# process @chunk
undef @chunk;
}
push(@chunk,$ln);
}

That works nicely except that when it gets to the end of the file "Line 3"
above is not added to @chunk. I would have expected eof to test positive
when it gets to the end of the blank line. Am I missing something obvious
here or is there a quirk with the usage of eof?



Here is how I would approach that problem...

(assume filehandle "IN" already opened for reading)

my @chunks; # dont need to explicitly declare an empty list, that is
default.
{
local $/ = ''; #slurp mode
@chunks = split /={15}\n/, <IN>; # puts all the 'chunks' into @chunks in
one read
}

# close <IN> here..

for (@chunks){
#process chunks...
}


BTW, your problem is that the while loop encounters the eof and self
terminates therefore your test never happens. Part of the magic of the

while (<FH>){

}

construct...


HTH
 
I

IanW

Gunnar Hjalmarsson said:
You can try playing with the $/ variable. See "perldoc perlvar".

Well, yes, I could split the file into chunks with something like this:

open(FH,$file);
local $/;
$lines = <FH>;
close(FH);
@chunks = $lines =~ /={15}.+?(?=\={15}|$)/g;
foreach(@chunk){
# process chunk
}

(that's untested so there's probably something wrong with it, but I guess
you know what I mean)

but I'd still like to know why eof isn't working as I would expect it to.

Bigus
 
I

IanW

[..]
So after opening the file (FH), I while-loop through it like this:

my @chunk = ();
while(my $ln = <FH>){
if(($ln =~ /^={15}$/ or eof) and scalar @chunk > 0){
# process @chunk
undef @chunk;
}
push(@chunk,$ln);
}

That works nicely except that when it gets to the end of the file "Line 3"
above is not added to @chunk. I would have expected eof to test positive
when it gets to the end of the blank line. Am I missing something obvious
here or is there a quirk with the usage of eof?
[..]

BTW, your problem is that the while loop encounters the eof and self
terminates therefore your test never happens. Part of the magic of the

while (<FH>){

Doh, of course. One oddity remains though.. In my original code I had this:

if($ln =~ /^={15}$/ and scalar @chunk > 0){

instead of the revised:

if(($ln =~ /^={15}$/ or eof) and scalar @chunk > 0){

without the eof test the whole last chunk (from = x 15 to the blank line at
the end of the file, inclusive) was skipped, because of course it was
looking for another line of ='s before processing the chunk. However, with
the eof test, it does actually process the last chunk, except misses off the
"Line 3". Any ideas on that?

IanW
 
P

Peter Hickman

IanW said:
my @chunk = ();
while(my $ln = <FH>){
if(($ln =~ /^={15}$/ or eof) and scalar @chunk > 0){
# process @chunk
undef @chunk;
}
push(@chunk,$ln);
}

The while loop is only executed whilst there is data to process, once the files
is passed the end then the loop stops processing. So the eof part of your test
is never reached. Thus after the loop is completed you have data in @chunk but
do not try to process it.

my @chunk = ();
while(my $ln = <FH>){
if($ln =~ /^={15}$/){
process( @chunk ) if @chunk;
undef @chunk;
}
push(@chunk,$ln);
}
process( @chunk ) if @chunk;

eof is rarely used, in my experience.
 
T

Tad McClellan

Graham Wood said:
IanW wrote:


[ snip: code for building multiline records ]

Your problem is that your while loop exits when it hits eof and
therefore you never hit the if statement when eof is true.

Someone will doubtless suggest a more elegant solution.


OK, how's this (untested)?


{
local $/ = "===============\n";
push @chunk, $_ while <FILE>;
chomp @chunk; # chomp() knows what to remove too!
}
 
J

James Willmore

IanW said:
I'm looping through each line of a text file, with a format like this:

# start of file
===============
Line 1
Line 2
Line 3
===============
Line 1
Line 2
Line 3
===============
Line 1
Line 2
Line 3

# end of file

I am processing it in chunks where "===============" is the start of each
chunk.

So after opening the file (FH), I while-loop through it like this:

my @chunk = ();
while(my $ln = <FH>){
if(($ln =~ /^={15}$/ or eof) and scalar @chunk > 0){
# process @chunk
undef @chunk;
}
push(@chunk,$ln);
}

That works nicely except that when it gets to the end of the file "Line 3"
above is not added to @chunk. I would have expected eof to test positive
when it gets to the end of the blank line. Am I missing something obvious
here or is there a quirk with the usage of eof?

Try:

#!/usr/bin/perl

use strict;
use warnings;

open(IN, 'sample')
or die "Can't open sample: $!\n";

my @chunk;
while(<IN>) {
chomp;
next if /^={15}$/ or length $_ == 0;
push @chunk, $_;
}

close IN;

print join("\n", @chunk),"\n";

I ditched the $in variable - I just operated on "$_". I also
changed the logic you used to test for a valid line (from 'and' to
'or'). I also chomp'ed the line - which causes the length of an
"empty" line to go to zero (which helps if you're testing the length
of the line - newlines and carriage returns do have length AFAIK ...
but I could be wrong on this point).

HTH

Jim
 
J

James Willmore

Peter Hickman wrote:
my @chunk = ();
while(my $ln = <FH>){
if($ln =~ /^={15}$/){

Where, in your example, have you actually assigned a value to
@chunk? I see $in, but nothing related to @chunk. Not enough
coffee yet ... maybe ;-)
process( @chunk ) if @chunk;
undef @chunk;
}
push(@chunk,$ln);
}
process( @chunk ) if @chunk;

eof is rarely used, in my experience.

True. Same here.

Jim
 
T

Thomas Kratz

IanW said:
Well, yes, I could split the file into chunks with something like this:

open(FH,$file);
local $/;
$lines = <FH>;
close(FH);
@chunks = $lines =~ /={15}.+?(?=\={15}|$)/g;
foreach(@chunk){
# process chunk
}

I think Gunnar meant something like:

{
local $/ = '=' x 15 . "\n";
while ( my $chunk = <FH> ) {
chomp($chunk);
next unless $chunk =~ /\w/;
# process chunk
}
}

Why separate the chunks yourself, when <> and chomp, in combination with
$/ can do the job?

Thomas

--
$/=$,,$_=<DATA>,s,(.*),$1,see;__END__
s,^(.*\043),,mg,@_=map{[split'']}split;{#>J~.>_an~>>e~......>r~
$_=$_[$%][$"];y,<~>^,-++-,?{$/=--$|?'"':#..u.t.^.o.P.r.>ha~.e..
'%',s,(.),\$$/$1=1,,$;=$_}:/\w/?{y,_, ,,#..>s^~ht<._..._..c....
print}:y,.,,||last,,,,,,$_=$;;eval,redo}#.....>.e.r^.>l^..>k^.-
 
P

Peter Hickman

James said:
Where, in your example, have you actually assigned a value to @chunk? I
see $in, but nothing related to @chunk. Not enough coffee yet ... maybe
;-)

Perhaps the push?
 
U

Uri Guttman

PW> default.
PW> {
PW> local $/ = ''; #slurp mode

that is paragraph mode. undef is the value for slurp mode.
and File::Slurp is faster and a nicer API IMO :).

uri
 
I

IanW

[..]
{
local $/ = "===============\n";
push @chunk, $_ while <FILE>;
chomp @chunk; # chomp() knows what to remove too!
}

where would one process each chunk in that - it seems to just slurp every
line of the file into @chunk except the "=" x 15 ones?

IanW
 
I

IanW

[..]

I don't understand the program logic here..
local $/ = '=' x 15 . "\n";

ok, so changing the input record separater to '=' x 15 . "\n" (referred to
as "=15" hereafter for convenience), effectively specifying how you're going
to split the file (I actually need the =15 when I output the chunks after
processing them, but I can add that back in when printing out to file).
while ( my $chunk = <FH> ) {

that's going to slurp in all text up to and including "Line 3" in my
original example on the first pass of the loop.
chomp($chunk);

removes any newline characters from the very end of $chunk and I see when I
comment this line out in testing, it causes the following =15 to be slurped
into the chunk aswell, but I've not yet figured out why that newline
character makes a difference since you've already told Perl you're splitting
the file up by =15?
next unless $chunk =~ /\w/;
# process chunk

So this says that we're going to process $chunk if it has an alphanumeric
character in it, which it would if the first chunk has been successfuly
slurped int?

IanW
 
C

ctcgag

IanW said:
[..]

I don't understand the program logic here..
local $/ = '=' x 15 . "\n";

ok, so changing the input record separater to '=' x 15 . "\n" (referred
to as "=15" hereafter for convenience), effectively specifying how you're
going to split the file (I actually need the =15 when I output the chunks
after processing them, but I can add that back in when printing out to
file).
while ( my $chunk = <FH> ) {

that's going to slurp in all text up to and including "Line 3" in my
original example on the first pass of the loop.

Up to and including the first Line 3, and also including the =15 after it.
removes any newline characters from the very end of $chunk

It removes $/, which in this case is not just any newline, but also =15
and I see when
I comment this line out in testing, it causes the following =15 to be
slurped into the chunk aswell, but I've not yet figured out why that
newline character makes a difference since you've already told Perl
you're splitting the file up by =15?


So this says that we're going to process $chunk if it has an alphanumeric
character in it, which it would if the first chunk has been successfuly
slurped int?

The normal use of $/ implies the =15 would come after each of your records.
What you actually have is a =15 before each record. What's is the
difference? Well, you have an extra =15 at the start, which results (after
chomp) in an empty "record" the first time through the loop. So you have
to ignore that empty record. Of course, if it is legal to have empty
records in your file, then this method would ignore those as well, so you
can't use this method of ignoring the first (false) empty record if you
need to deal with real empty records.

Also, you don't have a =15 as the last line in your file, as you would
if =15 was the record ender rather than record starter. But that doesn't
cause any problems, because chomp only removed $/ if it is present and
does nothing if it is absent.

Xho
 
B

Ben Morrow

Quoth "IanW said:
[..]

I don't understand the program logic here..
local $/ = '=' x 15 . "\n";

ok, so changing the input record separater to '=' x 15 . "\n" (referred to
as "=15" hereafter for convenience), effectively specifying how you're going
to split the file (I actually need the =15 when I output the chunks after
processing them, but I can add that back in when printing out to file).

....and you can get Perl to do that for you too, with

local $\ = $/;

Ben
 
A

A. Sinan Unur

[..]
{
local $/ = "===============\n";
push @chunk, $_ while <FILE>;
chomp @chunk; # chomp() knows what to remove
too!
}

where would one process each chunk in that - it seems to just slurp
every line of the file into @chunk except the "=" x 15 ones?

You might want to try and see what each element of the array holds as well
as read about $/ in perldoc perlvar.

Sinan.
 
T

Tad McClellan

IanW said:
[..]
{
local $/ = "===============\n";
push @chunk, $_ while <FILE>;
chomp @chunk; # chomp() knows what to remove too!
}

where would one process each chunk in that


You wouldn't, you could process each chunk _outside_ of that
by iterating over @chunk.

If you want to process chunks as they come in rather than gather
them all into an array, then you can do that to:

{
local $/ = "===============\n";
while ( <FILE> ) {
# process chunk (a multiline string)
}
}

- it seems to just slurp every
line of the file into @chunk


Each element of @chunk is, well, a chunk. You had a single chunk
in @chunk in your code with a _line_ from the chunk as the
array elements.

except the "=" x 15 ones?


No, it gets the =15 ones too, but the chomp() then removes them.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,777
Messages
2,569,604
Members
45,218
Latest member
JolieDenha

Latest Threads

Top