Using loop labels and iterating.

Nene · Feb 16, 2012

Hi,

This is a script that opens up a log file and searches for the word
'started' and if it finds the word, it prints it and iterates to the
next log file. But for the servers that didn't start, I want it to
print "didn't start". Please help, thanks.

#!/usr/bin/perl
use diagnostics;
use warnings;

open(SCRATCHPAD,"LOG.txt") or die "can't open $!";
my @uniq = <SCRATCHPAD>;

NODE:
foreach my $stuff ( @uniq ) {
chomp($stuff);

open(STARTUP, "/c\$/$stuff/esp.log") or "die can't open $!";
my @log_file = <STARTUP>;

LINE:
for my $line ( @log_file ) {

next LINE if $line !~ /ESP server (started)./;
my $capture = "$1\n";
print "$stuff => $capture";

close(STARTUP);
close(SCRATCHPAD);

}
}

Rainer Weikusat · Feb 17, 2012

Ben Morrow said:
use File::Slurp qw/slurp/;

my $startup = slurp "/c\$/$stuff/esp.log";
$startup =~ /ESP server started/
and print "$stuff => started\n";

Unless the logfile is *huge* (by which I mean several GB, these days),
it's going to be more efficient to read and match it all in one go.

It is probably going to be faster which might be a good thing if the
assumption that the machine is exclusively dedicated to this
particular task holds because it will then hog CPU and memory most
effectively. OTOH, except if the logfile is *huge*, is the difference
large enough to matter, especially considering that the computer will
very likely have more important things to do than 'monitoring itself'?

OTOH, the source code of that was 'an interesting read'. I suggest
that 'Author: Uri Guttman' should be considered a sufficient reason to
avoid any code blindly, based on that.

Rainer Weikusat · Feb 17, 2012

[...]

OTOH, the source code of that was 'an interesting read'.

For instance, if PERL_IMPLICIT_SYS is not set, sysread will simply do
a read system call and that may return less data than was requested
for any number of reasons. As far as I could determine, the module
does a single sysread call for 'small files' and returns the
results. The way 'atomic updates' are implemented is known to not work
with certain filesystems because it relies on rename having barrier
semantics wrt data writes and this might not be the case (out of my
head, this will break on 'older' version of ext4, on XFS and on any
filesystem which always performs metadata updates sychronously, IOW,
UFS and FFS). There's some hardcoded support for dealing with
'Windows' textfiles in an atrociously inefficient way (by doing
single-character deletions on the read buffer which is an O(n*n)
algorithm). There's no support for dealing with any other kind of
textfiles.

Wolf Behrenhoff · Feb 17, 2012

Am 17.02.2012 16:46, schrieb Rainer Weikusat:

[...]

OTOH, the source code of that was 'an interesting read'.

Click to expand...

Talking about this:

what is actually the advantage of sysread over the "Perl way" of reading
from a file with code like $result = <$fileHande> (or @result =
<$fileHandle>, depending on wantarray)? To slurp, I simply undef $/ and
everything seems fine... I have never used sysread in Perl - should I
consider using it?

I didn't expect such a long code for some "simple" thing like reading in
a file, especially not a comment telling me it is using "DEEP DARK MAGIC".

As I am usually using an "enterprise" Linux distribution (RHEL clone), I
still need to make scripts compatible with 5.8.8 where File::Slurp is
not a core module...

- Wolf

Rainer Weikusat · Feb 17, 2012

[...]

No, not unless you have a reason to. I use File::Slurp because (and only
because) it has a clean, simple interface for getting a file into a
string, without needing to mess about opening filehandles and setting
globals.

Provided that's really just what you want, consider using this:

sub slurp
{
my $fh;

open($fh, '<', $_[0]) or die("open: $_[0]: $!");
local $/ unless wantarray();
return <$fh>;
}

That's less buggy (because it leaves the intricacies of dealing with
system specific I/O to perl) and possibly even faster (again because
it uses features perl already has).

Rainer Weikusat · Feb 17, 2012

Wolf Behrenhoff said:
[...]

OTOH, the source code of that was 'an interesting read'.

Click to expand...

Click to expand...

Talking about this:

what is actually the advantage of sysread over the "Perl way" of reading
from a file with code like $result = <$fileHande> (or @result =
<$fileHandle>, depending on wantarray)? To slurp, I simply undef $/ and
everything seems fine... I have never used sysread in Perl - should I
consider using it?

It bypasses the Perl I/O buffering mechanism, including any
translations layers etc which might be active as part of that. I
wouldn't use it for reading 'text files'. It is, however, useful when
more control about the actual I/O operations performed by a program is
required than the read-in-advance/ write-behind buffering mechanism
offers. This would usually either be the case if the I/O is actually
'real-time' IPC, eg, when the program is acting as a server on an
AF_UNIX datagram socket or when reliability is an important concern,
eg, when doing 'atomic updates' of files which must (to the degree the
application can guarantee this) remain in a consistent state even when
power suddenly goes away. Since "whining about the evil filesystem"
doesn't really help to solve the problem, this required a multi-step
procedure where one stop must have been completed before the next one
commences.

Peter J. Holzer · Feb 17, 2012

Am 17.02.2012 16:46, schrieb Rainer Weikusat:

[...]

OTOH, the source code of that was 'an interesting read'.

Click to expand...

Click to expand...

Talking about this:

what is actually the advantage of sysread over the "Perl way" of reading
from a file with code like $result = <$fileHande> (or @result =
<$fileHandle>, depending on wantarray)?

I'll just quote from a posting I wrote about 2 years ago (incidentally
in reply to Uri Guttman's claim that sysread is faster):

[...]
| So I grabbed the server with the fastest disks I had access to (disk
| array of SSDs), created a file with 400 million lines of 80 characters
| (plus newline) each and ran some benchmarks:
|
| method time speed (MB/s)
| ----------------------------------------------
| perlio $/ = "\n" 2:35.12 209
| perlio $/ = \4096 1:35.36 340
| perlio $/ = \1048576 1:35.25 340
| sysread bs = 4096 1:35.28 340
| sysread bs = 1048576 1:35.18 340
|
| The times are the median of three runs. Times between the runs differed
| by about 1 second, so the difference between reading line by line and
| block by block is significant, but the difference between perlio and
| sysread or between different blocksizes isn't.
|
| I was a bit surprised that reading line by line was so much slower than
| blockwise reading. Was it because of the higher loop overhead (81 bytes
| read per loop instead of 4096 means 50 times more overhead) or because
| splitting a block into lines is so expensive?
|
| So I did another run of benchmarks with different block sizes:
|
| method block user system cpu total
| read_file_by_perlio_block 4096 0.64s 26.87s 31% 1:27.91
| read_file_by_perlio_block 2048 1.48s 28.65s 34% 1:28.56
| read_file_by_perlio_block 1024 5.14s 29.03s 37% 1:30.59
| read_file_by_perlio_block 512 11.98s 31.33s 47% 1:31.22
| read_file_by_perlio_block 256 26.84s 33.13s 61% 1:36.85
| read_file_by_perlio_block 128 43.53s 29.05s 71% 1:41.66
| read_file_by_perlio_block 64 77.26s 28.16s 88% 1:59.70
| read_file_by_line 104.68s 28.01s 93% 2:22.34
|
| (the times are a bit lower now because here the system was idle while it
| had a (relatively constant) load during the first batch)
|
| As expected elapsed time as well as CPU time increases with shrinking
| block size. However, even at 64 bytes, reading in blocks is still 20%
| faster than reading in lines, even though the loop is now executed 27%
| more often.
|
| Conclusions:
|
| * The difference between sysread and blockwise <> isn't even measurable.
|
| * Above 512 Bytes the block size matters very little (and above 4k, not
| at all).
|
| * Reading line by line is significantly slower than reading by blocks.

To slurp, I simply undef $/ and
everything seems fine...

I haven't benchmarked $/ = undef but based on the results above I would
expect it to be as fast as sysread.

hp

John W. Krahn · Feb 17, 2012

Nene said:
Hi,

This is a script that opens up a log file and searches for the word
'started' and if it finds the word, it prints it and iterates to the
next log file. But for the servers that didn't start, I want it to
print "didn't start". Please help, thanks.

#!/usr/bin/perl
use diagnostics;
use warnings;

open(SCRATCHPAD,"LOG.txt") or die "can't open $!";
my @uniq =<SCRATCHPAD>;

NODE:
foreach my $stuff ( @uniq ) {
chomp($stuff);

open(STARTUP, "/c\$/$stuff/esp.log") or "die can't open $!";
my @log_file =<STARTUP>;

LINE:
for my $line ( @log_file ) {

next LINE if $line !~ /ESP server (started)./;
my $capture = "$1\n";
print "$stuff => $capture";

close(STARTUP);
close(SCRATCHPAD);

}
}

This may work better (UNTESTED):

#!/usr/bin/perl
use warnings;
use strict;

open my $SCRATCHPAD, '<', 'LOG.txt' or die "can't open 'LOG.txt'
because: $!";

NODE:
while ( my $stuff = <$SCRATCHPAD> ) {
chomp $stuff;
open my $STARTUP, '<', "/c\$/$stuff/esp.log" or die "can't open
'/c\$/$stuff/esp.log' because: $!";
while ( my $line = <$STARTUP> ) {
if ( $line =~ /ESP server started\./ ) {
print "$stuff => started";
next NODE;
}
}
print "$stuff => didn't start";
}

__END__

John

How to keep the script from stopping or hanging	5	Apr 7, 2012
I need help with a Gemini prompt	1	May 14, 2025
I made a blockchain and want to make a cryptocurrency, but my code doesn't verify hash of each block	2	Jun 2, 2024
can't get out of infinite while loop	2	Aug 17, 2007
problam in nesting loop	1	Nov 18, 2005
Try to exit inner loop and start next iteration of outter loop.	6	Dec 12, 2008
Opening a file twice and having an if loop	5	Jun 8, 2007
nice parallel file reading	14	Apr 26, 2013

Using loop labels and iterating.

Nene

Rainer Weikusat

Rainer Weikusat

Wolf Behrenhoff

Rainer Weikusat

Rainer Weikusat

Peter J. Holzer

John W. Krahn

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads