ISO two Perl idioms...

B

bill

There are a couple of situations that I encounter frequently, and
I wonder if there are established Perl idioms for dealing with them.

1. The problem is to read and discard lines from a file until a
condition is met (usually some regexp match), and then process
subsequent lines. E.g.

my $found;
while (<IN>) {
if (meets_condition($_)) {
$found = 1;
last;
}
}

die "Bad data\n" unless $found;

while (<IN>) {
# etc.
}


2. Finding the first position at which two strings differ.

sub first_different {
my $len = (sort map length, @_[0, 1])[-1];
my $n;
for (my $i = 0; $i < $len; ++$i) {
if (substr($_[0], $i, 1) ne substr($_[1], $i, 1)) {
$n = $i;
last;
}
}
return defined $n ? $n : -1;
}

Are there nice Perl idioms for either of these two tasks?

-bill



P.S. This is not golf, now! I'm *sure* that it is possible to use
some insanely arcane Perl constructs to make the above a *lot* more
succinct, but the results probably would not qualify as "idioms"
in my book. Idioms are pithy, but not terribly obscure. They
don't leave you scratching your head after several minutes of trying
to decrypt them. (I'd say that

select((select(FH), $|=1)[0]);

is the most obscure construct that I'd still be willing to call an
"idiom".)
 
T

Tassilo v. Parseval

Also sprach bill:
There are a couple of situations that I encounter frequently, and
I wonder if there are established Perl idioms for dealing with them.
2. Finding the first position at which two strings differ.

sub first_different {
my $len = (sort map length, @_[0, 1])[-1];
my $n;
for (my $i = 0; $i < $len; ++$i) {
if (substr($_[0], $i, 1) ne substr($_[1], $i, 1)) {
$n = $i;
last;
}
}
return defined $n ? $n : -1;
}

Are there nice Perl idioms for either of these two tasks?

There is the infamous xor trick for this one:

$a = "foobar";
$b = "foobAr";
($a ^ $b) =~ /^(\0*)/ and print length $1;
__END__
4

It makes use of the fact that xoring two identical values yields zero.
Otherwise something non-zero (1 when looking at each bit). So after
xoring you just have to find the first non-NULL character.
P.S. This is not golf, now! I'm *sure* that it is possible to use
some insanely arcane Perl constructs to make the above a *lot* more
succinct, but the results probably would not qualify as "idioms"
in my book. Idioms are pithy, but not terribly obscure.

I don't know whether the above qualifies as obscure or golf. It is
certainly used in golf once in a while. Nonetheless I think it can look
quite elegant to people having a certain affinity towards bitwise
operations.

Tassilo
 
R

Richard Morse

bill said:
1. The problem is to read and discard lines from a file until a
condition is met (usually some regexp match), and then process
subsequent lines. E.g.

my $found;
while (<IN>) {
if (meets_condition($_)) {
$found = 1;
last;
}
}

die "Bad data\n" unless $found;

while (<IN>) {
# etc.
}

I think that you can do this with the '..' operator:

#!/usr/bin/perl
use strict;
use warnings;

while(<DATA>) {
if (m/now/ .. eof(DATA)) {
print $_;
}
}

__END__
this
is
a
test
now
i
will
print

This will still print the line that you are matching on, but you can
work with this...

HTH,
Ricky
 
A

Anno Siegel

Tassilo v. Parseval said:
Also sprach bill:
There are a couple of situations that I encounter frequently, and
I wonder if there are established Perl idioms for dealing with them.
2. Finding the first position at which two strings differ.

sub first_different {
my $len = (sort map length, @_[0, 1])[-1];
my $n;
for (my $i = 0; $i < $len; ++$i) {
if (substr($_[0], $i, 1) ne substr($_[1], $i, 1)) {
$n = $i;
last;
}
}
return defined $n ? $n : -1;
}

Are there nice Perl idioms for either of these two tasks?

There is the infamous xor trick for this one:

$a = "foobar";
$b = "foobAr";
($a ^ $b) =~ /^(\0*)/ and print length $1;
__END__
4

It makes use of the fact that xoring two identical values yields zero.
Otherwise something non-zero (1 when looking at each bit). So after
xoring you just have to find the first non-NULL character.
P.S. This is not golf, now! I'm *sure* that it is possible to use
some insanely arcane Perl constructs to make the above a *lot* more
succinct, but the results probably would not qualify as "idioms"
in my book. Idioms are pithy, but not terribly obscure.

I don't know whether the above qualifies as obscure or golf. It is
certainly used in golf once in a while. Nonetheless I think it can look
quite elegant to people having a certain affinity towards bitwise
operations.

I was going to suggest the xor solution (note promotion from "infamous
trick" to "solution") if you hadn't. What I wonder is how it would
hold up with multibyte characters.

On a more general note, trie data structures have been invented to
tackle the prefix problem (or could have been invented for that
purpose, not knowing the actual history). There are at least two
trie modules on CPAN.

So, instead of an idiom, there's a trick and a couple of modules.

As to the other problem (skipping initial lines in a file), the
".." operator has also been suggested. I'd use it in a single
loop, somewhat like this:

while ( <DATA> ) {
next unless /ccc/ .. do { 0};
print;
}

The funny way of saying "0" (meaning forever as the second operand of
".."), "do { 0}" is necessary because ".." compares literal operands
to "$.". With this quirk, I wouldn't claim idiom status for this
either.

Anno
 
B

Ben Morrow

Quoth Richard Morse said:
I think that you can do this with the '..' operator:

while(<DATA>) {
if (m/now/ .. eof(DATA)) {

This could just as well be written

if (/now/ .. 0) {

This will still print the line that you are matching on, but you can
work with this...

if ( (/now/ .. 0) > 1 ) {

will exclude the first line (see perldoc perlop).

Ben
 
B

Ben Morrow

Quoth Ben Morrow said:
This could just as well be written

if (/now/ .. 0) {

....except that, as Anno pointed out, this won't work. D'oh.

if (/now/ .. do{0}) {

or

if (/now/ .. !1) {

which are slightly nasty...

Ben
 
D

David K. Wall

bill said:
1. The problem is to read and discard lines from a file until a
condition is met (usually some regexp match), and then process
subsequent lines. E.g.

my $found;
while (<IN>) {
if (meets_condition($_)) {
$found = 1;
last;
}
}

die "Bad data\n" unless $found;

This could be a bit shorter and still not be obfuscated:

while (<DATA>) {
last if meets_condition($_);
}
die "Bad data" unless $_;
 
B

bill

I must say that reading the replies to my original query has be
great fun. Thar's gold in them thar posts!

Thanks!!!

-bill
 
T

Tassilo v. Parseval

Also sprach Anno Siegel:
I was going to suggest the xor solution (note promotion from "infamous
trick" to "solution") if you hadn't. What I wonder is how it would
hold up with multibyte characters.

Not very well, I guess. :) Depends a bit on the circumstances. When
both strings are in the same encoding (preferably not a variable-width
encoding such as UTF-8 is one), it could even be used there. And
naturally, the xor-approach can only report the byte- and not the
character-number at which two strings begin to differ.
On a more general note, trie data structures have been invented to
tackle the prefix problem (or could have been invented for that
purpose, not knowing the actual history). There are at least two
trie modules on CPAN.

The OP's problem looks a bit too basic to benefit from a trie. It's just
a matter of walking through two strings in parallel and compare each
character until they are no longer equal which would be a two-liner in C
(something like 'while (*a && *a++ == *b++) i++;').

When generalizing the problem a bit (namely finding the longest common
substring within potentially large strings), a trie comes in handy. I
think it solve this problem even in linear time.

Tassilo
 
U

Uri Guttman

b> 1. The problem is to read and discard lines from a file until a
b> condition is met (usually some regexp match), and then process
b> subsequent lines. E.g.

b> my $found;
b> while (<IN>) {
b> if (meets_condition($_)) {
b> $found = 1;
b> last;
b> }
b> }

b> die "Bad data\n" unless $found;

b> while (<IN>) {
b> # etc.
b> }

i read the other posts and none seem to use subs which make for a great
flow control mechanisim. and i don't like to use boolean flags for loop
stuff if i can help it that it reeks of fortran and spaghetti code.

sub skip_until_condition {
my( $fh, $cond_code) = @_ ;

while( <$fh> ) {
return if $cond_code->($_) ;
}

die "lousy file i just read" ;
}

now in the main code you just do:

skip_until_condition( \*IN, \&meets_condition ) ;

while( <IN> ) {

...
}

clean and easy to understand. the lesson is that return is a flow
control op too and that subs are your friend.

uri
 
D

David K. Wall

Uri Guttman said:
i read the other posts and none seem to use subs which make for a
great flow control mechanisim. and i don't like to use boolean
flags for loop stuff if i can help it that it reeks of fortran and
spaghetti code.

sub skip_until_condition {
my( $fh, $cond_code) = @_ ;

while( <$fh> ) {
return if $cond_code->($_) ;
}

die "lousy file i just read" ;
}

now in the main code you just do:

skip_until_condition( \*IN, \&meets_condition ) ;

while( <IN> ) {

...
}

clean and easy to understand. the lesson is that return is a flow
control op too and that subs are your friend.

OK, here's my original solution. I thought maybe it was overkill, so
I dropped the sub before posting. :)

sub read_until {
my ($filehandle, $condition) = @_;
while (<$filehandle>) {
last if $condition->($_);
}
return $_;
}

my $found = read_until( \*DATA, sub { ... } );
die "Nothing found!" unless $found;

I like returning the line that meets the condition, because it defers
the choice of what to do with it.
 
U

Uri Guttman

DKW> OK, here's my original solution. I thought maybe it was overkill, so
DKW> I dropped the sub before posting. :)

DKW> sub read_until {
DKW> my ($filehandle, $condition) = @_;
DKW> while (<$filehandle>) {
DKW> last if $condition->($_);

return $_ if $condition->($_);

DKW> }
DKW> return $_;

return ;

DKW> }

DKW> my $found = read_until( \*DATA, sub { ... } );
DKW> die "Nothing found!" unless $found;
^defined

what if the last line was '0' without a newline? that is the classic
(but very rare) gotcha of while(<>) in old perls (fixed with the implied
defined wrapper).

DKW> I like returning the line that meets the condition, because it defers
DKW> the choice of what to do with it.

that is ok but the OP never spec'ed it that way. he made it sound like
that line is just a marker and has no significant data.

uri
 
D

David K. Wall

Uri Guttman said:
DKW> OK, here's my original solution. I thought maybe it was
overkill, so DKW> I dropped the sub before posting. :)

DKW> sub read_until {
DKW> my ($filehandle, $condition) = @_;
DKW> while (<$filehandle>) {
DKW> last if $condition->($_);

return $_ if $condition->($_);

DKW> }
DKW> return $_;

return ;

DKW> }

DKW> my $found = read_until( \*DATA, sub { ... } );
DKW> die "Nothing found!" unless $found;
^defined

what if the last line was '0' without a newline?

Then there's a bug that might be hard to find. Oops. Bummer. I liked
that feature. It's easy enough to return a boolean AND the line, but
that's getting too crufty for something simple.
that is the
classic (but very rare) gotcha of while(<>) in old perls (fixed
with the implied defined wrapper).

DKW> I like returning the line that meets the condition, because
it defers DKW> the choice of what to do with it.

that is ok but the OP never spec'ed it that way. he made it sound
like that line is just a marker and has no significant data.

Sure. I was just putting my own preferences into it.
 
B

Ben Morrow

Quoth "David K. Wall said:
OK, here's my original solution. I thought maybe it was overkill, so
I dropped the sub before posting. :)

sub read_until {
my ($filehandle, $condition) = @_;
while (<$filehandle>) {
last if $condition->($_);
}
return $_;
}

my $found = read_until( \*DATA, sub { ... } );
die "Nothing found!" unless $found;

I like returning the line that meets the condition, because it defers
the choice of what to do with it.

Purely as a matter of style, I'd write it as

use Symbol;

sub read_until (&*) {
my ($cond, $FH) = @_;
local $_;
$FH = Symbol::qualify_to_ref $FH, caller;
$FH ||= \*ARGV;
while (<$FH>) {
last if $cond->($_);
}
return $_;
}

so you can call it as

my $found = read_until { ... } DATA;

; and I'd probably add a hash of params on the end so you could set $/
locally, croak with a given message if the condition wasn't met, etc.

Ben
 
D

David K. Wall

Dave Weaver said:
Perhaps you meant:

die "Bad data" unless meets_condition($_);
?

Otherwise a value of 0 in the data that meets the condition will
give "Bad data".

Yes, Uri Guttman pointed that out later in the thread.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,269
Messages
2,571,099
Members
48,773
Latest member
Kaybee

Latest Threads

Top