ISO two Perl idioms...

bill · May 20, 2004

There are a couple of situations that I encounter frequently, and
I wonder if there are established Perl idioms for dealing with them.

1. The problem is to read and discard lines from a file until a
condition is met (usually some regexp match), and then process
subsequent lines. E.g.

my $found;
while (<IN>) {
if (meets_condition($_)) {
$found = 1;
last;
}
}

die "Bad data\n" unless $found;

while (<IN>) {
# etc.
}

2. Finding the first position at which two strings differ.

sub first_different {
my $len = (sort map length, @_[0, 1])[-1];
my $n;
for (my $i = 0; $i < $len; ++$i) {
if (substr($_[0], $i, 1) ne substr($_[1], $i, 1)) {
$n = $i;
last;
}
}
return defined $n ? $n : -1;
}

Are there nice Perl idioms for either of these two tasks?

-bill

P.S. This is not golf, now! I'm *sure* that it is possible to use
some insanely arcane Perl constructs to make the above a *lot* more
succinct, but the results probably would not qualify as "idioms"
in my book. Idioms are pithy, but not terribly obscure. They
don't leave you scratching your head after several minutes of trying
to decrypt them. (I'd say that

select((select(FH), $|=1)[0]);

is the most obscure construct that I'd still be willing to call an
"idiom".)

Tassilo v. Parseval · May 20, 2004

Also sprach bill:

There are a couple of situations that I encounter frequently, and
I wonder if there are established Perl idioms for dealing with them.

2. Finding the first position at which two strings differ.

sub first_different {
my $len = (sort map length, @_[0, 1])[-1];
my $n;
for (my $i = 0; $i < $len; ++$i) {
if (substr($_[0], $i, 1) ne substr($_[1], $i, 1)) {
$n = $i;
last;
}
}
return defined $n ? $n : -1;
}

Are there nice Perl idioms for either of these two tasks?

There is the infamous xor trick for this one:

$a = "foobar";
$b = "foobAr";
($a ^ $b) =~ /^(\0*)/ and print length $1;
__END__
4

It makes use of the fact that xoring two identical values yields zero.
Otherwise something non-zero (1 when looking at each bit). So after
xoring you just have to find the first non-NULL character.

P.S. This is not golf, now! I'm *sure* that it is possible to use
some insanely arcane Perl constructs to make the above a *lot* more
succinct, but the results probably would not qualify as "idioms"
in my book. Idioms are pithy, but not terribly obscure.

I don't know whether the above qualifies as obscure or golf. It is
certainly used in golf once in a while. Nonetheless I think it can look
quite elegant to people having a certain affinity towards bitwise
operations.

Tassilo

Richard Morse · May 20, 2004

bill said:
1. The problem is to read and discard lines from a file until a
condition is met (usually some regexp match), and then process
subsequent lines. E.g.

my $found;
while (<IN>) {
if (meets_condition($_)) {
$found = 1;
last;
}
}

die "Bad data\n" unless $found;

while (<IN>) {
# etc.
}

I think that you can do this with the '..' operator:

#!/usr/bin/perl
use strict;
use warnings;

while(<DATA>) {
if (m/now/ .. eof(DATA)) {
print $_;
}
}

__END__
this
is
a
test
now
i
will
print

This will still print the line that you are matching on, but you can
work with this...

HTH,
Ricky

Anno Siegel · May 20, 2004

Tassilo v. Parseval said:
Also sprach bill:

There are a couple of situations that I encounter frequently, and
I wonder if there are established Perl idioms for dealing with them.

Click to expand...

2. Finding the first position at which two strings differ.

sub first_different {
my $len = (sort map length, @_[0, 1])[-1];
my $n;
for (my $i = 0; $i < $len; ++$i) {
if (substr($_[0], $i, 1) ne substr($_[1], $i, 1)) {
$n = $i;
last;
}
}
return defined $n ? $n : -1;
}

Are there nice Perl idioms for either of these two tasks?

Click to expand...

There is the infamous xor trick for this one:

$a = "foobar";
$b = "foobAr";
($a ^ $b) =~ /^(\0*)/ and print length $1;
__END__
4

It makes use of the fact that xoring two identical values yields zero.
Otherwise something non-zero (1 when looking at each bit). So after
xoring you just have to find the first non-NULL character.

P.S. This is not golf, now! I'm *sure* that it is possible to use
some insanely arcane Perl constructs to make the above a *lot* more
succinct, but the results probably would not qualify as "idioms"
in my book. Idioms are pithy, but not terribly obscure.

Click to expand...

I don't know whether the above qualifies as obscure or golf. It is
certainly used in golf once in a while. Nonetheless I think it can look
quite elegant to people having a certain affinity towards bitwise
operations.

I was going to suggest the xor solution (note promotion from "infamous
trick" to "solution") if you hadn't. What I wonder is how it would
hold up with multibyte characters.

On a more general note, trie data structures have been invented to
tackle the prefix problem (or could have been invented for that
purpose, not knowing the actual history). There are at least two
trie modules on CPAN.

So, instead of an idiom, there's a trick and a couple of modules.

As to the other problem (skipping initial lines in a file), the
".." operator has also been suggested. I'd use it in a single
loop, somewhat like this:

while ( <DATA> ) {
next unless /ccc/ .. do { 0};
print;
}

The funny way of saying "0" (meaning forever as the second operand of
".."), "do { 0}" is necessary because ".." compares literal operands
to "$.". With this quirk, I wouldn't claim idiom status for this
either.

Anno

Ben Morrow · May 20, 2004

Quoth Richard Morse said:
I think that you can do this with the '..' operator:

while(<DATA>) {
if (m/now/ .. eof(DATA)) {

This could just as well be written

if (/now/ .. 0) {

This will still print the line that you are matching on, but you can
work with this...

if ( (/now/ .. 0) > 1 ) {

will exclude the first line (see perldoc perlop).

Ben

Ben Morrow · May 20, 2004

Quoth Ben Morrow said:
This could just as well be written

if (/now/ .. 0) {

....except that, as Anno pointed out, this won't work. D'oh.

if (/now/ .. do{0}) {

or

if (/now/ .. !1) {

which are slightly nasty...

Ben

David K. Wall · May 20, 2004

bill said:
1. The problem is to read and discard lines from a file until a
condition is met (usually some regexp match), and then process
subsequent lines. E.g.

my $found;
while (<IN>) {
if (meets_condition($_)) {
$found = 1;
last;
}
}

die "Bad data\n" unless $found;

This could be a bit shorter and still not be obfuscated:

while (<DATA>) {
last if meets_condition($_);
}
die "Bad data" unless $_;

bill · May 20, 2004

I must say that reading the replies to my original query has be
great fun. Thar's gold in them thar posts!

Thanks!!!

-bill

Tassilo v. Parseval · May 20, 2004

Also sprach Anno Siegel:

I was going to suggest the xor solution (note promotion from "infamous
trick" to "solution") if you hadn't. What I wonder is how it would
hold up with multibyte characters.

Not very well, I guess.

Depends a bit on the circumstances. When
both strings are in the same encoding (preferably not a variable-width
encoding such as UTF-8 is one), it could even be used there. And
naturally, the xor-approach can only report the byte- and not the
character-number at which two strings begin to differ.

On a more general note, trie data structures have been invented to
tackle the prefix problem (or could have been invented for that
purpose, not knowing the actual history). There are at least two
trie modules on CPAN.

The OP's problem looks a bit too basic to benefit from a trie. It's just
a matter of walking through two strings in parallel and compare each
character until they are no longer equal which would be a two-liner in C
(something like 'while (*a && *a++ == *b++) i++;').

When generalizing the problem a bit (namely finding the longest common
substring within potentially large strings), a trie comes in handy. I
think it solve this problem even in linear time.

Tassilo

Uri Guttman · May 20, 2004

b> 1. The problem is to read and discard lines from a file until a
b> condition is met (usually some regexp match), and then process
b> subsequent lines. E.g.

b> my $found;
b> while (<IN>) {
b> if (meets_condition($_)) {
b> $found = 1;
b> last;
b> }
b> }

b> die "Bad data\n" unless $found;

b> while (<IN>) {
b> # etc.
b> }

i read the other posts and none seem to use subs which make for a great
flow control mechanisim. and i don't like to use boolean flags for loop
stuff if i can help it that it reeks of fortran and spaghetti code.

sub skip_until_condition {
my( $fh, $cond_code) = @_ ;

while( <$fh> ) {
return if $cond_code->($_) ;
}

die "lousy file i just read" ;
}

now in the main code you just do:

skip_until_condition( \*IN, \&meets_condition ) ;

while( <IN> ) {

...
}

clean and easy to understand. the lesson is that return is a flow
control op too and that subs are your friend.

uri

David K. Wall · May 21, 2004

Uri Guttman said:
i read the other posts and none seem to use subs which make for a
great flow control mechanisim. and i don't like to use boolean
flags for loop stuff if i can help it that it reeks of fortran and
spaghetti code.

sub skip_until_condition {
my( $fh, $cond_code) = @_ ;

while( <$fh> ) {
return if $cond_code->($_) ;
}

die "lousy file i just read" ;
}

now in the main code you just do:

skip_until_condition( \*IN, \&meets_condition ) ;

while( <IN> ) {

...
}

clean and easy to understand. the lesson is that return is a flow
control op too and that subs are your friend.

OK, here's my original solution. I thought maybe it was overkill, so
I dropped the sub before posting.

sub read_until {
my ($filehandle, $condition) = @_;
while (<$filehandle>) {
last if $condition->($_);
}
return $_;
}

my $found = read_until( \*DATA, sub { ... } );
die "Nothing found!" unless $found;

I like returning the line that meets the condition, because it defers
the choice of what to do with it.

Uri Guttman · May 21, 2004

DKW> OK, here's my original solution. I thought maybe it was overkill, so
DKW> I dropped the sub before posting.

DKW> sub read_until {
DKW> my ($filehandle, $condition) = @_;
DKW> while (<$filehandle>) {
DKW> last if $condition->($_);

return $_ if $condition->($_);

DKW> }
DKW> return $_;

return ;

DKW> }

DKW> my $found = read_until( \*DATA, sub { ... } );
DKW> die "Nothing found!" unless $found;
^defined

what if the last line was '0' without a newline? that is the classic
(but very rare) gotcha of while(<>) in old perls (fixed with the implied
defined wrapper).

DKW> I like returning the line that meets the condition, because it defers
DKW> the choice of what to do with it.

that is ok but the OP never spec'ed it that way. he made it sound like
that line is just a marker and has no significant data.

uri

David K. Wall · May 21, 2004

Uri Guttman said:
DKW> OK, here's my original solution. I thought maybe it was
overkill, so DKW> I dropped the sub before posting.

DKW> sub read_until {
DKW> my ($filehandle, $condition) = @_;
DKW> while (<$filehandle>) {
DKW> last if $condition->($_);

return $_ if $condition->($_);

DKW> }
DKW> return $_;

return ;

DKW> }

DKW> my $found = read_until( \*DATA, sub { ... } );
DKW> die "Nothing found!" unless $found;
^defined

what if the last line was '0' without a newline?

Then there's a bug that might be hard to find. Oops. Bummer. I liked
that feature. It's easy enough to return a boolean AND the line, but
that's getting too crufty for something simple.

that is the
classic (but very rare) gotcha of while(<>) in old perls (fixed
with the implied defined wrapper).

DKW> I like returning the line that meets the condition, because
it defers DKW> the choice of what to do with it.

that is ok but the OP never spec'ed it that way. he made it sound
like that line is just a marker and has no significant data.

Sure. I was just putting my own preferences into it.

Ben Morrow · May 21, 2004

Quoth "David K. Wall said:
OK, here's my original solution. I thought maybe it was overkill, so
I dropped the sub before posting.

sub read_until {
my ($filehandle, $condition) = @_;
while (<$filehandle>) {
last if $condition->($_);
}
return $_;
}

my $found = read_until( \*DATA, sub { ... } );
die "Nothing found!" unless $found;

I like returning the line that meets the condition, because it defers
the choice of what to do with it.

Purely as a matter of style, I'd write it as

use Symbol;

sub read_until (&*) {
my ($cond, $FH) = @_;
local $_;
$FH = Symbol::qualify_to_ref $FH, caller;
$FH ||= \*ARGV;
while (<$FH>) {
last if $cond->($_);
}
return $_;
}

so you can call it as

my $found = read_until { ... } DATA;

; and I'd probably add a hash of params on the end so you could set $/
locally, croak with a given message if the condition wasn't met, etc.

Ben

David K. Wall · Jun 3, 2004

Dave Weaver said:
Perhaps you meant:

die "Bad data" unless meets_condition($_);
?

Otherwise a value of 0 in the data that meets the condition will
give "Bad data".

Yes, Uri Guttman pointed that out later in the thread.

Help with finding difference between two bodies of text in order	0	Sep 10, 2024
Ajax function only returns error and does not call php function	2	Aug 6, 2022
Python point location of intersect between two lines	0	Feb 28, 2018
Translater + module + tkinter	1	Feb 16, 2023
The code idioms in the FAQ	3	May 1, 2009
Python client/server that reads HTML body from server	1	Apr 11, 2023
an array of of arrays in Perl, behavior different in Strawberry andLinux perl	3	Sep 13, 2012
building generators in Perl	1	Feb 27, 2013

ISO two Perl idioms...

bill

Tassilo v. Parseval

Richard Morse

Anno Siegel

Ben Morrow

Ben Morrow

David K. Wall

bill

Tassilo v. Parseval

Uri Guttman

David K. Wall

Uri Guttman

David K. Wall

Ben Morrow

David K. Wall

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads