Consecutive Numbers

G

Graham Drabble

I have a file that contains a list of numbers. I'm trying to process
the file to find out how many rows start with 4 consecutive numbers
(either ascending or decending). Currently I've got

use strict;
use warnings;

open (IN, '4bell.txt') or die "Can't open IN: $!";

my $runs = 0;
while(<IN>){
chomp;
my $first = substr($_,0,1);
my $asc = $first . $first+1 . $first+2 . $first+3;
my $des = $first . $first-1 . $first-2 . $first-3;
if (/^($asc|$des)/){
$runs++
}
}
print "There were $runs runs\n";

IN
12345867
23457658
34568765
43215687
13245678

Prints
There were 4 runs

which is correct. However I can't help but think there must be a
shorter solution but can't think of it. Any ideas? The file could
contain up to 5000 lines.
 
P

Paul Lalli

Graham said:
I have a file that contains a list of numbers. I'm trying to process
the file to find out how many rows start with 4 consecutive numbers
(either ascending or decending). Currently I've got

use strict;
use warnings;

open (IN, '4bell.txt') or die "Can't open IN: $!";

my $runs = 0;
while(<IN>){
chomp;
my $first = substr($_,0,1);
my $asc = $first . $first+1 . $first+2 . $first+3;
my $des = $first . $first-1 . $first-2 . $first-3;
if (/^($asc|$des)/){
$runs++
}
}
print "There were $runs runs\n";

IN
12345867
23457658
34568765
43215687
13245678

Prints
There were 4 runs

which is correct. However I can't help but think there must be a
shorter solution but can't think of it. Any ideas? The file could
contain up to 5000 lines.

Here's my solution. I don't know if it's better or worse, but it's
certainly different (a couple extra test cases given - 6 'valid' lines
total):

#!/usr/bin/perl
use strict;
use warnings;

my $rows;

while (<DATA>){
chomp;
my @nums = (split //)[0..3];
$rows++ if (grep {$nums[$_] == $nums[$_-1]+1} 1..3) == 3;
$rows++ if (grep {$nums[$_] == $nums[$_-1]-1} 1..3) == 3;
}

print "Total of $rows rows\n";

__DATA__
12345867
23457658
34568765
43215687
13245678
67893131
12335668
13456789
76542109
 
A

Anno Siegel

Graham Drabble said:
I have a file that contains a list of numbers. I'm trying to process
the file to find out how many rows start with 4 consecutive numbers
(either ascending or decending). Currently I've got

Please don't speak of numbers when you mean digits. Your example is hard
to understand without that distinction. That is a waste of time.
use strict;
use warnings;

open (IN, '4bell.txt') or die "Can't open IN: $!";

my $runs = 0;
while(<IN>){
chomp;
my $first = substr($_,0,1);
my $asc = $first . $first+1 . $first+2 . $first+3;
my $des = $first . $first-1 . $first-2 . $first-3;
if (/^($asc|$des)/){
$runs++
}
}
print "There were $runs runs\n";

IN
12345867
23457658
34568765
43215687
13245678

Prints
There were 4 runs

which is correct. However I can't help but think there must be a
shorter solution but can't think of it. Any ideas? The file could
contain up to 5000 lines.

Worry about speed when the program runs too slow, not before.

I don't know about shorter, or even faster, but your way of building
a regular expression for each case is certainly clumsy and limited.

Regular expressions are for text, but what you are dealing with are
numbers (despite my stressing that they are digits). Use arithmetic
operations for numbers, the other way lies, for instance, a famous
Y2K bug.

Your ascending or descending sequences of digits can be considered
arithmetic sequences with a difference of +1 or -1. That the elements
happen to be single-digit numbers is irrelevant here. So write a
general routine that recognizes arithmetic sequences (of any length):

sub is_arithmetic_sequence {
my $delta = shift;
while ( @_ > 1 ) {
$_ + $delta == $_[ 0] or return 0 for shift;
}
return 1;
}

That can deal with ascending and descending sequences, so you don't
have to code for both.

To extract the four-digit sequences from each line, a regex is
appropriate. So the main loop is

my $runs = 0;
while ( <DATA> ) {
my @seq = /(\d)(\d)(\d)(\d)/ or next;
$runs ++ if
is_arithmetic_sequence( -1, @seq) or
is_arithmetic_sequence( 1, @seq);
}
print "There were $runs runs\n";

That should deal with a few thousand lines with no big problems on
current hardware. The code gives the right result for your data,
but isn't severely tested.

Anno
 
J

Jay Tilton

: I have a file that contains a list of numbers. I'm trying to process
: the file to find out how many rows start with 4 consecutive numbers
: (either ascending or decending). Currently I've got
:
: use strict;
: use warnings;
:
: open (IN, '4bell.txt') or die "Can't open IN: $!";
:
: my $runs = 0;
: while(<IN>){
: chomp;
: my $first = substr($_,0,1);
: my $asc = $first . $first+1 . $first+2 . $first+3;
: my $des = $first . $first-1 . $first-2 . $first-3;
: if (/^($asc|$des)/){
: $runs++
: }

The sample data don't reveal it, but the equal precedence of the "+" and
"." operators can create false positive matches. Try it with an element
starting with "7900".

Even after disambiguating with parentheses, e.g.

my $asc = $first . ($first+1) . ($first+2) . ($first+3);

it could still create a false positive match if an element started with,
say, "78910".

: }
: print "There were $runs runs\n";
:
: IN
: 12345867
: 23457658
: 34568765
: 43215687
: 13245678
:
: Prints
: There were 4 runs
:
: which is correct.

: However I can't help but think there must be a
: shorter solution but can't think of it. Any ideas? The file could
: contain up to 5000 lines.

How about:

if(
/^([0-6])(??{($1+1) . ($1+2) . ($1+3)})/
or
/^([3-9])(??{($1-1) . ($1-2) . ($1-3)})/
) {
$runs++;
}

or:

if(
/^([0-6])/ and substr($_,1,3) - $1 x 3 == 123
or
/^([3-9])/ and substr($_,1,3) - $1 x 3 == -123
) {
$runs++;
}
 
A

A. Sinan Unur

(e-mail address removed)-berlin.de (Anno Siegel) wrote in
my $runs = 0;
while ( <DATA> ) {
my @seq = /(\d)(\d)(\d)(\d)/ or next;

I think the OP wanted to see if the first four digits in each line were
consecutive. In light of that, and because I was curious, here is another
way of doing it.

use strict;
use warnings;

my $runs = 0;
my $lines = 0;

my $t = time;

while(my $line = <DATA>) {
next unless $line =~ /^(\d\d\d\d)/;
++$lines;
my @digits = split '', $1;
if(is_arithmetic_sequence(1, @digits)
|| is_arithmetic_sequence(-1, @digits)) {
++$runs;
}
}

$t = time - $t;

print "There were $runs runs\n";
print "It took $t seconds to process $lines lines\n";

sub is_arithmetic_sequence {
my $delta = shift;
for my $d (1 .. $#_) {
return 0 unless $delta == $_[$d] - $_[$d-1];
}
return 1;
}
__DATA__
12345867
23457658

.... 1_211_896 lines ...
That should deal with a few thousand lines with no big problems on
current hardware.

C:\Temp> perl md.pl
There were 774520 runs
It took 63 seconds to process 1211896 lines

That's about 20_000 lines per second on a 1Ghz Celeron running Win XP. Not
bad :)
 
M

Matija Papec

X-Ftn-To: Graham Drabble

Graham Drabble said:
I have a file that contains a list of numbers. I'm trying to process
the file to find out how many rows start with 4 consecutive numbers
(either ascending or decending). Currently I've got

#untested
use strict;
use warnings;

open (IN, '4bell.txt') or die "Can't open IN: $!";

my $runs = 0;
my $s2 = reverse my $s1 = join "", 0..9;
while (<IN>) {
my ($m) = /^(\d{4})/ or next;
$runs++ if $s1 =~ /$m/ or $s2 =~ /$m/;
}
print "There were $runs runs\n";
 
A

A. Sinan Unur

The solutions I've seen so far all calculate something inside the
loop. That's not as efficient as doing the work outside of the loop.
One should realise there are only 12 possible starting strings.

Indeed. This results in an increase from about 20_000 lines per second to
about 200_000 lines per second.
my $r = join "|" => qr /0123 1234 2345 3456 4567 5678 6789
3210 4321 5432 6543 7654 8765 9876/;

ITYM qw.

Sinan.
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was sent to
Graham Drabble
I have a file that contains a list of numbers. I'm trying to process
the file to find out how many rows start with 4 consecutive digits
(either ascending or decending). Currently I've got

I'm puzzled at all complications: what is wrong with using something like

my $four = substr $in, 0, 4;
print 'OK' if length $four == 4 and
(0 <= index '0123456789', $four or 0 <= index '9876543210', $four);

Ilya
 
A

Anno Siegel

Ilya Zakharevich said:
[A complimentary Cc of this posting was sent to
Graham Drabble
I have a file that contains a list of numbers. I'm trying to process
the file to find out how many rows start with 4 consecutive digits
(either ascending or decending). Currently I've got

I'm puzzled at all complications: what is wrong with using something like

my $four = substr $in, 0, 4;
print 'OK' if length $four == 4 and
(0 <= index '0123456789', $four or 0 <= index '9876543210', $four);

There's nothing wrong, it apperars to be even a tad faster than Abigail's
regex. (Only goes to show that alternatives in a regex are relatively slow).

When I introduced a general function is_arithmetic_sequence() to solve
the problem, I didn'd have speed in mind, but, on the contrary, that
it is affordable to use a more general utility. With the original data
size of a few thousand, the difference in speed wouldn't have mattered.

Anno
 
G

Graham Drabble

[A complimentary Cc of this posting was sent to
Graham Drabble
I have a file that contains a list of numbers. I'm trying to
process the file to find out how many rows start with 4
consecutive digits (either ascending or decending). Currently
I've got

I'm puzzled at all complications: what is wrong with using
something like

my $four = substr $in, 0, 4;
print 'OK' if length $four == 4 and
(0 <= index '0123456789', $four or 0 <= index '9876543210',
$four);

Thanks everyone. I think I'm going to go with this one, partly for
speed and partly because it's simple enough that when I come back to
it I'll know what it does!

Is there any reason that you've used 0<= index rather than index >=0?
 
C

Christopher Nehren

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Is there any reason that you've used 0<= index rather than index >=0?

It's an extension of a technique used to prevent the all-too-common:

if($var = 0) { mumble() }

where

if($var == 0) { mumble() }

is what was meant. At least, that's how I understand it. I personally
don't see the point of doing that with relational operators other than
==, but I'm sure that someone can specify a reason.

Best Regards,
Christopher Nehren
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (FreeBSD)

iD8DBQFBvPI4k/lo7zvzJioRAkFFAJ4yzuG6Ln/q2gPwb9G14fGqC7+C+gCgsfYg
jNKKQEOABKaheJAjVJEaICQ=
=RnOJ
-----END PGP SIGNATURE-----
 
J

John W. Krahn

Graham said:
[A complimentary Cc of this posting was sent to
Graham Drabble
I have a file that contains a list of numbers. I'm trying to
process the file to find out how many rows start with 4
consecutive digits (either ascending or decending). Currently
I've got

I'm puzzled at all complications: what is wrong with using
something like

my $four = substr $in, 0, 4;
print 'OK' if length $four == 4 and
(0 <= index '0123456789', $four or 0 <= index '9876543210',
$four);

Thanks everyone. I think I'm going to go with this one, partly for
speed and partly because it's simple enough that when I come back to
it I'll know what it does!

Is there any reason that you've used 0<= index rather than index >=0?

Because of precedence (see perlop.) "index $a, $b >= 0" would require
parentheses "index( $a, $b ) >= 0" or it would be interpreted by perl as
"index $a, ( $b >= 0 )" while "0 <= index $a, $b" does not require parentheses.



John
 
G

Gary E. Ansok

I have a file that contains a list of numbers. I'm trying to process
the file to find out how many rows start with 4 consecutive numbers
(either ascending or decending). Currently I've got

use strict;
use warnings;

open (IN, '4bell.txt') or die "Can't open IN: $!";

my $runs = 0;
while(<IN>){
chomp;
my $first = substr($_,0,1);
my $asc = $first . $first+1 . $first+2 . $first+3;
my $des = $first . $first-1 . $first-2 . $first-3;
if (/^($asc|$des)/){
$runs++
}
}
print "There were $runs runs\n";

IN
12345867
23457658
34568765
43215687
13245678

Prints
There were 4 runs

which is correct. However I can't help but think there must be a
shorter solution but can't think of it. Any ideas? The file could
contain up to 5000 lines.

Since this is Perl, a solution involving a hash is always worth considering:

use strict;
use warnings;

open (IN, '4bell.txt') or die "Can't open IN: $!";

my @run_list = qw/0123 1234 2345 3456 4567 5678 6789
3210 4321 5432 6543 7654 8765 9876/;
my %run_hash;
@run_hash{@run_list} = ();
my $runs = 0;

while (<IN>) {
$runs++ if exists $run_hash{substr($_,0,4)};
}

print "There were $runs runs\n";

Gary Ansok
 
A

Anno Siegel

Gary E. Ansok said:
Since this is Perl, a solution involving a hash is always worth considering:

use strict;
use warnings;

open (IN, '4bell.txt') or die "Can't open IN: $!";

my @run_list = qw/0123 1234 2345 3456 4567 5678 6789
3210 4321 5432 6543 7654 8765 9876/;
my %run_hash;
@run_hash{@run_list} = ();
my $runs = 0;

while (<IN>) {
$runs++ if exists $run_hash{substr($_,0,4)};
}

print "There were $runs runs\n";

Fastest yet, by quite a margin, according to my inofficial score table:

regex (Abigail) 8 seconds
index (Ilya) 7 seconds
hash (Gary) 4 seconds

Anything that as much as calls a sub in the loop is out (> 10 seconds).

Anno
 
M

Matija Papec

X-Ftn-To: Anno Siegel

sub is_arithmetic_sequence {
my $delta = shift;
while ( @_ > 1 ) {
$_ + $delta == $_[ 0] or return 0 for shift;
}
return 1;
}

=== my benchmark ===
Rate foreach plain nosub_foreach nosub_plain
foreach 5698/s -- -68% -91% -92%
plain 17913/s 214% -- -72% -75%
nosub_foreach 64412/s 1030% 260% -- -11%
nosub_plain 72595/s 1174% 305% 13% --

I know that foreach is costly, even when using it only to topicalize
scalars. What baffles me here, is that there is far greater difference
between first two benchmarks => (foreach:plain = 1:3.14; your
is_arithmetic_sequence runs three times faster without foreach) compared to
next two 1:1.12 ?:)
Is Benchmark being misused?


use Benchmark qw:)all);
use strict;

sub is_arithmetic_sequence {
my $delta = shift;
while ( @_ > 1 ) {
$_ + $delta == $_[ 0] or return 0 for shift;
}
return 1;
}
sub is_arithmetic_sequence2 {
my $delta = shift;
while ( @_ > 1 ) {
(shift) + $delta == $_[ 0] or return 0;
}
return 1;
}

cmpthese(40_000, {
foreach => sub {
my @r = 0..30;
is_arithmetic_sequence(1, @r);
},
plain => sub {
my @r = 0..30;
is_arithmetic_sequence2(1, @r);
},
nosub_foreach => sub {
my @r = 0..30;
my $delta = 1;
while ( @r > 1 ) {
$_ + $delta == $_[ 0] or return 0 for shift @r;
}
return 1;
},
nosub_plain => sub {
my @r = 0..30;
my $delta = 1;
while ( @r > 1 ) {
(shift @r) + $delta == $_[ 0] or return 0;
}
return 1;
}
});
 
M

Matija Papec

X-Ftn-To: Matija Papec

Matija Papec said:
is_arithmetic_sequence runs three times faster without foreach) compared to
next two 1:1.12 ?:)
Is Benchmark being misused?

I was using wrong arrays in last two subs. ;)
Rate nosub_foreach foreach nosub_plain plain
foreach 5698/s 5% -- -64% -68%
plain 17841/s 228% 213% 12% --
nosub_foreach 5434/s -- -5% -66% -70%
nosub_plain 15981/s 194% 180% -- -10%

It seems that foreach is main speed killer.
 
A

Anno Siegel

Matija Papec said:
X-Ftn-To: Anno Siegel

sub is_arithmetic_sequence {
my $delta = shift;
while ( @_ > 1 ) {
$_ + $delta == $_[ 0] or return 0 for shift;
}
return 1;
}

=== my benchmark ===
Rate foreach plain nosub_foreach nosub_plain
foreach 5698/s -- -68% -91% -92%
plain 17913/s 214% -- -72% -75%
nosub_foreach 64412/s 1030% 260% -- -11%
nosub_plain 72595/s 1174% 305% 13% --

I know that foreach is costly, even when using it only to topicalize
scalars. What baffles me here, is that there is far greater difference
between first two benchmarks => (foreach:plain = 1:3.14; your
is_arithmetic_sequence runs three times faster without foreach) compared to
next two 1:1.12 ?:)
Is Benchmark being misused?

Well, you have cleared that up in a followup. Benchmarking is an art.
It is all to common to find you have benchmarked something that has
little to do with the original problem. Happens to everyone...
sub is_arithmetic_sequence {
my $delta = shift;
while ( @_ > 1 ) {
$_ + $delta == $_[ 0] or return 0 for shift;
}
return 1;
}

I was entirely unconcerned with speed when I wrote that. The one-shot
"for" is there so that shift() and $_[ 0] don't appear in the same
statement the way they do in the code below.
sub is_arithmetic_sequence2 {
my $delta = shift;
while ( @_ > 1 ) {
(shift) + $delta == $_[ 0] or return 0;

This is arguably wrong. The Perl interpreter is free to evaluate
subexpressions in any order, so there is no guarantee that shift()
has happened before $_[ 0] is evaluated. The result of the expression
is undefined, even though there was probably never a Perl interpreter
that did it the other way 'round.
}
return 1;
}

Anno
 
M

Matija Papec

X-Ftn-To: Anno Siegel

$_ + $delta == $_[ 0] or return 0 for shift;
}
return 1;
}

I was entirely unconcerned with speed when I wrote that. The one-shot

Ofcourse, it's just that I find your example interesting in light of
"subroutine calls slow down the program". While such way of thinking is
true, it's probably overestimated as there are greater speed killers like
"foreach".
sub is_arithmetic_sequence2 {
my $delta = shift;
while ( @_ > 1 ) {
(shift) + $delta == $_[ 0] or return 0;

This is arguably wrong. The Perl interpreter is free to evaluate
subexpressions in any order, so there is no guarantee that shift()
has happened before $_[ 0] is evaluated. The result of the expression
is undefined, even though there was probably never a Perl interpreter
that did it the other way 'round.

Ah, so I'm living on the edge? Are there some other common cases where it
comes to undefined results?
 
A

Anno Siegel

Matija Papec said:
X-Ftn-To: Anno Siegel
sub is_arithmetic_sequence2 {
my $delta = shift;
while ( @_ > 1 ) {
(shift) + $delta == $_[ 0] or return 0;

This is arguably wrong. The Perl interpreter is free to evaluate
subexpressions in any order, so there is no guarantee that shift()
has happened before $_[ 0] is evaluated. The result of the expression
is undefined, even though there was probably never a Perl interpreter
that did it the other way 'round.

Ah, so I'm living on the edge? Are there some other common cases where it
comes to undefined results?

It happens every time an expression has a subexpression that has a
side-effect on other parts of the expression. The problem is by no
means limited to Perl, though Perl may have more operations with
side-effects than other languages. The scalar-modifying ++, --, +=,
*= and their ilk are candidates, as are the array-modifying operations
shift, unshift, pop, push and splice. There are many more.

They can be used freely in any expression as long as the modified
object doesn't appear again in the same expression. When it does,
there is no way of telling whether it is going to be used in its
original state or after modification.

Only a few languages (Lisp) commit themselves to a defined order of
evaluation of expressions. Others wisely reserve the freedom of
changing the evaluation order at random. This is wise because small
changes in the parsing technique (for sake of efficiency) may change
the order of evaluation in hard-to-predict ways.

Btw, the problem is worse in languages that have macro capabilities,
like C. A macro can hide the fact that a subexpression appears more
than once. The standard example is

#define SQUARE( x, y) ((x)*(x))
int i = 1;
printf( "%d is not 4\n", SQUARE( ++i));

That looks entirely innocent, yet it prints "6 is not 4".

Anno
 
A

Anno Siegel

Abigail said:
Anno Siegel ([email protected]) wrote on MMMMCXXVII
September MCMXCIII in <URL:?? > X-Ftn-To: Anno Siegel
?? >
?? > >> sub is_arithmetic_sequence2 {
?? > >> my $delta = shift;
?? > >> while ( @_ > 1 ) {
?? > >> (shift) + $delta == $_[ 0] or return 0;
?? > >
?? > >This is arguably wrong. The Perl interpreter is free to evaluate
?? > >subexpressions in any order, so there is no guarantee that shift()
?? > >has happened before $_[ 0] is evaluated. The result of the expression
[...]

?? Btw, the problem is worse in languages that have macro capabilities,
?? like C. A macro can hide the fact that a subexpression appears more
?? than once. The standard example is
??
?? #define SQUARE( x, y) ((x)*(x))
?? int i = 1;
?? printf( "%d is not 4\n", SQUARE( ++i));
??
?? That looks entirely innocent, yet it prints "6 is not 4".

Well, the latter is bad because it has a side-effect - not because the
order of evaluation isn't fully defined. If you replace '++ i' in the
expression above with a function call that adds 1 to i and then returns
it, the order of evaluation is fully defined - but it will still print 6.

It's true, the macro problem is only related by association. I was
rambling.
my $qr = qr/^.+?(;).+?\1|;Just another Perl Hacker;|;.+$/;
$qr =~ s/$qr//g;
print $qr, "\n";

Ah... A quine reproduces itself, including a payload. This cleans itself
from everything but the payload. It's an anti-quine.

Anno
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,021
Latest member
AkilahJaim

Latest Threads

Top