Consecutive Numbers

Graham Drabble · Dec 12, 2004

I have a file that contains a list of numbers. I'm trying to process
the file to find out how many rows start with 4 consecutive numbers
(either ascending or decending). Currently I've got

use strict;
use warnings;

open (IN, '4bell.txt') or die "Can't open IN: $!";

my $runs = 0;
while(<IN>){
chomp;
my $first = substr($_,0,1);
my $asc = $first . $first+1 . $first+2 . $first+3;
my $des = $first . $first-1 . $first-2 . $first-3;
if (/^($asc|$des)/){
$runs++
}
}
print "There were $runs runs\n";

IN
12345867
23457658
34568765
43215687
13245678

Prints
There were 4 runs

which is correct. However I can't help but think there must be a
shorter solution but can't think of it. Any ideas? The file could
contain up to 5000 lines.

Paul Lalli · Dec 12, 2004

Graham said:
I have a file that contains a list of numbers. I'm trying to process
the file to find out how many rows start with 4 consecutive numbers
(either ascending or decending). Currently I've got

use strict;
use warnings;

open (IN, '4bell.txt') or die "Can't open IN: $!";

my $runs = 0;
while(<IN>){
chomp;
my $first = substr($_,0,1);
my $asc = $first . $first+1 . $first+2 . $first+3;
my $des = $first . $first-1 . $first-2 . $first-3;
if (/^($asc|$des)/){
$runs++
}
}
print "There were $runs runs\n";

IN
12345867
23457658
34568765
43215687
13245678

Prints
There were 4 runs

which is correct. However I can't help but think there must be a
shorter solution but can't think of it. Any ideas? The file could
contain up to 5000 lines.

Here's my solution. I don't know if it's better or worse, but it's
certainly different (a couple extra test cases given - 6 'valid' lines
total):

#!/usr/bin/perl
use strict;
use warnings;

my $rows;

while (<DATA>){
chomp;
my @nums = (split //)[0..3];
$rows++ if (grep {$nums[$_] == $nums[$_-1]+1} 1..3) == 3;
$rows++ if (grep {$nums[$_] == $nums[$_-1]-1} 1..3) == 3;
}

print "Total of $rows rows\n";

__DATA__
12345867
23457658
34568765
43215687
13245678
67893131
12335668
13456789
76542109

Anno Siegel · Dec 12, 2004

Graham Drabble said:
I have a file that contains a list of numbers. I'm trying to process
the file to find out how many rows start with 4 consecutive numbers
(either ascending or decending). Currently I've got

Please don't speak of numbers when you mean digits. Your example is hard
to understand without that distinction. That is a waste of time.

use strict;
use warnings;

open (IN, '4bell.txt') or die "Can't open IN: $!";

my $runs = 0;
while(<IN>){
chomp;
my $first = substr($_,0,1);
my $asc = $first . $first+1 . $first+2 . $first+3;
my $des = $first . $first-1 . $first-2 . $first-3;
if (/^($asc|$des)/){
$runs++
}
}
print "There were $runs runs\n";

IN
12345867
23457658
34568765
43215687
13245678

Prints
There were 4 runs

which is correct. However I can't help but think there must be a
shorter solution but can't think of it. Any ideas? The file could
contain up to 5000 lines.

Worry about speed when the program runs too slow, not before.

I don't know about shorter, or even faster, but your way of building
a regular expression for each case is certainly clumsy and limited.

Regular expressions are for text, but what you are dealing with are
numbers (despite my stressing that they are digits). Use arithmetic
operations for numbers, the other way lies, for instance, a famous
Y2K bug.

Your ascending or descending sequences of digits can be considered
arithmetic sequences with a difference of +1 or -1. That the elements
happen to be single-digit numbers is irrelevant here. So write a
general routine that recognizes arithmetic sequences (of any length):

sub is_arithmetic_sequence {
my $delta = shift;
while ( @_ > 1 ) {
$_ + $delta == $_[ 0] or return 0 for shift;
}
return 1;
}

That can deal with ascending and descending sequences, so you don't
have to code for both.

To extract the four-digit sequences from each line, a regex is
appropriate. So the main loop is

my $runs = 0;
while ( <DATA> ) {
my @seq = /(\d)(\d)(\d)(\d)/ or next;
$runs ++ if
is_arithmetic_sequence( -1, @seq) or
is_arithmetic_sequence( 1, @seq);
}
print "There were $runs runs\n";

That should deal with a few thousand lines with no big problems on
current hardware. The code gives the right result for your data,
but isn't severely tested.

Anno

Jay Tilton · Dec 12, 2004

: I have a file that contains a list of numbers. I'm trying to process
: the file to find out how many rows start with 4 consecutive numbers
: (either ascending or decending). Currently I've got
:
: use strict;
: use warnings;
:
: open (IN, '4bell.txt') or die "Can't open IN: $!";
:
: my $runs = 0;
: while(<IN>){
: chomp;
: my $first = substr($_,0,1);
: my $asc = $first . $first+1 . $first+2 . $first+3;
: my $des = $first . $first-1 . $first-2 . $first-3;
: if (/^($asc|$des)/){
: $runs++
: }

The sample data don't reveal it, but the equal precedence of the "+" and
"." operators can create false positive matches. Try it with an element
starting with "7900".

Even after disambiguating with parentheses, e.g.

my $asc = $first . ($first+1) . ($first+2) . ($first+3);

it could still create a false positive match if an element started with,
say, "78910".

: }
: print "There were $runs runs\n";
:
: IN
: 12345867
: 23457658
: 34568765
: 43215687
: 13245678
:
: Prints
: There were 4 runs
:
: which is correct.

: However I can't help but think there must be a
: shorter solution but can't think of it. Any ideas? The file could
: contain up to 5000 lines.

How about:

if(
/^([0-6])(??{($1+1) . ($1+2) . ($1+3)})/
or
/^([3-9])(??{($1-1) . ($1-2) . ($1-3)})/
) {
$runs++;
}

or:

if(
/^([0-6])/ and substr($_,1,3) - $1 x 3 == 123
or
/^([3-9])/ and substr($_,1,3) - $1 x 3 == -123
) {
$runs++;
}

A. Sinan Unur · Dec 12, 2004

(e-mail address removed)-berlin.de (Anno Siegel) wrote in

my $runs = 0;
while ( <DATA> ) {
my @seq = /(\d)(\d)(\d)(\d)/ or next;

I think the OP wanted to see if the first four digits in each line were
consecutive. In light of that, and because I was curious, here is another
way of doing it.

use strict;
use warnings;

my $runs = 0;
my $lines = 0;

my $t = time;

while(my $line = <DATA>) {
next unless $line =~ /^(\d\d\d\d)/;
++$lines;
my @digits = split '', $1;
if(is_arithmetic_sequence(1, @digits)
|| is_arithmetic_sequence(-1, @digits)) {
++$runs;
}
}

$t = time - $t;

print "There were $runs runs\n";
print "It took $t seconds to process $lines lines\n";

sub is_arithmetic_sequence {
my $delta = shift;
for my $d (1 .. $#_) {
return 0 unless $delta == $_[$d] - $_[$d-1];
}
return 1;
}
__DATA__
12345867
23457658

.... 1_211_896 lines ...

That should deal with a few thousand lines with no big problems on
current hardware.

C:\Temp> perl md.pl
There were 774520 runs
It took 63 seconds to process 1211896 lines

That's about 20_000 lines per second on a 1Ghz Celeron running Win XP. Not
bad

Matija Papec · Dec 12, 2004

X-Ftn-To: Graham Drabble

Graham Drabble said:
I have a file that contains a list of numbers. I'm trying to process
the file to find out how many rows start with 4 consecutive numbers
(either ascending or decending). Currently I've got

#untested
use strict;
use warnings;

open (IN, '4bell.txt') or die "Can't open IN: $!";

my $runs = 0;
my $s2 = reverse my $s1 = join "", 0..9;
while (<IN>) {
my ($m) = /^(\d{4})/ or next;
$runs++ if $s1 =~ /$m/ or $s2 =~ /$m/;
}
print "There were $runs runs\n";

A. Sinan Unur · Dec 12, 2004

The solutions I've seen so far all calculate something inside the
loop. That's not as efficient as doing the work outside of the loop.
One should realise there are only 12 possible starting strings.

Indeed. This results in an increase from about 20_000 lines per second to
about 200_000 lines per second.

my $r = join "|" => qr /0123 1234 2345 3456 4567 5678 6789
3210 4321 5432 6543 7654 8765 9876/;

ITYM qw.

Sinan.

Ilya Zakharevich · Dec 12, 2004

[A complimentary Cc of this posting was sent to
Graham Drabble

I have a file that contains a list of numbers. I'm trying to process
the file to find out how many rows start with 4 consecutive digits
(either ascending or decending). Currently I've got

I'm puzzled at all complications: what is wrong with using something like

my $four = substr $in, 0, 4;
print 'OK' if length $four == 4 and
(0 <= index '0123456789', $four or 0 <= index '9876543210', $four);

Ilya

Anno Siegel · Dec 12, 2004

Ilya Zakharevich said:
[A complimentary Cc of this posting was sent to
Graham Drabble

I have a file that contains a list of numbers. I'm trying to process
the file to find out how many rows start with 4 consecutive digits
(either ascending or decending). Currently I've got

Click to expand...

I'm puzzled at all complications: what is wrong with using something like

my $four = substr $in, 0, 4;
print 'OK' if length $four == 4 and
(0 <= index '0123456789', $four or 0 <= index '9876543210', $four);

There's nothing wrong, it apperars to be even a tad faster than Abigail's
regex. (Only goes to show that alternatives in a regex are relatively slow).

When I introduced a general function is_arithmetic_sequence() to solve
the problem, I didn'd have speed in mind, but, on the contrary, that
it is affordable to use a more general utility. With the original data
size of a few thousand, the difference in speed wouldn't have mattered.

Anno

Graham Drabble · Dec 13, 2004

[A complimentary Cc of this posting was sent to
Graham Drabble

I have a file that contains a list of numbers. I'm trying to
process the file to find out how many rows start with 4
consecutive digits (either ascending or decending). Currently
I've got

Click to expand...

I'm puzzled at all complications: what is wrong with using
something like

my $four = substr $in, 0, 4;
print 'OK' if length $four == 4 and
(0 <= index '0123456789', $four or 0 <= index '9876543210',
$four);

Thanks everyone. I think I'm going to go with this one, partly for
speed and partly because it's simple enough that when I come back to
it I'll know what it does!

Is there any reason that you've used 0<= index rather than index >=0?

Christopher Nehren · Dec 13, 2004

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Is there any reason that you've used 0<= index rather than index >=0?

It's an extension of a technique used to prevent the all-too-common:

if($var = 0) { mumble() }

where

if($var == 0) { mumble() }

is what was meant. At least, that's how I understand it. I personally
don't see the point of doing that with relational operators other than
==, but I'm sure that someone can specify a reason.

Best Regards,
Christopher Nehren
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (FreeBSD)

iD8DBQFBvPI4k/lo7zvzJioRAkFFAJ4yzuG6Ln/q2gPwb9G14fGqC7+C+gCgsfYg
jNKKQEOABKaheJAjVJEaICQ=
=RnOJ
-----END PGP SIGNATURE-----

John W. Krahn · Dec 13, 2004

Graham said:
[A complimentary Cc of this posting was sent to
Graham Drabble

I have a file that contains a list of numbers. I'm trying to
process the file to find out how many rows start with 4
consecutive digits (either ascending or decending). Currently
I've got

Click to expand...

I'm puzzled at all complications: what is wrong with using
something like

my $four = substr $in, 0, 4;
print 'OK' if length $four == 4 and
(0 <= index '0123456789', $four or 0 <= index '9876543210',
$four);

Click to expand...

Thanks everyone. I think I'm going to go with this one, partly for
speed and partly because it's simple enough that when I come back to
it I'll know what it does!

Is there any reason that you've used 0<= index rather than index >=0?

Because of precedence (see perlop.) "index $a, $b >= 0" would require
parentheses "index( $a, $b ) >= 0" or it would be interpreted by perl as
"index $a, ( $b >= 0 )" while "0 <= index $a, $b" does not require parentheses.

John

Gary E. Ansok · Dec 13, 2004

I have a file that contains a list of numbers. I'm trying to process
the file to find out how many rows start with 4 consecutive numbers
(either ascending or decending). Currently I've got

use strict;
use warnings;

open (IN, '4bell.txt') or die "Can't open IN: $!";

my $runs = 0;
while(<IN>){
chomp;
my $first = substr($_,0,1);
my $asc = $first . $first+1 . $first+2 . $first+3;
my $des = $first . $first-1 . $first-2 . $first-3;
if (/^($asc|$des)/){
$runs++
}
}
print "There were $runs runs\n";

IN
12345867
23457658
34568765
43215687
13245678

Prints
There were 4 runs

which is correct. However I can't help but think there must be a
shorter solution but can't think of it. Any ideas? The file could
contain up to 5000 lines.

Since this is Perl, a solution involving a hash is always worth considering:

use strict;
use warnings;

open (IN, '4bell.txt') or die "Can't open IN: $!";

my @run_list = qw/0123 1234 2345 3456 4567 5678 6789
3210 4321 5432 6543 7654 8765 9876/;
my %run_hash;
@run_hash{@run_list} = ();
my $runs = 0;

while (<IN>) {
$runs++ if exists $run_hash{substr($_,0,4)};
}

print "There were $runs runs\n";

Gary Ansok

Anno Siegel · Dec 13, 2004

Gary E. Ansok said:
Since this is Perl, a solution involving a hash is always worth considering:

use strict;
use warnings;

open (IN, '4bell.txt') or die "Can't open IN: $!";

my @run_list = qw/0123 1234 2345 3456 4567 5678 6789
3210 4321 5432 6543 7654 8765 9876/;
my %run_hash;
@run_hash{@run_list} = ();
my $runs = 0;

while (<IN>) {
$runs++ if exists $run_hash{substr($_,0,4)};
}

print "There were $runs runs\n";

Fastest yet, by quite a margin, according to my inofficial score table:

regex (Abigail) 8 seconds
index (Ilya) 7 seconds
hash (Gary) 4 seconds

Anything that as much as calls a sub in the loop is out (> 10 seconds).

Anno

Matija Papec · Dec 14, 2004

X-Ftn-To: Anno Siegel

sub is_arithmetic_sequence {
my $delta = shift;
while ( @_ > 1 ) {
$_ + $delta == $_[ 0] or return 0 for shift;
}
return 1;
}

=== my benchmark ===

Rate foreach plain nosub_foreach nosub_plain
foreach 5698/s -- -68% -91% -92%
plain 17913/s 214% -- -72% -75%
nosub_foreach 64412/s 1030% 260% -- -11%
nosub_plain 72595/s 1174% 305% 13% --

I know that foreach is costly, even when using it only to topicalize
scalars. What baffles me here, is that there is far greater difference
between first two benchmarks => (foreach

lain = 1:3.14; your
is_arithmetic_sequence runs three times faster without foreach) compared to
next two 1:1.12 ?

Is Benchmark being misused?

use Benchmark qw

all);
use strict;

sub is_arithmetic_sequence {
my $delta = shift;
while ( @_ > 1 ) {
$_ + $delta == $_[ 0] or return 0 for shift;
}
return 1;
}
sub is_arithmetic_sequence2 {
my $delta = shift;
while ( @_ > 1 ) {
(shift) + $delta == $_[ 0] or return 0;
}
return 1;
}

cmpthese(40_000, {
foreach => sub {
my @r = 0..30;
is_arithmetic_sequence(1, @r);
},
plain => sub {
my @r = 0..30;
is_arithmetic_sequence2(1, @r);
},
nosub_foreach => sub {
my @r = 0..30;
my $delta = 1;
while ( @r > 1 ) {
$_ + $delta == $_[ 0] or return 0 for shift @r;
}
return 1;
},
nosub_plain => sub {
my @r = 0..30;
my $delta = 1;
while ( @r > 1 ) {
(shift @r) + $delta == $_[ 0] or return 0;
}
return 1;
}
});

Matija Papec · Dec 14, 2004

X-Ftn-To: Matija Papec

Matija Papec said:
is_arithmetic_sequence runs three times faster without foreach) compared to
next two 1:1.12 ?
Is Benchmark being misused?

I was using wrong arrays in last two subs.

Rate nosub_foreach foreach nosub_plain plain
foreach 5698/s 5% -- -64% -68%
plain 17841/s 228% 213% 12% --
nosub_foreach 5434/s -- -5% -66% -70%
nosub_plain 15981/s 194% 180% -- -10%

It seems that foreach is main speed killer.

Anno Siegel · Dec 15, 2004

Matija Papec said:
X-Ftn-To: Anno Siegel

sub is_arithmetic_sequence {
my $delta = shift;
while ( @_ > 1 ) {
$_ + $delta == $_[ 0] or return 0 for shift;
}
return 1;
}

Click to expand...

=== my benchmark ===

Rate foreach plain nosub_foreach nosub_plain
foreach 5698/s -- -68% -91% -92%
plain 17913/s 214% -- -72% -75%
nosub_foreach 64412/s 1030% 260% -- -11%
nosub_plain 72595/s 1174% 305% 13% --

Click to expand...

I know that foreach is costly, even when using it only to topicalize
scalars. What baffles me here, is that there is far greater difference
between first two benchmarks => (foreachlain = 1:3.14; your
is_arithmetic_sequence runs three times faster without foreach) compared to
next two 1:1.12 ?
Is Benchmark being misused?

Well, you have cleared that up in a followup. Benchmarking is an art.
It is all to common to find you have benchmarked something that has
little to do with the original problem. Happens to everyone...

sub is_arithmetic_sequence {
my $delta = shift;
while ( @_ > 1 ) {
$_ + $delta == $_[ 0] or return 0 for shift;
}
return 1;
}

I was entirely unconcerned with speed when I wrote that. The one-shot
"for" is there so that shift() and $_[ 0] don't appear in the same
statement the way they do in the code below.

sub is_arithmetic_sequence2 {
my $delta = shift;
while ( @_ > 1 ) {
(shift) + $delta == $_[ 0] or return 0;

This is arguably wrong. The Perl interpreter is free to evaluate
subexpressions in any order, so there is no guarantee that shift()
has happened before $_[ 0] is evaluated. The result of the expression
is undefined, even though there was probably never a Perl interpreter
that did it the other way 'round.

}
return 1;
}

Anno

Matija Papec · Dec 17, 2004

X-Ftn-To: Anno Siegel

$_ + $delta == $_[ 0] or return 0 for shift;
}
return 1;
}

Click to expand...

I was entirely unconcerned with speed when I wrote that. The one-shot

Ofcourse, it's just that I find your example interesting in light of
"subroutine calls slow down the program". While such way of thinking is
true, it's probably overestimated as there are greater speed killers like
"foreach".

sub is_arithmetic_sequence2 {
my $delta = shift;
while ( @_ > 1 ) {
(shift) + $delta == $_[ 0] or return 0;

Click to expand...

This is arguably wrong. The Perl interpreter is free to evaluate
subexpressions in any order, so there is no guarantee that shift()
has happened before $_[ 0] is evaluated. The result of the expression
is undefined, even though there was probably never a Perl interpreter
that did it the other way 'round.

Ah, so I'm living on the edge? Are there some other common cases where it
comes to undefined results?

Anno Siegel · Dec 18, 2004

Matija Papec said:
X-Ftn-To: Anno Siegel

sub is_arithmetic_sequence2 {
my $delta = shift;
while ( @_ > 1 ) {
(shift) + $delta == $_[ 0] or return 0;

Click to expand...

This is arguably wrong. The Perl interpreter is free to evaluate
subexpressions in any order, so there is no guarantee that shift()
has happened before $_[ 0] is evaluated. The result of the expression
is undefined, even though there was probably never a Perl interpreter
that did it the other way 'round.

Click to expand...

Ah, so I'm living on the edge? Are there some other common cases where it
comes to undefined results?

It happens every time an expression has a subexpression that has a
side-effect on other parts of the expression. The problem is by no
means limited to Perl, though Perl may have more operations with
side-effects than other languages. The scalar-modifying ++, --, +=,
*= and their ilk are candidates, as are the array-modifying operations
shift, unshift, pop, push and splice. There are many more.

They can be used freely in any expression as long as the modified
object doesn't appear again in the same expression. When it does,
there is no way of telling whether it is going to be used in its
original state or after modification.

Only a few languages (Lisp) commit themselves to a defined order of
evaluation of expressions. Others wisely reserve the freedom of
changing the evaluation order at random. This is wise because small
changes in the parsing technique (for sake of efficiency) may change
the order of evaluation in hard-to-predict ways.

Btw, the problem is worse in languages that have macro capabilities,
like C. A macro can hide the fact that a subexpression appears more
than once. The standard example is

#define SQUARE( x, y) ((x)*(x))
int i = 1;
printf( "%d is not 4\n", SQUARE( ++i));

That looks entirely innocent, yet it prints "6 is not 4".

Anno

Anno Siegel · Dec 19, 2004

Abigail said:
Anno Siegel ([email protected]) wrote on MMMMCXXVII
September MCMXCIII in <URL:?? > X-Ftn-To: Anno Siegel
?? >
?? > >> sub is_arithmetic_sequence2 {
?? > >> my $delta = shift;
?? > >> while ( @_ > 1 ) {
?? > >> (shift) + $delta == $_[ 0] or return 0;
?? > >
?? > >This is arguably wrong. The Perl interpreter is free to evaluate
?? > >subexpressions in any order, so there is no guarantee that shift()
?? > >has happened before $_[ 0] is evaluated. The result of the expression
[...]

?? Btw, the problem is worse in languages that have macro capabilities,
?? like C. A macro can hide the fact that a subexpression appears more
?? than once. The standard example is
??
?? #define SQUARE( x, y) ((x)*(x))
?? int i = 1;
?? printf( "%d is not 4\n", SQUARE( ++i));
??
?? That looks entirely innocent, yet it prints "6 is not 4".

Well, the latter is bad because it has a side-effect - not because the
order of evaluation isn't fully defined. If you replace '++ i' in the
expression above with a function call that adds 1 to i and then returns
it, the order of evaluation is fully defined - but it will still print 6.

It's true, the macro problem is only related by association. I was
rambling.

my $qr = qr/^.+?(.+?\1|;Just another Perl Hacker;|;.+$/;
$qr =~ s/$qr//g;
print $qr, "\n";

Ah... A quine reproduces itself, including a payload. This cleans itself
from everything but the payload. It's an anti-quine.

Anno

Taskcproblem calendar	4	Aug 31, 2023
Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023
printing two consecutive lines	9	Aug 21, 2008
Problem Splitting Text String	2	Dec 29, 2022
PHP RSS Feed Aggregator changing to todays date everytime feed is aggregated	1	Jan 11, 2022
2 consecutive jobs in one perl program	7	Aug 3, 2005
Perl debugger has wrong initial entry point and line numbers	1	Feb 9, 2005
An embeddable job widget showing job listings discussion of an approach	0	Mar 22, 2023

Consecutive Numbers

Graham Drabble

Paul Lalli

Anno Siegel

Jay Tilton

A. Sinan Unur

Matija Papec

A. Sinan Unur

Ilya Zakharevich

Anno Siegel

Graham Drabble

Christopher Nehren

John W. Krahn

Gary E. Ansok

Anno Siegel

Matija Papec

Matija Papec

Anno Siegel

Matija Papec

Anno Siegel

Anno Siegel

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads