A regex to search for numeric ranges...

Mr P · Apr 19, 2011

I read up on this on the www and I found ideas like

if ( /\b([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\b/ ) ...

which is pretty uncipherable at a glance and just in general not
elegant in any sense.

I generally do something like

if ( /(\d+)/ && $1 > 256 && $1 < 1024 )

Which to me is a lot more readable at a glance, but like the example
above not overly elegant..

But what I'd REALLY like to do is, similar to the trick for numeric
sort, a way to do it in the regex like

/[256-1024]/ # but force it to be numeric, not literal perhaps with a
switch

Thoughts, Masters?

Mr P · Apr 19, 2011

In comp.lang.perl.misc said:
In comp.lang.perl.misc said:

I read up on this on the www and I found ideas like

Click to expand...

if ( /\b([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\b/ ) ...

Click to expand...

which is pretty uncipherable at a glance and just in general not
elegant in any sense.

Click to expand...

True. That's why it's much better to not use regexps for numerical
ranges.

I generally do something like

Click to expand...

if ( /(\d+)/ && $1 > 256 && $1 < 1024 )

Click to expand...

I'd write that as

if ( /(\d+)/ && ($1 > 256) && ($1 < 1024) )

because I like to make sure things operate in the order I want them
to.

Which to me is a lot more readable at a glance, but like the example
above not overly elegant..

Click to expand...

But what I'd REALLY like to do is, similar to the trick for numeric
sort, a way to do it in the regex like

Click to expand...

/[256-1024]/ # but force it to be numeric, not literal perhaps with a
switch

Click to expand...

sub mknumre($$) {
my $low = shift;
my $hi = shift;

my $set = join('|', ($low .. $hi));

return qr/($set)/;

}

Thoughts, Masters?

Click to expand...

Why does this have to be a regular expression? Use the right tool
for the job.

I guess my answer to that question is that my 1-line regex is a lot
easier to read and much shorter than your 9-line monster!

Mr P · Apr 19, 2011

I'd write that as

if ( /(\d+)/ && ($1 > 256) && ($1 < 1024) )

because I like to make sure things operate in the order I want them
to.

There is no ambiguity in the order of my example- study ORDER
PRECEDENCE. Mine is just less syntax-intensive.

sln · Apr 21, 2011

I read up on this on the www and I found ideas like

if ( /\b([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\b/ ) ...

which is pretty uncipherable at a glance and just in general not
elegant in any sense.

I generally do something like

if ( /(\d+)/ && $1 > 256 && $1 < 1024 )

Which to me is a lot more readable at a glance, but like the example
above not overly elegant..

But what I'd REALLY like to do is, similar to the trick for numeric
sort, a way to do it in the regex like

/[256-1024]/ # but force it to be numeric, not literal perhaps with a
switch

Thoughts, Masters?

/[256-1024]/ is generally possible.
It has limitations that affect the surrounding expressions, but it
could be worked around and functionally generalized (again within
specific limitations).

-sln

-----------------------

use strict;
use warnings;

my $str = '0001023 widgets';

# Inline code is going to be a thing of the future and definitely
# going to happen (see perl 6 regex).
# This allows parameter checking and is usefull when the source
# has extended data to be regex analyzed in one expression.

if ($str =~ / \b (\d+) \b
(?(?{$^N > 256 && $^N < 1024}) # is this number between 256-1024?
# yes, continue processing
|
(*FAIL) # no, fail outright
)
# more expressions here ..
\s*
(.+)
/x )
{
print "Number: '$1', Type: '$2'\n";
}
else {
print "failed\n";
}

print "\n";

# This does a source conversion of \d+ to a single utf8 character.
# It then allows checking it in a HEX numeric range character class.
# Even though the source is decimal, '1023', when magically assumed to
# be hex and converted to a utf8 char like "\x{1023}", its code point
# will be corectly matched within a regex character class range.
# Example: "\x{1023}" =~ /[\x{257}-\x{1023}]/ will match.
# And, only "\x{N}" where N is between 257-1023 will match.

for (0 .. 4096)
{
# Construct a fake string using the current counter.
# In reality, you have to parse the source string and do the conversion
# so that you end up doing something like this:
# $src =~ /^(.*?)\b(\d+)\b(.*?)$/
# eval "\$temp_src = \"$1\\x{$2}$3\" ";
# Then use the $temp_src in place of the $str below.

my $padded_string = "000$_"; # the extra '000' padding is just a test
eval "\$str = \"\\x{$padded_string} widgets\" ";

if ( $str =~ /^ ([\x{257}-\x{1023}])
\s*
(.+)
/x )
{
print "Number: '$padded_string', Type: '$2'\n";
}
}
__END__

Output
------------

Number: '0001023', Type: 'widgets'

Number: '000257', Type: 'widgets'
Number: '000258', Type: 'widgets'
Number: '000259', Type: 'widgets'
Number: '000260', Type: 'widgets'
Number: '000261', Type: 'widgets'
Number: '000262', Type: 'widgets'
Number: '000263', Type: 'widgets'
Number: '000264', Type: 'widgets'
Number: '000265', Type: 'widgets'
Number: '000266', Type: 'widgets'
Number: '000267', Type: 'widgets'
...
...
Number: '0001012', Type: 'widgets'
Number: '0001013', Type: 'widgets'
Number: '0001014', Type: 'widgets'
Number: '0001015', Type: 'widgets'
Number: '0001016', Type: 'widgets'
Number: '0001017', Type: 'widgets'
Number: '0001018', Type: 'widgets'
Number: '0001019', Type: 'widgets'
Number: '0001020', Type: 'widgets'
Number: '0001021', Type: 'widgets'
Number: '0001022', Type: 'widgets'
Number: '0001023', Type: 'widgets'

Uri Guttman · Apr 21, 2011

s> /[256-1024]/ is generally possible.

s> It has limitations that affect the surrounding expressions, but it
s> could be worked around and functionally generalized (again within
s> specific limitations).

limitations? it is just wrong. that is a char class of all those digits
(and i am not even sure what [6-1] will generate).

uri

Ilya Zakharevich · Apr 24, 2011

I'm sure. The second one, mapping integer sequences to characters to
then use a Unicode character class has all the workings of a brilliant
bit of obfuscation. I suspect it doesn't scale well, say 2^16 or
2^32, but I don't really know how Perl handles Unicode internally.

When I worked on this (long time ago), there were no compilers with
128-bit IV sitting around (are there now?). Hence the support I
implemented was intended to work "up to maximal number
representantable by UV", but it is actually coded with limitation "not
higher than 64 bits". I doubt anybody expanded to further than
this (the "hooks" for expansion are there, just probably not implemented)...

Hope this helps,
Ilya

Keith Thompson · Apr 27, 2011

Eli the Bearded said:
In comp.lang.perl.misc said:

I read up on this on the www and I found ideas like

if ( /\b([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\b/ ) ...

which is pretty uncipherable at a glance and just in general not
elegant in any sense.

Click to expand...

True. That's why it's much better to not use regexps for numerical
ranges.

I generally do something like

if ( /(\d+)/ && $1 > 256 && $1 < 1024 )

Click to expand...

I'd write that as

if ( /(\d+)/ && ($1 > 256) && ($1 < 1024) )

because I like to make sure things operate in the order I want them
to.

Really?

First off, I hope you're aware that both forms are exactly
equivalent., since "<" binds more tightly than "&&", and "&&"
imposes a left-to-right evaluation with or without the parentheses.

An argument for using the extra parentheses would be that they make
it clearer. They don't for me personally; in this particular case,
the precedence is carved deeply enough into my brain that it's clear
enough without the parentheses. But YMMV. Obviously, different
people have different levels of comfort with the precedence levels
of the various operators.

But I'd write it as:

if (/(\d+)/ and $1 > 256 and $1 < 1024)

I usually prefer "and" and "or" over "&&" and "||". On the other
hand, I have been bitten a few times by the *low* precedence of
"and" and "or"; I've mistakenly written things like

return $this and $that;

which never evaluates $that.

(And none of these are equivalent to the original regexp, which
checks for values from 0 to 255.)

Uri Guttman · Apr 27, 2011

KT> First off, I hope you're aware that both forms are exactly
KT> equivalent., since "<" binds more tightly than "&&", and "&&"
KT> imposes a left-to-right evaluation with or without the parentheses.

KT> An argument for using the extra parentheses would be that they make
KT> it clearer. They don't for me personally; in this particular case,
KT> the precedence is carved deeply enough into my brain that it's clear
KT> enough without the parentheses. But YMMV. Obviously, different
KT> people have different levels of comfort with the precedence levels
KT> of the various operators.

i agree with the dropping of unneeded parens. one place i do use extra
parens is with ?:. i find parens around the conditional part helps given
the usually longer total expression. it highlights that as the
conditional part. not critical but a little style thing i do. and it is
especially helpful when doing nested ?: ops.

uri

Jim Gibson · Apr 28, 2011

Keith Thompson said:
Eli the Bearded said:

In comp.lang.perl.misc said:

I read up on this on the www and I found ideas like

if ( /\b([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\b/ ) ...

which is pretty uncipherable at a glance and just in general not
elegant in any sense.

Click to expand...

True. That's why it's much better to not use regexps for numerical
ranges.

I generally do something like

if ( /(\d+)/ && $1 > 256 && $1 < 1024 )

Click to expand...

I'd write that as

if ( /(\d+)/ && ($1 > 256) && ($1 < 1024) )

because I like to make sure things operate in the order I want them
to.

Click to expand...

Really?

First off, I hope you're aware that both forms are exactly
equivalent., since "<" binds more tightly than "&&", and "&&"
imposes a left-to-right evaluation with or without the parentheses.

An argument for using the extra parentheses would be that they make
it clearer. They don't for me personally; in this particular case,
the precedence is carved deeply enough into my brain that it's clear
enough without the parentheses. But YMMV. Obviously, different
people have different levels of comfort with the precedence levels
of the various operators.

Another argument for using the extra, redundant parentheses is that it
will work without regard to precedence. I always use the parentheses.
That way I don't have to remember what the operator precedence is and
can worry about other things.

To quote Sherlock Holmes:

"You see," he explained, "I consider that a man's brain originally is
like a little empty attic, and you have to stock it with such furniture
as you choose. A fool takes in all the lumber of every sort that he
comes across, so that the knowledge which might be useful to him gets
crowded out, or at best is jumbled up with a lot of other things so
that he has a difficulty in laying his hands upon it. Now the skilful
workman is very careful indeed as to what he takes into his
brain-attic. He will have nothing but the tools which may help him in
doing his work, but of these he has a large assortment, and all in the
most perfect order. It is a mistake to think that that little room has
elastic walls and can distend to any extent. Depend upon it there comes
a time when for every addition of knowledge you forget something that
you knew before. It is of the highest importance, therefore, not to
have useless facts elbowing out the useful ones."

-- /A Study in Scarlet/, A. C. Doyle.

Justin C · Apr 28, 2011

To quote Sherlock Holmes:

"You see," he explained, "I consider that a man's brain originally is
like a little empty attic, and you have to stock it with such furniture
as you choose. A fool takes in all the lumber of every sort that he
comes across, so that the knowledge which might be useful to him gets
crowded out, or at best is jumbled up with a lot of other things so
that he has a difficulty in laying his hands upon it. Now the skilful
workman is very careful indeed as to what he takes into his
brain-attic. He will have nothing but the tools which may help him in
doing his work, but of these he has a large assortment, and all in the
most perfect order. It is a mistake to think that that little room has
elastic walls and can distend to any extent. Depend upon it there comes
a time when for every addition of knowledge you forget something that
you knew before. It is of the highest importance, therefore, not to
have useless facts elbowing out the useful ones."

Now we know where Matt Groening got Homer's quote "...every time I learn
something new it pushes some old stuff out of my brain".

I should read more... but then I'd probably forget stuff I want to
remember.

Justin.

Creating a regex to get multiple values and print	0	Jan 10, 2021
Using range-based for with alternative ranges	2	May 18, 2012
using multiple ranges	0	Oct 2, 2009
Regex testing and UTF8 awarenes or Regex and numeric pattern matching	2	Mar 10, 2009
Parsing Numeric Data	5	Oct 16, 2012
Minimum Total Difficulty	0	Nov 15, 2023
Machine Learning.. Endless Struggle	3	Feb 16, 2023
How to try a range of hex values in C# code ?	0	Nov 19, 2022

A regex to search for numeric ranges...

Mr P

Mr P

Mr P

sln

Uri Guttman

Ilya Zakharevich

Keith Thompson

Uri Guttman

Jim Gibson

Justin C

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads