A regex to search for numeric ranges...

M

Mr P

I read up on this on the www and I found ideas like

if ( /\b([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\b/ ) ...

which is pretty uncipherable at a glance and just in general not
elegant in any sense.

I generally do something like

if ( /(\d+)/ && $1 > 256 && $1 < 1024 )


Which to me is a lot more readable at a glance, but like the example
above not overly elegant..

But what I'd REALLY like to do is, similar to the trick for numeric
sort, a way to do it in the regex like

/[256-1024]/ # but force it to be numeric, not literal perhaps with a
switch

Thoughts, Masters?
 
M

Mr P

In comp.lang.perl.misc said:
I read up on this on the www and I found ideas like
if ( /\b([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\b/ ) ...
which is pretty uncipherable at a glance and just in general not
elegant in any sense.

True. That's why it's much better to not use regexps for numerical
ranges.
I generally do something like
 if ( /(\d+)/ && $1 > 256 && $1 < 1024 )

I'd write that as

   if ( /(\d+)/ && ($1 > 256) && ($1 < 1024) )

because I like to make sure things operate in the order I want them
to.
Which to me is a lot more readable at a glance, but like the example
above not overly elegant..
But what I'd REALLY like to do is, similar to the  trick for numeric
sort, a way to do it in the regex like
/[256-1024]/ # but force it to be numeric, not literal perhaps with a
switch

sub mknumre($$) {
  my $low = shift;
  my $hi  = shift;

  my $set = join('|', ($low .. $hi));

  return qr/($set)/;

}
Thoughts, Masters?

Why does this have to be a regular expression? Use the right tool
for the job.

I guess my answer to that question is that my 1-line regex is a lot
easier to read and much shorter than your 9-line monster!
 
M

Mr P

I'd write that as

   if ( /(\d+)/ && ($1 > 256) && ($1 < 1024) )

because I like to make sure things operate in the order I want them
to.

There is no ambiguity in the order of my example- study ORDER
PRECEDENCE. Mine is just less syntax-intensive.
 
S

sln

I read up on this on the www and I found ideas like

if ( /\b([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\b/ ) ...

which is pretty uncipherable at a glance and just in general not
elegant in any sense.

I generally do something like

if ( /(\d+)/ && $1 > 256 && $1 < 1024 )


Which to me is a lot more readable at a glance, but like the example
above not overly elegant..

But what I'd REALLY like to do is, similar to the trick for numeric
sort, a way to do it in the regex like

/[256-1024]/ # but force it to be numeric, not literal perhaps with a
switch

Thoughts, Masters?


/[256-1024]/ is generally possible.
It has limitations that affect the surrounding expressions, but it
could be worked around and functionally generalized (again within
specific limitations).

-sln

-----------------------

use strict;
use warnings;

my $str = '0001023 widgets';

# Inline code is going to be a thing of the future and definitely
# going to happen (see perl 6 regex).
# This allows parameter checking and is usefull when the source
# has extended data to be regex analyzed in one expression.

if ($str =~ / \b (\d+) \b
(?(?{$^N > 256 && $^N < 1024}) # is this number between 256-1024?
# yes, continue processing
|
(*FAIL) # no, fail outright
)
# more expressions here ..
\s*
(.+)
/x )
{
print "Number: '$1', Type: '$2'\n";
}
else {
print "failed\n";
}

print "\n";

# This does a source conversion of \d+ to a single utf8 character.
# It then allows checking it in a HEX numeric range character class.
# Even though the source is decimal, '1023', when magically assumed to
# be hex and converted to a utf8 char like "\x{1023}", its code point
# will be corectly matched within a regex character class range.
# Example: "\x{1023}" =~ /[\x{257}-\x{1023}]/ will match.
# And, only "\x{N}" where N is between 257-1023 will match.

for (0 .. 4096)
{
# Construct a fake string using the current counter.
# In reality, you have to parse the source string and do the conversion
# so that you end up doing something like this:
# $src =~ /^(.*?)\b(\d+)\b(.*?)$/
# eval "\$temp_src = \"$1\\x{$2}$3\" ";
# Then use the $temp_src in place of the $str below.

my $padded_string = "000$_"; # the extra '000' padding is just a test
eval "\$str = \"\\x{$padded_string} widgets\" ";

if ( $str =~ /^ ([\x{257}-\x{1023}])
\s*
(.+)
/x )
{
print "Number: '$padded_string', Type: '$2'\n";
}
}
__END__

Output
------------

Number: '0001023', Type: 'widgets'

Number: '000257', Type: 'widgets'
Number: '000258', Type: 'widgets'
Number: '000259', Type: 'widgets'
Number: '000260', Type: 'widgets'
Number: '000261', Type: 'widgets'
Number: '000262', Type: 'widgets'
Number: '000263', Type: 'widgets'
Number: '000264', Type: 'widgets'
Number: '000265', Type: 'widgets'
Number: '000266', Type: 'widgets'
Number: '000267', Type: 'widgets'
...
...
Number: '0001012', Type: 'widgets'
Number: '0001013', Type: 'widgets'
Number: '0001014', Type: 'widgets'
Number: '0001015', Type: 'widgets'
Number: '0001016', Type: 'widgets'
Number: '0001017', Type: 'widgets'
Number: '0001018', Type: 'widgets'
Number: '0001019', Type: 'widgets'
Number: '0001020', Type: 'widgets'
Number: '0001021', Type: 'widgets'
Number: '0001022', Type: 'widgets'
Number: '0001023', Type: 'widgets'
 
U

Uri Guttman

s> /[256-1024]/ is generally possible.

s> It has limitations that affect the surrounding expressions, but it
s> could be worked around and functionally generalized (again within
s> specific limitations).

limitations? it is just wrong. that is a char class of all those digits
(and i am not even sure what [6-1] will generate).

uri
 
I

Ilya Zakharevich

I'm sure. The second one, mapping integer sequences to characters to
then use a Unicode character class has all the workings of a brilliant
bit of obfuscation. I suspect it doesn't scale well, say 2^16 or
2^32, but I don't really know how Perl handles Unicode internally.

When I worked on this (long time ago), there were no compilers with
128-bit IV sitting around (are there now?). Hence the support I
implemented was intended to work "up to maximal number
representantable by UV", but it is actually coded with limitation "not
higher than 64 bits". I doubt anybody expanded to further than
this (the "hooks" for expansion are there, just probably not implemented)...

Hope this helps,
Ilya
 
K

Keith Thompson

Eli the Bearded said:
In comp.lang.perl.misc said:
I read up on this on the www and I found ideas like

if ( /\b([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\b/ ) ...

which is pretty uncipherable at a glance and just in general not
elegant in any sense.

True. That's why it's much better to not use regexps for numerical
ranges.
I generally do something like

if ( /(\d+)/ && $1 > 256 && $1 < 1024 )

I'd write that as

if ( /(\d+)/ && ($1 > 256) && ($1 < 1024) )

because I like to make sure things operate in the order I want them
to.

Really?

First off, I hope you're aware that both forms are exactly
equivalent., since "<" binds more tightly than "&&", and "&&"
imposes a left-to-right evaluation with or without the parentheses.

An argument for using the extra parentheses would be that they make
it clearer. They don't for me personally; in this particular case,
the precedence is carved deeply enough into my brain that it's clear
enough without the parentheses. But YMMV. Obviously, different
people have different levels of comfort with the precedence levels
of the various operators.

But I'd write it as:

if (/(\d+)/ and $1 > 256 and $1 < 1024)

I usually prefer "and" and "or" over "&&" and "||". On the other
hand, I have been bitten a few times by the *low* precedence of
"and" and "or"; I've mistakenly written things like

return $this and $that;

which never evaluates $that.

(And none of these are equivalent to the original regexp, which
checks for values from 0 to 255.)
 
U

Uri Guttman

KT> First off, I hope you're aware that both forms are exactly
KT> equivalent., since "<" binds more tightly than "&&", and "&&"
KT> imposes a left-to-right evaluation with or without the parentheses.

KT> An argument for using the extra parentheses would be that they make
KT> it clearer. They don't for me personally; in this particular case,
KT> the precedence is carved deeply enough into my brain that it's clear
KT> enough without the parentheses. But YMMV. Obviously, different
KT> people have different levels of comfort with the precedence levels
KT> of the various operators.

i agree with the dropping of unneeded parens. one place i do use extra
parens is with ?:. i find parens around the conditional part helps given
the usually longer total expression. it highlights that as the
conditional part. not critical but a little style thing i do. and it is
especially helpful when doing nested ?: ops.

uri
 
J

Jim Gibson

Keith Thompson said:
Eli the Bearded said:
In comp.lang.perl.misc said:
I read up on this on the www and I found ideas like

if ( /\b([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\b/ ) ...

which is pretty uncipherable at a glance and just in general not
elegant in any sense.

True. That's why it's much better to not use regexps for numerical
ranges.
I generally do something like

if ( /(\d+)/ && $1 > 256 && $1 < 1024 )

I'd write that as

if ( /(\d+)/ && ($1 > 256) && ($1 < 1024) )

because I like to make sure things operate in the order I want them
to.

Really?

First off, I hope you're aware that both forms are exactly
equivalent., since "<" binds more tightly than "&&", and "&&"
imposes a left-to-right evaluation with or without the parentheses.

An argument for using the extra parentheses would be that they make
it clearer. They don't for me personally; in this particular case,
the precedence is carved deeply enough into my brain that it's clear
enough without the parentheses. But YMMV. Obviously, different
people have different levels of comfort with the precedence levels
of the various operators.

Another argument for using the extra, redundant parentheses is that it
will work without regard to precedence. I always use the parentheses.
That way I don't have to remember what the operator precedence is and
can worry about other things.

To quote Sherlock Holmes:

"You see," he explained, "I consider that a man's brain originally is
like a little empty attic, and you have to stock it with such furniture
as you choose. A fool takes in all the lumber of every sort that he
comes across, so that the knowledge which might be useful to him gets
crowded out, or at best is jumbled up with a lot of other things so
that he has a difficulty in laying his hands upon it. Now the skilful
workman is very careful indeed as to what he takes into his
brain-attic. He will have nothing but the tools which may help him in
doing his work, but of these he has a large assortment, and all in the
most perfect order. It is a mistake to think that that little room has
elastic walls and can distend to any extent. Depend upon it there comes
a time when for every addition of knowledge you forget something that
you knew before. It is of the highest importance, therefore, not to
have useless facts elbowing out the useful ones."

-- /A Study in Scarlet/, A. C. Doyle.
 
J

Justin C

To quote Sherlock Holmes:

"You see," he explained, "I consider that a man's brain originally is
like a little empty attic, and you have to stock it with such furniture
as you choose. A fool takes in all the lumber of every sort that he
comes across, so that the knowledge which might be useful to him gets
crowded out, or at best is jumbled up with a lot of other things so
that he has a difficulty in laying his hands upon it. Now the skilful
workman is very careful indeed as to what he takes into his
brain-attic. He will have nothing but the tools which may help him in
doing his work, but of these he has a large assortment, and all in the
most perfect order. It is a mistake to think that that little room has
elastic walls and can distend to any extent. Depend upon it there comes
a time when for every addition of knowledge you forget something that
you knew before. It is of the highest importance, therefore, not to
have useless facts elbowing out the useful ones."

Now we know where Matt Groening got Homer's quote "...every time I learn
something new it pushes some old stuff out of my brain".

I should read more... but then I'd probably forget stuff I want to
remember.

Justin.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,538
Members
45,024
Latest member
ARDU_PROgrammER

Latest Threads

Top