S
sln
Somebody posted recently on perl.beginners this topic
(was: "Regex to match a numerical range")
The person was trying to match, using a regex, a range of IP's
like 127.0.0.[0-255] or something like that.
A bunch of posters replied with a textual solution.
The group is a list and I don't really have an email or know
how it works. I was going to reply with something like below.
It heavily uses eval, and has some moderate level regexs'.
The principle is that dec number becomes hex in \x{#} utf8 char.
Its workings are not at all that obvious and its pretty slow
comparitively, not only because of the evals' but because of
the runover past bytes, when it becomes utf8 characters in the
regexs'.
For example, a test case is to itterate a numeric range of
0-255, where for instance 255 is asumed to be a hex number
\x255 not a decimal. So a range of continuous decimal numbers
has a different output range used as hex numbers.
However, any check in the range of \x0 - \x255 utf8 characters
apparently works, where \x0 < \x127 < \x255, so it is deduced
that in decimal, 127 is greater than 0 and less than 255.
Inserting these as characters in a regex had me concerned for
a while. But, I tested it enough to be satisfied.
Here is an excerpt of the post from perl.beginners:
" For a reason i don't understand:
127.0.0.1 doesn't match as expected...
Everything between 127.0.0.2 and 127.0.0.299 matches...
127.0.0.230 doesn't match...
What am I doing wrong??
"
And there were many good replies.
-sln
# Regex IP Range matching
# where dec number becomes hex in \x{#} utf8 char
# ------------------------------------------------
use strict;
use warnings;
#### Test cases
# testQuadAndPortRange();
print "\n";
my $pattern = makeUIpRegex('178. [10-45] .[180-200] . [223-254]: [190-195] ');
print "Testing Ip range ...\npattern =\n$pattern\n\n";
my ($count, $matched, $nomatch) = (0,0,0);
for my $q2 (20 .. 22) {
for my $q3 (0 .. 255) {
for my $q4 (0 .. 255)
{
my $curip = "178 .$q2. $q3 .$q4 :$q3";
if ( makeUIp( $curip ) =~ /$pattern/ ) {
print "Matched! ($curip)\n";
$matched++;
}
else {
# print "No match! ($curip)\n";
$nomatch++;
}
$count++;
}
}
}
print <<EOM;
Checked $count
Matched $matched
Not-matched $nomatch
EOM
exit;
#### subs
my ( $rx_Ip, $rx_IpRange );
## Constructs a utf8 char string from a decimal notation IP
# (where input = dec number, becomes hex in \x{#} utf8 char)
# Input -> '#.#.#.# (optional-> : [#-#] )'
sub makeUIp
{
my ($ip) = @_;
BEGIN { $rx_Ip = qr/
^ \s*
(\d{1,3}) \s* \. \s* # q1 (1-3 digits)
(\d{1,3}) \s* \. \s* # q2 "
(\d{1,3}) \s* \. \s* # q3 "
(\d{1,3}) \s* # q4 "
(?: : \s* (\d{1,5}) \s*)? # optional port num (1-5 digits)
$ /x;
}
if ($ip =~ / $rx_Ip /x ) {
if (defined $5) {
eval " \$ip = \"\\x{$1}\.\\x{$2}\.\\x{$3}\.\\x{$4} : \\x{$5}\" ";
}
else {
eval " \$ip = \"\\x{$1}\.\\x{$2}\.\\x{$3}\.\\x{$4}\" ";
}
if ($@) { warn $@; return ''; }
return $ip;
}
# not the correct ip form
return '';
}
## Constructs a regex utf8 pattern from a decimal notation IP template
# (where input = dec number, becomes hex in \x{#} utf8 char)
# Input-> '#.#.#.#' to '[#-#].[#-#].[#-#].[#-#] : [#-#]'
sub makeUIpRegex
{
my ($ip) = @_;
my $res = '';
BEGIN { $rx_IpRange = qr/
^ \s*
(?: (\d{1,3}) | \[\s*(\d{1,3})\s* - \s*(\d{1,3})\s*\] ) \s*\.\s*
(?: (\d{1,3}) | \[\s*(\d{1,3})\s* - \s*(\d{1,3})\s*\] ) \s*\.\s*
(?: (\d{1,3}) | \[\s*(\d{1,3})\s* - \s*(\d{1,3})\s*\] ) \s*\.\s*
(?: (\d{1,3}) | \[\s*(\d{1,3})\s* - \s*(\d{1,3})\s*\] ) \s*
(?:
: \s* (?: (\d{1,5}) | \[\s*(\d{1,5})\s*-\s*(\d{1,5})\s*\] ) \s*
)?
$ /x;
}
if ($ip =~ / $rx_IpRange /x ) {
if (defined $1) { $res .= qq(\\x{$1}\\\\.) }
else { $res .= qq([\\x{$2}-\\x{$3}]\\\\.) }
if (defined $4) { $res .= qq(\\x{$4}\\\\.) }
else { $res .= qq([\\x{$5}-\\x{$6}]\\\\.) }
if (defined $7) { $res .= qq(\\x{$7}\\\\.) }
else { $res .= qq([\\x{$8}-\\x{$9}]\\\\.) }
if (defined $10) { $res .= qq(\\x{$10}) }
else { $res .= qq([\\x{$11}-\\x{$12}]) }
if (defined $13) {
$res .= qq(\\\\ :\\\\ \\x{$13});
}
elsif (defined $14) {
$res .= qq(\\\\ :\\\\ [\\x{$14}-\\x{$15}]);
}
eval "\$ip = \"$res\" ";
if ($@) { warn $@; return ''; }
return qr/$ip/x;
}
# not the correct form
return '';
}
## Constructs and runs utf8 regex /[$i-$i]/,
# (where $i = dec number, becomes hex in \x{$i} utf8 char)
# Tests for conflicts in character class syntax
sub testQuadAndPortRange
{
print "Testing quad and port range for conflicts ...\n";
for my $i (0 .. 99999)
{
my ($rx,$src);
eval " \$rx = '^'.\"\[\\x{$i}\\-\\x{$i}\]\".'\$' ";
if ($@) { warn $@; next; }
eval " \$src = \"\\x{$i}\" ";
if ($@) { warn $@; next; }
if ($src =~ / $rx /x) {
# print "OK! $i\n";
}
else {
print "***** BAD $i $rx\n";
# sleep (1);
}
}
}
__END__
(was: "Regex to match a numerical range")
The person was trying to match, using a regex, a range of IP's
like 127.0.0.[0-255] or something like that.
A bunch of posters replied with a textual solution.
The group is a list and I don't really have an email or know
how it works. I was going to reply with something like below.
It heavily uses eval, and has some moderate level regexs'.
The principle is that dec number becomes hex in \x{#} utf8 char.
Its workings are not at all that obvious and its pretty slow
comparitively, not only because of the evals' but because of
the runover past bytes, when it becomes utf8 characters in the
regexs'.
For example, a test case is to itterate a numeric range of
0-255, where for instance 255 is asumed to be a hex number
\x255 not a decimal. So a range of continuous decimal numbers
has a different output range used as hex numbers.
However, any check in the range of \x0 - \x255 utf8 characters
apparently works, where \x0 < \x127 < \x255, so it is deduced
that in decimal, 127 is greater than 0 and less than 255.
Inserting these as characters in a regex had me concerned for
a while. But, I tested it enough to be satisfied.
Here is an excerpt of the post from perl.beginners:
" For a reason i don't understand:
127.0.0.1 doesn't match as expected...
Everything between 127.0.0.2 and 127.0.0.299 matches...
127.0.0.230 doesn't match...
What am I doing wrong??
"
And there were many good replies.
-sln
# Regex IP Range matching
# where dec number becomes hex in \x{#} utf8 char
# ------------------------------------------------
use strict;
use warnings;
#### Test cases
# testQuadAndPortRange();
print "\n";
my $pattern = makeUIpRegex('178. [10-45] .[180-200] . [223-254]: [190-195] ');
print "Testing Ip range ...\npattern =\n$pattern\n\n";
my ($count, $matched, $nomatch) = (0,0,0);
for my $q2 (20 .. 22) {
for my $q3 (0 .. 255) {
for my $q4 (0 .. 255)
{
my $curip = "178 .$q2. $q3 .$q4 :$q3";
if ( makeUIp( $curip ) =~ /$pattern/ ) {
print "Matched! ($curip)\n";
$matched++;
}
else {
# print "No match! ($curip)\n";
$nomatch++;
}
$count++;
}
}
}
print <<EOM;
Checked $count
Matched $matched
Not-matched $nomatch
EOM
exit;
#### subs
my ( $rx_Ip, $rx_IpRange );
## Constructs a utf8 char string from a decimal notation IP
# (where input = dec number, becomes hex in \x{#} utf8 char)
# Input -> '#.#.#.# (optional-> : [#-#] )'
sub makeUIp
{
my ($ip) = @_;
BEGIN { $rx_Ip = qr/
^ \s*
(\d{1,3}) \s* \. \s* # q1 (1-3 digits)
(\d{1,3}) \s* \. \s* # q2 "
(\d{1,3}) \s* \. \s* # q3 "
(\d{1,3}) \s* # q4 "
(?: : \s* (\d{1,5}) \s*)? # optional port num (1-5 digits)
$ /x;
}
if ($ip =~ / $rx_Ip /x ) {
if (defined $5) {
eval " \$ip = \"\\x{$1}\.\\x{$2}\.\\x{$3}\.\\x{$4} : \\x{$5}\" ";
}
else {
eval " \$ip = \"\\x{$1}\.\\x{$2}\.\\x{$3}\.\\x{$4}\" ";
}
if ($@) { warn $@; return ''; }
return $ip;
}
# not the correct ip form
return '';
}
## Constructs a regex utf8 pattern from a decimal notation IP template
# (where input = dec number, becomes hex in \x{#} utf8 char)
# Input-> '#.#.#.#' to '[#-#].[#-#].[#-#].[#-#] : [#-#]'
sub makeUIpRegex
{
my ($ip) = @_;
my $res = '';
BEGIN { $rx_IpRange = qr/
^ \s*
(?: (\d{1,3}) | \[\s*(\d{1,3})\s* - \s*(\d{1,3})\s*\] ) \s*\.\s*
(?: (\d{1,3}) | \[\s*(\d{1,3})\s* - \s*(\d{1,3})\s*\] ) \s*\.\s*
(?: (\d{1,3}) | \[\s*(\d{1,3})\s* - \s*(\d{1,3})\s*\] ) \s*\.\s*
(?: (\d{1,3}) | \[\s*(\d{1,3})\s* - \s*(\d{1,3})\s*\] ) \s*
(?:
: \s* (?: (\d{1,5}) | \[\s*(\d{1,5})\s*-\s*(\d{1,5})\s*\] ) \s*
)?
$ /x;
}
if ($ip =~ / $rx_IpRange /x ) {
if (defined $1) { $res .= qq(\\x{$1}\\\\.) }
else { $res .= qq([\\x{$2}-\\x{$3}]\\\\.) }
if (defined $4) { $res .= qq(\\x{$4}\\\\.) }
else { $res .= qq([\\x{$5}-\\x{$6}]\\\\.) }
if (defined $7) { $res .= qq(\\x{$7}\\\\.) }
else { $res .= qq([\\x{$8}-\\x{$9}]\\\\.) }
if (defined $10) { $res .= qq(\\x{$10}) }
else { $res .= qq([\\x{$11}-\\x{$12}]) }
if (defined $13) {
$res .= qq(\\\\ :\\\\ \\x{$13});
}
elsif (defined $14) {
$res .= qq(\\\\ :\\\\ [\\x{$14}-\\x{$15}]);
}
eval "\$ip = \"$res\" ";
if ($@) { warn $@; return ''; }
return qr/$ip/x;
}
# not the correct form
return '';
}
## Constructs and runs utf8 regex /[$i-$i]/,
# (where $i = dec number, becomes hex in \x{$i} utf8 char)
# Tests for conflicts in character class syntax
sub testQuadAndPortRange
{
print "Testing quad and port range for conflicts ...\n";
for my $i (0 .. 99999)
{
my ($rx,$src);
eval " \$rx = '^'.\"\[\\x{$i}\\-\\x{$i}\]\".'\$' ";
if ($@) { warn $@; next; }
eval " \$src = \"\\x{$i}\" ";
if ($@) { warn $@; next; }
if ($src =~ / $rx /x) {
# print "OK! $i\n";
}
else {
print "***** BAD $i $rx\n";
# sleep (1);
}
}
}
__END__