Regex to match a numerical IP range

S

sln

Somebody posted recently on perl.beginners this topic
(was: "Regex to match a numerical range")
The person was trying to match, using a regex, a range of IP's
like 127.0.0.[0-255] or something like that.
A bunch of posters replied with a textual solution.

The group is a list and I don't really have an email or know
how it works. I was going to reply with something like below.
It heavily uses eval, and has some moderate level regexs'.
The principle is that dec number becomes hex in \x{#} utf8 char.

Its workings are not at all that obvious and its pretty slow
comparitively, not only because of the evals' but because of
the runover past bytes, when it becomes utf8 characters in the
regexs'.

For example, a test case is to itterate a numeric range of
0-255, where for instance 255 is asumed to be a hex number
\x255 not a decimal. So a range of continuous decimal numbers
has a different output range used as hex numbers.
However, any check in the range of \x0 - \x255 utf8 characters
apparently works, where \x0 < \x127 < \x255, so it is deduced
that in decimal, 127 is greater than 0 and less than 255.

Inserting these as characters in a regex had me concerned for
a while. But, I tested it enough to be satisfied.

Here is an excerpt of the post from perl.beginners:
" For a reason i don't understand:
127.0.0.1 doesn't match as expected...
Everything between 127.0.0.2 and 127.0.0.299 matches...
127.0.0.230 doesn't match...

What am I doing wrong??
"
And there were many good replies.

-sln

# Regex IP Range matching
# where dec number becomes hex in \x{#} utf8 char
# ------------------------------------------------

use strict;
use warnings;


#### Test cases

# testQuadAndPortRange();
print "\n";

my $pattern = makeUIpRegex('178. [10-45] .[180-200] . [223-254]: [190-195] ');
print "Testing Ip range ...\npattern =\n$pattern\n\n";

my ($count, $matched, $nomatch) = (0,0,0);

for my $q2 (20 .. 22) {
for my $q3 (0 .. 255) {
for my $q4 (0 .. 255)
{
my $curip = "178 .$q2. $q3 .$q4 :$q3";
if ( makeUIp( $curip ) =~ /$pattern/ ) {
print "Matched! ($curip)\n";
$matched++;
}
else {
# print "No match! ($curip)\n";
$nomatch++;
}
$count++;
}
}
}
print <<EOM;

Checked $count
Matched $matched
Not-matched $nomatch
EOM

exit;


#### subs

my ( $rx_Ip, $rx_IpRange );

## Constructs a utf8 char string from a decimal notation IP
# (where input = dec number, becomes hex in \x{#} utf8 char)
# Input -> '#.#.#.# (optional-> : [#-#] )'

sub makeUIp
{
my ($ip) = @_;

BEGIN { $rx_Ip = qr/
^ \s*
(\d{1,3}) \s* \. \s* # q1 (1-3 digits)
(\d{1,3}) \s* \. \s* # q2 "
(\d{1,3}) \s* \. \s* # q3 "
(\d{1,3}) \s* # q4 "
(?: : \s* (\d{1,5}) \s*)? # optional port num (1-5 digits)
$ /x;
}
if ($ip =~ / $rx_Ip /x ) {
if (defined $5) {
eval " \$ip = \"\\x{$1}\.\\x{$2}\.\\x{$3}\.\\x{$4} : \\x{$5}\" ";
}
else {
eval " \$ip = \"\\x{$1}\.\\x{$2}\.\\x{$3}\.\\x{$4}\" ";
}
if ($@) { warn $@; return ''; }
return $ip;
}
# not the correct ip form
return '';
}

## Constructs a regex utf8 pattern from a decimal notation IP template
# (where input = dec number, becomes hex in \x{#} utf8 char)
# Input-> '#.#.#.#' to '[#-#].[#-#].[#-#].[#-#] : [#-#]'

sub makeUIpRegex
{
my ($ip) = @_;
my $res = '';

BEGIN { $rx_IpRange = qr/
^ \s*
(?: (\d{1,3}) | \[\s*(\d{1,3})\s* - \s*(\d{1,3})\s*\] ) \s*\.\s*
(?: (\d{1,3}) | \[\s*(\d{1,3})\s* - \s*(\d{1,3})\s*\] ) \s*\.\s*
(?: (\d{1,3}) | \[\s*(\d{1,3})\s* - \s*(\d{1,3})\s*\] ) \s*\.\s*
(?: (\d{1,3}) | \[\s*(\d{1,3})\s* - \s*(\d{1,3})\s*\] ) \s*
(?:
: \s* (?: (\d{1,5}) | \[\s*(\d{1,5})\s*-\s*(\d{1,5})\s*\] ) \s*
)?
$ /x;
}
if ($ip =~ / $rx_IpRange /x ) {
if (defined $1) { $res .= qq(\\x{$1}\\\\.) }
else { $res .= qq([\\x{$2}-\\x{$3}]\\\\.) }
if (defined $4) { $res .= qq(\\x{$4}\\\\.) }
else { $res .= qq([\\x{$5}-\\x{$6}]\\\\.) }
if (defined $7) { $res .= qq(\\x{$7}\\\\.) }
else { $res .= qq([\\x{$8}-\\x{$9}]\\\\.) }
if (defined $10) { $res .= qq(\\x{$10}) }
else { $res .= qq([\\x{$11}-\\x{$12}]) }

if (defined $13) {
$res .= qq(\\\\ :\\\\ \\x{$13});
}
elsif (defined $14) {
$res .= qq(\\\\ :\\\\ [\\x{$14}-\\x{$15}]);
}
eval "\$ip = \"$res\" ";
if ($@) { warn $@; return ''; }
return qr/$ip/x;
}
# not the correct form
return '';
}

## Constructs and runs utf8 regex /[$i-$i]/,
# (where $i = dec number, becomes hex in \x{$i} utf8 char)
# Tests for conflicts in character class syntax

sub testQuadAndPortRange
{
print "Testing quad and port range for conflicts ...\n";
for my $i (0 .. 99999)
{
my ($rx,$src);
eval " \$rx = '^'.\"\[\\x{$i}\\-\\x{$i}\]\".'\$' ";
if ($@) { warn $@; next; }

eval " \$src = \"\\x{$i}\" ";
if ($@) { warn $@; next; }

if ($src =~ / $rx /x) {
# print "OK! $i\n";
}
else {
print "***** BAD $i $rx\n";
# sleep (1);
}
}
}

__END__
 
T

Ted Zlatanov

On Sat, 11 Dec 2010 14:26:17 -0800 (e-mail address removed) wrote:

s> Somebody posted recently on perl.beginners this topic
s> (was: "Regex to match a numerical range")
s> The person was trying to match, using a regex, a range of IP's
s> like 127.0.0.[0-255] or something like that.
s> A bunch of posters replied with a textual solution.

s> The group is a list and I don't really have an email or know
s> how it works. I was going to reply with something like below.
s> It heavily uses eval, and has some moderate level regexs'.
s> The principle is that dec number becomes hex in \x{#} utf8 char.

s> Its workings are not at all that obvious and its pretty slow
s> comparitively, not only because of the evals' but because of
s> the runover past bytes, when it becomes utf8 characters in the
s> regexs'.

I think Net::Netmask is much better for this task than any custom
solution. Have you tried it?

Ted
 
S

sln

On Sat, 11 Dec 2010 14:26:17 -0800 (e-mail address removed) wrote:

s> Somebody posted recently on perl.beginners this topic
s> (was: "Regex to match a numerical range")
s> The person was trying to match, using a regex, a range of IP's
s> like 127.0.0.[0-255] or something like that.
s> A bunch of posters replied with a textual solution.

s> The group is a list and I don't really have an email or know
s> how it works. I was going to reply with something like below.
s> It heavily uses eval, and has some moderate level regexs'.
s> The principle is that dec number becomes hex in \x{#} utf8 char.

s> Its workings are not at all that obvious and its pretty slow
s> comparitively, not only because of the evals' but because of
s> the runover past bytes, when it becomes utf8 characters in the
s> regexs'.

I think Net::Netmask is much better for this task than any custom
solution. Have you tried it?

Ted

Well, I thought it was just a case of knowing the simple ip
address without knowing anything about the CIDR network (block).
So given a simple quad part notation and range, a simple comparison
would be is all thats needed instead of a full blown cisco type thing.

-sln
 
S

sln

However, any check in the range of \x0 - \x255 utf8 characters
apparently works, where \x0 < \x127 < \x255, so it is deduced

A better solution is to use Net::Netmask and thats what I think.

But, to finish up this thing I wanted to flesh out the range class.
For maximum flexibility, let the template character class include individual
numbers and ranges, for example: [0-5,8,220,225-245], etc ..
And for a little extra speed, added a wildcard '*' so a particular part
doesen't need a range class. It just inserts a m/./ in the regex.
Validation was added on quad and optional port part.
This is all I will be doing on this because its relavence as a tool
is questionalble vs. to add anything else would take refactoring.

I thought it was a neat exercise in utf8, eval, and regular expressions though.
Parsing templates is too much work.

-sln

# Templating-Regex IP Range matching using utf8 chars \x{#}
# -----------------------------------------------------------

use strict;
use warnings;

my $show_UIpRegex = 1;

#### Test cases

print "\n";

my $pattern = makeUIpRegex(
'127. * . [99-110, 180, 182] . *' );

print "Testing Ip range ...\npattern =\n$pattern\n\n";

my ($count, $matched, $nomatch) = (0,0,0);

for my $q2 (20..22, 25) {
for my $q3 (0..255) {
for my $port (13, 193..195, 32700..32800, 3)
{
my $curip = "127 .$q2. $q3 .0 : $port";
if ( makeUIp( $curip ) =~ /$pattern/ ) {
print "Matched! $curip\n";
$matched++;
}
else {
# print "No match! $curip\n";
$nomatch++;
}
$count++;
}
}
}
print <<EOM;

Checked $count
Matched $matched
Not-matched $nomatch
EOM

exit;


#### subs

my ( $rx_IP, $rx_IPRx, $rx_IpRange, $rx_PortRange );


## Constructs a utf8 char string from a decimal notation IP
# ( where input = dec number, becomes hex in \x{#} utf8 char )
# Input -> '#.#.#.# (optional-> : # )'

sub makeUIp
{
my ($ip) = @_;

BEGIN { $rx_IP = qr/
^ \s*
(\d{1,3}) \s* \. \s* # q1 (1-3 digits)
(\d{1,3}) \s* \. \s* # q2 "
(\d{1,3}) \s* \. \s* # q3 "
(\d{1,3}) \s* # q4 "
(?: : \s* (\d{1,5}) \s*)? # optional port num (1-5 digits)
$ /x;
}
if ($ip =~ /$rx_IP/)
{
if ( defined $5 ) {
eval " \$ip = \"\\x{$1}\.\\x{$2}\.\\x{$3}\.\\x{$4} : \\x{$5}\" ";
}
else {
eval " \$ip = \"\\x{$1}\.\\x{$2}\.\\x{$3}\.\\x{$4}\" ";
}
if ($@) { warn $@; return ''; }
return $ip;
}
# not the correct ip form
return '';
}


## Constructs a REGEX utf8 pattern from a decimal notation IP template
# ( where input = dec number, becomes hex in \x{#} utf8 char )
# Input-> '#.#.#.#' - '[range].[range].[range].[range] : [range]'
# '*' can be substituted for any quad/port part # and is equivalent
# to range class [0-#(max_digits)] but is implemented as m/./ in the
# regex as opposed to a [\x{0}-\x{255}] character class
# ( example: 127. *. [range]. 1 : [range] )
# Range can be any combination of #-# or # separated by comma's
# (example: [#-#,#,#-#,#,#, ...] )
# Validation is done on all template parts as far as being between
# 0-255 and 0-65535, however this is not done for the wildcard '*'
# since it has no value to check.
# '*' should speed up the match but will allow ranges of [0-999]
# and/or [0-99999] depending on what part it is being used on.
# If the source string needs to be %100 valid, don't use '*'
# use a range [#-#] or #

sub makeUIpRegex
{
my ($ip) = @_;
my $res = '';
my $msg = 'makeUIpRegex : Invalid %s part \'%s\'';

BEGIN { $rx_IPRx = qr/
^ \s*
(?: (\d{1,3} | \*) | \[\s* ([^\]]+) \s*\] ) \s*\.\s*
(?: (\d{1,3} | \*) | \[\s* ([^\]]+) \s*\] ) \s*\.\s*
(?: (\d{1,3} | \*) | \[\s* ([^\]]+) \s*\] ) \s*\.\s*
(?: (\d{1,3} | \*) | \[\s* ([^\]]+) \s*\] ) \s*
(?:
: \s* (?: (\d{1,5} | \*) | \[\s* ([^\]]+) \s*\] ) \s*
)?
$ /x;
}
if ($ip =~ /$rx_IPRx/)
{
if ( defined $1 ) {
die sprintf( $msg, 'quad 1', $1) if ($1 ne '*' && $1 > 255);
$res .= ($1 eq '*' ? qq(\\.\\\\.) : qq(\\x{$1}\\\\.)) }
else {
$res .= '[';
my $str = parseIpRange( $2 );
die sprintf( $msg, 'quad 1', "\[$2\]") if (!defined $str);
$res .= $str . ']\\\\.';
}
if ( defined $3 ) {
die sprintf( $msg, 'quad 2', $3) if ($3 ne '*' && $3 > 255);
$res .= ($3 eq '*' ? qq(\\.\\\\.) : qq(\\x{$3}\\\\.)) }
else {
$res .= '[';
my $str = parseIpRange( $4 );
die sprintf( $msg, 'quad 2', "\[$4\]") if (!defined $str);
$res .= $str . ']\\\\.';
}
if ( defined $5 ) {
die sprintf( $msg, 'quad 3', $5) if ($5 ne '*' && $5 > 255);
$res .= ($5 eq '*' ? qq(\\.\\\\.) : qq(\\x{$5}\\\\.)) }
else {
$res .= '[';
my $str = parseIpRange( $6 );
die sprintf( $msg, 'quad 3', "\[$6\]") if (!defined $str);
$res .= $str . ']\\\\.';
}
if ( defined $7 ) {
die sprintf( $msg, 'quad 4', $7) if ($7 ne '*' && $7 > 255);
$res .= ($7 eq '*' ? qq(\\.) : qq(\\x{$7})) }
else {
$res .= '[';
my $str = parseIpRange( $8 );
die sprintf( $msg, 'quad 4', "\[$8\]") if (!defined $str);
$res .= $str . ']';
}
if ( defined $9 ) {
die sprintf( $msg, 'quad 1', $9) if ($9 ne '*' && $9 > 65535);
$res .= qq(\\\\ :\\\\ );
$res .= ($9 eq '*' ? qq(\\.) : qq(\\x{$9}));
}
elsif ( defined $10 ) {
$res .= qq(\\\\ :\\\\ [);
my $str = parsePortRange( $10 );
die sprintf( $msg, 'port', "\[$10\]") if (!defined $str);
$res .= $str . ']';
}
if ( $show_UIpRegex ) {
print $res,"\n\n";
}
eval "\$ip = \"$res\" ";
if ($@) { warn $@; return '' }
return qr/$ip/x;
}
# not the correct form
die sprintf( $msg, 'general format', $ip);
return undef;
}


## Range-parses individual IP quad part
#

sub parseIpRange
{
my ($class_string) = @_;

BEGIN { $rx_IpRange = qr/
^ \s* (?: (\d{1,3}) | (\d{1,3})\s* - \s* (\d{1,3}) ) \s*
$ /x;
}
my @rangevals = split /,/, $class_string;
my $res = '';
for my $val ( @rangevals ) {
if ( $val =~ /$rx_IpRange/ ) {
if ( defined $1 ) {
return undef if ($1 > 255); # bad
$res .= qq(\\x{$1})
}
else {
return undef if ($2 > 255 || $3 > 255); # bad
$res .= qq(\\x{$2}-\\x{$3})
}
}
else { return undef } # bad
}
return $res;
}


## Range-parses the IP port
#

sub parsePortRange
{
my ($class_string) = @_;

BEGIN { $rx_PortRange = qr/
^ \s* (?: (\d{1,5}) | (\d{1,5})\s* - \s* (\d{1,5}) ) \s*
$ /x;
}
my @rangevals = split /,/, $class_string;
my $res = '';
for my $val ( @rangevals ) {
if ( $val =~ /$rx_PortRange/ ) {
if ( defined $1 ) {
return undef if ($1 > 65535); # bad
$res .= qq(\\x{$1})
}
else {
return undef if ($2 > 65535 || $3 > 65535); # bad
$res .= qq(\\x{$2}-\\x{$3})
}
}
else { return undef } # bad
}
return $res;
}

__END__
 
T

Ted Zlatanov

On Mon, 13 Dec 2010 12:25:51 -0800 (e-mail address removed) wrote:


s> Well, I thought it was just a case of knowing the simple ip
s> address without knowing anything about the CIDR network (block).
s> So given a simple quad part notation and range, a simple comparison
s> would be is all thats needed instead of a full blown cisco type thing.

I wouldn't try to write that code myself because the risk of getting it
wrong is too high. It's surprisingly hard to do IP ranges well,
especially if you need fast operations. But it looks so easy, doesn't it...

Ted
 
S

sln

On Mon, 13 Dec 2010 12:25:51 -0800 (e-mail address removed) wrote:



s> Well, I thought it was just a case of knowing the simple ip
s> address without knowing anything about the CIDR network (block).
s> So given a simple quad part notation and range, a simple comparison
s> would be is all thats needed instead of a full blown cisco type thing.

I wouldn't try to write that code myself because the risk of getting it
wrong is too high. It's surprisingly hard to do IP ranges well,
especially if you need fast operations. But it looks so easy, doesn't it...

Yup, its like medusa, you can't really look at its face directly lest
one turns into stone..

-sln
 
I

Ilya Zakharevich

On Mon, 13 Dec 2010 12:25:51 -0800 (e-mail address removed) wrote:



s> Well, I thought it was just a case of knowing the simple ip
s> address without knowing anything about the CIDR network (block).
s> So given a simple quad part notation and range, a simple comparison
s> would be is all thats needed instead of a full blown cisco type thing.

I wouldn't try to write that code myself because the risk of getting it
wrong is too high. It's surprisingly hard to do IP ranges well,
especially if you need fast operations. But it looks so easy, doesn't it...

I added (??{}) for this, but it looks broken now:

perl -wle "123 =~ /^(\d+$)(??{ $1 > 122 ? qr( )x : qr((?!)) })/ or die"
panic: top_env

If I understand the docs correct, this should also work:

perl -wle "123 =~ /^(\d+$)(?(?{$1 > 122})|(?!))/ or die"

It looks like it works fine in 5.8.8 and 5.10.0.

Ilya
 
S

sln

I added (??{}) for this, but it looks broken now:

perl -wle "123 =~ /^(\d+$)(??{ $1 > 122 ? qr( )x : qr((?!)) })/ or die"
panic: top_env

If I understand the docs correct, this should also work:

perl -wle "123 =~ /^(\d+$)(?(?{$1 > 122})|(?!))/ or die"

It looks like it works fine in 5.8.8 and 5.10.0.

On your code, > 122 will pass.
Both flavors (??{}) and (?(?{})))? return in-place regex.
About the docs on (?()|),
(?{ CODE }) always succeeds, does it return 1? I don't know..
But (?(condition)yes-pattern) where condition = (?{ CODE }), treats
the code block as the condition. Then apparently, (?(?{ CODE })|) is a special
condition of the conditional expression extension.
So, its seems (?(?{ is itself, a special case.

perl -wle "123 =~ /^(\d+$)(?(?{$1 > 122}) |)/ or die"

dies if it is > 122, otherwise it passes.
I guess thats because if its > 122, the expression
becomes 123 =~ /^(\d+$) / otherwise its /^(\d+$)/

-sln
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,564
Members
45,039
Latest member
CasimiraVa

Latest Threads

Top