Regex to match a numerical IP range

Discussion in 'Perl Misc' started by sln@netherlands.com, Dec 11, 2010.

  1. Guest

    Somebody posted recently on perl.beginners this topic
    (was: "Regex to match a numerical range")
    The person was trying to match, using a regex, a range of IP's
    like 127.0.0.[0-255] or something like that.
    A bunch of posters replied with a textual solution.

    The group is a list and I don't really have an email or know
    how it works. I was going to reply with something like below.
    It heavily uses eval, and has some moderate level regexs'.
    The principle is that dec number becomes hex in \x{#} utf8 char.

    Its workings are not at all that obvious and its pretty slow
    comparitively, not only because of the evals' but because of
    the runover past bytes, when it becomes utf8 characters in the
    regexs'.

    For example, a test case is to itterate a numeric range of
    0-255, where for instance 255 is asumed to be a hex number
    \x255 not a decimal. So a range of continuous decimal numbers
    has a different output range used as hex numbers.
    However, any check in the range of \x0 - \x255 utf8 characters
    apparently works, where \x0 < \x127 < \x255, so it is deduced
    that in decimal, 127 is greater than 0 and less than 255.

    Inserting these as characters in a regex had me concerned for
    a while. But, I tested it enough to be satisfied.

    Here is an excerpt of the post from perl.beginners:
    " For a reason i don't understand:
    127.0.0.1 doesn't match as expected...
    Everything between 127.0.0.2 and 127.0.0.299 matches...
    127.0.0.230 doesn't match...

    What am I doing wrong??
    "
    And there were many good replies.

    -sln

    # Regex IP Range matching
    # where dec number becomes hex in \x{#} utf8 char
    # ------------------------------------------------

    use strict;
    use warnings;


    #### Test cases

    # testQuadAndPortRange();
    print "\n";

    my $pattern = makeUIpRegex('178. [10-45] .[180-200] . [223-254]: [190-195] ');
    print "Testing Ip range ...\npattern =\n$pattern\n\n";

    my ($count, $matched, $nomatch) = (0,0,0);

    for my $q2 (20 .. 22) {
    for my $q3 (0 .. 255) {
    for my $q4 (0 .. 255)
    {
    my $curip = "178 .$q2. $q3 .$q4 :$q3";
    if ( makeUIp( $curip ) =~ /$pattern/ ) {
    print "Matched! ($curip)\n";
    $matched++;
    }
    else {
    # print "No match! ($curip)\n";
    $nomatch++;
    }
    $count++;
    }
    }
    }
    print <<EOM;

    Checked $count
    Matched $matched
    Not-matched $nomatch
    EOM

    exit;


    #### subs

    my ( $rx_Ip, $rx_IpRange );

    ## Constructs a utf8 char string from a decimal notation IP
    # (where input = dec number, becomes hex in \x{#} utf8 char)
    # Input -> '#.#.#.# (optional-> : [#-#] )'

    sub makeUIp
    {
    my ($ip) = @_;

    BEGIN { $rx_Ip = qr/
    ^ \s*
    (\d{1,3}) \s* \. \s* # q1 (1-3 digits)
    (\d{1,3}) \s* \. \s* # q2 "
    (\d{1,3}) \s* \. \s* # q3 "
    (\d{1,3}) \s* # q4 "
    (?: : \s* (\d{1,5}) \s*)? # optional port num (1-5 digits)
    $ /x;
    }
    if ($ip =~ / $rx_Ip /x ) {
    if (defined $5) {
    eval " \$ip = \"\\x{$1}\.\\x{$2}\.\\x{$3}\.\\x{$4} : \\x{$5}\" ";
    }
    else {
    eval " \$ip = \"\\x{$1}\.\\x{$2}\.\\x{$3}\.\\x{$4}\" ";
    }
    if ($@) { warn $@; return ''; }
    return $ip;
    }
    # not the correct ip form
    return '';
    }

    ## Constructs a regex utf8 pattern from a decimal notation IP template
    # (where input = dec number, becomes hex in \x{#} utf8 char)
    # Input-> '#.#.#.#' to '[#-#].[#-#].[#-#].[#-#] : [#-#]'

    sub makeUIpRegex
    {
    my ($ip) = @_;
    my $res = '';

    BEGIN { $rx_IpRange = qr/
    ^ \s*
    (?: (\d{1,3}) | \[\s*(\d{1,3})\s* - \s*(\d{1,3})\s*\] ) \s*\.\s*
    (?: (\d{1,3}) | \[\s*(\d{1,3})\s* - \s*(\d{1,3})\s*\] ) \s*\.\s*
    (?: (\d{1,3}) | \[\s*(\d{1,3})\s* - \s*(\d{1,3})\s*\] ) \s*\.\s*
    (?: (\d{1,3}) | \[\s*(\d{1,3})\s* - \s*(\d{1,3})\s*\] ) \s*
    (?:
    : \s* (?: (\d{1,5}) | \[\s*(\d{1,5})\s*-\s*(\d{1,5})\s*\] ) \s*
    )?
    $ /x;
    }
    if ($ip =~ / $rx_IpRange /x ) {
    if (defined $1) { $res .= qq(\\x{$1}\\\\.) }
    else { $res .= qq([\\x{$2}-\\x{$3}]\\\\.) }
    if (defined $4) { $res .= qq(\\x{$4}\\\\.) }
    else { $res .= qq([\\x{$5}-\\x{$6}]\\\\.) }
    if (defined $7) { $res .= qq(\\x{$7}\\\\.) }
    else { $res .= qq([\\x{$8}-\\x{$9}]\\\\.) }
    if (defined $10) { $res .= qq(\\x{$10}) }
    else { $res .= qq([\\x{$11}-\\x{$12}]) }

    if (defined $13) {
    $res .= qq(\\\\ :\\\\ \\x{$13});
    }
    elsif (defined $14) {
    $res .= qq(\\\\ :\\\\ [\\x{$14}-\\x{$15}]);
    }
    eval "\$ip = \"$res\" ";
    if ($@) { warn $@; return ''; }
    return qr/$ip/x;
    }
    # not the correct form
    return '';
    }

    ## Constructs and runs utf8 regex /[$i-$i]/,
    # (where $i = dec number, becomes hex in \x{$i} utf8 char)
    # Tests for conflicts in character class syntax

    sub testQuadAndPortRange
    {
    print "Testing quad and port range for conflicts ...\n";
    for my $i (0 .. 99999)
    {
    my ($rx,$src);
    eval " \$rx = '^'.\"\[\\x{$i}\\-\\x{$i}\]\".'\$' ";
    if ($@) { warn $@; next; }

    eval " \$src = \"\\x{$i}\" ";
    if ($@) { warn $@; next; }

    if ($src =~ / $rx /x) {
    # print "OK! $i\n";
    }
    else {
    print "***** BAD $i $rx\n";
    # sleep (1);
    }
    }
    }

    __END__
    , Dec 11, 2010
    #1
    1. Advertising

  2. Ted Zlatanov Guest

    On Sat, 11 Dec 2010 14:26:17 -0800 wrote:

    s> Somebody posted recently on perl.beginners this topic
    s> (was: "Regex to match a numerical range")
    s> The person was trying to match, using a regex, a range of IP's
    s> like 127.0.0.[0-255] or something like that.
    s> A bunch of posters replied with a textual solution.

    s> The group is a list and I don't really have an email or know
    s> how it works. I was going to reply with something like below.
    s> It heavily uses eval, and has some moderate level regexs'.
    s> The principle is that dec number becomes hex in \x{#} utf8 char.

    s> Its workings are not at all that obvious and its pretty slow
    s> comparitively, not only because of the evals' but because of
    s> the runover past bytes, when it becomes utf8 characters in the
    s> regexs'.

    I think Net::Netmask is much better for this task than any custom
    solution. Have you tried it?

    Ted
    Ted Zlatanov, Dec 13, 2010
    #2
    1. Advertising

  3. Guest

    On Mon, 13 Dec 2010 10:51:11 -0600, Ted Zlatanov <> wrote:

    >On Sat, 11 Dec 2010 14:26:17 -0800 wrote:
    >
    >s> Somebody posted recently on perl.beginners this topic
    >s> (was: "Regex to match a numerical range")
    >s> The person was trying to match, using a regex, a range of IP's
    >s> like 127.0.0.[0-255] or something like that.
    >s> A bunch of posters replied with a textual solution.
    >
    >s> The group is a list and I don't really have an email or know
    >s> how it works. I was going to reply with something like below.
    >s> It heavily uses eval, and has some moderate level regexs'.
    >s> The principle is that dec number becomes hex in \x{#} utf8 char.
    >
    >s> Its workings are not at all that obvious and its pretty slow
    >s> comparitively, not only because of the evals' but because of
    >s> the runover past bytes, when it becomes utf8 characters in the
    >s> regexs'.
    >
    >I think Net::Netmask is much better for this task than any custom
    >solution. Have you tried it?
    >
    >Ted


    Well, I thought it was just a case of knowing the simple ip
    address without knowing anything about the CIDR network (block).
    So given a simple quad part notation and range, a simple comparison
    would be is all thats needed instead of a full blown cisco type thing.

    -sln
    , Dec 13, 2010
    #3
  4. Guest

    On Sat, 11 Dec 2010 14:26:17 -0800, wrote:

    >However, any check in the range of \x0 - \x255 utf8 characters
    >apparently works, where \x0 < \x127 < \x255, so it is deduced


    A better solution is to use Net::Netmask and thats what I think.

    But, to finish up this thing I wanted to flesh out the range class.
    For maximum flexibility, let the template character class include individual
    numbers and ranges, for example: [0-5,8,220,225-245], etc ..
    And for a little extra speed, added a wildcard '*' so a particular part
    doesen't need a range class. It just inserts a m/./ in the regex.
    Validation was added on quad and optional port part.
    This is all I will be doing on this because its relavence as a tool
    is questionalble vs. to add anything else would take refactoring.

    I thought it was a neat exercise in utf8, eval, and regular expressions though.
    Parsing templates is too much work.

    -sln

    # Templating-Regex IP Range matching using utf8 chars \x{#}
    # -----------------------------------------------------------

    use strict;
    use warnings;

    my $show_UIpRegex = 1;

    #### Test cases

    print "\n";

    my $pattern = makeUIpRegex(
    '127. * . [99-110, 180, 182] . *' );

    print "Testing Ip range ...\npattern =\n$pattern\n\n";

    my ($count, $matched, $nomatch) = (0,0,0);

    for my $q2 (20..22, 25) {
    for my $q3 (0..255) {
    for my $port (13, 193..195, 32700..32800, 3)
    {
    my $curip = "127 .$q2. $q3 .0 : $port";
    if ( makeUIp( $curip ) =~ /$pattern/ ) {
    print "Matched! $curip\n";
    $matched++;
    }
    else {
    # print "No match! $curip\n";
    $nomatch++;
    }
    $count++;
    }
    }
    }
    print <<EOM;

    Checked $count
    Matched $matched
    Not-matched $nomatch
    EOM

    exit;


    #### subs

    my ( $rx_IP, $rx_IPRx, $rx_IpRange, $rx_PortRange );


    ## Constructs a utf8 char string from a decimal notation IP
    # ( where input = dec number, becomes hex in \x{#} utf8 char )
    # Input -> '#.#.#.# (optional-> : # )'

    sub makeUIp
    {
    my ($ip) = @_;

    BEGIN { $rx_IP = qr/
    ^ \s*
    (\d{1,3}) \s* \. \s* # q1 (1-3 digits)
    (\d{1,3}) \s* \. \s* # q2 "
    (\d{1,3}) \s* \. \s* # q3 "
    (\d{1,3}) \s* # q4 "
    (?: : \s* (\d{1,5}) \s*)? # optional port num (1-5 digits)
    $ /x;
    }
    if ($ip =~ /$rx_IP/)
    {
    if ( defined $5 ) {
    eval " \$ip = \"\\x{$1}\.\\x{$2}\.\\x{$3}\.\\x{$4} : \\x{$5}\" ";
    }
    else {
    eval " \$ip = \"\\x{$1}\.\\x{$2}\.\\x{$3}\.\\x{$4}\" ";
    }
    if ($@) { warn $@; return ''; }
    return $ip;
    }
    # not the correct ip form
    return '';
    }


    ## Constructs a REGEX utf8 pattern from a decimal notation IP template
    # ( where input = dec number, becomes hex in \x{#} utf8 char )
    # Input-> '#.#.#.#' - '[range].[range].[range].[range] : [range]'
    # '*' can be substituted for any quad/port part # and is equivalent
    # to range class [0-#(max_digits)] but is implemented as m/./ in the
    # regex as opposed to a [\x{0}-\x{255}] character class
    # ( example: 127. *. [range]. 1 : [range] )
    # Range can be any combination of #-# or # separated by comma's
    # (example: [#-#,#,#-#,#,#, ...] )
    # Validation is done on all template parts as far as being between
    # 0-255 and 0-65535, however this is not done for the wildcard '*'
    # since it has no value to check.
    # '*' should speed up the match but will allow ranges of [0-999]
    # and/or [0-99999] depending on what part it is being used on.
    # If the source string needs to be %100 valid, don't use '*'
    # use a range [#-#] or #

    sub makeUIpRegex
    {
    my ($ip) = @_;
    my $res = '';
    my $msg = 'makeUIpRegex : Invalid %s part \'%s\'';

    BEGIN { $rx_IPRx = qr/
    ^ \s*
    (?: (\d{1,3} | \*) | \[\s* ([^\]]+) \s*\] ) \s*\.\s*
    (?: (\d{1,3} | \*) | \[\s* ([^\]]+) \s*\] ) \s*\.\s*
    (?: (\d{1,3} | \*) | \[\s* ([^\]]+) \s*\] ) \s*\.\s*
    (?: (\d{1,3} | \*) | \[\s* ([^\]]+) \s*\] ) \s*
    (?:
    : \s* (?: (\d{1,5} | \*) | \[\s* ([^\]]+) \s*\] ) \s*
    )?
    $ /x;
    }
    if ($ip =~ /$rx_IPRx/)
    {
    if ( defined $1 ) {
    die sprintf( $msg, 'quad 1', $1) if ($1 ne '*' && $1 > 255);
    $res .= ($1 eq '*' ? qq(\\.\\\\.) : qq(\\x{$1}\\\\.)) }
    else {
    $res .= '[';
    my $str = parseIpRange( $2 );
    die sprintf( $msg, 'quad 1', "\[$2\]") if (!defined $str);
    $res .= $str . ']\\\\.';
    }
    if ( defined $3 ) {
    die sprintf( $msg, 'quad 2', $3) if ($3 ne '*' && $3 > 255);
    $res .= ($3 eq '*' ? qq(\\.\\\\.) : qq(\\x{$3}\\\\.)) }
    else {
    $res .= '[';
    my $str = parseIpRange( $4 );
    die sprintf( $msg, 'quad 2', "\[$4\]") if (!defined $str);
    $res .= $str . ']\\\\.';
    }
    if ( defined $5 ) {
    die sprintf( $msg, 'quad 3', $5) if ($5 ne '*' && $5 > 255);
    $res .= ($5 eq '*' ? qq(\\.\\\\.) : qq(\\x{$5}\\\\.)) }
    else {
    $res .= '[';
    my $str = parseIpRange( $6 );
    die sprintf( $msg, 'quad 3', "\[$6\]") if (!defined $str);
    $res .= $str . ']\\\\.';
    }
    if ( defined $7 ) {
    die sprintf( $msg, 'quad 4', $7) if ($7 ne '*' && $7 > 255);
    $res .= ($7 eq '*' ? qq(\\.) : qq(\\x{$7})) }
    else {
    $res .= '[';
    my $str = parseIpRange( $8 );
    die sprintf( $msg, 'quad 4', "\[$8\]") if (!defined $str);
    $res .= $str . ']';
    }
    if ( defined $9 ) {
    die sprintf( $msg, 'quad 1', $9) if ($9 ne '*' && $9 > 65535);
    $res .= qq(\\\\ :\\\\ );
    $res .= ($9 eq '*' ? qq(\\.) : qq(\\x{$9}));
    }
    elsif ( defined $10 ) {
    $res .= qq(\\\\ :\\\\ [);
    my $str = parsePortRange( $10 );
    die sprintf( $msg, 'port', "\[$10\]") if (!defined $str);
    $res .= $str . ']';
    }
    if ( $show_UIpRegex ) {
    print $res,"\n\n";
    }
    eval "\$ip = \"$res\" ";
    if ($@) { warn $@; return '' }
    return qr/$ip/x;
    }
    # not the correct form
    die sprintf( $msg, 'general format', $ip);
    return undef;
    }


    ## Range-parses individual IP quad part
    #

    sub parseIpRange
    {
    my ($class_string) = @_;

    BEGIN { $rx_IpRange = qr/
    ^ \s* (?: (\d{1,3}) | (\d{1,3})\s* - \s* (\d{1,3}) ) \s*
    $ /x;
    }
    my @rangevals = split /,/, $class_string;
    my $res = '';
    for my $val ( @rangevals ) {
    if ( $val =~ /$rx_IpRange/ ) {
    if ( defined $1 ) {
    return undef if ($1 > 255); # bad
    $res .= qq(\\x{$1})
    }
    else {
    return undef if ($2 > 255 || $3 > 255); # bad
    $res .= qq(\\x{$2}-\\x{$3})
    }
    }
    else { return undef } # bad
    }
    return $res;
    }


    ## Range-parses the IP port
    #

    sub parsePortRange
    {
    my ($class_string) = @_;

    BEGIN { $rx_PortRange = qr/
    ^ \s* (?: (\d{1,5}) | (\d{1,5})\s* - \s* (\d{1,5}) ) \s*
    $ /x;
    }
    my @rangevals = split /,/, $class_string;
    my $res = '';
    for my $val ( @rangevals ) {
    if ( $val =~ /$rx_PortRange/ ) {
    if ( defined $1 ) {
    return undef if ($1 > 65535); # bad
    $res .= qq(\\x{$1})
    }
    else {
    return undef if ($2 > 65535 || $3 > 65535); # bad
    $res .= qq(\\x{$2}-\\x{$3})
    }
    }
    else { return undef } # bad
    }
    return $res;
    }

    __END__
    , Dec 13, 2010
    #4
  5. Ted Zlatanov Guest

    On Mon, 13 Dec 2010 12:25:51 -0800 wrote:

    s> On Mon, 13 Dec 2010 10:51:11 -0600, Ted Zlatanov <> wrote:
    >> I think Net::Netmask is much better for this task than any custom
    >> solution. Have you tried it?


    s> Well, I thought it was just a case of knowing the simple ip
    s> address without knowing anything about the CIDR network (block).
    s> So given a simple quad part notation and range, a simple comparison
    s> would be is all thats needed instead of a full blown cisco type thing.

    I wouldn't try to write that code myself because the risk of getting it
    wrong is too high. It's surprisingly hard to do IP ranges well,
    especially if you need fast operations. But it looks so easy, doesn't it...

    Ted
    Ted Zlatanov, Dec 14, 2010
    #5
  6. Guest

    On Tue, 14 Dec 2010 10:20:20 -0600, Ted Zlatanov <> wrote:

    >On Mon, 13 Dec 2010 12:25:51 -0800 wrote:
    >
    >s> On Mon, 13 Dec 2010 10:51:11 -0600, Ted Zlatanov <> wrote:
    >>> I think Net::Netmask is much better for this task than any custom
    >>> solution. Have you tried it?

    >
    >s> Well, I thought it was just a case of knowing the simple ip
    >s> address without knowing anything about the CIDR network (block).
    >s> So given a simple quad part notation and range, a simple comparison
    >s> would be is all thats needed instead of a full blown cisco type thing.
    >
    >I wouldn't try to write that code myself because the risk of getting it
    >wrong is too high. It's surprisingly hard to do IP ranges well,
    >especially if you need fast operations. But it looks so easy, doesn't it...
    >


    Yup, its like medusa, you can't really look at its face directly lest
    one turns into stone..

    -sln
    , Dec 14, 2010
    #6
  7. On 2010-12-14, Ted Zlatanov <> wrote:
    > On Mon, 13 Dec 2010 12:25:51 -0800 wrote:
    >
    > s> On Mon, 13 Dec 2010 10:51:11 -0600, Ted Zlatanov <> wrote:
    >>> I think Net::Netmask is much better for this task than any custom
    >>> solution. Have you tried it?

    >
    > s> Well, I thought it was just a case of knowing the simple ip
    > s> address without knowing anything about the CIDR network (block).
    > s> So given a simple quad part notation and range, a simple comparison
    > s> would be is all thats needed instead of a full blown cisco type thing.
    >
    > I wouldn't try to write that code myself because the risk of getting it
    > wrong is too high. It's surprisingly hard to do IP ranges well,
    > especially if you need fast operations. But it looks so easy, doesn't it...


    I added (??{}) for this, but it looks broken now:

    perl -wle "123 =~ /^(\d+$)(??{ $1 > 122 ? qr( )x : qr((?!)) })/ or die"
    panic: top_env

    If I understand the docs correct, this should also work:

    perl -wle "123 =~ /^(\d+$)(?(?{$1 > 122})|(?!))/ or die"

    It looks like it works fine in 5.8.8 and 5.10.0.

    Ilya
    Ilya Zakharevich, Dec 14, 2010
    #7
  8. Guest

    On Tue, 14 Dec 2010 19:53:11 +0000 (UTC), Ilya Zakharevich <> wrote:

    >On 2010-12-14, Ted Zlatanov <> wrote:
    >> On Mon, 13 Dec 2010 12:25:51 -0800 wrote:
    >>
    >> s> On Mon, 13 Dec 2010 10:51:11 -0600, Ted Zlatanov <> wrote:
    >>>> I think Net::Netmask is much better for this task than any custom
    >>>> solution. Have you tried it?

    >>
    >> s> Well, I thought it was just a case of knowing the simple ip
    >> s> address without knowing anything about the CIDR network (block).
    >> s> So given a simple quad part notation and range, a simple comparison
    >> s> would be is all thats needed instead of a full blown cisco type thing.
    >>
    >> I wouldn't try to write that code myself because the risk of getting it
    >> wrong is too high. It's surprisingly hard to do IP ranges well,
    >> especially if you need fast operations. But it looks so easy, doesn't it...

    >
    >I added (??{}) for this, but it looks broken now:
    >
    > perl -wle "123 =~ /^(\d+$)(??{ $1 > 122 ? qr( )x : qr((?!)) })/ or die"
    > panic: top_env
    >
    >If I understand the docs correct, this should also work:
    >
    > perl -wle "123 =~ /^(\d+$)(?(?{$1 > 122})|(?!))/ or die"
    >
    >It looks like it works fine in 5.8.8 and 5.10.0.
    >


    On your code, > 122 will pass.
    Both flavors (??{}) and (?(?{})))? return in-place regex.
    About the docs on (?()|),
    (?{ CODE }) always succeeds, does it return 1? I don't know..
    But (?(condition)yes-pattern) where condition = (?{ CODE }), treats
    the code block as the condition. Then apparently, (?(?{ CODE })|) is a special
    condition of the conditional expression extension.
    So, its seems (?(?{ is itself, a special case.

    perl -wle "123 =~ /^(\d+$)(?(?{$1 > 122}) |)/ or die"

    dies if it is > 122, otherwise it passes.
    I guess thats because if its > 122, the expression
    becomes 123 =~ /^(\d+$) / otherwise its /^(\d+$)/

    -sln
    , Dec 15, 2010
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. hiwa
    Replies:
    0
    Views:
    628
  2. Replies:
    46
    Views:
    948
    Antoon Pardon
    Jul 25, 2006
  3. Replies:
    3
    Views:
    728
    Reedick, Andrew
    Jul 1, 2008
  4. Lambda
    Replies:
    2
    Views:
    381
    James Kanze
    Jul 16, 2008
  5. Maximilian

    numerical result out of range

    Maximilian, Feb 1, 2009, in forum: C Programming
    Replies:
    39
    Views:
    3,004
    Richard Bos
    Feb 23, 2009
Loading...

Share This Page