A regex to search for numeric ranges...

Discussion in 'Perl Misc' started by Mr P, Apr 19, 2011.

  1. Mr P

    Mr P Guest

    I read up on this on the www and I found ideas like

    if ( /\b([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\b/ ) ...

    which is pretty uncipherable at a glance and just in general not
    elegant in any sense.

    I generally do something like

    if ( /(\d+)/ && $1 > 256 && $1 < 1024 )


    Which to me is a lot more readable at a glance, but like the example
    above not overly elegant..

    But what I'd REALLY like to do is, similar to the trick for numeric
    sort, a way to do it in the regex like

    /[256-1024]/ # but force it to be numeric, not literal perhaps with a
    switch

    Thoughts, Masters?
     
    Mr P, Apr 19, 2011
    #1
    1. Advertising

  2. Mr P

    Mr P Guest

    On Apr 19, 3:57 pm, Eli the Bearded <*> wrote:
    > In comp.lang.perl.misc, Mr P  <> wrote:
    >
    > > I read up on this on the www and I found ideas like

    >
    > > if ( /\b([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\b/ ) ...

    >
    > > which is pretty uncipherable at a glance and just in general not
    > > elegant in any sense.

    >
    > True. That's why it's much better to not use regexps for numerical
    > ranges.
    >
    > > I generally do something like

    >
    > >  if ( /(\d+)/ && $1 > 256 && $1 < 1024 )

    >
    > I'd write that as
    >
    >    if ( /(\d+)/ && ($1 > 256) && ($1 < 1024) )
    >
    > because I like to make sure things operate in the order I want them
    > to.
    >
    > > Which to me is a lot more readable at a glance, but like the example
    > > above not overly elegant..

    >
    > > But what I'd REALLY like to do is, similar to the  trick for numeric
    > > sort, a way to do it in the regex like

    >
    > > /[256-1024]/ # but force it to be numeric, not literal perhaps with a
    > > switch

    >
    > sub mknumre($$) {
    >   my $low = shift;
    >   my $hi  = shift;
    >
    >   my $set = join('|', ($low .. $hi));
    >
    >   return qr/($set)/;
    >
    > }
    >
    > > Thoughts, Masters?

    >
    > Why does this have to be a regular expression? Use the right tool
    > for the job.


    I guess my answer to that question is that my 1-line regex is a lot
    easier to read and much shorter than your 9-line monster!
     
    Mr P, Apr 19, 2011
    #2
    1. Advertising

  3. Mr P

    Mr P Guest


    > > I generally do something like

    >
    > >  if ( /(\d+)/ && $1 > 256 && $1 < 1024 )

    >
    > I'd write that as
    >
    >    if ( /(\d+)/ && ($1 > 256) && ($1 < 1024) )
    >
    > because I like to make sure things operate in the order I want them
    > to.
    >


    There is no ambiguity in the order of my example- study ORDER
    PRECEDENCE. Mine is just less syntax-intensive.
     
    Mr P, Apr 19, 2011
    #3
  4. Mr P

    Guest

    On Tue, 19 Apr 2011 12:35:56 -0700 (PDT), Mr P <> wrote:

    >I read up on this on the www and I found ideas like
    >
    >if ( /\b([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\b/ ) ...
    >
    >which is pretty uncipherable at a glance and just in general not
    >elegant in any sense.
    >
    >I generally do something like
    >
    > if ( /(\d+)/ && $1 > 256 && $1 < 1024 )
    >
    >
    >Which to me is a lot more readable at a glance, but like the example
    >above not overly elegant..
    >
    >But what I'd REALLY like to do is, similar to the trick for numeric
    >sort, a way to do it in the regex like
    >
    >/[256-1024]/ # but force it to be numeric, not literal perhaps with a
    >switch
    >
    >Thoughts, Masters?



    /[256-1024]/ is generally possible.
    It has limitations that affect the surrounding expressions, but it
    could be worked around and functionally generalized (again within
    specific limitations).

    -sln

    -----------------------

    use strict;
    use warnings;

    my $str = '0001023 widgets';

    # Inline code is going to be a thing of the future and definitely
    # going to happen (see perl 6 regex).
    # This allows parameter checking and is usefull when the source
    # has extended data to be regex analyzed in one expression.

    if ($str =~ / \b (\d+) \b
    (?(?{$^N > 256 && $^N < 1024}) # is this number between 256-1024?
    # yes, continue processing
    |
    (*FAIL) # no, fail outright
    )
    # more expressions here ..
    \s*
    (.+)
    /x )
    {
    print "Number: '$1', Type: '$2'\n";
    }
    else {
    print "failed\n";
    }

    print "\n";

    # This does a source conversion of \d+ to a single utf8 character.
    # It then allows checking it in a HEX numeric range character class.
    # Even though the source is decimal, '1023', when magically assumed to
    # be hex and converted to a utf8 char like "\x{1023}", its code point
    # will be corectly matched within a regex character class range.
    # Example: "\x{1023}" =~ /[\x{257}-\x{1023}]/ will match.
    # And, only "\x{N}" where N is between 257-1023 will match.

    for (0 .. 4096)
    {
    # Construct a fake string using the current counter.
    # In reality, you have to parse the source string and do the conversion
    # so that you end up doing something like this:
    # $src =~ /^(.*?)\b(\d+)\b(.*?)$/
    # eval "\$temp_src = \"$1\\x{$2}$3\" ";
    # Then use the $temp_src in place of the $str below.

    my $padded_string = "000$_"; # the extra '000' padding is just a test
    eval "\$str = \"\\x{$padded_string} widgets\" ";

    if ( $str =~ /^ ([\x{257}-\x{1023}])
    \s*
    (.+)
    /x )
    {
    print "Number: '$padded_string', Type: '$2'\n";
    }
    }
    __END__

    Output
    ------------

    Number: '0001023', Type: 'widgets'

    Number: '000257', Type: 'widgets'
    Number: '000258', Type: 'widgets'
    Number: '000259', Type: 'widgets'
    Number: '000260', Type: 'widgets'
    Number: '000261', Type: 'widgets'
    Number: '000262', Type: 'widgets'
    Number: '000263', Type: 'widgets'
    Number: '000264', Type: 'widgets'
    Number: '000265', Type: 'widgets'
    Number: '000266', Type: 'widgets'
    Number: '000267', Type: 'widgets'
    ...
    ...
    Number: '0001012', Type: 'widgets'
    Number: '0001013', Type: 'widgets'
    Number: '0001014', Type: 'widgets'
    Number: '0001015', Type: 'widgets'
    Number: '0001016', Type: 'widgets'
    Number: '0001017', Type: 'widgets'
    Number: '0001018', Type: 'widgets'
    Number: '0001019', Type: 'widgets'
    Number: '0001020', Type: 'widgets'
    Number: '0001021', Type: 'widgets'
    Number: '0001022', Type: 'widgets'
    Number: '0001023', Type: 'widgets'
     
    , Apr 21, 2011
    #4
  5. Mr P

    Uri Guttman Guest

    >>>>> "s" == sln <> writes:

    s> /[256-1024]/ is generally possible.

    s> It has limitations that affect the surrounding expressions, but it
    s> could be worked around and functionally generalized (again within
    s> specific limitations).

    limitations? it is just wrong. that is a char class of all those digits
    (and i am not even sure what [6-1] will generate).

    uri

    --
    Uri Guttman ------ -------- http://www.sysarch.com --
    ----- Perl Code Review , Architecture, Development, Training, Support ------
    --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
     
    Uri Guttman, Apr 21, 2011
    #5
  6. On 2011-04-21, Eli the Bearded <*@eli.users.panix.com> wrote:
    > I'm sure. The second one, mapping integer sequences to characters to
    > then use a Unicode character class has all the workings of a brilliant
    > bit of obfuscation. I suspect it doesn't scale well, say 2^16 or
    > 2^32, but I don't really know how Perl handles Unicode internally.


    When I worked on this (long time ago), there were no compilers with
    128-bit IV sitting around (are there now?). Hence the support I
    implemented was intended to work "up to maximal number
    representantable by UV", but it is actually coded with limitation "not
    higher than 64 bits". I doubt anybody expanded to further than
    this (the "hooks" for expansion are there, just probably not implemented)...

    Hope this helps,
    Ilya
     
    Ilya Zakharevich, Apr 24, 2011
    #6
  7. Eli the Bearded <*@eli.users.panix.com> writes:
    > In comp.lang.perl.misc, Mr P <> wrote:
    >> I read up on this on the www and I found ideas like
    >>
    >> if ( /\b([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\b/ ) ...
    >>
    >> which is pretty uncipherable at a glance and just in general not
    >> elegant in any sense.

    >
    > True. That's why it's much better to not use regexps for numerical
    > ranges.
    >
    >> I generally do something like
    >>
    >> if ( /(\d+)/ && $1 > 256 && $1 < 1024 )

    >
    > I'd write that as
    >
    > if ( /(\d+)/ && ($1 > 256) && ($1 < 1024) )
    >
    > because I like to make sure things operate in the order I want them
    > to.


    Really?

    First off, I hope you're aware that both forms are exactly
    equivalent., since "<" binds more tightly than "&&", and "&&"
    imposes a left-to-right evaluation with or without the parentheses.

    An argument for using the extra parentheses would be that they make
    it clearer. They don't for me personally; in this particular case,
    the precedence is carved deeply enough into my brain that it's clear
    enough without the parentheses. But YMMV. Obviously, different
    people have different levels of comfort with the precedence levels
    of the various operators.

    But I'd write it as:

    if (/(\d+)/ and $1 > 256 and $1 < 1024)

    I usually prefer "and" and "or" over "&&" and "||". On the other
    hand, I have been bitten a few times by the *low* precedence of
    "and" and "or"; I've mistakenly written things like

    return $this and $that;

    which never evaluates $that.

    (And none of these are equivalent to the original regexp, which
    checks for values from 0 to 255.)

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Apr 27, 2011
    #7
  8. Mr P

    Uri Guttman Guest

    >>>>> "KT" == Keith Thompson <> writes:

    KT> Eli the Bearded <*@eli.users.panix.com> writes:
    >> I'd write that as
    >>
    >> if ( /(\d+)/ && ($1 > 256) && ($1 < 1024) )
    >>
    >> because I like to make sure things operate in the order I want them
    >> to.


    KT> First off, I hope you're aware that both forms are exactly
    KT> equivalent., since "<" binds more tightly than "&&", and "&&"
    KT> imposes a left-to-right evaluation with or without the parentheses.

    KT> An argument for using the extra parentheses would be that they make
    KT> it clearer. They don't for me personally; in this particular case,
    KT> the precedence is carved deeply enough into my brain that it's clear
    KT> enough without the parentheses. But YMMV. Obviously, different
    KT> people have different levels of comfort with the precedence levels
    KT> of the various operators.

    i agree with the dropping of unneeded parens. one place i do use extra
    parens is with ?:. i find parens around the conditional part helps given
    the usually longer total expression. it highlights that as the
    conditional part. not critical but a little style thing i do. and it is
    especially helpful when doing nested ?: ops.

    uri

    --
    Uri Guttman ------ -------- http://www.sysarch.com --
    ----- Perl Code Review , Architecture, Development, Training, Support ------
    --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
     
    Uri Guttman, Apr 27, 2011
    #8
  9. Mr P

    Jim Gibson Guest

    In article <>, Keith Thompson
    <> wrote:

    > Eli the Bearded <*@eli.users.panix.com> writes:
    > > In comp.lang.perl.misc, Mr P <> wrote:
    > >> I read up on this on the www and I found ideas like
    > >>
    > >> if ( /\b([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\b/ ) ...
    > >>
    > >> which is pretty uncipherable at a glance and just in general not
    > >> elegant in any sense.

    > >
    > > True. That's why it's much better to not use regexps for numerical
    > > ranges.
    > >
    > >> I generally do something like
    > >>
    > >> if ( /(\d+)/ && $1 > 256 && $1 < 1024 )

    > >
    > > I'd write that as
    > >
    > > if ( /(\d+)/ && ($1 > 256) && ($1 < 1024) )
    > >
    > > because I like to make sure things operate in the order I want them
    > > to.

    >
    > Really?
    >
    > First off, I hope you're aware that both forms are exactly
    > equivalent., since "<" binds more tightly than "&&", and "&&"
    > imposes a left-to-right evaluation with or without the parentheses.
    >
    > An argument for using the extra parentheses would be that they make
    > it clearer. They don't for me personally; in this particular case,
    > the precedence is carved deeply enough into my brain that it's clear
    > enough without the parentheses. But YMMV. Obviously, different
    > people have different levels of comfort with the precedence levels
    > of the various operators.


    Another argument for using the extra, redundant parentheses is that it
    will work without regard to precedence. I always use the parentheses.
    That way I don't have to remember what the operator precedence is and
    can worry about other things.

    To quote Sherlock Holmes:

    "You see," he explained, "I consider that a man's brain originally is
    like a little empty attic, and you have to stock it with such furniture
    as you choose. A fool takes in all the lumber of every sort that he
    comes across, so that the knowledge which might be useful to him gets
    crowded out, or at best is jumbled up with a lot of other things so
    that he has a difficulty in laying his hands upon it. Now the skilful
    workman is very careful indeed as to what he takes into his
    brain-attic. He will have nothing but the tools which may help him in
    doing his work, but of these he has a large assortment, and all in the
    most perfect order. It is a mistake to think that that little room has
    elastic walls and can distend to any extent. Depend upon it there comes
    a time when for every addition of knowledge you forget something that
    you knew before. It is of the highest importance, therefore, not to
    have useless facts elbowing out the useful ones."

    -- /A Study in Scarlet/, A. C. Doyle.

    --
    Jim Gibson
     
    Jim Gibson, Apr 28, 2011
    #9
  10. Mr P

    Justin C Guest

    On 2011-04-28, Jim Gibson <> wrote:
    >
    > To quote Sherlock Holmes:
    >
    > "You see," he explained, "I consider that a man's brain originally is
    > like a little empty attic, and you have to stock it with such furniture
    > as you choose. A fool takes in all the lumber of every sort that he
    > comes across, so that the knowledge which might be useful to him gets
    > crowded out, or at best is jumbled up with a lot of other things so
    > that he has a difficulty in laying his hands upon it. Now the skilful
    > workman is very careful indeed as to what he takes into his
    > brain-attic. He will have nothing but the tools which may help him in
    > doing his work, but of these he has a large assortment, and all in the
    > most perfect order. It is a mistake to think that that little room has
    > elastic walls and can distend to any extent. Depend upon it there comes
    > a time when for every addition of knowledge you forget something that
    > you knew before. It is of the highest importance, therefore, not to
    > have useless facts elbowing out the useful ones."


    Now we know where Matt Groening got Homer's quote "...every time I learn
    something new it pushes some old stuff out of my brain".

    I should read more... but then I'd probably forget stuff I want to
    remember.

    Justin.

    --
    Justin C, by the sea.
     
    Justin C, Apr 28, 2011
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    5
    Views:
    961
    X-Centric
    Jun 30, 2005
  2. darrel
    Replies:
    4
    Views:
    851
    darrel
    Jul 19, 2007
  3. jobs

    int to numeric numeric(18,2) ?

    jobs, Jul 21, 2007, in forum: ASP .Net
    Replies:
    2
    Views:
    1,003
    =?ISO-8859-1?Q?G=F6ran_Andersson?=
    Jul 22, 2007
  4. Jack
    Replies:
    2
    Views:
    328
    Tad McClellan
    Oct 4, 2006
  5. Replies:
    2
    Views:
    410
Loading...

Share This Page