search reg-exp for exact match

Discussion in 'Ruby' started by John Butler, Nov 20, 2008.

  1. John Butler

    John Butler Guest

    Hi,

    I have a regular expression
    /\b2003|\2004|\2005|\2006|\2007|\2008|\2009\b/
    and i want to check if various years are present.

    "2003" =~ /\b2003|\2004|\2005|\2006|\2007|\2008|\2009\b/
    returns 0 as expected

    "2010" =~ /\b2003|\2004|\2005|\2006|\2007|\2008|\2009\b/
    returns nil as expected

    But i want only exact matches so when i search for "2003 - 2008" i want
    nil returned as there is no exact match for that particular string. I
    thought the \b would give me this but it doesnt.

    "2003 - 2008" =~ /\b2003|\2004|\2005|\2006|\2007|\2008|\2009\b/
    returns 0 i want nil returned.

    Anyone help?

    Jb
    --
    Posted via http://www.ruby-forum.com/.
     
    John Butler, Nov 20, 2008
    #1
    1. Advertising

  2. On Thu, Nov 20, 2008 at 1:49 PM, John Butler <> wrote:
    > Hi,
    >
    > I have a regular expression
    > /\b2003|\2004|\2005|\2006|\2007|\2008|\2009\b/
    > and i want to check if various years are present.
    >
    > "2003" =~ /\b2003|\2004|\2005|\2006|\2007|\2008|\2009\b/
    > returns 0 as expected
    >
    > "2010" =~ /\b2003|\2004|\2005|\2006|\2007|\2008|\2009\b/
    > returns nil as expected
    >
    > But i want only exact matches so when i search for "2003 - 2008" i want
    > nil returned as there is no exact match for that particular string. I
    > thought the \b would give me this but it doesnt.
    >
    > "2003 - 2008" =~ /\b2003|\2004|\2005|\2006|\2007|\2008|\2009\b/
    > returns 0 i want nil returned.
    >
    > Anyone help?


    To do exactly what you are asking for: you can anchor the regexp
    to the beggining or end of the string:

    irb(main):013:0> re = /\A(2003|2004|2005|2006|2007|2008|2009)\Z/
    => /\A(2003|2004|2005|2006|2007|2008|2009)\Z/
    irb(main):014:0> "2003" =~ re
    => 0
    irb(main):015:0> "2003 - 2008" =~ re
    => nil

    In this case you don't need the \b anymore. BTW, you had typos there
    because you had \2 instead of \b.
    Anyway, if you want exact matches of strings you don't need regexps:

    irb(main):018:0> years = (2003..2009).map {|x| x.to_s}
    => ["2003", "2004", "2005", "2006", "2007", "2008", "2009"]
    irb(main):020:0> years.include? "2003"
    => true
    irb(main):021:0> years.include? "2003 - 2008"
    => false

    If you have many numbers and many lookups, a Set should be better,
    performance-wise.
    Now, if we are talking about ranges of years we can do even better:

    irb(main):022:0> min_year = 2003
    => 2003
    irb(main):023:0> max_year = 2009
    => 2009
    irb(main):024:0> year_to_test = "2003".to_i
    => 2003
    irb(main):025:0> min_year <= year_to_test and year_to_test <= max_year
    => true
    irb(main):026:0> year_to_test = "2008".to_i
    => 2008
    irb(main):027:0> min_year <= year_to_test and year_to_test <= max_year
    => true
    irb(main):028:0> year_to_test = "2010".to_i
    => 2010
    irb(main):029:0> min_year <= year_to_test and year_to_test <= max_year
    => false


    Hope this helps,

    Jesus.
     
    Jesús Gabriel y Galán, Nov 20, 2008
    #2
    1. Advertising

  3. On 20.11.2008 14:17, Jesús Gabriel y Galán wrote:
    > On Thu, Nov 20, 2008 at 1:49 PM, John Butler <> wrote:
    >> Hi,
    >>
    >> I have a regular expression
    >> /\b2003|\2004|\2005|\2006|\2007|\2008|\2009\b/
    >> and i want to check if various years are present.
    >>
    >> "2003" =~ /\b2003|\2004|\2005|\2006|\2007|\2008|\2009\b/
    >> returns 0 as expected
    >>
    >> "2010" =~ /\b2003|\2004|\2005|\2006|\2007|\2008|\2009\b/
    >> returns nil as expected
    >>
    >> But i want only exact matches so when i search for "2003 - 2008" i want
    >> nil returned as there is no exact match for that particular string. I
    >> thought the \b would give me this but it doesnt.
    >>
    >> "2003 - 2008" =~ /\b2003|\2004|\2005|\2006|\2007|\2008|\2009\b/
    >> returns 0 i want nil returned.
    >>
    >> Anyone help?

    >
    > To do exactly what you are asking for: you can anchor the regexp
    > to the beggining or end of the string:
    >
    > irb(main):013:0> re = /\A(2003|2004|2005|2006|2007|2008|2009)\Z/
    > => /\A(2003|2004|2005|2006|2007|2008|2009)\Z/
    > irb(main):014:0> "2003" =~ re
    > => 0
    > irb(main):015:0> "2003 - 2008" =~ re
    > => nil


    I'd rather use /\A200[3-9]\z/.

    > In this case you don't need the \b anymore. BTW, you had typos there
    > because you had \2 instead of \b.
    > Anyway, if you want exact matches of strings you don't need regexps:
    >
    > irb(main):018:0> years = (2003..2009).map {|x| x.to_s}
    > => ["2003", "2004", "2005", "2006", "2007", "2008", "2009"]
    > irb(main):020:0> years.include? "2003"
    > => true
    > irb(main):021:0> years.include? "2003 - 2008"
    > => false


    Or

    irb(main):001:0> s="2005"
    => "2005"
    irb(main):002:0> (2003..2009) === s[/\A\d{4}\z/].to_i
    => true
    irb(main):003:0> s="2010"
    => "2010"
    irb(main):004:0> (2003..2009) === s[/\A\d{4}\z/].to_i
    => false
    irb(main):006:0> (2003..2009).include? s[/\A\d{4}\z/].to_i
    => false

    Note, this works because 0 (= nil.to_i) is not part of the range!

    > If you have many numbers and many lookups, a Set should be better,
    > performance-wise.


    You can even use a bit set:

    irb(main):007:0> t = (2003..2009).inject(0) {|mask,y| mask | 1 << y}
    =>
    116650078639864259662055853239652489576667478532211432368528502061497852157464823887836603809757037023714110007321126217782227286423686421672874625786531963635756068971637276480699799614611885589371789821904502024698121311064730577770474098457113815634439476503092997189887743679313284635928742849521858004245675611528209841692017556564840683843349732924435866760173931843810360262352061792429448169450281904579322760817054128336138506834410834183565543664844525391283837108127106791786643268532096672079466512393065631776802367002142967381057920196424747178242497261636008255151052901022379808767413846016
    irb(main):008:0> t[s.to_i]
    => 0
    irb(main):009:0> s="2005"
    => "2005"
    irb(main):010:0> t[s.to_i]
    => 1
    irb(main):011:0>

    There are many ways... :)

    > Now, if we are talking about ranges of years we can do even better:


    ... or use the range test (as above) directly.

    Kind regards

    robert
     
    Robert Klemme, Nov 22, 2008
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Andrew Rowland

    Reg exp: matching relative path only.

    Andrew Rowland, Aug 2, 2003, in forum: Perl
    Replies:
    0
    Views:
    1,150
    Andrew Rowland
    Aug 2, 2003
  2. psk

    Newbie-Reg Exp

    psk, Jan 16, 2004, in forum: Perl
    Replies:
    2
    Views:
    1,372
    Gunnar Hjalmarsson
    Jan 19, 2004
  3. PerlE

    Reg Exp Help

    PerlE, Jan 30, 2004, in forum: Perl
    Replies:
    0
    Views:
    508
    PerlE
    Jan 30, 2004
  4. Replies:
    9
    Views:
    393
    Paul McGuire
    Sep 7, 2009
  5. sivga

    reg exact string matching

    sivga, Nov 2, 2006, in forum: Perl Misc
    Replies:
    3
    Views:
    95
    Mumia W. (reading news)
    Nov 2, 2006
Loading...

Share This Page