Regular Expressions

Discussion in 'Ruby' started by Justin To, Jun 17, 2008.

  1. Justin To

    Justin To Guest

    Hello! I'm trying this problem that says I must match versions in a CSV
    file,

    could be anything like:

    v.6.0.3-3
    aajd4-43_3
    ABCD 5.0
    ABCDv.5.0
    A 3.40
    ...

    With the other fields in mind, I thought "heck, looks like versions are
    the only ones that contain a series of letters, digits, periods,
    underscores and dashes..."

    I'm pretty new to Ruby so I don't have very much experience with regular
    expressions. Is it possible to make just one regular expression to
    fulfill my problem? I need a regular expression that will return:

    v.6.0.3-3: true because there's a v followed by a series of '.' and
    digits
    aajd4-43_3: true because there's a series of digits, '-', and '_'
    ABCD 5.0: true because there's a series of digits and '.'
    ABCDv.5.0: true...
    A 3.40: true...

    Thanks for the help!
    --
    Posted via http://www.ruby-forum.com/.
    Justin To, Jun 17, 2008
    #1
    1. Advertising

  2. On Tue, Jun 17, 2008 at 6:08 PM, Justin To <> wrote:
    > Hello! I'm trying this problem that says I must match versions in a CSV
    > file,
    >
    > could be anything like:
    >
    > v.6.0.3-3
    > aajd4-43_3
    > ABCD 5.0
    > ABCDv.5.0
    > A 3.40
    > ...
    >
    > With the other fields in mind, I thought "heck, looks like versions are
    > the only ones that contain a series of letters, digits, periods,
    > underscores and dashes..."
    >
    > I'm pretty new to Ruby so I don't have very much experience with regular
    > expressions. Is it possible to make just one regular expression to
    > fulfill my problem? I need a regular expression that will return:
    >
    > v.6.0.3-3: true because there's a v followed by a series of '.' and
    > digits
    > aajd4-43_3: true because there's a series of digits, '-', and '_'
    > ABCD 5.0: true because there's a series of digits and '.'
    > ABCDv.5.0: true...
    > A 3.40: true...


    I think there's some information missing here: how many of
    these characters form a "series"? More than 1? Do they
    have to be interleaved in some order, like, you need digits
    followed by a . a - or a _ followed by more digits, or it doesn't matter.

    The simplest case: two or more of those characters in a row:

    irb(main):023:0> versions = ["v.6.0.3-3", "aajd4-43_3","ABCD 5.0",
    "ABCDv.5.0", "A 3.40"]
    => ["v.6.0.3-3", "aajd4-43_3", "ABCD 5.0", "ABCDv.5.0", "A 3.40"]
    irb(main):024:0> r = /[.-_1-9]{2,}/
    => /[.-_1-9]{2,}/
    irb(main):025:0> versions.each {|x| puts "#{x}: #{(x =~ r) != nil}"}
    v.6.0.3-3: true
    aajd4-43_3: true
    ABCD 5.0: true
    ABCDv.5.0: true
    A 3.40: true

    1 or more digits, followed by . or _ or -, followed by one or more digits:

    irb(main):030:0> r = /\d+[-._]\d+/
    => /\d+[-._]\d+/
    irb(main):031:0> versions.each {|x| puts "#{x}: #{(x =~ r) != nil}"}
    v.6.0.3-3: true
    aajd4-43_3: true
    ABCD 5.0: true
    ABCDv.5.0: true
    A 3.40: true

    You will have to refine your requirements a little bit, in order to choose among
    these (and any variations on this).

    Jesus.
    Jesús Gabriel y Galán, Jun 17, 2008
    #2
    1. Advertising

  3. On 17.06.2008 18:08, Justin To wrote:
    > Hello! I'm trying this problem that says I must match versions in a CSV
    > file,
    >
    > could be anything like:
    >
    > v.6.0.3-3
    > aajd4-43_3
    > ABCD 5.0
    > ABCDv.5.0
    > A 3.40
    > ..
    >
    > With the other fields in mind, I thought "heck, looks like versions are
    > the only ones that contain a series of letters, digits, periods,
    > underscores and dashes..."
    >
    > I'm pretty new to Ruby so I don't have very much experience with regular
    > expressions. Is it possible to make just one regular expression to
    > fulfill my problem? I need a regular expression that will return:
    >
    > v.6.0.3-3: true because there's a v followed by a series of '.' and
    > digits
    > aajd4-43_3: true because there's a series of digits, '-', and '_'
    > ABCD 5.0: true because there's a series of digits and '.'
    > ABCDv.5.0: true...
    > A 3.40: true...
    >
    > Thanks for the help!


    Yes, that's easy, just /./ as an expression.

    Seriously, it is similarly crucial what it does *not* match.

    The easiest (but not most efficient approach) would be to create on
    alternative for each variant you have, like

    %r{
    ^(?:
    v(?:\.\d+)+-\d+
    | \w+\d+-[\d_]+
    | ...
    )$
    }x

    etc.

    But given the number of alternatives you present it might be difficult
    to avoid also matching other stuff. At least, you'll face a pretty
    complex regular expression.

    Kind regards

    robert
    Robert Klemme, Jun 17, 2008
    #3
  4. Justin To

    Dave Bass Guest

    Robert Klemme wrote:
    > The easiest (but not most efficient approach) would be to create on
    > alternative for each variant you have, like
    >
    > %r{
    > ^(?:
    > v(?:\.\d+)+-\d+
    > | \w+\d+-[\d_]+
    > | ...
    > )$
    > }x
    >
    > etc.


    The problem with regular expressions is that they can easily get out of
    hand and become incomprehensible, as the above code shows (though
    presumably to RK it's totally transparent).

    Better to write a number of small regexps, each testing for a specific
    pattern. Then combine the results with a logical OR. This can be done
    using a flag variable, or an if-elsif tree, a case statement, etc.,
    whatever you feel happiest with. This approach will be a lot easier to
    test and debug.
    --
    Posted via http://www.ruby-forum.com/.
    Dave Bass, Jun 18, 2008
    #4
  5. 2008/6/18 Dave Bass <>:
    > Robert Klemme wrote:
    >> The easiest (but not most efficient approach) would be to create on
    >> alternative for each variant you have, like
    >>
    >> %r{
    >> ^(?:
    >> v(?:\.\d+)+-\d+
    >> | \w+\d+-[\d_]+
    >> | ...
    >> )$
    >> }x
    >>
    >> etc.

    >
    > The problem with regular expressions is that they can easily get out of
    > hand and become incomprehensible, as the above code shows (though
    > presumably to RK it's totally transparent).


    Actually the RX I presented was not complete and was intended to
    convey your point. :)

    > Better to write a number of small regexps, each testing for a specific
    > pattern. Then combine the results with a logical OR. This can be done
    > using a flag variable, or an if-elsif tree, a case statement, etc.,
    > whatever you feel happiest with. This approach will be a lot easier to
    > test and debug.


    Depends. If you build the one RX one alternative at a time and test
    during each iteration I'd say that works pretty good as well. And if
    the volume of data is hight the performance advantage of a single RX
    might pay off.

    Kind regards

    robert


    --
    use.inject do |as, often| as.you_can - without end
    Robert Klemme, Jun 18, 2008
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jay Douglas

    Custom Regular Expressions in ASP.net

    Jay Douglas, Nov 2, 2003, in forum: ASP .Net
    Replies:
    3
    Views:
    600
    mikeb
    Nov 3, 2003
  2. mark

    Regular expressions

    mark, Jun 30, 2003, in forum: Perl
    Replies:
    4
    Views:
    1,714
  3. Dustin D.
    Replies:
    1
    Views:
    11,148
  4. Jay Douglas
    Replies:
    0
    Views:
    593
    Jay Douglas
    Aug 15, 2003
  5. Noman Shapiro
    Replies:
    0
    Views:
    222
    Noman Shapiro
    Jul 17, 2013
Loading...

Share This Page