Why, oh, why, little regexp?

Discussion in 'Ruby' started by Daniel Waite, Oct 30, 2007.

  1. Daniel Waite

    Daniel Waite Guest

    'cost * tax'.match(/([a-z]+)*/).to_a
    => ["cost", "cost"]

    Why?

    I'm reading it as... Take one or more characters between a and z, store
    them into a back reference, then repeat the previous match zero or more
    times.

    Now, that regexp doesn't do what I want it to do, but what it IS doing
    doesn't make sense to me.

    What I'd like is to grab all the "words" in the string. So in the above
    example I'd like two matches, cost and tax.

    Any ideas?

    PS: match(...).captures always, always returns an empty array...
    --
    Posted via http://www.ruby-forum.com/.
     
    Daniel Waite, Oct 30, 2007
    #1
    1. Advertising

  2. Daniel Waite wrote:
    > 'cost * tax'.match(/([a-z]+)*/).to_a
    > => ["cost", "cost"]
    >
    > Why?
    >
    > I'm reading it as... Take one or more characters between a and z, store
    > them into a back reference, then repeat the previous match zero or more
    > times.
    >
    > Now, that regexp doesn't do what I want it to do, but what it IS doing
    > doesn't make sense to me.
    >
    > What I'd like is to grab all the "words" in the string. So in the above
    > example I'd like two matches, cost and tax.
    >
    > Any ideas?


    'cost * tax'.scan(/\w+/)
    => ["cost", "tax"]

    > PS: match(...).captures always, always returns an empty array...


    How are you using it?

    "foo".match(/(foo)/).captures
    => ["foo"]
    'cost * tax'.match(/([a-z]+)*/).captures
    => ["cost"]

    --
    vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407
     
    Joel VanderWerf, Oct 30, 2007
    #2
    1. Advertising

  3. Daniel Waite

    Daniel Waite Guest

    Joel VanderWerf wrote:
    >> What I'd like is to grab all the "words" in the string. So in the above
    >> example I'd like two matches, cost and tax.
    >>
    >> Any ideas?

    >
    > 'cost * tax'.scan(/\w+/)
    > => ["cost", "tax"]


    How do you people do that? The last time I had a regexp question someone
    came down from the clouds and handed me something about that short. Why
    do I think it's more difficult than it is?

    After making the example a little more complex I had to change it
    every-so-slightly...

    'cost * tax + 0.075'.scan(/[a-z]+/)
    => ["cost", "tax"]

    But it's effectively the same. Thank you Joel, you rock!

    Is there a book you recommend to learn more about regular expressions?
    How did YOU learn them?

    >> PS: match(...).captures always, always returns an empty array...

    > How are you using it?
    >
    > "foo".match(/(foo)/).captures
    > => ["foo"]
    > 'cost * tax'.match(/([a-z]+)*/).captures
    > => ["cost"]


    LOL I'm an idiot -- *captures* -- back references, right. Gotcha...
    --
    Posted via http://www.ruby-forum.com/.
     
    Daniel Waite, Oct 30, 2007
    #3
  4. On Wed, Oct 31, 2007 at 08:14:01AM +0900 Daniel Waite mentioned:
    > 'cost * tax'.match(/([a-z]+)*/).to_a
    > => ["cost", "cost"]
    >
    > Why?
    >


    Well, the regexp always matches the longest possible string.
    What did you wrote is effectively equialent to ([a-z]*).
    The single regexp can't match multiple strings, it always matches
    one. It can't match the space after the 'cost' either, since this
    symbol wasn't included to your regexp.

    In case, if you want to match two words, you should write e.g.
    ([[:alpha:]]+)[[:space:]]+([[:alpha:]]+)
    This regexp will match two words separated by a space.
    Regexp can't match an undefined number of words, you should know
    in advance which number of words you want to match.

    For more infor on regexps see e.g. re_format(7).

    --
    Stanislav Sedov
    ST4096-RIPE
     
    Stanislav Sedov, Oct 30, 2007
    #4
  5. Daniel Waite

    Daniel Waite Guest

    Stanislav Sedov wrote:
    > On Wed, Oct 31, 2007 at 08:14:01AM +0900 Daniel Waite mentioned:
    >> 'cost * tax'.match(/([a-z]+)*/).to_a
    >> => ["cost", "cost"]
    >>
    >> Why?
    >>

    >
    > Well, the regexp always matches the longest possible string.
    > What did you wrote is effectively equialent to ([a-z]*).
    > The single regexp can't match multiple strings, it always matches
    > one. It can't match the space after the 'cost' either, since this
    > symbol wasn't included to your regexp.
    >
    > In case, if you want to match two words, you should write e.g.
    > ([[:alpha:]]+)[[:space:]]+([[:alpha:]]+)
    > This regexp will match two words separated by a space.
    > Regexp can't match an undefined number of words, you should know
    > in advance which number of words you want to match.
    >
    > For more infor on regexps see e.g. re_format(7).


    Hmm... if what you say is true, why does the second poster's solution
    capture multiple words? Wait, I know why. String#scan is different than
    string#match. Interesting...

    So how does that work if I wanted to match ALL occurrences of \w+
    WITHOUT scan?
    --
    Posted via http://www.ruby-forum.com/.
     
    Daniel Waite, Oct 30, 2007
    #5
  6. Daniel Waite

    Jim Clark Guest

    Daniel Waite wrote:
    > Is there a book you recommend to learn more about regular expressions?
    > How did YOU learn them?
    >

    "Mastering Regular Expressions" by Jeffrey Friedl. I haven't seen the
    third edition to see if there is any Ruby specific examples but even
    with all the Perl examples in the first edition, I still use it as a
    reference because of the similarities between Perl and Ruby's regular
    expressions.

    -Jim
     
    Jim Clark, Oct 31, 2007
    #6
  7. Daniel Waite

    7stud -- Guest

    Daniel Waite wrote:
    > What I'd like is to grab all the "words" in the string.
    > So how does that work if I wanted to match ALL occurrences
    > of \w+ WITHOUT scan?


    Your using the wrong method. match() only returns the first match:

    pattern = /x.x/
    str = "xax hello xbx"

    puts pattern1.match(str)

    --output:--
    xax

    >
    > So how does that work if I wanted to match ALL occurrences
    > of \w+ WITHOUT scan?
    >


    str = " cost * tax"
    words = str.split("*").map {|elmt| elmt.strip()}
    p words

    --output:--
    ["cost", "tax"]



    str = " cost * tax = 123"
    words = []

    str.split().map do |word|
    good_word = true

    word.each_byte do |code|
    if code < ?a or code > ?z
    good_word = false
    break
    end
    end

    if good_word
    words << word
    end
    end

    p words

    --output:--
    ["cost", "tax"]

    --
    Posted via http://www.ruby-forum.com/.
     
    7stud --, Oct 31, 2007
    #7
  8. Daniel Waite

    Daniel Waite Guest

    7stud -- wrote:
    > str = " cost * tax = 123"
    > words = []
    >
    > str.split().map do |word|
    > good_word = true
    >
    > word.each_byte do |code|
    > if code < ?a or code > ?z
    > good_word = false
    > break
    > end
    > end
    >
    > if good_word
    > words << word
    > end
    > end
    >
    > p words
    >
    > --output:--
    > ["cost", "tax"]


    That's clever use of ?a, which I recognize but have never seen anyone
    use before. Thanks for the example!

    Jim Clark wrote:
    > "Mastering Regular Expressions" by Jeffrey Friedl. I haven't seen the
    > third edition to see if there is any Ruby specific examples but even
    > with all the Perl examples in the first edition, I still use it as a
    > reference because of the similarities between Perl and Ruby's regular
    > expressions.


    I shall check that out Jim, thanks much.
    --
    Posted via http://www.ruby-forum.com/.
     
    Daniel Waite, Oct 31, 2007
    #8
  9. Daniel Waite

    Phrogz Guest

    On Oct 30, 9:30 pm, Daniel Waite <> wrote:
    > That's clever use of ?a, which I recognize but have never seen anyone
    > use before. Thanks for the example!


    My current favorite use for the ?x syntax is converting single-
    character strings representing digits into their integer form:

    # Jenny jenny, who can I turn to?
    irb(main):006:0> "8675309".each_byte{ |x| p x - ?0 }
    8
    6
    7
    5
    3
    0
    9
     
    Phrogz, Oct 31, 2007
    #9
  10. Daniel Waite

    Brian Adkins Guest

    On Oct 30, 11:58 pm, Phrogz <> wrote:
    > On Oct 30, 9:30 pm, Daniel Waite <> wrote:
    >
    > > That's clever use of ?a, which I recognize but have never seen anyone
    > > use before. Thanks for the example!

    >
    > My current favorite use for the ?x syntax is converting single-
    > character strings representing digits into their integer form:


    Yeah, so you can squeeze Ruby code into small places :)

    1.upto(?d){|i|i%3<1&&x=:Fizz;puts i%5<1?"#{x}Buzz":x||i}
     
    Brian Adkins, Oct 31, 2007
    #10
  11. On 10/31/07, Brian Adkins <> wrote:
    > On Oct 30, 11:58 pm, Phrogz <> wrote:
    > > On Oct 30, 9:30 pm, Daniel Waite <> wrote:
    > >
    > > > That's clever use of ?a, which I recognize but have never seen anyone
    > > > use before. Thanks for the example!

    > >
    > > My current favorite use for the ?x syntax is converting single-
    > > character strings representing digits into their integer form:

    >
    > Yeah, so you can squeeze Ruby code into small places :)
    >
    > 1.upto(?d){|i|i%3<1&&x=:Fizz;puts i%5<1?"#{x}Buzz":x||i}


    Except under the upcoming revision (1.9) of the (Ruby) Rules of Golf,
    the R(uby)&A(ncient) has outlawed that usage, and instituted the
    penalty that ?d will no longer be 100, but "d".

    --
    Rick DeNatale

    My blog on Ruby
    http://talklikeaduck.denhaven2.com/
     
    Rick DeNatale, Oct 31, 2007
    #11
  12. Daniel Waite

    7stud -- Guest

    Gavin Kistner wrote:
    > On Oct 30, 9:30 pm, Daniel Waite <> wrote:
    >> That's clever use of ?a, which I recognize but have never seen anyone
    >> use before. Thanks for the example!

    >
    > My current favorite use for the ?x syntax is converting single-
    > character strings representing digits into their integer form:
    >
    > # Jenny jenny, who can I turn to?
    > irb(main):006:0> "8675309".each_byte{ |x| p x - ?0 }
    > 8
    > 6
    > 7
    > 5
    > 3
    > 0
    > 9



    Perhaps this is clearer:

    "8675309".each_byte{|code| puts code.chr}

    ...although slightly slower.
    --
    Posted via http://www.ruby-forum.com/.
     
    7stud --, Oct 31, 2007
    #12
  13. On Oct 31, 2007, at 7:46 AM, 7stud -- wrote:

    > Gavin Kistner wrote:
    >> On Oct 30, 9:30 pm, Daniel Waite <> wrote:
    >>> That's clever use of ?a, which I recognize but have never seen
    >>> anyone
    >>> use before. Thanks for the example!

    >>
    >> My current favorite use for the ?x syntax is converting single-
    >> character strings representing digits into their integer form:
    >>
    >> # Jenny jenny, who can I turn to?
    >> irb(main):006:0> "8675309".each_byte{ |x| p x - ?0 }
    >> 8
    >> 6
    >> 7
    >> 5
    >> 3
    >> 0
    >> 9

    >
    >
    > Perhaps this is clearer:
    >
    > "8675309".each_byte{|code| puts code.chr}
    >
    > ...although slightly slower.


    Printed content aside, it's not equivalent. The original code is
    making Integers, not Strings.

    James Edward Gray II
     
    James Edward Gray II, Oct 31, 2007
    #13
  14. Daniel Waite

    Brian Adkins Guest

    On Oct 31, 7:17 am, "Rick DeNatale" <> wrote:
    > On 10/31/07, Brian Adkins <> wrote:
    >
    > > On Oct 30, 11:58 pm, Phrogz <> wrote:
    > > > On Oct 30, 9:30 pm, Daniel Waite <> wrote:

    >
    > > > > That's clever use of ?a, which I recognize but have never seen anyone
    > > > > use before. Thanks for the example!

    >
    > > > My current favorite use for the ?x syntax is converting single-
    > > > character strings representing digits into their integer form:

    >
    > > Yeah, so you can squeeze Ruby code into small places :)

    >
    > > 1.upto(?d){|i|i%3<1&&x=:Fizz;puts i%5<1?"#{x}Buzz":x||i}

    >
    > Except under the upcoming revision (1.9) of the (Ruby) Rules of Golf,
    > the R(uby)&A(ncient) has outlawed that usage, and instituted the
    > penalty that ?d will no longer be 100, but "d".


    Well, then the least they can do is add Integer#to as an alias for
    Integer#upto so we can have a net loss of 1 character in the above
    code :)
     
    Brian Adkins, Oct 31, 2007
    #14
  15. Daniel Waite

    7stud -- Guest

    James Gray wrote:
    > On Oct 31, 2007, at 7:46 AM, 7stud -- wrote:
    >
    >>> irb(main):006:0> "8675309".each_byte{ |x| p x - ?0 }

    >>
    >> "8675309".each_byte{|code| puts code.chr}
    >>
    >> ...although slightly slower.

    >
    > Printed content aside, it's not equivalent. The original code is
    > making Integers, not Strings.
    >


    Whoops.
    --
    Posted via http://www.ruby-forum.com/.
     
    7stud --, Oct 31, 2007
    #15
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mr. SweatyFinger
    Replies:
    2
    Views:
    2,002
    Smokey Grindel
    Dec 2, 2006
  2. ThaDoctor
    Replies:
    3
    Views:
    385
    Alan Woodland
    Sep 28, 2007
  3. henon

    [Q] little regexp challenge

    henon, Oct 16, 2003, in forum: Ruby
    Replies:
    5
    Views:
    122
    Robert Klemme
    Oct 17, 2003
  4. Joao Silva
    Replies:
    16
    Views:
    364
    7stud --
    Aug 21, 2009
  5. Daniel
    Replies:
    1
    Views:
    214
    Bart van Ingen Schenau
    Jul 9, 2013
Loading...

Share This Page