Why, oh, why, little regexp?

Daniel Waite · Oct 30, 2007

'cost * tax'.match(/([a-z]+)*/).to_a
=> ["cost", "cost"]

Why?

I'm reading it as... Take one or more characters between a and z, store
them into a back reference, then repeat the previous match zero or more
times.

Now, that regexp doesn't do what I want it to do, but what it IS doing
doesn't make sense to me.

What I'd like is to grab all the "words" in the string. So in the above
example I'd like two matches, cost and tax.

Any ideas?

PS: match(...).captures always, always returns an empty array...

Joel VanderWerf · Oct 30, 2007

Daniel said:
'cost * tax'.match(/([a-z]+)*/).to_a
=> ["cost", "cost"]

Why?

I'm reading it as... Take one or more characters between a and z, store
them into a back reference, then repeat the previous match zero or more
times.

Now, that regexp doesn't do what I want it to do, but what it IS doing
doesn't make sense to me.

What I'd like is to grab all the "words" in the string. So in the above
example I'd like two matches, cost and tax.

Any ideas?

'cost * tax'.scan(/\w+/)
=> ["cost", "tax"]

PS: match(...).captures always, always returns an empty array...

How are you using it?

"foo".match(/(foo)/).captures
=> ["foo"]
'cost * tax'.match(/([a-z]+)*/).captures
=> ["cost"]

Daniel Waite · Oct 30, 2007

Joel said:
What I'd like is to grab all the "words" in the string. So in the above
example I'd like two matches, cost and tax.

Any ideas?

Click to expand...

'cost * tax'.scan(/\w+/)
=> ["cost", "tax"]

How do you people do that? The last time I had a regexp question someone
came down from the clouds and handed me something about that short. Why
do I think it's more difficult than it is?

After making the example a little more complex I had to change it
every-so-slightly...

'cost * tax + 0.075'.scan(/[a-z]+/)
=> ["cost", "tax"]

But it's effectively the same. Thank you Joel, you rock!

Is there a book you recommend to learn more about regular expressions?
How did YOU learn them?

PS: match(...).captures always, always returns an empty array...

Click to expand...

How are you using it?

"foo".match(/(foo)/).captures
=> ["foo"]
'cost * tax'.match(/([a-z]+)*/).captures
=> ["cost"]

LOL I'm an idiot -- *captures* -- back references, right. Gotcha...

Stanislav Sedov · Oct 30, 2007

'cost * tax'.match(/([a-z]+)*/).to_a
=> ["cost", "cost"]

Why?

Well, the regexp always matches the longest possible string.
What did you wrote is effectively equialent to ([a-z]*).
The single regexp can't match multiple strings, it always matches
one. It can't match the space after the 'cost' either, since this
symbol wasn't included to your regexp.

In case, if you want to match two words, you should write e.g.
([[:alpha:]]+)[[:space:]]+([[:alpha:]]+)
This regexp will match two words separated by a space.
Regexp can't match an undefined number of words, you should know
in advance which number of words you want to match.

For more infor on regexps see e.g. re_format(7).

Daniel Waite · Oct 30, 2007

Stanislav said:
'cost * tax'.match(/([a-z]+)*/).to_a
=> ["cost", "cost"]

Why?

Click to expand...

Well, the regexp always matches the longest possible string.
What did you wrote is effectively equialent to ([a-z]*).
The single regexp can't match multiple strings, it always matches
one. It can't match the space after the 'cost' either, since this
symbol wasn't included to your regexp.

In case, if you want to match two words, you should write e.g.
([[:alpha:]]+)[[:space:]]+([[:alpha:]]+)
This regexp will match two words separated by a space.
Regexp can't match an undefined number of words, you should know
in advance which number of words you want to match.

For more infor on regexps see e.g. re_format(7).

Hmm... if what you say is true, why does the second poster's solution
capture multiple words? Wait, I know why. String#scan is different than
string#match. Interesting...

So how does that work if I wanted to match ALL occurrences of \w+
WITHOUT scan?

Jim Clark · Oct 31, 2007

Daniel said:
Is there a book you recommend to learn more about regular expressions?
How did YOU learn them?

"Mastering Regular Expressions" by Jeffrey Friedl. I haven't seen the
third edition to see if there is any Ruby specific examples but even
with all the Perl examples in the first edition, I still use it as a
reference because of the similarities between Perl and Ruby's regular
expressions.

-Jim

7stud -- · Oct 31, 2007

Daniel said:
What I'd like is to grab all the "words" in the string.
So how does that work if I wanted to match ALL occurrences
of \w+ WITHOUT scan?

Your using the wrong method. match() only returns the first match:

pattern = /x.x/
str = "xax hello xbx"

puts pattern1.match(str)

--output:--
xax

So how does that work if I wanted to match ALL occurrences
of \w+ WITHOUT scan?

str = " cost * tax"
words = str.split("*").map {|elmt| elmt.strip()}
p words

--output:--
["cost", "tax"]

str = " cost * tax = 123"
words = []

str.split().map do |word|
good_word = true

word.each_byte do |code|
if code < ?a or code > ?z
good_word = false
break
end
end

if good_word
words << word
end
end

p words

--output:--
["cost", "tax"]

Daniel Waite · Oct 31, 2007

7stud said:
str = " cost * tax = 123"
words = []

str.split().map do |word|
good_word = true

word.each_byte do |code|
if code < ?a or code > ?z
good_word = false
break
end
end

if good_word
words << word
end
end

p words

--output:--
["cost", "tax"]

That's clever use of ?a, which I recognize but have never seen anyone
use before. Thanks for the example!

Jim said:
"Mastering Regular Expressions" by Jeffrey Friedl. I haven't seen the
third edition to see if there is any Ruby specific examples but even
with all the Perl examples in the first edition, I still use it as a
reference because of the similarities between Perl and Ruby's regular
expressions.

I shall check that out Jim, thanks much.

Phrogz · Oct 31, 2007

That's clever use of ?a, which I recognize but have never seen anyone
use before. Thanks for the example!

My current favorite use for the ?x syntax is converting single-
character strings representing digits into their integer form:

# Jenny jenny, who can I turn to?
irb(main):006:0> "8675309".each_byte{ |x| p x - ?0 }
8
6
7
5
3
0
9

Brian Adkins · Oct 31, 2007

My current favorite use for the ?x syntax is converting single-
character strings representing digits into their integer form:

Yeah, so you can squeeze Ruby code into small places

1.upto(?d){|i|i%3<1&&x=:Fizz;puts i%5<1?"#{x}Buzz":x||i}

Rick DeNatale · Oct 31, 2007

Yeah, so you can squeeze Ruby code into small places

1.upto(?d){|i|i%3<1&&x=:Fizz;puts i%5<1?"#{x}Buzz":x||i}

Except under the upcoming revision (1.9) of the (Ruby) Rules of Golf,
the R(uby)&A(ncient) has outlawed that usage, and instituted the
penalty that ?d will no longer be 100, but "d".

7stud -- · Oct 31, 2007

Gavin said:
My current favorite use for the ?x syntax is converting single-
character strings representing digits into their integer form:

# Jenny jenny, who can I turn to?
irb(main):006:0> "8675309".each_byte{ |x| p x - ?0 }
8
6
7
5
3
0
9

Perhaps this is clearer:

"8675309".each_byte{|code| puts code.chr}

...although slightly slower.

James Edward Gray II · Oct 31, 2007

Perhaps this is clearer:

"8675309".each_byte{|code| puts code.chr}

...although slightly slower.

Printed content aside, it's not equivalent. The original code is
making Integers, not Strings.

James Edward Gray II

Brian Adkins · Oct 31, 2007

Except under the upcoming revision (1.9) of the (Ruby) Rules of Golf,
the R(uby)&A(ncient) has outlawed that usage, and instituted the
penalty that ?d will no longer be 100, but "d".

Well, then the least they can do is add Integer#to as an alias for
Integer#upto so we can have a net loss of 1 character in the above
code

7stud -- · Oct 31, 2007

James said:
Printed content aside, it's not equivalent. The original code is
making Integers, not Strings.

Whoops.

Regexp - start and end of line or string	1	Jan 16, 2011
Must be a bug in the re module [was: Why this result with the remodule]	0	Nov 3, 2010
Simple regexp question	0	Oct 26, 2005
Ruby Regexp vs Perl and C#	6	Oct 13, 2006
Why is regexp "\Amax_repeats=(\d+)" not equal to "^max_repeats=([0-9]+)" ?	3	Apr 30, 2004
Can anyone write this recursion for simple regexp more beautifullyand clearly than the braggarts	157	Aug 29, 2009
[SUMMARY] Reverse the Polarity (#143)	0	Oct 18, 2007
FAQ 6.18 Why don't word-boundary searches with "\b" work for me?	0	Apr 24, 2011

Why, oh, why, little regexp?

Daniel Waite

Joel VanderWerf

Daniel Waite

Stanislav Sedov

Daniel Waite

Jim Clark

7stud --

Daniel Waite

Phrogz

Brian Adkins

Rick DeNatale

7stud --

James Edward Gray II

Brian Adkins

7stud --

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads