Regular expression: Is this a bug or feature?

M

Martin Kahlert

Hi ruby experts!

Is this intended behaviour?

irb(main):001:0> s1='a=1'
=> "a=1"
irb(main):002:0> s2='b=1'
=> "b=1"
irb(main):003:0> s1 =~ /a|b=(.)/
=> 0 <------ expression matches
irb(main):004:0> $1
=> nil <------ but where is argument?
irb(main):005:0> s2 =~ /a|b=(.)/
=> 0 <------ expression matches
irb(main):006:0> $1
=> "1" <------ this has been expected
irb(main):007:0> s1 =~ /(a|b)=(.)/
=> 0 <------ expression matches
irb(main):012:0> $2
=> "1" <------ this has been expected

Tested on ruby 1.8.2 (2004-12-22) [i686-linux]

Thanks for your help in advance
Martin.
 
G

George Ogata

Hi ruby experts!

Is this intended behaviour?

irb(main):001:0> s1='a=1'
=> "a=1"
irb(main):002:0> s2='b=1'
=> "b=1"

Call this part 1:
irb(main):003:0> s1 =~ /a|b=(.)/
=> 0 <------ expression matches
irb(main):004:0> $1
=> nil <------ but where is argument?

Part 2:
irb(main):005:0> s2 =~ /a|b=(.)/
=> 0 <------ expression matches
irb(main):006:0> $1
=> "1" <------ this has been expected

Part 3:
irb(main):007:0> s1 =~ /(a|b)=(.)/
=> 0 <------ expression matches
irb(main):012:0> $2
=> "1" <------ this has been expected

I'm not sure why you think it might be a bug. The '|' operator just
binds very loosely, so you have to group the "a|b" in parens. Note
that in part 1, The bit that matches is the left side of the '|',
namely 'a' (no parens), so there are no captures. In part 2, the
right side ('b=(.)') matches, so there's 1 capture. In part 3, it
matches the whole thing ('(a|b)=(.)'), so there are 2 captures.

Does this make sense?

Note that if you only want 1 capture, you can also use the shy
grouping operator (?:...), so:

s1 =~ /(?:a|b)=(.)/

[$1, $2] #=> ["1", nil]
 
P

Per Velschow

Martin said:
irb(main):003:0> s1 =~ /a|b=(.)/
=> 0 <------ expression matches
irb(main):004:0> $1
=> nil <------ but where is argument?

I assume this is the one you need explanation for? I think you simply
misinterpret the regexp. /a|b=(.)/ is a union between the two regexp /a/
and /b=(.)/. So in this case it matches only the first one which has no
bindings. The regexp you are probably looking for would be
/(?:a|b)=(.)/. Try that.
 
M

Martin Kahlert

I'm not sure why you think it might be a bug. The '|' operator just
binds very loosely, so you have to group the "a|b" in parens. Note
that in part 1, The bit that matches is the left side of the '|',
namely 'a' (no parens), so there are no captures. In part 2, the
right side ('b=(.)') matches, so there's 1 capture. In part 3, it
matches the whole thing ('(a|b)=(.)'), so there are 2 captures.

Does this make sense?


I always assumed 'a|b anything' means '(a|b) anything'.

Thanks for this clarification!

Regards
Martin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,008
Latest member
obedient dusk

Latest Threads

Top