Regexp: rubular VS match. Why is the result different ?

A

Ale Ds

I have to capture by means of regexp the content between '<' and '>'

as instance:

str = 'anystring<hour>anystring<min>anystring<sec>anystring'
I need the array['hour','min,'sec']

I have written the regexp: /(<([^<>]+)>)+/
and I have tested it in rubular.com site (It work !)

I have run it in irb:
/(<([^<>]+)>)+/.match('anystring<hour>anystring<min>anystring<sec>anystring')
As you can see match method return just the first match in MatchData obj

Do you know why ?

thank you,
Alessandro
 
R

Robert Klemme

2009/10/28 Ale Ds said:
I have to capture by means of regexp the content between '<' and '>'

as instance:

str = 'anystring<hour>anystring<min>anystring<sec>anystring'
I need the array['hour','min,'sec']

I have written the regexp: /(<([^<>]+)>)+/
and I have tested it in rubular.com site (It work !)

The "+" at the end is superfluous because this would match multiple
I have run it in irb:
/(<([^<>]+)>)+/.match('anystring<hour>anystring<min>anystring<sec>anystring')
As you can see match method return just the first match in MatchData obj

Do you know why ?

That's the difference between #match and #scan. You want scan in your code.

irb(main):001:0> str = 'anystring<hour>anystring<min>anystring<sec>anystring'
=> "anystring<hour>anystring<min>anystring<sec>anystring"
irb(main):002:0> str.scan /<([^>]+)>/
=> [["hour"], ["min"], ["sec"]]
irb(main):003:0> str.scan /<([^>]+)>/ do |m| p m end
["hour"]
["min"]
["sec"]
=> "anystring<hour>anystring<min>anystring<sec>anystring"
irb(main):004:0> str.scan /<([^>]+)>/ do |m,| p m end
"hour"
"min"
"sec"
=> "anystring<hour>anystring<min>anystring<sec>anystring"

Kind regards

robert
 
C

Chris Shea

I have to capture by means of regexp the content between '<' and '>'

as instance:

str = 'anystring<hour>anystring<min>anystring<sec>anystring'
I need the array['hour','min,'sec']

I have written the regexp: /(<([^<>]+)>)+/
and I have tested it in rubular.com site (It work !)

I have run it in irb:>> /(<([^<>]+)>)+/.match('anystring<hour>anystring<min>anystring<sec>anystring')

=> #<MatchData "<hour>" 1:"<hour>" 2:"hour">

As you can see match method return just the first match in MatchData obj

Do you know why ?

thank you,
Alessandro

Alessandro,

You'll want the String#scan method (http://www.ruby-doc.org/core/
classes/String.html#M000812).

015:0> regexp = /<([^<>]+)>/
=> /<([^<>]+)>/
016:0> str = 'anystring<hour>anystring<min>anystring<sec>anystring'
=> "anystring<hour>anystring<min>anystring<sec>anystring"
017:0> str.scan(regexp)
=> [["hour"], ["min"], ["sec"]]

HTH,
Chris
 
A

Ale Ds

The "+" at the end is superfluous because this would match multiple
concatenated sequences like <xx><yyy> which you want as separate
items.
...
I agree with you
I have run it in irb:
/(<([^<>]+)>)+/.match('anystring<hour>anystring<min>anystring<sec>anystring')
As you can see match method return just the first match in MatchData obj

Do you know why ?

That's the difference between #match and #scan. You want scan in your
code.
...

yes, scan works !
thanks a lot,
Alessandro
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top