Getting a list of results from one regular expression

T

tietyt

Hello I'm new to Ruby. I've read most of the pragmatic programmer
guide but couldn't find anything that explained how to do this.

To summarize my whole question: how do I get EVERY match of a regular
expression (instead of just the first)?

Here's my situation, I've got this long string that contains XML. I
would like to parse it. Specifically, I want to search this string for
all instances of a pattern like /stringAlias="(.*)"/

I'm no pro with regex, but I think that will find a match for a string
that looks like this: stringAlias="BLAH"

And because of the (.*), the result will be BLAH

Now this is all fine and good. But what I can't figure out is how to
get every match in an array (instead of just the first match.

If i have stringAlias="BLAH" ... stringAlias="BLEH" how do I get an
array that is ["BLAH", "BLEH"]?

Keep in mind that there are a dynamic number of matches for
stringAlias="(.*)"


This is the code I wrote to try to do it:

def ...
@aliases = []
matchedData = /stringAlias="(.*?)"/.match(@data)
@aliases = matchedData.to_a
puts @aliases
end

The length of the array is 2 and the result is this:
stringAlias="OP"
OP

Even though the data is this:
<string RSLDefined="false" active="false" languageId="1"
sortOrder="0" stringAlias="OP">
<stringValue><![CDATA[Open or Pending]]></stringValue>
</string>
<string RSLDefined="false" active="true" languageId="1"
sortOrder="1" stringAlias="1">
<stringValue><![CDATA[Open]]></stringValue>
</string>
<string RSLDefined="false" active="true" languageId="1"
sortOrder="2" stringAlias="2">
<stringValue><![CDATA[Pend]]></stringValue>
</string>
<string RSLDefined="false" active="true" languageId="1"
sortOrder="3" stringAlias="3">
<stringValue><![CDATA[Decline]]></stringValue>
</string>
<string RSLDefined="false" active="true" languageId="1"
sortOrder="4" stringAlias="4">
<stringValue><![CDATA[Complete]]></stringValue>
</string>
 
D

Devin Mullins

To summarize my whole question: how do I get EVERY match of a regular
expression (instead of just the first)?
String#scan

I'm sure there are other ways, though. I just learned about String#scan
today. (Yes, Dave, my copy of the Pickaxe is on its way.)

Devin
 
C

C Erler

I usually use String#scan.

"testwoohootestkaboomtestyutyut".scan(/test../)
=3D> ["testwo", "testka", "testyu"]
 
M

Mark Hubbart

Hello I'm new to Ruby. I've read most of the pragmatic programmer
guide but couldn't find anything that explained how to do this.
=20
To summarize my whole question: how do I get EVERY match of a regular
expression (instead of just the first)?
=20
Here's my situation, I've got this long string that contains XML. I
would like to parse it. Specifically, I want to search this string for
all instances of a pattern like /stringAlias=3D"(.*)"/
=20
I'm no pro with regex, but I think that will find a match for a string
that looks like this: stringAlias=3D"BLAH"
=20
And because of the (.*), the result will be BLAH
=20
Now this is all fine and good. But what I can't figure out is how to
get every match in an array (instead of just the first match.
=20
If i have stringAlias=3D"BLAH" ... stringAlias=3D"BLEH" how do I get an
array that is ["BLAH", "BLEH"]?
=20
Keep in mind that there are a dynamic number of matches for
stringAlias=3D"(.*)"
=20
=20
This is the code I wrote to try to do it:
=20
def ...
@aliases =3D []
matchedData =3D /stringAlias=3D"(.*?)"/.match(@data)
@aliases =3D matchedData.to_a
puts @aliases
end
=20
The length of the array is 2 and the result is this:
stringAlias=3D"OP"
OP
=20
Even though the data is this:
<string RSLDefined=3D"false" active=3D"false" languageId=3D"1"
sortOrder=3D"0" stringAlias=3D"OP">
<stringValue><![CDATA[Open or Pending]]></stringValue>
</string>
<string RSLDefined=3D"false" active=3D"true" languageId=3D"1"
sortOrder=3D"1" stringAlias=3D"1">
<stringValue><![CDATA[Open]]></stringValue>
</string>
<string RSLDefined=3D"false" active=3D"true" languageId=3D"1"
sortOrder=3D"2" stringAlias=3D"2">
<stringValue><![CDATA[Pend]]></stringValue>
</string>
<string RSLDefined=3D"false" active=3D"true" languageId=3D"1"
sortOrder=3D"3" stringAlias=3D"3">
<stringValue><![CDATA[Decline]]></stringValue>
</string>
<string RSLDefined=3D"false" active=3D"true" languageId=3D"1"
sortOrder=3D"4" stringAlias=3D"4">
<stringValue><![CDATA[Complete]]></stringValue>
</string>

Regexp#match only gives the first match; the matchdata object is sort
of an array of the entire match, followed by the subexpression
matches. What you want is String#scan: (warning, untested)

regexp =3D /stringAlias=3D"(.*?)"/
matches =3D @data.scan(regexp)

Since the regexp has a subexpression matcher, that is what will be put
into the array "matches". You'll get an array something like this:

[["OP"],["1"],["2"], ... ]

(each match has it's own subarray, since it's a subexpression match)

Check out the docs for String#scan for more info...

cheers,
Mark
 
G

Gavin Kistner

To summarize my whole question: how do I get EVERY match of a regular
expression (instead of just the first)?

In addition to the correct response given by others (String#scan),
you might also want to look at the StringScanner class. It gives you
the ability to crawl through a string with successive regexp calls,
where each new call starts at the new 'current' position.

story = <<ENDSTORY
Hello World! There are 3 cats in my house, with 4 feet each.

6 of those 12 feet have 5 claws each; the other 6 feet have 4 claws
each.

Ow, my back. 54 claws need clipping.
ENDSTORY

require 'strscan'
scanner = StringScanner.new( story )

info = []
count_nouns = /(\d+) (\w+)/

until scanner.eos?
break unless scanner.scan_until( count_nouns )
tidbit = {
:full_match => scanner[0],
:count => scanner[1].to_i,
:noun => scanner[2]
}
info << tidbit
end

require 'pp'
pp info
info.each{ |tidbit|
puts "Of %7s, I saw %02d" % [ tidbit[:noun], tidbit[:count] ]
}



[{:noun=>"cats", :count=>3, :full_match=>"3 cats"},
{:noun=>"feet", :count=>4, :full_match=>"4 feet"},
{:noun=>"of", :count=>6, :full_match=>"6 of"},
{:noun=>"feet", :count=>12, :full_match=>"12 feet"},
{:noun=>"claws", :count=>5, :full_match=>"5 claws"},
{:noun=>"feet", :count=>6, :full_match=>"6 feet"},
{:noun=>"claws", :count=>4, :full_match=>"4 claws"},
{:noun=>"claws", :count=>54, :full_match=>"54 claws"}]
Of cats, I saw 03
Of feet, I saw 04
Of of, I saw 06
Of feet, I saw 12
Of claws, I saw 05
Of feet, I saw 06
Of claws, I saw 04
Of claws, I saw 54
 
P

Pit Capitain

Here's my situation, I've got this long string that contains XML. I
would like to parse it. Specifically, I want to search this string for
all instances of a pattern like /stringAlias="(.*)"/

One additional remark: if the input can contain multiple stringAlias
expressions on one line, the pattern should be /stringAlias="(.*?)"/
(note the question mark). You can see the difference if you match a
string like

str = "stringAlias=\"one\" bla stringAlias=\"two\""

p str.scan( /stringAlias="(.*)"/ )
# => [["one\" bla stringAlias=\"two"]]

p str.scan( /stringAlias="(.*?)"/ )
# => [["one"], ["two"]]

Regards,
Pit
 
T

tietyt

First of all, thanks for all that super fast help. I've never asked a
technical question anywhere before and got such a fast response.

Specifically to Pit Capitain:
Thanks for that tip. I just googled that and learned what the .*?
does.

Pit said:
Here's my situation, I've got this long string that contains XML. I
would like to parse it. Specifically, I want to search this string for
all instances of a pattern like /stringAlias="(.*)"/

One additional remark: if the input can contain multiple stringAlias
expressions on one line, the pattern should be /stringAlias="(.*?)"/
(note the question mark). You can see the difference if you match a
string like

str = "stringAlias=\"one\" bla stringAlias=\"two\""

p str.scan( /stringAlias="(.*)"/ )
# => [["one\" bla stringAlias=\"two"]]

p str.scan( /stringAlias="(.*?)"/ )
# => [["one"], ["two"]]

Regards,
Pit
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top