Getting a list of results from one regular expression

Discussion in 'Ruby' started by tietyt@gmail.com, Jun 23, 2005.

  1. Guest

    Hello I'm new to Ruby. I've read most of the pragmatic programmer
    guide but couldn't find anything that explained how to do this.

    To summarize my whole question: how do I get EVERY match of a regular
    expression (instead of just the first)?

    Here's my situation, I've got this long string that contains XML. I
    would like to parse it. Specifically, I want to search this string for
    all instances of a pattern like /stringAlias="(.*)"/

    I'm no pro with regex, but I think that will find a match for a string
    that looks like this: stringAlias="BLAH"

    And because of the (.*), the result will be BLAH

    Now this is all fine and good. But what I can't figure out is how to
    get every match in an array (instead of just the first match.

    If i have stringAlias="BLAH" ... stringAlias="BLEH" how do I get an
    array that is ["BLAH", "BLEH"]?

    Keep in mind that there are a dynamic number of matches for
    stringAlias="(.*)"


    This is the code I wrote to try to do it:

    def ...
    @aliases = []
    matchedData = /stringAlias="(.*?)"/.match(@data)
    @aliases = matchedData.to_a
    puts @aliases
    end

    The length of the array is 2 and the result is this:
    stringAlias="OP"
    OP

    Even though the data is this:
    <string RSLDefined="false" active="false" languageId="1"
    sortOrder="0" stringAlias="OP">
    <stringValue><![CDATA[Open or Pending]]></stringValue>
    </string>
    <string RSLDefined="false" active="true" languageId="1"
    sortOrder="1" stringAlias="1">
    <stringValue><![CDATA[Open]]></stringValue>
    </string>
    <string RSLDefined="false" active="true" languageId="1"
    sortOrder="2" stringAlias="2">
    <stringValue><![CDATA[Pend]]></stringValue>
    </string>
    <string RSLDefined="false" active="true" languageId="1"
    sortOrder="3" stringAlias="3">
    <stringValue><![CDATA[Decline]]></stringValue>
    </string>
    <string RSLDefined="false" active="true" languageId="1"
    sortOrder="4" stringAlias="4">
    <stringValue><![CDATA[Complete]]></stringValue>
    </string>
    , Jun 23, 2005
    #1
    1. Advertising

  2. wrote:

    >To summarize my whole question: how do I get EVERY match of a regular
    >expression (instead of just the first)?
    >
    >

    String#scan

    I'm sure there are other ways, though. I just learned about String#scan
    today. (Yes, Dave, my copy of the Pickaxe is on its way.)

    Devin
    Devin Mullins, Jun 23, 2005
    #2
    1. Advertising

  3. C Erler Guest

    I usually use String#scan.

    "testwoohootestkaboomtestyutyut".scan(/test../)
    =3D> ["testwo", "testka", "testyu"]

    On 22/06/05, <> wrote:
    > Hello I'm new to Ruby. I've read most of the pragmatic programmer
    > guide but couldn't find anything that explained how to do this.
    >=20
    > To summarize my whole question: how do I get EVERY match of a regular
    > expression (instead of just the first)?
    >=20
    > Here's my situation, I've got this long string that contains XML. I
    > would like to parse it. Specifically, I want to search this string for
    > all instances of a pattern like /stringAlias=3D"(.*)"/
    >=20
    > I'm no pro with regex, but I think that will find a match for a string
    > that looks like this: stringAlias=3D"BLAH"
    >=20
    > And because of the (.*), the result will be BLAH
    >=20
    > Now this is all fine and good. But what I can't figure out is how to
    > get every match in an array (instead of just the first match.
    >=20
    > If i have stringAlias=3D"BLAH" ... stringAlias=3D"BLEH" how do I get an
    > array that is ["BLAH", "BLEH"]?
    >=20
    > Keep in mind that there are a dynamic number of matches for
    > stringAlias=3D"(.*)"
    >=20
    > This is the code I wrote to try to do it:
    >=20
    > def ...
    > @aliases =3D []
    > matchedData =3D /stringAlias=3D"(.*?)"/.match(@data)
    > @aliases =3D matchedData.to_a
    > puts @aliases
    > end
    >=20
    > The length of the array is 2 and the result is this:
    > stringAlias=3D"OP"
    > OP
    >=20
    > Even though the data is this:
    > <string RSLDefined=3D"false" active=3D"false" languageId=3D"1"
    > sortOrder=3D"0" stringAlias=3D"OP">
    > <stringValue><![CDATA[Open or Pending]]></stringValue>
    > </string>
    > <string RSLDefined=3D"false" active=3D"true" languageId=3D"1"
    > sortOrder=3D"1" stringAlias=3D"1">
    > <stringValue><![CDATA[Open]]></stringValue>
    > </string>
    > <string RSLDefined=3D"false" active=3D"true" languageId=3D"1"
    > sortOrder=3D"2" stringAlias=3D"2">
    > <stringValue><![CDATA[Pend]]></stringValue>
    > </string>
    > <string RSLDefined=3D"false" active=3D"true" languageId=3D"1"
    > sortOrder=3D"3" stringAlias=3D"3">
    > <stringValue><![CDATA[Decline]]></stringValue>
    > </string>
    > <string RSLDefined=3D"false" active=3D"true" languageId=3D"1"
    > sortOrder=3D"4" stringAlias=3D"4">
    > <stringValue><![CDATA[Complete]]></stringValue>
    > </string>
    >=20
    >
    C Erler, Jun 23, 2005
    #3
  4. Mark Hubbart Guest

    On 6/22/05, <> wrote:
    > Hello I'm new to Ruby. I've read most of the pragmatic programmer
    > guide but couldn't find anything that explained how to do this.
    >=20
    > To summarize my whole question: how do I get EVERY match of a regular
    > expression (instead of just the first)?
    >=20
    > Here's my situation, I've got this long string that contains XML. I
    > would like to parse it. Specifically, I want to search this string for
    > all instances of a pattern like /stringAlias=3D"(.*)"/
    >=20
    > I'm no pro with regex, but I think that will find a match for a string
    > that looks like this: stringAlias=3D"BLAH"
    >=20
    > And because of the (.*), the result will be BLAH
    >=20
    > Now this is all fine and good. But what I can't figure out is how to
    > get every match in an array (instead of just the first match.
    >=20
    > If i have stringAlias=3D"BLAH" ... stringAlias=3D"BLEH" how do I get an
    > array that is ["BLAH", "BLEH"]?
    >=20
    > Keep in mind that there are a dynamic number of matches for
    > stringAlias=3D"(.*)"
    >=20
    >=20
    > This is the code I wrote to try to do it:
    >=20
    > def ...
    > @aliases =3D []
    > matchedData =3D /stringAlias=3D"(.*?)"/.match(@data)
    > @aliases =3D matchedData.to_a
    > puts @aliases
    > end
    >=20
    > The length of the array is 2 and the result is this:
    > stringAlias=3D"OP"
    > OP
    >=20
    > Even though the data is this:
    > <string RSLDefined=3D"false" active=3D"false" languageId=3D"1"
    > sortOrder=3D"0" stringAlias=3D"OP">
    > <stringValue><![CDATA[Open or Pending]]></stringValue>
    > </string>
    > <string RSLDefined=3D"false" active=3D"true" languageId=3D"1"
    > sortOrder=3D"1" stringAlias=3D"1">
    > <stringValue><![CDATA[Open]]></stringValue>
    > </string>
    > <string RSLDefined=3D"false" active=3D"true" languageId=3D"1"
    > sortOrder=3D"2" stringAlias=3D"2">
    > <stringValue><![CDATA[Pend]]></stringValue>
    > </string>
    > <string RSLDefined=3D"false" active=3D"true" languageId=3D"1"
    > sortOrder=3D"3" stringAlias=3D"3">
    > <stringValue><![CDATA[Decline]]></stringValue>
    > </string>
    > <string RSLDefined=3D"false" active=3D"true" languageId=3D"1"
    > sortOrder=3D"4" stringAlias=3D"4">
    > <stringValue><![CDATA[Complete]]></stringValue>
    > </string>


    Regexp#match only gives the first match; the matchdata object is sort
    of an array of the entire match, followed by the subexpression
    matches. What you want is String#scan: (warning, untested)

    regexp =3D /stringAlias=3D"(.*?)"/
    matches =3D @data.scan(regexp)

    Since the regexp has a subexpression matcher, that is what will be put
    into the array "matches". You'll get an array something like this:

    [["OP"],["1"],["2"], ... ]

    (each match has it's own subarray, since it's a subexpression match)

    Check out the docs for String#scan for more info...

    cheers,
    Mark
    Mark Hubbart, Jun 23, 2005
    #4
  5. On Jun 22, 2005, at 8:30 PM, wrote:
    > To summarize my whole question: how do I get EVERY match of a regular
    > expression (instead of just the first)?


    In addition to the correct response given by others (String#scan),
    you might also want to look at the StringScanner class. It gives you
    the ability to crawl through a string with successive regexp calls,
    where each new call starts at the new 'current' position.

    story = <<ENDSTORY
    Hello World! There are 3 cats in my house, with 4 feet each.

    6 of those 12 feet have 5 claws each; the other 6 feet have 4 claws
    each.

    Ow, my back. 54 claws need clipping.
    ENDSTORY

    require 'strscan'
    scanner = StringScanner.new( story )

    info = []
    count_nouns = /(\d+) (\w+)/

    until scanner.eos?
    break unless scanner.scan_until( count_nouns )
    tidbit = {
    :full_match => scanner[0],
    :count => scanner[1].to_i,
    :noun => scanner[2]
    }
    info << tidbit
    end

    require 'pp'
    pp info
    info.each{ |tidbit|
    puts "Of %7s, I saw %02d" % [ tidbit[:noun], tidbit[:count] ]
    }



    [{:noun=>"cats", :count=>3, :full_match=>"3 cats"},
    {:noun=>"feet", :count=>4, :full_match=>"4 feet"},
    {:noun=>"of", :count=>6, :full_match=>"6 of"},
    {:noun=>"feet", :count=>12, :full_match=>"12 feet"},
    {:noun=>"claws", :count=>5, :full_match=>"5 claws"},
    {:noun=>"feet", :count=>6, :full_match=>"6 feet"},
    {:noun=>"claws", :count=>4, :full_match=>"4 claws"},
    {:noun=>"claws", :count=>54, :full_match=>"54 claws"}]
    Of cats, I saw 03
    Of feet, I saw 04
    Of of, I saw 06
    Of feet, I saw 12
    Of claws, I saw 05
    Of feet, I saw 06
    Of claws, I saw 04
    Of claws, I saw 54
    Gavin Kistner, Jun 23, 2005
    #5
  6. Pit Capitain Guest

    schrieb:
    > Here's my situation, I've got this long string that contains XML. I
    > would like to parse it. Specifically, I want to search this string for
    > all instances of a pattern like /stringAlias="(.*)"/


    One additional remark: if the input can contain multiple stringAlias
    expressions on one line, the pattern should be /stringAlias="(.*?)"/
    (note the question mark). You can see the difference if you match a
    string like

    str = "stringAlias=\"one\" bla stringAlias=\"two\""

    p str.scan( /stringAlias="(.*)"/ )
    # => [["one\" bla stringAlias=\"two"]]

    p str.scan( /stringAlias="(.*?)"/ )
    # => [["one"], ["two"]]

    Regards,
    Pit
    Pit Capitain, Jun 23, 2005
    #6
  7. Guest

    First of all, thanks for all that super fast help. I've never asked a
    technical question anywhere before and got such a fast response.

    Specifically to Pit Capitain:
    Thanks for that tip. I just googled that and learned what the .*?
    does.

    Pit Capitain wrote:
    > schrieb:
    > > Here's my situation, I've got this long string that contains XML. I
    > > would like to parse it. Specifically, I want to search this string for
    > > all instances of a pattern like /stringAlias="(.*)"/

    >
    > One additional remark: if the input can contain multiple stringAlias
    > expressions on one line, the pattern should be /stringAlias="(.*?)"/
    > (note the question mark). You can see the difference if you match a
    > string like
    >
    > str = "stringAlias=\"one\" bla stringAlias=\"two\""
    >
    > p str.scan( /stringAlias="(.*)"/ )
    > # => [["one\" bla stringAlias=\"two"]]
    >
    > p str.scan( /stringAlias="(.*?)"/ )
    > # => [["one"], ["two"]]
    >
    > Regards,
    > Pit
    , Jun 23, 2005
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. VSK
    Replies:
    2
    Views:
    2,279
  2. =?iso-8859-1?B?bW9vcJk=?=

    Matching abitrary expression in a regular expression

    =?iso-8859-1?B?bW9vcJk=?=, Dec 1, 2005, in forum: Java
    Replies:
    8
    Views:
    838
    Alan Moore
    Dec 2, 2005
  3. GIMME
    Replies:
    3
    Views:
    11,939
    vforvikash
    Dec 29, 2008
  4. lovecreatesbeauty
    Replies:
    8
    Views:
    1,636
    Old Wolf
    Sep 12, 2005
  5. AMT2K5
    Replies:
    6
    Views:
    102
    Eric J. Roode
    Dec 7, 2005
Loading...

Share This Page