ruby global regex question.

Discussion in 'Ruby' started by knohr, Nov 19, 2008.

  1. knohr

    knohr Guest

    For the life of me, i can't figure out a ruby equivalent to perl's /g

    basically, i want to do the following


    while htmlSource=~m/<table>(.*?)<\table>/g do
    tableSource=$1
    tableSource=~m/Index (\d+)/
    indexNumber=$1

    while tableSource=~m/<tr>(.*?)<\/tr>/g do
    tableRowSource=$1
    doSomethingWith(tableRowSource, indexNumber)
    end#while tableSource

    end#while htmlSource


    I will actually need to pull multiple vars, not just a single one,
    from the regex
    I will need to do the outer loop an unknown amount of times per
    document (0-20) and i will need to loop the inner an unknown amount of
    times (0-29)


    Thread safe would be a plus.


    any suggestions?
     
    knohr, Nov 19, 2008
    #1
    1. Advertising

  2. knohr

    Alan Johnson Guest

    [Note: parts of this message were removed to make it a legal post.]

    On Tue, Nov 18, 2008 at 4:06 PM, knohr <> wrote:

    > For the life of me, i can't figure out a ruby equivalent to perl's /g
    >
    > basically, i want to do the following
    >
    >
    > while htmlSource=~m/<table>(.*?)<\table>/g do
    > tableSource=$1
    > tableSource=~m/Index (\d+)/
    > indexNumber=$1
    >
    > while tableSource=~m/<tr>(.*?)<\/tr>/g do
    > tableRowSource=$1
    > doSomethingWith(tableRowSource, indexNumber)
    > end#while tableSource
    >
    > end#while htmlSource
    >
    >
    > I will actually need to pull multiple vars, not just a single one,
    > from the regex
    > I will need to do the outer loop an unknown amount of times per
    > document (0-20) and i will need to loop the inner an unknown amount of
    > times (0-29)
    >
    >
    > Thread safe would be a plus.
    >
    >
    > any suggestions?
    >
    >

    I think this does what you want, although I don't think gsub was really made
    for this purpose.

    def doSomethingWith(s)
    print s, "\n"
    end

    htmlSource = '<table><tr>1,1</tr><tr>1,2</tr></table>'
    htmlSource << '<table><tr>2,1</tr><tr>1,2</tr></table>'

    htmlSource.gsub(/<table>(.*?)<\/table>/) do |t|
    tableRowSource = $1
    tableRowSource.gsub(/<tr>(.*?)<\/tr>/) do |r|
    doSomethingWith $1
    end
    end

    --
    Alan
     
    Alan Johnson, Nov 19, 2008
    #2
    1. Advertising

  3. knohr

    Peter Szinek Guest

    [Note: parts of this message were removed to make it a legal post.]


    On 2008.11.19., at 1:06, knohr wrote:

    > For the life of me, i can't figure out a ruby equivalent to perl's /g
    >
    > basically, i want to do the following
    >
    >
    > while htmlSource=~m/<table>(.*?)<\table>/g do
    > tableSource=$1
    > tableSource=~m/Index (\d+)/
    > indexNumber=$1
    >
    > while tableSource=~m/<tr>(.*?)<\/tr>/g do
    > tableRowSource=$1
    > doSomethingWith(tableRowSource, indexNumber)
    > end#while tableSource
    >
    > end#while htmlSource
    >
    >
    > I will actually need to pull multiple vars, not just a single one,
    > from the regex
    > I will need to do the outer loop an unknown amount of times per
    > document (0-20) and i will need to loop the inner an unknown amount of
    > times (0-29)
    >
    >
    > Thread safe would be a plus.
    >
    >
    > any suggestions?


    While I can't answer your original question, I could possibly help you
    with the scraping if you are willing to reveal the page you are trying
    to scrape and the data bits on it which should be scraped.

    Cheers,
    Peter
    ___
    http://www.rubyrailways.com
    http://scrubyt.org
     
    Peter Szinek, Nov 19, 2008
    #3
  4. knohr

    Mark Thomas Guest

    On Nov 18, 7:08 pm, knohr <> wrote:
    > For the life of me, i can't figure out a ruby equivalent to perl's /g
    >
    > basically, i want to do the following
    >
    > while htmlSource=~m/<table>(.*?)<\table>/g do
    >    tableSource=$1
    >    tableSource=~m/Index (\d+)/
    >    indexNumber=$1
    >
    >    while tableSource=~m/<tr>(.*?)<\/tr>/g do
    >       tableRowSource=$1
    >       doSomethingWith(tableRowSource, indexNumber)
    >    end#while tableSource
    >
    > end#while htmlSource
    >
    > I will actually need to pull multiple vars, not just a single one,
    > from the regex
    > I will need to do the outer loop an unknown amount of times per
    > document (0-20) and i will need to loop the inner an unknown amount of
    > times (0-29)
    >
    > Thread safe would be a plus.


    Would fast be a plus? No nested loop?

    require 'nokogiri'
    doc = Nokogiri::HTML(htmlSource)
    doc.search('//tr').each do |row|
    index = row.xpath('ancestor::table/*[contains("Index",.)]')
    doSomethingWith(row.text,index[/(\d)/])
    end

    The location of the element containing the index may have to be
    modified.

    -- Mark.
     
    Mark Thomas, Nov 19, 2008
    #4
  5. On 19.11.2008, at 00:37 , Alan Johnson wrote:

    > On Tue, Nov 18, 2008 at 4:06 PM, knohr <>
    > wrote:
    >
    >> For the life of me, i can't figure out a ruby equivalent to perl's /g
    >>
    >> basically, i want to do the following
    >>
    >>
    >> while htmlSource=~m/<table>(.*?)<\table>/g do
    >> tableSource=$1
    >> tableSource=~m/Index (\d+)/
    >> indexNumber=$1
    >>
    >> while tableSource=~m/<tr>(.*?)<\/tr>/g do
    >> tableRowSource=$1
    >> doSomethingWith(tableRowSource, indexNumber)
    >> end#while tableSource
    >>
    >> end#while htmlSource
    >>
    >>
    >> I will actually need to pull multiple vars, not just a single one,
    >> from the regex
    >> I will need to do the outer loop an unknown amount of times per
    >> document (0-20) and i will need to loop the inner an unknown amount
    >> of
    >> times (0-29)
    >>
    >>
    >> Thread safe would be a plus.
    >>
    >>
    >> any suggestions?
    >>
    >>

    > I think this does what you want, although I don't think gsub was
    > really made
    > for this purpose.
    >
    > def doSomethingWith(s)
    > print s, "\n"
    > end
    >
    > htmlSource = '<table><tr>1,1</tr><tr>1,2</tr></table>'
    > htmlSource << '<table><tr>2,1</tr><tr>1,2</tr></table>'
    >
    > htmlSource.gsub(/<table>(.*?)<\/table>/) do |t|
    > tableRowSource = $1
    > tableRowSource.gsub(/<tr>(.*?)<\/tr>/) do |r|
    > doSomethingWith $1
    > end
    > end
    >
    > --
    > Alan




    That is pretty much how, except globals are hardly thread safe I
    think. Use scan instead of gsub:
    Here's something I wrote to extract information from data structured
    like this:

    - tablename
    + field1
    + field2:string

    - table2name
    +field1 : string
    +field2

    Table = Struct.new:)name, :fields)
    Field = Struct.new:)name, :type)

    def extract_db_spec(file)
    tables = []
    doc = open(file, File::RDONLY) {|f|f.read}
    table_name = /\- (\w*)\s*?\n/
    field_name = /(\s+\+ (\w+)\s*(\:\s*(\w*))?\n)/
    doc.scan /#{table_name}(#{field_name}+)/ do |tablename, fields|
    t = Table.new tablename, []
    fields.scan field_name do |junk, fieldname, junk2, type|
    if type.nil? || type == ""
    if /\w+_id/ === fieldname
    type = "int"
    else
    type = "string"
    end
    end

    t.fields << Field.new(fieldname, type)

    end
    tables << t
    end
    tables
    end


    einarmagnus
     
    Einar Magnús Boson, Nov 19, 2008
    #5
  6. On 19.11.2008 07:08, Einar Magnús Boson wrote:

    >
    > That is pretty much how, except globals are hardly thread safe I
    > think.


    $1 and the like are

    robert@fussel ~
    $ ruby -e '2.times{|i|Thread.new(i){|ii|4.times{/(\d+)/=~ii.to_s;puts
    $1;sleep 1}}};sleep 5'
    0
    1
    1
    0
    1
    0
    1
    0

    robert@fussel ~
    $

    > Use scan instead of gsub:


    Right, as far as I can see no replacements should be done. Just read
    only access.

    html_source.scan %r{<table>(.*?)</table>}i do
    table_souce = $1
    index_number = table_source[%r{Index\s+(\d+)}, 1].to_i

    table_source.scan %r{<tr>(.*?)</tr>}i do
    do_something_with $1, index_number
    end
    end

    But a proper HTML parser is probably much better. :)

    Kind regards

    robert
     
    Robert Klemme, Nov 19, 2008
    #6
  7. I use this as an equivalent to global match:

    class Regexp
    def global_match(str, &proc)
    retval = nil
    loop do
    res = str.sub(self) do |m|
    proc.call($~) # pass MatchData obj
    ''
    end
    break retval if res == str
    str = res
    retval ||= true
    end
    end
    end

    re = /.../
    re.global_match(...) do |m|
    ...
    end

    On Tue, Nov 18, 2008 at 9:06 PM, knohr <> wrote:
    > For the life of me, i can't figure out a ruby equivalent to perl's /g
    >
    > basically, i want to do the following
    >
    >
    > while htmlSource=~m/<table>(.*?)<\table>/g do
    > tableSource=$1
    > tableSource=~m/Index (\d+)/
    > indexNumber=$1
    >
    > while tableSource=~m/<tr>(.*?)<\/tr>/g do
    > tableRowSource=$1
    > doSomethingWith(tableRowSource, indexNumber)
    > end#while tableSource
    >
    > end#while htmlSource
    >
    >
    > I will actually need to pull multiple vars, not just a single one,
    > from the regex
    > I will need to do the outer loop an unknown amount of times per
    > document (0-20) and i will need to loop the inner an unknown amount of
    > times (0-29)
    >
    >
    > Thread safe would be a plus.
    >
    >
    > any suggestions?
    >
    >
     
    Gustavo Carvalho, Nov 19, 2008
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?SmViQnVzaGVsbA==?=

    Is ASP Validator Regex Engine Same As VS2003 Find Regex Engine?

    =?Utf-8?B?SmViQnVzaGVsbA==?=, Oct 22, 2005, in forum: ASP .Net
    Replies:
    2
    Views:
    745
    =?Utf-8?B?SmViQnVzaGVsbA==?=
    Oct 22, 2005
  2. Rick Venter

    perl regex to java regex

    Rick Venter, Oct 29, 2003, in forum: Java
    Replies:
    5
    Views:
    1,692
    Ant...
    Nov 6, 2003
  3. Replies:
    2
    Views:
    628
  4. Xah Lee
    Replies:
    1
    Views:
    972
    Ilias Lazaridis
    Sep 22, 2006
  5. Replies:
    3
    Views:
    832
    Reedick, Andrew
    Jul 1, 2008
Loading...

Share This Page