ruby global regex question.

K

knohr

For the life of me, i can't figure out a ruby equivalent to perl's /g

basically, i want to do the following


while htmlSource=~m/<table>(.*?)<\table>/g do
tableSource=$1
tableSource=~m/Index (\d+)/
indexNumber=$1

while tableSource=~m/<tr>(.*?)<\/tr>/g do
tableRowSource=$1
doSomethingWith(tableRowSource, indexNumber)
end#while tableSource

end#while htmlSource


I will actually need to pull multiple vars, not just a single one,
from the regex
I will need to do the outer loop an unknown amount of times per
document (0-20) and i will need to loop the inner an unknown amount of
times (0-29)


Thread safe would be a plus.


any suggestions?
 
A

Alan Johnson

[Note: parts of this message were removed to make it a legal post.]

For the life of me, i can't figure out a ruby equivalent to perl's /g

basically, i want to do the following


while htmlSource=~m/<table>(.*?)<\table>/g do
tableSource=$1
tableSource=~m/Index (\d+)/
indexNumber=$1

while tableSource=~m/<tr>(.*?)<\/tr>/g do
tableRowSource=$1
doSomethingWith(tableRowSource, indexNumber)
end#while tableSource

end#while htmlSource


I will actually need to pull multiple vars, not just a single one,
from the regex
I will need to do the outer loop an unknown amount of times per
document (0-20) and i will need to loop the inner an unknown amount of
times (0-29)


Thread safe would be a plus.


any suggestions?
I think this does what you want, although I don't think gsub was really made
for this purpose.

def doSomethingWith(s)
print s, "\n"
end

htmlSource = '<table><tr>1,1</tr><tr>1,2</tr></table>'
htmlSource << '<table><tr>2,1</tr><tr>1,2</tr></table>'

htmlSource.gsub(/<table>(.*?)<\/table>/) do |t|
tableRowSource = $1
tableRowSource.gsub(/<tr>(.*?)<\/tr>/) do |r|
doSomethingWith $1
end
end
 
P

Peter Szinek

[Note: parts of this message were removed to make it a legal post.]


For the life of me, i can't figure out a ruby equivalent to perl's /g

basically, i want to do the following


while htmlSource=~m/<table>(.*?)<\table>/g do
tableSource=$1
tableSource=~m/Index (\d+)/
indexNumber=$1

while tableSource=~m/<tr>(.*?)<\/tr>/g do
tableRowSource=$1
doSomethingWith(tableRowSource, indexNumber)
end#while tableSource

end#while htmlSource


I will actually need to pull multiple vars, not just a single one,
from the regex
I will need to do the outer loop an unknown amount of times per
document (0-20) and i will need to loop the inner an unknown amount of
times (0-29)


Thread safe would be a plus.


any suggestions?

While I can't answer your original question, I could possibly help you
with the scraping if you are willing to reveal the page you are trying
to scrape and the data bits on it which should be scraped.

Cheers,
Peter
___
http://www.rubyrailways.com
http://scrubyt.org
 
M

Mark Thomas

For the life of me, i can't figure out a ruby equivalent to perl's /g

basically, i want to do the following

while htmlSource=~m/<table>(.*?)<\table>/g do
   tableSource=$1
   tableSource=~m/Index (\d+)/
   indexNumber=$1

   while tableSource=~m/<tr>(.*?)<\/tr>/g do
      tableRowSource=$1
      doSomethingWith(tableRowSource, indexNumber)
   end#while tableSource

end#while htmlSource

I will actually need to pull multiple vars, not just a single one,
from the regex
I will need to do the outer loop an unknown amount of times per
document (0-20) and i will need to loop the inner an unknown amount of
times (0-29)

Thread safe would be a plus.

Would fast be a plus? No nested loop?

require 'nokogiri'
doc = Nokogiri::HTML(htmlSource)
doc.search('//tr').each do |row|
index = row.xpath('ancestor::table/*[contains("Index",.)]')
doSomethingWith(row.text,index[/(\d)/])
end

The location of the element containing the index may have to be
modified.

-- Mark.
 
E

Einar Magnús Boson

I think this does what you want, although I don't think gsub was
really made
for this purpose.

def doSomethingWith(s)
print s, "\n"
end

htmlSource = '<table><tr>1,1</tr><tr>1,2</tr></table>'
htmlSource << '<table><tr>2,1</tr><tr>1,2</tr></table>'

htmlSource.gsub(/<table>(.*?)<\/table>/) do |t|
tableRowSource = $1
tableRowSource.gsub(/<tr>(.*?)<\/tr>/) do |r|
doSomethingWith $1
end
end



That is pretty much how, except globals are hardly thread safe I
think. Use scan instead of gsub:
Here's something I wrote to extract information from data structured
like this:

- tablename
+ field1
+ field2:string

- table2name
+field1 : string
+field2

Table = Struct.new:)name, :fields)
Field = Struct.new:)name, :type)

def extract_db_spec(file)
tables = []
doc = open(file, File::RDONLY) {|f|f.read}
table_name = /\- (\w*)\s*?\n/
field_name = /(\s+\+ (\w+)\s*(\:\s*(\w*))?\n)/
doc.scan /#{table_name}(#{field_name}+)/ do |tablename, fields|
t = Table.new tablename, []
fields.scan field_name do |junk, fieldname, junk2, type|
if type.nil? || type == ""
if /\w+_id/ === fieldname
type = "int"
else
type = "string"
end
end

t.fields << Field.new(fieldname, type)

end
tables << t
end
tables
end


einarmagnus
 
R

Robert Klemme

That is pretty much how, except globals are hardly thread safe I
think.

$1 and the like are

robert@fussel ~
$ ruby -e '2.times{|i|Thread.new(i){|ii|4.times{/(\d+)/=~ii.to_s;puts
$1;sleep 1}}};sleep 5'
0
1
1
0
1
0
1
0

robert@fussel ~
$
Use scan instead of gsub:

Right, as far as I can see no replacements should be done. Just read
only access.

html_source.scan %r{<table>(.*?)</table>}i do
table_souce = $1
index_number = table_source[%r{Index\s+(\d+)}, 1].to_i

table_source.scan %r{<tr>(.*?)</tr>}i do
do_something_with $1, index_number
end
end

But a proper HTML parser is probably much better. :)

Kind regards

robert
 
G

Gustavo Carvalho

I use this as an equivalent to global match:

class Regexp
def global_match(str, &proc)
retval = nil
loop do
res = str.sub(self) do |m|
proc.call($~) # pass MatchData obj
''
end
break retval if res == str
str = res
retval ||= true
end
end
end

re = /.../
re.global_match(...) do |m|
...
end
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top