R
Rick DeNatale
I'm trying to scan an html file using Hpricot to produce a table of
links within the file.
Right now I've got something like this.
doc = Hpricot(open(url)).
doc.search('a').each do | element |
puts "#{element.inner_html}
puts " #{element.attributes['href']
end
This works, but in this document some of the a tags use markup on
their contents. Something like
<a href="http://blah.org/blah.htm"><b>blah blah</b> blah</a>
I'd like to strip out the markup tags so that I'd get
blah blah blah
http://blah.org/blah.htm
Is there some way to search for or iterate over the leaf elements of
the tree rooted by an element in Hpricot?
links within the file.
Right now I've got something like this.
doc = Hpricot(open(url)).
doc.search('a').each do | element |
puts "#{element.inner_html}
puts " #{element.attributes['href']
end
This works, but in this document some of the a tags use markup on
their contents. Something like
<a href="http://blah.org/blah.htm"><b>blah blah</b> blah</a>
I'd like to strip out the markup tags so that I'd get
blah blah blah
http://blah.org/blah.htm
Is there some way to search for or iterate over the leaf elements of
the tree rooted by an element in Hpricot?