html to plain text

C

Colin Summers

Okay, I have played with Hpricot and I am a convert. Amazing stuff.

I am struggling up to speed and I can't find what must be a basic
function. I've scraped the FAA site and they store all their stuff
wrapped in td's, wrapped in tr's, wrapped in tables. Thank you
Hpricot.

Now that I have "<b>Manufacturer</b>" isn't there a simple call to get
rid of the last bit of html?

Thanks,
--Colin
 
F

Florian Aßmann

Hi Colin, consult api doc for Hpricot.inner_text:

require 'rubygems'
require 'hpricot'
require 'open-uri'
doc =3D open( 'http://www.google.com/ncr' ) { |io| Hpricot io }
doc.inner_text

Regards
Florian
 
C

Chris Shea

Okay, I have played with Hpricot and I am a convert. Amazing stuff.

I am struggling up to speed and I can't find what must be a basic
function. I've scraped the FAA site and they store all their stuff
wrapped in td's, wrapped in tr's, wrapped in tables. Thank you
Hpricot.

Now that I have "<b>Manufacturer</b>" isn't there a simple call to get
rid of the last bit of html?

Thanks,
--Colin

It looks like you're looking for the inner_text method.

HTH,
Chris
 
T

Todd Benson

Hi Colin, consult api doc for Hpricot.inner_text:

require 'rubygems'
require 'hpricot'
require 'open-uri'
doc =3D open( 'http://www.google.com/ncr' ) { |io| Hpricot io }
doc.inner_text
^^^^^^^
This code (above) doesn't work on my system.

The following does:

require 'rubygems'
require 'hpricot'
html_string =3D '<b>Manufacturer</b>'
html_data =3D Hpricot html_string
html_element =3D html_data / "b"
puts html_element.inner_html

Todd
 
T

Todd Benson

The following does:

require 'rubygems'
require 'hpricot'
html_string = '<b>Manufacturer</b>'
html_data = Hpricot html_string
html_element = html_data / "b"
puts html_element.inner_html

Another "jump too soon moment".

In the above code, I didn't point out that html_element should be
plural. It still works though, but technically the grammatically
correct way would be:

require 'rubygems'
require 'hpricot'
html_string = '<b>Manufacturer</b>'
html_data = Hpricot html_string
html_elements = html_data / "b"
first_b_element = html_data.at "b"
first_b_element_also = (html_data / "b").first
puts first_b_element.inner_html

Todd
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top