html to plain text

Discussion in 'Ruby' started by Colin Summers, Jun 24, 2007.

  1. Okay, I have played with Hpricot and I am a convert. Amazing stuff.

    I am struggling up to speed and I can't find what must be a basic
    function. I've scraped the FAA site and they store all their stuff
    wrapped in td's, wrapped in tr's, wrapped in tables. Thank you
    Hpricot.

    Now that I have "<b>Manufacturer</b>" isn't there a simple call to get
    rid of the last bit of html?

    Thanks,
    --Colin
    Colin Summers, Jun 24, 2007
    #1
    1. Advertising

  2. Hi Colin, consult api doc for Hpricot.inner_text:

    require 'rubygems'
    require 'hpricot'
    require 'open-uri'
    doc =3D open( 'http://www.google.com/ncr' ) { |io| Hpricot io }
    doc.inner_text

    Regards
    Florian
    Florian Aßmann, Jun 24, 2007
    #2
    1. Advertising

  3. Colin Summers

    Chris Shea Guest

    On Jun 24, 1:40 pm, "Colin Summers" <> wrote:
    > Okay, I have played with Hpricot and I am a convert. Amazing stuff.
    >
    > I am struggling up to speed and I can't find what must be a basic
    > function. I've scraped the FAA site and they store all their stuff
    > wrapped in td's, wrapped in tr's, wrapped in tables. Thank you
    > Hpricot.
    >
    > Now that I have "<b>Manufacturer</b>" isn't there a simple call to get
    > rid of the last bit of html?
    >
    > Thanks,
    > --Colin


    It looks like you're looking for the inner_text method.

    HTH,
    Chris
    Chris Shea, Jun 24, 2007
    #3
  4. Colin Summers

    Todd Benson Guest

    On 6/24/07, Florian A=DFmann <> wrote:
    > Hi Colin, consult api doc for Hpricot.inner_text:
    >
    > require 'rubygems'
    > require 'hpricot'
    > require 'open-uri'
    > doc =3D open( 'http://www.google.com/ncr' ) { |io| Hpricot io }
    > doc.inner_text

    ^^^^^^^
    This code (above) doesn't work on my system.

    The following does:

    require 'rubygems'
    require 'hpricot'
    html_string =3D '<b>Manufacturer</b>'
    html_data =3D Hpricot html_string
    html_element =3D html_data / "b"
    puts html_element.inner_html

    Todd
    Todd Benson, Jun 24, 2007
    #4
  5. Colin Summers

    Todd Benson Guest

    On 6/24/07, Todd Benson <> wrote:
    > The following does:
    >
    > require 'rubygems'
    > require 'hpricot'
    > html_string = '<b>Manufacturer</b>'
    > html_data = Hpricot html_string
    > html_element = html_data / "b"
    > puts html_element.inner_html


    Another "jump too soon moment".

    In the above code, I didn't point out that html_element should be
    plural. It still works though, but technically the grammatically
    correct way would be:

    require 'rubygems'
    require 'hpricot'
    html_string = '<b>Manufacturer</b>'
    html_data = Hpricot html_string
    html_elements = html_data / "b"
    first_b_element = html_data.at "b"
    first_b_element_also = (html_data / "b").first
    puts first_b_element.inner_html

    Todd
    Todd Benson, Jun 24, 2007
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mike Bridge
    Replies:
    2
    Views:
    4,675
    Mike Bridge
    Feb 20, 2004
  2. Elton Pruitt
    Replies:
    2
    Views:
    5,779
    akjoshi
    Jun 12, 2006
  3. TimmyC
    Replies:
    0
    Views:
    1,494
    TimmyC
    Jun 8, 2007
  4. geoffbache
    Replies:
    8
    Views:
    590
    Stefan Behnel
    Feb 11, 2008
  5. Jake Barnes
    Replies:
    9
    Views:
    740
    dave cutts
    Feb 21, 2006
Loading...

Share This Page