need script: convert html-text to text

Discussion in 'Ruby' started by keal, Jan 4, 2006.

  1. keal

    keal Guest

    keal, Jan 4, 2006
    #1
    1. Advertising

  2. keal

    Gene Tani Guest

    Gene Tani, Jan 4, 2006
    #2
    1. Advertising

  3. keal

    Ross Bamford Guest

    On Wed, 04 Jan 2006 10:30:03 -0000, keal <> wrote:

    > i have html-text. i have to convert this text to simple text without
    > html-tags.
    >


    It's tricky, there's more to it than you'd think. The best way is probably
    to use Lynx, or another browser, to do it for you, e.g.:

    def plain(url)
    `lynx -dump "#{url}"`
    end

    p = plain('http://www.google.com/')
    puts p

    Outputs:

    [1]Personalised Home | [2]Sign in

    [3]A picture of the Braille letters spelling out "Google." Happy Birthday
    Louis Braille!

    Web [4]Images [5]Groups [6]News [7]Froogle [8]more »

    > ... [snip] ...


    Of course you'll need lynx for that to work, but you can use others too.
    Try a Google search.

    Cheers,

    --
    Ross Bamford -
    Ross Bamford, Jan 4, 2006
    #3
  4. keal wrote:
    > i have html-text. i have to convert this text to simple text without
    > html-tags.


    This is a very low cost variant - I guess the lynx approach is much more
    effective and complete:

    ruby -pe 'gsub! %r{</?.*?>}, ""' index.html

    Kind regards

    robert
    Robert Klemme, Jan 4, 2006
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Andreas Klemt
    Replies:
    1
    Views:
    376
    Karl Seguin
    Jul 23, 2003
  2. Phlip
    Replies:
    3
    Views:
    386
    Roland
    Sep 6, 2005
  3. Stefan Mueller
    Replies:
    3
    Views:
    32,958
    Stefan Mueller
    Jul 23, 2006
  4. Phlip
    Replies:
    0
    Views:
    334
    Phlip
    Sep 5, 2005
  5. Replies:
    2
    Views:
    170
    John Bokma
    Aug 30, 2013
Loading...

Share This Page