need script: convert html-text to text

R

Ross Bamford

i have html-text. i have to convert this text to simple text without
html-tags.

It's tricky, there's more to it than you'd think. The best way is probably
to use Lynx, or another browser, to do it for you, e.g.:

def plain(url)
`lynx -dump "#{url}"`
end

p = plain('http://www.google.com/')
puts p

Outputs:

[1]Personalised Home | [2]Sign in

[3]A picture of the Braille letters spelling out "Google." Happy Birthday
Louis Braille!

Web [4]Images [5]Groups [6]News [7]Froogle [8]more »
... [snip] ...

Of course you'll need lynx for that to work, but you can use others too.
Try a Google search.

Cheers,
 
R

Robert Klemme

keal said:
i have html-text. i have to convert this text to simple text without
html-tags.

This is a very low cost variant - I guess the lynx approach is much more
effective and complete:

ruby -pe 'gsub! %r{</?.*?>}, ""' index.html

Kind regards

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top