HTML to XML

2

2peachy

hello... I am brand new to this...
I did a search with no results...

how do you convert an html page into an xml page

2peach
 
J

Johannes Koch

2peachy said:
hello... I am brand new to this...
I did a search with no results...

how do you convert an html page into an xml page ?

For valid HTML documents you can use sx from OpenSP. Or use tidy to
output XHTML.
 
A

Andy Dingley

how do you convert an html page into an xml page ?

How long is a piece of string ?


How many pages are you dealing with ? Is this a one-off "I want to
convert my site" or a regular "I want to scrape stock prices from
another site and make them into an XML feed" ?

What's "HTML" ? Is this well-coded valid HTML 3.2 / 4.0, XHTML or
some tag-soup written by a M$oft tool ? What happens if it's not
valid ? Can your code crash, abandon the page, scream for human help,
or must it make a best-attempt ?

Can you avoid this altogether ? Can you obtain the content by some
friendlier means, such as RSS, direct access to the database, or some
other source ?

Why do you want to do it ? There are no "XML pages", there are only
XML documents. If you want to end up with "a web page" at the end of
it, then raw XML isn't enough of a finishing point, you need to take
it further.

What is "XML" ? What DTD or Schema are you aiming at ?


For one-offs, use Dave Raggett's Tidy (easily obtained via HTMLKit).
Even if you're not looking for an XHTML output, Tidy can be an
excellent pre-processor for sorting out ugly Tag Soup.

For screen-scrapes, use your favourite scripting language (Perl is
always a good start, but you could use Python or even JavaScript) and
use someone else's HTML parser.

RSS 1.0 is a good XML Schema to target at for generic screen scraping,
even if you don;t think your content is "relevant" to a newseed (but
RSS 0.92 isn't)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top