HTML to XML Conversion - Difficulty with Tidy and TagSoup

E

Eric

I'm trying to convert html pages to xml and I'm having some difficulty
with the folowing:

1. I try to use Tidy but the html that I'm trying to convert to xhtml
has too many errors and so I spend a lot of time trying to "fix" the
html before running it through Tidy. I'm using Tidy with -asxml

2. I've tried using TagSoup with JDOM but the SAXBuilder internally
tries to set the namespace prefixes and TagSoup does not support that
internal feature.

I really would appreciate help from someone who has delt with having
to crank out lots of html from poorly formatted html. I appreciate
any help! ;)

-Eric
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,902
Latest member
Elena68X5

Latest Threads

Top