Html to xml to my custom xml with xslt ??!?

A

asd

Hello
I've to parse a lot of web site.
I want to take their html and transform it into xml. My idea was to
take the html , transform it in xml, apply to the xml an xslt and
obtain my custom xml. each site (xml) will have its own xslt with
xpath..

I've done something like that with the html parser project jar

org.xml.sax.XMLReader reader =
org.xml.sax.helpers.XMLReaderFactory.createXMLReader
("org.htmlparser.sax.XMLReader");
org.xml.sax.ContentHandler content = new MyContentHandler ();
reader.setContentHandler (content);
org.xml.sax.ErrorHandler errors = new MyErrorHandler ();
reader.setErrorHandler (errors);
reader.parse("http://www.google.com");

I've understand that the MyContentHandler will take care about xml tags
processing. For the moment I've implemented this only with system.out
to test it.

I really don't know how I can do what I want..
For example: how can I apply a xslt to the google site's xml to obtain
another xml?
I don't want to parse each tag with java code in the 'MyContentHandler'
I want that xslt thake care about this. After I retrive the clean xml
from the html I'll give this to the xslt .. so I can take my custom
xml.
Someone can help me?
thanks a lot guys
Martina
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,763
Messages
2,569,562
Members
45,038
Latest member
OrderProperKetocapsules

Latest Threads

Top