A
asd
Hello
I've to parse a lot of web site.
I want to take their html and transform it into xml. My idea was to
take the html , transform it in xml, apply to the xml an xslt and
obtain my custom xml. each site (xml) will have its own xslt with
xpath..
I've done something like that with the html parser project jar
org.xml.sax.XMLReader reader =
org.xml.sax.helpers.XMLReaderFactory.createXMLReader
("org.htmlparser.sax.XMLReader");
org.xml.sax.ContentHandler content = new MyContentHandler ();
reader.setContentHandler (content);
org.xml.sax.ErrorHandler errors = new MyErrorHandler ();
reader.setErrorHandler (errors);
reader.parse("http://www.google.com");
I've understand that the MyContentHandler will take care about xml tags
processing. For the moment I've implemented this only with system.out
to test it.
I really don't know how I can do what I want..
For example: how can I apply a xslt to the google site's xml to obtain
another xml?
I don't want to parse each tag with java code in the 'MyContentHandler'
I want that xslt thake care about this. After I retrive the clean xml
from the html I'll give this to the xslt .. so I can take my custom
xml.
Someone can help me?
thanks a lot guys
Martina
I've to parse a lot of web site.
I want to take their html and transform it into xml. My idea was to
take the html , transform it in xml, apply to the xml an xslt and
obtain my custom xml. each site (xml) will have its own xslt with
xpath..
I've done something like that with the html parser project jar
org.xml.sax.XMLReader reader =
org.xml.sax.helpers.XMLReaderFactory.createXMLReader
("org.htmlparser.sax.XMLReader");
org.xml.sax.ContentHandler content = new MyContentHandler ();
reader.setContentHandler (content);
org.xml.sax.ErrorHandler errors = new MyErrorHandler ();
reader.setErrorHandler (errors);
reader.parse("http://www.google.com");
I've understand that the MyContentHandler will take care about xml tags
processing. For the moment I've implemented this only with system.out
to test it.
I really don't know how I can do what I want..
For example: how can I apply a xslt to the google site's xml to obtain
another xml?
I don't want to parse each tag with java code in the 'MyContentHandler'
I want that xslt thake care about this. After I retrive the clean xml
from the html I'll give this to the xslt .. so I can take my custom
xml.
Someone can help me?
thanks a lot guys
Martina