S
stixwix
What are peoples' favourite way of doing this?
I tried Tagsoup but have little experience of XML and can't find any
decent docs on the XPath bit.
The following prints the doc (a basic html file) title as expected:
URL url = new URL("file:///c:\\tmp\\test.htm");
Parser p = new Parser();
SAX2DOM sax2dom = new SAX2DOM();
p.setContentHandler(sax2dom);
p.parse(new InputSource(url.openStream()));
Node doc = sax2dom.getDOM();
String titlePath = "/html:html/html:head/html:title";
XObject title = XPathAPI.eval(doc,titlePath);
System.out.println("Title is '"+title+"'");
However, changing the titlePath to the following doesn't give the text
from the body tag:
String titlePath = "/html:html/html:body";
I would eventually like to be able to parse html comments into my java
prog as well.
Thanks,
Andy
I tried Tagsoup but have little experience of XML and can't find any
decent docs on the XPath bit.
The following prints the doc (a basic html file) title as expected:
URL url = new URL("file:///c:\\tmp\\test.htm");
Parser p = new Parser();
SAX2DOM sax2dom = new SAX2DOM();
p.setContentHandler(sax2dom);
p.parse(new InputSource(url.openStream()));
Node doc = sax2dom.getDOM();
String titlePath = "/html:html/html:head/html:title";
XObject title = XPathAPI.eval(doc,titlePath);
System.out.println("Title is '"+title+"'");
However, changing the titlePath to the following doesn't give the text
from the body tag:
String titlePath = "/html:html/html:body";
I would eventually like to be able to parse html comments into my java
prog as well.
Thanks,
Andy