Parsing html

S

stixwix

What are peoples' favourite way of doing this?
I tried Tagsoup but have little experience of XML and can't find any
decent docs on the XPath bit.
The following prints the doc (a basic html file) title as expected:

URL url = new URL("file:///c:\\tmp\\test.htm");
Parser p = new Parser();
SAX2DOM sax2dom = new SAX2DOM();
p.setContentHandler(sax2dom);
p.parse(new InputSource(url.openStream()));
Node doc = sax2dom.getDOM();
String titlePath = "/html:html/html:head/html:title";
XObject title = XPathAPI.eval(doc,titlePath);
System.out.println("Title is '"+title+"'");

However, changing the titlePath to the following doesn't give the text
from the body tag:

String titlePath = "/html:html/html:body";

I would eventually like to be able to parse html comments into my java
prog as well.

Thanks,
Andy
 
J

jcsnippets.atspace.com

What are peoples' favourite way of doing this?
I tried Tagsoup but have little experience of XML and can't find any
decent docs on the XPath bit.
The following prints the doc (a basic html file) title as expected:

URL url = new URL("file:///c:\\tmp\\test.htm");
Parser p = new Parser();
SAX2DOM sax2dom = new SAX2DOM();
p.setContentHandler(sax2dom);
p.parse(new InputSource(url.openStream()));
Node doc = sax2dom.getDOM();
String titlePath = "/html:html/html:head/html:title";
XObject title = XPathAPI.eval(doc,titlePath);
System.out.println("Title is '"+title+"'");

However, changing the titlePath to the following doesn't give the text
from the body tag:

String titlePath = "/html:html/html:body";

I would eventually like to be able to parse html comments into my java
prog as well.

Thanks,
Andy

If you're going to parse Html files, have a look at
http://sourceforge.net/projects/htmlparser - very easy to use, samples
included.

Best regards,

JayCee
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,131
Latest member
IsiahLiebe
Top