Ignoring tags when extracting data from xhtml

D

Damo_Suzuki

hi again,
I'm traversing an org.w3c.dom.Document to extract data.Say I'm going
through the following line:

<h2 class=r>
<a class=l href="http://www.java.com/" onmousedown="return
clk(this.href,'','','res','1','')">
<b>java</b>.com: Hot Games, Cool Apps</a></h2>

I look for the h2 node. I want it to print out, just, "java.com:Hot
Games, Cool Apps". At the moment it doesnt print anything. I thik its
because of the <b></b> tags in the middle. Is there anyway I can ignore
tags after I find the h2 tag
thanks
 
D

Damo_Suzuki

hi,
I just noticed JTidy has a method getDropFontTags() (oddly named!!)
,but has no documentation of how to use it. If you call it from a new
instance of a tidy object , how does it know what file to remove the
tags from? Has anyone ever used this method and if so could you show me
how?
Thanks
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,046
Latest member
Gavizuho

Latest Threads

Top