Ignoring tags when extracting data from xhtml

Damo_Suzuki · Dec 8, 2006

hi again,
I'm traversing an org.w3c.dom.Document to extract data.Say I'm going
through the following line:

<h2 class=r>
<a class=l href="http://www.java.com/" onmousedown="return
clk(this.href,'','','res','1','')">
<b>java</b>.com: Hot Games, Cool Apps</a></h2>

I look for the h2 node. I want it to print out, just, "java.com:Hot
Games, Cool Apps". At the moment it doesnt print anything. I thik its
because of the <b></b> tags in the middle. Is there anyway I can ignore
tags after I find the h2 tag
thanks

Damo_Suzuki · Dec 8, 2006

hi,
I just noticed JTidy has a method getDropFontTags() (oddly named!!)
,but has no documentation of how to use it. If you call it from a new
instance of a tidy object , how does it know what file to remove the
tags from? Has anyone ever used this method and if so could you show me
how?
Thanks

extract data from xhtml	2	Dec 7, 2006
strip away html tags from extracted links	2	Nov 29, 2013
html parsing	0	Dec 2, 2006
XHTML to XML conversion	12	Aug 15, 2005
extracting data from a database and converting it into an XML file	5	Mar 3, 2004
New Dojo Site--Most incompetent ever?	49	Mar 8, 2010
when i change the class of the form the script stops working	0	Jan 8, 2007
Ruby Weekly News 5th - 11th September 2005	1	Sep 12, 2005

Ignoring tags when extracting data from xhtml

Damo_Suzuki

Damo_Suzuki

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads