D
Damo
Hi
I'm trying to extract text from a html page useing DOM. I used JTidy
first on it. The HTml itself is not very descriptive. Theres no
standout tags around the text I need to extract . The way I was
thinking of doing it was accessing the attributes, but I keep getting a
NullPointerException. This is the HTML:
<div class="mb16">
<div id="r_t0" class="prel">
<a id="r0_t" class="L4"href="http://java.sun.com/"">
<b>Java</b> Technology</a></div>
<div class="T1" id="r0_a">Sun's home for <b>Java</b>. Offers
Windows, Solaris, and Linux <b>Java</b> Development Kits (JDKs),
extensions, news, tutorials, and product information.</div>
<div id="r_b0" class="prel T11"><a id="r0_b"
href="http://java.sun.com/">
<img src="http://sp.ask.com/sh/i/icon_bins.gif" border="0"class="bb"
/></a>
<span id="r0_u" class="T10">java.sun.com/</span>
<strong>·</strong> <a class="L5 nw"
href="http://www.askcache.com">
Cached</a> 1f40 <strong>·</strong>
<a class="L5 L5V" href="javascript:void(0)">Save</a>
</div>
</div>
This is the part I want to skip to to extract text. Its buried in loads
of other HTML. Cany anyone please help me do this.
I'm trying to extract text from a html page useing DOM. I used JTidy
first on it. The HTml itself is not very descriptive. Theres no
standout tags around the text I need to extract . The way I was
thinking of doing it was accessing the attributes, but I keep getting a
NullPointerException. This is the HTML:
<div class="mb16">
<div id="r_t0" class="prel">
<a id="r0_t" class="L4"href="http://java.sun.com/"">
<b>Java</b> Technology</a></div>
<div class="T1" id="r0_a">Sun's home for <b>Java</b>. Offers
Windows, Solaris, and Linux <b>Java</b> Development Kits (JDKs),
extensions, news, tutorials, and product information.</div>
<div id="r_b0" class="prel T11"><a id="r0_b"
href="http://java.sun.com/">
<img src="http://sp.ask.com/sh/i/icon_bins.gif" border="0"class="bb"
/></a>
<span id="r0_u" class="T10">java.sun.com/</span>
<strong>·</strong> <a class="L5 nw"
href="http://www.askcache.com">
Cached</a> 1f40 <strong>·</strong>
<a class="L5 L5V" href="javascript:void(0)">Save</a>
</div>
</div>
This is the part I want to skip to to extract text. Its buried in loads
of other HTML. Cany anyone please help me do this.