D
Drew
Hi All:
I'm working on a mini HTML parser. Basically, what I need to do is to
take a HTML file and parse thru it. I want to pick out all of the
text that is between table data tags <td> and </td> and all of the
text between list item tags <li> and </li>.
Since, its possible that a line of HTML could have no spaces at all
like the below:
<tr><td>SomeFixture</td></tr>
I'm thinking that I'm going to need to read the HTML file one line at
a time. Then look for < and its closing >. If the text between the
two is td or li, then start capturing text at the location of > + 1
and do that until I hit another < with at /td after it.
Does this sound reasonable? Or am I coming up with too difficult of a
solution. Does Java have any built in HTML parsing methods that make
this easier?
Or even if there's an existing Java program that I could modify for
this, that's great too.
Any help is appreciated!
Drew
I'm working on a mini HTML parser. Basically, what I need to do is to
take a HTML file and parse thru it. I want to pick out all of the
text that is between table data tags <td> and </td> and all of the
text between list item tags <li> and </li>.
Since, its possible that a line of HTML could have no spaces at all
like the below:
<tr><td>SomeFixture</td></tr>
I'm thinking that I'm going to need to read the HTML file one line at
a time. Then look for < and its closing >. If the text between the
two is td or li, then start capturing text at the location of > + 1
and do that until I hit another < with at /td after it.
Does this sound reasonable? Or am I coming up with too difficult of a
solution. Does Java have any built in HTML parsing methods that make
this easier?
Or even if there's an existing Java program that I could modify for
this, that's great too.
Any help is appreciated!
Drew