HTML Parser Help Please

Z

ZOCOR

Hi

I am using HTMLEditorKit.Parser class to parse a HTML file. However, I have
found this Swing HTML parser extremely difficult to use.

I am trying to parse a HTML file and extracting specific information from it
into a table. Consider the snippet of my HTML and the table I like it to
generate:

HTML source:

<HTML>
<TITLE></TITLE>
<BODY>
<PRE>
Identifer: ABCDEFG
</PRE>
data: 123456
<PRE>
</PRE>
</BODY>
</HTML>

TABLE:

ABCDEFG 123456


Here is the code I have so far:

import javax.swing.text.*;
import javax.swing.text.html.*;
import java.io.*;

public class HTMLParser extends HTMLEditorKit
{
public HTMLEditorKit.Parser getParser()
{
return super.getParser();
}

public static void main (String[] args)
{
try
{
Reader r = new FileReader("html_file.html");
HTMLEditor.Parser parse = new HTMLParser.getParser()
HTMLEditorKit.ParserCallback cb =
{
public void handleStartTag(HTML.Tag t, MutableAttributeSet
a, int a)
{
if (t==HTML.Tag.PRE)
{
//print whats between the pre tag
}
}
public void handleText(char[] data, int pos)
{
//print whats between the pre tags
}
};

parse.parse(r, cb, true);
}
catch (IOException e)
{
System.out.println(e);
}
}
}

I would appreciate it very much if someone could solve this problem for me.
I tried the sun tutortial, but the examples aren't that clear enough for me.

Thanks

ZOCOR
 
N

Nathan Zumwalt

I've never used this HTML Parser before, but I've done similar things
when scraping HTML off websites. My general solution is to:

1. Get the HTML as text (which you already have).
2. Run it through an HTML to XHTML cleanser (I lik JTidy)
3. Parse the XHTML using Java's XML parsers.
4. Use XPath statements to get the values I want.

This probably isn't very efficient for getting small bits of data, but
it works.

//Nathan
 
P

Paul Lutus

ZOCOR said:
Hi

I am using HTMLEditorKit.Parser class to parse a HTML file. However, I
have found this Swing HTML parser extremely difficult to use.

Problem: "difficult".
I am trying to parse a HTML file and extracting specific information from
it into a table.

Problem: "trying".
Consider the snippet of my HTML and the table I like it
to generate:

You left out the table, the final goal of your program.

/ ...
I would appreciate it very much if someone could solve this problem for
me.

Which problem, "difficult" or "trying"? Children and both difficult and
trying, but this is not a specific complaint. Neither is yours.

Tell us what you wanted, what you got, and how they differ.
I tried the sun tutortial, but the examples aren't that clear enough
for me.

Clear enough to do what?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top