parse HTML

V

VitaminB

Hello,

I want to parse a HTML document to get all URLs of the frames in a
frameset. I get a "NullPointer Exception" in the System.out.println...

Thanks a lot for you help.

Regards,
Marcus


##################
Java Code:
##################

URL urlobj = new URL(str);

HttpURLConnection uc = null;
uc = (HttpURLConnection)urlobj.openConnection();
uc.setUseCaches(false);
DataInputStream is = new DataInputStream(uc.getInputStream());

HTMLEditorKit hKit = new HTMLEditorKit();
HTMLDocument hDoc = new HTMLDocument();
hKit.read(is, hDoc, 0);
HTMLDocument.Iterator it = hDoc.getIterator(HTML.Tag.FRAME);

AttributeSet attSet = it.getAttributes();
String s = (String)attSet.getAttribute(HTML.Attribute.SRC);
System.out.println(s);





##################
Beispiel HTML-Seite:
##################

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN">
<html>
<head>

<script language="JavaScript" type="text/javascript">
<!--
self._domino_name = "_Main";
// -->
</script>
</head>

<frameset cols="45%,55%">

<frame
src="/Test/HET/PerformanceTestDB.nsf/ContentDeliveryMeasurement?OpenForm">


<frameset rows="1*,1*">

<frame src="/Test/HET/PerformanceTestDB.nsf/DocsInserted?OpenView">

<frame name="docPreviewFrame"
src="/Test/HET/PerformanceTestDB.nsf/select?OpenForm">
</frameset>
</frameset>
</html>
 
A

Amfur Kilnem

VitaminB said:
AttributeSet attSet = it.getAttributes();
String s = (String)attSet.getAttribute(HTML.Attribute.SRC);
System.out.println(s);

attSet.getAttribute must've returned null.
 
O

Oliver Wong

VitaminB said:
I want to parse a HTML document to get all URLs of the frames in a
frameset. I get a "NullPointer Exception" in the System.out.println... [...]

##################
Java Code:
##################

URL urlobj = new URL(str);

HttpURLConnection uc = null;
uc = (HttpURLConnection)urlobj.openConnection();
uc.setUseCaches(false);
DataInputStream is = new DataInputStream(uc.getInputStream());

HTMLEditorKit hKit = new HTMLEditorKit();
HTMLDocument hDoc = new HTMLDocument();
hKit.read(is, hDoc, 0);
HTMLDocument.Iterator it = hDoc.getIterator(HTML.Tag.FRAME);

AttributeSet attSet = it.getAttributes();
String s = (String)attSet.getAttribute(HTML.Attribute.SRC);
System.out.println(s);

I don't see how you could have gotten an NPE from the
System.out.printlnt statement. Are you sure you didn't get it from the line
above, or possibly somewhere else? See the section titled "If you get an
error message, repeat it exactly." at
http://riters.com/JINX/index.cgi/Suggestions_20for_20Asking_20Questions_20on_20Newsgroups

- Oliver
 
V

VitaminB

OK, now I worked on my code and get anothere exception. But I similary
don't know why.

Here is the failure:
javax.swing.text.ChangedCharSetException
at
javax.swing.text.html.parser.DocumentParser.handleEmptyTag(DocumentParser.java:198)
at javax.swing.text.html.parser.Parser.startTag(Parser.java:401)
at javax.swing.text.html.parser.Parser.parseTag(Parser.java:1875)
at javax.swing.text.html.parser.Parser.parseContent(Parser.java:1910)
at javax.swing.text.html.parser.Parser.parse(Parser.java:2076)
at
javax.swing.text.html.parser.DocumentParser.parse(DocumentParser.java:135)
at
javax.swing.text.html.parser.ParserDelegator.parse(ParserDelegator.java:107)
at javax.swing.text.html.HTMLEditorKit.read(HTMLEditorKit.java:262)
at javax.swing.text.DefaultEditorKit.read(DefaultEditorKit.java:163)
at Stress.urlRequest(Stress.java:76)
at Stress.run(Stress.java:40)



Here are the code:

public long[] urlRequest(String str) {
Cal starttime = new Cal();
long[] read = new long[2];
try {
int c = 0;
byte[] rc = new byte[1024];
URL urlobj = new URL(str);


HTTPRequest request = new HTTPRequest(str, user, pass);
DataInputStream is = new DataInputStream( request.get() );

HTMLEditorKit hKit = new HTMLEditorKit();
HTMLDocument hDoc = new HTMLDocument();
hKit.read(is, hDoc, 0);

HTMLDocument.Iterator it = hDoc.getIterator(HTML.Tag.FRAME);
it.next();
AttributeSet attSet = it.getAttributes();
String s = (String)attSet.getAttribute(HTML.Attribute.SRC);
System.out.println(s);


//System.out.println(attSet.getAttributeCount());



while (( c = is.read(rc)) != -1 ) {
read[0] = read[0] + c;
}
Cal endtime = new Cal();
read[1] = endtime.getTimeInMillis() -
starttime.getTimeInMillis();
return read;
}
catch ( Exception e ) {
e.printStackTrace();
}
return read;
}

}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,008
Latest member
obedient dusk

Latest Threads

Top