HTMLEditorKit is throwing exception

M

Mike Mimic

Hi!

What is wrong with this code:

HTMLDocument html = new HTMLDocument();
try {
HTMLEditorKit kit = new HTMLEditorKit();
URL url = new URL("http://www.google.com/");
kit.read(new BufferedReader(new
InputStreamReader(url.openStream())), html, 0);
}
catch (Exception e) {
System.err.println("Error");
}

Exception is thrown every time. (Yes, I am connected to the Internet
and I can read from reader - I have tryed reading from new
BufferedReader(new InputStreamReader(url.openStream())) Reader and it
works.)


Mike
 
B

Bryce (Work)

Hi!

What is wrong with this code:

HTMLDocument html = new HTMLDocument();
try {
HTMLEditorKit kit = new HTMLEditorKit();
URL url = new URL("http://www.google.com/");
kit.read(new BufferedReader(new
InputStreamReader(url.openStream())), html, 0);
}
catch (Exception e) {
System.err.println("Error");
}

Exception is thrown every time. (Yes, I am connected to the Internet
and I can read from reader - I have tryed reading from new
BufferedReader(new InputStreamReader(url.openStream())) Reader and it
works.)

What's the Exception? Just printing "error" doesn't tell you much. Try
e.printStackTrace();
 
M

Mike Mimic

Hi!
What's the Exception? Just printing "error" doesn't tell you much. Try
e.printStackTrace();

Good idea. :)

I have been reading message but it was null.

Here it is:

javax.swing.text.ChangedCharSetException
at javax.swing.text.html.parser.DocumentParser.handleEmptyTag(Unknown
Source)
at javax.swing.text.html.parser.Parser.startTag(Unknown Source)
at javax.swing.text.html.parser.Parser.parseTag(Unknown Source)
at javax.swing.text.html.parser.Parser.parseContent(Unknown Source)
at javax.swing.text.html.parser.Parser.parse(Unknown Source)
at javax.swing.text.html.parser.DocumentParser.parse(Unknown Source)
at javax.swing.text.html.parser.ParserDelegator.parse(Unknown Source)
at javax.swing.text.html.HTMLEditorKit.read(Unknown Source)
at test.main(test.java:18)

Does not tell me much. Empty tag?


Mike
 
C

Chris Smith

Mike said:
Here it is:

javax.swing.text.ChangedCharSetException

Mike,

When you provide a Reader to HTMLEditorKit to retrieve an HTML page,
you're giving it a character stream that isn't encoded at all. If it
encounters a meta tag in the HTML source that tries to change the
encoding, it will therefore throw an exception, because it has no way to
honor that request.

You can set a property on the document to tell it to ignore this. The
code looks like this:

doc.putProperty("IgnoreCharsetDirective", new Boolean(true));

--
www.designacourse.com
The Easiest Way to Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
M

Mike Mimic

Hi!

Chris said:
When you provide a Reader to HTMLEditorKit to retrieve an HTML page,
you're giving it a character stream that isn't encoded at all.

Encoded to what?

How can I then encode stream so that HTMLEditorKit will be able to
change the encoding?


Mike
 
C

Chris Smith

Mike said:
Encoded to what?

How can I then encode stream so that HTMLEditorKit will be able to
change the encoding?

Though I haven't tried it, I believe you may have better luck if you
used InputStream instead of Reader. Alternatively, just set that
property.

--
www.designacourse.com
The Easiest Way to Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
M

Mike Mimic

Hi!

Chris said:
Though I haven't tried it, I believe you may have better luck if you
used InputStream instead of Reader. Alternatively, just set that
property.

I had only BufferStream (chained to InputStream) but it was the same.


Mike
 
C

Chris Smith

Mike said:
I had only BufferStream (chained to InputStream) but it was the same.

Yep, after looking into it, turns out that the default parser just gives
up anyway, rather than switching encodings. Set that property and avoid
the whole mess.

--
www.designacourse.com
The Easiest Way to Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
M

Mike Mimic

Hi!

Chris said:
Yep, after looking into it, turns out that the default parser just gives
up anyway, rather than switching encodings. Set that property and avoid
the whole mess.

I have. It works now. Thanks.


Mike
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,053
Latest member
BrodieSola

Latest Threads

Top