Can't decode UTF-8

Gary Thomas · Jul 30, 2003

Hi,

This is driving me nuts, could anyone assist? I am passing a 5
character (chinese) unicode string as a GET parameter to a servlet. The
URL encoding looks fine, I've verified the hex below corresponds to the
UTF-8 character values:

id=%E4%BA%B8%E4%BA%B4%E4%BA%B0%E4%BA%A9%E4%BA%A3

However, my servlet does not seem to be able to decode this back into a
proper Java String (i.e. 5 characters). Code snippet:

....
String id = request.getParameter("id");
char[] c = id.toCharArray();
logger.error("BEFORE: " + id + " - # of chars: " + c.length);

id = URLDecoder.decode(id, "UTF-8");
c = id.toCharArray();
logger.error("AFTER: " + id + " - " + c.length);
....

logs then show:

....
BEFORE: äº¸äº´äº°äº©äº£ - # of chars: 15
AFTER: äº¸äº´äº°äº©äº£ - # of chars: 15
....

So obviously, it looks like the String was not decoded from UTF-8
properly. However, if I view the logs with an editor that reads UTF-8,
the 15 characters above show as the correct 5 chinese characters, so the
original UTF-8 does not seem to be incorrect.

Am I missing something obvious? This seems so simple, but just can't
get it to work... I'm using JDK 1.4.2

Thanks,

Gary

Manish Jethani · Jul 30, 2003

Gary said:
This is driving me nuts, could anyone assist? I am passing a 5

It has driven me nuts in the past

character (chinese) unicode string as a GET parameter to a servlet. The
URL encoding looks fine, I've verified the hex below corresponds to the
UTF-8 character values:

id=%E4%BA%B8%E4%BA%B4%E4%BA%B0%E4%BA%A9%E4%BA%A3

However, my servlet does not seem to be able to decode this back into a
proper Java String (i.e. 5 characters). Code snippet:

...
String id = request.getParameter("id");
char[] c = id.toCharArray();
logger.error("BEFORE: " + id + " - # of chars: " + c.length);

You need to set the character encoding in the request object.

request.setCharacterEncoding("UTF-8");

Either you do this in code (above), or set this in the config
files of your servlet container.

If you set it in the code, then make sure this is done before
calling any getParameter() methods. So it's best set at the
beginning of your doGet() and doPost()

id = URLDecoder.decode(id, "UTF-8");
c = id.toCharArray();
logger.error("AFTER: " + id + " - " + c.length);
...

This is redundant. There's no need to decode()

One more thing: if you're converting from a String object to a
byte[] array, and vice versa, you need to specify the encoding
explicitly in the String constructor and getBytes()

HTH,
Manish

Gary Thomas · Jul 30, 2003

Manish said:
You need to set the character encoding in the request object.

request.setCharacterEncoding("UTF-8");

Either you do this in code (above), or set this in the config
files of your servlet container.

If you set it in the code, then make sure this is done before
calling any getParameter() methods. So it's best set at the
beginning of your doGet() and doPost()

Thanks for the reply, I'm still confused though. I have been calling
request.setCharacterEncoding("UTF-8") in my request processor all along,
and it seems to be setting it correctly, but the parameter is not being
decoded. You can see below that the request encoding is correct:

Code snippet:

....
logger.error("Encoding: " + request.getCharacterEncoding());
String id = request.getParameter("id");
char[] c = id.toCharArray();
logger.error("BEFORE: " + id + " - # of chars: " + c.length);

id = new String(id.getBytes(), "UTF-8");
c = id.toCharArray();
logger.error("AFTER: " + id + " - # of chars: " + c.length);
....

Logs show:
....
Encoding: UTF-8
BEFORE: äº¸äº´äº°äº©äº£ - # of chars: 15
AFTER: ????? - # of chars: 5
....

As you can see though, 'new String(id.getBytes(), "UTF-8")' works
correctly. I also tried 'request.setCharacterEncoding("UTF-8")' in the
code above, but to no avail.

Is there any problems with using 'new String(id.getBytes(), "UTF-8")' as
a workaround?

I should also mention that I'm using the Struts framework, but this
shouldn't have an effect on the code above, correct?

Many Thanks,

Gary

Dave Miller · Jul 31, 2003

In article <G7HVa.23857$Bp2.380@fed1read07>, (e-mail address removed) says...

Strings are immutable.

try -

String id = URLDecoder.decode(request.getParameter("id", "UTF-8");

DM

Illya Kysil · Jul 31, 2003

Gary Thomas said:
Hi,

This is driving me nuts, could anyone assist? I am passing a 5
character (chinese) unicode string as a GET parameter to a servlet. The
URL encoding looks fine, I've verified the hex below corresponds to the
UTF-8 character values:

id=%E4%BA%B8%E4%BA%B4%E4%BA%B0%E4%BA%A9%E4%BA%A3

However, my servlet does not seem to be able to decode this back into a
proper Java String (i.e. 5 characters). Code snippet:

...
String id = request.getParameter("id");
char[] c = id.toCharArray();
logger.error("BEFORE: " + id + " - # of chars: " + c.length);

id = URLDecoder.decode(id, "UTF-8");
c = id.toCharArray();
logger.error("AFTER: " + id + " - " + c.length);
...

logs then show:

...
BEFORE: äº¸äº´äº°äº©äº£ - # of chars: 15
AFTER: äº¸äº´äº°äº©äº£ - # of chars: 15
...

So obviously, it looks like the String was not decoded from UTF-8
properly. However, if I view the logs with an editor that reads UTF-8,
the 15 characters above show as the correct 5 chinese characters, so the
original UTF-8 does not seem to be incorrect.

Am I missing something obvious? This seems so simple, but just can't
get it to work... I'm using JDK 1.4.2

Take a look @ http://www.anassina.com/struts/i18n/i18n.html
i18n with Struts tutorial

Gary Thomas · Aug 1, 2003

Thank you for the link Illya.

- Gary

Illya said:
Gary Thomas said:

Hi,

This is driving me nuts, could anyone assist? I am passing a 5
character (chinese) unicode string as a GET parameter to a servlet. The
URL encoding looks fine, I've verified the hex below corresponds to the
UTF-8 character values:

id=%E4%BA%B8%E4%BA%B4%E4%BA%B0%E4%BA%A9%E4%BA%A3

However, my servlet does not seem to be able to decode this back into a
proper Java String (i.e. 5 characters). Code snippet:

...
String id = request.getParameter("id");
char[] c = id.toCharArray();
logger.error("BEFORE: " + id + " - # of chars: " + c.length);

id = URLDecoder.decode(id, "UTF-8");
c = id.toCharArray();
logger.error("AFTER: " + id + " - " + c.length);
...

logs then show:

...
BEFORE: äº¸äº´äº°äº©äº£ - # of chars: 15
AFTER: äº¸äº´äº°äº©äº£ - # of chars: 15
...

So obviously, it looks like the String was not decoded from UTF-8
properly. However, if I view the logs with an editor that reads UTF-8,
the 15 characters above show as the correct 5 chinese characters, so the
original UTF-8 does not seem to be incorrect.

Am I missing something obvious? This seems so simple, but just can't
get it to work... I'm using JDK 1.4.2

Click to expand...

Take a look @ http://www.anassina.com/struts/i18n/i18n.html
i18n with Struts tutorial

Decoding no of ways and printing each decode message	2	Jun 1, 2021
retriving escape unicode sequences from files ...	1	Aug 4, 2012
retriving escape unicode sequences from files ...	1	Aug 4, 2012
UTF-8 to Unicode conversion in ajax response	9	May 17, 2011
Stuck with urllib.quote and Unicode/UTF-8	0	May 7, 2011
JSP/Servlet: Posting and reading UTF-8 characters	3	Sep 5, 2003
geting error as unxpected symbol read: ". in array initialization	0	Mar 27, 2016
anybody have experience in url decode (included chinese gb2312 character) ?	0	Aug 27, 2004

Can't decode UTF-8

Gary Thomas

Manish Jethani

Gary Thomas

Dave Miller

Illya Kysil

Gary Thomas

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads