G
Gary Thomas
Hi,
This is driving me nuts, could anyone assist? I am passing a 5
character (chinese) unicode string as a GET parameter to a servlet. The
URL encoding looks fine, I've verified the hex below corresponds to the
UTF-8 character values:
id=%E4%BA%B8%E4%BA%B4%E4%BA%B0%E4%BA%A9%E4%BA%A3
However, my servlet does not seem to be able to decode this back into a
proper Java String (i.e. 5 characters). Code snippet:
....
String id = request.getParameter("id");
char[] c = id.toCharArray();
logger.error("BEFORE: " + id + " - # of chars: " + c.length);
id = URLDecoder.decode(id, "UTF-8");
c = id.toCharArray();
logger.error("AFTER: " + id + " - " + c.length);
....
logs then show:
....
BEFORE: 亸亴亰亩亣 - # of chars: 15
AFTER: 亸亴亰亩亣 - # of chars: 15
....
So obviously, it looks like the String was not decoded from UTF-8
properly. However, if I view the logs with an editor that reads UTF-8,
the 15 characters above show as the correct 5 chinese characters, so the
original UTF-8 does not seem to be incorrect.
Am I missing something obvious? This seems so simple, but just can't
get it to work... I'm using JDK 1.4.2
Thanks,
Gary
This is driving me nuts, could anyone assist? I am passing a 5
character (chinese) unicode string as a GET parameter to a servlet. The
URL encoding looks fine, I've verified the hex below corresponds to the
UTF-8 character values:
id=%E4%BA%B8%E4%BA%B4%E4%BA%B0%E4%BA%A9%E4%BA%A3
However, my servlet does not seem to be able to decode this back into a
proper Java String (i.e. 5 characters). Code snippet:
....
String id = request.getParameter("id");
char[] c = id.toCharArray();
logger.error("BEFORE: " + id + " - # of chars: " + c.length);
id = URLDecoder.decode(id, "UTF-8");
c = id.toCharArray();
logger.error("AFTER: " + id + " - " + c.length);
....
logs then show:
....
BEFORE: 亸亴亰亩亣 - # of chars: 15
AFTER: 亸亴亰亩亣 - # of chars: 15
....
So obviously, it looks like the String was not decoded from UTF-8
properly. However, if I view the logs with an editor that reads UTF-8,
the 15 characters above show as the correct 5 chinese characters, so the
original UTF-8 does not seem to be incorrect.
Am I missing something obvious? This seems so simple, but just can't
get it to work... I'm using JDK 1.4.2
Thanks,
Gary