URLDecoder and URLEncoder have problem with Big5 code?

C

carfield

See to me that it sometime URLDecoder can't decode the string encoded
from URLEncoder. Sometime it can. Do anybody know more information
about it?
 
R

Roland

See to me that it sometime URLDecoder can't decode the string encoded
from URLEncoder. Sometime it can. Do anybody know more information
about it?
Do you have an example of a string that fails to decode (after being
encoded)?

[Since the single argument methods of URLEncoder/URLDecoder are
deprecated, you should always supply the character encoding as second
parameter to the encode/decode method.]
--
Regards,

Roland de Ruiter
___ ___
/__/ w_/ /__/
/ \ /_/ / \
 
C

Carfield Yim

Roland said:
See to me that it sometime URLDecoder can't decode the string encoded
from URLEncoder. Sometime it can. Do anybody know more information
about it?
Do you have an example of a string that fails to decode (after being
encoded)?

[Since the single argument methods of URLEncoder/URLDecoder are
deprecated, you should always supply the character encoding as second
parameter to the encode/decode method.]

I find that this actually not java encoder problem. Sorry about that

The scenario is it is a java servlet webapp which will talk with
apache using mod_proxy. The application will construct the URI in java
using URLEncoder dynamically according to the filename at the server,
and some of the filename contains multiple character, like big5
Chinese character.

However, apache will try to decode it, sometime it can decode it
correctly, but sometime fail to provide correct result.

Although it is off-topic, do you know how to tell apache stop decode
URI so that I can handle it at java?
 
R

Roland

Roland said:
See to me that it sometime URLDecoder can't decode the string encoded
from URLEncoder. Sometime it can. Do anybody know more information
about it?

Do you have an example of a string that fails to decode (after being
encoded)?

[Since the single argument methods of URLEncoder/URLDecoder are
deprecated, you should always supply the character encoding as second
parameter to the encode/decode method.]


I find that this actually not java encoder problem. Sorry about that

OK, doesn't matter, you have tracked down the source of the problem.
The scenario is it is a java servlet webapp which will talk with
apache using mod_proxy. The application will construct the URI in java
using URLEncoder dynamically according to the filename at the server,
and some of the filename contains multiple character, like big5
Chinese character.

However, apache will try to decode it, sometime it can decode it
correctly, but sometime fail to provide correct result.

Although it is off-topic, do you know how to tell apache stop decode
URI so that I can handle it at java?
Don't know. Maybe ask in alt.apache.configuration (this seems a
newsgroup with quite some traffic).
--
Regards,

Roland de Ruiter
___ ___
/__/ w_/ /__/
/ \ /_/ / \
 
J

John C. Bollinger

Carfield said:
Roland said:
[Since the single argument methods of URLEncoder/URLDecoder are
deprecated, you should always supply the character encoding as second
parameter to the encode/decode method.]


I find that this actually not java encoder problem. Sorry about that

The scenario is it is a java servlet webapp which will talk with
apache using mod_proxy. The application will construct the URI in java
using URLEncoder dynamically according to the filename at the server,
and some of the filename contains multiple character, like big5
Chinese character.

However, apache will try to decode it, sometime it can decode it
correctly, but sometime fail to provide correct result.

Although it is off-topic, do you know how to tell apache stop decode
URI so that I can handle it at java?

I don't know of any way to do that, but you might be able to get Apache
to assume a different charset for the decoded data. On the other hand,
you would probably be better off handling it on the originating side by
following the applicable W3C recommendation, which is summarized in the
API docs for URLEncoder.encode(String, String): "Note: The World Wide
Web Consortium Recommendation states that UTF-8 should be used. Not
doing so may introduce incompatibilities". UTF-8 can encode any Unicode
character, including all those covered by the Big5 charset. All you
need to do is specify "UTF-8" as the second parameter to the encoder's
encode method. (If you are currently using the one-arg version then you
may also need to insert a catch block for UnsupportedEncodingException,
but this will never be exercised because all Java implementations are
required to support UTF-8.)
 
C

carfield

I don't know of any way to do that, but you might be able to get
Apache
to assume a different charset for the decoded data. On the other hand,
you would probably be better off handling it on the originating side by
following the applicable W3C recommendation, which is summarized in the
API docs for URLEncoder.encode(String, String): "Note: The World Wide
Web Consortium Recommendation states that UTF-8 should be used. Not
doing so may introduce incompatibilities". UTF-8 can encode any Unicode
character, including all those covered by the Big5 charset. All you
need to do is specify "UTF-8" as the second parameter to the encoder's
encode method. (If you are currently using the one-arg version then you
may also need to insert a catch block for UnsupportedEncodingException,
but this will never be exercised because all Java implementations are
required to support UTF-8.)
I use non UTF-8 encoding is because if user enter URI by hand, in old
IE, it is Big5 but not UTF-8, however, I just check for the default
setting for newly installed XP, it is default as UTF-8. May be I should
support both platform specific encoding and UTF-8 encoding and get the
workable one.
 
J

John C. Bollinger

I use non UTF-8 encoding is because if user enter URI by hand, in old
IE, it is Big5 but not UTF-8, however, I just check for the default
setting for newly installed XP, it is default as UTF-8. May be I should
support both platform specific encoding and UTF-8 encoding and get the
workable one.

You are missing the point. By the time you have your hands on the
parameter inside the servlet, it is a String, and thus devoid of any
charset encumbrances.* If you pass that String on to something else,
such as a DB, you have both full control and full responsibility for
selecting an appropriate charset with which to encode it. UTF-8 always
works, for any characters, but Big5 doesn't. You must in any case
always decode with the same charset that you encoded with.

(*) Note that you have a potential problem in that the client must
correctly specify the charset(s) used to encode the request. (And the
servlet container must use that information correctly to decode the
request.) This is largely outside your control, but you can influence
it by the structure of your HTML interface. For instance, HTML 4 forms
support an optional attribute with which you can specify which charsets
the server is willing to accept for encoding the form data.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,571
Members
45,045
Latest member
DRCM

Latest Threads

Top