What determines the character encoding for HTTP request variables?

S

Soefara

IIRC, servlet containers such as Tomcat take the bytes sent in HTTP
requests and use an arbitrary character-set (eg. iso-8859-1) when
creating String objects. This might be the server's default character set.


To get around this, I have been using the SetCharacterEncodingFilter
filter which comes in Tomcat's 'examples' webapp to ensure that ALL
requests are treated as UTF-8.


However, I'm now building a webapp which third-party companies will
be submitting their forms to. It is highly likely that people in
UK/USA would build their websites and forms using standard iso-8859-1
character set. It shouldn't be too much of a trouble if they submit
their forms to my webapp.

However, other people might use BIG5 or GB2312 character sets when
building their sites and forms. What would happen if a BIG5 or GB2312
form were submitted (via a GET or a POST) to my webapp which, thanks
to the SetCharacterEncodingFilter filter, is treating all incoming
requests as UTF-8 ?


Surely it will end up in data corruption? If so, as I fear, then how
do you get around this ?

Thank you very much,

Soefara
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top