What determines the character encoding for HTTP request variables?

Soefara · Feb 10, 2004

IIRC, servlet containers such as Tomcat take the bytes sent in HTTP
requests and use an arbitrary character-set (eg. iso-8859-1) when
creating String objects. This might be the server's default character set.

To get around this, I have been using the SetCharacterEncodingFilter
filter which comes in Tomcat's 'examples' webapp to ensure that ALL
requests are treated as UTF-8.

However, I'm now building a webapp which third-party companies will
be submitting their forms to. It is highly likely that people in
UK/USA would build their websites and forms using standard iso-8859-1
character set. It shouldn't be too much of a trouble if they submit
their forms to my webapp.

However, other people might use BIG5 or GB2312 character sets when
building their sites and forms. What would happen if a BIG5 or GB2312
form were submitted (via a GET or a POST) to my webapp which, thanks
to the SetCharacterEncodingFilter filter, is treating all incoming
requests as UTF-8 ?

Surely it will end up in data corruption? If so, as I fear, then how
do you get around this ?

Thank you very much,

Soefara

Ruby 1.8 - character encoding	22	Jul 7, 2009
Character encoding (2)	1	Oct 25, 2004
Character Encoding not identical between XMLHttpRequest and a form submit	0	Sep 18, 2007
Specify Character Encoding On CD?	12	Oct 17, 2004
xml, character encoding, asp question	7	Mar 7, 2005
Can do FileUpload or Convert to UTF-8 but not both ?	0	Nov 12, 2003
Browser Encoding Validation for ISO-8859-1	0	Oct 13, 2005
preferred way to set encoding for print	5	Sep 15, 2009

What determines the character encoding for HTTP request variables?

Soefara

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads