Charset conversion question

djthomp · Feb 5, 2007

To put it simply, I need some help dealing with the 'smart' character
corrections that Word automatically performs (quotes, hyphens,
fractions, etc), specifically after it has been copied from Word and
pasted into my web form.

I am working on a project using JSP that has a web form with a few
essay questions. Because of the nature of the form (a scholarship
application form), it is very often filled out by applicants who are
writing their essays in Word and cutting and pasting them into the
form. Often those essays have quotes or other punctuation and special
characters that have been modified from straight up ASCII into
something else. This is causing me a problem both with character
counts, as well as with corrupted data after the data is submitted
(all of the modified characters show up as 2-3 garbage characters).

I've tried to find a solution using the various bits of String
functionality that take a character set name (public byte[]
getBytes(String charsetName) and public String(byte[] bytes, String
charsetName) in particular), but either its the wrong approach, or I'm
not doing it right, or I haven't found the right charsetName yet. I
have not yet tried CharsetEncoder or CharsetDecoder, as I am a little
uncertain where to begin with them.

I do have a working fix for this, but since it consists of a loop
through the string in question which manually finds and fixes the
problem spots by finding them with some hard coded comparisons, I
really don't believe its a good long term solution.

opalpa opalpa · Feb 5, 2007

To put it simply, I need some help dealing with the 'smart' character
corrections that Word automatically performs (quotes, hyphens,
fractions, etc), specifically after it has been copied from Word and
pasted into my web form.

I am working on a project using JSP that has a web form with a few
essay questions. Because of the nature of the form (a scholarship
application form), it is very often filled out by applicants who are
writing their essays in Word and cutting and pasting them into the
form. Often those essays have quotes or other punctuation and special
characters that have been modified from straight up ASCII into
something else. This is causing me a problem both with character
counts, as well as with corrupted data after the data is submitted
(all of the modified characters show up as 2-3 garbage characters).

I've tried to find a solution using the various bits of String
functionality that take a character set name (public byte[]
getBytes(String charsetName) and public String(byte[] bytes, String
charsetName) in particular), but either its the wrong approach, or I'm
not doing it right, or I haven't found the right charsetName yet. I
have not yet tried CharsetEncoder or CharsetDecoder, as I am a little
uncertain where to begin with them.

I do have a working fix for this, but since it consists of a loop
through the string in question which manually finds and fixes the
problem spots by finding them with some hard coded comparisons, I
really don't believe its a good long term solution.

Maybe helpful: http://www.ljmu.ac.uk/cis/webpublishing/81434.htm

opalpa
(e-mail address removed)
http://opalpa.info/

djthomp · Feb 7, 2007

Maybe helpful:http://www.ljmu.ac.uk/cis/webpublishing/81434.htm

opalpa
(e-mail address removed)://opalpa.info/

Unfortunately, we don't really want give the users of our site the
additional instructions they would need so that they only paste
'clean' characters into the form. We're looking for as simple and
clean of an application process as possible, and want a solution for
this that requires no additional user-side effort.

I ended up cleaning out the quotes with a little client-side
javascript, but I'm still looking for a server-side java method.

djthomp · Mar 9, 2007

Unfortunately, we don't really want give the users of our site the
additional instructions they would need so that they only paste
'clean' characters into the form. We're looking for as simple and
clean of an application process as possible, and want a solution for
this that requires no additional user-side effort.

I ended up cleaning out the quotes with a little client-side
javascript, but I'm still looking for a server-side java method.

Well, I finally found my server-side solution to this. I finally used
the proper search into google, which led me to <a href='http://
java.sun.com/developer/technicalArticles/Intl/HTTPCharset/'>this page</
a>. After that it was just a question of using the proper page
directive and and meta tag attributes, and using
request.setCharacterEncoding(encodingName) before reading any request
parameters (all of which was detailed pretty clearly on the sun page I
found).

Just thought I'd give a success update with the answer. When I was
looking I found the same question being asked a lot, but not this
particular answer. Hope that others might find it useful.

djthomp · Mar 9, 2007

Doh, that link got mangled, its at: http://java.sun.com/developer/technicalArticles/Intl/HTTPCharset/

opalpa opalpa · Mar 12, 2007

Good stuff.

nio charset doubt	3	Jul 2, 2008
Can't solve problems! please Help	0	Sep 26, 2022
charset problems with urllib/urllib2	0	Feb 23, 2009
JavaScript in Acrobat Save As Found Text	3	Nov 11, 2021
Can someone tell me what's wrong with this question on StackOverflow?	0	Aug 19, 2023
Converting an Array to a String in JavaScript	7	Sep 22, 2023
accept-charset in forms	3	Jan 7, 2007
Do Assignment Operator Conversion	4	Aug 27, 2010

Charset conversion question

djthomp

opalpa opalpa

djthomp

djthomp

djthomp

opalpa opalpa

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads