D
djthomp
To put it simply, I need some help dealing with the 'smart' character
corrections that Word automatically performs (quotes, hyphens,
fractions, etc), specifically after it has been copied from Word and
pasted into my web form.
I am working on a project using JSP that has a web form with a few
essay questions. Because of the nature of the form (a scholarship
application form), it is very often filled out by applicants who are
writing their essays in Word and cutting and pasting them into the
form. Often those essays have quotes or other punctuation and special
characters that have been modified from straight up ASCII into
something else. This is causing me a problem both with character
counts, as well as with corrupted data after the data is submitted
(all of the modified characters show up as 2-3 garbage characters).
I've tried to find a solution using the various bits of String
functionality that take a character set name (public byte[]
getBytes(String charsetName) and public String(byte[] bytes, String
charsetName) in particular), but either its the wrong approach, or I'm
not doing it right, or I haven't found the right charsetName yet. I
have not yet tried CharsetEncoder or CharsetDecoder, as I am a little
uncertain where to begin with them.
I do have a working fix for this, but since it consists of a loop
through the string in question which manually finds and fixes the
problem spots by finding them with some hard coded comparisons, I
really don't believe its a good long term solution.
corrections that Word automatically performs (quotes, hyphens,
fractions, etc), specifically after it has been copied from Word and
pasted into my web form.
I am working on a project using JSP that has a web form with a few
essay questions. Because of the nature of the form (a scholarship
application form), it is very often filled out by applicants who are
writing their essays in Word and cutting and pasting them into the
form. Often those essays have quotes or other punctuation and special
characters that have been modified from straight up ASCII into
something else. This is causing me a problem both with character
counts, as well as with corrupted data after the data is submitted
(all of the modified characters show up as 2-3 garbage characters).
I've tried to find a solution using the various bits of String
functionality that take a character set name (public byte[]
getBytes(String charsetName) and public String(byte[] bytes, String
charsetName) in particular), but either its the wrong approach, or I'm
not doing it right, or I haven't found the right charsetName yet. I
have not yet tried CharsetEncoder or CharsetDecoder, as I am a little
uncertain where to begin with them.
I do have a working fix for this, but since it consists of a loop
through the string in question which manually finds and fixes the
problem spots by finding them with some hard coded comparisons, I
really don't believe its a good long term solution.