Charset conversion question

D

djthomp

To put it simply, I need some help dealing with the 'smart' character
corrections that Word automatically performs (quotes, hyphens,
fractions, etc), specifically after it has been copied from Word and
pasted into my web form.

I am working on a project using JSP that has a web form with a few
essay questions. Because of the nature of the form (a scholarship
application form), it is very often filled out by applicants who are
writing their essays in Word and cutting and pasting them into the
form. Often those essays have quotes or other punctuation and special
characters that have been modified from straight up ASCII into
something else. This is causing me a problem both with character
counts, as well as with corrupted data after the data is submitted
(all of the modified characters show up as 2-3 garbage characters).

I've tried to find a solution using the various bits of String
functionality that take a character set name (public byte[]
getBytes(String charsetName) and public String(byte[] bytes, String
charsetName) in particular), but either its the wrong approach, or I'm
not doing it right, or I haven't found the right charsetName yet. I
have not yet tried CharsetEncoder or CharsetDecoder, as I am a little
uncertain where to begin with them.

I do have a working fix for this, but since it consists of a loop
through the string in question which manually finds and fixes the
problem spots by finding them with some hard coded comparisons, I
really don't believe its a good long term solution.
 
O

opalpa opalpa

To put it simply, I need some help dealing with the 'smart' character
corrections that Word automatically performs (quotes, hyphens,
fractions, etc), specifically after it has been copied from Word and
pasted into my web form.

I am working on a project using JSP that has a web form with a few
essay questions. Because of the nature of the form (a scholarship
application form), it is very often filled out by applicants who are
writing their essays in Word and cutting and pasting them into the
form. Often those essays have quotes or other punctuation and special
characters that have been modified from straight up ASCII into
something else. This is causing me a problem both with character
counts, as well as with corrupted data after the data is submitted
(all of the modified characters show up as 2-3 garbage characters).

I've tried to find a solution using the various bits of String
functionality that take a character set name (public byte[]
getBytes(String charsetName) and public String(byte[] bytes, String
charsetName) in particular), but either its the wrong approach, or I'm
not doing it right, or I haven't found the right charsetName yet. I
have not yet tried CharsetEncoder or CharsetDecoder, as I am a little
uncertain where to begin with them.

I do have a working fix for this, but since it consists of a loop
through the string in question which manually finds and fixes the
problem spots by finding them with some hard coded comparisons, I
really don't believe its a good long term solution.


Maybe helpful: http://www.ljmu.ac.uk/cis/webpublishing/81434.htm

opalpa
(e-mail address removed)
http://opalpa.info/
 
D

djthomp

Maybe helpful:http://www.ljmu.ac.uk/cis/webpublishing/81434.htm

opalpa
(e-mail address removed)://opalpa.info/

Unfortunately, we don't really want give the users of our site the
additional instructions they would need so that they only paste
'clean' characters into the form. We're looking for as simple and
clean of an application process as possible, and want a solution for
this that requires no additional user-side effort.

I ended up cleaning out the quotes with a little client-side
javascript, but I'm still looking for a server-side java method.
 
D

djthomp

Unfortunately, we don't really want give the users of our site the
additional instructions they would need so that they only paste
'clean' characters into the form. We're looking for as simple and
clean of an application process as possible, and want a solution for
this that requires no additional user-side effort.

I ended up cleaning out the quotes with a little client-side
javascript, but I'm still looking for a server-side java method.

Well, I finally found my server-side solution to this. I finally used
the proper search into google, which led me to <a href='http://
java.sun.com/developer/technicalArticles/Intl/HTTPCharset/'>this page</
a>. After that it was just a question of using the proper page
directive and and meta tag attributes, and using
request.setCharacterEncoding(encodingName) before reading any request
parameters (all of which was detailed pretty clearly on the sun page I
found).

Just thought I'd give a success update with the answer. When I was
looking I found the same question being asked a lot, but not this
particular answer. Hope that others might find it useful.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top