Charset conversion question

Discussion in 'Java' started by djthomp, Feb 5, 2007.

  1. djthomp

    djthomp Guest

    To put it simply, I need some help dealing with the 'smart' character
    corrections that Word automatically performs (quotes, hyphens,
    fractions, etc), specifically after it has been copied from Word and
    pasted into my web form.

    I am working on a project using JSP that has a web form with a few
    essay questions. Because of the nature of the form (a scholarship
    application form), it is very often filled out by applicants who are
    writing their essays in Word and cutting and pasting them into the
    form. Often those essays have quotes or other punctuation and special
    characters that have been modified from straight up ASCII into
    something else. This is causing me a problem both with character
    counts, as well as with corrupted data after the data is submitted
    (all of the modified characters show up as 2-3 garbage characters).

    I've tried to find a solution using the various bits of String
    functionality that take a character set name (public byte[]
    getBytes(String charsetName) and public String(byte[] bytes, String
    charsetName) in particular), but either its the wrong approach, or I'm
    not doing it right, or I haven't found the right charsetName yet. I
    have not yet tried CharsetEncoder or CharsetDecoder, as I am a little
    uncertain where to begin with them.

    I do have a working fix for this, but since it consists of a loop
    through the string in question which manually finds and fixes the
    problem spots by finding them with some hard coded comparisons, I
    really don't believe its a good long term solution.
    djthomp, Feb 5, 2007
    #1
    1. Advertising

  2. On Feb 5, 12:00 pm, "djthomp" <> wrote:
    > To put it simply, I need some help dealing with the 'smart' character
    > corrections that Word automatically performs (quotes, hyphens,
    > fractions, etc), specifically after it has been copied from Word and
    > pasted into my web form.
    >
    > I am working on a project using JSP that has a web form with a few
    > essay questions. Because of the nature of the form (a scholarship
    > application form), it is very often filled out by applicants who are
    > writing their essays in Word and cutting and pasting them into the
    > form. Often those essays have quotes or other punctuation and special
    > characters that have been modified from straight up ASCII into
    > something else. This is causing me a problem both with character
    > counts, as well as with corrupted data after the data is submitted
    > (all of the modified characters show up as 2-3 garbage characters).
    >
    > I've tried to find a solution using the various bits of String
    > functionality that take a character set name (public byte[]
    > getBytes(String charsetName) and public String(byte[] bytes, String
    > charsetName) in particular), but either its the wrong approach, or I'm
    > not doing it right, or I haven't found the right charsetName yet. I
    > have not yet tried CharsetEncoder or CharsetDecoder, as I am a little
    > uncertain where to begin with them.
    >
    > I do have a working fix for this, but since it consists of a loop
    > through the string in question which manually finds and fixes the
    > problem spots by finding them with some hard coded comparisons, I
    > really don't believe its a good long term solution.



    Maybe helpful: http://www.ljmu.ac.uk/cis/webpublishing/81434.htm

    opalpa

    http://opalpa.info/
    opalpa http://opalpa.info, Feb 5, 2007
    #2
    1. Advertising

  3. djthomp

    djthomp Guest

    On Feb 5, 10:33 am, "opalpa http://opalpa.info"
    <> wrote:
    >
    > Maybe helpful:http://www.ljmu.ac.uk/cis/webpublishing/81434.htm
    >
    > opalpa
    > ://opalpa.info/


    Unfortunately, we don't really want give the users of our site the
    additional instructions they would need so that they only paste
    'clean' characters into the form. We're looking for as simple and
    clean of an application process as possible, and want a solution for
    this that requires no additional user-side effort.

    I ended up cleaning out the quotes with a little client-side
    javascript, but I'm still looking for a server-side java method.
    djthomp, Feb 7, 2007
    #3
  4. djthomp

    djthomp Guest

    On Feb 7, 8:36 am, "djthomp" <> wrote:
    > Unfortunately, we don't really want give the users of our site the
    > additional instructions they would need so that they only paste
    > 'clean' characters into the form. We're looking for as simple and
    > clean of an application process as possible, and want a solution for
    > this that requires no additional user-side effort.
    >
    > I ended up cleaning out the quotes with a little client-side
    > javascript, but I'm still looking for a server-side java method.


    Well, I finally found my server-side solution to this. I finally used
    the proper search into google, which led me to <a href='http://
    java.sun.com/developer/technicalArticles/Intl/HTTPCharset/'>this page</
    a>. After that it was just a question of using the proper page
    directive and and meta tag attributes, and using
    request.setCharacterEncoding(encodingName) before reading any request
    parameters (all of which was detailed pretty clearly on the sun page I
    found).

    Just thought I'd give a success update with the answer. When I was
    looking I found the same question being asked a lot, but not this
    particular answer. Hope that others might find it useful.
    djthomp, Mar 9, 2007
    #4
  5. djthomp

    djthomp Guest

    djthomp, Mar 9, 2007
    #5
  6. Good stuff.
    opalpa http://opalpa.info, Mar 12, 2007
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?ISO-8859-1?Q?Mart_K=E4sper?=

    CharSet question ?

    =?ISO-8859-1?Q?Mart_K=E4sper?=, Nov 12, 2003, in forum: Java
    Replies:
    3
    Views:
    1,855
    Gordon Beaton
    Nov 12, 2003
  2. Samuël van Laere

    Charset - htaccess question

    Samuël van Laere, Jul 9, 2003, in forum: HTML
    Replies:
    9
    Views:
    3,556
    T. Audry Glamour
    Jul 9, 2003
  3. Joe Wong

    charset conversion routine?

    Joe Wong, Apr 15, 2004, in forum: Python
    Replies:
    1
    Views:
    508
    vincent wehren
    Apr 15, 2004
  4. femtowin femtowin

    how to do charset conversion in ruby?

    femtowin femtowin, Jul 20, 2005, in forum: Ruby
    Replies:
    3
    Views:
    130
    femtowin femtowin
    Jul 21, 2005
  5. optimistx

    javascript charset <> page charset

    optimistx, Aug 14, 2008, in forum: Javascript
    Replies:
    2
    Views:
    270
    optimistx
    Aug 15, 2008
Loading...

Share This Page