Character Encoding

Discussion in 'Java' started by Fred, Feb 20, 2005.

  1. Fred

    Fred Guest

    Hi,

    I've been using java.net.URLEncoder to encode text coming from a form
    on a web page before I store it in my database, and java.net.URLDecoder
    to decode the text when I read it from the database so I can display it
    to the user. I'm using UTF-8 character encoding.

    I recently had a problem where a user copied and pasted text from the
    Attachmate terminal emulator into a textarea and submitted the form.
    The text was stored successfully, but when it came time to decode it,
    the URLDecoder class started throwing errors. I'm guessing that some
    characters that were UTF-8 incompatible came along for the ride,
    because I've had similar problems with Attachmate in the past.

    Are there other classes I should use to perform the encoding? Am I
    using the best character encoding? Any suggestions would be greatly
    appreciated.

    Thank you.

    Fred
     
    Fred, Feb 20, 2005
    #1
    1. Advertising

  2. Fred

    Malte Guest

    Fred wrote:
    > Hi,
    >
    > I've been using java.net.URLEncoder to encode text coming from a form
    > on a web page before I store it in my database, and java.net.URLDecoder
    > to decode the text when I read it from the database so I can display it
    > to the user. I'm using UTF-8 character encoding.
    >
    > I recently had a problem where a user copied and pasted text from the
    > Attachmate terminal emulator into a textarea and submitted the form.
    > The text was stored successfully, but when it came time to decode it,
    > the URLDecoder class started throwing errors. I'm guessing that some
    > characters that were UTF-8 incompatible came along for the ride,
    > because I've had similar problems with Attachmate in the past.
    >
    > Are there other classes I should use to perform the encoding? Am I
    > using the best character encoding? Any suggestions would be greatly
    > appreciated.
    >
    > Thank you.
    >
    > Fred
    >


    Can you convert the input String if you do something like this:

    String input = new String(
    request.getParameter("your_field").getBytes(),"UTF8"
    );
     
    Malte, Feb 21, 2005
    #2
    1. Advertising

  3. Fred wrote:

    > I've been using java.net.URLEncoder to encode text coming from a form
    > on a web page before I store it in my database, and java.net.URLDecoder
    > to decode the text when I read it from the database so I can display it
    > to the user. I'm using UTF-8 character encoding.
    >
    > I recently had a problem where a user copied and pasted text from the
    > Attachmate terminal emulator into a textarea and submitted the form.
    > The text was stored successfully, but when it came time to decode it,
    > the URLDecoder class started throwing errors. I'm guessing that some
    > characters that were UTF-8 incompatible came along for the ride,
    > because I've had similar problems with Attachmate in the past.


    There are no characters incompatible with UTF-8 -- it is a
    general-purpose charset covering all of Unicode. Moreover, if you
    successfully _encode_ the characters with UTF-8 (in the process of
    URL-encoding them) then there is absolutely no reason that you should
    not be able to reverse the process. (You do, however, need to specify
    UTF-8 at both encoding and decoding time.)

    If you post a small, self-contained, compilable example that exhibits
    the problem, preferably with test data, then we can probably point you
    to where the problem lies. You would also get much better advice if you
    showed the actual stack traces for the exceptions thrown. The problem
    is not that the classes you are trying to use are broken; it is that you
    are not using them according to specs.

    Do note, by the way, that you have _two_ encoding/decoding pairs to
    worry about here, and so far you have only discussed one. You also need
    to worry about the the encoding and decoding involved in sending the
    form from the client to your application. Since you say you've had
    trouble with Attachmate before, I tend to suspect that your
    application's character handling is not as robust as you think it is.

    --
    John Bollinger
     
    John C. Bollinger, Feb 21, 2005
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Harley

    foreign character encoding

    Harley, Jul 26, 2003, in forum: ASP .Net
    Replies:
    2
    Views:
    2,013
    Harley
    Jul 26, 2003
  2. Hardy Wang

    Encoding.Default and Encoding.UTF8

    Hardy Wang, Jun 8, 2004, in forum: ASP .Net
    Replies:
    5
    Views:
    18,939
    Jon Skeet [C# MVP]
    Jun 9, 2004
  3. Replies:
    1
    Views:
    23,456
    Real Gagnon
    Oct 8, 2004
  4. raavi
    Replies:
    2
    Views:
    917
    raavi
    Mar 2, 2006
  5. Replies:
    2
    Views:
    391
Loading...

Share This Page