What charset the IIS uses to decode POST request?

Discussion in 'ASP General' started by Pavils Jurjans, Oct 23, 2003.

  1. Hallo,

    I am working on multilingual web-application, and I have to be very sure
    about how the international characters are encoded and decoded in the
    client-server form requests.

    There's a great article about the issue:
    http://ppewww.ph.gla.ac.uk/~flavell/charset/form-i18n.html

    Generally, that states that this are is filled with landmines. From my tests
    I see that form content upon POST request is encoded using the character
    encoding from the html page that hosted the form. However, there is no
    information about the used codepage in the POST request, and the server side
    has somehow to guess it so that it can decode the data properly and populate
    the Request.Form collection. My tests show that if the requester page is
    plain html with utf-8 codepage Content-Type metatag, the serverside
    sometimes does, but most time fails to decode the characters properly.

    So, my question is, what codepage is used when interpreting and decoding the
    POST request data anf Request.Form collection is populated? I cuold write my
    own interpreter that takes the data out from Request.BinaryRead(), but I
    would prefer to use the default Request.Form collection tough.

    Thanks,


    -- Pavils
     
    Pavils Jurjans, Oct 23, 2003
    #1
    1. Advertising

  2. Pavils Jurjans

    Arnold Shore Guest

    My sympathies. You may have noticed my posts on this question, and also the
    lack of any response. Yes, that link has a super discussion of the issue.

    The route I took was to end-run the problem by converting the input at POST
    time to 7-bit-safe stuff, filled into a hidden field. In addition to a
    database record of the input, I was trying to generate data for an RTF file
    as a possible output, and while the database contents were handled correctly
    in both directions, I could find nothing on its format for purposes of
    converting to the "hex Unicode" Code-page format that RTF requires.

    That is, a two-byte UTF-8 Cyrillic character was converted to a 4-byte
    value, and I couldn't discern the conversion algorithm. I expect it's
    related to a double conversion. A couple of cuts at reverse-engineering
    failed. If you succeed, pls share the solution.

    FYI, I used the Javascript charCodeAt() function for the client-side
    conversion. HTH a bit more than just sympathy.

    AS
     
    Arnold Shore, Oct 23, 2003
    #2
    1. Advertising

  3. Hi, Pavils

    ASP uses ANSI code page to decode source data. You have to explicitly
    specify utf-8 code page to work correctly with form data:
    <%@ Codepage=65001 %>

    BTW. I worked a lot of time to create component working with form-data,
    any code page, accepting up to 2GB of multipart and url-encoded form data
    (Request.Form has a 100kB limit). You can find it at http://www.pstruh.cz
    (Huge-ASP upload)

    Antonin


    "Pavils Jurjans" <> wrote in message
    news:#...
    > Hallo,
    >
    > I am working on multilingual web-application, and I have to be very sure
    > about how the international characters are encoded and decoded in the
    > client-server form requests.
    >
    > There's a great article about the issue:
    > http://ppewww.ph.gla.ac.uk/~flavell/charset/form-i18n.html
    >
    > Generally, that states that this are is filled with landmines. From my

    tests
    > I see that form content upon POST request is encoded using the character
    > encoding from the html page that hosted the form. However, there is no
    > information about the used codepage in the POST request, and the server

    side
    > has somehow to guess it so that it can decode the data properly and

    populate
    > the Request.Form collection. My tests show that if the requester page is
    > plain html with utf-8 codepage Content-Type metatag, the serverside
    > sometimes does, but most time fails to decode the characters properly.
    >
    > So, my question is, what codepage is used when interpreting and decoding

    the
    > POST request data anf Request.Form collection is populated? I cuold write

    my
    > own interpreter that takes the data out from Request.BinaryRead(), but I
    > would prefer to use the default Request.Form collection tough.
    >
    > Thanks,
    >
    >
    > -- Pavils
    >
    >
     
    Antonin Foller, Oct 23, 2003
    #3
  4. Thanks, Antonin,

    This bit of info was the last one to complete my puzzle. Now I'm happy (tm).

    Yes, I know of your site and the great components you have made. In this
    case, I am looking for more tech insight on the POST format problems. I have
    my own pure-ASP upload in JScript working very fine, and I prefer to stay
    with pure code class, because then I have full control of what happens
    inside.

    Regards,

    -- Pavils

    "Antonin Foller" <> wrote in message
    news:...
    > Hi, Pavils
    >
    > ASP uses ANSI code page to decode source data. You have to explicitly
    > specify utf-8 code page to work correctly with form data:
    > <%@ Codepage=65001 %>
    >
    > BTW. I worked a lot of time to create component working with

    form-data,
    > any code page, accepting up to 2GB of multipart and url-encoded form data
    > (Request.Form has a 100kB limit). You can find it at http://www.pstruh.cz
    > (Huge-ASP upload)
    >
    > Antonin
     
    Pavils Jurjans, Oct 24, 2003
    #4
  5. Arnold, I think I may help you with your issues. Just make you contactable,
    mail me or ICQ: 4047612

    -- Pavils

    "Arnold Shore" <> wrote in message
    news:%...
    > My sympathies. You may have noticed my posts on this question, and also

    the
    > lack of any response. Yes, that link has a super discussion of the issue.
    >
    > The route I took was to end-run the problem by converting the input at

    POST
    > time to 7-bit-safe stuff, filled into a hidden field. In addition to a
    > database record of the input, I was trying to generate data for an RTF

    file
    > as a possible output, and while the database contents were handled

    correctly
    > in both directions, I could find nothing on its format for purposes of
    > converting to the "hex Unicode" Code-page format that RTF requires.
    >
    > That is, a two-byte UTF-8 Cyrillic character was converted to a 4-byte
    > value, and I couldn't discern the conversion algorithm. I expect it's
    > related to a double conversion. A couple of cuts at reverse-engineering
    > failed. If you succeed, pls share the solution.
    >
    > FYI, I used the Javascript charCodeAt() function for the client-side
    > conversion. HTH a bit more than just sympathy.
    >
    > AS
    >
    >
     
    Pavils Jurjans, Oct 27, 2003
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Guest
    Replies:
    0
    Views:
    587
    Guest
    Feb 20, 2004
  2. inetquestion
    Replies:
    3
    Views:
    2,730
    Arne Vajhøj
    Sep 23, 2010
  3. John Machin
    Replies:
    1
    Views:
    678
    Ulrich Eckhardt
    Oct 11, 2010
  4. tony
    Replies:
    0
    Views:
    250
  5. optimistx

    javascript charset <> page charset

    optimistx, Aug 14, 2008, in forum: Javascript
    Replies:
    2
    Views:
    305
    optimistx
    Aug 15, 2008
Loading...

Share This Page