query string encoding/decoding

Discussion in 'ASP .Net' started by =?Utf-8?B?TWFyaw==?=, Mar 3, 2004.

  1. I've run a few simple tests looking at how query string encoding/decoding gets handled in asp.net, and it seems like the situation is even messier than it was in asp... Can't say I think much of the "improvements", but maybe someone here can point me in the right direction...

    First, it looks like asp.net will automatically read and recognize query strings encoded in utf8 and 16-bit unicode, only the latter is some mutant, non-standard encoding mechanism that only works in IIS (%u00f1 for example). This looks like it's the *only* way to decode querystrings. Too bad, 'cause browsers out there can encode them all kinds of different ways, and the way most will get done by default is windows-1252. At least in old asp, the lazy defaults for putting together your forms and the default behavior for most browsers would fit well. Seems like there's more active attention required in asp.net

    Second, it no longer appears to be using the page's declared output encoding as a means of interpreting the input (both good and bad, i guess). This means if it runs into a character in the querystring that's *not* encoded utf-8 or mutant, it just drops that character out of the input. Period. No way to handle it. Accented spanish characters, for example, that most browsers are going to encode in 1252 (i.e. %f1 for ñ) just vanish from the asp.net environment

    Third, in asp, when you have more than one value for a query string variable name, referencing the Request object gives you a collection. Now that collection has a toString method that makes a comma-separated list of the values but you *can* refer to each of the different values separately. In asp.net, the NameValueCollection mashes multiple values into a single comma-separated string so if your input has commas in it, well too bad

    Fourth, in asp Request.QueryString gives you the original urlencoded bytes of the querystring i.e. what you were sent. In asp.net, it's actually a re-urlencoding of the post-interpreted values, so you can't get out what you got in... This is most annoying when you get a querysting encoded in utf-8; referencing Request.QueryString returns you a value encoded in the mutant %uxxxx syntax instead in asp.net

    I'm just getting my feet wet in asp.net, coming from an asp environment. Any pointers on how to handle query string issues better than what appears to be the default in asp.net? Seems like there are some steps backwards in asp.net

    Thank
    -mar
    =?Utf-8?B?TWFyaw==?=, Mar 3, 2004
    #1
    1. Advertising

  2. Hi Mark,

    Thanks for posting in the community!
    From your description, you're wondering on the means ASP.NET treat the
    Request's querystring which seems quite different from the classic ASP's.
    As for the first two points you mentioned, I think they're because the
    querystring is encoded based on its client broswer's codepage and then post
    to serverside. The serverside will decode the querystring via the page's
    codepage. If not specified(both clientbrowser for serverside page), they'll
    take the default codepage, if not equals, the result we got may become
    incorrect.

    As for the #3 you mentioned , I've searched the MSDN and found that in
    ASP.NET if you want to get Multi-value querystring item, you need to first
    use Querystring.GetValues method, here is the description in MSDN:
    --------------------------------------------
    If the item you are accessing contains exactly one value for the specified
    key, you do not need to modify your code. However, if there are multiple
    values for a given key, you need to use a different method to return the
    collection of values. Also, note that collections in Visual Basic .NET are
    zero-based, whereas the collections in VBScript are one-based.

    For example, in ASP the individual query string values from a request to
    http://localhost/myweb/valuetest.asp?values=10&values=20 would be accessed
    as follows:

    <%
    'This will output "10"
    Response.Write Request.QueryString("values")(1)

    'This will output "20"
    Response.Write Request.QueryString("values")(2)
    %>

    In ASP.NET, the QueryString property is a NameValueCollection object from
    which you would need to retrieve the Values collection before retrieving
    the actual item you want. Again, note the first item in the collection is
    retrieved by using an index of zero rather than one:

    <%
    'This will output "10"
    Response.Write (Request.QueryString.GetValues("values")(0))

    'This will output "20"
    Response.Write (Request.QueryString.GetValues("values")(1))
    %>

    In both the case of ASP and ASP.NET, the follow code will behave
    identically:

    <%
    'This will output "10", "20"
    Response.Write (Request.QueryString("values"))
    %>

    ----------------------------------------------------------

    As for the #4 point, I think such things as %uxxxx is because the
    querystrings are in url and url can only contains ISO-8859-1 charset
    characters, so if contains unicode, it'll be first encoded and also
    urlencoded( replace some particular characters). I think we're certainly to
    get the values in the querystring how they're input at client as long as we
    mapping the correct codepage between the client and serversdie.

    In addition, here are some tech articles on Migrating from ASP TO ASP.NET:
    #New ASP.NET Page Directives
    http://msdn.microsoft.com/library/en-us/cpguide/html/cpconnewpagedirectives.
    asp?frame=true

    #Migrating to ASP.NET: Key Considerations
    http://msdn.microsoft.com/library/en-us/dnaspp/html/aspnetmigrissues.asp?fra
    me=true

    #Migrating a Commerce Server Site from ASP to ASP.NET
    http://msdn.microsoft.com/library/en-us/dncomsrv02/html/mscs_csnetmig.asp?fr
    ame=true

    #Converting ASP to ASP.NET
    http://msdn.microsoft.com/library/en-us/dndotnet/html/convertasptoaspnet.asp
    ?frame=true

    Hope these help.


    Regards,

    Steven Cheng
    Microsoft Online Support

    Get Secure! www.microsoft.com/security
    (This posting is provided "AS IS", with no warranties, and confers no
    rights.)

    Get Preview at ASP.NET whidbey
    http://msdn.microsoft.com/asp.net/whidbey/default.aspx
    Steven Cheng[MSFT], Mar 4, 2004
    #2
    1. Advertising

  3. Hi Steve..

    First off, thanks for the pointer about GetValues(). That seems much closer to the old method than the NameValueCollection and makes it possible to work with querystrings more effectively

    > As for the first two points you mentioned, I think they're because the
    > querystring is encoded based on its client broswer's codepage and then post
    > to serverside. The serverside will decode the querystring via the page's
    > codepage. If not specified(both clientbrowser for serverside page), they'll
    > take the default codepage, if not equals, the result we got may become
    > incorrect


    Is the default code page in .aspx different than .asp? Do you set the codepage differently in .aspx? I have a little sample .aspx page with the heade
    <%@Language="Jscript" CodePage=1252 EnableSessionState="False"%
    and yet asp.net does *not* seem to be decoding using the declared codepage. It only decodes utf-8 and the mutant utf-16 declarations. Period. Characters encoded in the querystring using the declared codepage just vaporize because asp.net is not decoding them properly

    This does pose some real practical problems, since most legacy pages don't go out of their way to declare codepages on either the client or the server side. In .asp, the default codepage is based on a system setting, usually windows-1252. This lazy default matches up well with 99% of the browsers out there, which are also set up to default to windows-1252 and the two can play together. Since asp.net seems only to decode utf-8 (and non-standard mutant), extra care seems to be necessary to get the client and server to play together reliably... An unnecessary potential gotcha for going to asp.net it seems

    Having the client and server play well by default is especially important becaus
    a) there is no way to communicate what the codepage is for url encoding on GET
    b) I.E. doesn't seem to send the client's codepage back up for the ride by default even on POSTs. I wrote a couple of little sample posts. The form gets the charset both from the Content-Type header and a <Meta http-equiv="Content-type"> header in the html and it does encode the form values in that codepage but the http request on the form submission does *not* include any info to tell the server what codepage the post data is in (seems like a deficiency in IE, but probably not uncommon among any browsers)

    The fact that asp.net doesn't follow/respect any of the common settings like asp did seems like it creates unnecessary openings for bugs

    On point #3, do you know off hand if asp.net works the same way as asp in that the QueryString collection doesn't get chopped up until there's a reference to it? Or is asp.net going to interpret the querystring whether or not the code references it?

    > As for the #4 point, I think such things as %uxxxx is because the
    > querystrings are in url and url can only contains ISO-8859-1 charset
    > characters, so if contains unicode, it'll be first encoded and also
    > urlencoded( replace some particular characters). I think we're certainly to
    > get the values in the querystring how they're input at client as long as we
    > mapping the correct codepage between the client and serversdie


    I think you missed the point of #4. The point was that no, you *don't* get back what the client put in and that seems undesirable and arbitrary. If the user input i
    http://foo.com/test.aspx?query=añ
    (the ñ is encoded with the utf-8 codepage
    <% Response.Write (Request.QueryString); %
    output
    query=a%u00f1
    (the post-interpreted value re-encoded using the non-standard mutant form instead of the original encoding) This is kinda like the xml assertion that one equivalent representation is just as good as another, but that's codified in the xml standard. The change in asp.net from asp just seems arbitrary and annoying (especially since it uses a syntax that is non-standard and only makes sense to certain versions of IE and IIS and nobody else). I guess I can still get at the real user input by looking at the rawUrl property instead. I haven't tried that yet


    I know this encoding stuff is difficult. That's why we had to write our own COM objects to interpret the querystring in asp. Mostly we wrote them for two reasons:
    1) we wanted, like other websites do, to let the page handle the querystring encoding based on user input (like google with a form value) so that people could encode things any which way and we'd still be able to read them.

    2) I haven't tried this in asp.net yet, but at least in asp, there was also the unfortunate side-effect that Server.UrlEncode would only encode things in the page's output-encoding. If you have a multi-tier system where you need to construct urls to call the next tier, you can't always say the next tier will take urls in the encoding the last tier does. So we needed the extra flexibility to be able to urlencode in any codepage.

    For the most part, as I said, the codepage handling seemed to work pretty well by default in asp. There were limitations, but by default most pages worked pretty well with the world at large. Your default page would cover English and all latinate languages without extra effort. Asp.net, it seems, requires extra non-default effort to correctly handle anything that's not 7-bit ascii, and that seems like a step backwards. That's all i'm saying. Pardon me for saying so, but it doesn't seem like you've actually tried any of this stuff past the standard ascii input. If you want I can send you my sample page and some sample queries to demonstate what I'm saying.

    I'll read through the reference pages, but after the first few there still doesn't appear to be anything to correct the impression I've gotten that asp.net is worse than asp in codepage issues.

    Thanks
    -Mark
    =?Utf-8?B?TWFyaw==?=, Mar 4, 2004
    #3
  4. Hi Mark,

    Thank you for the response. Regarding on the issue, I'll consult some
    further experts on this and will update you as soon as posible. Also, I
    think it'll be helpful if you'd attach some sample pages on the issues
    you've mentioned. Thanks.


    Regards,

    Steven Cheng
    Microsoft Online Support

    Get Secure! www.microsoft.com/security
    (This posting is provided "AS IS", with no warranties, and confers no
    rights.)

    Get Preview at ASP.NET whidbey
    http://msdn.microsoft.com/asp.net/whidbey/default.aspx
    Steven Cheng[MSFT], Mar 5, 2004
    #4
  5. Hi Steve..

    The example code is really rather simple
    <% @Page Language="JScript" CodePage="1252" EnableSessionState="false"%><% var key, args : NameValueCollection
    args = Request.QueryString
    for (key=0; key < args.AllKeys.Length; key++
    { var keys : String
    keys = args.AllKeys[key]
    var val = Request.QueryString.GetValues (keys)
    Response.Write ("<pre>"+keys+": "+val+"\n<pre>type: "+typeof (val)+"\n")
    Response.Write (val.Length+" "+typeof(val)+"\n")
    if (val.Length > 0
    { var i : int
    for (i = 0; i < val.Length; i++
    Response.Write ("\t"+i+" "+val(i)+"\n")

    Response.Write ("</pre>\n");

    Response.Write (Request.QueryString+"<br>\n")
    %

    I tweaked it from the original to use GetValues() as you suggested, which works fine for multiple values. All the page does is output the individual query string values that were passed into it

    The important parts of the demonstration, though are these
    1) note that the page *does* declare a codepage explicitly (not that it seems to matter) to windows-1252, which, if it were like asp, would/should be the default codepage anyway

    2) queries that will demonstrate all of the problems in asp.net
    (a little background: ñ = %f1 (latin-1, windows-1252 encoding) = %c3%b1 (utf-8 encoding) = %u00f1 (MS mutant utf-16 syntax
    http://localhost/qs.aspx?query=a%f1
    http://localhost/qs.aspx?query=añ
    http://localhost/qs.aspx?query=a%u00f1

    The first url encodes the querystring año in the declared codepage of the program. In the output, you'll see that the ñ just gets vaporized (i.e. declared invalid on parsing). Problem #1 is that asp.net is not using the page's declared codepage for interpretation

    The second url encodes the same query in utf-8. This page will show you that, despite the declared codepage, asp.net is going to read the querystring as a utf-8 encoding (at least it reads it properly). But this url also demonstrates problem #4 at the end with the Response.Write (Request.QueryString+"<br>\n");. It does *not* output the same value you got in. Instead it outputs mutant encoding. This is inconsistent with asp, where Request.QueryString will give you the raw bytes as you got them (without interpretation) but I suppose arguably consistent with serializing a NameValueCollection, post-interpretation. I'm feeling generous today, so I can acknowledge that if you put the same values into a string array and then tried to write out the .toString() of the array, you'd probably get the same result as asp.net currently produces. But it is still a deviation from asp. As I also said, I suppose I can grab the rawUrl property and to a split on the '?'

    The third url encodes the same query with the MS mutant utf-16 encoding, which, despite the declared codepage, asp.net reads just fine. This url only demonstrates that it will read utf-16 mutant no matter what your codepage is. Point #4 is not exactly demonstrated by this since getting MS mutant utf-16 out happens to be what you put in, in this instance. Even a broken clock is right twice a day, I guess

    Thank
    -mar
    =?Utf-8?B?TWFyaw==?=, Mar 5, 2004
    #5
  6. Hi Mark,

    Here's some inline replies.

    I've run a few simple tests looking at how query string encoding/decoding
    gets handled in asp.net, and it seems like the situation is even messier
    than it was in asp... Can't say I think much of the "improvements", but
    maybe someone here can point me in the right direction...
    First, it looks like asp.net will automatically read and recognize query
    strings encoded in utf8 and 16-bit unicode, only the latter is some mutant,
    non-standard encoding mechanism that only works in IIS (%u00f1 for
    example). This looks like it's the *only* way to decode querystrings. Too
    bad, 'cause browsers out there can encode them all kinds of different ways,
    and the way most will get done by default is windows-1252. At least in old
    asp, the lazy defaults for putting together your forms and the default
    behavior for most browsers would fit well. Seems like there's more active
    attention required in asp.net.
    This is due to responseEncoding/requestEncoding in <globalization> element
    in web.config. The default is UTF-8. You can modify this to windows-1252
    if you want. UTF-8 is much better.
    For IE, it will encode the url (unless you type it directly into the
    address bar) according to the following: Response.Charset, Charset header,
    then <meta> tag. Using the responseEncoding will send the Charset header.
    If you simply type the address into the address, I believe that it is
    encoded with the system codepage.

    Second, it no longer appears to be using the page's declared output
    encoding as a means of interpreting the input (both good and bad, i guess).
    This means if it runs into a character in the querystring that's *not*
    encoded utf-8 or mutant, it just drops that character out of the input.
    Period. No way to handle it. Accented spanish characters, for example,
    that most browsers are going to encode in 1252 (i.e. %f1 for ?) just vanish
    from the asp.net environment.
    It does use the declared output encoding, but this encoding is not where
    you expect. It is in <globalization> in web.config.
    If you do expect to get upper ascii characters, you can use the following
    code in Global.asax in the Application_BeginRequest event:
    protected void Application_BeginRequest(Object sender, EventArgs e)
    {
    //Fires at the beginning of each request
    string str;
    str = Request.QueryString.ToString();
    string delimStr = "%"
    char [] delimiter = delimStr.ToCharArray();
    string [] split = null;

    split = str.Split(delimiter, 2);

    if (split.Length <= 1)
    {
    System.Web.HttpContext.Current.RewritePath(Request.RawUrl.ToString());
    }

    }

    or
    Dim str As String
    str = Server.UrlPathEncode(Request.RawUrl)
    System.Web.HttpContext.Current.RewritePath(str)
    Third, in asp, when you have more than one value for a query string
    variable name, referencing the Request object gives you a collection. Now
    that collection has a toString method that makes a comma-separated list of
    the values but you *can* refer to each of the different values separately.
    In asp.net, the NameValueCollection mashes multiple values into a single
    comma-separated string so if your input has commas in it, well too bad.
    In ASP and ASP.NET, you get a collection when referencing the querystring
    object.
    The comment about the NameValueCollection is the exact same behaviour as in
    asp.
    Example: using the following querystring gives the following results:
    http://kronicas26/Converting/P1.asp?id=5&id=9&test=7
    ASP
    Code:
    <%
    Response.Write Request.QueryString & "<BR>"
    dim o
    for each o in Request.QueryString
    Response.Write "Key: " & o & " Value: " & Request.QueryString(o) & "<BR>"
    next
    %>

    Output:
    id=5&id=9&test=7
    Key: id Value: 5, 9
    Key: test Value: 7
    ASP.NET
    http://kronicas26/ssCS/webform1.aspx?id=5&id=9&test=7
    Code:
    Response.Write(Request.QueryString.ToString() + "<BR>");
    foreach(string t in Request.QueryString.Keys)
    {
    Response.Write("Key :" + t + " Value: " + Request.QueryString[t] +
    "<BR>");
    }
    Output:
    id=5&id=9&test=7
    Key :id Value: 5,9
    Key :test Value: 7

    This shows there is no difference in querystring handling between asp and
    asp.net as far as the collection behaviour goes.


    Fourth, in asp Request.QueryString gives you the original urlencoded bytes
    of the querystring i.e. what you were sent. In asp.net, it's actually a
    re-urlencoding of the post-interpreted values, so you can't get out what
    you got in... This is most annoying when you get a querysting encoded in
    utf-8; referencing Request.QueryString returns you a value encoded in the
    mutant %uxxxx syntax instead in asp.net.
    If you want to get the original querystring, use Request.RawUrl. This
    property is not encoded, and is pulled straight from the aspnet_isapi.dll.
    I'm just getting my feet wet in asp.net, coming from an asp environment.
    Any pointers on how to handle query string issues better than what appears
    to be the default in asp.net? Seems like there are some steps backwards in
    asp.net.

    If you want things to behave like they did in asp, at least as far as
    encoding issues go, do the following:
    In web.config, set <globalization
    requestEncoding="utf-8"
    responseEncoding="utf-8"
    />
    To
    <globalization
    requestEncoding="windows-1252"
    responseEncoding=" windows-1252"
    />
    And use Response.Charset and session.codepage. Windows-1252 is a single
    byte encoding scheme, and multibyte characters will pass through unchanged.

    Thanks,
    Earl Beaman
    Microsoft, ASP.NET

    This posting is provided "AS IS", with no warranties, and confers no
    rights.
    Earl Beaman[MS], Mar 6, 2004
    #6
  7. Hi Mark,

    Thing is, in asp.net, you really don't use the codepage. You can, but it
    is only there for compatibility.
    Now, you use the System.Threading and System.Globalization namespaces to
    create cultureInfo objects and set that as the current thread's
    currentculture.

    The main thing you are missing is the extra layer in asp.net. This is the
    <globalization> element in web.config. This controls how input and output
    are encoded. You can override it in web.config or on the page itself.
    This is why you are seeing that only utf-8 strings are correctly deciphered.

    The querystring gets chopped up and encoded automatically. I believe that
    this also occurred in asp, i can confirm that in another post. I do know
    that the servervariables collection behaves that way, but I don't believe
    we have had any issues with the querystring.

    It may seem like querystring handling is more difficult, but the
    globalization story is 9000 times better then in asp. And when you
    understand how things work, I believe you will see that things are better.

    Here are some links that should help you out.
    http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpgenref/ht
    ml/gngrfglobalizationsection.asp
    http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpref/html/
    frlrfsystemglobalization.asp
    http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpref/html/
    frlrfsystemthreadingthreadclasscurrentculturetopic.asp

    I will check this thread again to see if you have questions for me.

    Thanks,
    Earl Beaman
    Microsoft, ASP.NET

    This posting is provided "AS IS", with no warranties, and confers no
    rights.
    Earl Beaman[MS], Mar 6, 2004
    #7
  8. =?Utf-8?B?TWFyaw==?=

    T Conti Guest

    Thanks for this posting. We are running into Globalization issues
    with asp calls to ashx/aspx. Your last posting helped clear up some
    questions about asp.net (and confirmed some fears).

    (Earl Beaman[MS]) wrote in message news:<>...
    > Hi Mark,
    >
    > For your questions:
    > Yes, @Codepage only affects the response encoding. You can see the help
    > for it (Session.Codepage) at
    > http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpref/html/
    > frlrfsystemwebsessionstatehttpsessionstateclasscodepagetopic.asp
    >
    >
    > Yes, this is different then in asp. ASP.NET uses the globalization element
    > of web.config for the request encoding. You can change it if you want
    > programatically in the Application_BeginRequest event if you wish. After
    > this event, it is too late.
    > The main difference is that ASP.NET interprets the request before you get
    > to your page code.
    >
    > The RawUrl property does not cover the server name, but it does include the
    > application/page and querystring.
    > So, if you had an application such as
    > http://myserver/webapp1/webform1.aspx?id=Mark, RawUrl would return
    > /webapp1/webform1.aspx?id=Mark.
    >
    > You don't have to necessarily do a split on it, it depends on whether there
    > will be the "mutant" utf-16 encoding in the querystring.
    >
    > For the re-encode question:
    > UTF-8 is a variable length encoding, while UTF-16 is not. This makes
    > UTF-16 easily recognizable.
    > UTF-16 is standard, and all browsers SHOULD support it (i only work with
    > IE, so i don't know the level of conformance for other browsers).
    >
    > RFC 2616 does not discuss unicode characters, only ascii. ISO-10646
    > establishes that the %uxxxx format is standardized.
    > References:
    > Internationalization of the Hypertext Markup Language
    > http://www.faqs.org/rfcs/rfc2070.html
    >
    > RFC 1641 - Using Unicode with MIME
    > http://www.faqs.org/rfcs/rfc1641.html
    >
    > RFC 2279 - UTF-8, a transformation format of ISO 10646
    > http://www.faqs.org/rfcs/rfc2279.html
    >
    > Also see http://www.w3.org/TR/html401/references.html for a complete list
    > of references that the W3 org has listed.
    >
    > HTH,
    > Earl Beaman
    > Microsoft, ASP.NET
    >
    > This posting is provided "AS IS", with no warranties, and confers no
    > rights.
    T Conti, Apr 5, 2004
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Slade

    Problem encoding/decoding image

    Slade, Jun 25, 2003, in forum: ASP .Net
    Replies:
    1
    Views:
    1,101
    Natty Gur
    Jun 25, 2003
  2. terry
    Replies:
    2
    Views:
    2,418
    terry
    Nov 3, 2003
  3. LarsM
    Replies:
    18
    Views:
    1,135
    Andreas Prilop
    Feb 11, 2005
  4. Sridhar Anupindi
    Replies:
    0
    Views:
    567
    Sridhar Anupindi
    May 25, 2004
  5. James Antill
    Replies:
    2
    Views:
    339
    Randy Howard
    Jul 21, 2003
Loading...

Share This Page