urllib.unquote + unicode

Discussion in 'Python' started by koara, Nov 13, 2007.

  1. koara

    koara Guest

    Hello all,

    i am using urllib.unquote_plus to unquote a string. Sometimes i get a
    strange string like for example "spolu%u017E%E1ci.cz" to unquote. Here
    the problem is that some application decided to quote a non-ascii
    character as %uxxxx directly, instead of using an encoding and quoting
    byte per byte.

    Python (2.4.1) simply returns "'spolu%u017E\xe1ci.cz", which is likely
    not what the application meant.

    My question is, is this %u quoting a standard (i.e., urllib is in the
    wrong), is it not (i.e., the application is in the wrong and urllib
    silently ignores the '%u0' - why?), and most importantly, is there a
    simple workaround to get it working as expected?

    Cheers!
     
    koara, Nov 13, 2007
    #1
    1. Advertising

  2. En Tue, 13 Nov 2007 13:14:18 -0300, koara <> escribió:

    > i am using urllib.unquote_plus to unquote a string. Sometimes i get a
    > strange string like for example "spolu%u017E%E1ci.cz" to unquote. Here
    > the problem is that some application decided to quote a non-ascii
    > character as %uxxxx directly, instead of using an encoding and quoting
    > byte per byte.
    >
    > Python (2.4.1) simply returns "'spolu%u017E\xe1ci.cz", which is likely
    > not what the application meant.
    >
    > My question is, is this %u quoting a standard (i.e., urllib is in the
    > wrong),


    Not that I know of (and that doesn't prove anything).

    > is it not (i.e., the application is in the wrong and urllib
    > silently ignores the '%u0' - why?), and most importantly, is there a
    > simple workaround to get it working as expected?


    Try this (untested):

    def unquote_plus_u(source):
    result = unquote_plus(source)
    if '%u' in result:
    result = result.replace('%u','\\u').decode('unicode_escape')
    return result

    --
    Gabriel Genellina
     
    Gabriel Genellina, Nov 14, 2007
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. William Tasso

    quote, unquote

    William Tasso, Nov 11, 2003, in forum: HTML
    Replies:
    5
    Views:
    426
    George Self
    Nov 12, 2003
  2. George Sakkis

    urllib.unquote and unicode

    George Sakkis, Dec 19, 2006, in forum: Python
    Replies:
    11
    Views:
    1,145
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
    Dec 22, 2006
  3. Jonathan Gardner

    Asynchronous urllib (urllib+asyncore)?

    Jonathan Gardner, Feb 26, 2008, in forum: Python
    Replies:
    1
    Views:
    473
    Terry Jones
    Feb 27, 2008
  4. Maciej Bliziñski

    urllib2.unquote() vs unicode

    Maciej Bliziñski, Mar 18, 2008, in forum: Python
    Replies:
    1
    Views:
    774
    Gabriel Genellina
    Mar 18, 2008
  5. Mats

    Extract until unquote or EOL

    Mats, Jul 18, 2005, in forum: Perl Misc
    Replies:
    4
    Views:
    142
Loading...

Share This Page