urllib.unquote + unicode

koara · Nov 13, 2007

Hello all,

i am using urllib.unquote_plus to unquote a string. Sometimes i get a
strange string like for example "spolu%u017E%E1ci.cz" to unquote. Here
the problem is that some application decided to quote a non-ascii
character as %uxxxx directly, instead of using an encoding and quoting
byte per byte.

Python (2.4.1) simply returns "'spolu%u017E\xe1ci.cz", which is likely
not what the application meant.

My question is, is this %u quoting a standard (i.e., urllib is in the
wrong), is it not (i.e., the application is in the wrong and urllib
silently ignores the '%u0' - why?), and most importantly, is there a
simple workaround to get it working as expected?

Cheers!

Gabriel Genellina · Nov 14, 2007

En Tue said:
i am using urllib.unquote_plus to unquote a string. Sometimes i get a
strange string like for example "spolu%u017E%E1ci.cz" to unquote. Here
the problem is that some application decided to quote a non-ascii
character as %uxxxx directly, instead of using an encoding and quoting
byte per byte.

Python (2.4.1) simply returns "'spolu%u017E\xe1ci.cz", which is likely
not what the application meant.

My question is, is this %u quoting a standard (i.e., urllib is in the
wrong),

Not that I know of (and that doesn't prove anything).

is it not (i.e., the application is in the wrong and urllib
silently ignores the '%u0' - why?), and most importantly, is there a
simple workaround to get it working as expected?

Try this (untested):

def unquote_plus_u(source):
result = unquote_plus(source)
if '%u' in result:
result = result.replace('%u','\\u').decode('unicode_escape')
return result

Unicode	2	Mar 15, 2013
Thinking Unicode	0	Aug 8, 2013
Must we include urllib just to decode a URL-encoded string, whenusing Requests?	0	Jun 13, 2013
Unicode Chars in Windows Path	12	Apr 2, 2014
Unicode literals and byte string interpretation.	4	Oct 27, 2011
Anoying unicode / str conversion problem	2	Jan 26, 2009
urllib2.unquote() vs unicode	1	Mar 18, 2008
Unicode confusion	0	Jul 14, 2008

urllib.unquote + unicode

koara

Gabriel Genellina

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads