urllib2.unquote() vs unicode

M

Maciej Bliziñski

I've been hit by a urllib2.unquote() issue. Consider the following
unit test:

import unittest
import urllib2

class UnquoteUnitTest(unittest.TestCase):

def setUp(self):
self.utxt = u'%C4%99'
self.stxt = '%C4%99'

def testEq(self):
self.assertEqual(
self.utxt,
self.stxt)

def testStrEq(self):
self.assertEqual(
str(self.utxt),
str(self.stxt))

def testUnicodeEq(self):
self.assertEqual(
unicode(self.utxt),
unicode(self.stxt))

def testUnquote(self):
self.assertEqual(
urllib2.unquote(self.utxt),
urllib2.unquote(self.stxt))

def testUnquoteStr(self):
self.assertEqual(
urllib2.unquote(str(self.utxt)),
urllib2.unquote(str(self.stxt)))

def testUnquoteUnicode(self):
self.assertEqual(
urllib2.unquote(unicode(self.utxt)),
urllib2.unquote(unicode(self.stxt)))


if __name__ == '__main__':
unittest.main()

The three testEq*() tests positively confirm that the two are equal,
they are the same, they are also the same if cast both to str or
unicode. Tests with unquote() called with utxt and stxt cast into str
or unicode are also successful. However...


....E..
======================================================================
ERROR: testUnquote (__main__.UnquoteUnitTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "unquote.py", line 28, in testUnquote
urllib2.unquote(self.stxt))
File "/usr/lib/python2.4/unittest.py", line 332, in failUnlessEqual
if not first == second:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position
0: ordinal not in range(128)

----------------------------------------------------------------------
Ran 6 tests in 0.001s

FAILED (errors=1)

Why does this test fail while others are successful? Any ideas?

Regards,
Maciej
 
G

Gabriel Genellina

I've been hit by a urllib2.unquote() issue. Consider the following
unit test:

import unittest
import urllib2

class UnquoteUnitTest(unittest.TestCase):

   def setUp(self):
       self.utxt = u'%C4%99'
       self.stxt = '%C4%99'

   def testEq(self):
       self.assertEqual(
               self.utxt,
               self.stxt)

   def testStrEq(self):
       self.assertEqual(
               str(self.utxt),
               str(self.stxt))

   def testUnicodeEq(self):
       self.assertEqual(
               unicode(self.utxt),
               unicode(self.stxt))

   def testUnquote(self):
       self.assertEqual(
               urllib2.unquote(self.utxt),
               urllib2.unquote(self.stxt))

   def testUnquoteStr(self):
       self.assertEqual(
               urllib2.unquote(str(self.utxt)),
               urllib2.unquote(str(self.stxt)))

   def testUnquoteUnicode(self):
       self.assertEqual(
               urllib2.unquote(unicode(self.utxt)),
               urllib2.unquote(unicode(self.stxt)))

if __name__ == '__main__':
   unittest.main()

The three testEq*() tests positively confirm that the two are equal,
they are the same, they are also the same if cast both to str or
unicode. Tests with unquote() called with utxt and stxt cast into str
or unicode are also successful. However...

...E..
======================================================================
ERROR: testUnquote (__main__.UnquoteUnitTest)
----------------------------------------------------------------------
Traceback (most recent call last):
 File "unquote.py", line 28, in testUnquote
   urllib2.unquote(self.stxt))
 File "/usr/lib/python2.4/unittest.py", line 332, in failUnlessEqual
   if not first == second:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position
0: ordinal not in range(128)

----------------------------------------------------------------------
Ran 6 tests in 0.001s

FAILED (errors=1)

Why does this test fail while others are successful? Any ideas?

Both utxt and stxt consist exclusively of ASCII characters, so the
default ASCII encoding works fine.
When both are converted to unicode, or both are converted to string,
and then "unquoted", the resulting objects are again both unicode or
both strings, and compare without problem (even if they can't be
represented in ASCII at this stage).
In testUnquote, after "unquoting", you have non ASCII chars, both
string and unicode, and it fails to convert both to the same type to
compare them.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,566
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top