urllib2.unquote() vs unicode

Discussion in 'Python' started by Maciej Bliziñski, Mar 18, 2008.

  1. I've been hit by a urllib2.unquote() issue. Consider the following
    unit test:

    import unittest
    import urllib2

    class UnquoteUnitTest(unittest.TestCase):

    def setUp(self):
    self.utxt = u'%C4%99'
    self.stxt = '%C4%99'

    def testEq(self):
    self.assertEqual(
    self.utxt,
    self.stxt)

    def testStrEq(self):
    self.assertEqual(
    str(self.utxt),
    str(self.stxt))

    def testUnicodeEq(self):
    self.assertEqual(
    unicode(self.utxt),
    unicode(self.stxt))

    def testUnquote(self):
    self.assertEqual(
    urllib2.unquote(self.utxt),
    urllib2.unquote(self.stxt))

    def testUnquoteStr(self):
    self.assertEqual(
    urllib2.unquote(str(self.utxt)),
    urllib2.unquote(str(self.stxt)))

    def testUnquoteUnicode(self):
    self.assertEqual(
    urllib2.unquote(unicode(self.utxt)),
    urllib2.unquote(unicode(self.stxt)))


    if __name__ == '__main__':
    unittest.main()

    The three testEq*() tests positively confirm that the two are equal,
    they are the same, they are also the same if cast both to str or
    unicode. Tests with unquote() called with utxt and stxt cast into str
    or unicode are also successful. However...


    ....E..
    ======================================================================
    ERROR: testUnquote (__main__.UnquoteUnitTest)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
    File "unquote.py", line 28, in testUnquote
    urllib2.unquote(self.stxt))
    File "/usr/lib/python2.4/unittest.py", line 332, in failUnlessEqual
    if not first == second:
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position
    0: ordinal not in range(128)

    ----------------------------------------------------------------------
    Ran 6 tests in 0.001s

    FAILED (errors=1)

    Why does this test fail while others are successful? Any ideas?

    Regards,
    Maciej
    Maciej Bliziñski, Mar 18, 2008
    #1
    1. Advertising

  2. On 18 mar, 02:20, Maciej Bliziñski <> wrote:
    > I've been hit by a urllib2.unquote() issue. Consider the following
    > unit test:
    >
    > import unittest
    > import urllib2
    >
    > class UnquoteUnitTest(unittest.TestCase):
    >
    >    def setUp(self):
    >        self.utxt = u'%C4%99'
    >        self.stxt = '%C4%99'
    >
    >    def testEq(self):
    >        self.assertEqual(
    >                self.utxt,
    >                self.stxt)
    >
    >    def testStrEq(self):
    >        self.assertEqual(
    >                str(self.utxt),
    >                str(self.stxt))
    >
    >    def testUnicodeEq(self):
    >        self.assertEqual(
    >                unicode(self.utxt),
    >                unicode(self.stxt))
    >
    >    def testUnquote(self):
    >        self.assertEqual(
    >                urllib2.unquote(self.utxt),
    >                urllib2.unquote(self.stxt))
    >
    >    def testUnquoteStr(self):
    >        self.assertEqual(
    >                urllib2.unquote(str(self.utxt)),
    >                urllib2.unquote(str(self.stxt)))
    >
    >    def testUnquoteUnicode(self):
    >        self.assertEqual(
    >                urllib2.unquote(unicode(self.utxt)),
    >                urllib2.unquote(unicode(self.stxt)))
    >
    > if __name__ == '__main__':
    >    unittest.main()
    >
    > The three testEq*() tests positively confirm that the two are equal,
    > they are the same, they are also the same if cast both to str or
    > unicode. Tests with unquote() called with utxt and stxt cast into str
    > or unicode are also successful. However...
    >
    > ...E..
    > ======================================================================
    > ERROR: testUnquote (__main__.UnquoteUnitTest)
    > ----------------------------------------------------------------------
    > Traceback (most recent call last):
    >  File "unquote.py", line 28, in testUnquote
    >    urllib2.unquote(self.stxt))
    >  File "/usr/lib/python2.4/unittest.py", line 332, in failUnlessEqual
    >    if not first == second:
    > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position
    > 0: ordinal not in range(128)
    >
    > ----------------------------------------------------------------------
    > Ran 6 tests in 0.001s
    >
    > FAILED (errors=1)
    >
    > Why does this test fail while others are successful? Any ideas?


    Both utxt and stxt consist exclusively of ASCII characters, so the
    default ASCII encoding works fine.
    When both are converted to unicode, or both are converted to string,
    and then "unquoted", the resulting objects are again both unicode or
    both strings, and compare without problem (even if they can't be
    represented in ASCII at this stage).
    In testUnquote, after "unquoting", you have non ASCII chars, both
    string and unicode, and it fails to convert both to the same type to
    compare them.

    --
    Gabriel Genellina
    Gabriel Genellina, Mar 18, 2008
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. William Tasso

    quote, unquote

    William Tasso, Nov 11, 2003, in forum: HTML
    Replies:
    5
    Views:
    415
    George Self
    Nov 12, 2003
  2. Josef Cihal
    Replies:
    0
    Views:
    726
    Josef Cihal
    Sep 5, 2005
  3. George Sakkis

    urllib.unquote and unicode

    George Sakkis, Dec 19, 2006, in forum: Python
    Replies:
    11
    Views:
    1,129
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
    Dec 22, 2006
  4. koara

    urllib.unquote + unicode

    koara, Nov 13, 2007, in forum: Python
    Replies:
    1
    Views:
    623
    Gabriel Genellina
    Nov 14, 2007
  5. Mats

    Extract until unquote or EOL

    Mats, Jul 18, 2005, in forum: Perl Misc
    Replies:
    4
    Views:
    129
Loading...

Share This Page