Bug in Python 2.6 urlencode

John Nagle · Sep 7, 2010

There's a bug in Python 2.6's "urllib.urlencode". If you pass
in a Unicode character outside the ASCII range, instead of it
being encoded properly, an exception is raised.

File "C:\python26\lib\urllib.py", line 1267, in urlencode
v = quote_plus(str(v))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa9' in
position 0: ordinal not in range(128)

This will probably work in 3.x, because there, "str" converts
to Unicode, and quote_plus can handle Unicode. This is one of
those legacy bugs left from the pre-Unicode era.

There's a workaround. Call urllib.urlencode with a second
parameter of 1. This turns on the optional feature of
accepting tuples in the argument to be encoded, and the
code goes through a newer code path that works.

Is it worth reporting 2.x bugs any more? Or are we in the
version suckage period, where version N is abandonware and
version N+1 isn't deployable yet.

John Nagle

Ned Deily · Sep 7, 2010

Is it worth reporting 2.x bugs any more? Or are we in the
version suckage period, where version N is abandonware and
version N+1 isn't deployable yet.

Yes!! 2.7 is being actively maintained for bug fixes. (2.6 only for any
security issues that might arise.) It's easy enough to see this if you
take a glance at current activity on any of several Python development
related mailing lists:

http://www.python.org/community/lists/

Terry Reedy · Sep 8, 2010

There's a bug in Python 2.6's "urllib.urlencode". If you pass
in a Unicode character outside the ASCII range, instead of it
being encoded properly, an exception is raised.

File "C:\python26\lib\urllib.py", line 1267, in urlencode
v = quote_plus(str(v))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa9' in
position 0: ordinal not in range(128)

This will probably work in 3.x, because there, "str" converts
to Unicode, and quote_plus can handle Unicode. This is one of
those legacy bugs left from the pre-Unicode era.

There's a workaround. Call urllib.urlencode with a second
parameter of 1. This turns on the optional feature of
accepting tuples in the argument to be encoded, and the
code goes through a newer code path that works.

Is it worth reporting 2.x bugs any more? Or are we in the
version suckage period, where version N is abandonware and
version N+1 isn't deployable yet.

You may report 2.7 bugs, but please verify that the behavior is a bug in
2.7. However, bugs that have been fixed by the switch to switch to
unicode for text are unlikely to be fixed a second time in 2.7. You
might suggest an enhancement to the doc for urlencode if that workaround
is not clear. Or perhaps that workaround suggests that in this case, a
fix would not be too difficult, and you can supply a patch.

The basic deployment problem is that people who want to use unicode text
also want to use libraries that have not been ported to use unicode
text. That is the major issue for many porting projects.

John Nagle · Sep 8, 2010

You may report 2.7 bugs, but please verify that the behavior is a bug in
2.7. However, bugs that have been fixed by the switch to switch to
unicode for text are unlikely to be fixed a second time in 2.7. You
might suggest an enhancement to the doc for urlencode if that workaround
is not clear. Or perhaps that workaround suggests that in this case, a
fix would not be too difficult, and you can supply a patch.

The basic deployment problem is that people who want to use unicode text
also want to use libraries that have not been ported to use unicode
text. That is the major issue for many porting projects.

In other words, we're in the version suckage period.

John Nagle

Ned Deily · Sep 8, 2010

In other words, we're in the version suckage period.

It took me all of one minute to find where a similar issue was reported
previously (http://bugs.python.org/issue1349732). One of the comments
on the issue explains how to use the "doseq" form and an explicit encode
to handle Unicode items. I don't see where that part of the suggestion
made it into the documentation. I'm sure if you make a specific doc
change suggestion, it will be incorporated into the 2.7 docs. If you
think a code change is needed, suggest a specific patch.

John Nagle · Sep 8, 2010

It took me all of one minute to find where a similar issue was reported
previously (http://bugs.python.org/issue1349732). One of the comments
on the issue explains how to use the "doseq" form and an explicit encode
to handle Unicode items. I don't see where that part of the suggestion
made it into the documentation. I'm sure if you make a specific doc
change suggestion, it will be incorporated into the 2.7 docs. If you
think a code change is needed, suggest a specific patch.

That's a very funny bug report.

The report was created back in 2005:

Title: urllib.urlencode provides two features in one param
Type: feature request Stage: committed/rejected

It wasn't listed as an actual bug.

On 2005-12-29, "Mike Brown" writes "However, I was unable to
reproduce your observation that doseq=0 results in urlencode
not knowing how to handle unicode. The object is just passed to str()."
This was back in the Python 2.4 days, when "str" restriction to ASCII
wasn't enforced. Perhaps the original reporter and the developer
were using different versions.

Five years later (!) Terry J. Reedy writes '"put something
somewhere" will not get action.'

In July 2010, Senthil Kumaran writes "This was fixed as part of
Issue8788. Closing this." Issue 8788 is a documentation fix only.

John Nagle

urlencode with high characters	1	Nov 2, 2005
Python 3.3, gettext and Unicode problems	0	Dec 31, 2012
Python 2.6 SSL module: Fails on key file error, with Errno 336265225,without a key file.	2	Apr 19, 2010
Python2.6 + win32com crashes with unicode bug	5	Oct 29, 2009
Unicode thing that causes a traceback in 2.6 and 2.7 but not in 2.5,and only when writing to a pipe,	1	Dec 2, 2010
sys.stdout.write()'s bug or doc bug?	4	Dec 26, 2008
Unicode in writing to a file	4	Apr 23, 2009
Py 2.5: Bug in sgmllib	2	Oct 22, 2006

Bug in Python 2.6 urlencode

John Nagle

Ned Deily

Terry Reedy

John Nagle

Ned Deily

John Nagle

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads