urllib.quote fails on Unicode URL

John Nagle · May 4, 2007

The code in urllib.quote fails on Unicode input, when
called by robotparser.

That bit of code needs some attention.
- It still assumes ASCII goes up to 255, which hasn't been true in Python
for a while now.
- The initialization may not be thread-safe; a table is being initialized
on first use. The code is too clever and uncommented.

"robotparser" was trying to check if a URL,
"http://www.highbeam.com/DynamicContent/”/mysaved/privacyPref.asp""
could be accessed, and there are some wierd characters in there. Unicode
URLs are legal, so this is a real bug.

Logged in as Bug #1712522.

John Nagle

Peter Otten · May 4, 2007

John said:
The code in urllib.quote fails on Unicode input, when
called by robotparser.

That bit of code needs some attention.
- It still assumes ASCII goes up to 255, which hasn't been true in
Python
for a while now.
- The initialization may not be thread-safe; a table is being
initialized
on first use. The code is too clever and uncommented.

"robotparser" was trying to check if a URL,
"http://www.highbeam.com/DynamicContent/”/mysaved/privacyPref.asp""
could be accessed, and there are some wierd characters in there. Unicode
URLs are legal, so this is a real bug.

Logged in as Bug #1712522.

There has been a related discussion:

http://groups.google.com/group/comp...read/thread/b331dc3625dbfc41/ce6e6a3c0635e340

IIRC the outcome was that while UTF-8 is recommended
urllib.quote()/unquote() should not guess the encoding.

What changes that would imply for robotparser I don't know...

Peter

comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Mar 1, 2008
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	1	Feb 1, 2004
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Dec 15, 2007
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Jan 12, 2008
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Nov 1, 2007
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Mar 15, 2008
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Aug 1, 2007

urllib.quote fails on Unicode URL

John Nagle

Peter Otten

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads