urllib.quote fails on Unicode URL

Discussion in 'Python' started by John Nagle, May 4, 2007.

  1. John Nagle

    John Nagle Guest

    The code in urllib.quote fails on Unicode input, when
    called by robotparser.

    That bit of code needs some attention.
    - It still assumes ASCII goes up to 255, which hasn't been true in Python
    for a while now.
    - The initialization may not be thread-safe; a table is being initialized
    on first use. The code is too clever and uncommented.

    "robotparser" was trying to check if a URL,
    "http://www.highbeam.com/DynamicContent/%E2%80%9D/mysaved/privacyPref.asp%22"
    could be accessed, and there are some wierd characters in there. Unicode
    URLs are legal, so this is a real bug.

    Logged in as Bug #1712522.

    John Nagle
     
    John Nagle, May 4, 2007
    #1
    1. Advertising

  2. John Nagle

    Peter Otten Guest

    John Nagle wrote:

    > The code in urllib.quote fails on Unicode input, when
    > called by robotparser.
    >
    > That bit of code needs some attention.
    > - It still assumes ASCII goes up to 255, which hasn't been true in
    > Python
    > for a while now.
    > - The initialization may not be thread-safe; a table is being
    > initialized
    > on first use. The code is too clever and uncommented.
    >
    > "robotparser" was trying to check if a URL,
    >

    "http://www.highbeam.com/DynamicContent/%E2%80%9D/mysaved/privacyPref.asp%22"
    > could be accessed, and there are some wierd characters in there. Unicode
    > URLs are legal, so this is a real bug.
    >
    > Logged in as Bug #1712522.


    There has been a related discussion:

    http://groups.google.com/group/comp...read/thread/b331dc3625dbfc41/ce6e6a3c0635e340

    IIRC the outcome was that while UTF-8 is recommended
    urllib.quote()/unquote() should not guess the encoding.

    What changes that would imply for robotparser I don't know...

    Peter
     
    Peter Otten, May 4, 2007
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    3
    Views:
    13,238
  2. Stimp
    Replies:
    2
    Views:
    2,270
    Stimp
    Sep 20, 2006
  3. Valery Khamenya
    Replies:
    3
    Views:
    715
  4. Jerry Hill
    Replies:
    0
    Views:
    484
    Jerry Hill
    Oct 4, 2008
  5. Marian Steinbach

    Stuck with urllib.quote and Unicode/UTF-8

    Marian Steinbach, May 7, 2011, in forum: Python
    Replies:
    0
    Views:
    435
    Marian Steinbach
    May 7, 2011
Loading...

Share This Page