urllib.quote fails on Unicode URL

Discussion in 'Python' started by John Nagle, May 4, 2007.

  1. John Nagle

    John Nagle Guest

    The code in urllib.quote fails on Unicode input, when
    called by robotparser.

    That bit of code needs some attention.
    - It still assumes ASCII goes up to 255, which hasn't been true in Python
    for a while now.
    - The initialization may not be thread-safe; a table is being initialized
    on first use. The code is too clever and uncommented.

    "robotparser" was trying to check if a URL,
    could be accessed, and there are some wierd characters in there. Unicode
    URLs are legal, so this is a real bug.

    Logged in as Bug #1712522.

    John Nagle
    John Nagle, May 4, 2007
    1. Advertisements

  2. John Nagle

    Peter Otten Guest

    There has been a related discussion:


    IIRC the outcome was that while UTF-8 is recommended
    urllib.quote()/unquote() should not guess the encoding.

    What changes that would imply for robotparser I don't know...

    Peter Otten, May 4, 2007
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.