URL Character Decoding

Discussion in 'Python' started by Kirk McDonald, Jan 30, 2006.

  1. If you have a link such as, e.g.:

    <a href="index.py?title=Main Menu">Main menu!</a>

    The space will be translated to the character code '%20' when you later
    retrieve the GET data. Not knowing if there was a library function that
    would convert these back to their actual characters, I've written the
    following:

    import re

    def sub_func(m):
    return chr(int(m.group()[1:], 16))

    def parse_title(title):
    p = re.compile(r'%[0-9][0-9]')
    return re.sub(p, sub_func, title)

    (I know I could probably use a lambda function instead of sub_func, but
    I come to Python via C++ and am still not entirely used to them. This is
    clearer to me, at least.)

    I guess what I'm asking is: Is there a library function (in Python or
    mod_python) that knows how to do this? Or, failing that, is there a
    different regex I could use to get rid of the substitution function?

    -Kirk McDonald
     
    Kirk McDonald, Jan 30, 2006
    #1
    1. Advertising

  2. Kirk McDonald wrote:
    > If you have a link such as, e.g.:
    >
    > <a href="index.py?title=Main Menu">Main menu!</a>
    >
    > The space will be translated to the character code '%20' when you later
    > retrieve the GET data. Not knowing if there was a library function that
    > would convert these back to their actual characters, I've written the
    > following:
    >
    > import re
    >
    > def sub_func(m):
    > return chr(int(m.group()[1:], 16))
    >
    > def parse_title(title):
    > p = re.compile(r'%[0-9][0-9]')
    > return re.sub(p, sub_func, title)
    >
    > (I know I could probably use a lambda function instead of sub_func, but
    > I come to Python via C++ and am still not entirely used to them. This is
    > clearer to me, at least.)
    >
    > I guess what I'm asking is: Is there a library function (in Python or
    > mod_python) that knows how to do this? Or, failing that, is there a
    > different regex I could use to get rid of the substitution function?
    >
    > -Kirk McDonald


    Actually, I just noticed this doesn't really work at all. The URL
    character codes are in hex, so not only does the regex not match what it
    should, but sub_func fails miserably. See why I wanted a library function?

    -Kirk McDonald
     
    Kirk McDonald, Jan 30, 2006
    #2
    1. Advertising

  3. Kirk McDonald wrote:
    > Actually, I just noticed this doesn't really work at all. The URL
    > character codes are in hex, so not only does the regex not match what it
    > should, but sub_func fails miserably. See why I wanted a library function?
    >
    > -Kirk McDonald


    Not to keep talking to myself, but looks like sub_func works fine, and
    the regex just needs to be r'%[0-9a-fA-F][0-9a-fA-F]'. But even so.

    -Kirk McDonald
     
    Kirk McDonald, Jan 30, 2006
    #3
  4. Kirk McDonald

    Paul McGuire Guest

    "Kirk McDonald" <> wrote in message
    news:...
    > If you have a link such as, e.g.:
    >
    > <a href="index.py?title=Main Menu">Main menu!</a>
    >
    > The space will be translated to the character code '%20' when you later
    > retrieve the GET data.
    >
    > I guess what I'm asking is: Is there a library function (in Python or
    > mod_python) that knows how to do this? Or, failing that, is there a
    > different regex I could use to get rid of the substitution function?
    >
    > -Kirk McDonald



    >>> import urllib
    >>> urllib.quote("index.py?title=Main Menu")

    'index.py%3Ftitle%3DMain%20Menu'
    >>> urllib.unquote("index.py%3Ftitle%3DMain%20Menu")

    'index.py?title=Main Menu'
     
    Paul McGuire, Jan 30, 2006
    #4
  5. Paul McGuire wrote:
    > "Kirk McDonald" <> wrote in message
    > news:...
    >
    >>If you have a link such as, e.g.:
    >>
    >><a href="index.py?title=Main Menu">Main menu!</a>
    >>
    >>The space will be translated to the character code '%20' when you later
    >>retrieve the GET data.
    >>
    >>I guess what I'm asking is: Is there a library function (in Python or
    >>mod_python) that knows how to do this? Or, failing that, is there a
    >>different regex I could use to get rid of the substitution function?
    >>
    >>-Kirk McDonald

    >
    >
    >
    >>>>import urllib
    >>>>urllib.quote("index.py?title=Main Menu")

    >
    > 'index.py%3Ftitle%3DMain%20Menu'
    >
    >>>>urllib.unquote("index.py%3Ftitle%3DMain%20Menu")

    >
    > 'index.py?title=Main Menu'
    >
    >


    Perfect! Thanks.

    -Kirk McDonald
     
    Kirk McDonald, Jan 30, 2006
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. HP

    Decoding url

    HP, Jan 12, 2005, in forum: ASP .Net
    Replies:
    2
    Views:
    593
    Juan T. Llibre
    Jan 12, 2005
  2. Ron Clabo

    URL Decoding Issue --- HELP!

    Ron Clabo, Apr 27, 2005, in forum: ASP .Net
    Replies:
    3
    Views:
    555
    Joerg Jooss
    Apr 28, 2005
  3. R L Vandaveer
    Replies:
    0
    Views:
    5,638
    R L Vandaveer
    Dec 22, 2005
  4. flyingco

    URL decoding/encoding problem

    flyingco, Nov 23, 2006, in forum: C Programming
    Replies:
    4
    Views:
    636
    flyingco
    Nov 27, 2006
  5. helzer
    Replies:
    1
    Views:
    111
    Jano Svitok
    Sep 21, 2007
Loading...

Share This Page