URL Character Decoding

K

Kirk McDonald

If you have a link such as, e.g.:

<a href="index.py?title=Main Menu">Main menu!</a>

The space will be translated to the character code '%20' when you later
retrieve the GET data. Not knowing if there was a library function that
would convert these back to their actual characters, I've written the
following:

import re

def sub_func(m):
return chr(int(m.group()[1:], 16))

def parse_title(title):
p = re.compile(r'%[0-9][0-9]')
return re.sub(p, sub_func, title)

(I know I could probably use a lambda function instead of sub_func, but
I come to Python via C++ and am still not entirely used to them. This is
clearer to me, at least.)

I guess what I'm asking is: Is there a library function (in Python or
mod_python) that knows how to do this? Or, failing that, is there a
different regex I could use to get rid of the substitution function?

-Kirk McDonald
 
K

Kirk McDonald

Kirk said:
If you have a link such as, e.g.:

<a href="index.py?title=Main Menu">Main menu!</a>

The space will be translated to the character code '%20' when you later
retrieve the GET data. Not knowing if there was a library function that
would convert these back to their actual characters, I've written the
following:

import re

def sub_func(m):
return chr(int(m.group()[1:], 16))

def parse_title(title):
p = re.compile(r'%[0-9][0-9]')
return re.sub(p, sub_func, title)

(I know I could probably use a lambda function instead of sub_func, but
I come to Python via C++ and am still not entirely used to them. This is
clearer to me, at least.)

I guess what I'm asking is: Is there a library function (in Python or
mod_python) that knows how to do this? Or, failing that, is there a
different regex I could use to get rid of the substitution function?

-Kirk McDonald

Actually, I just noticed this doesn't really work at all. The URL
character codes are in hex, so not only does the regex not match what it
should, but sub_func fails miserably. See why I wanted a library function?

-Kirk McDonald
 
K

Kirk McDonald

Kirk said:
Actually, I just noticed this doesn't really work at all. The URL
character codes are in hex, so not only does the regex not match what it
should, but sub_func fails miserably. See why I wanted a library function?

-Kirk McDonald

Not to keep talking to myself, but looks like sub_func works fine, and
the regex just needs to be r'%[0-9a-fA-F][0-9a-fA-F]'. But even so.

-Kirk McDonald
 
P

Paul McGuire

Kirk McDonald said:
If you have a link such as, e.g.:

<a href="index.py?title=Main Menu">Main menu!</a>

The space will be translated to the character code '%20' when you later
retrieve the GET data.

I guess what I'm asking is: Is there a library function (in Python or
mod_python) that knows how to do this? Or, failing that, is there a
different regex I could use to get rid of the substitution function?

-Kirk McDonald
'index.py?title=Main Menu'
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top