URL Character Decoding

Kirk McDonald · Jan 30, 2006

If you have a link such as, e.g.:

<a href="index.py?title=Main Menu">Main menu!</a>

The space will be translated to the character code '%20' when you later
retrieve the GET data. Not knowing if there was a library function that
would convert these back to their actual characters, I've written the
following:

import re

def sub_func(m):
return chr(int(m.group()[1:], 16))

def parse_title(title):
p = re.compile(r'%[0-9][0-9]')
return re.sub(p, sub_func, title)

(I know I could probably use a lambda function instead of sub_func, but
I come to Python via C++ and am still not entirely used to them. This is
clearer to me, at least.)

I guess what I'm asking is: Is there a library function (in Python or
mod_python) that knows how to do this? Or, failing that, is there a
different regex I could use to get rid of the substitution function?

-Kirk McDonald

Kirk McDonald · Jan 30, 2006

Kirk said:
If you have a link such as, e.g.:

<a href="index.py?title=Main Menu">Main menu!</a>

The space will be translated to the character code '%20' when you later
retrieve the GET data. Not knowing if there was a library function that
would convert these back to their actual characters, I've written the
following:

import re

def sub_func(m):
return chr(int(m.group()[1:], 16))

def parse_title(title):
p = re.compile(r'%[0-9][0-9]')
return re.sub(p, sub_func, title)

(I know I could probably use a lambda function instead of sub_func, but
I come to Python via C++ and am still not entirely used to them. This is
clearer to me, at least.)

I guess what I'm asking is: Is there a library function (in Python or
mod_python) that knows how to do this? Or, failing that, is there a
different regex I could use to get rid of the substitution function?

-Kirk McDonald

Actually, I just noticed this doesn't really work at all. The URL
character codes are in hex, so not only does the regex not match what it
should, but sub_func fails miserably. See why I wanted a library function?

-Kirk McDonald

Kirk McDonald · Jan 30, 2006

Kirk said:
Actually, I just noticed this doesn't really work at all. The URL
character codes are in hex, so not only does the regex not match what it
should, but sub_func fails miserably. See why I wanted a library function?

-Kirk McDonald

Not to keep talking to myself, but looks like sub_func works fine, and
the regex just needs to be r'%[0-9a-fA-F][0-9a-fA-F]'. But even so.

-Kirk McDonald

Paul McGuire · Jan 30, 2006

Kirk McDonald said:
If you have a link such as, e.g.:

<a href="index.py?title=Main Menu">Main menu!</a>

The space will be translated to the character code '%20' when you later
retrieve the GET data.

I guess what I'm asking is: Is there a library function (in Python or
mod_python) that knows how to do this? Or, failing that, is there a
different regex I could use to get rid of the substitution function?

-Kirk McDonald

'index.py?title=Main Menu'

Kirk McDonald · Jan 30, 2006

Paul said:
'index.py?title=Main Menu'

Perfect! Thanks.

-Kirk McDonald

Outputting signal values to terminal Within Character Array	0	Dec 10, 2021
Regular Expression : Bad Character Range	0	Dec 20, 2013
Working on mobile css menu with plenty of frustration!	2	Dec 29, 2022
decoding keyboard input when using curses	6	May 30, 2009
Rock, Paper, Scissor game. Im getting TypeError, unsupported operand type(s) for -=: 'NoneType' and 'int'	2	Aug 29, 2023
Help with python code!	18	Mar 31, 2013
Problem with displaying character that code number is 219 (after SetConsoleTextAttribute)?	3	Jan 9, 2023
Function is not worked in C	2	Jun 27, 2023

URL Character Decoding

Kirk McDonald

Kirk McDonald

Kirk McDonald

Paul McGuire

Kirk McDonald

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads