URL 'special character' replacements

C

Claude Henchoz

Hi guys

I have a huge list of URLs. These URLs all have ASCII codes for special
characters, like "%20" for a space or "%21" for an exclamation mark.

I've already googled quite some time, but I have not been able to find
any elegant way on how to replace these with their 'real' counterparts
(" " and "!").

Of course, I could just replace(), but that seems to be a lot of work.

Thanks for any help.

Cheers, Claude
 
R

Richie Hindle

[Claude]
I have a huge list of URLs. These URLs all have ASCII codes for special
characters, like "%20" for a space or "%21" for an exclamation mark.

You need urllib.unquote:
Help on function unquote in module urllib:

unquote(s)
unquote('abc%20def') -> 'abc def'.
 
D

Duncan Booth

Claude said:
I have a huge list of URLs. These URLs all have ASCII codes for special
characters, like "%20" for a space or "%21" for an exclamation mark.

I've already googled quite some time, but I have not been able to find
any elegant way on how to replace these with their 'real' counterparts
(" " and "!").

Of course, I could just replace(), but that seems to be a lot of work.

urllib.unquote() or urllib.unquote_plus() as appropriate:

unquote( string)

Replace "%xx" escapes by their single-character equivalent.
Example: unquote('/%7Econnolly/') yields '/~connolly/'.


unquote_plus( string)

Like unquote(), but also replaces plus signs by spaces, as required for
unquoting HTML form values.
 
F

Fredrik Lundh

Claude said:
I have a huge list of URLs. These URLs all have ASCII codes for special
characters, like "%20" for a space or "%21" for an exclamation mark.

I've already googled quite some time, but I have not been able to find
any elegant way on how to replace these with their 'real' counterparts
(" " and "!").

Of course, I could just replace(), but that seems to be a lot of work.
'http://docs.python.org/lib/module-urllib.html !'

</F>
 
T

Tim N. van der Leeuw

My outline for a solution would be:

- Use StringIO or cStringIO for reading the original URLs character for
character, and to build the result URLs character for character

- When you read a '%' then read the next 2 character (should be
digits!!!) and create a new string with them
- The numbers like '20' etc. are hexadecimal values, meaning integers
with base 16.
Get the actual int-value like this:
code_int = int(code_str, 16)
- Convert to character as: code_chr = chr(code_int)
- Write this character to the output cStringIO buffer
- When the whole URL is done, do getvalue() to get the string of the
new URL and close the cStringIO buffer.

Is that sufficiently comprehensible? Or still too convoluted for you?

(PS: I researched doing it the manual way, 'the hard way'. However,
there are plenty of libraries in Python for all sorts of internet
stuff. Perhaps urllib or urllib2 already has the functionality that you
need -- didn't look it up)

cheers,

--Tim
 
B

Brett g Porter

Claude said:
Hi guys

I have a huge list of URLs. These URLs all have ASCII codes for special
characters, like "%20" for a space or "%21" for an exclamation mark.

I've already googled quite some time, but I have not been able to find
any elegant way on how to replace these with their 'real' counterparts
(" " and "!").

Of course, I could just replace(), but that seems to be a lot of work.

Thanks for any help.

Cheers, Claude

The standard library module 'urllib' gies you two choices, depending on
the exact behavior you'd like:

http://www.python.org/doc/2.3.2/lib/module-urllib.html
unquote(string)
Replace "%xx" escapes by their single-character equivalent.

Example: unquote('/%7Econnolly/') yields '/~connolly/'.

unquote_plus(string)
Like unquote(), but also replaces plus signs by spaces, as required
for unquoting HTML form values.
 
C

Claude Henchoz

Thanks guys, I like the urllib solution. Stupid me, looked at urllib
reference, but thought that "quote" and "unquote" deal with
_&_n_b_s_p_;_ style entities.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,521
Members
44,995
Latest member
PinupduzSap

Latest Threads

Top