utf-8 encoding issue

M

Marc Petitmermet

The line below looks up the name "öttinger" (with the German umlaut) of
an author using the mysql console:

mysql> select author from records where author like '%Öttinger%';

This successfully finds all entries in the records database where
"öttinger" is the author or the co-author.

In a web form, the user enters "öttinger" and wants to search with this
search string. My idea is now to convert the search string (which also
could be e.g. some cyrillic text) into unicode and then to utf-8:

unicode(search_string).encode('utf-8')

This gives me the utf-8 encoded version of the string but not yet in the
correct representation. How can I get the correct one (is this the hex
version? I don't know the correct terminology.)?

In short: how do I e.g. convert a sting containing a "ö" into a string
containing a "%Ö"?

Regards,
Marc
 
F

Fredrik Lundh

Marc said:
In a web form, the user enters "öttinger" and wants to search with this
search string. My idea is now to convert the search string (which also
could be e.g. some cyrillic text) into unicode and then to utf-8:

unicode(search_string).encode('utf-8')

This gives me the utf-8 encoded version of the string but not yet in the
correct representation. How can I get the correct one (is this the hex
version? I don't know the correct terminology.)?

In short: how do I e.g. convert a sting containing a "ö" into a string
containing a "%Ö"?

that's not UTF-8, that's HTML/XML-style charrefs.

if mysql translates the charref's to unicode characters, you can simply
use:

s = u.encode("ascii", "xmlcharrefreplace")

where "u" is a unicode string.

if you've stored charrefs as is in the database, you're in for some
serious trouble. assuming that all charrefs are hexadecimal charrefs,
you can use something like:

def fixup(m): return "&#" + hex(int(m.group(1)))[1:]
s = re.sub("&#(\d+)", fixup, u.encode("ascii", "xmlcharrefreplace"))

to map all non-ASCII characters to charrefs, and then translate all
charrefs to hexadecimal charrefs.

decoding the charrefs *before* you add the strings to the database
is a better idea, though.

</F>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,046
Latest member
Gavizuho

Latest Threads

Top