string conversion latin2 to ascii

M

Martin Landa

Hi all,

sorry for a newbie question. I have unicode string (or better say
latin2 encoding) containing non-ascii characters, e.g.

s = "Ukázka_možnosti_využití_programu_OpenJUMP_v_SOA"

I would like to convert this string to plain ascii (using some lookup
table for latin2)

to get

-> Ukazka_moznosti_vyuziti_programu_OpenJUMP_v_SOA

Thanks for any hits! Regards, Martin Landa
 
K

kyosohma

Hi all,

sorry for a newbie question. I have unicode string (or better say
latin2 encoding) containing non-ascii characters, e.g.

s = "Ukázka_možnosti_využití_programu_OpenJUMP_v_SOA"

I would like to convert this string to plain ascii (using some lookup
table for latin2)

to get

-> Ukazka_moznosti_vyuziti_programu_OpenJUMP_v_SOA

Thanks for any hits! Regards, Martin Landa

With a little googling, I found this:

http://www.peterbe.com/plog/unicode-to-ascii

You might also find this article useful:

http://www.reportlab.com/i18n/python_unicode_tutorial.html

Mike
 
M

Martin v. Löwis

sorry for a newbie question. I have unicode string (or better say
latin2 encoding) containing non-ascii characters, e.g.

s = "Ukázka_možnosti_využití_programu_OpenJUMP_v_SOA"

That's not a Unicode string (at least in Python 2); it is
a latin-2 encoded byte string; it has nothing to do with Unicode.
I would like to convert this string to plain ascii (using some lookup
table for latin2)

to get

-> Ukazka_moznosti_vyuziti_programu_OpenJUMP_v_SOA

I recommend to use string.translate. You need a translation
table there, which is best generated with string.maketrans.

table=string.maketrans("áží","azi")
print s.translate(table)

HTH,
Martin
 
J

John Machin


and if the OP has the patience to read *ALL* the comments on that blog
entry, he will find that comment[-2] points to

http://effbot.python-hosting.com/file/stuff/sandbox/text/unaccent.py

and comment[-1] (from the blog owner) is "Brilliant! Thank you."

The bottom line is that there is no universal easy solution; you need
to handcraft a translation table suited to your particular purpose
(e.g. do you want u-with-umlaut to become u or ue?). The
unicodedata.normalize function is useful for off-line preparation of a
set of candidate mappings for that table; it should not be applied
either on-line or blindly.

Cheers,
John
 
J

Jakub Wilk

I have unicode string (or better say latin2 encoding) containing
non-ascii characters, e.g.

s = "Ukázka_možnosti_využití_programu_OpenJUMP_v_SOA"

I would like to convert this string to plain ascii (using some lookup
table for latin2)

to get

-> Ukazka_moznosti_vyuziti_programu_OpenJUMP_v_SOA

You may try python-elinks
Ukazka_moznosti_vyuziti_programu_OpenJUMP_v_SOA
 
K

kyosohma

With a little googling, I found this:

and if the OP has the patience to read *ALL* the comments on that blog
entry, he will find that comment[-2] points to

http://effbot.python-hosting.com/file/stuff/sandbox/text/unaccent.py

and comment[-1] (from the blog owner) is "Brilliant! Thank you."

The bottom line is that there is no universal easy solution; you need
to handcraft a translation table suited to your particular purpose
(e.g. do you want u-with-umlaut to become u or ue?). The
unicodedata.normalize function is useful for off-line preparation of a
set of candidate mappings for that table; it should not be applied
either on-line or blindly.

Cheers,
John

Sorry...I didn't know about translation tables or I would have
mentioned that instead. My bad.

Mike
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,073
Latest member
DarinCeden

Latest Threads

Top