convert Unicode filenames to good-looking ASCII

coldpizza · May 6, 2010

Hello,

I need to convert accented unicode chars in some audio files to
similarly-looking ascii chars. Looks like the following code seems to
work on windows:

import os
import sys
import glob

EXT = '*.*'

lst_uni = glob.glob(unicode(EXT))

os.system('chcp 437')
lst_asci = glob.glob(EXT)
print sys.stdout.encoding

for i in range(len(lst_asci)):
try:
os.rename(lst_uni, lst_asci)
except Exception as e:
print e

On windows it converts most of the accented chars from the latin1
encoding. This does not work in Linux since it uses 'chcp'.

The questions are (1) *why* does it work on windows, and (2) what is
the proper and portable way to convert unicode characters to similarly
looking plain ascii chars?

That is how to properly do this kind of conversion?
ü > u
é > e
â > a
ä > a
à > a
á > a
ç > c
ê > e
ë > e
è > e

Is there any other way apart from creating my own char replacement
table?

Iliya · May 6, 2010

Try smth like this:

import unicodedata

def remove_accents(str):
nkfd_form = unicodedata.normalize('NFKD', unicode(str))
return u''.join([c for c in nkfd_form if not unicodedata.combining(c)])

Peter Otten · May 6, 2010

coldpizza said:
Hello,

I need to convert accented unicode chars in some audio files to
similarly-looking ascii chars. Looks like the following code seems to
work on windows:

import os
import sys
import glob

EXT = '*.*'

lst_uni = glob.glob(unicode(EXT))

os.system('chcp 437')
lst_asci = glob.glob(EXT)
print sys.stdout.encoding

for i in range(len(lst_asci)):
try:
os.rename(lst_uni, lst_asci)
except Exception as e:
print e

On windows it converts most of the accented chars from the latin1
encoding. This does not work in Linux since it uses 'chcp'.

The questions are (1) *why* does it work on windows, and (2) what is
the proper and portable way to convert unicode characters to similarly
looking plain ascii chars?

That is how to properly do this kind of conversion?
Ã¼ > u
Ã© > e
Ã¢ > a
Ã¤ > a
Ã > a
Ã¡ > a
Ã§ > c
Ãª > e
Ã« > e
Ã¨ > e

Is there any other way apart from creating my own char replacement
table?

.... Ã© > e
.... Ã¢ > a
.... Ã¤ > a
.... Ã > a
.... Ã¡ > a
.... Ã§ > c
.... Ãª > e
.... Ã« > e
.... Ã¨ > e
.... """u > u
e > e
a > a
a > a
a > a
a > a
c > c
e > e
e > e
e > e

coldpizza · May 6, 2010

Cool! Thanks to both Iliya and Peter!

coldpizza said:
coldpizza said:

Hello,

Click to expand...

I need to convert accented unicode chars in some audio files to
similarly-looking ascii chars. Looks like the following code seems to
work on windows:

Click to expand...

import os
import sys
import glob

Click to expand...

EXT = '*.*'

Click to expand...

lst_uni = glob.glob(unicode(EXT))

Click to expand...

os.system('chcp 437')
lst_asci = glob.glob(EXT)
print sys.stdout.encoding

Click to expand...

for i in range(len(lst_asci)):
try:
os.rename(lst_uni, lst_asci)
except Exception as e:
print e

Click to expand...

On windows it converts most of the accented chars from the latin1
encoding. This does not work in Linux since it uses 'chcp'.

Click to expand...

The questions are (1) *why* does it work on windows, and (2) what is
the proper and portable way to convert unicode characters to similarly
looking plain ascii chars?

Click to expand...

That is how to properly do this kind of conversion?
ü > u
é > e
â > a
ä > a
à > a
á > a
ç > c
ê > e
ë > e
è > e

Click to expand...

Is there any other way apart from creating my own char replacement
table?

Click to expand...

... é > e
... â > a
... ä > a
... à > a
... á > a
... ç > c
... ê > e
... ë > e
... è > e
... """>>> from unicodedata import normalize
u > u
e > e
a > a
a > a
a > a
a > a
c > c
e > e
e > e
e > e

Ascii to Unicode.	4	Jul 28, 2010
Ascii to Unicode.	16	Jul 28, 2010
Custom alphabetical sort	8	Dec 24, 2012
DeprecationWarning: Non-ASCII character '\xc0'	2	Feb 6, 2004
How to convert Unicode string to raw string escaped with HTML Entities	3	May 10, 2007
I develop a Java program to format Java codes	14	Mar 2, 2012
which is better for you ?kakg	0	Apr 28, 2005
Reg file exported from XP/2003 - enumerate HEX to ASCII - my head hurts	7	Mar 12, 2005

convert Unicode filenames to good-looking ASCII

coldpizza

Iliya

Peter Otten

coldpizza

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads