way to remove all non-ascii characters from a file?

O

omission9

I have a text file which contains the occasional non-ascii charcter.
What is the best way to remove all of these in python?
 
L

Larry Bates

Something simple like following will work for files
that fit in memory:

def onlyascii(char):
if ord(char) < 48 or ord(char) > 127: return ''
else: return char

f=open('filename.ext','r')
data=f.read()
f.close()
filtered_data=filter(onlyascii, data)

For larger files you will need to loop and read
the data in chunks.

-Larry Bates
 
I

Ivan Voras

omission9 said:
I have a text file which contains the occasional non-ascii charcter.
What is the best way to remove all of these in python?

file("file2","w").write("".join(
[ch for ch in file("file1", "r").read()
if ch in string.ascii_letters]))

but this will also strip line breaks and whatnot :)

(n.b. I didn't actualy test the above code, and wrote it because of
amusement value :) )
 
P

Peter Otten

omission9 said:
I have a text file which contains the occasional non-ascii charcter.
What is the best way to remove all of these in python?

Read it in chunks, then remove the non-ascii charactors like so:
'Trichte Logik bser Kobold'

and finally write the maimed chunks to a file. However, it's not clear to
me, how removing characters could be a good idea in the first place.
Replacing them at least gives some mimimal hints that something is missing:
'T?richte Logik b?ser Kobold'

Peter
 
P

Peter Hansen

Gerhard said:
I have a text file which contains the occasional non-ascii charcter.
What is the best way to remove all of these in python?

Here's a simple example that does what you want:
orig = "Häring"
"".join([x for x in orig if ord(x) < 128])
'Hring'


Or, if performance is critical, it's possible something like this would
be faster. (A regex might be even better, avoiding the redundant identity
transformation step.) :
from string import maketrans, translate
table = maketrans('', '')
translate(orig, table, table[128:])
'Hring'


-Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,534
Members
45,008
Latest member
Rahul737

Latest Threads

Top