unicode text file

Junaid · Sep 27, 2009

I want to do replacements in a utf-8 text file. example

f=open("test.txt","r") #this file is uft-8 encoded

raw = f.read()
txt = raw.decode("utf-8")

txt.replace{'English', ur'à´‡à´‚à´—àµà´²àµ€à´·àµ') #replacing raw unicode string,
but not working

f.write(txt)
f.close()
f.flush()

please, help me

thanks

Vlastimil Brom · Sep 27, 2009

2009/9/27 Junaid said:
I want to do replacements in a utf-8 text file. example

f=open("test.txt","r") #this file is uft-8 encoded

raw = f.read()
txt = raw.decode("utf-8")

txt.replace{'English', ur'à´‡à´‚à´—àµà´²àµ€à´·àµ') #replacing raw unicode string,
but not working

f.write(txt)
f.close()
f.flush()

please, help me

thanks

Does
txt.replace('English', ur'à´‡à´‚à´—àµà´²àµ€à´·àµ')
instead of
txt.replace{'English', ur'à´‡à´‚à´—àµà´²àµ€à´·àµ')

fix the problem?

hth
vbr

MRAB · Sep 27, 2009

Junaid said:
I want to do replacements in a utf-8 text file. example

f=open("test.txt","r") #this file is uft-8 encoded

raw = f.read()
txt = raw.decode("utf-8")

txt.replace{'English', ur'à´‡à´‚à´—àµà´²àµ€à´·àµ') #replacing raw unicode string,
but not working

txt = txt.replace{'English', ur'à´‡à´‚à´—àµà´²àµ€à´·àµ')

f.write(txt)
f.close()
f.flush()

The file will be flushed when it's closed, and flushing it after closing
is meaningless.

Mark Tolonen · Sep 27, 2009

Junaid said:
I want to do replacements in a utf-8 text file. example

f=open("test.txt","r") #this file is uft-8 encoded
raw = f.read()
txt = raw.decode("utf-8")

You can use the codecs module to open and decode the file in one step

txt.replace{'English', ur'à´‡à´‚à´—àµà´²àµ€à´·àµ') #replacing raw unicode string,
but not working

The replace method returns the altered string. It does not modify it in
place. You also should use Unicode strings for both the arguments (although
it doesn't matter in this case). Using a raw Unicode string is also
unnecessary in this case.

txt = txt.replace(u'English', u'à´‡à´‚à´—àµà´²àµ€à´·àµ')

f.write(txt)

You opened the file for writing. You'll need to close the file and reopen
it for writing.

f.close()
f.flush()

Flush isn't required. close() will flush.

Also to have text like à´‡à´‚à´—àµà´²àµ€à´·àµ in a file you'll need to declare the
encoding of the file at the top and be sure to actually save the file in the
encoding.

In summary:

# coding: utf-8
import codecs
f = codecs.open('test.txt','r','utf-8')
txt = f.read()
txt = txt.replace(u'English', u'à´‡à´‚à´—àµà´²àµ€à´·àµ')
f.close()
f = codecs.open('test.txt','w','utf-8')
f.write(txt)
f.close()

-Mark

Junaid · Oct 3, 2009

You can use the codecs module to open and decode the file in one step

The replace method returns the altered string. Â It does not modify it in
place. Â You also should use Unicode strings for both the arguments (although
it doesn't matter in this case). Â Using a raw Unicode string is also
unnecessary in this case.

Â Â txt = txt.replace(u'English', u'à´‡à´‚à´—àµà´²àµ€à´·àµ')

You opened the file for writing. Â You'll need to close the file and reopen
it for writing.

Flush isn't required. Â close() will flush.

Also to have text like à´‡à´‚à´—àµà´²àµ€à´·àµ in a file you'll need to declare the
encoding of the file at the top and be sure to actually save the file in the
encoding.

In summary:

Â Â # coding: utf-8
Â Â import codecs
Â Â f = codecs.open('test.txt','r','utf-8')
Â Â txt = f.read()
Â Â txt = txt.replace(u'English', u'à´‡à´‚à´—àµà´²àµ€à´·àµ')
Â Â f.close()
Â Â f = codecs.open('test.txt','w','utf-8')
Â Â f.write(txt)
Â Â f.close()

-Mark

thanx everyone for replying,

I did as Mark suggested, and it worked

thanx once more

Translater + module + tkinter	1	Feb 16, 2023
Output confusion	2	Mar 9, 2023
Export data from python to a txt file	5	Mar 29, 2013
Cyrillic text from file - set utf8 in cmd, unknown characters output anyway	0	Nov 11, 2022
I write a code to save comment in post on my Facebook forum but it did not work.	0	Aug 30, 2023
Python mange with liste	7	Dec 28, 2013
Python client/server that reads HTML body from server	1	Apr 12, 2023
Parsing unicode (devanagari) text with xml.dom.minidom	6	Mar 8, 2009

unicode text file

Junaid

Vlastimil Brom

MRAB

Mark Tolonen

Junaid

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads