utf-8 read/write file

B

Bruno

Hi!

I have big .txt file which i want to read, process and write to another .txt file.
I have done script for that, but im having problem with croatian characters
(Å ,Ä,Ž,ÄŒ,Ć).
How can I read/write from/to file in utf-8 encoding?
I read file with fileinput.input.

thanks
 
B

Benjamin

Hi!

I have big .txt file which i want to read, process and write to another .txt file.
I have done script for that, but im having problem with croatian characters
(©,Ð,®,È,Æ).

Can you show us what you have so far?
How can I read/write from/to file in utf-8 encoding?

import codecs
data = codecs.open("my-utf8-file.txt").read()
 
G

gigs

Benjamin said:
Can you show us what you have so far?


import codecs
data = codecs.open("my-utf8-file.txt").read()
I have tried with codecs, but when i use encoding="utf-8" i get this error on
word : ¾ivot

Traceback (most recent call last):
File "C:\Users\Administrator\Desktop\getcontent.py", line 43, in <module>
encoding="utf-8").readlines()
File "C:\Python25\Lib\codecs.py", line 626, in readlines
return self.reader.readlines(sizehint)
File "C:\Python25\Lib\codecs.py", line 535, in readlines
data = self.read()
File "C:\Python25\Lib\codecs.py", line 424, in read
newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x9e in position 0:
unexpected code byte


i just need to read from file1.txt, process (its simple text processing) some
words and write them to file2.txt without loss of croatian characters. (¹ð¾èæ)
 
K

Kent Johnson

UnicodeDecodeError: 'utf8' codec can't decode byte 0x9e in position 0:
unexpected code byte

Are you sure you have UTF-8 data? I guess your file is encoded in
CP1250 or CP1252; in both of these charsets 0x9e represents LATIN
SMALL LETTER Z WITH CARON.

Kent
 
G

gigs

Kent said:
Are you sure you have UTF-8 data? I guess your file is encoded in
CP1250 or CP1252; in both of these charsets 0x9e represents LATIN
SMALL LETTER Z WITH CARON.

Kent

This data wasnt in utf-8 probably, today i get another one utf-8 and its working

thanks
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,609
Members
45,253
Latest member
BlytheFant

Latest Threads

Top