processing a large utf-8 file

I

Ivan Voras

Since the .encoding attribute of file objects are read-only, what is the
proper way to process large utf-8 text files?

I need "bulk" processing (i.e. in blocks - the file is ~ 1GB), but
reading it in fixed blocks is bound to result in partially-read utf-8
characters at block boundaries.
 
G

Guest

Ivan said:
Since the .encoding attribute of file objects are read-only, what is the
proper way to process large utf-8 text files?

You should use codecs.open, or codecs.getreader to get a StreamReader
for UTF-8.

Regards,
Martin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,813
Messages
2,569,699
Members
45,489
Latest member
SwethaJ

Latest Threads

Top