Py3: Read file with Unicode characters

Gnarlodious · Apr 8, 2010

Attempting to read a file containing Unicode characters such as ±:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
5007: ordinal not in range(128)

I did succeed by converting all the characters to HTML entities such
as "±", but I want the characters to be the actual font in the
source file. What am I doing wrong? My understanding is that ALL
strings in Py3 are unicode so... confused.

-- Gnarlie

Martin v. Loewis · Apr 8, 2010

Gnarlodious said:
Attempting to read a file containing Unicode characters such as ±:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
5007: ordinal not in range(128)

I did succeed by converting all the characters to HTML entities such
as "±", but I want the characters to be the actual font in the
source file. What am I doing wrong? My understanding is that ALL
strings in Py3 are unicode so... confused.

When opening the file, you need to specify the file encoding. If you
don't, it defaults to ASCII (in your situation; the specific default
depends on the environment).

Regards,
Martin

Gnarlodious · Apr 8, 2010

When opening the file, you need to specify the file encoding.

OK, I had tried this:

open(path, 'r').read().encode('utf-8')

however I get error

TypeError: Can't convert 'bytes' object to str implicitly

I had assumed a Unicode string was a Unicode string, so why is it a
bytes string?

Sorry, doing Unicode in Py3 has really been a challenge.

-- Gnarlie

Martin v. Loewis · Apr 8, 2010

Gnarlodious said:
OK, I had tried this:

open(path, 'r').read().encode('utf-8')

No, when *opening* the file, you need to specify the encoding:

open(path, 'r', encoding='utf-8').read()

Sorry, doing Unicode in Py3 has really been a challenge.

That's because you need to re-learn some things.

Regards,
Martin

Gnarlodious · Apr 8, 2010

That's because you need to re-learn some things.

Apparently so, every little item is a lesson. Thank you.

-- Gnarlie

Thinking Unicode	0	Aug 8, 2013
Unicode confusion	0	Jul 14, 2008
Unicode characters, XML/RSS	1	Jul 31, 2008
helping with unicode	4	Jul 3, 2012
Stuck with urllib.quote and Unicode/UTF-8	0	May 7, 2011
How to pass Chinese characters as command-line arguments?	2	Jan 31, 2010
How to work around a unicode problem?	4	Jan 24, 2012
Unicode in writing to a file	4	Apr 23, 2009

Py3: Read file with Unicode characters

Gnarlodious

Martin v. Loewis

Gnarlodious

Martin v. Loewis

Gnarlodious

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads