Py3: Read file with Unicode characters

Discussion in 'Python' started by Gnarlodious, Apr 8, 2010.

  1. Gnarlodious

    Gnarlodious Guest

    Attempting to read a file containing Unicode characters such as ±:
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
    5007: ordinal not in range(128)

    I did succeed by converting all the characters to HTML entities such
    as "±", but I want the characters to be the actual font in the
    source file. What am I doing wrong? My understanding is that ALL
    strings in Py3 are unicode so... confused.

    -- Gnarlie
    Gnarlodious, Apr 8, 2010
    #1
    1. Advertising

  2. Gnarlodious wrote:
    > Attempting to read a file containing Unicode characters such as ±:
    > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
    > 5007: ordinal not in range(128)
    >
    > I did succeed by converting all the characters to HTML entities such
    > as "±", but I want the characters to be the actual font in the
    > source file. What am I doing wrong? My understanding is that ALL
    > strings in Py3 are unicode so... confused.


    When opening the file, you need to specify the file encoding. If you
    don't, it defaults to ASCII (in your situation; the specific default
    depends on the environment).

    Regards,
    Martin
    Martin v. Loewis, Apr 8, 2010
    #2
    1. Advertising

  3. Gnarlodious

    Gnarlodious Guest

    On Apr 8, 9:14 am, "Martin v. Loewis" wrote:

    > When opening the file, you need to specify the file encoding.


    OK, I had tried this:

    open(path, 'r').read().encode('utf-8')

    however I get error

    TypeError: Can't convert 'bytes' object to str implicitly

    I had assumed a Unicode string was a Unicode string, so why is it a
    bytes string?

    Sorry, doing Unicode in Py3 has really been a challenge.

    -- Gnarlie
    Gnarlodious, Apr 8, 2010
    #3
  4. Gnarlodious wrote:
    > On Apr 8, 9:14 am, "Martin v. Loewis" wrote:
    >
    >> When opening the file, you need to specify the file encoding.

    >
    > OK, I had tried this:
    >
    > open(path, 'r').read().encode('utf-8')


    No, when *opening* the file, you need to specify the encoding:

    open(path, 'r', encoding='utf-8').read()

    > Sorry, doing Unicode in Py3 has really been a challenge.


    That's because you need to re-learn some things.

    Regards,
    Martin
    Martin v. Loewis, Apr 8, 2010
    #4
  5. Gnarlodious

    Gnarlodious Guest

    On Apr 8, 11:04 am, "Martin v. Loewis" wrote:

    > That's because you need to re-learn some things.


    Apparently so, every little item is a lesson. Thank you.

    -- Gnarlie
    Gnarlodious, Apr 8, 2010
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Laszlo Nagy
    Replies:
    6
    Views:
    602
  2. Terry Reedy
    Replies:
    0
    Views:
    500
    Terry Reedy
    Jul 1, 2008
  3. Grzegorz ¦liwiñski
    Replies:
    2
    Views:
    940
    Grzegorz ¦liwiñski
    Jan 19, 2011
  4. jmfauth
    Replies:
    2
    Views:
    195
    jmfauth
    Feb 29, 2012
  5. jmfauth

    Py3.3 unicode literal and input()

    jmfauth, Jun 18, 2012, in forum: Python
    Replies:
    24
    Views:
    532
    Steven D'Aprano
    Jun 25, 2012
Loading...

Share This Page