Character encoding & the copyright symbol

Discussion in 'Python' started by Robert Dailey, Aug 6, 2009.

  1. Hello,

    I'm loading a file via open() in Python 3.1 and I'm getting the
    following error when I try to print the contents of the file that I
    obtained through a call to read():

    UnicodeEncodeError: 'charmap' codec can't encode character '\xa9' in
    position 1650: character maps to <undefined>

    The file is defined as ASCII and the copyright symbol shows up just
    fine in Notepad++. However, Python will not print this symbol. How can
    I get this to work? And no, I won't replace it with "(c)". Thanks!
     
    Robert Dailey, Aug 6, 2009
    #1
    1. Advertising

  2. On Aug 6, 2009, at 12:14 PM, Robert Dailey wrote:

    > Hello,
    >
    > I'm loading a file via open() in Python 3.1 and I'm getting the
    > following error when I try to print the contents of the file that I
    > obtained through a call to read():
    >
    > UnicodeEncodeError: 'charmap' codec can't encode character '\xa9' in
    > position 1650: character maps to <undefined>
    >
    > The file is defined as ASCII and the copyright symbol shows up just
    > fine in Notepad++. However, Python will not print this symbol. How can
    > I get this to work? And no, I won't replace it with "(c)". Thanks!


    If the file is defined as ASCII and it contains 0xa9, then the file
    was written incorrectly or you were told the wrong encoding. There is
    no such character in ASCII which runs from 0x00 - 0x7f.

    The copyright symbol == 0xa9 if the encoding is ISO-8859-1 or
    windows-1252, and since you're on Windows the latter is a likely bet.

    http://en.wikipedia.org/wiki/Ascii
    http://en.wikipedia.org/wiki/Iso-8859-1
    http://en.wikipedia.org/wiki/Windows-1252


    Bottom line is that your file is not in ASCII. Try specifying
    windows-1252 as the encoding. Without seeing your code I can't tell
    you where you need to specify the encoding, but the Python docs should
    help you out.


    HTH
    Philip
     
    Philip Semanchuk, Aug 6, 2009
    #2
    1. Advertising

  3. "Robert Dailey" <> wrote in message
    news:...

    > UnicodeEncodeError: 'charmap' codec can't encode character '\xa9' in
    > position 1650: character maps to <undefined>
    >
    > The file is defined as ASCII.


    That's the problem: ASCII is a seven bit code. What you have is
    actually ISO-8859-1 (or possibly Windows-1252).

    The different ISO-8859-n variants assign various characters to
    to '\xa9'. Rather than being Western-European centric and assuming
    ISO-8859-1 by default, Python throws an error when you stray
    outside of strict ASCII.
     
    Richard Brodie, Aug 6, 2009
    #3
  4. On Aug 6, 11:31 am, "Richard Brodie" <> wrote:
    > "Robert Dailey" <> wrote in message
    >
    > news:...
    >
    > > UnicodeEncodeError: 'charmap' codec can't encode character '\xa9' in
    > > position 1650: character maps to <undefined>

    >
    > > The file is defined as ASCII.

    >
    > That's the problem: ASCII is a seven bit code. What you have is
    > actually ISO-8859-1 (or possibly Windows-1252).
    >
    > The different ISO-8859-n variants assign various characters to
    > to '\xa9'. Rather than being Western-European centric and assuming
    > ISO-8859-1 by default, Python throws an error when you stray
    > outside of strict ASCII.


    Thanks for the help guys. Sorry I left out code, I wasn't sure at the
    time if it would be helpful. Below is my code:


    #========================================================
    def GetFileContentsAsString( file ):
    f = open( file, mode='r', encoding='cp1252' )
    contents = f.read()
    f.close()
    return contents

    #========================================================
    def ReplaceVersion( file, version, regExps ):
    #match = regExps[0].search( 'FILEVERSION 1,45332,2100,32,' )
    #print( match.group() )
    text = GetFileContentsAsString( file )
    print( text )


    As you can see, I am trying to load the file with encoding 'cp1252'
    which, according to the python 3.1 docs, translates to windows-1252. I
    also tried 'latin_1', which translates to ISO-8859-1, but this did not
    work either. Am I doing something else wrong?
     
    Robert Dailey, Aug 6, 2009
    #4
  5. On Thu, 2009-08-06 at 09:14 -0700, Robert Dailey wrote:
    > Hello,
    >
    > I'm loading a file via open() in Python 3.1 and I'm getting the
    > following error when I try to print the contents of the file that I
    > obtained through a call to read():
    >
    > UnicodeEncodeError: 'charmap' codec can't encode character '\xa9' in
    > position 1650: character maps to <undefined>
    >
    > The file is defined as ASCII and the copyright symbol shows up just
    > fine in Notepad++. However, Python will not print this symbol. How can
    > I get this to work? And no, I won't replace it with "(c)". Thanks!


    It's not actually ASCII but Windows-1252 extended ASCII-like. So with
    that information you can do either of 2 things: You can open it in text
    mode and specify the encoding:

    >>> fp = open(filename, 'r', encoding='windows-1252')
    >>> s = fp.read()
    >>> print(s)


    or you can open it in binary mode and decode it later:

    >>> fp = open(filename, 'rb')
    >>> b = fp.read()
    >>> print(str(b, encoding='windows-1252'))


    Or you may be able to set the default encoding to windows-1252 but I
    don't know how to do that (in Windows).

    p.s.

    Next time it might be helpful to paste a code snippet else we have to
    make assumptions about what you are actually doing.
     
    Albert Hopkins, Aug 6, 2009
    #5
  6. "Robert Dailey" <> wrote in message
    news:...

    > As you can see, I am trying to load the file with encoding 'cp1252'
    > which, according to the python 3.1 docs, translates to windows-1252. I
    > also tried 'latin_1', which translates to ISO-8859-1, but this did not
    > work either. Am I doing something else wrong?


    Probably it's just the debugging print that has a problem, and if you
    opened an output file with an encoding specified it would be fine.
    When you get a UnicodeEncodingError, it's conversion _from_
    Unicode that has failed.
     
    Richard Brodie, Aug 6, 2009
    #6
  7. On Aug 6, 2009, at 12:41 PM, Robert Dailey wrote:

    > On Aug 6, 11:31 am, "Richard Brodie" <> wrote:
    >> "Robert Dailey" <> wrote in message
    >>
    >> news:
    >> ...
    >>
    >>> UnicodeEncodeError: 'charmap' codec can't encode character '\xa9' in
    >>> position 1650: character maps to <undefined>

    >>
    >>> The file is defined as ASCII.

    >>
    >> That's the problem: ASCII is a seven bit code. What you have is
    >> actually ISO-8859-1 (or possibly Windows-1252).
    >>
    >> The different ISO-8859-n variants assign various characters to
    >> to '\xa9'. Rather than being Western-European centric and assuming
    >> ISO-8859-1 by default, Python throws an error when you stray
    >> outside of strict ASCII.

    >
    > Thanks for the help guys. Sorry I left out code, I wasn't sure at the
    > time if it would be helpful. Below is my code:
    >
    >
    > #========================================================
    > def GetFileContentsAsString( file ):
    > f = open( file, mode='r', encoding='cp1252' )
    > contents = f.read()
    > f.close()
    > return contents
    >
    > #========================================================
    > def ReplaceVersion( file, version, regExps ):
    > #match = regExps[0].search( 'FILEVERSION 1,45332,2100,32,' )
    > #print( match.group() )
    > text = GetFileContentsAsString( file )
    > print( text )
    >
    >
    > As you can see, I am trying to load the file with encoding 'cp1252'
    > which, according to the python 3.1 docs, translates to windows-1252. I
    > also tried 'latin_1', which translates to ISO-8859-1, but this did not
    > work either. Am I doing something else wrong?



    Are you getting the error when you read the file or when you
    print(text)?

    As a side note, you should probably use something other than "file"
    for the parameter name in GetFileContentsAsString() since file() is a
    Python function.
     
    Philip Semanchuk, Aug 6, 2009
    #7
  8. Robert Dailey

    Nobody Guest

    On Thu, 06 Aug 2009 09:14:08 -0700, Robert Dailey wrote:

    > I'm loading a file via open() in Python 3.1 and I'm getting the
    > following error when I try to print the contents of the file that I
    > obtained through a call to read():
    >
    > UnicodeEncodeError: 'charmap' codec can't encode character '\xa9' in
    > position 1650: character maps to <undefined>
    >
    > The file is defined as ASCII and the copyright symbol shows up just
    > fine in Notepad++. However, Python will not print this symbol. How can
    > I get this to work? And no, I won't replace it with "(c)". Thanks!


    1. As others have said, your file *isn't* ASCII, but that isn't the
    problem.

    2. The problem is that the encoding which your standard output
    stream uses doesn't have the copyright symbol. You need to use something
    like:

    sys.stdout = io.TextIOWrapper(sys.stdout.detach(), encoding = 'iso-8859-1')
    sys.stderr = io.TextIOWrapper(sys.stderr.detach(), encoding = 'iso-8859-1')

    to fix the encoding of the stdout and stderr streams.
     
    Nobody, Aug 6, 2009
    #8
  9. > As a side note, you should probably use something other than "file" for
    > the parameter name in GetFileContentsAsString() since file() is a Python
    > function.


    Python 3.1.1a0 (py3k:74094, Jul 19 2009, 13:39:42)
    [GCC 4.3.3] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    py> file
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    NameError: name 'file' is not defined

    Regards,
    Martin
     
    Martin v. Löwis, Aug 6, 2009
    #9
  10. On Aug 6, 2009, at 3:14 PM, Martin v. Löwis wrote:

    >> As a side note, you should probably use something other than "file"
    >> for
    >> the parameter name in GetFileContentsAsString() since file() is a
    >> Python
    >> function.

    >
    > Python 3.1.1a0 (py3k:74094, Jul 19 2009, 13:39:42)
    > [GCC 4.3.3] on linux2
    > Type "help", "copyright", "credits" or "license" for more information.
    > py> file
    > Traceback (most recent call last):
    > File "<stdin>", line 1, in <module>
    > NameError: name 'file' is not defined



    Whooops, didn't know about that change from 2.x to 3.x. Thanks.
     
    Philip Semanchuk, Aug 6, 2009
    #10
  11. On Thu, Aug 6, 2009 at 12:41 PM, Robert Dailey<> wrote:
    > On Aug 6, 11:31 am, "Richard Brodie" <> wrote:
    >> "Robert Dailey" <> wrote in message
    >>
    >> news:....
    >>
    >> > UnicodeEncodeError: 'charmap' codec can't encode character '\xa9' in
    >> > position 1650: character maps to <undefined>

    >>
    >> > The file is defined as ASCII.

    >>
    >> That's the problem: ASCII is a seven bit code. What you have is
    >> actually ISO-8859-1 (or possibly Windows-1252).
    >>
    >> The different ISO-8859-n variants assign various characters to
    >> to '\xa9'. Rather than being Western-European centric and assuming
    >> ISO-8859-1 by default, Python throws an error when you stray
    >> outside of strict ASCII.

    >
    > Thanks for the help guys. Sorry I left out code, I wasn't sure at the
    > time if it would be helpful. Below is my code:
    >
    >
    > #========================================================
    > def GetFileContentsAsString( file ):
    >   f = open( file, mode='r', encoding='cp1252' )
    >   contents = f.read()
    >   f.close()
    >   return contents
    >
    > #========================================================
    > def ReplaceVersion( file, version, regExps ):
    >   #match = regExps[0].search( 'FILEVERSION 1,45332,2100,32,' )
    >   #print( match.group() )
    >   text = GetFileContentsAsString( file )
    >   print( text )
    >
    >
    > As you can see, I am trying to load the file with encoding 'cp1252'
    > which, according to the python 3.1 docs, translates to windows-1252. I
    > also tried 'latin_1', which translates to ISO-8859-1, but this did not
    > work either. Am I doing something else wrong?


    This is why we need code and full tracebacks. There's a good chance
    that your error is on the print(text) line. That's because sys.stdout
    is probably a byte stream without an encoding defined. When you try to
    print your unicode string, Python has to convert it to a stream of
    bytes. Python refuses to guess on the console encoding and just falls
    back to ascii, the conversion fails, and you get your error. Try using
    print( text.encode( 'cp1252' ) ) instead.
    > --
    > http://mail.python.org/mailman/listinfo/python-list
    >
     
    Benjamin Kaplan, Aug 6, 2009
    #11
  12. Robert Dailey

    Dave Angel Guest

    Robert Dailey wrote:
    > Hello,
    >
    > I'm loading a file via open() in Python 3.1 and I'm getting the
    > following error when I try to print the contents of the file that I
    > obtained through a call to read():
    >
    > UnicodeEncodeError: 'charmap' codec can't encode character '\xa9' in
    > position 1650: character maps to <undefined>
    >
    > The file is defined as ASCII and the copyright symbol shows up just
    > fine in Notepad++. However, Python will not print this symbol. How can
    > I get this to work? And no, I won't replace it with "(c)". Thanks!
    >
    >

    I see others have alerted you to changes needed in stdout, which is
    ASCII coded by default.

    But I wanted to comment on the (c) remark. If you're in the US, that's
    the wrong abbreviation for copyright. The only recognized abbreviation
    is (copr).

    DaveA
     
    Dave Angel, Aug 6, 2009
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?bW9XaGl0ZQ==?=

    copyright symbol in web.config

    =?Utf-8?B?bW9XaGl0ZQ==?=, Mar 7, 2006, in forum: ASP .Net
    Replies:
    10
    Views:
    6,482
    ankyeez
    Aug 22, 2012
  2. raavi
    Replies:
    2
    Views:
    917
    raavi
    Mar 2, 2006
  3. baumann@pan
    Replies:
    1
    Views:
    757
    Richard Bos
    Apr 15, 2005
  4. Brian Marick
    Replies:
    1
    Views:
    206
    NAKAMURA, Hiroshi
    Nov 9, 2003
  5. Song Ma
    Replies:
    2
    Views:
    244
    Charles Oliver Nutter
    Jul 20, 2008
Loading...

Share This Page