Character encoding & the copyright symbol

Robert Dailey · Aug 6, 2009

Hello,

I'm loading a file via open() in Python 3.1 and I'm getting the
following error when I try to print the contents of the file that I
obtained through a call to read():

UnicodeEncodeError: 'charmap' codec can't encode character '\xa9' in
position 1650: character maps to <undefined>

The file is defined as ASCII and the copyright symbol shows up just
fine in Notepad++. However, Python will not print this symbol. How can
I get this to work? And no, I won't replace it with "(c)". Thanks!

Philip Semanchuk · Aug 6, 2009

Hello,

I'm loading a file via open() in Python 3.1 and I'm getting the
following error when I try to print the contents of the file that I
obtained through a call to read():

UnicodeEncodeError: 'charmap' codec can't encode character '\xa9' in
position 1650: character maps to <undefined>

The file is defined as ASCII and the copyright symbol shows up just
fine in Notepad++. However, Python will not print this symbol. How can
I get this to work? And no, I won't replace it with "(c)". Thanks!

If the file is defined as ASCII and it contains 0xa9, then the file
was written incorrectly or you were told the wrong encoding. There is
no such character in ASCII which runs from 0x00 - 0x7f.

The copyright symbol == 0xa9 if the encoding is ISO-8859-1 or
windows-1252, and since you're on Windows the latter is a likely bet.

http://en.wikipedia.org/wiki/Ascii
http://en.wikipedia.org/wiki/Iso-8859-1
http://en.wikipedia.org/wiki/Windows-1252

Bottom line is that your file is not in ASCII. Try specifying
windows-1252 as the encoding. Without seeing your code I can't tell
you where you need to specify the encoding, but the Python docs should
help you out.

HTH
Philip

Richard Brodie · Aug 6, 2009

UnicodeEncodeError: 'charmap' codec can't encode character '\xa9' in
position 1650: character maps to <undefined>

The file is defined as ASCII.

That's the problem: ASCII is a seven bit code. What you have is
actually ISO-8859-1 (or possibly Windows-1252).

The different ISO-8859-n variants assign various characters to
to '\xa9'. Rather than being Western-European centric and assuming
ISO-8859-1 by default, Python throws an error when you stray
outside of strict ASCII.

Robert Dailey · Aug 6, 2009

That's the problem: ASCII is a seven bit code. What you have is
actually ISO-8859-1 (or possibly Windows-1252).

The different ISO-8859-n variants assign various characters to
to '\xa9'. Rather than being Western-European centric and assuming
ISO-8859-1 by default, Python throws an error when you stray
outside of strict ASCII.

Thanks for the help guys. Sorry I left out code, I wasn't sure at the
time if it would be helpful. Below is my code:

#========================================================
def GetFileContentsAsString( file ):
f = open( file, mode='r', encoding='cp1252' )
contents = f.read()
f.close()
return contents

#========================================================
def ReplaceVersion( file, version, regExps ):
#match = regExps[0].search( 'FILEVERSION 1,45332,2100,32,' )
#print( match.group() )
text = GetFileContentsAsString( file )
print( text )

As you can see, I am trying to load the file with encoding 'cp1252'
which, according to the python 3.1 docs, translates to windows-1252. I
also tried 'latin_1', which translates to ISO-8859-1, but this did not
work either. Am I doing something else wrong?

Albert Hopkins · Aug 6, 2009

Hello,

I'm loading a file via open() in Python 3.1 and I'm getting the
following error when I try to print the contents of the file that I
obtained through a call to read():

UnicodeEncodeError: 'charmap' codec can't encode character '\xa9' in
position 1650: character maps to <undefined>

The file is defined as ASCII and the copyright symbol shows up just
fine in Notepad++. However, Python will not print this symbol. How can
I get this to work? And no, I won't replace it with "(c)". Thanks!

It's not actually ASCII but Windows-1252 extended ASCII-like. So with
that information you can do either of 2 things: You can open it in text
mode and specify the encoding:

or you can open it in binary mode and decode it later:

Or you may be able to set the default encoding to windows-1252 but I
don't know how to do that (in Windows).

p.s.

Next time it might be helpful to paste a code snippet else we have to
make assumptions about what you are actually doing.

Richard Brodie · Aug 6, 2009

As you can see, I am trying to load the file with encoding 'cp1252'
which, according to the python 3.1 docs, translates to windows-1252. I
also tried 'latin_1', which translates to ISO-8859-1, but this did not
work either. Am I doing something else wrong?

Probably it's just the debugging print that has a problem, and if you
opened an output file with an encoding specified it would be fine.
When you get a UnicodeEncodingError, it's conversion _from_
Unicode that has failed.

Philip Semanchuk · Aug 6, 2009

...

That's the problem: ASCII is a seven bit code. What you have is
actually ISO-8859-1 (or possibly Windows-1252).

The different ISO-8859-n variants assign various characters to
to '\xa9'. Rather than being Western-European centric and assuming
ISO-8859-1 by default, Python throws an error when you stray
outside of strict ASCII.

Click to expand...

Thanks for the help guys. Sorry I left out code, I wasn't sure at the
time if it would be helpful. Below is my code:

#========================================================
def GetFileContentsAsString( file ):
f = open( file, mode='r', encoding='cp1252' )
contents = f.read()
f.close()
return contents

#========================================================
def ReplaceVersion( file, version, regExps ):
#match = regExps[0].search( 'FILEVERSION 1,45332,2100,32,' )
#print( match.group() )
text = GetFileContentsAsString( file )
print( text )

As you can see, I am trying to load the file with encoding 'cp1252'
which, according to the python 3.1 docs, translates to windows-1252. I
also tried 'latin_1', which translates to ISO-8859-1, but this did not
work either. Am I doing something else wrong?

Are you getting the error when you read the file or when you
print(text)?

As a side note, you should probably use something other than "file"
for the parameter name in GetFileContentsAsString() since file() is a
Python function.

Nobody · Aug 6, 2009

I'm loading a file via open() in Python 3.1 and I'm getting the
following error when I try to print the contents of the file that I
obtained through a call to read():

UnicodeEncodeError: 'charmap' codec can't encode character '\xa9' in
position 1650: character maps to <undefined>

The file is defined as ASCII and the copyright symbol shows up just
fine in Notepad++. However, Python will not print this symbol. How can
I get this to work? And no, I won't replace it with "(c)". Thanks!

1. As others have said, your file *isn't* ASCII, but that isn't the
problem.

2. The problem is that the encoding which your standard output
stream uses doesn't have the copyright symbol. You need to use something
like:

sys.stdout = io.TextIOWrapper(sys.stdout.detach(), encoding = 'iso-8859-1')
sys.stderr = io.TextIOWrapper(sys.stderr.detach(), encoding = 'iso-8859-1')

to fix the encoding of the stdout and stderr streams.

Martin v. Löwis · Aug 6, 2009

As a side note, you should probably use something other than "file" for

the parameter name in GetFileContentsAsString() since file() is a Python
function.

Python 3.1.1a0 (py3k:74094, Jul 19 2009, 13:39:42)
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
py> file
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'file' is not defined

Regards,
Martin

Philip Semanchuk · Aug 6, 2009

As a side note, you should probably use something other than "file"
for
the parameter name in GetFileContentsAsString() since file() is a
Python
function.

Click to expand...

Python 3.1.1a0 (py3k:74094, Jul 19 2009, 13:39:42)
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
py> file
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'file' is not defined

Whooops, didn't know about that change from 2.x to 3.x. Thanks.

Benjamin Kaplan · Aug 6, 2009

That's the problem: ASCII is a seven bit code. What you have is
actually ISO-8859-1 (or possibly Windows-1252).

The different ISO-8859-n variants assign various characters to
to '\xa9'. Rather than being Western-European centric and assuming
ISO-8859-1 by default, Python throws an error when you stray
outside of strict ASCII.

Click to expand...

Thanks for the help guys. Sorry I left out code, I wasn't sure at the
time if it would be helpful. Below is my code:

#========================================================
def GetFileContentsAsString( file ):
f = open( file, mode='r', encoding='cp1252' )
contents = f.read()
f.close()
return contents

#========================================================
def ReplaceVersion( file, version, regExps ):
#match = regExps[0].search( 'FILEVERSION 1,45332,2100,32,' )
#print( match.group() )
text = GetFileContentsAsString( file )
print( text )

As you can see, I am trying to load the file with encoding 'cp1252'
which, according to the python 3.1 docs, translates to windows-1252. I
also tried 'latin_1', which translates to ISO-8859-1, but this did not
work either. Am I doing something else wrong?

This is why we need code and full tracebacks. There's a good chance
that your error is on the print(text) line. That's because sys.stdout
is probably a byte stream without an encoding defined. When you try to
print your unicode string, Python has to convert it to a stream of
bytes. Python refuses to guess on the console encoding and just falls
back to ascii, the conversion fails, and you get your error. Try using
print( text.encode( 'cp1252' ) ) instead.

Dave Angel · Aug 6, 2009

Robert said:
Hello,

I'm loading a file via open() in Python 3.1 and I'm getting the
following error when I try to print the contents of the file that I
obtained through a call to read():

UnicodeEncodeError: 'charmap' codec can't encode character '\xa9' in
position 1650: character maps to <undefined>

The file is defined as ASCII and the copyright symbol shows up just
fine in Notepad++. However, Python will not print this symbol. How can
I get this to work? And no, I won't replace it with "(c)". Thanks!

I see others have alerted you to changes needed in stdout, which is
ASCII coded by default.

But I wanted to comment on the (c) remark. If you're in the US, that's
the wrong abbreviation for copyright. The only recognized abbreviation
is (copr).

DaveA

Printing characters outside of the ASCII range	18	Nov 9, 2012
files.py (encoding error)	0	Jun 10, 2013
encoding error	1	Feb 20, 2013
files.py (weird encoding error)	0	Jun 10, 2013
the stupid encoding problem to stdout	16	Jun 9, 2011
nntplib encoding problem	0	Feb 28, 2011
Encoding of surrogate code points to UTF-8	14	Oct 8, 2013
Encoding/decoding: Still don't get it :-/	4	Mar 13, 2009

Character encoding & the copyright symbol

Robert Dailey

Philip Semanchuk

Richard Brodie

Robert Dailey

Albert Hopkins

Richard Brodie

Philip Semanchuk

Nobody

Martin v. Löwis

Philip Semanchuk

Benjamin Kaplan

Dave Angel

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads