Printing characters outside of the ASCII range

D

danielk

I'm converting an application to Python 3. The app works fine on Python 2.

Simply put, this simple one-liner:

print(chr(254))

errors out with:

Traceback (most recent call last):
File "D:\home\python\tst.py", line 1, in <module>
print(chr(254))
File "C:\Python33\lib\encodings\cp437.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xfe' in position 0: character maps to <undefined>

I'm using this character as a delimiter in my application.

What do I have to do to convert this string so that it does not error out?
 
I

Ian Kelly

I'm converting an application to Python 3. The app works fine on Python 2..

Simply put, this simple one-liner:

print(chr(254))

errors out with:

Traceback (most recent call last):
File "D:\home\python\tst.py", line 1, in <module>
print(chr(254))
File "C:\Python33\lib\encodings\cp437.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xfe' in position 0: character maps to <undefined>

I'm using this character as a delimiter in my application.

What do I have to do to convert this string so that it does not error out?

In Python 2, chr(254) means the byte 254.

In Python 3, chr(254) means the Unicode character with code point 254,
which is "þ". This character does not exist in CP 437, so it fails to
encode it for output.

If what you really want is the byte, then use b'\xfe' or bytes([254]) instead.
 
A

Andrew Berg

I'm converting an application to Python 3. The app works fine on Python 2.

Simply put, this simple one-liner:

print(chr(254))

errors out with:

Traceback (most recent call last):
File "D:\home\python\tst.py", line 1, in <module>
print(chr(254))
File "C:\Python33\lib\encodings\cp437.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xfe' in position 0: character maps to <undefined>

I'm using this character as a delimiter in my application.

What do I have to do to convert this string so that it does not error out?
That character is outside of cp437 - the default terminal encoding on
many Windows systems. You will either need to change the code page to
something that supports the character (if you're going to change it, you
might as well change it to cp65001 since you are using 3.3), catch the
error and replace the character with something that is in the current
codepage (don't assume cp437; it is not the default everywhere), or use
a different character completely. If it works on Python 2, it's probably
changing the character automatically to a replacement character or you
were using IDLE, which is graphical and is not subject to the weird
encoding system of terminals.
 
D

Dave Angel

I'm converting an application to Python 3. The app works fine on Python 2.

Simply put, this simple one-liner:

print(chr(254))

errors out with:

Traceback (most recent call last):
File "D:\home\python\tst.py", line 1, in <module>
print(chr(254))
File "C:\Python33\lib\encodings\cp437.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xfe' in position 0: character maps to <undefined>

I'm using this character as a delimiter in my application.

What do I have to do to convert this string so that it does not error out?

What character do you want? What characters do your console handle
directly? What does a "delimiter" mean for your particular console?

Or are you just printing it for the fun of it, and the real purpose is
for further processing, which will not go to the console?

What kind of things will it be separating? (strings, bytes ?) Clearly
you originally picked it as something unlikely to occur in those elements.

When those things are combined with a separator between, how are the
results going to be used? Saved to a file? Printed to console? What?
 
D

danielk

I'm converting an application to Python 3. The app works fine on Python2.

Simply put, this simple one-liner:



errors out with:

Traceback (most recent call last):
File "D:\home\python\tst.py", line 1, in <module>

File "C:\Python33\lib\encodings\cp437.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xfe' in position 0: character maps to <undefined>

I'm using this character as a delimiter in my application.

What do I have to do to convert this string so that it does not error out?



What character do you want? What characters do your console handle

directly? What does a "delimiter" mean for your particular console?



Or are you just printing it for the fun of it, and the real purpose is

for further processing, which will not go to the console?



What kind of things will it be separating? (strings, bytes ?) Clearly

you originally picked it as something unlikely to occur in those elements..



When those things are combined with a separator between, how are the

results going to be used? Saved to a file? Printed to console? What?



--



DaveA

The database I'm using stores information as a 3-dimensional array. The delimiters between elements are chr(252), chr(253) and chr(254). So a record can look like this (example only uses one of the delimiters for simplicity):

name + chr(254) + address + chr(254) + city + chr(254) + st + chr(254) + zip

The other delimiters can be embedded within each field. For example, if there were multiple addresses for 'name' then the 'address' field would look like this:

addr1 + chr(253) + addr2 + chr(253) + addr3 + etc ...

I use Python to connect to the database using subprocess.Popen to run a server process. Python requests 'actions' like 'read' and 'write' to the server process, whereby the server process performs the actions. Some actions require that the server send back information in the form of records that contain those delimiters.

I have __str__ and __repr__ methods in the classes but Python is choking onthose characters. Surely, I could convert those characters on the server before sending them to Python and that is what I'm probably going to do, so guess I've answered my own question. On Python 2, it just printed the 'extended' ASCII representation.

I guess the question I have is: How do you tell Python to use a specific encoding for 'print' statements when I know there will be characters outside of the ASCII range of 0-127?
 
D

danielk

I'm converting an application to Python 3. The app works fine on Python2.

Simply put, this simple one-liner:



errors out with:

Traceback (most recent call last):
File "D:\home\python\tst.py", line 1, in <module>

File "C:\Python33\lib\encodings\cp437.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xfe' in position 0: character maps to <undefined>

I'm using this character as a delimiter in my application.

What do I have to do to convert this string so that it does not error out?



What character do you want? What characters do your console handle

directly? What does a "delimiter" mean for your particular console?



Or are you just printing it for the fun of it, and the real purpose is

for further processing, which will not go to the console?



What kind of things will it be separating? (strings, bytes ?) Clearly

you originally picked it as something unlikely to occur in those elements..



When those things are combined with a separator between, how are the

results going to be used? Saved to a file? Printed to console? What?



--



DaveA

The database I'm using stores information as a 3-dimensional array. The delimiters between elements are chr(252), chr(253) and chr(254). So a record can look like this (example only uses one of the delimiters for simplicity):

name + chr(254) + address + chr(254) + city + chr(254) + st + chr(254) + zip

The other delimiters can be embedded within each field. For example, if there were multiple addresses for 'name' then the 'address' field would look like this:

addr1 + chr(253) + addr2 + chr(253) + addr3 + etc ...

I use Python to connect to the database using subprocess.Popen to run a server process. Python requests 'actions' like 'read' and 'write' to the server process, whereby the server process performs the actions. Some actions require that the server send back information in the form of records that contain those delimiters.

I have __str__ and __repr__ methods in the classes but Python is choking onthose characters. Surely, I could convert those characters on the server before sending them to Python and that is what I'm probably going to do, so guess I've answered my own question. On Python 2, it just printed the 'extended' ASCII representation.

I guess the question I have is: How do you tell Python to use a specific encoding for 'print' statements when I know there will be characters outside of the ASCII range of 0-127?
 
P

Prasad, Ramit

danielk said:
The database I'm using stores information as a 3-dimensional array. The delimiters between elements are
chr(252), chr(253) and chr(254). So a record can look like this (example only uses one of the delimiters for
simplicity):

name + chr(254) + address + chr(254) + city + chr(254) + st + chr(254) + zip

The other delimiters can be embedded within each field. For example, if there were multiple addresses for 'name'
then the 'address' field would look like this:

addr1 + chr(253) + addr2 + chr(253) + addr3 + etc ...

I use Python to connect to the database using subprocess.Popen to run a server process. Python requests
'actions' like 'read' and 'write' to the server process, whereby the server process performs the actions. Some
actions require that the server send back information in the form of records that contain those delimiters.

I have __str__ and __repr__ methods in the classes but Python is choking on those characters. Surely, I could
convert those characters on the server before sending them to Python and that is what I'm probably going to do,
so guess I've answered my own question. On Python 2, it just printed the 'extended' ASCII representation.

I guess the question I have is: How do you tell Python to use a specific encoding for 'print' statements when I
know there willbe characters outside of the ASCII range of 0-127?

You just need to change the string to one that is not
trying to use the ASCII codec when printing.

print(chr(253).decode('latin1')) # changelatin1 to your
# chosen encoding.
ý


~Ramit


This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completenessof information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.
 
A

Andrew Berg

I guess the question I have is: How do you tell Python to use a specific encoding for 'print' statements when I know there will be characters outside of the ASCII range of 0-127?
You don't. It's raising that exception because the terminal cannot
display that character, not because it's using the wrong encoding. As
Ian mentioned, chr() on Python 2 and chr() on Python 3 return two
different things. I'm not very familiar with the oddities of Python 2,
but I suspect sending bytes to the terminal could work since that is
what chr() on Python 2 returns.
 
D

danielk

You just need to change the string to one that is not

trying to use the ASCII codec when printing.



print(chr(253).decode('latin1')) # change latin1 to your

# chosen encoding.

ý





~Ramit





This email is confidential and subject to important disclaimers and

conditions including on offers for the purchase or sale of

securities, accuracy and completeness of information, viruses,

confidentiality, legal privilege, and legal entity disclaimers,

available at http://www.jpmorgan.com/pages/disclosures/email.

D:\home\python>pytest.py
Traceback (most recent call last):
File "D:\home\python\pytest.py", line 1, in <module>
print(chr(253).decode('latin1'))
AttributeError: 'str' object has no attribute 'decode'

Do I need to import something?
 
D

danielk

You just need to change the string to one that is not

trying to use the ASCII codec when printing.



print(chr(253).decode('latin1')) # change latin1 to your

# chosen encoding.

ý





~Ramit





This email is confidential and subject to important disclaimers and

conditions including on offers for the purchase or sale of

securities, accuracy and completeness of information, viruses,

confidentiality, legal privilege, and legal entity disclaimers,

available at http://www.jpmorgan.com/pages/disclosures/email.

D:\home\python>pytest.py
Traceback (most recent call last):
File "D:\home\python\pytest.py", line 1, in <module>
print(chr(253).decode('latin1'))
AttributeError: 'str' object has no attribute 'decode'

Do I need to import something?
 
I

Ian Kelly

D:\home\python>pytest.py
Traceback (most recent call last):
File "D:\home\python\pytest.py", line 1, in <module>
print(chr(253).decode('latin1'))
AttributeError: 'str' object has no attribute 'decode'

Do I need to import something?

Ramit should have written "encode", not "decode". But the above still
would not work, because chr(253) gives you the character at *Unicode*
code point 253, not the character with CP437 ordinal 253 that your
terminal can actually print. The Unicode equivalents of those
characters are:
list(map(ord, bytes([252, 253, 254]).decode('cp437')))
[8319, 178, 9632]

So these are what you would need to encode to CP437 for printing.
â– 

That's probably not the way you want to go about printing them,
though, unless you mean to be inserting them manually. Is the data
you get from your database a string, or a bytes object? If the
former, just do:

print(data.encode('cp437'))

If the latter, then it should be printable as is, unless it is in some
other encoding than CP437.
 
W

wxjmfauth

Le vendredi 9 novembre 2012 18:17:54 UTC+1, danielk a écrit :
I'm converting an application to Python 3. The app works fine on Python 2..



Simply put, this simple one-liner:



print(chr(254))



errors out with:



Traceback (most recent call last):

File "D:\home\python\tst.py", line 1, in <module>

print(chr(254))

File "C:\Python33\lib\encodings\cp437.py", line 19, in encode

return codecs.charmap_encode(input,self.errors,encoding_map)[0]

UnicodeEncodeError: 'charmap' codec can't encode character '\xfe' in position 0: character maps to <undefined>



I'm using this character as a delimiter in my application.



What do I have to do to convert this string so that it does not error out?

-----

There is nothing wrong in having the character with
the code point 0xfe in the cp437 coding scheme as
a delimiter.

If it is coming from a byte string, you should
decode it properly
'=â– =â– ='

or you can use directly the unicode equivalent
'=â– =â– ='

That's for "input". For "output" see:
http://groups.google.com/group/comp.lang.python/browse_thread/thread/c29f2f7f5a4962e8#


The choice of that character as a delimiter is not wrong.
It's a little bit unfortunate, because it falls high in
the "unicode table".
'Geometric Shapes'

(Another form of explanation)
jmf
 
D

danielk

D:\home\python>pytest.py
Traceback (most recent call last):
File "D:\home\python\pytest.py", line 1, in <module>

AttributeError: 'str' object has no attribute 'decode'

Do I need to import something?



Ramit should have written "encode", not "decode". But the above still

would not work, because chr(253) gives you the character at *Unicode*

code point 253, not the character with CP437 ordinal 253 that your

terminal can actually print. The Unicode equivalents of those

characters are:


list(map(ord, bytes([252, 253, 254]).decode('cp437')))

[8319, 178, 9632]



So these are what you would need to encode to CP437 for printing.


â¿


²

â– 



That's probably not the way you want to go about printing them,

though, unless you mean to be inserting them manually. Is the data

you get from your database a string, or a bytes object? If the

former, just do:



print(data.encode('cp437'))



If the latter, then it should be printable as is, unless it is in some

other encoding than CP437.

Ian's solution gives me what I need (thanks Ian!). But I notice a difference between '__str__' and '__repr__'.

class Pytest(str):
def __init__(self, data = None):
if data == None: data = ""
self.data = data

def __repr__(self):
return (self.data).encode('cp437')
<class 'str'>

If I change '__repr__' to '__str__' then I get:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: __str__ returned non-string (type bytes)

Why is '__str__' behaving differently than '__repr__' ? I'd like to be ableto use '__str__' because the result is not executable code, it's just a string of the record contents.

The documentation for the 'encode' method says: "Return an encoded version of the string as a bytes object." Yet when I displayed the type, it said itwas <class 'str'>, which I'm taking to be 'type string', or can a 'string'also be 'a string of bytes' ?

I'm trying to get my head around all this codecs/unicode stuff. I haven't had to deal with it until now but I'm determined to not let it get the best of me :)

My goals are:

a) display a 'raw' database record with the delimiters intact, and
b) allow the client to create a string that represents a database record. So, if they know the record format then they should be able to create a database object like it does above, but with the chr(25x) characters. I will handle the conversion of the chr(25x) characters internally.
 
D

danielk

D:\home\python>pytest.py
Traceback (most recent call last):
File "D:\home\python\pytest.py", line 1, in <module>

AttributeError: 'str' object has no attribute 'decode'

Do I need to import something?



Ramit should have written "encode", not "decode". But the above still

would not work, because chr(253) gives you the character at *Unicode*

code point 253, not the character with CP437 ordinal 253 that your

terminal can actually print. The Unicode equivalents of those

characters are:


list(map(ord, bytes([252, 253, 254]).decode('cp437')))

[8319, 178, 9632]



So these are what you would need to encode to CP437 for printing.


â¿


²

â– 



That's probably not the way you want to go about printing them,

though, unless you mean to be inserting them manually. Is the data

you get from your database a string, or a bytes object? If the

former, just do:



print(data.encode('cp437'))



If the latter, then it should be printable as is, unless it is in some

other encoding than CP437.

Ian's solution gives me what I need (thanks Ian!). But I notice a difference between '__str__' and '__repr__'.

class Pytest(str):
def __init__(self, data = None):
if data == None: data = ""
self.data = data

def __repr__(self):
return (self.data).encode('cp437')
<class 'str'>

If I change '__repr__' to '__str__' then I get:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: __str__ returned non-string (type bytes)

Why is '__str__' behaving differently than '__repr__' ? I'd like to be ableto use '__str__' because the result is not executable code, it's just a string of the record contents.

The documentation for the 'encode' method says: "Return an encoded version of the string as a bytes object." Yet when I displayed the type, it said itwas <class 'str'>, which I'm taking to be 'type string', or can a 'string'also be 'a string of bytes' ?

I'm trying to get my head around all this codecs/unicode stuff. I haven't had to deal with it until now but I'm determined to not let it get the best of me :)

My goals are:

a) display a 'raw' database record with the delimiters intact, and
b) allow the client to create a string that represents a database record. So, if they know the record format then they should be able to create a database object like it does above, but with the chr(25x) characters. I will handle the conversion of the chr(25x) characters internally.
 
T

Thomas Rachel

Am 09.11.2012 18:17 schrieb danielk:
I'm using this character as a delimiter in my application.

Then you probably use the *byte* 254 as opposed to the *character* 254.

So it might be better to either switch to byte strings, or output the
representation of the string instead of itself.

So do print(repr(chr(254))) or, for byte strings, print(bytes([254])).


Thomas
 
D

diccon.tesson

Your handling Pick Multi value fields aren't you ;)
Just hit the same issue, thanks all here for various solutions.
Interfacing with OpenQM / Scarlet DME here.
 
M

Mark Lawrence

Your handling Pick Multi value fields aren't you ;)
Just hit the same issue, thanks all here for various solutions.
Interfacing with OpenQM / Scarlet DME here.

The context is conspicious by its absence. In future would you please
be kind enough to provide some.
 
Z

Zachary Ware

Your handling Pick Multi value fields aren't you ;)
Just hit the same issue, thanks all here for various solutions.
Interfacing with OpenQM / Scarlet DME here.

For future posts, please be sure to quote what you're replying to.
Google Groups makes things easy to find and reply to, but this is a
mailing list. When we receive a mail with just a subject line and a
cryptic message, we're likely to think it spam and ignore future mail
from that sender. It's also a bit less than ideal to reply to years
old threads.

The context is conspicious by its absence. In future would you please be
kind enough to provide some.

In a fit of curiosity, I went looking:
https://mail.python.org/pipermail/python-list/2012-November/634803.html
I'm almost surprised it wasn't any older than that :)

Ironically, on my way down the November 2012 archive page, I noticed a
long thread about "Obnoxious postings from Google Groups".
 
M

Mark Lawrence

Ironically, on my way down the November 2012 archive page, I noticed a
long thread about "Obnoxious postings from Google Groups".

Thankfully the number of grotty postings from gg has dropped
considerably. Sadly our resident unicode expert quite deliberately
continues to use it in a manner which is designed to annoy.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,044
Latest member
RonaldNen

Latest Threads

Top