Printing characters outside of the ASCII range

Discussion in 'Python' started by danielk, Nov 9, 2012.

  1. danielk

    danielk Guest

    I'm converting an application to Python 3. The app works fine on Python 2.

    Simply put, this simple one-liner:

    print(chr(254))

    errors out with:

    Traceback (most recent call last):
    File "D:\home\python\tst.py", line 1, in <module>
    print(chr(254))
    File "C:\Python33\lib\encodings\cp437.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
    UnicodeEncodeError: 'charmap' codec can't encode character '\xfe' in position 0: character maps to <undefined>

    I'm using this character as a delimiter in my application.

    What do I have to do to convert this string so that it does not error out?
    danielk, Nov 9, 2012
    #1
    1. Advertising

  2. danielk

    Ian Kelly Guest

    On Fri, Nov 9, 2012 at 10:17 AM, danielk <> wrote:
    > I'm converting an application to Python 3. The app works fine on Python 2..
    >
    > Simply put, this simple one-liner:
    >
    > print(chr(254))
    >
    > errors out with:
    >
    > Traceback (most recent call last):
    > File "D:\home\python\tst.py", line 1, in <module>
    > print(chr(254))
    > File "C:\Python33\lib\encodings\cp437.py", line 19, in encode
    > return codecs.charmap_encode(input,self.errors,encoding_map)[0]
    > UnicodeEncodeError: 'charmap' codec can't encode character '\xfe' in position 0: character maps to <undefined>
    >
    > I'm using this character as a delimiter in my application.
    >
    > What do I have to do to convert this string so that it does not error out?


    In Python 2, chr(254) means the byte 254.

    In Python 3, chr(254) means the Unicode character with code point 254,
    which is "þ". This character does not exist in CP 437, so it fails to
    encode it for output.

    If what you really want is the byte, then use b'\xfe' or bytes([254]) instead.
    Ian Kelly, Nov 9, 2012
    #2
    1. Advertising

  3. danielk

    Andrew Berg Guest

    On 2012.11.09 11:17, danielk wrote:
    > I'm converting an application to Python 3. The app works fine on Python 2.
    >
    > Simply put, this simple one-liner:
    >
    > print(chr(254))
    >
    > errors out with:
    >
    > Traceback (most recent call last):
    > File "D:\home\python\tst.py", line 1, in <module>
    > print(chr(254))
    > File "C:\Python33\lib\encodings\cp437.py", line 19, in encode
    > return codecs.charmap_encode(input,self.errors,encoding_map)[0]
    > UnicodeEncodeError: 'charmap' codec can't encode character '\xfe' in position 0: character maps to <undefined>
    >
    > I'm using this character as a delimiter in my application.
    >
    > What do I have to do to convert this string so that it does not error out?
    >

    That character is outside of cp437 - the default terminal encoding on
    many Windows systems. You will either need to change the code page to
    something that supports the character (if you're going to change it, you
    might as well change it to cp65001 since you are using 3.3), catch the
    error and replace the character with something that is in the current
    codepage (don't assume cp437; it is not the default everywhere), or use
    a different character completely. If it works on Python 2, it's probably
    changing the character automatically to a replacement character or you
    were using IDLE, which is graphical and is not subject to the weird
    encoding system of terminals.
    --
    CPython 3.3.0 | Windows NT 6.1.7601.17835
    Andrew Berg, Nov 9, 2012
    #3
  4. danielk

    Dave Angel Guest

    On 11/09/2012 12:17 PM, danielk wrote:
    > I'm converting an application to Python 3. The app works fine on Python 2.
    >
    > Simply put, this simple one-liner:
    >
    > print(chr(254))
    >
    > errors out with:
    >
    > Traceback (most recent call last):
    > File "D:\home\python\tst.py", line 1, in <module>
    > print(chr(254))
    > File "C:\Python33\lib\encodings\cp437.py", line 19, in encode
    > return codecs.charmap_encode(input,self.errors,encoding_map)[0]
    > UnicodeEncodeError: 'charmap' codec can't encode character '\xfe' in position 0: character maps to <undefined>
    >
    > I'm using this character as a delimiter in my application.
    >
    > What do I have to do to convert this string so that it does not error out?


    What character do you want? What characters do your console handle
    directly? What does a "delimiter" mean for your particular console?

    Or are you just printing it for the fun of it, and the real purpose is
    for further processing, which will not go to the console?

    What kind of things will it be separating? (strings, bytes ?) Clearly
    you originally picked it as something unlikely to occur in those elements.

    When those things are combined with a separator between, how are the
    results going to be used? Saved to a file? Printed to console? What?

    --

    DaveA
    Dave Angel, Nov 9, 2012
    #4
  5. danielk

    danielk Guest

    On Friday, November 9, 2012 12:48:05 PM UTC-5, Dave Angel wrote:
    > On 11/09/2012 12:17 PM, danielk wrote:
    >
    > > I'm converting an application to Python 3. The app works fine on Python2.

    >
    > >

    >
    > > Simply put, this simple one-liner:

    >
    > >

    >
    > > print(chr(254))

    >
    > >

    >
    > > errors out with:

    >
    > >

    >
    > > Traceback (most recent call last):

    >
    > > File "D:\home\python\tst.py", line 1, in <module>

    >
    > > print(chr(254))

    >
    > > File "C:\Python33\lib\encodings\cp437.py", line 19, in encode

    >
    > > return codecs.charmap_encode(input,self.errors,encoding_map)[0]

    >
    > > UnicodeEncodeError: 'charmap' codec can't encode character '\xfe' in position 0: character maps to <undefined>

    >
    > >

    >
    > > I'm using this character as a delimiter in my application.

    >
    > >

    >
    > > What do I have to do to convert this string so that it does not error out?

    >
    >
    >
    > What character do you want? What characters do your console handle
    >
    > directly? What does a "delimiter" mean for your particular console?
    >
    >
    >
    > Or are you just printing it for the fun of it, and the real purpose is
    >
    > for further processing, which will not go to the console?
    >
    >
    >
    > What kind of things will it be separating? (strings, bytes ?) Clearly
    >
    > you originally picked it as something unlikely to occur in those elements..
    >
    >
    >
    > When those things are combined with a separator between, how are the
    >
    > results going to be used? Saved to a file? Printed to console? What?
    >
    >
    >
    > --
    >
    >
    >
    > DaveA


    The database I'm using stores information as a 3-dimensional array. The delimiters between elements are chr(252), chr(253) and chr(254). So a record can look like this (example only uses one of the delimiters for simplicity):

    name + chr(254) + address + chr(254) + city + chr(254) + st + chr(254) + zip

    The other delimiters can be embedded within each field. For example, if there were multiple addresses for 'name' then the 'address' field would look like this:

    addr1 + chr(253) + addr2 + chr(253) + addr3 + etc ...

    I use Python to connect to the database using subprocess.Popen to run a server process. Python requests 'actions' like 'read' and 'write' to the server process, whereby the server process performs the actions. Some actions require that the server send back information in the form of records that contain those delimiters.

    I have __str__ and __repr__ methods in the classes but Python is choking onthose characters. Surely, I could convert those characters on the server before sending them to Python and that is what I'm probably going to do, so guess I've answered my own question. On Python 2, it just printed the 'extended' ASCII representation.

    I guess the question I have is: How do you tell Python to use a specific encoding for 'print' statements when I know there will be characters outside of the ASCII range of 0-127?
    danielk, Nov 9, 2012
    #5
  6. danielk

    danielk Guest

    On Friday, November 9, 2012 12:48:05 PM UTC-5, Dave Angel wrote:
    > On 11/09/2012 12:17 PM, danielk wrote:
    >
    > > I'm converting an application to Python 3. The app works fine on Python2.

    >
    > >

    >
    > > Simply put, this simple one-liner:

    >
    > >

    >
    > > print(chr(254))

    >
    > >

    >
    > > errors out with:

    >
    > >

    >
    > > Traceback (most recent call last):

    >
    > > File "D:\home\python\tst.py", line 1, in <module>

    >
    > > print(chr(254))

    >
    > > File "C:\Python33\lib\encodings\cp437.py", line 19, in encode

    >
    > > return codecs.charmap_encode(input,self.errors,encoding_map)[0]

    >
    > > UnicodeEncodeError: 'charmap' codec can't encode character '\xfe' in position 0: character maps to <undefined>

    >
    > >

    >
    > > I'm using this character as a delimiter in my application.

    >
    > >

    >
    > > What do I have to do to convert this string so that it does not error out?

    >
    >
    >
    > What character do you want? What characters do your console handle
    >
    > directly? What does a "delimiter" mean for your particular console?
    >
    >
    >
    > Or are you just printing it for the fun of it, and the real purpose is
    >
    > for further processing, which will not go to the console?
    >
    >
    >
    > What kind of things will it be separating? (strings, bytes ?) Clearly
    >
    > you originally picked it as something unlikely to occur in those elements..
    >
    >
    >
    > When those things are combined with a separator between, how are the
    >
    > results going to be used? Saved to a file? Printed to console? What?
    >
    >
    >
    > --
    >
    >
    >
    > DaveA


    The database I'm using stores information as a 3-dimensional array. The delimiters between elements are chr(252), chr(253) and chr(254). So a record can look like this (example only uses one of the delimiters for simplicity):

    name + chr(254) + address + chr(254) + city + chr(254) + st + chr(254) + zip

    The other delimiters can be embedded within each field. For example, if there were multiple addresses for 'name' then the 'address' field would look like this:

    addr1 + chr(253) + addr2 + chr(253) + addr3 + etc ...

    I use Python to connect to the database using subprocess.Popen to run a server process. Python requests 'actions' like 'read' and 'write' to the server process, whereby the server process performs the actions. Some actions require that the server send back information in the form of records that contain those delimiters.

    I have __str__ and __repr__ methods in the classes but Python is choking onthose characters. Surely, I could convert those characters on the server before sending them to Python and that is what I'm probably going to do, so guess I've answered my own question. On Python 2, it just printed the 'extended' ASCII representation.

    I guess the question I have is: How do you tell Python to use a specific encoding for 'print' statements when I know there will be characters outside of the ASCII range of 0-127?
    danielk, Nov 9, 2012
    #6
  7. danielk wrote:

    >
    > The database I'm using stores information as a 3-dimensional array. The delimiters between elements are
    > chr(252), chr(253) and chr(254). So a record can look like this (example only uses one of the delimiters for
    > simplicity):
    >
    > name + chr(254) + address + chr(254) + city + chr(254) + st + chr(254) + zip
    >
    > The other delimiters can be embedded within each field. For example, if there were multiple addresses for 'name'
    > then the 'address' field would look like this:
    >
    > addr1 + chr(253) + addr2 + chr(253) + addr3 + etc ...
    >
    > I use Python to connect to the database using subprocess.Popen to run a server process. Python requests
    > 'actions' like 'read' and 'write' to the server process, whereby the server process performs the actions. Some
    > actions require that the server send back information in the form of records that contain those delimiters.
    >
    > I have __str__ and __repr__ methods in the classes but Python is choking on those characters. Surely, I could
    > convert those characters on the server before sending them to Python and that is what I'm probably going to do,
    > so guess I've answered my own question. On Python 2, it just printed the 'extended' ASCII representation.
    >
    > I guess the question I have is: How do you tell Python to use a specific encoding for 'print' statements when I
    > know there willbe characters outside of the ASCII range of 0-127?


    You just need to change the string to one that is not
    trying to use the ASCII codec when printing.

    print(chr(253).decode('latin1')) # changelatin1 to your
    # chosen encoding.
    ý


    ~Ramit


    This email is confidential and subject to important disclaimers and
    conditions including on offers for the purchase or sale of
    securities, accuracy and completenessof information, viruses,
    confidentiality, legal privilege, and legal entity disclaimers,
    available at http://www.jpmorgan.com/pages/disclosures/email.
    Prasad, Ramit, Nov 9, 2012
    #7
  8. danielk

    Andrew Berg Guest

    On 2012.11.09 15:17, danielk wrote:
    > I guess the question I have is: How do you tell Python to use a specific encoding for 'print' statements when I know there will be characters outside of the ASCII range of 0-127?

    You don't. It's raising that exception because the terminal cannot
    display that character, not because it's using the wrong encoding. As
    Ian mentioned, chr() on Python 2 and chr() on Python 3 return two
    different things. I'm not very familiar with the oddities of Python 2,
    but I suspect sending bytes to the terminal could work since that is
    what chr() on Python 2 returns.
    --
    CPython 3.3.0 | Windows NT 6.1.7601.17835
    Andrew Berg, Nov 9, 2012
    #8
  9. danielk

    danielk Guest

    On Friday, November 9, 2012 4:34:19 PM UTC-5, Prasad, Ramit wrote:
    > danielk wrote:
    >
    > >

    >
    > > The database I'm using stores information as a 3-dimensional array. Thedelimiters between elements are

    >
    > > chr(252), chr(253) and chr(254). So a record can look like this (example only uses one of the delimiters for

    >
    > > simplicity):

    >
    > >

    >
    > > name + chr(254) + address + chr(254) + city + chr(254) + st + chr(254) + zip

    >
    > >

    >
    > > The other delimiters can be embedded within each field. For example, ifthere were multiple addresses for 'name'

    >
    > > then the 'address' field would look like this:

    >
    > >

    >
    > > addr1 + chr(253) + addr2 + chr(253) + addr3 + etc ...

    >
    > >

    >
    > > I use Python to connect to the database using subprocess.Popen to run aserver process. Python requests

    >
    > > 'actions' like 'read' and 'write' to the server process, whereby the server process performs the actions. Some

    >
    > > actions require that the server send back information in the form of records that contain those delimiters.

    >
    > >

    >
    > > I have __str__ and __repr__ methods in the classes but Python is choking on those characters. Surely, I could

    >
    > > convert those characters on the server before sending them to Python and that is what I'm probably going to do,

    >
    > > so guess I've answered my own question. On Python 2, it just printed the 'extended' ASCII representation.

    >
    > >

    >
    > > I guess the question I have is: How do you tell Python to use a specific encoding for 'print' statements when I

    >
    > > know there will be characters outside of the ASCII range of 0-127?

    >
    >
    >
    > You just need to change the string to one that is not
    >
    > trying to use the ASCII codec when printing.
    >
    >
    >
    > print(chr(253).decode('latin1')) # change latin1 to your
    >
    > # chosen encoding.
    >
    > ý
    >
    >
    >
    >
    >
    > ~Ramit
    >
    >
    >
    >
    >
    > This email is confidential and subject to important disclaimers and
    >
    > conditions including on offers for the purchase or sale of
    >
    > securities, accuracy and completeness of information, viruses,
    >
    > confidentiality, legal privilege, and legal entity disclaimers,
    >
    > available at http://www.jpmorgan.com/pages/disclosures/email.


    D:\home\python>pytest.py
    Traceback (most recent call last):
    File "D:\home\python\pytest.py", line 1, in <module>
    print(chr(253).decode('latin1'))
    AttributeError: 'str' object has no attribute 'decode'

    Do I need to import something?
    danielk, Nov 9, 2012
    #9
  10. danielk

    danielk Guest

    On Friday, November 9, 2012 4:34:19 PM UTC-5, Prasad, Ramit wrote:
    > danielk wrote:
    >
    > >

    >
    > > The database I'm using stores information as a 3-dimensional array. Thedelimiters between elements are

    >
    > > chr(252), chr(253) and chr(254). So a record can look like this (example only uses one of the delimiters for

    >
    > > simplicity):

    >
    > >

    >
    > > name + chr(254) + address + chr(254) + city + chr(254) + st + chr(254) + zip

    >
    > >

    >
    > > The other delimiters can be embedded within each field. For example, ifthere were multiple addresses for 'name'

    >
    > > then the 'address' field would look like this:

    >
    > >

    >
    > > addr1 + chr(253) + addr2 + chr(253) + addr3 + etc ...

    >
    > >

    >
    > > I use Python to connect to the database using subprocess.Popen to run aserver process. Python requests

    >
    > > 'actions' like 'read' and 'write' to the server process, whereby the server process performs the actions. Some

    >
    > > actions require that the server send back information in the form of records that contain those delimiters.

    >
    > >

    >
    > > I have __str__ and __repr__ methods in the classes but Python is choking on those characters. Surely, I could

    >
    > > convert those characters on the server before sending them to Python and that is what I'm probably going to do,

    >
    > > so guess I've answered my own question. On Python 2, it just printed the 'extended' ASCII representation.

    >
    > >

    >
    > > I guess the question I have is: How do you tell Python to use a specific encoding for 'print' statements when I

    >
    > > know there will be characters outside of the ASCII range of 0-127?

    >
    >
    >
    > You just need to change the string to one that is not
    >
    > trying to use the ASCII codec when printing.
    >
    >
    >
    > print(chr(253).decode('latin1')) # change latin1 to your
    >
    > # chosen encoding.
    >
    > ý
    >
    >
    >
    >
    >
    > ~Ramit
    >
    >
    >
    >
    >
    > This email is confidential and subject to important disclaimers and
    >
    > conditions including on offers for the purchase or sale of
    >
    > securities, accuracy and completeness of information, viruses,
    >
    > confidentiality, legal privilege, and legal entity disclaimers,
    >
    > available at http://www.jpmorgan.com/pages/disclosures/email.


    D:\home\python>pytest.py
    Traceback (most recent call last):
    File "D:\home\python\pytest.py", line 1, in <module>
    print(chr(253).decode('latin1'))
    AttributeError: 'str' object has no attribute 'decode'

    Do I need to import something?
    danielk, Nov 9, 2012
    #10
  11. danielk

    Ian Kelly Guest

    On Fri, Nov 9, 2012 at 2:46 PM, danielk <> wrote:
    > D:\home\python>pytest.py
    > Traceback (most recent call last):
    > File "D:\home\python\pytest.py", line 1, in <module>
    > print(chr(253).decode('latin1'))
    > AttributeError: 'str' object has no attribute 'decode'
    >
    > Do I need to import something?


    Ramit should have written "encode", not "decode". But the above still
    would not work, because chr(253) gives you the character at *Unicode*
    code point 253, not the character with CP437 ordinal 253 that your
    terminal can actually print. The Unicode equivalents of those
    characters are:

    >>> list(map(ord, bytes([252, 253, 254]).decode('cp437')))

    [8319, 178, 9632]

    So these are what you would need to encode to CP437 for printing.

    >>> print(chr(8319))

    â¿
    >>> print(chr(178))

    ²
    >>> print(chr(9632))

    â– 

    That's probably not the way you want to go about printing them,
    though, unless you mean to be inserting them manually. Is the data
    you get from your database a string, or a bytes object? If the
    former, just do:

    print(data.encode('cp437'))

    If the latter, then it should be printable as is, unless it is in some
    other encoding than CP437.
    Ian Kelly, Nov 9, 2012
    #11
  12. danielk

    Guest

    Le vendredi 9 novembre 2012 18:17:54 UTC+1, danielk a écrit :
    > I'm converting an application to Python 3. The app works fine on Python 2..
    >
    >
    >
    > Simply put, this simple one-liner:
    >
    >
    >
    > print(chr(254))
    >
    >
    >
    > errors out with:
    >
    >
    >
    > Traceback (most recent call last):
    >
    > File "D:\home\python\tst.py", line 1, in <module>
    >
    > print(chr(254))
    >
    > File "C:\Python33\lib\encodings\cp437.py", line 19, in encode
    >
    > return codecs.charmap_encode(input,self.errors,encoding_map)[0]
    >
    > UnicodeEncodeError: 'charmap' codec can't encode character '\xfe' in position 0: character maps to <undefined>
    >
    >
    >
    > I'm using this character as a delimiter in my application.
    >
    >
    >
    > What do I have to do to convert this string so that it does not error out?


    -----

    There is nothing wrong in having the character with
    the code point 0xfe in the cp437 coding scheme as
    a delimiter.

    If it is coming from a byte string, you should
    decode it properly

    >>> b'=\xfe=\xfe='.decode('cp437')

    '=â– =â– ='

    or you can use directly the unicode equivalent

    >>> '=\u25a0=\u25a0='

    '=â– =â– ='

    That's for "input". For "output" see:
    http://groups.google.com/group/comp.lang.python/browse_thread/thread/c29f2f7f5a4962e8#


    The choice of that character as a delimiter is not wrong.
    It's a little bit unfortunate, because it falls high in
    the "unicode table".

    >>> import fourbiunicode as fu
    >>> fu.UnicodeBlock('\u25a0')

    'Geometric Shapes'
    >>>
    >>> fu.UnicodeBlock(b'\xfe'.decode('cp437'))

    'Geometric Shapes'

    (Another form of explanation)
    jmf
    , Nov 10, 2012
    #12
  13. danielk

    danielk Guest

    On Friday, November 9, 2012 5:11:12 PM UTC-5, Ian wrote:
    > On Fri, Nov 9, 2012 at 2:46 PM, danielk <> wrote:
    >
    > > D:\home\python>pytest.py

    >
    > > Traceback (most recent call last):

    >
    > > File "D:\home\python\pytest.py", line 1, in <module>

    >
    > > print(chr(253).decode('latin1'))

    >
    > > AttributeError: 'str' object has no attribute 'decode'

    >
    > >

    >
    > > Do I need to import something?

    >
    >
    >
    > Ramit should have written "encode", not "decode". But the above still
    >
    > would not work, because chr(253) gives you the character at *Unicode*
    >
    > code point 253, not the character with CP437 ordinal 253 that your
    >
    > terminal can actually print. The Unicode equivalents of those
    >
    > characters are:
    >
    >
    >
    > >>> list(map(ord, bytes([252, 253, 254]).decode('cp437')))

    >
    > [8319, 178, 9632]
    >
    >
    >
    > So these are what you would need to encode to CP437 for printing.
    >
    >
    >
    > >>> print(chr(8319))

    >
    > â¿
    >
    > >>> print(chr(178))

    >
    > ²
    >
    > >>> print(chr(9632))

    >
    > â– 
    >
    >
    >
    > That's probably not the way you want to go about printing them,
    >
    > though, unless you mean to be inserting them manually. Is the data
    >
    > you get from your database a string, or a bytes object? If the
    >
    > former, just do:
    >
    >
    >
    > print(data.encode('cp437'))
    >
    >
    >
    > If the latter, then it should be printable as is, unless it is in some
    >
    > other encoding than CP437.


    Ian's solution gives me what I need (thanks Ian!). But I notice a difference between '__str__' and '__repr__'.

    class Pytest(str):
    def __init__(self, data = None):
    if data == None: data = ""
    self.data = data

    def __repr__(self):
    return (self.data).encode('cp437')

    >>> import pytest
    >>> p = pytest.Pytest("abc" + chr(178) + "def")
    >>> print(p)

    abc²def
    >>> print(p.data)

    abc²def
    >>> print(type(p.data))

    <class 'str'>

    If I change '__repr__' to '__str__' then I get:

    >>> import pytest
    >>> p = pytest.Pytest("abc" + chr(178) + "def")
    >>> print(p)

    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    TypeError: __str__ returned non-string (type bytes)

    Why is '__str__' behaving differently than '__repr__' ? I'd like to be ableto use '__str__' because the result is not executable code, it's just a string of the record contents.

    The documentation for the 'encode' method says: "Return an encoded version of the string as a bytes object." Yet when I displayed the type, it said itwas <class 'str'>, which I'm taking to be 'type string', or can a 'string'also be 'a string of bytes' ?

    I'm trying to get my head around all this codecs/unicode stuff. I haven't had to deal with it until now but I'm determined to not let it get the best of me :)

    My goals are:

    a) display a 'raw' database record with the delimiters intact, and
    b) allow the client to create a string that represents a database record. So, if they know the record format then they should be able to create a database object like it does above, but with the chr(25x) characters. I will handle the conversion of the chr(25x) characters internally.
    danielk, Nov 11, 2012
    #13
  14. danielk

    danielk Guest

    On Friday, November 9, 2012 5:11:12 PM UTC-5, Ian wrote:
    > On Fri, Nov 9, 2012 at 2:46 PM, danielk <> wrote:
    >
    > > D:\home\python>pytest.py

    >
    > > Traceback (most recent call last):

    >
    > > File "D:\home\python\pytest.py", line 1, in <module>

    >
    > > print(chr(253).decode('latin1'))

    >
    > > AttributeError: 'str' object has no attribute 'decode'

    >
    > >

    >
    > > Do I need to import something?

    >
    >
    >
    > Ramit should have written "encode", not "decode". But the above still
    >
    > would not work, because chr(253) gives you the character at *Unicode*
    >
    > code point 253, not the character with CP437 ordinal 253 that your
    >
    > terminal can actually print. The Unicode equivalents of those
    >
    > characters are:
    >
    >
    >
    > >>> list(map(ord, bytes([252, 253, 254]).decode('cp437')))

    >
    > [8319, 178, 9632]
    >
    >
    >
    > So these are what you would need to encode to CP437 for printing.
    >
    >
    >
    > >>> print(chr(8319))

    >
    > â¿
    >
    > >>> print(chr(178))

    >
    > ²
    >
    > >>> print(chr(9632))

    >
    > â– 
    >
    >
    >
    > That's probably not the way you want to go about printing them,
    >
    > though, unless you mean to be inserting them manually. Is the data
    >
    > you get from your database a string, or a bytes object? If the
    >
    > former, just do:
    >
    >
    >
    > print(data.encode('cp437'))
    >
    >
    >
    > If the latter, then it should be printable as is, unless it is in some
    >
    > other encoding than CP437.


    Ian's solution gives me what I need (thanks Ian!). But I notice a difference between '__str__' and '__repr__'.

    class Pytest(str):
    def __init__(self, data = None):
    if data == None: data = ""
    self.data = data

    def __repr__(self):
    return (self.data).encode('cp437')

    >>> import pytest
    >>> p = pytest.Pytest("abc" + chr(178) + "def")
    >>> print(p)

    abc²def
    >>> print(p.data)

    abc²def
    >>> print(type(p.data))

    <class 'str'>

    If I change '__repr__' to '__str__' then I get:

    >>> import pytest
    >>> p = pytest.Pytest("abc" + chr(178) + "def")
    >>> print(p)

    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    TypeError: __str__ returned non-string (type bytes)

    Why is '__str__' behaving differently than '__repr__' ? I'd like to be ableto use '__str__' because the result is not executable code, it's just a string of the record contents.

    The documentation for the 'encode' method says: "Return an encoded version of the string as a bytes object." Yet when I displayed the type, it said itwas <class 'str'>, which I'm taking to be 'type string', or can a 'string'also be 'a string of bytes' ?

    I'm trying to get my head around all this codecs/unicode stuff. I haven't had to deal with it until now but I'm determined to not let it get the best of me :)

    My goals are:

    a) display a 'raw' database record with the delimiters intact, and
    b) allow the client to create a string that represents a database record. So, if they know the record format then they should be able to create a database object like it does above, but with the chr(25x) characters. I will handle the conversion of the chr(25x) characters internally.
    danielk, Nov 11, 2012
    #14
  15. Am 09.11.2012 18:17 schrieb danielk:

    > I'm using this character as a delimiter in my application.


    Then you probably use the *byte* 254 as opposed to the *character* 254.

    So it might be better to either switch to byte strings, or output the
    representation of the string instead of itself.

    So do print(repr(chr(254))) or, for byte strings, print(bytes([254])).


    Thomas
    Thomas Rachel, Nov 11, 2012
    #15
  16. danielk

    Guest

    Your handling Pick Multi value fields aren't you ;)
    Just hit the same issue, thanks all here for various solutions.
    Interfacing with OpenQM / Scarlet DME here.
    , Mar 19, 2014
    #16
  17. On 19/03/2014 13:11, wrote:
    > Your handling Pick Multi value fields aren't you ;)
    > Just hit the same issue, thanks all here for various solutions.
    > Interfacing with OpenQM / Scarlet DME here.
    >


    The context is conspicious by its absence. In future would you please
    be kind enough to provide some.

    --
    My fellow Pythonistas, ask not what our language can do for you, ask
    what you can do for our language.

    Mark Lawrence

    ---
    This email is free from viruses and malware because avast! Antivirus protection is active.
    http://www.avast.com
    Mark Lawrence, Mar 19, 2014
    #17
  18. danielk

    Zachary Ware Guest

    On 19/03/2014 13:11, wrote:
    > Your handling Pick Multi value fields aren't you ;)
    > Just hit the same issue, thanks all here for various solutions.
    > Interfacing with OpenQM / Scarlet DME here.


    For future posts, please be sure to quote what you're replying to.
    Google Groups makes things easy to find and reply to, but this is a
    mailing list. When we receive a mail with just a subject line and a
    cryptic message, we're likely to think it spam and ignore future mail
    from that sender. It's also a bit less than ideal to reply to years
    old threads.

    On Wed, Mar 19, 2014 at 9:19 AM, Mark Lawrence <> wrote:
    > The context is conspicious by its absence. In future would you please be
    > kind enough to provide some.


    In a fit of curiosity, I went looking:
    https://mail.python.org/pipermail/python-list/2012-November/634803.html
    I'm almost surprised it wasn't any older than that :)

    Ironically, on my way down the November 2012 archive page, I noticed a
    long thread about "Obnoxious postings from Google Groups".

    --
    Zach
    Zachary Ware, Mar 19, 2014
    #18
  19. On 19/03/2014 14:43, Zachary Ware wrote:
    > Ironically, on my way down the November 2012 archive page, I noticed a
    > long thread about "Obnoxious postings from Google Groups".
    >


    Thankfully the number of grotty postings from gg has dropped
    considerably. Sadly our resident unicode expert quite deliberately
    continues to use it in a manner which is designed to annoy.

    --
    My fellow Pythonistas, ask not what our language can do for you, ask
    what you can do for our language.

    Mark Lawrence

    ---
    This email is free from viruses and malware because avast! Antivirus protection is active.
    http://www.avast.com
    Mark Lawrence, Mar 19, 2014
    #19
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?ISO-8859-1?Q?Marian_Aldenh=F6vel?=

    Printing Filenames with non-Ascii-Characters

    =?ISO-8859-1?Q?Marian_Aldenh=F6vel?=, Feb 1, 2005, in forum: Python
    Replies:
    13
    Views:
    672
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
    Feb 8, 2005
  2. Laszlo Nagy
    Replies:
    6
    Views:
    602
  3. Terry Reedy
    Replies:
    0
    Views:
    500
    Terry Reedy
    Jul 1, 2008
  4. Alextophi
    Replies:
    8
    Views:
    490
    Alan J. Flavell
    Dec 30, 2005
  5. Ivan Shmakov
    Replies:
    5
    Views:
    374
    Ivan Shmakov
    Dec 12, 2011
Loading...

Share This Page