Re: codecs.register_error for "strict",unicode.encode() and str.decode()

Discussion in 'Python' started by Peter Otten, Jul 27, 2012.

  1. Peter Otten

    Peter Otten Guest

    Alan Franzoni wrote:

    > Hello,
    > I think I'm missing some piece here.
    >
    > I'm trying to register a default error handler for handling exceptions
    > for preventing encoding/decoding errors (I know how this works and that
    > making this global is probably not a good practice, but I found this
    > strange behaviour while writing a proof of concept of how to let Python
    > work in a more forgiving way).
    >
    > What I discovered is that register_error() for "strict" seems to work in
    > the way I expect for string decoding, not for unicode encoding.
    >
    > That's what happens on Mac, Python 2.7.1 from Apple:
    >
    > melquiades:tmp alan$ cat minimal_test_encode.py
    > # -*- coding: utf-8 -*-
    >
    > import codecs
    >
    > def handle_encode(e):
    > return ("ASD", e.end)
    >
    > codecs.register_error("strict", handle_encode)
    >
    > print u"à".encode("ascii")
    >
    > melquiades:tmp alan$ python minimal_test_encode.py
    > Traceback (most recent call last):
    > File "minimal_test_encode.py", line 10, in <module>
    > u"à".encode("ascii")
    > UnicodeEncodeError: 'ascii' codec can't encode character u'\xe0' in
    > position 0: ordinal not in range(128)
    >
    >
    > OTOH this works properly:
    >
    > melquiades:tmp alan$ cat minimal_test_decode.py
    > # -*- coding: utf-8 -*-
    >
    > import codecs
    >
    > def handle_decode(e):
    > return (u"ASD", e.end)
    >
    > codecs.register_error("strict", handle_decode)
    >
    > print "à".decode("ascii")
    >
    > melquiades:tmp alan$ python minimal_test_decode.py
    > ASDASD
    >
    >
    > What piece am I missing? The doc at
    > http://docs.python.org/library/codecs.html says " For
    > encoding /error_handler/ will be called with a UnicodeEncodeError
    >

    <http://docs.python.org/library/exceptions.html#exceptions.UnicodeEncodeError>
    > instance, which contains information about the location of the error.", is
    > there any reason why the standard "strict" handler cannot be replaced?


    The error handling for the standard erorrs "strict", "replace", "ignore",
    and "xmlcharrefreplace" is hardwired, see function unicode_encode_ucs1 in
    Lib/unicodeobject.c:

    if (known_errorHandler==-1) {
    if ((errors==NULL) || (!strcmp(errors, "strict")))
    known_errorHandler = 1;
    ....
    switch (known_errorHandler) {
    case 1: /* strict */
    raise_encode_exception(&exc, encoding, unicode, collstart,
    collend, reason);
    goto onError;

    You need another gun to shoot yourself in the foot ;)
    Peter Otten, Jul 27, 2012
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Harald Kirsch
    Replies:
    2
    Views:
    2,114
    Harald Kirsch
    Aug 28, 2003
  2. aurora
    Replies:
    2
    Views:
    543
    aurora
    Jan 14, 2006
  3. =?UTF-8?B?UmFmYcWCIE1haiBSYWYyNTY=?=

    c++ support for unicode, utf-8, encode/decode, ifstream, wstream?

    =?UTF-8?B?UmFmYcWCIE1haiBSYWYyNTY=?=, Jan 20, 2006, in forum: C++
    Replies:
    12
    Views:
    6,334
    JustBoo
    Jan 23, 2006
  4. Karl Knechtel
    Replies:
    2
    Views:
    358
    Walter Dörwald
    Jul 10, 2012
  5. Alan Franzoni
    Replies:
    0
    Views:
    193
    Alan Franzoni
    Jul 27, 2012
Loading...

Share This Page