Re: Python3.3 str() bug?

Discussion in 'Python' started by Stefan Behnel, Nov 9, 2012.

  1. Helmut Jarausch, 09.11.2012 10:18:
    > probably I'm missing something.
    >
    > Using str(Arg) works just fine if Arg is a list.
    > But
    > str([],encoding='latin-1')
    >
    > gives the error
    > TypeError: coercing to str: need bytes, bytearray or buffer-like object,
    > list found
    >
    > If this isn't a bug how can I use str(Arg,encoding='latin-1') in general.
    > Do I need to flatten any data structure which is normally excepted by str() ?


    Funny idea to call this a bug in Python. What your code is asking for is to
    decode the object you pass in using the "latin-1" encoding. Since a list is
    not something that is "encoded", let alone in latin-1, you get an error,
    and actually a rather clear one.

    Note that this is not specific to Python3.3 or even 3.x. It's the same
    thing in Py2 when you call the equivalent unicode() function.

    Stefan
     
    Stefan Behnel, Nov 9, 2012
    #1
    1. Advertising

  2. Re: Python3.3 str() bug?

    On Fri, Nov 9, 2012 at 10:08 PM, Helmut Jarausch
    <-aachen.de> wrote:
    > For me it's not funny, at all.


    His description "funny" was in reference to the fact that you
    described this as a bug. This is a heavily-used mature language; bugs
    as fundamental as you imply are unlikely to exist (consequences of
    design decisions there will be, but not outright bugs, usually);
    extraordinary claims require extraordinary evidence.

    > Whenever Python3 encounters a bytestring it needs an encoding to convert it to
    > a string. If I feed a list of bytestrings or a list of list of bytestrings to
    > 'str' , etc, it should use the encoding for each bytestring component of the
    > given data structure.
    >
    > How can I convert a data strucure of arbitrarily complex nature, which contains
    > bytestrings somewhere, to a string?


    Okay, now we're getting somewhere.

    What you really should be doing is not transforming the whole
    structure, but explicitly transforming each part inside it. I
    recommend you stop fighting the language and start thinking about your
    data as either *bytes* or *characters* and using the appropriate data
    types (bytes or str) everywhere. You'll then find that it makes
    perfect sense to explicitly translate (en/decode) from one to another,
    but it doesn't make sense to encode a list in UTF-8 or decode a
    dictionary from Latin-1.

    > This problem has arisen while converting a working Python2 script to Python3.3.
    > Since Python2 doesn't have bytestrings it just works.


    Actually it does; it just calls them "str". And there's a Unicode
    string type, called "unicode", which is (more or less) the thing that
    Python 3 calls "str".

    You may be able to do some kind of recursive cast that, in one sweep
    of your data structure, encodes all str objects into bytes using a
    given encoding (or the reverse thereof). But I don't think this is the
    best way to do things.

    ChrisA
     
    Chris Angelico, Nov 9, 2012
    #2
    1. Advertising

  3. Re: Python3.3 str() bug?

    Helmut Jarausch, 09.11.2012 14:13:
    > On Fri, 09 Nov 2012 23:22:04 +1100, Chris Angelico wrote:
    >> What you really should be doing is not transforming the whole
    >> structure, but explicitly transforming each part inside it. I
    >> recommend you stop fighting the language and start thinking about your
    >> data as either *bytes* or *characters* and using the appropriate data
    >> types (bytes or str) everywhere. You'll then find that it makes
    >> perfect sense to explicitly translate (en/decode) from one to another,
    >> but it doesn't make sense to encode a list in UTF-8 or decode a
    >> dictionary from Latin-1.
    >>
    >>> This problem has arisen while converting a working Python2 script to Python3.3.
    >>> Since Python2 doesn't have bytestrings it just works.

    >>
    >> Actually it does; it just calls them "str". And there's a Unicode
    >> string type, called "unicode", which is (more or less) the thing that
    >> Python 3 calls "str".
    >>
    >> You may be able to do some kind of recursive cast that, in one sweep
    >> of your data structure, encodes all str objects into bytes using a
    >> given encoding (or the reverse thereof). But I don't think this is the
    >> best way to do things.

    >
    > Thanks, but in my case the (complex) object is returned via ctypes from the
    > aspell library.
    > I still think that a standard function in Python3 which is able to 'stringify'
    > objects should take an encoding parameter.


    And how would that work? Would it recursively run through all data
    structures you pass in or stop at some level or at some type of object?
    Would it simply concatenate the substrings (and with what separator?), or
    does the chaining depend on the objects found? Should it use the same
    separator for everything or different separators for each level of the data
    structure? Should it use str() for everything or repr() for some? Is str()
    the right thing or are there special objects that need more than just a
    call to str(), some kind of further preprocessing?

    There are so many ways to do something like this, and it's so straight
    forward to do in a given use case, that it's IMHO useless to even think
    about adding a "general solution" for this to the stdlib.

    Stefan
     
    Stefan Behnel, Nov 9, 2012
    #3
  4. RE: Python3.3 str() bug?

    Chris Angelico wrote:

    >
    > What you really should be doing is nottransforming the whole
    > structure, but explicitly transforming each part inside it. I
    > recommend you stop fighting the language and start thinking about your
    > data as either *bytes* or *characters* and using the appropriate data
    > types (bytes or str) everywhere. You'll then find that it makes
    > perfect sense to explicitly translate (en/decode) from one to another,
    > but it doesn't make sense to encode a listin UTF-8 or decode a
    > dictionary from Latin-1.
    >

    [snip]

    >
    > You may be able to do some kind of recursive cast that, in one sweep
    > of your data structure, encodes all str objects into bytesusing a
    > given encoding (or the reverse thereof). But I don't think this is the
    > best way to do things.


    I would think the best way is to convert as you load the data.
    That way everything is in the correct format as you manipulate
    and generate new data.


    ~Ramit


    This email is confidential and subjectto important disclaimers and
    conditions including on offers for the purchase or sale of
    securities, accuracy and completeness of information, viruses,
    confidentiality, legal privilege, and legal entity disclaimers,
    available at http://www.jpmorgan.com/pages/disclosures/email.
     
    Prasad, Ramit, Nov 9, 2012
    #4
  5. Stefan Behnel

    Terry Reedy Guest

    Re: Python3.3 str() bug?

    On 11/9/2012 8:13 AM, Helmut Jarausch wrote:

    > Just for the record.
    > I first discovered a real bug with Python3 when using os.walk on a file system
    > containing non-ascii characters in file names.
    >
    > I encountered a very strange behavior (I still would call it a bug) when trying
    > to put non-ascii characters in email headers.
    > This has only been solved satisfactorily in Python3.3.


    Most bugs, such as the above, are in library modules. There have been
    many related to unicode. In my opinion, 3.3 is the first version to
    handle unicode decently well.

    >>> How can I convert a data strucure of arbitrarily complex nature, which contains
    >>> bytestrings somewhere, to a string?


    > Thanks, but in my case the (complex) object is returned via ctypes from the
    > aspell library.
    > I still think that a standard function in Python3 which is able to 'stringify'
    > objects should take an encoding parameter.


    This is an interesting idea, which I have not seen before. It is more
    sensible in Python 3 than in Python 2. (For py2, unicode(str(object),
    encoding='xxx') does what you want.) Try presenting it here or on
    python-ideas as an enhancement request, rather than as a bug report ;-).

    In the meanwhile, if you cannot have the object constructed with strings
    rather than bytes, I suggest you write a custom converter function that
    understands the structure and replaces bytes with strings.

    --
    Terry Jan Reedy
     
    Terry Reedy, Nov 9, 2012
    #5
  6. Re: Python3.3 str() bug?

    On 9 November 2012 11:08, Helmut Jarausch <-aachen.de> wrote:
    > On Fri, 09 Nov 2012 10:37:11 +0100, Stefan Behnel wrote:
    >
    >> Helmut Jarausch, 09.11.2012 10:18:
    >>> probably I'm missing something.
    >>>
    >>> Using str(Arg) works just fine if Arg is a list.
    >>> But
    >>> str([],encoding='latin-1')
    >>>
    >>> gives the error
    >>> TypeError: coercing to str: need bytes, bytearray or buffer-like object,
    >>> list found
    >>>
    >>> If this isn't a bug how can I use str(Arg,encoding='latin-1') in general.
    >>> Do I need to flatten any data structure which is normally excepted by str() ?

    >>
    >> Funny idea to call this a bug in Python. What your code is asking for is to
    >> decode the object you pass in using the "latin-1" encoding. Since a list is
    >> not something that is "encoded", let alone in latin-1, you get an error,
    >> and actually a rather clear one.
    >>
    >> Note that this is not specific to Python3.3 or even 3.x. It's the same
    >> thing in Py2 when you call the equivalent unicode() function.
    >>

    >
    > For me it's not funny, at all.


    I think the problem is that the str constructor does two fundamentally
    different things depending on whether you have supplied the encoding
    argument. From help(str) in Python 3.2:

    | str(object[, encoding[, errors]]) -> str
    |
    | Create a new string object from the given object. If encoding or
    | errors is specified, then the object must expose a data buffer
    | that will be decoded using the given encoding and error handler.
    | Otherwise, returns the result of object.__str__() (if defined)
    | or repr(object).
    | encoding defaults to sys.getdefaultencoding().
    | errors defaults to 'strict'.

    So str(obj) returns obj.__str__() but str(obj, encoding='xxx') decodes
    a byte string (or a similar object) using a given encoding. In most
    cases obj will be a byte string and it will be equivalent to using
    obj.decode('xxx').

    I think the help text is a little confusing. It says that encoding
    defaults to sys.getdefaultencoding() but doesn't clarify but this only
    applies if errors is given as a keyword argument since otherwise no
    decoding is performed. Perhaps the help text would be clearer if it
    listed the two operations as two separate cases e.g.:

    str(object)
    Returns a string object from object.__str__() if it is defined or
    otherwise object.__repr__(). Raises TypeError if the returned result
    is not a string object.

    str(bytes, [encoding[, errors]])
    If either encoding or errors is supplied, creates a new string
    object by decoding bytes with the specified encoding. The bytes
    argument can be any object that supports the buffer interface.
    encoding defaults to sys.getdefaultencoding() and errors defaults to
    'strict'.

    > Whenever Python3 encounters a bytestring it needs an encoding to convert it to
    > a string.


    Well actually Python 3.3 will happily convert it to a string using
    bytes.__repr__ if you don't supply the encoding argument:

    >>> str(b'this is a byte string')

    "b'this is a byte string'"

    > If I feed a list of bytestrings or a list of list of bytestrings to
    > 'str' , etc, it should use the encoding for each bytestring component of the
    > given data structure.


    You can always do:

    [str(obj, encoding='xxx') for obj in list_of_byte_strings]

    > How can I convert a data strucure of arbitrarily complex nature, which contains
    > bytestrings somewhere, to a string?


    Using str(obj) or repr(obj). Of course this relies on the author of
    type(obj) defining the appropriate methods and writing the code that
    actually converts the object into a string.

    > This problem has arisen while converting a working Python2 script to Python3.3.
    > Since Python2 doesn't have bytestrings it just works.


    In Python 2 ordinary strings are byte strings.

    > Tell me how to convert str(obj) from Python2 to Python3 if obj is an
    > arbitrarily complex data structure containing bytestrings somewhere
    > which have to be converted to strings with a given encoding?


    The str function when used to convert a non-string object into a
    string knows nothing about the object you provide except whether it
    has __str__ or __repr__ methods. The only processing that is done is
    to check that the returned result was actually a string:

    >>> class A:

    .... def __str__(self):
    .... return []
    ....
    >>> a = A()
    >>> str(a)

    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    TypeError: __str__ returned non-string (type list)

    Perhaps it would help if you would explain why you want the string
    object. I would only use str(complex_object) as something to print for
    debugging so I would actually want it to show me which strings were
    byte strings by marking them with a 'b' prefix and I would also want
    it to show non-ascii characters with a \x hex code as it already does:

    >>> a = [1, 2, b'caf\xe9']
    >>> str(a)

    "[1, 2, b'caf\\xe9']"

    If I wanted to convert the object to a string in order to e.g. save it
    to a file or database then I would write a function to create the
    string that I wanted. I would only use str() to convert elementary
    types like int and float into strings.


    Oscar
     
    Oscar Benjamin, Nov 10, 2012
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. David
    Replies:
    2
    Views:
    494
    Thomas G. Marshall
    Aug 3, 2003
  2. Trevor

    sizeof(str) or sizeof(str) - 1 ?

    Trevor, Apr 3, 2004, in forum: C Programming
    Replies:
    9
    Views:
    656
    CBFalconer
    Apr 10, 2004
  3. Sullivan WxPyQtKinter

    It is fun.the result of str.lower(str())

    Sullivan WxPyQtKinter, Mar 7, 2006, in forum: Python
    Replies:
    5
    Views:
    351
    Tim Roberts
    Mar 9, 2006
  4. Ethan Furman
    Replies:
    4
    Views:
    265
    Roy Smith
    May 27, 2011
  5. Ian Kelly

    Re: Python3.3 str() bug?

    Ian Kelly, Nov 9, 2012, in forum: Python
    Replies:
    0
    Views:
    163
    Ian Kelly
    Nov 9, 2012
Loading...

Share This Page