[unicode] inconvenient unicode conversion of non-string arguments

Discussion in 'Python' started by Holger Joukl, Dec 13, 2006.

  1. Holger Joukl

    Holger Joukl Guest

    Hi there,

    I consider the behaviour of unicode() inconvenient wrt to conversion of
    non-string
    arguments.
    While you can do:

    >>> unicode(17.3)

    u'17.3'

    you cannot do:

    >>> unicode(17.3, 'ISO-8859-1', 'replace')

    Traceback (most recent call last):
    File "<stdin>", line 1, in ?
    TypeError: coercing to Unicode: need string or buffer, float found
    >>>


    This is somehow annoying when you want to convert a mixed-type argument
    list
    to unicode strings, e.g. for a logging system (that's where it bit me) and
    want to make sure that possible raw string arguments are also converted to
    unicode without errors (although by force).
    Especially as this is a performance-critical part in my application so I
    really
    do not like to wrap unicode() into some custom tounicode() function that
    handles
    such cases by distinction of argument types.

    Any reason why unicode() with a non-string argument should not allow the
    encoding and errors arguments?
    Or some good solution to work around my problem?

    (Currently running on python 2.4.3)

    Regards,
    Holger

    Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene
    Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde,
    verständigen Sie bitte den Absender sofort und löschen Sie die E-Mail
    sodann. Das unerlaubte Kopieren sowie die unbefugte Übermittlung sind nicht
    gestattet. Die Sicherheit von Übermittlungen per E-Mail kann nicht
    garantiert werden. Falls Sie eine Bestätigung wünschen, fordern Sie bitte
    den Inhalt der E-Mail als Hardcopy an.

    The contents of this e-mail are confidential. If you are not the named
    addressee or if this transmission has been addressed to you in error,
    please notify the sender immediately and then delete this e-mail. Any
    unauthorized copying and transmission is forbidden. E-Mail transmission
    cannot be guaranteed to be secure. If verification is required, please
    request a hard copy version.
     
    Holger Joukl, Dec 13, 2006
    #1
    1. Advertising

  2. Holger Joukl

    Leo Kislov Guest

    Re: inconvenient unicode conversion of non-string arguments

    Holger Joukl wrote:
    > Hi there,
    >
    > I consider the behaviour of unicode() inconvenient wrt to conversion of
    > non-string
    > arguments.
    > While you can do:
    >
    > >>> unicode(17.3)

    > u'17.3'
    >
    > you cannot do:
    >
    > >>> unicode(17.3, 'ISO-8859-1', 'replace')

    > Traceback (most recent call last):
    > File "<stdin>", line 1, in ?
    > TypeError: coercing to Unicode: need string or buffer, float found
    > >>>

    >
    > This is somehow annoying when you want to convert a mixed-type argument
    > list
    > to unicode strings, e.g. for a logging system (that's where it bit me) and
    > want to make sure that possible raw string arguments are also converted to
    > unicode without errors (although by force).
    > Especially as this is a performance-critical part in my application so I
    > really
    > do not like to wrap unicode() into some custom tounicode() function that
    > handles
    > such cases by distinction of argument types.
    >
    > Any reason why unicode() with a non-string argument should not allow the
    > encoding and errors arguments?


    There is reason: encoding is a property of bytes, it is not applicable
    to other objects.

    > Or some good solution to work around my problem?


    Do not put undecoded bytes in a mixed-type argument list. A rule of
    thumb working with unicode: decode as soon as possible, encode as late
    as possible.

    -- Leo
     
    Leo Kislov, Dec 13, 2006
    #2
    1. Advertising

  3. Re: inconvenient unicode conversion of non-string arguments

    Holger Joukl wrote:

    > Ok, but I still don't see why these arguments shouldn't simply be silently
    > ignored


    >>> import this


    </F>
     
    Fredrik Lundh, Dec 13, 2006
    #3
  4. Holger Joukl

    Leo Kislov Guest

    Re: inconvenient unicode conversion of non-string arguments

    Holger Joukl wrote:
    > python-list-bounces+holger.joukl= schrieb am 13.12.2006
    > 11:02:30:
    >
    > >
    > > Holger Joukl wrote:
    > > > Hi there,
    > > >
    > > > I consider the behaviour of unicode() inconvenient wrt to conversion of
    > > > non-string
    > > > arguments.
    > > > While you can do:
    > > >
    > > > >>> unicode(17.3)
    > > > u'17.3'
    > > >
    > > > you cannot do:
    > > >
    > > > >>> unicode(17.3, 'ISO-8859-1', 'replace')
    > > > Traceback (most recent call last):
    > > > File "<stdin>", line 1, in ?
    > > > TypeError: coercing to Unicode: need string or buffer, float found
    > > > >>>
    > > > [...]
    > > > Any reason why unicode() with a non-string argument should not allow

    > the
    > > > encoding and errors arguments?

    > >
    > > There is reason: encoding is a property of bytes, it is not applicable
    > > to other objects.

    >
    > Ok, but I still don't see why these arguments shouldn't simply be silently
    > ignored
    > for non-string arguments.


    That's rather bizzare and sloppy approach. Should

    unicode(17.3, 'just-having-fun', 'I-do-not-like-errors')
    unicode(17.3, 'sdlfkj', 'ewrlkj', 'eoirj', 'sdflkj')

    work?


    > > > Or some good solution to work around my problem?

    > >
    > > Do not put undecoded bytes in a mixed-type argument list. A rule of
    > > thumb working with unicode: decode as soon as possible, encode as late
    > > as possible.

    >
    > It's not always that easy when you deal with a tree data structure with the
    > tree elements containing different data types and your user may decide to
    > output
    > root.element.subelement.whateverData.
    > I have the problems in a logging mechanism, and it would vanish if
    > unicode(<non-string>, encoding, errors) would work and just ignore the
    > obsolete
    > arguments.


    I don't really see from your example what stops you from putting
    unicode instead of bytes into your tree, but I can believe some
    libraries can cause some extra work. That's the problem with libraries,
    not with builtin function unicode(). Would you be happy if floating
    point value 17.3 would be stored as 8 bytes in your tree? After all,
    that is how 17.3 is actually represented in computer memory. Same story
    with unicode, if some library gives you raw bytes *you* have to do
    extra work later.

    -- Leo
     
    Leo Kislov, Dec 13, 2006
    #4
  5. In <>, Holger Joukl
    wrote:

    > Der Inhalt dieser E-Mail ist vertraulich. Falls Sie nicht der angegebene
    > Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde,
    > verständigen Sie bitte den Absender sofort und löschen Sie die E-Mail
    > sodann. Das unerlaubte Kopieren sowie die unbefugte Ãœbermittlung sind nicht
    > gestattet. Die Sicherheit von Ãœbermittlungen per E-Mail kann nicht
    > garantiert werden. Falls Sie eine Bestätigung wünschen, fordern Sie bitte
    > den Inhalt der E-Mail als Hardcopy an.
    >
    > The contents of this e-mail are confidential. If you are not the named
    > addressee or if this transmission has been addressed to you in error,
    > please notify the sender immediately and then delete this e-mail. Any
    > unauthorized copying and transmission is forbidden. E-Mail transmission
    > cannot be guaranteed to be secure. If verification is required, please
    > request a hard copy version.


    Maybe you should rethink if it really makes sense to add this huge block
    of "nonsense" to a post to a newsgroup or public mailing list. If it's
    confidential, just keep it secret. ;-)

    Ciao,
    Marc 'BlackJack' Rintsch
     
    Marc 'BlackJack' Rintsch, Dec 13, 2006
    #5
  6. Holger Joukl

    Ben Finney Guest

    Stupid email disclaimers (was: [unicode] inconvenient unicodeconversion of non-string arguments)

    "Marc 'BlackJack' Rintsch" <> writes:

    > In <>, Holger Joukl
    > wrote:
    > > [a meaningless disclaimer text at the bottom of every message]

    >
    > Maybe you should rethink if it really makes sense to add this huge
    > block of "nonsense" to a post to a newsgroup or public mailing list.
    > If it's confidential, just keep it secret. ;-)


    In all likelihood, the OP isn't choosing specifically to attach it;
    these things are often done to *every* outgoing message at an
    organisational level by people who don't think the issue through very
    well.

    <URL:http://goldmark.org/jeff/stupid-disclaimers/>

    Please, those with such badly-configured systems, discuss the issue of
    public discussion forums with the boneheads who think these disclaimer
    texts are a good idea and at least try to change that behaviour.

    Alternatively, post from some other mail system that doesn't slap
    these obnoxious blocks onto your messages.

    --
    \ "I wish there was a knob on the TV to turn up the intelligence. |
    `\ There's a knob called 'brightness' but it doesn't work." -- |
    _o__) Eugene P. Gallagher |
    Ben Finney
     
    Ben Finney, Dec 13, 2006
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Shawn
    Replies:
    8
    Views:
    719
    Martin Gregorie
    Nov 1, 2006
  2. Shawn
    Replies:
    6
    Views:
    1,898
  3. ankur
    Replies:
    1
    Views:
    13,057
    Jan =?UTF-8?B?VGhvbcOk?=
    Aug 27, 2007
  4. Simon Strandgaard

    [rcr] Array#join non string arguments

    Simon Strandgaard, Feb 19, 2005, in forum: Ruby
    Replies:
    14
    Views:
    212
    Alexander Kellett
    Feb 20, 2005
  5. Jochen Lehmeier

    DBD::Oracle, Unicode, non-UTF8-non-ASCII strings

    Jochen Lehmeier, Jul 23, 2009, in forum: Perl Misc
    Replies:
    0
    Views:
    449
    Jochen Lehmeier
    Jul 23, 2009
Loading...

Share This Page