byte count unicode string

Discussion in 'Python' started by willie, Sep 20, 2006.

  1. willie

    willie Guest

    John Machin:

    >You are confusing the hell out of yourself. You say that your web app
    >deals only with UTF-8 strings. Where do you get "the unicode string"
    >from??? If name is a utf-8 string, as your comment says, then len(name)
    >is all you need!!!



    # I'll go ahead and concede defeat since you appear to be on the
    # verge of a heart attack :)
    # I can see that I lack clarity so I don't blame you.

    # By UTF-8 string, I mean a unicode object with UTF-8 encoding:

    type(ustr)
    <type 'unicode'>
    >>> repr(ustr)

    "u'\\u2708'"

    # The database API expects unicode objects:
    # A template query, then a variable number of values.
    # Perhaps I'm a victim of arbitrary design decisions :)
     
    willie, Sep 20, 2006
    #1
    1. Advertising

  2. willie

    John Machin Guest

    willie wrote:
    > John Machin:
    >
    > >You are confusing the hell out of yourself. You say that your web app
    > >deals only with UTF-8 strings. Where do you get "the unicode string"
    > >from??? If name is a utf-8 string, as your comment says, then len(name)
    > >is all you need!!!

    >
    >
    > # I'll go ahead and concede defeat since you appear to be on the
    > # verge of a heart attack :)
    > # I can see that I lack clarity so I don't blame you.


    All you have to do is use terminology like "Python str object, encoded
    in utf-8" and "Python unicode object".

    >
    > # By UTF-8 string, I mean a unicode object with UTF-8 encoding:


    There is no such animal as a "unicode object with UTF-8 encoding".
    Don't make up terminology as you go.

    >
    > type(ustr)
    > <type 'unicode'>
    > >>> repr(ustr)

    > "u'\\u2708'"


    Sigh. I suppose we have to infer that "ustr" is the same as the "name"
    that you were getting as post data. Is that correct?

    >
    > # The database API expects unicode objects:
    > # A template query, then a variable number of values.
    > # Perhaps I'm a victim of arbitrary design decisions :)


    And the database will encode those unicode objects as utf-8, silently
    truncating any that are too long -- just as Duncan feared? "Arbitrary"
    is not the word for it.

    Good luck!

    Cheers,
    John
     
    John Machin, Sep 20, 2006
    #2
    1. Advertising

  3. willie wrote:

    > John Machin:
    >
    > >You are confusing the hell out of yourself. You say that your web app
    > >deals only with UTF-8 strings. Where do you get "the unicode string"
    > >from??? If name is a utf-8 string, as your comment says, then len(name)
    > >is all you need!!!

    >
    >
    > # I'll go ahead and concede defeat since you appear to be on the
    > # verge of a heart attack :)
    > # I can see that I lack clarity so I don't blame you.


    Could you please change your style of quoting/posting? It is extremely
    confusing - not only using a different character than > for citations, but
    also appearing to cite yourself while in fact it is your answer one reads.

    I'm all for expressing oneself and proving to be an individual - but
    communication can get tricky even with standardized manners of doing so,
    and there is no need to add more confusion.

    > # By UTF-8 string, I mean a unicode object with UTF-8 encoding:
    >
    > type(ustr)
    > <type 'unicode'>
    > >>> repr(ustr)

    > "u'\\u2708'"


    You ARE confusing the hell out of yourself. There is no such thing as a
    unciode object with UTF-8 encoding. There are unicode objects. And there
    are byte-strings, which may happen to represent text encoded in utf-8.

    What you see above is a unicode code point literal - which is translated to
    a certain utf-8 string, that looks suspiciously alike because of the way
    utf-8 defines the mapping between the code-points of unicode to utf-8.

    But it still remains true: a unicode object is a unicode object. And has no
    encoding whatsoever!

    > # The database API expects unicode objects:
    > # A template query, then a variable number of values.
    > # Perhaps I'm a victim of arbitrary design decisions :)


    The same happens in java all the time, as java only deals with unicode
    strings. And for dealing with it, you also need to explicitly convert them
    to the proper encoded byte array. Unfortunate, but true.

    Diez
     
    Diez B. Roggisch, Sep 20, 2006
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. willie

    byte count unicode string

    willie, Sep 20, 2006, in forum: Python
    Replies:
    2
    Views:
    359
    Marc 'BlackJack' Rintsch
    Sep 20, 2006
  2. willie

    byte count unicode string

    willie, Sep 20, 2006, in forum: Python
    Replies:
    7
    Views:
    719
    Virgil Dupras
    Sep 21, 2006
  3. willie

    byte count unicode string

    willie, Sep 20, 2006, in forum: Python
    Replies:
    1
    Views:
    613
    John Machin
    Sep 20, 2006
  4. willie

    byte count unicode string

    willie, Sep 20, 2006, in forum: Python
    Replies:
    2
    Views:
    709
    =?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=
    Sep 20, 2006
  5. willie

    byte count unicode string

    willie, Sep 20, 2006, in forum: Python
    Replies:
    2
    Views:
    508
    Paul Rubin
    Sep 22, 2006
Loading...

Share This Page