unicode and strings

Discussion in 'Python' started by Jacob Friis, Nov 2, 2004.

  1. Jacob Friis

    Jacob Friis Guest

    I'm trying to learn Python via Marks Feedparser.

    <snip src="http://feedparser.org/docs/character-encoding.html">
    If the character encoding can not be determined, Universal Feed Parser
    sets the bozo bit to 1 and sets bozo_exception to
    feedparser.CharacterEncodingUnknown. In this case, parsed values will be
    strings, not Unicode strings.
    </snip>

    I guess this means that all data will be unicode, and to put in a
    database I could use my mycode function. Correct?

    def mycode(value):
    if isinstance(value, unicode):
    value = value.encode('utf-8')
    return value

    What do I do about data that is a string?

    Thanks,
    Jacob
     
    Jacob Friis, Nov 2, 2004
    #1
    1. Advertisements

  2. Jacob Friis wrote:

    > I'm trying to learn Python via Marks Feedparser.
    >
    > <snip src="http://feedparser.org/docs/character-encoding.html">
    > If the character encoding can not be determined, Universal Feed Parser
    > sets the bozo bit to 1 and sets bozo_exception to
    > feedparser.CharacterEncodingUnknown. In this case, parsed values will be
    > strings, not Unicode strings.
    > </snip>
    >
    > I guess this means that all data will be unicode, and to put in a
    > database I could use my mycode function. Correct?


    No. It means that you don't get unicode objects, but strings which are
    basically sequences of bytes. And there is no way to be sure what encoding
    they are in.

    >
    > def mycode(value):
    > if isinstance(value, unicode):
    > value = value.encode('utf-8')
    > return value


    this will either yield a string in utf8-encoding, or a string in an unknown
    encoding.

    --
    Regards,

    Diez B. Roggisch
     
    Diez B. Roggisch, Nov 3, 2004
    #2
    1. Advertisements

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Rune Froysa

    Unicode and exception strings

    Rune Froysa, Jan 9, 2004, in forum: Python
    Replies:
    7
    Views:
    1,227
    Terry Carroll
    Jan 14, 2004
  2. Laurent Therond

    Binary strings, unicode and encodings

    Laurent Therond, Jan 15, 2004, in forum: Python
    Replies:
    11
    Views:
    850
    Serge Orlov
    Jan 17, 2004
  3. Laszlo Zsolt Nagy

    unicode and data strings

    Laszlo Zsolt Nagy, Jan 28, 2005, in forum: Python
    Replies:
    0
    Views:
    377
    Laszlo Zsolt Nagy
    Jan 28, 2005
  4. Fuzzyman
    Replies:
    2
    Views:
    613
    Fuzzyman
    Jan 31, 2006
  5. Ben

    Strings, Strings and Damned Strings

    Ben, Jun 22, 2006, in forum: C Programming
    Replies:
    14
    Views:
    1,095
    Malcolm
    Jun 24, 2006
  6. Asterix
    Replies:
    5
    Views:
    978
    Matt Nordhoff
    Aug 31, 2008
  7. Grzegorz ¦liwiñski
    Replies:
    2
    Views:
    1,340
    Grzegorz ¦liwiñski
    Jan 19, 2011
  8. Chirag Mistry
    Replies:
    6
    Views:
    389
    Ollivier Robert
    Feb 8, 2008
Loading...