unicode and strings

J

Jacob Friis

I'm trying to learn Python via Marks Feedparser.

<snip src="http://feedparser.org/docs/character-encoding.html">
If the character encoding can not be determined, Universal Feed Parser
sets the bozo bit to 1 and sets bozo_exception to
feedparser.CharacterEncodingUnknown. In this case, parsed values will be
strings, not Unicode strings.
</snip>

I guess this means that all data will be unicode, and to put in a
database I could use my mycode function. Correct?

def mycode(value):
if isinstance(value, unicode):
value = value.encode('utf-8')
return value

What do I do about data that is a string?

Thanks,
Jacob
 
D

Diez B. Roggisch

Jacob said:
I'm trying to learn Python via Marks Feedparser.

<snip src="http://feedparser.org/docs/character-encoding.html">
If the character encoding can not be determined, Universal Feed Parser
sets the bozo bit to 1 and sets bozo_exception to
feedparser.CharacterEncodingUnknown. In this case, parsed values will be
strings, not Unicode strings.
</snip>

I guess this means that all data will be unicode, and to put in a
database I could use my mycode function. Correct?

No. It means that you don't get unicode objects, but strings which are
basically sequences of bytes. And there is no way to be sure what encoding
they are in.
def mycode(value):
if isinstance(value, unicode):
value = value.encode('utf-8')
return value

this will either yield a string in utf8-encoding, or a string in an unknown
encoding.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,575
Members
45,053
Latest member
billing-software

Latest Threads

Top