Dr. Dobb's Python-URL! - weekly Python news and links (Dec 30)

Discussion in 'Python' started by Cameron Laird, Dec 30, 2004.

  1. QOTW: "I found the discussion of unicode, in any python book I have,
    insufficient." -- Thomas Heller

    "If you develop on a Mac, ... Objective-C could come in handy. . . .
    PyObjC makes mixing the two languages dead easy and more convenient than
    indoor plumbing." -- Robert Kern


    Among other activities, the PSF aggregates donors with dollars
    destined to do good Python works, and developers expert in
    obscure corners of Pythonia.
    http://groups-beta.google.com/group/comp.lang.python.announce/browse_thread/thread/705bfe05419aa0b3
    http://groups-beta.google.com/group/comp.lang.python.announce/browse_thread/thread/1122f3e14752ce5/

    Yippee! The martellibot promises to explain Unicode for Pythoneers.
    http://groups-beta.google.com/group/comp.lang.python/msg/6015a5a05c206712

    The glorious SciPy project supports *multiple* worthwhile Wikis.
    http://www.scipy.org/wikis

    Good style in Python does not generally include "in-place"
    operations on lists. Several cleaner idioms are possible.
    http://groups-beta.google.com/group/comp.lang.python/browse_thread/thread/c94559f53d25474e

    Assume you're comfortable with tuples' semantics, immutability,
    and so on. Do you correctly understand the basics of their
    syntax, though? This is another opportunity to think about
    Unicode, by the way.
    http://groups-beta.google.com/group/comp.lang.python/browse_thread/thread/990049d7adb1bcce

    Robert Kern, Paul Rubin, Mike Meyer, Alex Martelli, and others
    provide disproportionately high-quality advice (and tangents!)
    on the subject of languages which complement Python.
    http://groups-beta.google.com/group/comp.lang.python/browse_thread/thread/bbc1c6d9d87049b6


    ========================================================================
    Everything Python-related you want is probably one or two clicks away in
    these pages:

    Python.org's Python Language Website is the traditional
    center of Pythonia
    http://www.python.org
    Notice especially the master FAQ
    http://www.python.org/doc/FAQ.html

    PythonWare complements the digest you're reading with the
    marvelous daily python url
    http://www.pythonware.com/daily
    Mygale is a news-gathering webcrawler that specializes in (new)
    World-Wide Web articles related to Python.
    http://www.awaretek.com/nowak/mygale.html
    While cosmetically similar, Mygale and the Daily Python-URL
    are utterly different in their technologies and generally in
    their results.

    comp.lang.python.announce announces new Python software. Be
    sure to scan this newsgroup weekly.
    http://groups.google.com/groups?oi=djq&as_ugroup=comp.lang.python.announce

    Brett Cannon continues the marvelous tradition established by
    Andrew Kuchling and Michael Hudson of intelligently summarizing
    action on the python-dev mailing list once every other week.
    http://www.python.org/dev/summary/

    The Python Package Index catalogues packages.
    http://www.python.org/pypi/

    The somewhat older Vaults of Parnassus ambitiously collects references
    to all sorts of Python resources.
    http://www.vex.net/~x/parnassus/

    Much of Python's real work takes place on Special-Interest Group
    mailing lists
    http://www.python.org/sigs/

    The Python Business Forum "further the interests of companies
    that base their business on ... Python."
    http://www.python-in-business.org

    Python Success Stories--from air-traffic control to on-line
    match-making--can inspire you or decision-makers to whom you're
    subject with a vision of what the language makes practical.
    http://www.pythonology.com/success

    The Python Software Foundation (PSF) has replaced the Python
    Consortium as an independent nexus of activity. It has official
    responsibility for Python's development and maintenance.
    http://www.python.org/psf/
    Among the ways you can support PSF is with a donation.
    http://www.python.org/psf/donate.html

    Kurt B. Kaiser publishes a weekly report on faults and patches.
    http://www.google.com/groups?as_usubject=weekly python patch

    Cetus collects Python hyperlinks.
    http://www.cetus-links.org/oo_python.html

    Python FAQTS
    http://python.faqts.com/

    The Cookbook is a collaborative effort to capture useful and
    interesting recipes.
    http://aspn.activestate.com/ASPN/Cookbook/Python

    Among several Python-oriented RSS/RDF feeds available are
    http://www.python.org/channews.rdf
    http://bootleg-rss.g-blog.net/pythonware_com_daily.pcgi
    http://python.de/backend.php
    For more, see
    http://www.syndic8.com/feedlist.php?ShowMatch=python&ShowStatus=all
    The old Python "To-Do List" now lives principally in a
    SourceForge reincarnation.
    http://sourceforge.net/tracker/?atid=355470&group_id=5470&func=browse
    http://python.sourceforge.net/peps/pep-0042.html

    The online Python Journal is posted at pythonjournal.cognizor.com.
    and
    welcome submission of material that helps people's understanding
    of Python use, and offer Web presentation of your work.

    deli.cio.us presents an intriguing approach to reference commentary.
    It already aggregates quite a bit of Python intelligence.
    http://del.icio.us/tag/python

    *Py: the Journal of the Python Language*
    http://www.pyzine.com

    Archive probing tricks of the trade:
    http://groups.google.com/groups?oi=djq&as_ugroup=comp.lang.python&num=100
    http://groups.google.com/groups?meta=site=groups&group=comp.lang.python.*

    Previous - (U)se the (R)esource, (L)uke! - messages are listed here:
    http://www.ddj.com/topics/pythonurl/
    http://purl.org/thecliff/python/url.html (dormant)
    or
    http://groups.google.com/groups?oi=djq&as_q=+Python-URL!&as_ugroup=comp.lang.python


    Suggestions/corrections for next week's posting are always welcome.
    E-mail to <> should get through.

    To receive a new issue of this posting in e-mail each Monday morning
    (approximately), ask <> to subscribe. Mention
    "Python-URL!".


    -- The Python-URL! Team--

    Dr. Dobb's Journal (http://www.ddj.com) is pleased to participate in and
    sponsor the "Python-URL!" project.
     
    Cameron Laird, Dec 30, 2004
    #1
    1. Advertising

  2. Cameron Laird <> wrote:
    ...
    > Yippee! The martellibot promises to explain Unicode for Pythoneers.
    > http://groups-beta.google.com/group/comp.lang.python/msg/6015a5a05c206712


    Uh -- _did_ I? Eeep... I guess I did... mostly, I was pointing to
    Holger Krekel's very nice recipe (not sure he posted it to the site as
    well as submitting it for the printed edition, but, lobby _HIM_ about
    that;-).


    Alex
     
    Alex Martelli, Dec 31, 2004
    #2
    1. Advertising

  3. On Fri, Dec 31, 2004 at 19:18 +0100, Alex Martelli wrote:
    > Cameron Laird <> wrote:
    > ...
    > > Yippee! The martellibot promises to explain Unicode for Pythoneers.
    > > http://groups-beta.google.com/group/comp.lang.python/msg/6015a5a05c206712

    >
    > Uh -- _did_ I? Eeep... I guess I did... mostly, I was pointing to
    > Holger Krekel's very nice recipe (not sure he posted it to the site as
    > well as submitting it for the printed edition, but, lobby _HIM_ about
    > that;-).


    FWIW, i added the recipe back to the online cookbook. It's not perfectly
    formatted but still useful, i hope.

    http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/361742

    cheers,

    holger

    P.S: happy new year.
     
    holger krekel, Jan 4, 2005
    #3
  4. Cameron Laird

    Guest

    Holger:

    > FWIW, i added the recipe back to the online cookbook. It's not

    perfectly
    > formatted but still useful, i hope.


    > http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/361742


    Uhm... on my system I get:

    >>> german_ae = unicode('\xc3\xa4', 'utf8')
    >>> print german_ae # dunno if it will appear right on Google groups

    ä

    >>> german_ae.decode('latin1')

    Traceback (most recent call last):
    File "<stdin>", line 1, in ?
    UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in
    position 0: ordinal not in range(128)
    ?? What's wrong?

    Michele Simionato
     
    , Jan 4, 2005
    #4
  5. On Tue, 04 Jan 2005 05:43:32 -0800, michele.simionato wrote:

    > Holger:
    >
    >> FWIW, i added the recipe back to the online cookbook. It's not

    > perfectly
    >> formatted but still useful, i hope.

    >
    >> http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/361742

    >
    > Uhm... on my system I get:
    >
    >>>> german_ae = unicode('\xc3\xa4', 'utf8')
    >>>> print german_ae # dunno if it will appear right on Google groups

    > ä
    >
    >>>> german_ae.decode('latin1')

    > Traceback (most recent call last):
    > File "<stdin>", line 1, in ?
    > UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in
    > position 0: ordinal not in range(128)
    > ?? What's wrong?


    I'd rather use german_ae.encode('latin1')
    ^^^^^^

    which returns '\xe4'.
    >
    > Michele Simionato
     
    Stephan Diehl, Jan 4, 2005
    #5
  6. Cameron Laird

    Guest

    Stephan:

    > I'd rather use german_ae.encode('latin1')

    ^^^^^^
    > which returns '\xe4'.


    uhm ... then there is a misprint in the discussion of the recipe;
    BTW what's the difference between .encode and .decode ?
    (yes, I have been living in happy ASCII-land until now ... ;)
    I should probably ask for an unicode primer, I have found the
    one by Marc André Lemburg
    http://www.reportlab.com/i18n/python_unicode_tutorial.html
    and I am reading it right now.


    Michele Simionato
     
    , Jan 4, 2005
    #6
  7. Cameron Laird

    Aahz Guest

    Unicode universe (was Re: Dr. Dobb's Python-URL! - weekly Python news and links (Dec 30))

    In article <>,
    <> wrote:
    >
    >BTW what's the difference between .encode and .decode ?
    >(yes, I have been living in happy ASCII-land until now ... ;)


    Here's the stark simple recipe: when you use Unicode, you *MUST* switch
    to a Unicode-centric view of the universe. Therefore you encode *FROM*
    Unicode and you decode *TO* Unicode. Period. It's similar to the way
    floating point contaminates ints.
    --
    Aahz () <*> http://www.pythoncraft.com/

    "19. A language that doesn't affect the way you think about programming,
    is not worth knowing." --Alan Perlis
     
    Aahz, Jan 4, 2005
    #7
  8. michele> BTW what's the difference between .encode and .decode ?

    I started to answer, then got confused when I read the docstrings for
    unicode.encode and unicode.decode:

    >>> help(u"\xe4".decode)

    Help on built-in function decode:

    decode(...)
    S.decode([encoding[,errors]]) -> string or unicode

    Decodes S using the codec registered for encoding. encoding defaults
    to the default encoding. errors may be given to set a different error
    handling scheme. Default is 'strict' meaning that encoding errors raise
    a UnicodeDecodeError. Other possible values are 'ignore' and 'replace'
    as well as any other name registerd with codecs.register_error that is
    able to handle UnicodeDecodeErrors.

    >>> help(u"\xe4".encode)

    Help on built-in function encode:

    encode(...)
    S.encode([encoding[,errors]]) -> string or unicode

    Encodes S using the codec registered for encoding. encoding defaults
    to the default encoding. errors may be given to set a different error
    handling scheme. Default is 'strict' meaning that encoding errors raise
    a UnicodeEncodeError. Other possible values are 'ignore', 'replace' and
    'xmlcharrefreplace' as well as any other name registered with
    codecs.register_error that can handle UnicodeEncodeErrors.

    It probably makes sense to one who knows, but for the feeble-minded like
    myself, they seem about the same.

    I'd be happy to add a couple examples to the string methods section of the
    docs if someone will produce something simple that makes the distinction
    clear.

    Skip
     
    Skip Montanaro, Jan 4, 2005
    #8
  9. Cameron Laird

    Guest

    Yep, I did the same and got confused :-/

    Michele
     
    , Jan 4, 2005
    #9
  10. Re: Unicode universe (was Re: Dr. Dobb's Python-URL! - weekly Pythonnews and links (Dec 30))

    aahz> Here's the stark simple recipe: when you use Unicode, you *MUST*
    aahz> switch to a Unicode-centric view of the universe. Therefore you
    aahz> encode *FROM* Unicode and you decode *TO* Unicode. Period. It's
    aahz> similar to the way floating point contaminates ints.

    That's what I do in my code. Why do Unicode objects have a decode method
    then?

    Skip
     
    Skip Montanaro, Jan 4, 2005
    #10
  11. Skip Montanaro <> writes:

    > michele> BTW what's the difference between .encode and .decode ?
    >
    > I started to answer, then got confused when I read the docstrings for
    > unicode.encode and unicode.decode:
    >
    > >>> help(u"\xe4".decode)

    > Help on built-in function decode:
    >
    > decode(...)
    > S.decode([encoding[,errors]]) -> string or unicode
    >
    > Decodes S using the codec registered for encoding. encoding defaults
    > to the default encoding. errors may be given to set a different error
    > handling scheme. Default is 'strict' meaning that encoding errors raise
    > a UnicodeDecodeError. Other possible values are 'ignore' and 'replace'
    > as well as any other name registerd with codecs.register_error that is
    > able to handle UnicodeDecodeErrors.
    >
    > >>> help(u"\xe4".encode)

    > Help on built-in function encode:
    >
    > encode(...)
    > S.encode([encoding[,errors]]) -> string or unicode
    >
    > Encodes S using the codec registered for encoding. encoding defaults
    > to the default encoding. errors may be given to set a different error
    > handling scheme. Default is 'strict' meaning that encoding errors raise
    > a UnicodeEncodeError. Other possible values are 'ignore', 'replace' and
    > 'xmlcharrefreplace' as well as any other name registered with
    > codecs.register_error that can handle UnicodeEncodeErrors.
    >
    > It probably makes sense to one who knows, but for the feeble-minded like
    > myself, they seem about the same.


    It seems also the error messages aren't too helpful:

    >>> "ä".encode("latin-1")

    Traceback (most recent call last):
    File "<stdin>", line 1, in ?
    UnicodeDecodeError: 'ascii' codec can't decode byte 0x84 in position 0: ordinal not in range(128)
    >>>


    Hm, why does the 'encode' call complain about decoding?

    Why do string objects have an encode method, and why do unicode objects
    have a decode method, and what does this error message want to tell me:

    >>> u"ä".decode("latin-1")

    Traceback (most recent call last):
    File "<stdin>", line 1, in ?
    UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 0: ordinal not in range(128)
    >>>


    Thomas
     
    Thomas Heller, Jan 4, 2005
    #11
  12. Cameron Laird

    Max M Guest

    wrote:

    > uhm ... then there is a misprint in the discussion of the recipe;
    > BTW what's the difference between .encode and .decode ?
    > (yes, I have been living in happy ASCII-land until now ... ;)



    # -*- coding: latin-1 -*-


    # here i make a unicode string
    unicode_file = u'Some danish characters æøå' #.encode('hex')
    print type(unicode_file)
    print repr(unicode_file)
    print ''


    # I can convert this unicode string to an ordinary string.
    # because æøå are in the latin-1 charmap it can be understood as
    # a latin-1 string
    # the æøå characters even has the same value in both
    latin1_file = unicode_file.encode('latin-1')
    print type(latin1_file)
    print repr(latin1_file)
    print latin1_file
    print ''


    ## I can *not* convert it to ascii
    #ascii_file = unicode_file.encode('ascii')
    #print ''


    # I can also convert it to utf-8
    utf8_file = unicode_file.encode('utf-8')
    print type(utf8_file)
    print repr(utf8_file)
    print utf8_file
    print ''


    #utf8_file is now an ordinary string. again it can help to think of it
    as a file
    #format.
    #
    #I can convert this file/string back to unicode again by using the
    decode method.
    #It tells python to decode this "file format" as utf-8 when it loads it
    onto a
    #unicode string. And we are back where we started


    unicode_file = utf8_file.decode('utf-8')
    print type(unicode_file)
    print repr(unicode_file)
    print ''


    # So basically you can encode a unicode string into a special
    string/file format
    # and you can decode a string from a special string/file format back
    into unicode.


    ###################################


    <type 'unicode'>
    u'Some danish characters \xe6\xf8\xe5'

    <type 'str'>
    'Some danish characters \xe6\xf8\xe5'
    Some danish characters æøå

    <type 'str'>
    'Some danish characters \xc3\xa6\xc3\xb8\xc3\xa5'
    Some danish characters æøå

    <type 'unicode'>
    u'Some danish characters \xe6\xf8\xe5'





    --

    hilsen/regards Max M, Denmark

    http://www.mxm.dk/
    IT's Mad Science
     
    Max M, Jan 4, 2005
    #12
  13. Cameron Laird

    Max M Guest

    Thomas Heller wrote:

    > It seems also the error messages aren't too helpful:
    >
    >>>>"ä".encode("latin-1")

    >
    > Traceback (most recent call last):
    > File "<stdin>", line 1, in ?
    > UnicodeDecodeError: 'ascii' codec can't decode byte 0x84 in position 0: ordinal not in range(128)
    >
    > Hm, why does the 'encode' call complain about decoding?


    Because it tries to print it out to your console and fail. While writing
    to the console it tries to convert to ascii.

    Beside, you should write:

    u"ä".encode("latin-1") to get a latin-1 encoded string.


    --

    hilsen/regards Max M, Denmark

    http://www.mxm.dk/
    IT's Mad Science
     
    Max M, Jan 4, 2005
    #13
  14. Max M <> writes:

    > Thomas Heller wrote:
    >
    >> It seems also the error messages aren't too helpful:
    >>
    >>>>>"ä".encode("latin-1")

    >> Traceback (most recent call last):
    >> File "<stdin>", line 1, in ?
    >> UnicodeDecodeError: 'ascii' codec can't decode byte 0x84 in position 0: ordinal not in range(128)
    >> Hm, why does the 'encode' call complain about decoding?

    >
    > Because it tries to print it out to your console and fail. While
    > writing to the console it tries to convert to ascii.


    Wrong, same error without trying to print something:

    >>> x = "ä".encode("latin-1")

    Traceback (most recent call last):
    File "<stdin>", line 1, in ?
    UnicodeDecodeError: 'ascii' codec can't decode byte 0x84 in position 0: ordinal not in range(128)
    >>>


    >
    > Beside, you should write:
    >
    > u"ä".encode("latin-1") to get a latin-1 encoded string.


    I know, but the question was: why does a unicode string has a encode
    method, and why does it complain about decoding (which has already been
    answered in the meantime).

    Thomas
     
    Thomas Heller, Jan 4, 2005
    #14
  15. Re: Unicode universe (was Re: Dr. Dobb's Python-URL! - weekly Pythonnews and links (Dec 30))

    Skip Montanaro wrote:
    > aahz> Here's the stark simple recipe: when you use Unicode, you *MUST*
    > aahz> switch to a Unicode-centric view of the universe. Therefore you
    > aahz> encode *FROM* Unicode and you decode *TO* Unicode. Period. It's
    > aahz> similar to the way floating point contaminates ints.
    >
    > That's what I do in my code. Why do Unicode objects have a decode method
    > then?


    Because MAL implemented it! >;->

    It first encodes in the default encoding and then decodes the result
    with the specified encoding, so if u is a unicode object
    u.decode("utf-16")
    is an abbreviation of
    u.encode().decode("utf-16")

    In the same way str has an encode method, so
    s.encode("utf-16")
    is an abbreviation of
    s.decode().encode("utf-16")

    Bye,
    Walter Dörwald
     
    =?ISO-8859-1?Q?Walter_D=F6rwald?=, Jan 4, 2005
    #15
  16. Cameron Laird

    Carl Banks Guest

    Skip Montanaro wrote:
    > I started to answer, then got confused when I read the docstrings for
    > unicode.encode and unicode.decode:

    [snip]


    It certainly is confusing. When I first started Unicoding, I pretty
    much stuck to Aahz's rule of thumb, without understanding this details,
    and still do that. But now I do undertstand it.

    Although encodings are bijective (i.e., equivalent one-to-one
    mappings), they are not apolar. One side of the encoding is
    arbitrarily labeled the encoded form; the other is arbitrarily labeled
    the decoded form. (This is not a relativistic system, here.) The
    encode method maps from the decoded to the encoded set. The decode
    method does the inverse.

    That's it. The only real technical difference between encode and
    decode is the direction they map in.

    By convention, the decoded form is a Python unicode string, and the
    encoded form is the byte string.

    I believe it's technically possible (but very rude) to write an
    "inverse encoding", where the "encoded" form is a unicode string, and
    the decoded form is UTF-8 byte string.

    Also, note that there are some encodings unrelated to Unicode. For
    example, try this:

    .. >>> "abcd".encode("base64")
    This is an encoding between two byte strings.


    --
    CARL BANKS
     
    Carl Banks, Jan 5, 2005
    #16
  17. Cameron Laird

    Max M Guest

    Carl Banks wrote:

    > Also, note that there are some encodings unrelated to Unicode. For
    > example, try this:
    >
    > . >>> "abcd".encode("base64")
    > This is an encoding between two byte strings.


    Yes. This can be especially nice when you need to use restricted charsets.

    I needed to use unicode objects as Zope ids. But Zope only accepts a
    subset of ascii as ids.

    So I used:


    hex_id = u'INBOX'.encode('utf-8').encode('hex')
    >>494e424f58


    And I can get the unicode representation back with:

    unicode_id = id.decode('hex').decode('utf-8')
    >>u'INBOX'


    Tn that case id.decode('hex') doesn't return a unicode, but a utf-8
    encoded string.

    --

    hilsen/regards Max M, Denmark

    http://www.mxm.dk/
    IT's Mad Science
     
    Max M, Jan 5, 2005
    #17
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Emile van Sebille
    Replies:
    1
    Views:
    296
    Irmen de Jong
    Dec 3, 2003
  2. Emile van Sebille
    Replies:
    0
    Views:
    274
    Emile van Sebille
    Dec 8, 2003
  3. Emile van Sebille
    Replies:
    0
    Views:
    253
    Emile van Sebille
    Dec 15, 2003
  4. Emile van Sebille
    Replies:
    0
    Views:
    272
    Emile van Sebille
    Dec 26, 2003
  5. Emile van Sebille
    Replies:
    0
    Views:
    293
    Emile van Sebille
    Dec 30, 2003
Loading...

Share This Page