Dr. Dobb's Python-URL! - weekly Python news and links (Dec 30)

C

Cameron Laird

QOTW: "I found the discussion of unicode, in any python book I have,
insufficient." -- Thomas Heller

"If you develop on a Mac, ... Objective-C could come in handy. . . .
PyObjC makes mixing the two languages dead easy and more convenient than
indoor plumbing." -- Robert Kern


Among other activities, the PSF aggregates donors with dollars
destined to do good Python works, and developers expert in
obscure corners of Pythonia.
http://groups-beta.google.com/group/comp.lang.python.announce/browse_thread/thread/705bfe05419aa0b3
http://groups-beta.google.com/group/comp.lang.python.announce/browse_thread/thread/1122f3e14752ce5/

Yippee! The martellibot promises to explain Unicode for Pythoneers.
http://groups-beta.google.com/group/comp.lang.python/msg/6015a5a05c206712

The glorious SciPy project supports *multiple* worthwhile Wikis.
http://www.scipy.org/wikis

Good style in Python does not generally include "in-place"
operations on lists. Several cleaner idioms are possible.
http://groups-beta.google.com/group/comp.lang.python/browse_thread/thread/c94559f53d25474e

Assume you're comfortable with tuples' semantics, immutability,
and so on. Do you correctly understand the basics of their
syntax, though? This is another opportunity to think about
Unicode, by the way.
http://groups-beta.google.com/group/comp.lang.python/browse_thread/thread/990049d7adb1bcce

Robert Kern, Paul Rubin, Mike Meyer, Alex Martelli, and others
provide disproportionately high-quality advice (and tangents!)
on the subject of languages which complement Python.
http://groups-beta.google.com/group/comp.lang.python/browse_thread/thread/bbc1c6d9d87049b6


========================================================================
Everything Python-related you want is probably one or two clicks away in
these pages:

Python.org's Python Language Website is the traditional
center of Pythonia
http://www.python.org
Notice especially the master FAQ
http://www.python.org/doc/FAQ.html

PythonWare complements the digest you're reading with the
marvelous daily python url
http://www.pythonware.com/daily
Mygale is a news-gathering webcrawler that specializes in (new)
World-Wide Web articles related to Python.
http://www.awaretek.com/nowak/mygale.html
While cosmetically similar, Mygale and the Daily Python-URL
are utterly different in their technologies and generally in
their results.

comp.lang.python.announce announces new Python software. Be
sure to scan this newsgroup weekly.
http://groups.google.com/groups?oi=djq&as_ugroup=comp.lang.python.announce

Brett Cannon continues the marvelous tradition established by
Andrew Kuchling and Michael Hudson of intelligently summarizing
action on the python-dev mailing list once every other week.
http://www.python.org/dev/summary/

The Python Package Index catalogues packages.
http://www.python.org/pypi/

The somewhat older Vaults of Parnassus ambitiously collects references
to all sorts of Python resources.
http://www.vex.net/~x/parnassus/

Much of Python's real work takes place on Special-Interest Group
mailing lists
http://www.python.org/sigs/

The Python Business Forum "further the interests of companies
that base their business on ... Python."
http://www.python-in-business.org

Python Success Stories--from air-traffic control to on-line
match-making--can inspire you or decision-makers to whom you're
subject with a vision of what the language makes practical.
http://www.pythonology.com/success

The Python Software Foundation (PSF) has replaced the Python
Consortium as an independent nexus of activity. It has official
responsibility for Python's development and maintenance.
http://www.python.org/psf/
Among the ways you can support PSF is with a donation.
http://www.python.org/psf/donate.html

Kurt B. Kaiser publishes a weekly report on faults and patches.
http://www.google.com/groups?as_usubject=weekly python patch

Cetus collects Python hyperlinks.
http://www.cetus-links.org/oo_python.html

Python FAQTS
http://python.faqts.com/

The Cookbook is a collaborative effort to capture useful and
interesting recipes.
http://aspn.activestate.com/ASPN/Cookbook/Python

Among several Python-oriented RSS/RDF feeds available are
http://www.python.org/channews.rdf
http://bootleg-rss.g-blog.net/pythonware_com_daily.pcgi
http://python.de/backend.php
For more, see
http://www.syndic8.com/feedlist.php?ShowMatch=python&ShowStatus=all
The old Python "To-Do List" now lives principally in a
SourceForge reincarnation.
http://sourceforge.net/tracker/?atid=355470&group_id=5470&func=browse
http://python.sourceforge.net/peps/pep-0042.html

The online Python Journal is posted at pythonjournal.cognizor.com.
(e-mail address removed) and (e-mail address removed)
welcome submission of material that helps people's understanding
of Python use, and offer Web presentation of your work.

deli.cio.us presents an intriguing approach to reference commentary.
It already aggregates quite a bit of Python intelligence.
http://del.icio.us/tag/python

*Py: the Journal of the Python Language*
http://www.pyzine.com

Archive probing tricks of the trade:
http://groups.google.com/groups?oi=djq&as_ugroup=comp.lang.python&num=100
http://groups.google.com/groups?meta=site=groups&group=comp.lang.python.*

Previous - (U)se the (R)esource, (L)uke! - messages are listed here:
http://www.ddj.com/topics/pythonurl/
http://purl.org/thecliff/python/url.html (dormant)
or
http://groups.google.com/groups?oi=djq&as_q=+Python-URL!&as_ugroup=comp.lang.python


Suggestions/corrections for next week's posting are always welcome.
E-mail to <[email protected]> should get through.

To receive a new issue of this posting in e-mail each Monday morning
(approximately), ask <[email protected]> to subscribe. Mention
"Python-URL!".


-- The Python-URL! Team--

Dr. Dobb's Journal (http://www.ddj.com) is pleased to participate in and
sponsor the "Python-URL!" project.
 
M

michele.simionato

S

Stephan Diehl

Holger:



Uhm... on my system I get:

Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in
position 0: ordinal not in range(128)
?? What's wrong?

I'd rather use german_ae.encode('latin1')
^^^^^^

which returns '\xe4'.
 
M

michele.simionato

Stephan:
I'd rather use german_ae.encode('latin1') ^^^^^^
which returns '\xe4'.

uhm ... then there is a misprint in the discussion of the recipe;
BTW what's the difference between .encode and .decode ?
(yes, I have been living in happy ASCII-land until now ... ;)
I should probably ask for an unicode primer, I have found the
one by Marc André Lemburg
http://www.reportlab.com/i18n/python_unicode_tutorial.html
and I am reading it right now.


Michele Simionato
 
A

Aahz

BTW what's the difference between .encode and .decode ?
(yes, I have been living in happy ASCII-land until now ... ;)

Here's the stark simple recipe: when you use Unicode, you *MUST* switch
to a Unicode-centric view of the universe. Therefore you encode *FROM*
Unicode and you decode *TO* Unicode. Period. It's similar to the way
floating point contaminates ints.
 
S

Skip Montanaro

michele> BTW what's the difference between .encode and .decode ?

I started to answer, then got confused when I read the docstrings for
unicode.encode and unicode.decode:
Help on built-in function decode:

decode(...)
S.decode([encoding[,errors]]) -> string or unicode

Decodes S using the codec registered for encoding. encoding defaults
to the default encoding. errors may be given to set a different error
handling scheme. Default is 'strict' meaning that encoding errors raise
a UnicodeDecodeError. Other possible values are 'ignore' and 'replace'
as well as any other name registerd with codecs.register_error that is
able to handle UnicodeDecodeErrors.
Help on built-in function encode:

encode(...)
S.encode([encoding[,errors]]) -> string or unicode

Encodes S using the codec registered for encoding. encoding defaults
to the default encoding. errors may be given to set a different error
handling scheme. Default is 'strict' meaning that encoding errors raise
a UnicodeEncodeError. Other possible values are 'ignore', 'replace' and
'xmlcharrefreplace' as well as any other name registered with
codecs.register_error that can handle UnicodeEncodeErrors.

It probably makes sense to one who knows, but for the feeble-minded like
myself, they seem about the same.

I'd be happy to add a couple examples to the string methods section of the
docs if someone will produce something simple that makes the distinction
clear.

Skip
 
S

Skip Montanaro

aahz> Here's the stark simple recipe: when you use Unicode, you *MUST*
aahz> switch to a Unicode-centric view of the universe. Therefore you
aahz> encode *FROM* Unicode and you decode *TO* Unicode. Period. It's
aahz> similar to the way floating point contaminates ints.

That's what I do in my code. Why do Unicode objects have a decode method
then?

Skip
 
T

Thomas Heller

Skip Montanaro said:
michele> BTW what's the difference between .encode and .decode ?

I started to answer, then got confused when I read the docstrings for
unicode.encode and unicode.decode:
Help on built-in function decode:

decode(...)
S.decode([encoding[,errors]]) -> string or unicode

Decodes S using the codec registered for encoding. encoding defaults
to the default encoding. errors may be given to set a different error
handling scheme. Default is 'strict' meaning that encoding errors raise
a UnicodeDecodeError. Other possible values are 'ignore' and 'replace'
as well as any other name registerd with codecs.register_error that is
able to handle UnicodeDecodeErrors.
Help on built-in function encode:

encode(...)
S.encode([encoding[,errors]]) -> string or unicode

Encodes S using the codec registered for encoding. encoding defaults
to the default encoding. errors may be given to set a different error
handling scheme. Default is 'strict' meaning that encoding errors raise
a UnicodeEncodeError. Other possible values are 'ignore', 'replace' and
'xmlcharrefreplace' as well as any other name registered with
codecs.register_error that can handle UnicodeEncodeErrors.

It probably makes sense to one who knows, but for the feeble-minded like
myself, they seem about the same.

It seems also the error messages aren't too helpful:
Traceback (most recent call last):

Hm, why does the 'encode' call complain about decoding?

Why do string objects have an encode method, and why do unicode objects
have a decode method, and what does this error message want to tell me:
Traceback (most recent call last):

Thomas
 
M

Max M

uhm ... then there is a misprint in the discussion of the recipe;
BTW what's the difference between .encode and .decode ?
(yes, I have been living in happy ASCII-land until now ... ;)


# -*- coding: latin-1 -*-


# here i make a unicode string
unicode_file = u'Some danish characters æøå' #.encode('hex')
print type(unicode_file)
print repr(unicode_file)
print ''


# I can convert this unicode string to an ordinary string.
# because æøå are in the latin-1 charmap it can be understood as
# a latin-1 string
# the æøå characters even has the same value in both
latin1_file = unicode_file.encode('latin-1')
print type(latin1_file)
print repr(latin1_file)
print latin1_file
print ''


## I can *not* convert it to ascii
#ascii_file = unicode_file.encode('ascii')
#print ''


# I can also convert it to utf-8
utf8_file = unicode_file.encode('utf-8')
print type(utf8_file)
print repr(utf8_file)
print utf8_file
print ''


#utf8_file is now an ordinary string. again it can help to think of it
as a file
#format.
#
#I can convert this file/string back to unicode again by using the
decode method.
#It tells python to decode this "file format" as utf-8 when it loads it
onto a
#unicode string. And we are back where we started


unicode_file = utf8_file.decode('utf-8')
print type(unicode_file)
print repr(unicode_file)
print ''


# So basically you can encode a unicode string into a special
string/file format
# and you can decode a string from a special string/file format back
into unicode.


###################################


<type 'unicode'>
u'Some danish characters \xe6\xf8\xe5'

<type 'str'>
'Some danish characters \xe6\xf8\xe5'
Some danish characters æøå

<type 'str'>
'Some danish characters \xc3\xa6\xc3\xb8\xc3\xa5'
Some danish characters æøå

<type 'unicode'>
u'Some danish characters \xe6\xf8\xe5'





--

hilsen/regards Max M, Denmark

http://www.mxm.dk/
IT's Mad Science
 
M

Max M

Thomas said:
It seems also the error messages aren't too helpful:


Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0x84 in position 0: ordinal not in range(128)

Hm, why does the 'encode' call complain about decoding?

Because it tries to print it out to your console and fail. While writing
to the console it tries to convert to ascii.

Beside, you should write:

u"ä".encode("latin-1") to get a latin-1 encoded string.


--

hilsen/regards Max M, Denmark

http://www.mxm.dk/
IT's Mad Science
 
T

Thomas Heller

Max M said:
Because it tries to print it out to your console and fail. While
writing to the console it tries to convert to ascii.

Wrong, same error without trying to print something:
Traceback (most recent call last):
Beside, you should write:

u"ä".encode("latin-1") to get a latin-1 encoded string.

I know, but the question was: why does a unicode string has a encode
method, and why does it complain about decoding (which has already been
answered in the meantime).

Thomas
 
?

=?ISO-8859-1?Q?Walter_D=F6rwald?=

Skip said:
aahz> Here's the stark simple recipe: when you use Unicode, you *MUST*
aahz> switch to a Unicode-centric view of the universe. Therefore you
aahz> encode *FROM* Unicode and you decode *TO* Unicode. Period. It's
aahz> similar to the way floating point contaminates ints.

That's what I do in my code. Why do Unicode objects have a decode method
then?

Because MAL implemented it! >;->

It first encodes in the default encoding and then decodes the result
with the specified encoding, so if u is a unicode object
u.decode("utf-16")
is an abbreviation of
u.encode().decode("utf-16")

In the same way str has an encode method, so
s.encode("utf-16")
is an abbreviation of
s.decode().encode("utf-16")

Bye,
Walter Dörwald
 
C

Carl Banks

Skip said:
I started to answer, then got confused when I read the docstrings for
unicode.encode and unicode.decode:
[snip]


It certainly is confusing. When I first started Unicoding, I pretty
much stuck to Aahz's rule of thumb, without understanding this details,
and still do that. But now I do undertstand it.

Although encodings are bijective (i.e., equivalent one-to-one
mappings), they are not apolar. One side of the encoding is
arbitrarily labeled the encoded form; the other is arbitrarily labeled
the decoded form. (This is not a relativistic system, here.) The
encode method maps from the decoded to the encoded set. The decode
method does the inverse.

That's it. The only real technical difference between encode and
decode is the direction they map in.

By convention, the decoded form is a Python unicode string, and the
encoded form is the byte string.

I believe it's technically possible (but very rude) to write an
"inverse encoding", where the "encoded" form is a unicode string, and
the decoded form is UTF-8 byte string.

Also, note that there are some encodings unrelated to Unicode. For
example, try this:

.. >>> "abcd".encode("base64")
This is an encoding between two byte strings.
 
M

Max M

Carl said:
Also, note that there are some encodings unrelated to Unicode. For
example, try this:

. >>> "abcd".encode("base64")
This is an encoding between two byte strings.

Yes. This can be especially nice when you need to use restricted charsets.

I needed to use unicode objects as Zope ids. But Zope only accepts a
subset of ascii as ids.

So I used:


hex_id = u'INBOX'.encode('utf-8').encode('hex')
And I can get the unicode representation back with:

unicode_id = id.decode('hex').decode('utf-8')
Tn that case id.decode('hex') doesn't return a unicode, but a utf-8
encoded string.

--

hilsen/regards Max M, Denmark

http://www.mxm.dk/
IT's Mad Science
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,189
Latest member
CryptoTaxSoftware

Latest Threads

Top