[RELEASED] Python 3.1 final

Benjamin Peterson · Jun 27, 2009

On behalf of the Python development team, I'm thrilled to announce the first
production release of Python 3.1.

Python 3.1 focuses on the stabilization and optimization of the features and
changes that Python 3.0 introduced. For example, the new I/O system has been
rewritten in C for speed. File system APIs that use unicode strings now handle
paths with undecodable bytes in them. Other features include an ordered
dictionary implementation, a condensed syntax for nested with statements, and
support for ttk Tile in Tkinter. For a more extensive list of changes in 3.1,
see http://doc.python.org/3.1/whatsnew/3.1.html or Misc/NEWS in the Python
distribution.

To download Python 3.1 visit:

http://www.python.org/download/releases/3.1/

The 3.1 documentation can be found at:

http://docs.python.org/3.1

Bugs can always be reported to:

http://bugs.python.org

Enjoy!

Nobody · Jun 28, 2009

Python 3.1 focuses on the stabilization and optimization of the features and
changes that Python 3.0 introduced. For example, the new I/O system has been
rewritten in C for speed. File system APIs that use unicode strings now
handle paths with undecodable bytes in them.

That's a significant improvement. It still decodes os.environ and sys.argv
before you have a chance to call sys.setfilesystemencoding(), but it
appears to be recoverable (with some effort; I can't find any way to re-do
the encoding without manually replacing the surrogates).

However, sys.std{in,out,err} are still created as text streams, and AFAICT
there's nothing you can do about this from within your code.

All in all, Python 3.x still has a long way to go before it will be
suitable for real-world use.

Martin v. Löwis · Jun 28, 2009

That's a significant improvement. It still decodes os.environ and sys.argv

before you have a chance to call sys.setfilesystemencoding(), but it
appears to be recoverable (with some effort; I can't find any way to re-do
the encoding without manually replacing the surrogates).

See PEP 383.

However, sys.std{in,out,err} are still created as text streams, and AFAICT
there's nothing you can do about this from within your code.

That's intentional, and not going to change. You can access the
underlying byte streams if you want to, as you could already in 3.0.

Regards,
Martin

P.S. Please identify yourself on this newsgroup.

Benjamin Peterson · Jun 28, 2009

Nobody said:
All in all, Python 3.x still has a long way to go before it will be
suitable for real-world use.

Such as?

Paul Moore · Jun 28, 2009

2009/6/28 "Martin v. Löwis said:
That's intentional, and not going to change. You can access the
underlying byte streams if you want to, as you could already in 3.0.

I had a quick look at the documentation, and couldn't see how to do
this. It's the first time I'd read the new IO module documentation, so
I probably missed something obvious. Could you explain how I get the
byte stream underlying sys.stdin? (That should give me enough to find
what I was misunderstanding in the docs).

Thanks,
Paul.

Piet van Oostrum · Jun 28, 2009

PM> I had a quick look at the documentation, and couldn't see how to do
PM> this. It's the first time I'd read the new IO module documentation, so
PM> I probably missed something obvious. Could you explain how I get the
PM> byte stream underlying sys.stdin? (That should give me enough to find
PM> what I was misunderstanding in the docs).

http://docs.python.org/3.1/library/sys.html#sys.stdin

Nobody · Jun 28, 2009

Such as?

Such as not trying to shoe-horn every byte string it encounters into
Unicode. Some of them really are *just* byte strings.

Benjamin Peterson · Jun 28, 2009

Nobody said:
Such as not trying to shoe-horn every byte string it encounters into
Unicode. Some of them really are *just* byte strings.

You're certainly allowed to convert them back to byte strings if you want.

Terry Reedy · Jun 28, 2009

Nobody said:
Such as not trying to shoe-horn every byte string it encounters into
Unicode. Some of them really are *just* byte strings.

Let's ignore the disinformation. So false it is hardly worth refuting.

Benjamin Peterson · Jun 28, 2009

Paul Moore said:
The "buffer" attribute doesn't seem to be documented in the docs for
the io module. I'm guessing that the TextIOBase class should have a
note that you get at the buffer through the "buffer" attribute?

Good point. I've now documented it, and the "raw" attribute of BufferedIOBase.

Aahz · Jun 28, 2009

You're certainly allowed to convert them back to byte strings if you want.

Yes, but do you get back the original byte strings? Maybe I'm missing
something, but my impression is that this is still an issue for the email
module as well as command-line arguments and environment variables.

Benjamin Peterson · Jun 28, 2009

Aahz said:
Yes, but do you get back the original byte strings? Maybe I'm missing
something, but my impression is that this is still an issue for the email
module as well as command-line arguments and environment variables.

The email module is, yes, broken. You can recover the bytestrings of
command-line arguments and environment variables.

Nobody · Jun 28, 2009

The email module is, yes, broken. You can recover the bytestrings of
command-line arguments and environment variables.

1. Does Python offer any assistance in doing so, or do you have to
manually convert the surrogates which are generated for unrecognised bytes?

2. How do you do this for non-invertible encodings (e.g. ISO-2022)?

Most of the issues can be worked around by calling
sys.setfilesystemencoding('iso-8859-1') at the start of the program, but
sys.argv and os.environ have already been converted by this point.

Nobody · Jun 28, 2009

Let's ignore the disinformation.

Translation: let's ignore anything which falsifies the assumptions.

So false it is hardly worth refuting.

Your copy of Trolling by Numbers must be getting pretty dog-eared by now.

Benjamin Peterson · Jun 28, 2009

Nobody said:
1. Does Python offer any assistance in doing so, or do you have to
manually convert the surrogates which are generated for unrecognised bytes?

fs_encoding = sys.getfilesystemencoding()
bytes_argv = [arg.encode(fs_encoding, "surrogateescape") for arg in sys.argv]

2. How do you do this for non-invertible encodings (e.g. ISO-2022)?

What's a non-invertible encoding? I can't find a reference to the term.

Hallvard B Furuseth · Jun 28, 2009

Benjamin said:
Nobody said:

On Sun, 28 Jun 2009 19:21:49 +0000, Benjamin Peterson wrote:
1. Does Python offer any assistance in doing so, or do you have to
manually convert the surrogates which are generated for unrecognised bytes?

Click to expand...

fs_encoding = sys.getfilesystemencoding()
bytes_argv = [arg.encode(fs_encoding, "surrogateescape") for arg in sys.argv]

2. How do you do this for non-invertible encodings (e.g. ISO-2022)?

Click to expand...

What's a non-invertible encoding? I can't find a reference to the term.

Different ISO-2022 strings can map to the same Unicode string.
Thus you can convert back to _some_ ISO-2022 string, but it won't
necessarily match the original.

Martin v. Löwis · Jun 28, 2009

2. How do you do this for non-invertible encodings (e.g. ISO-2022)?

ISO-2022 cannot be used as a system encoding.

Please do read the responses I write, and please do identify yourself.

Regards,
Martin

Gerhard Häring · Jun 28, 2009

Scott said:
Fortunately, I have assiduously avoided the real word, and am happy to
embrace the world from our 'bot overlords.

Congratulations on another release from the hydra-like world of
multi-head development.

+1 QOTW

-- Gerhard

Nobody · Jun 29, 2009

1. Does Python offer any assistance in doing so, or do you have to
manually convert the surrogates which are generated for unrecognised bytes?

Click to expand...

fs_encoding = sys.getfilesystemencoding()
bytes_argv = [arg.encode(fs_encoding, "surrogateescape") for arg in sys.argv]

This results in an internal error:

"\udce4\udceb\udcef\udcf6\udcfc".encode("iso-8859-1", "surrogateescape")

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
SystemError: Objects/bytesobject.c:3182: bad argument to internal function

[FWIW, the error corresponds to _PyBytes_Resize, which has a
cautionary comment almost as large as the code.]

The documentation gives the impression that "surrogateescape" is only
meaningful for decoding.

What's a non-invertible encoding? I can't find a reference to the term.

One where different inputs can produce the same output.

Nobody · Jun 29, 2009

See PEP 383.

Okay, that's useful, except that it may have some bugs:

r = "\udce4\udceb\udcef\udcf6\udcfc".encode("iso-8859-1", "surrogateescape")

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
SystemError: Objects/bytesobject.c:3182: bad argument to internal function

Trying a few random test cases suggests that the ratio of valid to invalid
bytes has an effect. Strings which consist mostly of invalid bytes trigger
the error, those which are mostly valid don't.

The error corresponds to _PyBytes_Resize(), which has the following
words of caution in a preceding comment:

/* The following function breaks the notion that strings are immutable:
it changes the size of a string. We get away with this only if there
is only one module referencing the object. You can also think of it
as creating a new string object and destroying the old one, only
more efficiently. In any case, don't use this if the string may
already be known to some other part of the code...
Note that if there's not enough memory to resize the string, the original
string object at *pv is deallocated, *pv is set to NULL, an "out of
memory" exception is set, and -1 is returned. Else (on success) 0 is
returned, and the value in *pv may or may not be the same as on input.
As always, an extra byte is allocated for a trailing \0 byte (newsize
does *not* include that), and a trailing \0 byte is stored.
*/

Assuming that this gets fixed, it should make most of the problems with
3.0 solvable. OTOH, it wouldn't have killed them to have added e.g.
sys.argv_bytes and os.environ_bytes.

That's intentional, and not going to change. You can access the
underlying byte streams if you want to, as you could already in 3.0.

Okay, I've since been pointed to the relevant information (I was looking
under "File Objects"; I didn't think to look at "sys").

[RELEASED] Python 3.1 beta 1	8	May 7, 2009
[RELEASED] Python 3.1 Release Candidate 1	0	May 30, 2009
[RELEASED] Python 3.1 Release Candidate 2	1	Jun 13, 2009
[RELEASED] Python 3.1.4	0	Jun 12, 2011
[RELEASED] Python 3.1.3	8	Nov 28, 2010
[RELEASED] Python 3.1 alpha 2	0	Apr 5, 2009
[RELEASED] Python 3.1.3 release candidate 1	0	Nov 13, 2010
[RELEASED] Python 3.1.2	0	Mar 21, 2010

[RELEASED] Python 3.1 final

Benjamin Peterson

Nobody

Martin v. Löwis

Benjamin Peterson

Paul Moore

Piet van Oostrum

Nobody

Benjamin Peterson

Terry Reedy

Benjamin Peterson

Aahz

Benjamin Peterson

Nobody

Nobody

Benjamin Peterson

Hallvard B Furuseth

Martin v. Löwis

Gerhard Häring

Nobody

Nobody

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads