Article on the future of Python

W

wxjmfauth

Le mercredi 26 septembre 2012 17:54:04 UTC+2, Ian a écrit :
Indeed. Here's an interesting article about Unicode handling that

identifies Python 3.3 as one of only four programming languages that

handle Unicode correctly (the other three being Bash, Haskell 98, and

Scheme R6RS).



http://unspecified.wordpress.com/20...e-of-language-level-abstract-unicode-strings/

May I suggest, you dive in the TeX documentation (sometimes,
no so easy to find quickly).

In my mind much better than all these web pages around. The big
plus, you will also understand "characters" as whole.

jmf
 
W

wxjmfauth

Le mercredi 26 septembre 2012 17:54:04 UTC+2, Ian a écrit :
Indeed. Here's an interesting article about Unicode handling that

identifies Python 3.3 as one of only four programming languages that

handle Unicode correctly (the other three being Bash, Haskell 98, and

Scheme R6RS).



http://unspecified.wordpress.com/20...e-of-language-level-abstract-unicode-strings/

May I suggest, you dive in the TeX documentation (sometimes,
no so easy to find quickly).

In my mind much better than all these web pages around. The big
plus, you will also understand "characters" as whole.

jmf
 
P

Paul Rubin

Chris Angelico said:
When you compare against a wide build, semantics of 3.2 and 3.3 are
identical, and then - and ONLY then - can you sanely compare
performance. And 3.3 stacks up much better.

I like to have seen real world benchmarks against a pure UTF-8
implementation. That means O(n) access to the n'th character of a
string which could theoretically slow some programs down terribly, but I
wonder how often that actually matters in ways that can't easily be
worked around.
 
C

Chris Angelico

I like to have seen real world benchmarks against a pure UTF-8
implementation. That means O(n) access to the n'th character of a
string which could theoretically slow some programs down terribly, but I
wonder how often that actually matters in ways that can't easily be
worked around.

That's pretty much what we have with the PHP parts of our web site.
We've decreed that everything should be UTF-8 byte streams (actually,
it took some major campaigning from me to get rid of the underlying
thinking that "binary-safe" and "UTF-8" and "characters" and so on
were all equivalent), but there are very few places where we actually
index strings in PHP. There's a small amount of parsing, but it's all
done by splitting on particular strings - if you search for 0x0A in a
UTF-8 bytestream and split at that index, it's the same as searching
for U+000A in a Unicode string and splitting there - and all of our
structural elements fit inside ASCII. The few times we actually care
about character length (eg limiting user-specified rule names to N
characters), we don't much care about performance, because they're
unusual checks.

So, I don't actually have any stats for you, because it's really easy
to just not index strings at all.

ChrisA
 
D

Dennis Lee Bieber

How long did you just say??? I promised it in 8 weeks, not 12 you
complete moron :)

That means it will be done in 16 months (double the time estimate
and change to next larger unit)
 
P

Paul Rubin

Chris Angelico said:
So, I don't actually have any stats for you, because it's really easy
to just not index strings at all.

Right, that's why I think the O(n) indexing issue of UTF-8 may be
overblown. Haskell 98 was mentioned earlier as a language that did
Unicode "correctly", but its strings are linked lists of code points.
They are a performance pig to be sure but the O(n) indexing is usually
not the bottleneck. These days there is a "Text" module that I think is
basically UTF-16 arrays. I have been meaning to find out what happens
with non-BMP characters.
 
W

wxjmfauth

Le mercredi 26 septembre 2012 18:52:44 UTC+2, Paul Rubin a écrit :
I like to have seen real world benchmarks against a pure UTF-8

implementation. That means O(n) access to the n'th character of a

string which could theoretically slow some programs down terribly, but I

wonder how often that actually matters in ways that can't easily be

worked around.

The selection of a coding scheme is a problem per
se. In Py33 there is a mixin of coding schemes, an
artificial construction supposed to be a new coding
scheme.

As an exercise, pickup characters of each individual
coding, toy with them and see what happen.
This poor Python has not only the task to handle
the bytes of a coding scheme, now it has the
task to select the coding scheme it will use with
probably plenty of side effects.

Completely absurd. I am penalized simply because I add
a French character to a French word. A character which
does not belong to the same "category" of the characters
composing this word.

jmf
 
M

Matej Cepl

Apart from IronPython, what constituency do these alternative
and Jython ... that is widely used in the Java server world
implementations of Python have that would raise them above the level of
interesting experiments?

Matěj
 
T

Terry Reedy

You are always selling the same argument.

Because you keep repeating the same insane argument against 3.3.
Py3.3 is the only computer language I'm aware of which
is maltreating Unicode in such a way.

You have it backwards. 3.3 fixes maltreatment of unicode, such as also
exists in other languages. re will also run better with 3.3. You have
not shown any new bugs. Many other languages do not handle extended
plane characters properly.
After all, if replacing a Nabla operator in a string take
10 times more times in Py33 than in Python32, it takes 10
times more . There is nothing more to say.

On the contrary, there is lots more to say. You have picked out the one
thing that 3.3 does not do as well and ignored all the things 3.3 does
better. I and others have already explained many of them. Included is
that fact that 3.3 does one operation 10, 100, 1000,... times faster
than 3.2.
 
T

Terry Reedy

You know, usually when I see software decried as America-centric, it's
because it doesn't support Unicode. This must be the first time I've
seen that label applied to software that dares to *fully* support Unicode.

What is truly bizarre is the idea came from and much or most of the
implementation was done by Europeans, not Americans.
 
C

Chris Angelico

What is truly bizarre is the idea came from and much or most of the
implementation was done by Europeans, not Americans.

I suppose that a system that supports only Latin-1 is therefore Italy-centric?

ChrisA
 
S

Steven D'Aprano

No, I'm comparing Py33 with Py32 narrow build [*]. And I am not a Python
newbie. Others in a previous discussion have pointed "bad" numbers and
even TR wrote something like "I'm baffled (?) by these numbers".

jmf, some time ago I said to you that if you want your claims to be taken
seriously, you should come up with a test suite that exercises the FULL
range of string operations and still demonstrates a significant slowdown.

Have you do this? I would be interested to run your test suite.

We know that if the only thing you do is repeatedly create strings, then
throw them away, then create more strings, then throw them away, Python
3.3 will be a little slower than Python 3.2. You say "ten times" slower,
but nobody else has been able to confirm this. Others are reporting that,
at worst, string handling is twice as slow and sometimes twice as fast,
depending on what operations you do, and what operating system you have.

(Since creating strings depends on allocating, and moving, blocks of
memory, the speed of creating strings is highly dependent on the
operating system's memory management.)

If all you want to do is complain and whinge and feel morally superior
that you are the only one that cares that "Python is slower" (allegedly),
please take it to your blog because we don't care.

But if you genuinely want to determine whether or not this slowdown is
meaningful in practice, and if so help optimise it so that it is faster,
then stop with the propaganda about Python destroying Unicode and start
writing a test suite.
 
S

Steven D'Aprano

You remind me of the opening to the song Plaistow Patricia by Ian Dury
and the Blockheads.

While I always appreciate a good reference to Ian Dury, please stop
feeding D.H.'s ego by responding to his taunts.
 
S

Steven D'Aprano

Apart from IronPython, what constituency do these alternative
implementations of Python have that would raise them above the level of
interesting experiments?

The "Big Four" are CPython, Jython, IronPython and PyPy. Possibly "Big
Five" if you include Stackless, although I'm not quite sure just how big
(popular) Stackless actually is. It's certainly old and venerable, and
actively maintained. If you've played EVE Online, you've seen Stackless
in action.

Jython has a big constituency in Java shops. I can't tell you much about
that because I don't use Java.

PyPy is, well, PyPy is amazing, if you have the hardware to run it. It is
an optimizing Python JIT compiler, and it can consistently demonstrate
speeds of about 10 times the speed of CPython, which puts it in the same
ballpark as native code generated by Java compilers. For some (admittedly
artificially narrow) tasks it can beat optimized C code. It's fast enough
for real time video processing, depending on the algorithm used.

While PyPy is still a work in progress, and is not anywhere near as
mature as (say) gcc or clang, it should be considered production-ready.

I expect that, within the decade, PyPy will become "the" standard Python
compiler and CPython will be relegated to "merely" the reference
implementation.
 
M

Mark Lawrence

While I always appreciate a good reference to Ian Dury, please stop
feeding D.H.'s ego by responding to his taunts.

Good point as he's had so much rope that he's hung himself several times
over. Thanks for helping me get my feet back on the ground.
 
W

Walter Hurry

Any chance you could work on your usenet literacy and fix your double
posts?

I have a better idea: Consign him to the same bin as Dwight Hutto and
Dihedral.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,733
Messages
2,569,440
Members
44,832
Latest member
GlennSmall

Latest Threads

Top