Native Code vs. Python code for modules

K

koblas

Ruby has been getting pummeled for the last year or more on the
performance subject. They've been working hard at improving it. From
my arm chair perspective Python is sitting on it's laurels and not
taking this as seriously as it probably should. In general it's
possible to make many comments that swirl around religion and
approach, one of the things that I've noticed is that wile Python has
a much more disciplined developer community they're not using this for
the greater good.

Specifically, in looking at a benchmark that was posted, python was an
order of magnitude (20 secs vs. 1 sec) slower than ruby. In
investigating the performance bottleneck is the random module. This
got me to think, why does python have "pickle" and "cPickle"? Comes
down to lowest common denominator, or the inability for somebody to
write an optimized package that can mimic a python package.

To that end why would somebody write big try catch blocks to see if
modules exist and if they exist alias their names. Wouldn't it be
better if there was a way that if I have an "interface compatible"
native (aka C) module that has better performance that there could be
a way that python would give it preference.

e.g.

import random(version=1.2, lang=c)
or
import random(version=1.2, lang=py) # use the python version by
default
or
import random # use the latest version in the "fastest" code (C
given preference)

where there could be a nice set of "standard" key value pairs that
could provide addtional hints as to what language and version of a
library was to be used.
 
A

alex23

Ruby has been getting pummeled for the last year or more on the
performance subject.  They've been working hard at improving it.  From
my arm chair perspective Python is sitting on it's laurels and not
taking this as seriously as it probably should.

Well, the snarky response is most Python developers are too busy
working on actual real world projects :)

A more measured response would be that it's so easy to integrate
Python with C that most developers would prefer to profile their own
code and implement bottleneck code in C where appropriate.
 In general it's
possible to make many comments that swirl around religion and
approach, one of the things that I've noticed is that wile Python has
a much more disciplined developer community they're not using this for
the greater good.

I guess it all depends on what you define by "greater good". If you
mean "faster performance", then maybe not, although it definitely
seems to be -one- aspect being addressed by the PyPy project, at
least. For me, "greater good" == "makes my life as a developer
easier", and from that definition I've nothing but admiration for the
community as a whole.
This got me to think, why does python have "pickle" and "cPickle"?

Because sometimes you want something to be fast, but other times you
want a pure Python version that's easy to extend.
To that end why would somebody write big try catch blocks to see if
modules exist and if they exist alias their names.

Because it's the established approach, they're run through once during
importing, and it doesn't require modifying the language to achieve
it.
 Wouldn't it be
better if there was a way that if I have an "interface compatible"
native (aka C) module that has better performance that there could be
a way that python would give it preference.

Shouldn't that -always- be decided by the developer? I don't want to
rely on magic behaviour based on what is installed in the runtime
environment by an end user...
  import random(version=1.2, lang=c)
or
  import random(version=1.2, lang=py)

I don't see the gain over sticking with an established convention,
like pickle and cPickle.
  import random     #  use the latest version in the "fastest" code (C
given preference)

I personally have a preference for predictable behaviour over
idealised optimisation.

All of this, of course, is personal preference; please don't take me
for a jihadi :)

- alex23
 
P

Paddy

Ruby has been getting pummeled for the last year or more on the
performance subject.  They've been working hard at improving it.  From
my arm chair perspective Python is sitting on it's laurels and not
taking this as seriously as it probably should.

I personally DON'T think that the developers should be chasing down
every micro-benchmark. Someone has to look-up and see the bigger
picture.
If you take a look at a larger set of micro benchmarks, then Ruby
is below par when compared to Python and Perl on speed. I had the
misfortune to read their development group entries when a critical
bug was 'fixed' by a release from Japan that didn't have enough
testing and so think the Ruby development 'group' have problems of
quality that they also need to address. (And are doing I think by
adding to their test-suite).

PyPy and previous tools like Psyco are ways in which Python
developers are continuing to see above the eternal grind of micro-
optimising the C-interpreter to try and give us tools that can
approach compiled language speeds from the language we love.
I like to think of it as working smarter rather than harder - the
brains a much better muscle :)

- Paddy.
 
S

Sion Arrowsmith

alex23 said:
Well, the snarky response is most Python developers are too busy
working on actual real world projects :)

The follow-up snarky response is that working on actual real world
projects has lead Python developers to realise that most real world
bottlenecks arise from disk access, network speed, and user
interaction, and you should only sweat about code speed in the rare
case when you *know* that it's not its interface to the real world
that's slow.
 
P

pruebauno

better if there was a way that if I have an "interface compatible"
native (aka C) module that has better performance that there could be
a way that python would give it preference.

e.g.

import random(version=1.2, lang=c)
or
import random(version=1.2, lang=py) # use the python version by
default
or
import random # use the latest version in the "fastest" code (C
given preference)

where there could be a nice set of "standard" key value pairs that
could provide addtional hints as to what language and version of a
library was to be used.

I will only make a comment on automatic cModule importing. Some of
what you are suggesting already happens (e.g. Elementree). In Python
3.0 more of the "legacy" modules (e.g. StringIO) should also work that
way. I don't know what the current status for Pickle is because
cPickle had some additional internal functionality not available in
regular Pickle that Zope was dependent on.
 
J

John Nagle

koblas said:
Ruby has been getting pummeled for the last year or more on the
performance subject. They've been working hard at improving it. From
my arm chair perspective Python is sitting on it's laurels and not
taking this as seriously as it probably should.

PyPy was supposed to help, but that project has been dragging on
for half a decade now. Personally, I think the Shed Skin approach
is more promising. PyPy has too many different goals, and tends
to generate lots of auxiliary tools, blogs, videos, and "sprints",
but not a usable PyPy compiler.

We'll have to see if PyPy 1.1 works.

John Nagle
 
B

bearophileHUGS

John Nagle:
Personally, I think the Shed Skin approach
is more promising.

ShedSkin will probably have scaling problems: as the program size
grows it may need too much time to infer all the types. The author has
the strict policy of refusing any kind of type annotation, this make
it unpractical.

And, despite your interest in ShedSkin, so far very few people have
given a hand actually developing SS (I think partially because
ShedSkin Python sources aren't much hackable. This is very bad for an
OpenSource project), so I think the author now has lost part of the
will to develop this project (but probably we'll see one of two more
versions).

For me so far the most viable way to produce a faster Python system
seems a version of CPython with Cython and something Psyco-like built-
in (and a built-in compiler on Windows, like MinGW 4.2.1), maybe with
some syntax support in the Python language, allowing to mix statically
compiled Python code with dynamically compiled Python code in an easy
way (as CLisp sometimes does).

Bye,
bearophile
 
A

alex23

The follow-up snarky response is that working on actual real world
projects has lead Python developers to realise that most real world
bottlenecks arise from disk access, network speed, and user
interaction, and you should only sweat about code speed in the rare
case when you *know* that it's not its interface to the real world
that's slow.

You won't hear any disagreement from me on that point :)
 
S

srepmub

ShedSkinwill probably have scaling problems: as the program size
grows it may need too much time to infer all the types. The author has
the strict policy of refusing any kind of type annotation, this make
it unpractical.

well, I admit I really don't like manual type annotations (unless for
documentation purposes). it seems a much nicer (..pythonic) approach
to just get type information from a profiler. if I had four hands (and
two brains), shedskin would probably already include one.

that said, I know of several ways to improve the scalability
shedskin's type analysis itself, and I might still pursue those. but I
think, in combination with a profiler, things should scale pretty well
already.. certainly enough to compile most smallish programs/extension
modules of up to a few thousands lines.
And, despite your interest inShedSkin, so far very few people have
given a hand actually developing SS (I think partially

well, it's been quite a few people actually, about 15 that have
contributed substantial improvements. of course doing a compiler like
this is probably more than 10 person-years of work, so I could always
use more help.

becauseShedSkinPython sources aren't much hackable. This is very bad
for an
OpenSource project), so I think the author now has lost part of the

I think they are reasonably hackable for the most part, and this can
only improve. in the beginning I had little documentation, and there
was just this 7000-line Python file :) now things are more split up,
and I even added documentation recently to each part. yes, type
inference will always be hard to hack on, but that's only one part.
the C++ side, where I can arguably use most help, and which consists
of more than half of the code, has always been easily hackable.
will to develop this project (but probably we'll see one of two more
versions).

I have my ups and downs of course, but at the moment I'm quite
enthousiastic about the whole thing, in part because people are
actually contributing. a new release is coming up, with support for
datetime and ConfigParser among many other improvements/fixes, and
there is a much faster set implementation in the pipeline. at the
moment, I have no plans to halt development at all.
For me so far the most viable way to produce a faster Python system
seems a version of CPython with Cython and something Psyco-like built-
in (and a built-in compiler on Windows, like MinGW 4.2.1), maybe with
some syntax support in the Python language, allowing to mix statically
compiled Python code with dynamically compiled Python code in an easy
way (as CLisp sometimes does).

shedskin can of course generate extension modules (shedskin -e), that
can be imported from larger Python programs. it's a bit clumsy, as
only builtins can be passed to/from shedskin, and everything (args,
return values) is copied recursively, but it can be quite useful
already. and of course it can only improve as well..


mark.
 
C

castironpi

To that end why would somebody write big try catch blocks to see if
modules exist and if they exist alias their names.  Wouldn't it be
better if there was a way that if I have an "interface compatible"
native (aka C) module that has better performance that there could be
a way that python would give it preference.

e.g.

  import random(version=1.2, lang=c)
or
  import random(version=1.2, lang=py)   # use the python version by
default
or
  import random     #  use the latest version in the "fastest" code (C
given preference)

where there could be a nice set of "standard" key value pairs that
could provide addtional hints as to what language and version of a
library was to be used.

I don't see any reason why you couldn't write your own. You won't get
that exact syntax in Python but here's a few things that come close.

versioned_import( 'random', 1.2, versioning.C )
versioned_import( 'random', 1.2, 'C' )
versioned_import( 'random', lang= 'python' )

In order to get your new module added to the namespace, you might need
an assignment.

randomC= versioned_import( 'random', lang= 'python' )

This would perform the same functionality as 'import random as
randomC'.

The exact details of priority and search order (you find a better
match in a more remote search location, you find a better version
match and worse language match) are up to you, which there are ways to
specify or decide behind the scenes. You could even have two versions
of a lookup table in the same language, one implemented with a hash
table and one with an AVL tree. Do you specify that with version
numbers, version names, or extra keywords? If you're dealing with
lookups frequently on powers of two, for example, you might want to
avoid a hash table, or at least avoid Python's.
[ hash(2**x) for x in range( 0, 700, 32 ) ]
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

Can a package that's 'versioned_import - ready' offer additional
keywords, such as enabling particular options, etc.? Perhaps the
function can begin to import a package, make some run-time
determinations based on parameters, and branch from there.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top