Python advocacy in scientific computation

A

Alex Martelli

sturlamolden said:
just a toy. And as Matlab's run-time does reference counting insted of
proper garbage collection, any datastructure more complex than arrays
are sure to leak memory (I believe Python also suffered from this as
some point).

Yes, that was fixed in the move from 1.5.2 to 2.0 (I don't recall if the
intermediate short-lived 1.6 also fixed it, but nobody used that
anyway;-).
Matlab is not useful for anything except plotting data
quickly. And as for the expensive license, I am not sure its worth it.
I have been considering a move to Scilab for some time, but it too
carries the burden of working with a flawed language.

There was a pyscilab once, still around at
<http://pdilib.sourceforge.net/>, but I don't think it ever matured
beyond a "proof of concept" release 0.1 or something.


Alex
 
R

Robert Kern

sturlamolden said:
Robert Kern wrote:

Yes, and that is why I use C (that is ISO C99, not ANSI C98) instead of
Matlab for everything except trivial tasks. The design of Matlab's
language is fundamentally flawed. I once wrote a tutorial on how to
implement things like lists and trees in Matlab (using functional
programming, e.g. using functions to represent list nodes), but it's
just a toy. And as Matlab's run-time does reference counting insted of
proper garbage collection, any datastructure more complex than arrays
are sure to leak memory (I believe Python also suffered from this as
some point).

Python still uses reference counting and has several very good data structures
more complex than arrays. And yet, most programs don't leak memory.
Matlab is not useful for anything except plotting data
quickly. And as for the expensive license, I am not sure its worth it.
I have been considering a move to Scilab for some time, but it too
carries the burden of working with a flawed language.

And you need to ask why Python is a better Matlab than Matlab?

--
Robert Kern
(e-mail address removed)

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
R

Robert Kern

Brian said:
I'd like to ask, being new to python, in which ways is this array object far superior
to Matlab's? (I'm not being sarcastic, I really would like to know!)

Matlab takes the view that "everything is a rank-2 matrix of floating point values."

Our arrays have been N-dimensional since day one. They really are arrays, not
matrices. You have complete control over the types.

--
Robert Kern
(e-mail address removed)

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
S

sturlamolden

Robert said:
And you need to ask why Python is a better Matlab than Matlab?


First there are a few things I don't like:

1. Intendation as a part of the syntax, really annoying.

2. The "self.something" syntax is really tedious (look to ruby)!

4. Multithreading and parallel execution is impossible AFAIK because of
the so-called GIL (global interpreter lock). Matlab is perhaps even
worse in this respect.

5. I don't like numpy's array slicing. Array operations should be a
part of the language, as in Matlab, Fortran 90, Ada 95, D, Octave.


And there is a couple of questions I need answered:

1. Can python do "pass by reference"? Are datastructures represented by
references as in Java (I don't know yet).

2. How good is matplotlib/pylab? I tried to install it but only get
error messages so I haven't tested it. But plotting capabilities is
really major issue.

3. Speed. I haven't seen any performance benchmarks that actually deals
with things that are important for scientific programs.

4. Are there "easy to use" libraries containing other stuff important
for scientific programs, e.q. linear algebra (LU, SVD, Cholesky),
Fourier transforms, etc. E.g. in Matlab I can just type,

[u,s,v] = svd(x) % which calls LAPACK linked to ATLAS or
vendor-optimized BLAS

Even though the language itself is very limited this type of library
functionality more than makes up for it.


I have looked for alternatives to Matlab for quite a while, mainly due
to the cost, the åpoor speed and poor memory management. I am not sure
it is Python but so far I have not found anything mor promising either.
 
D

Duncan Booth

sturlamolden said:
First there are a few things I don't like:

1. Intendation as a part of the syntax, really annoying.

Each to his own. I find having the compiler enforce indentation rules is a
real benefit. There's nothing quite so annoying as having 'coding
standards' you are supposed to be following which aren't enforced (so
nobody else working on the code followed them either). C code which never
executes a for loop because of a spurious ';' can be pretty annoying too.
2. The "self.something" syntax is really tedious (look to ruby)!

I find it really useful to use a similar style when programming in other
languages such as Java or C#. In particular it means that you can
instantly see when you are referring to a member variable (without
having to prefix them all with m_ or some other abomination) and when you
have method arguments which get assigned to member variables you don't have
to think of different names for each.
4. Multithreading and parallel execution is impossible AFAIK because of
the so-called GIL (global interpreter lock). Matlab is perhaps even
worse in this respect.

Multithreading and parallel execution work fine. The only problem is that
you don't get the full benefit when you have multiple processors. This one
will become more of an annoyance in the future though as more systems have
hyperthreading and multi-core processors.
And there is a couple of questions I need answered:

1. Can python do "pass by reference"? Are datastructures represented by
references as in Java (I don't know yet).
Python only does "pass by reference", although it is more normally referred
to as "pass by object reference" to distinguish it from language where the
references refer to variables rather than objects.

What it doesn't do is let you rebind a variable in the caller's scope which
is what many people expect as a consequence of pass by reference. If you
pass an object to a function (and in Python *every* value is an object)
then when you mutate the object the changes are visible to everything else
using the same object. Of course, some objects aren't mutable so it isn't
that easy to tell that they are always passed by reference.
 
D

Dennis Lee Bieber

1. Intendation as a part of the syntax, really annoying.
Really lovely --- no need for superfluous begin/end blocking to
delimit the logic (and any decent programming course already recommends
indenting to illustrate the nesting of the logic)
2. The "self.something" syntax is really tedious (look to ruby)!
I did look at Ruby once... It looked to me like the worst aspects of
PERL grafted onto the worst parts of old Python.
4. Multithreading and parallel execution is impossible AFAIK because of
the so-called GIL (global interpreter lock). Matlab is perhaps even
worse in this respect.
For CPU-bound number-crunching, perhaps... For I/O-bound jobs, the
GIL is(should be) released when ever a thread is blocked waiting for I/O
to complete.
5. I don't like numpy's array slicing. Array operations should be a
part of the language, as in Matlab, Fortran 90, Ada 95, D, Octave.
Well, it took what -- 30 years for FORTRAN to evolve those
operations? 15 for Ada? And both of those were targeted for
computationally intensive jobs. Python started with a different goal, so
far as I recall. Something closer to REXX.
And there is a couple of questions I need answered:

1. Can python do "pass by reference"? Are datastructures represented by
references as in Java (I don't know yet).
Everything in Python is a reference to an object. I think the
question you want is more on the lines of: Can I change an object that
has been passed? Short answer: depends on what type of object is at the
end of the reference. A mutable container object (list, dictionary,
maybe a class instance) can have its contents changed.

def junk(a):
a.append(3.1415) #if "a" referenced a list object
#this appends to the "contents"
#of the list passed in.
a = [] #no effect on the actual object passed in
#inside junk, "a" started as a reference to
#whatever was passed in
#this assignment now makes "a" a reference to an
#empty list, but has no effect on the original reference

b = [1, 2, 3] #"b" is a reference to a list somewhere in memory
junk(b) #passes a copy of the reference
3. Speed. I haven't seen any performance benchmarks that actually deals
with things that are important for scientific programs.
Feel free to produce them -- benchmarks tend to be created by the
people with an interest in whatever is being benchmarked. If you haven't
found any, I'd conclude that either no one is interested in the tasks
you find important, or they are interested, but have found Python
sufficient in practice that they don't feel a need to spend the time
creating a fair benchmark suite.
--
 
S

sturlamolden

Dennis said:
Everything in Python is a reference to an object. I think the
question you want is more on the lines of: Can I change an object that
has been passed? Short answer: depends on what type of object is at the
end of the reference. A mutable container object (list, dictionary,
maybe a class instance) can have its contents changed.

Thank you.
 
E

Evan Monroig

First there are a few things I don't like:

Hi, I will respond to things that others haven't responded yet
2. How good is matplotlib/pylab? I tried to install it but only get
error messages so I haven't tested it. But plotting capabilities is
really major issue.

I don't know because I haven't managed to get it working either. But
other people have and I guess it should not be so difficult.

I personally use gnuplot-py (gnuplot-py.sourceforge.net) which I adapted
to the new NumPy/SciPy (see below) by searching-and-replacing "Numeric"
by "numpy". It allows to use raw gnuplot commands.
4. Are there "easy to use" libraries containing other stuff important
for scientific programs, e.q. linear algebra (LU, SVD, Cholesky),
Fourier transforms, etc. E.g. in Matlab I can just type,

[u,s,v] = svd(x) % which calls LAPACK linked to ATLAS or
vendor-optimized BLAS

Yes !

There is the excellent SciPy package, which you can get at www.scipy.org

Personnally I use it a lot for linear algebra (linked to
LAPACK/ATLAS/BLAS), but there are also libraries for statistics,
optimization, signal processing, etc.

There has been many changes recently, including package names, so don't
get confused and be sure to get recent versions of NumPy and SciPy ;).

Evan
 
A

Alex Martelli

Robert Kern said:
Python still uses reference counting and has several very good data structures
more complex than arrays. And yet, most programs don't leak memory.

Python uses reference counting *AND* cyclic-garbage collection for the
kind of garbage that wouldn't go away by RC due to reference-cycles
(plus, also, weak references for yet another helper). To leak memory
despite all of that, you really need to do it on purpose (e.g. via a
C-coded container extension-type that does NOT play nice with gc;-).


Alex
 
S

Steve Holden

sturlamolden said:
Robert Kern wrote:





First there are a few things I don't like:

1. Intendation as a part of the syntax, really annoying.
Troll. You think this is going away because *you* don't like it? Am I to
presume that you don't bother to indent your C code according to its
nested block structure? If you *do* indent your C code, perhaps you can
explain the additional benefits of the braces?
2. The "self.something" syntax is really tedious (look to ruby)!
This is done because of a preference from explicit references over
implied ones. It does avoid a lot of namespace confusion.

By the way, anyone who can't count shouldn't be criticising programming
languages. What happened to "3"?
4. Multithreading and parallel execution is impossible AFAIK because of
the so-called GIL (global interpreter lock). Matlab is perhaps even
worse in this respect.
Right. So kindly tell us how to write thread-safe code without using a
GIL. This is not an easy problem, and you shouldn't assume that all you
have to do to get rid of the GIL is to wave your magic wand. There are
deep reasons why the GIL is there.
5. I don't like numpy's array slicing. Array operations should be a
part of the language, as in Matlab, Fortran 90, Ada 95, D, Octave.
Slicing *is* a part of the language, inserted into the grammar (as far
as I know) precisely to support the numeric/scientific community.
And there is a couple of questions I need answered:

1. Can python do "pass by reference"? Are datastructures represented by
references as in Java (I don't know yet).
All assignments store references.
2. How good is matplotlib/pylab? I tried to install it but only get
error messages so I haven't tested it. But plotting capabilities is
really major issue.
Good enough to keep you away, apparently ;-) (Sorry, I don't use these
features).
3. Speed. I haven't seen any performance benchmarks that actually deals
with things that are important for scientific programs.
The major fact here is that no matter how fast a language is there is
always a need for more speed in certain areas.

Suffice it to say that Python is being used for a wide range of
scientific and engineering problems to the evident satisfaction of its
users.
4. Are there "easy to use" libraries containing other stuff important
for scientific programs, e.q. linear algebra (LU, SVD, Cholesky),
Fourier transforms, etc. E.g. in Matlab I can just type,

[u,s,v] = svd(x) % which calls LAPACK linked to ATLAS or
vendor-optimized BLAS

Even though the language itself is very limited this type of library
functionality more than makes up for it.
The more people who join in and write libraries to add to the growing
corpus of scientific and engineering libraries the sooner the answer to
this question will be "we have everything you want".

For the moment, however, since apparently Google isn't available where
you are, a quick search for "Python LAPACK" gave

http://mdp-toolkit.sourceforge.net/faq.html

as its first hit. This appears to include information about how to have
LAPACK make use of ATLAS' faster LAPACK routines. Satisfied?
I have looked for alternatives to Matlab for quite a while, mainly due
to the cost, the åpoor speed and poor memory management. I am not sure
it is Python but so far I have not found anything mor promising either.

You know, recently the Python community has acquired a reputation in
certain quarters for defensive support of the status quo. With
ill-informed criticism like this from self-confessed beginners it's not
hard to see how this myth has arisen.

I'd be very surprised if Python doesn't already give you 95% of what you
appear to want. If you prejudices about indented code and self-relative
references blind you to the clear advantages of the Python environment
then frankly you are a lost cause, and good riddance.

If, on the other hand, you are prepared to engage the community and do a
little bit of learning rather than just trolling, you may find that one
of the most valuable features of Python is its supportive user base,
whom at the moment you seem to be doing your best to offend.

regards
Steve
 
M

Michael Tobis

1) indentation:

I claim that this is a trivial matter. Indentation is enforced, but if
you want to end all your blocks with #end, feel free.

Or write a preprocessor to go from your preferred block style to
python's

2) self.something tedious to look at.

Again, you can somewhat work around this if you like. Just use "s" or
"_" instead of "self".

3) missing

4) multithreading and parallel execution impossible

This is simply false, though admittedly the MPI libraries need a little
work. Mike Steder of our group is likely to release an alternative. A
good way to think of Python is as a powerful extension to C. So using
MPI from Python just amounts to setting up a communicator in C and
wrapping the MPI calls.

As for less tightly coupled threads on a single processor, Python is
adept at it. I think the issues with multiple processors are much more
those of a server farm than those of a computational cluster.

We have been encountering no fundamental difficulty in cluster programs
using Python.

5) "I don't like numpy's array slicing" ?

this is unclear. It is somewhat different form Matlab's, but much more
powerful.

1) pass by reference

Python names are all references. The model is a little peculiar to
Fortran and C people, but rather similar to the Java model.

2) matplotlib

A correct install can be difficult, but once it works it rocks. ipython
(a.k.a. pylab) is also a very nice work environment.

3D plots remain unavailable at present.

3) speed

Speed matters less in Python than in other languages because Python
plays so well with others. For many applications, NumPy is fine.
Otherwise write your own C or C++ or F77; building the Python bindings
is trivial. (F9* is problematic, though in fact we do some calling of
F90 from Python using the Babel toolkit)

4) useful libraries

yes. for your svd example see Hinsen's Scientific package. In general,
Python's claim of "batteries included" applies to scientific code.

mt
 
M

Magnus Lycka

Dennis said:
I did look at Ruby once... It looked to me like the worst aspects of
PERL grafted onto the worst parts of old Python.

Don't forget that there are portions of Smalltalk syntax
(blocks) added in as well. I guess it could be seen as Perl-NG.
Both the name 'Ruby' and the Ruby syntax seems to suggest that
Matz had the idea to "flirt" a bit with the Perl programmers,
and considering how Perl seems to be in decline today, that
might have been clever from a user-base point of view. Whether
it was really good for the language is another issue. I still
think it's a bit prettier than Perl though.
For CPU-bound number-crunching, perhaps... For I/O-bound jobs, the
GIL is(should be) released when ever a thread is blocked waiting for I/O
to complete.

I think CPU-bound number-crunching was the big deal in this case.
Somehow, I doubt that the OP uses Matlab for I/O-bound jobs. At
least if writing threaded applications becomes less error prone
in competing languages, this might well be the weak point of Python
in the future. I hope to see some clever solution to this from the
Python developers.

It seems the Python attitude to performance has largely been:
Let Python take care of development speed, and let Moore's law
and the hardware manufacturers take care of execution speed. As
it seems now, increases in processing speed the coming years
will largely be through parallell thread. If Python can't utilize
that well, we have a real problem.

Python is not primarily a mathematics language. It's not a text
processing language either, so no regexp support directly in the
syntax. That might make it less ideal as a Matlab substitute, or
as a sed or awk substitute, but on the other hand, it's useful for
so many other things...
Everything in Python is a reference to an object. I think the
question you want is more on the lines of: Can I change an object that
has been passed?

The key lies in understanding that "a=b" means "bind the local name
(unless declared global) "a" to the object the name "b" refers to.
It never means "copy the content of b into the location of a".
 
B

Bil Kleb

Magnus said:
Don't forget that there are portions of Smalltalk syntax
(blocks) added in as well. I guess it could be seen as Perl-NG.

Actually, the Perl part was one of the last steps
in the Ruby recipe according to Matz:

Ruby is a language designed in the following steps:

* take a simple lisp language (like one prior to CL).
* remove macros, s-expression.
* add simple object system (much simpler than CLOS).
* add blocks, inspired by higher order functions.
* add methods found in Smalltalk.
* add functionality found in Perl (in OO way).

So, Ruby was a Lisp originally, in theory.
Let's call it MatzLisp from now on. ;-)

--from http://ruby-talk.org/cgi-bin/scat.rb/ruby/ruby-talk/179642

Regards,
 
T

Tim Hochberg

Magnus said:
Dennis Lee Bieber wrote: [SNIP]
For CPU-bound number-crunching, perhaps... For I/O-bound jobs, the
GIL is(should be) released when ever a thread is blocked waiting for I/O
to complete.


I think CPU-bound number-crunching was the big deal in this case.
Somehow, I doubt that the OP uses Matlab for I/O-bound jobs. At
least if writing threaded applications becomes less error prone
in competing languages, this might well be the weak point of Python
in the future. I hope to see some clever solution to this from the
Python developers.

I don't disagree with this, but it's largely irrelevant to CPU-bound
number-crunching using numpy and its bretheren. In that case the bulk of
the work is going on in the array extensions, in C, and thus the GIL
should be released. Whether it actually is released, I can't say --
never having been blessed/cursed with a multiproccessing box, I haven't
looked into it.

[SNIP]

-tim
 
D

Dennis Lee Bieber

I think CPU-bound number-crunching was the big deal in this case.
Somehow, I doubt that the OP uses Matlab for I/O-bound jobs. At

Concede to that -- though the original comment was rather absolute,
to me, allowing no slack for differing problem domains.
--
 
P

Pauli Virtanen

Michael said:
3) speed

Speed matters less in Python than in other languages because Python
plays so well with others. For many applications, NumPy is fine.
Otherwise write your own C or C++ or F77; building the Python bindings
is trivial. (F9* is problematic, though in fact we do some calling of
F90 from Python using the Babel toolkit)

I have met no problems using F90 together with f2py -- in fact usually it
can bind f90 code to python completely automatically with no need to write
extra glue code.

The only problem was setting up Intel Fortran compiler and making it play
along with f2py, but I suppose the compiler business will become easier
when gfortran matures.
 
M

Michael Tobis

I have met no problems using F90 together with f2py

Thank you for the correction. I should have qualified my statement.

Our group must cope with F90 derived types to wrap a library that we
need. f2py fails to handle this case. While the f2py site alleges that
someone is working on this, I contacted the fellow and he said it was
on hold.

To our knowledge only the (officially unreleased) Babel toolkit can
handle F9* derived types. I would be pleased to know of alternatives.
To be fair, Babel remains in development, and so perhaps it will become
less unwieldy, and the developers have been very helpful to us. Still,
it certainly is not as simple a matter as f2py or swig.

mt
 
T

Terry Reedy

Magnus Lycka said:
It seems the Python attitude to performance has largely been:
Let Python take care of development speed, and let Moore's law
and the hardware manufacturers take care of execution speed.

I think this is pretty fair, and yet .... the core Python interpreter has
perhaps doubled in speed (hardware held constant) since some years ago.
And new builtins like enumerate speed up code that needs to enumerate
sequence items (which is not uncommon). And class sets.Set is rewritten in
C as builtin type set primarily for speed.

It is certainly true that Guido regards continued correct performance of
the interperter to be more important that greater speed. Ditto for Python
programs.
it seems now, increases in processing speed the coming years
will largely be through parallell thread. If Python can't utilize
that well, we have a real problem.

I believe it is Guido's current view, perhaps Google's collective view, and
a general *nix view that such increases can just as well come thru parallel
processes. I believe one can run separate Python processes on separate
cores just as well as one can run separate processes on separate chips or
separate machines. Your view has also been presented and discussed on the
pydev list. (But I am not one for thread versus process debate.)
At
least if writing threaded applications becomes less error prone
in competing languages, this might well be the weak point of Python
in the future.

Queue.Queue was added to help people write correct threaded programs.
I hope to see some clever solution to this from the Python developers.

A Python developer is one who helps develop Python. Threading improvements
will have to come from those who want it enough to contribute to the
effort. (There have been some already.)

Terry Jan Reedy
 
P

Peter Maas

Duncan said:
Python only does "pass by reference", although it is more normally referred
to as "pass by object reference" to distinguish it from language where the
references refer to variables rather than objects.

What it doesn't do is let you rebind a variable in the caller's scope which
is what many people expect as a consequence of pass by reference. If you
pass an object to a function (and in Python *every* value is an object)
then when you mutate the object the changes are visible to everything else
using the same object. Of course, some objects aren't mutable so it isn't
that easy to tell that they are always passed by reference.

This is hard to understand for an outsider. If you pass an int, a float,
a string or any other "atomic" object to a function you have "pass by
value" semantics. If you put a compound object like a list or a dictionary
or any other object that acts as an editable data container you can return
modified *contents* (list elements etc.) to the caller, exactly like in
Java and different from C/C++.

Peter Maas, AAchen
 
P

Peter Maas

Steve said:
Troll. You think this is going away because *you* don't like it?

You are over-reacting. Keep in mind that sturlamolden has criticized
Python and not you :) I think there is a more convincing reply to
indentation phobia:

It is natural that compiler and programmer agree on how to identify
block structures. Anybody who disagrees should bang his code against
the left side or put everything in one line to get rid of annoying
line breaks. :)

Peter Maas, AAchen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,074
Latest member
StanleyFra

Latest Threads

Top