switching to numpy and failing, a user story

G

greg.landrum

After using numeric for almost ten years, I decided to attempt to
switch a large codebase (python and C++) to using numpy. Here's are
some comments about how that went.

- The code to automatically switch python stuff over just kind of
works. But it was a 90% solution, I could do the rest by hand. Of
course, the problem is that then the code is still using the old
numeric API, so it's not a long term solution. Unfortunately, to switch
to the numpy API one needs documentation, which is a problem; see
below.

- Well, ok, the automatic switching code doesn't really work all that
well... my uses of RandomArray still work, but they generate different
numbers. The underlying random-number generator must have changed. I'm
sure that it's "better" now, but it's different. This is a major pain
for my regression tests that rely on seeding the random number
generator and getting particular results. But that's ok, I can update
the regressions for the new RNG.

- My extension modules just won't build because the new numpy stuff
lives in a different location from where Numeric used to live. I
probably could fix this, but without the documentation I can't figure
out how to do that. I'd also need to figure out how to port my code to
use the new numpy API instead of the compatibility layer, but I can't
do that without docs either.

- I guess I should just buy the documentation. I don't like this idea,
because I think it's counter-productive to the project to have payware
docs (would Python be successful if you had to buy the documentation? I
don't think so), but that's the way this project goes. I'm doubly
unhappy about it because they payment system is using Paypal and I
don't like Paypal at all, but I guess that's just the way it goes. Oh,
wait, I *can't* buy the docs because I'm not in the US and the payment
page requires a US address. I give up; I guess NumPy is only for people
living in the US.

I guess I'll come back to NumPy in 2010, when the docs are available.

-greg
 
R

Robert Kern

After using numeric for almost ten years, I decided to attempt to
switch a large codebase (python and C++) to using numpy. Here's are
some comments about how that went.

- The code to automatically switch python stuff over just kind of
works. But it was a 90% solution, I could do the rest by hand. Of
course, the problem is that then the code is still using the old
numeric API, so it's not a long term solution. Unfortunately, to switch
to the numpy API one needs documentation, which is a problem; see
below.

Actually, it's not too hard to do so with the files that come with the source
and utilizing the mailing list. I've converted quite a lot of code without
reference to _The Guide to NumPy_.
- Well, ok, the automatic switching code doesn't really work all that
well... my uses of RandomArray still work, but they generate different
numbers. The underlying random-number generator must have changed. I'm
sure that it's "better" now, but it's different. This is a major pain
for my regression tests that rely on seeding the random number
generator and getting particular results. But that's ok, I can update
the regressions for the new RNG.

Sorry, but the old PRNG code had an non-open source license that prohibited
commercial use, and it had to be replaced.
- My extension modules just won't build because the new numpy stuff
lives in a different location from where Numeric used to live. I
probably could fix this, but without the documentation I can't figure
out how to do that.

That is documented in the files that come with the source. Or you could have
asked us on the numpy mailing list. In short, if you use numpy.distutils,
everything would have been taken care of for you; otherwise, numpy.get_include()
will give you the location of the headers.
- I guess I should just buy the documentation. I don't like this idea,
because I think it's counter-productive to the project to have payware
docs

(would Python be successful if you had to buy the documentation? I
don't think so), but that's the way this project goes. I'm doubly
unhappy about it because they payment system is using Paypal and I
don't like Paypal at all, but I guess that's just the way it goes. Oh,
wait, I *can't* buy the docs because I'm not in the US and the payment
page requires a US address. I give up; I guess NumPy is only for people
living in the US.

Or you could have emailed Travis, and he would have worked around the issue.

I'm sorry that you had such problems, but if you had let us know about them
earlier, we could have helped you out.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
T

Travis E. Oliphant

After using numeric for almost ten years, I decided to attempt to
switch a large codebase (python and C++) to using numpy. Here's are
some comments about how that went.

- The code to automatically switch python stuff over just kind of
works. But it was a 90% solution, I could do the rest by hand. Of
course, the problem is that then the code is still using the old
numeric API, so it's not a long term solution. Unfortunately, to switch
to the numpy API one needs documentation, which is a problem; see
below.

I'm glad to hear of your experiences (good and bad). We need feedback
exactly from users like you in order to improve things. Yep, the code
converter is a 80% solution, but it does work. Improvements are always
welcome.
- Well, ok, the automatic switching code doesn't really work all that
well... my uses of RandomArray still work, but they generate different
numbers. The underlying random-number generator must have changed. I'm
sure that it's "better" now, but it's different. This is a major pain
for my regression tests that rely on seeding the random number
generator and getting particular results. But that's ok, I can update
the regressions for the new RNG.

You definitely can't expect the same random number generator. The new
one (thanks to Robert Kern) is quite good.

- My extension modules just won't build because the new numpy stuff
lives in a different location from where Numeric used to live.

This is an easy one and is documented in lots of places on the new
http://www.scipy.org site. Plus, people are always willing to help out
if you just ask. The numpy-discussion list is quite active. Don't be
shy.

- I guess I should just buy the documentation. I don't like this idea,
because I think it's counter-productive to the project to have payware
docs (would Python be successful if you had to buy the documentation? I
don't think so), but that's the way this project goes.

It's probably better to call it "complete documentation." Normal
open-source documentation is available from http://www.scipy.org. There
are lots of people who have helped it. I had to do something to at
least pretend to justify the time NumPy took me to the people who care
about how I spend my time (including my family). This was the best I
could come up with.

I'm doubly
unhappy about it because they payment system is using Paypal and I
don't like Paypal at all, but I guess that's just the way it goes. Oh,
wait, I *can't* buy the docs because I'm not in the US and the payment
page requires a US address.

Is that really true? A lot of people not living in the U.S. have used
Paypal successfully. When difficulties arise, communicating with me
your difficulty is usually productive.

I give up; I guess NumPy is only for people living in the US.

Definitely not true. People in Singapore, Japan, Ghana, South Africa,
France, Germany, New Zealand, Australia, and many other countries are
using NumPy successfully. Gratefully, a few have contributed by buying
the book, but a lot more have downloaded and are successfully using it.

I'm sorry about your experience, you definitely don't *have* to buy the
book to use NumPy. Just like you don't *have* to buy any Python book to
use Python. The amount of documentation for NumPy is growing and I
expect that trend to continue. There is a lot of information in the
source file.
I guess I'll come back to NumPy in 2010, when the docs are available.

Or just ask on the mailing lists, use the numpy.oldnumeric interface
(the differences are all documented in the first few pages of my book
which is available for free now).

Thanks,

-Travis
 
S

sturlamolden

Travis said:
Definitely not true. People in Singapore, Japan, Ghana, South Africa,
France, Germany, New Zealand, Australia, and many other countries are
using NumPy successfully. Gratefully, a few have contributed by buying
the book, but a lot more have downloaded and are successfully using it.
From the PayPal site for purchasing your book: "Select Payment Type:
Don't have a PayPal account? You don't need an account. Pay securely
using your credit card." I bought the book from Norway using my credit
card, and received the pdf file soon afterwards. Obviously the book can
be bought without a PayPal account or a us billing address.

NumPy is the most versatile array type I have worked with, including
those of Matlab and Fortran 95. In particular, the explosive memory use
of Matlab is avoided. Keep up the good work!
 
G

greg.landrum

sturlamolden said:
Don't have a PayPal account? You don't need an account. Pay securely
using your credit card." I bought the book from Norway using my credit
card, and received the pdf file soon afterwards. Obviously the book can
be bought without a PayPal account or a us billing address.

ok, my apologies on this one. I was wrong about the paypal option. I
missed the spot (right at the top, where I should have seen it) to
change the country. There is indeed a way to buy the book if you don't
live in the US.

-greg
 
G

greg.landrum

Travis said:
It's probably better to call it "complete documentation." Normal
open-source documentation is available from http://www.scipy.org. There
are lots of people who have helped it. I had to do something to at
least pretend to justify the time NumPy took me to the people who care
about how I spend my time (including my family). This was the best I
could come up with.

Given the quality of python's (free) documentation and how good it's
been for a very long time, it's bit ironic to be using the phrase
"normal open-source documentation" on this mailing list. Numeric
python, which numpy aspires to be a replacement for, has perfectly
reasonable documentation. It wasn't perfect, but it told you pretty
much everything you needed to know to get started, use the system, and
build extension modules. I guess this set my expectations for NumPy.
Or just ask on the mailing lists, use the numpy.oldnumeric interface
(the differences are all documented in the first few pages of my book
which is available for free now).

"Ask on the mailing lists" is viable for the occasional question or
detail, but it's not really an efficient way to get started with a
system. At least not for me. But that's fine, I have something that
works (numeric), and I can do what I need to do there.

-greg
 
T

Travis Oliphant

Travis E. Oliphant wrote:


Given the quality of python's (free) documentation and how good it's
been for a very long time, it's bit ironic to be using the phrase
"normal open-source documentation" on this mailing list. Numeric
python, which numpy aspires to be a replacement for, has perfectly
reasonable documentation.

And it is still perfectly useful. Only a couple of details have
changed. The overall description is still useful.

It wasn't perfect, but it told you pretty
much everything you needed to know to get started, use the system, and
build extension modules. I guess this set my expectations for NumPy.

This documentation was written largely due to funding from a national
laboratory. I didn't have those resources. If somebody wanted to step
up to the plate and make me an offer, the NumPy docs could be free as
well. So far, people have been content to buy it a piece at a time.
"Ask on the mailing lists" is viable for the occasional question or
detail, but it's not really an efficient way to get started with a
system. At least not for me. But that's fine, I have something that
works (numeric), and I can do what I need to do there.


Absolutely, that's the advantage of open source. If the world moves a
head you don't *have* to. It's entirely your choice. There is no lock-in.


-Travis
 
S

sturlamolden

Given the quality of python's (free) documentation and how good it's
been for a very long time, it's bit ironic to be using the phrase
"normal open-source documentation" on this mailing list. Numeric
python, which numpy aspires to be a replacement for, has perfectly
reasonable documentation. It wasn't perfect, but it told you pretty
much everything you needed to know to get started, use the system, and
build extension modules. I guess this set my expectations for NumPy.


NumPy is perhaps the most well thought array object known to man. That
includes those of Matlab, Fortran 95 and C++ libraries (e.g. Blitz++).
I don't think we should be modest about the quality of NumPy. NumPy
allows us to do serious number crunching in a well designed language -
Python. Scientists pay thosands of dollars for software that are not
par with NumPy, and we get NumPy for free.

Those involved in the development of NumPy must receive some
compensation. Financial support to NumPy also ensure that the
developmentcan continue. I for one does not want to see NumPy as
abandonware in the near future. Unfortunately, getting scientists to
make volounteer financial contributions to an open-source project has
proven difficult. A modest charge for the documentation is a fair way
of doing things. I can not be reimbursed by my employer for making a
donation to NumPy, so I would have to take that out of my own pocket. I
can be reimbursed for buying a copy of the documentation, however.
 
I

Istvan Albert

sturlamolden said:
Those involved in the development of NumPy must receive some
compensation. Financial support to NumPy also ensure that the
developmentcan continue. I for one does not want to see NumPy as

Then charge for NumPy ... or write a book *besides* the documentation.
One in which you make good use of NumPy and demonstrate the actual
problem solving process.

Charging for docs is just shooting yourself in the foot.

Plus that so called documention seems very unwieldy and unattractive,
long winded text, you can't search it, google won't index it -> people
won't find what they are looking for.

I.
 
S

Scott David Daniels

Istvan said:
Then charge for NumPy ... or write a book *besides* ....
Charging for docs is just shooting yourself in the foot.

You overlook the fact that this pricing model means that a developer
can produce some software, and freely deliver that software (with all
needed libraries) to a customer at a price that needn't be so high as
to make each of the programmer's customers pay as if they are using
the full library.

--Scott David Daniels
(e-mail address removed)
 
R

Ramon Diaz-Uriarte

Then charge for NumPy ... or write a book *besides* the documentation.
One in which you make good use of NumPy and demonstrate the actual
problem solving process.

Charging for docs is just shooting yourself in the foot.

I beg to disagree with you (even if I'd rather have the docs for free):

1. You have NumPy available, so you can use it. Paying for software
is, for many reasons, and for many of us, an absolute show stopper.

2. For many people the "for free" docs and help are enough.

3. As already said, this mechanism allows some people to make a
contribution that would otherwise be impossible. For instance, if you
use NumPy in your job, and your employer (be that a private business
or the public sector if you are paid with tax money) benefits from it,
how can you return that back? Ask that the book be bought. Moreover,
even if, say, I might be willing to pay for the book, maybe my grad
students can't; I might be able to use grant money, to purchase a copy
for a grad student.

(It is actually interesting that in the R help mailing list there are
from time to time suggestions that CDs with R ---even if R is GPL'd
software that you can download from the web--- be sellable by the R
foundation, as a possible way to allow employers to make a
contribution to the R project.)

Plus that so called documention seems very unwieldy and unattractive,

attractiveness is on the eye of the looker, I'd say. A lot of it looks
like classical good-looking LaTeX to me.
long winded text, you can't search it, google won't index it -> people
won't find what they are looking for.

Many textbooks and reference books are not indexed by google. Yet we
pay for them and we use them. These are incoveniences, not fatal
blows.


Best,

R.


--
Ramon Diaz-Uriarte
Computational Statistics Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz
 
F

Fernando Perez

After using numeric for almost ten years, I decided to attempt to
switch a large codebase (python and C++) to using numpy. Here's are
some comments about how that went.

- The code to automatically switch python stuff over just kind of
works. But it was a 90% solution, I could do the rest by hand. Of
course, the problem is that then the code is still using the old
numeric API, so it's not a long term solution. Unfortunately, to switch
to the numpy API one needs documentation, which is a problem; see
below.

[ RNG issues, already addressed ]
- My extension modules just won't build because the new numpy stuff
lives in a different location from where Numeric used to live. I
probably could fix this, but without the documentation I can't figure
out how to do that. I'd also need to figure out how to port my code to
use the new numpy API instead of the compatibility layer, but I can't
do that without docs either.

I have to call bull on this. I happen to have the per-pay book, but I
recently converted a large codebase from Numeric to Numpy, and I actually
never had to open the book. Travis has made sure that the
compatibility-related information is easy to find:

http://www.scipy.org/Converting_from_Numeric

has most of what you need, and this page:

http://www.tramy.us/guidetoscipy.html

contains a link to the first two chapters FOR FREE. A lot of what's needed
for a port is covered in there.

In addition, the C API is available here (as well as being obviously in the
source, which you have access to):

http://www.scipy.org/NumPyCapi

And the doc page has many more useful links:

http://www.scipy.org/Documentation

So yes, you have to buy the book. Travis has sunk over a year of his time
into an absolutely herculean effort that provides working scientists with
tools that are better than anything money can buy (yes, I have access to
both Matlab and IDL, and you can't pay me enough to use them instead of
Python).

And he has the gall to ask for some money for a 300 page book? How dare he,
when one can walk into any Barnes and Noble and just walk out of the store
with a cart full of books for free!

It's funny how I don't see anyone complaining about any of the Python books
sold here (or at any other publishing house):

http://www.oreilly.com/pub/topic/python

I recently was very happy to pay for the Twisted book, since I need Twisted
for a project and a well-organized book is a good complement to the
auto-generated documentation.


And finally, if you had porting problems, none were reported on any of the
numpy/scipy mailing lists (else you used a different name or email, since I
can't find traces of queries from you my gmail archive where I keep
everything posted on all the lists):

http://www.scipy.org/Mailing_Lists

Lots of people have been porting their codes recently, and inevitably some
have run into difficulties. EVERY single time when they actually say
something on the list, Travis (and others as well) is very fast with
specific help on exactly how to solve the problems, or with bug fixes when
the problem happens to be a numpy bug discovered by the user. But don't
take my word for it:

http://sourceforge.net/mailarchive/forum.php?thread_id=30703688&forum_id=4890

(and Francesc has found some really nasty things, given how pytables pushes
numpy far beyond where Numeric ever went, so this is not a light
compliment).

Look, I'm sure you had issues with your code, we all have. But I want to
make sure that others don't take from your message the wrong impression
regarding numpy, its future, its quality as a scientific computing
platform, or Travis (I'd build the man a statue if I could :).

The environment which is developing around Python for scientific computing
is nothing short of remarkable. If you find issues in the process, ask for
help on the numpy/scipy lists and I'm sure you will receive some. But
please refrain from spreading FUD.

Regards,

f
 
I

Istvan Albert

Fernando said:
It's funny how I don't see anyone complaining about any of the Python books
sold here (or at any other publishing house):

That is maybe because the language is fairly well documented to begin
with. Try to imagine for a moment how many people would use Python if
on the first page of documentation you'd see a link sending you to buy
the book...

No one is questioning one's right to try to sell a product/book etc.
But I happen to believe that trying to make money by selling the docs
is stupid, you'll scare away potential users, hinder the acceptance of
the product, further fragment the community of users needing such
functionality. Once I hit the page asking me to pay even before telling
me what NumPy does, I went back to Numeric. Even now I can't tell in
what way is NumPy different from Numeric or Numarray (I understand
that implementation wise it is different).

In the past I have donated to several open source projects (and
individual developers) whose work saved me a lot of time, I don't mind
the cost of it. But I just can't see doing the same in a "pay it
forward" fashion.

I wish the author all the best, with the remark that in this day and
age with the pervasive need for large scale management/analysis he
would/could make a lot more money by fostering relationships with
research groups (private/university) and selling his scientific
computing (support) expertise rather then a pdf file.

That's all.
 
R

Robert Kern

Istvan said:
No one is questioning one's right to try to sell a product/book etc.
But I happen to believe that trying to make money by selling the docs
is stupid, you'll scare away potential users, hinder the acceptance of
the product, further fragment the community of users needing such
functionality. Once I hit the page asking me to pay even before telling
me what NumPy does, I went back to Numeric. Even now I can't tell in
what way is NumPy different from Numeric or Numarray (I understand
that implementation wise it is different).

Really? This link is made before any mention of the book on numpy.scipy.org:

http://numpy.scipy.org/new_features.html
I wish the author all the best, with the remark that in this day and
age with the pervasive need for large scale management/analysis he
would/could make a lot more money by fostering relationships with
research groups (private/university) and selling his scientific
computing (support) expertise rather then a pdf file.

Since he is working towards tenure, working on numpy and the book is about as
much as his tenure committee will tolerate, and that grudgingly. Selling
consulting services in any significant amount is probably out of the question.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
R

Robert Kern

Istvan said:
No one is questioning one's right to try to sell a product/book etc.
But I happen to believe that trying to make money by selling the docs
is stupid, you'll scare away potential users, hinder the acceptance of
the product, further fragment the community of users needing such
functionality. Once I hit the page asking me to pay even before telling
me what NumPy does, I went back to Numeric. Even now I can't tell in
what way is NumPy different from Numeric or Numarray (I understand
that implementation wise it is different).

Really?

http://numpy.scipy.org/new_features.html

Okay, I guess this link is indeed a few whole sentences after the mention of the
book. And also after the link to the sample chapters which, lo and behold,
contain even more detailed information about the changes as well as instructions
on how to port Numeric and numarray code to numpy.

http://numpy.scipy.org/numpybooksample.pdf
I wish the author all the best, with the remark that in this day and
age with the pervasive need for large scale management/analysis he
would/could make a lot more money by fostering relationships with
research groups (private/university) and selling his scientific
computing (support) expertise rather then a pdf file.

Since he is working towards tenure, working on numpy and the book is about as
much as his tenure committee will tolerate, and that grudgingly. Selling
consulting services in any significant amount is probably out of the question.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
R

Robert Kern

Apologies for the dupe. It looked like something went wrong with the first send
(and the first post was partly incorrect to begin with).

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top