At said:
(e-mail address removed) (Christopher T King) wrote in
Yes but see I'm interested in what speed hacks can actually be done to
improve the above code. I just don't see anything that can iterate and add
over that memory region faster.
Well, numarray probably isn't faster for this case (adding a scalar to
a vector). In fact, the relevant numarray code looks like this:
static int add_Float64_vector_scalar(long niter, long ninargs, long noutargs, vo
id **buffers, long *bsizes) {
long i;
Float64 *tin1 = (Float64 *) buffers[0];
Float64 tscalar = *(Float64 *) buffers[1];
Float64 *tout = (Float64 *) buffers[2];
for (i=0; i<niter; i++, tin1++, tout++) {
*tout = *tin1 + tscalar;
}
return 0;
}
What you *do* get with numarray is:
1) transparent handling of byteswapped, misaligned, discontiguous,
type-mismatched data (say, from a memory-mapped file generated on a
system with a different byte order as single-precision instead of
double-precision).
2) ease-of-use. That two lines of python code above is _it_ (except
for an 'import numarray' statement). Your C code isn't anywhere
nearly complete enough to use. You would need to add routines to
read the file, etc.
3) interactive use. You can do all this in the Python command line. If
you want to multiply instead of add, an up-arrow and some editing
will do that. With C, you'd have to recompile.
If you need the best possible speed (after doing it in numarray and
finding it isn't fast enough), you can write an extension module to
do that bit in C, or look into scipy.weave for inlining C code, or into
f2py for linking Fortran code to Python.