Pyrex speed

J

Jim Lewis

Has anyone found a good link on exactly how to speed up code using
pyrex? I found various info but the focus is usually not on code
speedup.
 
D

Diez B. Roggisch

Jim said:
Has anyone found a good link on exactly how to speed up code using
pyrex? I found various info but the focus is usually not on code
speedup.

The code speedup comes through the usage of C by pyrex itself, and using
it to put a thin layer over C-functions available/coded for that purpose.

Diez
 
J

Jim Lewis

I'm not planning to write C functions. My understanding is that by
using cdefs in the python code one can gain substantial speed. I'm
trying to find a description of how to modify python code in more
detail so it runs fast under pyrex.
 
S

Simon Percivall

You can gain substantial speed-ups in very certain cases, but the main
point of Pyrex is ease of wrapping, not of speeding-up.

Depending on what you're doing, rewriting in Pyrex or even in C, using
the Python/C API directly, might not gain you much.
 
J

Jim Lewis

main point of Pyrex is ease of wrapping, not of speeding-up.

Supposedly the primes example is 50 times faster.
 
J

Jarek Zgoda

Jim Lewis napisal:
Supposedly the primes example is 50 times faster.

How often you perform primes calculations in your programs? In my >10
years of professional career in writing business software I never had an
opportunity to do any more sophisticated math than simple adding,
multiplying, subtracting and dividing.
 
J

Jim Lewis

I never had an opportunity to do any more sophisticated math than simple adding,
multiplying, subtracting and dividing.

Neither is the primes example doing anything more sophisticated than
basic arithmetic but it's 50 times faster.
 
G

Graham Breed

Jim Lewis \/\/|20+3:
I'm not planning to write C functions. My understanding is that by
using cdefs in the python code one can gain substantial speed. I'm
trying to find a description of how to modify python code in more
detail so it runs fast under pyrex.

I've used pyrex to speed up my code. It worked. While it isn't
intended as a tutorial on pyrex you can have a look at it here:

http://www.microtonal.co.uk/temper.html

The trick is to write C functions using pyrex. That's not much easier
than writing C functions in C. But I still found it convenient enough
to be worth doing that way. Some tips:

- declare functions with cdef

- declare the type of every variable you use

- don't use Python builtins, or other libraries

The point of these rules is that generated C code using Python
variables will still be slow. You want Pyrex to write C code using C
variables only. To check this is happening you can look at the
automatically generated source code to make sure there are no reference
counting functions where there shouldn't be.

The usual rule for C optimization applies -- rewrite the code that
you're spending most time in. But if that innermost function's being
called from a loop it can be worth changing the loop as well so that
you pass in and out C variables.

HTH,

Graham
 
?

=?ISO-8859-1?Q?Gonzalo_Monz=F3n?=

Hi Jim,

It depends a lot on what you're doing. You will get speed up from Pyrex
or wrapping C code if you understand how does it work internally, and to
speed up you application via coding *only* Pyrex parts (I mean don't
using it for wrapping C but implementing in Pyrex), it limits a lot the
things that you can expect to get faster -'cause on some kind of things
you can even get better performance coding that in straight Python than
in Pyrex and converted to C & compiled, I thought you should know how
Python works in the C side to understand it fully-.

I attach some examples of different code where C is a lot faster, or
just a little bit faster (and I compare with C counterparts, not Pyrex
ones -Pyrex is only used for wrapping in these examples-). So you can
get an idea of why depends a lot on what you're doing. If you plan only
using cdefs to speed-up Python code, you're very limited in the things
that could be speed-up. Try to do some experiments and examine the C
generated code by Pyrex, and you will see why it is -you will see how
Pyrex does Python C api function calls for conversion from Python
objects to C type values every time you use that var, and that's not a
great gain, even in some kind of operations can be worse as Python does
a better job than generated C code by Pyrex for some operations or value
conversions (i.e. when doing operations on some kind of iterable objects
I remember to read on some paper that Pyrex does not traslate to the
faster C approach)

Some days ago I posted some timing results for a function coded in
Python, or coded in C and wrapped by Pyrex. C approach was more than 80
times faster. And I attach below another one, where C isn't much a gain
(1 time faster).

Example A:
This code is more than 80 times faster than a "easy" Python
implementation. For every call, it does some bitwise operations and does
an array lookup for every string character from argument. Its a lot
faster because in Python approach a list lookup is done and it is a lot
faster to do a C array lookup -thought that in these C loops no Python
type value conversions are needed, if it where the case, C approach
would not be so faster than python. I don't know how would perform an
array based Python code, but I expect it to be a lot faster than using a
list, so Python code can be speed up a lot if you know how to do it.

// C code:
int CRC16Table[256]; // Filled elsewhere
int CalcCRC16(char *str)
{
int crc;

for(crc = 0xFFFF; *str != 0; str++) {
crc = CRC16Table [(( crc >> 8 ) & 255 )] ^ ( crc << 8 ) ^ *str;
}

return crc;
}

# Python code
gCRC16Table = [] # Filled elsewhere
def CalcCRC16(astr):
crc = 0xFFFFL
for c in astr:
crc = gCRC16Table[((crc >> 8) & 255)] ^ ((crc & 0xFFFFFF) << 8)
^ ord(c)
return crc

-------------------------------------------------------------------------
Example B:
If we do compare the functions below, Python approach is only a bit
slowly than C implementation. I know both aren't the faster approaches
for every language, but that's a different issue. C here is only about 1
time faster:

// C code. gTS type is struct { int m, int s }
gTS gTS_diff(gTS t0, gTS t1) {
gTS retval;

retval.s = (t1.s-t0.s);
if ((t0.m>t1.m)) {
retval.m = (t1.m-t0.m);

while((retval.m<0)) {
retval.s = (retval.s-1);
retval.m = (m+1000);
}
} else {
retval.m = (t1.m-t0.m);
}

while((retval.m>999)) {
retval.m = (retval.m-1000);
retval.s = (retval.s+1);
}
return retval;
}

# Python code (t0 and t1 are tuples)
def gts_diff(t0,t1):
s = t1[0] - t0[0]
if (t0[1] > t1[1]):
m = t1[1] - t0[1]

while m < 0:
s = s - 1
m = m + 1000
else:
m = t1[1] - t0[1]

while m > 999:
m = m - 1000
s = s + 1
return s, m


I encourage you to google for some Pyrex papers on the net, they explain
the "to do"'s and the "not to do"'s with Pyrex. Sorry but I don't have
the urls.

Regards,
Gonzalo

Jim Lewis escribió:
 
J

John Machin

On 28/05/2006 12:10 AM, Gonzalo Monzón wrote:

[good advice snipped]
Example A:
This code is more than 80 times faster than a "easy" Python
implementation. For every call, it does some bitwise operations and does
an array lookup for every string character from argument. Its a lot
faster because in Python approach a list lookup is done and it is a lot
faster to do a C array lookup -thought that in these C loops no Python
type value conversions are needed, if it where the case, C approach
would not be so faster than python. I don't know how would perform an
array based Python code, but I expect it to be a lot faster than using a
list, so Python code can be speed up a lot if you know how to do it.

// C code:
int CRC16Table[256]; // Filled elsewhere
int CalcCRC16(char *str)
{
int crc;
for(crc = 0xFFFF; *str != 0; str++) {
crc = CRC16Table [(( crc >> 8 ) & 255 )] ^ ( crc << 8 ) ^ *str;

Gonzalo, just in case there are any C compilers out there which need to
be told:
> for(crc = 0xFFFF; *str != 0;) {
> crc = CRC16Table [(( crc >> 8 ) & 255 )] ^ ( crc << 8 ) ^ *str++;

}
return crc;
}

# Python code
gCRC16Table = [] # Filled elsewhere
def CalcCRC16(astr):
crc = 0xFFFFL

Having that L on the end (plus the fact that you are pointlessly
maintaining "crc" as an *unsigned* 32-bit quantity) will be slowing the
calculation down -- Python will be doing it in long integers. You are
calculating a *sixteen bit* CRC! The whole algorithm can be written
simply so as to not need more than 16-bit registers, and not to pollute
high-order bits in 17-or-more-bit registers.
for c in astr:
crc = gCRC16Table[((crc >> 8) & 255)] ^ ((crc & 0xFFFFFF) << 8) ^
ord(c)

Note that *both* the C and Python routines still produce a 32-bit result
with 16 bits of high-order rubbish -- I got the impression from the
previous thread that you were going to fix that.

This Python routine never strays outside 16 bits, so avoiding your "&
255" and a final "& 0xFFFF" (which you don't have).

def CalcCRC16(astr):
crc = 0xFFFF
for c in astr:
crc = gCRC16Table[crc >> 8] ^ ((crc & 0xFF) << 8) ^ ord(c)
return crc

==============
To the OP:

I'd just like to point out that C code and Pyrex code can gain
signicantly (as the above example does) by not having to use ord() and
chr().

As Gonzalo says, read the generated C code. Look for other cases of
using Python built-ins that could be much faster with a minor bit of
effort in Pyrex e.g. "max(a, b)" -> "(a) > (b) ? (a) : (b) " or if you
don't like that, a cdef function to get the max of 2 ints will be *way*
faster than calling Python's max()
 
?

=?ISO-8859-1?Q?Gonzalo_Monz=F3n?=

Hi John,

John Machin escribió:
On 28/05/2006 12:10 AM, Gonzalo Monzón wrote:

[good advice snipped]


Example A:
This code is more than 80 times faster than a "easy" Python
implementation. For every call, it does some bitwise operations and does
an array lookup for every string character from argument. Its a lot
faster because in Python approach a list lookup is done and it is a lot
faster to do a C array lookup -thought that in these C loops no Python
type value conversions are needed, if it where the case, C approach
would not be so faster than python. I don't know how would perform an
array based Python code, but I expect it to be a lot faster than using a
list, so Python code can be speed up a lot if you know how to do it.

// C code:
int CRC16Table[256]; // Filled elsewhere
int CalcCRC16(char *str)
{
int crc;
for(crc = 0xFFFF; *str != 0; str++) {
crc = CRC16Table [(( crc >> 8 ) & 255 )] ^ ( crc << 8 ) ^ *str;

Gonzalo, just in case there are any C compilers out there which need to
be told:
for(crc = 0xFFFF; *str != 0;) {
crc = CRC16Table [(( crc >> 8 ) & 255 )] ^ ( crc << 8 ) ^ *str++;
Thank you for the advise! I didn't know you couldn't advance pointer in
the for in some compilers...
}
return crc;
}

# Python code
gCRC16Table = [] # Filled elsewhere
def CalcCRC16(astr):
crc = 0xFFFFL

Having that L on the end (plus the fact that you are pointlessly
maintaining "crc" as an *unsigned* 32-bit quantity) will be slowing the
calculation down -- Python will be doing it in long integers. You are
calculating a *sixteen bit* CRC! The whole algorithm can be written
simply so as to not need more than 16-bit registers, and not to pollute
high-order bits in 17-or-more-bit registers.
Yes I know but I plan to post a quick example for Jim, and got the first
one file from several versions... :) The issue was about Jim
understanding how some code can be speed-up a lot and some other not and
how that's not a trivial question.
for c in astr:
crc = gCRC16Table[((crc >> 8) & 255)] ^ ((crc & 0xFFFFFF) << 8) ^
ord(c)

Note that *both* the C and Python routines still produce a 32-bit result
with 16 bits of high-order rubbish -- I got the impression from the
previous thread that you were going to fix that.
Yes of course! I plan to spend some time on this issue, the last week I
had not much time to work on this, but thought it worth the pain to
setup a compiling environment -ms.evc++ obviously-, and got succesfuly
compiled Python and some of these own custom Pyrex extensions for the
PocketPC, easily, only adding the C files to makefile, as Pyrex glue
code compiles well on ARM, so I have to make some timings and decide
what version to use for the code that won't be likely to be changed in
long time. I still have to test the last improved Python array based
approach and make some timings on the PDA.
This Python routine never strays outside 16 bits, so avoiding your "&
255" and a final "& 0xFFFF" (which you don't have).

def CalcCRC16(astr):
crc = 0xFFFF
for c in astr:
crc = gCRC16Table[crc >> 8] ^ ((crc & 0xFF) << 8) ^ ord(c)
return crc
Thank you again for your thoughts John! :)

Regards,
Gonzalo
 
Y

yairchu

The stuff you do are not representative of 100% of programming
conducted in the world. Not even 90% and probably not even 50% of
programming work is similar to what you do.
The fact you never use sophisticated math doesn't mean this guy doesn't
either.
Personally, I've used pyrex a lot. And it was never for wrapping -
always for speeding up.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,769
Messages
2,569,578
Members
45,052
Latest member
LucyCarper

Latest Threads

Top