squeeze out some performance

  • Thread starter Robert Voigtländer
  • Start date
R

Robert Voigtländer

Hi,

I try to squeeze out some performance of the code pasted on the link below.
http://pastebin.com/gMnqprST

The code will be used to continuously analyze sonar sensor data. I set this up to calculate all coordinates in a sonar cone without heavy use of trigonometry (assuming that this way is faster in the end).

I optimized as much as I could. Maybe one of you has another bright idea to squeeze out a bit more?

Thanks
Robert
 
J

Jeremy Sanders

Robert said:
I try to squeeze out some performance of the code pasted on the link
below. http://pastebin.com/gMnqprST

The code will be used to continuously analyze sonar sensor data. I set
this up to calculate all coordinates in a sonar cone without heavy use of
trigonometry (assuming that this way is faster in the end).

I optimized as much as I could. Maybe one of you has another bright idea
to squeeze out a bit more?

This sort of code is probably harder to make faster in pure python. You
could try profiling it to see where the hot spots are. Perhaps the choice of
arrays or sets might have some speed impact.

One idea would be to use something like cython to compile your python code
to an extension module, with some hints to the types of the various values.

I would go down the geometry route. If you can restate your problem in terms
of geometry, it might be possible to replace all that code with a few numpy
array operations.

e.g. for finding pixels in a circle of radius 50
import numpy as np
radiussqd = np.fromfunction(lambda y,x: (y-50)**2+(x-50)**2, (100,100) )
all_y, all_x = np.indices((100,100))
yvals = all_y[radiussqd < 50**2]

Jeremy
 
C

Chris Angelico

This sort of code is probably harder to make faster in pure python. You
could try profiling it to see where the hot spots are. Perhaps the choice of
arrays or sets might have some speed impact.

I'd make this recommendation MUCH stronger.

Rule 1 of optimization: Don't.
Rule 2 (for experts only): Don't yet.

Once you find that your program actually is running too slowly, then
AND ONLY THEN do you start looking at tightening something up. You'll
be amazed how little you need to change; start with good clean
idiomatic code, and then if it takes too long, you tweak just a couple
of things and it's fast enough. And when you do come to the
tweaking...

Rule 3: Measure twice, cut once.
Rule 4: Actually, measure twenty times, cut once.

Profile your code to find out what's actually slow. This is very
important. Here's an example from a real application (not in Python,
it's in a semantically-similar language called Pike):

https://github.com/Rosuav/Gypsum/blob/d9907e1507c52189c83ae25f5d7be85235b616fa/window.pike

I noticed that I could saturate one CPU core by typing commands very
quickly. Okay. That gets us past the first two rules (it's a MUD
client, it should not be able to saturate one core of an i5). The code
looks roughly like this:

paint():
for line in lines:
if line_is_visible:
paint_line(line)

paint_line():
for piece_of_text in text:
if highlighted: draw_highlighted()
else: draw_not_highlighted()

My first guess was that the actual drawing was taking the time, since
that's a whole lot of GTK calls. But no; the actual problem was the
iteration across all lines and then finding out if they're visible or
not (possibly because it obliterates the CPU caches). Once the
scrollback got to a million lines or so, that was prohibitively
expensive. I didn't realize that until I actually profiled the code
and _measured_ where the time was being spent.

How fast does your code run? How fast do you need it to run? Lots of
optimization questions are answered by "Yaknow what, it don't even
matter", unless you're running in a tight loop, or on a
microcontroller, or something. Halving the time taken sounds great
until you see that it's currently taking 0.0001 seconds and happens in
response to user action.

ChrisA
 
R

Robert Voigtländer

Thanks for your replies.

I already did some basic profiling and optimized a lot. Especially with help of a goof python performance tips list I found.

I think I'll follow the cython path.
The geometry approach also sound good. But it's way above my math/geometry knowledge.

Thanks for your input!
 
M

Mark Lawrence

Thanks for your replies.

I already did some basic profiling and optimized a lot. Especially > with help of a goof python performance tips list I found.

Wonderful typo -----^ :)
 
R

Robert Voigtländer

Am Freitag, 6. Dezember 2013 17:36:03 UTC+1 schrieb Mark Lawrence:
Wonderful typo -----^ :)

Oh well :) ... it was a good one. Just had a quick look at Cython. Looks great. Thanks for the tip.
 
J

John Ladasky

I try to squeeze out some performance of the code pasted on the link below.
http://pastebin.com/gMnqprST

Several comments:

1) I find this program to be very difficult to read, largely because there's a whole LOT of duplicated code. Look at lines 53-80, and lines 108-287, and lines 294-311. It makes it harder to see what this algorithm actually does. Is there a way to refactor some of this code to use some shared function calls?

2) I looked up the "Bresenham algorithm", and found two references which may be relevant. The original algorithm was one which computed good raster approximations to straight lines. The second algorithm described may be more pertinent to you, because it draws arcs of circles.

http://en.wikipedia.org/wiki/Bresenham's_line_algorithm
http://en.wikipedia.org/wiki/Midpoint_circle_algorithm

Both of these algorithms are old, from the 1960's, and can be implemented using very simple CPU register operations and minimal memory. Both of the web pages I referenced have extensive example code and pseudocode, and discuss optimization. If you need speed, is this really a job for Python?

3) I THINK that I see some code -- those duplicated parts -- which might benefit from the use of multiprocessing (assuming that you have a multi-core CPU). But I would have to read more deeply to be sure. I need to understand the algorithm more completely, and exactly how you have modified it for your needs.
 
J

Joel Goldstick

Not that this will speed up your code but you have this:

if not clockwise:
s = start
start = end
end = s

Python people would write:
end, start = start, end


You have quite a few if statements that involve multiple comparisons of the
same variable. Did you know you can do things like this in python:
 
M

Mark Lawrence

Several comments:

1) I find this program to be very difficult to read, largely because there's a whole LOT of duplicated code. Look at lines 53-80, and lines 108-287, and lines 294-311. It makes it harder to see what this algorithm actually does. Is there a way to refactor some of this code to use some shared function calls?

A handy tool for detecting duplicated code here
http://clonedigger.sourceforge.net/ for anyone who's interested.
 
R

Robert Voigtländer

Am Samstag, 7. Dezember 2013 00:01:49 UTC+1 schrieb Dan Stromberg:
On 06/12/2013 16:52, John Ladasky wrote:


On Friday, December 6, 2013 12:47:54 AM UTC-8, Robert Voigtländer wrote:




I try to squeeze out some performance of the code pasted on the link below.

http://pastebin.com/gMnqprST




Several comments:



1) I find this program to be very difficult to read, largely because there's a whole LOT of duplicated code.  Look at lines 53-80, and lines 108-287, and lines 294-311.  It makes it harder to see what this algorithm actually does.  Is there a way to refactor some of this code to use some shared function calls?







A handy tool for detecting duplicated code here http://clonedigger.sourceforge.net/ for anyone who's interested.



Pylint does this too...

Thanks again. I'll try to compress the code and have a look at the "multiple comparisons" topic.

Robert
 
R

Robin Becker

On 06/12/2013 22:07, Joel Goldstick wrote:
...........
Not that this will speed up your code but you have this:

if not clockwise:
s = start
start = end
end = s

Python people would write:
end, start = start, end

this works for some small number of variables, but on my machine with python 2.7
I start losing with 4 variables eg
C:\code\optichrome\74663>python -mtimeit -s"a=1;b=2;c=3;d=4" "a,b,c,d=b,c,d,a"
1000000 loops, best of 3: 0.206 usec per loop

C:\code\optichrome\74663>python -mtimeit -s"a=1;b=2;c=3;d=4" "t=a;a=b;b=c;c=d;d=t"
10000000 loops, best of 3: 0.118 usec per loop


It doesn't seem to make much difference that the variables are related as I see
a similar behaviour for simple assignments
C:\code\optichrome\74663>python -mtimeit -s"a=1;b=2;c=3;d=4;e=5;f=6;g=7;h=8" "a,b,c,d=e,f,g,h"
1000000 loops, best of 3: 0.204 usec per loop

C:\code\optichrome\74663>python -mtimeit -s"a=1;b=2;c=3;d=4;e=5;f=6;g=7;h=8" "a=e;b=f;c=g;d=h"
10000000 loops, best of 3: 0.103 usec per loop

for less than 4 variables the tuple method is faster.
 
M

Mark Lawrence

Hi,

I try to squeeze out some performance of the code pasted on the link below.
http://pastebin.com/gMnqprST

The code will be used to continuously analyze sonar sensor data. I set this up to calculate all coordinates in a sonar cone without heavy use of trigonometry (assuming that this way is faster in the end).

I optimized as much as I could. Maybe one of you has another bright idea to squeeze out a bit more?

Thanks
Robert

Actually for optimised code it looks very similar to some code posted
here
http://www.daniweb.com/software-dev.../321181/python-bresenham-circle-arc-algorithm
over three years ago.
 
R

Robin Becker

What does speed have to do with it? When you want to swap two variables, the
tuple assignment reads better.
Well the OP is asking about performance so I guess the look and feel might be
sacrificed for speed in some circumstances.

The tuple approach is more appealing when the lhs & rhs are connected, but it
seems that even for more general assignments the tuple assignment may be faster
for small numbers of variables. Looking at the output of dis for this case

d,e,f=c,b,a

it seems that we get code like this
LOAD_NAME 3 (c)
LOAD_NAME 2 (b)
LOAD_NAME 1 (a)
ROT_THREE
ROT_TWO
STORE_NAME 4 (d)
STORE_NAME 5 (e)
STORE_NAME 6 (f)



for

d = c
e = b
f = a

we get this
LOAD_NAME 3 (c)
STORE_NAME 4 (d)
LOAD_NAME 2 (b)
STORE_NAME 5 (e)
LOAD_NAME 1 (a)
STORE_NAME 6 (f)

which is not obviously slower, but I consistently get the former to be faster. I
suppose it's a cache issue or something. However, for this particular case when
the variables are not connected I find the tuple assignment less pythonic, but
perhaps faster (even though in this case no tuples are built).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,073
Latest member
DarinCeden

Latest Threads

Top