squeeze out some performance

Robert Voigtländer · Dec 6, 2013

Hi,

I try to squeeze out some performance of the code pasted on the link below.
http://pastebin.com/gMnqprST

The code will be used to continuously analyze sonar sensor data. I set this up to calculate all coordinates in a sonar cone without heavy use of trigonometry (assuming that this way is faster in the end).

I optimized as much as I could. Maybe one of you has another bright idea to squeeze out a bit more?

Thanks
Robert

Jeremy Sanders · Dec 6, 2013

Robert said:
I try to squeeze out some performance of the code pasted on the link
below. http://pastebin.com/gMnqprST

The code will be used to continuously analyze sonar sensor data. I set
this up to calculate all coordinates in a sonar cone without heavy use of
trigonometry (assuming that this way is faster in the end).

I optimized as much as I could. Maybe one of you has another bright idea
to squeeze out a bit more?

This sort of code is probably harder to make faster in pure python. You
could try profiling it to see where the hot spots are. Perhaps the choice of
arrays or sets might have some speed impact.

One idea would be to use something like cython to compile your python code
to an extension module, with some hints to the types of the various values.

I would go down the geometry route. If you can restate your problem in terms
of geometry, it might be possible to replace all that code with a few numpy
array operations.

e.g. for finding pixels in a circle of radius 50
import numpy as np
radiussqd = np.fromfunction(lambda y,x: (y-50)**2+(x-50)**2, (100,100) )
all_y, all_x = np.indices((100,100))
yvals = all_y[radiussqd < 50**2]

Jeremy

Chris Angelico · Dec 6, 2013

This sort of code is probably harder to make faster in pure python. You
could try profiling it to see where the hot spots are. Perhaps the choice of
arrays or sets might have some speed impact.

I'd make this recommendation MUCH stronger.

Rule 1 of optimization: Don't.
Rule 2 (for experts only): Don't yet.

Once you find that your program actually is running too slowly, then
AND ONLY THEN do you start looking at tightening something up. You'll
be amazed how little you need to change; start with good clean
idiomatic code, and then if it takes too long, you tweak just a couple
of things and it's fast enough. And when you do come to the
tweaking...

Rule 3: Measure twice, cut once.
Rule 4: Actually, measure twenty times, cut once.

Profile your code to find out what's actually slow. This is very
important. Here's an example from a real application (not in Python,
it's in a semantically-similar language called Pike):

https://github.com/Rosuav/Gypsum/blob/d9907e1507c52189c83ae25f5d7be85235b616fa/window.pike

I noticed that I could saturate one CPU core by typing commands very
quickly. Okay. That gets us past the first two rules (it's a MUD
client, it should not be able to saturate one core of an i5). The code
looks roughly like this:

paint():
for line in lines:
if line_is_visible:
paint_line(line)

paint_line():
for piece_of_text in text:
if highlighted: draw_highlighted()
else: draw_not_highlighted()

My first guess was that the actual drawing was taking the time, since
that's a whole lot of GTK calls. But no; the actual problem was the
iteration across all lines and then finding out if they're visible or
not (possibly because it obliterates the CPU caches). Once the
scrollback got to a million lines or so, that was prohibitively
expensive. I didn't realize that until I actually profiled the code
and _measured_ where the time was being spent.

How fast does your code run? How fast do you need it to run? Lots of
optimization questions are answered by "Yaknow what, it don't even
matter", unless you're running in a tight loop, or on a
microcontroller, or something. Halving the time taken sounds great
until you see that it's currently taking 0.0001 seconds and happens in
response to user action.

ChrisA

Robert Voigtländer · Dec 6, 2013

Thanks for your replies.

I already did some basic profiling and optimized a lot. Especially with help of a goof python performance tips list I found.

I think I'll follow the cython path.
The geometry approach also sound good. But it's way above my math/geometry knowledge.

Thanks for your input!

Mark Lawrence · Dec 6, 2013

Thanks for your replies.

I already did some basic profiling and optimized a lot. Especially > with help of a goof python performance tips list I found.

Wonderful typo -----^

Robert Voigtländer · Dec 6, 2013

Am Freitag, 6. Dezember 2013 17:36:03 UTC+1 schrieb Mark Lawrence:

Wonderful typo -----^

Oh well

... it was a good one. Just had a quick look at Cython. Looks great. Thanks for the tip.

John Ladasky · Dec 6, 2013

I try to squeeze out some performance of the code pasted on the link below.
http://pastebin.com/gMnqprST

Several comments:

1) I find this program to be very difficult to read, largely because there's a whole LOT of duplicated code. Look at lines 53-80, and lines 108-287, and lines 294-311. It makes it harder to see what this algorithm actually does. Is there a way to refactor some of this code to use some shared function calls?

2) I looked up the "Bresenham algorithm", and found two references which may be relevant. The original algorithm was one which computed good raster approximations to straight lines. The second algorithm described may be more pertinent to you, because it draws arcs of circles.

http://en.wikipedia.org/wiki/Bresenham's_line_algorithm
http://en.wikipedia.org/wiki/Midpoint_circle_algorithm

Both of these algorithms are old, from the 1960's, and can be implemented using very simple CPU register operations and minimal memory. Both of the web pages I referenced have extensive example code and pseudocode, and discuss optimization. If you need speed, is this really a job for Python?

3) I THINK that I see some code -- those duplicated parts -- which might benefit from the use of multiprocessing (assuming that you have a multi-core CPU). But I would have to read more deeply to be sure. I need to understand the algorithm more completely, and exactly how you have modified it for your needs.

Joel Goldstick · Dec 6, 2013

Not that this will speed up your code but you have this:

if not clockwise:
s = start
start = end
end = s

Python people would write:
end, start = start, end

You have quite a few if statements that involve multiple comparisons of the
same variable. Did you know you can do things like this in python:

Mark Lawrence · Dec 6, 2013

Several comments:

1) I find this program to be very difficult to read, largely because there's a whole LOT of duplicated code. Look at lines 53-80, and lines 108-287, and lines 294-311. It makes it harder to see what this algorithm actually does. Is there a way to refactor some of this code to use some shared function calls?

A handy tool for detecting duplicated code here
http://clonedigger.sourceforge.net/ for anyone who's interested.

Dan Stromberg · Dec 6, 2013

A handy tool for detecting duplicated code here
http://clonedigger.sourceforge.net/ for anyone who's interested.

Pylint does this too...

Robert Voigtländer · Dec 9, 2013

Am Samstag, 7. Dezember 2013 00:01:49 UTC+1 schrieb Dan Stromberg:

On 06/12/2013 16:52, John Ladasky wrote:

On Friday, December 6, 2013 12:47:54 AM UTC-8, Robert Voigtländer wrote:

I try to squeeze out some performance of the code pasted on the link below.

http://pastebin.com/gMnqprST

Several comments:

1) I find this program to be very difficult to read, largely because there's a whole LOT of duplicated code. Look at lines 53-80, and lines 108-287, and lines 294-311. It makes it harder to see what this algorithm actually does. Is there a way to refactor some of this code to use some shared function calls?

A handy tool for detecting duplicated code here http://clonedigger.sourceforge.net/ for anyone who's interested.

Pylint does this too...

Thanks again. I'll try to compress the code and have a look at the "multiple comparisons" topic.

Robert

Mark Lawrence · Dec 9, 2013

Am Samstag, 7. Dezember 2013 00:01:49 UTC+1 schrieb Dan Stromberg:

Thanks again. I'll try to compress the code and have a look at the "multiple comparisons" topic.

Robert

Would you be kind enough to compress your messages while you're at it by
reading and actioning https://wiki.python.org/moin/GoogleGroupsPython
thanks, a quick glance above will tell you why

Robin Becker · Dec 9, 2013

On 06/12/2013 22:07, Joel Goldstick wrote:
...........

Not that this will speed up your code but you have this:

if not clockwise:
s = start
start = end
end = s

Python people would write:
end, start = start, end

this works for some small number of variables, but on my machine with python 2.7
I start losing with 4 variables eg

C:\code\optichrome\74663>python -mtimeit -s"a=1;b=2;c=3;d=4" "a,b,c,d=b,c,d,a"
1000000 loops, best of 3: 0.206 usec per loop

C:\code\optichrome\74663>python -mtimeit -s"a=1;b=2;c=3;d=4" "t=a;a=b;b=c;c=d;d=t"
10000000 loops, best of 3: 0.118 usec per loop

It doesn't seem to make much difference that the variables are related as I see
a similar behaviour for simple assignments

C:\code\optichrome\74663>python -mtimeit -s"a=1;b=2;c=3;d=4;e=5;f=6;g=7;h=8" "a,b,c,d=e,f,g,h"
1000000 loops, best of 3: 0.204 usec per loop

C:\code\optichrome\74663>python -mtimeit -s"a=1;b=2;c=3;d=4;e=5;f=6;g=7;h=8" "a=e;b=f;c=g;d=h"
10000000 loops, best of 3: 0.103 usec per loop

for less than 4 variables the tuple method is faster.

Dave Angel · Dec 9, 2013

a similar behaviour for simple assignments

for less than 4 variables the tuple method is faster.

What does speed have to do with it? When you want to swap two
variables, the tuple assignment reads better.

Mark Lawrence · Dec 9, 2013

Hi,

I try to squeeze out some performance of the code pasted on the link below.
http://pastebin.com/gMnqprST

The code will be used to continuously analyze sonar sensor data. I set this up to calculate all coordinates in a sonar cone without heavy use of trigonometry (assuming that this way is faster in the end).

I optimized as much as I could. Maybe one of you has another bright idea to squeeze out a bit more?

Thanks
Robert

Actually for optimised code it looks very similar to some code posted
here
http://www.daniweb.com/software-dev.../321181/python-bresenham-circle-arc-algorithm
over three years ago.

Robert Voigtländer · Dec 10, 2013

Actually for optimised code it looks very similar to some code posted

here

http://www.daniweb.com/software-dev.../321181/python-bresenham-circle-arc-algorithm

over three years ago.

This is where it origins from. I just extended it for my needs and now want to optimize it.
List comprehensions instead of some for loops brought another 25%. And made the code shorter.

Robin Becker · Dec 10, 2013

What does speed have to do with it? When you want to swap two variables, the
tuple assignment reads better.

Well the OP is asking about performance so I guess the look and feel might be
sacrificed for speed in some circumstances.

The tuple approach is more appealing when the lhs & rhs are connected, but it
seems that even for more general assignments the tuple assignment may be faster
for small numbers of variables. Looking at the output of dis for this case

d,e,f=c,b,a

it seems that we get code like this
LOAD_NAME 3 (c)
LOAD_NAME 2 (b)
LOAD_NAME 1 (a)
ROT_THREE
ROT_TWO
STORE_NAME 4 (d)
STORE_NAME 5 (e)
STORE_NAME 6 (f)

for

d = c
e = b
f = a

we get this
LOAD_NAME 3 (c)
STORE_NAME 4 (d)
LOAD_NAME 2 (b)
STORE_NAME 5 (e)
LOAD_NAME 1 (a)
STORE_NAME 6 (f)

which is not obviously slower, but I consistently get the former to be faster. I
suppose it's a cache issue or something. However, for this particular case when
the variables are not connected I find the tuple assignment less pythonic, but
perhaps faster (even though in this case no tuples are built).

calculate part of solid circle in 2D array	7	Nov 25, 2013
Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023
performance critical Python features	0	Jun 23, 2011
I want to have a button in a web page with action for open new phone contact page and fill it with some date	1	Aug 31, 2023
filecmp.dircmp performance	1	Jan 8, 2011
Background image not showing up on html page	3	Sep 23, 2023
key/value store optimized for disk storage	23	May 3, 2012
Parsing a serial stream too slowly	10	Jan 23, 2012

squeeze out some performance

Robert Voigtländer

Jeremy Sanders

Chris Angelico

Robert Voigtländer

Mark Lawrence

Robert Voigtländer

John Ladasky

Joel Goldstick

Mark Lawrence

Dan Stromberg

Robert Voigtländer

Mark Lawrence

Robin Becker

Dave Angel

Mark Lawrence

Robert Voigtländer

Robin Becker

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads