Writing more efficient code

G

gonzlobo

Greetings, and happyNewYear to all.

I picked up Python a few weeks ago, and have been able to parse large
files and process data pretty easily, but I believe my code isn't too
efficient. I'm hoping dictionaries will help out, but I'm not sure the
best way to implement them.

I've been using a bunch of nested if/elif/else statements to select
slices (0317 & 03de) from a file, then parse the data (aa, hh, bb,
d2-d9) into parameters (a = airspeed, h = heading) & flags.

#sample file contents
0000007d 03 0317 aa aa aa aa aa hh hh hh bb bb
0000007e 06 03de d2 d3 d4 d5 d6 d7 d8 d9 10 11

# some pseudo code
if PID == '03de':
flapsCmd = int(d3, 16)
if flapsCmd == 0xc0:
<flaps up code>
elif flapsCmd == 0x03:
<flaps down code>
if PID == '0317':
airspeed == 'combine aa for airspeed & multiply by 0.1'
heading == 'combine hh for heading'
mach == 'combine bb for mach & multiply by 0.01'

Might dictionaries help in this case... say Label0317(parameterName,
slice (d3), scaleFactor(0.1))... I'd like to use them if they'll
replace the dozens of nested conditionals. I have roughly 75
different parameters to decode from a file containing ~2.5 million
lines of data.

I know my pseudo code lacks details, but hopefully I'm getting my
point across...

(I suppose switch/select/case statements would help a bit, but python
doesn't seem to use them... not to start a religious war or anything).

Any help (or encouragement) is appreciated.
 
J

Jon Harrop

gonzlobo said:
I picked up Python a few weeks ago, and have been able to parse large
files and process data pretty easily, but I believe my code isn't too
efficient. I'm hoping dictionaries will help out, but I'm not sure the
best way to implement them.

I've been using a bunch of nested if/elif/else statements to select
slices (0317 & 03de) from a file, then parse the data (aa, hh, bb,
d2-d9) into parameters (a = airspeed, h = heading) & flags.

#sample file contents
0000007d 03 0317 aa aa aa aa aa hh hh hh bb bb
0000007e 06 03de d2 d3 d4 d5 d6 d7 d8 d9 10 11

# some pseudo code
if PID == '03de':
flapsCmd = int(d3, 16)
if flapsCmd == 0xc0:
<flaps up code>
elif flapsCmd == 0x03:
<flaps down code>
if PID == '0317':
airspeed == 'combine aa for airspeed & multiply by 0.1'
heading == 'combine hh for heading'
mach == 'combine bb for mach & multiply by 0.01'

Might dictionaries help in this case... say Label0317(parameterName,
slice (d3), scaleFactor(0.1))... I'd like to use them if they'll
replace the dozens of nested conditionals.

Sounds like you want pattern matching over lists:

| 0x03 :: 0xde :: 0xc0 :: t -> <flaps up code>
| 0x03 :: 0xde :: 0x03 :: t -> <flaps down code>
| 0x03 :: 0x17 :: t -> ...

That's OCaml code which finds patterns in sequences of bytes, just like the
ones you're searching for.
I have roughly 75
different parameters to decode from a file containing ~2.5 million
lines of data.

I know my pseudo code lacks details, but hopefully I'm getting my
point across...

(I suppose switch/select/case statements would help a bit, but python
doesn't seem to use them... not to start a religious war or anything).

Pattern matching is "switch" on steroids.

OCaml's pattern matcher is very sophisticated and very fast. You'll probably
shrink your code by several fold whilst also having it run a few orders of
magnitude faster and having it statically checked, so it will be more
reliable.

You might want to look at any language with pattern matching: OCaml, SML,
Haskell, F#, Mathematica etc. If you're running Windows then F# is a .NET
language...
 
B

bearophileHUGS

Jon Harrop:
OCaml's pattern matcher is very sophisticated and very fast. You'll probably
shrink your code by several fold whilst also having it run a few orders of
magnitude faster and having it statically checked, so it will be more
reliable.

You seem to forget some months of time to learn OCaml.
And my Python programs seem reliable enough despite being unstatically
checked :)

You might want to look at any language with pattern matching: OCaml, SML,
Haskell, F#, Mathematica etc.

Mathematica pattern matching is good, but Mathematica costs a lot of
money (and you need some time to learn it, it's not an easy system).

Bye,
bearophile
 
D

Diez B. Roggisch

gonzlobo said:
Greetings, and happyNewYear to all.

I picked up Python a few weeks ago, and have been able to parse large
files and process data pretty easily, but I believe my code isn't too
efficient. I'm hoping dictionaries will help out, but I'm not sure the
best way to implement them.

I've been using a bunch of nested if/elif/else statements to select
slices (0317 & 03de) from a file, then parse the data (aa, hh, bb,
d2-d9) into parameters (a = airspeed, h = heading) & flags.

#sample file contents
0000007d 03 0317 aa aa aa aa aa hh hh hh bb bb
0000007e 06 03de d2 d3 d4 d5 d6 d7 d8 d9 10 11

# some pseudo code
if PID == '03de':
flapsCmd = int(d3, 16)
if flapsCmd == 0xc0:
<flaps up code>
elif flapsCmd == 0x03:
<flaps down code>
if PID == '0317':
airspeed == 'combine aa for airspeed & multiply by 0.1'
heading == 'combine hh for heading'
mach == 'combine bb for mach & multiply by 0.01'

Might dictionaries help in this case... say Label0317(parameterName,
slice (d3), scaleFactor(0.1))... I'd like to use them if they'll
replace the dozens of nested conditionals. I have roughly 75
different parameters to decode from a file containing ~2.5 million
lines of data.

I know my pseudo code lacks details, but hopefully I'm getting my
point across...

(I suppose switch/select/case statements would help a bit, but python
doesn't seem to use them... not to start a religious war or anything).

The dictionary approach certainly will be speedier - lookup should
usually be O(1) instead of O(n) for the if-elif-chain.

Diez
 
J

Jon Harrop

Jon Harrop:

You seem to forget some months of time to learn OCaml.

I think most people could pick up the core ideas in a day and start writing
working programs. Just read:

http://www.ffconsultancy.com/free/ocaml
http://www.ffconsultancy.com/products/ocaml_for_scientists/chapter1.html
And my Python programs seem reliable enough despite being unstatically
checked :)

Damn. ;-)
Mathematica pattern matching is good, but Mathematica costs a lot of
money (and you need some time to learn it, it's not an easy system).

Mathematica is expensive but learning to use pattern matching is much easier
than learning how to write a pattern matcher and much less tedious than
reimplementing it yourself all the time (which is exactly what the OP will
end up doing).
 
P

Paul McGuire

gonzlobo said:
Greetings, and happyNewYear to all.

I picked up Python a few weeks ago, and have been able to parse large
files and process data pretty easily, but I believe my code isn't too
efficient. I'm hoping dictionaries will help out, but I'm not sure the
best way to implement them.

I've been using a bunch of nested if/elif/else statements to select
slices (0317 & 03de) from a file, then parse the data (aa, hh, bb,
d2-d9) into parameters (a = airspeed, h = heading) & flags.

#sample file contents
0000007d 03 0317 aa aa aa aa aa hh hh hh bb bb
0000007e 06 03de d2 d3 d4 d5 d6 d7 d8 d9 10 11

# some pseudo code
if PID == '03de':
flapsCmd = int(d3, 16)
if flapsCmd == 0xc0:
<flaps up code>
elif flapsCmd == 0x03:
<flaps down code>
if PID == '0317':
airspeed == 'combine aa for airspeed & multiply by 0.1'
heading == 'combine hh for heading'
mach == 'combine bb for mach & multiply by 0.01'

Might dictionaries help in this case... say Label0317(parameterName,
slice (d3), scaleFactor(0.1))... I'd like to use them if they'll
replace the dozens of nested conditionals. I have roughly 75
different parameters to decode from a file containing ~2.5 million
lines of data.

I know my pseudo code lacks details, but hopefully I'm getting my
point across...

(I suppose switch/select/case statements would help a bit, but python
doesn't seem to use them... not to start a religious war or anything).

Any help (or encouragement) is appreciated.

First, your data is so regularly formatted, I suspect you are using the
split() method of string to get at the pieces. Good, this keeps things
simple.

It *would* help if you posted actual data lines, instead of aa's, and bb's.
So I took a guess that these were hex representations of characters
representing decimal numbers.

Instead of switch/case, I selected a dict that creates a dispatch table,
selecting one of several object classes whose __init__ constructors are
written to pick apart the relevant fields from the 10-element list following
the 4-digit selection code. I then just created an object dumping method to
show what objects were created from what lines of data, and perhaps you can
take it from there.

-- Paul


test = """\
0000007d 03 0317 30 33 32 38 38 32 37 35 34 31
0000007e 06 03de af c0 fe d5 d6 d7 d8 d9 10 11
0000007e 06 junk af c0 fe d5 d6 d7 d8 d9 10 11
0000007f 06 03de af 03 fe d5 d6 d7 d8 d9 10 11"""

hex2int = lambda hh : int(hh,16)
class AirspeedHeadingMach(object):
def __init__(self,datalist):
self.airspeed = int("".join(map(chr,map(hex2int,datalist[:5]))))*0.1
self.heading = "".join(map(chr,map(hex2int,datalist[5:8])))
self.mach = int("".join(map(chr,map(hex2int,datalist[8:]))))*0.01

UP,DOWN = 0,1
class FlapsStatus(object):
def __init__(self,datalist):
self.flaps = { 0xc0:UP, 0x03:DOWN } [ int(datalist[1],16) ]

class UnknownRec(object):
def __init__(self,code,datalist):
self.code = code
self.datalist = datalist[:]

def dump(o):
print o.__class__.__name__
print o.__dict__
print

recCodeTypeMap = {
'0317' : AirspeedHeadingMach,
'03de' : FlapsStatus,
}
for line in test.split("\n"):
toks = line.split()
code = toks[2]
datalist = toks[3:]
if code in recCodeTypeMap:
dataObject = recCodeTypeMap
Code:
(datalist)
    else:
        dataObject = UnknownRec(code,datalist)
    dump(dataObject)

Prints:
AirspeedHeadingMach
{'mach': 0.41000000000000003, 'airspeed': 328.80000000000001, 'heading': 
'275'}

FlapsStatus
{'flaps': 0}

UnknownRec
{'datalist': ['af', 'c0', 'fe', 'd5', 'd6', 'd7', 'd8', 'd9', '10', '11'], 
'code': 'junk'}

FlapsStatus
{'flaps': 1}
 
B

bearophileHUGS

Jon Harrop:
I think most people could pick up the core ideas in a day and start writing
working programs.

Probably I am not that intelligent, I probably need some months :) But
that language has many good sides, and one day I'll probably try to
learn it a bit.

Mathematica is expensive but learning to use pattern matching is much easier
than learning how to write a pattern matcher and much less tedious than
reimplementing it yourself all the time (which is exactly what the OP will
end up doing).

I see. This is a very old post of mine, at the bottom there are few
notes about the Mathematica pattern matching syntax:
http://groups.google.com/group/comp.lang.python/msg/93ce3e9a08f5e4c7

To avoid reimplementing it yourself all the time then maybe someone
(you?) can try to write a good pattern matcher for sequences for
CPython. With such system it may become less important to switch to a
different language ;-)

Bye,
bearophile
 
J

John Machin

gonzlobo said:
Greetings, and happyNewYear to all.

I picked up Python a few weeks ago, and have been able to parse large
files and process data pretty easily, but I believe my code isn't too
efficient. I'm hoping dictionaries will help out, but I'm not sure the
best way to implement them.

I've been using a bunch of nested if/elif/else statements to select
slices (0317 & 03de) from a file, then parse the data (aa, hh, bb,
d2-d9) into parameters (a = airspeed, h = heading) & flags.

#sample file contents
0000007d 03 0317 aa aa aa aa aa hh hh hh bb bb
0000007e 06 03de d2 d3 d4 d5 d6 d7 d8 d9 10 11

Do you have the original file from which this hex dump was made? It may
be a lot easier to write the code to pick that apart using the struct
module's unpack function than fiddling with the hex dump. It would
probably run faster as well.
# some pseudo code
if PID == '03de':
flapsCmd = int(d3, 16)
if flapsCmd == 0xc0:
<flaps up code>
elif flapsCmd == 0x03:
<flaps down code>
if PID == '0317':
airspeed == 'combine aa for airspeed & multiply by 0.1'

*five* bytes for airspeed? Are they ascii characters e.g. "01234"
meaning 123.4?
heading == 'combine hh for heading'
mach == 'combine bb for mach & multiply by 0.01'

Might dictionaries help in this case... say Label0317(parameterName,
slice (d3), scaleFactor(0.1))... I'd like to use them if they'll
replace the dozens of nested conditionals. I have roughly 75
different parameters to decode from a file containing ~2.5 million
lines of data.

I know my pseudo code lacks details, but hopefully I'm getting my
point across...

It would help if you gave some more precise info on what format the
individual fields can take.

Cheers,
John
 
B

Beliavsky

If in the newsgroup comp.lang.x somone asks how to do y, and you
suggest using language z, without answering their question, which was
how to do it in x, you will likely just annoy people and perhaps make
it even less likely that they will try z.

I have my own favorite language z and have not always heeded the above
advice, but I think the principle is still correct.
 
J

John Machin

Thanks to John, Paul & Jon for their responses. This list is great for
info.

Hi gonzlobo,

Please dont use private e-mail; post to the Python mailing-list /
newsgroup, so that everybody can see what the eventual outcome is.

Please also answer the question that I asked: in effect, what you have
showed looks like a hex dump of a binary fiie -- are you saying that the
data is actually transmitted over the serial bus in hex with spaces in
between?
Jon,
OCaml might be the best option, but my braincell is only good for 1
language at a time. Hopefully python doesn't replace English. :^)

Paul,
Thanks for the snippets. It looks pretty complicated, but worth
looking into. I'm all for reducing line of code (hopefully not for
readability).

John M,
You're right, my original example contained non-sensical data (I made
it up for example's sake).

It wasn't me who wrote that, but I agree.
Here's a snippet of real data (it's a 10Mb/s serial bus, so there's
*alot* of data).
0000007a 06 0321 80 00 34 d1 01 0b 3f f7 01 6b
0000007b 26 0311 00 00 00 00 1a bd 00 00 00 00
0000007c 06 0321 80 00 a0 04 81 eb 20 05 81 1b
0000007d 16 0614 00 00 00 00 00 00 00 00 00 00
0000007e 06 0321 80 00 20 00 01 07 a0 43 01 9b
0000007f 06 0301 80 00 a0 b9 82 2b 3f d6 02 ab
00000080 06 0321 80 00 bf d4 01 5b a3 f0 01 db
00000081 06 0301 80 00 31 9c 02 0b bf d7 02 15
00000082 0f 0416 01 01 00 00 20 20 20 20 20 20
00000083 06 0301 80 00 bf ff 02 6b bf f3 82 eb
00000084 0f 0416 02 01 00 00 20 20 20 20 20 20
00000085 06 0301 80 00 bf ed 82 1b a0 07 02 07
00000086 06 0321 00 00 00 00 01 af 00 00 00 00
00000087 26 0311 80 00 e0 ce 02 30 80 07 82 86
00000088 06 0301 80 00 a0 4a 02 9b 3f df 02 5b
00000089 06 0301 80 00 80 00 02 ce 80 00 02 b3
0000008a 06 0301 80 00 00 00 02 5f e0 00 02 89
0000008b 16 0614 00 fe 31 00 00 00 00 00 00 00
0000008c 43 03a1 01 00 80 00 02 5d 80 0b 06 5d
0000008d 06 0301 80 00 60 a1 92 c1 e0 a1 8a 21
0000008e 4f 0450 01 10 00 00 80 00 37 00 00 00

line = 0000007a 06 0321 80 00 34 d1 01 0b 3f f7 01 6b
0....v....1....v....2....v....3....v....4....v
Label 321 actually contains:

time = line[:8]
PID = line[12:16]
d2 = line[17:19]
d3 = line[20:22]
d4 = line[23:25]
d5 = line[26:28]
d6 = line[29:31]
d7 = line[32:34]
d8 = line[35:37]
d9 = line[38:40]
d10 = line[41:43]
d11 = line[44:46]

That's not Python 101, it's PYBOL :) I haven't had to resort to the
0....v....1 etc caper for a very long time.

Consider doing this:

ints = [int(x, 16) for x in line.split()]
time = ints[0]
d = ints[1:]
pid = d[1]
# what is the undescribed field in ints[1]?

d2 + d3 = Pitch Angle (* 0.01)

I asked you before what you mean by "combine" ... now I'll ask what you
mean by "d2 + d3"

Do you mean this:
pitch_angle = (d[3] * 256 + d[2]) * 0.01
?

d4 + d5 = Roll Angle (* 0.05)
d6 + d8 = Magnetic Heading (* 0.05)

What happened to d7?
d9 + d10 = Pressure Altitude (* 1.0)
d11 = various flags


My code is python 101 (uses lots of nested if's), so using
dictionaries would be very helpful.

Here's my Python 102 ... we can help you get to 201 level with a bit
more disclosure from you on the specifics ...

| >>> line = "0000007a 06 0321 80 00 34 d1 01 0b 3f f7 01 6b"
| >>> ints = [int(x, 16) for x in line.split()]
| >>> ints
| [122, 6, 801, 128, 0, 52, 209, 1, 11, 63, 247, 1, 107]
| >>> time = ints[0]
| >>> time
| 122
| >>> d = ints[1:]
| >>> d
| [6, 801, 128, 0, 52, 209, 1, 11, 63, 247, 1, 107]
| >>> unknown = d[0]
| >>> unknown
| 6
| >>> pid = d[1]
| >>> pid
| 801
| >>> hex(pid)
| '0x321'
| >>> pitch_angle = (d[3] * 256 + d[2]) * 0.01
| >>> pitch_angle
| 1.28
| >>>

Call me crazy, but I'm highly suspicious of 0x8000 becoming a pitch
angle of 1.28 degrees ;-) Especially since other lines in your sample
with pid == 0x321 have (mostly) d2d3 == 0x8000 also, except for one with
0x000 -- I'm not an aeronautical engineer, but I would have expected
other values for pitch angle.

HTH,
John
 
J

Jon Harrop

Beliavsky said:
If in the newsgroup comp.lang.x somone asks how to do y, and you
suggest using language z, without answering their question, which was
how to do it in x, you will likely just annoy people and perhaps make
it even less likely that they will try z.

Pattern matching isn't a language.
 
J

Jon Harrop

Jon Harrop:

Probably I am not that intelligent, I probably need some months :) But
that language has many good sides, and one day I'll probably try to
learn it a bit.

It is very cool, and there are a growing number of resources about these
languages. You might also like to try Microsoft's F#, which runs
under .NET.
I see. This is a very old post of mine, at the bottom there are few
notes about the Mathematica pattern matching syntax:
http://groups.google.com/group/comp.lang.python/msg/93ce3e9a08f5e4c7

Yes. Lots of good points. I think this sort of functionality would be a
welcome addition to Python. What is the easiest way to add such
functionality to Python? Perhaps it can be written in Python?
To avoid reimplementing it yourself all the time then maybe someone
(you?) can try to write a good pattern matcher for sequences for
CPython. With such system it may become less important to switch to a
different language ;-)

I think that is an excellent idea. Who will pay me? ;-)
 
B

bearophileHUGS

Jon Harrop:
I think this sort of functionality would be a welcome addition to Python.

I don't know.

Perhaps it can be written in Python?

Pyparsing and beautifulsoup show that practically useful parsing
modules can be done using Python alone too.
Set module of Python 2.3, translated to C in Python 2.4 and improved in
Python 2.5 shows that sometimes Python is fit to create prototypes that
can be debugged, and whose API can be improved, and they can later
translated to a faster language.
But the decimal module shows that sometimes such translation can become
a bit difficult.
The re module shows that external modules too can be good enough
compared of Perl/Ruby built-in regex syntax.
Beside Python, various much faster languages may be used, like D,
Pyrex, C, or your loved OCaml. Or theoretically even ShedSkin. I think
D may be fit, the "Pyd" Python <-> D bridge is quite good, and it's
improving. And D is reaching version 1.0.
OCaml maybe can be used to produce Python compiled modules, so it can
be a possibility too, but then very few people are able to maintain it,
so maybe it's better to use a more common language (even D is more
common, because its syntax is easy to grasp by C++ programmers).

What is the easiest way to add such functionality to Python?

I think implementation language isn't the main problem. I think the
definition of the purpose and API are more difficult.
Mathematica allows you to define rules that let the system chose the
right function (among some with the same name) according to the input
pattern or the kind of input. Guido has discussed some about something
similar, the multimethods. There are some ways and tricks to add such
capabilities to Python, but I don't think they are fast and reliable
enough for real programs. Maybe Python 3.0 will change this some.
If you want just to create something like a regular engine that works
on lists, that contains matching rules, rewriting rules and calls to
many user-defined functions, then I think you can do it with Python
(with Psyco if you want) in few lines (a really basic RE on lists can
be defined in about 20 lines, maybe a good system may can be built with
2000-10000 lines), but I don't know how much useful it can be, maybe it
can find some purpose (and maybe someone has already written such
module).

I think that is an excellent idea. Who will pay me? ;-)

I don't know, probably no one. Most people don't even know how to use
such pattern matching programming paradigm. For Python it may become
just an experiment.

Bye,
bearophile
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,733
Messages
2,569,439
Members
44,829
Latest member
PIXThurman

Latest Threads

Top