An idiom for code generation with exec

E

eliben

Hello,

In a Python program I'm writing I need to dynamically generate
functions[*] and store them in a dict. eval() can't work for me
because a function definition is a statement and not an expression, so
I'm using exec. At the moment I came up with the following to make it
work:

def build_func(args):
code """def foo(...)..."""
d = {}
exec code in globals(), d
return d['foo']

My question is, considering that I really need code generation[*] -
"is there a cleaner way to do this ?" Also, what happens if I replace
globals() by None ?
Additionally, I've found indentation to be a problem in such
constructs. Is there a workable way to indent the code at the level of
build_func, and not on column 0 ?

Thanks in advance
Eli

[*] I know that each time a code generation question comes up people
suggest that there's a better way to achieve this, without using exec,
eval, etc. But in my case, for reasons too long to fully lay out, I
really need to generate non-trivial functions with a lot of hard-coded
actions for performance. And there's no problem of security
whatsoever. If someone is very interested in the application, I will
elaborate more.
 
B

Bruno Desthuilliers

eliben a écrit :
Hello,

In a Python program I'm writing I need to dynamically generate
functions[*] (snip)

[*] I know that each time a code generation question comes up people
suggest that there's a better way to achieve this, without using exec,
eval, etc.

Just to make things clear: you do know that you can dynamically build
functions without exec, do you ?
But in my case, for reasons too long to fully lay out, I
really need to generate non-trivial functions with a lot of hard-coded
actions for performance.

Just out of curiousity : could you tell a bit more about your use case
and what makes a simple closure not an option ?
 
E

eliben

eliben a écrit :> Hello,
In a Python program I'm writing I need to dynamically generate
functions[*]
(snip)

[*] I know that each time a code generation question comes up people
suggest that there's a better way to achieve this, without using exec,
eval, etc.

Just to make things clear: you do know that you can dynamically build
functions without exec, do you ?

Yes, but the other options for doing so are significantly less
flexible than exec.
Just out of curiousity : could you tell a bit more about your use case
and what makes a simple closure not an option ?

Okay.

I work in the field of embedded programming, and one of the main uses
I have for Python (and previously Perl) is writing GUIs for
controlling embedded systems. The communication protocols are usually
ad-hoc messages (headear, footer, data, crc) built on top of serial
communication (RS232).

The packets that arrive have a known format. For example (YAMLish
syntax):

packet_length: 10
fields:
- name: header
offset: 0
length: 1
- name: time_tag
offset: 1
length: 1
transform: val * 2048
units: ms
- name: counter
offset: 2
length: 4
bytes-msb-first: true
- name: bitmask
offset: 6
length: 1
bit_from: 0
bit_to: 5
...

This is a partial capability display. Fields have defined offsets and
lengths, can be only several bits long, can have defined
transformations and units for convenient display.

I have a program that should receive such packets from the serial port
and display their contents in tabular form. I want the user to be able
to specify the format of his packets in a file similar to above.

Now, in previous versions of this code, written in Perl, I found out
that the procedure of extracting field values from packets is very
inefficient. I've rewritten it using a dynamically generated procedure
for each field, that does hard coded access to its data. For example:

def get_counter(packet):
data = packet[2:6]
data.reverse()
return data

This gave me a huge speedup, because each field now had its specific
function sitting in a dict that quickly extracted the field's data
from a given packet.

Now I'm rewriting this program in Python and am wondering about the
idiomatic way to use exec (in Perl, eval() replaces both eval and exec
of Python).

Eli
 
P

Peter Otten

eliben said:
Additionally, I've found indentation to be a problem in such
constructs. Is there a workable way to indent the code at the level of
build_func, and not on column 0 ?

exec "if 1:" + code.rstrip()

Peter
 
G

George Sakkis

eliben a écrit :> Hello,
In a Python program I'm writing I need to dynamically generate
functions[*]
[*] I know that each time a code generation question comes up people
suggest that there's a better way to achieve this, without using exec,
eval, etc.
Just to make things clear: you do know that you can dynamically build
functions without exec, do you ?

Yes, but the other options for doing so are significantly less
flexible than exec.
Just out of curiousity : could you tell a bit more about your use case
and what makes a simple closure not an option ?

Okay.

I work in the field of embedded programming, and one of the main uses
I have for Python (and previously Perl) is writing GUIs for
controlling embedded systems. The communication protocols are usually
ad-hoc messages (headear, footer, data, crc) built on top of serial
communication (RS232).

The packets that arrive have a known format. For example (YAMLish
syntax):

packet_length: 10
fields:
- name: header
offset: 0
length: 1
- name: time_tag
offset: 1
length: 1
transform: val * 2048
units: ms
- name: counter
offset: 2
length: 4
bytes-msb-first: true
- name: bitmask
offset: 6
length: 1
bit_from: 0
bit_to: 5
...

This is a partial capability display. Fields have defined offsets and
lengths, can be only several bits long, can have defined
transformations and units for convenient display.

I have a program that should receive such packets from the serial port
and display their contents in tabular form. I want the user to be able
to specify the format of his packets in a file similar to above.

Now, in previous versions of this code, written in Perl, I found out
that the procedure of extracting field values from packets is very
inefficient. I've rewritten it using a dynamically generated procedure
for each field, that does hard coded access to its data. For example:

def get_counter(packet):
data = packet[2:6]
data.reverse()
return data

This gave me a huge speedup, because each field now had its specific
function sitting in a dict that quickly extracted the field's data
from a given packet.

It's still not clear why the generic version is so slower, unless you
extract only a few selected fields, not all of them. Can you post a
sample of how you used to write it without exec to clarify where the
inefficiency comes from ?

George
 
R

Raymond Hettinger

I've rewritten it using a dynamically generated procedure
for each field, that does hard coded access to its data. For example:

def get_counter(packet):
  data = packet[2:6]
  data.reverse()
  return data

This gave me a huge speedup, because each field now had its specific
function sitting in a dict that quickly extracted the field's data
from a given packet.

Now I'm rewriting this program in Python and am wondering about the
idiomatic way to use exec (in Perl, eval() replaces both eval and exec
of Python).

FWIW, when I had a similar challenge for dynamic coding, I just
generated a py file and then imported it. This technique was nice
because can also work with Pyrex or Psyco.

Also, the code above can be simplified to: get_counter = lambda
packet: packet[5:1:-1]

Since function calls are expensive in python, you can also gain speed
by parsing multiple fields at a time:

header, timetag, counter = parse(packet)


Raymond
 
B

Bruno Desthuilliers

eliben a écrit :
eliben a écrit :> Hello,
In a Python program I'm writing I need to dynamically generate
functions[*] (snip)

[*] I know that each time a code generation question comes up people
suggest that there's a better way to achieve this, without using exec,
eval, etc.
Just to make things clear: you do know that you can dynamically build
functions without exec, do you ?

Yes, but the other options for doing so are significantly less
flexible than exec.

Let's see...
Just out of curiousity : could you tell a bit more about your use case
and what makes a simple closure not an option ?

Okay.

I work in the field of embedded programming, and one of the main uses
I have for Python (and previously Perl) is writing GUIs for
controlling embedded systems. The communication protocols are usually
ad-hoc messages (headear, footer, data, crc) built on top of serial
communication (RS232).
ok

The packets that arrive have a known format. For example (YAMLish
syntax):

packet_length: 10
fields:
- name: header
offset: 0
length: 1
- name: time_tag
offset: 1
length: 1
transform: val * 2048
units: ms
- name: counter
offset: 2
length: 4
bytes-msb-first: true
- name: bitmask
offset: 6
length: 1
bit_from: 0
bit_to: 5
...

This is a partial capability display. Fields have defined offsets and
lengths, can be only several bits long, can have defined
transformations and units for convenient display.
ok

I have a program that should receive such packets from the serial port
and display their contents in tabular form. I want the user to be able
to specify the format of his packets in a file similar to above.
ok

Now, in previous versions of this code, written in Perl, I found out
that the procedure of extracting field values from packets is very
inefficient. I've rewritten it using a dynamically generated procedure
for each field, that does hard coded access to its data. For example:

def get_counter(packet):
data = packet[2:6]
data.reverse()
return data

This gave me a huge speedup, because each field now had its specific
function sitting in a dict that quickly extracted the field's data
from a given packet.

ok. So if I get it right, you build the function's code as a string
based on the YAML specification.

If so, well, I can't think of anything really better[1] - at least *if*
dynamically generated procedures are really better performance wise,
which may *or not* be the case in Python.

[1] except using compile to build a code object with the function's
body, then instanciate a function object using this code, but I'm not
sure whether it will buy you much more performance-wise. I'd personnaly
prefer this because I find it more explicit and readable, but YMMV.
Now I'm rewriting this program in Python and am wondering about the
idiomatic way to use exec (in Perl, eval() replaces both eval and exec
of Python).

Well... So far, the most pythonic way to use exec is to avoid using it -
unless it's the right tool for the job !-)
 
E

eliben

FWIW, when I had a similar challenge for dynamic coding, I just
generated a py file and then imported it. This technique was nice
because can also work with Pyrex or Psyco.

I guess this is not much different than using exec, at the conceptual
level. exec is perhaps more suitable when you really need just one
function at a time and not a whole file of related functions.
Also, the code above can be simplified to: get_counter = lambda
packet: packet[5:1:-1]

OK, but that was just a demonstration. The actual functions are
complex enough to not fit into a single expression.

Eli
 
E

eliben

[1] except using compile to build a code object with the function's
body, then instanciate a function object using this code, but I'm not
sure whether it will buy you much more performance-wise. I'd personnaly
prefer this because I find it more explicit and readable, but YMMV.

How is compiling more readable than exec - doesn't it require an extra
step ? You generate code dynamically anyway.

Eli
 
E

eliben

eliben a écrit :> Hello,
In a Python program I'm writing I need to dynamically generate
functions[*]
(snip)
[*] I know that each time a code generation question comes up people
suggest that there's a better way to achieve this, without using exec,
eval, etc.
Just to make things clear: you do know that you can dynamically build
functions without exec, do you ?
Yes, but the other options for doing so are significantly less
flexible than exec.

I work in the field of embedded programming, and one of the main uses
I have for Python (and previously Perl) is writing GUIs for
controlling embedded systems. The communication protocols are usually
ad-hoc messages (headear, footer, data, crc) built on top of serial
communication (RS232).
The packets that arrive have a known format. For example (YAMLish
syntax):
packet_length: 10
fields:
- name: header
offset: 0
length: 1
- name: time_tag
offset: 1
length: 1
transform: val * 2048
units: ms
- name: counter
offset: 2
length: 4
bytes-msb-first: true
- name: bitmask
offset: 6
length: 1
bit_from: 0
bit_to: 5
...
This is a partial capability display. Fields have defined offsets and
lengths, can be only several bits long, can have defined
transformations and units for convenient display.
I have a program that should receive such packets from the serial port
and display their contents in tabular form. I want the user to be able
to specify the format of his packets in a file similar to above.
Now, in previous versions of this code, written in Perl, I found out
that the procedure of extracting field values from packets is very
inefficient. I've rewritten it using a dynamically generated procedure
for each field, that does hard coded access to its data. For example:
def get_counter(packet):
data = packet[2:6]
data.reverse()
return data
This gave me a huge speedup, because each field now had its specific
function sitting in a dict that quickly extracted the field's data
from a given packet.

It's still not clear why the generic version is so slower, unless you
extract only a few selected fields, not all of them. Can you post a
sample of how you used to write it without exec to clarify where the
inefficiency comes from ?

George

The generic version has to make a lot of decisions at runtime, based
on the format specification.
Extract the offset from the spec, extract the length. Is it msb-
first ? Then reverse. Are specific bits required ? If so, do bit
operations. Should bits be reversed ? etc.

A dynamically generated function doesn't have to make any decisions -
everything is hard coded in it, because these decisions have been done
at compile time. This can save a lot of dict accesses and conditions,
and results in a speedup.

I guess this is not much different from Lisp macros - making decisions
at compile time instead of run time and saving performance.

Eli
 
G

George Sakkis

On Jun 20, 9:17 am, Bruno Desthuilliers <bruno.
(e-mail address removed)> wrote:
eliben a écrit :> Hello,
In a Python program I'm writing I need to dynamically generate
functions[*]
(snip)
[*] I know that each time a code generation question comes up people
suggest that there's a better way to achieve this, without using exec,
eval, etc.
Just to make things clear: you do know that you can dynamically build
functions without exec, do you ?
Yes, but the other options for doing so are significantly less
flexible than exec.
But in my case, for reasons too long to fully lay out, I
really need to generate non-trivial functions with a lot of hard-coded
actions for performance.
Just out of curiousity : could you tell a bit more about your use case
and what makes a simple closure not an option ?
Okay.
I work in the field of embedded programming, and one of the main uses
I have for Python (and previously Perl) is writing GUIs for
controlling embedded systems. The communication protocols are usually
ad-hoc messages (headear, footer, data, crc) built on top of serial
communication (RS232).
The packets that arrive have a known format. For example (YAMLish
syntax):
packet_length: 10
fields:
- name: header
offset: 0
length: 1
- name: time_tag
offset: 1
length: 1
transform: val * 2048
units: ms
- name: counter
offset: 2
length: 4
bytes-msb-first: true
- name: bitmask
offset: 6
length: 1
bit_from: 0
bit_to: 5
...
This is a partial capability display. Fields have defined offsets and
lengths, can be only several bits long, can have defined
transformations and units for convenient display.
I have a program that should receive such packets from the serial port
and display their contents in tabular form. I want the user to be able
to specify the format of his packets in a file similar to above.
Now, in previous versions of this code, written in Perl, I found out
that the procedure of extracting field values from packets is very
inefficient. I've rewritten it using a dynamically generated procedure
for each field, that does hard coded access to its data. For example:
def get_counter(packet):
data = packet[2:6]
data.reverse()
return data
This gave me a huge speedup, because each field now had its specific
function sitting in a dict that quickly extracted the field's data
from a given packet.
It's still not clear why the generic version is so slower, unless you
extract only a few selected fields, not all of them. Can you post a
sample of how you used to write it without exec to clarify where the
inefficiency comes from ?

The generic version has to make a lot of decisions at runtime, based
on the format specification.
Extract the offset from the spec, extract the length. Is it msb-
first ? Then reverse. Are specific bits required ? If so, do bit
operations. Should bits be reversed ? etc.

So you are saying that for example "if do_reverse: data.reverse()" is
*much* slower than "data.reverse()" ? I would expect that checking the
truthness of a boolean would be negligible compared to the reverse
itself. Did you try converting all checks to identity comparisons with
None ? I mean replacing every "if compile_time_condition:" in a loop
with

compile_time_condition = compile_time_condition or None
for i in some_loop:
if compile_time_condition is None:
...

It's hard to believe that the overhead of identity checks is
comparable (let alone much higher) to the body of the loop for
anything more complex than "pass".

George
 
B

bruno.desthuilliers

[1] except using compile to build a code object with the function's
body, then instanciate a function object using this code, but I'm not
sure whether it will buy you much more performance-wise. I'd personnaly
prefer this because I find it more explicit and readable, but YMMV.

How is compiling more readable than exec -

Using compile and function(), you explicitely instanciate a new
function object, while using exec you're relying on a side effect.
doesn't it require an extra
step ?

Well... Your way:

d = {}
exec code in globals(), d
return d['foo']

My way:

return function(compile(code, '<string>', 'exec'), globals())

As far as I'm concern, it's two steps less - but YMMV, of course !-)
You generate code dynamically anyway.

Yes, indeed. Which may or not be the right thing to do here, but this
is a different question (and one I can't actually answer).
 
B

bruno.desthuilliers

(snip)



The generic version has to make a lot of decisions at runtime, based
on the format specification.
Extract the offset from the spec, extract the length.

import operator

transformers = []
transformers.append(operator.itemgetter(slice(format.offset,format.offset
+format.length)))
Is it msb-
first ? Then reverse.

if format.msb_first:
transformer.append(reverse)
Are specific bits required ? If so, do bit
operations.

etc.... Python functions are objects, you can define your own callable
(ie: function like) types, you can define anonymous single-expression
functions using lambda, functions are closures too so they can carry
the environment they were defined in, implementing partial application
(using either closures or callable objects) is trivial (and is in the
stdlib functools module since 2.5 FWIW), well... Defining a sequence
of transormer functionals is not a problem neither. And applying it to
your data bytestring is just trivial:

def apply_transformers(data, transormers) :
for transformer in transformers:
data = transformer(data)
return data

.... and is not necessarily that bad performance-wide (here you'd have
to benchmark both solutions to know for sure).
A dynamically generated function doesn't have to make any decisions -

No, but neither does a sequence of callable objects. The decisions are
taken where you have the necessary context, and applied somewhere
else. Dynamically generating/compiling code is one possible solution,
but not the only one.

I guess this is not much different from Lisp macros
The main difference is that Lisp macro are not built as raw string,
but as first class objects. I've so found this approach more flexible
and way easier to maintain, but here again, YMMV.

Anyway, even while (as you may have noticed by now) I'm one of these
"there's-a-better-way-than-eval-exec" peoples, I'd think you may
(depending on benchmarks with both solutions and real-life data) have
a valid use case here - and if you encapsulate this part correctly,
you can alway start with your current solution (so you make it work),
then eventually switch implementation later if it's worth the extra
effort...


Just my 2 cents. Truth is that as long as it works and is
maintainable, then who cares...
 
E

eliben

So you are saying that for example "if do_reverse: data.reverse()" is
*much* slower than "data.reverse()" ? I would expect that checking the
truthness of a boolean would be negligible compared to the reverse
itself. Did you try converting all checks to identity comparisons with
None ? I mean replacing every "if compile_time_condition:" in a loop
with

compile_time_condition = compile_time_condition or None
for i in some_loop:
if compile_time_condition is None:
...

It's hard to believe that the overhead of identity checks is
comparable (let alone much higher) to the body of the loop for
anything more complex than "pass".

There are also dict accesses (to extract the format parameters, such
as length and offsets) to the format, which are absent. Besides, the
fields are usually small, so reverse is relatively cheap.

Eli
 
E

eliben

d = {}
execcode in globals(), d
return d['foo']

My way:

return function(compile(code, '<string>', 'exec'), globals())

With some help from the guys at IRC I came to realize your way doesn't
do the same. It creates a function that, when called, creates 'foo' on
globals(). This is not exactly what I need.

Eli
 
E

eliben

exec"if 1:" + code.rstrip()

Peter

Why is the 'if' needed here ? I had .strip work for me:

def make_func():
code = """
def foo(packet):
return ord(packet[3]) + 256 * ord(packet[4])
"""

d = {}
exec code.strip() in globals(), d
return d['foo']

Without .strip this doesn't work:

Traceback (most recent call last):
File "exec_code_generation.py", line 25, in <module>
foo = make_func()
File "exec_code_generation.py", line 20, in make_func
exec code in globals(), d
File "<string>", line 2
def foo(packet):
^
IndentationError: unexpected indent
 
P

Peter Otten

eliben said:
Why is the 'if' needed here ? I had .strip work for me:

A simple .strip() doesn't work if the code comprises multiple lines:
.... return """
.... x = 42
.... if x > 0:
.... print x
.... """
....Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 2
if x > 0:
^
IndentationError: unexpected indent

You can of course split the code into lines, calculate the indentation of
the first non-white line, remove that indentation from all lines and then
rejoin.

Peter
 
E

eliben

A simple .strip() doesn't work if the code comprises multiple lines:


... return """
... x = 42
... if x > 0:
... print x
... """
...>>> exec "if 1:\n" + f().rstrip()
42

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 2
if x > 0:
^
IndentationError: unexpected indent

I see. In my case I only evaluate function definitions with 'exec', so
I only need to de-indent the first line, and the others can be
indented because they're in a new scope anyway. What you suggest works
for arbitrary code and not only function definitions. It's a nice
trick with the "if 1:" :)
 
L

Lie

I see. In my case I only evaluate function definitions with 'exec', so
I only need to de-indent the first line, and the others can be
indented because they're in a new scope anyway. What you suggest works
for arbitrary code and not only function definitions. It's a nice
trick with the "if 1:" :)

Have you actually profiled your code? Or are you just basing this
assumptions on guesses?
 
E

eliben

I see. In my case I only evaluate function definitions with 'exec', so
Have you actually profiled your code? Or are you just basing this
assumptions on guesses?

First of all, I see absolutely no connection between your question and
the text you quote. Is there? Or did you pick one post randomly to
post your question on?

Second, yes - I have profiled my code.

Third, this is a very typical torture path one has to go through when
asking about code generation. It is true of almost all communities,
except Lisp, perhaps. You have to convince everyone that you have a
real reason to do what you do. The simple norm of getting a reply to
your question doesn't work when you get to code generation. I wonder
why is it so. How many people have been actually "burned" by bad code
generation techniques, and how many are just parroting "goto is evil"
because it's the accepted thing to say. This is an interesting point
to ponder.

Eli
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,777
Messages
2,569,604
Members
45,234
Latest member
SkyeWeems

Latest Threads

Top