Idiom for running compiled python scripts?

M

Mark

Hi, I'm new to python and looking for a better idiom to use for the
manner I have been organising my python scripts. I've googled all over
the place about this but found absolutely nothing.

I'm a linux/unix command line guy quite experienced in shell scripts
etc. I have a heap of command line utility scripts which I run directly.
What is the best way to create python command line scripts but exploit
the (loadonly) speed-up benefit of python compiled code?

E.g. say I have a python script "myprog.py". I could just execute that
directly each time but that means it is "compiled" each time I run it
which is not efficient and adds to startup time. I have been creating a
complimentary script "myprog" stub which just does:

#!/usr/bin/env python
from myprog import main
if __name__ == "__main__":
main()

Of course this compiles myprog.py into myprog.pyc on first run as I am
wanting.

I have one of these stubs for all my python scripts I've created so far.
Is there not a better way? Do I have to create a separate stub each
time? I find it a bit messy to require a pair of scripts for each
utility and it also contributes some inefficiency. Given the above stub
is so boilerplate, why does python not provide a general stub/utility
mechanism for this?
 
B

Bjoern Schliessmann

Mark said:
E.g. say I have a python script "myprog.py". I could just execute
that directly each time but that means it is "compiled" each time
I run it which is not efficient and adds to startup time.

Did you measure the performance hit in your case?
I have one of these stubs for all my python scripts I've created
so far. Is there not a better way? Do I have to create a separate
stub each time? I find it a bit messy to require a pair of scripts
for each utility and it also contributes some inefficiency. Given
the above stub is so boilerplate, why does python not provide a
general stub/utility mechanism for this?

I've noticed that calling the interpreter with pre-compiled pyc
files also works.

Regards,


Björn
 
M

Max Erickson

....
#!/usr/bin/env python
from myprog import main
if __name__ == "__main__":
main()

Of course this compiles myprog.py into myprog.pyc on first run as
I am wanting.

I have one of these stubs for all my python scripts I've created
so far. Is there not a better way? Do I have to create a separate
stub each time? I find it a bit messy to require a pair of
scripts for each utility and it also contributes some
inefficiency. Given the above stub is so boilerplate, why does
python not provide a general stub/utility mechanism for this?

I don't know of a better way to organize things, but as an
alternative, you could have a script where you import each of the
scripts that you want compiled, python will write the compiled files
to disk when you run it(thus avoiding the need to split the other
scripts). There are also the py_compile and compileall modules, which
have facilities for generating byte code.

More here:

http://effbot.org/zone/python-compile.htm

under 'Compiling python modules to byte code'.


max
 
M

Mark

So given the lack of response it seems that there is probably no such
idiom and that I should not be concerned by the inefficiency inherent in
running .py scripts directly?

I did some time tests and sure, the speed gain is slight, but it is a
gain none the less.
 
S

Steve Holden

Mark said:
So given the lack of response it seems that there is probably no such
idiom and that I should not be concerned by the inefficiency inherent in
running .py scripts directly?

I did some time tests and sure, the speed gain is slight, but it is a
gain none the less.

Someone already told you - compile the files manually (see the compile
built-in function) and then use

python file.pyc

regards
Steve
 
S

Steve Holden

Mark said:
So given the lack of response it seems that there is probably no such
idiom and that I should not be concerned by the inefficiency inherent in
running .py scripts directly?

I did some time tests and sure, the speed gain is slight, but it is a
gain none the less.

Sorry, what you really need is the compileFile() function from the
compiler module.

regards
Steve
 
S

Steven D'Aprano

So given the lack of response it seems that there is probably no such
idiom and that I should not be concerned by the inefficiency inherent in
running .py scripts directly?

I did some time tests and sure, the speed gain is slight, but it is a
gain none the less.

Since you've done these tests already, perhaps you can tell us what gain
you actually got?

Here's a test I did:

[steve@apple ~]$ time python script.py
the result is 166166000

real 0m0.555s
user 0m0.470s
sys 0m0.011s
[steve@apple ~]$ time python script.pyc
the result is 166166000

real 0m0.540s
user 0m0.456s
sys 0m0.011s


That gives me an absolute gain of 15ms which is a percentage gain of about
3%. But don't forget the time it takes you to type the extra "c" at the
end of the script, even with filename completion. The average human
reaction time is something between 200 and 270 milliseconds, so unless
you're saving at least 200ms, typing that extra "c" at the end actually
wastes time.

Of course you have to type the "c". You're not deleting the source files
away are you? *wink*
 
M

Mark

Since you've done these tests already, perhaps you can tell us what gain
you actually got?

About the same as you, ~20 msecs for my small script samples.
Of course you have to type the "c". You're not deleting the source files
away are you? *wink*

Sorry, the wink is lost on me?

Of course I am not deleting the sources. In fact, I am also talking
about python scripts being called from shell scripts. I guess I'm just
surprised that the python installation does not provide a small stub
invoker, e.g:

A small script called "python_compile_and_run" in "pseudo" code:

#!/usr/bin/env python
import sys

# Following is invalid syntax unfortunately :(
from sys.argv[1].rstrip('.py') import main

sys.argv = sys.argv[1:]
if __name__ == "__main__":
main()

so I could just do a "python_compile_and_run myscript.py" and it would
do what I want, i.e. run myscript.pyc if available and valid, generate
and run it if necessary.
 
G

Gerard Flanagan

Of course I am not deleting the sources. In fact, I am also talking
about python scripts being called from shell scripts.

There's a nice recipe in Python Cookbook (Martelli et al.) for this.
It involves zipping your .pyc files and adding a shell stub. Never
used it before but I'm going to need something similar in the near
future, probably with a text templating system such as Cheetah
(www.cheetahtemplate.org).

HTH

Gerard
 
S

Steven D'Aprano

About the same as you, ~20 msecs for my small script samples.

Well, I think that pretty much answers your question about whether it is
worth pre-compiling short shell scripts: you save about 20ms in execution
time, and lose 200ms in typing time. (Maybe a bit less if you are a
fast typist and don't use auto-completion.) You do the maths.

Sorry, the wink is lost on me?

It is because I didn't really think you were deleting the source files.
That would be incredibly stupid. But I mentioned it just in case some
not-so-bright spark decided to argue that you could use auto-completion
without needing to type that final "c" if you deleted the source file.

Presumably now somebody is going to suggest merely *moving* the source
files into another directory, thus spending a minute or two each time they
edit a script re-arranging files in order to save twenty or thirty
milliseconds when they execute the script. Hey, if your time is so
valuable that 20ms means that much to you, go for it.

Of course I am not deleting the sources. In fact, I am also talking
about python scripts being called from shell scripts. I guess I'm just
surprised that the python installation does not provide a small stub
invoker, e.g:

A small script called "python_compile_and_run" in "pseudo" code:

[snip pseudo-code]
so I could just do a "python_compile_and_run myscript.py" and it would
do what I want, i.e. run myscript.pyc if available and valid, generate
and run it if necessary.

You shouldn't expect Python to come with every imaginable special-purpose
script already written for you! Besides, it's pretty simple to get that
functionality by hand when you need it, or automatically for that matter.

Here's one (untested) script that executes the pyc file in a subshell if
it exists and is new enough, and compiles it if it doesn't.


import os, sys, compiler
from stat import ST_MTIME as MT
if __name__ == "__main__":
scriptname = sys.argv[1]
compiledname = scriptname + "c"
if not os.path.exists(compiledname) or \
os.stat(compiledname)[MT] < os.stat(scriptname)[MT]:
# compiled file doesn't exist, or is too old
compiler.compileFile(scriptname)
assert os.path.exists(compiledname)
resultcode = os.system('python %s' % compiledname)
sys.exit(resultcode)

Now don't forget to test whether launching the subshell takes longer than
the 20ms you might save. All that effort, and wouldn't it be ironic if it
was actually *slower* than executing the script from scratch each time...
 
M

mark

if not os.path.exists(compiledname) or \ os.stat(compiledname)[MT] <
os.stat(scriptname)[MT]:
# compiled file doesn't exist, or is too old

Surely the validity check done by Python is more sophisticated than
this? Doesn't the binary file have to be compiled under the same python
version etc?
Now don't forget to test whether launching the subshell takes longer
than the 20ms you might save. All that effort, and wouldn't it be ironic
if it was actually *slower* than executing the script from scratch each
time...

But Python proper is executing all the above anyhow isn't it? So the 20
msecs advantage I measured already includes this logic.

Anyhow, I give up. Compilation, it seems, only applies to python
modules. Compilation is not appropriate for Python scripts. Should be
in the FAQ.
 
S

Steven D'Aprano

if not os.path.exists(compiledname) or \ os.stat(compiledname)[MT] <
os.stat(scriptname)[MT]:
# compiled file doesn't exist, or is too old

Surely the validity check done by Python is more sophisticated than
this? Doesn't the binary file have to be compiled under the same python
version etc?

Of course. What, do you want me to do all your work? :)


But Python proper is executing all the above anyhow isn't it? So the 20
msecs advantage I measured already includes this logic.

I don't know how you measured the 20ms.

When you call a script direct from the shell, you've already got a shell
running. When you call a script from within Python via os.system, it has
to launch a sub-shell. That takes time.

Anyhow, I give up. Compilation, it seems, only applies to python
modules. Compilation is not appropriate for Python scripts. Should be
in the FAQ.

No, that's not true. Python scripts certainly take advantage of compiled
modules.

The real lesson of this is that optimization isn't necessarily
straightforward. What we imagine "should be" faster might not be in
practice -- especially when dealing with micro-optimizations that only
save a few tens of milliseconds.

Frankly, it simply isn't worth trying to save 20ms in a script that takes
less than a second to run. If you scratch your nose before hitting enter,
you've wasted a hundred times what you've just spent hours trying to save.

Or, to put it another way:

The Rules of Optimization are simple.
Rule 1: Don't do it.
Rule 2 (for experts only): Don't do it yet.
-- Michael A. Jackson (no, not that Michael Jackson), "Principles of
Program Design", 1975.
 
A

Alex Martelli

Mark said:
so I could just do a "python_compile_and_run myscript.py" and it would
do what I want, i.e. run myscript.pyc if available and valid, generate
and run it if necessary.

You can use

python -c 'import myscript; myscript.main()'

and variations thereon.


Alex
 
M

Mark

You can use

python -c 'import myscript; myscript.main()'

and variations thereon.

Hmmm, after all that, this seems to be close to what I was looking for.

Thanks Alex. Didn't find anything about this in your cookbook! (I'm just
starting reading it whole - best way to really learn the language I
think).

So the general purpose invoking bash script e.g. "runpy" is merely something
like:

#################################################################
#!/bin/bash

if [ $# -lt 1 ]; then
echo "usage: `basename $0` script.py [script args ..]"
exit 1
fi

PROG=$1
DIR=`dirname $PROG`
MOD=`basename $PROG`
MOD=${MOD%.py}
shift
exec python -c "import sys; \
sys.argv[0] = \"$PROG\"; \
sys.path.append(\"$DIR\"); \
import $MOD; $MOD.main()" $@
#################################################################

So I timed "~/bin/myscript.py myargs" against "runpy ~/bin/myscript.py
myargs" but got only maybe a couple of millisec improvement (using a
1000 line myscript.py which just immediately exits - to try and push the
advantage!).

So I agree - the ends do not justify the means here and I'll just
execute myscript.py directly. Still not sure why python does not provide
this as a "python --compile_and_run myscript.py" option though?! ;)
 
I

irstas

A small script called "python_compile_and_run" in "pseudo" code:

#!/usr/bin/env python
import sys

# Following is invalid syntax unfortunately :(
from sys.argv[1].rstrip('.py') import main

sys.argv = sys.argv[1:]
if __name__ == "__main__":
main()

so I could just do a "python_compile_and_run myscript.py" and it would
do what I want, i.e. run myscript.pyc if available and valid, generate
and run it if necessary.

There's __import__ which allows you to do what you tried:

m = __import__(sys.argv[1].rstrip('.py'))

Also, rstrip doesn't work like you think it does.
'pyxyypp.py'.rstrip('.py') == 'pyx'

Answering also to your later message:
So the general purpose invoking bash script e.g. "runpy" is merely something
like:

Curiously enough, Python 2.5 has a module called runpy:
http://docs.python.org/lib/module-runpy.html

which seems to almost do what you want. It doesn't compile the modules
but you could make a modification which does. The benefit over just
using __import__("module").main() would be
that your scripts wouldn't necessarily need a function called "main",
but would still work with scripts that use the __name__ == '__main__'
idiom.
A simple implementation that "works":


import imp, sys, os
c = sys.argv[1]
if not os.path.exists(c + 'c') or os.stat(c).st_mtime > os.stat(c +
'c').st_mtime:
import compiler
compiler.compileFile(c)
del sys.argv[0]
imp.load_compiled('__main__', c + 'c')


I timed it against running plain .py and running .pyc directly.
It seemed to be roughly on par with running .pyc directly, and about
18ms
faster than running .py. The file had 600 lines (21kb) of code.
 
M

Mark

Also, rstrip doesn't work like you think it does.
'pyxyypp.py'.rstrip('.py') == 'pyx'

Well there is embarrassing confirmation that I am a python newbie :(
I timed it against running plain .py and running .pyc directly. It
seemed to be roughly on par with running .pyc directly, and about 18ms
faster than running .py. The file had 600 lines (21kb) of code.

So see my point at least? I'm still not sure why this approach is
ill-favoured?

Thanks very much for all the detailed responses here.
 
M

Mark

A simple implementation that "works":

Not quite irstas BTW ..
import imp, sys, os
c = sys.argv[1]
if not os.path.exists(c + 'c') or os.stat(c).st_mtime > os.stat(c +
'c').st_mtime:
import compiler
compiler.compileFile(c)
del sys.argv[0]
imp.load_compiled('__main__', c + 'c')

The above doesn't actually work for my test script. I have an atexit
call in the script which is deleting some temp files and I get the
following traceback on termination when run with the above:

Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "/home/mark/bin/myscript.py", line 523, in delete
if files.tempdir:
AttributeError: 'NoneType' object has no attribute 'tempdir'
Error in sys.exitfunc:
Traceback (most recent call last):
File "/usr/lib/python2.4/atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "/home/mark/bin/myscript.py", line 523, in delete
if files.tempdir:
AttributeError: 'NoneType' object has no attribute 'tempdir'

The appropriate source code is:

At the start of main() ..

# Ensure all temp files deleted on exit
import atexit
atexit.register(files.delete)

and then from my class "files":

@staticmethod
def delete():
'''Called to delete all temp files'''

if files.tempdir:
shutil.rmtree(files.tempdir)

Something about the environment is not quite the same. Any ideas?
 
G

Gabriel Genellina

The above doesn't actually work for my test script. I have an atexit
call in the script which is deleting some temp files and I get the
following traceback on termination when run with the above:

Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "/home/mark/bin/myscript.py", line 523, in delete
if files.tempdir:
AttributeError: 'NoneType' object has no attribute 'tempdir'

I don't know exactly what happened so it doesn't work anymore (and it
worked before) but script finalization is always a bit fragile. All values
in all modules dictionaries (holding globals) are set to None (presumably
to help garbage collection by breaking cycles). When your delete function
is called, globals like shutil or files are already gone. A way to avoid
this problem is to hold a reference to all required globals, so your
delete function would become:

@staticmethod
def delete(files=files,rmtree=shutil.rmtree):
'''Called to delete all temp files'''
if files.tempdir:
rmtree(files.tempdir)

But I'm not sure if this is enough because rmtree relies on the os module
to do its work.
 
G

Gabriel Genellina

The above doesn't actually work for my test script. I have an atexit
call in the script which is deleting some temp files and I get the
following traceback on termination when run with the above:

Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "/home/mark/bin/myscript.py", line 523, in delete
if files.tempdir:
AttributeError: 'NoneType' object has no attribute 'tempdir'

I don't know exactly what happened so it doesn't work anymore (and it
worked before) but script finalization is always a bit fragile. All values
in all modules dictionaries (holding globals) are set to None (presumably
to help garbage collection by breaking cycles). When your delete function
is called, globals like shutil or files are already gone. A way to avoid
this problem is to hold a reference to all required globals, so your
delete function would become:

@staticmethod
def delete(files=files,rmtree=shutil.rmtree):
'''Called to delete all temp files'''
if files.tempdir:
rmtree(files.tempdir)

But I'm not sure if this is enough because rmtree relies on the os module
to do its work.
 
S

Steven D'Aprano

So see my point at least? I'm still not sure why this approach is
ill-favoured?

Because this is entirely a trivial saving. Who cares? Sheesh.

That's less than the natural variation in execution speed caused by (e.g.)
network events on your PC. I've just run the same do-nothing script (a
simple "pass") three times, and got times of 338ms, 67ms and 74ms. That's
a variation of 271 milliseconds between runs of the same script, and you
care about 18ms???

Saving 18ms on a script that takes 50ms to execute *might* be worthwhile,
if you're using that script in an automated system that executes it
thousands of times. If you're calling it by hand, come on now, you're not
even going to notice the difference! 50ms is close enough to instantaneous
that 32ms is not detectably faster to the human eye.

If you save 18ms one thousand times a day, you save a grand total of ...
eighteen seconds. Wow. Now you can spend more time with your family.

As of 2005, the world's fastest typist Barbara Blackburn has been clocked
at a peak of 212 words per minute for short bursts. Assuming an average of
five key strokes per word (including the space) that's about 18 key
presses per second, or 55 milliseconds per key press. A more realistic
figure for the average professional typist is about six key presses per
second, or 160 milliseconds per key press, and that's for pure
transposition (copying). If you've got to think carefully about what
you're typing, like sys admins do, the average time per key press is
significantly larger.

In other words, unless you can save AT LEAST 160 milliseconds, it isn't
worth typing even one more character. If you have to type one extra
character to save 18ms, you're actually 140ms worse off.

I can't believe the number of people who are spending this amount of time
worrying about such a trivial saving, and I can't believe that I've let
myself be suckered into this discussion. Don't you people have lives???
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top