Idiom for running compiled python scripts?

Discussion in 'Python' started by Mark, Mar 21, 2007.

  1. Mark

    Mark Guest

    Hi, I'm new to python and looking for a better idiom to use for the
    manner I have been organising my python scripts. I've googled all over
    the place about this but found absolutely nothing.

    I'm a linux/unix command line guy quite experienced in shell scripts
    etc. I have a heap of command line utility scripts which I run directly.
    What is the best way to create python command line scripts but exploit
    the (loadonly) speed-up benefit of python compiled code?

    E.g. say I have a python script "myprog.py". I could just execute that
    directly each time but that means it is "compiled" each time I run it
    which is not efficient and adds to startup time. I have been creating a
    complimentary script "myprog" stub which just does:

    #!/usr/bin/env python
    from myprog import main
    if __name__ == "__main__":
    main()

    Of course this compiles myprog.py into myprog.pyc on first run as I am
    wanting.

    I have one of these stubs for all my python scripts I've created so far.
    Is there not a better way? Do I have to create a separate stub each
    time? I find it a bit messy to require a pair of scripts for each
    utility and it also contributes some inefficiency. Given the above stub
    is so boilerplate, why does python not provide a general stub/utility
    mechanism for this?
     
    Mark, Mar 21, 2007
    #1
    1. Advertising

  2. Mark wrote:

    > E.g. say I have a python script "myprog.py". I could just execute
    > that directly each time but that means it is "compiled" each time
    > I run it which is not efficient and adds to startup time.


    Did you measure the performance hit in your case?

    > I have one of these stubs for all my python scripts I've created
    > so far. Is there not a better way? Do I have to create a separate
    > stub each time? I find it a bit messy to require a pair of scripts
    > for each utility and it also contributes some inefficiency. Given
    > the above stub is so boilerplate, why does python not provide a
    > general stub/utility mechanism for this?


    I've noticed that calling the interpreter with pre-compiled pyc
    files also works.

    Regards,


    Björn


    --
    BOFH excuse #68:

    only available on a need to know basis
     
    Bjoern Schliessmann, Mar 21, 2007
    #2
    1. Advertising

  3. Mark

    Max Erickson Guest

    Mark <> wrote:

    ....
    > #!/usr/bin/env python
    > from myprog import main
    > if __name__ == "__main__":
    > main()
    >
    > Of course this compiles myprog.py into myprog.pyc on first run as
    > I am wanting.
    >
    > I have one of these stubs for all my python scripts I've created
    > so far. Is there not a better way? Do I have to create a separate
    > stub each time? I find it a bit messy to require a pair of
    > scripts for each utility and it also contributes some
    > inefficiency. Given the above stub is so boilerplate, why does
    > python not provide a general stub/utility mechanism for this?


    I don't know of a better way to organize things, but as an
    alternative, you could have a script where you import each of the
    scripts that you want compiled, python will write the compiled files
    to disk when you run it(thus avoiding the need to split the other
    scripts). There are also the py_compile and compileall modules, which
    have facilities for generating byte code.

    More here:

    http://effbot.org/zone/python-compile.htm

    under 'Compiling python modules to byte code'.


    max
     
    Max Erickson, Mar 21, 2007
    #3
  4. Mark

    Mark Guest

    So given the lack of response it seems that there is probably no such
    idiom and that I should not be concerned by the inefficiency inherent in
    running .py scripts directly?

    I did some time tests and sure, the speed gain is slight, but it is a
    gain none the less.
     
    Mark, Mar 23, 2007
    #4
  5. Mark

    Steve Holden Guest

    Mark wrote:
    > So given the lack of response it seems that there is probably no such
    > idiom and that I should not be concerned by the inefficiency inherent in
    > running .py scripts directly?
    >
    > I did some time tests and sure, the speed gain is slight, but it is a
    > gain none the less.


    Someone already told you - compile the files manually (see the compile
    built-in function) and then use

    python file.pyc

    regards
    Steve
    --
    Steve Holden +44 150 684 7255 +1 800 494 3119
    Holden Web LLC/Ltd http://www.holdenweb.com
    Skype: holdenweb http://del.icio.us/steve.holden
    Recent Ramblings http://holdenweb.blogspot.com
     
    Steve Holden, Mar 23, 2007
    #5
  6. Mark

    Steve Holden Guest

    Mark wrote:
    > So given the lack of response it seems that there is probably no such
    > idiom and that I should not be concerned by the inefficiency inherent in
    > running .py scripts directly?
    >
    > I did some time tests and sure, the speed gain is slight, but it is a
    > gain none the less.


    Sorry, what you really need is the compileFile() function from the
    compiler module.

    regards
    Steve
    --
    Steve Holden +44 150 684 7255 +1 800 494 3119
    Holden Web LLC/Ltd http://www.holdenweb.com
    Skype: holdenweb http://del.icio.us/steve.holden
    Recent Ramblings http://holdenweb.blogspot.com
     
    Steve Holden, Mar 23, 2007
    #6
  7. On Fri, 23 Mar 2007 01:01:15 +0000, Mark wrote:

    > So given the lack of response it seems that there is probably no such
    > idiom and that I should not be concerned by the inefficiency inherent in
    > running .py scripts directly?
    >
    > I did some time tests and sure, the speed gain is slight, but it is a
    > gain none the less.


    Since you've done these tests already, perhaps you can tell us what gain
    you actually got?

    Here's a test I did:

    [steve@apple ~]$ time python script.py
    the result is 166166000

    real 0m0.555s
    user 0m0.470s
    sys 0m0.011s
    [steve@apple ~]$ time python script.pyc
    the result is 166166000

    real 0m0.540s
    user 0m0.456s
    sys 0m0.011s


    That gives me an absolute gain of 15ms which is a percentage gain of about
    3%. But don't forget the time it takes you to type the extra "c" at the
    end of the script, even with filename completion. The average human
    reaction time is something between 200 and 270 milliseconds, so unless
    you're saving at least 200ms, typing that extra "c" at the end actually
    wastes time.

    Of course you have to type the "c". You're not deleting the source files
    away are you? *wink*



    --
    Steven D'Aprano
     
    Steven D'Aprano, Mar 23, 2007
    #7
  8. Mark

    Mark Guest

    On Fri, 23 Mar 2007 14:03:12 +1100, Steven D'Aprano wrote:
    > Since you've done these tests already, perhaps you can tell us what gain
    > you actually got?


    About the same as you, ~20 msecs for my small script samples.

    > Of course you have to type the "c". You're not deleting the source files
    > away are you? *wink*


    Sorry, the wink is lost on me?

    Of course I am not deleting the sources. In fact, I am also talking
    about python scripts being called from shell scripts. I guess I'm just
    surprised that the python installation does not provide a small stub
    invoker, e.g:

    A small script called "python_compile_and_run" in "pseudo" code:

    #!/usr/bin/env python
    import sys

    # Following is invalid syntax unfortunately :(
    from sys.argv[1].rstrip('.py') import main

    sys.argv = sys.argv[1:]
    if __name__ == "__main__":
    main()

    so I could just do a "python_compile_and_run myscript.py" and it would
    do what I want, i.e. run myscript.pyc if available and valid, generate
    and run it if necessary.
     
    Mark, Mar 23, 2007
    #8
  9. On Mar 23, 8:30 am, Mark <> wrote:
    >
    > Of course I am not deleting the sources. In fact, I am also talking
    > about python scripts being called from shell scripts.


    There's a nice recipe in Python Cookbook (Martelli et al.) for this.
    It involves zipping your .pyc files and adding a shell stub. Never
    used it before but I'm going to need something similar in the near
    future, probably with a text templating system such as Cheetah
    (www.cheetahtemplate.org).

    HTH

    Gerard
     
    Gerard Flanagan, Mar 23, 2007
    #9
  10. On Fri, 23 Mar 2007 07:30:58 +0000, Mark wrote:

    > On Fri, 23 Mar 2007 14:03:12 +1100, Steven D'Aprano wrote:
    >> Since you've done these tests already, perhaps you can tell us what gain
    >> you actually got?

    >
    > About the same as you, ~20 msecs for my small script samples.


    Well, I think that pretty much answers your question about whether it is
    worth pre-compiling short shell scripts: you save about 20ms in execution
    time, and lose 200ms in typing time. (Maybe a bit less if you are a
    fast typist and don't use auto-completion.) You do the maths.


    >> Of course you have to type the "c". You're not deleting the source files
    >> away are you? *wink*

    >
    > Sorry, the wink is lost on me?


    It is because I didn't really think you were deleting the source files.
    That would be incredibly stupid. But I mentioned it just in case some
    not-so-bright spark decided to argue that you could use auto-completion
    without needing to type that final "c" if you deleted the source file.

    Presumably now somebody is going to suggest merely *moving* the source
    files into another directory, thus spending a minute or two each time they
    edit a script re-arranging files in order to save twenty or thirty
    milliseconds when they execute the script. Hey, if your time is so
    valuable that 20ms means that much to you, go for it.


    > Of course I am not deleting the sources. In fact, I am also talking
    > about python scripts being called from shell scripts. I guess I'm just
    > surprised that the python installation does not provide a small stub
    > invoker, e.g:
    >
    > A small script called "python_compile_and_run" in "pseudo" code:


    [snip pseudo-code]

    > so I could just do a "python_compile_and_run myscript.py" and it would
    > do what I want, i.e. run myscript.pyc if available and valid, generate
    > and run it if necessary.


    You shouldn't expect Python to come with every imaginable special-purpose
    script already written for you! Besides, it's pretty simple to get that
    functionality by hand when you need it, or automatically for that matter.

    Here's one (untested) script that executes the pyc file in a subshell if
    it exists and is new enough, and compiles it if it doesn't.


    import os, sys, compiler
    from stat import ST_MTIME as MT
    if __name__ == "__main__":
    scriptname = sys.argv[1]
    compiledname = scriptname + "c"
    if not os.path.exists(compiledname) or \
    os.stat(compiledname)[MT] < os.stat(scriptname)[MT]:
    # compiled file doesn't exist, or is too old
    compiler.compileFile(scriptname)
    assert os.path.exists(compiledname)
    resultcode = os.system('python %s' % compiledname)
    sys.exit(resultcode)

    Now don't forget to test whether launching the subshell takes longer than
    the 20ms you might save. All that effort, and wouldn't it be ironic if it
    was actually *slower* than executing the script from scratch each time...


    --
    Steven.
     
    Steven D'Aprano, Mar 23, 2007
    #10
  11. Mark

    mark Guest

    On Fri, 23 Mar 2007 22:24:07 +1100, Steven D'Aprano wrote:
    > if not os.path.exists(compiledname) or \ os.stat(compiledname)[MT] <
    > os.stat(scriptname)[MT]:
    > # compiled file doesn't exist, or is too old


    Surely the validity check done by Python is more sophisticated than
    this? Doesn't the binary file have to be compiled under the same python
    version etc?

    > Now don't forget to test whether launching the subshell takes longer
    > than the 20ms you might save. All that effort, and wouldn't it be ironic
    > if it was actually *slower* than executing the script from scratch each
    > time...


    But Python proper is executing all the above anyhow isn't it? So the 20
    msecs advantage I measured already includes this logic.

    Anyhow, I give up. Compilation, it seems, only applies to python
    modules. Compilation is not appropriate for Python scripts. Should be
    in the FAQ.
     
    mark, Mar 23, 2007
    #11
  12. On Fri, 23 Mar 2007 12:22:44 +0000, mark wrote:

    > On Fri, 23 Mar 2007 22:24:07 +1100, Steven D'Aprano wrote:
    >> if not os.path.exists(compiledname) or \ os.stat(compiledname)[MT] <
    >> os.stat(scriptname)[MT]:
    >> # compiled file doesn't exist, or is too old

    >
    > Surely the validity check done by Python is more sophisticated than
    > this? Doesn't the binary file have to be compiled under the same python
    > version etc?


    Of course. What, do you want me to do all your work? :)



    >> Now don't forget to test whether launching the subshell takes longer
    >> than the 20ms you might save. All that effort, and wouldn't it be ironic
    >> if it was actually *slower* than executing the script from scratch each
    >> time...

    >
    > But Python proper is executing all the above anyhow isn't it? So the 20
    > msecs advantage I measured already includes this logic.


    I don't know how you measured the 20ms.

    When you call a script direct from the shell, you've already got a shell
    running. When you call a script from within Python via os.system, it has
    to launch a sub-shell. That takes time.


    > Anyhow, I give up. Compilation, it seems, only applies to python
    > modules. Compilation is not appropriate for Python scripts. Should be
    > in the FAQ.


    No, that's not true. Python scripts certainly take advantage of compiled
    modules.

    The real lesson of this is that optimization isn't necessarily
    straightforward. What we imagine "should be" faster might not be in
    practice -- especially when dealing with micro-optimizations that only
    save a few tens of milliseconds.

    Frankly, it simply isn't worth trying to save 20ms in a script that takes
    less than a second to run. If you scratch your nose before hitting enter,
    you've wasted a hundred times what you've just spent hours trying to save.

    Or, to put it another way:

    The Rules of Optimization are simple.
    Rule 1: Don't do it.
    Rule 2 (for experts only): Don't do it yet.
    -- Michael A. Jackson (no, not that Michael Jackson), "Principles of
    Program Design", 1975.



    --
    Steven.
     
    Steven D'Aprano, Mar 23, 2007
    #12
  13. Mark <> wrote:
    ...
    > so I could just do a "python_compile_and_run myscript.py" and it would
    > do what I want, i.e. run myscript.pyc if available and valid, generate
    > and run it if necessary.


    You can use

    python -c 'import myscript; myscript.main()'

    and variations thereon.


    Alex
     
    Alex Martelli, Mar 23, 2007
    #13
  14. Mark

    Mark Guest

    On Fri, 23 Mar 2007 07:47:04 -0700, Alex Martelli wrote:
    > You can use
    >
    > python -c 'import myscript; myscript.main()'
    >
    > and variations thereon.


    Hmmm, after all that, this seems to be close to what I was looking for.

    Thanks Alex. Didn't find anything about this in your cookbook! (I'm just
    starting reading it whole - best way to really learn the language I
    think).

    So the general purpose invoking bash script e.g. "runpy" is merely something
    like:

    #################################################################
    #!/bin/bash

    if [ $# -lt 1 ]; then
    echo "usage: `basename $0` script.py [script args ..]"
    exit 1
    fi

    PROG=$1
    DIR=`dirname $PROG`
    MOD=`basename $PROG`
    MOD=${MOD%.py}
    shift
    exec python -c "import sys; \
    sys.argv[0] = \"$PROG\"; \
    sys.path.append(\"$DIR\"); \
    import $MOD; $MOD.main()" $@
    #################################################################

    So I timed "~/bin/myscript.py myargs" against "runpy ~/bin/myscript.py
    myargs" but got only maybe a couple of millisec improvement (using a
    1000 line myscript.py which just immediately exits - to try and push the
    advantage!).

    So I agree - the ends do not justify the means here and I'll just
    execute myscript.py directly. Still not sure why python does not provide
    this as a "python --compile_and_run myscript.py" option though?! ;)
     
    Mark, Mar 24, 2007
    #14
  15. Mark

    Guest

    On Mar 23, 9:30 am, Mark <> wrote:
    > A small script called "python_compile_and_run" in "pseudo" code:
    >
    > #!/usr/bin/env python
    > import sys
    >
    > # Following is invalid syntax unfortunately :(
    > from sys.argv[1].rstrip('.py') import main
    >
    > sys.argv = sys.argv[1:]
    > if __name__ == "__main__":
    > main()
    >
    > so I could just do a "python_compile_and_run myscript.py" and it would
    > do what I want, i.e. run myscript.pyc if available and valid, generate
    > and run it if necessary.


    There's __import__ which allows you to do what you tried:

    m = __import__(sys.argv[1].rstrip('.py'))

    Also, rstrip doesn't work like you think it does.
    'pyxyypp.py'.rstrip('.py') == 'pyx'

    Answering also to your later message:

    > So the general purpose invoking bash script e.g. "runpy" is merely something

    like:

    Curiously enough, Python 2.5 has a module called runpy:
    http://docs.python.org/lib/module-runpy.html

    which seems to almost do what you want. It doesn't compile the modules
    but you could make a modification which does. The benefit over just
    using __import__("module").main() would be
    that your scripts wouldn't necessarily need a function called "main",
    but would still work with scripts that use the __name__ == '__main__'
    idiom.
    A simple implementation that "works":


    import imp, sys, os
    c = sys.argv[1]
    if not os.path.exists(c + 'c') or os.stat(c).st_mtime > os.stat(c +
    'c').st_mtime:
    import compiler
    compiler.compileFile(c)
    del sys.argv[0]
    imp.load_compiled('__main__', c + 'c')


    I timed it against running plain .py and running .pyc directly.
    It seemed to be roughly on par with running .pyc directly, and about
    18ms
    faster than running .py. The file had 600 lines (21kb) of code.
     
    , Mar 24, 2007
    #15
  16. Mark

    Mark Guest

    On Sat, 24 Mar 2007 07:21:21 -0700, irstas wrote:
    > Also, rstrip doesn't work like you think it does.
    > 'pyxyypp.py'.rstrip('.py') == 'pyx'


    Well there is embarrassing confirmation that I am a python newbie :(

    > I timed it against running plain .py and running .pyc directly. It
    > seemed to be roughly on par with running .pyc directly, and about 18ms
    > faster than running .py. The file had 600 lines (21kb) of code.


    So see my point at least? I'm still not sure why this approach is
    ill-favoured?

    Thanks very much for all the detailed responses here.
     
    Mark, Mar 24, 2007
    #16
  17. Mark

    Mark Guest

    On Sat, 24 Mar 2007 07:21:21 -0700, irstas wrote:
    > A simple implementation that "works":


    Not quite irstas BTW ..

    > import imp, sys, os
    > c = sys.argv[1]
    > if not os.path.exists(c + 'c') or os.stat(c).st_mtime > os.stat(c +
    > 'c').st_mtime:
    > import compiler
    > compiler.compileFile(c)
    > del sys.argv[0]
    > imp.load_compiled('__main__', c + 'c')


    The above doesn't actually work for my test script. I have an atexit
    call in the script which is deleting some temp files and I get the
    following traceback on termination when run with the above:

    Error in atexit._run_exitfuncs:
    Traceback (most recent call last):
    File "atexit.py", line 24, in _run_exitfuncs
    func(*targs, **kargs)
    File "/home/mark/bin/myscript.py", line 523, in delete
    if files.tempdir:
    AttributeError: 'NoneType' object has no attribute 'tempdir'
    Error in sys.exitfunc:
    Traceback (most recent call last):
    File "/usr/lib/python2.4/atexit.py", line 24, in _run_exitfuncs
    func(*targs, **kargs)
    File "/home/mark/bin/myscript.py", line 523, in delete
    if files.tempdir:
    AttributeError: 'NoneType' object has no attribute 'tempdir'

    The appropriate source code is:

    At the start of main() ..

    # Ensure all temp files deleted on exit
    import atexit
    atexit.register(files.delete)

    and then from my class "files":

    @staticmethod
    def delete():
    '''Called to delete all temp files'''

    if files.tempdir:
    shutil.rmtree(files.tempdir)

    Something about the environment is not quite the same. Any ideas?
     
    Mark, Mar 24, 2007
    #17
  18. En Sat, 24 Mar 2007 20:46:15 -0300, Mark <> escribió:

    > The above doesn't actually work for my test script. I have an atexit
    > call in the script which is deleting some temp files and I get the
    > following traceback on termination when run with the above:
    >
    > Error in atexit._run_exitfuncs:
    > Traceback (most recent call last):
    > File "atexit.py", line 24, in _run_exitfuncs
    > func(*targs, **kargs)
    > File "/home/mark/bin/myscript.py", line 523, in delete
    > if files.tempdir:
    > AttributeError: 'NoneType' object has no attribute 'tempdir'


    I don't know exactly what happened so it doesn't work anymore (and it
    worked before) but script finalization is always a bit fragile. All values
    in all modules dictionaries (holding globals) are set to None (presumably
    to help garbage collection by breaking cycles). When your delete function
    is called, globals like shutil or files are already gone. A way to avoid
    this problem is to hold a reference to all required globals, so your
    delete function would become:

    @staticmethod
    def delete(files=files,rmtree=shutil.rmtree):
    '''Called to delete all temp files'''
    if files.tempdir:
    rmtree(files.tempdir)

    But I'm not sure if this is enough because rmtree relies on the os module
    to do its work.

    --
    Gabriel Genellina
     
    Gabriel Genellina, Mar 25, 2007
    #18
  19. En Sat, 24 Mar 2007 20:46:15 -0300, Mark <> escribió:

    > The above doesn't actually work for my test script. I have an atexit
    > call in the script which is deleting some temp files and I get the
    > following traceback on termination when run with the above:
    >
    > Error in atexit._run_exitfuncs:
    > Traceback (most recent call last):
    > File "atexit.py", line 24, in _run_exitfuncs
    > func(*targs, **kargs)
    > File "/home/mark/bin/myscript.py", line 523, in delete
    > if files.tempdir:
    > AttributeError: 'NoneType' object has no attribute 'tempdir'


    I don't know exactly what happened so it doesn't work anymore (and it
    worked before) but script finalization is always a bit fragile. All values
    in all modules dictionaries (holding globals) are set to None (presumably
    to help garbage collection by breaking cycles). When your delete function
    is called, globals like shutil or files are already gone. A way to avoid
    this problem is to hold a reference to all required globals, so your
    delete function would become:

    @staticmethod
    def delete(files=files,rmtree=shutil.rmtree):
    '''Called to delete all temp files'''
    if files.tempdir:
    rmtree(files.tempdir)

    But I'm not sure if this is enough because rmtree relies on the os module
    to do its work.

    --
    Gabriel Genellina
     
    Gabriel Genellina, Mar 25, 2007
    #19
  20. On Sat, 24 Mar 2007 22:59:06 +0000, Mark wrote:

    >> I timed it against running plain .py and running .pyc directly. It
    >> seemed to be roughly on par with running .pyc directly, and about 18ms
    >> faster than running .py. The file had 600 lines (21kb) of code.

    >
    > So see my point at least? I'm still not sure why this approach is
    > ill-favoured?


    Because this is entirely a trivial saving. Who cares? Sheesh.

    That's less than the natural variation in execution speed caused by (e.g.)
    network events on your PC. I've just run the same do-nothing script (a
    simple "pass") three times, and got times of 338ms, 67ms and 74ms. That's
    a variation of 271 milliseconds between runs of the same script, and you
    care about 18ms???

    Saving 18ms on a script that takes 50ms to execute *might* be worthwhile,
    if you're using that script in an automated system that executes it
    thousands of times. If you're calling it by hand, come on now, you're not
    even going to notice the difference! 50ms is close enough to instantaneous
    that 32ms is not detectably faster to the human eye.

    If you save 18ms one thousand times a day, you save a grand total of ...
    eighteen seconds. Wow. Now you can spend more time with your family.

    As of 2005, the world's fastest typist Barbara Blackburn has been clocked
    at a peak of 212 words per minute for short bursts. Assuming an average of
    five key strokes per word (including the space) that's about 18 key
    presses per second, or 55 milliseconds per key press. A more realistic
    figure for the average professional typist is about six key presses per
    second, or 160 milliseconds per key press, and that's for pure
    transposition (copying). If you've got to think carefully about what
    you're typing, like sys admins do, the average time per key press is
    significantly larger.

    In other words, unless you can save AT LEAST 160 milliseconds, it isn't
    worth typing even one more character. If you have to type one extra
    character to save 18ms, you're actually 140ms worse off.

    I can't believe the number of people who are spending this amount of time
    worrying about such a trivial saving, and I can't believe that I've let
    myself be suckered into this discussion. Don't you people have lives???



    --
    Steven
    who has no life, which is why he is spending time complaining about people
    who have no lives.
     
    Steven D'Aprano, Mar 25, 2007
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Klaus Schneider
    Replies:
    1
    Views:
    552
    Rolf Magnus
    Dec 2, 2004
  2. Jan Danielsson
    Replies:
    8
    Views:
    631
    Mike Meyer
    Jul 22, 2005
  3. Jp Calderone
    Replies:
    0
    Views:
    461
    Jp Calderone
    Jul 21, 2005
  4. lander
    Replies:
    5
    Views:
    598
    bruce barker
    Mar 5, 2008
  5. davidj411
    Replies:
    0
    Views:
    518
    davidj411
    Jun 27, 2008
Loading...

Share This Page