python profiling, hotspot and strange execution time

Discussion in 'Python' started by cournape@gmail.com, Sep 6, 2005.

  1. Guest

    Hi there,

    I have some scientific application written in python. There is a
    good deal of list processing, but also some "simple" computation such
    as basic linear algebra involved. I would like to speed things up
    implementing some of the functions in C. So I need profiling.

    I first tried to use the default python profiler, but profiling my
    application multiplies the execution time by a factor between 10 and
    100 ! So I decided to give a try to hotspot. I just followed the
    example of the python library reference, but I have some strange
    results concerning cpu time. My profiling script is something like the
    following:

    def run_foo():
    print time.clock()

    function_to_profile()

    print time.clock()

    prof = hotshot.Profile("essai.prof")
    benchtime= prof.runcall(run_foo)
    prof.close()
    stats = hotshot.stats.load("essai.prof")
    stats.strip_dirs()
    stats.sort_stats('time', 'calls')
    stats.print_stats(20)

    The goal is to profile the function function_to_profile(). Running this
    script gives me a CPU executime time of around 2 seconds, whereas the
    difference between the two clock calls is around 10 seconds ! And I
    don't run any other cpu consuming tasks at the same time, so this
    cannot come from other running processes. Is there something perticular
    about hotspot timing I should know ? I am not sure how I can get more
    accurate results with hotspot.

    I would appreciate any help,

    Thanks
    , Sep 6, 2005
    #1
    1. Advertising

  2. Paul Rubin Guest

    writes:
    > I have some scientific application written in python. There is a
    > good deal of list processing, but also some "simple" computation such
    > as basic linear algebra involved. I would like to speed things up
    > implementing some of the functions in C. So I need profiling.


    Why don't you use numarray for the linear algebra?
    Paul Rubin, Sep 6, 2005
    #2
    1. Advertising

  3. In message <>,
    writes
    >Hi there,
    >
    > I have some scientific application written in python. There is a
    >good deal of list processing, but also some "simple" computation such
    >as basic linear algebra involved. I would like to speed things up
    >implementing some of the functions in C. So I need profiling.


    You haven't said which platform you are on. If you are on Windows you
    may want to try Python Performance Validator.

    http://www.softwareverify.com

    Stephen
    --
    Stephen Kellett
    Object Media Limited http://www.objmedia.demon.co.uk/software.html
    Computer Consultancy, Software Development
    Windows C++, Java, Assembler, Performance Analysis, Troubleshooting
    Stephen Kellett, Sep 6, 2005
    #3
  4. Jeremy Jones Guest

    wrote:

    >Hi there,
    >
    > I have some scientific application written in python. There is a
    >good deal of list processing, but also some "simple" computation such
    >as basic linear algebra involved. I would like to speed things up
    >implementing some of the functions in C. So I need profiling.
    >
    > I first tried to use the default python profiler, but profiling my
    >application multiplies the execution time by a factor between 10 and
    >100 ! So I decided to give a try to hotspot.
    >

    OK - first of all, as someone else has asked, what platform are you
    running? I'm assuming it's windows since you're referring to
    time.clock() and then later saying "wall clock".

    Next, what are you hoping that the profiler will give you? If you're
    expecting it to give you the big picture of your application's
    performance and give you "real runtime numbers", you may be
    disappointed. It is a deterministic profiler and will give you CPU time
    spent in different areas of code rather than and overall "how long did
    this thing take to run?".

    > I just followed the
    >example of the python library reference, but I have some strange
    >results concerning cpu time. My profiling script is something like the
    >following:
    >
    >def run_foo():
    > print time.clock()
    >
    > function_to_profile()
    >
    > print time.clock()
    >
    >prof = hotshot.Profile("essai.prof")
    >benchtime= prof.runcall(run_foo)
    >prof.close()
    >stats = hotshot.stats.load("essai.prof")
    >stats.strip_dirs()
    >stats.sort_stats('time', 'calls')
    >stats.print_stats(20)
    >
    >


    Well, let's just add more confusion to the pot, shall we? Look at this
    example (a slight hack from yours)::

    import time
    import hotshot
    import hotshot.stats


    def run_foo():
    print time.clock()
    print time.time()

    time.sleep(5)

    print time.clock()
    print time.time()

    prof = hotshot.Profile("essai.prof")
    benchtime= prof.runcall(run_foo)
    prof.close()
    stats = hotshot.stats.load("essai.prof")
    stats.strip_dirs()
    stats.sort_stats('time', 'calls')
    stats.print_stats(20)

    and the output::

    0.24
    1126011669.55
    0.24
    1126011674.55
    1 function calls in 0.000 CPU seconds

    Ordered by: internal time, call count

    ncalls tottime percall cumtime percall filename:lineno(function)
    1 0.000 0.000 0.000 0.000 tmphQNKbq.py:6(run_foo)
    0 0.000 0.000 profile:0(profiler)



    I inserted a time.time() call since I'm on Linux and time.clock()
    returns a process's CPU time and wanted to show the "wall clock time" as
    it were. So, the stats show 0 time taken, whereas time.time() shows 5
    seconds. It's because the time.sleep() took a negligable amount of CPU
    time which is what the profiler looks at.

    >The goal is to profile the function function_to_profile(). Running this
    >script gives me a CPU executime time of around 2 seconds, whereas the
    >difference between the two clock calls is around 10 seconds !
    >

    I would attribute the wall clock and profile time difference to the
    overhead of hotshot. While hotshot is miles better than the "regular"
    profiler, it can still take a little time to profile code.

    > And I
    >don't run any other cpu consuming tasks at the same time, so this
    >cannot come from other running processes. Is there something perticular
    >about hotspot timing I should know ? I am not sure how I can get more
    >accurate results with hotspot.
    >
    >I would appreciate any help,
    >
    >Thanks
    >
    >
    >

    HTH,


    JMJ
    Jeremy Jones, Sep 6, 2005
    #4
  5. Guest


    > OK - first of all, as someone else has asked, what platform are you
    > running? I'm assuming it's windows since you're referring to
    > time.clock() and then later saying "wall clock".


    Actually, no. I am working on a x86 linux (HT disabled for this
    testing, as I thought it may introduce some subtilities). I am not sure
    aht you mean by wall clock ?
    >
    > Next, what are you hoping that the profiler will give you? If you're
    > expecting it to give you the big picture of your application's
    > performance and give you "real runtime numbers", you may be
    > disappointed. It is a deterministic profiler and will give you CPU time
    > spent in different areas of code rather than and overall "how long did
    > this thing take to run?".


    I am not sure to understand the big difference between "time spent in
    different areas of code" and "how long did this thing take to run?".
    Looking at python doc for deterministic profiling, I understand the
    implementation difference, and the performance implications, but I
    don't see why deterministic profiling would not give me an overall
    picture ?

    > Well, let's just add more confusion to the pot, shall we? Look at this
    > example (a slight hack from yours)::
    >
    > import time
    > import hotshot
    > import hotshot.stats
    >
    >
    > def run_foo():
    > print time.clock()
    > print time.time()
    >
    > time.sleep(5)
    >
    > print time.clock()
    > print time.time()
    >
    > prof = hotshot.Profile("essai.prof")
    > benchtime= prof.runcall(run_foo)
    > prof.close()
    > stats = hotshot.stats.load("essai.prof")
    > stats.strip_dirs()
    > stats.sort_stats('time', 'calls')
    > stats.print_stats(20)
    >
    > and the output::
    >
    > 0.24
    > 1126011669.55
    > 0.24
    > 1126011674.55
    > 1 function calls in 0.000 CPU seconds
    >
    > Ordered by: internal time, call count
    >
    > ncalls tottime percall cumtime percall filename:lineno(function)
    > 1 0.000 0.000 0.000 0.000 tmphQNKbq.py:6(run_foo)
    > 0 0.000 0.000 profile:0(profiler)
    >
    >
    >
    > I inserted a time.time() call since I'm on Linux and time.clock()
    > returns a process's CPU time and wanted to show the "wall clock time" as
    > it were. So, the stats show 0 time taken, whereas time.time() shows 5
    > seconds. It's because the time.sleep() took a negligable amount of CPU
    > time which is what the profiler looks at.


    Well, your example make actually more sense to me :) I understand the
    difference between CPU time and time spent in the python code (even if
    I was not clear in my previous post about it...). But this case does
    not apply to my code, as my code is never "idled", takes 100 % of the
    cpu, with no other CPU consuming task

    > I would attribute the wall clock and profile time difference to the
    > overhead of hotshot. While hotshot is miles better than the "regular"
    > profiler, it can still take a little time to profile code.


    Well, if hotshot reported a timing which is longer than the execution
    time without it, I would have considered that to be normal. Even in C,
    using gprof has a non negligeable overhead, most of the time.

    What I don't understand is why hotshot reports that do_foo is executed
    in 2 seconds whereas it effectively takes more than 10 seconds ? Is it
    because I don't understand what deterministic profiling is about ?

    David
    , Sep 8, 2005
    #5
  6. Jeremy Jones Guest

    wrote:
    <snip>

    >I am not sure to understand the big difference between "time spent in
    >different areas of code" and "how long did this thing take to run?".
    >Looking at python doc for deterministic profiling, I understand the
    >implementation difference, and the performance implications, but I
    >don't see why deterministic profiling would not give me an overall
    >picture ?
    >
    >

    I think from below you said you were more clear on this. Cool.

    <snip>

    >Well, your example make actually more sense to me :) I understand the
    >difference between CPU time and time spent in the python code (even if
    >I was not clear in my previous post about it...). But this case does
    >not apply to my code, as my code is never "idled", takes 100 % of the
    >cpu, with no other CPU consuming task
    >
    >
    >
    >>I would attribute the wall clock and profile time difference to the
    >>overhead of hotshot. While hotshot is miles better than the "regular"
    >>profiler, it can still take a little time to profile code.
    >>
    >>

    >
    >Well, if hotshot reported a timing which is longer than the execution
    >time without it, I would have considered that to be normal. Even in C,
    >using gprof has a non negligeable overhead, most of the time.
    >
    >

    Actually, I'd expect the opposite, but not as extreme for your case. I
    would expect it to *report* that a piece of code took less time to
    execute than I *observed* it taking. Reasons in the snipped area
    above. Unless you're calling a C extension, in which case, IIRC, it's
    supposed to report the actual execution time of the C call (and I guess
    plus any overhead that hotshot may cause it to incur) in which case you
    would be IMO 100% correct. I hope you're not calling a C extension, or
    my head's gonna explode.

    >What I don't understand is why hotshot reports that do_foo is executed
    >in 2 seconds whereas it effectively takes more than 10 seconds ? Is it
    >because I don't understand what deterministic profiling is about ?
    >
    >

    The profiler is supposed to be smart about how it tracks time spent in
    execution so it doesn't get readings that are tainted by other processes
    running or other stuff. I could easily see a 2->10 second disparity if
    your process were idling somehow. Now, if you're doing a lot of IO, I
    wonder if the profiler isn't taking into consideration any blocking
    calls that may max out the CPU in IOWAIT... Are you doing a lot of IO?

    >David
    >
    >
    >

    JMJ
    Jeremy Jones, Sep 8, 2005
    #6
  7. Guest

    I was unhappy with both hotshot and the standard python profiler, so
    I wrote my own, which may be what you are looking for. I've submitted
    it as a patch at:

    http://sourceforge.net/tracker/index.php?func=detail&aid=1212837&group_id=5470&atid=305470

    It should add a minimum of overhead, give real numbers and also
    gives stats on child calls. However, it is not compatible with
    the stats module.

    You can compile it as a standalone module.


    wrote:
    > Hi there,
    >
    > I have some scientific application written in python. There is a
    > good deal of list processing, but also some "simple" computation such
    > as basic linear algebra involved. I would like to speed things up
    > implementing some of the functions in C. So I need profiling.
    >
    > I first tried to use the default python profiler, but profiling my
    > application multiplies the execution time by a factor between 10 and
    > 100 ! So I decided to give a try to hotspot. I just followed the
    > example of the python library reference, but I have some strange
    > results concerning cpu time. My profiling script is something like the
    > following:
    >
    > def run_foo():
    > print time.clock()
    >
    > function_to_profile()
    >
    > print time.clock()
    >
    > prof = hotshot.Profile("essai.prof")
    > benchtime= prof.runcall(run_foo)
    > prof.close()
    > stats = hotshot.stats.load("essai.prof")
    > stats.strip_dirs()
    > stats.sort_stats('time', 'calls')
    > stats.print_stats(20)
    >
    > The goal is to profile the function function_to_profile(). Running this
    > script gives me a CPU executime time of around 2 seconds, whereas the
    > difference between the two clock calls is around 10 seconds ! And I
    > don't run any other cpu consuming tasks at the same time, so this
    > cannot come from other running processes. Is there something perticular
    > about hotspot timing I should know ? I am not sure how I can get more
    > accurate results with hotspot.
    >
    > I would appreciate any help,
    >
    > Thanks
    , Sep 8, 2005
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Naresh Agarwal
    Replies:
    9
    Views:
    7,553
    Brad BARCLAY
    Jan 8, 2004
  2. WiZZ
    Replies:
    6
    Views:
    7,619
  3. aurora
    Replies:
    3
    Views:
    383
    aurora
    Feb 2, 2005
  4. westblob

    Building Hotspot and JDK

    westblob, Nov 26, 2006, in forum: Java
    Replies:
    1
    Views:
    292
    Andrew Thompson
    Nov 26, 2006
  5. Kelvin
    Replies:
    1
    Views:
    191
    Aaron Bertrand [SQL Server MVP]
    Jul 13, 2005
Loading...

Share This Page