unit-profiling, similar to unit-testing

Discussion in 'Python' started by Ulrich Eckhardt, Nov 16, 2011.

  1. Hi!

    I'm currently trying to establish a few tests here that evaluate certain
    performance characteristics of our systems. As part of this, I found
    that these tests are rather similar to unit-tests, only that they are
    much more fuzzy and obviously dependent on the systems involved, CPU
    load, network load, day of the week (Tuesday is virus scan day) etc.

    What I'd just like to ask is how you do such things. Are there tools
    available that help? I was considering using the unit testing framework,
    but the problem with that is that the results are too hard to interpret
    programmatically and too easy to misinterpret manually. Any suggestions?

    Cheers!

    Uli
     
    Ulrich Eckhardt, Nov 16, 2011
    #1
    1. Advertising

  2. Ulrich Eckhardt

    Roy Smith Guest

    In article <>,
    Ulrich Eckhardt <> wrote:

    > Hi!
    >
    > I'm currently trying to establish a few tests here that evaluate certain
    > performance characteristics of our systems. As part of this, I found
    > that these tests are rather similar to unit-tests, only that they are
    > much more fuzzy and obviously dependent on the systems involved, CPU
    > load, network load, day of the week (Tuesday is virus scan day) etc.
    >
    > What I'd just like to ask is how you do such things. Are there tools
    > available that help? I was considering using the unit testing framework,
    > but the problem with that is that the results are too hard to interpret
    > programmatically and too easy to misinterpret manually. Any suggestions?


    It's really, really, really hard to either control for, or accurately
    measure, things like CPU or network load. There's so much stuff you
    can't even begin to see. The state of your main memory cache. Disk
    fragmentation. What I/O is happening directly out of kernel buffers vs
    having to do a physical disk read. How slow your DNS server is today.

    What I suggest is instrumenting your unit test suite to record not just
    the pas/fail status of every test, but also the test duration. Stick
    these into a database as the tests run. Over time, you will accumulate
    a whole lot of performance data, which you can then start to mine.

    While you're running the tests, gather as much system performance data
    as you can (output of top, vmstat, etc) and stick that into your
    database too. You never know when you'll want to refer to the data, so
    just collect it all and save it forever.
     
    Roy Smith, Nov 16, 2011
    #2
    1. Advertising

  3. Am 16.11.2011 15:36, schrieb Roy Smith:
    > It's really, really, really hard to either control for, or accurately
    > measure, things like CPU or network load. There's so much stuff you
    > can't even begin to see. The state of your main memory cache. Disk
    > fragmentation. What I/O is happening directly out of kernel buffers vs
    > having to do a physical disk read. How slow your DNS server is today.


    Fortunately, I am in a position where I'm running tests on one system
    (generic desktop PC) while the system to test is another one, and there
    both hardware and software is under my control. Since this is rather
    smallish and embedded, the power and load of the desktop don't play a
    significant role, the other side is usually the bottleneck. ;)


    > What I suggest is instrumenting your unit test suite to record not just
    > the pas/fail status of every test, but also the test duration. Stick
    > these into a database as the tests run. Over time, you will accumulate
    > a whole lot of performance data, which you can then start to mine.


    I'm not sure. I see unit tests as something that makes sure things run
    correctly. For performance testing, I have functions to set up and tear
    down the environment. Then, I found it useful to have separate code to
    prime a cache, which is something done before each test run, but which
    is not part of the test run itself. I'm repeating each test run N times,
    recording the times and calculating maximum, minimum, average and
    standard deviation. Some of this is similar to unit testing (code to set
    up/tear down), but other things are too different. Also, sometimes I can
    vary tests with a factor F, then I would also want to capture the
    influence of this factor. I would even wonder if you can't verify the
    behaviour agains an expected Big-O complexity somehow.

    All of this is rather general, not specific to my use case, hence my
    question if there are existing frameworks to facilitate this task. Maybe
    it's time to create one...


    > While you're running the tests, gather as much system performance data
    > as you can (output of top, vmstat, etc) and stick that into your
    > database too. You never know when you'll want to refer to the data, so
    > just collect it all and save it forever.


    Yes, this is surely something that is necessary, in particular since
    there are no clear success/failure outputs like for unit tests and they
    require a human to interpret them.


    Cheers!

    Uli
     
    Ulrich Eckhardt, Nov 17, 2011
    #3
  4. Ulrich Eckhardt

    Roy Smith Guest

    In article <>,
    Ulrich Eckhardt <> wrote:

    > Yes, this is surely something that is necessary, in particular since
    > there are no clear success/failure outputs like for unit tests and they
    > require a human to interpret them.


    As much as possible, you want to automate things so no human
    intervention is required.

    For example, let's say you have a test which calls foo() and times how
    long it takes. You've already mentioned that you run it N times and
    compute some basic (min, max, avg, sd) stats. So far, so good.

    The next step is to do some kind of regression against past results.
    Once you've got a bunch of historical data, it should be possible to
    look at today's numbers and detect any significant change in performance.

    Much as I loathe the bureaucracy and religious fervor which has grown up
    around Six Sigma, it does have some good tools. You might want to look
    into control charts (http://en.wikipedia.org/wiki/Control_chart). You
    think you've got the test environment under control, do you? Try
    plotting a month's worth of run times for a particular test on a control
    chart and see what it shows.

    Assuming your process really is under control, I would write scripts
    that did the following kinds of analysis:

    1) For a given test, do a linear regression of run time vs date. If the
    line has any significant positive slope, you want to investigate why.

    2) You already mentioned, "I would even wonder if you can't verify the
    behaviour agains an expected Big-O complexity somehow". Of course you
    can. Run your test a bunch of times with different input sizes. I
    would try something like a 1-2-5 progression over several decades (i.e.
    input sizes of 10, 20, 50, 100, 200, 500, 1000, etc) You will have to
    figure out what an appropriate range is, and how to generate useful
    input sets. Now, curve fit your performance numbers to various shape
    curves and see what correlation coefficient you get.

    All that being said, in my experience, nothing beats plotting your data
    and looking at it.
     
    Roy Smith, Nov 17, 2011
    #4
  5. On Wed, Nov 16, 2011 at 09:36:40AM -0500, Roy Smith wrote:
    > In article <>,
    > Ulrich Eckhardt <> wrote:
    >
    > > Hi!
    > >
    > > I'm currently trying to establish a few tests here that evaluate certain
    > > performance characteristics of our systems. As part of this, I found
    > > that these tests are rather similar to unit-tests, only that they are
    > > much more fuzzy and obviously dependent on the systems involved, CPU
    > > load, network load, day of the week (Tuesday is virus scan day) etc.
    > >
    > > What I'd just like to ask is how you do such things. Are there tools
    > > available that help? I was considering using the unit testing framework,
    > > but the problem with that is that the results are too hard to interpret
    > > programmatically and too easy to misinterpret manually. Any suggestions?

    >
    > It's really, really, really hard to either control for, or accurately
    > measure, things like CPU or network load. There's so much stuff you
    > can't even begin to see. The state of your main memory cache. Disk
    > fragmentation. What I/O is happening directly out of kernel buffers vs
    > having to do a physical disk read. How slow your DNS server is today.


    While I agree there's a lot of things you can't control for, you can
    get a more accurate picture by using CPU time instead of wall time
    (e.g. the clock() system call). If what you care about is mostly CPU
    time, you can control for the "your disk is fragmented", "your DNS
    server died", or "my cow-orker was banging on the test machine" this
    way.

    \t
     
    Tycho Andersen, Nov 17, 2011
    #5
  6. Ulrich Eckhardt

    spartan.the Guest

    On Nov 17, 4:03 pm, Roy Smith <> wrote:
    > In article <>,
    >  Ulrich Eckhardt <> wrote:
    >
    > > Yes, this is surely something that is necessary, in particular since
    > > there are no clear success/failure outputs like for unit tests and they
    > > require a human to interpret them.

    >
    > As much as possible, you want to automate things so no human
    > intervention is required.
    >
    > For example, let's say you have a test which calls foo() and times how
    > long it takes.  You've already mentioned that you run it N times and
    > compute some basic (min, max, avg, sd) stats.  So far, so good.
    >
    > The next step is to do some kind of regression against past results.
    > Once you've got a bunch of historical data, it should be possible to
    > look at today's numbers and detect any significant change in performance.
    >
    > Much as I loathe the bureaucracy and religious fervor which has grown up
    > around Six Sigma, it does have some good tools.  You might want to look
    > into control charts (http://en.wikipedia.org/wiki/Control_chart).  You
    > think you've got the test environment under control, do you?  Try
    > plotting a month's worth of run times for a particular test on a control
    > chart and see what it shows.
    >
    > Assuming your process really is under control, I would write scripts
    > that did the following kinds of analysis:
    >
    > 1) For a given test, do a linear regression of run time vs date.  If the
    > line has any significant positive slope, you want to investigate why.
    >
    > 2) You already mentioned, "I would even wonder if you can't verify the
    > behaviour agains an expected Big-O complexity somehow".  Of course you
    > can.  Run your test a bunch of times with different input sizes.  I
    > would try something like a 1-2-5 progression over several decades (i.e.
    > input sizes of 10, 20, 50, 100, 200, 500, 1000, etc)  You will have to
    > figure out what an appropriate range is, and how to generate useful
    > input sets.  Now, curve fit your performance numbers to various shape
    > curves and see what correlation coefficient you get.
    >
    > All that being said, in my experience, nothing beats plotting your data
    > and looking at it.


    I strongly agree with Roy, here.

    Ulrich, I recommend you to explore how google measures appengine's
    health here: http://code.google.com/status/appengine.

    Unit tests are inappropriate here; any single unit test can answer
    PASS or FAIL, YES or NO. It can't answer the question "how much".
    Unless you just want to use unit tests. Then any arguments here just
    don't make sense.

    I suggest:

    1. Decide what you want to measure. Measure result must be a number in
    range (0..100, -5..5), so you can plot them.
    2. Write no-UI programs to get each number (measure) and write it to
    CSV. Run each of them several times take away 1 worst and 1 best
    result, and take and average number.
    3. Collect the data for some period of time.
    4. Plot those average number over time axis (it's easy with CSV
    format).
    5. Make sure you automate this process (batch files or so) so the plot
    is generated automatically each hour or each day.

    And then after a month you can decide if you want to divide your
    number ranges into green-yellow-red zones. More often than not you may
    find that your measures are so inaccurate and random that you can't
    trust them. Then you'll either forget that or dive into math
    (statistics). You have about 5% chances to succeed ;)
     
    spartan.the, Nov 17, 2011
    #6
  7. Ulrich Eckhardt

    Roy Smith Guest

    In article <>,
    Tycho Andersen <> wrote:

    > While I agree there's a lot of things you can't control for, you can
    > get a more accurate picture by using CPU time instead of wall time
    > (e.g. the clock() system call). If what you care about is mostly CPU
    > time [...]


    That's a big if. In some cases, CPU time is important, but more often,
    wall-clock time is more critical. Let's say I've got two versions of a
    program. Here's some results for my test run:

    Version CPU Time Wall-Clock Time
    1 2 hours 2.5 hours
    2 1.5 hours 5.0 hours

    Between versions, I reduced the CPU time to complete the given task, but
    increased the wall clock time. Perhaps I doubled the size of some hash
    table. Now I get a lot fewer hash collisions (so I spend less CPU time
    re-hashing), but my memory usage went up so I'm paging a lot and my
    locality of reference went down so my main memory cache hit rate is
    worse.

    Which is better? I think most people would say version 1 is better.

    CPU time is only important in a situation where the system is CPU bound.
    In many real-life cases, that's not at all true. Things can be memory
    bound. Or I/O bound (which, when you consider paging, is often the same
    thing as memory bound). Or lock-contention bound.

    Before you starting measuring things, it's usually a good idea to know
    what you want to measure, and why :)
     
    Roy Smith, Nov 18, 2011
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. John Maclean
    Replies:
    1
    Views:
    351
    Martin P. Hellwig
    Apr 13, 2010
  2. Bill Mosteller
    Replies:
    0
    Views:
    234
    Bill Mosteller
    Oct 22, 2009
  3. Avi
    Replies:
    0
    Views:
    500
  4. Avi
    Replies:
    0
    Views:
    472
  5. James Harris

    C unit testing and regression testing

    James Harris, Aug 8, 2013, in forum: C Programming
    Replies:
    40
    Views:
    615
    Les Cargill
    Aug 17, 2013
Loading...

Share This Page