Python reliability

Discussion in 'Python' started by Ville Voipio, Oct 9, 2005.

  1. Ville Voipio

    Ville Voipio Guest

    I would need to make some high-reliability software
    running on Linux in an embedded system. Performance
    (or lack of it) is not an issue, reliability is.

    The piece of software is rather simple, probably a
    few hundred lines of code in Python. There is a need
    to interact with network using the socket module,
    and then probably a need to do something hardware-
    related which will get its own driver written in
    C.

    Threading and other more error-prone techniques can
    be left aside, everything can run in one thread with
    a poll loop.

    The software should be running continously for
    practically forever (at least a year without a reboot).
    Is the Python interpreter (on Linux) stable and
    leak-free enough to achieve this?

    - Ville

    --
    Ville Voipio, Dr.Tech., M.Sc. (EE)
     
    Ville Voipio, Oct 9, 2005
    #1
    1. Advertising

  2. Ville Voipio

    Paul Rubin Guest

    Ville Voipio <> writes:
    > The software should be running continously for
    > practically forever (at least a year without a reboot).
    > Is the Python interpreter (on Linux) stable and
    > leak-free enough to achieve this?


    I would say give the app the heaviest stress testing that you can
    before deploying it, checking carefully for leaks and crashes. I'd
    say that regardless of the implementation language.
     
    Paul Rubin, Oct 9, 2005
    #2
    1. Advertising

  3. Ville Voipio

    Paul Rubin Guest

    Steven D'Aprano <> writes:
    > If performance is really not such an issue, would it really matter if you
    > periodically restarted Python? Starting Python takes a tiny amount of time:


    If you have to restart an application, every network peer connected to
    it loses its connection. Think of a phone switch. Do you really want
    your calls dropped every few hours of conversation time, just because
    some lame application decided to restart itself? Phone switches go to
    great lengths to keep running through both hardware failures and
    software upgrades, without dropping any calls. That's the kind of
    application it sounds like the OP is trying to run.

    To the OP: besides Python you might also consider Erlang.
     
    Paul Rubin, Oct 10, 2005
    #3
  4. On Sun, 09 Oct 2005 23:00:04 +0300, Ville Voipio wrote:

    > I would need to make some high-reliability software
    > running on Linux in an embedded system. Performance
    > (or lack of it) is not an issue, reliability is.


    [snip]

    > The software should be running continously for
    > practically forever (at least a year without a reboot).
    > Is the Python interpreter (on Linux) stable and
    > leak-free enough to achieve this?


    If performance is really not such an issue, would it really matter if you
    periodically restarted Python? Starting Python takes a tiny amount of time:

    $ time python -c pass
    real 0m0.164s
    user 0m0.021s
    sys 0m0.015s

    If performance isn't an issue, your users may not even care about ten
    times that delay even once an hour. In other words, built your software to
    deal gracefully with restarts, and your users won't even notice or care if
    it restarts.

    I'm not saying that you will need to restart Python once an hour, or even
    once a month. But if you did, would it matter? What's more important is
    the state of the operating system. (I'm assuming that, with a year uptime
    the requirements, you aren't even thinking of WinCE.)


    --
    Steven.
     
    Steven D'Aprano, Oct 10, 2005
    #4
  5. Steven D'Aprano wrote:

    > On Sun, 09 Oct 2005 23:00:04 +0300, Ville Voipio wrote:
    >
    > > I would need to make some high-reliability software
    > > running on Linux in an embedded system. Performance
    > > (or lack of it) is not an issue, reliability is.

    >
    > [snip]
    >
    > > The software should be running continously for
    > > practically forever (at least a year without a reboot).
    > > Is the Python interpreter (on Linux) stable and
    > > leak-free enough to achieve this?

    >
    > If performance is really not such an issue, would it really matter if you
    > periodically restarted Python? Starting Python takes a tiny amount of time:


    You must have missed or misinterpreted the "The software should be
    running continously for practically forever" part. The problem of
    restarting python is not the 200 msec lost but putting at stake
    reliability (e.g. for health monitoring devices, avionics, nuclear
    reactor controllers, etc.) and robustness (e.g. a computation that
    takes weeks of cpu time to complete is interrupted without the
    possibility to restart from the point it stopped).

    George
     
    George Sakkis, Oct 10, 2005
    #5
  6. Ville Voipio

    Neal Norwitz Guest

    Ville Voipio wrote:
    >
    > The software should be running continously for
    > practically forever (at least a year without a reboot).
    > Is the Python interpreter (on Linux) stable and
    > leak-free enough to achieve this?


    Jp gave you the answer that he has done this.

    I've spent quite a bit of time since 2.1 days trying to improve the
    reliability. I think it has gotten much better. Valgrind is run on
    (nearly) every release. We look for various kinds of problems. I try
    to review C code for these sorts of problems etc.

    There are very few known issues that can crash the interpreter. I
    don't know of any memory leaks. socket code is pretty well tested and
    heavily used, so you should be in fairly safe territory, particularly
    on Unix.

    n
     
    Neal Norwitz, Oct 10, 2005
    #6
  7. George Sakkis wrote:

    > Steven D'Aprano wrote:
    >
    >
    >>On Sun, 09 Oct 2005 23:00:04 +0300, Ville Voipio wrote:
    >>
    >>
    >>>I would need to make some high-reliability software
    >>>running on Linux in an embedded system. Performance
    >>>(or lack of it) is not an issue, reliability is.

    >>
    >>[snip]
    >>
    >>
    >>>The software should be running continously for
    >>>practically forever (at least a year without a reboot).
    >>>Is the Python interpreter (on Linux) stable and
    >>>leak-free enough to achieve this?

    >>
    >>If performance is really not such an issue, would it really matter if you
    >>periodically restarted Python? Starting Python takes a tiny amount of time:

    >
    >
    > You must have missed or misinterpreted the "The software should be
    > running continously for practically forever" part. The problem of
    > restarting python is not the 200 msec lost but putting at stake
    > reliability (e.g. for health monitoring devices, avionics, nuclear
    > reactor controllers, etc.) and robustness (e.g. a computation that
    > takes weeks of cpu time to complete is interrupted without the
    > possibility to restart from the point it stopped).



    Er, no, I didn't miss that at all. I did miss that it
    needed continual network connections. I don't know if
    there is a way around that issue, although mobile
    phones move in and out of network areas, swapping
    connections when and as needed.

    But as for reliability, well, tell that to Buzz Aldrin
    and Neil Armstrong. The Apollo 11 moon lander rebooted
    multiple times on the way down to the surface. It was
    designed to recover gracefully when rebooting unexpectedly:

    http://www.hq.nasa.gov/office/pao/History/alsj/a11/a11.1201-pa.html

    I don't have an authoritive source of how many times
    the computer rebooted during the landing, but it was
    measured in the dozens. Calculations were performed in
    an iterative fashion, with an initial estimate that was
    improved over time. If a calculation was interupted the
    computer lost no more than one iteration.

    I'm not saying that this strategy is practical or
    useful for the original poster, but it *might* be. In a
    noisy environment, it pays to design a system that can
    recover transparently from a lost connection.

    If your heart monitor can reboot in 200 ms, you might
    miss one or two beats, but so long as you pick up the
    next one, that's just noise. If your calculation takes
    more than a day of CPU time to complete, you should
    design it in such a way that you can save state and
    pick it up again when you are ready. You never know
    when the cleaner will accidently unplug the computer...


    --
    Steven.
     
    Steven D'Aprano, Oct 10, 2005
    #7
  8. Ville Voipio

    Ville Voipio Guest

    In article <>, Paul Rubin wrote:

    > I would say give the app the heaviest stress testing that you can
    > before deploying it, checking carefully for leaks and crashes. I'd
    > say that regardless of the implementation language.


    Goes without saying. But I would like to be confident (or as
    confident as possible) that all bugs are mine. If I use plain
    C, I think this is the case. Of course, bad memory management
    in the underlying platform will wreak havoc. I am planning to
    use Linux 2.4.somethingnew as the OS kernel, and there I have
    not experienced too many problems before.

    Adding the Python interpreter adds one layer on uncertainty.
    On the other hand, I am after the simplicity of programming
    offered by Python.

    - Ville

    --
    Ville Voipio, Dr.Tech., M.Sc. (EE)
     
    Ville Voipio, Oct 10, 2005
    #8
  9. Ville Voipio

    Ville Voipio Guest

    In article <>,
    Steven D'Aprano wrote:

    > If performance is really not such an issue, would it really matter if you
    > periodically restarted Python? Starting Python takes a tiny amount of time:


    Uhhh. Sounds like playing with Microsoft :) I know of a mission-
    critical system which was restarted every week due to some memory
    leaks. If it wasn't, it crashed after two weeks. Guess which
    platform...

    > $ time python -c pass
    > real 0m0.164s
    > user 0m0.021s
    > sys 0m0.015s


    This is on the limit of being acceptable. I'd say that a one-second
    time lag is the maximum. The system is a safety system after all,
    and there will be a hardware watchdog to take care of odd crashes.
    The software itself is stateless in the sense that its previous
    state does not affect the next round. Basically, it is just checking
    a few numbers over the network. Even the network connection is
    stateless (single UDP packet pairs) to avoid TCP problems with
    partial closings, etc.

    There are a gazillion things which may go wrong. A stray cosmic
    ray may change the state of one bit in the wrong place of memory,
    and that's it, etc. So, the system has to be able to recover from
    pretty much everything. I will in any case build an independent
    process which probes the state of the main process. However,
    I hope it is never really needed.

    > I'm not saying that you will need to restart Python once an hour, or even
    > once a month. But if you did, would it matter? What's more important is
    > the state of the operating system. (I'm assuming that, with a year uptime
    > the requirements, you aren't even thinking of WinCE.)


    Not even in my worst nightmares! The platform will be an embedded
    Linux computer running 2.4.somethingnew.

    - Ville

    --
    Ville Voipio, Dr.Tech., M.Sc. (EE)
     
    Ville Voipio, Oct 10, 2005
    #9
  10. Ville Voipio

    Paul Rubin Guest

    Ville Voipio <> writes:
    > Goes without saying. But I would like to be confident (or as
    > confident as possible) that all bugs are mine. If I use plain
    > C, I think this is the case. Of course, bad memory management
    > in the underlying platform will wreak havoc. I am planning to
    > use Linux 2.4.somethingnew as the OS kernel, and there I have
    > not experienced too many problems before.


    You might be better off with a 2.6 series kernel. If you use Python
    conservatively (be careful with the most advanced features, and don't
    stress anything too hard) you should be ok. Python works pretty well
    if you use it the way the implementers expected you to. Its
    shortcomings are when you try to press it to its limits.

    You do want reliable hardware with ECC and all that, maybe with multiple
    servers and automatic failover. This site might be of interest:

    http://www.linux-ha.org/
     
    Paul Rubin, Oct 10, 2005
    #10
  11. Ville Voipio wrote:

    > There are a gazillion things which may go wrong. A stray cosmic
    > ray may change the state of one bit in the wrong place of memory,
    > and that's it, etc. So, the system has to be able to recover from
    > pretty much everything. I will in any case build an independent
    > process which probes the state of the main process. However,
    > I hope it is never really needed.


    If you have enough hardware grunt, you could think
    about having three independent processes working in
    parallel. They vote on their output, and best out of
    three gets reported back to the user. In other words,
    only if all three results are different does the device
    throw its hands up in the air and say "I don't know!"

    Of course, unless you are running each of them on an
    independent set of hardware and OS, you really aren't
    getting that much benefit. And then there is the
    question, can you trust the voting mechanism... But if
    this is so critical you are worried about cosmic rays,
    maybe it is the way to go.

    If it is not a secret, what are you monitoring with
    this device?


    --
    Steven.
     
    Steven D'Aprano, Oct 10, 2005
    #11
  12. Ville Voipio

    Ville Voipio Guest

    In article <>, Steven D'Aprano wrote:

    > If you have enough hardware grunt, you could think
    > about having three independent processes working in
    > parallel. They vote on their output, and best out of
    > three gets reported back to the user. In other words,
    > only if all three results are different does the device
    > throw its hands up in the air and say "I don't know!"


    Ok, I will give you a bit more information, so that the
    situation is a bit clearer. (Sorry, I cannot tell you
    the exact application.)

    The system is a safety system which supervises several
    independent measurements (two or more). The measurements
    are carried out by independent measurement instruments
    which have their independent power supplies, etc.

    The application communicates with the independent
    measurement instruments thrgough the network. Each
    instrument is queried its measurement results and
    status information regularly. If the results given
    by different instruments differ more than a given
    amount, then an alarm is set (relay contacts opened).

    Naturally, in case of equipment malfunction, the
    alarm is set. This covers a wide range of problems from
    errors reported by the instrument to physical failures
    or program bugs.

    The system has several weak spots. However, the basic
    principle is simple: if anything goes wrong, start
    yelling. A false alarm is costly, but not giving the
    alarm when required is downright impossible.

    I am not building a redundant system with independent
    instruments voting. At this point I am trying to minimize
    the false alarms. This is why I want to know if Python
    is reliable enough to be used in this application.

    By the postings I have seen in this thread it seems that
    the answer is positive. At least if I do not try
    apply any adventorous programming techniques.

    - Ville

    --
    Ville Voipio, Dr.Tech., M.Sc. (EE)
     
    Ville Voipio, Oct 10, 2005
    #12
  13. Ville Voipio

    Ville Voipio Guest

    In article <>, Paul Rubin wrote:

    > You might be better off with a 2.6 series kernel. If you use Python
    > conservatively (be careful with the most advanced features, and don't
    > stress anything too hard) you should be ok. Python works pretty well
    > if you use it the way the implementers expected you to. Its
    > shortcomings are when you try to press it to its limits.


    Just one thing: how reliable is the garbage collecting system?
    Should I try to either not produce any garbage or try to clean
    up manually?

    > You do want reliable hardware with ECC and all that, maybe with multiple
    > servers and automatic failover. This site might be of interest:


    Well... Here the uptime benefit from using several servers is
    not eceonomically justifiable. I am right now at the phase of
    trying to minimize the downtime with given hardware resources.
    This is not flying; downtime does not kill anyone. I just want
    to avoid choosing tools which belong more to the problem than
    to the solution set.

    - Ville

    --
    Ville Voipio, Dr.Tech., M.Sc. (EE)
     
    Ville Voipio, Oct 10, 2005
    #13
  14. Ville Voipio

    Paul Rubin Guest

    Ville Voipio <> writes:
    > Just one thing: how reliable is the garbage collecting system?
    > Should I try to either not produce any garbage or try to clean
    > up manually?


    The GC is a simple, manually-updated reference counting system
    augmented with some extra contraption to resolve cyclic dependencies.
    It's extremely easy to make errors with the reference counts in C
    extensions, and either leak references (causing memory leaks) or
    forget to add them (causing double-free crashes). The standard
    libraries are pretty careful about managing references but if you're
    using 3rd party C modules, or writing your own, then watch out.

    There is no way you can avoid making garbage. Python conses
    everything, even integers (small positive ones are cached). But I'd
    say, avoid making cyclic dependencies, be very careful if you use the
    less popular C modules or any 3rd party ones, and stress test the hell
    out of your app while monitoring memory usage very carefully. If you
    can pound it with as much traffic in a few hours as it's likely to see
    in a year of deployment, without memory leaks or thread races or other
    errors, that's a positive sign.

    > Well... Here the uptime benefit from using several servers is not
    > eceonomically justifiable. I am right now at the phase of trying to
    > minimize the downtime with given hardware resources. This is not
    > flying; downtime does not kill anyone. I just want to avoid choosing
    > tools which belong more to the problem than to the solution set.


    You're probably ok with Python in this case.
     
    Paul Rubin, Oct 10, 2005
    #14
  15. Ville Voipio

    Max M Guest

    Ville Voipio wrote:
    > In article <>, Paul Rubin wrote:
    >
    >>I would say give the app the heaviest stress testing that you can
    >>before deploying it, checking carefully for leaks and crashes. I'd
    >>say that regardless of the implementation language.

    >
    > Goes without saying. But I would like to be confident (or as
    > confident as possible) that all bugs are mine. If I use plain
    > C, I think this is the case. Of course, bad memory management
    > in the underlying platform will wreak havoc.


    Python isn't perfect, but I do believe that is as good as the best of
    the major "standard" systems out there.

    You will have *far* greater chances of introducing errors yourself by
    coding in c, than you will encounter in Python.

    You can see the bugs fixed in recent versions, and see for yourself
    whether they would have crashed your system. That should be an indicator:

    http://www.python.org/2.4.2/NEWS.html


    --

    hilsen/regards Max M, Denmark

    http://www.mxm.dk/
    IT's Mad Science
     
    Max M, Oct 10, 2005
    #15
  16. Ville Voipio

    Tom Anderson Guest

    Python's garbage collection was Re: Python reliability

    On Mon, 10 Oct 2005, it was written:

    > Ville Voipio <> writes:
    >
    >> Just one thing: how reliable is the garbage collecting system? Should I
    >> try to either not produce any garbage or try to clean up manually?

    >
    > The GC is a simple, manually-updated reference counting system augmented
    > with some extra contraption to resolve cyclic dependencies. It's
    > extremely easy to make errors with the reference counts in C extensions,
    > and either leak references (causing memory leaks) or forget to add them
    > (causing double-free crashes).


    Has anyone looked into using a real GC for python? I realise it would be a
    lot more complexity in the interpreter itself, but it would be faster,
    more reliable, and would reduce the complexity of extensions.

    Hmm. Maybe it wouldn't make extensions easier or more reliable. You'd
    still need some way of figuring out which variables in C-land held
    pointers to objects; if anything, that might be harder, unless you want to
    impose a horrendous JAI-like bondage-and-discipline interface.

    > There is no way you can avoid making garbage. Python conses everything,
    > even integers (small positive ones are cached).


    So python doesn't use the old SmallTalk 80 SmallInteger hack, or similar?
    Fair enough - the performance gain is nice, but the extra complexity would
    be a huge pain, i imagine.

    tom

    --
    Fitter, Happier, More Productive.
     
    Tom Anderson, Oct 10, 2005
    #16
  17. Ville Voipio

    Aahz Guest

    Re: Python's garbage collection was Re: Python reliability

    In article <>,
    Tom Anderson <> wrote:
    >
    >Has anyone looked into using a real GC for python? I realise it would be a
    >lot more complexity in the interpreter itself, but it would be faster,
    >more reliable, and would reduce the complexity of extensions.
    >
    >Hmm. Maybe it wouldn't make extensions easier or more reliable. You'd
    >still need some way of figuring out which variables in C-land held
    >pointers to objects; if anything, that might be harder, unless you want to
    >impose a horrendous JAI-like bondage-and-discipline interface.


    Bingo! There's a reason why one Python motto is "Plays well with
    others".
    --
    Aahz () <*> http://www.pythoncraft.com/

    "If you think it's expensive to hire a professional to do the job, wait
    until you hire an amateur." --Red Adair
     
    Aahz, Oct 10, 2005
    #17
  18. Ville Voipio

    Mike Meyer Guest

    Re: Python's garbage collection was Re: Python reliability

    Tom Anderson <> writes:
    > Has anyone looked into using a real GC for python? I realise it would
    > be a lot more complexity in the interpreter itself, but it would be
    > faster, more reliable, and would reduce the complexity of extensions.
    >
    > Hmm. Maybe it wouldn't make extensions easier or more reliable. You'd
    > still need some way of figuring out which variables in C-land held
    > pointers to objects; if anything, that might be harder, unless you
    > want to impose a horrendous JAI-like bondage-and-discipline interface.


    Wouldn't necessarily be faster, either. I rewrote an program that
    built a static data structure of a couple of hundred thousand objects
    and then went traipsing through that while generating a few hundred
    objects in a compiled language with a real garbage collector. The
    resulting program ran about an order of magnitude slower than the
    Python version.

    Profiling revealed that it was spending 95% of it's time in the
    garbage collector, marking and sweeping that large data structure.

    There's lots of research on dealing with this problem, as my usage
    pattern isn't unusual - just a little extreme. Unfortunately, none of
    them were applicable to comiled code without a serious performance
    impact on pretty much everything. Those could probably be used in
    Python without a problem.

    <mike
    --
    Mike Meyer <> http://www.mired.org/home/mwm/
    Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
     
    Mike Meyer, Oct 10, 2005
    #18
  19. "Ville Voipio" <> wrote in message
    news:...
    > In article <>, Paul Rubin wrote:

    <snip>

    > I would need to make some high-reliability software
    > running on Linux in an embedded system. Performance
    > (or lack of it) is not an issue, reliability is.


    > The software should be running continously for
    > practically forever (at least a year without a reboot).
    > is the Python interpreter (on Linux) stable and
    > leak-free enough to achieve this?


    >
    > Adding the Python interpreter adds one layer on uncertainty.
    > On the other hand, I am after the simplicity of programming
    > offered by Python.

    <snip>

    > I would need to make some high-reliability software
    > running on Linux in an embedded system. Performance
    > (or lack of it) is not an issue, reliability is.

    <snip>
    > The software should be running continously for
    > practically forever (at least a year without a reboot).
    > is the Python interpreter (on Linux) stable and
    > leak-free enough to achieve this?

    <snip>

    All in all, it would seem that the reliability of the Python run time is the
    least of your worries. The best multi-tasking operating systems do a good
    job of segragating different processes BUT what multitasking operating
    system meets the standard you request in that last paragraph? Assuming that
    the Python interpreter itself is robust enough to meet that standard, what
    about that other 99% of everything else that is competing with your Python
    script for cpu, memory, and other critical resources? Under ordinary Linux,
    your Python script will be interrupted frequently and regularly by processes
    entirely outside of Python's control.

    You may not want a multitasking OS at all but rather a single tasking OS
    where nothing happens that isn't 100% under your program control. Or if you
    do need a multitasking system, you probably want something designed for the
    type of rugged use you are demanding. I would google "embedded systems".
    If you want to use Python/Linux, I might suggest you search "Embedded
    Linux".

    And I wouldn't be surprised if some dedicated microcontrollers aren't
    showing up with Python capability. In any case, it would seem you need more
    control than a Python interpreter would receive when running under Linux.

    Good Luck.
    Thomas Bartkus
     
    Thomas Bartkus, Oct 10, 2005
    #19
  20. Ville Voipio

    Paul Rubin Guest

    Re: Python's garbage collection was Re: Python reliability

    Tom Anderson <> writes:
    > Has anyone looked into using a real GC for python? I realise it would
    > be a lot more complexity in the interpreter itself, but it would be
    > faster, more reliable, and would reduce the complexity of extensions.


    The next PyPy sprint (this week I think) is going to focus partly on GC.

    > Hmm. Maybe it wouldn't make extensions easier or more reliable. You'd
    > still need some way of figuring out which variables in C-land held
    > pointers to objects; if anything, that might be harder, unless you
    > want to impose a horrendous JAI-like bondage-and-discipline interface.


    I'm not sure what JAI is (do you mean JNI?) but you might look at how
    Emacs Lisp does it. You have to call a macro to protect intermediate
    heap results in C functions from GC'd, so it's possible to make
    errors, but it cleans up after itself and is generally less fraught
    with hazards than Python's method is.
     
    Paul Rubin, Oct 10, 2005
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Alan Cyment
    Replies:
    1
    Views:
    1,414
    Esmond Pitt
    Jan 29, 2004
  2. i_lk
    Replies:
    4
    Views:
    365
    Roedy Green
    Oct 11, 2005
  3. Delaney, Timothy (Tim)
    Replies:
    20
    Views:
    881
    Patrick Down
    Oct 14, 2005
  4. Carl J. Van Arsdall

    The reliability of python threads

    Carl J. Van Arsdall, Jan 24, 2007, in forum: Python
    Replies:
    41
    Views:
    886
    Steve Holden
    Feb 1, 2007
  5. oleg korenevich
    Replies:
    4
    Views:
    457
    Mel Wilson
    Feb 2, 2012
Loading...

Share This Page