Using time.sleep() in 2 threads causes lockup when hyper-threading is enabled

Discussion in 'Python' started by OlafMeding@gmail.com, May 3, 2006.

  1. Guest

    Below are 2 files that isolate the problem. Note, both programs hang
    (stop responding) with hyper-threading turned on (a BIOS setting), but
    work as expected with hyper-threading turned off.

    Note, the Windows task manager shows 2 CPUs on the Performance tab with
    hyper-threading is turned on.

    Both Python 2.3.5 and 2.4.3 (downloaded from python.org) have this
    problem.
    The operating system is MS Windows XP Professional.

    winmsd.exe shows:
    2CPUs: x86 Family 15 Model 4 Stepping 1 GenuineIntel ~3000 MHz
    Version: 5.1.2600 Service Pack 2 Build 2600

    Could someone with a hyper-threading (or dual/multicore) CPU please
    confirm this bug?

    Many Thanks

    Olaf


    # testsleep.py
    import threading
    import time

    class Task(threading.Thread):
    def __init__(self, n, t):
    threading.Thread.__init__(self)
    self.n = n
    self.t = t
    def run(self):
    print 'thread %d started' % self.n
    print 'sleep time:', self.t
    print time.clock()
    print time.clock()
    print time.clock()
    print
    count = 0
    printCount = int(10 / self.t)
    while True:
    start = time.clock()
    time.sleep(self.t)
    stop = time.clock()
    if stop - start > 1.0:
    print 'thread', self.n, stop - start

    count += 1
    if count > printCount:
    count = 0
    print self.n,

    def test():
    thread1 = Task(1, 0.01)
    thread2 = Task(2, 0.003)
    thread1.start()
    thread2.start()

    test()

    ------------------------------------------------------------------------

    # testsleep2.py
    import thread
    import time
    import sys

    def run(n, t):
    print 'thread %d started' % n
    print 'sleep time:', t
    print time.clock()
    print time.clock()
    print time.clock()
    print
    count = 0
    printCount = int(10 / t)
    while True:
    start = time.clock()
    time.sleep(t)
    stop = time.clock()
    if stop - start > 1.0:
    print 'thread', n, stop - start

    count += 1
    if count > printCount:
    count = 0
    print n,

    def test():
    thread.start_new_thread(run, (1, 0.01))
    thread.start_new_thread(run, (2, 0.003))

    # Wait until the user presses the enter key.
    sys.stdin.read(1)


    test()
    , May 3, 2006
    #1
    1. Advertising

  2. Serge Orlov Guest

    wrote:
    > Below are 2 files that isolate the problem. Note, both programs hang
    > (stop responding) with hyper-threading turned on (a BIOS setting), but
    > work as expected with hyper-threading turned off.


    What do you mean "stop responding"? Not responding when you press
    ctrl-c? They stop printing? If you mean stop printing, try
    sys.stdout.flush() after each print
    Serge Orlov, May 3, 2006
    #2
    1. Advertising

  3. Tim Peters Guest

    Re: Using time.sleep() in 2 threads causes lockup whenhyper-threading is enabled

    []
    > Below are 2 files that isolate the problem. Note, both programs hang
    > (stop responding)


    What does "stop responding" mean?

    > with hyper-threading turned on (a BIOS setting), but
    > work as expected with hyper-threading turned off.
    >
    > Note, the Windows task manager shows 2 CPUs on the Performance tab with
    > hyper-threading is turned on.
    >
    > Both Python 2.3.5 and 2.4.3 (downloaded from python.org) have this
    > problem.
    > The operating system is MS Windows XP Professional.
    >
    > winmsd.exe shows:
    > 2CPUs: x86 Family 15 Model 4 Stepping 1 GenuineIntel ~3000 MHz
    > Version: 5.1.2600 Service Pack 2 Build 2600
    >
    > Could someone with a hyper-threading (or dual/multicore) CPU please
    > confirm this bug?


    I don't see anything unexpected (to me) on a WinXP Pro SP2 box with HT
    enabled, using 2.4.3, but I'm not sure what "not responding" means to
    you.

    > # testsleep.py
    > import threading
    > import time
    >
    > class Task(threading.Thread):
    > def __init__(self, n, t):
    > threading.Thread.__init__(self)
    > self.n = n
    > self.t = t
    > def run(self):
    > print 'thread %d started' % self.n
    > print 'sleep time:', self.t
    > print time.clock()
    > print time.clock()
    > print time.clock()
    > print
    > count = 0
    > printCount = int(10 / self.t)
    > while True:
    > start = time.clock()
    > time.sleep(self.t)
    > stop = time.clock()
    > if stop - start > 1.0:
    > print 'thread', self.n, stop - start


    You don't _expect_ this print to execute, do you? It should trigger
    very rarely (if ever).

    > count += 1
    > if count > printCount:
    > count = 0
    > print self.n,
    >
    > def test():
    > thread1 = Task(1, 0.01)
    > thread2 = Task(2, 0.003)
    > thread1.start()
    > thread2.start()
    >
    > test()


    This is what happened when I ran it, until I got tired of watching it
    and killed the job:

    C:\Python24>python test1.py
    thread 1 started
    sleep time: 0.01
    7.8499538238e-007
    4.42770924877e-005
    8.62618455186e-005

    thread 2 started
    sleep time: 0.003
    0.000479349533238
    0.000521904282916
    0.000563649037359

    1 2 1 2 1 2 1 2 1 1 2 1 2 1 2 1 2 1 2 1 1 2 1 2 1
    2 1 2 1 2 1 1 2 1 2 1 2 1 2 1 1 2 1 2 1 2 1 2 1 2
    1 1 2 1 2 1 2 1 2 1 2 1 1 2 1 2 1 2 1 2 1

    > # testsleep2.py
    > import thread
    > import time
    > import sys
    >
    > def run(n, t):
    > print 'thread %d started' % n
    > print 'sleep time:', t
    > print time.clock()
    > print time.clock()
    > print time.clock()
    > print
    > count = 0
    > printCount = int(10 / t)
    > while True:
    > start = time.clock()
    > time.sleep(t)
    > stop = time.clock()
    > if stop - start > 1.0:
    > print 'thread', n, stop - start


    See above.

    > count += 1
    > if count > printCount:
    > count = 0
    > print n,
    >
    > def test():
    > thread.start_new_thread(run, (1, 0.01))
    > thread.start_new_thread(run, (2, 0.003))
    >
    > # Wait until the user presses the enter key.
    > sys.stdin.read(1)


    Do you want someone running this test to hit the ENTER key, or not?

    >
    > test()


    Works much the same for me, except I got bored quicker ;-):

    C:\Python24>python test2..py
    thread 1 started
    sleep time: 0.01
    1.14999323533e-006
    8.01271757225e-005
    thread 2 started
    sleep time: 0.003
    0.000395865318439
    0.000474259857295
    0.000559831706872

    0.00071061346698

    1 2 1 2 1 2 1 2 1 1 2 1 2 1 2 1 2 1 2 1

    At that point I hit the ENTER key, and saw:

    Unhandled exception in thread started by
    Error in sys.excepthook:

    Original exception was:
    Unhandled exception in thread started by
    Error in sys.excepthook:

    Original exception was:


    That's unfortunate, but not unexpected either. The interpreter
    doesn't wait for a `thread` module thread to finish before tearing
    itself down, so the threads keep running after Python has torn so much
    of itself down that weird execptions occur -- and Python is _so_ torn
    down by that point it can't even display a useful error message. The
    interpreter does (by default) wait for `threading` module threads to
    exit before tearing itself down, so those kinds of useless exit
    messages weren't expected in the first test.
    Tim Peters, May 3, 2006
    #3
  4. Guest

    > What do you mean "stop responding"?

    Both threads print their thread numbers (either 1 or 2) approximately
    every 10 seconds. However, after a while (minutes to hours) both
    programs (see above) hang!

    Pressing ctrl-c (after the printing stops) causes the threads to "wake
    up" from their sleep statement. And since the sleep took more than 1
    seconds the thread number and the duration of the sleep is printed to
    the screen.

    Do you have a hyper-threading/dual/multi core CPU? Did you try this?

    Olaf
    , May 3, 2006
    #4
  5. Guest

    Time

    >>>

    1 2 1 2 1 2 1 2 1 1 2 1 2 1 2 1 2 1 2 1 1 2 1 2 1
    2 1 2 1 2 1 1 2 1 2 1 2 1 2 1 1 2 1 2 1 2 1 2 1 2
    1 1 2 1 2 1 2 1 2 1 2 1 1 2 1 2 1 2 1 2 1
    >>>


    This is exactly what you should see. The problem I see is that after a
    while (minutes to hours) the printing of 1s and 2s stops! If you press
    ctrl-c at that point the threads will print how many seconds they were
    stuck in the sleep statements (proving that the threads were stuck in
    the sleep statement until you pressed ctrl-c).

    Could you please try again (perhaps let it run overnight)? Many, many
    thanks.

    Olaf
    , May 3, 2006
    #5
  6. Guest

    Tim

    > Do you want someone running this test to hit the ENTER key, or not?


    The purpose of the "sys.stdin.read(1)" statement is simply to prevent
    the main thread from exiting and thus ending the test. And yes, I also
    get an exception when I press the enter key.

    Olaf
    , May 3, 2006
    #6
  7. Serge Orlov Guest

    wrote:
    > > What do you mean "stop responding"?

    >
    > Both threads print their thread numbers (either 1 or 2) approximately
    > every 10 seconds. However, after a while (minutes to hours) both
    > programs (see above) hang!
    >
    > Pressing ctrl-c (after the printing stops) causes the threads to "wake
    > up" from their sleep statement. And since the sleep took more than 1
    > seconds the thread number and the duration of the sleep is printed to
    > the screen.
    >
    > Do you have a hyper-threading/dual/multi core CPU? Did you try this?


    I don't have such CPU but I run the first program anyway. It printed

    C:\py>th.py
    thread 1 started
    sleep time: 0.01
    3.63174649292e-006
    8.43682646817e-005
    0.000164825417756

    thread 2 started
    sleep time: 0.003
    0.000675225482568
    0.000753447714724
    0.00082943502596

    1 1 1 2 1 1 1 2 1 1 1 2 1 1 1 1 2 1 1 1 2 1

    I got bored and tried to stop it with ctrl-c but it didn't respond and
    kept running and printing the numbers. I had to kill it from task
    manager.
    Serge Orlov, May 3, 2006
    #7
  8. Guest

    Serge

    > I got bored and tried to stop it with ctrl-c ...


    Yes, you have to use the ctrl-break key to stop the first program. And
    neither program every hangs on a single core CPU. It also does not
    hang on a hyper-threading CPU if hyper-threading is turned off in the
    BIOS.

    Olaf
    , May 3, 2006
    #8
  9. Tim Peters Guest

    Re: Using time.sleep() in 2 threads causes lockup whenhyper-threading is enabled

    >> What do you mean "stop responding"?

    []
    > Both threads print their thread numbers (either 1 or 2) approximately
    > every 10 seconds. However, after a while (minutes to hours) both
    > programs (see above) hang!


    Where "hang" means they stop printing.

    > Pressing ctrl-c (after the printing stops) causes the threads to "wake
    > up" from their sleep statement. And since the sleep took more than 1
    > seconds the thread number and the duration of the sleep is printed to
    > the screen.
    >
    > Do you have a hyper-threading/dual/multi core CPU? Did you try this?


    I was using a 3.4 GHz Pentium 4 (single core) with hyper-threading
    enabled. I didn't run it for hours ;-)

    But supposing I do and see a hang, it's unlikely that will have
    anything to do with Python. On Windows, time.sleep() called from any
    thread other than the main thread just calls the Win32 API Sleep()
    function, after converting the argument to milliseconds. So it may be
    more fruitful to recode your test program in C (if there is a bug
    here, it's most likely in Microsoft's implementation of its Sleep()
    function).
    Tim Peters, May 3, 2006
    #9
  10. Guest

    Tim

    > I didn't run it for hours ;-)


    Please try.

    The sleep statement does not return! And this should not happen. The
    code above does nothing special or unusual. The problem only occurs if
    2 threads use the sleep statement and hyper-threading is enabled.

    We discovered this bug perhaps a year ago. The only solution was to
    tell our customers to disable hyper-threading (you can imagine they did
    not like our "solution" very much). It then took many days of hard
    work to isolate the problem down to the code I posted above.

    > Where "hang" means they stop printing.


    Our Python code just stops running (locking up the entire application).
    We use Qt for our GUI. We have over a hundred .py files. We use a
    total of 4 threads (they get created once and stay running). One
    thread uses sockets.

    Once the application locks up (getting stuck in a sleep statement) all
    comes back to live if I pull the network cable out. So perhaps the
    socket thread returns from the sleep statement and other threads return
    to live because they were waiting for the socket thread.

    Our software runs on both Windows and Linux. We are not sure if the
    problem also happens on Linux.

    In any case, if someone else can confirm the bug then this is a serious
    problem meriting further investigation.

    We have searched the Internet far and wide and were not able to find
    any information that indicates that someone else has reported a similar
    problem (neither Python nor Windows Sleep related).

    Thank you for your help.

    Olaf
    , May 4, 2006
    #10
  11. On 2006-05-04, <> wrote:

    > Please try.
    >
    > The sleep statement does not return!


    Never, or does it just take a long time?

    > And this should not happen.


    Dude, it's MS Windows.

    It does all _sorts_ of stuff that it shouldn't.

    Having sleep() take orders of magnitude longer than it should
    is not an uncommon complaint for MS Windows users. There was a
    fairly extensive thread in this group about that problem just a
    few weeks ago. IIRC, disabling some other program or service
    fixed it for one MS victem.

    > The code above does nothing special or unusual. The problem
    > only occurs if 2 threads use the sleep statement and
    > hyper-threading is enabled.


    That part is new. I'm pretty sure other people who complained
    about sleep() not returning in a reasonable amount of time saw
    it all of the time.

    --
    Grant Edwards grante Yow! I was giving HAIR
    at CUTS to th' SAUCER PEOPLE
    visi.com ... I'm CLEAN!!
    Grant Edwards, May 4, 2006
    #11
  12. Guest

    Grant

    > Having sleep() take orders of magnitude longer than it should


    I seen a few times where sleep returns after some seconds or even after
    tens of seconds (my code above check for that). But most of the time
    it gets stuck forever.

    Olaf
    , May 4, 2006
    #12
  13. In <>, Grant Edwards wrote:

    > On 2006-05-04, <> wrote:
    >> The sleep statement does not return!

    >
    > Never, or does it just take a long time?
    >
    >> And this should not happen.

    >
    > Dude, it's MS Windows.
    >
    > It does all _sorts_ of stuff that it shouldn't.
    >
    > Having sleep() take orders of magnitude longer than it should
    > is not an uncommon complaint for MS Windows users. There was a
    > fairly extensive thread in this group about that problem just a
    > few weeks ago. IIRC, disabling some other program or service
    > fixed it for one MS victem.


    IIRC it was something like an NTP daemon that caused the clock to "jump"
    a little and (Window's) sleep was confused.

    Ciao,
    Marc 'BlackJack' Rintsch
    Marc 'BlackJack' Rintsch, May 4, 2006
    #13
  14. Guest

    Marc

    > IIRC it was something like an NTP daemon that caused the clock to "jump" a little and (Window's) sleep was confused.


    The problem is not a "jump" but a permanet lockup of the sleep
    statement.

    To all others, is there nobody out there that could run the test code
    at the beginning of this post on a hyper-threading CPU?

    Olaf
    , May 5, 2006
    #14
  15. Tim Peters Guest

    Re: Using time.sleep() in 2 threads causes lockup whenhyper-threading is enabled

    [Tim Peters]
    >> I didn't run it for hours ;-)


    []
    > Please try.


    OK, I let the first test program run for over 24 hours by now. It
    never hung. Overnight, the box did go into sleep mode, but the test
    woke itself up after sleep mode ended, and the threads reported they
    were sleeping for about 8 hours then.

    I did this under a debug build of Python, so that if it did hang I'd
    have _some_chance of getting useful info from the VC 7.1 debugger.
    It's possible (but unlikely) that using a debug-build Python prevented
    a hang that would have occurred had I used a release-build Python.

    > The sleep statement does not return! And this should not happen. The
    > code above does nothing special or unusual. The problem only occurs if
    > 2 threads use the sleep statement and hyper-threading is enabled.
    >
    > We discovered this bug perhaps a year ago. The only solution was to
    > tell our customers to disable hyper-threading (you can imagine they did
    > not like our "solution" very much). It then took many days of hard
    > work to isolate the problem down to the code I posted above.


    As before, since Python merely calls the Win32 API Sleep() function,
    it's extremely unlikely that the problem is due to Python.

    It's quite possible that the problem is due to a tiny timing hole in
    MS's implementation of Sleep(). Since I don't have the source code
    for that, and disassembling is prohibited by my license ;-), I can't
    pursue that.

    I've seen software with honest-to-God thread-race bugs that were never
    reported across years of heavy production use, until a user tried the
    code on a hyper-threaded or multi-core box. Tiny timing holes can be
    _extremely_ shy, and running on a box with true, or even just
    finer-grained (like HT), concurrency can make them much more likely to
    appear. I've never seen a software failure with (eventually) known
    cause occur on an HT box that could not have happened on a non-HT box.
    The statistics of extremely unlikely events aren't a natural fit to
    the unabused human mind ;-)

    > ...
    > Once the application locks up (getting stuck in a sleep statement) all
    > comes back to live if I pull the network cable out. So perhaps the
    > socket thread returns from the sleep statement and other threads return
    > to live because they were waiting for the socket thread.


    That's peculiar. time.sleep() called from a thread other than the
    main thread on Windows is non-interruptable (because the Windows
    Sleep() function is non-interruptable). time.sleep() called from the
    main thread on Windows _is_ interruptable: the main thread uses the
    Win32 API WaitForSingleObject() instead, passing a handle to a custom
    interrupt event; the _intent_ is so that a time.sleep() in the main
    thread can be aborted by a keyboard interrupt. But it shouldn't be
    possible for anything other than a keyboard interrupt or a normal
    (timeout) return to knock that loose.

    So if unplugging the cable knocks things lose, that just points even
    stronger at a bug in the Windows kernel.

    > Our software runs on both Windows and Linux. We are not sure if the
    > problem also happens on Linux.


    Well, I ran your test for a day on my Windows box -- you try running
    it for a week on your Linux box ;-)

    > ...
    > We have searched the Internet far and wide and were not able to find
    > any information that indicates that someone else has reported a similar
    > problem (neither Python nor Windows Sleep related).


    I'm pretty sure I would have known about it if anyone reported such a
    Python bug in the last 15 years. But this is the first time I've
    heard it. I don't keep up with Microsoft bug reports at all.
    Tim Peters, May 5, 2006
    #15
  16. In <>, OlafMeding
    wrote:

    >> IIRC it was something like an NTP daemon that caused the clock to
    >> "jump" a little and (Window's) sleep was confused.

    >
    > The problem is not a "jump" but a permanet lockup of the sleep statement.


    The "jump" of the system clock might confuse the systems `sleep`
    implementation.

    Ciao,
    Marc 'BlackJack' Rintsch
    Marc 'BlackJack' Rintsch, May 5, 2006
    #16
  17. On 2006-05-05, Marc 'BlackJack' Rintsch <> wrote:
    > In <>, OlafMeding
    > wrote:
    >
    >>> IIRC it was something like an NTP daemon that caused the clock
    >>> to "jump" a little and (Window's) sleep was confused.

    >>
    >> The problem is not a "jump" but a permanet lockup of the sleep
    >> statement.

    >
    > The "jump" of the system clock might confuse the systems
    > `sleep` implementation.


    I wouldn't be a bit surprised if MS write code to wake up a
    task when the currentTime == task.wakeupTime instead of when
    currentTime >= task.wakeupTime. If the system time jumps past
    the "wakeup" point without ever passing through it, the task
    sleeps forever.

    Microsoft's bungling of simple things like timer ticks is
    legend in some circles:

    http://www.embedded.com/showArticle.jhtml?articleID=57703680
    http://www.embedded.com/showArticle.jhtml?articleID=159902113

    --
    Grant Edwards grante Yow! I'm young... I'm
    at HEALTHY... I can HIKE
    visi.com THRU CAPT GROGAN'S LUMBAR
    REGIONS!
    Grant Edwards, May 5, 2006
    #17
  18. Guest

    Tim

    > I did this under a debug build of Python


    Perhaps this is the reason why you were not able to reproduce the
    problem. Could you try again with a standard build of Python?

    I am a bit surprised that nobody else has tried running the short
    Python program above on a hyper-threading or dual core / dual processor
    system.

    Olaf
    , May 8, 2006
    #18
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Gonzalo Moreno
    Replies:
    2
    Views:
    12,375
  2. Delaney, Timothy (Tim)
    Replies:
    4
    Views:
    275
  3. Replies:
    2
    Views:
    431
    =?ISO-8859-1?Q?Arne_Vajh=F8j?=
    Oct 3, 2006
  4. Sam Roberts
    Replies:
    16
    Views:
    363
    Daniel Berger
    Feb 18, 2005
  5. Richard
    Replies:
    7
    Views:
    213
    Richard
    May 22, 2007
Loading...

Share This Page