Problem with awaitTermination in ThreadpoolExecutor.

Discussion in 'Java' started by TomInDenver, Aug 23, 2010.

  1. TomInDenver

    TomInDenver Guest

    Hi,

    The javadoc for awaitTermination in ExecutorService and
    ThreadPoolExecutor includes the following:

    Description
    Blocks until all tasks have completed execution after a shutdown
    request, or the timeout occurs, or the current thread is interrupted,
    whichever happens first.

    Returns:
    true if this executor terminated and false if the timeout elapsed
    before termination

    We have occasionally noticed that awaitTermination returns true when
    tasks submitted to the executor are still running, a timeout has not
    occurred, and the submitting thread was not interrupted. This has been
    an infrequent occurrence, but when it happens it severely impacts our
    application. Our log clearly shows the condition (log messages from
    Runnables exist after the awaitTermination returned true), and the
    application behavior reflects the result of this condition (failures
    due to threads still running when it is expected that the threads
    have completed).

    Below is the relevant code. (In this instance the tasks are
    downloading files from an FTP site using a 3rd party FTP library, one
    file per thread.)

    Can anyone point out anything in this code that might cause the
    problem, or suggest how we might refactor the code so the chances of
    the problem occurring are reduced, or let us know if you recall a
    bugfix for a problem like this ? We are using java build 1.6.0_11-
    b03.


    // Create thread pool
    ExecutorService downloadThreadPool = new ThreadPoolExecutor(
    3, // corePoolSz
    5, // maxPoolSz,
    7, // keepalive (7 days)
    TimeUnit.DAYS,
    new LinkedBlockingQueue<Runnable>(),
    new ThreadFactory() {
    public Thread newThread(Runnable r) {
    Thread t = new Thread(r);
    t.setDaemon(false);
    t.setName("XF-Download-Thread-Pool");
    return t;
    }
    });

    //The application creates many Runnables and then executes the
    following line in a loop for each:
    // (code to create Runnables not shown here)
    downloadThreadPool.execute(aRunnable);

    // Make threadpool wait up to 7 days for Runnables to end, after
    which a threadpool timeout will occur.
    downloadThreadPool.shutdown();
    try {
    if (!downloadThreadPool.awaitTermination(7, TimeUnit.DAYS)) {
    Log.log(SPLogger.LogLevel.WARNING, "Threadpool timeout occurred",
    SPLogger.LogPhase.UNKNOWN);
    }
    } catch (InterruptedException ie) {
    Log.log(SPLogger.LogLevel.WARNING, "Threadpool prematurely
    terminated due to interruption in thread that created pool",
    SPLogger.LogPhase.UNKNOWN);
    }

    Thank you,

    Tom Vicker
    TomInDenver, Aug 23, 2010
    #1
    1. Advertising

  2. TomInDenver

    Daniel Pitts Guest

    On 8/23/2010 7:20 AM, TomInDenver wrote:
    > Hi,
    >
    > The javadoc for awaitTermination in ExecutorService and
    > ThreadPoolExecutor includes the following:
    >
    > Description
    > Blocks until all tasks have completed execution after a shutdown
    > request, or the timeout occurs, or the current thread is interrupted,
    > whichever happens first.
    >
    > Returns:
    > true if this executor terminated and false if the timeout elapsed
    > before termination
    >
    > We have occasionally noticed that awaitTermination returns true when
    > tasks submitted to the executor are still running, a timeout has not
    > occurred, and the submitting thread was not interrupted. This has been
    > an infrequent occurrence, but when it happens it severely impacts our
    > application. Our log clearly shows the condition (log messages from
    > Runnables exist after the awaitTermination returned true), and the
    > application behavior reflects the result of this condition (failures
    > due to threads still running when it is expected that the threads
    > have completed).
    >
    > Below is the relevant code. (In this instance the tasks are
    > downloading files from an FTP site using a 3rd party FTP library, one
    > file per thread.)
    >
    > Can anyone point out anything in this code that might cause the
    > problem, or suggest how we might refactor the code so the chances of
    > the problem occurring are reduced, or let us know if you recall a
    > bugfix for a problem like this ? We are using java build 1.6.0_11-
    > b03.
    >
    >
    > // Create thread pool
    > ExecutorService downloadThreadPool = new ThreadPoolExecutor(
    > 3, // corePoolSz
    > 5, // maxPoolSz,
    > 7, // keepalive (7 days)
    > TimeUnit.DAYS,
    > new LinkedBlockingQueue<Runnable>(),
    > new ThreadFactory() {
    > public Thread newThread(Runnable r) {
    > Thread t = new Thread(r);
    > t.setDaemon(false);
    > t.setName("XF-Download-Thread-Pool");
    > return t;
    > }
    > });
    >
    > //The application creates many Runnables and then executes the
    > following line in a loop for each:
    > // (code to create Runnables not shown here)
    > downloadThreadPool.execute(aRunnable);
    >
    > // Make threadpool wait up to 7 days for Runnables to end, after
    > which a threadpool timeout will occur.
    > downloadThreadPool.shutdown();
    > try {
    > if (!downloadThreadPool.awaitTermination(7, TimeUnit.DAYS)) {
    > Log.log(SPLogger.LogLevel.WARNING, "Threadpool timeout occurred",
    > SPLogger.LogPhase.UNKNOWN);
    > }
    > } catch (InterruptedException ie) {
    > Log.log(SPLogger.LogLevel.WARNING, "Threadpool prematurely
    > terminated due to interruption in thread that created pool",
    > SPLogger.LogPhase.UNKNOWN);
    > }
    >
    > Thank you,
    >
    > Tom Vicker


    The problem is in the code you didn't post.

    Please create an SSCCE and post it here.
    <http://sscce.org/>

    I suspicion is that perhaps it *is* terminated., but you're log might be
    buffered and the buffer isn't flushed in the order you expect.

    --
    Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>
    Daniel Pitts, Aug 23, 2010
    #2
    1. Advertising

  3. TomInDenver

    markspace Guest

    On 8/23/2010 10:52 AM, Daniel Pitts wrote:

    > On 8/23/2010 7:20 AM, TomInDenver wrote:


    >> if (!downloadThreadPool.awaitTermination(7, TimeUnit.DAYS)) {
    >> Log.log(SPLogger.LogLevel.WARNING, "Threadpool timeout occurred",
    >> SPLogger.LogPhase.UNKNOWN);


    >
    > I suspicion is that perhaps it *is* terminated., but you're log might be
    > buffered and the buffer isn't flushed in the order you expect.



    Well, he only logs the above message if the timeout occurs. Which at 7
    days, is pretty unlikely unless he really is waiting 168 hours for his
    app to shutdown.

    However the OP could just do this:

    while(!downloadThreadPool.awaitTermination(7, TimeUnit.DAYS )) ;

    And the code would now wait forever. No chance of the shutdown wait
    exiting prematurely. Whether that's really what the OP wants, is up to him.
    markspace, Aug 23, 2010
    #3
  4. ThreadPoolExecutor is very problematic. It has feature bloat, poorly
    defined threading characteristics, and it wouldn't surprise me if you
    were hitting a bug. From a quick code review, it appears that
    awaitTermination() will cause threads to get stuck in run() -> getTask()
    -> workerCanExit() -> mainLock.lock().

    These are a couple of bugs that I've seen cause problems:

    - It has a positive feedback loop on creating threads. If there's no
    thread for a new task it creates a new thread. That takes CPU time from
    a thread that might be about to complete. Now the next task needs a new
    thread too, and on and on. With a large thread pool, such as for Tomcat
    HTTP handlers, this causes stalls.

    - Worker threads are used round-robin so extra threads can not time out
    as long as there is a steady stream in new tasks. In an environment
    where per-thread caches are used to improve concurrency, this causes
    more frequent GCing that hurts performance.
    --
    I won't see Google Groups replies because I must filter them as spam
    Kevin McMurtrie, Aug 23, 2010
    #4
  5. TomInDenver

    TomInDenver Guest

    On Aug 23, 11:52 am, Daniel Pitts
    <> wrote:
    > On 8/23/2010 7:20 AM, TomInDenver wrote:
    >
    >
    >
    > > Hi,

    >
    > > The javadoc for awaitTermination in ExecutorService and
    > > ThreadPoolExecutor includes the following:

    >
    > > Description
    > > Blocks until all tasks have completed execution after a shutdown
    > > request, or the timeout occurs, or the current thread is interrupted,
    > > whichever happens first.

    >
    > > Returns:
    > > true if this executor terminated and false if the timeout elapsed
    > > before termination

    >
    > > We have occasionally noticed that awaitTermination returns true when
    > > tasks submitted to the executor are still running, a timeout has not
    > > occurred, and the submitting thread was not interrupted. This has been
    > > an infrequent occurrence, but when it happens it severely impacts our
    > > application. Our log clearly shows the condition (log messages from
    > > Runnables exist after the awaitTermination returned true), and the
    > > application behavior reflects the result of this condition (failures
    > > due to threads still running when it is expected that the threads
    > > have  completed).

    >
    > > Below is the relevant code. (In this instance the tasks are
    > > downloading files from an FTP site using a 3rd party FTP library, one
    > > file per thread.)

    >
    > > Can anyone point out anything in this code that might cause the
    > > problem, or suggest how we might refactor the code so the chances of
    > > the problem occurring are reduced, or let us know if you recall a
    > > bugfix for a problem like this ? We are using java build 1.6.0_11-
    > > b03.

    >
    > >          // Create thread pool
    > >    ExecutorService downloadThreadPool = new ThreadPoolExecutor(
    > >            3,    // corePoolSz
    > >            5,    // maxPoolSz,
    > >            7,    // keepalive (7 days)
    > >            TimeUnit.DAYS,
    > >            new LinkedBlockingQueue<Runnable>(),
    > >            new ThreadFactory() {
    > >                    public Thread newThread(Runnable r) {
    > >                    Thread t = new Thread(r);
    > >                    t.setDaemon(false);
    > >                    t.setName("XF-Download-Thread-Pool");
    > >                    return t;
    > >                    }
    > >            });

    >
    > >    //The application creates many Runnables and then executes the
    > > following line in a loop for each:
    > >    // (code to create Runnables not shown here)
    > >    downloadThreadPool.execute(aRunnable);

    >
    > >    // Make threadpool wait up to 7 days for Runnables to end, after
    > > which a threadpool timeout will occur.
    > >    downloadThreadPool.shutdown();
    > >    try {
    > >            if (!downloadThreadPool.awaitTermination(7, TimeUnit.DAYS)) {
    > >                    Log.log(SPLogger.LogLevel.WARNING, "Threadpool timeout occurred",
    > > SPLogger.LogPhase.UNKNOWN);
    > >            }
    > >    } catch (InterruptedException ie) {
    > >            Log.log(SPLogger.LogLevel.WARNING, "Threadpool prematurely
    > > terminated due to interruption in thread that created pool",
    > > SPLogger.LogPhase.UNKNOWN);
    > >    }

    >
    > > Thank you,

    >
    > > Tom Vicker

    >
    > The problem is in the code you didn't post.
    >
    > Please create an SSCCE and post it here.
    > <http://sscce.org/>
    >
    > I suspicion is that perhaps it *is* terminated., but you're log might be
    > buffered and the buffer isn't flushed in the order you expect.
    >
    > --
    > Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>


    Daniel,

    Thanks for response. It seems you're interested in seeing the code of
    the thread class. There is quite a bit of code I would need to post
    and I am not sure it would be productive. The run() method code is
    within a try/catch and the catch is for "Exception", which is handled
    without rethrowing.

    I am curious what code could be in the Runnable that would cause the
    ThreadPoolExecutor to think the thread has terminated when it actually
    has not terminated. Do you know of anything that would cause this
    condition ? Some of the code in the Runnable are calls to objects in a
    3rd party lib, so I cannot see what that code is doing. If I knew what
    could cause the problem condition, I can check our code and also
    request the 3rd party vendor to check their code. So please, if you
    know what would cause it, please respond.

    The buffer flush scenario you described isn't happening because we see
    evidence of the thread running before the timeout expiration and well
    after the awaitTermination unblocked. Furthermore, the behavior of the
    application is such that if the threads did not terminate, subsequent
    processing would fail, which is exactly what happens.

    Tom Vicker
    TomInDenver, Aug 23, 2010
    #5
  6. TomInDenver

    Daniel Pitts Guest

    On 8/23/2010 2:25 PM, TomInDenver wrote:
    > On Aug 23, 11:52 am, Daniel Pitts
    > <> wrote:
    >> On 8/23/2010 7:20 AM, TomInDenver wrote:
    >>
    >>
    >>
    >>> Hi,

    >>
    >>> The javadoc for awaitTermination in ExecutorService and
    >>> ThreadPoolExecutor includes the following:

    >>
    >>> Description
    >>> Blocks until all tasks have completed execution after a shutdown
    >>> request, or the timeout occurs, or the current thread is interrupted,
    >>> whichever happens first.

    >>
    >>> Returns:
    >>> true if this executor terminated and false if the timeout elapsed
    >>> before termination

    >>
    >>> We have occasionally noticed that awaitTermination returns true when
    >>> tasks submitted to the executor are still running, a timeout has not
    >>> occurred, and the submitting thread was not interrupted. This has been
    >>> an infrequent occurrence, but when it happens it severely impacts our
    >>> application. Our log clearly shows the condition (log messages from
    >>> Runnables exist after the awaitTermination returned true), and the
    >>> application behavior reflects the result of this condition (failures
    >>> due to threads still running when it is expected that the threads
    >>> have completed).

    >>
    >>> Below is the relevant code. (In this instance the tasks are
    >>> downloading files from an FTP site using a 3rd party FTP library, one
    >>> file per thread.)

    >>
    >>> Can anyone point out anything in this code that might cause the
    >>> problem, or suggest how we might refactor the code so the chances of
    >>> the problem occurring are reduced, or let us know if you recall a
    >>> bugfix for a problem like this ? We are using java build 1.6.0_11-
    >>> b03.

    >>
    >>> // Create thread pool
    >>> ExecutorService downloadThreadPool = new ThreadPoolExecutor(
    >>> 3, // corePoolSz
    >>> 5, // maxPoolSz,
    >>> 7, // keepalive (7 days)
    >>> TimeUnit.DAYS,
    >>> new LinkedBlockingQueue<Runnable>(),
    >>> new ThreadFactory() {
    >>> public Thread newThread(Runnable r) {
    >>> Thread t = new Thread(r);
    >>> t.setDaemon(false);
    >>> t.setName("XF-Download-Thread-Pool");
    >>> return t;
    >>> }
    >>> });

    >>
    >>> //The application creates many Runnables and then executes the
    >>> following line in a loop for each:
    >>> // (code to create Runnables not shown here)
    >>> downloadThreadPool.execute(aRunnable);

    >>
    >>> // Make threadpool wait up to 7 days for Runnables to end, after
    >>> which a threadpool timeout will occur.
    >>> downloadThreadPool.shutdown();
    >>> try {
    >>> if (!downloadThreadPool.awaitTermination(7, TimeUnit.DAYS)) {
    >>> Log.log(SPLogger.LogLevel.WARNING, "Threadpool timeout occurred",
    >>> SPLogger.LogPhase.UNKNOWN);
    >>> }
    >>> } catch (InterruptedException ie) {
    >>> Log.log(SPLogger.LogLevel.WARNING, "Threadpool prematurely
    >>> terminated due to interruption in thread that created pool",
    >>> SPLogger.LogPhase.UNKNOWN);
    >>> }

    >>
    >>> Thank you,

    >>
    >>> Tom Vicker

    >>
    >> The problem is in the code you didn't post.
    >>
    >> Please create an SSCCE and post it here.
    >> <http://sscce.org/>
    >>
    >> I suspicion is that perhaps it *is* terminated., but you're log might be
    >> buffered and the buffer isn't flushed in the order you expect.
    >>
    >> --
    >> Daniel Pitts' Tech Blog:<http://virtualinfinity.net/wordpress/>

    >
    > Daniel,
    >
    > Thanks for response. It seems you're interested in seeing the code of
    > the thread class. There is quite a bit of code I would need to post
    > and I am not sure it would be productive. The run() method code is
    > within a try/catch and the catch is for "Exception", which is handled
    > without rethrowing.
    >
    > I am curious what code could be in the Runnable that would cause the
    > ThreadPoolExecutor to think the thread has terminated when it actually
    > has not terminated. Do you know of anything that would cause this
    > condition ? Some of the code in the Runnable are calls to objects in a
    > 3rd party lib, so I cannot see what that code is doing. If I knew what
    > could cause the problem condition, I can check our code and also
    > request the 3rd party vendor to check their code. So please, if you
    > know what would cause it, please respond.
    >
    > The buffer flush scenario you described isn't happening because we see
    > evidence of the thread running before the timeout expiration and well
    > after the awaitTermination unblocked. Furthermore, the behavior of the
    > application is such that if the threads did not terminate, subsequent
    > processing would fail, which is exactly what happens.
    >
    > Tom Vicker

    Is it possible then that your runnables are spinner off yet more
    threads, and *those* threads are the ones you observe after
    awaitTermination completes?

    I'm not asking you to provide your entire code-base. And SSCCE is
    specifically the smallest running program which exhibits your problem.
    It is definitely worth the exercise of creating a test harness and
    attempting to recreate the errant conditions. You may find that you
    needn't post *any* code here, as you could discover the problem simply
    by trying to recreate it.

    --
    Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>
    Daniel Pitts, Aug 23, 2010
    #6
  7. TomInDenver

    TomInDenver Guest

    On Aug 23, 4:21 pm, Daniel Pitts
    <> wrote:
    > On 8/23/2010 2:25 PM, TomInDenver wrote:
    >
    > > On Aug 23, 11:52 am, Daniel Pitts
    > > <>  wrote:
    > >> On 8/23/2010 7:20 AM, TomInDenver wrote:

    >
    > >>> Hi,

    >
    > >>> The javadoc for awaitTermination in ExecutorService and
    > >>> ThreadPoolExecutor includes the following:

    >
    > >>> Description
    > >>> Blocks until all tasks have completed execution after a shutdown
    > >>> request, or the timeout occurs, or the current thread is interrupted,
    > >>> whichever happens first.

    >
    > >>> Returns:
    > >>> true if this executor terminated and false if the timeout elapsed
    > >>> before termination

    >
    > >>> We have occasionally noticed that awaitTermination returns true when
    > >>> tasks submitted to the executor are still running, a timeout has not
    > >>> occurred, and the submitting thread was not interrupted. This has been
    > >>> an infrequent occurrence, but when it happens it severely impacts our
    > >>> application. Our log clearly shows the condition (log messages from
    > >>> Runnables exist after the awaitTermination returned true), and the
    > >>> application behavior reflects the result of this condition (failures
    > >>> due to threads still running when it is expected that the threads
    > >>> have  completed).

    >
    > >>> Below is the relevant code. (In this instance the tasks are
    > >>> downloading files from an FTP site using a 3rd party FTP library, one
    > >>> file per thread.)

    >
    > >>> Can anyone point out anything in this code that might cause the
    > >>> problem, or suggest how we might refactor the code so the chances of
    > >>> the problem occurring are reduced, or let us know if you recall a
    > >>> bugfix for a problem like this ? We are using java build 1.6.0_11-
    > >>> b03.

    >
    > >>>           // Create thread pool
    > >>>     ExecutorService downloadThreadPool = new ThreadPoolExecutor(
    > >>>             3,    // corePoolSz
    > >>>             5,    // maxPoolSz,
    > >>>             7,    // keepalive (7 days)
    > >>>             TimeUnit.DAYS,
    > >>>             new LinkedBlockingQueue<Runnable>(),
    > >>>             new ThreadFactory() {
    > >>>                     public Thread newThread(Runnable r) {
    > >>>                     Thread t = new Thread(r);
    > >>>                     t.setDaemon(false);
    > >>>                     t.setName("XF-Download-Thread-Pool");
    > >>>                     return t;
    > >>>                     }
    > >>>             });

    >
    > >>>     //The application creates many Runnables and then executes the
    > >>> following line in a loop for each:
    > >>>     // (code to create Runnables not shown here)
    > >>>     downloadThreadPool.execute(aRunnable);

    >
    > >>>     // Make threadpool wait up to 7 days for Runnables to end, after
    > >>> which a threadpool timeout will occur.
    > >>>     downloadThreadPool.shutdown();
    > >>>     try {
    > >>>             if (!downloadThreadPool.awaitTermination(7, TimeUnit.DAYS)) {
    > >>>                     Log.log(SPLogger.LogLevel.WARNING, "Threadpool timeout occurred",
    > >>> SPLogger.LogPhase.UNKNOWN);
    > >>>             }
    > >>>     } catch (InterruptedException ie) {
    > >>>             Log.log(SPLogger.LogLevel.WARNING, "Threadpool prematurely
    > >>> terminated due to interruption in thread that created pool",
    > >>> SPLogger.LogPhase.UNKNOWN);
    > >>>     }

    >
    > >>> Thank you,

    >
    > >>> Tom Vicker

    >
    > >> The problem is in the code you didn't post.

    >
    > >> Please create an SSCCE and post it here.
    > >> <http://sscce.org/>

    >
    > >> I suspicion is that perhaps it *is* terminated., but you're log might be
    > >> buffered and the buffer isn't flushed in the order you expect.

    >
    > >> --
    > >> Daniel Pitts' Tech Blog:<http://virtualinfinity.net/wordpress/>

    >
    > > Daniel,

    >
    > > Thanks for response. It seems you're interested in seeing the code of
    > > the thread class. There is quite a bit of code I would need to post
    > > and I am not sure it would be productive. The run() method code is
    > > within a try/catch and the catch is for "Exception", which is handled
    > > without rethrowing.

    >
    > > I am curious what code could be in the Runnable that would cause the
    > > ThreadPoolExecutor to think the thread has terminated when it actually
    > > has not terminated. Do you know of anything that would cause this
    > > condition ? Some of the code in the Runnable are calls to objects in a
    > > 3rd party lib, so I cannot see what that code is doing. If I knew what
    > > could cause the problem condition, I can check our code and also
    > > request the 3rd party vendor to check their code. So please, if you
    > > know what would cause it, please respond.

    >
    > > The buffer flush scenario you described isn't happening because we see
    > > evidence of the thread running before the timeout expiration and well
    > > after the awaitTermination unblocked. Furthermore, the behavior of the
    > > application is such that if the threads did not terminate, subsequent
    > > processing would fail, which is exactly what happens.

    >
    > > Tom Vicker

    >
    > Is it possible then that your runnables are spinner off yet more
    > threads, and *those* threads are the ones you observe after
    > awaitTermination completes?
    >
    > I'm not asking you to provide your entire code-base. And SSCCE is
    > specifically the smallest running program which exhibits your problem.
    > It is definitely worth the exercise of creating a test harness and
    > attempting to recreate the errant conditions.  You may find that you
    > needn't post *any* code here, as you could discover the problem simply
    > by trying to recreate it.
    >
    > --
    > Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>


    Thanks Daniel, for these suggestions. I can assure you, we have tried
    to recreate this problem many, many times. I wouldn't post here unless
    that were true. The code runs very frequently on a daily basis by our
    customers, QA, and development people and 99% of the time it works
    perfectly. When the problem occurs, however, it is very visible
    because the integrity of our application falls apart because of it.

    Having a test harness run continuously might eventually expose the
    problem unless the problem is triggered by a specific condition that
    the test harness didn't create. Our threads are processing data from
    different vendor data feeds, so the data varies daily. If the problem
    occurs due to some oddity in the data, then it would be very hard to
    pinpoint. But this something we will definitely consider.

    If you have any insights into what key information we might be able to
    capture (like various state information within the ThreadPoolExecutor)
    on the rare occurrence when the problem happens, please let us know.

    Yes, it is possible that the Runnables submitted to the executor are
    themselves spinning off other Runnables. The 3rd party code we are
    calling could be doing this. But the logging we see after the executor
    terminates are coming from our Runnables that we submit to the
    executor. I wonder if there is some way the executor somehow gets
    confused where the termination of a distant Runnable that was spun off
    (from a Runnable submitted to the executor), makes the executor think
    a Runnable has terminated when it really hasn't ?? I guess we'd have
    to know how the executor keeps track (do you know ?).

    As a temporary work-around we are writing code to "backstop" the
    executor by using our own synchronizer to keep track of when the
    threads terminate. This is very unfortunate, but if we can't get it
    resolved, its our only choice.

    Tom Vicker
    TomInDenver, Aug 24, 2010
    #7
  8. On 24.08.2010 15:11, TomInDenver wrote:
    > On Aug 23, 4:21 pm, Daniel Pitts
    > <> wrote:
    >> On 8/23/2010 2:25 PM, TomInDenver wrote:
    >>
    >>> On Aug 23, 11:52 am, Daniel Pitts
    >>> <> wrote:
    >>>> On 8/23/2010 7:20 AM, TomInDenver wrote:

    >>
    >>>>> Hi,

    >>
    >>>>> The javadoc for awaitTermination in ExecutorService and
    >>>>> ThreadPoolExecutor includes the following:

    >>
    >>>>> Description
    >>>>> Blocks until all tasks have completed execution after a shutdown
    >>>>> request, or the timeout occurs, or the current thread is interrupted,
    >>>>> whichever happens first.

    >>
    >>>>> Returns:
    >>>>> true if this executor terminated and false if the timeout elapsed
    >>>>> before termination

    >>
    >>>>> We have occasionally noticed that awaitTermination returns true when
    >>>>> tasks submitted to the executor are still running, a timeout has not
    >>>>> occurred, and the submitting thread was not interrupted. This has been
    >>>>> an infrequent occurrence, but when it happens it severely impacts our
    >>>>> application. Our log clearly shows the condition (log messages from
    >>>>> Runnables exist after the awaitTermination returned true), and the
    >>>>> application behavior reflects the result of this condition (failures
    >>>>> due to threads still running when it is expected that the threads
    >>>>> have completed).

    >>
    >>>>> Below is the relevant code. (In this instance the tasks are
    >>>>> downloading files from an FTP site using a 3rd party FTP library, one
    >>>>> file per thread.)

    >>
    >>>>> Can anyone point out anything in this code that might cause the
    >>>>> problem, or suggest how we might refactor the code so the chances of
    >>>>> the problem occurring are reduced, or let us know if you recall a
    >>>>> bugfix for a problem like this ? We are using java build 1.6.0_11-
    >>>>> b03.

    >>
    >>>>> // Create thread pool
    >>>>> ExecutorService downloadThreadPool = new ThreadPoolExecutor(
    >>>>> 3, // corePoolSz
    >>>>> 5, // maxPoolSz,
    >>>>> 7, // keepalive (7 days)
    >>>>> TimeUnit.DAYS,
    >>>>> new LinkedBlockingQueue<Runnable>(),
    >>>>> new ThreadFactory() {
    >>>>> public Thread newThread(Runnable r) {
    >>>>> Thread t = new Thread(r);
    >>>>> t.setDaemon(false);
    >>>>> t.setName("XF-Download-Thread-Pool");
    >>>>> return t;
    >>>>> }
    >>>>> });

    >>
    >>>>> //The application creates many Runnables and then executes the
    >>>>> following line in a loop for each:
    >>>>> // (code to create Runnables not shown here)
    >>>>> downloadThreadPool.execute(aRunnable);

    >>
    >>>>> // Make threadpool wait up to 7 days for Runnables to end, after
    >>>>> which a threadpool timeout will occur.
    >>>>> downloadThreadPool.shutdown();
    >>>>> try {
    >>>>> if (!downloadThreadPool.awaitTermination(7, TimeUnit.DAYS)) {
    >>>>> Log.log(SPLogger.LogLevel.WARNING, "Threadpool timeout occurred",
    >>>>> SPLogger.LogPhase.UNKNOWN);
    >>>>> }
    >>>>> } catch (InterruptedException ie) {
    >>>>> Log.log(SPLogger.LogLevel.WARNING, "Threadpool prematurely
    >>>>> terminated due to interruption in thread that created pool",
    >>>>> SPLogger.LogPhase.UNKNOWN);
    >>>>> }

    >>
    >>>>> Thank you,

    >>
    >>>>> Tom Vicker

    >>
    >>>> The problem is in the code you didn't post.

    >>
    >>>> Please create an SSCCE and post it here.
    >>>> <http://sscce.org/>

    >>
    >>>> I suspicion is that perhaps it *is* terminated., but you're log might be
    >>>> buffered and the buffer isn't flushed in the order you expect.

    >>
    >>>> --
    >>>> Daniel Pitts' Tech Blog:<http://virtualinfinity.net/wordpress/>

    >>
    >>> Daniel,

    >>
    >>> Thanks for response. It seems you're interested in seeing the code of
    >>> the thread class. There is quite a bit of code I would need to post
    >>> and I am not sure it would be productive. The run() method code is
    >>> within a try/catch and the catch is for "Exception", which is handled
    >>> without rethrowing.

    >>
    >>> I am curious what code could be in the Runnable that would cause the
    >>> ThreadPoolExecutor to think the thread has terminated when it actually
    >>> has not terminated. Do you know of anything that would cause this
    >>> condition ? Some of the code in the Runnable are calls to objects in a
    >>> 3rd party lib, so I cannot see what that code is doing. If I knew what
    >>> could cause the problem condition, I can check our code and also
    >>> request the 3rd party vendor to check their code. So please, if you
    >>> know what would cause it, please respond.

    >>
    >>> The buffer flush scenario you described isn't happening because we see
    >>> evidence of the thread running before the timeout expiration and well
    >>> after the awaitTermination unblocked. Furthermore, the behavior of the
    >>> application is such that if the threads did not terminate, subsequent
    >>> processing would fail, which is exactly what happens.

    >>
    >>> Tom Vicker

    >>
    >> Is it possible then that your runnables are spinner off yet more
    >> threads, and *those* threads are the ones you observe after
    >> awaitTermination completes?
    >>
    >> I'm not asking you to provide your entire code-base. And SSCCE is
    >> specifically the smallest running program which exhibits your problem.
    >> It is definitely worth the exercise of creating a test harness and
    >> attempting to recreate the errant conditions. You may find that you
    >> needn't post *any* code here, as you could discover the problem simply
    >> by trying to recreate it.


    > Thanks Daniel, for these suggestions. I can assure you, we have tried
    > to recreate this problem many, many times. I wouldn't post here unless
    > that were true. The code runs very frequently on a daily basis by our
    > customers, QA, and development people and 99% of the time it works
    > perfectly. When the problem occurs, however, it is very visible
    > because the integrity of our application falls apart because of it.
    >
    > Having a test harness run continuously might eventually expose the
    > problem unless the problem is triggered by a specific condition that
    > the test harness didn't create. Our threads are processing data from
    > different vendor data feeds, so the data varies daily. If the problem
    > occurs due to some oddity in the data, then it would be very hard to
    > pinpoint. But this something we will definitely consider.
    >
    > If you have any insights into what key information we might be able to
    > capture (like various state information within the ThreadPoolExecutor)
    > on the rare occurrence when the problem happens, please let us know.
    >
    > Yes, it is possible that the Runnables submitted to the executor are
    > themselves spinning off other Runnables.


    What do you mean by this? Are you assuming that new threads are spawned
    from Runnable.run()?

    > The 3rd party code we are
    > calling could be doing this. But the logging we see after the executor
    > terminates are coming from our Runnables that we submit to the
    > executor. I wonder if there is some way the executor somehow gets
    > confused where the termination of a distant Runnable that was spun off
    > (from a Runnable submitted to the executor), makes the executor think
    > a Runnable has terminated when it really hasn't ?? I guess we'd have
    > to know how the executor keeps track (do you know ?).
    >
    > As a temporary work-around we are writing code to "backstop" the
    > executor by using our own synchronizer to keep track of when the
    > threads terminate. This is very unfortunate, but if we can't get it
    > resolved, its our only choice.


    I'm with Kevin on this one: we also have seen issues with it in the past
    (IIRC there was / is a bug with creation of threads and automatic
    termination of threads). I'd seriously consider cooking your own
    version of TPE. IMHO it is not that hard if you do not want all the
    features (dynamic thread creation for example). If you work with a
    fixed number of threads the job is comparatively easy. IMHO interface
    ExecutorService is also too big for many use cases so if you stick with
    Executor and a few necessary additions for shutdown handling the job
    will be easier than if you go for the full ExecutorService.

    My 0.02EUR

    Kind regards

    robert

    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
    Robert Klemme, Aug 24, 2010
    #8
  9. TomInDenver

    TomInDenver Guest

    On Aug 24, 1:51 pm, Robert Klemme <> wrote:
    > On 24.08.2010 15:11, TomInDenver wrote:
    >
    >
    >
    > > On Aug 23, 4:21 pm, Daniel Pitts
    > > <>  wrote:
    > >> On 8/23/2010 2:25 PM, TomInDenver wrote:

    >
    > >>> On Aug 23, 11:52 am, Daniel Pitts
    > >>> <>    wrote:
    > >>>> On 8/23/2010 7:20 AM, TomInDenver wrote:

    >
    > >>>>> Hi,

    >
    > >>>>> The javadoc for awaitTermination in ExecutorService and
    > >>>>> ThreadPoolExecutor includes the following:

    >
    > >>>>> Description
    > >>>>> Blocks until all tasks have completed execution after a shutdown
    > >>>>> request, or the timeout occurs, or the current thread is interrupted,
    > >>>>> whichever happens first.

    >
    > >>>>> Returns:
    > >>>>> true if this executor terminated and false if the timeout elapsed
    > >>>>> before termination

    >
    > >>>>> We have occasionally noticed that awaitTermination returns true when
    > >>>>> tasks submitted to the executor are still running, a timeout has not
    > >>>>> occurred, and the submitting thread was not interrupted. This has been
    > >>>>> an infrequent occurrence, but when it happens it severely impacts our
    > >>>>> application. Our log clearly shows the condition (log messages from
    > >>>>> Runnables exist after the awaitTermination returned true), and the
    > >>>>> application behavior reflects the result of this condition (failures
    > >>>>> due to threads still running when it is expected that the threads
    > >>>>> have  completed).

    >
    > >>>>> Below is the relevant code. (In this instance the tasks are
    > >>>>> downloading files from an FTP site using a 3rd party FTP library, one
    > >>>>> file per thread.)

    >
    > >>>>> Can anyone point out anything in this code that might cause the
    > >>>>> problem, or suggest how we might refactor the code so the chances of
    > >>>>> the problem occurring are reduced, or let us know if you recall a
    > >>>>> bugfix for a problem like this ? We are using java build 1.6.0_11-
    > >>>>> b03.

    >
    > >>>>>            // Create thread pool
    > >>>>>      ExecutorService downloadThreadPool = new ThreadPoolExecutor(
    > >>>>>              3,    // corePoolSz
    > >>>>>              5,    // maxPoolSz,
    > >>>>>              7,    // keepalive (7 days)
    > >>>>>              TimeUnit.DAYS,
    > >>>>>              new LinkedBlockingQueue<Runnable>(),
    > >>>>>              new ThreadFactory() {
    > >>>>>                      public Thread newThread(Runnable r) {
    > >>>>>                      Thread t = new Thread(r);
    > >>>>>                      t.setDaemon(false);
    > >>>>>                      t.setName("XF-Download-Thread-Pool");
    > >>>>>                      return t;
    > >>>>>                      }
    > >>>>>              });

    >
    > >>>>>      //The application creates many Runnables and then executes the
    > >>>>> following line in a loop for each:
    > >>>>>      // (code to create Runnables not shown here)
    > >>>>>      downloadThreadPool.execute(aRunnable);

    >
    > >>>>>      // Make threadpool wait up to 7 days for Runnables to end, after
    > >>>>> which a threadpool timeout will occur.
    > >>>>>      downloadThreadPool.shutdown();
    > >>>>>      try {
    > >>>>>              if (!downloadThreadPool.awaitTermination(7, TimeUnit.DAYS)) {
    > >>>>>                      Log.log(SPLogger.LogLevel.WARNING, "Threadpool timeout occurred",
    > >>>>> SPLogger.LogPhase.UNKNOWN);
    > >>>>>              }
    > >>>>>      } catch (InterruptedException ie) {
    > >>>>>              Log.log(SPLogger.LogLevel.WARNING, "Threadpool prematurely
    > >>>>> terminated due to interruption in thread that created pool",
    > >>>>> SPLogger.LogPhase.UNKNOWN);
    > >>>>>      }

    >
    > >>>>> Thank you,

    >
    > >>>>> Tom Vicker

    >
    > >>>> The problem is in the code you didn't post.

    >
    > >>>> Please create an SSCCE and post it here.
    > >>>> <http://sscce.org/>

    >
    > >>>> I suspicion is that perhaps it *is* terminated., but you're log might be
    > >>>> buffered and the buffer isn't flushed in the order you expect.

    >
    > >>>> --
    > >>>> Daniel Pitts' Tech Blog:<http://virtualinfinity.net/wordpress/>

    >
    > >>> Daniel,

    >
    > >>> Thanks for response. It seems you're interested in seeing the code of
    > >>> the thread class. There is quite a bit of code I would need to post
    > >>> and I am not sure it would be productive. The run() method code is
    > >>> within a try/catch and the catch is for "Exception", which is handled
    > >>> without rethrowing.

    >
    > >>> I am curious what code could be in the Runnable that would cause the
    > >>> ThreadPoolExecutor to think the thread has terminated when it actually
    > >>> has not terminated. Do you know of anything that would cause this
    > >>> condition ? Some of the code in the Runnable are calls to objects in a
    > >>> 3rd party lib, so I cannot see what that code is doing. If I knew what
    > >>> could cause the problem condition, I can check our code and also
    > >>> request the 3rd party vendor to check their code. So please, if you
    > >>> know what would cause it, please respond.

    >
    > >>> The buffer flush scenario you described isn't happening because we see
    > >>> evidence of the thread running before the timeout expiration and well
    > >>> after the awaitTermination unblocked. Furthermore, the behavior of the
    > >>> application is such that if the threads did not terminate, subsequent
    > >>> processing would fail, which is exactly what happens.

    >
    > >>> Tom Vicker

    >
    > >> Is it possible then that your runnables are spinner off yet more
    > >> threads, and *those* threads are the ones you observe after
    > >> awaitTermination completes?

    >
    > >> I'm not asking you to provide your entire code-base. And SSCCE is
    > >> specifically the smallest running program which exhibits your problem.
    > >> It is definitely worth the exercise of creating a test harness and
    > >> attempting to recreate the errant conditions.  You may find that you
    > >> needn't post *any* code here, as you could discover the problem simply
    > >> by trying to recreate it.

    > > Thanks Daniel, for these suggestions. I can assure you, we have tried
    > > to recreate this problem many, many times. I wouldn't post here unless
    > > that were true. The code runs very frequently on a daily basis by our
    > > customers, QA, and development people and 99% of the time it works
    > > perfectly. When the problem occurs, however, it is very visible
    > > because the integrity of our application falls apart because of it.

    >
    > > Having a test harness run continuously might eventually expose the
    > > problem unless the problem is triggered by a specific condition that
    > > the test harness didn't create. Our threads are processing data from
    > > different vendor data feeds, so the data varies daily. If the problem
    > > occurs due to some oddity in the data, then it would be very hard to
    > > pinpoint. But this something we will definitely consider.

    >
    > > If you have any insights into what key information we might be able to
    > > capture (like various state information within the ThreadPoolExecutor)
    > > on the rare occurrence when the problem happens, please let us know.

    >
    > > Yes, it is possible that the Runnables submitted to the executor are
    > > themselves spinning off other Runnables.

    >
    > What do you mean by this?  Are you assuming that new threads are spawned
    > from Runnable.run()?
    >
    > > The 3rd party code we are
    > > calling could be doing this. But the logging we see after the executor
    > > terminates are coming from our Runnables that we submit to the
    > > executor. I wonder if there is some way the executor somehow gets
    > > confused where the termination of a distant Runnable that was spun off
    > > (from a Runnable submitted to the executor), makes the executor think
    > > a Runnable has terminated when it really hasn't ?? I guess we'd have
    > > to know how the executor keeps track (do you know ?).

    >
    > > As a temporary work-around we are writing code to "backstop" the
    > > executor by using our own synchronizer to keep track of when the
    > > threads terminate. This is very unfortunate, but if we can't get it
    > > resolved, its our only choice.

    >
    > I'm with Kevin on this one: we also have seen issues with it in the past
    > (IIRC there was / is a bug with creation of threads and automatic
    > termination of threads).  I'd seriously consider cooking your own
    > version of TPE.  IMHO it is not that hard if you do not want all the
    > features (dynamic thread creation for example).  If you work with a
    > fixed number of threads the job is comparatively easy.  IMHO interface
    > ExecutorService is also too big for many use cases so if you stick with
    > Executor and a few necessary additions for shutdown handling the job
    > will be easier than if you go for the full ExecutorService.
    >
    > My 0.02EUR
    >
    > Kind regards
    >
    >         robert
    >
    > --
    > remember.guy do |as, often| as.you_can - without endhttp://blog.rubybestpractices.com/


    Yes, since we call into a 3rd party FTP lib from within the
    Runnable.run(), it is possible that other threads are spawned by this
    3rd party code, but we don't own that code and don't really know. But
    even if there were other threads spawned this way, I don't see how the
    TPE would know or care about them. It should only keep track of the
    threads that are submitted to it.

    Unfortunately, we are under some serious time constraints so at this
    point I couldn't get the OK to go with our own TPE implementation.
    That's something to keep in mind, though.

    Thanks,

    Tom Vicker
    TomInDenver, Aug 24, 2010
    #9
  10. On 24.08.2010 22:49, TomInDenver wrote:

    > Yes, since we call into a 3rd party FTP lib from within the
    > Runnable.run(), it is possible that other threads are spawned by this
    > 3rd party code, but we don't own that code and don't really know. But
    > even if there were other threads spawned this way, I don't see how the
    > TPE would know or care about them. It should only keep track of the
    > threads that are submitted to it.


    I would not expect it either but you never know. For example, if a
    ThreadGroup was used to track threads newly spawned threads might end up
    in the same group as the TPE's.

    > Unfortunately, we are under some serious time constraints so at this
    > point I couldn't get the OK to go with our own TPE implementation.
    > That's something to keep in mind, though.


    That's bad because in my experience these workarounds tend to stick
    forever and cause more trouble over the course of their existence than a
    real solution.

    Good luck!

    robert

    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
    Robert Klemme, Aug 25, 2010
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    0
    Views:
    818
  2. Marc E
    Replies:
    0
    Views:
    674
    Marc E
    Apr 29, 2006
  3. Maciej
    Replies:
    1
    Views:
    652
    Thomas Hawtin
    Oct 27, 2006
  4. Replies:
    11
    Views:
    21,194
  5. pksiazek
    Replies:
    2
    Views:
    1,590
    pksiazek
    Oct 15, 2007
Loading...

Share This Page