An Executor-like structure providing more than threads

Tom Anderson · Jan 17, 2010

Holla yalls,

This is a slightly confused question, for which i apologise. I've been on
a minor quick-and-dirty hacking run, and haven't really thought this
through properly.

An Executor is a thing that has a pool of Threads, accepts a stream of
Runnables (or Callables), pairs them up one after the other with Threads,
then sends the pairs off together to do some work, and accepts the Threads
back afterwards.

What if the resource needed to perform a task, the thing you wanted to
maintain a limited pool of, reuse, and provide shared access to, was more
than a Thread?

In my case, it was a Downloader, which was a Thread plus an instance of
Apache's HttpClient and a buffer. The tasks were URLs to download. URLs
come in, are assigned to a Downloader which downloads them, and when the
download is done, the Downloader goes back to be assigned another URL.

I could have written this with the Downloaders being the active party,
going to a queue of URLs, pulling one off, downloading it, then going back
for another. Or i could have put Downloaders in a pool, and had the
submission mechanism pull them out and hand URLs to them. But i really
wanted to be able to use all the cool stuff in ExecutorService, like
getting Futures and having orderly shutdown, and a properly controllable
thread pool and so on. So what i did was subclass Thread to add the other
bits (the HttpClient and so on), and then, in the tasks, do something
like:

((DownloadThread)Thread.currentThread()).getHttpClient()

And so on. I thought that was quite clever, although it is clearly also
entirely bletcherous.

Any thoughts? What's the right way to do this?

tom

Roedy Green · Jan 17, 2010

Any thoughts? What's the right way to do this?

Does Executor dispense just raw Threads or you can you get it to
dispense your custom Thread class? If so, your thread constructor
could allocate the extra goodies.

John B. Matthews · Jan 17, 2010

Tom Anderson said:
An Executor is a thing that has a pool of Threads, accepts a stream
of Runnables (or Callables), pairs them up one after the other with
Threads, then sends the pairs off together to do some work, and
accepts the Threads back afterwards.

What if the resource needed to perform a task, the thing you wanted
to maintain a limited pool of, reuse, and provide shared access to,
was more than a Thread?

In my case, it was a Downloader, which was a Thread plus an instance
of Apache's HttpClient and a buffer. The tasks were URLs to download.
URLs come in, are assigned to a Downloader which downloads them, and
when the download is done, the Downloader goes back to be assigned
another URL.

I could have written this with the Downloaders being the active
party, going to a queue of URLs, pulling one off, downloading it,
then going back for another. Or i could have put Downloaders in a
pool, and had the submission mechanism pull them out and hand URLs to
them. But i really wanted to be able to use all the cool stuff in
ExecutorService, like getting Futures and having orderly shutdown,
and a properly controllable thread pool and so on. So what i did was
subclass Thread to add the other bits (the HttpClient and so on), and
then, in the tasks, do something like:

((DownloadThread)Thread.currentThread()).getHttpClient()

And so on. I thought that was quite clever, although it is clearly also
entirely bletcherous.

Any thoughts? What's the right way to do this?

As DownloadThread implements Runnable, I'd think you can just execute()
them from, or submit() them for Future reference to, an Executor
created by Executors.newFixedThreadPool(), as shown here:

<http://groups.google.com/group/comp.lang.java.programmer/msg/1d9c821dcda040f6>

Daniel Pitts · Jan 17, 2010

Tom said:
Holla yalls,

This is a slightly confused question, for which i apologise. I've been
on a minor quick-and-dirty hacking run, and haven't really thought this
through properly.

An Executor is a thing that has a pool of Threads, accepts a stream of
Runnables (or Callables), pairs them up one after the other with
Threads, then sends the pairs off together to do some work, and accepts
the Threads back afterwards.

What if the resource needed to perform a task, the thing you wanted to
maintain a limited pool of, reuse, and provide shared access to, was
more than a Thread?

In my case, it was a Downloader, which was a Thread plus an instance of
Apache's HttpClient and a buffer. The tasks were URLs to download. URLs
come in, are assigned to a Downloader which downloads them, and when the
download is done, the Downloader goes back to be assigned another URL.

I could have written this with the Downloaders being the active party,
going to a queue of URLs, pulling one off, downloading it, then going
back for another. Or i could have put Downloaders in a pool, and had the
submission mechanism pull them out and hand URLs to them. But i really
wanted to be able to use all the cool stuff in ExecutorService, like
getting Futures and having orderly shutdown, and a properly controllable
thread pool and so on. So what i did was subclass Thread to add the
other bits (the HttpClient and so on), and then, in the tasks, do
something like:

((DownloadThread)Thread.currentThread()).getHttpClient()

And so on. I thought that was quite clever, although it is clearly also
entirely bletcherous.

Any thoughts? What's the right way to do this?

tom

The pattern you are asking about is called a Resource Pool. I wouldn't
combine the concept of Thread Pool with HttpClient Pool. They are
orthogonal concepts, so they should be accessible orthogonally.

Lew · Jan 17, 2010

Steven said:
What if the resource needed to perform a task, the thing you wanted to
maintain a limited pool of, reuse, and provide shared access to, was
more than a Thread?

[...] But i really wanted to be able to use all the cool stuff in
ExecutorService, like getting Futures and having orderly shutdown, and
a properly controllable thread pool and so on. So what i did was
subclass Thread to add the other bits (the HttpClient and so on), and
then, in the tasks, do something like:

((DownloadThread)Thread.currentThread()).getHttpClient()

Click to expand...

Would a ThreadLocal<HttpClient> be appropriate?

My noion based on the question and some of the answers given is that the
Runnable or Callable running from the thread pool should instantiate a client
locally from the separate client pool. It won't need 'ThreadLocal' because
the client reference will already be local. The tangle comes from the idea
that the client must be associated with the thread at construction of the
thread. I propose that the client be acquired in the run method of the thread.

I am not fond of 'ThreadLocal'; I'm not a fan of global objects generally.

Arved Sandstrom · Jan 17, 2010

Lew said:
Steven said:

What if the resource needed to perform a task, the thing you wanted to
maintain a limited pool of, reuse, and provide shared access to, was
more than a Thread?

[...] But i really wanted to be able to use all the cool stuff in
ExecutorService, like getting Futures and having orderly shutdown, and
a properly controllable thread pool and so on. So what i did was
subclass Thread to add the other bits (the HttpClient and so on), and
then, in the tasks, do something like:

((DownloadThread)Thread.currentThread()).getHttpClient()

Click to expand...

Would a ThreadLocal<HttpClient> be appropriate?

Click to expand...

My noion based on the question and some of the answers given is that the
Runnable or Callable running from the thread pool should instantiate a
client locally from the separate client pool. It won't need
'ThreadLocal' because the client reference will already be local. The
tangle comes from the idea that the client must be associated with the
thread at construction of the thread. I propose that the client be
acquired in the run method of the thread.

I am not fond of 'ThreadLocal'; I'm not a fan of global objects generally.

Used with care, though, used with care.

The canonical examples for
good use of ThreadLocals are probably best called "thread globals",
which is what I think you're getting at. However, this makes sense for
things like JDBC connections and transaction contexts.

In some of the applications I maintain the mechanism for keeping an
application-managed JPA EntityManager around for each request,
furthermore in a long-running transaction implementation, includes
ThreadLocal storage. To me this is defensible - the EM is part of the
"context" or "environment" of each thread.

The criticism you're making is well-founded. What is stored *in* the
ThreadLocal is not immune to being abused.

AHS

Tom Anderson · Jan 21, 2010

Does Executor dispense just raw Threads or you can you get it to
dispense your custom Thread class? If so, your thread constructor could
allocate the extra goodies.

I can, and that's exactly what it does:

ExecutorService executor = Executors.newFixedThreadPool(numThreads);
((ThreadPoolExecutor)executor).setThreadFactory(new DownloadThreadFactory());

public class DownloadThreadFactory implements ThreadFactory {
HttpConnectionManager connMgr = new MultiThreadedHttpConnectionManager();

public Thread newThread(Runnable r) {
return new DownloadThread(connMgr, r);
}
}

public class DownloadThread extends Thread {
private final HttpClient client;
private byte[] buffer;

public DownloadThread(HttpConnectionManager connMgr, Runnable r) {
super(r);
client = new HttpClient(connMgr);
buffer = new byte[1024];
}
}

tom

Tom Anderson · Jan 21, 2010

Tom said:
Tom said:

[...]
I could have written this with the Downloaders being the active party,
going to a queue of URLs, pulling one off, downloading it, then going back
for another. Or i could have put Downloaders in a pool, and had the
submission mechanism pull them out and hand URLs to them. But i really
wanted to be able to use all the cool stuff in ExecutorService, like
getting Futures and having orderly shutdown, and a properly controllable
thread pool and so on. So what i did was subclass Thread to add the other
bits (the HttpClient and so on), and then, in the tasks, do something like:

((DownloadThread)Thread.currentThread()).getHttpClient()

And so on. I thought that was quite clever, although it is clearly also
entirely bletcherous.

Click to expand...

I don't see anything fundamentally wrong with the idea. So long as you
really have a one-to-one relationship between the "downloader" and the
"thread" parts, maybe it does make sense to have a "downloader thread".

As far as the specific line of code you posted, the one change I would have
made is to have an appropriate static method in DownloadThread:

class DownloadThread extends Thread
{
public static DownloadThread currentThread() throws UnsupportedOperationException
{
Thread thread = Thread.currentThread();

if (thread instanceof DownloadThread)
{
return (DownloadThread)thread;
}

// Alternative to the below would be to just make the
// cast and let the bad cast exception propagate up
throw new UnsupportedOperationException("current thread is not a DownloadThread");
}
}

That way you don't have a bunch of explicit casting all over the place.

That's certainly an improvement.

Though, that said it seems to me that most if not all of the time, you should
not need the currentThread() method anyway, because you should be executing
methods within the DownloadThread instance already.

Ah, no, because the idea is that the DownloadThread is infrastructure to
support all sorts of generic downloading tasks. One might be to get a file
from a given URL; another might be to get a series of files from the same
server, yet another might download one thing, parse it to find the URL to
another thing, then download that, etc. Moreover, a single DownloadThread
might do different tasks in succession. That means having code outside the
DownloadThread itself.

Although i admit that in this app, there was only one type of task. I
would like to turn this into a generic utility, though.

Cases where client code needs to get specific output of the
DownloadThread class would be handled instead via some mechanism where
you don't need to concern yourself with the current thread. For
example, a listener/event API, or even just a simple completion
callback, either approach in a Future implementation where the relevant
instance of the DownloadThread class is delivered via some mechanism
other than checking the current Thread instance.

I guess the more I think about it, the more I wonder why you should ever need
to make the assumption that the DownloadThread object of interest is the same
as the current Thread object.

I also think that you could implement a Downloader pool that is not so
explicitly tied to a Thread pool. It seems to me that the Thread pool
dependency is an implementation detail, and should be abstracted such that
the client of the Downloader pool should not need to know or concern their
self with that detail.

I would think that the Downloader pool client would simply submit the URLs,
and receive notification via some mechanism of completion defined by the
Downloader pool. That the Downloader pool is itself depending on a Thread
pool behind the scenes should be irrelevant and unknowable from the API
alone. To make the Downloader and Thread one and the same object seems like
a leaky abstraction to me.

Could be. If by 'thread pool', you mean something like an executor, where
there is explicitly a pool of threads, then i can't see how you'd
implement the Downloader pool - would Downloaders fetch threads from the
pool when they need to run or something? Store the download task in
themselves, then submit themselves to the executor, having a run method
which runs the download task? Yes, that would work.

I wonder if there's a design pattern there - objects which submit
themselves to be executed to service a request from another object,
converting method parameters into instance fields. Sort of a variation of
Active Object. I haven't seen it enough times to say so, really, but it
has the feel of a pattern, somehow.

Thinking about it, i could probably do away with pools altogether and just
have active threads pulling tasks from a queue and executing them; these
tasks could have a wider interface than Runnable, through which a
HttpClient and buffer could be passed:

interface Download {
public void perform(HttpClient client, byte[] buffer);
}

class DownloadWorker implements Runnable {
private BlockingQueue<Download> downloads; // shared
private HttpClient client;
private byte[] buffer;

public DownloadWorker(BlockingQueue<Download> downloads, HttpConnectionManager connMgr) {
this.downloads = downloads;
client = new HttpClient(connMgr);
buffer = new byte[1024];
}
public void run() {
// exception handling not shown
while (true) {
Download dl = downloads.take();
dl.perform(client, buffer);
}
}
}

BlockingQueue<Download> downloads;
HttpConnectionManager connMgr;
// pool management also not shown
for (int i = 0; i < numThreads; ++i) {
new Thread(new DownloadWorker(downloads, connMgr)).start();
}

Look ma, no executor!

That said, I try to be pragmatic. There are certainly others who are
more OOP-Nazi than I am (and I preemptively reject the rampant
over-application of Godwin's law that goes on in this newsgroup) and who
might insist against an unnecessary dependency in the object hierarchy
like that.

But while I personally probably wouldn't implement it that way, I'm not
comfortable asserting that your own choice is wrong. Especially not
having a full view of the entire system you're implementing, and
especially having experience with your other contributions to the
newsgroup (granted, sometimes having a positive reputation can interfere
with your ability to receive good, critical feedback because people make
too broad an assumption that you know what you're doing ). Maybe for
your particular system, this really is a good design.

I refer you to the timestamp on my original message - it was getting on
for one in the morning when i wrote that. I wouldn't for a second dream of
trying to defend that code!

tom

Tom Anderson · Jan 21, 2010

The pattern you are asking about is called a Resource Pool. I wouldn't
combine the concept of Thread Pool with HttpClient Pool. They are
orthogonal concepts, so they should be accessible orthogonally.

Except there's a 1:1 relationship between threads and HttpClients. Yes,
you could write a task which looked like:

public void run() {
HttpClient client = httpClientPool.acquire();
// do stuff
httpClientPool.release(client);
}

But that seems like unnecessary boilerplate. Still, it is actually less
code than my smartarse solution, so maybe it's the right answer.

It would deal gracefully with the case where a task needs no HttpClient,
or more than one, which mine doesn't. I wouldn't anticipate those being
common, though.

tom

Tom Anderson · Jan 21, 2010

What if the resource needed to perform a task, the thing you wanted to
maintain a limited pool of, reuse, and provide shared access to, was
more than a Thread?

[...] But i really wanted to be able to use all the cool stuff in
ExecutorService, like getting Futures and having orderly shutdown, and
a properly controllable thread pool and so on. So what i did was
subclass Thread to add the other bits (the HttpClient and so on), and
then, in the tasks, do something like:

((DownloadThread)Thread.currentThread()).getHttpClient()

Click to expand...

Would a ThreadLocal<HttpClient> be appropriate?

That would certainly do it. ThreadLocals make me uneasy, though. Don't
really know why - intellectually, i know they're no different to a normal
variable with the same scope.

tom

Daniel Pitts · Jan 21, 2010

Tom said:
Except there's a 1:1 relationship between threads and HttpClients. Yes,
you could write a task which looked like:

public void run() {
HttpClient client = httpClientPool.acquire();
// do stuff
httpClientPool.release(client);
}

But that seems like unnecessary boilerplate. Still, it is actually less
code than my smartarse solution, so maybe it's the right answer.

It would deal gracefully with the case where a task needs no HttpClient,
or more than one, which mine doesn't. I wouldn't anticipate those being
common, though.

tom

Are HttpClient objects really that expensive to create? I noticed your
solution is re-using the same connection manager... Perhaps you're
anticipating problems where there are non, and you should just use "new
HttpClient(connectionManager)" where it is needed, and forget about
HttpClient pooling.

*then*, consider profiling your code. If it turns out that there is a
lot of time spent allocating and releasing HttpClient objects, then you
should consider pooling.

Tom Anderson · Jan 21, 2010

Steven said:
Steven said:

On 17/01/10 01:11, Tom Anderson wrote:
What if the resource needed to perform a task, the thing you wanted to
maintain a limited pool of, reuse, and provide shared access to, was
more than a Thread?

[...] But i really wanted to be able to use all the cool stuff in
ExecutorService, like getting Futures and having orderly shutdown, and
a properly controllable thread pool and so on. So what i did was
subclass Thread to add the other bits (the HttpClient and so on), and
then, in the tasks, do something like:

((DownloadThread)Thread.currentThread()).getHttpClient()

Would a ThreadLocal<HttpClient> be appropriate?

Click to expand...

My noion based on the question and some of the answers given is that
the Runnable or Callable running from the thread pool should
instantiate a client locally from the separate client pool.

Click to expand...

I thought the point was that you need at least one HttpClient per thread
(or you will have no more threads running than HttpClients), and you do
not need more than one per thread (spares are wasted). (This is what I
take from the items of the pool being 'more than a Thread'.) The
threads are already in a pool, so you may as well exploit the thread
pool policy for maintaining HttpClients, rather than trying to build a
parallel pool.
Exactly.

It won't need 'ThreadLocal' because the client reference will already
be local. The tangle comes from the idea that the client must be
associated with the thread at construction of the thread. I propose
that the client be acquired in the run method of the thread.

Click to expand...

I am supposing that the OP is using one of the Executors statics to
build a thread pool, so he'll have to pass in a ThreadFactory, in order
to inject his DownloadThread subclass. The thread pool determines what
Runnable to pass to the thread, so he can't inject his own
client-acquiring code there. (He could supply his own Runnable to the
thread to create the client, and then run the pool's Runnable:

new ThreadFactory() {
public Thread createThread(final Runnable action) {
Runnable myAction = new Runnable() {
public void run() {
HttpClient client = new HttpClient(...);
action.run();
}
};
return new DownloadThread(myAction);
}
}

...but he still can't get that client reference into action.run() or
indeed the tasks' run()/call() methods, except by the way he's doing it
now (making it a field in DownloadThread).)

Right. The whole DownloadThread thing was really an attempt to pass the
HttpClient around the side of the Runnable interface.

With the ThreadLocal, the first task that each thread executes will set
up the HttpClient, and subsequent tasks will re-use it:

// field with same lifetime as the ExecutorService
ThreadLocal<HttpClient> clientPool = new ThreadLocal<HttpClient>();

// in method creating new task
Runnable action = new Runnable() {
public void run() {
HttpClient client = clientPool.get();
if (client == null) {
client = new HttpClient(...);
clientPool.set(client);
}

// Do stuff with client...
}
};

You could remove the uncertainty by putting the setup in the factory:

public Thread newThread(Runnable r) {
return new Thread(new Runnable() {
public void run() {
clientPool.set(new HttpClient(connMgr));
r.run();
}
});
}

That will run the HttpClient setup before heading into the code supplied
by the Executor, which is presumably some kind of internal class which
pulls Runnables off the Executor's queue.

I think that's cleaner, as it doesn't require a Thread subclass, so it
doesn't require a ThreadFactory, nor the 'bletcherous' cast to the subclass.

But it does require a ThreadLocal, which is on about the same level as
Thread subclasses and bletcherous casts, ie correct and legal, but
aesthetically displeasing.

tom

Daniel Pitts · Jan 21, 2010

Tom said:
Steven Simpson wrote:
On 17/01/10 01:11, Tom Anderson wrote:
What if the resource needed to perform a task, the thing you wanted to
maintain a limited pool of, reuse, and provide shared access to, was
more than a Thread?

[...] But i really wanted to be able to use all the cool stuff in
ExecutorService, like getting Futures and having orderly shutdown, and
a properly controllable thread pool and so on. So what i did was
subclass Thread to add the other bits (the HttpClient and so on), and
then, in the tasks, do something like:

((DownloadThread)Thread.currentThread()).getHttpClient()

Would a ThreadLocal<HttpClient> be appropriate?

My noion based on the question and some of the answers given is that
the Runnable or Callable running from the thread pool should
instantiate a client locally from the separate client pool.

Click to expand...

I thought the point was that you need at least one HttpClient per
thread (or you will have no more threads running than HttpClients),
and you do not need more than one per thread (spares are wasted).
(This is what I take from the items of the pool being 'more than a
Thread'.) The threads are already in a pool, so you may as well
exploit the thread pool policy for maintaining HttpClients, rather
than trying to build a parallel pool.
Exactly.

It won't need 'ThreadLocal' because the client reference will already
be local. The tangle comes from the idea that the client must be
associated with the thread at construction of the thread. I propose
that the client be acquired in the run method of the thread.

Click to expand...

I am supposing that the OP is using one of the Executors statics to
build a thread pool, so he'll have to pass in a ThreadFactory, in
order to inject his DownloadThread subclass. The thread pool
determines what Runnable to pass to the thread, so he can't inject his
own client-acquiring code there. (He could supply his own Runnable to
the thread to create the client, and then run the pool's Runnable:

new ThreadFactory() {
public Thread createThread(final Runnable action) {
Runnable myAction = new Runnable() {
public void run() {
HttpClient client = new HttpClient(...);
action.run();
}
};
return new DownloadThread(myAction);
}
}

...but he still can't get that client reference into action.run() or
indeed the tasks' run()/call() methods, except by the way he's doing
it now (making it a field in DownloadThread).)

Click to expand...

Right. The whole DownloadThread thing was really an attempt to pass the
HttpClient around the side of the Runnable interface.

With the ThreadLocal, the first task that each thread executes will set
up the HttpClient, and subsequent tasks will re-use it:

// field with same lifetime as the ExecutorService
ThreadLocal<HttpClient> clientPool = new ThreadLocal<HttpClient>();

// in method creating new task
Runnable action = new Runnable() {
public void run() {
HttpClient client = clientPool.get();
if (client == null) {
client = new HttpClient(...);
clientPool.set(client);
}

// Do stuff with client...
}
};

Click to expand...

You could remove the uncertainty by putting the setup in the factory:

public Thread newThread(Runnable r) {
return new Thread(new Runnable() {
public void run() {
clientPool.set(new HttpClient(connMgr));
r.run();
}
});
}

That will run the HttpClient setup before heading into the code supplied
by the Executor, which is presumably some kind of internal class which
pulls Runnables off the Executor's queue.

I think that's cleaner, as it doesn't require a Thread subclass, so it
doesn't require a ThreadFactory, nor the 'bletcherous' cast to the
subclass.

Click to expand...

But it does require a ThreadLocal, which is on about the same level as
Thread subclasses and bletcherous casts, ie correct and legal, but
aesthetically displeasing.

tom

Actually, the appropriate construct would be:

private final ThreadLocal<HttpClient> pool = ThreadLocal<HttpClient>() {
@Override
protected HttpClient initialValue() {
return new HttpClient(connMgr);
}
};

Executor	8	Sep 19, 2005
Parallel processing using Executor?	8	Aug 14, 2007
How to not load an insanely big dataset in less than 50 hrs	1	Sep 2, 2023
Problem with awaitTermination in ThreadpoolExecutor.	9	Aug 23, 2010
Unlimited threads being created	13	Mar 18, 2008
JNI, threads are still running after DestroyJavaVM()	2	Feb 19, 2013
OutOfMemory using ThreadPoolExecutor	3	Apr 22, 2010
No more than N element of an array	12	Jul 25, 2013

An Executor-like structure providing more than threads

Tom Anderson

Roedy Green

John B. Matthews

Daniel Pitts

Lew

Arved Sandstrom

Tom Anderson

Tom Anderson

Tom Anderson

Tom Anderson

Daniel Pitts

Tom Anderson

Daniel Pitts

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads