concurrency, threads and objects

T

Tom Forsmo

Hi

I have recently done some thread programming in java, my previous
experience is from posix threads in C. There is one thing that puzzles
me about thread programming in java.

In C there are no function instances (as in objects or similar things),
only function invocations. When programming threads in C there is one
function and several threads with its separate function invocations.
In java you can either create an object and have a number of threads
execute its run() method or you can create one object per thread.

What puzzles me is that in a way both ways seems slightly wrong.

Creating a number of objects with a thread for each is sort of like
creating many separate programs/processes, it seems like waste of
objects to start with. Why create as many objects as you would create
threads?

Creating only one object and creating a number of threads for that
objects run method, also seems wrong, for two reasons. 1) the object
could then not have any state unless it was to be shared and 2) when
reading the name of the thread all threads are named the same.

I understand that it would be normal to create other objects and execute
their methods run(), which would then effectively create a
function/method invocation and it all aligns well with my previous
perception. But the startup part of it seems a bit strange to me.

Anyone care to help me push my perception into alignment again.

tom
 
C

Chris Uppal

Tom said:
Creating only one object and creating a number of threads for that
objects run method, also seems wrong, for two reasons. 1) the object
could then not have any state unless it was to be shared and [...]

That is correct, but it may be the semantics that you want -- if several
threads have to share data for instance. (It's not the typical case, though.)

[...] 2) when
reading the name of the thread all threads are named the same.

The name comes from the instance of Thread, and can be set independently of the
Runnable object that each thread executes.

Creating a number of objects with a thread for each is sort of like
creating many separate programs/processes, it seems like waste of
objects to start with. Why create as many objects as you would create
threads?

I don't know Posix threads, but I assume there is some way that you can ask the
system about a thread once created (is it still running, what groups does it
belong to, and so on). So there is a /something/ there that you can talk
about. In Java a "something" is always represented by an object, so in Java
there is an object (instance of Thread) which stands for each thread.

As an example, a thread couldn't have a name unless there was an object to hold
the name.

-- chris
 
B

bugbear

Tom said:
Hi

I have recently done some thread programming in java, my previous
experience is from posix threads in C. There is one thing that puzzles
me about thread programming in java.

In C there are no function instances (as in objects or similar things),
only function invocations. When programming threads in C there is one
function and several threads with its separate function invocations.
In java you can either create an object and have a number of threads
execute its run() method or you can create one object per thread.

What puzzles me is that in a way both ways seems slightly wrong.

Objects to the rescue!

Data that is per thread should be in the object associated
with the thread.

Shared data should be another object(s), held in a
field of the per-thread objects.

BugBear
 
R

Robert Klemme

Hi

I have recently done some thread programming in java, my previous
experience is from posix threads in C. There is one thing that puzzles
me about thread programming in java.

In C there are no function instances (as in objects or similar things),
only function invocations. When programming threads in C there is one
function and several threads with its separate function invocations.
In java you can either create an object and have a number of threads
execute its run() method or you can create one object per thread.

What puzzles me is that in a way both ways seems slightly wrong.

Both are valid approaches although the one instance per thread seems
more common.
Creating a number of objects with a thread for each is sort of like
creating many separate programs/processes, it seems like waste of
objects to start with. Why create as many objects as you would create
threads?

In order to not let the threads interfere with each other. If you work
with Runnable then this is the typical setup, i.e. you create one
Runnable instance per thread. That's not really a waste of objects
since it's just this one instance. An object is nothing "heavy" at
least not by default. Creating an instance without any state is very
cheap on modern JVM's. You can easily create tons of objects per second.
Creating only one object and creating a number of threads for that
objects run method, also seems wrong, for two reasons. 1) the object
could then not have any state unless it was to be shared

Exactly. And you must synchronize access to that state. But if the
Runnable just implements some kind of function (i.e. has no state of its
own) it is perfectly ok to execute it from multiple threads.
> and 2) when
reading the name of the thread all threads are named the same.

This is wrong. The name is read from the Thread instance not from the
Runnable the thread executes (unless of course you make the name you are
referring to a member of that Runnable).
I understand that it would be normal to create other objects and execute
their methods run(), which would then effectively create a
function/method invocation and it all aligns well with my previous
perception. But the startup part of it seems a bit strange to me.

You do not actually create a function but an object. That object can
have methods (and typically has). You can invoke methods on an object
from multiple threads - in some cases it works, in others it does not.
That completely depends on the class implementation: if there is no
state or if access to state is properly synchronized then it is likely
to work from multiple threads - if there is state and accesses are not
properly synchronized all bets are off.
Anyone care to help me push my perception into alignment again.

It seems to me that your problem might more lie in the area of object
oriented thinking. This can be difficult when coming from a procedural
background. The same happened to me when I embraced OO. It can take
some time to get used to it. However, there are plenty resources out
there that introduce OO.

For reading up on Java threads I can very much recommend Doug Lea's book
and website:

http://www.awprofessional.com/bookstore/product.asp?isbn=0201310090&rl=1
http://g.oswego.edu/dl/

Kind regards

robert
 
T

Tom Forsmo

Chris said:
Tom said:
Creating only one object and creating a number of threads for that
objects run method, also seems wrong, for two reasons. 1) the object
could then not have any state unless it was to be shared and [...]

That is correct, but it may be the semantics that you want -- if several
threads have to share data for instance. (It's not the typical case, though.)

I didn't quite think of it like that...
[...] 2) when
reading the name of the thread all threads are named the same.

The name comes from the instance of Thread, and can be set independently of the
Runnable object that each thread executes.

I tried this but it did not work properly. What I did was this:

thr = new Thread[opt.getThreads()];

for(int i=0; i<opt.getThreads(); i++) {
thr = new Thread(this);
thr.setName("thread num: " + i);
thr.start();
}

But I found out, just now, that if I use setName() inside run(), then it
works ok, why is that? the object is created outside run so changing its
state there should be ok, unless the object is reinitialised when run()
starts to execute.
I don't know Posix threads, but I assume there is some way that you can ask the
system about a thread once created (is it still running, what groups does it
belong to, and so on). So there is a /something/ there that you can talk
about. In Java a "something" is always represented by an object, so in Java
there is an object (instance of Thread) which stands for each thread.

As an example, a thread couldn't have a name unless there was an object to hold
the name.

Sorry, what I meant was my defined objects, not the Thread objects. E.g.
I can define an class ClassA which implements Runnable. If I use 100
threads, I would create 100 ClassA objects, which means I would have 100
ClassA objects and 100 Thread objects.

thr = new Thread[opt.getThreads()];

for(int i=0; i<opt.getThreads(); i++) {
Runnable r = new ClassA();
thr = new Thread(r);
thr.start();
}


Of course the Thread object is here composed of the ClassA object, so it
would be the same as if you extended a Thread class, but that is my
point. It seems like a waste of resources to have those 100 ClassA
objects lying around for no reason. A thread safe program requires
re-entrancy, which it in this example solves, not by having a re-entrant
object, but rather by just creating completely new objects avoiding the
entire issue... Sort of like starting 100 separate programs/processes.
This is where my perception clashes with java threads.

tom
 
T

Thomas Fritsch

Tom Forsmo wrote:
[...]
I can define an class ClassA which implements Runnable. If I use 100
threads, I would create 100 ClassA objects, which means I would have 100
ClassA objects and 100 Thread objects.

thr = new Thread[opt.getThreads()];

for(int i=0; i<opt.getThreads(); i++) {
Runnable r = new ClassA();
thr = new Thread(r);
thr.start();
}


Of course the Thread object is here composed of the ClassA object, so it
would be the same as if you extended a Thread class, but that is my
point. It seems like a waste of resources to have those 100 ClassA
objects lying around for no reason.

So, why not use the same ClassA object for all your 100 threads?
Runnable r = new ClassA();
for(int i=0; i<opt.getThreads(); i++) {
thr = new Thread(r);
thr.start();
}
But of course this approach depends on how careful your ClassA object
handles concurrent calls to its run() method.
A thread safe program requires
re-entrancy, which it in this example solves, not by having a re-entrant
object, but rather by just creating completely new objects avoiding the
entire issue... Sort of like starting 100 separate programs/processes.
Not quite: a thread is actually much cheaper than a process.
This is where my perception clashes with java threads.
Well, having 100 objects may not be such a big thing as you suspect. A
Java Thread object is actually nothing more than a native OS thread and
a few more bytes (for its member variables).
 
R

Robert Klemme

Sorry, what I meant was my defined objects, not the Thread objects. E.g.
I can define an class ClassA which implements Runnable. If I use 100
threads, I would create 100 ClassA objects, which means I would have 100
ClassA objects and 100 Thread objects.

thr = new Thread[opt.getThreads()];

for(int i=0; i<opt.getThreads(); i++) {
Runnable r = new ClassA();
thr = new Thread(r);
thr.start();
}


Of course the Thread object is here composed of the ClassA object, so it
would be the same as if you extended a Thread class, but that is my
point. It seems like a waste of resources to have those 100 ClassA
objects lying around for no reason.


As said earlier, the overhead of a single object is not much. And the
advantage of using a class that implements Runnable is that you get more
flexibility. You can, for example, push those instances into a queue
and have a fixed number of threads processing them one by one. (I think
Doug calls it "lightweight processing framework".)
> A thread safe program requires
re-entrancy, which it in this example solves, not by having a re-entrant
object, but rather by just creating completely new objects avoiding the
entire issue... Sort of like starting 100 separate programs/processes.
This is where my perception clashes with java threads.

The concept is called "thread confinement" (-> Doug Lea's book).
Basically this means that you avoid congestion by having separate sets
of data which in turn removes the necessity of synchronization.

Whether you apply that or not depends on the circumstances. Also of
course this can be mixed with other approaches so you get very precise
control over which data is shared and which not.

Regards

robert
 
C

Chris Uppal

Tom said:
[...] 2) when
reading the name of the thread all threads are named the same.

The name comes from the instance of Thread, and can be set
independently of the Runnable object that each thread executes.

I tried this but it did not work properly. What I did was this:

thr = new Thread[opt.getThreads()];

for(int i=0; i<opt.getThreads(); i++) {
thr = new Thread(this);
thr.setName("thread num: " + i);
thr.start();
}

But I found out, just now, that if I use setName() inside run(), then it
works ok, why is that? the object is created outside run so changing its
state there should be ok, unless the object is reinitialised when run()
starts to execute.


I can't think of any reason why setName() wouldn't work. I admit I haven't
tested it, but I can't see anything odd in the source. I suspect that it's an
artefact of whatever you are using to "see" the Thread's names.

But why use setName() at all ? It seems easier just to pass the correct names
to the Threads' constructors in the first place.

If I use 100
threads, I would create 100 ClassA objects, which means I would have 100
ClassA objects and 100 Thread objects.

So what ?

;-)

Think of it like this. If those 100 objects which implement Runnable are
genuinely unnecessary, then they must be effectively stateless in that none of
the processing in any thread depends on the state of its Runnable object -- in
which case there is no reason not to use the same Runnable for every Thread.
But, going further, if they are /actually/ stateless, or nearly so, then they
are so cheap that they cost much less (in space and time) than the Thread
itself, and will almost certainly cost less even than the thread's /name/, so
there is no reason to go to the (cognitive) effort of reusing the same object.
Just create 100 of 'em -- you can afford it.

OTOH, if the Runnables' states /do/ affect the subsequent execution, then you
obviously can't get away without having separate objects...

BTW, the most common case is that each thread /is/ parameterised in some way
(which Socket to read from, which array to process, which Snoggle to
delaminate(), ....) and the Runnable objects are the natural place to put that
information.

-- chris
 
T

Tom Forsmo

Chris said:
I can't think of any reason why setName() wouldn't work. I admit I haven't
tested it, but I can't see anything odd in the source. I suspect that it's an
artefact of whatever you are using to "see" the Thread's names.

I use println() just before start(), and then another println() inside
run() to print the progress of each thread. When I print before start()
it prints out the name I set, but when I print inside run() the name is
reset to the original name. Its almost as if there is an object
reinitialisation when run starts.
But why use setName() at all ? It seems easier just to pass the correct names
to the Threads' constructors in the first place.

I haven't tried that, it might work.
So what ?

I don't believe in code bloat and I see it as unnecessary runtime
resource consumption. I don't subscribe to the idea that you should not
worry about resources (cpu, memory etc.), because its so cheap. The
reason is simple, bloated code runs slower and is more difficult to
maintain. Think of a program that takes up 300 MB of memory and compare
it to a program that only requires say, 150MB. The smaller program
requires less bus bandwidth between the cpu, memory and disk and less
processing cycles (barring algorithm efficiency).

I do see, though, that there are solutions where doing having one object
per thread is beneficial. But it sort of leaves a bad taste in my mouth...
Think of it like this. If those 100 objects which implement Runnable are
genuinely unnecessary, then they must be effectively stateless in that none of
the processing in any thread depends on the state of its Runnable object -- in
which case there is no reason not to use the same Runnable for every Thread.
But, going further, if they are /actually/ stateless, or nearly so, then they
are so cheap that they cost much less (in space and time) than the Thread
itself, and will almost certainly cost less even than the thread's /name/, so
there is no reason to go to the (cognitive) effort of reusing the same object.
Just create 100 of 'em -- you can afford it.

OTOH, if the Runnables' states /do/ affect the subsequent execution, then you
obviously can't get away without having separate objects...

BTW, the most common case is that each thread /is/ parameterised in some way
(which Socket to read from, which array to process, which Snoggle to
delaminate(), ....) and the Runnable objects are the natural place to put that
information.

I have come to the same conclusions as well. But as I said I think it
leaves a bad taste... But then again, I might just be a bit picky.

thanks for all your feedback.

tom
 
R

Robert Klemme

Tom said:
Where in Dougs book is that described? I could not find it.

?

http://www.awprofessional.com/bookstore/product.asp?isbn=0201310090&rl=1#info2

click on Table of Contents, Chapter 2
In any case the more general computer science term is re-entrant, as
part of the subject of thread safe code, see

Reentrancy (?) is something completely different from thread
confinement. You can use the latter to achieve the former. Reentrancy
is a property of a piece of code while thread confinement is a technique
which can be used to achieve the other - and other ends.

Regards

robert
 
A

A. Bolmarcich

I use println() just before start(), and then another println() inside
run() to print the progress of each thread. When I print before start()
it prints out the name I set, but when I print inside run() the name is
reset to the original name. Its almost as if there is an object
reinitialisation when run starts.

Be careful to invoke getName() on the same object on which setName()
was invoked. Within run() you likely want to use an expression like

Thread.currentThread().getName()

Instead of posting partial code, please post a small complete program that
demonstrates the problem that others can compile and run.
I haven't tried that, it might work.
[snip]

Not in the example code that posted. It had a loop whose body contained

thr = new Thread(this);

That would create 100 Thread objects. The statement

thr.start();

later in the loop would invoke the run() method of the (single) object
referred by the keyword "this" 100 times.
 
T

Tom Forsmo

Robert said:

That explains it, I have edition 1 of the book, you are referring to
edition 2. A question then, do you know if 2nd Ed much different that
1st Ed?

I see there is a difference in the TOC, but to me it seems like a
reorganisation of the book with possibly better titles or something. The
book came out 2 years after the first, so it seemed to me that there
really could not be much difference. Basically since the theory and
practice of concurrent programming is quite old, so much would probably
not have changed in 2 years.
Reentrancy (?) is something completely different from thread
confinement.

Fair enough, but I was talking about re-entrancy. I am not saying thread
confinement could not be used, thought.

Btw could you give a bit more detailed description about thread
confinement, I could not find anything when googling, only references to
Dougs book.

tom
 
T

Tom Forsmo

A. Bolmarcich said:
Be careful to invoke getName() on the same object on which setName()
was invoked. Within run() you likely want to use an expression like

Thread.currentThread().getName()

that seemed to work, but I don't understand quite why.
Instead of posting partial code, please post a small complete program that
demonstrates the problem that others can compile and run.

(Please don't post comments in a thread where they don't belong, it
makes it difficult to understand what message and part of it you are
commenting.)

I was posting only the relevant parts of the code, all the other code
has nothing to do with the problem.
Not in the example code that posted. It had a loop whose body contained

thr = new Thread(this);

That would create 100 Thread objects. The statement

thr.start();

later in the loop would invoke the run() method of the (single) object
referred by the keyword "this" 100 times.


Yes, but it starts the 100 threads (i.e. the thread objects), which all
executes within the same object, namely this.

In any case, it was the wrong code, it was supposed to be

thr = new Thread[opt.getThreads()];

for(int i=0; i<opt.getThreads(); i++) {
System.out.println("Starting Thread " + i);
thr = new Worker();
thr.start();
}

this creates 100 worker objects and 100 threads objects

tom
 
C

Chris Smith

Tom Forsmo said:
I don't believe in code bloat and I see it as unnecessary runtime
resource consumption. I don't subscribe to the idea that you should not
worry about resources (cpu, memory etc.), because its so cheap. The
reason is simple, bloated code runs slower and is more difficult to
maintain. Think of a program that takes up 300 MB of memory and compare
it to a program that only requires say, 150MB. The smaller program
requires less bus bandwidth between the cpu, memory and disk and less
processing cycles (barring algorithm efficiency).

You seem to see things in black and white. The world doesn't work that
way. Practically everything is an object in Java. Objects are cheap.
The entire runtime system, memory management, etc. is designed that way,
and people have put lots of effort into making it so. Anything else you
do that tries to minimize creating objects is likely to not be a
noticable improvement, and often hurts the performance of your code.

On the other hand, creating 100 threads is certainly not cheap, and
almost certainly harmful if you care about performance in this
application... unless it will be running on some kind of supercomputer
that has at least 50 processors or so. Sometimes creating 100 threads
can make your development life easier by helping you separate various
tasks in your application design; but if that cost is okay with you, you
are certainly misplacing your priorities when you worry about creating
that extra 100 objects. This isn't about whether you should be happy
with a sub-optimal program. It's about whether you should worry about
polishing the deck when the Titanic is sinking.
 
R

Robert Klemme

That explains it, I have edition 1 of the book, you are referring to
edition 2. A question then, do you know if 2nd Ed much different that
1st Ed?

I can't seem to find my book (time to search colleagues desks) and also
I do not know 1st edition. :)
I see there is a difference in the TOC, but to me it seems like a
reorganisation of the book with possibly better titles or something.

I believe the foreword mentioned something like reorganization as one of
the major differences.
> The
book came out 2 years after the first, so it seemed to me that there
really could not be much difference. Basically since the theory and
practice of concurrent programming is quite old, so much would probably
not have changed in 2 years.

Still a book can be improved. :)
Fair enough, but I was talking about re-entrancy. I am not saying thread
confinement could not be used, thought.

Btw could you give a bit more detailed description about thread
confinement, I could not find anything when googling, only references to
Dougs book.

I believe I stated that earlier: basically you restrict access to data
to a thread and thus avoid synchronization issues. Can be achieved in
different ways (local variables etc.).

Kind regards

robert
 
T

Tom Forsmo

Chris said:
You seem to see things in black and white. The world doesn't work that
way.
:)

Practically everything is an object in Java. Objects are cheap.
The entire runtime system, memory management, etc. is designed that way,
and people have put lots of effort into making it so. Anything else you
do that tries to minimize creating objects is likely to not be a
noticable improvement, and often hurts the performance of your code.

That only applies if you don't have experience in thinking about
avoiding code bloat and its problems. I have numerous times created
applications that require a fraction of memory or cpu power compared to
a someone's idea that you should not worry about it. In some cases I
have also created working and stable solutions when others have not
managed to get one off the ground, because of code bloat.
Sometimes creating 100 threads
can make your development life easier by helping you separate various
tasks in your application design; but if that cost is okay with you,

Yes, for example in high performance server design, where the server
should be able to handle between thousand and ten thousand transactions
per second.
you
are certainly misplacing your priorities when you worry about creating
that extra 100 objects. This isn't about whether you should be happy
with a sub-optimal program. It's about whether you should worry about
polishing the deck when the Titanic is sinking.

Its funny how many people hide behind that statement, it clearly shows
they really do not know what they are talking about in that respect. I
have experience in thinking about the problem, so I don't use much
"cognitive effort" to avoid it. A person not used to thinking about it
would spend much time worrying about it, which incidentally seems to be
the majority of java developers i have talked to. The mutual feeling
among them seems to be that the JVM will take care of it all for you, so
don't worry your pretty little head about it... I am not saying there
is nothing to what you are saying, of course there is, but its not as
black and white as you are saying it is.

tom
 
C

Chris Smith

Tom Forsmo said:
That only applies if you don't have experience in thinking about
avoiding code bloat and its problems. I have numerous times created
applications that require a fraction of memory or cpu power compared to
a someone's idea that you should not worry about it. In some cases I
have also created working and stable solutions when others have not
managed to get one off the ground, because of code bloat.

I don't believe you. That is, I believe you've created better software
than other people; I don't believe that an improvement of that scale
came from decisions along the lines of trying to avoid creating one
object per thread.
Yes, for example in high performance server design, where the server
should be able to handle between thousand and ten thousand transactions
per second.

Are there anything close to 50 CPUs in the box? If not, then 100
threads is still very likely to be killing your performance to the point
that it's way past time to worry about 100 objects. Half my point is
that you are overestimating the performance impact of creating
objects... but the other half is that you are underestimating the
performance impact of creating threads. If you need that kind of
performance, you should be doing thread pooling and asynchronous I/O in
concert with state machines to reduce the number of threads to less than
about twice the number of CPUs.
Its funny how many people hide behind that statement, it clearly shows
they really do not know what they are talking about in that respect.

"That" statement? You mean the one that says that threads are likely
killing your performance, so stop worrying about 1K of memory
allocations? Most people probably don't hear that a lot. If you're
hearing that a lot from other developers, then perhaps it's time to
think about whether threads are killing your performance.

Yes, I realize you'd like to come away from this conversation feeling
superior because you can avoid "code bloat" while all us lowly Java
programmers can't. Feel free to do so, of that's what your ego needs.
Otherwise, you might want to fix your thread problem.
 
T

Tom Forsmo

Chris said:
I don't believe you. That is, I believe you've created better software
than other people; I don't believe that an improvement of that scale
came from decisions along the lines of trying to avoid creating one
object per thread.

No, not at that small level, but applying the same principle to all code
and especially to data structures that hold big amounts of data.
Are there anything close to 50 CPUs in the box? If not, then 100
threads is still very likely to be killing your performance to the point
that it's way past time to worry about 100 objects.

Are you pulling my leg? What system are you running on?

The code I did which prompted me to ask the original question, ran a
thousand server threads and five hundred client threads where the client
issued ten thousand requests per thread. In total five million requests,
which finished in around 45 minutes. This is on a Intel Core Duo
processor running Linux with 1.5 GB of RAM:

Linux duplo 2.6.17.8tf2 #10 SMP PREEMPT Wed Aug 30 22:35:48 CEST 2006
i686 Genuine Intel(R) CPU T2300 @ 1.66GHz unknown GNU/Linux

Threads are very cheap in linux 2.6, when they changed the kernel thread
model, they did a test where they created one hundred thousand threads.
With the old model that took about 15 minutes with the new model it took
2 seconds ref: http://kerneltrap.org/node/422
As far as I understand it. On Windows processes are expensive while
threads are cheap. On linux processes are cheap and threads are
extremely cheap.

Back to the business at hand. The server and client is communicating
with UDP (so that's a bit cheaper and its a simple request/reply
operation, which talks to a DB (oracle cluster, so the DB does not cause
any problems). In addition the code is completely self made, no app
servers or anything like that, which of course would eat up a lot of the
cpu power and memory.
"That" statement? You mean the one that says that threads are likely
killing your performance, so stop worrying about 1K of memory
allocations?

No, I mean the statement: "stop worrying about memory and processing
power, we can just buy some more...".
Most people probably don't hear that a lot. If you're
hearing that a lot from other developers, then perhaps it's time to
think about whether threads are killing your performance.

Its almost exclusively coming from java developers, but also from
developers of other languages, although not as much. I think its lazy
programming. I don't mean to be rude and condescending towards java or
java developers, I like java as well. I just think there are some ideas
that the programming and java community should open their eyes to. I
have been working in a C project the last couple of years and that's
where I learned to appreciate that sentiment.

tom
 
C

Chris Smith

Tom Forsmo said:
The code I did which prompted me to ask the original question, ran a
thousand server threads and five hundred client threads where the client
issued ten thousand requests per thread. In total five million requests,
which finished in around 45 minutes.

Good. Then you have no reason to care about the memory required for one
small object per thread.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,021
Latest member
AkilahJaim

Latest Threads

Top