concurrency, threads and objects

T

Tom Forsmo

Chris said:
Good. Then you have no reason to care about the memory required for one
small object per thread.

hats exactly the general sentiment I am opposing in my argument...

In any case, I asked in the previous post what system you where running
on, since you say that running 100 threads would kill the performance,
do you mind telling me? If there is that much difference in performance
between systems its good to be aware of that.

tom
 
T

Tom Forsmo

Chris said:
Good. Then you have no reason to care about the memory required for one
small object per thread.

Not the point. This discussion is not about performance as a consequence
of memory consumption, those are two separate issues in this thread.

In any case, I asked in the previous post what system you where running
on, since you say that running 100 threads would kill the performance,
do you mind telling us?

If there is that much difference in performance between systems I think
that's a valuable discussion to have in this group.

tom
 
R

Robert Klemme

As far as I understand it. On Windows processes are expensive while
threads are cheap. On linux processes are cheap and threads are
extremely cheap.

Yep, threads on modern systems are very cheap. I once cooked up a small
program (attached) to collect thread stats. On my 3GHz P4D with Win XP
Pro x64 it yields

max t11 - start time in thread: 140
avg t11 - start time in thread: 4.14126
max t2 - creation time : 78
avg t2 - creation time : 0.03084
max t3 - start time in main : 78
avg t3 - start time in main : 0.1558

max t11 - start time in thread: 204
avg t11 - start time in thread: 5.35656
max t2 - creation time : 16
avg t2 - creation time : 0.0282
max t3 - start time in main : 16
avg t3 - start time in main : 0.15657

5ms as average starting time for a thread isn't really much.
No, I mean the statement: "stop worrying about memory and processing
power, we can just buy some more...".


Its almost exclusively coming from java developers, but also from
developers of other languages, although not as much. I think its lazy
programming.

I do not think so - rather it is consciously trying to find a good OO
design. OOA/D/P are quite different from procedural. While I do agree
that thought has to be given to issues of memory consumption and CPU
usage during design of performance critical applications, overdoing it
is certainly doing more harm than good. Considering the overhead of one
object created per thread to be too much will definitively harm the
design of the application. And this is even more so true in Java where
the overhead of object creation on modern VM's is negligible.
Performance might be one goal but there are tons of other goals. If you
have the ultra performant application that nobody can maintain then
you're getting nowhere.
> I don't mean to be rude and condescending towards java or
java developers, I like java as well. I just think there are some ideas
that the programming and java community should open their eyes to. I
have been working in a C project the last couple of years and that's
where I learned to appreciate that sentiment.

I would be very carefully carrying over knowledge from a C environment
to a Java or other OO environment. While there are similarities and
general principles one must be aware of the platform and adjust to its
specifics.

Regards

robert

package threads;

import java.util.Iterator;
import java.util.List;
import java.util.Vector;

/**
* @author robert.klemme
* @created 04.07.2005 10:29:35
* @version $Id:$
*/
public class ThreadCreationOverhead {

private static final int THREADS = 100000;



private static class Maximizer {
private long max = 0;
private long sum = 0;
private int count = 0;

public synchronized void update( long n ) {
if ( n > max ) {
max = n;
}

sum += n;
++count;
}

public synchronized long getMax() {
return max;
}

public synchronized double getAvg() {
return ( ( double ) sum ) / count;
}
}


private static void print( String msg ) {
// System.out.println( Thread.currentThread().getName() + ": " + msg );
}



public static void main( String[] args ) {
testRun();
System.out.println();
testRun();
}



private static void testRun() {
final Maximizer mt11 = new Maximizer();
final Maximizer mt2 = new Maximizer();
final Maximizer mt3 = new Maximizer();

List threads = new Vector();

for ( int i = 0; i < THREADS; ++i ) {
// System.out.println( "Run " + i );
final long t1 = System.currentTimeMillis();
Thread th = new Thread(
new Runnable() {
public void run() {
long t11 = System.currentTimeMillis() - t1;
mt11.update( t11 );
print( "in thread: " + t11 );
}
} );
long t2 = System.currentTimeMillis() - t1;
th.start();
long t3 = System.currentTimeMillis() - t1;
mt2.update( t2 );
print( "after creation: " + t2 );
mt3.update( t3 );
print( "after start: " + t3 );
// System.out.println();

threads.add( th );
}

for ( Iterator iter = threads.iterator(); iter.hasNext(); ) {
Thread th = ( Thread ) iter.next();
try {
th.join();
}
catch ( InterruptedException e ) {
e.printStackTrace();
}
}

System.out.println( "max t11 - start time in thread: " + mt11.getMax() );
System.out.println( "avg t11 - start time in thread: " + mt11.getAvg() );
System.out.println( "max t2 - creation time : " + mt2.getMax() );
System.out.println( "avg t2 - creation time : " + mt2.getAvg() );
System.out.println( "max t3 - start time in main : " + mt3.getMax() );
System.out.println( "avg t3 - start time in main : " + mt3.getAvg() );
}
}
 
T

Tom Forsmo

Robert said:
5ms as average starting time for a thread isn't really much.

That means windows can only create 400 threads in 2 seconds, compared to
linux 2.6 which creates 100,000 threads in 2 seconds. hats a big
difference. That makes me understand why people in this thread talks
about the performance hit of having large number of threads.

We are, though, comparing c thread calls to java thread calls, even
though java threads are native threads on both windows and linux in java 5.0
Additionally these numbers say nothing about execution efficiency of
threads in windows compared to linux.

I will have a look at you program and run it on my computer, in both
windows and linux, since there would be no no hardware difference. I
never thought it might be that big a difference between linux and
windows, actually I am not sure this difference can be correct. I know
Ingo Molnar of the linux kernel team is really good when it comes this
stuff, but microsoft can not be doing that bad here, we will see.

To test execution efficiency I will create a small test app which I will
run on both systems as well, just to get that angle. I will post my results.
> While I do agree
that thought has to be given to issues of memory consumption and CPU
usage during design of performance critical applications, overdoing it
is certainly doing more harm than good. Considering the overhead of one
object created per thread to be too much will definitively harm the
design of the application. And this is even more so true in Java where
the overhead of object creation on modern VM's is negligible.

I agree, it was an instinctive reaction that prompted me to start this
thread and I decided I wanted to know the answer. I like to know the
cost and consequence of doing things on one way compared to another, for
future reference.
I would be very carefully carrying over knowledge from a C environment
to a Java or other OO environment. While there are similarities and
general principles one must be aware of the platform and adjust to its
specifics.

I agree, but that does not preclude the chance that there might
something that can be learned from other platforms.

tom
 
C

Chris Smith

Tom Forsmo said:
Not the point. This discussion is not about performance as a consequence
of memory consumption, those are two separate issues in this thread.

Of course it's the point. The point was that you were concerned about
the memory overhead of creating one unnecessary object per thread. I
was pointing out that it's not a sensible thing to be concerned about.
In any case, I asked in the previous post what system you where running
on, since you say that running 100 threads would kill the performance,
do you mind telling us?

The machine I was speaking of was the hypothetical system in which
creating 100 objects has a discernable performance impact. I am not
aware of any such machine in common use; but apparently you are
convinced that you are using one. My Commodore 64 certainly qualifies;
but I haven't yet figured out how to get it to do multithreading.

Robert mentioned a start time of 5 ms per thread. You responded that
your Linux server creates a thread on 0.02 ms. By contrast, an object
allocation for Integer on my system (including some amortized time for
garbage collection, though probably too little since the object graph in
test code is inevitably simpler than in production code) takes about
0.000015 ms. That's not really accounting for the real performance
impact of the threads, though, which is paid. Unless you have a
hideously bad architecture, you won't spend a lot of time creating
threads. Since you've now decided on 100000 threads instead of 100, the
real cost for these threads will be paid:

1. During scheduling.
2. In cache misses and TLB flushes due to context switching

Memory-wise, 100000 unnecessary objects requires about 1MB of memory;
not trivial in absolute terms, certainly. But each thread will require
at a minimum one machine page (typically 4K) of stack space, plus extra
data structures in the kernel for tracking. That's about half a
gigabyte of memory, and that's extremely conservative.

Result: it makes no sense to worry about one unnecessary object per
thread.
 
T

Tom Forsmo

Robert said:
Yep, threads on modern systems are very cheap. I once cooked up a small
program (attached) to collect thread stats.

I ran you program on my machine in both windows and linux and discovered
some interesting results:

The machine is a dual boot Thinkpad T60 with intel dual core. no special
systems/kernel optimisations has been performed on either systems.

linux: vanilla linux 2.6.17.8 kernel release running on Mandriva 2006
windows: factory installed windows xp with SP 2 (version 2002)

tf - linux:

max t11 - start time in thread: 55
avg t11 - start time in thread: 0.16632
max t2 - creation time : 42
avg t2 - creation time : 0.02114
max t3 - start time in main : 42
avg t3 - start time in main : 0.09306

max t11 - start time in thread: 65
avg t11 - start time in thread: 0.15874
max t2 - creation time : 15
avg t2 - creation time : 0.01887
max t3 - start time in main : 15
avg t3 - start time in main : 0.09395


tf - windows:

max t11 - start time in thread: 78
avg t11 - start time in thread: 0.66997
max t2 - creation time : 78
avg t2 - creation time : 0.14944
max t3 - start time in main : 63
avg t3 - start time in main : 0.27753

max t11 - start time in thread: 47
avg t11 - start time in thread: 0.73756
max t2 - creation time : 47
avg t2 - creation time : 0.14407
max t3 - start time in main : 47
avg t3 - start time in main : 0.29903


Conclusion: linux is faster.

I also tested a thread efficiency program I made, its a udp server and
client.

server: -t 1000 (number of threads: 1000)
client: -t 500 -r 10000 (number of threads 500,
number of requests per thread: 10000)

tf - linux

Average creation time for client object: 0.00462ms
Time executing threads: 183114ms (183.114s)
Average creation time for client object: 0.00426ms
Time executing threads: 182486ms (182.486s)

tf - windows

Average creation time for client object: 0.00359ms
Time executing threads: 535891ms (535.891s)
Average creation time for client object: 0.00296ms
Time executing threads: 536219ms (536.219s)

conclusion: windows is faster at creating client objects by a little
bit, but linux is 3 times faster at executing the actual operations.

I did another test with this code also:

in the server there is a sleep() call to simulate db access, I
experimented a bit with what values it could hold and how it would
affect the total performance. I found out that the performance
increasement is proportional to the sleep time decreasement, and that
all values down to 1ms (since it is the lowest value for the call I
made) affected performance. But for windows the story was completely
different, why that is I dont know. In windows any values below
100-110ms was rounded up to approx 100ms. So I could not get any
performance increase with values below 100ms. Also there was a strange
spike at the 1ms and 2ms tests (it might have something to do with
kernel context switching thresholds)

Here are the measurements:

tf - windows:

1ms:
E:\threads_perf>java -cp . tf.StatelessUdpClient -t 500 -r 10000
Time executing threads: 453875ms (453.875s)
Time executing threads: 483859ms (483.859s)

2ms:
Time executing threads: 656609ms (656.609s)
Time executing threads: 684734ms (684.734s)

4ms:
Time executing threads: 572547ms (572.547s)
Time executing threads: 604500ms (587.5s)

30ms:
Time executing threads: 578796ms (578.796s)
Time executing threads: 555860ms (555.86s)

100ms:
Time executing threads: 571079ms (571.079s)
Time executing threads: 593531ms (593.531s)

120ms:
Time executing threads: 632657ms (632.657s)
Time executing threads: 639125ms (639.125s)

150ms:
Time executing threads: 773750ms (773.75s)
Time executing threads: 771406ms (771.406s)

200ms:
Time executing threads: 1019234ms (1019.234s)
Time executing threads: 1021328ms (1021.328s)

500ms:
Time executing threads: 2543078ms (2543.078s)
Time executing threads: 2544656ms (2544.656s)


The code is attached.

tom
 
R

Robert Klemme

I ran you program on my machine in both windows and linux and discovered
some interesting results:

The machine is a dual boot Thinkpad T60 with intel dual core. no special
systems/kernel optimisations has been performed on either systems.

Does it also have a dual display and a dual keyboard? :) SCNR
linux: vanilla linux 2.6.17.8 kernel release running on Mandriva 2006
windows: factory installed windows xp with SP 2 (version 2002)
Conclusion: linux is faster.

I also tested a thread efficiency program I made, its a udp server and
client.

server: -t 1000 (number of threads: 1000)
client: -t 500 -r 10000 (number of threads 500,
number of requests per thread: 10000)

tf - linux

Average creation time for client object: 0.00462ms
Time executing threads: 183114ms (183.114s)
Average creation time for client object: 0.00426ms
Time executing threads: 182486ms (182.486s)

tf - windows

Average creation time for client object: 0.00359ms
Time executing threads: 535891ms (535.891s)
Average creation time for client object: 0.00296ms
Time executing threads: 536219ms (536.219s)

conclusion: windows is faster at creating client objects by a little
bit, but linux is 3 times faster at executing the actual operations.

Interesting findings! Thanks for sharing these!

Kind regards

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top