performance question

  • Thread starter Olivier Scalbert
  • Start date
S

Steve Wampler

Olivier said:
Ok, but how can I do the same thing in Java ?
By the way, I do not want to attack java (that I use daily). I just want
to understand ...

Try the following version (lifted almost verbatim from an earlier thread).
Note that it includes instrumentation code that could be stripped out
in 'Real Life'. Also note that the methods it uses could be generalized
and provided as part of a 'performance' package. (I didn't write the
methods, incidently.)

---------------------------------------------------------------------
class T1 {

static public void main(String[] argv) {
int lim = new Integer(argv[0]);
int nbench = new Integer(argv[1]);
int b;

for (b=0; b < nbench; b++) {
System.err.println("Bench " + b);

Date start = new Date();
try {
write3fastItoS2(lim);
}
catch ( Exception e) {
System.err.println("Exception occurred");
System.err.println(e.toString());
}
Date now = new Date();

System.err.println("Took " + ((now.getTime() -
start.getTime())/1000.0) + " seconds");
}
}

static public void write3fastItoS2( int lim ) throws IOException {
BufferedOutputStream os = new BufferedOutputStream( System.out );
String message2 = "abcdefghijk ";
byte[] mbuff = message2.getBytes();
int mlength = mbuff.length;

AsciiByteBuff ibuff = new AsciiByteBuff();
ibuff.ascii = new byte [Integer.toString(Integer.MAX_VALUE).length()
+ 2 ];

for( int i = 0; i < lim; i++ ) {
os.write( mbuff, 0, mlength );
fastItoS2( ibuff, i );
os.write(ibuff.ascii, ibuff.start, ibuff.slength );
os.write('\n');
}
os.close();
}

/** Converts a POSITIVE integer to a byte [], with an emphasis on speed.
*
* @param buff The start, length and ASCII values are stored in this buffer.
* The buffer must be therefore at least the size of Integer.MAX_VALUE + 2.
* buff.start stores the offset of the first ASCII character.
* buff.length stores the length of the ASCII character string.
* Strings are written to the end of the buffer.ascii array.
* @param i MUST BE POSITIVE. This is not tested by the method. Stuff
* will explode in spectacular ways if you pass this routine a negative
* integer.
*/
static void fastItoS2( AsciiByteBuff buff, int i ) {
int index = buff.ascii.length - 1;
int q = i;
int r;

for(;;) {
r = q % 10;
q = q / 10;
buff.ascii[index--] = digits[r];
if( q == 0 ) break;
}

buff.start = (index + 1);
buff.slength = (buff.ascii.length - 1 - index);
}

private static class AsciiByteBuff {
public int start;
public int slength;
public byte [] ascii;
}

private static byte [] digits = { '0', '1', '2', '3', '4',
'5', '6', '7', '8', '9' };

}
----------------------------------------------------------------

Yes, it's quite a bit longer than the C version, but by rewriting
most of it into a 'performance package' (presumably with additional
methods for other, similar operations), the remaining code could
be written in not much more lines than the C one. (The fact that
no one [as far as I know] has written such a performance package
probably means that there hasn't been a perceived need for it!)

Performance with a single run (to match the C version) through the
key code (SUN's JDK 1.6):

---------------------------------------------------------------
->time java -server T1 1000000 1 | cat >/dev/null
Bench 0
Took 0.251 seconds
java -server T1 1000000 1 0.31s user 0.03s system 88% cpu 0.384 total
cat > /dev/null 0.00s user 0.02s system 5% cpu 0.333 total
---------------------------------------------------------------

while on the same machine, using gcc 4.1.1 (-O4) the original C version gives:

---------------------------------------------------------------
->time ./t1 | cat >/dev/null
../t1 0.29s user 0.02s system 97% cpu 0.319 total
cat > /dev/null 0.00s user 0.02s system 6% cpu 0.318 total
->
---------------------------------------------------------------

And to show what Hot Spot can do, let's give it a chance to really kick
in (it doesn't take long):

---------------------------------------------------------------
->time java -server T1 1000000 20 | cat >/dev/null
Bench 0
Took 0.239 seconds
Bench 1
Took 0.195 seconds
Bench 2
Took 0.194 seconds
Bench 3
Took 0.199 seconds
Bench 4
Took 0.198 seconds
Bench 5
Took 0.207 seconds
Bench 6
Took 0.194 seconds
Bench 7
Took 0.192 seconds
Bench 8
Took 0.194 seconds
Bench 9
Took 0.194 seconds
Bench 10
Took 0.196 seconds
Bench 11
Took 0.192 seconds
Bench 12
Took 0.193 seconds
Bench 13
Took 0.194 seconds
Bench 14
Took 0.194 seconds
Bench 15
Took 0.198 seconds
Bench 16
Took 0.201 seconds
Bench 17
Took 0.193 seconds
Bench 18
Took 0.194 seconds
Bench 19
Took 0.195 seconds
java -server T1 1000000 20 4.01s user 0.04s system 99% cpu 4.083 total
cat > /dev/null 0.00s user 0.01s system 3% cpu 0.319 total
->
---------------------------------------------------------------

Whereas 20 runs of the C version would produce no significant difference
from the original. (Feel free to instrument the C version in the same way.)

Does this prove Java is faster than C? Of course not. I imagine one could
do similar hand optimizations on the C side to get a quicker version there
as well (or find a better C compiler!). In fact, I'm *sure* one could also
write a similar 'performance package' for C...

But it does help show that Java isn't inherently as slow as some would claim.

(If someone who's not as cheap as I am has access to JET, I'd *love* to see
what JET does with this code [and the original Java version].)
 
S

Steve Wampler

Steve said:
And to show what Hot Spot can do, let's give it a chance to really kick
in (it doesn't take long):

---------------------------------------------------------------
->time java -server T1 1000000 20 | cat >/dev/null
Bench 0
Took 0.239 seconds
Bench 1
Took 0.195 seconds
Bench 2
Took 0.194 seconds
Bench 3

Just for fun, I decided to see what happens if the program has to
produce more lines of output (10,000,000 instead of 1,000,000).
This allows the effect of Hot Spot to show up more in a single run:

--------------------------------------------------------------
->time java -server T1 10000000 1 | cat >/dev/null
Bench 0
Took 2.07 seconds
java -server T1 10000000 1 1.92s user 0.12s system 92% cpu 2.205 total
cat > /dev/null 0.01s user 0.13s system 6% cpu 2.154 total
 
T

Tim Smith

in response to Joshua Cranmer's equally definitive point about planes being
faster to fly a few hundred people across a continent but trains being faster
to transport tons of coal.

Although, it occurs to me that if one used a multi-airplane solution with
parallelized coal transport that it would be possible to transport coal more
swiftly with a fleet of transport aircraft than with a train. Naturally it
would be much, much more expensive than a freight train, but hey, speed is
all, right?

Having just recently completed a round trip on Amtrak from Tacoma, WA,
to Beaumont, TX, and back, I'm not sure the train would beat the plane
on coal hauling, even if there was only one plane.

The train is about 40 hours each way, if there are no delays, but trains
seem really good at finding ways to be dealyed. A plane is about 4.5 hours.
So that one plane could take about 10 trips for every one train trip. A
quick check on Wikipedia shows that a C-5A (the first cargo plane that
came to mind) can carry about 1/8th of the coal that Joshua Cranmer's
example used...so it looks like the plane could actually win this! :)
 
W

Wojtek

Jon Harrop wrote :
If I'm running a program many times then I'll care about Java's
uniquely-poor performance at starting up.

If you are running a program that starts up, does a little something or
other, then goes away, the Java is not for you. The startup costs for
the JRE will swamp out everything else.

If you are running a program which starts up, then remains up for
hours/days/weeks/months/years, then Java will outperform many other
languages.

You can however re-write the program to accept information along a
channel (such as IP). That way the program is up and running when it
gets the request for processing, and you remove the JRE startup.

And again, simple tests like a small loop are utterly inadequate for
benchmarks.

And it is not unique. PASCAL had a runtime environment many years ago.
The idea is not new.
 
D

Daniel Pitts

Tim said:
Having just recently completed a round trip on Amtrak from Tacoma, WA,
to Beaumont, TX, and back, I'm not sure the train would beat the plane
on coal hauling, even if there was only one plane.

The train is about 40 hours each way, if there are no delays, but trains
seem really good at finding ways to be dealyed. A plane is about 4.5 hours.
So that one plane could take about 10 trips for every one train trip. A
quick check on Wikipedia shows that a C-5A (the first cargo plane that
came to mind) can carry about 1/8th of the coal that Joshua Cranmer's
example used...so it looks like the plane could actually win this! :)
But in what cost of fuel?
 
M

Michael Jung

Tim Smith said:
Having just recently completed a round trip on Amtrak from Tacoma, WA,
to Beaumont, TX, and back, I'm not sure the train would beat the plane
on coal hauling, even if there was only one plane.

The train is about 40 hours each way, if there are no delays, but trains
seem really good at finding ways to be dealyed. A plane is about 4.5 hours.
So that one plane could take about 10 trips for every one train trip. A
quick check on Wikipedia shows that a C-5A (the first cargo plane that
came to mind) can carry about 1/8th of the coal that Joshua Cranmer's
example used...so it looks like the plane could actually win this! :)

This example indeed has some bearing on perfomance in computing.
Imagine that this "trick" wouldn't cost that much fuel in itself
(metaphor for electrictity?). Then many planes will fly and planes
will have to wait for other planes, just like trains get delayed for
similiar reasons. Eventually, when judging performance, you have to be
aware of what other software will run in parallel with yours.

Michael
 
D

Daniel Pitts

Andreas said:
g++ without any optimisation option isn't exactly a fair match...
Afterall, java also performs optimisation.
That may be true, but keep in mind also that you can run g++ with
maximum optimization today, and it will get the benefit of todays
optimizer. You could have compiled a Java program last year, and it
gets the benefit of the optimizer of today's JVM. Next year, it will get
the benefit of the latest platform.

The real point of the post is that the micro-benchmark can prove Java is
fast enough.
 
R

Roger Lindsjö

Jon said:
Note that, even if I could reproduce your results, I can trivially optimize
your C code but nobody has been able to optimize Olivier's Java to run
anything like as fast as C on any system.

I rewrote the java program to use a buffered output stream with a 512
byte buffer and ASCII encoding (didn't make much difference from UTF-8,
but UTF-16 was slower). Also avoid building new strings for each output.
I'm also measuring the loop time to be able to compare to total time.

<sscce>
import java.io.BufferedWriter;
import java.io.FileDescriptor;
import java.io.FileOutputStream;
import java.io_OutputStreamWriter;

public class Test {
public static void main(String...args) throws Exception {
BufferedWriter out = new BufferedWriter(new OutputStreamWriter(new
FileOutputStream(java.io.FileDescriptor.out), "ASCII"), 512);
long start = System.currentTimeMillis();
for (int i = 0; i < 1000000; i++) {
out.write("abcdefghijk ");
out.write(String.valueOf(i));
out.write('\n');
}
System.err.println("Loop time: " +
(System.currentTimeMillis() - start));
}
}
</sscce>

Results:
(C program)
time ./a.out > /dev/null

real 0m0.286s
user 0m0.282s
sys 0m0.001s

(Original java program)
time java Original > /dev/null

real 0m3.774s
user 0m3.101s
sys 0m0.866s

(Rewritten java program)
time java Test > /dev/null
Loop time: 426

real 0m0.523s
user 0m0.597s
sys 0m0.036s

Interesting (perhaps), if I did not redirect to /dev/null I got (output
snipped)

(C program)
time ./a.out
real 0m9.725s
user 0m0.759s
sys 0m3.425s

(Rewritten java program)
time java Test
Loop time: 7571

real 0m7.675s
user 0m0.714s
sys 0m0.699s

//Roger Lindsjö
 
M

Mark Thornton

Roger said:
I rewrote the java program to use a buffered output stream with a 512
byte buffer and ASCII encoding (didn't make much difference from UTF-8,
but UTF-16 was slower). Also avoid building new strings for each output.
I'm also measuring the loop time to be able to compare to total time.

<sscce>
import java.io.BufferedWriter;
import java.io.FileDescriptor;
import java.io.FileOutputStream;
import java.io_OutputStreamWriter;

public class Test {
public static void main(String...args) throws Exception {
BufferedWriter out = new BufferedWriter(new OutputStreamWriter(new
FileOutputStream(java.io.FileDescriptor.out), "ASCII"), 512);
long start = System.currentTimeMillis();
for (int i = 0; i < 1000000; i++) {
out.write("abcdefghijk ");
out.write(String.valueOf(i));
out.write('\n');
}
System.err.println("Loop time: " +
(System.currentTimeMillis() - start));
}
}
</sscce>

Results:
(C program)
time ./a.out > /dev/null

real 0m0.286s
user 0m0.282s
sys 0m0.001s

(Original java program)
time java Original > /dev/null

real 0m3.774s
user 0m3.101s
sys 0m0.866s

(Rewritten java program)
time java Test > /dev/null
Loop time: 426

real 0m0.523s
user 0m0.597s
sys 0m0.036s

Interesting (perhaps), if I did not redirect to /dev/null I got (output
snipped)

(C program)
time ./a.out
real 0m9.725s
user 0m0.759s
sys 0m3.425s

(Rewritten java program)
time java Test
Loop time: 7571

real 0m7.675s
user 0m0.714s
sys 0m0.699s

//Roger Lindsjö

I think these tests should be done writing to a real file. After all
what is the point of sending a million lines to either /dev/null or a
console.

Mark Thornton
 
R

Roger Lindsjö

Mark said:
Jon Harrop wrote:

Do you know what you are measuring in the Java case? Try rerunning with

time java -Xms32m Bench 2000000

Java's allocation IS fast. However the cost of garbage collection
depends mostly on the number and size of live objects. I find gc very
valuable, but none of your examples benefit from it (was that your
deliberate choice).

Yes, growing the heap takes some time. Changes the time by a factor 2.5
on my machine.

time java Bench 2000000
3159ms

real 0m3.262s
user 0m8.307s
sys 0m0.252s

time java -Xms256m Bench 2000000
1159ms

real 0m1.282s
user 0m4.138s
sys 0m0.162s

//Roger Lindsjö
 
L

Lew

Mark said:
I think these tests should be done writing to a real file. After all
what is the point of sending a million lines to either /dev/null or a
console.

Well, if you write to the console you'll get snickered at on the newsgroup,
even if you're comparing to someone else's use of the exact same thing.
 
J

Jon Harrop

Mark said:
Do you know what you are measuring in the Java case? Try rerunning with

time java -Xms32m Bench 2000000

You can do the same to the OCaml, of course.
Java's allocation IS fast.

Not really.
However the cost of garbage collection
depends mostly on the number and size of live objects. I find gc very
valuable, but none of your examples benefit from it (was that your
deliberate choice).

What do you mean by "my examples"?

Functional programming tends to allocate huge numbers of very short lived
objects and GCs like the one in the JVM are not optimized for this.
Consequently, any decent FPL will be much faster (typically ~5x faster)
than Java at such benchmarks.

Lots of practical applications fall into this category (being ideally suited
to functional programming). Symbolic computations are the most obvious
example. I'll wager that a Java port of this symbolic simplifier will be
~5x slower, for example:

http://www.lambdassociates.org/studies/study10.htm
 
J

Jon Harrop

Wojtek said:
And it is not unique. PASCAL had a runtime environment many years ago.
The idea is not new.

All functional programming languages (including F#) have runtimes but none
have startup times anything like as slow as Java.
 
J

Jon Harrop

Daniel said:
That may be true, but keep in mind also that you can run g++ with
maximum optimization today, and it will get the benefit of todays
optimizer. You could have compiled a Java program last year, and it
gets the benefit of the optimizer of today's JVM. Next year, it will get
the benefit of the latest platform.

That is a triumph of hope over reality. You're giving theoretical reasons
why Java could be faster when, in reality, it is slower (often a lot
slower).
The real point of the post is that the micro-benchmark can prove Java is
fast enough.

But you can't optimize the OPs Java.
 
J

Jon Harrop

Steve said:
I wrote:

Of course, this one is (on my machine) over 700 times *faster* in
Java than in C.

There was a reason for the parenthetical remark. Benchmarks are
dependent on a *lot* of factors beyond the code they contain, and
rarely simply measure one *language* against another. On my machine, with
gcc 3.4.6 and Sun's JDK 1.6, the times I get are the times I
posted. Had I used gcc4, for example, the optimizations would be
different. However, perhaps I could also find a Java system that optimizes
that
code differently as well (JET, perhaps?) "Simple, flawless" 'benchmarks'
are particularly susceptible to environmental differences and should
never be used to make blanket statements about languages.

Irrelevant. We can optimize your C. The point here is that nobody has been
able to optimize the Java yet.
I've made *no* claim about the relative performance of Java and C, but
agree with others that there are plenty of applications where Java is
certainly "fast enough". And I find its ease of use *compared to C and
C++* superior for the tasks I have to write. I have no doubt that others
would disagree *even for those same tasks* and am sure that, for at
least some of those tasks, I could find an easier-to-use language as
well. So what?

So telling the OP that other Java programs might be fast enough for their
different purposes isn't helpful.
 
J

Jon Harrop

Lew said:
GIYF, Jon. The numbers are out there.

Yeah, like the rubbish Steve Wampler just posted. There are lots of numbers
out there but nothing objective that I can reproduce.
 
J

Jon Harrop

Steve said:
Whereas 20 runs of the C version would produce no significant difference
from the original. (Feel free to instrument the C version in the same
way.)

There is absolutely no evidence whatsoever that Hotspot has improved
anything in these results. You first run warms the cache (just as it does
in C) and you second run in Java jumped virtually to the fastest of all of
your results (that was even identical to your last run).
But it does help show that Java isn't inherently as slow as some would
claim.

The claim can equally well be taken as "Java is prohibitively difficult to
optimize". Real programs often won't afford 10x code bloat to get decent
performance.
 
L

Lew

Jon said:
I don't think it was intensional but by not piping the output to /dev/null
or somewhere else you were measuring the speed at which it drew on the
screen. That is the only reason your results were so similar for C and
Java.

It was in response to a benchmark that purported to show a much larger
difference also writing to the screen. Maybe it was wrong, but it was
necessary to keep the conversation on an even playing field.

Piped, as you suggest, Java was about 10x slower than C on that benchmark.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top