Java vs C++ speed (IO & Sorting)

M

Mirek Fidler

Well, it's weird older version is faster for smaller file (like 80 MB
file), so I guess I go back to old one

http://www.pastebin.ca/975248

Well, this is exactly the problem: You are developing input data
together with code....

The original benchmark was "wc of 2MB text file". Bible.txt is OK
here. Stick to it please. If you are getting too small times, you can
introduce loop just gather longer times.

But be aware that GC will kick in too harder with cpp, which OTOH is
quite fair.

Mirek
 
M

Mirek Fidler

I was playing around. Yes, UPP map and Strings are about 2.5 times
faster than java's and C++ standard library.

This java version (the old one) is probably the more standard version
of java.

http://www.pastebin.ca/977902

4 MB file
Time: 281 ms (java client).
Time: 78 ms (Upp)

40 MB file
Time: 2250 ms (java -server)
Time: 828 ms (Upp).

Anyway, the idea of moving trie structure into a single char array was
by pm_kirkham (not mine). Very neat trick and jump in speed .. crushed
Upp :)

Yes, I have already appreciated it, did not I?

However, the practical uses of this clever trick are quite limited.

Mirek
 
M

Mirek Fidler

Why? We have at least reached the conclusion that this is the best
solution to this problem. Lookup time is O(m) where m is the length of
the lookup word. In any case, these kinds of structures are already
used in IP routing software (according to Wikipedia:http://en.wikipedia.org/wiki/Radix_tree)

The standard library should provide this as an option (right now there
are only Hashmap and tree map). In Java, this should be part of
Collection container and would probably go under SortedMap interface.

Ah, but this is completely different result at all.

Sure, such structure can have some special uses, like countind words
in bible * 80 text. Does it tell anything about Java vs C++
performance?

However, its limitation is that in most cases, it seems to waste a lot
of memory too..

You have got away in this case quite well as each node has to store
only 2 * 26 pointers. Try to extend "wc" to counting words that
contain all unicode latin letters (means, 2000 pointers per node) or
define them as "space separated items" and you are in trouble
(IMO!)...

Mirek
 
M

Mirek Fidler

The way it was implemented in this java version, it wastes a lot of
space. However, the form called "Radix tree" (or PATRICIA trie), it's
more compact.

Yes. But also much slower for inserting.

See, Trie or Radix tree are really useful... SOMETIMES. But I would
not use them e.g. in linker to create the map of function names or in
Java compiler to store the class map.

Mirek
 
M

Mirek Fidler

Yes, and the Windows will cache 80 MB files. As for 800 MB file, the
U++ version tries to load the entire file instead of reading it in
chunks.

Well, U++ version is created this way because original D was doing the
same thing... means it really is created for smaller files than
800MB...

Mirek
 
L

ldv

For small benchmarking problems (which are some structured algorithms)
if you compileJavainto ASM (and an exe file) then it may be as fast
as C/C++. But then you will loose the portability. For being fast, it
need to be native, which isn't portable.

I do not get your point here. A portable C++ program loses portability
when you compile it for a particular platform. A Java program loses
portability if it uses a native library specific to a particular
platform (e.g. to access Windows registry).

However you define "portability", programs are not written for the
sake of being portable, they are written to solve problems. The only
exception I can think of are programs written specifically to teach
the concept of portability. :)

Finally, Excelsior JET is not the only spec-compliant Java
implementation with an Ahead-Of-Time compiler. IBM has it in WebSphere
Realtime:

http://www.ibm.com/developerworks/java/library/j-rtj2/index.html

(scroll down to "AOT Java compilation")

LDV
 
T

timjowers

I made bible.txt 10 times and made it a 43 meg file

C++ is doing far worse now (the code used was multiset version)

Time for reading, sorting, writing: 2047 ms (java)
Time for reading, sorting, writing: 2016 ms (java)
Time for reading, sorting, writing: 2016 ms (java)
Time for reading, sorting, writing: 2015 ms (java)

and for c++

Time for reading, sorting, writing: 5281 ms (c++)
Time for reading, sorting, writing: 5703 ms (c++)
Time for reading, sorting, writing: 3921 ms (c++)
Time for reading, sorting, writing: 3718 ms (c++)

How come? c++ is at least 45% times slowe (if using 3718 ms)


I'm sure your deltas are not significant in terms of computer
architecture and maybe not statistically either. I've done other
benchmarks on JVM versions and you'll get variance based on disk IO.
once you run from RAM you get fairly comparable results but certainly
you'd need to drop to runlevel 1 or something (boot into ksh?) to get
close to a bare-metal comparison. Maybe that was discussed before as I
read only this posting.

Anyways, what I see is Java is fairly fast once it gets into RAM. Of
course you can code something faster in C (not C++ even). Some
companies used to even hand-code assembly! I'm sure the compiler,
compiler options, hardware architecture, etc. are a big factor in C/C+
+. Time was every major tech company had a compiler team just for that
reason. Until you have tailored the assembly output by the C code then
you are not being fair. E.g. the JVM is architecture dependent and
init'ed for that hardware at the time at which you start your clock.
You'd need to likewise have optimized assembly output for the C binary
to make an apples comparison. Not just compiled with -o or -g or
whatever but with architecture specific flags at least.

If you think performance is not too important anymore then you are
right. Modern OS's do not perform anywhere in the ballpark as fast as
old ones, for instance.

Best,
TimJowers
 
E

Erik Wikström

I do not get your point here. A portable C++ program loses portability
when you compile it for a particular platform. A Java program loses
portability if it uses a native library specific to a particular
platform (e.g. to access Windows registry).

However you define "portability", programs are not written for the
sake of being portable, they are written to solve problems. The only
exception I can think of are programs written specifically to teach
the concept of portability. :)

True, but one of the advantages of Java is that you can "write once, run
everywhere", by compiling to native you lose that kind of portability.
 
L

ldv

I found a few more that can compile bytecode to native machine code.

gcjhttp://gcc.gnu.org/java/

JNC is just a graphical front-end to GCJ.

GCJ itself is not Java spec-compliant, so it would not be fair to use
it for performance comparison.

ARM Jazelle supports AOT compilation, but that's a Java ME solution.

LDV
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,046
Latest member
Gavizuho

Latest Threads

Top