Java vs C++ speed (IO & Sorting)

R

Razii

I wonder if the latest beta of Excelsior JET would make any difference
in your particular tests? They claim that


I already tried it. The version "1" that I posted was faster on JET
(especially startup) but there was not much difference for version 3.
I wonder why? In fact, java -server is fastest when the file is large
(it's slower with small files).

160 KB file
Time: 47 ms (JET)
Time: 47 ms (Java Client)
Time: 156 ms (Java-server)

3.9 meg file
Time: 312 ms (JET)
Time: 344 ms (Java client)
Time: 422 ms (Java -server)

40 meg file
Time: 2469 ms (JET)
Time: 2735 ms (Java client)
Time: 2211 ms (Java server)
 
R

Razii

So a memory mapped file is faster than copying the file content
multiple times. Surprise!

IO might be one of the reason why C++ version is slower (fix it).
Another reason is the sorted map. I wasn't using TreeMap in Java,
which like C++ map, is sorted. If I use TreeMap, it gets 2 times
slower. I only sort it in the end after the file is parsed. Do you
have an alternative to sorted map in standard library? If not, I doubt
C++ version can get faster than my version. If not, prove me wrong.
Post one version that is faster.
 
M

Mirek Fidler

Yikes! never mind. This "bug fixed was bug and doesn't really work.

NP. I have learned long ago that before celbrating a new optimization
achievement, you should always check it still works :)

Mirek
 
M

Mirek Fidler

I think your library is designed specifically for this benchmark
(i.e., looking for words in a file).

Actually, if you check the code, identifying the words in file is the
only thing that had to be programmed :)

Sure, it IS optimized for String maps, but hey, String maps are so
commong that optimizing it benefits nearly anything.
How about if I change the
criteria, that instead of finding words, find quotes. i.e all words
and sentences within " and '?

IMO, would not change a thing.

In any case, in some situations, the
java version would look simpler to understand and read than U++
(probably in an application that has threading, network and/or GUI).

Well, considering GUI:

http://www.ultimatepp.org/www$uppweb$vsswing$en-us.html

I agree it has to be taken with a grain of salt, but to show C++
potential, I guess it is good enough...

Mirek
 
M

Mirek Fidler

Razii said:
Well, I am really disappointed with C++ people and especially
VC+++. I fixed a minor bug in version three and it's now two time
faster :)
Here is what I have now
3 meg file
Time: 625 ms (My version 1) (3 meg)
Time: 187 ms (My version 3 with the fix) (3 meg)
40 meg file (and java -server)
Time: 5297 ms (my version 1)
Time: 1265 ms (my version 3 with the fix)
What about C++ with standard library and VC++?
Time: 531 ms (3 meg)
Time: 5546 ms (for 40 meg)
Am I to believe that C++ with standard library is 4 TIMES SLOWER?
C++ IS FOUR TIMES SLOWER THAN JAVA WITH standard library?
This is really disappointing. I had high hopes.
The version 3 with bug fix is here
---------------
Also, posted herehttp://www.pastebin.ca/964045
//counts the words in a text file...
//combined effort: wlfshmn from #java on IRC Undernet
//and RAZII
ByteBuffer in = new FileInputStream(arg).getChannel().map(
FileChannel.MapMode.READ_ONLY, 0, numBytes);

Ok, if I get this you are using a memory mapped file in your Java
version.
std::ifstream input_file( argv );
std::eek:stringstream buffer;
buffer << input_file.rdbuf();
std::string input( buffer.str() );


While in the C++ version you use a high level operator<< to do a
bytewise copy from one stream to another, then copy the result to a
std::string.

So a memory mapped file is faster than copying the file content
multiple times. Surprise!

I thought we had already decided that when processing large files, the
I/O times dominate the test, and suppling a proper buffering is
important. Redundant copying certainly is not!

Bo Persson


AFAIK, not true. See this:

http://www.ultimatepp.org/www$uppweb$vsstd$en-us.html

IMO, before you conclude that I/O dominates, you should test it...

Mirek
 
R

Razii

NP. I have learned long ago that before celbrating a new optimization
achievement, you should always check it still works :)

:) I am working on version 4. I intend to catch up with UPP in this
benchmark. That will be a start of new topic.
 
M

Mirek Fidler

Do you
have an alternative to sorted map in standard library?

Upcoming C++ standard has "unordered_map"
If not, I doubt
C++ version can get faster than my version. If not, prove me wrong.
Post one version that is faster.

Just to make things clear, you are asking for "post me C++ version
using map container and string from standard library". U++ after all
is C++...

Mirek
 
R

Razii

Just to make things clear, you are asking for "post me C++ version
using map container and string from standard library". U++ after all
is C++...


Yes, standard library, which doesn't have VectorMap.
 
B

Bo Persson

Mirek said:
Razii said:
Well, I am really disappointed with C++ people and especially
VC+++. I fixed a minor bug in version three and it's now two time
faster :)
Here is what I have now
3 meg file
Time: 625 ms (My version 1) (3 meg)
Time: 187 ms (My version 3 with the fix) (3 meg)
40 meg file (and java -server)
Time: 5297 ms (my version 1)
Time: 1265 ms (my version 3 with the fix)
What about C++ with standard library and VC++?
Time: 531 ms (3 meg)
Time: 5546 ms (for 40 meg)
Am I to believe that C++ with standard library is 4 TIMES SLOWER?
C++ IS FOUR TIMES SLOWER THAN JAVA WITH standard library?
This is really disappointing. I had high hopes.
The version 3 with bug fix is here
---------------
Also, posted herehttp://www.pastebin.ca/964045
//counts the words in a text file...
//combined effort: wlfshmn from #java on IRC Undernet
//and RAZII
ByteBuffer in = new FileInputStream(arg).getChannel().map(
FileChannel.MapMode.READ_ONLY, 0, numBytes);

Ok, if I get this you are using a memory mapped file in your Java
version.
std::ifstream input_file( argv );
std::eek:stringstream buffer;
buffer << input_file.rdbuf();
std::string input( buffer.str() );


While in the C++ version you use a high level operator<< to do a
bytewise copy from one stream to another, then copy the result to a
std::string.

So a memory mapped file is faster than copying the file content
multiple times. Surprise!

I thought we had already decided that when processing large files,
the I/O times dominate the test, and suppling a proper buffering is
important. Redundant copying certainly is not!

Bo Persson


AFAIK, not true. See this:

http://www.ultimatepp.org/www$uppweb$vsstd$en-us.html

IMO, before you conclude that I/O dominates, you should test it...


We did that last week, when read-sort-write was just 15% slower than
plain read-write. To optimize that, read more than one character at a
time.

In the current test, the C++ version first does a byte-by-byte copy
from one stream to another, the copies that to a std::string, then
claims that C++ is slow at counting words!

My claim is that this is perhaps not optimal code:
std::ifstream input_file( argv );
std::eek:stringstream buffer;
buffer << input_file.rdbuf();
std::string input( buffer.str() );




Bo Persson
 
R

Razii

We did that last week, when read-sort-write was just 15% slower than
plain read-write.

However, in this case the time to write is not included. Also,
hashing, storing words is far more expensive than plain reading lines
and sorting the list like we did last time. Try it.
In the current test, the C++ version first does a byte-by-byte copy
from one stream to another, the copies that to a std::string, then
claims that C++ is slow at counting words!

My claim is that this is perhaps not optimal code:

why can't you fix it and post the fixed the version?
 
S

stan

Razii said:
However, in this case the time to write is not included. Also,
hashing, storing words is far more expensive than plain reading lines
and sorting the list like we did last time. Try it.


why can't you fix it and post the fixed the version?

You seem to not grasp the point that adults respond to childish taunts
differently than children. You might note that you have completely
failed to get one competant programmer to play your game.

Turns out that people grow out of the "mine is bigger than your's
games". We learn that comparing apples to oranges is pointless and
silly. IF you need to make orange juice then an apple is inappropriate
and if you need something red then the orange is out, and if all you
need is something round the they are about equal. Generalized benchmarks
are likewise pointless and impossible to develop. In fact banchmarks are
very hard and so far this thread has failed to even come close to
anything meaningful.

Since you seem completely bent on trolling usenet, I hear the assembly
language guys have some pretty snappy code solutions, maybe you could go
there and play awhile. If that doesn't appeal to you then maybe you
could at least stop crossposting.
 
M

Mirek Fidler

Mirek said:
Razii wrote:
Well, I am really disappointed with C++ people and especially
VC+++. I fixed a minor bug in version three and it's now two time
faster :)
Here is what I have now
3 meg file
Time: 625 ms (My version 1) (3 meg)
Time: 187 ms (My version 3 with the fix) (3 meg)
40 meg file (and java -server)
Time: 5297 ms (my version 1)
Time: 1265 ms (my version 3 with the fix)
What about C++ with standard library and VC++?
Time: 531 ms (3 meg)
Time: 5546 ms (for 40 meg)
Am I to believe that C++ with standard library is 4 TIMES SLOWER?
C++ IS FOUR TIMES SLOWER THAN JAVA WITH standard library?
This is really disappointing. I had high hopes.
The version 3 with bug fix is here
---------------
Also, posted herehttp://www.pastebin.ca/964045
//counts the words in a text file...
//combined effort: wlfshmn from #java on IRC Undernet
//and RAZII
ByteBuffer in = new FileInputStream(arg).getChannel().map(
FileChannel.MapMode.READ_ONLY, 0, numBytes);
Ok, if I get this you are using a memory mapped file in your Java
version.
std::ifstream input_file( argv );
std::eek:stringstream buffer;
buffer << input_file.rdbuf();
std::string input( buffer.str() );
While in the C++ version you use a high level operator<< to do a
bytewise copy from one stream to another, then copy the result to a
std::string.
So a memory mapped file is faster than copying the file content
multiple times. Surprise!
I thought we had already decided that when processing large files,
the I/O times dominate the test, and suppling a proper buffering is
important. Redundant copying certainly is not!
Bo Persson

AFAIK, not true. See this:

IMO, before you conclude that I/O dominates, you should test it...

We did that last week, when read-sort-write was just 15% slower than
plain read-write. To optimize that, read more than one character at a
time.

In the current test, the C++ version first does a byte-by-byte copy
from one stream to another, the copies that to a std::string, then
claims that C++ is slow at counting words!

My claim is that this is perhaps not optimal code:
std::ifstream input_file( argv );
std::eek:stringstream buffer;
buffer << input_file.rdbuf();
std::string input( buffer.str() );


Bo Persson


Well, but that is not what I would call "I/O dominated benchmark".

In my vocabulary, "I/O dominated" starts at system call to read/write
the file (e.g. FileRead in Win32, 'read' in Posix", or total delays
of regular memory read from the mapped file). If your profiler shows
that you are spending more than 90% in there (and you are not doing
some wild unnecessay seeks and rereading), then, well, it is "I/O
dominated". Otherwise the responsibility is in the userland, either in
C++ library or benchmark.

Mirek
 
M

Mirek Fidler

You seem to not grasp the point that adults respond to childish taunts
differently than children. You might note that you have completely
failed to get one competant programmer to play your game.

Turns out that people grow out of the "mine is bigger than your's
games". We learn that comparing apples to oranges is pointless and
silly. IF you need to make orange juice then an apple is inappropriate
and if you need something red then the orange is out, and if all you
need is something round the they are about equal. Generalized benchmarks
are likewise pointless and impossible to develop. In fact banchmarks are
very hard and so far this thread has failed to even come close to
anything meaningful.

Well, now this sounds like a bunch of very childish excuses to me...
Of course, Razii's posts are somewhat annoying, but IMO C++ community
should take these issues a little bit more seriously. It is way too
simple to outperfom C++ nowadays with langauges supposed to be much
slower. It would be ridiculous if C++ gains the "legacy language
status" just because it looks slow...

And, after all, I did posted a C++ code that Razii is completely
unable to beat... :)

Mirek
 
R

Razii

You seem to not grasp the point that adults respond to childish taunts
differently than children. You might note that you have completely
failed to get one competant programmer to play your game.

Another post "about me" by the clown 'stan.' All his posts to this
newsgroup are about me (he has nothing else to contribute to this
newsgroup). He claims adults don't respond to taunts, even though he
reads all my posts obsessively, and he has posted to every thread of
mine. How old are you stan, four?

Thanks for posting to all my threads, by the way :)

Continue.
 
C

Chris Thomasson

Razii said:
This topic was on these newsgroups 7 years ago :)

http://groups.google.com/group/comp.lang.c++/msg/695ebf877e25b287

I said then: "How about reading the whole Bible, sorting by lines, and
writing the sorted book to a file?"

Who remember that from 7 years ago, one of the longest thread on this
newsgroup :)

The text file used for the bible is here
ftp://ftp.cs.princeton.edu/pub/cs126/markov/textfiles/bible.txt

Back to see if anything has changed

(downloaded whatever is latest version from sun.java.com)

Time for reading, sorting, writing: 359 ms (Java)
Time for reading, sorting, writing: 375 ms (Java)
Time for reading, sorting, writing: 375 ms (Java)

Visual C++ express and command I used was cl IOSort.cpp /O2

Time for reading, sorting, writing: 375 ms (c++)
Time for reading, sorting, writing: 390 ms (c++)
Time for reading, sorting, writing: 359 ms (c++)

The question still is (7 years later), where is great speed advantage
you guys were claiming for c++?
[...]

http://groups.google.com/group/comp.lang.c++/msg/30c5e869150a2041

http://groups.google.com/group/comp.lang.c++/msg/9c11afcb3b21dc90

IMVHO, I would have to use platform-dependant resources in order to compete
with standard Java API, exactly like how I would need to use them (e.g.,
Linux/Windows/Solaris/POSIX native sys-calls) to build a high-performance
JVM with C/C++ and some _asm of course... ;^)
 
R

Razii

And, after all, I did posted a C++ code that Razii is completely
unable to beat... :)

Well, I tried some tricks but nothing worked. I give up. So this is
the final version (for now).

http://pastebin.com/f529d07e5

for 4 meg text file, it's 359 ms (client) and for 40 meg it's around
2300 ms (-server)

With 40 meg (or larger) file, it's 2.5 times slower than U++ and
around 2.3 faster than C++ version compiled with VC++ (I haven't tried
GCC).

Well, time for you to post the results, as you said you would,
including the time for C++ version that uses standard library.

Use java client with files under 10 meg and -server with larger
files.
 
R

Razii

In the current test, the C++ version first does a byte-by-byte copy
from one stream to another, the copies that to a std::string, then
claims that C++ is slow at counting words!

My Java version, U++ version, D version are all doing the same thing,
creating strings from bytes. What are you talking about?

In the last case we were parsing lines as strings and sorting them.
That took far less time than reading and writing the file.

In this case...

(1) We are not counting the output time.
(2) it takes time to parse each word (instead of whole line) as a
distinct string. (this takes most of the time in this benchmark)
(3) You need to save the new word/string and increment the count each
time the word is found again (i.e use some kind of map container).
(this also takes most of the time).
(4) Sort the list -- which in C++ version is already sorted due to
using sorted map.

Number 2 and number 3 take more time than reading the file itself.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,073
Latest member
DarinCeden

Latest Threads

Top