Java vs C++ speed (IO & Sorting)

Bo Persson · Mar 30, 2008

Razii said:
If you can't take insults, don't insult others.

You are dumber than a stack of stones.

By the way, notice that this guy, Jerry Coffin, didn't post result
with 40x bible since he figured out that -server is faster than c++
version.

Also, interesting to see that none of c++ guru posted any time. I
guess that says it all.

I guess they are just out of insults.

Bo Persson

Razii · Mar 30, 2008

Well, yes and no. I am trying to test every single benchmark I
encounter. If I find deficiency, I am trying to fix it

I wrote a third version that is faster than the second version

http://pastebin.com/f691e5e86

For 3 meg file

Time: 578 ms (First version)
Time: 422 ms (Second version)
Time: 360 ms (Third version)

don't use -server with smaller files like 3 meg. Client java is
faster.

Now with 40 meg file (using -server this time)

Time: 4922 ms (First version)
Time: 3422 ms (Second version)
Time: 2797 ms (Third version)

VC++
Time: 531 ms (3 meg)
Time: 5296 ms (40)

Now that's slow ...

U++
Time: 78 ms (3 meg)
Time: 828 ms (40 meg)

Now that's really fast

Third version below

Also, posted here http://pastebin.com/f691e5e86

-----------------------

//counts the words in a text file...
//combined effort: wlfshmn from #java on IRC Undernet
//and RAZII
import java.io.*;
import java.util.*;
import java.nio.*;
import java.nio.channels.*;
public final class WordCount3
{
private static final Map<String, int[]> dictionary =
new HashMap<String, int[]>(800000);
private static int tWords = 0;
private static int tLines = 0;
private static long tBytes = 0;

public static void main(final String[] args) throws Exception
{
System.out.println("Lines\tWords\tBytes\tFile\n");

//TIME STARTS HERE
final long start = System.currentTimeMillis();
for (String arg : args)
{
File file = new File(arg);
if (!file.isFile())
{
continue;
}

int numLines = 0;
int numWords = 0;
long numBytes = file.length();

ByteBuffer in = new FileInputStream(arg).getChannel().map(
FileChannel.MapMode.READ_ONLY, 0, file.length());

StringBuilder sb = new StringBuilder();
boolean inword = false;
in.rewind();
for (int i = 0; i < numBytes; i++)
{
char c = (char )in.get();

if (c == '\n')
numLines++;
else if (c >= 'a' && c <= 'z' || c >= 'A' && c <= 'Z')
{
sb.append(c);
inword = true;
}
else if (inword)
{
numWords++;
int[] count = dictionary.get(sb.toString());
if (count != null)
{ count[0]++;}
else
{dictionary.put(sb.toString(), new int[]{1});}
sb.delete(0, sb.length());
inword = false;
}

}

System.out.println( numLines + "\t" + numWords + "\t" + numBytes +
"\t" + arg);
tLines += numLines;
tWords += numWords;
tBytes += numBytes;
}

//only converting it to TreepMap so the result
//appear ordered, I could have
//moved this part down to printing phase
//(i.e. not include it in time).
TreeMap<String, int[] > sort = new TreeMap<String, int[]>
(dictionary);

//TIME ENDS HERE
final long end = System.currentTimeMillis();

System.out.println("---------------------------------------");
if (args.length > 1)
{
System.out.println(tLines + "\t" + tWords + "\t" + tBytes +
"\tTotal");
System.out.println("---------------------------------------");
}
for (Map.Entry<String, int[]> pairs : sort.entrySet())
{
System.out.println(pairs.getValue()[0] + "\t" + pairs.getKey());
}
System.out.println("Time: " + (end - start) + " ms");
}
}

Razii · Mar 30, 2008

P.S.: I hope you have compiled it as "OPTIMAL"

I chose MSC9 Speed from the menu. In any case, I am not sure what can
be faster than this anyway. It's 0ms for alice30.txt !!!

Razii · Mar 30, 2008

Time: 4922 ms (First version)
Time: 3422 ms (Second version)
Time: 2797 ms (Third version)

Actually, I forgot to use -server.. third version with - server is :

Time: 2306 ms

Mirek Fidler · Mar 30, 2008

I chose MSC9 Speed from the menu. In any case, I am not sure what can
be faster than this anyway. It's 0ms for alice30.txt !!!

Who knows. Two years ago I was pretty sure U++ Core is as fast as
possible. Since then, we have found ways how to optimize it to be
about twice as fast

Mirek

Razii · Mar 30, 2008

Who knows. Two years ago I was pretty sure U++ Core is as fast as
possible. Since then, we have found ways how to optimize it to be
about twice as fast

Are you on linux, windows? Can you do the benchmark with 40 meg text
file on your machine with D (move the sort up) vs the third version of
Java (WordCount3) that I posted here http://pastebin.com/f691e5e86

Just add the internal time counter with D like I did with U++.

Mirek Fidler · Mar 30, 2008

Are you on linux, windows? Can you do the benchmark with 40 meg

Both

But if you want to benchmark state-of-the-art C++/U++
performace, GCC recently got quite better than VC++ optimizing the
code. Also, GCC standard library seems to be better than VC++ version
as well now.

text

file on your machine with D (move the sort up) vs the third version of
Java (WordCount3) that I posted herehttp://pastebin.com/f691e5e86

Just add the internal time counter with D like I did with U++.

Sorry, right now I am a little bit short of time. I am looking forward
to play with all this during next week.

Mirek

Jerry Coffin · Mar 30, 2008

[ ... ]

Many products do give customers a choice to install the correct JVM
distribution. For example, the Glassfish server offers a variety of bundles
with and without the JDK.

<http://java.sun.com/javaee/downloads/index.jsp>

It's not complex, because it uses an installer. You're right, one doesn't
expect the customer to handle moving around auxiliary files; one automates
that part of it.

While there may be situations that would justify this, for trivial
utilities like counting words in a file, it sounds quite ridiculous (at
least to me).

Razii · Mar 30, 2008

I think the reason U++ is fast is probably due to algorithm used in
hashing strings in VectorMap. How VectorMap works? If I figure that
out, I bet I can match the speed

Sherman Pendley · Mar 30, 2008

I'm studying to update my C++ skills from long ago. Turbo Pascal & C++ were
actually my first OOP languages, but that was pre-standard and pre-STL days
in the early 90s. Yeah, I have the grays to prove it.

I've been using C variants all along though, so the general syntax is still
hard-wired into my fingers. I don't need a "c++ for dummies" introduction,
just a refresher course.

I'm a fan of O'Reilly. Any opinions on their "Practical C++ Programming,"
"C++ In a Nutshell," or "C++ Cookbook"? In general, I've found that I can
skip their "head first ..." or "learning ..." books, and go straight to
the "programming ..." books.

sherm--

Jerry Coffin · Mar 30, 2008

I'm studying to update my C++ skills from long ago. Turbo Pascal & C++ were
actually my first OOP languages, but that was pre-standard and pre-STL days
in the early 90s. Yeah, I have the grays to prove it.

I've been using C variants all along though, so the general syntax is still
hard-wired into my fingers. I don't need a "c++ for dummies" introduction,
just a refresher course.

For that, _Accelerated C++_ would probably work quite nicely. At some
point, you might want to look at _Exceptional C++_ and _More Exceptional
C++_ as well. It's probably also worth spending a bit of time with
_Effective C++_ and _More Effective C++_. _Accelerated C++_ would
definitely be the first one to study though.

I'm a fan of O'Reilly. Any opinions on their "Practical C++ Programming,"
"C++ In a Nutshell," or "C++ Cookbook"? In general, I've found that I can
skip their "head first ..." or "learning ..." books, and go straight to
the "programming ..." books.

_C++ in a Nutshell_ is almost a pure reference book, not really a course
in C++ (refresher or otherwise). I haven't looked at _C++ Cookbook_, so
I can't really comment on it.

Mirek Fidler · Mar 30, 2008

I think the reason U++ is fast is probably due to algorithm used in
hashing strings in VectorMap. How VectorMap works? If I figure that
out, I bet I can match the speed

Well, you are of course quite right.

Anyway, I do not think you can repeat this in Java - it simply lacks
required low-level facilities.

In any case, VectorMap, String and hashing is a long story to
explain... but if you are really interested, it is open-source after
all. And you have working debugger

BTW, also notice how simple, almost "naive approach" U++ code is. And
how complex is your Java (although, of course, to me U++ looks like
the most natural thing in the world and Java is quite unfamiliar).

Why would I want to use Java if I can do my job in U++ a lot faster,
and the result will be a lot faster too?

Mirek

Razii · Mar 30, 2008

BTW, also notice how simple, almost "naive approach" U++ code is. And
how complex is your Java (although, of course, to me U++ looks like
the most natural thing in the world and Java is quite unfamiliar).

I think your library is designed specifically for this benchmark
(i.e., looking for words in a file). How about if I change the
criteria, that instead of finding words, find quotes. i.e all words
and sentences within " and '? In any case, in some situations, the
java version would look simpler to understand and read than U++
(probably in an application that has threading, network and/or GUI).
My first two versions were as simple to read as wc2.cpp on D page. In
the third version , I used nio and couldn't (for now) find a simple
way to make it work easily with StreamTokenizer.

Razii · Mar 30, 2008

Anyway, I do not think you can repeat this in Java - it simply lacks
required low-level facilities.

Bug fix report.

I just changed one line in version 3 and it's twice faster

http://www.pastebin.ca/964045

In fact with 6 args at command line (each file is 40 meg), Java
-server gets close to U++

Have a look

C:\>WCUPP bible2.txt bible2.txt bible2.txt bible2.txt bible2.txt
bible2.txt

Time: 5046 ms

C:\>java -server WordCount3 bible2.txt bible2.txt bible2.txt
bible2.txt bible2.txt bible2.txt

Time: 6828 ms

Ah, only 1.8 sec difference

Comparing to my previous versions..

Time: 625 ms (version 1) (3 meg)
Time: 187 ms (version 3 with the fix) (3 meg)

40 meg file (java -server)
Time: 5297 ms (version 1)
Time: 1265 ms (version 3 with the fix)

1265 is not too behind U++ ( 843 ms ). You should be worried of the
4th version

Visual C++ still at (Time: 5546 ms ) for 40 meg

The Updated version

-------------
http://www.pastebin.ca/964045

//counts the words in a text file...
//combined effort: wlfshmn from #java on IRC Undernet
//and RAZII
import java.io.*;
import java.util.*;
import java.nio.*;
import java.nio.channels.*;
public final class WordCount3
{
private static final Map<String, int[]> dictionary =
new HashMap<String, int[]>(16000);
private static int tWords = 0;
private static int tLines = 0;
private static long tBytes = 0;

public static void main(final String[] args) throws Exception
{
System.out.println("Lines\tWords\tBytes\tFile\n");

//TIME STARTS HERE
final long start = System.currentTimeMillis();
for (String arg : args)
{
File file = new File(arg);
if (!file.isFile())
{
continue;
}

int numLines = 0;
int numWords = 0;
long numBytes = file.length();

ByteBuffer in = new FileInputStream(arg).getChannel().map(
FileChannel.MapMode.READ_ONLY, 0, numBytes);

StringBuilder sb = new StringBuilder();
boolean inword = false;
in.rewind();
for (int i = 0; i < numBytes; i= i +2)
{
char c = (char) in.get();
if (c == '\n')
numLines++;
else if (c >= 'a' && c <= 'z' || c >= 'A' && c <= 'Z')
{
sb.append(c);
inword = true;
}
else if (inword)
{
numWords++;
int[] count = dictionary.get(sb.toString());
if (count != null)
{ count[0]++;}
else
{dictionary.put(sb.toString(), new int[]{1});}
sb.delete(0, sb.length());
inword = false;
}

}

System.out.println( numLines + "\t" + numWords + "\t" + numBytes +
"\t" + arg);
tLines += numLines;
tWords += numWords;
tBytes += numBytes;
}

//only converting it to TreepMap so the result
//appear ordered, I could have
//moved this part down to printing phase
//(i.e. not include it in time).
TreeMap<String, int[] > sort = new TreeMap<String, int[]>
(dictionary);

//TIME ENDS HERE
final long end = System.currentTimeMillis();

System.out.println("---------------------------------------");
if (args.length > 1)
{
System.out.println(tLines + "\t" + tWords + "\t" + tBytes +
"\tTotal");
System.out.println("---------------------------------------");
}
for (Map.Entry<String, int[]> pairs : sort.entrySet())
{
System.out.println(pairs.getValue()[0] + "\t" + pairs.getKey());
}
System.out.println("Time: " + (end - start) + " ms");
}
}

Razii · Mar 30, 2008

Well, I am really disappointed with C++ people and especially VC+++. I
fixed a minor bug in version three and it's now two time faster

Here is what I have now

3 meg file
Time: 625 ms (My version 1) (3 meg)
Time: 187 ms (My version 3 with the fix) (3 meg)

40 meg file (and java -server)
Time: 5297 ms (my version 1)
Time: 1265 ms (my version 3 with the fix)

What about C++ with standard library and VC++?

Time: 531 ms (3 meg)
Time: 5546 ms (for 40 meg)

Am I to believe that C++ with standard library is 4 TIMES SLOWER?

C++ IS FOUR TIMES SLOWER THAN JAVA WITH standard library?

This is really disappointing. I had high hopes.

The version 3 with bug fix is here
---------------

Also, posted here http://www.pastebin.ca/964045

//counts the words in a text file...
//combined effort: wlfshmn from #java on IRC Undernet
//and RAZII
import java.io.*;
import java.util.*;
import java.nio.*;
import java.nio.channels.*;
public final class WordCount3
{
private static final Map<String, int[]> dictionary =
new HashMap<String, int[]>(16000);
private static int tWords = 0;
private static int tLines = 0;
private static long tBytes = 0;

public static void main(final String[] args) throws Exception
{
System.out.println("Lines\tWords\tBytes\tFile\n");

//TIME STARTS HERE
final long start = System.currentTimeMillis();
for (String arg : args)
{
File file = new File(arg);
if (!file.isFile())
{
continue;
}

int numLines = 0;
int numWords = 0;
long numBytes = file.length();

ByteBuffer in = new FileInputStream(arg).getChannel().map(
FileChannel.MapMode.READ_ONLY, 0, numBytes);

StringBuilder sb = new StringBuilder();
boolean inword = false;
in.rewind();
for (int i = 0; i < numBytes; i= i +2)
{
char c = (char) in.get();
if (c == '\n')
numLines++;
else if (c >= 'a' && c <= 'z' || c >= 'A' && c <= 'Z')
{
sb.append(c);
inword = true;
}
else if (inword)
{
numWords++;
int[] count = dictionary.get(sb.toString());
if (count != null)
{ count[0]++;}
else
{dictionary.put(sb.toString(), new int[]{1});}
sb.delete(0, sb.length());
inword = false;
}

}

System.out.println( numLines + "\t" + numWords + "\t" + numBytes +
"\t" + arg);
tLines += numLines;
tWords += numWords;
tBytes += numBytes;
}

//only converting it to TreepMap so the result
//appear ordered, I could have
//moved this part down to printing phase
//(i.e. not include it in time).
TreeMap<String, int[] > sort = new TreeMap<String, int[]>
(dictionary);

//TIME ENDS HERE
final long end = System.currentTimeMillis();

System.out.println("---------------------------------------");
if (args.length > 1)
{
System.out.println(tLines + "\t" + tWords + "\t" + tBytes +
"\tTotal");
System.out.println("---------------------------------------");
}
for (Map.Entry<String, int[]> pairs : sort.entrySet())
{
System.out.println(pairs.getValue()[0] + "\t" + pairs.getKey());
}
System.out.println("Time: " + (end - start) + " ms");
}
}

Razii · Mar 30, 2008

Bug fix report.

Yikes! never mind. This "bug fixed was bug and doesn't really work.

Razii · Mar 30, 2008

Well, I am really disappointed with C++ people and especially VC+++. I
fixed a minor bug in version three and it's now two time faster

I am disappointed alright but totally ignore the last post. There was
no bug fixes

The version was fake one.

Bo Persson · Mar 31, 2008

Razii said:
Well, I am really disappointed with C++ people and especially
VC+++. I fixed a minor bug in version three and it's now two time
faster

Here is what I have now

3 meg file
Time: 625 ms (My version 1) (3 meg)
Time: 187 ms (My version 3 with the fix) (3 meg)

40 meg file (and java -server)
Time: 5297 ms (my version 1)
Time: 1265 ms (my version 3 with the fix)

What about C++ with standard library and VC++?

Time: 531 ms (3 meg)
Time: 5546 ms (for 40 meg)

Am I to believe that C++ with standard library is 4 TIMES SLOWER?

C++ IS FOUR TIMES SLOWER THAN JAVA WITH standard library?

This is really disappointing. I had high hopes.

The version 3 with bug fix is here
---------------

Also, posted here http://www.pastebin.ca/964045

//counts the words in a text file...
//combined effort: wlfshmn from #java on IRC Undernet
//and RAZII

ByteBuffer in = new FileInputStream(arg).getChannel().map(
FileChannel.MapMode.READ_ONLY, 0, numBytes);

Ok, if I get this you are using a memory mapped file in your Java
version.

std::ifstream input_file( argv );
std:stringstream buffer;
buffer << input_file.rdbuf();
std::string input( buffer.str() );

While in the C++ version you use a high level operator<< to do a
bytewise copy from one stream to another, then copy the result to a
std::string.

So a memory mapped file is faster than copying the file content
multiple times. Surprise!

I thought we had already decided that when processing large files, the
I/O times dominate the test, and suppling a proper buffering is
important. Redundant copying certainly is not!

Bo Persson

Razii · Mar 31, 2008

Ok, if I get this you are using a memory mapped file in your Java
version.

Well, I said ignore that version since it was missing words (due to a
stupid typo). This is the one that is working

http://pastebin.com/d48680a60

Time for 40 meg file

Time: 2281 ms

for C++..

C:\>wc1 bible2.txt
Time: 5421 ms

In fact

C:\>java -server WordCount3 bible2.txt bible2.txt
Time: 4344 ms

even with two bible2.txt, it's still faster than c++ with one
bible2.txt

Ok, if I get this you are using a memory mapped file in your Java
version.

Well then change the following in C++ version to mapping the file...

std::ifstream input_file( argv );
std:stringstream buffer;
buffer << input_file.rdbuf();
std::string input( buffer.str() );

Click to expand...

ldv · Apr 1, 2008

Compare these results with Java'sJETcompiler...http://www.excelsior-usa.com/jet.html

I wonder if the latest beta of Excelsior JET would make any difference
in your particular tests? They claim that

Thanks to:

* faster memory allocation
* enhanced loop optimizations
* better register allocation

the compiled applications now work faster.

It probably won't speedup the I/O, but sorting has a chance.

http://www.excelsior-usa.com/jetdlbeta.html

Counter-intuitive io vs no-io time readings	6	Apr 9, 2014
C++ VS C# for specific app	1	Jul 5, 2019
How to sort a CSV file with merge sort JAVA	7	May 6, 2021
simple sorting a Range program can not run(work or printout)	3	Jul 12, 2011
Connected SQLite to my java program but information are not submitted	2	Aug 2, 2022
Java speed vs. C++.	30	Dec 4, 2004
Chatbot	0	Oct 8, 2024
Java VM 1.6.0 Sorting Collections	28	Jun 10, 2010

Java vs C++ speed (IO & Sorting)

Bo Persson

Razii

Razii

Razii

Mirek Fidler

Razii

Mirek Fidler

Jerry Coffin

Razii

Sherman Pendley

Jerry Coffin

Mirek Fidler

Razii

Razii

Razii

Razii

Razii

Bo Persson

Razii

ldv

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads