Java vs C++ speed (IO & Sorting)

R

Razii

This topic was on these newsgroups 7 years ago :)

http://groups.google.com/group/comp.lang.c++/msg/695ebf877e25b287

I said then: "How about reading the whole Bible, sorting by lines, and
writing the sorted book to a file?"

Who remember that from 7 years ago, one of the longest thread on this
newsgroup :)

The text file used for the bible is here
ftp://ftp.cs.princeton.edu/pub/cs126/markov/textfiles/bible.txt

Back to see if anything has changed

(downloaded whatever is latest version from sun.java.com)

Time for reading, sorting, writing: 359 ms (Java)
Time for reading, sorting, writing: 375 ms (Java)
Time for reading, sorting, writing: 375 ms (Java)

Visual C++ express and command I used was cl IOSort.cpp /O2

Time for reading, sorting, writing: 375 ms (c++)
Time for reading, sorting, writing: 390 ms (c++)
Time for reading, sorting, writing: 359 ms (c++)

The question still is (7 years later), where is great speed advantage
you guys were claiming for c++?

------------------- Java Code -------------- (same as 7 years ago :)

import java.io.*;
import java.util.*;
public class IOSort
{
public static void main(String[] arg) throws Exception
{
ArrayList ar = new ArrayList(5000);


String line = "";


BufferedReader in = new BufferedReader(
new FileReader("bible.txt"));
PrintWriter out = new PrintWriter(new BufferedWriter(
new FileWriter("output.txt")));


long start = System.currentTimeMillis();
while (true)
{
line = in.readLine();
if (line == null)
break;
if (line.length() == 0)
continue;
ar.add(line);
}


Collections.sort(ar);
int size = ar.size();
for (int i = 0; i < size; i++)
{
out.println(ar.get(i));
}
out.close();
long end = System.currentTimeMillis();
System.out.println("Time for reading, sorting, writing: "+
(end - start) + " ms");
}
}

--------- C++ Code ---------------

#include <fstream>
#include<iostream>
#include <string>
#include <vector>
#include <algorithm>
#include <ctime>
using namespace ::std;


int main()
{
vector<string> buf;
string linBuf;
ifstream inFile("bible.txt");
clock_t start=clock();
buf.reserve(50000);


while(getline(inFile,linBuf)) buf.insert(buf.end(), linBuf);
sort(buf.begin(), buf.end());
ofstream outFile("output.txt");
copy(buf.begin(),buf.end(),ostream_iterator<string>(outFile,"\n"));
clock_t endt=clock();
cout <<"Time for reading, sorting, writing: " << endt-start << "
ms\n";
return 0;

}
 
T

Tim H

This topic was on these newsgroups 7 years ago :)

http://groups.google.com/group/comp.lang.c++/msg/695ebf877e25b287

I said then: "How about reading the whole Bible, sorting by lines, and
writing the sorted book to a file?"

Who remember that from 7 years ago, one of the longest thread on this
newsgroup :)

The text file used for the bible is hereftp://ftp.cs.princeton.edu/pub/cs126/markov/textfiles/bible.txt

Back to see if anything has changed

(downloaded whatever is latest version from sun.java.com)

Time for reading, sorting, writing: 359 ms (Java)
Time for reading, sorting, writing: 375 ms (Java)
Time for reading, sorting, writing: 375 ms (Java)

Visual C++ express and command I used was cl IOSort.cpp /O2

Time for reading, sorting, writing: 375 ms (c++)
Time for reading, sorting, writing: 390 ms (c++)
Time for reading, sorting, writing: 359 ms (c++)

The question still is (7 years later), where is great speed advantage
you guys were claiming for c++?

------------------- Java Code -------------- (same as 7 years ago :)

import java.io.*;
import java.util.*;
public class IOSort
{
public static void main(String[] arg) throws Exception
{
ArrayList ar = new ArrayList(5000);

String line = "";

BufferedReader in = new BufferedReader(
new FileReader("bible.txt"));
PrintWriter out = new PrintWriter(new BufferedWriter(
new FileWriter("output.txt")));

long start = System.currentTimeMillis();
while (true)
{
line = in.readLine();
if (line == null)
break;
if (line.length() == 0)
continue;
ar.add(line);
}

Collections.sort(ar);
int size = ar.size();
for (int i = 0; i < size; i++)
{
out.println(ar.get(i));
}
out.close();
long end = System.currentTimeMillis();
System.out.println("Time for reading, sorting, writing: "+
(end - start) + " ms");
}

}

--------- C++ Code ---------------

#include <fstream>
#include<iostream>
#include <string>
#include <vector>
#include <algorithm>
#include <ctime>
using namespace ::std;

int main()
{
vector<string> buf;
string linBuf;
ifstream inFile("bible.txt");
clock_t start=clock();
buf.reserve(50000);

while(getline(inFile,linBuf)) buf.insert(buf.end(), linBuf);
sort(buf.begin(), buf.end());
ofstream outFile("output.txt");
copy(buf.begin(),buf.end(),ostream_iterator<string>(outFile,"\n"));
clock_t endt=clock();
cout <<"Time for reading, sorting, writing: " << endt-start << "
ms\n";
return 0;

}

Did this include JVM startup time? What were the memory footprints?
 
R

Razii

Did this include JVM startup time? What were the memory footprints?


Read the code.. you will see where the time comes from. What does
start time of virtual machine has to do with the time for reading,
sorting and writing file?
 
J

jason.cipriani

The question still is (7 years later), where is great speed advantage
you guys were claiming for c++?

Well I was not involved in that original topic, but I can tell you
that Java has improved a lot over the years. VM startup times aside,
there are VM's that will compile to native code on the fly, the byte
code optimizers have been greatly improved, there are even CPU's that
execute Java byte code directly (not on your test platform, but you'll
find these on devices like PDAs and mobile phones).

C++ will bring you closer to the hardware you are developing for, that
is one of the strengths of the language, but Java can be just as
respectable as far as performance goes. It really just depends on what
you are using the language for. Use the most appropriate tool for the
job.

Also, comparing to your results 7 years ago, it looks like Java has
slowed down a bit, relatively. :-D

Jason
 
R

Razii

Also, comparing to your results 7 years ago, it looks like Java has
slowed down a bit, relatively. :-D

Slowed down? It was 2080 ms in the google link that I posted. It's 359
ms this time (however, the bible.txt file was different back then. So
there can't be any comparison with the old times).
 
J

jason.cipriani

Slowed down? It was 2080 ms in the google link that I posted. It's 359
ms this time (however, the bible.txt file was different back then. So
there can't be any comparison with the old times).

Key word: relatively. I was making a joke that the old C++:Java ratio
was 3400:2080 (1.0:0.6), and the new one is 375:370 (1.0:1.0).

Jason
 
I

Ian Collins

Razii said:
--------- C++ Code ---------------

#include <fstream>
#include<iostream>
#include <string>
#include <vector>
#include <algorithm>
#include <ctime>

#include <iterator>

Is required for ostream_iterator.
using namespace ::std;


int main()
{
vector<string> buf;
string linBuf;
ifstream inFile("bible.txt");
clock_t start=clock();
buf.reserve(50000);


while(getline(inFile,linBuf)) buf.insert(buf.end(), linBuf);
sort(buf.begin(), buf.end());

Why not use a sorted container? Your example takes 120mS on my box,
using std::multiset reduces this to 90.
ofstream outFile("output.txt");
copy(buf.begin(),buf.end(),ostream_iterator<string>(outFile,"\n"));
clock_t endt=clock();
cout <<"Time for reading, sorting, writing: " << endt-start << "
ms\n";

endt-start is in what ever unit the system returns from clock(), it
should be scaled by CLOCKS_PER_SEC.
 
P

peter koch

This topic was on these newsgroups 7 years ago :)

http://groups.google.com/group/comp.lang.c++/msg/695ebf877e25b287

I said then: "How about reading the whole Bible, sorting by lines, and
writing the sorted book to a file?"

Who remember that from 7 years ago, one of the longest thread on this
newsgroup :)
[snip]

First of all, I believe this is a bad test. A lot of the time will be
involved with I/O which the compilers cant really affect. I also
notice that the time included does not involve releasing memory used
by the Java-program which is unfair as this time was measured in the C+
+ version.
Be that as it is, I notice that the C++ version is fifty percent
shorter which suggests that developing with C++ will be quite a lot
faster.
I also wonder what happens in the hypothetical case where you were
told that the solution produced was simply to slow. I know that C++
offers you lots of flexibility where you could program towards a
certain environment, using e.g. memory-mapped I/O. (*)
So all in all, the above benchmark could never make me consider
switching languages.

/Peter

(*) Simpler measures such as adjusting the buffers of the streams
could also have an effect.
 
J

jason.cipriani

Why not use a sorted container? Your example takes 120mS on my box,
using std::multiset reduces this to 90.

What kind of super computers are you guys using? I mean my machine is
a little over a year old but... took me 790ms on a 2.16GHz Core Duo,
7200 RPM SATA something or other hard drive, with GCC -O2 (MinGW, GCC
3.4.5), using QueryPerformanceCounter for timings. Multiset reduced it
to about 720; with 380 for read + sort and 340 for write.

Jason
 
R

Razii

Key word: relatively. I was making a joke that the old C++:Java ratio
was 3400:2080 (1.0:0.6), and the new one is 375:370 (1.0:1.0).

Read the whole thread.. (700+ posts .. is that still a record in this
group?) :)

In the end they whined, made me change compilers, then after I got
VC++, claimed there is a bug in VC++ 5.0 library, . I had to fix the
bug. c++ ended up slightly faster after all that. even then there was
nothing to brag about.
 
I

Ian Collins

What kind of super computers are you guys using? I mean my machine is
a little over a year old but... took me 790ms on a 2.16GHz Core Duo,
7200 RPM SATA something or other hard drive, with GCC -O2 (MinGW, GCC
3.4.5), using QueryPerformanceCounter for timings. Multiset reduced it
to about 720; with 380 for read + sort and 340 for write.
Super computer? Just an AMD FX74 3Ghz, Sun CC. 70mS reading to
multiset, 20mS writing.
 
R

Razii

Why not use a sorted container? Your example takes 120mS on my box,
using std::multiset reduces this to 90.

Two chapters in the bible are identical. If you used set, that won't
include duplicates.

Both java and c++ used the same code, so what's the problem?

Funny that in 2001 when I first posted this I used set. Some guy, Pete
Becker, claimed I was comparing apples and oranges and must use
vector.
 
I

Ian Collins

Razii said:
Two chapters in the bible are identical. If you used set, that won't
include duplicates.
I said multiset.

You're requirement was "How about reading the whole Bible, sorting by
lines, and writing the sorted book to a file?"

Reading into a multiset and then writing out meets those requirements.
 
L

Lionel B

Two chapters in the bible are identical. If you used set, that won't
include duplicates.

That's probably why he suggested using `multiset'.
Both java and c++ used the same code, so what's the problem?

"Same code" seems like stretching it a bit to me...
Funny that in 2001 when I first posted this I used set. Some guy, Pete
Becker, claimed I was comparing apples and oranges and must use vector.

Does Java have a `multiset' equivalent? If so, maybe try a comparison
using that.
 
J

Juha Nieminen

Razii said:
The question still is (7 years later), where is great speed advantage
you guys were claiming for c++?

1) 300 ms is too short of a time for any reliable comparison.

2) With heavy I/O, as in this case, the bottleneck is not in the
language but in the I/O system, which is often independent of the
language (and more dependent on the hardware and somewhat on the
operating system).
Just because Java can read and write files at the same speed as C++
doesn't mean that it's equally fast in general. (OTOH, it doesn't mean
the contrary either, of course.)
 
R

Razii

I also
notice that the time included does not involve releasing memory used
by the Java-program which is unfair as this time was measured in the C+
+ version.

You are not making sense. Where on earth is c+ releasing memory in the
code that I posted?
Be that as it is, I notice that the C++ version is fifty percent
shorter which suggests that developing with C++ will be quite a lot
faster.

No, it's generally accepted that developing in C++ is much slower and
difficult due to pathetic c++ library, no thread support, no network
library. As for the length of code I posted, I can jumble everything
together and make Java code look short :)

import java.io.*; import java.util.*; public class IOSort
{public static void main(String[] arg) throws Exception {
ArrayList<String> ar = new ArrayList<String>(50000); String line = "";
BufferedReader in = new BufferedReader( new FileReader("bible.txt"));
PrintWriter out = new PrintWriter(new BufferedWriter(new
FileWriter("output.txt"))); long start = System.currentTimeMillis();
while (true) { line = in.readLine(); if (line == null) break;
ar.add(line); } Collections.sort(ar); int size = ar.size();
for (int i = 0; i < size; i++) { out.println(ar.get(i));}
out.close(); long end = System.currentTimeMillis();
System.out.println("Time for reading, sorting, writing: "+ (end -
start) + " ms"); } }

I hope you are satisfied :))

On a serious note, I also removed an unneeded line, (if (line.length()
==0) continue;) that was in the loop. That probably helped in speed.
So all in all, the above benchmark could never make me consider
switching languages.

Yawn. I really care what language you use (NOT).
 
J

jason.cipriani

Read the whole thread.. (700+ posts .. is that still a record in this
group?) :)

I most certainly will not read the whole thread; because I do not
care. Also I'm happy for your record. Makes for a good resume, I
guess...? At least you'll be able to have fun trolling with the rest
of the people here that will be taking the bait for the next few days.
Good luck, I'll pop in and say high when this thread beats the old
one.

Jason
 
J

jason.cipriani

Read the whole thread.. (700+ posts .. is that still a record in this
group?) :)

In the end they whined, made me change compilers, then after I got
VC++, claimed there is a bug in VC++ 5.0 library, . I had to fix the
bug. c++ ended up slightly faster after all that. even then there was
nothing to brag about.


Anyways, if you use Java, it should be because of it's rich component
library, cross-platformness, and other great strengths -- not because
some strange test case out-performed another language by a few
milliseconds. You are testing all the wrong things. What you really
need to do is use whatever tool is most appropriate for the job at
hand, not whatever tool sorts the bible 4 milliseconds faster than the
other one...
 
J

James Kanze

This topic was on these newsgroups 7 years ago :)

I said then: "How about reading the whole Bible, sorting by lines, and
writing the sorted book to a file?"
Who remember that from 7 years ago, one of the longest thread on this
newsgroup :)
The text file used for the bible is
hereftp://ftp.cs.princeton.edu/pub/cs126/markov/textfiles/bible.txt
Back to see if anything has changed
(downloaded whatever is latest version from sun.java.com)
Time for reading, sorting, writing: 359 ms (Java)
Time for reading, sorting, writing: 375 ms (Java)
Time for reading, sorting, writing: 375 ms (Java)
Visual C++ express and command I used was cl IOSort.cpp /O2
Time for reading, sorting, writing: 375 ms (c++)
Time for reading, sorting, writing: 390 ms (c++)
Time for reading, sorting, writing: 359 ms (c++)
The question still is (7 years later), where is great speed advantage
you guys were claiming for c++?

Who ever claimed a speed advantage for C++? I've said it more
than once, I can write a benchmark in which C++ will beat Java
hands down. Or vice versa. It happens that C++ will beat Java
in the type of code I'm working on now, but the real reason I
use C++ is because my applications have to be robust, and it's
easier to develop correct code with C++ than with Java.

For those who want to prove C++ faster, just do something with
large arrays of user defined types having value semantics. For
those who want to prove Java faster, use large arrays of basic
types, or where you can swap the pointers, rather than the
values. Like this particular example:)---I'm really surprised
that Java didn't do a lot better.

For those who are concerned with performance in your actual
work, of course, write a benchmark which simulates your actual
work (I don't know of too many people just sorting lines in a
single large text corpus), and benchmark it, on the machine
you'll actually be running on. (The quality of Java---and
C++---implementations varies a lot.) In theory, Java has the
advantage in array accesses (because of the lack of aliasing);
C++ when it comes to handling user defined value types (no
allocation is even cheaper than garbage collected allocation).
In practice, however, it will depend on the implementation.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,535
Members
45,007
Latest member
obedient dusk

Latest Threads

Top