Java vs C++ speed (IO & Sorting)

R

Razii

That's probably why he suggested using `multiset'.

By the way, have look at 2001 post where I used set...

http://groups.google.com/group/comp.lang.c++/msg/695ebf877e25b287

In that case, it didn't make a difference because bible.txt I was
using had verse numbers (so there could be no duplicates). A few here
(including Pete Becker from dinkumware) claimed that I used set and it
was unfair...

In any case, I can use multiset...

Time for reading, sorting, writing: 328 ms (c++)
Time for reading, sorting, writing: 312 ms (c++)
Time for reading, sorting, writing: 312 ms (c++)

Just a little improvement only...

For Java I can try TreeSet
http://java.sun.com/javase/6/docs/api/java/util/TreeSet.html

(which I asume does something similiar -- if not, any Java expert can
correct it).

Time for reading, sorting, writing: 359 ms
Time for reading, sorting, writing: 360 ms
Time for reading, sorting, writing: 359 ms

Not much changes here ...

------ c++-------
#include <fstream>
#include<iostream>
#include <string>
#include <set>
#include <algorithm>
#include <ctime>
using namespace ::std;
using namespace std;

void main()
{
multiset<string> buf;
string linBuf;
ifstream inFile("bible.txt");


clock_t start=clock();
while(getline(inFile,linBuf))
buf.insert(linBuf);
ofstream outFile("output.txt");
copy(buf.begin(),buf.end(),ostream_iterator<string>(outFile,"\n"));
clock_t endt=clock();
cout <<"Time for reading, sorting, writing: " << endt-start << "
ms\n";

}

-------- java -----------

import java.io.*;
import java.util.*;
public class IOSort
{
public static void main(String[] arg) throws Exception
{
Collection<String> ar = new TreeSet<String> ();
String line = "";
BufferedReader in = new BufferedReader( new
FileReader("bible.txt"));
PrintWriter out = new PrintWriter(new BufferedWriter(new
FileWriter("output.txt")));
long start = System.currentTimeMillis();
while (true)
{ line = in.readLine();
if (line == null)
break;
ar.add(line);
}
int size = ar.size();
for (String c : ar)
{
out.println(c);
}
out.close();
long end = System.currentTimeMillis();
System.out.println("Time for reading, sorting, writing: "+
(end - start) + " ms");
}


}
 
R

Razii

Who ever claimed a speed advantage for C++?


You were here 7 years ago ... I remember your name. Are you claiming
no one has ever said c++ has huge speed advantage? If that's your
claim, then you are either being dishonest or have a bad memory
 
I

Ian Collins

Razii said:
By the way, have look at 2001 post where I used set...

http://groups.google.com/group/comp.lang.c++/msg/695ebf877e25b287

In that case, it didn't make a difference because bible.txt I was
using had verse numbers (so there could be no duplicates). A few here
(including Pete Becker from dinkumware) claimed that I used set and it
was unfair...

In any case, I can use multiset...

Time for reading, sorting, writing: 328 ms (c++)
Time for reading, sorting, writing: 312 ms (c++)
Time for reading, sorting, writing: 312 ms (c++)

Just a little improvement only...
So you have a slow system that may be I/O bound, or you have a poor
multiset implementation. Try checking the read and write times on your
original code.
 
Y

Yannick Tremblay

Yawn. I really care what language you use (NOT).

If you don't care about what language other peoples use, why are
trying to start a flame war across 2 newsgroup?
 
R

Razii

1) 300 ms is too short of a time for any reliable comparison.


I made bible.txt 10 times and made it a 43 meg file

C++ is doing far worse now (the code used was multiset version)

Time for reading, sorting, writing: 2047 ms (java)
Time for reading, sorting, writing: 2016 ms (java)
Time for reading, sorting, writing: 2016 ms (java)
Time for reading, sorting, writing: 2015 ms (java)


and for c++

Time for reading, sorting, writing: 5281 ms (c++)
Time for reading, sorting, writing: 5703 ms (c++)
Time for reading, sorting, writing: 3921 ms (c++)
Time for reading, sorting, writing: 3718 ms (c++)

How come? c++ is at least 45% times slowe (if using 3718 ms)
 
R

Razii

So you have a slow system that may be I/O bound, or you have a poor
multiset implementation. Try checking the read and write times on your
original code.

Well, the java code is running on the same "slow system"

And see the other post .. make bible.txt 10 times bigger by copying
and pasting 9 more times. Make it 43 meg file. After that,...

Time for reading, sorting, writing: 2047 ms (java)
Time for reading, sorting, writing: 2016 ms (java)
Time for reading, sorting, writing: 2016 ms (java)
Time for reading, sorting, writing: 2015 ms (java)


and for c++

Time for reading, sorting, writing: 5281 ms (c++)
Time for reading, sorting, writing: 5703 ms (c++)
Time for reading, sorting, writing: 3921 ms (c++)
Time for reading, sorting, writing: 3718 ms (c++)

c++ performed even worse...
 
R

Razii

If you don't care about what language other peoples use, why are
trying to start a flame war across 2 newsgroup?

It's not a flame war. It's a discussion and testing. If you are not
interesting in it, just move on to next thread.
 
P

peter koch

You are not making sense. Where on earth is c+ releasing memory in the
code that I posted?


No, it's generally accepted that developing in C++ is much slower and
difficult due to pathetic c++ library, no thread support, no network
library.

This is weird. If the C++ library is so bad I do not understand why
the C++ code in your example is so much clearer than the Java
equivalent with an "endless" loop that is exited in the middle.
Apart from that, the C++ philosophy is very different from the Java
one: Java has an "everything in one package" whereas in C++ you
typically use add-on packages. So if you use threading and networking,
just use e.g. Posix or Corba or boost which gives you everything.

As for the length of code I posted, I can jumble everything
together and make Java code look short :)

import java.io.*;  import java.util.*;  public class IOSort
{public static void main(String[] arg) throws Exception {
ArrayList<String> ar = new ArrayList<String>(50000); String line = "";
BufferedReader in = new BufferedReader( new FileReader("bible.txt"));
PrintWriter out  = new PrintWriter(new BufferedWriter(new
FileWriter("output.txt"))); long start = System.currentTimeMillis();
while (true) { line = in.readLine(); if (line == null) break;
ar.add(line);  } Collections.sort(ar); int size = ar.size();
for (int i = 0; i < size; i++) { out.println(ar.get(i));}
out.close();  long end = System.currentTimeMillis();
System.out.println("Time for reading, sorting, writing: "+ (end -
start) + " ms"); } }

I hope you are satisfied :))

Right. But count the number of statements: they are the same. And
still the same half time longer.
On a serious note, I also removed an unneeded line, (if (line.length()
==0)  continue;) that was in the loop. That probably helped in speed..

It did? That would give you more lines to sort, wouldn't it?
Yawn. I really care what language you use (NOT).

I do not know your purpose of that test, but to me it confirms that
you should use C++ and not Java. I guess that must be of relevance
somewhere?

/Peter
 
C

Cory Nelson

You were here 7 years ago ... I remember your name. Are you claiming
no one has ever said c++ has huge speed advantage? If that's your
claim, then you are either being dishonest or have a bad memory

This specific test has a lot of room to be specialized. A quick
example: arenas and intrusive containers. Near zero-overhead paged
allocation means less memory usage and much faster execution, and
boost.intrusive's multiset container can give less overhead than a
std::multiset and better locality.

Is such optimization usually needed? No. But after you've profiled
if you decide your approach isn't adequate, C++ gives you a lot more
freedom than Java to do so when you need it. Apples to apples
comparisons are silly: in what real-world situation would you limit
your app design because another language can't do something?

And I question both benchmarks' timing methods. On Windows, clock()
will return wall time. On *nix it will return real processor usage
time. These should both be measuring processor usage, not wall time.
 
R

Razii

C++ is doing far worse now (the code used was multiset version)

In Java TreeSet has no duplicates .. so I will put back ArrayList()

with 10 bibles 43 meg file

C:\>java -Xmx128m IOSort

Time for reading, sorting, writing: 4203 ms
Time for reading, sorting, writing: 3141 ms
Time for reading, sorting, writing: 3203 ms
Time for reading, sorting, writing: 2954 ms


and for c++ (using multiset)

Time for reading, sorting, writing: 5281 ms (c++)
Time for reading, sorting, writing: 5703 ms (c++)
Time for reading, sorting, writing: 3921 ms (c++)
Time for reading, sorting, writing: 3718 ms (c++)
 
J

James Kanze

these newsgroups 7 years ago :)
[snip]
First of all, I believe this is a bad test. A lot of the time
will be involved with I/O which the compilers cant really
affect.

If most of your application is involved with doing I/O, it could
be a valid measurement. If your application is CPU bound with
floating point operations, it's totally irrelevant. If your
application is sorting large text files, it's very relevant.
For most applications, I suspect, it's somewhere in between.
I also notice that the time included does not involve
releasing memory used by the Java-program which is unfair as
this time was measured in the C++ version.

Yes and no. This could be considered a constraint inherent in
C++---that you can't defer releasing memory until later. (It
would be interesting to see the times for C++ with the Boehm
collector. Interesting, but not necessarily relevant to
anything in particular either.)
Be that as it is, I notice that the C++ version is fifty
percent shorter which suggests that developing with C++ will
be quite a lot faster.

That's pretty much an established fact:).

Seriously, it depends on what you're developing. When
reliability is important, C++ tends to win out. For
applications where it's not too important, there are some
domains where Java is particularly well integrated---it's
certainly a lot less work to develop a few beans for your web
server than it is to write CGI programs in C++. (Curiously, one
of the application domains where I think Java would have the
edge would be light weight graphic clients---a very good, fully
integrated GUI library and portability of the compiled code
would seem to be major trump cards for that. But it doesn't
seem to be widely used there.)
I also wonder what happens in the hypothetical case where you
were told that the solution produced was simply to slow. I
know that C++ offers you lots of flexibility where you could
program towards a certain environment, using e.g.
memory-mapped I/O. (*) So all in all, the above benchmark
could never make me consider switching languages.

That's not the purpose of it. The purpose is just to try to get
an argument going.
 
R

Razii

I do not know your purpose of that test, but to me it confirms that
you should use C++ and not Java. I guess that must be of relevance
somewhere?

No, the test says nothing about c++ or java usage. It's about IO and
sorting speed and the test shows c++ has no advantage in speed. The
java code is cleared and easier to understand but even that has
nothing to do with this test.

As for library, that's the problem with c++. You have to use third
party libraries for something as basic and important as networking
and threading. In 99% of software today, threading and networking is
needed. That's a very good reason why not to use c++ and why there is
no c++ on web, commerce, business apps etc. Where is c++ on server
side apps for example? No where. Why did c++ lose so much ground to C#
and Java in last 8 years?

All you do with C++ is write drivers and that can be done just fine
with C.
 
D

dave_mikesell

k, I figure out the reason .. there are no duplicates in TreeSet in
Java :)

So over the last seven years that it took you to craft a benchmark in
Java's favor, you didn't learn what the basic containers in each
language can do?
 
R

Razii

(It
would be interesting to see the times for C++ with the Boehm
collector.

Boehm collector will be always slower than languages with built-in GC.
It's implemented via library. Retrofitting a language with gc means it
will be always slower than language designed for gc.


the following was posted to this group before (credit John Harpo)

http://lists.tunes.org/archives/gclist/1997-November/001291.html

Such optimisations require the optimiser in the compiler to know the
details of the memory allocator and collector, i.e. the GC. This is
not possible if the GC has been retrofitted onto the language as a
library. The compiler's optimiser does not have the necessary
information to make the optimisations.


"Because it is hard to move objects for C programs", i.e. retrofitting
limits choices which limits performance.

"Many Java/ML/Scheme implementations have faster garbage collectors
that may move objects..." - Hans Boehm
http://www.hpl.hp.com/personal/Hans_Boehm/gc/nonmoving/html/slide_4.html
 
L

Lew

Razii said:
You are not making sense. Where on earth is c+ releasing memory in the
code that I posted?

Where on earth is the Java code NOT releasing its unused memory?
How is the time for Java to release memory NOT being measured?

I have a hard time imagining any simple way NOT to include GC time in the Java
timings.
 
R

Razii

So over the last seven years that it took you to craft a benchmark in
Java's favor, you didn't learn what the basic containers in each
language can do?

It took me seven years? What the heck? I posted this 7 years ago and
you (and by that I mean these two newsgroups) failed to show c++ is
faster. I came back 7 years later and posted the same thing and you
still failed to show c++ is faster.

What does that have to do with what I did for 7 years?
 
M

Michael.Boehnisch

First of all, I believe this is a bad test. A lot of the time will be
involved with I/O which the compilers cant really affect.

Quick check, comment out the std::sort() call:
with std::sort(): 375ms
w/o : 281ms
I seem to have a similar machine in terms of performance, compared to
the original poster :)

The program's runtime is dominated by the I/O, executed in both cases
by the same OS back-end functions.
The rest of the program is mainly comparing and shoving around memory
segments, I assume in both cases executed by library *machine* code.
Its only natural to me, the execution time is near identical.
Memory allocation seems to be no issue, at least not for C++ - if I
comment out the buf.reserve() call, no change in runtime is
noticeable.
However, the Java code pre-allocates 5000 lines, the C++ version
50000. Somebody with a Java environment may check out what happens if
the number is adjusted.
(The example text is ~31,000 lines).

One more thing caught my eye: The bible file contains a single empty
line that is processed by the C++ version but not by the Java version.
One extra empty line is not much, but induces O(log n) extra steps for
the sorting. If I modify the C++ program to disregard the empty line,
computing time goes down to 358ms (or 94ms --> 77ms for the sorting
only!).
I also
notice that the time included does not involve releasing memory used
by the Java-program which is unfair as this time was measured in the C+
+ version.

Plus, the considerable effort for loading and initialization, and
garbage collection of the Java VM is not included.
Be that as it is, I notice that the C++ version is fifty percent
shorter which suggests that developing with C++ will be quite a lot
faster.

While I agree in part, IMHO you are not referring to the right reason.
The physical typing of the programs should not make a big difference -
in C++ you can use really nifty constructs that save plenty of source
bytes. However, most other developers will have problems reading your
code - even you yourself may not be able to explain a code snippet you
wrote "ad hoc" only one week later.
Just as example, the infamous Ackerman function:

int ack( const int m, const int n ) {
return m?n?ack(m-1,ack(m,n-1)):ack(m-1,1):n+1;
}

This *is* valid C++ code - a real space-saver, horrible style. I would
prefer the more typing intensive, but better manageable version:

int ack( const int m, const int n ) {
if ( m == 0 ) return n+1;
if ( n == 0 ) return ack( m-1, 1 );
return ack( m-1, ack( m, n-1 ) );
}

Both versions are not identical in run-time efficiency: the "nifty"
version takes 5,4s on my system for ack(4, 1), the lengthy one 3,6s
only. I have no quick explanation for the difference, though.
For my feeling, too, the Java version looks clumsy style - it is
harder to understand.

just my EURO.02,

Michael.
 
L

Lew

James said:
the real reason I
use C++ is because my applications have to be robust, and it's
easier to develop correct code with C++ than with Java.

YMMV. I find that Java supports correct code, robustness and scalability a
LOT more than C++. But then, I like emacs better than vi, too.

Neither one of us can claim that either language makes it "easier to develop
correct code" without a whole lot of evidence, and factoring in the impedance
match to the developer's mind.

The best you can aver is that you /feel/ that C++ makes it easier /for you/ to
develop correct code, for certain values of "correct".
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,143
Latest member
SterlingLa
Top