Array optimizing problem in C++?

J

jason.cipriani


Razii, you are benchmarking compiler optimization techniques, not
language differences. Again, just as with your I/O hardware
benchmarks, your tests have two many variables in them to be used as
simply a comparison between C++ and Java.

Jason
 
R

Razii

g++: 4.080 g++ -O3 bench.cpp -o bench; time ./bench
g++2: 4.080 g++ -O3 bench2.cpp -o bench2; time ./bench2
ocaml: 4.048 ocamlopt bench.ml -o bench; time ./bench
java: 4.016 javac Arrays.java; time java Arrays

Make sure you are using server VM to run it ... that is,
java -server Arrays
 
J

Jerry Coffin

[ ... ]
Out of curiosity, I tried to test if the above is true. It didn't make
any difference. In fact, C++ was a bit faster (not by much, just 6%).
Probably due to array bound check in Java, if there is in indeed an
issue with C++ arrays, overall there is no difference.

Or maybe it did make a difference, and C++ would have won by a much
larger margin if it hadn't been for aliasing problems. If you're looking
at heavy-duty numeric programming (which is where this sort of problem
typically arises) the normal standard of comparison is Fortran.
 
A

Andrey Tarasevich

Razii said:
Out of curiosity, I tried to test if the above is true. It didn't make
any difference. In fact, C++ was a bit faster (not by much, just 6%).
Probably due to array bound check in Java, if there is in indeed an
issue with C++ arrays, overall there is no difference.
...

I don't really see much sense in comparing Java performance with C++
performance. What might be more interesting is the effect of the 'restrict'
specifier supported by C99 compilers (and many C89/90 and C++ compielers as an
extension), which is intended to assist compiler in performing exactly this kind
of optimizations.
 
J

Jerry Coffin

[ ... ]
I don't really see much sense in comparing Java performance with C++
performance. What might be more interesting is the effect of the 'restrict'
specifier supported by C99 compilers (and many C89/90 and C++ compielers as an
extension), which is intended to assist compiler in performing exactly this kind
of optimizations.

Of, if you're really interested in C++, you could throw in a comparison
use a valarray, which was designed to give the same sort of assurance.
Unfortunately, I don't know of anybody who seems to have gone to much
(if any) trouble to optimize valarray at all -- rather the contrary, it
was an idea that even its own creator admits was in the wrong place at
the wrong time, so it's ignored almost to death, so to speak.
 
R

Razii

Or maybe it did make a difference, and C++ would have won by a much
larger margin if it hadn't been for aliasing problems.

No it's not. The compiler probably inlined the method, so there was no
aliasing problem here.
If you're looking
at heavy-duty numeric programming (which is where this sort of problem
typically arises) the normal standard of comparison is Fortran.

If you read the post carefully, you would have noticed that Kanze was
discussing java and c++. Why would I use Fortran?
 
J

Jon Harrop

Only sometimes; and it's a valid optimization.

No, that is completely wrong. Valid optimizations must not break correct
code as -ffast-math does. This is why -ffast-math is not enabled by any -On
setting. Read the documentation.
Specifically, in this case, the results are identical.

They may appear identical on your machine with your compiler but that is
certainly not true in general. In particular, the -ffast-math option breaks
this program on my computer, giving incorrect results.
Mostly, in my experience, you start
to lose precision with -ffast-math when you start doing things beyond
simple arithmetic, such as sqrt() and cos(), or when you get into the
realm of overflows and NaNs.

In case anybody is curious the Intel compiler yields similar results
to VS, and to GCC with SSE3 enabled (but no -ffast-math), which is the
expected results:

icl /Ox /QxP /Qipo /Qunroll-aggressive smooth.cpp

Was about 7400 ms for me. With:

icl /Ox /QxP /Qipo /Qprec-div- /Qunroll-aggressive smooth.cpp

Dropping it down to 1100 ms (ICC's /Qprec-div- is similar in spirit to
GCC's -ffast-math).

Following are 3 source files and a Makefile, I used MinGW GCC 3.4.5;
you will want to implement your own tick()/tock() functions; the
windows.h #include is only for those. The output, for me, is:

$ ./smooth.exe
no -ffast-math: 8796.27
-ffast-math: 923.052
1e-014
delta: 0
they are precisely equal.

Compiling with -ffast-math gives 25% incorrect results on this machine (AMD
Athlon(tm) 64 X2 Dual Core Processor 4400+, g++ 4.2.3):

$ g++ -O3 test1.cpp -o test1
$ ./test1 >output.txt
$ g++ -O3 -ffast-math test1.cpp -o test1
$ ./test1 >output2.txt
$ diff output.txt output2.txt | wc
21242 42480 615958

Note: I replaced "rand()" in "fill" with "i" to make the program
deterministic.

Here are some of the differing results (correct results first):

49361.00000000000000000000
49362.00000000000000000000
49363.00000000000000000000
49364.00000000000000000000
49365.00000000000000000000

49360.99999999998544808477
49361.99999999997817212716
49362.99999999995634425431
49363.99999999992724042386
49364.99999999989813659340

As you can see, enabling -ffast-math really does break this program. As I
said, this is not a valid optimization.
 
J

Jon Harrop

Razii said:
No it's not. The compiler probably inlined the method, so there was no
aliasing problem here.

This benchmark appears to run several times faster with -ffast-math on
several machines. Until we find out why, I don't believe there is any point
in speculating about the relative importance of aliasing in this context.
If you read the post carefully, you would have noticed that Kanze was
discussing java and c++. Why would I use Fortran?

Using Fortran instead of Java would remove a lot of irrelevant
complications, like JIT, HotSpot, slow startup times, ugly programmers etc.
 
J

Jon Harrop

Razii said:
Is there a difference between float and double in OCaml? If yes,
wouldn't it make difference here?

Except for specialized container types, OCaml only provides 64-bit floats.
The keyword "float" there means "double" in C/C++/Java. So there was no
problem with my benchmark results.

Indeed, it might be interesting to do the same comparison again but with
32-bit floats.
 
D

dave_mikesell

The problem is C++ is already suffering from bloating problem.

What? Haven't you been complaining that its library is too small?
Don't you prefer a language that produces a 22MB "Hello World"
program?
 
J

James Kanze

What all other bench marks? :) I only posted one IO example. This
post was not a benchmark but a question to Kanze to prove by posting
an example (that I can test) where aliasing in c++ makes optimizing
arrays hard (or impossible). The example that he gave (which I used),
for ( size_t i = 1 ; i < len - 1 ; ++ i ) {
dest[ i - 1 ] = (src[ i - 1 ] + src[ i ] + src[ i + 1 ]) / 3 ;
}
c++ was in fact faster. Obviously aliasing was no issue in
this case, as he claimed.

It was obvious that the Java optimizer wasn't very good. Or
that the code was used in a way that didn't allow the Java
optimizer to do as good a job as it was capable. The fact
remains that aliasing is an issue here, and that Java should be
able to optimize the loop better than C++, because of the lack
of aliasing in Java.

The fact that you're not intelligent enough to understand the
issues, and prefer annecdotal evidence, doesn't change anything.
 
J

James Kanze

The bench2 program is the following, more idiomatic, C++
version that uses bounds checking:

You use vector<>::at(), which is certainly not idiomatic
C++---I've never seen C++ code which used it, nor which should.
Most implementations of vector<>::eek:perator[] do the right thing
(assertion failure on bounds error) if compiled with the correct
options. (FWIW: VC++ requires some special
option---non-documented, I think---to turn off all runtime
checking. I don't know whether bounds checking is part of this
or not.)
 
J

James Kanze

[ ... ]
Out of curiosity, I tried to test if the above is true. It
didn't make any difference. In fact, C++ was a bit faster
(not by much, just 6%). Probably due to array bound check
in Java, if there is in indeed an issue with C++ arrays,
overall there is no difference.
Or maybe it did make a difference, and C++ would have won by a
much larger margin if it hadn't been for aliasing problems. If
you're looking at heavy-duty numeric programming (which is
where this sort of problem typically arises) the normal
standard of comparison is Fortran.

Quite. It is problems like these which caused the Fortran
standard to say that such aliasing results in undefined
behavior. Precisely so that the compiler can do the important
optimizations.

Of course, this is all a discussion about the optimizations
which the language allows. Whether any particular
implementation actually does them or not is a different
question.
 
R

Razii

The fact that you're not intelligent enough to understand the
issues, and prefer annecdotal evidence, doesn't change anything.

You are the one who claimed that you can write code based on arrays
aliasing problem that would show c++ to be slower than java. You also
claimed that such code often shows up in benchmarks by people claiming
java is fast. Now that I asked you to post proof, you are claiming
that java optimizing is not good and that I am not intelligent enough.
Nice excuse and ad hominem but that doesn't change the fact that you
were wrong. You have been proven wrong and now you are running away by
name calling. Why did you claim that such code often shows up in
benchmark by java people? Where is it?
 
L

Lionel B

[ ... ]
I don't really see much sense in comparing Java performance with C++
performance. What might be more interesting is the effect of the
'restrict' specifier supported by C99 compilers (and many C89/90 and
C++ compielers as an extension), which is intended to assist compiler
in performing exactly this kind of optimizations.

Of, if you're really interested in C++, you could throw in a comparison
use a valarray, which was designed to give the same sort of assurance.
Unfortunately, I don't know of anybody who seems to have gone to much
(if any) trouble to optimize valarray at all -- rather the contrary, it
was an idea that even its own creator admits was in the wrong place at
the wrong time, so it's ignored almost to death, so to speak.

FWIW on my (gcc 4.1.1 supplied) implementation valarrays appear to be
simply pointers to new'ed memory with a (GCC-specific) __restrict__
qualifier.

Personally I've never managed to code up a scenario (using GCC with
various optimisations) where __restrict__ appears to have made any
difference whatsoever. I seem to recall that this has come up now and
again - inconclusively - on the g++ lists.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,777
Messages
2,569,604
Members
45,203
Latest member
KaliShumat

Latest Threads

Top