incredible slowdown switching to 64 bit g++

N

nandor.sieben

I have a fairly complex C++ program that uses a lot of STL, number
crunching using doubles and the lapack library (-llapack -lblas -lg2c -
lm). The code works fine on any 32 bit unix machine compiled with g++
but when I try it on a 64 bit machine a running time of 10 seconds
becomes 15 minutes. The code is complex, I could not create a simple
subset that produces this problem. I tried this on several 32 and 64
bit machines. The speed of the machines are comparable. I use -O2
optimization. The program is not swapping to disk. What could cause
this incredible slowdown?

Some suspects:

-The lapack library
- Tolerances I use for floating point comparisons
- Need a compiler option on the 64 bit machines?
- Random number generator
 
K

Kai-Uwe Bux

I have a fairly complex C++ program that uses a lot of STL, number
crunching using doubles and the lapack library (-llapack -lblas -lg2c -
lm). The code works fine on any 32 bit unix machine compiled with g++
but when I try it on a 64 bit machine a running time of 10 seconds
becomes 15 minutes. The code is complex, I could not create a simple
subset that produces this problem. I tried this on several 32 and 64
bit machines. The speed of the machines are comparable. I use -O2
optimization. The program is not swapping to disk. What could cause
this incredible slowdown?
[snip]

Just A Quick question: does the program do the same thing on a 64bit machine
as on a 32bit machine? has testing shown that for the same input you get
the same output?


Best

Kai-Uwe Bux
 
I

Ian Collins

I have a fairly complex C++ program that uses a lot of STL, number
crunching using doubles and the lapack library (-llapack -lblas -lg2c -
lm). The code works fine on any 32 bit unix machine compiled with g++
but when I try it on a 64 bit machine a running time of 10 seconds
becomes 15 minutes. The code is complex, I could not create a simple
subset that produces this problem. I tried this on several 32 and 64
bit machines. The speed of the machines are comparable. I use -O2
optimization. The program is not swapping to disk. What could cause
this incredible slowdown?
You'll have to profile to find out. It's not uncommon for 64 bit
executables to be slower when the code has been tuned for 32bit. That's
one reason why 32 bit executables are still common on 64 bit platforms.
Some suspects:

-The lapack library
- Tolerances I use for floating point comparisons

Shouldn't matter.

Shouldn't matter. A heavy use of long might.

Try a gcc or Linux group or maybe comp.unix.programmer.
 
N

nandor.sieben

Just A Quick question: does the program do the same thing on a 64bit machine
as on a 32bit machine? has testing shown that for the same input you get
the same output?

There are small differences in the values of doubles but I guess
that's not unexpected.
 
N

nandor.sieben

You'll have to profile to find out.  

I did try profiling but I did not make much sense of it. Generally it
seemed
like everything takes somewhat longer.
It's not uncommon for 64 bit
executables to be slower when the code has been tuned for 32bit.  That's
one reason why 32 bit executables are still common on 64 bit platforms.

But could it be such a huge difference? What does it mean to be tuned
for 32bit?
The code does not depend on it, could it be that the STL library is
optimized for
32 bit or the lapack library?
Shouldn't matter.  A heavy use of long might.

No long in the code.
 
N

nandor.sieben

It's not uncommon for 64 bit
executables to be slower when the code has been tuned for 32bit.  That's
one reason why 32 bit executables are still common on 64 bit platforms.

I am not trying to run the executable compiled on the 32 bit machine.
I recompile
everything on the 64 bit machines.
 
N

nandor.sieben

I am not trying to run the executable compiled on the 32 bit machine.

It is my own code. Since I have the source code it makes sense to
recompile and hope it
will be optimized for the new machine. I don't know if the 32 bit
executable would run on
the 64 bit machines but perhaps I should try that.

Could this piece of code be responsible?

extern "C"
{
void dsyev_ (const char *jobz,
const char *uplo,
const int &n,
double a[],
const int &lda,
double w[], double work[], int &lwork, int &info);
}

int
dsyev (const vector said:
vector < vector < double > >&evec)
{
....
dsyev_ ("V", "U", n, a, n, w, work, lwork, info);
....
}

This is how I use the fortran lapack library. Perhaps the type sizes
change differently in C++ and in Fortran
when going from 32 bit to 64 bit.
 
I

Ian Collins

It is my own code. Since I have the source code it makes sense to
recompile and hope it
will be optimized for the new machine. I don't know if the 32 bit
executable would run on
the 64 bit machines but perhaps I should try that.
Under any decent OS, they should. I don't use 32 bit systems any more
and I seldom build 64 bit executables.
Could this piece of code be responsible?

extern "C"
{
void dsyev_ (const char *jobz,
const char *uplo,
const int &n,
double a[],
const int &lda,
double w[], double work[], int &lwork, int &info);
}
doubles or ints shouldn't be an issue.

You'd should try asking on a more specialised group. You should be able
to find a 64 porting guide for your platform.
 
M

Maxim Yegorushkin

I have a fairly complex C++ program that uses a lot of STL, number
crunching using doubles and the lapack library (-llapack -lblas -lg2c -
lm). The code works fine on any 32 bit unix machine compiled with g++
but when I try it on a 64 bit machine a running time of 10 seconds
becomes 15 minutes. The code is complex, I could not create a simple
subset that produces this problem. I tried this on several 32 and 64
bit machines. The speed of the machines are comparable. I use -O2
optimization. The program is not swapping to disk. What could cause
this incredible slowdown?

[]

Have you tried comparing 32 and 64-bit versions compiled on the very
same machine? Use -m32 compiler switch to compile a 32-bit version.
 
J

James Kanze

There are small differences in the values of doubles but I
guess that's not unexpected.

Not really. Both the 64 bit machine and the 32 bit one are
probably using IEEE doubles. Even on a 32 bit machine, a double
is normally 64 bits.

Compiling in 64 bit mode will often result some reduction in
speed, because of larger program size, and thus poorer locality.
I can't imagine this representing more than a difference of
about 10 or 20 percent, however, and I would expect it usually
to be a lot less.

Have you profiled the two cases, to see which functions have
become significantly slower?
 
J

joseph cook

On Nov 25, 1:21 am, (e-mail address removed) wrote:
The program is not swapping to disk. What could cause
this incredible slowdown?

Some suspects:

-The lapack library
- Tolerances I use for floating point comparisons
- Large vector<vector<int > > variables ( even vector<vector<vector< >> >  variable )

- Need a compiler option on the 64 bit machines?
- Random number generator

Your #1 suspect you have not mentioned. You switched out the
machine! The amount of memory and the architecture (how much L1
cache for example) will make major differences. I'm sure you got
this new machine because it has better specs, but maybe it is short on
memory for the loading. (Is this program competing with a Windows O/S
while it runs now, or something similar?). Maybe there is a
architecture specific flag you should be adding on these new
machines ?

Joe
 
G

gpderetta

I have a fairly complex C++ program that uses a lot of STL, number
crunching using doubles and the lapack library (-llapack -lblas -lg2c -
lm). The code works fine on any 32 bit unix machine compiled with g++
but when I try it on a 64 bit machine a running time of 10 seconds
becomes 15 minutes. The code is complex, I could not create a simple
subset that produces this problem. I tried this on several 32 and 64
bit machines. The speed of the machines are comparable. I use -O2
optimization. The program is not swapping to disk. What could cause
this incredible slowdown?

Some suspects:

-The lapack library
- Tolerances I use for floating point comparisons
- Large vector<vector<int > > variables ( even vector<vector<vector< >> >  variable )

- Need a compiler option on the 64 bit machines?
- Random number generator

Random guess: maybe the lapack/blas library uses hand optimized
assembler in 32 bit mode while it uses unoptimized C code (or fortran
or whatever) in 64 mode. But I do not think this is enough to explain
the slow down.
 
L

Lionel B

Random guess: maybe the lapack/blas library uses hand optimized
assembler in 32 bit mode while it uses unoptimized C code (or fortran
or whatever) in 64 mode.

That was my thought; many default OS installations will supply an
unoptimised "reference" BLAS/LAPACK.
But I do not think this is enough to explain the slow down.

I have seen pretty drastic performance hits with reference BLAS/LAPACK as
compared with a vendor-supplied optimised version, or something like
ATLAS, although not quite that dramatic...

[BTW, is there something odd about the wrapping/flowing in the previous
article? My newsreader seems to display it as blank unless I force it to
wrap.]
 
N

nandor.sieben

Have you tried comparing 32 and 64-bit versions compiled on the very
same machine? Use -m32 compiler switch to compile a 32-bit version.
Max

I compiled the code on the 3-bit machine and run it on the 64-bit
machine.
It runs without the extreme slowdown.

I was not able to compile on the 64-bit machine using -m32. It
produces the error message:
/usr/bin/ld: cannot open gcrt1.o: No such file or directory
collect2: ld returned 1 exit status
 
N

nandor.sieben

Your #1 suspect you have not mentioned.  You switched out the
machine!   The amount of memory and the architecture (how much L1
cache for example) will make major differences.   I'm sure you got
this new machine because it has better specs, but maybe it is short on
memory for the loading.  (Is this program competing with a Windows O/S
while it runs now, or something similar?).  Maybe there is a
architecture specific flag you should be adding on these new
machines ?

Joe

I tried it on two different 32 bit machine (Redhat, Ubuntu) and three
different 64-bit machine.
The machines don't matter. The compiler is g++ in all 5 cases. The
systems are comparable in speed
and memory.
 
N

nandor.sieben

Random guess: maybe the lapack/blas library uses hand optimized
assembler in 32 bit mode while it uses unoptimized C code (or fortran
or whatever) in 64 mode. But I do not think this is enough to explain
the slow down.

Based on printed output, it looks like everything is slower, not only
the lapack calls.
 
M

Maxim Yegorushkin

I  compiled the code on the 3-bit machine and run it on the 64-bit
machine.
It runs without the extreme slowdown.

I was not able to compile on the 64-bit machine using -m32.
It produces the error message:
/usr/bin/ld: cannot open gcrt1.o: No such file or directory
collect2: ld returned 1 exit status

It is a linker error, not a compiler one. Try using -m32 switch for
linking as well.
 
N

nandor.sieben

It is a linker error, not a compiler one. Try using -m32 switch for
linking as well.

I had the -m32 for the linker. I moved to another 64 bit machine. Same
slowness for
regular compiling. Runs as fast as it does for the 32 bit machines
when I compile
with -m32.
 
N

nandor.sieben

Thank you everybody for the help. I have found the solution.
It is simple, I just need the compiler flag -ffast-math.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,013
Latest member
KatriceSwa

Latest Threads

Top