incredible slowdown switching to 64 bit g++

Discussion in 'C++' started by nandor.sieben@gmail.com, Nov 25, 2008.

  1. Guest

    I have a fairly complex C++ program that uses a lot of STL, number
    crunching using doubles and the lapack library (-llapack -lblas -lg2c -
    lm). The code works fine on any 32 bit unix machine compiled with g++
    but when I try it on a 64 bit machine a running time of 10 seconds
    becomes 15 minutes. The code is complex, I could not create a simple
    subset that produces this problem. I tried this on several 32 and 64
    bit machines. The speed of the machines are comparable. I use -O2
    optimization. The program is not swapping to disk. What could cause
    this incredible slowdown?

    Some suspects:

    -The lapack library
    - Tolerances I use for floating point comparisons
    - Large vector<vector<int > > variables ( even vector<vector<vector< >
    > > variable )

    - Need a compiler option on the 64 bit machines?
    - Random number generator
     
    , Nov 25, 2008
    #1
    1. Advertising

  2. Kai-Uwe Bux Guest

    wrote:

    > I have a fairly complex C++ program that uses a lot of STL, number
    > crunching using doubles and the lapack library (-llapack -lblas -lg2c -
    > lm). The code works fine on any 32 bit unix machine compiled with g++
    > but when I try it on a 64 bit machine a running time of 10 seconds
    > becomes 15 minutes. The code is complex, I could not create a simple
    > subset that produces this problem. I tried this on several 32 and 64
    > bit machines. The speed of the machines are comparable. I use -O2
    > optimization. The program is not swapping to disk. What could cause
    > this incredible slowdown?

    [snip]

    Just A Quick question: does the program do the same thing on a 64bit machine
    as on a 32bit machine? has testing shown that for the same input you get
    the same output?


    Best

    Kai-Uwe Bux
     
    Kai-Uwe Bux, Nov 25, 2008
    #2
    1. Advertising

  3. Ian Collins Guest

    wrote:
    > I have a fairly complex C++ program that uses a lot of STL, number
    > crunching using doubles and the lapack library (-llapack -lblas -lg2c -
    > lm). The code works fine on any 32 bit unix machine compiled with g++
    > but when I try it on a 64 bit machine a running time of 10 seconds
    > becomes 15 minutes. The code is complex, I could not create a simple
    > subset that produces this problem. I tried this on several 32 and 64
    > bit machines. The speed of the machines are comparable. I use -O2
    > optimization. The program is not swapping to disk. What could cause
    > this incredible slowdown?
    >

    You'll have to profile to find out. It's not uncommon for 64 bit
    executables to be slower when the code has been tuned for 32bit. That's
    one reason why 32 bit executables are still common on 64 bit platforms.

    > Some suspects:
    >
    > -The lapack library
    > - Tolerances I use for floating point comparisons


    Shouldn't matter.

    > - Large vector<vector<int > > variables ( even vector<vector<vector< >
    >>> variable )


    Shouldn't matter. A heavy use of long might.

    Try a gcc or Linux group or maybe comp.unix.programmer.

    --
    Ian Collins
     
    Ian Collins, Nov 25, 2008
    #3
  4. Guest

    > Just A Quick question: does the program do the same thing on a 64bit machine
    > as on a 32bit machine? has testing shown that for the same input you get
    > the same output?


    There are small differences in the values of doubles but I guess
    that's not unexpected.
     
    , Nov 25, 2008
    #4
  5. Guest

    > You'll have to profile to find out.  

    I did try profiling but I did not make much sense of it. Generally it
    seemed
    like everything takes somewhat longer.

    > It's not uncommon for 64 bit
    > executables to be slower when the code has been tuned for 32bit.  That's
    > one reason why 32 bit executables are still common on 64 bit platforms.


    But could it be such a huge difference? What does it mean to be tuned
    for 32bit?
    The code does not depend on it, could it be that the STL library is
    optimized for
    32 bit or the lapack library?

    > Shouldn't matter.  A heavy use of long might.


    No long in the code.
     
    , Nov 25, 2008
    #5
  6. Guest

    > It's not uncommon for 64 bit
    > executables to be slower when the code has been tuned for 32bit.  That's
    > one reason why 32 bit executables are still common on 64 bit platforms.


    I am not trying to run the executable compiled on the 32 bit machine.
    I recompile
    everything on the 64 bit machines.
     
    , Nov 25, 2008
    #6
  7. Ian Collins Guest

    wrote:
    >> It's not uncommon for 64 bit
    >> executables to be slower when the code has been tuned for 32bit. That's
    >> one reason why 32 bit executables are still common on 64 bit platforms.

    >
    > I am not trying to run the executable compiled on the 32 bit machine.
    > I recompile
    > everything on the 64 bit machines.


    Why?

    --
    Ian Collins
     
    Ian Collins, Nov 25, 2008
    #7
  8. Guest

    > > I am not trying to run the executable compiled on the 32 bit machine.
    > > I recompile
    > > everything on the 64 bit machines.

    >
    > Why?


    It is my own code. Since I have the source code it makes sense to
    recompile and hope it
    will be optimized for the new machine. I don't know if the 32 bit
    executable would run on
    the 64 bit machines but perhaps I should try that.

    Could this piece of code be responsible?

    extern "C"
    {
    void dsyev_ (const char *jobz,
    const char *uplo,
    const int &n,
    double a[],
    const int &lda,
    double w[], double work[], int &lwork, int &info);
    }

    int
    dsyev (const vector < vector < double > >&mat, vector < double
    >&eval,

    vector < vector < double > >&evec)
    {
    ....
    dsyev_ ("V", "U", n, a, n, w, work, lwork, info);
    ....
    }

    This is how I use the fortran lapack library. Perhaps the type sizes
    change differently in C++ and in Fortran
    when going from 32 bit to 64 bit.
     
    , Nov 25, 2008
    #8
  9. Ian Collins Guest

    wrote:
    >>> I am not trying to run the executable compiled on the 32 bit machine.
    >>> I recompile
    >>> everything on the 64 bit machines.

    >> Why?

    >
    > It is my own code. Since I have the source code it makes sense to
    > recompile and hope it
    > will be optimized for the new machine. I don't know if the 32 bit
    > executable would run on
    > the 64 bit machines but perhaps I should try that.
    >

    Under any decent OS, they should. I don't use 32 bit systems any more
    and I seldom build 64 bit executables.

    > Could this piece of code be responsible?
    >
    > extern "C"
    > {
    > void dsyev_ (const char *jobz,
    > const char *uplo,
    > const int &n,
    > double a[],
    > const int &lda,
    > double w[], double work[], int &lwork, int &info);
    > }
    >

    doubles or ints shouldn't be an issue.

    You'd should try asking on a more specialised group. You should be able
    to find a 64 porting guide for your platform.

    --
    Ian Collins
     
    Ian Collins, Nov 25, 2008
    #9
  10. On Nov 25, 6:21 am, wrote:
    > I have a fairly complex C++ program that uses a lot of STL, number
    > crunching using doubles and the lapack library (-llapack -lblas -lg2c -
    > lm). The code works fine on any 32 bit unix machine compiled with g++
    > but when I try it on a 64 bit machine a running time of 10 seconds
    > becomes 15 minutes. The code is complex, I could not create a simple
    > subset that produces this problem. I tried this on several 32 and 64
    > bit machines. The speed of the machines are comparable. I use -O2
    > optimization. The program is not swapping to disk. What could cause
    > this incredible slowdown?


    []

    Have you tried comparing 32 and 64-bit versions compiled on the very
    same machine? Use -m32 compiler switch to compile a 32-bit version.

    --
    Max
     
    Maxim Yegorushkin, Nov 25, 2008
    #10
  11. James Kanze Guest

    On Nov 25, 8:27 am, wrote:
    > > Just A Quick question: does the program do the same thing on
    > > a 64bit machine as on a 32bit machine? has testing shown
    > > that for the same input you get the same output?


    > There are small differences in the values of doubles but I
    > guess that's not unexpected.


    Not really. Both the 64 bit machine and the 32 bit one are
    probably using IEEE doubles. Even on a 32 bit machine, a double
    is normally 64 bits.

    Compiling in 64 bit mode will often result some reduction in
    speed, because of larger program size, and thus poorer locality.
    I can't imagine this representing more than a difference of
    about 10 or 20 percent, however, and I would expect it usually
    to be a lot less.

    Have you profiled the two cases, to see which functions have
    become significantly slower?

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, Nov 25, 2008
    #11
  12. joseph cook Guest

    On Nov 25, 1:21 am, wrote:
    The program is not swapping to disk. What could cause
    > this incredible slowdown?
    >
    > Some suspects:
    >
    > -The lapack library
    > - Tolerances I use for floating point comparisons
    > - Large vector<vector<int > > variables ( even vector<vector<vector< >> >  variable )
    >
    > - Need a compiler option on the 64 bit machines?
    > - Random number generator


    Your #1 suspect you have not mentioned. You switched out the
    machine! The amount of memory and the architecture (how much L1
    cache for example) will make major differences. I'm sure you got
    this new machine because it has better specs, but maybe it is short on
    memory for the loading. (Is this program competing with a Windows O/S
    while it runs now, or something similar?). Maybe there is a
    architecture specific flag you should be adding on these new
    machines ?

    Joe
     
    joseph cook, Nov 25, 2008
    #12
  13. gpderetta Guest

    On Nov 25, 7:21 am, wrote:
    > I have a fairly complex C++ program that uses a lot of STL, number
    > crunching using doubles and the lapack library (-llapack -lblas -lg2c -
    > lm). The code works fine on any 32 bit unix machine compiled with g++
    > but when I try it on a 64 bit machine a running time of 10 seconds
    > becomes 15 minutes. The code is complex, I could not create a simple
    > subset that produces this problem. I tried this on several 32 and 64
    > bit machines. The speed of the machines are comparable. I use -O2
    > optimization. The program is not swapping to disk. What could cause
    > this incredible slowdown?
    >
    > Some suspects:
    >
    > -The lapack library
    > - Tolerances I use for floating point comparisons
    > - Large vector<vector<int > > variables ( even vector<vector<vector< >> >  variable )
    >
    > - Need a compiler option on the 64 bit machines?
    > - Random number generator


    Random guess: maybe the lapack/blas library uses hand optimized
    assembler in 32 bit mode while it uses unoptimized C code (or fortran
    or whatever) in 64 mode. But I do not think this is enough to explain
    the slow down.

    --
    Giovanni P. Deretta
     
    gpderetta, Nov 26, 2008
    #13
  14. Lionel B Guest

    On Wed, 26 Nov 2008 06:44:15 -0800, gpderetta wrote:
    > On Nov 25, 7:21 am, wrote:
    >
    >> I have a fairly complex C++ program that uses a lot of STL, number
    >> crunching using doubles and the lapack library (-llapack -lblas -lg2c -
    >> lm). The code works fine on any 32 bit unix machine compiled with g++
    >> but when I try it on a 64 bit machine a running time of 10 seconds
    >> becomes 15 minutes. The code is complex, I could not create a simple
    >> subset that produces this problem. I tried this on several 32 and 64
    >> bit machines. The speed of the machines are comparable. I use -O2
    >> optimization. The program is not swapping to disk. What could cause
    >> this incredible slowdown?
    >>
    >> Some suspects:
    >>
    >> -The lapack library
    >> - Tolerances I use for floating point comparisons
    >> - Large vector<vector<int > > variables ( even vector<vector<vector< >>
    >>  variable )
    >>
    >> - Need a compiler option on the 64 bit machines?
    >> - Random number generator


    > Random guess: maybe the lapack/blas library uses hand optimized
    > assembler in 32 bit mode while it uses unoptimized C code (or fortran
    > or whatever) in 64 mode.


    That was my thought; many default OS installations will supply an
    unoptimised "reference" BLAS/LAPACK.

    > But I do not think this is enough to explain the slow down.


    I have seen pretty drastic performance hits with reference BLAS/LAPACK as
    compared with a vendor-supplied optimised version, or something like
    ATLAS, although not quite that dramatic...

    [BTW, is there something odd about the wrapping/flowing in the previous
    article? My newsreader seems to display it as blank unless I force it to
    wrap.]

    --
    Lionel B
     
    Lionel B, Nov 26, 2008
    #14
  15. Guest


    > Have you tried comparing 32 and 64-bit versions compiled on the very
    > same machine? Use -m32 compiler switch to compile a 32-bit version.
    > Max


    I compiled the code on the 3-bit machine and run it on the 64-bit
    machine.
    It runs without the extreme slowdown.

    I was not able to compile on the 64-bit machine using -m32. It
    produces the error message:
    /usr/bin/ld: cannot open gcrt1.o: No such file or directory
    collect2: ld returned 1 exit status
     
    , Nov 27, 2008
    #15
  16. Guest

    > Your #1 suspect you have not mentioned.  You switched out the
    > machine!   The amount of memory and the architecture (how much L1
    > cache for example) will make major differences.   I'm sure you got
    > this new machine because it has better specs, but maybe it is short on
    > memory for the loading.  (Is this program competing with a Windows O/S
    > while it runs now, or something similar?).  Maybe there is a
    > architecture specific flag you should be adding on these new
    > machines ?
    >
    > Joe


    I tried it on two different 32 bit machine (Redhat, Ubuntu) and three
    different 64-bit machine.
    The machines don't matter. The compiler is g++ in all 5 cases. The
    systems are comparable in speed
    and memory.
     
    , Nov 27, 2008
    #16
  17. Guest

    > Random guess: maybe the lapack/blas library uses hand optimized
    > assembler in 32 bit mode while it uses unoptimized C code (or fortran
    > or whatever) in 64 mode. But I do not think this is enough to explain
    > the slow down.
    > --
    > Giovanni P. Deretta


    Based on printed output, it looks like everything is slower, not only
    the lapack calls.
     
    , Nov 27, 2008
    #17
  18. On Nov 27, 5:09 am, wrote:
    > > Have you tried comparing 32 and 64-bit versions compiled on the very
    > > same machine? Use -m32 compiler switch to compile a 32-bit version.
    > > Max

    >
    > I  compiled the code on the 3-bit machine and run it on the 64-bit
    > machine.
    > It runs without the extreme slowdown.
    >
    > I was not able to compile on the 64-bit machine using -m32.
    > It produces the error message:
    > /usr/bin/ld: cannot open gcrt1.o: No such file or directory
    > collect2: ld returned 1 exit status


    It is a linker error, not a compiler one. Try using -m32 switch for
    linking as well.

    --
    Max
     
    Maxim Yegorushkin, Nov 27, 2008
    #18
  19. Guest

    > It is a linker error, not a compiler one. Try using -m32 switch for
    > linking as well.
    >
    > --
    > Max


    I had the -m32 for the linker. I moved to another 64 bit machine. Same
    slowness for
    regular compiling. Runs as fast as it does for the 32 bit machines
    when I compile
    with -m32.
     
    , Nov 28, 2008
    #19
  20. Guest

    Thank you everybody for the help. I have found the solution.
    It is simple, I just need the compiler flag -ffast-math.
     
    , Nov 28, 2008
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    0
    Views:
    474
  2. ProgDario
    Replies:
    4
    Views:
    2,086
    ProgDario
    May 5, 2005
  3. Replies:
    0
    Views:
    366
  4. Replies:
    0
    Views:
    386
  5. Arondelle

    Just see this...It`s incredible!

    Arondelle, Jul 6, 2004, in forum: HTML
    Replies:
    2
    Views:
    450
    Will Gittoes
    Jul 7, 2004
Loading...

Share This Page