ANSI C problem on P4 under Linux & Windows

Discussion in 'C Programming' started by VNG, Aug 22, 2004.

  1. VNG

    VNG Guest

    I have an ANSI C program that was compiled under Windows MSVC++ 6.0 (SP6) and
    under Linux gnu, and ran under P3, P4 and AMD.

    It runs fine on P3 and AMD under both Windows and Linux, but under P4 it has
    problems. Under Windows 3GHz P4 runs twice slower than 800MHz P3... and under
    Linux not only that it runs slower (while AMD is 40 times faster), but it also
    produces wrong numerical results...

    Any suggestion what can be the problem?

    How to fix the P4 speed under MSVC++ (SP6)?
    How to fix P4's speed and numerical result under Linux?

    Here's some more details about the compilation:
    GNU:
    CFLAGS=-O6 -fexpensive-optimizations -ffast-math -fno-strength-reduce
    -funroll-loops -fomit-frame-pointer -Wno-long-long -Wno-unused


    Basically one of the most intensive loops (that we suspect in but aren't sure if
    it causes the problem) looks like this:

    static long loop_order;

    void functionname ()
    {
    register float *iPtr, *itPtr, *iPtr1, *cPtr, acc;
    register long j;
    :
    {
    register float c1, c2;
    j = loop_order;
    while (j--)
    {
    acc = *itPtr-- * c1;
    acc += *itPtr-- * c2;
    acc += *itPtr++ * c3;
    *cPtr++ += *iPtr1++ * acc;
    }
    }
    :
    }

    We have tried to eliminate the use of the word "register" and redefined "j" as
    volatile, no change.


    Thanks,
    -- VNG
     
    VNG, Aug 22, 2004
    #1
    1. Advertising

  2. VNG

    SM Ryan Guest

    # {
    # register float c1, c2;
    # j = loop_order;
    # while (j--)
    # {
    # acc = *itPtr-- * c1;
    # acc += *itPtr-- * c2;
    # acc += *itPtr++ * c3;
    # *cPtr++ += *iPtr1++ * acc;
    # }
    # }

    Is there some reason to keep loading itPtr[-1] and itPtr[-2]
    inside the loop instead of outside?

    --
    SM Ryan http://www.rawbw.com/~wyrmwif/
    One of the drawbacks of being a martyr is that you have to die.
     
    SM Ryan, Aug 22, 2004
    #2
    1. Advertising

  3. VNG

    Profetas Guest

    which OS do you have in your P3?

    newer OS/compiler may not use the register to store your vars which will
    be slower
     
    Profetas, Aug 22, 2004
    #3
  4. VNG

    -berlin.de Guest

    Profetas <> wrote:
    > which OS do you have in your P3?


    Did you ever read the post? The OP writes it all at the start of his
    article.

    > newer OS/compiler may not use the register to store your vars which will
    > be slower


    That's simply BS. First of all, 'register' was never more than a
    hint to the compiler that a variable will be used a lot and that
    it might be a good idea to store it in a register. But the compiler
    was always free to disregard this hint. Moreover, newer compilers
    are usually quite good at figuring out such things, so you usually
    don't need the 'register' keyword anymore because the compiler will
    automatically pick the most suitable variables for keeping them in
    registers. And, finally, this didn't got anything at all to do with
    the OS.
    Regards, Jens
    --
    \ Jens Thoms Toerring ___ -berlin.de
    \__________________________ http://www.toerring.de
     
    -berlin.de, Aug 22, 2004
    #4
  5. VNG

    -berlin.de Guest

    VNG <> wrote:
    > I have an ANSI C program that was compiled under Windows MSVC++ 6.0 (SP6) and
    > under Linux gnu, and ran under P3, P4 and AMD.


    > It runs fine on P3 and AMD under both Windows and Linux, but under P4 it has
    > problems. Under Windows 3GHz P4 runs twice slower than 800MHz P3... and under
    > Linux not only that it runs slower (while AMD is 40 times faster), but it also
    > produces wrong numerical results...


    > Any suggestion what can be the problem?


    > How to fix the P4 speed under MSVC++ (SP6)?
    > How to fix P4's speed and numerical result under Linux?


    > Here's some more details about the compilation:
    > GNU:
    > CFLAGS=-O6 -fexpensive-optimizations -ffast-math -fno-strength-reduce
    > -funroll-loops -fomit-frame-pointer -Wno-long-long -Wno-unused


    No idea about the speed issues - and that's rather off-topic here,
    because it's about the behavior of certain compilers in combination
    with certain processors, which all hasn't much to do with C. And
    about the wrong results with gcc have another look at the info
    pages concerning the -ffast-math option:

    > This option should never be turned on by any `-O' option since it
    > can result in incorrect output for programs which depend on an
    > exact implementation of IEEE or ISO rules/specifications for math
    > functions.


    Perhaps it got to do something with this...

    In your place I would probably start with throwing out all that
    options and test carefully which of them really make a difference
    - some of them could even result in a slow-down when used with the
    wrong processor type. And your code is actually that obfuscated (and
    not the one you're using, by the way) that a compiler might have
    problems finding out how to optimize on it. Try to rewrite it in an
    understandable form and you might have a much better chance to get
    it optimized. If you then find it's too slow you still can try to
    micro-optimize (but expect the effect to differ between compilers
    and processors).

    > Basically one of the most intensive loops (that we suspect in but aren't sure if
    > it causes the problem) looks like this:


    Profiling your code would probably be better than just guessing...

    > static long loop_order;


    > void functionname ()
    > {
    > register float *iPtr, *itPtr, *iPtr1, *cPtr, acc;


    iPtr is twice defined, that should get the compiler quite a bit upset.

    > register long j;
    > :


    What's that colon good for?

    > {


    Why wrap this in another block?

    > register float c1, c2;


    Where do c1 and c2 ever get assigned values?

    > j = loop_order;
    > while (j--)
    > {
    > acc = *itPtr-- * c1;


    iPtr has never been assigned a value.

    > acc += *itPtr-- * c2;
    > acc += *itPtr++ * c3;


    c3 is never defined anywhere.

    > *cPtr++ += *iPtr1++ * acc;


    cPtr and iPtr1 also didn't get assigned values.

    > }
    > }


    Now, what the hell is all that supposed to do?

    Regards, Jens
    --
    \ Jens Thoms Toerring ___ -berlin.de
    \__________________________ http://www.toerring.de
     
    -berlin.de, Aug 22, 2004
    #5
  6. VNG

    CBFalconer Guest

    VNG wrote:
    >

    .... snip about systems - OT ...
    >
    > Basically one of the most intensive loops (that we suspect in but
    > aren't sure if it causes the problem) looks like this:
    >
    > static long loop_order;
    >
    > void functionname ()
    > {
    > register float *iPtr, *itPtr, *iPtr1, *cPtr, acc;
    > register long j;
    > :
    > {
    > register float c1, c2;
    > j = loop_order;
    > while (j--)
    > {
    > acc = *itPtr-- * c1;
    > acc += *itPtr-- * c2;
    > acc += *itPtr++ * c3;
    > *cPtr++ += *iPtr1++ * acc;
    > }
    > }
    > :
    > }
    >
    > We have tried to eliminate the use of the word "register" and
    > redefined "j" as volatile, no change.


    What are those isolated colons doing? The register keyword seems
    pointless, as does the volatile. Initializing the various
    pointers might help. Same for the cNs. c3 seems to be undefined.
    The time for multiplication can vary greatly with the operands.

    As ever, first measure. It should not be any great effort to do
    some profiling runs.

    --
    fix (vb.): 1. to paper over, obscure, hide from public view; 2.
    to work around, in a way that produces unintended consequences
    that are worse than the original problem. Usage: "Windows ME
    fixes many of the shortcomings of Windows 98 SE". - Hutchison
     
    CBFalconer, Aug 22, 2004
    #6
  7. In article <5nWVc.5582$>,
    VNG <> wrote:

    > I have an ANSI C program that was compiled under Windows MSVC++ 6.0 (SP6) and
    > under Linux gnu, and ran under P3, P4 and AMD.
    >
    > It runs fine on P3 and AMD under both Windows and Linux, but under P4 it has
    > problems. Under Windows 3GHz P4 runs twice slower than 800MHz P3... and
    > under
    > Linux not only that it runs slower (while AMD is 40 times faster), but it
    > also
    > produces wrong numerical results...
    >
    > Any suggestion what can be the problem?
    >
    > How to fix the P4 speed under MSVC++ (SP6)?
    > How to fix P4's speed and numerical result under Linux?
    >
    > Here's some more details about the compilation:
    > GNU:
    > CFLAGS=-O6 -fexpensive-optimizations -ffast-math -fno-strength-reduce
    > -funroll-loops -fomit-frame-pointer -Wno-long-long -Wno-unused
    >
    >
    > Basically one of the most intensive loops (that we suspect in but aren't sure
    > if
    > it causes the problem) looks like this:
    >
    > static long loop_order;
    >
    > void functionname ()
    > {
    > register float *iPtr, *itPtr, *iPtr1, *cPtr, acc;
    > register long j;
    > :
    > {
    > register float c1, c2;
    > j = loop_order;
    > while (j--)
    > {
    > acc = *itPtr-- * c1;
    > acc += *itPtr-- * c2;
    > acc += *itPtr++ * c3;
    > *cPtr++ += *iPtr1++ * acc;
    > }
    > }
    > :
    > }


    P4s dislike accessing data at certain distances from each other. If the
    distance between the various pointer variables is a multiple of a large
    power of two (for example 64 KB) then you might be in trouble.
     
    Christian Bau, Aug 22, 2004
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. hshdude
    Replies:
    12
    Views:
    1,084
    Dimitri Maziuk
    Nov 4, 2004
  2. VNG
    Replies:
    1
    Views:
    382
    Ioannis Vranos
    Aug 22, 2004
  3. Replies:
    1
    Views:
    499
  4. Replies:
    11
    Views:
    1,096
    Keith Thompson
    Apr 28, 2008
  5. Frank Iannarilli

    pre-ansi to ansi c++ conversion?

    Frank Iannarilli, Jul 21, 2009, in forum: C++
    Replies:
    2
    Views:
    420
Loading...

Share This Page