ANSI C problem on P4 under Linux & Windows

VNG · Aug 22, 2004

I have an ANSI C program that was compiled under Windows MSVC++ 6.0 (SP6) and
under Linux gnu, and ran under P3, P4 and AMD.

It runs fine on P3 and AMD under both Windows and Linux, but under P4 it has
problems. Under Windows 3GHz P4 runs twice slower than 800MHz P3... and under
Linux not only that it runs slower (while AMD is 40 times faster), but it also
produces wrong numerical results...

Any suggestion what can be the problem?

How to fix the P4 speed under MSVC++ (SP6)?
How to fix P4's speed and numerical result under Linux?

Here's some more details about the compilation:
GNU:
CFLAGS=-O6 -fexpensive-optimizations -ffast-math -fno-strength-reduce
-funroll-loops -fomit-frame-pointer -Wno-long-long -Wno-unused

Basically one of the most intensive loops (that we suspect in but aren't sure if
it causes the problem) looks like this:

static long loop_order;

void functionname ()
{
register float *iPtr, *itPtr, *iPtr1, *cPtr, acc;
register long j;
:
{
register float c1, c2;
j = loop_order;
while (j--)
{
acc = *itPtr-- * c1;
acc += *itPtr-- * c2;
acc += *itPtr++ * c3;
*cPtr++ += *iPtr1++ * acc;
}
}
:
}

We have tried to eliminate the use of the word "register" and redefined "j" as
volatile, no change.

Thanks,
-- VNG

SM Ryan · Aug 22, 2004

# {
# register float c1, c2;
# j = loop_order;
# while (j--)
# {
# acc = *itPtr-- * c1;
# acc += *itPtr-- * c2;
# acc += *itPtr++ * c3;
# *cPtr++ += *iPtr1++ * acc;
# }
# }

Is there some reason to keep loading itPtr[-1] and itPtr[-2]
inside the loop instead of outside?

Profetas · Aug 22, 2004

which OS do you have in your P3?

newer OS/compiler may not use the register to store your vars which will
be slower

Jens.Toerring · Aug 22, 2004

Profetas said:
which OS do you have in your P3?

Did you ever read the post? The OP writes it all at the start of his
article.

newer OS/compiler may not use the register to store your vars which will
be slower

That's simply BS. First of all, 'register' was never more than a
hint to the compiler that a variable will be used a lot and that
it might be a good idea to store it in a register. But the compiler
was always free to disregard this hint. Moreover, newer compilers
are usually quite good at figuring out such things, so you usually
don't need the 'register' keyword anymore because the compiler will
automatically pick the most suitable variables for keeping them in
registers. And, finally, this didn't got anything at all to do with
the OS.
Regards, Jens

Jens.Toerring · Aug 22, 2004

VNG said:
I have an ANSI C program that was compiled under Windows MSVC++ 6.0 (SP6) and
under Linux gnu, and ran under P3, P4 and AMD.

It runs fine on P3 and AMD under both Windows and Linux, but under P4 it has
problems. Under Windows 3GHz P4 runs twice slower than 800MHz P3... and under
Linux not only that it runs slower (while AMD is 40 times faster), but it also
produces wrong numerical results...

Any suggestion what can be the problem?

How to fix the P4 speed under MSVC++ (SP6)?
How to fix P4's speed and numerical result under Linux?

Here's some more details about the compilation:
GNU:
CFLAGS=-O6 -fexpensive-optimizations -ffast-math -fno-strength-reduce
-funroll-loops -fomit-frame-pointer -Wno-long-long -Wno-unused

No idea about the speed issues - and that's rather off-topic here,
because it's about the behavior of certain compilers in combination
with certain processors, which all hasn't much to do with C. And
about the wrong results with gcc have another look at the info
pages concerning the -ffast-math option:

This option should never be turned on by any `-O' option since it
can result in incorrect output for programs which depend on an
exact implementation of IEEE or ISO rules/specifications for math
functions.

Perhaps it got to do something with this...

In your place I would probably start with throwing out all that
options and test carefully which of them really make a difference
- some of them could even result in a slow-down when used with the
wrong processor type. And your code is actually that obfuscated (and
not the one you're using, by the way) that a compiler might have
problems finding out how to optimize on it. Try to rewrite it in an
understandable form and you might have a much better chance to get
it optimized. If you then find it's too slow you still can try to
micro-optimize (but expect the effect to differ between compilers
and processors).

Basically one of the most intensive loops (that we suspect in but aren't sure if
it causes the problem) looks like this:

Profiling your code would probably be better than just guessing...

static long loop_order;

void functionname ()
{
register float *iPtr, *itPtr, *iPtr1, *cPtr, acc;

iPtr is twice defined, that should get the compiler quite a bit upset.

register long j;
:

What's that colon good for?

{

Why wrap this in another block?

register float c1, c2;

Where do c1 and c2 ever get assigned values?

j = loop_order;
while (j--)
{
acc = *itPtr-- * c1;

iPtr has never been assigned a value.

acc += *itPtr-- * c2;
acc += *itPtr++ * c3;

c3 is never defined anywhere.

*cPtr++ += *iPtr1++ * acc;

cPtr and iPtr1 also didn't get assigned values.

}
}

Now, what the hell is all that supposed to do?

Regards, Jens

CBFalconer · Aug 22, 2004

.... snip about systems - OT ...

Basically one of the most intensive loops (that we suspect in but
aren't sure if it causes the problem) looks like this:

static long loop_order;

void functionname ()
{
register float *iPtr, *itPtr, *iPtr1, *cPtr, acc;
register long j;
:
{
register float c1, c2;
j = loop_order;
while (j--)
{
acc = *itPtr-- * c1;
acc += *itPtr-- * c2;
acc += *itPtr++ * c3;
*cPtr++ += *iPtr1++ * acc;
}
}
:
}

We have tried to eliminate the use of the word "register" and
redefined "j" as volatile, no change.

What are those isolated colons doing? The register keyword seems
pointless, as does the volatile. Initializing the various
pointers might help. Same for the cNs. c3 seems to be undefined.
The time for multiplication can vary greatly with the operands.

As ever, first measure. It should not be any great effort to do
some profiling runs.

Christian Bau · Aug 22, 2004

VNG said:
I have an ANSI C program that was compiled under Windows MSVC++ 6.0 (SP6) and
under Linux gnu, and ran under P3, P4 and AMD.

It runs fine on P3 and AMD under both Windows and Linux, but under P4 it has
problems. Under Windows 3GHz P4 runs twice slower than 800MHz P3... and
under
Linux not only that it runs slower (while AMD is 40 times faster), but it
also
produces wrong numerical results...

Any suggestion what can be the problem?

How to fix the P4 speed under MSVC++ (SP6)?
How to fix P4's speed and numerical result under Linux?

Here's some more details about the compilation:
GNU:
CFLAGS=-O6 -fexpensive-optimizations -ffast-math -fno-strength-reduce
-funroll-loops -fomit-frame-pointer -Wno-long-long -Wno-unused

Basically one of the most intensive loops (that we suspect in but aren't sure
if
it causes the problem) looks like this:

static long loop_order;

void functionname ()
{
register float *iPtr, *itPtr, *iPtr1, *cPtr, acc;
register long j;
:
{
register float c1, c2;
j = loop_order;
while (j--)
{
acc = *itPtr-- * c1;
acc += *itPtr-- * c2;
acc += *itPtr++ * c3;
*cPtr++ += *iPtr1++ * acc;
}
}
:
}

P4s dislike accessing data at certain distances from each other. If the
distance between the various pointer variables is a multiple of a large
power of two (for example 64 KB) then you might be in trouble.

ANSI C problem on P4 under Linux & Windows	1	Aug 22, 2004
Java application developped under Linux running ridiculously slow under Windows	12	Oct 24, 2004
Why C Is Not My Favourite Programming Language	132	Feb 5, 2005
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006
comp.lang.c Changes to Answers to Frequently Asked Questions (FAQ)	1	Jul 4, 2004
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	1	Feb 1, 2004
comp.lang.vhdl FAQ part 3 of 4: products & services	0	Jul 8, 2003
comp.lang.vhdl FAQ part 1 of 4: general	0	Jul 8, 2003

ANSI C problem on P4 under Linux & Windows

VNG

SM Ryan

Profetas

Jens.Toerring

Jens.Toerring

CBFalconer

Christian Bau

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads