ANSI C problem on P4 under Linux & Windows

V

VNG

I have an ANSI C program that was compiled under Windows MSVC++ 6.0 (SP6) and
under Linux gnu, and ran under P3, P4 and AMD.

It runs fine on P3 and AMD under both Windows and Linux, but under P4 it has
problems. Under Windows 3GHz P4 runs twice slower than 800MHz P3... and under
Linux not only that it runs slower (while AMD is 40 times faster), but it also
produces wrong numerical results...

Any suggestion what can be the problem?

How to fix the P4 speed under MSVC++ (SP6)?
How to fix P4's speed and numerical result under Linux?

Here's some more details about the compilation:
GNU:
CFLAGS=-O6 -fexpensive-optimizations -ffast-math -fno-strength-reduce
-funroll-loops -fomit-frame-pointer -Wno-long-long -Wno-unused


Basically one of the most intensive loops (that we suspect in but aren't sure if
it causes the problem) looks like this:

static long loop_order;

void functionname ()
{
register float *iPtr, *itPtr, *iPtr1, *cPtr, acc;
register long j;
:
{
register float c1, c2;
j = loop_order;
while (j--)
{
acc = *itPtr-- * c1;
acc += *itPtr-- * c2;
acc += *itPtr++ * c3;
*cPtr++ += *iPtr1++ * acc;
}
}
:
}

We have tried to eliminate the use of the word "register" and redefined "j" as
volatile, no change.


Thanks,
-- VNG
 
I

Ioannis Vranos

VNG said:
I have an ANSI C program that was compiled under Windows MSVC++ 6.0
(SP6) and
under Linux gnu, and ran under P3, P4 and AMD.

It runs fine on P3 and AMD under both Windows and Linux, but under P4 it
has
problems. Under Windows 3GHz P4 runs twice slower than 800MHz P3...
and under
Linux not only that it runs slower (while AMD is 40 times faster), but
it also
produces wrong numerical results...

Any suggestion what can be the problem?

How to fix the P4 speed under MSVC++ (SP6)?
How to fix P4's speed and numerical result under Linux?

Here's some more details about the compilation:
GNU:
CFLAGS=-O6 -fexpensive-optimizations -ffast-math -fno-strength-reduce
-funroll-loops -fomit-frame-pointer -Wno-long-long -Wno-unused


Basically one of the most intensive loops (that we suspect in but aren't
sure if
it causes the problem) looks like this:

static long loop_order;

void functionname ()
{
register float *iPtr, *itPtr, *iPtr1, *cPtr, acc;
register long j;
:
{
register float c1, c2;
j = loop_order;
while (j--)
{
acc = *itPtr-- * c1;
acc += *itPtr-- * c2;
acc += *itPtr++ * c3;
*cPtr++ += *iPtr1++ * acc;
}
}
:
}

We have tried to eliminate the use of the word "register" and redefined
"j" as
volatile, no change.


Why volatile? Also -ffast-math sounds like lower floating pointprecision
than normal.


The command line parameters I use for C90 programs:


-std=iso9899:199409 -pedantic-errors -Wall -fexpensive-optimizations -O3
-ffloat-store -mcpu=pentiumpro




Try this, and do not use volatile and register unless needed.






Regards,

Ioannis Vranos

http://www23.brinkster.com/noicys
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,731
Messages
2,569,432
Members
44,832
Latest member
GlennSmall

Latest Threads

Top