MS VC++ 6 SP6 optimiser

A

Andrew Holme

I'm prototyping a fixed-point FFT algorithm for an FPGA.

I wrote a class 'complex' based on type 'double' for real and imaginary
parts. With some difficulty, I eventually got to the point where most
operations compiled to inline co-processor instructions. I then wrote a
class 'fixed' using type 'int' to represent fixed-point numbers with 16 bits
either side of the binary point. A simple test program using class 'fixed'
was well optimised; but when I changed class 'complex' to use 'fixed'
instead of 'double', the results were dissappointing.

Is this because the optimiser can only see so many levels down through a
class hierarchy?

Speed of execution on the PC is not important, since I am targetting a
totally different platform; b ut it would be nice to see it produce good
code.
 
D

Dombo

Op 11-Jun-11 15:26, Andrew Holme schreef:
I'm prototyping a fixed-point FFT algorithm for an FPGA.

I wrote a class 'complex' based on type 'double' for real and imaginary
parts. With some difficulty, I eventually got to the point where most
operations compiled to inline co-processor instructions. I then wrote a
class 'fixed' using type 'int' to represent fixed-point numbers with 16 bits
either side of the binary point. A simple test program using class 'fixed'
was well optimised; but when I changed class 'complex' to use 'fixed'
instead of 'double', the results were dissappointing.

Is this because the optimiser can only see so many levels down through a
class hierarchy?

That depends, if your fixed point code is in a different translation
unit (= .cpp file) than your complex class chances are that the
optimizer will not be able to inline calls of your fixed point class
(especially since you are using a rather antique compiler).

Even if your fixed point class has inline implementations (visible
through the header file) of its member functions, it still depends very
much on how you implemented the 16-bit fixed point arithmetic. Unlike
double arithmetic, 16-bit fixed point arithmetric is not natively
supported by the x86 processor. So I'm not too surprised if the
optimizer is less successful at optimizing the 16-bit fixed point code
compared to doubles. Also note that 16-bit operations tend to be
somewhat inefficient on x86 processors in 32-bit mode.
 
R

robertwessel2

I'm prototyping a fixed-point FFT algorithm for an FPGA.

I wrote a class 'complex' based on type 'double' for real and imaginary
parts.  With some difficulty, I eventually got to the point where most
operations compiled to inline co-processor instructions.  I then wrote a
class 'fixed' using type 'int' to represent fixed-point numbers with 16 bits
either side of the binary point.  A simple test program using class 'fixed'
was well optimised; but when I changed class 'complex' to use 'fixed'
instead of 'double', the results were dissappointing.

Is this because the optimiser can only see so many levels down through a
class hierarchy?

Speed of execution on the PC is not important, since I am targetting a
totally different platform; b ut it would be nice to see it produce good
code.



As Dombo mentioned, if these are defined in another translation unit,
then the compiler can't see them to inline them. Also, you might ask
for more aggressive inlining. Don't remember if MSVC6 allowed
anything more than /Ob2, but it's worth taking a look. Explicit
inlines might help as well.

But the real question is: why on earth are you using a 13 year old
compiler? When the new version is free? And the VS10 optimizer is
better, and it supports link-time code generation (which allows the
compiler to optimize across translation units). Not only is MSVC6
horrible outdated, its support of templates is at considerable
variance with the C++ standard (not really MS's fault, they followed
the draft standard which existed at the time they implemented VC6, and
that got changed).
 
R

robertwessel2

Op 11-Jun-11 15:26, Andrew Holme schreef:




That depends, if your fixed point code is in a different translation
unit (= .cpp file) than your complex class chances are that the
optimizer will not be able to inline calls of your fixed point class
(especially since you are using a rather antique compiler).

Even if your fixed point class has inline implementations (visible
through the header file) of its member functions, it still depends very
much on how you implemented the 16-bit fixed point arithmetic. Unlike
double arithmetic, 16-bit fixed point arithmetric is not natively
supported by the x86 processor. So I'm not too surprised if the
optimizer is less successful at optimizing the 16-bit fixed point code
compared to doubles. Also note that 16-bit operations tend to be
somewhat inefficient on x86 processors in 32-bit mode.



How is 16 bit arithmetic not natively supported on x86? You can code
"add ax,bx" on every x86 from the 16 bit 8086 to the latest 64 bit
monsters. In any event, the OP appears to be do 32 bit arithmetic,
with a fixed radix point in the middle.
 
D

Dombo

Op 12-Jun-11 0:52, (e-mail address removed) schreef:
How is 16 bit arithmetic not natively supported on x86?

I didn't say that, read again.
You can code
"add ax,bx" on every x86 from the 16 bit 8086 to the latest 64 bit
monsters.

In 32-bit mode that requires a prefix which slows the operation down
compared to add eax,ebx
In any event, the OP appears to be do 32 bit arithmetic,
with a fixed radix point in the middle.

I didn't see any code from the OP, did you?
 
R

robertwessel2

Op 12-Jun-11 0:52, (e-mail address removed) schreef:







I didn't say that, read again.


You said "16-bit fixed point arithmetric is not natively supported by
the x86 processor."

In 32-bit mode that requires a prefix which slows the operation down
compared to add eax,ebx


The use of a prefix byte in the encoding really has no impact on
whether the instruction is native or not. If it did, you'd have to
declare all the 64 bit instructions and any use of registers R8-R15 to
be "non-native," since you need a REX prefix to do any of that. If
the instruction is in the ISA, not matter how inefficient the encoding
or implementation, it's native.

In any event, whether the longer instruction is slower or not is
implementation and code sequence dependent.

I didn't see any code from the OP, did you?


No, but he did say he was using ints on MSVC6, which makes then 32
bits. I grant you that the interpretation that he's using a single
int to store both 16-bit haves of the number does require a bit of
reading between the lines, but it's a common enough approach for fixed
point arithmetic.
 
A

Andrew Holme

Andrew Holme said:
I'm prototyping a fixed-point FFT algorithm for an FPGA.

I wrote a class 'complex' based on type 'double' for real and imaginary
parts. With some difficulty, I eventually got to the point where most
operations compiled to inline co-processor instructions. I then wrote a
class 'fixed' using type 'int' to represent fixed-point numbers with 16
bits either side of the binary point. A simple test program using class
'fixed' was well optimised; but when I changed class 'complex' to use
'fixed' instead of 'double', the results were dissappointing.

Is this because the optimiser can only see so many levels down through a
class hierarchy?

Speed of execution on the PC is not important, since I am targetting a
totally different platform; b ut it would be nice to see it produce good
code.

Thanks for the comments. I tried it on a modern version of MSVC at work
today and the compiled code was optimal. So it seems they have made a lot
of improvements over the past decade!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top