Hello,
this is cross-posted with follow-up to comp.lang.asm.x86, because I talk about
generated and optimized x86 code for a C program, which is rahter OT in comp.lang.c.
Skarmander said:
I suppose. Last week's gcc 4.1 from CVS:
[mark@icepick ~]$ gcc-4_1_cvs_20051015 --version
gcc-4_1_cvs_20051015 (GCC) 4.1.0 20051015 (experimental)
[...]
[mark@icepick ~]$ gcc-4_1_cvs_20051015 -Wall -ansi -pedantic -O2
-mtune=i686 -fomit-frame-pointer -c -o foo.o foo.c
modTest:
movzwl 8(%esp), %edx
movzwl 4(%esp), %eax
subl %edx, %eax
cltd
shrl $29, %edx
addl %edx, %eax
andl $7, %eax
subl %edx, %eax
movzwl %ax, %eax
ret
Yup, clever. Branchless and pipeline-optimized. Thanks, I didn't see this.
Now, you are comparing something very weird. MSVC++ 6.0 is some years
older than "last weeks gcc CVS".
Ok, although it is OT, I did some test with a newer version of the MS
compiler (as been available with the latest release DDK, Win 2003 DDK SP
1, with default settings for release builds):
C:\test>cl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 13.10.4035 for 80x86
Copyright (C) Microsoft Corporation 1984-2002. All rights reserved.
Now, let's have a look at the compiled code:
unsigned short modTest1(unsigned short a, unsigned short b)
{
return ( (a - b) % 8 );
}
test!modTest1:
01001ba7 8bff mov edi,edi
01001ba9 55 push ebp
01001baa 8bec mov ebp,esp
01001bac 0fb74d0c movzx ecx,word ptr [ebp+0xc]
01001bb0 0fb74508 movzx eax,word ptr [ebp+0x8]
01001bb4 2bc1 sub eax,ecx
01001bb6 99 cdq
01001bb7 6a08 push 0x8
01001bb9 59 pop ecx
01001bba f7f9 idiv ecx
01001bbc 668bc2 mov ax,dx
01001bbf 5d pop ebp
01001bc0 c20800 ret 0x8
[...]
unsigned short modTest2(unsigned short a, unsigned short b)
{
return ( (a - b) % 8U );
}
test!modTest2:
01001bc8 8bff mov edi,edi
01001bca 55 push ebp
01001bcb 8bec mov ebp,esp
01001bcd 8b4508 mov eax,[ebp+0x8]
01001bd0 8b4d0c mov ecx,[ebp+0xc]
01001bd3 2bc1 sub eax,ecx
01001bd5 83e007 and eax,0x7
01001bd8 5d pop ebp
01001bd9 c20800 ret 0x8
[...]
unsigned short modTest3(unsigned short a, unsigned short b)
{
return ( (unsigned short)( (unsigned int)(a - b) ) % 8U );
}
test!modTest3:
01001be1 8bff mov edi,edi
01001be3 55 push ebp
01001be4 8bec mov ebp,esp
01001be6 33c0 xor eax,eax
01001be8 8a4508 mov al,[ebp+0x8]
01001beb 2a450c sub al,[ebp+0xc]
01001bee 83e007 and eax,0x7
01001bf1 5d pop ebp
01001bf2 c20800 ret 0x8
(Remark: I was not able to compile modTest2 and modTest3 if they were both
available with the main() calling it. The compiler always insisted on
optimizing them away. I had to use two different compilation units and link
them together to actually get code for them.)
Now, this looks much better than the MSVC++ 6.0 code, doesn't it?
Anyway, I am not sure if the IDIV approach with modTest1() is a good solution
compared with the solution of gcc 4.1. Any thoughts from the assembler gurus
here?
Regards,
Spiro.