The difference between signed and unsigned when doing `mod'

Hanzac Chen · Oct 20, 2005

Hi,

I don't understand why this could happen?
The Code 1 will output `fff9'
and the Code 2 will output `1'
How could the `mod 8' not have effect?

/* Code 1 */
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
unsigned short a, b, c;
a = 0;
b = 7;

c = (a-b)%8;

printf("%x\n", c);
return 0;
}

/* Code 2 */
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
unsigned short a, b, c;
a = 0;
b = 7;

c = (a-b)%8U;

printf("%x\n", c);
return 0;
}

Walter Roberson · Oct 20, 2005

/* Code 1 */
#include <stdio.h>
#include <stdlib.h>

Note: you do not actually use stdlib.h in any of the code you show.
stdlib.h is, though, the source of EXIT_SUCCESS which you could be
using as your return code instead of using the magic number 0.

int main(int argc, char *argv[])
{
unsigned short a, b, c;

The Code 1 will output `fff9'

Not if unsigned short happens to be a different size than on the machine
you happened to test it on.

a = 0;
b = 7;

c = (a-b)%8;

printf("%x\n", c);

A printf %x format requires an unsigned int, not an unsigned short.
There's probably some default argument promotion going on, but
that's going to affect the result.

Keith Thompson · Oct 20, 2005

Note: you do not actually use stdlib.h in any of the code you show.
stdlib.h is, though, the source of EXIT_SUCCESS which you could be
using as your return code instead of using the magic number 0.

But 0 is the least magical of all numbers, and the standard guarantees
that "return 0;" in main() returns a status indicating success.

Jack Klein · Oct 20, 2005

in comp.lang.c:

In addition to the things that Walter Roberson correctly mentioned...

Hi,

I don't understand why this could happen?
The Code 1 will output `fff9'
and the Code 2 will output `1'
How could the `mod 8' not have effect?

/* Code 1 */
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
unsigned short a, b, c;
a = 0;
b = 7;

c = (a-b)%8;

If signed int on your platform can hold all the values of an unsigned
short, 'a' and 'b' will be promoted to signed int for the subtraction.
Equivalent to:

unsigned short a, b, c;
signed int sia, sib;
a = 0;
b = 7;
sia = a;
sib = b;

c = (sia - sib)%8;

The expression inside the parentheses evaluates to (int)-7. One of
the two possible correct values for -7 % 8 allowed prior to C99 is -7.

When you assign (int)-7 to an unsigned short, the behavior of unsigned
types with out of range values occurs. (int)-7 is converted to
USHRT_MAX - 7. For a 16-bit unsigned short, this is 0xffff - 7, which
equals 0xfff9.

printf("%x\n", c);
return 0;
}

/* Code 2 */
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
unsigned short a, b, c;
a = 0;
b = 7;

c = (a-b)%8U;

The same thing happens here, the subtraction yields a value of
(int)-7. Since the other operand of the % operator has the type
unsigned int, however, the value (int)-7 is promoted to (unsigned
int)-7 before the operation is performed.

printf("%x\n", c);
return 0;
}

When dealing with promotions of unsigned integer types to higher
ranking integer types, these promotions sometimes result in signed
values of the higher types.

Walter Roberson · Oct 20, 2005

But 0 is the least magical of all numbers

The ancient Greek mathemeticians (e.g., Pythagorus) would
not have agreed at all -- they couldn't grasp the existance of 0.

0 has so many unusual properties that it is one of the -most-
magical of finite numbers.

In every other context in C, 0 represents "false" and non-zero
represents "true", but to continue that on to the exit value
would suppose that the exit value is posing the question
"Did this program fail", and 0 is answering that in the negative,
"No, it is false that the program failed". That's a pretty magical
interpretation.

Mike Wahler · Oct 20, 2005

Walter Roberson said:
The ancient Greek mathemeticians (e.g., Pythagorus) would
not have agreed at all -- they couldn't grasp the existance of 0.

0 has so many unusual properties that it is one of the -most-
magical of finite numbers.

In every other context in C, 0 represents "false" and non-zero
represents "true",

Not all contexts. It often means literally the value
zero, or some function could give it some special meaning as
a return value (e.g. 'not found'), etc.

but to continue that on to the exit value
would suppose that the exit value is posing the question
"Did this program fail",

Or, "did it succeed?"

and 0 is answering that in the negative,
"No, it is false that the program failed". That's a pretty magical
interpretation.

No magic at all. THe standard explicitly speicifies
that main() returning zero means 'success'. Whether
or not the value recieved by the host is also zero,
is implementation/platform dependent.

-Mike

Hanzac Chen · Oct 20, 2005

Hi,

Thanks to all of you that replied to my mail.

Jack said:
In addition to the things that Walter Roberson correctly mentioned...

Yes, I read that. But I've just write a test program.

Thanks to Walter Roberson although you're strict.

If signed int on your platform can hold all the values of an unsigned
short, 'a' and 'b' will be promoted to signed int for the subtraction.
Equivalent to:

unsigned short a, b, c;
signed int sia, sib;
a = 0;
b = 7;
sia = a;
sib = b;

c = (sia - sib)%8;

The expression inside the parentheses evaluates to (int)-7. One of
the two possible correct values for -7 % 8 allowed prior to C99 is -7.

This is what I can't fully understand: all the variables in this
calculation are unsigned short, there is no need to promote to int.

Anyway, I have to obey the rule.

When you assign (int)-7 to an unsigned short, the behavior of unsigned
types with out of range values occurs. (int)-7 is converted to
USHRT_MAX - 7. For a 16-bit unsigned short, this is 0xffff - 7, which
equals 0xfff9.

printf("%x\n", c);
return 0;
}

/* Code 2 */
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
unsigned short a, b, c;
a = 0;
b = 7;

c = (a-b)%8U;

Click to expand...

The same thing happens here, the subtraction yields a value of
(int)-7. Since the other operand of the % operator has the type
unsigned int, however, the value (int)-7 is promoted to (unsigned
int)-7 before the operation is performed.

printf("%x\n", c);
return 0;
}

Click to expand...

When dealing with promotions of unsigned integer types to higher
ranking integer types, these promotions sometimes result in signed
values of the higher types.

Thanks, I understand the full process now.

Jack Klein · Oct 20, 2005

Hi,

Thanks to all of you that replied to my mail.

Yes, I read that. But I've just write a test program.
Thanks to Walter Roberson although you're strict.

This is what I can't fully understand: all the variables in this
calculation are unsigned short, there is no need to promote to int.

There IS a need. The need is provided by the C standard, that
requires that almost all operators act on values of int or greater
rank.

Anyway, I have to obey the rule.

When you assign (int)-7 to an unsigned short, the behavior of unsigned
types with out of range values occurs. (int)-7 is converted to
USHRT_MAX - 7. For a 16-bit unsigned short, this is 0xffff - 7, which
equals 0xfff9.

printf("%x\n", c);
return 0;
}

/* Code 2 */
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
unsigned short a, b, c;
a = 0;
b = 7;

c = (a-b)%8U;

Click to expand...

The same thing happens here, the subtraction yields a value of
(int)-7. Since the other operand of the % operator has the type
unsigned int, however, the value (int)-7 is promoted to (unsigned
int)-7 before the operation is performed.

printf("%x\n", c);
return 0;
}

Click to expand...

When dealing with promotions of unsigned integer types to higher
ranking integer types, these promotions sometimes result in signed
values of the higher types.

Click to expand...

Thanks, I understand the full process now.

Keyser Soze · Oct 22, 2005

Jack Klein said:
Hi,

Thanks to all of you that replied to my mail.

Yes, I read that. But I've just write a test program.
Thanks to Walter Roberson although you're strict.

This is what I can't fully understand: all the variables in this
calculation are unsigned short, there is no need to promote to int.

Click to expand...

There IS a need. The need is provided by the C standard, that
requires that almost all operators act on values of int or greater
rank.

Anyway, I have to obey the rule.

When you assign (int)-7 to an unsigned short, the behavior of unsigned
types with out of range values occurs. (int)-7 is converted to
USHRT_MAX - 7. For a 16-bit unsigned short, this is 0xffff - 7, which
equals 0xfff9.

printf("%x\n", c);
return 0;
}

/* Code 2 */
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
unsigned short a, b, c;
a = 0;
b = 7;

c = (a-b)%8U;

The same thing happens here, the subtraction yields a value of
(int)-7. Since the other operand of the % operator has the type
unsigned int, however, the value (int)-7 is promoted to (unsigned
int)-7 before the operation is performed.

printf("%x\n", c);
return 0;
}

When dealing with promotions of unsigned integer types to higher
ranking integer types, these promotions sometimes result in signed
values of the higher types.

Click to expand...

Thanks, I understand the full process now.

Click to expand...

Side affects on code generation when unsigned types are promoted to signed:

--------------------

/* Signed constant promotes expression from */
/* unsigned short to signed int. */
/* Mod operator returns a range of -7 to +7 */

unsigned short modTest(unsigned short a, unsigned short b)
{
return ( (a - b) % 8 );
}

00401000 mov eax,dword ptr [esp+4]
00401004 mov ecx,dword ptr [esp+8]
00401008 and eax,0FFFFh
0040100D and ecx,0FFFFh
00401013 sub eax,ecx
00401015 and eax,80000007h
0040101A jns 00401021
0040101C dec eax
0040101D or eax,0F8h
00401020 inc eax
00401021 ret

--------------------

/* Unsigned constant promotes expression from */
/* unsigned short to unsigned int. */
/* Mod operator returns a range of 0 to +7 */

unsigned short modTest(unsigned short a, unsigned short b)
{
return ( (a - b) % 8U );
}

00401000 mov eax,dword ptr [esp+4]
00401004 mov ecx,dword ptr [esp+8]
00401008 sub eax,ecx
0040100A and eax,7
0040100D ret

--------------------

/* Included just because it's so strange. */
/* The explicit casts resulted in demoting */
/* the interim expression to unsigned char! */

unsigned short modTest(unsigned short a, unsigned short b)
{
return ( (unsigned short)( (unsigned int)(a - b) ) % 8U );
}

00401000 mov al,byte ptr [esp+4]
00401004 mov cl,byte ptr [esp+8]
00401008 sub al,cl
0040100A and eax,7
0040100D ret

Mark F. Haigh · Oct 22, 2005

Keyser Soze wrote:

Side affects on code generation when unsigned types are promoted to signed:

--------------------

/* Signed constant promotes expression from */
/* unsigned short to signed int. */
/* Mod operator returns a range of -7 to +7 */

unsigned short modTest(unsigned short a, unsigned short b)
{
return ( (a - b) % 8 );
}

00401000 mov eax,dword ptr [esp+4]
00401004 mov ecx,dword ptr [esp+8]
00401008 and eax,0FFFFh
0040100D and ecx,0FFFFh
00401013 sub eax,ecx
00401015 and eax,80000007h
0040101A jns 00401021
0040101C dec eax
0040101D or eax,0F8h
00401020 inc eax
00401021 ret

OT: Crappy code generation. It could simulaneously do the 'mov' and
'and' with 'movz', and eliminate branches using 'cmov' or some other
equivalent non-branching code transformation. Yawn.

Mark F. Haigh
(e-mail address removed)

Skarmander · Oct 22, 2005

Mark said:
Keyser Soze wrote:

Side affects on code generation when unsigned types are promoted to signed:

--------------------

/* Signed constant promotes expression from */
/* unsigned short to signed int. */
/* Mod operator returns a range of -7 to +7 */

unsigned short modTest(unsigned short a, unsigned short b)
{
return ( (a - b) % 8 );
}

00401000 mov eax,dword ptr [esp+4]
00401004 mov ecx,dword ptr [esp+8]
00401008 and eax,0FFFFh
0040100D and ecx,0FFFFh
00401013 sub eax,ecx
00401015 and eax,80000007h
0040101A jns 00401021
0040101C dec eax
0040101D or eax,0F8h
00401020 inc eax
00401021 ret

Click to expand...

OT: Crappy code generation. It could simulaneously do the 'mov' and
'and' with 'movz', and eliminate branches using 'cmov' or some other
equivalent non-branching code transformation. Yawn.

Yes, OT.

There's also no indication of what architecture was used. 'cmov' only
exists on Pentium Pro and higher, while the 'mov'-'and' combination
beats 'movzx' on the 486 and 586. Indeed, "gcc -march=i386 -mtune=i486"
(use 386 instruction set but optimize for 486) produces code quite
similar to the above. On a pure 386 they should be equal, but gcc
chooses 'movzx' regardless, so it might know something I don't.

For better or worse, the default setting for most x86 compilers is to
use the 386 instruction set. For a 386-586, the VC compiler is producing
fine code. You can fault it for its default settings, but not for the
generated code.

Morals:
- Intel instruction sets are weird and optimal code differs from CPU to CPU.
- If code generation is a concern to you, tell your compiler exactly
what architecture you're compiling for: it matters.
- Don't assume too quickly that you're smarter than a compiler.

S.

Mark F. Haigh · Oct 23, 2005

Skarmander said:
Mark said:

Keyser Soze wrote:

Side affects on code generation when unsigned types are promoted to signed:

--------------------

/* Signed constant promotes expression from */
/* unsigned short to signed int. */
/* Mod operator returns a range of -7 to +7 */

unsigned short modTest(unsigned short a, unsigned short b)
{
return ( (a - b) % 8 );
}

00401000 mov eax,dword ptr [esp+4]
00401004 mov ecx,dword ptr [esp+8]
00401008 and eax,0FFFFh
0040100D and ecx,0FFFFh
00401013 sub eax,ecx
00401015 and eax,80000007h
0040101A jns 00401021
0040101C dec eax
0040101D or eax,0F8h
00401020 inc eax
00401021 ret

Click to expand...

OT: Crappy code generation. It could simulaneously do the 'mov' and
'and' with 'movz', and eliminate branches using 'cmov' or some other
equivalent non-branching code transformation. Yawn.

Click to expand...

Yes, OT.

There's also no indication of what architecture was used. 'cmov' only
exists on Pentium Pro and higher, [...]

Gee, really? Why do you think I wrote "or some other equivalent
non-branching code transformation"?

[...] while the 'mov'-'and' combination
beats 'movzx' on the 486 and 586.

Who cares? It's a single uop on anything modern, and (allegedly) costs
a cycle or two on 486 or 586's. The Intel optimization manuals say to
use it, and that's good enough for me, and good enough for gcc,
apparently.

Indeed, "gcc -march=i386 -mtune=i486"
(use 386 instruction set but optimize for 486) produces code quite
similar to the above. On a pure 386 they should be equal, but gcc
chooses 'movzx' regardless, so it might know something I don't.

The newer gcc's will produce branchless code for the example when
tuning for a 586 or higher. A mispredicted branch is a minor
catastrophe. In fact, eliminating branches is the *first*
assembler/compiler coding rule in the Intel optimization manuals.

<snip>

Mark F. Haigh
(e-mail address removed)

Skarmander · Oct 23, 2005

Mark said:
Skarmander said:

Mark said:

Keyser Soze wrote:

<snip>

Side affects on code generation when unsigned types are promoted to signed:

--------------------

/* Signed constant promotes expression from */
/* unsigned short to signed int. */
/* Mod operator returns a range of -7 to +7 */

unsigned short modTest(unsigned short a, unsigned short b)
{
return ( (a - b) % 8 );
}

00401000 mov eax,dword ptr [esp+4]
00401004 mov ecx,dword ptr [esp+8]
00401008 and eax,0FFFFh
0040100D and ecx,0FFFFh
00401013 sub eax,ecx
00401015 and eax,80000007h
0040101A jns 00401021
0040101C dec eax
0040101D or eax,0F8h
00401020 inc eax
00401021 ret

OT: Crappy code generation. It could simulaneously do the 'mov' and
'and' with 'movz', and eliminate branches using 'cmov' or some other
equivalent non-branching code transformation. Yawn.

Click to expand...

Yes, OT.

There's also no indication of what architecture was used. 'cmov' only
exists on Pentium Pro and higher, [...]

Click to expand...

Gee, really? Why do you think I wrote "or some other equivalent
non-branching code transformation"?

Presumably because you expected me to believe you knew one, but felt
strengthening your argument by actually exhibiting it unnecessary. If
you feel this is trivial, I apologize. I'll admit I don't immediately
see it, and I was also distracted by what appeared to be unwarranted
assumptions on your end.

[...] while the 'mov'-'and' combination
beats 'movzx' on the 486 and 586.

Click to expand...

Who cares? It's a single uop on anything modern, and (allegedly) costs
a cycle or two on 486 or 586's. The Intel optimization manuals say to
use it, and that's good enough for me, and good enough for gcc,
apparently.

Yes, it's possible gcc chooses to use 'movzx' for "generic" 386 code on
the assumption that the instruction will on average be faster for newer
processors. It certainly isn't *worse* on an actual 386, so this is
acceptable.

Note that, the Intel optimization manuals and alleged cycle costs
notwithstanding, gcc 3.4.2 avoids the "good enough" on either the 486 or
the 586. So VC's choice may be suboptimal here (good only for the 386,
486 and 586, while gcc covers the 386, the 686 and (presumably, since
you never know with Intel) the future), but this doesn't seem to warrant
your snap judgment of "crappy code generation".

The newer gcc's will produce branchless code for the example when
tuning for a 586 or higher.

I only have gcc 3.4.2 at my disposal here. You do mean tuning for 586
while using the 386 instruction set, right? Could you reproduce the
relevant code?

A mispredicted branch is a minor catastrophe. In fact, eliminating
branches is the *first* assembler/compiler coding rule in the Intel
optimization manuals.

Of course. It's quite possible that if you tell VC to tune for a 586 or
higher, the outcome would be different too. I'm fairly sure that
expecting a compiler to eliminate branches without giving it the
assumptions necessary to do this effectively will produce more than a
few minor catastrophes.

Like I said, you can fault VC for its default assumptions, but when you
know these, you have to evaluate its code generation in light of them.

S.

Keyser Soze · Oct 23, 2005

< snip of all other OT rants >

This thread has been hijacked down a rabbit whole of how many code
generators can dance on the head of an optimizer.

The point of the examples was to show how explicit type casts can affect the
generated code.

Not to start a discussion of the quality of the generated code.

I am too old to get a life but you youngsters should really get out more

Mark F. Haigh · Oct 23, 2005

Skarmander said:
Mark said:

Skarmander wrote:

Click to expand...

There's also no indication of what architecture was used. 'cmov' only
exists on Pentium Pro and higher, [...]

Click to expand...

Gee, really? Why do you think I wrote "or some other equivalent
non-branching code transformation"?

Click to expand...

Presumably because you expected me to believe you knew one, but felt
strengthening your argument by actually exhibiting it unnecessary. If
you feel this is trivial, I apologize. I'll admit I don't immediately
see it, and I was also distracted by what appeared to be unwarranted
assumptions on your end.

Ok, this is getting too far OT, and this will be the last I have to say
on the matter. It has been my experience that the code fragment is
small and short enough that it can be solved (by an appropriately
advanced optimizing compiler) without branches.

[...] while the 'mov'-'and' combination
beats 'movzx' on the 486 and 586.

Click to expand...

Who cares? It's a single uop on anything modern, and (allegedly) costs
a cycle or two on 486 or 586's. The Intel optimization manuals say to
use it, and that's good enough for me, and good enough for gcc,
apparently.

Click to expand...

Yes, it's possible gcc chooses to use 'movzx' for "generic" 386 code on
the assumption that the instruction will on average be faster for newer
processors. It certainly isn't *worse* on an actual 386, so this is
acceptable.

Note that, the Intel optimization manuals and alleged cycle costs
notwithstanding, gcc 3.4.2 avoids the "good enough" on either the 486 or
the 586. So VC's choice may be suboptimal here (good only for the 386,
486 and 586, while gcc covers the 386, the 686 and (presumably, since
you never know with Intel) the future), but this doesn't seem to warrant
your snap judgment of "crappy code generation".

The newer gcc's will produce branchless code for the example when
tuning for a 586 or higher.

Click to expand...

I only have gcc 3.4.2 at my disposal here. You do mean tuning for 586
while using the 386 instruction set, right? Could you reproduce the
relevant code?

I suppose. Last week's gcc 4.1 from CVS:

[mark@icepick ~]$ gcc-4_1_cvs_20051015 --version
gcc-4_1_cvs_20051015 (GCC) 4.1.0 20051015 (experimental)
[...]

[mark@icepick ~]$ gcc-4_1_cvs_20051015 -Wall -ansi -pedantic -O2
-mtune=i686 -fomit-frame-pointer -c -o foo.o foo.c

modTest:
movzwl 8(%esp), %edx
movzwl 4(%esp), %eax
subl %edx, %eax
cltd
shrl $29, %edx
addl %edx, %eax
andl $7, %eax
subl %edx, %eax
movzwl %ax, %eax
ret

Of course. It's quite possible that if you tell VC to tune for a 586 or
higher, the outcome would be different too. I'm fairly sure that
expecting a compiler to eliminate branches without giving it the
assumptions necessary to do this effectively will produce more than a
few minor catastrophes.

IMO, probably not. It's been a while since I've used VC6, but I
remember it seemed to produce nearly the same code no matter what you
tell it. Give it a try yourself-- hopefully I'll never have to touch
it again.

Like I said, you can fault VC for its default assumptions, but when you
know these, you have to evaluate its code generation in light of them.

Fair enough. I think it's a waste of time to look at the code
generation of what I consider to be a mostly obsolete compiler that
targets mostly obsolete machines. Like I originally said: Yawn.

Mark F. Haigh
(e-mail address removed)

Skarmander · Oct 23, 2005

Keyser said:
< snip of all other OT rants >

This thread has been hijacked down a rabbit whole of how many code
generators can dance on the head of an optimizer.

The point of the examples was to show how explicit type casts can affect the
generated code.

Not to start a discussion of the quality of the generated code.

I am too old to get a life but you youngsters should really get out more

That's why it was flagged "off-topic". And telling Usenet posters to get
out more? Hmm...

Hey, at least nobody was compared to Hitler. That should count for
something.

S.

Skarmander · Oct 23, 2005

Mark said:
Skarmander said:

Mark said:

Skarmander wrote:
[...] while the 'mov'-'and' combination
beats 'movzx' on the 486 and 586.

Who cares? It's a single uop on anything modern, and (allegedly) costs
a cycle or two on 486 or 586's. The Intel optimization manuals say to
use it, and that's good enough for me, and good enough for gcc,
apparently.

Click to expand...

Yes, it's possible gcc chooses to use 'movzx' for "generic" 386 code on
the assumption that the instruction will on average be faster for newer
processors. It certainly isn't *worse* on an actual 386, so this is
acceptable.

Note that, the Intel optimization manuals and alleged cycle costs
notwithstanding, gcc 3.4.2 avoids the "good enough" on either the 486 or
the 586. So VC's choice may be suboptimal here (good only for the 386,
486 and 586, while gcc covers the 386, the 686 and (presumably, since
you never know with Intel) the future), but this doesn't seem to warrant
your snap judgment of "crappy code generation".

The newer gcc's will produce branchless code for the example when
tuning for a 586 or higher.

Click to expand...

I only have gcc 3.4.2 at my disposal here. You do mean tuning for 586
while using the 386 instruction set, right? Could you reproduce the
relevant code?

Click to expand...

I suppose. Last week's gcc 4.1 from CVS:

[mark@icepick ~]$ gcc-4_1_cvs_20051015 --version
gcc-4_1_cvs_20051015 (GCC) 4.1.0 20051015 (experimental)
[...]

[mark@icepick ~]$ gcc-4_1_cvs_20051015 -Wall -ansi -pedantic -O2
-mtune=i686 -fomit-frame-pointer -c -o foo.o foo.c

modTest:
movzwl 8(%esp), %edx
movzwl 4(%esp), %eax
subl %edx, %eax
cltd
shrl $29, %edx
addl %edx, %eax
andl $7, %eax
subl %edx, %eax
movzwl %ax, %eax
ret

Yup, clever. Branchless and pipeline-optimized. Thanks, I didn't see this.

IMO, probably not. It's been a while since I've used VC6, but I
remember it seemed to produce nearly the same code no matter what you
tell it. Give it a try yourself-- hopefully I'll never have to touch
it again.

I don't have it. What do you take me for, a Microsoft apologist?

The
point here was not to defend VC, but to verify whether your condemnation
was valid.

Fair enough. I think it's a waste of time to look at the code
generation of what I consider to be a mostly obsolete compiler that
targets mostly obsolete machines. Like I originally said: Yawn.

Such is the danger of throwaway comments. That said, thanks for
indulging my demands that you expand your statements.

S.

Skarmander · Oct 23, 2005

Skarmander said:
Mark F. Haigh wrote:

[mark@icepick ~]$ gcc-4_1_cvs_20051015 --version
gcc-4_1_cvs_20051015 (GCC) 4.1.0 20051015 (experimental)
[...]

[mark@icepick ~]$ gcc-4_1_cvs_20051015 -Wall -ansi -pedantic -O2
-mtune=i686 -fomit-frame-pointer -c -o foo.o foo.c

modTest:
movzwl 8(%esp), %edx
movzwl 4(%esp), %eax
subl %edx, %eax
cltd
shrl $29, %edx
addl %edx, %eax
andl $7, %eax
subl %edx, %eax
movzwl %ax, %eax
ret

Click to expand...

Yup, clever. Branchless and pipeline-optimized. Thanks, I didn't see this.

One slight coda to this: gcc 3.4.2 will produce almost identical
branchless code if you manually crank up the cost of branches with
-mbranch-cost. Obviously, I didn't know about this option before, or I
would have tried it.

S.

Spiro Trikaliotis · Oct 23, 2005

Hello,

this is cross-posted with follow-up to comp.lang.asm.x86, because I talk about
generated and optimized x86 code for a C program, which is rahter OT in comp.lang.c.

Skarmander said:
I suppose. Last week's gcc 4.1 from CVS:

[mark@icepick ~]$ gcc-4_1_cvs_20051015 --version
gcc-4_1_cvs_20051015 (GCC) 4.1.0 20051015 (experimental)
[...]

[mark@icepick ~]$ gcc-4_1_cvs_20051015 -Wall -ansi -pedantic -O2
-mtune=i686 -fomit-frame-pointer -c -o foo.o foo.c

modTest:
movzwl 8(%esp), %edx
movzwl 4(%esp), %eax
subl %edx, %eax
cltd
shrl $29, %edx
addl %edx, %eax
andl $7, %eax
subl %edx, %eax
movzwl %ax, %eax
ret

Click to expand...

Yup, clever. Branchless and pipeline-optimized. Thanks, I didn't see this.

Now, you are comparing something very weird. MSVC++ 6.0 is some years
older than "last weeks gcc CVS".

Ok, although it is OT, I did some test with a newer version of the MS
compiler (as been available with the latest release DDK, Win 2003 DDK SP
1, with default settings for release builds):

C:\test>cl
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 13.10.4035 for 80x86
Copyright (C) Microsoft Corporation 1984-2002. All rights reserved.

Now, let's have a look at the compiled code:

unsigned short modTest1(unsigned short a, unsigned short b)
{
return ( (a - b) % 8 );
}

test!modTest1:
01001ba7 8bff mov edi,edi
01001ba9 55 push ebp
01001baa 8bec mov ebp,esp
01001bac 0fb74d0c movzx ecx,word ptr [ebp+0xc]
01001bb0 0fb74508 movzx eax,word ptr [ebp+0x8]
01001bb4 2bc1 sub eax,ecx
01001bb6 99 cdq
01001bb7 6a08 push 0x8
01001bb9 59 pop ecx
01001bba f7f9 idiv ecx
01001bbc 668bc2 mov ax,dx
01001bbf 5d pop ebp
01001bc0 c20800 ret 0x8
[...]

unsigned short modTest2(unsigned short a, unsigned short b)
{
return ( (a - b) % 8U );
}

test!modTest2:
01001bc8 8bff mov edi,edi
01001bca 55 push ebp
01001bcb 8bec mov ebp,esp
01001bcd 8b4508 mov eax,[ebp+0x8]
01001bd0 8b4d0c mov ecx,[ebp+0xc]
01001bd3 2bc1 sub eax,ecx
01001bd5 83e007 and eax,0x7
01001bd8 5d pop ebp
01001bd9 c20800 ret 0x8
[...]

unsigned short modTest3(unsigned short a, unsigned short b)
{
return ( (unsigned short)( (unsigned int)(a - b) ) % 8U );
}

test!modTest3:
01001be1 8bff mov edi,edi
01001be3 55 push ebp
01001be4 8bec mov ebp,esp
01001be6 33c0 xor eax,eax
01001be8 8a4508 mov al,[ebp+0x8]
01001beb 2a450c sub al,[ebp+0xc]
01001bee 83e007 and eax,0x7
01001bf1 5d pop ebp
01001bf2 c20800 ret 0x8

(Remark: I was not able to compile modTest2 and modTest3 if they were both
available with the main() calling it. The compiler always insisted on
optimizing them away. I had to use two different compilation units and link
them together to actually get code for them.)

Now, this looks much better than the MSVC++ 6.0 code, doesn't it?

Anyway, I am not sure if the IDIV approach with modTest1() is a good solution
compared with the solution of gcc 4.1. Any thoughts from the assembler gurus
here?

Regards,
Spiro.

Keyser Soze · Oct 23, 2005

Skarmander said:
That's why it was flagged "off-topic". And telling Usenet posters to get
out more? Hmm...

Hey, at least nobody was compared to Hitler. That should count for
something.

S.

Yes that's a good thing.

Signed mod unsigned	29	Jun 6, 2012
How to alter the program so that when user types z or Z or 0, the program sets both a and b to zero?	0	Oct 11, 2022
Command Line Arguments	0	Mar 7, 2023
Adding adressing of IPv6 to program	1	Feb 16, 2023
signed vs. unsigned multiplication	8	Jun 17, 2012
Communicating between processes	0	May 14, 2023
THE PROGRAM IS NOT RUNING	3	Nov 1, 2022
Linux: using "clone3" and "waitid"	0	Oct 17, 2023

The difference between signed and unsigned when doing `mod'

Hanzac Chen

Walter Roberson

Keith Thompson

Jack Klein

Walter Roberson

Mike Wahler

Hanzac Chen

Jack Klein

Keyser Soze

Mark F. Haigh

Skarmander

Mark F. Haigh

Skarmander

Keyser Soze

Mark F. Haigh

Skarmander

Skarmander

Skarmander

Spiro Trikaliotis

Keyser Soze

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads