N
Netocrat
The code at the bottom illustrates the nature of a situation discussed on
comp.lang.asm.x86. Basically an unsigned integer is being used as a
negative index in a loop, and it works as though the integer were signed.
I want to confirm my interpretation of this behaviour as there have been
differing understandings of why it works.
On my system (P4 using gcc under linux), the results are as follows:
len before sign swap : 9
shlen before sign swap : 9
l1 before addition : 0x8048644
l1[0] before addition : a
l1 after addition : 0x804864d
len after sign swap : 4294967287
&l1[len] : 0x8048644
l1[len] : a
Looping using len
l1[len==4294967287]: a
l1[len==4294967288]: b
l1[len==4294967289]: c
l1[len==4294967290]: d
l1[len==4294967291]: e
l1[len==4294967292]: f
l1[len==4294967293]: g
l1[len==4294967294]: h
l1[len==4294967295]: i
shlen after sign swap : 65527
Segmentation fault
I expected those results for shlen, but I didn't expect that the access
through len would work.
I have looked at the draft C89 and C99 standards and they don't seem to
say anything about the interpretation of indexes except that they must be
integers (6.5.2.1 Array subscripting). I assume that sign is therefore
interpreted according to the type of the integer and no implicit cast is
applied.
So my interpretation according to the standard - given that
sizeof(unsigned int) is 4 and sizeof(char *) is also 4 in this case - is
as follows:
The code len = -len; sets len to (unsigned integer)-9 == 4294967287. The
output confirms this. Then &l1[len] expands to:
l1 + len * sizeof(char)
== l1 + 4294967287
Given that l1 is greater than 9, this should overflow.
From C89 Draft, 3.3.6 Additive operators:
"As with any other arithmetic overflow, if the result does not fit
in the space provided, the behavior is undefined." [referring to pointer
addition]
So the fact that the code happens to work in this case is not by virtue of
the standard as I interpret it. It is officially undefined behaviour.
Is this interpretation correct?
Code is below
--------------
#include <stdio.h>
int main(void)
{
char *l1 = "abcdefghi";
unsigned int len = 9;
unsigned short shlen = len;
printf("len before sign swap : %u\n", len);
printf("shlen before sign swap : %hu\n", shlen);
printf("l1 before addition : %p\n", l1 );
printf("l1[0] before addition : %c\n", l1[0] );
l1 += len;
len = -len;
shlen = -shlen;
printf("l1 after addition : %p\n", l1 );
printf("len after sign swap : %u\n", len);
printf("&l1[len] : %p\n", &l1[len]);
printf("l1[len] : %c\n", l1[len]);
printf("Looping using len\n");
while(len != 0) {
printf("l1[len==%u]: %c\n", len, l1[len]);
len++;
}
printf("shlen after sign swap : %hu\n", shlen);
printf("l1[shlen] : %c\n", l1[shlen]);
printf("&l1[shlen] : %p\n", &l1[shlen]);
printf("Looping using shlen\n");
while(shlen != 0) {
printf("l1[shlen==%hu]: %c\n", shlen, l1[shlen]);
shlen++;
}
return 0;
}
comp.lang.asm.x86. Basically an unsigned integer is being used as a
negative index in a loop, and it works as though the integer were signed.
I want to confirm my interpretation of this behaviour as there have been
differing understandings of why it works.
On my system (P4 using gcc under linux), the results are as follows:
len before sign swap : 9
shlen before sign swap : 9
l1 before addition : 0x8048644
l1[0] before addition : a
l1 after addition : 0x804864d
len after sign swap : 4294967287
&l1[len] : 0x8048644
l1[len] : a
Looping using len
l1[len==4294967287]: a
l1[len==4294967288]: b
l1[len==4294967289]: c
l1[len==4294967290]: d
l1[len==4294967291]: e
l1[len==4294967292]: f
l1[len==4294967293]: g
l1[len==4294967294]: h
l1[len==4294967295]: i
shlen after sign swap : 65527
Segmentation fault
I expected those results for shlen, but I didn't expect that the access
through len would work.
I have looked at the draft C89 and C99 standards and they don't seem to
say anything about the interpretation of indexes except that they must be
integers (6.5.2.1 Array subscripting). I assume that sign is therefore
interpreted according to the type of the integer and no implicit cast is
applied.
So my interpretation according to the standard - given that
sizeof(unsigned int) is 4 and sizeof(char *) is also 4 in this case - is
as follows:
The code len = -len; sets len to (unsigned integer)-9 == 4294967287. The
output confirms this. Then &l1[len] expands to:
l1 + len * sizeof(char)
== l1 + 4294967287
Given that l1 is greater than 9, this should overflow.
From C89 Draft, 3.3.6 Additive operators:
"As with any other arithmetic overflow, if the result does not fit
in the space provided, the behavior is undefined." [referring to pointer
addition]
So the fact that the code happens to work in this case is not by virtue of
the standard as I interpret it. It is officially undefined behaviour.
Is this interpretation correct?
Code is below
--------------
#include <stdio.h>
int main(void)
{
char *l1 = "abcdefghi";
unsigned int len = 9;
unsigned short shlen = len;
printf("len before sign swap : %u\n", len);
printf("shlen before sign swap : %hu\n", shlen);
printf("l1 before addition : %p\n", l1 );
printf("l1[0] before addition : %c\n", l1[0] );
l1 += len;
len = -len;
shlen = -shlen;
printf("l1 after addition : %p\n", l1 );
printf("len after sign swap : %u\n", len);
printf("&l1[len] : %p\n", &l1[len]);
printf("l1[len] : %c\n", l1[len]);
printf("Looping using len\n");
while(len != 0) {
printf("l1[len==%u]: %c\n", len, l1[len]);
len++;
}
printf("shlen after sign swap : %hu\n", shlen);
printf("l1[shlen] : %c\n", l1[shlen]);
printf("&l1[shlen] : %p\n", &l1[shlen]);
printf("Looping using shlen\n");
while(shlen != 0) {
printf("l1[shlen==%hu]: %c\n", shlen, l1[shlen]);
shlen++;
}
return 0;
}