Amusing C, amusing compiler

A

Ark

A function
int foo(struct T *x)
{
return (x+1)-x;
}
should always return a 1, no matter how T is defined. (And int could be
replaced with ptrdiff_t for you pedants.)

For one thing, it was amusing to watch how my compiler (the famed IAR
EWARM for ARM) jumps through hoops to arrive at this answer.
[ For sizeof(struct T)==6,
00000000 061080E2 ADD R1,R0,#+6
00000004 18209FE5 LDR R2,??foo_0 ;; 0xaaaaaaab
00000008 92318CE0 UMULL R3,R12,R2,R1
0000000C 2CC1A0E1 LSR R12,R12,#+2
00000010 0210A0E1 MOV R1,R2
00000014 912083E0 UMULL R2,R3,R1,R0
00000018 2331A0E1 LSR R3,R3,#+2
0000001C 03004CE0 SUB R0,R12,R3
00000020 0EF0A0E1 MOV PC,LR ;; return
??foo_0:
00000024 ABAAAAAA DC32 0xaaaaaaab
]

How does your compiler fare?
[MSVC gets it right:
mov eax, 1
ret 0
]

Another thing is that, logically, since the actual type doesn't matter,
it could be an incomplete type. However, if I just say
struct T;
before the foo's body, compilation fails. Is it good and/or justified?

- Ark
 
J

jacob navia

Ark said:
A function
int foo(struct T *x)
{
return (x+1)-x;
}
should always return a 1, no matter how T is defined. (And int could be
replaced with ptrdiff_t for you pedants.)

For one thing, it was amusing to watch how my compiler (the famed IAR
EWARM for ARM) jumps through hoops to arrive at this answer.
[ For sizeof(struct T)==6,
00000000 061080E2 ADD R1,R0,#+6
00000004 18209FE5 LDR R2,??foo_0 ;; 0xaaaaaaab
00000008 92318CE0 UMULL R3,R12,R2,R1
0000000C 2CC1A0E1 LSR R12,R12,#+2
00000010 0210A0E1 MOV R1,R2
00000014 912083E0 UMULL R2,R3,R1,R0
00000018 2331A0E1 LSR R3,R3,#+2
0000001C 03004CE0 SUB R0,R12,R3
00000020 0EF0A0E1 MOV PC,LR ;; return
??foo_0:
00000024 ABAAAAAA DC32 0xaaaaaaab
]

How does your compiler fare?
[MSVC gets it right:
mov eax, 1
ret 0
]

Another thing is that, logically, since the actual type doesn't matter,
it could be an incomplete type. However, if I just say
struct T;
before the foo's body, compilation fails. Is it good and/or justified?

- Ark

It is amusing how stupid machines are.
You know? You forget a semi colon and they get all screwed up.

Stupid isn't it?

1) If x is double. If x is a NAN or INFinity,
the result is not one but NAN.
2) For x == INTMAX, x+1-x depends on how the operations are
ordered by the compiler. If it makes (x-1)+x then is 1,
otherwise it overflows and there is undefined behavior.
If INTMAX+1 wraps around to negative the result is not one
either

3) If x is unsigned and equal to UINT_MAX, the the result is -x.

4) There is an infinite number of this kind of optimizations
x+1-x
x+2-x
x+3-x
x+4-x+2*x-x-x
x+5-x + 3*x-x-x-x + 5765*x-5000*x-765*x

For that last expression that should be 5 Microsoft generates:
mov eax, DWORD PTR _x$[ebp]
add eax, 5
sub eax, DWORD PTR _x$[ebp]
mov ecx, DWORD PTR _x$[ebp]
imul ecx, 3
add eax, ecx
sub eax, DWORD PTR _x$[ebp]
sub eax, DWORD PTR _x$[ebp]
sub eax, DWORD PTR _x$[ebp]
mov edx, DWORD PTR _x$[ebp]
imul edx, 5765
add eax, edx
mov ecx, DWORD PTR _x$[ebp]
imul ecx, 5000
sub eax, ecx
mov edx, DWORD PTR _x$[ebp]
imul edx, 765
sub eax, edx

Stupid isn't it???


What is not so clever is to expect from machines that they
understand anything.

A machine doesn't grasp anything, nor it understands anything.
It is we that give the understanding to the machines, and we
have to do it mostly by enumeration. There is no way for
a machine to make an abstraction, I suspect that is because
THEY DO NOT THINK!!!!

Happy for us programmers, they are not there yet, not will be there
for a long while.

Brains THINK, you see? Machines have no brains and can't generalize
vcan't build from experience, can't do anything. All they have is
a fake of "intelligence" in the form of complex enumerations
of facts and algorithms that humans program into them.

Brains THINK, machines execute. That's why machines never do any
mistake. Only WE can do mistakes. It is because only WE have
intent. They do not have any.

jacob
 
I

Ian Collins

jacob said:
It is amusing how stupid machines are.
You know? You forget a semi colon and they get all screwed up.

Stupid isn't it?

1) If x is double. If x is a NAN or INFinity,
the result is not one but NAN.

But the example is only doing pointer arithmetic..
 
I

Ian Collins

Ark said:
Another thing is that, logically, since the actual type doesn't matter,
it could be an incomplete type. However, if I just say
struct T;
before the foo's body, compilation fails. Is it good and/or justified?
The expression still has to be parsed before it is optimised, so the
size of T has to be know in order to attempt the pointer arithmetic.
 
P

pete

Ark said:
A function
int foo(struct T *x)
{
return (x+1)-x;
}
should always return a 1, no matter how T is defined.
(And int could be
replaced with ptrdiff_t for you pedants.)
Another thing is that, logically,
since the actual type doesn't matter,
it could be an incomplete type. However, if I just say
struct T;
before the foo's body, compilation fails. Is it good and/or justified?

It is good and/or justified.

If x is a pointer to an incomplete type, then (x + 1) is undefined.

You can't do pointer arithmetic on pointers to incomplete types.
 
S

straywithsmile

In visual studio 2005, if the struct T haven't defined( NOT declared),
the compiler will give an error message, if it has been defined, and
you just give a pointer that is not initialed, e.g
struct T* ptr;
foo(ptr);
the visual studio 2005, also give an run-time error message, said ptr
is not initialed, but on GCC it is right, and will return one; my
compiler DJGPP's assemble tell me that, the foo is optimized by the
compiler, and return 1 directly.
 
K

Keith Thompson

Especially if it is void*

Why "especially"?

<OT>gcc allows arithmitec on void* as an extension; it doesn't allow
arithmitec on pointers to other incomplete types.</OT>
 
W

websnarf

Ian said:
But the example is only doing pointer arithmetic..

Setting a pointer beyond its boundaries apparently leads to undefined
behavior. This suggests that on at least some theoretical platforms it
might be worth while to not do this simplification. If I am not
mistaken, unsigned integers is the only data type where the standard
guarantees the applicability of that simplification.

Another more pressing point -- what is the use of such a nonsensical
simplification? Who is subtracting two pointers where the base pointer
type is clearly the same, rather than doing the simplification by hand?
Maybe some macro shenanigans, but to be honest I have never found
myself doing that and *not* simplifying it by hand. Personally, I
judge the compiler optimizer for its ability to generate good code
where human intevention is either impossible (jump tables) or difficult
(software-based register renaming.) This is why WATCOM C/C++ is still
in my arsenal of compilers.
 
S

Simon Biber

In visual studio 2005, if the struct T haven't defined( NOT declared),
the compiler will give an error message, if it has been defined, and
you just give a pointer that is not initialed, e.g
struct T* ptr;
foo(ptr);
the visual studio 2005, also give an run-time error message, said ptr
is not initialed

Absolutely correct. It's undefined behaviour to add one to an
uninitialised pointer, since it does not point at any object.

VS 2005 is being helpful in diagnosing this undefined behaviour for you,
though it is not required to do so.
but on GCC it is right, and will return one; my

That's not "right" per se, merely one possible manifestation of
undefined behaviour. The way GCC is behaving is correct but it is not
required to behave in that way.
compiler DJGPP's assemble tell me that, the foo is optimized by the
compiler, and return 1 directly.

Compilers are allowed to do that, but not required to do so.
 
A

Ark

Setting a pointer beyond its boundaries apparently leads to undefined
behavior.
No. I can address (but not dereference) one beyond
This suggests that on at least some theoretical platforms it
might be worth while to not do this simplification. If I am not
mistaken, unsigned integers is the only data type where the standard
guarantees the applicability of that simplification.

Another more pressing point -- what is the use of such a nonsensical
simplification? Who is subtracting two pointers where the base pointer
type is clearly the same, rather than doing the simplification by hand?
Maybe some macro shenanigans, but to be honest I have never found
myself doing that and *not* simplifying it by hand.
Oh. It's a boiled down example from real life. The origin is indeed from
a macro.
Personally, I
judge the compiler optimizer for its ability to generate good code
where human intevention is either impossible (jump tables) or difficult
(software-based register renaming.)
Also consider how they optimize constant expressions and eliminate
common subexpressions. The latter may not be yours but of the compiler's
own production. Add instruction scheduling, data rearrangement. function
inlining, function prologue/epilogue optimization etc.
- Ark
 
A

Ark

pete said:
It is good and/or justified.

If x is a pointer to an incomplete type, then (x + 1) is undefined.

You can't do pointer arithmetic on pointers to incomplete types.

Thanks. But I didn't ask if it is /legal/; I questioned the wisdom of it
being illegal - at today's level of compiler technology.
If
- an expression correctly evaluates to something sensible /regardless/
of the values of some of its terms, and
- these terms are known to be valid (although not known precisely)
then what's wrong with accepting such an expression?

- Ark
 
I

Ian Collins

Ark said:
Thanks. But I didn't ask if it is /legal/; I questioned the wisdom of it
being illegal - at today's level of compiler technology.
If
- an expression correctly evaluates to something sensible /regardless/
of the values of some of its terms, and
- these terms are known to be valid (although not known precisely)
then what's wrong with accepting such an expression?
As I posted yesterday, the expression has to be parsed before it is
optimised.
 
K

Keith Thompson

Not necessary; the expression, if it's legal, yields a result of type
ptrdiff_t, which will be implicitly converted to type int.
Thanks. But I didn't ask if it is /legal/; I questioned the wisdom of
it being illegal - at today's level of compiler technology.
If
- an expression correctly evaluates to something sensible /regardless/
of the values of some of its terms, and
- these terms are known to be valid (although not known precisely)
then what's wrong with accepting such an expression?

What's *right* with accepting such an expression?

Attempting to do arithmetic on a pointer to an incomplete type is an
error, requiring a diagnostic. The purpose of the diagnostic is to
tell the programmer that he's done something wrong or meaningless. I
see no benefit in suppressing that diagnostic if some terms of the
expression happen to cancel out.

If you want to write "return 1;", just write "return 1;". If you want
to perform arithmetic on a pointer to an incomplete type, then you're
making a mistake.
 
C

Christopher Benson-Manica

(This post is not really on-topic at all; it deals with assembly code
generated by gcc 3.3.3 for a situation described by OP.)
A function
int foo(struct T *x)
{
return (x+1)-x;
}

FWIW, gcc 3.3.3 (such is what my Unix host, SDF, provides), when fed

#include <stdio.h>

struct foo {
int bar;
int baz;
};

int qux( struct foo *f ) {
return (f+1)-f;
}

int main(void)
{
struct foo g;
return qux(&g)-1;
}

, generates this with optimization disabled:

.set noat
.set noreorder
.text
.align 2
.globl qux
.ent qux
$qux..ng:
qux:
.frame $15,32,$26,0
.mask 0x4008000,-32
lda $30,-32($30)
stq $26,0($30)
stq $15,8($30)
bis $31,$30,$15
.prologue 0
stq $16,16($15)
lda $0,1($31)
bis $31,$15,$30
ldq $26,0($30)
ldq $15,8($30)
lda $30,32($30)
ret $31,($26),1
.end qux
.align 2
.globl main
.ent main
main:
.frame $15,32,$26,0
.mask 0x4008000,-32
ldgp $29,0($27)
$main..ng:
lda $30,-32($30)
stq $26,0($30)
stq $15,8($30)
bis $31,$30,$15
.prologue 1
lda $16,16($15)
bsr $26,$qux..ng
subl $0,1,$1
addl $31,$1,$1
bis $31,$1,$0
bis $31,$15,$30
ldq $26,0($30)
ldq $15,8($30)
lda $30,32($30)
ret $31,($26),1
.end main
.ident "GCC: (GNU) 3.3.3 (NetBSD nb3 20040520)"

Not so good, but as I believe someone elsethread (but in ng) mentioned
recently, gcc is notorious for generating brain-dead code unless you
ask it to optimize. When asked to do so (given -O3), gcc generates

.set noat
.set noreorder
.text
.align 2
.align 4
.globl main
.ent main
$main..ng:
main:
.frame $30,16,$26,0
lda $30,-16($30)
.prologue 0
bis $31,$31,$0
lda $30,16($30)
ret $31,($26),1
.end main
.align 2
.align 4
.globl qux
.ent qux
$qux..ng:
qux:
.frame $30,0,$26,0
.prologue 0
lda $0,1($31)
ret $31,($26),1
.end qux
.ident "GCC: (GNU) 3.3.3 (NetBSD nb3 20040520)"

I'm far from an assembler guru (I'm quite happy to let those of you
who are continue to make gobs of cash so that I never have to code in
it), but gcc seems to be pretty capable when you ask it to be. It
isn't what ICC will do for you (presumably), but then again ICC costs
a bit more, and cannot claim any contribution from the eminently
eminent Richard Stallman. One could do a lot worse, although I'd be
curious how gcc 4 performs.
 
L

lovecreatesbea...

A function
int foo(struct T *x)
{
return (x+1)-x;}should always return a 1, no matter how T is defined. (And int could be
replaced with ptrdiff_t for you pedants.)

I remember that someone has posted code emulating the sizeof operator
similar to the following:

/*a.c*/
struct T{
struct T *p;
int i;
} t;

int foo(struct T *x){
return (x+1)-x;
}

int size_of(struct T *x){
char *p1 = (char*)(x + 1), *p2 = (char*) x;
return p1 - p2;
}

int main(void){
printf("%d\n", foo(&t));
printf("%d\n", size_of(&t));
return 0;
}

$ gcc a.c

$ a
1
8

$
 
L

lovecreatesbea...

/*
A function
int foo(struct T *x)
{
return (x+1)-x;}should always return a 1, no matter how T is defined. (And int could be
replaced with ptrdiff_t for you pedants.)

I remember that someone has posted code emulating the sizeof operator
similar to the following:
*/

/*a.c*/
struct T{
struct T *p;
int i;
} t;

int foo(struct T *x){
return (x+1)-x;
}

int size_of(struct T *x){
char *p1 = (char*)(x + 1), *p2 = (char*) x;
return p1 - p2;
}

int main(void){
printf("%d\n", foo(&t));
printf("%d\n", size_of(&t));
return 0;
}

$ gcc a.c

$ a
1
8

$
 
R

Richard Heathfield

(e-mail address removed) said:
/*a.c*/
struct T{
struct T *p;
int i;
} t;

int foo(struct T *x){
return (x+1)-x;
}

int size_of(struct T *x){
char *p1 = (char*)(x + 1), *p2 = (char*) x;
return p1 - p2;
}

int main(void){
printf("%d\n", foo(&t));

It took a while this time, but the undefined behaviour starts here.
 
P

pete

Ark said:
Thanks. But I didn't ask if it is /legal/;
I questioned the wisdom of it being illegal
- at today's level of compiler technology.

It has nothing to do with modern technology.
Pointer arithmetic on pointers to incomplete types, is meaningless,
that's the problem.
If
- an expression correctly evaluates to something sensible /regardless/
of the values of some of its terms, and
- these terms are known to be valid (although not known precisely)
then what's wrong with accepting such an expression?

If x is a pointer to an incomplete type,
then (x + 1) doesn't evaluate to anything.

If x is a pointer to type int,
then ((char *)(x + 1) - (char *)x) equals sizeof(int).

If x is a pointer to an incomplete type,
then what does ((char *)(x + 1) - (char *)x) equal?
 
R

Richard Tobin

Ark said:
A function
int foo(struct T *x)
{
return (x+1)-x;
}
should always return a 1, no matter how T is defined.

I think that's true, provided it does not invoke undefined behaviour.

The following calls would invoke undefined behaviour when the addition
is performed:

void bar(void)
{
struct T *undefined;
struct T array[1];
struct T *null = 0;

foo(undefined);
foo(&array[1]);
foo(null);
}

(maybe the first one invokes it as soon as the variable is passed to
the function?)

-- Richard
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top