Circumventing the -fno-strict-aliasing switch

M

Marco Devillers

Hi all, a short question.

I wrote a compiler which compiles to C and has a garbage collector
written in C. The garbage collector assumes that nodes in memory are
arrays of integers or pointers (I use the default integer of pointer
size type for that).

However, sometimes, terminal nodes may hold other values, such as
floats, doubles, etc. Naturally, I cast these then to the correct
type, so a intptr_t* becomes a float*. Of course, this is unspecified
behavior according to the C standard, so I need to compile with the -
fno-strict-aliasing switch in gcc.

I want to rewrite my code such that I don't need that switch anymore,
i.e., I want to end up with portable code.

My question: What is the correct manner to rewrite this code?

I know there's an exception to strict aliasing, i.e., the char
pointer. Should I redefine the node pointer type as char*, or is it
sufficient to cast intptr_t* to char* and then to float* (can I use a
char* as an intermediate to trick the C type system)?

Thanks all,
Marco
 
J

jacob navia

Le 25/05/11 14:53, Marco Devillers a écrit :
Hi all, a short question.

I wrote a compiler which compiles to C and has a garbage collector
written in C. The garbage collector assumes that nodes in memory are
arrays of integers or pointers (I use the default integer of pointer
size type for that).

However, sometimes, terminal nodes may hold other values, such as
floats, doubles, etc. Naturally, I cast these then to the correct
type, so a intptr_t* becomes a float*. Of course, this is unspecified
behavior according to the C standard, so I need to compile with the -
fno-strict-aliasing switch in gcc.

I want to rewrite my code such that I don't need that switch anymore,
i.e., I want to end up with portable code.

My question: What is the correct manner to rewrite this code?

I know there's an exception to strict aliasing, i.e., the char
pointer. Should I redefine the node pointer type as char*, or is it
sufficient to cast intptr_t* to char* and then to float* (can I use a
char* as an intermediate to trick the C type system)?

Thanks all,
Marco

This will never work correctly. As you say "this is unspecified
behavior according to the C standard". Do not do this

Make a union of the different pointer types (int * float * whatever)
and use the language instead of fighting against it.

typedef union ptrs {
int *pInt;
float *pFloat;
double *pDouble;
// etc
} Ptrs;
 
M

Marco Devillers

Le 25/05/11 14:53, Marco Devillers a écrit :
















This will never work correctly. As you say "this is unspecified
behavior according to the C standard". Do not do this

Make a union of the different pointer types (int * float * whatever)
and use the language instead of fighting against it.

typedef union ptrs {
        int *pInt;
        float *pFloat;
        double *pDouble;
        // etc
} Ptrs;

Can't. I don't know, a priory, what will be stored in a terminal node.
 
J

jacob navia

Le 25/05/11 16:15, Marco Devillers a écrit :
Can't. I don't know, a priory, what will be stored in a terminal node.

Then use a void pointer at the end. That is the official "wildcard"
 
M

Marco Devillers

Le 25/05/11 16:15, Marco Devillers a écrit :



Then use a void pointer at the end. That is the official "wildcard"

Both remarks are nonsense. The union will not work since I am storing
arbitrary data; if anything, I need a union of primitive types.
Moreover, even in a union, there are no guarantees according to the
standard that your solution will work since reading a value from a
union with a different type than written to is unspecified also.
Moreover, the void* isn't the official wildcard when dealing with
aliasing rules: the char pointer is.
 
T

Tom St Denis

Hi all, a short question.

I wrote a compiler which compiles to C and has a garbage collector
written in C. The garbage collector assumes that nodes in memory are
arrays of integers or pointers (I use the default integer of pointer
size type for that).

However, sometimes, terminal nodes may hold other values, such as
floats, doubles, etc. Naturally, I cast these then to the correct
type, so a intptr_t* becomes a float*. Of course, this is unspecified
behavior according to the C standard, so I need to compile with the -
fno-strict-aliasing switch in gcc.

I want to rewrite my code such that I don't need that switch anymore,
i.e., I want to end up with portable code.

My question: What is the correct manner to rewrite this code?

I know there's an exception to strict aliasing, i.e., the char
pointer. Should I redefine the node pointer type as char*, or is it
sufficient to cast intptr_t* to char* and then to float* (can I use a
char* as an intermediate to trick the C type system)?

As Jacob pointed out use a "void *" pointer in your terminal node so
you can assign any pointer type to it. You might want to have some
form of metadata along for the ride so you know what type the node is
when using it...

Tom
 
M

Marco Devillers

As Jacob pointed out use a "void *" pointer in your terminal node so
you can assign any pointer type to it.  You might want to have some
form of metadata along for the ride so you know what type the node is
when using it...

Tom

The metadata is -of course- already present; size and type tags
discriminate all terminal nodes.

Terminal nodes hold series of bits, integers or pointers, but also may
hold data, not pointers to data. I don't see how your solution would
work.
 
M

Marco Devillers

Forget those remarks. The only way out, I think, is casting stuff to
char* (which is allowed) and copying that with memcopy/sizeof to
pointers to memory of a given type.

Is that portable?
 
T

Tim Rentsch

Marco Devillers said:
Hi all, a short question.

I wrote a compiler which compiles to C and has a garbage collector
written in C. The garbage collector assumes that nodes in memory are
arrays of integers or pointers (I use the default integer of pointer
size type for that).

It would help a lot if you showed some code to make the question
more specific. Obvious first question: is the format of a Node
(ignoring the cases with floats, etc) like this

struct Node_s {
union {
some_integer_type integers[ ELEMENTS_PER_NODE ];
struct Node_s *pointers[ ELEMENTS_PER_NODE ];
} element_arrays;
};

or like this

struct Node_s {
union {
some_integer_type integer;
struct Node_s *pointer;
} elements[ ELEMENTS_PER_NODE ];
};

or something else? The description you give isn't
specific enough so people can tell.

However, sometimes, terminal nodes may hold other values, such as
floats, doubles, etc. Naturally, I cast these then to the correct
type, so a intptr_t* becomes a float*. Of course, this is unspecified
behavior according to the C standard, so I need to compile with the -
fno-strict-aliasing switch in gcc.

Most likely the behavior being evoked is undefined, not
unspecified.

I want to rewrite my code such that I don't need that switch anymore,
i.e., I want to end up with portable code.
Excellent.


My question: What is the correct manner to rewrite this code?

Please show some example code fragments (and including the
relevant data type definitions) of the code you're asking
about.

I know there's an exception to strict aliasing, i.e., the char
pointer. Should I redefine the node pointer type as char*, or is it
sufficient to cast intptr_t* to char* and then to float* (can I use a
char* as an intermediate to trick the C type system)?

Beware! The term 'strict aliasing' is a gcc-ism, and doesn't
necessarily map onto what the Standard defines as conforming.
The terminology used in the Standard is 'effective type rules'.
I suspect you intend 'strict aliasing' to be synonymous with
obeying the Standard's effective type rules, but it isn't,
and so it's important to distinguish between them.
 
J

jacob navia

Le 25/05/11 16:44, Marco Devillers a écrit :
Both remarks are nonsense. The union will not work since I am storing
arbitrary data; if anything, I need a union of primitive types.

excuse me but it was YOU that wrote:
However, sometimes, terminal nodes may hold other values, such as
floats, doubles, etc. Naturally, I cast these then to the correct
type, so a intptr_t* becomes a float*

It is you that spoke about pointers in the first place.

Now, post the definition of the structure.

I wrote several years ago a lisp interpreter. I had the following
structure;

typedef struct cons {
unsigned char n_type;
unsigned char n_flags;
union { /* valeur */
struct lsym { /* ------------------symbole */
SYMBOL_POINTER plist; /* symbol plist */
struct cons *value; /* sa valeur */
} n_lsym;
struct csym { /* compiled-symbol */
unsigned short offset; /* deplacement */
struct cons *psym; /* pointeur vers le symbole */
} n_csym;
struct lcode {
unsigned char *code;
struct cons *data;
} n_lcode;
struct clabel { /* -----Etiquette pour gotos */
struct cons *psym; /* pointeur vers le symbole */
struct context *pcontext; /* pointeur vers le contexte */
} n_clabel;
struct lsubr { /* ----------subr/fsubr node */
short xorder;
struct cons *(*MachineCode)();/* pointeur vers le code */
} n_lsubr;
struct llist { /* ---------list node (cons) */
struct cons *list_car; /* the car pointer */
struct cons *list_cdr; /* the cdr pointer */
} n_llist;
struct lint { /* -------------integer node */
ENTIER xi_int; /* integer value */
} n_lint;
struct lchar { /* ------------caracter node */
ENTIER caractere; /* caracter value */
} n_lettre;
struct lratio { /* ---------------ratio node */
ENTIER numerator;
ENTIER denominator;
} n_ratio;
struct lfloat { /* --------------float node */
FLOTTANT lf_float; /* float value */
} n_lfloat;
struct xcmplx { /* ------------complex node */
struct cons *real_part; /* real value */
struct cons *imag_part; /* imag value */
} n_xcmplx;
struct lstr { /* -------------string node */
unsigned char *xst_str; /* Pointeur */
unsigned long xstrlen; /* Longueur */
} n_lstr;
struct lfixstr { /* ------------fixed string */
unsigned char fixstr[MAX_FIX_STR];
} n_lfixstr;
struct lfptr { /* -----------------fichier */
STREAM *lf_sp; /* Pointeur vers info */
short lf_savech; /* lookahead */
} n_lfptr;
struct intvect { /* simple vector (integers or
floats) */
Vector *ivdescriptor;
ENTIER *int_data;
} n_intvect;
struct vecteur { /* -------------vector node */
//
// BIG SNIP
//

}
inform;
}LISP_VALUE;

If you are writing a compiler you KNOW what types you have.

And, as far as I know, void * is the wildcard and NOT char pointer.
 
T

Tim Rentsch

Marco Devillers said:
[suggesting an approach using a union]

The union will not work

Using a union (not necessarily the suggested one) is the
approach that is most likely _to_ work.
since I am storing
arbitrary data; if anything, I need a union of primitive types.

Obviously you know all the different types that can be
held, since you are casting the pointer types to (float *),
(double *), etc. Make a union that has all the different
types that a node can hold.

Moreover, even in a union, there are no guarantees according to the
standard that your solution will work since reading a value from a
union with a different type than written to is unspecified also.

Irrelevant, since presumably you will be using the same
type as what the node actually holds, so the same type
will be used for both reading and writing.

(Technical point: the rules for reading a union member
other than the last one stored are not unspecified (or
undefined), and the semantics is more well-defined than
most people think. This probably isn't relevant for
what you're doing, but it's a common misconception and
one worth clarifying.)
 
M

Marco Devillers

Marco Devillers said:
Hi all, a short question.
I wrote a compiler which compiles to C and has a garbage collector
written in C. The garbage collector assumes that nodes in memory are
arrays of integers or pointers (I use the default integer of pointer
size type for that).

It would help a lot if you showed some code to make the question
more specific.  Obvious first question:  is the format of a Node
(ignoring the cases with floats, etc) like this

   struct Node_s {
      union {
         some_integer_type   integers[ ELEMENTS_PER_NODE ];
         struct Node_s      *pointers[ ELEMENTS_PER_NODE ];
      } element_arrays;
   };

Excellent! No, a node is just a series of cells, integers or pointers,
so I didn't define an explicit node type. Referring to a node is just
intptr_t* node, though there are a number of invariants. For brevity,
lets assume that the first value is always an integer and tells you a)
the size and b) whether the following bits are integers (a terminal
node) or pointers (a non-terminal). (It's more complex, stuff may mix
since it is a conservative collector, but this will do.)

So, I sometimes store floats, the bits, into the memory of a terminal
node starting with the second cell. But I would like it if people are
able to add new basic types to the language such as pairs of floats
for an OpenGL binding. So, a priori, I have no knowledge of what data
is stored in the payload of a terminal node.
 
K

Keith Thompson

Marco Devillers said:
Le 25/05/11 16:15, Marco Devillers a écrit :



Then use a void pointer at the end. That is the official "wildcard"
[snip]
Moreover, the void* isn't the official wildcard when dealing with
aliasing rules: the char pointer is.

void*, char*, signed char*, and unsigned char* are guaranteed to have
the same representation and alignment requirements. For your purposes,
I think the only difference is going to be whether you need (explicit)
casts for any conversions. void* is probably going to be more
convenient, but you can use whichever one you like.

I haven't thought through the implications of using non-pointer types.
 
K

Keith Thompson

Keith Thompson said:
Marco Devillers said:
Le 25/05/11 16:15, Marco Devillers a écrit :

typedef union ptrs {
         int *pInt;
         float *pFloat;
         double *pDouble;
         // etc
} Ptrs;

Can't. I don't know, a priory, what will be stored in a terminal node.

Then use a void pointer at the end. That is the official "wildcard"
[snip]
Moreover, the void* isn't the official wildcard when dealing with
aliasing rules: the char pointer is.

void*, char*, signed char*, and unsigned char* are guaranteed to have
the same representation and alignment requirements. For your purposes,
I think the only difference is going to be whether you need (explicit)
casts for any conversions. void* is probably going to be more
convenient, but you can use whichever one you like.

I haven't thought through the implications of using non-pointer types.

I should mention that there's really no such thing as a "wildcard"
pointer type in C. void* has some properties that make it generally
useful, but you shouldn't conclude from those properties that it's a
"wildcard", and then draw further conclusions from that.

Converting a pointer to an object or incomplete type to void* and
then back again yields the original pointer value. (Note that
this doesn't apply to function pointers.) I think that's the
only useful sense in which void* is a "wildcard". (There are also
rules allowing implicit conversions, but since you're generating
C code you can insert any necessary casts anyway, so that's not
particularly relevant.)

There are no guarantees abot the representation of void* vs., say,
double*; either could in principle be larger than the other, and
storing one in a union and then retrieving the other could give
you garbage. (Though in most implementations, they happen to have
the same representation.)
 
K

Keith Thompson

Marco Devillers said:
Excellent! No, a node is just a series of cells, integers or pointers,
so I didn't define an explicit node type. Referring to a node is just
intptr_t* node, though there are a number of invariants.
[...]

If you're using intptr_t, you code isn't absolutely portable, since
intptr_t isn't guaranteed to exist. But it may well be portable enough
for your purposes.

You're more likely to run into an implementation that doesn't have
intptr_t because it doesn't support C99 than because there's no
appropriate integer type capable of holding pointer values.
 
M

Marco Devillers

Dear people, I want to _alias_ stuff. The only pointer type on which
it is safe to alias _to_ is the char*.

So,

float f = 3.14;
float* fp = &f;
char* aliased_fp = (char*) fp; // only supported aliasing, as far as
I know

So, I guess the only option I have to safely 'convert' between int and
float is something like this (please allow for some typos/semantic
mistakes and sizing problems):

float f = 3.14;
float* fp = &f;
char* a_fp = (char*) fp;

int i = 0;
int* ip = &i;
char* a_ip = (char*) ip;

memcopy(a_ip, a_fp, sizeof(int)); // copy some bits from f to i.

Don't confuse me with void* since I really think that has totally no
bearing on the subject.

Cheers, Marco
 
M

Morris Keesan

So, I guess the only option I have to safely 'convert' between int and
float is something like this (please allow for some typos/semantic
mistakes and sizing problems):

float f = 3.14;
float* fp = &f;
char* a_fp = (char*) fp;

int i = 0;
int* ip = &i;
char* a_ip = (char*) ip;

memcopy(a_ip, a_fp, sizeof(int)); // copy some bits from f to i.

Don't confuse me with void* since I really think that has totally no
bearing on the subject.

I don't know about this "memcopy" function you're using, but the standard
memcpy function takes arguments of type (void *). And since (void *) has
special privileges (like automatic conversion without casts, in the
presence of a prototype), you don't need any of fp, a_fp, ip, or a_ip to
copy sizeof(int) bytes from f to i. This works:

#include <string.h>
....
float f = 3.14;
int i = 0;
....
memcpy(&i, &f, sizeof(int)); /* or memcpy(&i, &f, sizeof i); */

I would probably include an assertion that sizeof f >= sizeof i.
 
M

Marco Devillers

I don't know about this "memcopy" function you're using, but the standard
memcpy function takes arguments of type (void *).  And since (void *) has
special privileges (like automatic conversion without casts, in the
presence of a prototype), you don't need any of fp, a_fp, ip, or a_ip to
copy sizeof(int) bytes from f to i.  This works:

Hey, didn't I ask to allow for some typos and sizing errors. ;)
#include <string.h>
...
float f = 3.14;
int i = 0;
...
memcpy(&i, &f, sizeof(int)); /* or memcpy(&i, &f, sizeof i); */

I would probably include an assertion that sizeof f >= sizeof i.

But I guess, between me and you, we worked out the answer. Strict
aliasing tells that pointers of different type refer to different
values. So, I guess it's sufficient to use memcpy on places where I
alias stuff at the moment.

Anyway, thank all, signing off,
Marco
 
K

Keith Thompson

Marco Devillers said:
Dear people, I want to _alias_ stuff. The only pointer type on which
it is safe to alias _to_ is the char*.

Not true; void* works just as well as char* (and gives you the advantage
of implicit conversions).
So,

float f = 3.14;
float* fp = &f;
char* aliased_fp = (char*) fp; // only supported aliasing, as far as
I know

So, I guess the only option I have to safely 'convert' between int and
float is something like this (please allow for some typos/semantic
mistakes and sizing problems):

float f = 3.14;
float* fp = &f;
char* a_fp = (char*) fp;

int i = 0;
int* ip = &i;
char* a_ip = (char*) ip;

memcopy(a_ip, a_fp, sizeof(int)); // copy some bits from f to i.

Don't confuse me with void* since I really think that has totally no
bearing on the subject.

If void* confuses you, I think you need to clear up that confusion
before proceeding further. For example, memcpy (note spelling) takes
void* arguments, and you need to understand why.

You can pass arguments of any pointer-to-object type to memcpy; there's
no need to convert explicitly to another pointer type or to use
temporaries.

Your code above is equivalent to:

float f = 3.14;
int i = 0;
memcpy(&i, &f, sizeof (int));

This will "copy some bits from f to i", but it's not necessarily
a safe thing to do. If float happens to be bigger than int,
it will copy past the end of i, resulting in undefined behavior.
And the value stored in i after the memcpy() won't necessarily be
meaningful; it could even be a trap representation (though that's
unlikely on most real systems).

There are sometimes good reasons to take the representation of a
float object and treat it as an int. From what I've read so far,
I'm not convinced that you actually have a reason to do this.

On my system, i gets the value 1078523331. How is that value useful
to you? (I'm not asserting that it's not useful, I just don't see
how it would be.)
 
M

Marco Devillers

First off, the mistake _I_ made is that I want to alias in a context
of a compiler which doesn't allow this. So, excuses to all, this is my
mistake. I just can't alias and will need to copy.

Well, after having cleared that.
There are sometimes good reasons to take the representation of a
float object and treat it as an int.  From what I've read so far,
I'm not convinced that you actually have a reason to do this.

On my system, i gets the value 1078523331.  How is that value useful
to you?  (I'm not asserting that it's not useful, I just don't see
how it would be.)

Thing is, my collector just knows about cells, which are either ints
or pointers. In fact, every constant value is treated as a series of
bits/ints and copied in that manner. In that way, the collector
doesn't need to discriminate between all different kinds of terminals
which might occur; it allows for a very concisely defined collector
since it has no knowledge of values except for that they are a series
of size tagged bits.

Moreover, I just want people to be able add arbitrary 'primitive'
types which means they should also be able to store anything in a
terminal node. I.e., I envision that people will want to store
arbitrary structures into them. Like a tuple of doubles for OpenGL, or
a tuple of characters for an environment hashmap, or whatever.

These are daydreams of course. My compiler at the moment is more a
personal experiment which allows me to keep a bit in touch with
programming, modern compiler technology, type theory, and it allows me
to explore some armchair ideas. It certainly is not a production ready
compiler...

Again, thanks all, Marco
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top