strict-aliasing??

  • Thread starter Ivan Riis Nielsen
  • Start date
I

Ivan Riis Nielsen

Hello everyone,

I have been playing around with object oriented techniques in C.
Specifically, I wanted to implement generic reference counting. This
would be implemented by a base class which would always be the first
field in all derived classes. The reference counting would then be
handled by a universal assignment macro. Below is some trimmed down
code demonstrating my issue:

gcc (4.4.3) complains about strict-aliasing rules.

% gcc -O3 -Wall t1.c
t1.c: In function ‘main’:
t1.c:28: warning: dereferencing pointer ‘_pdst’ does break
strict-aliasing rules
t1.c:28: note: initialized from here

The output from the program is

% ./a.out
d1 = {{1},2}
d1 = 0x9b53008, d2 = (nil)
d1 = {{2},2}
d1 = 0x9b53008, d2 = 0x9b53008

The compiled program does as expected. I don't understand exactly what
the problem is, can anyone shed some light on this? Or is this a false
warning?

Thanks
Ivan


<code>

#include <stdio.h>
#include <stdlib.h>

typedef struct {
int count;
} base_struct, *base;

typedef struct {
base_struct b;
int deriv_field;
} deriv_struct,*deriv;

#define UNIVERSAL_ASSIGN(dst,src) \
do { \
base* _pdst=(base*)&(dst); \
base _src=(base)(src); \
if (_src) ++(_src->count); \
if (*_pdst) --((*_pdst)->count); \
*_pdst=_src; \
} while (0)

int main(void) {
deriv d1=(deriv)malloc(sizeof(deriv_struct)),d2=0;
d1->b.count = 1;
d1->deriv_field = 2;
printf("d1 = {{%d},%d}\n",d1->b.count,d1->deriv_field);
printf("d1 = %p, d2 = %p\n",d1,d2);
UNIVERSAL_ASSIGN(d2,d1);
printf("d1 = {{%d},%d}\n",d1->b.count,d1->deriv_field);
printf("d1 = %p, d2 = %p\n",d1,d2);
return 0;
}

</code>
 
S

Seebs

I have been playing around with object oriented techniques in C.
Specifically, I wanted to implement generic reference counting. This
would be implemented by a base class which would always be the first
field in all derived classes. The reference counting would then be
handled by a universal assignment macro. Below is some trimmed down
code demonstrating my issue:

This is... Not Right.
% ./a.out
d1 = {{1},2}
d1 = 0x9b53008, d2 = (nil)
d1 = {{2},2}
d1 = 0x9b53008, d2 = 0x9b53008
The compiled program does as expected. I don't understand exactly what
the problem is, can anyone shed some light on this? Or is this a false
warning?

The problem is that you're violating one of the language rules in a way
which justifiably causes the compiler to complain. If you do this often
enough, eventually the optimizer will throw out assignments or corrupt
something.
#include <stdio.h>
#include <stdlib.h>

typedef struct {
int count;
} base_struct, *base;

typedef struct {
base_struct b;
int deriv_field;
} deriv_struct,*deriv;

#define UNIVERSAL_ASSIGN(dst,src) \
do { \
base* _pdst=(base*)&(dst); \
base _src=(base)(src); \
if (_src) ++(_src->count); \
if (*_pdst) --((*_pdst)->count); \
*_pdst=_src; \
} while (0)

int main(void) {
deriv d1=(deriv)malloc(sizeof(deriv_struct)),d2=0;
d1->b.count = 1;
d1->deriv_field = 2;
printf("d1 = {{%d},%d}\n",d1->b.count,d1->deriv_field);
printf("d1 = %p, d2 = %p\n",d1,d2);
UNIVERSAL_ASSIGN(d2,d1);
printf("d1 = {{%d},%d}\n",d1->b.count,d1->deriv_field);
printf("d1 = %p, d2 = %p\n",d1,d2);
return 0;
}

I think I get what you're trying to do here. The goal is to make it so
that you are counting that both 'd1' and 'd2' are the same pointer.

The problem is that it is not generally safe to take the address of an
object, convert it to a pointer to another type, and then modify it through
that pointer. (It can be with unsigned char). In particular, while in
practice it's probably always going to work to modify pointers to structures
as though they were pointers to other structures, it's not really safe,
and the optimizer is allowed to assume you won't, even if it sees you do
so. Thus the warning.

I'd probably write this as something like:
#define REFERENCE_ASSIGN(d2, d1) do { \
if (d2) (d2)->b.count--; \
if (d1) (d1)->b.count++; \
(d2) = (d1); \
} while(0)

.... Well, actually, I wouldn't, because I wouldn't try to do this at
all, because C hasn't really got the hooks you need -- there's no destructors.

void catch_me_if_you_can(deriv *b1) {
deriv *b2;
UNIVERSAL_ASSIGN(b2, b1);
/* oops, b2 goes away and we never find out */
return;
}

For that matter:
void catch_me_if_you_can(deriv *b1) {
deriv *b2 = {something that makes a new one};
UNIVERSAL_ASSIGN(b1, b2);
drop_reference_and_free(b2);
/* oops, b1's reference count was lowered, but
* the caller still has the reference.
*/
}

In general, if you want to do a reference counting API, I'd recommend
acknowledging up front in the design that it is an API, and people must use
your API functions to create or release the objects.

-s
 
I

ImpalerCore

Hello everyone,

I have been playing around with object oriented techniques in C.
Specifically, I wanted to implement generic reference counting.  This
would be implemented by a base class which would always be the first
field in all derived classes.  The reference counting would then be
handled by a universal assignment macro.  Below is some trimmed down
code demonstrating my issue:

gcc (4.4.3) complains about strict-aliasing rules.

% gcc -O3 -Wall t1.c
t1.c: In function ‘main’:
t1.c:28: warning: dereferencing pointer ‘_pdst’ does break
strict-aliasing rules
t1.c:28: note: initialized from here

The output from the program is

% ./a.out
d1 = {{1},2}
d1 = 0x9b53008, d2 = (nil)
d1 = {{2},2}
d1 = 0x9b53008, d2 = 0x9b53008

The compiled program does as expected.  I don't understand exactly what
the problem is, can anyone shed some light on this?  Or is this a false
warning?

Thanks
Ivan

<code>

#include <stdio.h>
#include <stdlib.h>

typedef struct {
   int count;

} base_struct, *base;

typedef struct {
   base_struct b;
   int deriv_field;

} deriv_struct,*deriv;

#define UNIVERSAL_ASSIGN(dst,src)    \
   do {                              \
     base* _pdst=(base*)&(dst);          \
     base _src=(base)(src);          \
     if (_src) ++(_src->count);           \
     if (*_pdst) --((*_pdst)->count); \
     *_pdst=_src;                    \
   } while (0)

int main(void) {
   deriv d1=(deriv)malloc(sizeof(deriv_struct)),d2=0;
   d1->b.count = 1;
   d1->deriv_field = 2;
   printf("d1 = {{%d},%d}\n",d1->b.count,d1->deriv_field);
   printf("d1 = %p, d2 = %p\n",d1,d2);
   UNIVERSAL_ASSIGN(d2,d1);
   printf("d1 = {{%d},%d}\n",d1->b.count,d1->deriv_field);
   printf("d1 = %p, d2 = %p\n",d1,d2);
   return 0;

}

</code>

I don't know exactly why you are getting that error, but it usually
indicates that you're doing something dangerous, possibly introducing
some sort of memory corruption.

You can make a rudimentary reference counting wrapper, but you have to
make people use that interface. I've done something that you can
experiment with, but I haven't tried to do anything particularly
useful with it yet.

\code
#include <stdio.h>
#include <string.h>

void* rc_malloc( size_t n );
void* rc_copy( void* p );
void rc_free( void* p );
size_t refcount( void* p );

char* rc_strdup( const char* str );

int main( void )
{
char* s;
char* sref;

s = rc_strdup( "This is a test" );
printf( "s [string,length,rc] = [%s,%lu,%lu]\n", s, strlen( s ),
refcount( s ) );

sref = rc_copy( s );
printf( "sref [string,length,rc] = [%s,%lu,%lu]\n", sref,
strlen( sref ), refcount( sref ) );
printf( "\n" );

s[0] = 't';
sref[1] = 'H';
printf( "Setting s[0] to 't' and sref[1] to 'H'...\n" );
printf( "s [string,length,rc] = [%s,%lu,%lu]\n", s, strlen( s ),
refcount( s ) );
printf( "sref [string,length,rc] = [%s,%lu,%lu]\n", sref,
strlen( sref ), refcount( sref ) );
printf( "\n" );

rc_free( s );
s = NULL;

printf( "sref [string,length,rc] = [%s,%lu,%lu]\n", sref,
strlen( sref ), refcount( sref ) );

rc_free( sref );

return EXIT_SUCCESS;
}

void* rc_malloc( size_t n )
{
void* p = NULL;
void* mem = NULL;

/*
* To track the reference count, a 'size_t' object will be
* added to the allocation.
*/
p = malloc( n + sizeof( size_t ) );

if ( p )
{
*((size_t*)p) = 1;
mem = (unsigned char*)p + sizeof( size_t );
}

return mem;
}

void* rc_copy( void* p )
{
void* actual_p = NULL;

if ( p )
{
actual_p = (unsigned char*)p - sizeof( size_t );
++*((size_t*)actual_p);
}

return p;
}

void rc_free( void* p )
{
void* actual_p = NULL;
size_t rc;

if ( p )
{
actual_p = (unsigned char*)p - sizeof( size_t );
rc = --*((size_t*)actual_p);
if ( rc == 0 ) {
free( actual_p );
}
}
}

size_t refcount( void* p )
{
void* actual_p = NULL;
size_t rc = 0;

if ( p )
{
actual_p = (unsigned char*)p - sizeof( size_t );
rc = *((size_t*)actual_p);
}

return rc;
}

char* rc_strdup( const char* str )
{
char* new_str = NULL;
size_t length;

if ( str )
{
length = strlen( str ) + 1;
new_str = rc_malloc( length );
if ( new_str ) {
memcpy( new_str, str, length );
}
}

return new_str;
}
\endcode

\output
s [string,length,rc] = [This is a test,14,1]
sref [string,length,rc] = [This is a test,14,2]

Setting s[0] to 't' and sref[1] to 'H'...
s [string,length,rc] = [tHis is a test,14,2]
sref [string,length,rc] = [tHis is a test,14,2]

sref [string,length,rc] = [tHis is a test,14,1]
\endoutput

I could see a possible usecase where you have say N linked lists
acting as views pointing to the same data, and you want the behavior
to be if the node object is deleted out of all views, its resources
are freed.

Hopefully it works okay, and I don't know what all the gotchas are of
doing it this way, but it is something to play with.

Best regards,
John D.
 
I

Ivan Riis Nielsen

Seebs said:
This is... Not Right.
Hmm, you think I should do this in C++? You're probably right, it would
certainly be a lot easier...

I think I get what you're trying to do here. The goal is to make it so
that you are counting that both 'd1' and 'd2' are the same pointer.

The problem is that it is not generally safe to take the address of an
object, convert it to a pointer to another type, and then modify it through
that pointer. (It can be with unsigned char). In particular, while in
practice it's probably always going to work to modify pointers to structures
as though they were pointers to other structures, it's not really safe,
and the optimizer is allowed to assume you won't, even if it sees you do
so. Thus the warning.

I'd probably write this as something like:
#define REFERENCE_ASSIGN(d2, d1) do { \
if (d2) (d2)->b.count--; \
if (d1) (d1)->b.count++; \
(d2) = (d1); \
} while(0)

Yes, I tried this, and it does not give the strict-aliasing warning.
However, this is not quite generic... If the class hierarchy had more
levels, ie. a class 'deriv2' could have a 'deriv_struct' as its first
field, which in turn has a 'base_struct' as its first field. This one
does it and still does not give the strict-aliasing warning:

#define UNIVERSAL_ASSIGN2(dst,src) \
do { \
if (src) ++(((base)(src))->count); \
if (dst) --(((base)(dst))->count); \
(dst) = (src); \
} while (0)

However, this has another problem: Both src and dst are evaluated
multiple times. This was why I used the local variables of type 'base'
(struct pointer) and 'base*' (pointer to struct pointer).

This next one is identical to my original UNIVERSAL_ASSIGN, except that
the actual derived type name is passed in. It solves the problem but is
of course a bit less generic.

#define UNIVERSAL_ASSIGN3(Type,dst,src) \
do { \
Type * _pdst=&(dst); \
base _src=(base)(src); \
if (_src) ++(_src->count); \
if (*_pdst) --((((base)(*_pdst)))->count); \
*_pdst=(Type)_src; \
} while (0)

... Well, actually, I wouldn't, because I wouldn't try to do this at
all, because C hasn't really got the hooks you need -- there's no destructors.

void catch_me_if_you_can(deriv *b1) {
deriv *b2;
UNIVERSAL_ASSIGN(b2, b1);
/* oops, b2 goes away and we never find out */
return;
}

For that matter:
void catch_me_if_you_can(deriv *b1) {
deriv *b2 = {something that makes a new one};
UNIVERSAL_ASSIGN(b1, b2);
drop_reference_and_free(b2);
/* oops, b1's reference count was lowered, but
* the caller still has the reference.
*/
}

In general, if you want to do a reference counting API, I'd recommend
acknowledging up front in the design that it is an API, and people must use
your API functions to create or release the objects.

-s

Yes, that's true, the programmer would have to UNREF() object reference
variables when exiting their scope. Maybe the assignment macro is too
clumsy, and simple REF() and UNREF() macros would be more suitable. Or
one could go to C++ to make it prettier.

Thank you for your inputs,
Ivan
 
S

Seebs

Hmm, you think I should do this in C++? You're probably right, it would
certainly be a lot easier...

If you need to do this, C++ may be a better fit, although there it's largely
redundant.
Yes, I tried this, and it does not give the strict-aliasing warning.
However, this is not quite generic... If the class hierarchy had more
levels, ie. a class 'deriv2' could have a 'deriv_struct' as its first
field, which in turn has a 'base_struct' as its first field. This one
does it and still does not give the strict-aliasing warning:
#define UNIVERSAL_ASSIGN2(dst,src) \
do { \
if (src) ++(((base)(src))->count); \
if (dst) --(((base)(dst))->count); \
(dst) = (src); \
} while (0)

As long as you're sure that the base object is always at the beginning
of the object, I think you're fine.
However, this has another problem: Both src and dst are evaluated
multiple times. This was why I used the local variables of type 'base'
(struct pointer) and 'base*' (pointer to struct pointer).

Yeah. If you want to use extensions, you can do the above with local
stashed copies, by using gcc's typeof() extension, though of course that's
not portable.
Yes, that's true, the programmer would have to UNREF() object reference
variables when exiting their scope. Maybe the assignment macro is too
clumsy, and simple REF() and UNREF() macros would be more suitable. Or
one could go to C++ to make it prettier.

In general, I think reference counting is something that should be done at
at a lower level, such as internally. To see an example, you could look at
my old string library:

http://www.seebs.net/c/sz.html

The code's not great, but it illustrates a use of reference-counting in C
objects. It doesn't try to be generic, though.

-s
 
I

ImpalerCore

void* rc_malloc( size_t n );
void* rc_copy( void* p );
void rc_free( void* p );
size_t refcount( void* p );

<snip>

To do the reference assignment macro using the above primitives could
be something like (not tested):

#define REFERENCE_ASSIGN(dst, src) \
do { \
rc_free( (dst) ); \
(dst) = rc_copy( (src) ); \
} while(0)

And if you go to C++, check out Boost's smartptr.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,045
Latest member
DRCM

Latest Threads

Top