Strict aliasing and Q2.6 in the FAQ

C

Conor F

(Trying this again as the velocityreviews site doesn't seem to forward
to NNTP - hope this doesn't appear twice!

Back to this old topic again. Sorry about this but I'm just not sure
if aliasing applies in this case. Question 2.6 in the FAQ describes
the case of using one malloc and piggy backing a char * onto it. It's
a pretty common idiom I would have thought, but I'm now having my
doubts:

struct name {
int namelen;
char *namep;
};

struct name *makename(char *newname)
{
char *buf = malloc(sizeof(struct name) + strlen(newname) + 1);

struct name *ret = (struct name *)buf;
ret->namelen = strlen(newname);
ret->namep = buf + sizeof(struct name);
strcpy(ret->namep, newname);

return ret;
}

I don't really believe there are aliasing issues here due to a char *
being reassigned to a struct name *.

But. If you did this instead:

struct name *ret = malloc(sizeof(struct name) + strlen(newname) +
1);
char *buf = (char *)(&ret[1]);

which also seems a perfectly reasonable way of going about it, and
avoids the sizeof(struct name) addition which can be a little tricky
in the cases where you have several leading structs. Ok, it's not
terrible but I always though the above was clearer.

Anyway - you've now taken an object of type struct name and converted
to a different type pointing to the same memory.

Isn't that an aliasing issue?

And if char * is a special case then what if I had used a wchar_t
instead? Or another type?

It's all a bit subtle for me.

Conor.

(Hum. Google won't let me post with my hotmail address any more. How
annoying...)
 
H

Harald van Dijk

struct name {
    int namelen;
    char *namep;

};

struct name *makename(char *newname)
{
    char *buf = malloc(sizeof(struct name) + strlen(newname) + 1);

    struct name *ret = (struct name *)buf;
    ret->namelen = strlen(newname);
    ret->namep = buf + sizeof(struct name);
    strcpy(ret->namep, newname);

    return ret;

}

I don't really believe there are aliasing issues here due to a char *
being reassigned to a struct name *.

That's fine, but for a different reason. You're accessing
sizeof(struct name) bytes as struct name, and you're accessing the
following strlen(newname)+1 bytes as char. You never access any data
as a type it isn't.

Because of that, your suggested alternative has no aliasing issues
either.

My preference would be to use neither, and instead use the C99
alternative

struct name {
int namelen;
char name[];
};
 
E

Eric Sosman

(Trying this again as the velocityreviews site doesn't seem to forward
to NNTP - hope this doesn't appear twice!

Back to this old topic again. Sorry about this but I'm just not sure
if aliasing applies in this case. Question 2.6 in the FAQ describes
the case of using one malloc and piggy backing a char * onto it. It's
a pretty common idiom I would have thought, but I'm now having my
doubts:

struct name {
int namelen;
char *namep;
};

struct name *makename(char *newname)
{
char *buf = malloc(sizeof(struct name) + strlen(newname) + 1);

struct name *ret = (struct name *)buf;
ret->namelen = strlen(newname);
ret->namep = buf + sizeof(struct name);
strcpy(ret->namep, newname);

return ret;
}

I don't really believe there are aliasing issues here due to a char *
being reassigned to a struct name *.

Nor do I.
But. If you did this instead:

struct name *ret = malloc(sizeof(struct name) + strlen(newname) +
1);
char *buf = (char *)(&ret[1]);

which also seems a perfectly reasonable way of going about it, and
avoids the sizeof(struct name) addition which can be a little tricky
in the cases where you have several leading structs. Ok, it's not
terrible but I always though the above was clearer.

Anyway - you've now taken an object of type struct name and converted
to a different type pointing to the same memory.

Isn't that an aliasing issue?

I don't see why. Personally, I prefer the latter form (although
I usually write `ret + 1' for `&ret[1]'). Both examples convert a
pointer value from one type to another.
And if char * is a special case then what if I had used a wchar_t
instead? Or another type?

Alignment problems could arise. They can be put to bed again,
but the code gets uglier.
 
C

Conor F

I don't really believe there are aliasing issues here due to a char *
That's fine, but for a different reason. You're accessing
sizeof(struct name) bytes as struct name, and you're accessing the
following strlen(newname)+1 bytes as char. You never access any data
as a type it isn't.

Because of that, your suggested alternative has no aliasing issues
either.

Ah! But I thought that it wouldn't matter where the data was. In other
words, gcc could decide that all that memory over there is now of type
"struct name" and if you pretend part of it isn't, gcc will play games
with you, like optimise all the references to buf out because you
didn't return it :)

I can't always tell what gcc might mess with. I've seen posts by Linus
Torvalds where advocates using the no-strict-aliasing flag to avoid
any subtle issues like this thread from a few years back:
https://lkml.org/lkml/2003/2/25/270

In that case, reordering the code make a difference. But if properly
assigned pointers point to different blocks of memory, I can't see how
anything would fail. Which is why I asked here :)
My preference would be to use neither, and instead use the C99
alternative

struct name {
  int namelen;
  char name[];

Oh absolutely. The case I mentioned was a simple one where the above
notation would suit much better. The Windows header files have those
notations all over the place (except using the pre C99 form of char
name[1];).

char *buf = (char *)(ret + 1);

As Eric says, I also prefer the above form, especially when things get
a little hairier, like the classic array of pointers to char followed
by the char data:

char **strarray -> [ptrc0][ptrc1][ptrc2][NULL][string0][string1]
[string2]

dataptr = (char *)(strarray + nstrings + 1)

And then if it's an array of pointers to structures, then we hit
alignment issues. But at least those are easy to deal with (just round
up to the next even multiple of the structure size). And then compile
on a Sparc just to see if you are right!

Conor.
 
H

Harald van Dijk

Ah! But I thought that it wouldn't matter where the data was. In other
words, gcc could decide that all that memory over there is now of type
"struct name" and if you pretend part of it isn't, gcc will play games
with you, like optimise all the references to buf out because you
didn't return it :)

I can't always tell what gcc might mess with. I've seen posts by Linus
Torvalds where advocates using the no-strict-aliasing flag to avoid
any subtle issues like this thread from a few years back:
   https://lkml.org/lkml/2003/2/25/270

In that case, reordering the code make a difference. But if properly
assigned pointers point to different blocks of memory, I can't see how
anything would fail. Which is why I asked here :)

Looking further in that thread, the problem comes from a subtle bug/
misfeature in the implementation of the kernel's own memcpy macro/
function. Generally speaking, when you're not writing a kernel, you
can assume memcpy behaves as required by the standard.
 
C

Conor F

So, to summarise :):

Strict aliasing would only apply if a type punned pointer pointed to
the same place in memory - which in my opinion is wild west code
anyway...

So, to be awkward and use a wchar_t instead simply to avoid the char *
case:

struct name { int namelen; wchar_t *namep; };

struct name *ret = malloc(sizeof(struct name) +
wcslen(newname) + 1);

wchar_t *buf = (wchar_t *)(ret + 1);

... copy to buf here ...


Would be fine simply because the type punning is to a different memory
location; but:

wchar_t *buf = (wchar_t *)(ret + 0);

Isn't fine. Ok, other than the fact that I made a mess of the example
I mean. Um, a better example would be the one in the wikipedia article
on type punning:

struct sockaddr_in sa = {0};
....
bind(sockfd, (struct sockaddr *)&sa, sizeof sa);

which is obviously bad. But if bind took a char * and they did this:

bind(sockfd, (char *)&sa, sizeof sa);

that would be ok I guess.

Plus any other type of inheritance creation like COM - where two
structures share initial sequences (IUnknown and all that) but are not
unioned are also out. But then I'd be aware that I'm messing around in
those circumstances and I'd use -fno-strict-aliasing...

Thanks,

Conor.
 
H

Harald van Dijk

So, to summarise :):

Strict aliasing would only apply if a type punned pointer pointed to
the same place in memory - which in my opinion is wild west code
anyway...

Pretty much, yes.
So, to be awkward and use a wchar_t instead simply to avoid the char *
case:

  struct name {  int namelen; wchar_t *namep; };

  struct name *ret = malloc(sizeof(struct name) +
                              wcslen(newname) + 1);

(wcslen(newname) + 1) * sizeof(wchar_t)
  wchar_t *buf = (wchar_t *)(ret + 1);

  ... copy to buf here ...

Would be fine simply because the type punning is to a different memory
location; but:
Right.

  wchar_t *buf = (wchar_t *)(ret + 0);

Isn't fine. Ok, other than the fact that I made a mess of the example
I mean.

Yes, that example doesn't work. Accessing the data as

struct name *ret = malloc(sizeof(wchar_t));
wchar_t *buf = (wchar *) (ret + 0);
*buf = L'x';

is no violation of the aliasing rules, because you're still only
accessing the data as wchar_t, even though you have a suspicious cast
now.
Um, a better example would be the one in the wikipedia article
on type punning:

  struct sockaddr_in sa = {0};
    ....
  bind(sockfd, (struct sockaddr *)&sa, sizeof sa);

which is obviously bad.

By C's aliasing rules, yes, you're right. Remember, though, that bind
is a non-standard function, and POSIX makes additional guarantees
about what compilers must permit, beyond what standard C does. It may
say that the above use must be given the "obvious" interpretation by a
conforming POSIX compiler. I don't know if it does so.
But if bind took a char * and they did this:

  bind(sockfd, (char *)&sa, sizeof sa);

that would be ok I guess.

If bind is declared as taking a struct sockaddr *, and if bind
dereferences its parameter to get a struct sockaddr, then C's aliasing
rules don't allow you to pass a pointer to what is really a struct
sockaddr_in, not even via an intermediate char * cast. If bind takes a
char *, and accesses the memory byte by byte, then yes, that is the
special exception in the aliasing rules.
Plus any other type of inheritance creation like COM - where two
structures share initial sequences (IUnknown and all that) but are not
unioned are also out. But then I'd be aware that I'm messing around in
those circumstances and I'd use -fno-strict-aliasing...

Another case where COM pretty much ignores the aliasing rules is in
IUnknown's QueryInterface method, where its last argument's type is
void **, but will almost never really be a pointer to void *. Which is
okay if MS decides that COM compilers must allow this, even if C's
aliasing rules don't.
 
C

Conor F

  struct name *ret = malloc(sizeof(struct name) +
(wcslen(newname) + 1) * sizeof(wchar_t)

Ooops. Erm, sorry. That's what I get for coding in a rush and then
changing my mind. That example was a total mess said:

Grand. That clarifies a lot. I used to do QA so I'm a tad pedantic
about these things (except the example I typed in). I just wanted to
be sure on that point.
Yes, that example doesn't work. Accessing the data as

  struct name *ret = malloc(sizeof(wchar_t));
  wchar_t *buf = (wchar *) (ret + 0);
  *buf = L'x';

is no violation of the aliasing rules, because you're still only
accessing the data as wchar_t, even though you have a suspicious cast
now.

That's somewhat of a surprise. I guess you might get hosed as soon as
you access "ret", because the compiler would decide to optimise given
the assumption that ret and buf couldn't possibly point to the same
location.
By C's aliasing rules, yes, you're right. Remember, though, that bind
is a non-standard function, and POSIX makes additional guarantees
about what compilers must permit, beyond what standard C does. It may
say that the above use must be given the "obvious" interpretation by a
conforming POSIX compiler. I don't know if it does so.

Hmmm. I see - I believe I've seen that before with threading - Posix
makes assurances above what ISO C makes so that calls like
pthread_mutex_lock() don't get messed with. I'd guess the gcc
documentation might shed some light.
If bind is declared as taking a struct sockaddr *, and if bind
dereferences its parameter to get a struct sockaddr, then C's aliasing
rules don't allow you to pass a pointer to what is really a struct
sockaddr_in, not even via an intermediate char * cast. If bind takes a
char *, and accesses the memory byte by byte, then yes, that is the
special exception in the aliasing rules.

Yes, thank you. I had that feeling when I typed that bit that maybe it
would be bad to recast it back once cast to a char *. Doing some
googling shows that some coders (eg: Putty) have made changes to put
these in unions to avoid the problem.
Another case where COM pretty much ignores the aliasing rules is in
IUnknown's QueryInterface method, where its last argument's type is
void **, but will almost never really be a pointer to void *. Which is
okay if MS decides that COM compilers must allow this, even if C's
aliasing rules don't.

Although if I was using a compiler like mingw it would probably be
good to be aware of the possible issues and use the appropriate flags
if necessary. The Windows compilers would do the right thing
automagically of course.

Conor.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,053
Latest member
BrodieSola

Latest Threads

Top