Strict aliasing and Q2.6 in the FAQ

Discussion in 'C Programming' started by Conor F, Sep 19, 2011.

  1. Conor F

    Conor F Guest

    (Trying this again as the velocityreviews site doesn't seem to forward
    to NNTP - hope this doesn't appear twice!

    Back to this old topic again. Sorry about this but I'm just not sure
    if aliasing applies in this case. Question 2.6 in the FAQ describes
    the case of using one malloc and piggy backing a char * onto it. It's
    a pretty common idiom I would have thought, but I'm now having my
    doubts:

    struct name {
    int namelen;
    char *namep;
    };

    struct name *makename(char *newname)
    {
    char *buf = malloc(sizeof(struct name) + strlen(newname) + 1);

    struct name *ret = (struct name *)buf;
    ret->namelen = strlen(newname);
    ret->namep = buf + sizeof(struct name);
    strcpy(ret->namep, newname);

    return ret;
    }

    I don't really believe there are aliasing issues here due to a char *
    being reassigned to a struct name *.

    But. If you did this instead:

    struct name *ret = malloc(sizeof(struct name) + strlen(newname) +
    1);
    char *buf = (char *)(&ret[1]);

    which also seems a perfectly reasonable way of going about it, and
    avoids the sizeof(struct name) addition which can be a little tricky
    in the cases where you have several leading structs. Ok, it's not
    terrible but I always though the above was clearer.

    Anyway - you've now taken an object of type struct name and converted
    to a different type pointing to the same memory.

    Isn't that an aliasing issue?

    And if char * is a special case then what if I had used a wchar_t
    instead? Or another type?

    It's all a bit subtle for me.

    Conor.

    (Hum. Google won't let me post with my hotmail address any more. How
    annoying...)
    Conor F, Sep 19, 2011
    #1
    1. Advertising

  2. On Sep 19, 11:53 pm, Conor F <>
    wrote:
    > struct name {
    >     int namelen;
    >     char *namep;
    >
    > };
    >
    > struct name *makename(char *newname)
    > {
    >     char *buf = malloc(sizeof(struct name) + strlen(newname) + 1);
    >
    >     struct name *ret = (struct name *)buf;
    >     ret->namelen = strlen(newname);
    >     ret->namep = buf + sizeof(struct name);
    >     strcpy(ret->namep, newname);
    >
    >     return ret;
    >
    > }
    >
    > I don't really believe there are aliasing issues here due to a char *
    > being reassigned to a struct name *.


    That's fine, but for a different reason. You're accessing
    sizeof(struct name) bytes as struct name, and you're accessing the
    following strlen(newname)+1 bytes as char. You never access any data
    as a type it isn't.

    Because of that, your suggested alternative has no aliasing issues
    either.

    My preference would be to use neither, and instead use the C99
    alternative

    struct name {
    int namelen;
    char name[];
    };
    Harald van Dijk, Sep 19, 2011
    #2
    1. Advertising

  3. Conor F

    Eric Sosman Guest

    On 9/19/2011 5:53 PM, Conor F wrote:
    > (Trying this again as the velocityreviews site doesn't seem to forward
    > to NNTP - hope this doesn't appear twice!
    >
    > Back to this old topic again. Sorry about this but I'm just not sure
    > if aliasing applies in this case. Question 2.6 in the FAQ describes
    > the case of using one malloc and piggy backing a char * onto it. It's
    > a pretty common idiom I would have thought, but I'm now having my
    > doubts:
    >
    > struct name {
    > int namelen;
    > char *namep;
    > };
    >
    > struct name *makename(char *newname)
    > {
    > char *buf = malloc(sizeof(struct name) + strlen(newname) + 1);
    >
    > struct name *ret = (struct name *)buf;
    > ret->namelen = strlen(newname);
    > ret->namep = buf + sizeof(struct name);
    > strcpy(ret->namep, newname);
    >
    > return ret;
    > }
    >
    > I don't really believe there are aliasing issues here due to a char *
    > being reassigned to a struct name *.


    Nor do I.

    > But. If you did this instead:
    >
    > struct name *ret = malloc(sizeof(struct name) + strlen(newname) +
    > 1);
    > char *buf = (char *)(&ret[1]);
    >
    > which also seems a perfectly reasonable way of going about it, and
    > avoids the sizeof(struct name) addition which can be a little tricky
    > in the cases where you have several leading structs. Ok, it's not
    > terrible but I always though the above was clearer.
    >
    > Anyway - you've now taken an object of type struct name and converted
    > to a different type pointing to the same memory.
    >
    > Isn't that an aliasing issue?


    I don't see why. Personally, I prefer the latter form (although
    I usually write `ret + 1' for `&ret[1]'). Both examples convert a
    pointer value from one type to another.

    > And if char * is a special case then what if I had used a wchar_t
    > instead? Or another type?


    Alignment problems could arise. They can be put to bed again,
    but the code gets uglier.

    --
    Eric Sosman
    d
    Eric Sosman, Sep 20, 2011
    #3
  4. Conor F

    Conor F Guest

    > > I don't really believe there are aliasing issues here due to a char *
    > > being reassigned to a struct name *.

    >
    > That's fine, but for a different reason. You're accessing
    > sizeof(struct name) bytes as struct name, and you're accessing the
    > following strlen(newname)+1 bytes as char. You never access any data
    > as a type it isn't.
    >
    > Because of that, your suggested alternative has no aliasing issues
    > either.


    Ah! But I thought that it wouldn't matter where the data was. In other
    words, gcc could decide that all that memory over there is now of type
    "struct name" and if you pretend part of it isn't, gcc will play games
    with you, like optimise all the references to buf out because you
    didn't return it :)

    I can't always tell what gcc might mess with. I've seen posts by Linus
    Torvalds where advocates using the no-strict-aliasing flag to avoid
    any subtle issues like this thread from a few years back:
    https://lkml.org/lkml/2003/2/25/270

    In that case, reordering the code make a difference. But if properly
    assigned pointers point to different blocks of memory, I can't see how
    anything would fail. Which is why I asked here :)

    > My preference would be to use neither, and instead use the C99
    > alternative
    >
    > struct name {
    >   int namelen;
    >   char name[];


    Oh absolutely. The case I mentioned was a simple one where the above
    notation would suit much better. The Windows header files have those
    notations all over the place (except using the pre C99 form of char
    name[1];).

    char *buf = (char *)(ret + 1);

    As Eric says, I also prefer the above form, especially when things get
    a little hairier, like the classic array of pointers to char followed
    by the char data:

    char **strarray -> [ptrc0][ptrc1][ptrc2][NULL][string0][string1]
    [string2]

    dataptr = (char *)(strarray + nstrings + 1)

    And then if it's an array of pointers to structures, then we hit
    alignment issues. But at least those are easy to deal with (just round
    up to the next even multiple of the structure size). And then compile
    on a Sparc just to see if you are right!

    Conor.
    Conor F, Sep 20, 2011
    #4
  5. On Sep 20, 12:11 pm, Conor F <>
    wrote:
    > > > I don't really believe there are aliasing issues here due to a char *
    > > > being reassigned to a struct name *.

    >
    > > That's fine, but for a different reason. You're accessing
    > > sizeof(struct name) bytes as struct name, and you're accessing the
    > > following strlen(newname)+1 bytes as char. You never access any data
    > > as a type it isn't.

    >
    > > Because of that, your suggested alternative has no aliasing issues
    > > either.

    >
    > Ah! But I thought that it wouldn't matter where the data was. In other
    > words, gcc could decide that all that memory over there is now of type
    > "struct name" and if you pretend part of it isn't, gcc will play games
    > with you, like optimise all the references to buf out because you
    > didn't return it :)
    >
    > I can't always tell what gcc might mess with. I've seen posts by Linus
    > Torvalds where advocates using the no-strict-aliasing flag to avoid
    > any subtle issues like this thread from a few years back:
    >    https://lkml.org/lkml/2003/2/25/270
    >
    > In that case, reordering the code make a difference. But if properly
    > assigned pointers point to different blocks of memory, I can't see how
    > anything would fail. Which is why I asked here :)


    Looking further in that thread, the problem comes from a subtle bug/
    misfeature in the implementation of the kernel's own memcpy macro/
    function. Generally speaking, when you're not writing a kernel, you
    can assume memcpy behaves as required by the standard.
    Harald van Dijk, Sep 20, 2011
    #5
  6. Conor F

    Conor F Guest

    So, to summarise :):

    Strict aliasing would only apply if a type punned pointer pointed to
    the same place in memory - which in my opinion is wild west code
    anyway...

    So, to be awkward and use a wchar_t instead simply to avoid the char *
    case:

    struct name { int namelen; wchar_t *namep; };

    struct name *ret = malloc(sizeof(struct name) +
    wcslen(newname) + 1);

    wchar_t *buf = (wchar_t *)(ret + 1);

    ... copy to buf here ...


    Would be fine simply because the type punning is to a different memory
    location; but:

    wchar_t *buf = (wchar_t *)(ret + 0);

    Isn't fine. Ok, other than the fact that I made a mess of the example
    I mean. Um, a better example would be the one in the wikipedia article
    on type punning:

    struct sockaddr_in sa = {0};
    ....
    bind(sockfd, (struct sockaddr *)&sa, sizeof sa);

    which is obviously bad. But if bind took a char * and they did this:

    bind(sockfd, (char *)&sa, sizeof sa);

    that would be ok I guess.

    Plus any other type of inheritance creation like COM - where two
    structures share initial sequences (IUnknown and all that) but are not
    unioned are also out. But then I'd be aware that I'm messing around in
    those circumstances and I'd use -fno-strict-aliasing...

    Thanks,

    Conor.
    Conor F, Sep 20, 2011
    #6
  7. On Sep 20, 8:26 pm, Conor F <>
    wrote:
    > So, to summarise :):
    >
    > Strict aliasing would only apply if a type punned pointer pointed to
    > the same place in memory - which in my opinion is wild west code
    > anyway...


    Pretty much, yes.

    > So, to be awkward and use a wchar_t instead simply to avoid the char *
    > case:
    >
    >   struct name {  int namelen; wchar_t *namep; };
    >
    >   struct name *ret = malloc(sizeof(struct name) +
    >                               wcslen(newname) + 1);


    (wcslen(newname) + 1) * sizeof(wchar_t)

    >   wchar_t *buf = (wchar_t *)(ret + 1);
    >
    >   ... copy to buf here ...
    >
    > Would be fine simply because the type punning is to a different memory
    > location; but:


    Right.

    >   wchar_t *buf = (wchar_t *)(ret + 0);
    >
    > Isn't fine. Ok, other than the fact that I made a mess of the example
    > I mean.


    Yes, that example doesn't work. Accessing the data as

    struct name *ret = malloc(sizeof(wchar_t));
    wchar_t *buf = (wchar *) (ret + 0);
    *buf = L'x';

    is no violation of the aliasing rules, because you're still only
    accessing the data as wchar_t, even though you have a suspicious cast
    now.

    > Um, a better example would be the one in the wikipedia article
    > on type punning:
    >
    >   struct sockaddr_in sa = {0};
    >     ....
    >   bind(sockfd, (struct sockaddr *)&sa, sizeof sa);
    >
    > which is obviously bad.


    By C's aliasing rules, yes, you're right. Remember, though, that bind
    is a non-standard function, and POSIX makes additional guarantees
    about what compilers must permit, beyond what standard C does. It may
    say that the above use must be given the "obvious" interpretation by a
    conforming POSIX compiler. I don't know if it does so.

    > But if bind took a char * and they did this:
    >
    >   bind(sockfd, (char *)&sa, sizeof sa);
    >
    > that would be ok I guess.


    If bind is declared as taking a struct sockaddr *, and if bind
    dereferences its parameter to get a struct sockaddr, then C's aliasing
    rules don't allow you to pass a pointer to what is really a struct
    sockaddr_in, not even via an intermediate char * cast. If bind takes a
    char *, and accesses the memory byte by byte, then yes, that is the
    special exception in the aliasing rules.

    > Plus any other type of inheritance creation like COM - where two
    > structures share initial sequences (IUnknown and all that) but are not
    > unioned are also out. But then I'd be aware that I'm messing around in
    > those circumstances and I'd use -fno-strict-aliasing...


    Another case where COM pretty much ignores the aliasing rules is in
    IUnknown's QueryInterface method, where its last argument's type is
    void **, but will almost never really be a pointer to void *. Which is
    okay if MS decides that COM compilers must allow this, even if C's
    aliasing rules don't.
    Harald van Dijk, Sep 20, 2011
    #7
  8. Conor F

    Conor F Guest

    > >   struct name *ret = malloc(sizeof(struct name) +
    > >                               wcslen(newname) + 1);

    >
    > (wcslen(newname) + 1) * sizeof(wchar_t)


    Ooops. Erm, sorry. That's what I get for coding in a rush and then
    changing my mind. That example was a total mess <grin>.

    > > Would be fine simply because the type punning is to a different memory
    > > location; but:

    >
    > Right.


    Grand. That clarifies a lot. I used to do QA so I'm a tad pedantic
    about these things (except the example I typed in). I just wanted to
    be sure on that point.

    > Yes, that example doesn't work. Accessing the data as
    >
    >   struct name *ret = malloc(sizeof(wchar_t));
    >   wchar_t *buf = (wchar *) (ret + 0);
    >   *buf = L'x';
    >
    > is no violation of the aliasing rules, because you're still only
    > accessing the data as wchar_t, even though you have a suspicious cast
    > now.


    That's somewhat of a surprise. I guess you might get hosed as soon as
    you access "ret", because the compiler would decide to optimise given
    the assumption that ret and buf couldn't possibly point to the same
    location.

    > By C's aliasing rules, yes, you're right. Remember, though, that bind
    > is a non-standard function, and POSIX makes additional guarantees
    > about what compilers must permit, beyond what standard C does. It may
    > say that the above use must be given the "obvious" interpretation by a
    > conforming POSIX compiler. I don't know if it does so.


    Hmmm. I see - I believe I've seen that before with threading - Posix
    makes assurances above what ISO C makes so that calls like
    pthread_mutex_lock() don't get messed with. I'd guess the gcc
    documentation might shed some light.

    > > But if bind took a char * and they did this:

    >
    > >   bind(sockfd, (char *)&sa, sizeof sa);

    >
    > > that would be ok I guess.

    >
    > If bind is declared as taking a struct sockaddr *, and if bind
    > dereferences its parameter to get a struct sockaddr, then C's aliasing
    > rules don't allow you to pass a pointer to what is really a struct
    > sockaddr_in, not even via an intermediate char * cast. If bind takes a
    > char *, and accesses the memory byte by byte, then yes, that is the
    > special exception in the aliasing rules.


    Yes, thank you. I had that feeling when I typed that bit that maybe it
    would be bad to recast it back once cast to a char *. Doing some
    googling shows that some coders (eg: Putty) have made changes to put
    these in unions to avoid the problem.

    > Another case where COM pretty much ignores the aliasing rules is in
    > IUnknown's QueryInterface method, where its last argument's type is
    > void **, but will almost never really be a pointer to void *. Which is
    > okay if MS decides that COM compilers must allow this, even if C's
    > aliasing rules don't.


    Although if I was using a compiler like mingw it would probably be
    good to be aware of the possible issues and use the appropriate flags
    if necessary. The Windows compilers would do the right thing
    automagically of course.

    Conor.
    Conor F, Sep 20, 2011
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Paul Brettschneider

    char and strict aliasing

    Paul Brettschneider, Jul 17, 2008, in forum: C++
    Replies:
    4
    Views:
    421
    James Kanze
    Jul 18, 2008
  2. Francois Duranleau

    Strict aliasing and buffer handling

    Francois Duranleau, Jun 20, 2011, in forum: C++
    Replies:
    20
    Views:
    1,555
    Alf P. Steinbach /Usenet
    Jun 22, 2011
  3. Carveone

    Strict aliasing and Q2.6 in the FAQ

    Carveone, Sep 19, 2011, in forum: C Programming
    Replies:
    0
    Views:
    360
    Carveone
    Sep 19, 2011
  4. Maxim Fomin

    Union and strict aliasing

    Maxim Fomin, Jul 28, 2012, in forum: C Programming
    Replies:
    4
    Views:
    452
    Maxim Fomin
    Aug 2, 2012
  5. Xavier Roche
    Replies:
    3
    Views:
    90
    James Kuyper
    Mar 25, 2014
Loading...

Share This Page