Confirm reinterpret_cast if is safe?

Discussion in 'C++' started by Nephi Immortal, Feb 28, 2011.

  1. I use reinterpret_cast to convert from 32 bits integer into 8 bits
    integer. I use reference instead of pointer to modify value. Please
    confirm if reinterpret_cast is safe on either Intel machine or AMD
    machine.
    If another machine has 9 bits instead of 8 bits, then I use “if
    condition” macro to use bit shift and bit mask instead.


    typedef unsigned __int8 size_8;
    typedef unsigned __int16 size_16;
    typedef unsigned __int32 size_32;

    int main () {
    size_32 dword = 0x123456U;

    size_8 &L = *reinterpret_cast< size_8* >( &dword );
    size_8 &H = *( reinterpret_cast< size_8* >( &dword ) + 1 );
    size_8 &B = *( reinterpret_cast< size_8* >( &dword ) + 2 );
    size_16 &W = *reinterpret_cast< size_16* >( &dword );

    ++L;
    ++H;
    ++B;

    L += 2;

    return 0;
    }
     
    Nephi Immortal, Feb 28, 2011
    #1
    1. Advertising

  2. On Feb 28, 3:39 pm, Nephi Immortal <> wrote:
    >         I use reinterpret_cast to convert from 32 bits integer into 8 bits
    > integer.  I use reference instead of pointer to modify value.  Please
    > confirm if reinterpret_cast is safe on either Intel machine or AMD
    > machine.
    >         If another machine has 9 bits instead of 8 bits, then I use “if
    > condition” macro to use bit shift and bit mask instead.
    >
    > typedef unsigned __int8 size_8;
    > typedef unsigned __int16 size_16;
    > typedef unsigned __int32 size_32;
    >
    > int main () {
    >         size_32 dword = 0x123456U;
    >
    >         size_8 &L = *reinterpret_cast< size_8* >( &dword );
    >         size_8 &H = *( reinterpret_cast< size_8* >( &dword ) + 1 );
    >         size_8 &B = *( reinterpret_cast< size_8* >( &dword ) + 2 );
    >         size_16 &W = *reinterpret_cast< size_16* >( &dword );
    >
    >         ++L;
    >         ++H;
    >         ++B;
    >
    >         L += 2;
    >
    >         return 0;
    >
    > }


    This is broken by C++ standard. You are reading an __in32 object
    through a __int8 lvalue, and that is undefined behavior. At least, it
    is UB if __int8 is not a typedef of char nor unsigned char.

    I don't know what various implements will actually do with that code.

    To fix it, at least the following is allowed by the standard: you can
    use "char" or "unsigned char" lvalues to read any POD object.
     
    Joshua Maurice, Mar 1, 2011
    #2
    1. Advertising

  3. On Feb 28, 6:33 pm, Joshua Maurice <> wrote:
    > On Feb 28, 3:39 pm, Nephi Immortal <> wrote:
    >
    >
    >
    >
    >
    > >         I use reinterpret_cast to convert from 32 bits integer into 8 bits
    > > integer.  I use reference instead of pointer to modify value.  Please
    > > confirm if reinterpret_cast is safe on either Intel machine or AMD
    > > machine.
    > >         If another machine has 9 bits instead of 8 bits, then Iuse “if
    > > condition” macro to use bit shift and bit mask instead.

    >
    > > typedef unsigned __int8 size_8;
    > > typedef unsigned __int16 size_16;
    > > typedef unsigned __int32 size_32;

    >
    > > int main () {
    > >         size_32 dword = 0x123456U;

    >
    > >         size_8 &L = *reinterpret_cast< size_8* >( &dword );
    > >         size_8 &H = *( reinterpret_cast< size_8* >( &dword ) + 1 );
    > >         size_8 &B = *( reinterpret_cast< size_8* >( &dword ) + 2 );
    > >         size_16 &W = *reinterpret_cast< size_16* >( &dword );

    >
    > >         ++L;
    > >         ++H;
    > >         ++B;

    >
    > >         L += 2;

    >
    > >         return 0;

    >
    > > }

    >
    > This is broken by C++ standard. You are reading an __in32 object
    > through a __int8 lvalue, and that is undefined behavior. At least, it
    > is UB if __int8 is not a typedef of char nor unsigned char.
    >
    > I don't know what various implements will actually do with that code.
    >
    > To fix it, at least the following is allowed by the standard: you can
    > use "char" or "unsigned char" lvalues to read any POD object.- Hide quoted text -
    >
    > - Show quoted text -


    I think you meant unrecognized keyword. Another C++ Compiler than
    Microsoft C++ Compiler or Intel C++ Compiler will generate an error
    message to state undeclared __int8, __int16, and __int32.
    I assume that you suggest:

    typedef unsigned char size_8;
    typedef unsigned short size_16;
    typedef unsigned long size_32;

    instead of

    typedef unsigned __int8 size_8;
    typedef unsigned __int16 size_16;
    typedef unsigned __int32 size_32;

    Can you guarantee to be sure if my source code works without any
    undefined behavior on IA-32, IA-64, and x64 machine? Other machines
    require different definitions.

    For example

    #if defined( __INTEL__ ) || defined( __AMD__ )
    size_32 dword = 0x123456U;

    size_8 &L = *reinterpret_cast< size_8* >( &dword );
    size_8 &H = *( reinterpret_cast< size_8* >( &dword ) + 1 );
    size_8 &B = *( reinterpret_cast< size_8* >( &dword ) + 2 );
    size_16 &W = *reinterpret_cast< size_16* >( &dword );
    #else
    size_32 dword = 0x123456U;

    size_8 L = dword & 0xFFU;
    size_8 H = ( dword >> 8 ) & 0xFFU;
    size_8 B = ( dword >> 16 ) & 0xFFU;
    size_16 W = dword & 0xFFFFU;
    #end if
     
    Nephi Immortal, Mar 1, 2011
    #3
  4. On Feb 28, 6:54 pm, Nephi Immortal <> wrote:
    > On Feb 28, 6:33 pm, Joshua Maurice <> wrote:
    >
    >
    >
    > > On Feb 28, 3:39 pm, Nephi Immortal <> wrote:

    >
    > > >         I use reinterpret_cast to convert from 32 bits integer into 8 bits
    > > > integer.  I use reference instead of pointer to modify value.  Please
    > > > confirm if reinterpret_cast is safe on either Intel machine or AMD
    > > > machine.
    > > >         If another machine has 9 bits instead of 8 bits, thenI use “if
    > > > condition” macro to use bit shift and bit mask instead.

    >
    > > > typedef unsigned __int8 size_8;
    > > > typedef unsigned __int16 size_16;
    > > > typedef unsigned __int32 size_32;

    >
    > > > int main () {
    > > >         size_32 dword = 0x123456U;

    >
    > > >         size_8 &L = *reinterpret_cast< size_8* >( &dword );
    > > >         size_8 &H = *( reinterpret_cast< size_8* >( &dword ) + 1 );
    > > >         size_8 &B = *( reinterpret_cast< size_8* >( &dword ) + 2 );
    > > >         size_16 &W = *reinterpret_cast< size_16* >( &dword );

    >
    > > >         ++L;
    > > >         ++H;
    > > >         ++B;

    >
    > > >         L += 2;

    >
    > > >         return 0;

    >
    > > > }

    >
    > > This is broken by C++ standard. You are reading an __in32 object
    > > through a __int8 lvalue, and that is undefined behavior. At least, it
    > > is UB if __int8 is not a typedef of char nor unsigned char.

    >
    > > I don't know what various implements will actually do with that code.

    >
    > > To fix it, at least the following is allowed by the standard: you can
    > > use "char" or "unsigned char" lvalues to read any POD object.- Hide quoted text -

    >
    > > - Show quoted text -

    >
    >         I think you meant unrecognized keyword.  Another C++ Compiler than
    > Microsoft C++ Compiler or Intel C++ Compiler will generate an error
    > message to state undeclared __int8, __int16, and __int32.
    >         I assume that you suggest:
    >
    > typedef unsigned char size_8;
    > typedef unsigned short size_16;
    > typedef unsigned long size_32;
    >
    > instead of
    >
    > typedef unsigned __int8 size_8;
    > typedef unsigned __int16 size_16;
    > typedef unsigned __int32 size_32;
    >
    >         Can you guarantee to be sure if my source code works without any
    > undefined behavior on IA-32, IA-64, and x64 machine?  Other machines
    > require different definitions.
    >
    > For example
    >
    > #if defined( __INTEL__ ) || defined( __AMD__ )
    >         size_32 dword = 0x123456U;
    >
    >         size_8 &L = *reinterpret_cast< size_8* >( &dword );
    >         size_8 &H = *( reinterpret_cast< size_8* >( &dword ) + 1 );
    >         size_8 &B = *( reinterpret_cast< size_8* >( &dword ) + 2 );
    >         size_16 &W = *reinterpret_cast< size_16* >( &dword );
    > #else
    >         size_32 dword = 0x123456U;
    >
    >         size_8 L = dword & 0xFFU;
    >         size_8 H = ( dword >> 8 ) & 0xFFU;
    >         size_8 B = ( dword >> 16 ) & 0xFFU;
    >         size_16 W = dword & 0xFFFFU;
    > #end if


    Why wouldn't you just use the second form? That would be a much
    preferred way. Check the assembly yourself - but I would expect/hope
    that it should be compiled down to the same thing with optimization.
     
    Joshua Maurice, Mar 1, 2011
    #4
  5. Nephi Immortal

    m0shbear Guest

    On Mar 1, 1:14 am, Joshua Maurice <> wrote:
    > On Feb 28, 6:54 pm, Nephi Immortal <> wrote:
    >
    >
    >
    > > On Feb 28, 6:33 pm, Joshua Maurice <> wrote:

    >
    > > > On Feb 28, 3:39 pm, Nephi Immortal <> wrote:


    > > > This is broken by C++ standard. You are reading an __in32 object
    > > > through a __int8 lvalue, and that is undefined behavior. At least, it
    > > > is UB if __int8 is not a typedef of char nor unsigned char.

    >
    > > > I don't know what various implements will actually do with that code.

    >
    > > > To fix it, at least the following is allowed by the standard: you can
    > > > use "char" or "unsigned char" lvalues to read any POD object.- Hide quoted text -

    >
    > > > - Show quoted text -

    >
    > >         I think you meant unrecognized keyword.  Another C++ Compiler than
    > > Microsoft C++ Compiler or Intel C++ Compiler will generate an error
    > > message to state undeclared __int8, __int16, and __int32.
    > >         I assume that you suggest:

    >
    > > typedef unsigned char size_8;
    > > typedef unsigned short size_16;
    > > typedef unsigned long size_32;

    >
    > > instead of

    >
    > > typedef unsigned __int8 size_8;
    > > typedef unsigned __int16 size_16;
    > > typedef unsigned __int32 size_32;

    >
    > >         Can you guarantee to be sure if my source code works without any
    > > undefined behavior on IA-32, IA-64, and x64 machine?  Other machines
    > > require different definitions.

    >
    > > For example

    >
    > > #if defined( __INTEL__ ) || defined( __AMD__ )
    > >         size_32 dword = 0x123456U;

    >
    > >         size_8 &L = *reinterpret_cast< size_8* >( &dword );
    > >         size_8 &H = *( reinterpret_cast< size_8* >( &dword ) + 1 );
    > >         size_8 &B = *( reinterpret_cast< size_8* >( &dword ) + 2 );
    > >         size_16 &W = *reinterpret_cast< size_16* >( &dword );
    > > #else
    > >         size_32 dword = 0x123456U;

    >
    > >         size_8 L = dword & 0xFFU;
    > >         size_8 H = ( dword >> 8 ) & 0xFFU;
    > >         size_8 B = ( dword >> 16 ) & 0xFFU;
    > >         size_16 W = dword & 0xFFFFU;

    Possible typo: L, H, B, W are not lvalue references.
    > > #end if

    #endif. If your compiler doesn't spit out an error, something is very
    wrong.
    >
    > Why wouldn't you just use the second form? That would be a much
    > preferred way. Check the assembly yourself - but I would expect/hope
    > that it should be compiled down to the same thing with optimization.


    It's only the same on little-endian machines.
    LE: |56|34|12|00|
    (1) |L |H |B |
    | W |
    (2) |L |H |B |
    | W |

    On big-endian:
    BE: |00|12|34|56|
    (1) |L |H |B |
    | W |
    (2) | |B |H |L |
    | W |

    Hence, (1) is, strictly speaking, undefined. Should you cast it to
    char* and then do the pointer arithmetic, you should get better
    results. I've only had to use reinterpret_cast<uintN_t*, for
    N=16,...> when calling htobeN* with a pointer to _char_.
    Hence, (2) is implementation-defined, specifically with respect to
    byte ordering.

    I suggest unions with packed structs if you want to be more explicit:
    union u32_u8 {
    uint32_t DW;
    struct {
    union {
    struct {
    uint8_t L;
    uint8_t H;
    };
    uint16_t W;
    };
    uint8_t B;
    };
    };

    Then
    u32_u8& example = *reinterpret_cast<u32_u8*>(dword);
    Or, better yet,
    u32_u8 example2; example2.DW = dword;
    // proceed as usual, using members of example2

    Also,
    extern "C" {
    #include <stdint.h>
    }
    , if supported.
    For microsoft, look up ms-inttypes and save the headers to your system
    include directory.
    Then you can use (u)intN_t instead of __intN.

    C++0x should have <cstdint>

    * BSD. See <endian.h>.

    Remember, assumptions + reinterpret_cast = UB.
    I've only had to use reinterpret_cast when serializing/deserializing
    multibyte integers, for e.g. disk and cross-thread/process exception
    passing via pipes, and when doing casts _which violate conversion
    rules_, e.g. from void * to function pointer (this was an experiment
    in using a std::map<std::string, const void*> to implement runtime-
    based named parameters),
     
    m0shbear, Mar 1, 2011
    #5
  6. Nephi Immortal

    Goran Guest

    On Mar 1, 12:39 am, Nephi Immortal <> wrote:
    >         I use reinterpret_cast to convert from 32 bits integer into 8 bits
    > integer.  I use reference instead of pointer to modify value.  Please
    > confirm if reinterpret_cast is safe on either Intel machine or AMD
    > machine.
    >         If another machine has 9 bits instead of 8 bits, then I use “if
    > condition” macro to use bit shift and bit mask instead.
    >
    > typedef unsigned __int8 size_8;
    > typedef unsigned __int16 size_16;
    > typedef unsigned __int32 size_32;
    >
    > int main () {
    >         size_32 dword = 0x123456U;
    >
    >         size_8 &L = *reinterpret_cast< size_8* >( &dword );
    >         size_8 &H = *( reinterpret_cast< size_8* >( &dword ) + 1 );
    >         size_8 &B = *( reinterpret_cast< size_8* >( &dword ) + 2 );
    >         size_16 &W = *reinterpret_cast< size_16* >( &dword );
    >
    >         ++L;
    >         ++H;
    >         ++B;
    >
    >         L += 2;
    >
    >         return 0;
    >
    > }
    >
    >


    Looking at your variable names ("L", "H"), you seem to presume little-
    endian machine, which is the case for x86 and x64 (I don't think that
    intel/AMD distinction matters). That means that your code isn't doing
    what you think it does on a big-endian machine.

    On the other hand, I don't think that 9-bit datums are a practical
    consideration.

    I agree with Joshua, from what you've shown, masks and shifts seems to
    be a better approach (endiannes is handled for you). If, on the other
    hand, you actually know the binary layout inside your "dword", I would
    say that the best approach is to write a compiler-specific POD union
    and use that, e.g.

    compiler-specific-pack-to-1-directive-here
    struct as_chars { char c[4]; };
    union data
    {
    as_chars chars;
    size_32 dword;
    };
    end_compiler-specific-pack-to-1-directive

    Goran.
     
    Goran, Mar 1, 2011
    #6
  7. Nephi Immortal

    SG Guest

    On 1 Mrz., 00:39, Nephi Immortal wrote:
    >         I use reinterpret_cast to convert from 32 bits integer into 8 bits
    > integer.  I use reference instead of pointer to modify value.  Please
    > confirm if reinterpret_cast is safe on either Intel machine or AMD
    > machine.
    >         If another machine has 9 bits instead of 8 bits, then I use “if
    > condition” macro to use bit shift and bit mask instead.


    I guess that means you *do* want the program to compile on possibly
    obscure hardware/compilers.

    > typedef unsigned __int8 size_8;
    > typedef unsigned __int16 size_16;
    > typedef unsigned __int32 size_32;


    But then why would you use non-standard types like __int16 (etc)?
    What is your goal exactly?

    > int main () {
    >         size_32 dword = 0x123456U;
    >
    >         size_8 &L = *reinterpret_cast< size_8* >( &dword );
    >         size_8 &H = *( reinterpret_cast< size_8* >( &dword ) + 1 );
    >         size_8 &B = *( reinterpret_cast< size_8* >( &dword ) + 2 );
    >         size_16 &W = *reinterpret_cast< size_16* >( &dword );
    >
    >         ++L;
    >         ++H;
    >         ++B;
    >
    >         L += 2;
    >
    >         return 0;
    > }


    Regardless of what meaning you attatch to "safety" this is rather
    unsafe. Apart from the non-stanrad types __int8, __int16 etc and a
    possibly differing "endianness" you have to account for if you are
    interested in portability, you finally violate §3.10/15 which results
    in undefined behaviour.

    Here's a thought: Try to write it as portable as you can (using the
    bit operations on unsigned ints). It's totally acceptible IMHO to
    assume CHAR_BIT==8 nowadays. So,...

    #include <climits>
    #if CHAR_BIT != 8
    #error "sorry, this is all too weird for me"
    #endif

    Then you can use constant expressions for shifts (8,16,24) and masks
    (0xFF, 0xFFFF) which gives a good compiler enough opportunities to
    optimize the code.

    reinterpret_casts have their use. But using them is hardly portable.
    Also violating §3.10/15 ("strict aliasing") comes with risks.

    Cheers!
    SG
     
    SG, Mar 1, 2011
    #7
  8. Nephi Immortal

    James Kanze Guest

    On Mar 1, 12:33 am, Joshua Maurice <> wrote:
    > On Feb 28, 3:39 pm, Nephi Immortal <> wrote:


    > > I use reinterpret_cast to convert from 32 bits integer into 8 bits
    > > integer. I use reference instead of pointer to modify value.
    > > Please confirm if reinterpret_cast is safe on either Intel machine
    > > or AMD machine.


    > > If another machine has 9 bits instead of 8 bits, then I use “if
    > > condition” macro to use bit shift and bit mask instead.


    > > typedef unsigned __int8 size_8;
    > > typedef unsigned __int16 size_16;
    > > typedef unsigned __int32 size_32;


    Not sure what __int8, etc. are, although I can guess. Why not use the
    standard uint8_t, etc.?

    > > int main () {
    > > size_32 dword = 0x123456U;


    > > size_8 &L = *reinterpret_cast< size_8* >( &dword );
    > > size_8 &H = *( reinterpret_cast< size_8* >( &dword ) + 1 );
    > > size_8 &B = *( reinterpret_cast< size_8* >( &dword ) + 2 );
    > > size_16 &W = *reinterpret_cast< size_16* >( &dword );


    > > ++L;
    > > ++H;
    > > ++B;


    > > L += 2;


    > > return 0;
    > > }


    > This is broken by C++ standard. You are reading an __in32 object
    > through a __int8 lvalue, and that is undefined behavior.


    > At least, it
    > is UB if __int8 is not a typedef of char nor unsigned char.


    > I don't know what various implements will actually do with that code.


    > To fix it, at least the following is allowed by the standard: you can
    > use "char" or "unsigned char" lvalues to read any POD object.


    In practice, however, reinterpret_cast becomes useless if this doesn't
    work. The intent of the standard, here, is rather clear: the behavior
    is undefined in the standard (since there's no way it could be defined
    portably), but it is expected that the implementation define it when
    reasonable. But there is a lot of gray areas around this. There is
    also a very definite intent that the compiler can assume no aliasing
    between two pointers to different types, provided that neither type is
    char or unsigned char. From a QoI point of view, on "normal"
    machines,
    I would expect to be able to access, and even modify the bit patterns,
    through a pointer to any integral type, provided alignment
    restrictions
    are respected, and all of the reinterpret_cast are in the same
    function,
    so the compiler can see them, and take the additional aliasing into
    account (or alternatively, a compiler option is used to turn off all
    optimization based on aliasing analysis). As to what happens to the
    object whose bit pattern was actually accessed... that's very
    architecture dependent, but if you know the architecture, and the
    actual
    types are all more or less basic types, you can play games. When
    implementing the C library, I accessed a double through an unsigned
    short* several types, including modifying it (e.g. in ldexp). It's
    certainly not portable, and it's not the sort of thing to be used
    everywhere, but there are a few specific cases in very low level code
    where it is necessary. (FWIW: his code will give different
    results---supposing he output dword---on an Intel/AMD and on most
    other
    platforms.)

    --
    James Kanze
     
    James Kanze, Mar 1, 2011
    #8
  9. On Mar 1, 1:29 am, James Kanze <> wrote:
    > On Mar 1, 12:33 am, Joshua Maurice <> wrote:
    >
    > > On Feb 28, 3:39 pm, Nephi Immortal <> wrote:
    > > > I use reinterpret_cast to convert from 32 bits integer into 8 bits
    > > > integer.  I use reference instead of pointer to modify value.
    > > > Please confirm if reinterpret_cast is safe on either Intel machine
    > > > or AMD machine.
    > > > If another machine has 9 bits instead of 8 bits, then I use “if
    > > > condition” macro to use bit shift and bit mask instead.
    > > > typedef unsigned __int8 size_8;
    > > > typedef unsigned __int16 size_16;
    > > > typedef unsigned __int32 size_32;

    >
    > Not sure what __int8, etc. are, although I can guess.  Why not use the
    > standard uint8_t, etc.?
    >
    >
    >
    > > > int main () {
    > > >         size_32 dword = 0x123456U;
    > > >         size_8 &L = *reinterpret_cast< size_8* >( &dword );
    > > >         size_8 &H = *( reinterpret_cast< size_8* >( &dword ) + 1 );
    > > >         size_8 &B = *( reinterpret_cast< size_8* >( &dword ) + 2 );
    > > >         size_16 &W = *reinterpret_cast< size_16* >( &dword );
    > > >         ++L;
    > > >         ++H;
    > > >         ++B;
    > > >         L += 2;
    > > >         return 0;
    > > > }

    > > This is broken by C++ standard. You are reading an __in32 object
    > > through a __int8 lvalue, and that is undefined behavior.
    > > At least, it
    > > is UB if __int8 is not a typedef of char nor unsigned char.
    > > I don't know what various implements will actually do with that code.
    > > To fix it, at least the following is allowed by the standard: you can
    > > use "char" or "unsigned char" lvalues to read any POD object.

    >
    > In practice, however, reinterpret_cast becomes useless if this doesn't
    > work.  The intent of the standard, here, is rather clear: the behavior
    > is undefined in the standard (since there's no way it could be defined
    > portably), but it is expected that the implementation define it when
    > reasonable.  But there is a lot of gray areas around this.  There is
    > also a very definite intent that the compiler can assume no aliasing
    > between two pointers to different types, provided that neither type is
    > char or unsigned char.  From a QoI point of view, on "normal"
    > machines,
    > I would expect to be able to access, and even modify the bit patterns,
    > through a pointer to any integral type, provided alignment
    > restrictions
    > are respected, and all of the reinterpret_cast are in the same
    > function,
    > so the compiler can see them, and take the additional aliasing into
    > account (or alternatively, a compiler option is used to turn off all
    > optimization based on aliasing analysis).  As to what happens to the
    > object whose bit pattern was actually accessed... that's very
    > architecture dependent, but if you know the architecture, and the
    > actual
    > types are all more or less basic types, you can play games.  When
    > implementing the C library, I accessed a double through an unsigned
    > short* several types, including modifying it (e.g. in ldexp).  It's
    > certainly not portable, and it's not the sort of thing to be used
    > everywhere, but there are a few specific cases in very low level code
    > where it is necessary.  (FWIW: his code will give different
    > results---supposing he output dword---on an Intel/AMD and on most
    > other
    > platforms.)


    Perhaps, but the gcc team and the gcc compiler clearly disagree with
    your interpretation of the strict aliasing rules, and the C standards
    committee /seems/ to be leaning towards disagreeing with you.
    (Disagrees for the C programming language mind you, though I would
    argue that C++ ought to adopt whatever reasonable resolutions that the
    C committee comes to on the union DR and related issues.) See
    http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_236.htm
    and the links for associated meeting minutes for the discussion of the
    union DR. The C standards committee /seems/ to be leaning towards a
    very naive aliasing rule, though of course I guess we'll have to wait
    and see until they publish something definitive. Or, of course, you
    could go ask them for us as you actually know them in person (maybe?),
    and perhaps you could get them to answer the few other pesky issues I
    have about how a general purpose portable conforming pooling memory
    allocator on top of malloc is supposed to work, or not work. It would
    be nice.

    To quote you: "In practice, however, reinterpret_cast becomes useless
    if this doesn't work." I think that reinterpret_cast is largely
    useless, (except for converting between pointer types and integer
    types), especially when compiled under the default gcc options. AFAIK,
    reinterpret_cast exists for platform dependent hackery (of which I've
    had the pleasure to never need to hack), to convert between pointers
    types and integer types, and convert pointer types to char pointer and
    unsigned char pointer, and not much else.

    I stand by my original point that you really ought not read objects
    through incorrectly typed lvalues, unless that incorrectly typed
    lvalue is char or unsigned char, if you can at all help it, for
    maximum portability and conformance.
     
    Joshua Maurice, Mar 1, 2011
    #9
  10. Nephi Immortal

    James Kanze Guest

    On Mar 1, 10:25 am, Joshua Maurice <> wrote:
    > On Mar 1, 1:29 am, James Kanze <> wrote:
    > > On Mar 1, 12:33 am, Joshua Maurice <> wrote:


    > > > On Feb 28, 3:39 pm, Nephi Immortal <> wrote:
    > > > > I use reinterpret_cast to convert from 32 bits integer into 8 bits
    > > > > integer. I use reference instead of pointer to modify value.
    > > > > Please confirm if reinterpret_cast is safe on either Intel machine
    > > > > or AMD machine.
    > > > > If another machine has 9 bits instead of 8 bits, then I use “if
    > > > > condition” macro to use bit shift and bit mask instead.
    > > > > typedef unsigned __int8 size_8;
    > > > > typedef unsigned __int16 size_16;
    > > > > typedef unsigned __int32 size_32;


    > > Not sure what __int8, etc. are, although I can guess. Why not use the
    > > standard uint8_t, etc.?


    > > > > int main () {
    > > > > size_32 dword = 0x123456U;
    > > > > size_8 &L = *reinterpret_cast< size_8* >( &dword );
    > > > > size_8 &H = *( reinterpret_cast< size_8* >( &dword ) + 1 );
    > > > > size_8 &B = *( reinterpret_cast< size_8* >( &dword ) + 2 );
    > > > > size_16 &W = *reinterpret_cast< size_16* >( &dword );
    > > > > ++L;
    > > > > ++H;
    > > > > ++B;
    > > > > L += 2;
    > > > > return 0;
    > > > > }
    > > > This is broken by C++ standard. You are reading an __in32 object
    > > > through a __int8 lvalue, and that is undefined behavior.
    > > > At least, it
    > > > is UB if __int8 is not a typedef of char nor unsigned char.
    > > > I don't know what various implements will actually do with that code.
    > > > To fix it, at least the following is allowed by the standard: you can
    > > > use "char" or "unsigned char" lvalues to read any POD object.


    > > In practice, however, reinterpret_cast becomes useless if
    > > this doesn't work. The intent of the standard, here, is
    > > rather clear: the behavior is undefined in the standard
    > > (since there's no way it could be defined portably), but it
    > > is expected that the implementation define it when
    > > reasonable. But there is a lot of gray areas around this.
    > > There is also a very definite intent that the compiler can
    > > assume no aliasing between two pointers to different types,
    > > provided that neither type is char or unsigned char. From
    > > a QoI point of view, on "normal" machines, I would expect to
    > > be able to access, and even modify the bit patterns, through
    > > a pointer to any integral type, provided alignment
    > > restrictions are respected, and all of the reinterpret_cast
    > > are in the same function, so the compiler can see them, and
    > > take the additional aliasing into account (or alternatively,
    > > a compiler option is used to turn off all optimization based
    > > on aliasing analysis). As to what happens to the object
    > > whose bit pattern was actually accessed... that's very
    > > architecture dependent, but if you know the architecture,
    > > and the actual types are all more or less basic types, you
    > > can play games. When implementing the C library, I accessed
    > > a double through an unsigned short* several types, including
    > > modifying it (e.g. in ldexp). It's certainly not portable,
    > > and it's not the sort of thing to be used everywhere, but
    > > there are a few specific cases in very low level code where
    > > it is necessary. (FWIW: his code will give different
    > > results---supposing he output dword---on an Intel/AMD and on
    > > most other platforms.)


    > Perhaps, but the gcc team and the gcc compiler clearly disagree with
    > your interpretation of the strict aliasing rules, and the C standards
    > committee /seems/ to be leaning towards disagreeing with you.
    > (Disagrees for the C programming language mind you, though I would
    > argue that C++ ought to adopt whatever reasonable resolutions that the
    > C committee comes to on the union DR and related issues.) See
    > http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_236.htm


    The problem is not simple, and the current wording in both the
    C and the C++ standard definitely guarantees some behavior that
    I don't think the committee wanted to guarantee (and which
    doesn't work in g++, and probably in other compilers as well).
    Fundamentally, there are two issues which have to be addressed:

    1. Some sort of type punning is necessary in low level code.
    Historically, K&R favored unions for this (rather than
    pointer casting). For various reasons, the C committee,
    when formulating C90, moved in the direction of favoring
    pointer casts. All strictly as "intent", since there's
    nothing the standard can define with regards to modifying
    some bits in a double through a pointer to an integral type
    (for example). It's been a long time since I've been
    involved in C standardization, so I don't know if the
    committee has moved again. Both pointer casting and unions
    are wide spread in C code, and from a QoI point of view,
    I would expect both to work, the the caveats discussed
    below. Anything else shows disdain for the user community.

    2. Possible aliasing kills optimization. This is the
    motivation behind restrict, and before that noalias; the
    programmer declares that there will be no aliasing, and pays
    the price if he makes a mistake. The rules also allow the
    compiler to assume that pointers to different types (with
    a number of exceptions) do not alias. Except when they do.
    My own opinion here is that type punning using pointer casts
    discussed in 1. should only hold when the aliasing mechanism
    is visible in the function where the aliasing occurs. (I'm
    not sure how to formulate this in standardese, but I think
    this is what the C committee is trying to do.)

    > and the links for associated meeting minutes for the discussion of the
    > union DR. The C standards committee /seems/ to be leaning towards a
    > very naive aliasing rule, though of course I guess we'll have to wait
    > and see until they publish something definitive.


    I haven't followed it at all closely, but the last time
    I looked, the tendancy was to favor a rule which made the
    aliasing clear; accessing members of a union was fine (as long
    as the member read was the last member written), but only
    provided all of the accesses were through the union members.
    With regards to pointer casts, it's harder to specify, but in
    the end, they don't have to; they only have to make the intent
    sort of clear. Formally, accessing any union member except the
    last written, or accessing an object except as an unsigned char
    or its actual type is undefined behavior. And will remain so,
    since it is impossible to define anything which could be valid
    on all platforms. This means that formally, all type punning is
    undefined behavior.

    > Or, of course, you
    > could go ask them for us as you actually know them in person (maybe?),
    > and perhaps you could get them to answer the few other pesky issues I
    > have about how a general purpose portable conforming pooling memory
    > allocator on top of malloc is supposed to work, or not work. It would
    > be nice.


    If you ask on comp.std.c++, you'll get some feedback. Or
    perhaps comp.std.c---I haven't looked there in ages, but IMHO,
    this is a problem that C should resolve, and C++ should simply
    accept the decision. There's absolutely no reason for the
    languages to have different rules here.

    > To quote you: "In practice, however, reinterpret_cast becomes useless
    > if this doesn't work." I think that reinterpret_cast is largely
    > useless, (except for converting between pointer types and integer
    > types), especially when compiled under the default gcc options. AFAIK,
    > reinterpret_cast exists for platform dependent hackery (of which I've
    > had the pleasure to never need to hack), to convert between pointers
    > types and integer types, and convert pointer types to char pointer and
    > unsigned char pointer, and not much else.


    The purpose of reinterpret_cast is to make very low level,
    platform dependent hackery, necessary. I've used the equivalent
    (in C) when implementing the standard C library, one one hand in
    the implementation of malloc and free, and on the other in some
    of the low level math functions like ldexp. I tend to avoid it
    otherwise.

    > I stand by my original point that you really ought not read objects
    > through incorrectly typed lvalues, unless that incorrectly typed
    > lvalue is char or unsigned char, if you can at all help it, for
    > maximum portability and conformance.


    As soon as you need reinterpret_cast, portability goes out the
    window. It's strictly for experts, and only for very machine
    dependent code, at the very lowest level.

    --
    James Kanze
     
    James Kanze, Mar 1, 2011
    #10
  11. Nephi Immortal

    tni Guest

    Snipped example that does something like this:

    #include <iostream>
    #include <stdint.h>

    int main() {
    uint32_t i = 7;
    uint16_t& i_ref16 = *reinterpret_cast<uint16_t*>(&i);
    i_ref16 = 1;
    std::cout << i << " " << i_ref16 << std::endl;
    return 0;
    }

    On 2011-03-01 10:29, James Kanze wrote:
    > In practice, however, reinterpret_cast becomes useless if this doesn't
    > work.


    That's exactly the case.

    > The intent of the standard, here, is rather clear: the behavior
    > is undefined in the standard (since there's no way it could be defined
    > portably), but it is expected that the implementation define it when
    > reasonable. But there is a lot of gray areas around this. There is
    > also a very definite intent that the compiler can assume no aliasing
    > between two pointers to different types, provided that neither type is
    > char or unsigned char. From a QoI point of view, on "normal"
    > machines,
    > I would expect to be able to access, and even modify the bit patterns,
    > through a pointer to any integral type, provided alignment
    > restrictions
    > are respected, and all of the reinterpret_cast are in the same
    > function,
    > so the compiler can see them, and take the additional aliasing into
    > account (or alternatively, a compiler option is used to turn off all
    > optimization based on aliasing analysis). As to what happens to the
    > object whose bit pattern was actually accessed... that's very
    > architecture dependent, but if you know the architecture, and the
    > actual
    > types are all more or less basic types, you can play games.


    The above example example with GCC 4.4 -O2 produces (unless you use
    -fno-strict-aliasing):
    7 1

    So don't do it, even if it seems to work. It's simply nasty bugs waiting
    to happen in corner cases/future compiler versions.
     
    tni, Mar 1, 2011
    #11
  12. Nephi Immortal

    Paul Guest

    "Leigh Johnston" <> wrote in message
    news:...
    > On 01/03/2011 15:18, James Kanze wrote:
    >> As soon as you need reinterpret_cast, portability goes out the
    >> window. It's strictly for experts, and only for very machine
    >> dependent code, at the very lowest level.
    >>

    <snip>
    >Also, I am not sure that agree with you that a particular language feature
    >should only be used by "experts".
    >

    No big surprise there you are always looking for flaws in James' posts.
    I think Jamseyboy meant specialists specialising in low level C++ coding,
    also suggesting this was advanced C++ programming.
    You must remember English is probably not James' first language so he does
    quite well .Pity he's unclear about some C++ basics though.
     
    Paul, Mar 1, 2011
    #12
  13. Nephi Immortal

    Paul Guest

    "Leigh Johnston" <> wrote in message
    news:...
    > On 01/03/2011 19:34, Paul wrote:
    >>
    >> "Leigh Johnston" <> wrote in message
    >> news:...
    >>> On 01/03/2011 15:18, James Kanze wrote:
    >>>> As soon as you need reinterpret_cast, portability goes out the
    >>>> window. It's strictly for experts, and only for very machine
    >>>> dependent code, at the very lowest level.
    >>>>

    >> <snip>
    >>> Also, I am not sure that agree with you that a particular language
    >>> feature should only be used by "experts".
    >>>

    >> No big surprise there you are always looking for flaws in James' posts.
    >> I think Jamseyboy meant specialists specialising in low level C++
    >> coding, also suggesting this was advanced C++ programming.
    >> You must remember English is probably not James' first language so he
    >> does quite well .Pity he's unclear about some C++ basics though.
    >>

    >
    > The part of my post that you snipped clearly indicated that
    > reinterpret_cast is not solely the domain of low level C++ coding.
    > Pointer/integer type conversion is a very common idiom in Microsoft
    > Windows C/C++ development mainly stemming from the fact that a Windows
    > message passes information via the "WPARAM" and "LPARAM" integer
    > arguments.
    >
    > /Leigh

    Who cares?
    You're an idiot.
     
    Paul, Mar 1, 2011
    #13
  14. Nephi Immortal

    Paul Guest

    "Leigh Johnston" <> wrote in message
    news:...
    > On 01/03/2011 19:43, Paul wrote:
    >>
    >> "Leigh Johnston" <> wrote in message
    >> news:...
    >>> On 01/03/2011 19:34, Paul wrote:
    >>>>
    >>>> "Leigh Johnston" <> wrote in message
    >>>> news:...
    >>>>> On 01/03/2011 15:18, James Kanze wrote:
    >>>>>> As soon as you need reinterpret_cast, portability goes out the
    >>>>>> window. It's strictly for experts, and only for very machine
    >>>>>> dependent code, at the very lowest level.
    >>>>>>
    >>>> <snip>
    >>>>> Also, I am not sure that agree with you that a particular language
    >>>>> feature should only be used by "experts".
    >>>>>
    >>>> No big surprise there you are always looking for flaws in James' posts.
    >>>> I think Jamseyboy meant specialists specialising in low level C++
    >>>> coding, also suggesting this was advanced C++ programming.
    >>>> You must remember English is probably not James' first language so he
    >>>> does quite well .Pity he's unclear about some C++ basics though.
    >>>>
    >>>
    >>> The part of my post that you snipped clearly indicated that
    >>> reinterpret_cast is not solely the domain of low level C++ coding.
    >>> Pointer/integer type conversion is a very common idiom in Microsoft
    >>> Windows C/C++ development mainly stemming from the fact that a Windows
    >>> message passes information via the "WPARAM" and "LPARAM" integer
    >>> arguments.
    >>>
    >>> /Leigh

    >> Who cares?
    >> You're an idiot.

    >
    > Again rather than accepting your mistake you go into denial and throw
    > insults about; can you not see how pathetic this behaviour is?
    >
    > /Leigh
    >

    I can see that you are an idiot.
     
    Paul, Mar 1, 2011
    #14
  15. On Mar 1, 7:18 am, James Kanze <> wrote:
    > On Mar 1, 10:25 am, Joshua Maurice <> wrote:
    > > Or, of course, you
    > > could go ask them for us as you actually know them in person (maybe?),
    > > and perhaps you could get them to answer the few other pesky issues I
    > > have about how a general purpose portable conforming pooling memory
    > > allocator on top of malloc is supposed to work, or not work. It would
    > > be nice.

    >
    > If you ask on comp.std.c++, you'll get some feedback.


    I have had a thread up now for months, many posts by myself that
    include musing and questions, and not a single reply. It was in a
    thread talking about a DR in the current draft that would disallow
    general purpose portable conforming pooling memory allocators on top
    of malloc, maybe.

    > Or
    > perhaps comp.std.c---I haven't looked there in ages,


    I tried that as well. Got a large thread going, ~137 replies atm.
    Unfortunately, nothing was really resolved. Instead we started talking
    about the effective type rules and whether
    *a.x
    has distinct behavior from
    a->x
    which seemed quite silly given my C++ background where they are
    defined to be equivalent. The argument was that "*a" counts as an
    access of the struct of type a no matter its context, and "a->x" does
    not count as an access of the (whole?) struct. I think near the end of
    the discussion the silly side backed down and said that they don't
    know, and are waiting for the C committee to clear this up.

    > but IMHO,
    > this is a problem that C should resolve, and C++ should simply
    > accept the decision.  There's absolutely no reason for the
    > languages to have different rules here.


    Agreed. First C needs to figure out its own rules though. There does
    not appear to be a clear consensus on if and why the following program
    has undefined behavior in C and/or C++.

    #include <stddef.h>
    #include <stdlib.h>
    int main()
    {
    typedef struct T1 { int x; int y; } T1;
    typedef struct T2 { int x; int y; } T2;

    void* p = 0;
    T1* a = 0;
    T2* b = 0;
    int* y = 0;

    if (offsetof(T1, y) != offsetof(T2, y))
    return 1;
    if (sizeof(T1) != sizeof(T2))
    return 2;

    p = malloc(sizeof(T1));
    a = (T1*) p;
    b = (T2*) p;
    y = & a->y;
    *y = 1;
    y = & b->y;
    return *y;
    }

    The interesting part is:
    y = & b->y;
    return *y;
    My naive understanding is that
    y = & b->y;
    is not meant to be UB, nor is UB as Rules As Written. Moreover, for
    the read
    return *y;
    we know that y has the same bit-value as y did when we made a write
    through it. That is, under a naive understanding, it is the same
    pointer value, and it points to the same memory location. In the C
    parlance, it is reading a (sub-)object whose effective type is int
    through an int lvalue. No UB there.

    However, when you combine these two things, the intent and consensus
    seems to be UB, but this is not Rules As Written anywhere where I can
    find. That's the first interesting problem.

    In a related DR, the C standards committee is talking about the
    "Providence" of the pointer - that is a sort of data dependency
    analysis of its origins. That might be a way to resolve that
    particular mess.

    The second interesting problem is the following program:

    /* Program 2, version 1 */
    #include <stdlib.h>
    #include <stdio.h>
    int main()
    {
    void* p = 0;
    float* f = 0;
    int* i = 0;

    p = malloc(sizeof(float) + sizeof(int));
    f = (float*) p;
    *f = 1;
    printf("%f\n", *f);
    i = (int*) p;
    *i = 1;
    printf("%d\n", *i);
    }

    Which naively can be rewritten as:

    #include <stdlib.h>
    #include <stdio.h>
    int main()
    {
    void* p = 0;
    float* f = 0;
    int* i = 0;

    p = malloc(sizeof(float) + sizeof(int));
    f = (float*) p;
    i = (int*) p;
    *f = 1;
    printf("%f\n", *f);
    *i = 1;
    printf("%d\n", *i);
    }

    Which naively can be rewritten as:

    #include <stdlib.h>
    #include <stdio.h>
    void foo(float* f, int* i)
    {
    *f = 1;
    printf("%f\n", *f);
    *i = 1;
    printf("%d\n", *i);
    }
    int main()
    {
    void* p = 0;
    float* f = 0;
    int* i = 0;

    p = malloc(sizeof(float) + sizeof(int));
    f = (float*) p;
    i = (int*) p;
    foo(f, i);
    }

    Which naively can be optimized as the following, which of course
    breaks things:

    #include <stdlib.h>
    #include <stdio.h>
    void foo(float* f, int* i)
    {
    *f = 1;
    *i = 1;
    printf("%f\n", *f);
    printf("%d\n", *i);
    }
    int main()
    {
    void* p = 0;
    float* f = 0;
    int* i = 0;

    p = malloc(sizeof(float) + sizeof(int));
    f = (float*) p;
    i = (int*) p;
    foo(f, i);
    }

    "Program 2, version 1" needs to have defined behavior if you want
    userspace portable conforming general purpose pooling memory
    allocators written on top of malloc or new. I argue that we definitely
    need to allow that. The last program needs to have UB. The question of
    course is where exactly is it broken. The C standards committee,
    judging from its DR notes and meeting minutes, wants to say it breaks
    when we introduce the function foo. That is of course sensible, but
    not Rules As Written, and definitely not anything official yet.
     
    Joshua Maurice, Mar 1, 2011
    #15
  16. Nephi Immortal

    James Kanze Guest

    On Mar 1, 3:51 pm, Leigh Johnston <> wrote:
    > On 01/03/2011 15:18, James Kanze wrote:


    > > As soon as you need reinterpret_cast, portability goes out the
    > > window. It's strictly for experts, and only for very machine
    > > dependent code, at the very lowest level.


    > Quite often I have to use reinterpret_cast in GUI code (Microsoft) for
    > converting to/from GUI control item "LPARAM" values which I use for
    > associating a GUI control item with an object; I wouldn't call this
    > "very lowest level". Also, I am not sure that agree with you that a
    > particular language feature should only be used by "experts".


    I'll admit that it can be necessary to work around a poorly
    designed interface. The obvious answer is to fix the interface,
    but we don't always have that liberty.

    And the term "expert" is not meant to be precise, but just to
    suggest that the decision to use it should not be made lightly.
    I can easily imagine a case where the decision to use it
    regularly in the interface to some external interface was made
    by one of the project's "experts", but the actual use (following
    the guidelines laid down by the expert) was by very run of the
    mill programmers. I'd generally recommend having the experts
    write a wrapper to the poorly designed interface, and letting
    the others use that. But if the poorly designed interface is
    a standard (Posix dlsym, for example, which can't be used
    without a reinterpret_cast) or a pseudo-standard (Windows GUI?),
    there's a strong argument for letting the programmers use the
    interface they already know, rather than having to learn a new
    one.

    --
    James Kanze
     
    James Kanze, Mar 2, 2011
    #16
  17. Nephi Immortal

    James Kanze Guest

    On Mar 1, 10:47 pm, Joshua Maurice <> wrote:
    > On Mar 1, 7:18 am, James Kanze <> wrote:


    > > On Mar 1, 10:25 am, Joshua Maurice <> wrote:
    > > > Or, of course, you could go ask them for us as you
    > > > actually know them in person (maybe?), and perhaps you
    > > > could get them to answer the few other pesky issues I have
    > > > about how a general purpose portable conforming pooling
    > > > memory allocator on top of malloc is supposed to work, or
    > > > not work. It would be nice.


    > > If you ask on comp.std.c++, you'll get some feedback.


    > I have had a thread up now for months, many posts by myself
    > that include musing and questions, and not a single reply. It
    > was in a thread talking about a DR in the current draft that
    > would disallow general purpose portable conforming pooling
    > memory allocators on top of malloc, maybe.


    Hmmm. In the past, you'd typically have gotten an answer. (Or
    several contradictory answers:).)

    > > Or
    > > perhaps comp.std.c---I haven't looked there in ages,


    > I tried that as well. Got a large thread going, ~137 replies atm.
    > Unfortunately, nothing was really resolved.


    I think part of the problem is that standard (C99) doesn't
    actually say what was intended. And that different people don't
    even agree with what was intended (nor with what it actually
    says, for that matter, although in most cases, I find what it
    actually says rather clear, albeit almost certainly not what was
    intended).

    > Instead we started talking
    > about the effective type rules and whether
    > *a.x
    > has distinct behavior from
    > a->x
    > which seemed quite silly given my C++ background where they are
    > defined to be equivalent.


    They're defined to be equivalent in C as well. Although there's
    special language in C (and maybe in C++0x) which says that the
    equivalence has some limits: a is the same as *(a+i), except
    when it is preceded by a & operator (in C99), for example.

    > The argument was that "*a" counts as an
    > access of the struct of type a no matter its context, and "a->x" does
    > not count as an access of the (whole?) struct. I think near the end of
    > the discussion the silly side backed down and said that they don't
    > know, and are waiting for the C committee to clear this up.


    > > but IMHO,
    > > this is a problem that C should resolve, and C++ should simply
    > > accept the decision. There's absolutely no reason for the
    > > languages to have different rules here.


    > Agreed. First C needs to figure out its own rules though. There does
    > not appear to be a clear consensus on if and why the following program
    > has undefined behavior in C and/or C++.


    > #include <stddef.h>
    > #include <stdlib.h>
    > int main()
    > {
    > typedef struct T1 { int x; int y; } T1;
    > typedef struct T2 { int x; int y; } T2;


    > void* p = 0;
    > T1* a = 0;
    > T2* b = 0;
    > int* y = 0;


    > if (offsetof(T1, y) != offsetof(T2, y))
    > return 1;
    > if (sizeof(T1) != sizeof(T2))
    > return 2;


    > p = malloc(sizeof(T1));
    > a = (T1*) p;
    > b = (T2*) p;
    > y = & a->y;
    > *y = 1;
    > y = & b->y;
    > return *y;
    > }


    There is one place here where C and C++ are different. In
    C (IIRC), if you leave out the first T1/T2 in the typedef's, T1
    and T2 would be "compatible types". C++ doesn't have the notion
    of "compatbile type", and would continue to treat them as two
    distinct types.

    > The interesting part is:
    > y = & b->y;
    > return *y;
    > My naive understanding is that
    > y = & b->y;
    > is not meant to be UB, nor is UB as Rules As Written. Moreover, for
    > the read
    > return *y;
    > we know that y has the same bit-value as y did when we made a write
    > through it. That is, under a naive understanding, it is the same
    > pointer value, and it points to the same memory location. In the C
    > parlance, it is reading a (sub-)object whose effective type is int
    > through an int lvalue. No UB there.


    I'm not sure. The real question is what is the type of the
    object in the malloc'ed memory. In the above, we're dealing
    with a corner case where I don't think the committee really
    cares if it is defined or not---they'll take whatever falls out
    of the definitions made to handle the more general cases.

    Of course, in practice, something like the above will always
    work. Even if T1 and T2 were full C++ classes, with
    constructors, virtual functions, and the works. As long as T1
    and T2 had the same layout, and they will have the same layout
    as long as their data members are the same, and they are
    "similar enough" on other grounds (e.g. either both have virtual
    functions, or neither; both have the same base classes, if any,
    etc.). In general, C++ wants this to be undefined behavior, at
    least when non-PODs are involved; the committee almost certainly
    doesn't want to get into the issues defining how similar is
    "similar enough". And as I said, they don't really care whether
    the case in the example is UB, so they don't make a special case
    of it.

    > However, when you combine these two things, the intent and consensus
    > seems to be UB, but this is not Rules As Written anywhere where I can
    > find. That's the first interesting problem.


    I think that the intent is clearly UB, at least in C++. With
    regards to the rules, the real question is when malloc'ed memory
    becomes an object with a specific type. Once it has become an
    object with a specific type, accessing it as another object with
    a specific type is clearly undefined behavior. And although *y
    has type int, the "complete object" being accessed does depend
    on how the pointer was initialized. And whether the expression
    *y = 1; has "created" an object of type T1 (since y was
    initialized with a pointer into a T1) or not. And while the
    standard definitly does forbid accessing an object of one type
    through an lvalue expression of another type, it doesn't say
    anything about when memory acquired by calling malloc or the
    operator new function becomes an object of a specific type
    (unless the object is a non-POD class, in which case, it's when
    the constructor runs).

    There is also an interesting variant on the above example.
    Suppose, instead of y, I have:
    int* y1 = &a->y;
    int* y2 = &b->y;
    assert(y1 == y2);
    If y1 == y2, then they point to the same object. So *y1 = 1;
    return *y2 is definitely legal. Unless, of course, something
    previous triggered undefined behavior. (But at this point,
    we've not yet accessed the actual memory returned by malloc, so
    its type is indeterminate. Or?)

    Until we know when (if ever) the memory returned by malloc
    becomes a T1 or a T2, we can't really answer these questions.
    And both the C and the C++ standards are completely silent about
    this.

    > In a related DR, the C standards committee is talking about the
    > "Providence" of the pointer - that is a sort of data dependency
    > analysis of its origins. That might be a way to resolve that
    > particular mess.


    > The second interesting problem is the following program:


    > /* Program 2, version 1 */
    > #include <stdlib.h>
    > #include <stdio.h>
    > int main()
    > {
    > void* p = 0;
    > float* f = 0;
    > int* i = 0;


    > p = malloc(sizeof(float) + sizeof(int));
    > f = (float*) p;
    > *f = 1;
    > printf("%f\n", *f);
    > i = (int*) p;
    > *i = 1;
    > printf("%d\n", *i);
    > }


    > Which naively can be rewritten as:


    > #include <stdlib.h>
    > #include <stdio.h>
    > int main()
    > {
    > void* p = 0;
    > float* f = 0;
    > int* i = 0;


    > p = malloc(sizeof(float) + sizeof(int));
    > f = (float*) p;
    > i = (int*) p;
    > *f = 1;
    > printf("%f\n", *f);
    > *i = 1;
    > printf("%d\n", *i);
    > }


    > Which naively can be rewritten as:


    > #include <stdlib.h>
    > #include <stdio.h>
    > void foo(float* f, int* i)
    > {
    > *f = 1;
    > printf("%f\n", *f);
    > *i = 1;
    > printf("%d\n", *i);
    > }
    > int main()
    > {
    > void* p = 0;
    > float* f = 0;
    > int* i = 0;


    > p = malloc(sizeof(float) + sizeof(int));
    > f = (float*) p;
    > i = (int*) p;
    > foo(f, i);
    > }


    > Which naively can be optimized as the following, which of course
    > breaks things:


    > #include <stdlib.h>
    > #include <stdio.h>
    > void foo(float* f, int* i)
    > {
    > *f = 1;
    > *i = 1;
    > printf("%f\n", *f);
    > printf("%d\n", *i);
    > }
    > int main()
    > {
    > void* p = 0;
    > float* f = 0;
    > int* i = 0;


    > p = malloc(sizeof(float) + sizeof(int));
    > f = (float*) p;
    > i = (int*) p;
    > foo(f, i);
    > }


    > "Program 2, version 1" needs to have defined behavior if you want
    > userspace portable conforming general purpose pooling memory
    > allocators written on top of malloc or new. I argue that we definitely
    > need to allow that. The last program needs to have UB.


    I agree whole heartedly. I think that that is the intent, or at
    least it should be.

    > The question of
    > course is where exactly is it broken. The C standards committee,
    > judging from its DR notes and meeting minutes, wants to say it breaks
    > when we introduce the function foo. That is of course sensible, but
    > not Rules As Written, and definitely not anything official yet.


    Yes. There are at least two problems: one (at least for C++) is
    when the raw memory allocated with malloc or the operator new
    function starts being an object with a definite type (or in
    other words, when does it start being illegal to access it as
    a different type); the other is defining the exact aliasing
    rules, so that they do depend on the compiler being able to see
    aliases: as the standard is currently written, I think your
    Program 2, version 1 has undefined behavior (although the intent
    is clearly that it should work), and the following program:

    int foo(int* p1, float* p2)
    {
    int result = *p1;
    *p2 = 3.14159;
    return result;
    }

    int main()
    {
    union { int i; float f; } u;
    u.i = 42;
    printf("%d\n", foo(&u.i, &u.f));
    return 0;
    }

    is clearly legal, although it doesn't work with g++, I don't
    think it was the intent that it be well defined, and there are
    very good reasons for making it undefined. But I don't know
    quite how to formulate this in standardese: intuitively, I'd say
    that anytime there was a reinterpret_cast or a union in
    a function, the compiler must assume that pointers and
    references to any of the involved types may be aliases, and in
    all other cases, it may assume that pointers and references to
    different types are not aliases, unless one of the types is char
    or unsigned char. But that isn't conform with the way the
    standard is written.

    --
    James Kanze
     
    James Kanze, Mar 2, 2011
    #17
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Johan Ovlinger
    Replies:
    3
    Views:
    335
    Johan Ovlinger
    Mar 22, 2005
  2. Gabriel Rossetti
    Replies:
    0
    Views:
    1,363
    Gabriel Rossetti
    Aug 29, 2008
  3. Alex Vinokur
    Replies:
    1
    Views:
    592
  4. Replies:
    1
    Views:
    355
    Brian Candler
    Aug 12, 2003
  5. Aredridel

    Not just $SAFE, but damn $SAFE

    Aredridel, Sep 2, 2004, in forum: Ruby
    Replies:
    19
    Views:
    254
Loading...

Share This Page