Questions about "alignment" in memory

Discussion in 'C++' started by J. Campbell, Oct 7, 2003.

  1. J. Campbell

    J. Campbell Guest

    I posted a question some time back about accessing a char array as an
    array of words. In order not to overrun the char array, I padded it
    with enough 0x00 bytes to ensure that when accessed as words I
    wouldn't overrun the array. I was told that this is dangerous and
    that there could be alignment problems if, for example, I wanted to
    access the char array elements from non-even multiples of sizeof(int).
    For example, if I had the array:

    char a[10];

    and I wanted to access the 8 bytes (a[2], a[3],..., a[8], a[9]) as the
    array:

    int b[2];

    where (b[0] contains the data in a[2] to a[5], and b[1] contains a[6]
    to a[9])

    I understand the alignment issue in this example. My question
    is...can I turn this problem on its head...for example, create an
    empty array of ints, then access this memory space as a char?

    Here's what I'm talking about:


    unsigned int* a_words;
    char* a_bytes;

    fstream in("myfile.dat", ios::in | ios::binary | ios::ate);
    int filesize_bytes = in.tellg();
    int filesize_words = filesize_bytes / sizeof(int) + ((filesize_bytes %
    sizeof(int)) > 0); // add 1 if there is a remander...

    a_words = new unsigned int[filesize_words];
    a_bytes = reinterpret_cast<char*>(a_words);

    in.seekg(2, ios::beg); //note...out of (word) alignment...starts on
    3rd byte
    in.read(a_bytes, filesize_bytes-3);
    in.close();

    at which point the file is in memory and can be accessed as bytes (by
    indexing a_bytes[0 to filesize_bytes]) or as words (by indexing
    a_words[0 to filesize_words].

    This seems to work fine. Additionally, it shouldn't suffer potential
    alignment problems since the array is defined to align with words, and
    word addresses should be accessable to a byte address, even if the
    converse of this is not true.

    I can see that there will be compatibility problems with this system
    if ported to a system where CHAR_BIT != 8. However, I don't care
    about these systems. If I'm only doing logical operators on the bits
    in the file, I don't even see any endian issues with doing this.

    Thanks for the slap-in-the-face I'm sure I'll get for performing such
    blastphomous operations in c++. Seriously, does this treatment
    circumvent potential alignment issues?
    J. Campbell, Oct 7, 2003
    #1
    1. Advertising

  2. J. Campbell

    WW Guest

    J. Campbell wrote:
    > I posted a question some time back about accessing a char array as an
    > array of words. In order not to overrun the char array, I padded it
    > with enough 0x00 bytes to ensure that when accessed as words I
    > wouldn't overrun the array. I was told that this is dangerous and
    > that there could be alignment problems if, for example, I wanted to
    > access the char array elements from non-even multiples of sizeof(int).
    > For example, if I had the array:
    >
    > char a[10];
    >
    > and I wanted to access the 8 bytes (a[2], a[3],..., a[8], a[9]) as the
    > array:
    >
    > int b[2];
    >
    > where (b[0] contains the data in a[2] to a[5], and b[1] contains a[6]
    > to a[9])
    >
    > I understand the alignment issue in this example. My question
    > is...can I turn this problem on its head...for example, create an
    > empty array of ints, then access this memory space as a char?


    Yes you can, but only with char being the "other" thing. Another solution
    is to define a union, with a char and an int array inside.

    --
    WW aka Attila
    WW, Oct 7, 2003
    #2
    1. Advertising

  3. J. Campbell

    Default User Guest

    WW wrote:

    > Yes you can, but only with char being the "other" thing. Another solution
    > is to define a union, with a char and an int array inside.



    This is not guaranteed. It is implementation-defined behavior if the
    value of a member of a union object is used when the most recent store
    to the object was to a different member, other than structs sharing a
    common initial sequence.

    Many implementations do allow it.

    There are more portable ways, basically shifting and or-ing the bytes
    onto an int.




    Brian Rodenborn
    Default User, Oct 7, 2003
    #3
  4. J. Campbell

    WW Guest

    Default User wrote:
    > WW wrote:
    >
    >> Yes you can, but only with char being the "other" thing. Another
    >> solution is to define a union, with a char and an int array inside.

    >
    >
    > This is not guaranteed. It is implementation-defined behavior if the
    > value of a member of a union object is used when the most recent store
    > to the object was to a different member, other than structs sharing a
    > common initial sequence.


    yep. But we are talking about a char and an int array so far.

    --
    WW aka Attila
    WW, Oct 7, 2003
    #4
  5. J. Campbell

    Default User Guest

    WW wrote:
    >
    > Default User wrote:
    > > WW wrote:
    > >
    > >> Yes you can, but only with char being the "other" thing. Another
    > >> solution is to define a union, with a char and an int array inside.

    > >
    > >
    > > This is not guaranteed. It is implementation-defined behavior if the
    > > value of a member of a union object is used when the most recent store
    > > to the object was to a different member, other than structs sharing a
    > > common initial sequence.

    >
    > yep. But we are talking about a char and an int array so far.



    Right, which don't come under the exemption. If I got the OP's problem
    right, he had a buffer of char that he wanted to convert into a series
    of ints. Using unions to do so would be implementation-defined behavior
    (if I'm reading the standard correctly).

    Here's a way from my personal library:

    unsigned int CreateDataWord (unsigned char data[4])
    {
    unsigned int dataword = 0;

    for (int i = 0; i < 4; i++)
    {
    dataword |= data << (3-i) * 8;
    }
    return dataword;
    }


    Note that this uses unsigned char for the buffer, which is guaranteed to
    be safe, requires CHAR_BIT == 8, and is predicated on 32-bit int, so it
    has its own nonportabilities.



    Brian Rodenborn
    Default User, Oct 8, 2003
    #5
  6. J. Campbell

    J. Campbell Guest

    "WW" <> wrote in:

    > yep. But we are talking about a char and an int array so far.


    Thanks...<so far :)>

    Indeed, the real question is: is it SAFE to access a region of
    memory, defined as other than char, as a char array...if you are aware
    of the issues? Your answer indicates a cautious "yes" if you are
    gentle, and make sure never to overstep the char array bounds...as
    long as CHAR_BIT is the length expected. Is this interpretation
    correct??

    Thanks for the response...still trying to learn...6 mos into the
    process...still love QB45...;-)
    J. Campbell, Oct 8, 2003
    #6
  7. J. Campbell

    J. Campbell Guest

    Default User <> wrote in message news:<>...
    > WW wrote:
    >
    > > Yes you can, but only with char being the "other" thing. Another solution
    > > is to define a union, with a char and an int array inside.

    >
    >
    > This is not guaranteed. It is implementation-defined behavior if the
    > value of a member of a union object is used when the most recent store
    > to the object was to a different member, other than structs sharing a
    > common initial sequence.
    >
    > Many implementations do allow it.
    >
    > There are more portable ways, basically shifting and or-ing the bytes
    > onto an int.
    >
    > Brian Rodenborn


    Brian,

    So...you raise issue with the use of union...but what about my
    original solution where I take a char array and put it into an int
    array...which I then access as both an int and a char array. Are
    there alignment problems with this, or are the problems more local???

    I somehow get the feeling you are posting from Galviston...if this is
    the case, then it explains the dissarray. Cheers, ciao, and thanks in
    advance for the c++ help.
    J. Campbell, Oct 8, 2003
    #7
  8. "J. Campbell" <> wrote in message
    news:...
    > [...]
    > So...you raise issue with the use of union...but what about my
    > original solution where I take a char array and put it into an int
    > array...which I then access as both an int and a char array. Are
    > there alignment problems with this, or are the problems more
    > local???
    > [...]


    You would need to do a reinterpret cast, and that is not one of
    the portable types for it. So technically, no. Doing what you
    suggest will result in an ill-formed program (or maybe the
    behaviour is just implementation-defined). On the other hand,
    it will probably work on 99% of the compilers and systems out
    there. Since it would be costly to do it the "right" way, I
    personally would just run with it. But that's just me, and this is
    a C++ newsgroup, so if I were toeing the party line like a good
    programmer, I would revile you for suggesting a program which
    might possibly contravene the sacred text which is the C++
    standard. Anyway, good luck.

    Dave



    ---
    Outgoing mail is certified Virus Free.
    Checked by AVG anti-virus system (http://www.grisoft.com).
    Version: 6.0.521 / Virus Database: 319 - Release Date: 9/23/2003
    David B. Held, Oct 8, 2003
    #8
  9. J. Campbell

    Default User Guest

    "J. Campbell" wrote:

    > So...you raise issue with the use of union...but what about my
    > original solution where I take a char array and put it into an int
    > array...which I then access as both an int and a char array. Are
    > there alignment problems with this, or are the problems more local???



    You can access any object as an array of unsigned char safely. That's
    because unsigned char is guaranteed to have no trap representations. An
    array of ints can be accessed as unsigned char. However, you then must
    be cognizant of endianess of the ints in the array. It's generally kind
    of tricky, I've found it easier and more portable (no method is
    completely portable) to use bitwise operators.




    Brian Rodenborn
    Default User, Oct 8, 2003
    #9
  10. "David B. Held" <> wrote in message
    news:bm0fki$73j$...
    > [...]
    > On the other hand, it will probably work on 99% of the
    > compilers and systems out there.
    > [...]


    After reading Default User's post, I realized I should have added
    the caveat that it will probably work on 99% of the compilers
    and systems out there *but in a generally non-portable way*.
    That means that since you're reading raw bytes into an array
    from a file, and assuming a certain byte order for int, the code
    obviously won't work on a platform that has a different byte order.
    But usually, people who do stuff like this aren't interested in
    portability in the first place.

    Dave



    ---
    Outgoing mail is certified Virus Free.
    Checked by AVG anti-virus system (http://www.grisoft.com).
    Version: 6.0.521 / Virus Database: 319 - Release Date: 9/23/2003
    David B. Held, Oct 8, 2003
    #10
  11. J. Campbell

    Default User Guest

    "David B. Held" wrote:

    > After reading Default User's post, I realized I should have added
    > the caveat that it will probably work on 99% of the compilers
    > and systems out there *but in a generally non-portable way*.
    > That means that since you're reading raw bytes into an array
    > from a file, and assuming a certain byte order for int, the code
    > obviously won't work on a platform that has a different byte order.
    > But usually, people who do stuff like this aren't interested in
    > portability in the first place.



    Byte order is a big problem for me, because my code has to work on
    Windows for desktop testing, then to the target hardware, which has a
    different endianess. My methods (bitwise ops) were compatible to both
    without change. You'll have an easier time finding platforms with
    CHAR_BIT == 8 and 32-bit integral types.

    Once you devise the packing and unpacking routines for the data words,
    then all you need to deal with is the unsigned char array.




    Brian Rodenborn
    Default User, Oct 8, 2003
    #11
  12. (J. Campbell) wrote in message news:<>...
    > I understand the alignment issue in this example. My question
    > is...can I turn this problem on its head...for example, create an
    > empty array of ints, then access this memory space as a char?


    Sure.

    > I can see that there will be compatibility problems with this system
    > if ported to a system where CHAR_BIT != 8. However, I don't care
    > about these systems. If I'm only doing logical operators on the bits
    > in the file, I don't even see any endian issues with doing this.


    If you access the array as int, you will be endian-specific. Whether
    you use arithmetic or logic operations makes no difference.

    Sam
    Samuel Barber, Oct 9, 2003
    #12
  13. J. Campbell

    J. Campbell Guest

    Default User wrote in message news:<>...
    > Here's a way from my personal library:
    >
    > unsigned int CreateDataWord (unsigned char data[4])
    > {
    > unsigned int dataword = 0;
    >
    > for (int i = 0; i < 4; i++)
    > {
    > dataword |= data << (3-i) * 8;
    > }
    > return dataword;
    > }
    >
    >
    > Note that this uses unsigned char for the buffer, which is guaranteed to
    > be safe, requires CHAR_BIT == 8, and is predicated on 32-bit int, so it
    > has its own nonportabilities.
    >
    > Brian Rodenborn


    Thanks for the input, Brian. Regarding your function
    CreateDataWord...I just want to point out that if you just want to
    pack a char buffer into ints, you can do this portabally while making
    no assumptions of the system bit size, or the size of CHAR_BIT.
    However, you actually need two functions...depending on how you want
    to pack your word...the function you show packs the word Little
    Endian. Here is compilable code that uses 2 portable versions of your
    function.

    #include <iostream>

    using namespace std;

    void wait();
    unsigned int makeBE(unsigned char a[]);
    unsigned int makeLE(unsigned char a[]);
    bool endian_check();

    int main(){
    int ws = sizeof(int);
    cout << "This is a " << ws * CHAR_BIT << "-bit system\n"
    << "Bytes are " << CHAR_BIT << "-bits\n"
    << "Words are " << ws << " bytes\n\n"
    << "Checking system endianness...System is ";

    if(endian_check()) cout << "Little Endian (Intel)\n\n";
    else cout << "Big Endian (Motorola)\n\n";

    unsigned char data[ws]; // Make a 1-word char array and fill it
    for(int i = 0; i < ws; ++i) data = 0x41 + i;

    cout << "The " << ws << " byte sequence \"";
    for(int i = 0; i < ws; ++i) cout << data;
    cout << "\" (Ascii)\n"
    << "is translated to a " << ws
    << " byte integer word (hex) as:\n\n" << hex;
    cout << "Big Endian(Motorola): " << makeBE(data) << endl;
    cout << "Little Endian(Intel): " << makeLE(data) << endl << endl;
    wait();
    return 0;
    }

    unsigned int makeBE (unsigned char data[sizeof(int)]){
    unsigned int dataword = 0;

    for (int i = 0; i < sizeof(int); i++)
    dataword |= (data << (i * CHAR_BIT));
    return dataword;
    }

    unsigned int makeLE (unsigned char data[sizeof(int)]){
    unsigned int dataword = 0;
    int index = 0;

    for (int i = sizeof(int); i > 0; )
    dataword |= data[index++] << --i * CHAR_BIT;
    return dataword;
    }

    bool endian_check(){
    unsigned int word = 0x1;
    unsigned char* byte = reinterpret_cast<unsigned char*>(&word);
    return (byte[0]); // returns 1 if LE, 0 if BE
    }

    void wait(){
    cout<<"<Enter> to continue..";
    string z; getline(cin,z);
    }
    J. Campbell, Oct 9, 2003
    #13
  14. J. Campbell

    WW Guest

    Default User wrote:
    > WW wrote:
    >>
    >> Default User wrote:
    >>> WW wrote:
    >>>
    >>>> Yes you can, but only with char being the "other" thing. Another
    >>>> solution is to define a union, with a char and an int array inside.
    >>>
    >>>
    >>> This is not guaranteed. It is implementation-defined behavior if the
    >>> value of a member of a union object is used when the most recent
    >>> store to the object was to a different member, other than structs
    >>> sharing a common initial sequence.

    >>
    >> yep. But we are talking about a char and an int array so far.

    >
    >
    > Right, which don't come under the exemption. If I got the OP's problem
    > right, he had a buffer of char that he wanted to convert into a series
    > of ints. Using unions to do so would be implementation-defined
    > behavior (if I'm reading the standard correctly).


    Yeah, you do. Emerican Netiveness. :)

    > Here's a way from my personal library:
    >
    > unsigned int CreateDataWord (unsigned char data[4])
    > {
    > unsigned int dataword = 0;
    >
    > for (int i = 0; i < 4; i++)
    > {
    > dataword |= data << (3-i) * 8;
    > }
    > return dataword;
    > }
    >
    > Note that this uses unsigned char for the buffer, which is guaranteed
    > to be safe, requires CHAR_BIT == 8, and is predicated on 32-bit int,
    > so it has its own nonportabilities.


    Yepp... But if you did write long int, then it would be fully portable
    IIRC.

    --
    WW aka Attila
    WW, Oct 9, 2003
    #14
  15. J. Campbell

    Default User Guest

    WW wrote:

    > > Note that this uses unsigned char for the buffer, which is guaranteed
    > > to be safe, requires CHAR_BIT == 8, and is predicated on 32-bit int,
    > > so it has its own nonportabilities.

    >
    > Yepp... But if you did write long int, then it would be fully portable
    > IIRC.



    Probably should have been long. My original code used our own local
    guaranteed sized type, UINT_32, which is very nonportable.



    Brian Rodenborn
    Default User, Oct 9, 2003
    #15
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Peter van Merkerk

    Re: memory alignment?

    Peter van Merkerk, Aug 1, 2003, in forum: C++
    Replies:
    1
    Views:
    384
    David Cattarin
    Aug 1, 2003
  2. Thomas Matthews

    Re: memory alignment?

    Thomas Matthews, Aug 1, 2003, in forum: C++
    Replies:
    0
    Views:
    366
    Thomas Matthews
    Aug 1, 2003
  3. Andrew Koenig

    Re: memory alignment?

    Andrew Koenig, Aug 1, 2003, in forum: C++
    Replies:
    1
    Views:
    320
    Karl Heinz Buchegger
    Aug 1, 2003
  4. Paul_Huang

    Memory Padding and alignment

    Paul_Huang, Sep 21, 2004, in forum: C++
    Replies:
    2
    Views:
    3,945
    Serge Paccalin
    Sep 21, 2004
  5. aneesh

    memory alignment

    aneesh, Sep 24, 2003, in forum: C Programming
    Replies:
    3
    Views:
    1,262
    Micah Cowan
    Sep 24, 2003
Loading...

Share This Page