Questions about "alignment" in memory

J

J. Campbell

I posted a question some time back about accessing a char array as an
array of words. In order not to overrun the char array, I padded it
with enough 0x00 bytes to ensure that when accessed as words I
wouldn't overrun the array. I was told that this is dangerous and
that there could be alignment problems if, for example, I wanted to
access the char array elements from non-even multiples of sizeof(int).
For example, if I had the array:

char a[10];

and I wanted to access the 8 bytes (a[2], a[3],..., a[8], a[9]) as the
array:

int b[2];

where (b[0] contains the data in a[2] to a[5], and b[1] contains a[6]
to a[9])

I understand the alignment issue in this example. My question
is...can I turn this problem on its head...for example, create an
empty array of ints, then access this memory space as a char?

Here's what I'm talking about:


unsigned int* a_words;
char* a_bytes;

fstream in("myfile.dat", ios::in | ios::binary | ios::ate);
int filesize_bytes = in.tellg();
int filesize_words = filesize_bytes / sizeof(int) + ((filesize_bytes %
sizeof(int)) > 0); // add 1 if there is a remander...

a_words = new unsigned int[filesize_words];
a_bytes = reinterpret_cast<char*>(a_words);

in.seekg(2, ios::beg); //note...out of (word) alignment...starts on
3rd byte
in.read(a_bytes, filesize_bytes-3);
in.close();

at which point the file is in memory and can be accessed as bytes (by
indexing a_bytes[0 to filesize_bytes]) or as words (by indexing
a_words[0 to filesize_words].

This seems to work fine. Additionally, it shouldn't suffer potential
alignment problems since the array is defined to align with words, and
word addresses should be accessable to a byte address, even if the
converse of this is not true.

I can see that there will be compatibility problems with this system
if ported to a system where CHAR_BIT != 8. However, I don't care
about these systems. If I'm only doing logical operators on the bits
in the file, I don't even see any endian issues with doing this.

Thanks for the slap-in-the-face I'm sure I'll get for performing such
blastphomous operations in c++. Seriously, does this treatment
circumvent potential alignment issues?
 
W

WW

J. Campbell said:
I posted a question some time back about accessing a char array as an
array of words. In order not to overrun the char array, I padded it
with enough 0x00 bytes to ensure that when accessed as words I
wouldn't overrun the array. I was told that this is dangerous and
that there could be alignment problems if, for example, I wanted to
access the char array elements from non-even multiples of sizeof(int).
For example, if I had the array:

char a[10];

and I wanted to access the 8 bytes (a[2], a[3],..., a[8], a[9]) as the
array:

int b[2];

where (b[0] contains the data in a[2] to a[5], and b[1] contains a[6]
to a[9])

I understand the alignment issue in this example. My question
is...can I turn this problem on its head...for example, create an
empty array of ints, then access this memory space as a char?

Yes you can, but only with char being the "other" thing. Another solution
is to define a union, with a char and an int array inside.
 
D

Default User

WW said:
Yes you can, but only with char being the "other" thing. Another solution
is to define a union, with a char and an int array inside.


This is not guaranteed. It is implementation-defined behavior if the
value of a member of a union object is used when the most recent store
to the object was to a different member, other than structs sharing a
common initial sequence.

Many implementations do allow it.

There are more portable ways, basically shifting and or-ing the bytes
onto an int.




Brian Rodenborn
 
W

WW

Default said:
This is not guaranteed. It is implementation-defined behavior if the
value of a member of a union object is used when the most recent store
to the object was to a different member, other than structs sharing a
common initial sequence.

yep. But we are talking about a char and an int array so far.
 
D

Default User

WW said:
yep. But we are talking about a char and an int array so far.


Right, which don't come under the exemption. If I got the OP's problem
right, he had a buffer of char that he wanted to convert into a series
of ints. Using unions to do so would be implementation-defined behavior
(if I'm reading the standard correctly).

Here's a way from my personal library:

unsigned int CreateDataWord (unsigned char data[4])
{
unsigned int dataword = 0;

for (int i = 0; i < 4; i++)
{
dataword |= data << (3-i) * 8;
}
return dataword;
}


Note that this uses unsigned char for the buffer, which is guaranteed to
be safe, requires CHAR_BIT == 8, and is predicated on 32-bit int, so it
has its own nonportabilities.



Brian Rodenborn
 
J

J. Campbell

WW said:
yep. But we are talking about a char and an int array so far.

Thanks...<so far :)>

Indeed, the real question is: is it SAFE to access a region of
memory, defined as other than char, as a char array...if you are aware
of the issues? Your answer indicates a cautious "yes" if you are
gentle, and make sure never to overstep the char array bounds...as
long as CHAR_BIT is the length expected. Is this interpretation
correct??

Thanks for the response...still trying to learn...6 mos into the
process...still love QB45...;-)
 
J

J. Campbell

Default User said:
This is not guaranteed. It is implementation-defined behavior if the
value of a member of a union object is used when the most recent store
to the object was to a different member, other than structs sharing a
common initial sequence.

Many implementations do allow it.

There are more portable ways, basically shifting and or-ing the bytes
onto an int.

Brian Rodenborn

Brian,

So...you raise issue with the use of union...but what about my
original solution where I take a char array and put it into an int
array...which I then access as both an int and a char array. Are
there alignment problems with this, or are the problems more local???

I somehow get the feeling you are posting from Galviston...if this is
the case, then it explains the dissarray. Cheers, ciao, and thanks in
advance for the c++ help.
 
D

David B. Held

J. Campbell said:
[...]
So...you raise issue with the use of union...but what about my
original solution where I take a char array and put it into an int
array...which I then access as both an int and a char array. Are
there alignment problems with this, or are the problems more
local???
[...]

You would need to do a reinterpret cast, and that is not one of
the portable types for it. So technically, no. Doing what you
suggest will result in an ill-formed program (or maybe the
behaviour is just implementation-defined). On the other hand,
it will probably work on 99% of the compilers and systems out
there. Since it would be costly to do it the "right" way, I
personally would just run with it. But that's just me, and this is
a C++ newsgroup, so if I were toeing the party line like a good
programmer, I would revile you for suggesting a program which
might possibly contravene the sacred text which is the C++
standard. Anyway, good luck.

Dave
 
D

Default User

J. Campbell said:
So...you raise issue with the use of union...but what about my
original solution where I take a char array and put it into an int
array...which I then access as both an int and a char array. Are
there alignment problems with this, or are the problems more local???


You can access any object as an array of unsigned char safely. That's
because unsigned char is guaranteed to have no trap representations. An
array of ints can be accessed as unsigned char. However, you then must
be cognizant of endianess of the ints in the array. It's generally kind
of tricky, I've found it easier and more portable (no method is
completely portable) to use bitwise operators.




Brian Rodenborn
 
D

David B. Held

David B. Held said:
[...]
On the other hand, it will probably work on 99% of the
compilers and systems out there.
[...]

After reading Default User's post, I realized I should have added
the caveat that it will probably work on 99% of the compilers
and systems out there *but in a generally non-portable way*.
That means that since you're reading raw bytes into an array
from a file, and assuming a certain byte order for int, the code
obviously won't work on a platform that has a different byte order.
But usually, people who do stuff like this aren't interested in
portability in the first place.

Dave
 
D

Default User

David B. Held said:
After reading Default User's post, I realized I should have added
the caveat that it will probably work on 99% of the compilers
and systems out there *but in a generally non-portable way*.
That means that since you're reading raw bytes into an array
from a file, and assuming a certain byte order for int, the code
obviously won't work on a platform that has a different byte order.
But usually, people who do stuff like this aren't interested in
portability in the first place.


Byte order is a big problem for me, because my code has to work on
Windows for desktop testing, then to the target hardware, which has a
different endianess. My methods (bitwise ops) were compatible to both
without change. You'll have an easier time finding platforms with
CHAR_BIT == 8 and 32-bit integral types.

Once you devise the packing and unpacking routines for the data words,
then all you need to deal with is the unsigned char array.




Brian Rodenborn
 
S

Samuel Barber

I understand the alignment issue in this example. My question
is...can I turn this problem on its head...for example, create an
empty array of ints, then access this memory space as a char?
Sure.

I can see that there will be compatibility problems with this system
if ported to a system where CHAR_BIT != 8. However, I don't care
about these systems. If I'm only doing logical operators on the bits
in the file, I don't even see any endian issues with doing this.

If you access the array as int, you will be endian-specific. Whether
you use arithmetic or logic operations makes no difference.

Sam
 
J

J. Campbell

Default User wrote in message news: said:
Here's a way from my personal library:

unsigned int CreateDataWord (unsigned char data[4])
{
unsigned int dataword = 0;

for (int i = 0; i < 4; i++)
{
dataword |= data << (3-i) * 8;
}
return dataword;
}


Note that this uses unsigned char for the buffer, which is guaranteed to
be safe, requires CHAR_BIT == 8, and is predicated on 32-bit int, so it
has its own nonportabilities.

Brian Rodenborn


Thanks for the input, Brian. Regarding your function
CreateDataWord...I just want to point out that if you just want to
pack a char buffer into ints, you can do this portabally while making
no assumptions of the system bit size, or the size of CHAR_BIT.
However, you actually need two functions...depending on how you want
to pack your word...the function you show packs the word Little
Endian. Here is compilable code that uses 2 portable versions of your
function.

#include <iostream>

using namespace std;

void wait();
unsigned int makeBE(unsigned char a[]);
unsigned int makeLE(unsigned char a[]);
bool endian_check();

int main(){
int ws = sizeof(int);
cout << "This is a " << ws * CHAR_BIT << "-bit system\n"
<< "Bytes are " << CHAR_BIT << "-bits\n"
<< "Words are " << ws << " bytes\n\n"
<< "Checking system endianness...System is ";

if(endian_check()) cout << "Little Endian (Intel)\n\n";
else cout << "Big Endian (Motorola)\n\n";

unsigned char data[ws]; // Make a 1-word char array and fill it
for(int i = 0; i < ws; ++i) data = 0x41 + i;

cout << "The " << ws << " byte sequence \"";
for(int i = 0; i < ws; ++i) cout << data;
cout << "\" (Ascii)\n"
<< "is translated to a " << ws
<< " byte integer word (hex) as:\n\n" << hex;
cout << "Big Endian(Motorola): " << makeBE(data) << endl;
cout << "Little Endian(Intel): " << makeLE(data) << endl << endl;
wait();
return 0;
}

unsigned int makeBE (unsigned char data[sizeof(int)]){
unsigned int dataword = 0;

for (int i = 0; i < sizeof(int); i++)
dataword |= (data << (i * CHAR_BIT));
return dataword;
}

unsigned int makeLE (unsigned char data[sizeof(int)]){
unsigned int dataword = 0;
int index = 0;

for (int i = sizeof(int); i > 0; )
dataword |= data[index++] << --i * CHAR_BIT;
return dataword;
}

bool endian_check(){
unsigned int word = 0x1;
unsigned char* byte = reinterpret_cast<unsigned char*>(&word);
return (byte[0]); // returns 1 if LE, 0 if BE
}

void wait(){
cout<<"<Enter> to continue..";
string z; getline(cin,z);
}
 
W

WW

Default said:
Right, which don't come under the exemption. If I got the OP's problem
right, he had a buffer of char that he wanted to convert into a series
of ints. Using unions to do so would be implementation-defined
behavior (if I'm reading the standard correctly).

Yeah, you do. Emerican Netiveness. :)
Here's a way from my personal library:

unsigned int CreateDataWord (unsigned char data[4])
{
unsigned int dataword = 0;

for (int i = 0; i < 4; i++)
{
dataword |= data << (3-i) * 8;
}
return dataword;
}

Note that this uses unsigned char for the buffer, which is guaranteed
to be safe, requires CHAR_BIT == 8, and is predicated on 32-bit int,
so it has its own nonportabilities.


Yepp... But if you did write long int, then it would be fully portable
IIRC.
 
D

Default User

WW said:
Yepp... But if you did write long int, then it would be fully portable
IIRC.


Probably should have been long. My original code used our own local
guaranteed sized type, UINT_32, which is very nonportable.



Brian Rodenborn
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top