Accessing individual bytes of an integer

  • Thread starter =?iso-8859-1?q?Daniel_Lidstr=F6m?=
  • Start date
?

=?iso-8859-1?q?Daniel_Lidstr=F6m?=

Hello!

I want to work with individual bytes of integers. I know that ints are
32-bit and will always be. Sometimes I want to work with the entire
32-bits, and other times I want to modify just the first 8-bits for
example. For me, I think it would be best if I can declare the 32-bits
like this:

unsigned char bits[4];

When I want to treat this as a 32-bits integer, can I do something
like this?

unsigned int bits32 = *((unsigned int*)bits);

I'm unsure of the syntax. I don't need to work in-place so to speak. It is
fine to work with a copy.

Thanks in advance!
 
A

Andrew Koenig

When I want to treat this as a 32-bits integer, can I do something
like this?

unsigned int bits32 = *((unsigned int*)bits);

Yes you can, but you have absolutely no assurance as to what the results
will be :)

What's wrong with

(bits>>n) & 0xff

where n is 0, 8, 16, or 24?
 
M

MatrixV

Daniel Lidström said:
Hello!

I want to work with individual bytes of integers. I know that ints are
32-bit and will always be. Sometimes I want to work with the entire
32-bits, and other times I want to modify just the first 8-bits for
example. For me, I think it would be best if I can declare the 32-bits
like this:

unsigned char bits[4];

When I want to treat this as a 32-bits integer, can I do something
like this?

unsigned int bits32 = *((unsigned int*)bits);

I'm unsure of the syntax. I don't need to work in-place so to speak. It is
fine to work with a copy.

Thanks in advance!

Unconsidering the byte sequence, you are correct.
A better way is using a union like:
union xxx
{
unsigned char bits[4];
unsigned int i;
};
 
T

Thomas Matthews

MatrixV said:
Hello!

I want to work with individual bytes of integers. I know that ints are
32-bit and will always be. Sometimes I want to work with the entire
32-bits, and other times I want to modify just the first 8-bits for
example. For me, I think it would be best if I can declare the 32-bits
like this:

unsigned char bits[4];

When I want to treat this as a 32-bits integer, can I do something
like this?

unsigned int bits32 = *((unsigned int*)bits);

I'm unsure of the syntax. I don't need to work in-place so to speak. It is
fine to work with a copy.

Thanks in advance!


Unconsidering the byte sequence, you are correct.
A better way is using a union like:
union xxx
{
unsigned char bits[4];
unsigned int i;
};

How about this:
union xxx
{
unsigned char bytes[sizeof(unsigned int))];
unsigned int i;
};
This makes no assumptions about how many bytes are
in an integer.


--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.comeaucomputing.com/learn/faq/
Other sites:
http://www.josuttis.com -- C++ STL Library book
http://www.sgi.com/tech/stl -- Standard Template Library
 
A

Andrew Koenig

Unconsidering the byte sequence, you are correct.
A better way is using a union like:
union xxx
{
unsigned char bits[4];
unsigned int i;
};

Not really. When you use a union, you have no assurance about the effect
that giving a value to one member of a union will have on other members.
 
O

Old Wolf

MatrixV said:
Daniel Lidström said:
I want to work with individual bytes of integers. I know that ints are
32-bit and will always be. Sometimes I want to work with the entire
32-bits, and other times I want to modify just the first 8-bits for
example. For me, I think it would be best if I can declare the 32-bits
like this:

unsigned char bits[4];

When I want to treat this as a 32-bits integer, can I do something
like this?

unsigned int bits32 = *((unsigned int*)bits);

Bad - if 'bits' is not correctly aligned for an int, then
you have undefined behaviour.

You can work in-place with:
unsigned int bits32;
and then to access the chars:
((unsigned char *)&bits32)[0]
etc. Note that the contents of the chars could be anything
(eg. big endian, little endian, or something more exotic),
and if you modify one of those chars then you aren't guaranteed
to have anything sensible left in bits32.

If you don't want to work in-place then you could memcpy
between the int and the char (with the same caveats I mentioned
already).

To work portably (assuming a 32-bit int and 8-bit char),
you can use bit-shifts and masks to extract the four bytes
and replace them. A good compiler would optimise this code
into a single instruction, if it could.
Unconsidering the byte sequence, you are correct.
A better way is using a union like:
union xxx
{
unsigned char bits[4];
unsigned int i;
};

Undefined behaviour if you access a member of a union that
wasn't the one you just set.
 
C

Clark S. Cox III

Unconsidering the byte sequence, you are correct.
A better way is using a union like:
union xxx
{
unsigned char bits[4];
unsigned int i;
};

Not really. When you use a union, you have no assurance about the
effect that giving a value to one member of a union will have on other
members.

You do when one of them is an array of unsigned char.
 
I

Ioannis Vranos

Daniel said:
Hello!

I want to work with individual bytes of integers. I know that ints are
32-bit and will always be. Sometimes I want to work with the entire
32-bits, and other times I want to modify just the first 8-bits for
example. For me, I think it would be best if I can declare the 32-bits
like this:

unsigned char bits[4];

When I want to treat this as a 32-bits integer, can I do something
like this?

unsigned int bits32 = *((unsigned int*)bits);


Yes but not like this because array bits is not initialised.


I'm unsure of the syntax. I don't need to work in-place so to speak. It is
fine to work with a copy.


What you can do is read an unsigned int or any other POD type as a
sequence of unsigned chars (or plain chars) - that is bytes, copy it
byte by byte to another unsigned char sequence (which includes possible
padding bits), and deal the new char sequence as another unsigned int.


The following example uses an int and is portable:


#include <iostream>

int main()
{
int integer=0;

unsigned char *puc= reinterpret_cast<unsigned char *>(&integer);


unsigned char otherInt[sizeof(integer)];

// Read integer byte by byte and copy it to otherInt
for(unsigned i=0; i<sizeof(integer); ++i)
otherInt= puc;


// We treat the new unsigned char sequence as an int
int *p= reinterpret_cast<int *>(otherInt);

// Assign another value to the integer otherInt!
*p=7;

std::cout<<*p<<"\n";
}
 
J

Jack Klein

Hello!

I want to work with individual bytes of integers. I know that ints are
32-bit and will always be.

No, you don't. You just think you do. But you are mistaken.
 
A

Andrew Koenig

Not really. When you use a union, you have no assurance about the effect
You do when one of them is an array of unsigned char.

Can you show me where in the C++ standard it says that? The text that I
think is relevant can be found in subclause 9.5:

In a union, at most one of the data members can be active at any time, that
is, the value of at most one of the data members can be stored in a union at
any time. [Note: one special guarantee is made in order to simplify the use
of unions: If a POD-union contains several POD-structs that share a common
initial sequence (9.2), and if an object of this POD-union type contains one
of the POD-structs, it is permitted to inspect the common initial sequence
of any of POD-struct members; see 9.2. ]

I think that "Only one of the data members can be active at any time" is
pretty clear, and the one exception to that rule says nothing about array of
unsigned character.
 
L

Larry Brasfield

....
What you can do is read an unsigned int or any other POD type as a sequence of unsigned chars (or plain chars) - that is bytes,
copy it byte by byte to another unsigned char sequence (which includes possible padding bits), and deal the new char sequence as
another unsigned int.


The following example uses an int and is portable:

I have to disagree with your "is portable" claim.
Comments inserted below in your code.
#include <iostream>

int main()
{
int integer=0;

unsigned char *puc= reinterpret_cast<unsigned char *>(&integer);


unsigned char otherInt[sizeof(integer)];

// Read integer byte by byte and copy it to otherInt
for(unsigned i=0; i<sizeof(integer); ++i)
otherInt= puc;


// We treat the new unsigned char sequence as an int
int *p= reinterpret_cast<int *>(otherInt);


There is no assurance that the attempt to access an int
at the starting address of otherInt will succeed. On some
machines, it could produce an alignment fault.
// Assign another value to the integer otherInt!
*p=7;

The above access could also produce an alignment fault.
std::cout<<*p<<"\n";
}

I think "works on some platforms" versus "can fault on
some platforms" is a good example of "not portable".
 
R

Ron Natalie

Andrew said:
I think that "Only one of the data members can be active at any time" is
pretty clear, and the one exception to that rule says nothing about array of
unsigned character.

I believfe he is referring to the passage at 3.10 p 15:
If a program attempts to access the stored value of an object through an lvalue of other than one of the following
types the behavior is undefined48):
— the dynamic type of the object,
— a cv-qualified version of the dynamic type of the object,
— a type that is the signed or unsigned type corresponding to the dynamic type of the object,
— a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of
the object,
— an aggregate or union type that includes one of the aforementioned types among its members (including,
recursively, a member of a subaggregate or contained union),
— a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
— a char or unsigned char type.
 
I

Ioannis Vranos

Andrew said:
Can you show me where in the C++ standard it says that? The text that I
think is relevant can be found in subclause 9.5:

In a union, at most one of the data members can be active at any time, that
is, the value of at most one of the data members can be stored in a union at
any time. [Note: one special guarantee is made in order to simplify the use
of unions: If a POD-union contains several POD-structs that share a common
initial sequence (9.2), and if an object of this POD-union type contains one
of the POD-structs, it is permitted to inspect the common initial sequence
of any of POD-struct members; see 9.2. ]

I think that "Only one of the data members can be active at any time" is
pretty clear, and the one exception to that rule says nothing about array of
unsigned character.


However the interesting part is that the entire union can be read as an
unsigned char/plain char array, as is the case with all POD types.
 
I

Ioannis Vranos

Larry said:
There is no assurance that the attempt to access an int
at the starting address of otherInt will succeed. On some
machines, it could produce an alignment fault.
I think "works on some platforms" versus "can fault on
some platforms" is a good example of "not portable".


The standard guarantees that we can both read a POD type as an array of
unsigned chars/plain chars, copy its contents to another array of
unsigned chars/plain chars of the same size, and the new array is an
exact copy of the initial POD object.


That's why you can copy byte by byte or use memcpy() for this, an entire
array of ints for example. The same applies to an individual int.
 
O

Old Wolf

Ron said:
I believfe he is referring to the passage at 3.10 p 15:
If a program attempts to access the stored value of an object through
an lvalue of other than one of the following
types the behavior is undefined48):
- the dynamic type of the object,
- a cv-qualified version of the dynamic type of the object,
- a type that is the signed or unsigned type corresponding to the dynamic type of the object,
- a type that is the signed or unsigned type corresponding to a
cv-qualified version of the dynamic type of
the object,
- an aggregate or union type that includes one of the
aforementioned types among its members (including,
recursively, a member of a subaggregate or contained union),
- a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
- a char or unsigned char type.

That passage is irrelevant to this situation. It says
"If (conditions) then the behaviour is undefined". The union
example in question does not meet (conditions). QED.

To put it another way, the passage you quoted doesn't say that
the behaviour is defined for those listed bullet points.
 
O

Old Wolf

Ron said:
I believfe he is referring to the passage at 3.10 p 15:
If a program attempts to access the stored value of an object through
an lvalue of other than one of the following
types the behavior is undefined48):
- the dynamic type of the object,
- a cv-qualified version of the dynamic type of the object,
- a type that is the signed or unsigned type corresponding to the dynamic type of the object,
- a type that is the signed or unsigned type corresponding to a
cv-qualified version of the dynamic type of
the object,
- an aggregate or union type that includes one of the
aforementioned types among its members (including,
recursively, a member of a subaggregate or contained union),
- a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
- a char or unsigned char type.

That passage is irrelevant to this situation. It says
"If (conditions) then the behaviour is undefined". The union
example in question does not meet (conditions). QED.

To put it another way, the passage you quoted doesn't say that
the behaviour is defined for those listed bullet points.
 
L

Larry Brasfield

Ioannis Vranos said:
The standard guarantees that we can both read a POD type as an array of unsigned chars/plain chars, copy its contents to another
array of unsigned chars/plain chars of the same size, and the new array is an exact copy of the initial POD object.

True, but irrelevant. Your code had these elements:
unsigned char otherInt[sizeof(integer)];
// ...
// We treat the new unsigned char sequence as an int
int *p= reinterpret_cast<int *>(otherInt);
// Assign another value to the integer otherInt!
*p=7;

Because char may have looser alignment restrictions than
int, the char array named 'otherInt' may start at an address
that is not aligned sufficiently to be a directly accessible int.
The reinterpret_cast can result in a pointer whose value
would never be an int pointer for any normally allocated int.
And therefor the assignment to *p could produce an alignment
fault on some platforms. The MIPS or Alpha are examples.
That's why you can copy byte by byte or use memcpy() for this, an entire array of ints for example. The same applies to an
individual int.

My beef is not with copying an int as a series of bytes. It
is with accessing a whole int at an address not allocated
as an int. That is simply not portable.
 
I

Ioannis Vranos

Larry said:
The standard guarantees that we can both read a POD type as an array of unsigned chars/plain chars, copy its contents to another
array of unsigned chars/plain chars of the same size, and the new array is an exact copy of the initial POD object.


True, but irrelevant. Your code had these elements:
unsigned char otherInt[sizeof(integer)];
// ...
// We treat the new unsigned char sequence as an int
int *p= reinterpret_cast<int *>(otherInt);
// Assign another value to the integer otherInt!
*p=7;

Because char may have looser alignment restrictions than
int, the char array named 'otherInt' may start at an address
that is not aligned sufficiently to be a directly accessible int.


Actually char/unsigned char have no padding bits.

The reinterpret_cast can result in a pointer whose value
would never be an int pointer for any normally allocated int.
And therefor the assignment to *p could produce an alignment
fault on some platforms. The MIPS or Alpha are examples.


I am not sure I understand that. However let me give you another example:


int main()
{
int arrayInt[4]={0};

unsigned char arrayUChar[4 * sizeof(*arrayInt)];


unsigned char *p= reinterpret_cast<unsigned char *>(arrayInt);

// Or memcpy()
for(unsigned i=0; i<4 * sizeof(*arrayInt); ++i)
arrayUChar= p;


int *pInt= reinterpret_cast<int *>(arrayUChar);

// Treat p as an exact copy of arrayInt and change the values
// of its ints.
for(unsigned i=0; i< 4 * sizeof(*arrayInt); ++i)
p= i;
}


Isn't this guaranteed to be portable? Take notice of

int *pInt= reinterpret_cast<int *>(arrayUChar);


This is the meaning of an exact copy of a POD type. The same can happen
with a struct for example where I would assign SomeStruct *p instead.


Also here essentially, we access the first int of an int array.

My beef is not with copying an int as a series of bytes. It
is with accessing a whole int at an address not allocated
as an int. That is simply not portable.


We can access an int, because int is a *POD type* and the standard
guarantees that we can create *exact copies* of any POD type and access
them via a pointer or reference of the POD type.

Built in types *are* POD types.
 
I

Ioannis Vranos

Ioannis said:
We can access an int, because int is a *POD type* and the standard
guarantees that we can create *exact copies* of any POD type and access
them via a pointer or reference of the POD type.

Built in types *are* POD types.


Actually the standard states:

"2. For any object (other than a base-class subobject) of POD type T,
whether or not the object holds a valid value of type T, the underlying
bytes (1.7) making up the object can be copied into an array of char or
unsigned char.36) If the content of the array of char or unsigned char
is copied back into the object, the object shall subsequently hold its
original value.

[Example:

#define N sizeof(T)

char buf[N];

T obj; // obj initialized to its original value

memcpy(buf, &obj, N); // between these two calls to memcpy,

// obj might be modified
memcpy(&obj, buf, N); // at this point, each subobject of obj of scalar
// type

// holds its original value
—end example]


3. For any POD type T, if two pointers to T point to distinct T objects
obj1 and obj2, where neither obj1 nor obj2 is a base-class subobject, if
the value of obj1 is copied into obj2, using the memcpy library
function, obj2 shall subsequently hold the same value as obj1. [Example:

T* t1p;
T* t2p;

// provided that t2p points to an initialized object ...
memcpy(t1p, t2p, sizeof(T)); // at this point, every subobject of POD
// type in *t1p contains
// the same value as the corresponding subobject in *t2p
—end example]


4. The object representation of an object of type T is the sequence of N
unsigned char objects taken up by the object of type T, where N equals
sizeof(T). The value representation of an object is the set of bits
that hold the value of type T. For POD types, the value representation
is a set of bits in the object representation that determines a value,
which is one discrete element of an implementation-defined set of values.37)

5. Object types have alignment requirements (3.9.1, 3.9.2). The
alignment of a complete object type is an implementation-defined integer
value representing a number of bytes; an object is allocated at an
address that meets the alignment requirements of its object type."



Unfortunately you are right.


Case 3:

At first the destination must be a pointer of the same object type (and
presumably the allocated space must be for the same type objects.


Secondly, the standard guarantees that only memcpy() works in this case,
and not copying it char by char or unsigned char by unsigned char. In
other words, in an implementation memcpy() can be defined in some exotic
way (e.g. assembly), and still copying it unsigned char by unsigned char
to the destination has undefined behaviour, while using memcpy() for the
same destination is guaranteed to work!


So my examples fixed (and not producing exactly the same results):


#include <iostream>
#include <cstring>

int main()
{
using namespace std;

int integer=4;

int secondInteger;


memcpy(&secondInteger, &integer, sizeof(integer));


cout<<integer<<"\n";
}



and


#include <iostream>
#include <cstring>


int main()
{
using namespace std;

int arrayInt[4]={1,2,3,4};

int secondArrayInt[sizeof(arrayInt)];


memcpy(secondArrayInt, arrayInt, sizeof(arrayInt));

for(unsigned i=0; i< sizeof(4); ++i)
cout<<secondArrayInt<<"\n";
}



Please correct if I am wrong. Aren't the above guaranteed to be portable?
 
I

Ioannis Vranos

Ioannis said:
Actually the standard states:

"2. For any object (other than a base-class subobject) of POD type T,
whether or not the object holds a valid value of type T, the underlying
bytes (1.7) making up the object can be copied into an array of char or
unsigned char.36) If the content of the array of char or unsigned char
is copied back into the object, the object shall subsequently hold its
original value.

[Example:

#define N sizeof(T)

char buf[N];

T obj; // obj initialized to its original value

memcpy(buf, &obj, N); // between these two calls to memcpy,

// obj might be modified
memcpy(&obj, buf, N); // at this point, each subobject of obj of scalar
// type

// holds its original value
—end example]


3. For any POD type T, if two pointers to T point to distinct T objects
obj1 and obj2, where neither obj1 nor obj2 is a base-class subobject, if
the value of obj1 is copied into obj2, using the memcpy library
function, obj2 shall subsequently hold the same value as obj1. [Example:

T* t1p;
T* t2p;

// provided that t2p points to an initialized object ...
memcpy(t1p, t2p, sizeof(T)); // at this point, every subobject of POD
// type in *t1p contains
// the same value as the corresponding subobject in *t2p
—end example]


4. The object representation of an object of type T is the sequence of N
unsigned char objects taken up by the object of type T, where N equals
sizeof(T). The value representation of an object is the set of bits
that hold the value of type T. For POD types, the value representation
is a set of bits in the object representation that determines a value,
which is one discrete element of an implementation-defined set of
values.37)

5. Object types have alignment requirements (3.9.1, 3.9.2). The
alignment of a complete object type is an implementation-defined integer
value representing a number of bytes; an object is allocated at an
address that meets the alignment requirements of its object type."



Unfortunately you are right.


Case 3:

At first the destination must be a pointer of the same object type (and
presumably the allocated space must be for the same type objects.


Secondly, the standard guarantees that only memcpy() works in this case,
and not copying it char by char or unsigned char by unsigned char. In
other words, in an implementation memcpy() can be defined in some exotic
way (e.g. assembly), and still copying it unsigned char by unsigned char
to the destination has undefined behaviour, while using memcpy() for the
same destination is guaranteed to work!


So my examples fixed (and not producing exactly the same results):


#include <iostream>
#include <cstring>

int main()
{
using namespace std;

int integer=4;

int secondInteger;


memcpy(&secondInteger, &integer, sizeof(integer));

cout<<secondInteger<<"\n";
}



and


#include <iostream>
#include <cstring>


int main()
{
using namespace std;

int arrayInt[4]={1,2,3,4};

int secondArrayInt[sizeof(arrayInt)];


memcpy(secondArrayInt, arrayInt, sizeof(arrayInt));

for(unsigned i=0; i< sizeof(4); ++i)
cout<<secondArrayInt<<"\n";
}



Please correct if I am wrong. Aren't the above guaranteed to be portable?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top