Split a numeric value into bytes (char)

Barzo · Jan 21, 2009

Hi,

I have a value (long for example) and I want to split into an array of
char.

Doing so:

unsigned long l = 0xAABBCCDD;
unsigned char *bytes = (unsigned char*)& l;

works, but is it secure? Or there is a better method?

Tnx.
Daniele.

Helge Kruse · Jan 21, 2009

Barzo said:
Hi,

I have a value (long for example) and I want to split into an array of
char.

Doing so:

unsigned long l = 0xAABBCCDD;
unsigned char *bytes = (unsigned char*)& l;

works, but is it secure? Or there is a better method?

Tnx.
Daniele.

This works, but keep in mind that this code is platform dependend.
bytes[0] can be 0xAA, 0xDD or probably any other value, depending on the
endianess of the CPU.

Further drawback: You dont have any size information of the bytes array.
Slightly better would be:

union
{
long l;
bytes b[4];
} u;

or

union
{
long l;
struct
{
b1:8;
b2:8;
b3:8;
b4:8;
} b;
} u;

Regards,
Helge

anon · Jan 21, 2009

Barzo said:
Hi,

I have a value (long for example) and I want to split into an array of
char.

Doing so:

unsigned long l = 0xAABBCCDD;
unsigned char *bytes = (unsigned char*)& l;

works, but is it secure? Or there is a better method?

What do you mean "secure"???

You could do this:
unsigned long l = 0xAABBCCDD;
unsigned char *bytes = reinterpret_cast< unsigned char* > ( & l );

Maxim Yegorushkin · Jan 21, 2009

I have a value (long for example) and I want to split into an array of
char.

Doing so:

unsigned long l = 0xAABBCCDD;
unsigned char *bytes = (unsigned char*)& l;

works, but is it secure?

Provided l lifetime is same or longer than that of bytes.

Or there is a better method?

Your method ignores padding bits (if any, normally none on popular x86
and x86-64 platforms) and byte order. http://en.wikipedia.org/wiki/Endianness

To make it byte order insensitive and ignore any padding bits you
could do:

#include <stdio.h>
#include <limits.h>

template<class T>
void asBytes(T t, unsigned char* bytes)
{
for(int i = 0; i != sizeof t; ++i)
bytes = static_cast<unsigned char>(t >> i * CHAR_BIT &
0xff);
}

template<class T>
void fromBytes(unsigned char const* bytes, T* t)
{
*t = 0;
for(int i = 0; i != sizeof t; ++i)
*t |= bytes << CHAR_BIT * i;
}

int main()
{
long l = 0x12345678;
unsigned char bytes[sizeof l];
asBytes(l, bytes);
long m;
fromBytes(bytes, &m);

printf("%lx -> %lx\n", l, m);
}

Obviously, the above two functions only work with integer types.

Barzo · Jan 21, 2009

Your method ignores padding bits (if any, normally none on popular x86
and x86-64 platforms) and byte order.

To make it byte order insensitive and ignore any padding bits you
could do:
....
Obviously, the above two functions only work with integer types.

This is exactly what I need.

Thanks Max!
Daniele.

Rolf Magnus · Jan 21, 2009

Helge said:
Barzo said:

Hi,

I have a value (long for example) and I want to split into an array of
char.

Doing so:

unsigned long l = 0xAABBCCDD;
unsigned char *bytes = (unsigned char*)& l;

works, but is it secure? Or there is a better method?

Tnx.
Daniele.

Click to expand...

This works, but keep in mind that this code is platform dependend.
bytes[0] can be 0xAA, 0xDD or probably any other value, depending on the
endianess of the CPU.

Further drawback: You dont have any size information of the bytes array.

The size is equal to sizeof(unsigned long).

Slightly better would be:

union
{
long l;
bytes b[4];
} u;

I don't consider that better. It's abuse of unions, which are not supposed
to be used like that. That's what casts (in this case reinterpret_cast) are
for.

Erik WikstrÃ¶m · Jan 21, 2009

Barzo said:
Barzo said:

Hi,

I have a value (long for example) and I want to split into an array of
char.

Doing so:

unsigned long l = 0xAABBCCDD;
unsigned char *bytes = (unsigned char*)& l;

works, but is it secure? Or there is a better method?

Tnx.
Daniele.

Click to expand...

This works, but keep in mind that this code is platform dependend.
bytes[0] can be 0xAA, 0xDD or probably any other value, depending on the
endianess of the CPU.

Further drawback: You dont have any size information of the bytes array.
Slightly better would be:

union
{
long l;
bytes b[4];
} u;

And even better would be to not make assumptions about the size of long:

union u
{
long l;
unsigned char b[sizeof(long)];
};

peter koch · Jan 21, 2009

Barzo said:
Barzo said:

Hi,

Click to expand...

I have a value (long for example) and I want to split into an array of
char.

Click to expand...

Doing so:

Click to expand...

unsigned long l = 0xAABBCCDD;
unsigned char *bytes = (unsigned char*)& l;

Click to expand...

works, but is it secure? Or there is a better method?

Click to expand...

[snip]

Slightly better would be:

union
{
long l;
bytes b[4];

} u;

or

union
{
long l;
struct
{
b1:8;
b2:8;
b3:8;
b4:8;
} b;

} u;

It is not better, its worse. unions can only be legally accessed as
the type they were last written with. It often works the union way,
but reinterpret_cast gives you better guarantees standardwise.

/Peter

James Kanze · Jan 22, 2009

Actually, whether it works or not depends on the definition of
"works". The problem hasn't been specified enough to say.

This works, but keep in mind that this code is platform
dependend. bytes[0] can be 0xAA, 0xDD or probably any other
value, depending on the endianess of the CPU.

Endianness isn't the only issue; size can also play a role.
I've actually worked on systems where bytes[0] would be 0xBB,
and I'm aware of ones where it would be 0x00 or 0x15 (both still
being sold).

Further drawback: You dont have any size information of the
bytes array. Slightly better would be:

union
{
long l;
bytes b[4];
} u;

That's formally worse, since it results in undefined behavior if
you access b when the last value was written to l. (In
practice, which is better depends on the compiler---some
compilers ignore aliasing which results from a
reinterpret_cast.)

or

union
{
long l;
struct
{
b1:8;
b2:8;
b3:8;
b4:8;
} b;
} u;

That is completely implementation defined. Some compilers will
lay out the first bit fields from LSB to MSB, others from MSB to
LSB. Some will put any padding at the top, others will put it
at the bottom. (I don't know of any which will insert padding
bits in the middle, but I think that would be legal as well.)

Michael DOUBEZ · Jan 22, 2009

Barzo said:
I have a value (long for example) and I want to split into an array of
char.

Doing so:

unsigned long l = 0xAABBCCDD;
unsigned char *bytes = (unsigned char*)& l;

works, but is it secure? Or there is a better method?

Whether with cast or union-cast, you have undefined behavior. Depending
on you architecture/compiler, the results may differs. If those won't
change, check your compiler's documentation and you may be able to use this.

The only way to reliably make this kind of conversion is to establish
your convention (such as where should go the MSB-LSB in your array) and
use an explicit construction:

unsigned l = 0xAABBCCDD;
assert(sizeof(l)==4);
//example in little endian
char bytes[4]={
(l>> 0)&0x0FF,
(l>> 8)&0x0FF,
(l>>16)&0x0FF,
(l>>24)&0x0FF
};

Guest · Jan 22, 2009

Barzo wrote:

Click to expand...

Actually, whether it works or not depends on the definition of
"works". The problem hasn't been specified enough to say.

This works, but keep in mind that this code is platform
dependend. bytes[0] can be 0xAA, 0xDD or probably any other
value, depending on the endianess of the CPU.

Click to expand...

Endianness isn't the only issue; size can also play a role.
I've actually worked on systems where bytes[0] would be 0xBB,
and I'm aware of ones where it would be 0x00 or 0x15 (both still
being sold).

Further drawback: You dont have any size information of the
bytes array. Slightly better would be:
union
{
long l;
bytes b[4];
} u;

Click to expand...

That's formally worse, since it results in undefined behavior if
you access b when the last value was written to l.

no. If bytes is an alias for unsigned char the it merely
results in implementation defined behaviour. unsigned char
is the exception to the usual "only read what you wrote to a
union". You can access any object as a array of unsigned char.
[caveat: this is true for C at least, I can't see why C++
would change it]

(In

practice, which is better depends on the compiler---some
compilers ignore aliasing which results from a
reinterpret_cast.)

That is completely implementation defined.
yes

Some compilers will
lay out the first bit fields from LSB to MSB, others from MSB to
LSB. Some will put any padding at the top, others will put it
at the bottom. (I don't know of any which will insert padding
bits in the middle, but I think that would be legal as well.)

--
Nick Keighley

The different branches of Arithmetic
- Ambition, Distraction, Uglification and Derision.
Alice in Wonderland.

Guest · Jan 22, 2009

This is exactly what I need.

note: the code uses CHAR_BIT which is usually 8.
Do actually want the integer broken up into chars
or octets (8-bit bytes)?

Triple-DES · Jan 22, 2009

Your method ignores padding bits (if any, normally none on popular x86
and x86-64 platforms) and byte order.http://en.wikipedia.org/wiki/Endianness

To make it byte order insensitive and ignore any padding bits you
could do:

#include <stdio.h>
#include <limits.h>

template<class T>
void asBytes(T t, unsigned char* bytes)
{
for(int i = 0; i != sizeof t; ++i)
bytes = static_cast<unsigned char>(t >> i * CHAR_BIT &
0xff);
}

That CHAR_BIT may make the code look more portable than it is. The
function will have surprising results if CHAR_BIT is anything but 8.

Consider for example a system with CHAR_BIT 32, and sizeof(unsigned
long) == 1. asBytes( 0xAABBCCDDul, bytes)
will end up writing a single byte with the value 0xDD.

Depending on which behaviour you want (native bytes or octets),
consider removing (& 0xff) or changing CHAR_BIT to 8.

Maxim Yegorushkin · Jan 22, 2009

Your method ignores padding bits (if any, normally none on popular x86
and x86-64 platforms) and byte order.http://en.wikipedia.org/wiki/Endianness

Click to expand...

To make it byte order insensitive and ignore any padding bits you
could do:

Click to expand...

#include <stdio.h>
#include <limits.h>

Click to expand...

template<class T>
void asBytes(T t, unsigned char* bytes)
{
for(int i = 0; i != sizeof t; ++i)
bytes = static_cast<unsigned char>(t >> i * CHAR_BIT &
0xff);
}

Click to expand...

That CHAR_BIT may make the code look more portable than it is. The
function will have surprising results if CHAR_BIT is anything but 8.

Consider for example a system with CHAR_BIT 32, and sizeof(unsigned
long) == 1. asBytes( 0xAABBCCDDul, bytes)
will end up writing a single byte with the value 0xDD.

Depending on which behaviour you want (native bytes or octets),
consider removing (& 0xff) or changing CHAR_BIT to 8.

Totally agree. Thanks.

Maxim Yegorushkin · Jan 22, 2009

note: the code uses CHAR_BIT which is usually 8.
Do actually want the integer broken up into chars
or octets (8-bit bytes)?

Octets please!

Maxim Yegorushkin · Jan 22, 2009

note: the code uses CHAR_BIT which is usually 8.
Do actually want the integer broken up into chars
or octets (8-bit bytes)?

Octets please!

Maxim Yegorushkin · Jan 22, 2009

note: the code uses CHAR_BIT which is usually 8.
Do actually want the integer broken up into chars
or octets (8-bit bytes)?

Octets please!

James Kanze · Jan 22, 2009

On 22 Jan, 09:14, James Kanze <[email protected]> wrote:

[...]

Further drawback: You dont have any size information of the
bytes array. Slightly better would be:
union
{
long l;
bytes b[4];
} u;

Click to expand...

That's formally worse, since it results in undefined
behavior if you access b when the last value was written to
l.

Click to expand...

no. If bytes is an alias for unsigned char the it merely
results in implementation defined behaviour. unsigned char
is the exception to the usual "only read what you wrote to a
union".

Where does it say that? All I see is that "only one of the
members can be active at any time". Formally, I think that this
still remains undefined. Taking the address of l, casting it
to an unsigned char*, and accessing through that is legal, but
rather defeats the purpose of using a union.

In practice, of course, it will depend on the compiler. (Also,
I don't think the results are even "implementation defined".
More along the lines of "unspecified", or "very poorly
specified". But normally: "what the hardware gives you", which
may or may not be what is wanted.)

You can access any object as a array of unsigned char.
[caveat: this is true for C at least, I can't see why C++
would change it]

C++ even extended it to encompass accessing it as an array of
char (which more or less means that plain char must be unsigned
if the machine is not 2's complement). But even something like:

union
{
char c1 ;
char c2 ;
} u ;
u.c1 = 'a' ;
putchar( u.c2 ) ;

is undefined behavior---the standard allows the implementation
to somehow maintain the information as to what the last stored
value was (except in some very special cases), and core dump if
you access via any other member.

(That was, at least, the concensus in the C committee, back in
the late 1980's. At least among some of the committee members.)

Jean-Marc Bourguet · Jan 22, 2009

James Kanze said:
C++ even extended it to encompass accessing it as an array of
char (which more or less means that plain char must be unsigned
if the machine is not 2's complement). But even something like:

union
{
char c1 ;
char c2 ;
} u ;
u.c1 = 'a' ;
putchar( u.c2 ) ;

is undefined behavior---the standard allows the implementation
to somehow maintain the information as to what the last stored
value was (except in some very special cases), and core dump if
you access via any other member.

(That was, at least, the concensus in the C committee, back in
the late 1980's. At least among some of the committee members.)

The funny thing is that

struct s1 { char c; };
struct s2 { char c; };

union {
s1 m1;
s2 m2;
} u;

u.m1.c = 'a';
putchar(u.m2.c);

is conformant...

Yours,

Stefan Ram · Jan 22, 2009

Pete Becker said:
It ran into essential complexity driven by hardware. Java, on the other
hand, imposes specific size requirements, with the result that some
operations are extremely slow. For example, requiring every operation
on a double to use exactly 64 bits meant that the runtime support on
Intel hardware had to set the math processor to 64-bit mode rather than
its native 80-bit mode. This made it unusable for serious number
crunchers.

»In older JVMs, floating-point calculations were always
strict floating-point, meaning all values used during
floating-point calculations are made in the IEEE-standard
float or double sizes. This could sometimes result in a
numeric overflow or underflow in the middle of a
calculation, even if the end result would be a valid
number. Since version 1.2 of the JVM, floating-point
calculations do not require that all numbers used in
computations are themselves limited to the standard float
or double precision.

However, for some applications, a programmer might require
every platform to have precisely the same floating-point
behavior, even if some platforms could handle more
precision. In that case, the programmer can use the
modifier strictfp to ensure that calculations are
performed as in the earlier versions—only with floats and
doubles.«

http://en.wikipedia.org/wiki/Strictfp

Help with a on numeric value encountered warning	2	Mar 6, 2023
Is this right way to convert data attributes values to number in javascipt? Need to get valid numeric value or 0	2	May 30, 2023
stream bytes	2	Dec 5, 2011
Parsing Numeric Data	5	Oct 16, 2012
Testing for Numeric Digits	11	Feb 26, 2012
bytes to unsigned long	14	May 9, 2007
Parsing Numeric Data	2	Nov 8, 2012
[Newbie] Once more on conversions	2	Jan 27, 2009

Split a numeric value into bytes (char)

Barzo

Helge Kruse

anon

Maxim Yegorushkin

Barzo

Rolf Magnus

Erik WikstrÃ¶m

peter koch

James Kanze

Michael DOUBEZ

Guest

Guest

Triple-DES

Maxim Yegorushkin

Maxim Yegorushkin

Maxim Yegorushkin

Maxim Yegorushkin

James Kanze

Jean-Marc Bourguet

Stefan Ram

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads