Split a numeric value into bytes (char)

B

Barzo

Hi,

I have a value (long for example) and I want to split into an array of
char.

Doing so:

unsigned long l = 0xAABBCCDD;
unsigned char *bytes = (unsigned char*)& l;

works, but is it secure? Or there is a better method?

Tnx.
Daniele.
 
H

Helge Kruse

Barzo said:
Hi,

I have a value (long for example) and I want to split into an array of
char.

Doing so:

unsigned long l = 0xAABBCCDD;
unsigned char *bytes = (unsigned char*)& l;

works, but is it secure? Or there is a better method?

Tnx.
Daniele.

This works, but keep in mind that this code is platform dependend.
bytes[0] can be 0xAA, 0xDD or probably any other value, depending on the
endianess of the CPU.

Further drawback: You dont have any size information of the bytes array.
Slightly better would be:

union
{
long l;
bytes b[4];
} u;

or

union
{
long l;
struct
{
b1:8;
b2:8;
b3:8;
b4:8;
} b;
} u;



Regards,
Helge
 
A

anon

Barzo said:
Hi,

I have a value (long for example) and I want to split into an array of
char.

Doing so:

unsigned long l = 0xAABBCCDD;
unsigned char *bytes = (unsigned char*)& l;

works, but is it secure? Or there is a better method?

What do you mean "secure"???

You could do this:
unsigned long l = 0xAABBCCDD;
unsigned char *bytes = reinterpret_cast< unsigned char* > ( & l );
 
M

Maxim Yegorushkin

I have a value (long for example) and I want to split into an array of
char.

Doing so:

unsigned long l = 0xAABBCCDD;
unsigned char *bytes = (unsigned char*)& l;

works, but is it secure?

Provided l lifetime is same or longer than that of bytes.
Or there is a better method?

Your method ignores padding bits (if any, normally none on popular x86
and x86-64 platforms) and byte order. http://en.wikipedia.org/wiki/Endianness

To make it byte order insensitive and ignore any padding bits you
could do:

#include <stdio.h>
#include <limits.h>

template<class T>
void asBytes(T t, unsigned char* bytes)
{
for(int i = 0; i != sizeof t; ++i)
bytes = static_cast<unsigned char>(t >> i * CHAR_BIT &
0xff);
}

template<class T>
void fromBytes(unsigned char const* bytes, T* t)
{
*t = 0;
for(int i = 0; i != sizeof t; ++i)
*t |= bytes << CHAR_BIT * i;
}

int main()
{
long l = 0x12345678;
unsigned char bytes[sizeof l];
asBytes(l, bytes);
long m;
fromBytes(bytes, &m);

printf("%lx -> %lx\n", l, m);
}

Obviously, the above two functions only work with integer types.
 
B

Barzo

Your method ignores padding bits (if any, normally none on popular x86
and x86-64 platforms) and byte order.

To make it byte order insensitive and ignore any padding bits you
could do:
....
Obviously, the above two functions only work with integer types.

This is exactly what I need.

Thanks Max!
Daniele.
 
R

Rolf Magnus

Helge said:
Barzo said:
Hi,

I have a value (long for example) and I want to split into an array of
char.

Doing so:

unsigned long l = 0xAABBCCDD;
unsigned char *bytes = (unsigned char*)& l;

works, but is it secure? Or there is a better method?

Tnx.
Daniele.

This works, but keep in mind that this code is platform dependend.
bytes[0] can be 0xAA, 0xDD or probably any other value, depending on the
endianess of the CPU.

Further drawback: You dont have any size information of the bytes array.

The size is equal to sizeof(unsigned long).
Slightly better would be:

union
{
long l;
bytes b[4];
} u;

I don't consider that better. It's abuse of unions, which are not supposed
to be used like that. That's what casts (in this case reinterpret_cast) are
for.
 
E

Erik Wikström

Barzo said:
Hi,

I have a value (long for example) and I want to split into an array of
char.

Doing so:

unsigned long l = 0xAABBCCDD;
unsigned char *bytes = (unsigned char*)& l;

works, but is it secure? Or there is a better method?

Tnx.
Daniele.

This works, but keep in mind that this code is platform dependend.
bytes[0] can be 0xAA, 0xDD or probably any other value, depending on the
endianess of the CPU.

Further drawback: You dont have any size information of the bytes array.
Slightly better would be:

union
{
long l;
bytes b[4];
} u;

And even better would be to not make assumptions about the size of long:

union u
{
long l;
unsigned char b[sizeof(long)];
};
 
P

peter koch

Barzo said:
I have a value (long for example) and I want to split into an array of
char.
Doing so:
unsigned long l = 0xAABBCCDD;
unsigned char *bytes = (unsigned char*)& l;
works, but is it secure? Or there is a better method?
[snip]

Slightly better would be:

union
{
        long l;
        bytes b[4];

} u;

or

union
{
        long l;
        struct
        {
                b1:8;
                b2:8;
                b3:8;
                b4:8;
        } b;

} u;

It is not better, its worse. unions can only be legally accessed as
the type they were last written with. It often works the union way,
but reinterpret_cast gives you better guarantees standardwise.

/Peter
 
J

James Kanze

Actually, whether it works or not depends on the definition of
"works". The problem hasn't been specified enough to say.
This works, but keep in mind that this code is platform
dependend. bytes[0] can be 0xAA, 0xDD or probably any other
value, depending on the endianess of the CPU.

Endianness isn't the only issue; size can also play a role.
I've actually worked on systems where bytes[0] would be 0xBB,
and I'm aware of ones where it would be 0x00 or 0x15 (both still
being sold).
Further drawback: You dont have any size information of the
bytes array. Slightly better would be:
union
{
long l;
bytes b[4];
} u;

That's formally worse, since it results in undefined behavior if
you access b when the last value was written to l. (In
practice, which is better depends on the compiler---some
compilers ignore aliasing which results from a
reinterpret_cast.)
union
{
long l;
struct
{
b1:8;
b2:8;
b3:8;
b4:8;
} b;
} u;

That is completely implementation defined. Some compilers will
lay out the first bit fields from LSB to MSB, others from MSB to
LSB. Some will put any padding at the top, others will put it
at the bottom. (I don't know of any which will insert padding
bits in the middle, but I think that would be legal as well.)
 
M

Michael DOUBEZ

Barzo said:
I have a value (long for example) and I want to split into an array of
char.

Doing so:

unsigned long l = 0xAABBCCDD;
unsigned char *bytes = (unsigned char*)& l;

works, but is it secure? Or there is a better method?

Whether with cast or union-cast, you have undefined behavior. Depending
on you architecture/compiler, the results may differs. If those won't
change, check your compiler's documentation and you may be able to use this.

The only way to reliably make this kind of conversion is to establish
your convention (such as where should go the MSB-LSB in your array) and
use an explicit construction:

unsigned l = 0xAABBCCDD;
assert(sizeof(l)==4);
//example in little endian
char bytes[4]={
(l>> 0)&0x0FF,
(l>> 8)&0x0FF,
(l>>16)&0x0FF,
(l>>24)&0x0FF
};
 
G

Guest

Barzo wrote:

Actually, whether it works or not depends on the definition of
"works".  The problem hasn't been specified enough to say.
This works, but keep in mind that this code is platform
dependend.  bytes[0] can be 0xAA, 0xDD or probably any other
value, depending on the endianess of the CPU.

Endianness isn't the only issue; size can also play a role.
I've actually worked on systems where bytes[0] would be 0xBB,
and I'm aware of ones where it would be 0x00 or 0x15 (both still
being sold).
Further drawback: You dont have any size information of the
bytes array.  Slightly better would be:
union
{
        long l;
        bytes b[4];
} u;

That's formally worse, since it results in undefined behavior if
you access b when the last value was written to l.

no. If bytes is an alias for unsigned char the it merely
results in implementation defined behaviour. unsigned char
is the exception to the usual "only read what you wrote to a
union". You can access any object as a array of unsigned char.
[caveat: this is true for C at least, I can't see why C++
would change it]


 (In
practice, which is better depends on the compiler---some
compilers ignore aliasing which results from a
reinterpret_cast.)


That is completely implementation defined.  
yes


Some compilers will
lay out the first bit fields from LSB to MSB, others from MSB to
LSB.  Some will put any padding at the top, others will put it
at the bottom.  (I don't know of any which will insert padding
bits in the middle, but I think that would be legal as well.)

--
Nick Keighley

The different branches of Arithmetic
- Ambition, Distraction, Uglification and Derision.
Alice in Wonderland.
 
T

Triple-DES

Your method ignores padding bits (if any, normally none on popular x86
and x86-64 platforms) and byte order.http://en.wikipedia.org/wiki/Endianness

To make it byte order insensitive and ignore any padding bits you
could do:

    #include <stdio.h>
    #include <limits.h>

    template<class T>
    void asBytes(T t, unsigned char* bytes)
    {
        for(int i = 0; i != sizeof t; ++i)
            bytes = static_cast<unsigned char>(t >> i * CHAR_BIT &
0xff);
    }



That CHAR_BIT may make the code look more portable than it is. The
function will have surprising results if CHAR_BIT is anything but 8.

Consider for example a system with CHAR_BIT 32, and sizeof(unsigned
long) == 1. asBytes( 0xAABBCCDDul, bytes)
will end up writing a single byte with the value 0xDD.

Depending on which behaviour you want (native bytes or octets),
consider removing (& 0xff) or changing CHAR_BIT to 8.
 
M

Maxim Yegorushkin

Your method ignores padding bits (if any, normally none on popular x86
and x86-64 platforms) and byte order.http://en.wikipedia.org/wiki/Endianness
To make it byte order insensitive and ignore any padding bits you
could do:
    #include <stdio.h>
    #include <limits.h>
    template<class T>
    void asBytes(T t, unsigned char* bytes)
    {
        for(int i = 0; i != sizeof t; ++i)
            bytes = static_cast<unsigned char>(t >> i * CHAR_BIT &
0xff);
    }


That CHAR_BIT may make the code look more portable than it is. The
function will have surprising results if CHAR_BIT is anything but 8.

Consider for example a system with CHAR_BIT 32, and sizeof(unsigned
long) == 1. asBytes( 0xAABBCCDDul, bytes)
will end up writing a single byte with the value 0xDD.

Depending on which behaviour you want (native bytes or octets),
consider removing (& 0xff) or changing CHAR_BIT to 8.


Totally agree. Thanks.
 
J

James Kanze

On 22 Jan, 09:14, James Kanze <[email protected]> wrote:

[...]
Further drawback: You dont have any size information of the
bytes array. Slightly better would be:
union
{
long l;
bytes b[4];
} u;
That's formally worse, since it results in undefined
behavior if you access b when the last value was written to
l.
no. If bytes is an alias for unsigned char the it merely
results in implementation defined behaviour. unsigned char
is the exception to the usual "only read what you wrote to a
union".

Where does it say that? All I see is that "only one of the
members can be active at any time". Formally, I think that this
still remains undefined. Taking the address of l, casting it
to an unsigned char*, and accessing through that is legal, but
rather defeats the purpose of using a union.

In practice, of course, it will depend on the compiler. (Also,
I don't think the results are even "implementation defined".
More along the lines of "unspecified", or "very poorly
specified". But normally: "what the hardware gives you", which
may or may not be what is wanted.)
You can access any object as a array of unsigned char.
[caveat: this is true for C at least, I can't see why C++
would change it]

C++ even extended it to encompass accessing it as an array of
char (which more or less means that plain char must be unsigned
if the machine is not 2's complement). But even something like:

union
{
char c1 ;
char c2 ;
} u ;
u.c1 = 'a' ;
putchar( u.c2 ) ;

is undefined behavior---the standard allows the implementation
to somehow maintain the information as to what the last stored
value was (except in some very special cases), and core dump if
you access via any other member.

(That was, at least, the concensus in the C committee, back in
the late 1980's. At least among some of the committee members.)
 
J

Jean-Marc Bourguet

James Kanze said:
C++ even extended it to encompass accessing it as an array of
char (which more or less means that plain char must be unsigned
if the machine is not 2's complement). But even something like:

union
{
char c1 ;
char c2 ;
} u ;
u.c1 = 'a' ;
putchar( u.c2 ) ;

is undefined behavior---the standard allows the implementation
to somehow maintain the information as to what the last stored
value was (except in some very special cases), and core dump if
you access via any other member.

(That was, at least, the concensus in the C committee, back in
the late 1980's. At least among some of the committee members.)

The funny thing is that

struct s1 { char c; };
struct s2 { char c; };

union {
s1 m1;
s2 m2;
} u;

u.m1.c = 'a';
putchar(u.m2.c);

is conformant...

Yours,
 
S

Stefan Ram

Pete Becker said:
It ran into essential complexity driven by hardware. Java, on the other
hand, imposes specific size requirements, with the result that some
operations are extremely slow. For example, requiring every operation
on a double to use exactly 64 bits meant that the runtime support on
Intel hardware had to set the math processor to 64-bit mode rather than
its native 80-bit mode. This made it unusable for serious number
crunchers.

»In older JVMs, floating-point calculations were always
strict floating-point, meaning all values used during
floating-point calculations are made in the IEEE-standard
float or double sizes. This could sometimes result in a
numeric overflow or underflow in the middle of a
calculation, even if the end result would be a valid
number. Since version 1.2 of the JVM, floating-point
calculations do not require that all numbers used in
computations are themselves limited to the standard float
or double precision.

However, for some applications, a programmer might require
every platform to have precisely the same floating-point
behavior, even if some platforms could handle more
precision. In that case, the programmer can use the
modifier strictfp to ensure that calculations are
performed as in the earlier versions—only with floats and
doubles.«

http://en.wikipedia.org/wiki/Strictfp
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,144
Latest member
KetoBaseReviews
Top