Int to char[4]

G

Grizlyk

The union version doesn't work because the standard only
allows you to inspect the common initial sequence of two structs.
What does it mean?

union X
{
int i;
char c[sizeof(int)];
};

X tmp;

1. "tmp" is aligned for "X::i" X::i (has offset zero)
2. "X::i" and "X::c" is started from the same memory adr
3. type "int" has "sizeof(int)" chars placed without holes - one by one

Why not?
 
G

Grizlyk

Kai-Uwe Bux said:
Jim said:
// Method 1
int p1 = 1234;
char n1[4];
*(reinterpret_cast<int*>( n1 )) = p1;

This one is interesting. Are you sure, n1 satisfies the alignment
requirements for int?

I think, you are right, we can get wrong alignment here. It is better to do
like this:

int p1 = 1234;

//to force buf align and size
int p2 = 0;
char *n1=reinterpret_cast<char*>(&p2);

*(reinterpret_cast<int*>( n1 )) = p1;
n1[0..3]=0;

//is not aligned now
char buf[4];
n1=buf;

//can be error
(And I wondern what happens if it is not.
Memory access violation can occur on some targets.
 
R

Rolf Magnus

Jerry said:
[ ... ]
I agree. However, I'd still not go through the union. For one, the cast
is simpler, and it's the tool that was made for the job. Unions are not
meant to be used like that. I mean, you may be able to use a brick to
hammer a nail into a wall, but a hammer just seems to be a more "natural"
choice.

In this case, the correct metaphor doesn't seem to be comparing a hammer
to a brick, but comaparing a chunk of reddish granite to a chunk of grey
granite, both roughly the same size and shape.

If he'd used a reinterpret_cast instead of a C-style cast, you _might_
have a point, but even then (IMO) we're just talking about a
specifically designated chunk of granite, still not really a hammer.

Ok, maybe the metaphor wasn't that good, so let's not discuss if a
reinterpret_cast is a hammer or a chunk of granite. What matters to me is
that reinterpret_cast is the tool actually designed for the task, and a
union is not. When having a choice, I stick with using the language as it
was intended, so I'd always choose the reinterpret_cast.
 
O

Old Wolf

Jim said:
// Method 1
int p1 = 1234;
char n1[4];
*(reinterpret_cast<int*>( n1 )) = p1;
[snip]
No, there is no problem with alignment in the above code.

Actually there is. The char array might not be correctly aligned
for an int.
However unsightly this code may be, it will not blow up when run. Because
the four chars of storage for the int were allocated by a single array
object (n1), the storage will be aligned according the most stringent
alignment requirements of any four-byte sized type (including a four-byte
int type).

Completely untrue.
In fact a C++ program can always be certain that as long as a character
array is equal to (or larger than) the sizeof() a POD type, then the program
will be able to place of an object of that type into that character array
safely.

You can place the object there by memcpy'ing it. But you may not
be able to create a pointer to that object type, and point it to the
char array.
 
J

Jerry Coffin

[ ... ]
I don't have a copy of the standard at home, but I don't believe
there's anything undefined or otherwise unsafe about this.

Sorry, but on that point you're simply mistaken.
The layout
of the int is of course not specified by the standard, but any object
can be examined as a sequence of bytes safely.

In C99 you'd be right -- and someday, C++ might follow suit -- but as
far as I can see, they're both undefined behavior at least for now.

[ ... ]
So? That's no different from what you have, except that the cast to one
of the char* types is specified to be safe. It avoids a lot of
unnecessary business in the union. If all you want to do is examine the
byte layout of an object, casting to unsigned char* is the way to go
(don't make my error that Ron pointed out, of course).

See above -- in C99, it's true that the cast to pointer to a char type
gives defined (if unspecified) results. I don't see such requirement in
the C++ standard though.
 
J

Jerry Coffin

The union version doesn't work because the standard only
allows you to inspect the common initial sequence of two structs.
What does it mean?

union X
{
int i;
char c[sizeof(int)];
};

X tmp;

1. "tmp" is aligned for "X::i" X::i (has offset zero)
2. "X::i" and "X::c" is started from the same memory adr
3. type "int" has "sizeof(int)" chars placed without holes - one by one

Why not?

C89 specifically required that only the most recently written member of
a union could be read, and violating this resulted in implementation-
defined behavior ("With on exception, if a member of a union object is
accessed after a value has been stored in a different member of the
object, the behavior is implementation-defined." $6.3.2.3). Interpreted
literally, that means the following code gives implementation defined
behavior:

union X {
int i;
float j;
};

int main() {
union X x;
int a;

x.i = 1;
x.j = 1.0;
x.i = 0;
a = x.i;
return 0;
}

Even though x.i was the most recently stored member when x.i is
accessed, the access to x.i does take place "after a value has been
stord in a different member of the object."

In both C++ and C99, this (explicit) requirement seems to have
disappeared (though both still contain language about a "special" rule
dispensing with the requirement on the common initial sequence, sort of
implying that the disappearance of the rule may not have been entirely
intentional). It's open to argument that the undefined behavior still
exists, simply because neither explicitly defines what happens when you
read from a different member than was last written.

OTOH, the standard explicitly requires that the storage for the objects
in the union overlap, and that the union be aligned so that a pointer to
the beginning of the union can be used to dereference any member (and
vice versa) -- and this is true in both C and C++. So the alignment is
guaranteed to work, but the type-pun (arguably) might not.

A reinterpret_cast (even if it looks like a C-style cast) usually has a
problem with alignment: even though everybody "knows" that char has no
alignment requirements, the standard doesn't seem to directly guarantee
it (then again, the required similarity between pointer to char and
pointer to void could be interpreted as such). If you put the int into
dynamically allocated memory, it guarantees that its first byte is
aligned to be accessed as a char, but the remainder still might not be.

The shift and mask method works for essentially any data, but it's
clumsy (at best) to make it entirely portable. You need to convert the
int to unsigned before you do right shifting, and you need to use
CHAR_BIT to figure out how many bits there are in a byte, and use that
as the basis for your mask, etc. Even with all that, you have to live
with the fact that the int could contain some padding bits, so you could
have some number of bits in the byte-by-byte representation that are
zero for all possible inputs.

AFAIK, the lack of portability of either the cast or the union method is
purely theoretical. None of the methods is what I'd call beautiful by
any means, though the (portable) version of the shift/mask method is
undoubtedly the longest, probably the ugliest, and the most likely to
involve extra instructions. Between the cast and the union, it's close
to a toss-up: neither guarantees portability (in C+++; the cast is semi-
portable in C99), but both are for all practical purposes. Both strike
me as ugly, though I think the cast is somewhat more so. The cast by
itself is fairly ugly, but when you add in the requirement to take the
address of the int, the cast that to pointer to char, then dereference
the resulting pointer, the whole is really pretty hideous (and the fact
that it's basically the only way to use a reinterpret_cast doesn't make
it any less hideous, IMO).

If possible, the real answer is to avoid all of the above, and simply
find an entirely different way to solve the problem.
 
R

Rolf Magnus

Jerry said:
[ ... ]
I don't have a copy of the standard at home, but I don't believe
there's anything undefined or otherwise unsafe about this.

Sorry, but on that point you're simply mistaken.
The layout
of the int is of course not specified by the standard, but any object
can be examined as a sequence of bytes safely.

In C99 you'd be right -- and someday, C++ might follow suit -- but as
far as I can see, they're both undefined behavior at least for now.

Quote from the standard (§3.9):

"For any complete POD object type T, whether or not the object holds a valid
value of type T, the underlying bytes (1.7) making up the object can be
copied into an array of char or unsigned char."

and:

"The object representation of an object of type T is the sequence of N
unsigned char objects taken up by the object of type T, where N equals
sizeof(T)."
 
D

Default User

Jerry Coffin wrote:

A reinterpret_cast (even if it looks like a C-style cast) usually has
a problem with alignment: even though everybody "knows" that char has
no alignment requirements, the standard doesn't seem to directly
guarantee it (then again, the required similarity between pointer to
char and pointer to void could be interpreted as such). If you put
the int into dynamically allocated memory, it guarantees that its
first byte is aligned to be accessed as a char, but the remainder
still might not be.

You are simply not correct about this. The standard does guarantee that
POD types can be accessed as byte buffers. Not only that, if there were
some sort of alignment concern, your union would have the same problem.





Brian
 
J

Jerry Coffin

[ ... ]
Quote from the standard (§3.9):

"For any complete POD object type T, whether or not the object holds a valid
value of type T, the underlying bytes (1.7) making up the object can be
copied into an array of char or unsigned char."

Copied into an array of char is one thing -- but using the cast is NOT
copying it anywhere, it's leaving it where it is, and attempting to
_treat_ it as an array of char.
and:

"The object representation of an object of type T is the sequence of N
unsigned char objects taken up by the object of type T, where N equals
sizeof(T)."

If this has any relevance, I'm missing it.
 
J

Jerry Coffin

Jerry Coffin wrote:



You are simply not correct about this. The standard does guarantee that
POD types can be accessed as byte buffers.

Where does it guarantee that? The closest I see is at 3.9/2, and that
seems to fall a bit short of what you're claiming.
Not only that, if there were
some sort of alignment concern, your union would have the same problem.

No, it would not. The standard specifically requires that a union be
aligned so that each member is accessible and the addresses of all those
members are equal to each other and to the addess of the union.
 
R

Rolf Magnus

Jerry said:
[ ... ]
Quote from the standard (§3.9):

"For any complete POD object type T, whether or not the object holds a
valid value of type T, the underlying bytes (1.7) making up the object
can be copied into an array of char or unsigned char."

Copied into an array of char is one thing -- but using the cast is NOT
copying it anywhere, it's leaving it where it is, and attempting to
_treat_ it as an array of char.

How do you copy it to an array of char without treating it as one?
If this has any relevance, I'm missing it.

It says that the object representation is an array of unsigned char. If it
is one, you can treat it as one.
 
J

Jerry Coffin

[ ... ]
How do you copy it to an array of char without treating it as one?

char *dest = new malloc(sizeof(int));

*(int *)dest = src_int;

for (size_t i=0; i<sizeof(int); i++)
use(dest);

Here you're copying it as an int. malloc returns storage aligned for any
possible type, so we've eliminated that problem. We copy the int as an
int, into an array of char, so we're doing the copying without treating
it as an array of char. Then, only after we've transferred the bits to
the array of char, do we look at the char's for what they really are.

I'll repeat though: I think this is all mostly nonsense. The mere fact
that the standard doesn't quite guarantee what most people think it does
is no real reason to get stupid about writing more complex code than
necessary for the portability you actually need. Even if you're sure I'm
wrong about what the standard requires, it's not going to change much:
either the compilers you care about will accept your code, or else they
won't. While it's nice to think of the standard as an absolute, we all
know that in real life, it's little more than a general guideline. We
all now most compilers don't conform even with requirements we can all
agree are present.
 
R

Rolf Magnus

Jerry said:
In both C++ and C99, this (explicit) requirement seems to have
disappeared (though both still contain language about a "special" rule
dispensing with the requirement on the common initial sequence, sort of
implying that the disappearance of the rule may not have been entirely
intentional). It's open to argument that the undefined behavior still
exists, simply because neither explicitly defines what happens when you
read from a different member than was last written.

It still says that only the value of one of the members can be stored in it
at any time. I don't see any rule (other than the exception of the common
initial sequence) that says you can read any other member than the one
currently stored.
A reinterpret_cast (even if it looks like a C-style cast) usually has a
problem with alignment: even though everybody "knows" that char has no
alignment requirements, the standard doesn't seem to directly guarantee
it (then again, the required similarity between pointer to char and
pointer to void could be interpreted as such).

It is guaranteed, simply due to the fact that sizeof(char) is always 1. If
char had any special alignment requirements, you couldn't create arrays of
char, and that would violate the standard.
AFAIK, the lack of portability of either the cast or the union method is
purely theoretical. None of the methods is what I'd call beautiful by
any means, though the (portable) version of the shift/mask method is
undoubtedly the longest, probably the ugliest, and the most likely to
involve extra instructions.

IMHO, the best way to deal with it is to put the required conversions into
their own functions and adapt those to target platforms that need that.
There is no really portable way of doing binary I/O (which I assume to be
the actual reason for the cast).
 
V

Victor Bazarov

Jerry said:
[ ... ]
How do you copy it to an array of char without treating it as one?

char *dest = new malloc(sizeof(int));

*(int *)dest = src_int;
[..]

This is fine. The alignment of dynamically allocated block is
different from one allocated automatically. The OP asked about
an automatic array of char, IIRC. That's why the whole alignment
discussion was started. So, to reiterate, this

char *dest = new malloc(sizeof(int));
*(int*)dest = 42;

is fine, however, this

char dest[sizeof(int)];
*(int*)dest = 42;

is NOT.

V
 
J

Jerry Coffin

[ ... ]
This is fine. The alignment of dynamically allocated block is
different from one allocated automatically. The OP asked about
an automatic array of char, IIRC. That's why the whole alignment
discussion was started. So, to reiterate, this

char *dest = new malloc(sizeof(int));
*(int*)dest = 42;

is fine,

....other than that minor detail that you copied my typo of "new
malloc" -- obviously you need new or malloc, but not both. :)
 
J

Jerry Coffin

[ ... ]
It is guaranteed, simply due to the fact that sizeof(char) is always 1. If
char had any special alignment requirements, you couldn't create arrays of
char, and that would violate the standard.

You're assuming that all machines are quite a bit like you're accustomed
to. That's not necessarily the case. Consider a machine with two
entirely separate memories, one that's byte-addressable, but relatively
small, while the other is only word addressable, but larger. When you
ask for char's, it's allocated from the first memory, but when you ask
for int's, it's allocated from the second. If you want to copy from the
second to the first, you can do that -- but only by reading an entire
word, not individual bytes.

Many DSPs are more or less like this: they start as more or less Harvard
architectures, with separate memories (including separate busses) for
data and instructions. For the sake of speed, however, some instructions
can use the instruction memory as a secondary data memory -- but often
with restrictions. When/if you work with data that doesn't fit those
restrictions, it needs to be allocated in the main data memory...

[ ... ]
IMHO, the best way to deal with it is to put the required conversions into
their own functions and adapt those to target platforms that need that.
There is no really portable way of doing binary I/O (which I assume to be
the actual reason for the cast).

Assuming that's really the case (and I'm not disputing it, just
admitting that I'm not sure) I agree.
 
K

Kai-Uwe Bux

Jerry said:
[ ... ]
It is guaranteed, simply due to the fact that sizeof(char) is always 1.
If char had any special alignment requirements, you couldn't create
arrays of char, and that would violate the standard.

You're assuming that all machines are quite a bit like you're accustomed
to. That's not necessarily the case. Consider a machine with two
entirely separate memories, one that's byte-addressable, but relatively
small, while the other is only word addressable, but larger. When you
ask for char's, it's allocated from the first memory, but when you ask
for int's, it's allocated from the second. If you want to copy from the
second to the first, you can do that -- but only by reading an entire
word, not individual bytes.

Many DSPs are more or less like this: they start as more or less Harvard
architectures, with separate memories (including separate busses) for
data and instructions. For the sake of speed, however, some instructions
can use the instruction memory as a secondary data memory -- but often
with restrictions. When/if you work with data that doesn't fit those
restrictions, it needs to be allocated in the main data memory...

I am not sure, I buy this argument (provided it is supposed to give guidance
on interpreting the C++ standard). The standard describes the memory model
and states the addressable units in memory are bytes. If you had the
architecture above, implementors would have to decide what they want a byte
to be, and if they decide on the smaller unit, they would have to implement
some trickery to make those subwords look like addressable in the region of
memory where hardware does not support it. From the point of the abstract
machine, memory is homogeneous.

[snip]


Best

Kai-Uwe Bux
 
J

Jerry Coffin

[ ... ]
I am not sure, I buy this argument (provided it is supposed to give guidance
on interpreting the C++ standard). The standard describes the memory model
and states the addressable units in memory are bytes. If you had the
architecture above, implementors would have to decide what they want a byte
to be, and if they decide on the smaller unit, they would have to implement
some trickery to make those subwords look like addressable in the region of
memory where hardware does not support it. From the point of the abstract
machine, memory is homogeneous.

While we all know the general idea of how we think things are supposed
to be, the question would be whether there's a requirement in the
standard that this design would violate.

Most people have the general idea that the required alignment of an item
is always less than or equal to the size of that item, so reading a char
will always be aligned -- but I don't see any such actual requirement. I
don't see any other requirement it would violate either.
 
K

Kai-Uwe Bux

Jerry said:
[ ... ]
I am not sure, I buy this argument (provided it is supposed to give
guidance on interpreting the C++ standard). The standard describes the
memory model and states the addressable units in memory are bytes. If you
had the architecture above, implementors would have to decide what they
want a byte to be, and if they decide on the smaller unit, they would
have to implement some trickery to make those subwords look like
addressable in the region of memory where hardware does not support it.
From the point of the abstract machine, memory is homogeneous.

While we all know the general idea of how we think things are supposed
to be, the question would be whether there's a requirement in the
standard that this design would violate.

Most people have the general idea that the required alignment of an item
is always less than or equal to the size of that item, so reading a char
will always be aligned -- but I don't see any such actual requirement. I
don't see any other requirement it would violate either.

I think, I see your point now: although the standard guarantees that memory
consists of bytes and each byte is invidually addressable, and although it
guarantees that an unsigned char has size 1, which means it is exactly one
byte, there is no guarantee that unsigned char has no alignment, i.e., the
standard does not guarantee that each bytes can be addressed by means of a
pointer to unsigned char.

I think, that might be a defect in the standard.


Best

Kai-Uwe Bux
 
J

Jerry Coffin

[ ... ]
I think, I see your point now: although the standard guarantees that memory
consists of bytes and each byte is invidually addressable, and although it
guarantees that an unsigned char has size 1, which means it is exactly one
byte, there is no guarantee that unsigned char has no alignment, i.e., the
standard does not guarantee that each bytes can be addressed by means of a
pointer to unsigned char.

Actually, I don't see anything that even says every byte is individually
addressable. It says a char is one byte, and everything else is composed
of bytes, but I don't see anything that guarantees that those bytes are
all individually addressable. It does guarantee that anything else can
be copied into bytes and those bytes addressed individually. There's an
example that does byte-by-byte copying but it's not normative. As such,
I think the intent to allow byte-by-byte addressing of all POD types was
probably there -- but I don't see normative language that really
guarantees it.

Use of the word "alignment" tends to suggest that it's related to the
least significant bits of an address. Nonetheless, the alignment rules
really seem to say that memory that's dynamically allocated (e.g. with
malloc) can be addressed as any type that'll fit into the allocated
memory, but otherwise, you can only really address something as its
allocated type. There are a few special rules about how you can address
parts of a partially constructed object, but that's about it. Oddly, I
think these are _intended_ to restrict what you can do portably, but I
think they're actually more permissive than the rules for other
situations.
I think, that might be a defect in the standard.

I suspect it is. The note mentioned above, while non-normative still
gives a _strong_ suggestion of what the authors had in mind. I think the
intent was to allow any object (in the C sense of the word) to be
addressed as a series of bytes. You're not guaranteed a particular
relationship between the original value and the values in those bytes,
but you're allowed to read them anyway.

As I've said, however, while I don't see normative language to support
that, I'm pretty sure just about every compiler writer "knows" it so
every compiler around will allow it. Systems that have memory that can't
be addressed byte-by-byte probably fake it by reading entire words and
then allowing manipulation of individual bytes in registers.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,776
Messages
2,569,602
Members
45,182
Latest member
BettinaPol

Latest Threads

Top