Copying struct and unsigned char buffer

R

Roy Hills

When I'm reading from or writing to a network socket, I want to use a
struct to represent the structured data, but must use an unsigned char
buffer for the call to sendto() or recvfrom().

I have two questions:

1. Is it generally safe to "overlay" the structure on the buffer,
e.g.:

unsigned char buffer[BUFSIZ];
struct header {
whatever;
};
struct header *hdr = (struct header *) buf;

or is it safer / more portable to declare a seperate struct and use
memcpy() to copy it to or from the buffer? Obviously, it would be
more efficient to avoid the memcpy() if possible.

The overlay seems to work fine, but I'm always concerned that I'm
doing something non-portable or ineligant whenever I cast a pointer.

I've seen both approaches used in practice.

2. How should I deal with alignment?

I've read that there's no guarantee that struct members will be
adjacent in memory because of alignment requirements. So in theory, I
shouldn't try to overlay or memcpy() a struct to or from a buffer at
all. However, just about every program I've seen seems to do this with
no ill effects or lack of portability.

Are there any general guidelines on when I'm likely to run into struct
alignment padding? Is it normally safe if elements are aligned
according to size, e.g. a 32-bit on an address divisible by 4 and a
16-bit on an address divisible by 2? Is it safe to assume that there
will be no padding at the beginning of the struct? (i.e. the compiler
will automatically position the first element with the requireed
alignment?).

Roy Hills
 
M

Mark A. Odell

When I'm reading from or writing to a network socket, I want to use a
struct to represent the structured data, but must use an unsigned char
buffer for the call to sendto() or recvfrom().

I have two questions:

1. Is it generally safe to "overlay" the structure on the buffer,
e.g.:

unsigned char buffer[BUFSIZ];
struct header {
whatever;
};
struct header *hdr = (struct header *) buf;

or is it safer / more portable to declare a seperate struct and use
memcpy() to copy it to or from the buffer? Obviously, it would be
more efficient to avoid the memcpy() if possible.
The overlay seems to work fine, but I'm always concerned that I'm
doing something non-portable or ineligant whenever I cast a pointer.

You can overlay it safely. You're just not supposed to trust what happens
if you write via one var and read via the other, e.g. buffer[34] = 'a';
followed by if (hdr->elementN == 'a') where elementN happens to contain
byte 34 of buffer[].
I've seen both approaches used in practice.

2. How should I deal with alignment?

That can be accomplished by allocating N-1 bytes of storage where N is
your alignment requirement. But I don't think this will work for you since
you probably want your struct to start at the beginning of buffer not
offset by some alignment munging.
I've read that there's no guarantee that struct members will be
adjacent in memory because of alignment requirements. So in theory, I
shouldn't try to overlay or memcpy() a struct to or from a buffer at
all. However, just about every program I've seen seems to do this with
no ill effects or lack of portability.

If you fill in data via the struct you should be able to copy the struct
to another same type struct w/o issue.
Are there any general guidelines on when I'm likely to run into struct
alignment padding? Is it normally safe if elements are aligned
according to size, e.g. a 32-bit on an address divisible by 4 and a
16-bit on an address divisible by 2? Is it safe to assume that there
will be no padding at the beginning of the struct? (i.e. the compiler
will automatically position the first element with the requireed
alignment?).

Yes, the first element of the struct may not be preceeded by padding
bytes.
 
E

Eric Sosman

Roy said:
When I'm reading from or writing to a network socket, I want to use a
struct to represent the structured data, but must use an unsigned char
buffer for the call to sendto() or recvfrom().

I have two questions:

1. Is it generally safe to "overlay" the structure on the buffer,
e.g.:

unsigned char buffer[BUFSIZ];
struct header {
whatever;
};
struct header *hdr = (struct header *) buf;

or is it safer / more portable to declare a seperate struct and use
memcpy() to copy it to or from the buffer? Obviously, it would be
more efficient to avoid the memcpy() if possible.

The overlay seems to work fine, but I'm always concerned that I'm
doing something non-portable or ineligant whenever I cast a pointer.

I've seen both approaches used in practice.

2. How should I deal with alignment?

I've read that there's no guarantee that struct members will be
adjacent in memory because of alignment requirements. So in theory, I
shouldn't try to overlay or memcpy() a struct to or from a buffer at
all. However, just about every program I've seen seems to do this with
no ill effects or lack of portability.

Are there any general guidelines on when I'm likely to run into struct
alignment padding? Is it normally safe if elements are aligned
according to size, e.g. a 32-bit on an address divisible by 4 and a
16-bit on an address divisible by 2? Is it safe to assume that there
will be no padding at the beginning of the struct? (i.e. the compiler
will automatically position the first element with the requireed
alignment?).

The simple "overlay" approach is *not* safe or portable,
for exactly the reason you mentioned: the compiler may choose
to add unnamed padding bytes after any element of the struct.
See Questions 2.12 and 2.13 in the comp.lang.c Frequently Asked
Questions (FAQ) list

http://www.eskimo.com/~scs/C-faq/top.html

A simple memcpy() has exactly the same problem as trying
to perform the I/O directly to or from the struct itself: you're
depending on the struct layout to match the externally-imposed
format, and that's an undependable coincidence.

When you say that "just about every program I've seen seems
to do this with no ill effects or lack of portability," I think
you haven't really explored the portability dimension ...

What to do? There are three fundamental approaches, plus
others formed by combining the latter two in different ways:

0: Do the I/O directly to and from the struct and hope
there's no problem. Some compilers will add no padding
to the structs of interest, others can be told (in non-
portable ways) not to do so. Of course, your code will
fail mysteriously on Frobozz Magic C ...

1: Do the I/O with a buffer of `unsigned char' and deal
directly with the bytes therein -- no struct in sight.
You can probably write yourself a few macros and/or
functions to access the fields of interest. If there
are only a few "structured" fields and most of the
stuff is relatively free-form, this approach can be
a winner.

2: Do the I/O with a buffer of `unsigned char' and move
the buffer's fields to and from the struct elements
one by one. This sounds (and is) irksome, but if you
also want to handle details like different endianness
or different sizes and representations for basic types
you'll need to do most of this work anyhow.
 
S

Sheldon Simms

When I'm reading from or writing to a network socket, I want to use a
struct to represent the structured data, but must use an unsigned char
buffer for the call to sendto() or recvfrom().

I have two questions:

1. Is it generally safe to "overlay" the structure on the buffer,
e.g.:

unsigned char buffer[BUFSIZ];
struct header {
whatever;
};
struct header *hdr = (struct header *) buf;

The idea is ok. You might want to do it like this though:

struct header {
/* whatever */
};
unsigned char buf [sizeof(struct header)];
struct header * hdr = (struct header *)buf;

There are a couple of things to be careful about, both mentioned by
other people replying to you: array of unsigned char is the only
suitable type for your buffer. You cannot do something like this:

int foo;
buf[n] = 0xa5;
foo = hdr->member; /* try to retrieve value stored in buf[n] */

And you cannot expect this work over a network. Even though struct
header is declared identically on two different machines, it's
representation might be different on each of them. Therefore you
cannot do this:

/* machine A */
struct header a_hdr = { /* make a nice header */ };
send((unsigned char *)&a_hdr);
...

/* machine B */
struct header * hdr;
unsigned char buf [sizeof(struct header)];
/* buf might be too small for A's version of struct header */
recv(buf);
hdr = (struct header *)buf;
if (hdr->member == 0) ...
/* might be padding bytes, the wrong member, anything */
2. How should I deal with alignment?

I've read that there's no guarantee that struct members will be
adjacent in memory because of alignment requirements. So in theory, I
shouldn't try to overlay or memcpy() a struct to or from a buffer at
all.

No that's not true. The fact that structs can contain padding only
means that some bytes of the buffer will correspond to padding bytes
in the struct. That's one of the reasons why you can't just write
stuff into the buffer and expect it to show up in the struct.
Is it safe to assume that there
will be no padding at the beginning of the struct?

Yes.

-Sheldon
 
C

Clint Olsen

Are there any general guidelines on when I'm likely to run into struct
alignment padding? Is it normally safe if elements are aligned according
to size, e.g. a 32-bit on an address divisible by 4 and a 16-bit on an
address divisible by 2? Is it safe to assume that there will be no
padding at the beginning of the struct? (i.e. the compiler will
automatically position the first element with the requireed alignment?).

The alignment of a struct should be constrained by the alignment of the
first element since a pointer to the first element is compatible with a
pointer to the structure itself.

If you go to Rob Pike's paper on the Plan 9 C compiler, he shows a
technique in the portability section on how they trade data. It's not a
panacea since you still must make decisions about the number of bytes for
each type you're transferring. I also could see some issues if CHAR_BIT
wasn't the same on all platforms in question.

http://www.cs.bell-labs.com/sys/doc/comp.html

He does not cover how floating point data should be handled except to say
that it requires care.

-Clint
 
P

Peter Shaggy Haywood

Groovy hepcat Mark A. Odell was jivin' on 27 Oct 2003 15:58:13 GMT in
comp.lang.c.
Re: Copying struct and unsigned char buffer's a cool scene! Dig it!
When I'm reading from or writing to a network socket, I want to use a
struct to represent the structured data, but must use an unsigned char
buffer for the call to sendto() or recvfrom().

I have two questions:

1. Is it generally safe to "overlay" the structure on the buffer,
e.g.:

unsigned char buffer[BUFSIZ];
struct header {
whatever;
};
struct header *hdr = (struct header *) buf;

or is it safer / more portable to declare a seperate struct and use
memcpy() to copy it to or from the buffer? Obviously, it would be

That's the way to go. But copy struct members individually. (See
below.)
more efficient to avoid the memcpy() if possible.
The overlay seems to work fine, but I'm always concerned that I'm
doing something non-portable or ineligant whenever I cast a pointer.

You can overlay it safely. You're just not supposed to trust what happens
if you write via one var and read via the other, e.g. buffer[34] = 'a';
followed by if (hdr->elementN == 'a') where elementN happens to contain
byte 34 of buffer[].

Not so. There may be problems with pointer alignment. It may be that
(on some implementation) structs are only stored at even addresses,
while chars and arrays thereof may be stored at odd addresses. In that
case a pointer to struct is not guarenteed to be able to point at an
array of char. In such a situation, even trying to point a pointer to
struct at an array of char may cause undefined behaviour.

====================================================================
6.3.2.3 Pointers
....
7 A pointer to an object or incomplete type may be converted to a
pointer to a different object or incomplete type. If the resulting
pointer is not correctly aligned50) for the pointed-to type, the
behavior is undefined. ...
====================================================================
That can be accomplished by allocating N-1 bytes of storage where N is
your alignment requirement. But I don't think this will work for you since
you probably want your struct to start at the beginning of buffer not
offset by some alignment munging.

This approach is also inherently non-portable, since he must know
the alignment characteristics of the implementation he is using, and
modify the code for each new implementation with different alignment
characteristics.
If you fill in data via the struct you should be able to copy the struct
to another same type struct w/o issue.

Which doesn't really help him with his problem.

Yes. The guideline is to always assume that structs are padded, but
you don't know where and how much. (Because, on some implementations,
that will actually be the case.)
Of course, if you don't care about portability you can drop this
assumption and look up your compiler manual. But it sounds like you
(the OP) are concerned about portability. (Very good!) In that case,
assume the worst.
For that reason you should not copy an entire struct to an unsigned
char buffer in one go. Instead, as I said above, you should copy
struct members one by one. Have an extra pointer ready to keep track
of where to copy the next struct member to/from. Use memcpy() to copy
each member (to minimise alignment issues, as discussed above). You
could maybe write functions to copy a struct to and from a buffer.
Eg.:

#include <string.h>

struct foo
{
int a;
double b;
char c[42];
};

void foo2buf(unsigned char *dst, struct foo *src)
{
unsigned char *off;

off = dst;
memcpy(off, src->a, sizeof src->a);
off += sizeof src->a;
memcpy(off, src->b, sizeof src->b);
off += sizeof src->b;
memcpy(off, src->c, sizeof src->c);
}

void buf2foo(struct foo *dst, unsigned char *src)
{
unsigned char *off;

off = src;
memcpy(dst->a, off, sizeof dst->a);
off += sizeof dst->a;
memcpy(dst->b, off, sizeof dst->b);
off += sizeof dst->b;
memcpy(dst->c, off, sizeof dst->c);
}

Even better, make these functions work more portably by copying only
items of a predetermined size and representation (ie., endianness,
representation of negative numbers, etc.). (You may have to do some
manipulating of data, converting C data types to/from some other form,
to achieve this.) This is left as an exercise.

--

Dig the even newer still, yet more improved, sig!

http://alphalink.com.au/~phaywood/
"Ain't I'm a dog?" - Ronny Self, Ain't I'm a Dog, written by G. Sherry & W. Walker.
I know it's not "technically correct" English; but since when was rock & roll "technically correct"?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,189
Latest member
CryptoTaxSoftware

Latest Threads

Top