beginner question about storing binary data in buffers, seeing binarydata in a variable, etc

darren · Jul 4, 2008

Hi there

Im working on an assignment that has me store data in a buffer to be
sent over the network. I'm ignorant about how C++ stores data in an
array, and types in general.

If i declare an array of chars that is say 10 bytes long:
char buff[10];
does this mean that i can safely store 80 bits of data?

When i think of an array of chars, i think each spot in the array as a
sequence of 8 1's or 0's. Is this a correct visualization? I guess
my question here is why do most buffers seem to be implemented as char
arrays? Can any binary value between 0 and 255 be safely put into a
char array slot (00000000 to 11111111). Why not implement a buffer
using uint8_t ?

Obviously I have a very loose grasp on how buffers are saving data,
and how a receiver gets this data on their end. I understand the
sockets stuff, just not the buffer-specific stuff. Any enlightenment
would be most appreciated.

thanks.

James Kanze · Jul 5, 2008

* darren:

A textbook would be good resource.

Most probably don't say anything about it, because it's
unspecified. Except for the specifications of how pointer
arithmetic works within arrays.

[...]

Are they?

Transmission buffers, yes. char[] or unsigned char[] are really
your only two choices. (I generally use unsigned char, but the
C++ standard does try to make char viable as well. And on most
typical architectures, where converting between char and
unsigned char doesn't change the bit pattern, both work equally
well in practice.)

It depends what you're really asking.

If you provided a concrete example and what you expected as
result, one could say whether that was correct or not.

But a char has guranteed at least 256 possible bitpatterns
(minimum 8 bits), yes.

On the other hand, I think in theory, char could be signed 1's
complement, and assigning a negative 0 (0xFF) could force it to
possitive (which would mean that you could never get 0xFF by
assignment---but you could memcpy it in). I think: I'm too lazy
to verify in the standard, and of course, any implementation
that actually did this would break so much code as to be
unviable.

That's not presently a standard C++ type.

It's still a viable alternative.

It's possible to write code for binary network protocols in a
perfectly portable manner. It's rarely worth it, since it
entails some very complex work arounds for what are, in the end,
very rare and exotic machines that most of us don't have to deal
with. Thus, I know that much of the networking software I write
professionally will fail on a machine with an unusual convertion
of unsigned to signed (i.e. which isn't 2's complement, and
doesn't just use the underlying bit pattern).

James Kanze · Jul 5, 2008

In that case any implementation for 1's complement that used
1's complement also for signed 'char' would be unviable...

As it happens, in the two implementations I'm aware of where
signed integers are not 2's complement, plain char is unsigned,
thus avoiding the problem. (If one of the goals of plain char
is to contain text characters, then it really should be unsigned
anyway. Historically, however, making char unsigned had a
non-negligible runtime cost on a PDP-11, and since back then,
all the world was PDP-11, and the only character set which
counted was ASCII, which only uses the lower 7 bits...)

It makes an interesting case for dropping that support in the
standard, and go for requirement of two's complement for all
signed integral types.

Perhaps a better choice would be to require that plain char be
unsigned, so that you could safely use it with the results of
e.g. istream::get() or fgetc(). (Or both, but I don't think
you'll find much support for either in the committee.)

Yes, and the main reason for "why not" is that it's not a
standard C++ type.

Tune up the warning level, perhaps? <g>

What I'm waiting for is a machine which will core dump if the
conversion fails (i.e. doesn't result in the same value). I
don't really expect to see it, however, given that one standard
idiom in C is things like:

int ch = fgetc( input ) ;
while ( ch != EOF && someOtherConditions( ch ) ) {
*p ++ = ch ; // Where p is a char*...
}

It's a bit surprising that something this widespread is
implementation defined, and may result in an implementation
defined signal (according to the C standard---the C++ standard
still has the imprecisions of C90). Because it is so
widespread, however, I don't expect to see a compiler which
doesn't support it anytime soon. (As I said, all of the
"exotic" architectures that I know make plain char unsigned,
which effectively removes the "implementation defined" here.)

Setting array size with a variable - What does the C compiler do?	3	Feb 25, 2022
Beginner question: binary data and socket.send	1	Dec 21, 2009
Storing Data in Binary Field using ADODB	0	May 23, 2005
Storing data of different byte sizes	7	May 23, 2009
Different data types when working with JPEG image buffers (beginner)	6	Dec 28, 2006
Storing binary data in String/StringBuffer	10	Mar 27, 2007
storing .txt file in an array..	0	Mar 24, 2011
Storing/processing binary file input help needed	7	Jan 6, 2004

beginner question about storing binary data in buffers, seeing binarydata in a variable, etc

darren

James Kanze

James Kanze

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads