Binary files, little&big endian setting bits

S

Steve

Hi, i know this is an old question (sorry)

but its a different problem, i need to write a binary file as follows

00000011
00000000
00000000
00000101
00000000
11111111
11111111
00000000

program will be compiled in Microsoft Visual C++

was thinking of just writing it as chars (afaik chars are the only
unsigned int thats only 1 byte) so basicly i'll be writing
3,0,0,5,0,256,256,0

question is if i write a file like that will it come out as the bits
above, does VC++ write little or big endian and other than endian
issues if it doesn't come out as above, why not??
 
D

dandelion

Steve said:
Hi, i know this is an old question (sorry)

Off topic, too...

question is if i write a file like that will it come out as the bits
above, does VC++ write little or big endian and other than endian
issues if it doesn't come out as above, why not??

The Intel 80x86 processors are Little Endian, regardless of your compiler,
16 bit and 32-bit words will not be "in order" but will have their order
reversed. Writing bytes, however, you should have no endiannes problems.

See
http://www.cs.umass.edu/~verts/cs32/endian.html
http://www.rdrop.com/~cary/html/endian_faq.html
for details.
 
J

Jens.Toerring

Steve said:
but its a different problem, i need to write a binary file as follows

program will be compiled in Microsoft Visual C++

Should be irrelevant here, if you need something VC++ specific you
better should ask in some MS related newsgroup.
was thinking of just writing it as chars (afaik chars are the only
unsigned int thats only 1 byte)

Sorry, but a char isn't nessecarily a single byte (i.e. 8 bits) - a
char can have different numbers of bits on different architectures.
that tells you how many bits a said:
so basicly i'll be writing
3,0,0,5,0,256,256,0
question is if i write a file like that will it come out as the bits
above, does VC++ write little or big endian and other than endian
issues if it doesn't come out as above, why not??

When you write single bytes endianness isn't an issue at all - it
only becomes a problem when you write out data with a size larger
than a byte.
Regards, Jens
 
E

Eric Sosman

Steve said:
Hi, i know this is an old question (sorry)

but its a different problem, i need to write a binary file as follows

00000011
00000000
00000000
00000101
00000000
11111111
11111111
00000000

program will be compiled in Microsoft Visual C++

was thinking of just writing it as chars (afaik chars are the only
unsigned int thats only 1 byte) so basicly i'll be writing
3,0,0,5,0,256,256,0

question is if i write a file like that will it come out as the bits
above, does VC++ write little or big endian and other than endian
issues if it doesn't come out as above, why not??

I think you're asking about the order in which the
individual bits of each byte will be written: will the
first bit of the 3 be the high-order zero or the low-
order one?

To begin with, there may not *be* any order at all.
For example, suppose the output is sent to a parallel
interface that presents all eight bits simultaneously:
which bit is "first among equals" when they all march
in line abreast? The individual bits may not even
exist as discrete units: Consider writing to a modem
that encodes many bits in each signal transition, or
which uses data compression and winds up transmitting
2.71828 bits to encode the eight you presented? At the
C language level -- and even at the machine language
level, for most machines -- the byte is an indivisible
unit of I/O, and since it's indivisible the "order" of
its components cannot be discerned.

The question does eventually arise, at the level of
the medium on which the data is stored or through which
it is transmitted. And here, each storage device or
transmission medium has its own standards for the encoding
of these "indivisible" bytes. Some, like serial interfaces,
will indeed "split the atom" and transmit the individual
bits in a specified other. Others, like SCSI controllers,
designate specific signal lines for specific bits. Still
others, like card punches (anybody remember punched cards?)
will produce a pattern of holes that encode the character
designated by 3; this pattern will probably not have any
obvious relation to the original eight bits.

But you needn't worry about this unless you're the
person charged with implementing the electrical interface
to the storage or transmission medium. It is the job of
that interface to accept the serialized bits or the SCSI
signals or the holes in a punched card and to reconstitute
the indivisible byte value from them. As a programmer you
almost never care about the details (unless, perhaps, you're
writing diagnostic code that wants to produce specified
patterns in the signal lines to detect cross-talk, or that
sort of thing). You write out a 3, and it's the business
of the various media through which that 3 travels to ensure
that a 3 comes out at the other end. No huhu, cobber.

Where you *do* need to worry about endianness issues
is when you're dealing with multi-byte data objects: the
low-level media take care of individual bytes for you, but
you're responsible for arranging those bytes into larger
structures. Different systems have different conventions
for such arrangements, and that's why you can't just use
`fwrite(&int_value, sizeof int_value, 1, stream)' to send
an integer from one system to another. But once you've
settled on an "exchange format" that specifies the order
and meaning of the individual bytes, all you need to do is
decompose your larger objects into those bytes before
writing them, and reassemble the bytes into the larger
objects when reading. The actual form of the bytes "in
flight" is not your problem.

The only possible worry you might have with byte-by-
byte data exchange is if the machines use different byte
sizes: Exchanging data between machines with 8-bit and
9-bit bytes, for instance, can be tricky. But if you're
dealing with a common byte size, all is well.
 
C

Chris Torek

... i need to write a binary file as follows
00000011
00000000
00000000
00000101
00000000
11111111
11111111
00000000
... (afaik chars are the only unsigned int thats only 1 byte) so
basicly i'll be writing 3,0,0,5,0,256,256,0

Actually, eight 1 bits, treated as an unsigned char, represents the
value 255, not 256.

Eric Sosman has already addressed the (lack of) endianness that
occurs when 8-bit units are your atomic level of input/output.

I want to point out that in C, "byte" and "char" mean the same
thing, which is not necessarily "8 bits" -- but it probably does
not matter, in part because you are unlikely to have a 9- or 32-bit
"char" system in the first place, and in part because those have
to deal with the rest of the world.

And then I just had to write this... :)

Bits in the C

When using a protocol over a net
(like TCP/IP or one I forget)
Where the number of bits has got to be eight
The Standards for C won't keep the things straight:
A char that is un-signed has got enough bits
But it might have too many, giving you fits!

A byte is a char, and a char is a byte
Eight bits is common, but nine is in sight
Digital Signalling Processors? Whew!
Here you may find there's a whole thirty-two!

When external formats on you are imposed
The trick to remember (while staying composed):
The C system's "bytes" may well be too big
But this does not mean you must give up the jig
To talk to another, the box you are on
Must have SOME way for them to begone
("Them" being pesky extraneous bits)
It just is not Standard, the part that omits
Some high order zeros of values between
Oh oh and eff eff (and hex sure is keen!).

To hold the right values, a char that is un-signed
Will do the trick nicely, I think you will find.
Who cares if it's bigger than strictly required?
The values you need will never get mired.
The eight bits you want won't get overtired
And values you need will never get mired!

Perhaps, with some more work and a good rousing tune, this might
even make a Gilbert & Sullivan pastiche. :)
 
D

Dave Thompson

[Endianness] does eventually arise, at the level of
the medium on which the data is stored or through which
it is transmitted. <snip> Some, like serial interfaces,
will indeed "split the atom" and transmit the individual
bits in a specified other. Others, like SCSI controllers,
designate specific signal lines for specific bits. Still
others, like card punches (anybody remember punched cards?)
will produce a pattern of holes that encode the character
designated by 3; this pattern will probably not have any
obvious relation to the original eight bits.
If the bits were EBCDIC, it certainly does bear a relation obvious to
anyone who thinks a bit about it (and knows the BCDIC history); even
for ASCII significant chunks of the translation to and from EBCDIC
(and thus card aka Hollerith) are systematic.

(Otherwise concur.)

Now, if you want an octet-parallel interface people will probably have
trouble remembering, how about IEEE-488 (IIRC) GPIB nee HPIB? <G>


- David.Thompson1 at worldnet.att.net
 
F

Flash Gordon

On Mon, 01 Nov 2004 08:14:12 GMT

Now, if you want an octet-parallel interface people will probably have
trouble remembering, how about IEEE-488 (IIRC) GPIB nee HPIB? <G>

I think it was originally HPIB (Hewlet Packard Interface Bus), then GPIB
and IEEE-488 came alone as later names for it.

I've made plenty of use of it in the past talking to DSO, DMM...

I also did some low level hacking around with it trying to detect if kit
was connected without crashing the program doing the check or locking up
the bus. All in HP Pascal.

So I have absolutely no trouble remembering it and know where there is
kit still making use of it. :)
 
M

Mike Wahler

Should be irrelevant here, if you need something VC++ specific you
better should ask in some MS related newsgroup.

Steve, note that type 'char' might or might not be unsigned.
This is defined by the implementation. If you want to ensure
an unsigned type, explicitly say so:

unsigned char c;
Sorry, but a char isn't nessecarily a single byte

Actually, yes it is.

See ISO 9899:

3.6

3.7.1

5.2.1/3

(i.e. 8 bits) - a
char can have different numbers of bits on different architectures.

As can a byte. "byte equals eight bits" is a very common
misconception.

-Mike
 
O

Old Wolf

Mike Wahler said:
As can a byte. "byte equals eight bits" is a very common
misconception.

OOI, how many bits are there in a kilobyte, if 1 byte is 32 bits?
Should I start referring to file sizes in bits to avoid confusion?
 
M

Mike Wahler

Old Wolf said:
OOI, how many bits are there in a kilobyte, if 1 byte is 32 bits?

1024 * 32
Should I start referring to file sizes in bits to avoid confusion?

If you need to be that precise in your specification, yes.

-Mike
 
C

Chris Croughton

1024 * 32


If you need to be that precise in your specification, yes.

That's the reason why communications specifications use the term
'octet', defined as being exactly 8 bits, because they need to be
specific about how many of them are used for fields. They also specify
the order of them (and order of bits if that is significant) to be
totally precise (big- and little-endian confusion is a major cause of
programming errors in comms software). I often define an explicit type
'octet' in my code (the same as uint8_t in C99, but not all compilers
are C99 and have stdint.h yet).

Chris C
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,144
Latest member
KetoBaseReviews
Top