processing bytes

J

jmoy

I have some data (say in a file) that needs to be handled byte by
byte. Source code I have looked at does this by treating the data as a
stream of 'char's. However, the standard does not require a 'char' to
be a byte wide. Stroustrup's TC++PL mentions an implementation where a
'char' is four bytes. How can I write my program so that it will work
even with such an implementation?
 
R

Rolf Magnus

jmoy said:
I have some data (say in a file) that needs to be handled byte by
byte. Source code I have looked at does this by treating the data as a
stream of 'char's. However, the standard does not require a 'char' to
be a byte wide.

Yes, it does. It just doesn't reqire a byte to be exactly 8 bit wide.
Stroustrup's TC++PL mentions an implementation where a 'char' is four
bytes.

I doubt that.
How can I write my program so that it will work even with such an
implementation?

You would need to read in the bytes and split them up using bit
manipulation operators.
 
T

Thomas Matthews

jmoy said:
I have some data (say in a file) that needs to be handled byte by
byte. Source code I have looked at does this by treating the data as a
stream of 'char's. However, the standard does not require a 'char' to
be a byte wide. Stroustrup's TC++PL mentions an implementation where a
'char' is four bytes. How can I write my program so that it will work
even with such an implementation?

In modern times, few devices actually require byte by byte handling.
Most of them are designed for speed, and that speed is most efficient
handling "streams" of bytes. For example, many files like to transfer
sectors at a time. So one would allocate an array the size of a sector,
read the sector from the file, then parse through the array.

For text files, especially ones that have records delineated by a
newline, are best read line by line into a std::string. This is still
not processing the file byte by byte.

Also, you will have to separate in your mind, the concept between a
byte, octet and a character. A byte is the minimal unit in computing;
it can be 8 or more bits. The bits in a byte need not be a multiple or
power of 8. An octect is a unit of 8 bits. A character is a single
textual unit, often times a letter. The character may be as small as
6 bits or higher; the number of bits used depends on the platforms
character encoding scheme. The CDC Cyber computers have a 6 / 12 bit
character (popular letters take 6 bits, less popular require 12 bits).
Some Asian character sets require 16 or more bits. Just remember that
there is a difference between a byte, octet and char.


--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library book
 
R

red floyd

jmoy said:
I have some data (say in a file) that needs to be handled byte by
byte. Source code I have looked at does this by treating the data as a
stream of 'char's. However, the standard does not require a 'char' to
be a byte wide. Stroustrup's TC++PL mentions an implementation where a
'char' is four bytes. How can I write my program so that it will work
even with such an implementation?

Didn't the CDC have a 36-bit word? Not sure if C implementations used a
6-bit, 8-bit, 9-bit, 18-bit or 36-bit char.
 
M

Mike Wahler

jmoy said:
I have some data (say in a file) that needs to be handled byte by
byte.

Then use type 'unsigned char'.
Source code I have looked at does this by treating the data as a
stream of 'char's.

That's the only way i/o can be done in (standard) C++.
However, the standard does not require a 'char' to
be a byte wide.

Really?


ISO/IEC 14882:1998(E)

1.7 The C++ memory model

1 The fundamental storage unit in the C++ memory model is the
byte. A byte is at least large enough to contain any member
of the basic execution character set and is composed of a
contiguous sequence of bits, the number of which is
implementation­defined. The least significant bit is called
the low­order bit; the most significant bit is called the
high­order bit. The memory available to a C++ program consists
of one or more sequences of contiguous bytes. Every byte has
a unique address.
Stroustrup's TC++PL mentions an implementation where a
'char' is four bytes.

I can't seem to locate this mention in my copy of Stroustrup.
What page?
How can I write my program so that it will work
even with such an implementation?

Such an implementation (where 'char' types have a size greater
than one byte) does not conform to the C++ standard.

-Mike
 
B

Bill Seurer

red said:
Didn't the CDC have a 36-bit word? Not sure if C implementations used a
6-bit, 8-bit, 9-bit, 18-bit or 36-bit char.

60 bits. Characters were 6 bits in Pascal.
 
J

JKop

jmoy posted:
I have some data (say in a file) that needs to be handled byte by
byte. Source code I have looked at does this by treating the data as a
stream of 'char's. However, the standard does not require a 'char' to
be a byte wide. Stroustrup's TC++PL mentions an implementation where a
'char' is four bytes. How can I write my program so that it will work
even with such an implementation?


I thought there was something in the Standard that said a byte had to be
atleast 8 bits? Is there anything at all in the Standard limiting the
minimum size? I sure hope there is!

-JKop
 
R

Rolf Magnus

JKop said:
jmoy posted:



I thought there was something in the Standard that said a byte had to
be atleast 8 bits? Is there anything at all in the Standard limiting
the minimum size? I sure hope there is!

Yep. In C++, the minimum number of bits in a byte is 8. There is however
no maximum number, so a conforming implementation could be made that
has 5173 bits/byte.
 
J

JKop

Rolf Magnus posted:
Yep. In C++, the minimum number of bits in a byte is 8. There is however
no maximum number, so a conforming implementation could be made that
has 5173 bits/byte.


I was thinking, if there _wasn't_ a lower minimum, then even things like so
would be undefined behaviour:


unsigned int k = 515;


-JKop
 
R

Rolf Magnus

JKop said:
Rolf Magnus posted:



I was thinking, if there _wasn't_ a lower minimum, then even things
like so would be undefined behaviour:


unsigned int k = 515;

Well, for unsigned int, a minimum range from 0 to 65535 is guaranteed.
 
J

Jack Klein

In modern times, few devices actually require byte by byte handling.
Most of them are designed for speed, and that speed is most efficient
handling "streams" of bytes. For example, many files like to transfer
sectors at a time. So one would allocate an array the size of a sector,
read the sector from the file, then parse through the array.

Actually, I need to disagree with you on this. While it may hold true
for high-level applications in hosted environments, it breaks down
quickly for many type of communication interfaces even in those same
environments.

Many programs must deal with a "clump" of data obtained from
(Ethernet, FireWire, USB, CAN, serial port, etc.). The details of the
device are hardware specific and off-topic here, but how a program can
parse and extract various data types from such a "clump" is not.
 
J

JKop

Rolf Magnus posted:
Well, for unsigned int, a minimum range from 0 to 65535 is guaranteed.

If it ain't too much trouble, could you please post all the limits.

-JKop
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top