1 byte = 32 bits?

dragan · Dec 5, 2009

Wouldn't it be nice if one byte was well-defined as 32 bits? Please list the
pros and cons of such a hypothetical thing in this thread.

Ian Collins · Dec 5, 2009

dragan said:
Wouldn't it be nice if one byte was well-defined as 32 bits? Please list the
pros and cons of such a hypothetical thing in this thread.

It isn't hypothetical on some platforms and it would be a royal pain in
the arse on others!

Marcel Müller · Dec 5, 2009

Hi,

Wouldn't it be nice if one byte was well-defined as 32 bits? Please list the
pros and cons of such a hypothetical thing in this thread.

pros: none
reason: nothing changes, you only use another terminology. You still
need 8 bits where 8 bits are required, and so on.

cons: a lot of confusion
reason: most people use 'byte' as synonym for 8 bits and not for any
type that could be of arbitrary length.

So why do you want to change the terminology? Or what do you expect to
change with that? There are already thousands of CPU designs around that
use 32 bit (or even more) as machine size word.

Some portable standards like MPEG or Ogg avoided the word byte, since
it's historic meaning was 'storage for one (latin) character'. They
defined 'octet' instead. However, the meaning of byte has silently
changed with time and is now the same as octet. This is a common
evolution in living languages.

Marcel

Alf P. Steinbach · Dec 5, 2009

* Marcel Müller:

Some portable standards like MPEG or Ogg avoided the word byte, since
it's historic meaning was 'storage for one (latin) character'. They
defined 'octet' instead. However, the meaning of byte has silently
changed with time and is now the same as octet. This is a common
evolution in living languages.

In C++ the meaning of "byte" is smallest addressable unit of memory.

Cheers & hth.,

- Alf

Rolf Magnus · Dec 5, 2009

Marcel said:
Hi,

pros: none
reason: nothing changes, you only use another terminology. You still
need 8 bits where 8 bits are required, and so on.

But you can't use them anymore, since there is no smaller unit than a byte.

cons: a lot of confusion
reason: most people use 'byte' as synonym for 8 bits and not for any
type that could be of arbitrary length.

Then those people are confused anyway, since that was never the definition
of "byte", neither in C++, nor in general.

So why do you want to change the terminology? Or what do you expect to
change with that? There are already thousands of CPU designs around that
use 32 bit (or even more) as machine size word.

Some portable standards like MPEG or Ogg avoided the word byte, since
it's historic meaning was 'storage for one (latin) character'. They
defined 'octet' instead.

The use of 'octet' for a unit of eight bits is a lot older than mpeg or ogg,
and it's common in computer standards. See
http://wapedia.mobi/en/Octet_(computing) for a brief history about it.

Johannes Bauer · Dec 5, 2009

dragan said:
Wouldn't it be nice if one byte was well-defined as 32 bits? Please list the
pros and cons of such a hypothetical thing in this thread.

Yeah, it would most definitely be a great thing! When we change that we
could also change "chair" to "car", "car" to "window" and "window" to
"chair". Maybe change the meaning of "seven" and "thirteen", but I'm
still unsure.

It would make life so much easier for everybody! What visionaries we are.

Regards,
Johannes

--
"Aus starken Potentialen können starke Erdbeben resultieren; es können
aber auch kleine entstehen - und "du" wirst es nicht für möglich halten
(!), doch sieh': Es können dabei auch gar keine Erdbeben resultieren."
-- "Rüdiger Thomas" alias Thomas Schulz in dsa über seine "Vorhersagen"
<1a30da36-68a2-4977-9eed-154265b17d28@q14g2000vbi.googlegroups.com>

Kaz Kylheku · Dec 5, 2009

Wouldn't it be nice if one byte was well-defined as 32 bits? Please list the
pros and cons of such a hypothetical thing in this thread.

Code which assumes that UCHAR_MAX is a small number will blow up.

int char_translation_table[UCHAR_MAX]; // oops!

Communication with other machines and file formats is difficult, because
the rest of the world is ``byte == octet''.

When you open a binary file on this platform, you are reading 32 bit
unsigned chars. If the file came from another system, what happens
to its bytes? When we read the 32-bit bit byte of a JPEG file, are
we getting four octets of its header? In what order?

Maybe you will need some file mode for reading 8 bits at a time
from a file in a well-defined order.

What about networking? What will a packet look like when your
ethernet card receives it? Four octets to a byte?

Etc.

Kaz Kylheku · Dec 5, 2009

Yeah, it would most definitely be a great thing! When we change that we
could also change "chair" to "car", "car" to "window" and "window" to
"chair". Maybe change the meaning of "seven" and "thirteen", but I'm
still unsure.

The C and C++ standard do not define ``byte'' as ``8 bits''.

That is why in <limits.h>, or <climits> we have the CHAR_BIT constant.

http://en.wikipedia.org/wiki/36-bit

Now 9 bit bytes are workable. There is an obvious ways in which external
data can be handled. If you download a file to the 9 bit system, or
receive a packat, its 8 bit bytes can just spread into 9 bit bytes;
the waste is small. When generating external data, do not generate
bytes in the range 256-511.

With 32 bit bytes, if external data is treated the same way, it
quadruples in size.

dragan · Dec 6, 2009

dragan said:
Wouldn't it be nice if one byte was well-defined as 32 bits? Please
list the pros and cons of such a hypothetical thing in this thread.

I guess I'll have to help you all with the concept.

A key part to this
excercise is: use of imagination. Apparently none of the responders have any
(

hehe). Maybe you'll give it another go with this further information
that I did not want to give right away so that I wouldn't be coloring the
responses in any way. So, herein is some more information.

The key word in the OP is 'hypothetical'.

There are not many constraints in the OP. I was hoping that someone would
have IMAGINED what everything would be like today if a byte was 32-bits from
the start, but all scenarios are fair game: there are no limits (or there
weren't? First responders have beaten the dead horse cliches?). That
probably gets some imaginations going. (?) All who responded , wrongly
assumed that the OP was suggesting changing the definition of "byte".
Perhaps the OP then is kind of like one of those drawings that contain
multiple scenes depending on how you are processing the information (and who
you are also, no doubt: your values, experiences and such): the pretty girl
or the witch-like woman.

That's probably enough information for a second round, if you want to "try
again" or "try" for the first time (there are no wrong responses, though the
ones so far have been cliche or lame or something IMO). So, let your
imagination run wild and post a response!

BGB / cr88192 · Dec 6, 2009

dragan said:
I guess I'll have to help you all with the concept. A key part to this
excercise is: use of imagination. Apparently none of the responders have
any ( hehe). Maybe you'll give it another go with this further
information that I did not want to give right away so that I wouldn't be
coloring the responses in any way. So, herein is some more information.

The key word in the OP is 'hypothetical'.

There are not many constraints in the OP. I was hoping that someone would
have IMAGINED what everything would be like today if a byte was 32-bits
from the start, but all scenarios are fair game: there are no limits (or
there weren't? First responders have beaten the dead horse cliches?). That
probably gets some imaginations going. (?) All who responded , wrongly
assumed that the OP was suggesting changing the definition of "byte".
Perhaps the OP then is kind of like one of those drawings that contain
multiple scenes depending on how you are processing the information (and
who you are also, no doubt: your values, experiences and such): the pretty
girl or the witch-like woman.

That's probably enough information for a second round, if you want to "try
again" or "try" for the first time (there are no wrong responses, though
the ones so far have been cliche or lame or something IMO). So, let your
imagination run wild and post a response!

well, it is worth noting also that, among programmers, the SJ tempermant is
apparently very common (some research was apparently done which found it to
be more common than the NT tempermant among programmers, which is still more
than NF, with SP is last place...).

myself included, as an ESTJ apparently (MBTI, LSE is the socionics
equivalent...).

so, what then is the SJ stereotype?
errm... not having whole lots of imagination, and maybe worrying about rules
and conventions (or, oddly, some things say tending towards a systematic and
materialistic worldview, ...), ...

there are pros and cons I guess, or one can also assert that people are
people and can be whoever they want to be, granted...

ok then:
you would either need way the hell more RAM;
otherwise, teh craploads of shifting and masking to access any smaller
members (such as, what we currently call bytes...).

not even using 16 bits for character strings is all that compelling, why?
because, in the average case, lots of space is wasted say, than would be
used with UTF-8, even though the extended characters take more bytes.

this is because for "most" text, the characters fit nicely in 1 or 2 bytes.
it is then mostly just asian languages which need more bytes, as most of the
other non-latin alphabets (greek, cyrillic, arabic, ...) happen to fall
nicely in the 2-byte break-even range.

it can also be observed that, in many uses, even for common forms of asian
text, many latin/... characters are present, and to may reduce the overall
cost of the expansion of the non-latin chars (or even improve on the net
size).

consider, for example, Chinese or Japanese source code, or even HTML, where
the vast majority of the characters are typically latin (either part of the
source-code character set, or as markup tags, ...).

hence, in the general case, UTF-8 may be a denser encoding on average than
UTF-16 (for some hypothetical open-ended collection of text).

similarly, there is little to be gained WRT performance, since most modern
CPUs handle memory in larger units anyways (I think many newer processors
generally work with data 128 bits at a time, and there is the trick that the
low-order address bits can actually be used to swap around the bits allowing
one to address smaller units as a single access, although this trick
generally requires data to be properly aligned, ...).

James Kanze · Dec 7, 2009

pros: none
reason: nothing changes, you only use another terminology. You
still need 8 bits where 8 bits are required, and so on.

Even on machines which don't support 8 bit accesses? There are
systems on which bytes are 32 bits; there are also some where
they are 9 bits, and historically, other values have existed.
(C and C++ require at least 8 bits, but historically, 6 and 7
were once common.)

cons: a lot of confusion
reason: most people use 'byte' as synonym for 8 bits and not
for any type that could be of arbitrary length.

Since when? The original use of 'byte' was for a 6 bit entity.
Most people use octet when exactly 8 bits are needed. In
everyday computer language, a "byte" is an addressable element
of memory smaller than a word.

So why do you want to change the terminology? Or what do you
expect to change with that? There are already thousands of CPU
designs around that use 32 bit (or even more) as machine size
word.

Some portable standards like MPEG or Ogg avoided the word
byte, since it's historic meaning was 'storage for one (latin)
character'.

The historical meaning was an element smaller than a word (which
was mainly used for characters). That's the modern meaning as
well, outside of the C/C++ standards.

They defined 'octet' instead. However, the meaning of byte
has silently changed with time and is now the same as octet.
This is a common evolution in living languages.

Except that byte is still currently used for nine bit values as
well, in general computer language. And for 32 bit values in
some implementations of C and C++.

James Kanze · Dec 7, 2009

Code which assumes that UCHAR_MAX is a small number will blow up.

int char_translation_table[UCHAR_MAX]; // oops!

Communication with other machines and file formats is
difficult, because the rest of the world is ``byte == octet''.

That's not really true. Today, the rest of the world does
assume that a byte can hold an octet, but that's about all. And
there are definitely machines around (still being built and
sold) today with 9 bit bytes.

When you open a binary file on this platform, you are reading
32 bit unsigned chars. If the file came from another system,
what happens to its bytes? When we read the 32-bit bit byte of
a JPEG file, are we getting four octets of its header? In what
order?

You probably get one octet per byte. (I've never actually used
a machine with 32 bit bytes, but supposedly, the exist.) Most
modern transmission protocols are defined in terms of octets
(although in the past, I worked on one which used 5 bit
elements).

dragan · Dec 8, 2009

BGB / cr88192 said:
well, it is worth noting also that, among programmers, the SJ tempermant
is apparently very common (some research was apparently done which found
it to be more common than the NT tempermant among programmers, which is
still more than NF, with SP is last place...).

myself included, as an ESTJ apparently (MBTI, LSE is the socionics
equivalent...).

so, what then is the SJ stereotype?
errm... not having whole lots of imagination, and maybe worrying about
rules and conventions (or, oddly, some things say tending towards a
systematic and materialistic worldview, ...), ...

there are pros and cons I guess, or one can also assert that people are
people and can be whoever they want to be, granted...

Meyers-Briggs is a crock.

ok then:
you would either need way the hell more RAM;

Either that or...

otherwise, teh craploads of shifting and masking to access any smaller
members (such as, what we currently call bytes...).

that. Definitely cons.

not even using 16 bits for character strings is all that compelling, why?
because, in the average case, lots of space is wasted say, than would be
used with UTF-8, even though the extended characters take more bytes.

Though it would eliminate a lot of gyrations that multi-byte encoding
algorithms require. The UNICODE fiasco was one of the things that was on my
mind when I posed "the question" in the OP. The other thing was the
elimination of padding and alignment issues on a wide class of platforms
(not those bizare Sun things, RIP), sparked by another poster who was asking
about cross-platform struct transfer.

Of course "I can have my cake and eat it too": nothing stops anyone from
using only 32-bit integers. Maybe not a bad idea in omposed structs,
especially if you are trying to avoid serializing.

this is because for "most" text, the characters fit nicely in 1 or 2
bytes. it is then mostly just asian languages which need more bytes, as
most of the other non-latin alphabets (greek, cyrillic, arabic, ...)
happen to fall nicely in the 2-byte break-even range.

Yes, probably the "only" place 32-bit bytes would be practical is in the CJK
world.

it can also be observed that, in many uses, even for common forms of asian
text, many latin/... characters are present, and to may reduce the overall
cost of the expansion of the non-latin chars (or even improve on the net
size).

consider, for example, Chinese or Japanese source code, or even HTML,
where the vast majority of the characters are typically latin (either part
of the source-code character set, or as markup tags, ...).

hence, in the general case, UTF-8 may be a denser encoding on average than
UTF-16 (for some hypothetical open-ended collection of text).

similarly, there is little to be gained WRT performance,

Performance was never considered to be an issue (by me). The remote thought
was "in search of" anything that would eliminate a whole area of constant
thought while programming so as to simplify it by leaving more brain power
for important things. I think the other poster looking to aleviate some of
the tedium of serializing is on the right track, but I do think that that
has been worked on and solved formally long ago, I just can't remember the
place where that was done. There is a formal and comprehensive specification
for data exchange that I'm thinking of, but not XDR or ASN I believe...
slips my mind.

since most modern CPUs handle memory in larger units anyways (I think many
newer processors generally work with data 128 bits at a time,

Is that right? I thought mostly 64-bit now (assuming an OS to take advantage
of it) and specialized (SSE) instructions for 128-bit. I haven't read Intel
hardware specs lately, but I don't remember seeing anything ever about all
the registers being 128-bit.

aku ankka · Dec 11, 2009

Then those people are confused anyway, since that was never the definition
of "byte", neither in C++, nor in general.

That is interesting conclusion; I see KB, MB, GB etc. used all over
the place by everybody! Some hard-core folks have been trying to
introduce KiB, MiB, etc. to replace the deceptive KB, MB, GB, TB and
the ilk (especially hard-drive manufacturers have found the
distinction between 10^3 and 2^10 very useful, but I digress.

"just pointing out the obvious"

James Kanze · Dec 11, 2009

That is interesting conclusion; I see KB, MB, GB etc. used all
over the place by everybody! Some hard-core folks have been
trying to introduce KiB, MiB, etc. to replace the deceptive
KB, MB, GB, TB and the ilk (especially hard-drive
manufacturers have found the distinction between 10^3 and 2^10
very useful, but I digress.

"just pointing out the obvious"

The obvious what? I've only seen KB, MB, etc. used with regards
to a specific hardware: my PC has 4MB main memory, etc. On most
specific hardware, the bytes have a specific size (8 bits on a
PC, although on the old PDP-10, the size of a byte was
programmable, with both 7 and 9 bit bytes being commonly used).
And on machines which aren't byte addressable, you probably
won't hear KB, MB etc. being used.

Nick Keighley · Dec 11, 2009

Most
modern transmission protocols are defined in terms of octets
(although in the past, I worked on one which used 5 bit
elements).

baudot? telex? AFTN?

Nick Keighley · Dec 11, 2009

I was hoping that someone would
have IMAGINED what everything would be like today if a byte was 32-bits from
the start,

what I call overuse of the subjunctive

"I don't deal with "if". "if" is a little word that fills volumes.
If a bullfrog had wings he wouldn't bump his rear everytime he
hopped"
Don King- boxing promoter

aku ankka · Dec 11, 2009

The obvious what? I've only seen KB, MB, etc. used with regards
to a specific hardware: my PC has 4MB main memory, etc. On most
specific hardware, the bytes have a specific size (8 bits on a
PC, although on the old PDP-10, the size of a byte was
programmable, with both 7 and 9 bit bytes being commonly used).
And on machines which aren't byte addressable, you probably
won't hear KB, MB etc. being used.

"Then those people are confused anyway, since that was never
the definition of "byte", neither in C++, nor in general."

Are we discussing ideas or how the ideas are worded here? The de-facto
standard for byte is 8 bits, "everyone knows that".

What "everyone" doesn't know is that octet is 8 bits and that byte can
come in many sizes ; this is very specialized knowledge that a good
computer architect, software engineer and such discipline practitioner
*should* know.

It is a poor form to say that "everyone" is confused, except the
handful of selected people. That's all. My sincerest apologies if I
thought this was FUCKING OBVIOUS. =) =)

64 bits values or 32 bits	2	Jun 30, 2011
Help with recompiling a small software into 32 bit format	0	Jun 2, 2023
Suggestions on building an AI?	0	Oct 4, 2022
32/64 bit cc differences	110	Jan 10, 2014
Why does 1 represent a negative sign bit?	0	Aug 10, 2022
CSS Grid in a nutshell (Part 1)	0	May 31, 2023
Question about bits (debugging>	11	Mar 31, 2008
Storing many 32-bits "parameters" ?	6	Nov 29, 2008

1 byte = 32 bits?

dragan

Ian Collins

Marcel Müller

Alf P. Steinbach

Rolf Magnus

Johannes Bauer

Kaz Kylheku

Kaz Kylheku

dragan

BGB / cr88192

James Kanze

James Kanze

dragan

aku ankka

James Kanze

Nick Keighley

Nick Keighley

aku ankka

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads