in which form real and integer values are saved in memory

Tim Rentsch · Jul 8, 2009

Keith Thompson said:
It would be difficult to arrange that, though not impossible.

Depending on how this statement is meant, the proposition is
either trivially true or wrong.

Clearly a C implementation can use BCD to encode C values
using the simple trick of having one BCD digit per bit.
One octet (two nibbles) would hold two C "bits", or four
octets per char (with CHAR_BIT == 8).

However, if "representation" is meant in the sense that
the Standard uses the term (as in "object representation",
for example), integers cannot be represented using
BCD, as that is not consistent with other requirements
for representation of integers.

Keith Thompson · Jul 8, 2009

Stephen Sprunk said:
Keith Thompson wrote: [...]

Almost all modern character sets are based on ASCII, including
the ISO-8859 series and the various Unicode representations.
There are a few character sets that replace some ASCII characters
with accented letters, but I don't think they're used much anymore.

Click to expand...

They're still used in Scandinavia, though UTF-8 is gradually making
inroads as more and more applications support it.

And of course we'll always have EBCDIC.

Click to expand...

I was about to say "unfortunately", but then we'd be lacking a good
example of why it's a Bad Idea(tm) to assume all strings will be in
ASCII or at least a superset of ASCII...

On the other hand, if EBCDIC actually died (along with the more mildly
non-ASCII-compatible national variants), we wouldn't *need* it as a
Bad Idea(tm) example. The C standard could just specify the character
codes for the basic character set. But that's not going to happen for
a few decades, if ever.

I think there's an impression in some quarters that the C standard is
excessively flexible, allowing implementation-defined features that
are irrelevant in the real world. But in fact there are a number of
areas where the standard nails things down because all relevant
real-world implementations happen to be consistent -- such as
requiring CHAR_BIT >= 8, binary representation for integers, and one
of three specific schemes for representing signed integers. The
standard *could* have permitted 6-bit bytes, trinary representation,
and a bias scheme for signed integers, but there are no currently
relevant platforms that require those things.

Keith Thompson · Jul 8, 2009

Tim Rentsch said:
Depending on how this statement is meant, the proposition is
either trivially true or wrong.

Clearly a C implementation can use BCD to encode C values
using the simple trick of having one BCD digit per bit.
One octet (two nibbles) would hold two C "bits", or four
octets per char (with CHAR_BIT == 8).

That's not what I had in mind.

However, if "representation" is meant in the sense that
the Standard uses the term (as in "object representation",
for example), integers cannot be represented using
BCD, as that is not consistent with other requirements
for representation of integers.

What I was thinking of was an implementation that uses BCD for integer
representation (so that a stored bit pattern of 0001 0101 represents
the value 15), but that goes through substantial gyrations to make it
*appear* to be using a pure binary representation. If you examined
the underlying representations of objects using, say, a debugger,
you'd see the BCD, but the BCD representation wouldn't leak out to
become visible in the behavior of any program that avoids undefined
behavior. The implementation would satisfy the "least requirements"
of C99 5.1.2.3p5.

This would be a very difficult and silly thing to do, but I'm not
convinced it would violate the standard.

The counterargument is that the representation requirements of 6.2.6.1
must be met by the underlying representation, and that satisfying
5.1.2.3p5 is insufficient.

To put it another way, it should be possible to implement C on top of
a binary virtual machine running on BCD hardware.

Tim Rentsch · Jul 9, 2009

Keith Thompson said:
That's not what I had in mind.

What I was thinking of was an implementation that uses BCD for integer
representation (so that a stored bit pattern of 0001 0101 represents
the value 15), but that goes through substantial gyrations to make it
*appear* to be using a pure binary representation. If you examined
the underlying representations of objects using, say, a debugger,
you'd see the BCD, but the BCD representation wouldn't leak out to
become visible in the behavior of any program that avoids undefined
behavior. The implementation would satisfy the "least requirements"
of C99 5.1.2.3p5.

Yes, that would be another way of doing it. At the far end of the
spectrum, the entire program state could be represented as one
gigantic BCD-encoded integer (which would mean decoding/encoding
essentially the entire program state at every point). Just as silly
(or even sillier?), but of course the point is to ask whether it
can be done, not how useful it is.

This would be a very difficult and silly thing to do, but I'm not
convinced it would violate the standard.

The counterargument is that the representation requirements of 6.2.6.1
must be met by the underlying representation, and that satisfying
5.1.2.3p5 is insufficient.

To put it another way, it should be possible to implement C on top of
a binary virtual machine running on BCD hardware.

Right. BCD encoding (or any other computable encoding) is allowed
in the underlying hardware, but not in the C virtual machine.

Richard Bos · Jul 12, 2009

Keith Thompson said:
Note that Unicode is not an encoding. There are several encodings
used with Unicode, such as UTF-8 and UCS-4.

True now, but for some time there were no Unicode points beyond 0xFFFF,
and in that time (and on systems which still use that version of
Unicode, which AFAIK Windows XP does, for example) it is not
unreasonable to call UTF-16 the "Unicode" charset.

Richard

Ben Bacarisse · Jul 12, 2009

True now, but for some time there were no Unicode points beyond 0xFFFF,
and in that time (and on systems which still use that version of
Unicode, which AFAIK Windows XP does, for example) it is not
unreasonable to call UTF-16 the "Unicode" charset.

Doing so still confuses the numbering with the encoding. As I am sure
you know, there are two UTF-16 encodings and Windows program can
usually read both though one is presumably preferred internally.

I know *you* are not confused by this distinction, but I'd say it is
unreasonable to use the term the that way because it confuses others.

Nobody · Jul 12, 2009

True now, but for some time there were no Unicode points beyond 0xFFFF,
and in that time (and on systems which still use that version of
Unicode, which AFAIK Windows XP does, for example) it is not
unreasonable to call UTF-16 the "Unicode" charset.

1. Keith said "encodings", not "charsets".

2. If you limit yourself to Unicode 1.0, the character set is the BMP
(Basic Multilingual Plane).

3. The "raw" 16-bit encoding of the BMP is UCS-2; UTF-16 can encode up to
U+10FFFF using surrogates. These two are frequently confused, and given
that characters outside the BMP are uncommon, you can't generally deduce
which is correct based upon sample data.

how real and integer values are saved in C language	1	Jul 11, 2009
how real and integer values are stored in C language	3	Jul 12, 2009
Duplicate integer values in enum	6	Mar 25, 2014
Outputting signal values to terminal Within Character Array	0	Dec 10, 2021
How to try a range of hex values in C# code ?	0	Nov 19, 2022
I am trying to detect Which image id="" was clicked ?	22	Jan 3, 2023
Registration Form	7	Aug 30, 2023
Button enabling before all inputs are filled and disabling when all inputs are filled	1	Aug 18, 2022

in which form real and integer values are saved in memory

Tim Rentsch

Keith Thompson

Keith Thompson

Tim Rentsch

Richard Bos

Ben Bacarisse

Nobody

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads