in which form real and integer values are saved in memory

T

Tim Rentsch

Keith Thompson said:
It would be difficult to arrange that, though not impossible.

Depending on how this statement is meant, the proposition is
either trivially true or wrong.

Clearly a C implementation can use BCD to encode C values
using the simple trick of having one BCD digit per bit.
One octet (two nibbles) would hold two C "bits", or four
octets per char (with CHAR_BIT == 8).

However, if "representation" is meant in the sense that
the Standard uses the term (as in "object representation",
for example), integers cannot be represented using
BCD, as that is not consistent with other requirements
for representation of integers.
 
K

Keith Thompson

Stephen Sprunk said:
Keith Thompson wrote: [...]
Almost all modern character sets are based on ASCII, including
the ISO-8859 series and the various Unicode representations.
There are a few character sets that replace some ASCII characters
with accented letters, but I don't think they're used much anymore.

They're still used in Scandinavia, though UTF-8 is gradually making
inroads as more and more applications support it.
And of course we'll always have EBCDIC.

I was about to say "unfortunately", but then we'd be lacking a good
example of why it's a Bad Idea(tm) to assume all strings will be in
ASCII or at least a superset of ASCII...

On the other hand, if EBCDIC actually died (along with the more mildly
non-ASCII-compatible national variants), we wouldn't *need* it as a
Bad Idea(tm) example. The C standard could just specify the character
codes for the basic character set. But that's not going to happen for
a few decades, if ever.

I think there's an impression in some quarters that the C standard is
excessively flexible, allowing implementation-defined features that
are irrelevant in the real world. But in fact there are a number of
areas where the standard nails things down because all relevant
real-world implementations happen to be consistent -- such as
requiring CHAR_BIT >= 8, binary representation for integers, and one
of three specific schemes for representing signed integers. The
standard *could* have permitted 6-bit bytes, trinary representation,
and a bias scheme for signed integers, but there are no currently
relevant platforms that require those things.
 
K

Keith Thompson

Tim Rentsch said:
Depending on how this statement is meant, the proposition is
either trivially true or wrong.

Clearly a C implementation can use BCD to encode C values
using the simple trick of having one BCD digit per bit.
One octet (two nibbles) would hold two C "bits", or four
octets per char (with CHAR_BIT == 8).

That's not what I had in mind.
However, if "representation" is meant in the sense that
the Standard uses the term (as in "object representation",
for example), integers cannot be represented using
BCD, as that is not consistent with other requirements
for representation of integers.

What I was thinking of was an implementation that uses BCD for integer
representation (so that a stored bit pattern of 0001 0101 represents
the value 15), but that goes through substantial gyrations to make it
*appear* to be using a pure binary representation. If you examined
the underlying representations of objects using, say, a debugger,
you'd see the BCD, but the BCD representation wouldn't leak out to
become visible in the behavior of any program that avoids undefined
behavior. The implementation would satisfy the "least requirements"
of C99 5.1.2.3p5.

This would be a very difficult and silly thing to do, but I'm not
convinced it would violate the standard.

The counterargument is that the representation requirements of 6.2.6.1
must be met by the underlying representation, and that satisfying
5.1.2.3p5 is insufficient.

To put it another way, it should be possible to implement C on top of
a binary virtual machine running on BCD hardware.
 
T

Tim Rentsch

Keith Thompson said:
That's not what I had in mind.


What I was thinking of was an implementation that uses BCD for integer
representation (so that a stored bit pattern of 0001 0101 represents
the value 15), but that goes through substantial gyrations to make it
*appear* to be using a pure binary representation. If you examined
the underlying representations of objects using, say, a debugger,
you'd see the BCD, but the BCD representation wouldn't leak out to
become visible in the behavior of any program that avoids undefined
behavior. The implementation would satisfy the "least requirements"
of C99 5.1.2.3p5.

Yes, that would be another way of doing it. At the far end of the
spectrum, the entire program state could be represented as one
gigantic BCD-encoded integer (which would mean decoding/encoding
essentially the entire program state at every point). Just as silly
(or even sillier?), but of course the point is to ask whether it
can be done, not how useful it is.

This would be a very difficult and silly thing to do, but I'm not
convinced it would violate the standard.

The counterargument is that the representation requirements of 6.2.6.1
must be met by the underlying representation, and that satisfying
5.1.2.3p5 is insufficient.

To put it another way, it should be possible to implement C on top of
a binary virtual machine running on BCD hardware.

Right. BCD encoding (or any other computable encoding) is allowed
in the underlying hardware, but not in the C virtual machine.
 
R

Richard Bos

Keith Thompson said:
Note that Unicode is not an encoding. There are several encodings
used with Unicode, such as UTF-8 and UCS-4.

True now, but for some time there were no Unicode points beyond 0xFFFF,
and in that time (and on systems which still use that version of
Unicode, which AFAIK Windows XP does, for example) it is not
unreasonable to call UTF-16 the "Unicode" charset.

Richard
 
B

Ben Bacarisse

True now, but for some time there were no Unicode points beyond 0xFFFF,
and in that time (and on systems which still use that version of
Unicode, which AFAIK Windows XP does, for example) it is not
unreasonable to call UTF-16 the "Unicode" charset.

Doing so still confuses the numbering with the encoding. As I am sure
you know, there are two UTF-16 encodings and Windows program can
usually read both though one is presumably preferred internally.

I know *you* are not confused by this distinction, but I'd say it is
unreasonable to use the term the that way because it confuses others.
 
N

Nobody

True now, but for some time there were no Unicode points beyond 0xFFFF,
and in that time (and on systems which still use that version of
Unicode, which AFAIK Windows XP does, for example) it is not
unreasonable to call UTF-16 the "Unicode" charset.

1. Keith said "encodings", not "charsets".

2. If you limit yourself to Unicode 1.0, the character set is the BMP
(Basic Multilingual Plane).

3. The "raw" 16-bit encoding of the BMP is UCS-2; UTF-16 can encode up to
U+10FFFF using surrogates. These two are frequently confused, and given
that characters outside the BMP are uncommon, you can't generally deduce
which is correct based upon sample data.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,780
Messages
2,569,614
Members
45,287
Latest member
Helenfem

Latest Threads

Top