char type size

F

Francis Moreau

Hello,

I'm wondering if I can find a (realistic) implementation of C where
the char type size is different from 8 bits ?

The C spec seems to say that a char type must be able to contain any
basic execution character sets and have at least 8 bits.

But I'm not sure if the C claims that the size of char is exactly one
byte. If so that would mean that any implementation must have a byte
whose size is at least 8 bits. This is actually the case for all
architectures I aware of, so why don't the spec simply says that the
char _is_ 8 bits ?

BTW, how many members are there in the basic execution character set ?

Again the C spec says:

In a character constant or string literal, members of the execution
character set shall be
represented by corresponding members of the source character set or
by escape
sequences consisting of the backslash \ followed by one or more
characters.

The last part of this section ("backslash followed by one or more
char) confuses me since it makes the number of member undefined.

Could anybody give me some clues ?

thanks
 
I

Ian Collins

Francis said:
Hello,

I'm wondering if I can find a (realistic) implementation of C where
the char type size is different from 8 bits ?

Any number of DSPs.
 
B

Boon

Francis said:
I'm wondering if I can find a (realistic) implementation of C where
the char type size is different from 8 bits ?

The C spec seems to say that a char type must be able to contain any
basic execution character sets and have at least 8 bits.

But I'm not sure if the C claims that the size of char is exactly one
byte. If so that would mean that any implementation must have a byte
whose size is at least 8 bits. This is actually the case for all
architectures I aware of, so why don't the spec simply says that the
char _is_ 8 bits ?

NB : other standards may /require/ that bytes be, in fact, octets,
i.e. that CHAR_BIT be 8 (e.g. POSIX).

http://www.opengroup.org/onlinepubs/9699919799/basedefs/limits.h.html

POSIX also requires int to be (at least) 32-bits wide.

Regards.
 
J

jacob navia

Francis said:
Hello,

I'm wondering if I can find a (realistic) implementation of C where
the char type size is different from 8 bits ?

In an implementation of lcc-win for a DSP we have
sizeof(char) == sizeof(short) == sizeof(int) == 16 bits.

The machine can't access 8 bits and always accesses 16 bits.

sizeof(long) is 32 bits. No floating point.
 
F

Francis Moreau

In an implementation of lcc-win for a DSP we have
sizeof(char) == sizeof(short) == sizeof(int) == 16 bits.

The machine can't access 8 bits and always accesses 16 bits.

sizeof(long) is 32 bits. No floating point.

I see

Thanks
 
K

Keith Thompson

Jack Klein said:
On Tue, 31 Mar 2009 01:39:25 -0700 (PDT), Francis Moreau


C does not define an exact number, it is implementation-defined. That
means a compiler's documentation must specify what it is.

If I'm reading C99 5.2.1 correctly, you're mistaken, probably due to a
simple confusion of terminology.

The "basic execution character set" contains exactly 96 characters
(unless I've miscounted), including the null character, the uppercase
and lowercase Latin letters, the decimal digits, 29 listed graphic
characters, the space character, and control characters representing
horizontal tab, vertical tab, and form feed.

(This happens to be the set of printable ASCII characters, minus '$',
'@', and '`', plus null, horizontal tab, vertical tab, and form feed.)

The "execution character set" consists of the fixed "basic execution
character set" plus the variable, locale-specific, an possibly empty
set of "extended characters". The full "execution character set" is
also called the "extended execution character set". (Confusingly, the
extend character set is not the set of extended characters.)

The same applies to the (basic|extended) source character set, except
that the basic source character set doesn't include the null
character, and the respective extended character sets needn't be the
same.

[...]
 
F

Francis Moreau

Right off the top of my head, Texas Instrument Digital Signal
Processors in the 28xx family.  I use them quite a lot.  char, int,
and short have CHAR_BIT == 16, sizeof(char), sizeof(short), and
sizeof(int) are all 1.  Long has 32 bits and size is 2, long long has
64 bits and size is 4.

We used to use a DSP family called SHARC from Analog Devices, on a now
discontinued product.  It had char, short, int and long all 32 bits
and all 1 byte in size.

thanks for these examples.
Yes, that's correct.


Yes, C requires and guarantees, not claims, that sizeof(char) is
exactly one byte.  However C does not, and never has, stated that a
byte is an "octet", the precise term for a data type consisting of
exactly 8 bits.

It becomes more confusing over time as imprecise usage has come to
cause most people, at least in English-speaking countries, to think
that a byte is always 8 bits, but that is not the C definition of
byte.

I think what is really confusing is that to get the whole definition
of the char, we need to parse a lot of different sections in the spec.
What you are missing is the other requirements that C places on what
it defines as a byte:

nop I did read that.
 
F

Francis Moreau

If I'm reading C99 5.2.1 correctly, you're mistaken, probably due to a
simple confusion of terminology.

The "basic execution character set" contains exactly 96 characters
(unless I've miscounted), including the null character, the uppercase
and lowercase Latin letters, the decimal digits, 29 listed graphic
characters, the space character, and control characters representing
horizontal tab, vertical tab, and form feed.

(This happens to be the set of printable ASCII characters, minus '$',
'@', and '`', plus null, horizontal tab, vertical tab, and form feed.)

Hmm that's true.

Does it mean that using a string including the '@' character in a C
program is not portable ?
The "execution character set" consists of the fixed "basic execution
character set" plus the variable, locale-specific, an possibly empty
set of "extended characters".  The full "execution character set" is
also called the "extended execution character set".  (Confusingly, the
extend character set is not the set of extended characters.)

The same applies to the (basic|extended) source character set, except
that the basic source character set doesn't include the null
character, and the respective extended character sets needn't be the
same.

Am I right if I claim these

basic-execution-character-set = basic-source-character-set + null-char
+ alert + backspace + carriage-return + new line
extended-source-character-set = local-specific
extended-execution-character-set = local-specific

thanks
 
R

Richard Bos

Francis Moreau said:
Hmm that's true.

Does it mean that using a string including the '@' character in a C
program is not portable ?

Yes. So is one using '$'. Both of these have surprised a good many
people in the past, and I believe at least some historical Ebbydick-
using IBM machines had indeed no '@' (though, being IBM, they must have
had a '$', of course).

Richard
 
K

Keith Thompson

Keith Thompson said:
If I'm reading C99 5.2.1 correctly, you're mistaken, probably due to a
simple confusion of terminology.

The "basic execution character set" contains exactly 96 characters
(unless I've miscounted), including the null character, the uppercase
and lowercase Latin letters, the decimal digits, 29 listed graphic
characters, the space character, and control characters representing
horizontal tab, vertical tab, and form feed.

(This happens to be the set of printable ASCII characters, minus '$',
'@', and '`', plus null, horizontal tab, vertical tab, and form feed.)
[...]

Francis Moreau reminds me that I missed a few by not reading far
enough in 5.2.1. The "basic execution character set" additionally
contains alert, backspace, carriage return, and new line.

[...]
 
B

Barry Schwarz

snip
Yes. So is one using '$'. Both of these have surprised a good many
people in the past, and I believe at least some historical Ebbydick-
using IBM machines had indeed no '@' (though, being IBM, they must have
had a '$', of course).

The first EBCDIC machine I was a 360 mod 40 in the mid 60s and it had
the @ character. Maybe you know of an earlier one that didn't.
 
F

Francis Moreau

Yes. So is one using '$'.


Hmm now I'm not sure anymore after reading from 5.2.1.{3}:

If any other characters are encountered in a source file (except in
an identifier, a character constant, a string literal, a header
name,
a comment, or a preprocessing token that is never converted to a
token), the behavior is undefined.

Note the "except in a string literal."

Could anybody shed some light ?

Thanks
 
K

Keith Thompson

Francis Moreau said:
Hmm now I'm not sure anymore after reading from 5.2.1.{3}:

If any other characters are encountered in a source file (except in
an identifier, a character constant, a string literal, a header
name,
a comment, or a preprocessing token that is never converted to a
token), the behavior is undefined.

Note the "except in a string literal."

Could anybody shed some light ?

Hmm.

If the source and execution character sets include the '@' character
(which may or may not be the case for a given implementation), then
the behavior of a program with an '@' character in a character
constant or string literal is well defined. So the standard doesn't
give implementations license to do anything other than the obvious;
putchar('@') must, for such an implementation, print an '@' character
to stdout.

But the program isn't portable, in the sense that there's no guarantee
that it can be ported. If, for example, you copy the source file to a
machine with a different character set, it has to be translated; if
the target machine's character set doesn't include the '@' character,
then that translation is going to be a problem.

(It's also conceivable that an *implementation's* character set
doesn't include '@' even though the underlying machine supports it.
In that case, there can be an '@' character in a source file, but the
program's behavior is undefined; the implementation can do anything it
likes. But that's approaching DS9K territory.)
 
F

Francis Moreau

Hmm.

If the source and execution character sets include the '@' character
(which may or may not be the case for a given implementation), then
the behavior of a program with an '@' character in a character
constant or string literal is well defined.  So the standard doesn't
give implementations license to do anything other than the obvious;
putchar('@') must, for such an implementation, print an '@' character
to stdout.

Ok.

All of this means that the (source|execution) characters set is
undefined or rather a local-specific defined.

So using '@' character, for example, is valid as long as the local
conventions define it but can be no more valid if the source is used
where the local conventions don't define '@'. So it's basically not
portable but the behaviour is undefined.

But I still don't see the point of the part of 5.2.1.{3}, which says:

... (except in an identifier, a character constant,
a string literal, a header name, a comment, or a
preprocessing token that is never converted to a
token) ...

I don't see why all of them are exceptions.
 
D

Dik T. Winter

> On Wed, 01 Apr 2009 09:21:18 GMT, (e-mail address removed) (Richard Bos)
> wrote: ....
>
> The first EBCDIC machine I was a 360 mod 40 in the mid 60s and it had
> the @ character. Maybe you know of an earlier one that didn't.

As far as I know there are different code-pages for EBCDIC, and it is quite
possible that in some of them '@' is missing.
 
L

Lew Pitcher

As far as I know there are different code-pages for EBCDIC, and it is
quite possible that in some of them '@' is missing.

According to a mirror of the ISO/IEC JTC1/SC2 (International Charactersets)
charactermap website (http://anubis.dkuug.dk/i18n/charmaps/),
EBCDIC-AT-DE, EBCDIC-BE, EBCDIC-BR, EBCDIC-CP-GR, EBCDIC-DK-NO,
EBCDIC-FI-SE, EBCDIC-FR, EBCDIC-IS-FRISS, EBCDIC-IT, and EBCDIC-PT all lack
the "Commercial AT" ('@') sign. The other EBCDIC charactersets map the '@'
to various codepoints (most commonly 0x7c, but also 0x80, 0xec, 0x44, 0xac,
0xb5, or 0xaf) on a characterset-by-characterset basis.


--
Lew Pitcher

Master Codewright & JOAT-in-training | Registered Linux User #112576
http://pitcher.digitalfreehold.ca/ | GPG public key available by request
---------- Slackware - Because I know what I'm doing. ------
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,074
Latest member
StanleyFra

Latest Threads

Top