basic source character set

B

borophyll

Hi

Please let me know if I have this clear. The basic source character
set is the list of (96) characters that all implementations must have
in their vocabulary. All other characters recognized by an
implementation are implementation defined, and will not necessarily be
the same across implementations. The key issue as far as developers
are concerned is that if they want their code to be perfectly
portable, then they must restrict their source files to using only
characters from the basic source character set, or use universal
character names to insert characters outside of the basic source
character set.

For example, the following code is not strictly portable:

char *str = "$";

since the "$" character is not a member of the basic source character
set. To make it portable, you would need to do the following

char *str = "\u0024";

regards, B.
 
R

Richard Heathfield

(e-mail address removed) said:
Hi

Please let me know if I have this clear. The basic source character
set is the list of (96) characters that all implementations must have
in their vocabulary. All other characters recognized by an
implementation are implementation defined, and will not necessarily be
the same across implementations. The key issue as far as developers
are concerned is that if they want their code to be perfectly
portable, then they must restrict their source files to using only
characters from the basic source character set, or use universal
character names to insert characters outside of the basic source
character set.

Yes, that's basically it. In practice, I think you'll be okay with all
the printable characters that are in the common subset of ASCII and
EBCDIC, although I await correction on the matter from those who have
used conforming C implementations that employ more esoteric source
character sets. Unfortunately, however, AFAICT this only extends the
basic character set by two: $ and @
For example, the following code is not strictly portable:

char *str = "$";

since the "$" character is not a member of the basic source character
set.

Strictly speaking, you are correct, yes. Of course, you can /read/ a '$'
character from an open stream at runtime without any trouble at all, if
one happens to be present and is representable as an unsigned char.
 
C

CBFalconer

Please let me know if I have this clear. The basic source
character set is the list of (96) characters that all
implementations must have in their vocabulary. All other
characters recognized by an implementation are implementation
defined, and will not necessarily be the same across
implementations. The key issue as far as developers are
concerned is that if they want their code to be perfectly
portable, then they must restrict their source files to using
only characters from the basic source character set, or use
universal character names to insert characters outside of the
basic source character set.

Not quite. Including space, there are 92 printing chars in the
basic set (not 96). Chars such as $ are language dependant, and
may therefore be different on other machines. Other missing chars
are '@', '`' and the rubout (hex 7f in ASCII). The following is an
extract from N869:

[#3] Both the basic source and basic execution character
sets shall have at least the following members: the 26
uppercase letters of the Latin alphabet

A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z

the 26 lowercase letters of the Latin alphabet

a b c d e f g h i j k l m
n o p q r s t u v w x y z

the 10 decimal digits

0 1 2 3 4 5 6 7 8 9

the following 29 graphic characters

! " # % & ' ( ) * + , - . / :
; < = > ? [ \ ] ^ _ { | } ~

the space character, and control characters representing
horizontal tab, vertical tab, and form feed. The
 
A

Army1987

Not quite. Including space, there are 92 printing chars in the
basic set (not 96).
He did not specify "printing characters", so he's only off by one.
[...]
 
P

Peter Nilsson

...if [developers] want their code to be perfectly
portable, then they must restrict their source files to
using only characters from the basic source character set,
Yes.

or use universal character names to insert characters
outside of the basic source character set.

If you have a supporting compiler.
For example, the following code is not strictly portable:

char *str = "$";

since the "$" character is not a member of the basic source
character set.
Correct.

To make it portable, you would need to do the following

char *str = "\u0024";

That's fine for the source, but it won't actually help you
when the program executes. There is still no guarantee that
the dollar sign is a member of the execution character set,
even though you can now 'name' it.

You'll get a dollar sign on the systems that have them, but
you'll get an implementation defined character on the systems
that don't.

Given that programs that _need_ $ and @ invariably need 'A'
to be 65 as well, you might as well go ahead and use them in
the source.

[Aside: One of the pre-standard drafts of C99 actually
precluded the naming of $ and @ with universal character
escapes. Fortunately, someone alerted the Committee of
their apparent use in some circles. :-]
 
R

Richard Heathfield

Peter Nilsson said:

Given that programs that _need_ $ and @ invariably need 'A'
to be 65 as well, you might as well go ahead and use them in
the source.

But this is not true. I've worked on a number of programs that needed a
'$' but which were quite happy for 'A' to have a non-65 code point (and
it's just as well, since they often had to run on systems where 'A' was
in fact not 65).
 
R

Richard Bos

Peter Nilsson said:
Given that programs that _need_ $ and @ invariably need 'A'
to be 65 as well, you might as well go ahead and use them in
the source.

A large amount of accounting software written to run on IBM systems
would be surprised to hear that (though I don't know whether any of that
software was written in C).

Richard
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top