alphabet q

Joe Smith · May 24, 2006

"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"abcdefghijklmnopqrstuvwxyz"
"0123456789"
" "
"!#%^&*()-_"
"+=~[]\|;:\'"
"\"{},.<>/\?"
"\a\b\f\n\r\t\v\\"

Do the above string literals comprise an alphabet for C? joe

Kenneth Brody · May 24, 2006

Joe said:
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"abcdefghijklmnopqrstuvwxyz"
"0123456789"
" "
"!#%^&*()-_"
"+=~[]\|;:\'"
"\"{},.<>/\?"
"\a\b\f\n\r\t\v\\"

Do the above string literals comprise an alphabet for C? joe

Define "alphabet".

Also, a quick scan shows no '@' or '$'.

--
+-------------------------+--------------------+-----------------------------+
| Kenneth J. Brody | www.hvcomputer.com | |
| kenbrody/at\spamcop.net | www.fptech.com | #include <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------------+
Don't e-mail me at: <mailto:[email protected]>

Walter Roberson · May 24, 2006

"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"abcdefghijklmnopqrstuvwxyz"
"0123456789"
" "
"!#%^&*()-_"
"+=~[]\|;:\'"
"\"{},.<>/\?"
"\a\b\f\n\r\t\v\\"

Do the above string literals comprise an alphabet for C? joe

You don't need to backslash the bar character in any literal form.

You don't need to backslash the single-quote within a string literal.

You don't need to backslash the question-mark in that context,
only if it is followed by another question-mark (in which case
triglyphs would start coming into play).

With regard to that last string: in the source character set,
space and the control characters representing horizontal tab,
vertical tab, and form feed are valid, but newline is not
explicitly valid -- only "some way of indicating the
end of each line of text"; the standard treats each end of line
as if were a single new-line character. If you happened to be
using a system with fixed length records, then an occurance
of a newline in the source would not necessarily be valid! Likewise,
carriage return is not explicitly valid in source text except to the
extent that that particular system happens to include carriage return
in the end of line indicator.

Eric Sosman · May 24, 2006

Joe Smith wrote On 05/24/06 14:06,:

"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"abcdefghijklmnopqrstuvwxyz"
"0123456789"
" "
"!#%^&*()-_"
"+=~[]\|;:\'"
"\"{},.<>/\?"
"\a\b\f\n\r\t\v\\"

Do the above string literals comprise an alphabet for C? joe

No {obviously}. Why do you ask?

Richard Heathfield · May 24, 2006

Kenneth Brody said:

Joe said:
Joe said:

"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"abcdefghijklmnopqrstuvwxyz"
"0123456789"
" "
"!#%^&*()-_"
"+=~[]\|;:\'"
"\"{},.<>/\?"
"\a\b\f\n\r\t\v\\"

Do the above string literals comprise an alphabet for C? joe

Click to expand...

Define "alphabet".

Also, a quick scan shows no '@' or '$'.

Neither of those appears in the guaranteed C source character set.

Walter Roberson · May 24, 2006

Joe Smith wrote:

"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"abcdefghijklmnopqrstuvwxyz"
"0123456789"
" "
"!#%^&*()-_"
"+=~[]\|;:\'"
"\"{},.<>/\?"
"\a\b\f\n\r\t\v\\"
Do the above string literals comprise an alphabet for C? joe

Click to expand...

Define "alphabet".

Also, a quick scan shows no '@' or '$'.

Neither @ nor $ are part of C's basic source character set. Any
occurance of either outside of a character constant,
string ltieral, header name, comment, or
"preprocessing token that is never converted to a token" (whatever that
means), results in undefined behaviour.

John Devereux · May 24, 2006

Kenneth Brody said:
Joe said:

"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"abcdefghijklmnopqrstuvwxyz"
"0123456789"
" "
"!#%^&*()-_"
"+=~[]\|;:\'"
"\"{},.<>/\?"
"\a\b\f\n\r\t\v\\"

Do the above string literals comprise an alphabet for C? joe

Click to expand...

Define "alphabet".

Also, a quick scan shows no '@' or '$'.

Hmm, these are not "needed" for C are they...

Seems a waste of good characters to me. C does make admirable use of
the symbols that most unenlightened folk would have no clue where to
find on their keyboard. But @ and $, also ` and ¬ are sadly neglected
by the C language, in my opinion.

Surely this situation cries out for some new operators to be added to
the language.

Eric Sosman · May 24, 2006

Eric Sosman wrote On 05/24/06 14:24,:

Joe Smith wrote On 05/24/06 14:06,:

"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"abcdefghijklmnopqrstuvwxyz"
"0123456789"
" "
"!#%^&*()-_"
"+=~[]\|;:\'"
"\"{},.<>/\?"
"\a\b\f\n\r\t\v\\"

Do the above string literals comprise an alphabet for C? joe

Click to expand...

No {obviously}. Why do you ask?

Hmmm -- not so obvious (I looked for something and
didn't find it, but it was there anyhow -- my eyes are
exhibiting undefined behavior ...). Still: Why do you
ask? Section 5.2.1 has the entire list if you want it.

Eric Sosman · May 24, 2006

John Devereux wrote On 05/24/06 14:42,:

Joe said:
Joe said:

"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"abcdefghijklmnopqrstuvwxyz"
"0123456789"
" "
"!#%^&*()-_"
"+=~[]\|;:\'"
"\"{},.<>/\?"
"\a\b\f\n\r\t\v\\"

Do the above string literals comprise an alphabet for C? joe

Click to expand...

Define "alphabet".

Also, a quick scan shows no '@' or '$'.

Click to expand...

Hmm, these are not "needed" for C are they...

Seems a waste of good characters to me. C does make admirable use of
the symbols that most unenlightened folk would have no clue where to
find on their keyboard. But @ and $, also ` and ¬ are sadly neglected
by the C language, in my opinion.

Surely this situation cries out for some new operators to be added to
the language.

Good idea. Let's pick an unused character like ÷ and
use it for the long-desired exponentiation operator ...

Joe Smith · May 24, 2006

Eric Sosman said:
Eric Sosman wrote On 05/24/06 14:24,:

Joe Smith wrote On 05/24/06 14:06,:

"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"abcdefghijklmnopqrstuvwxyz"
"0123456789"
" "
"!#%^&*()-_"
"+=~[]\|;:\'"
"\"{},.<>/\?"
"\a\b\f\n\r\t\v\\"

Do the above string literals comprise an alphabet for C? joe

Click to expand...

No {obviously}. Why do you ask?

Click to expand...

Hmmm -- not so obvious (I looked for something and
didn't find it, but it was there anyhow -- my eyes are
exhibiting undefined behavior ...). Still: Why do you
ask? Section 5.2.1 has the entire list if you want it.

The short answer to motivation is, as usual, that I'm not getting something.
Since having made the original query, I've been going back and forth on
whether backslash zero was its own character. How much larger is the list
in sec 5.2.1 than mine? Finally, how usable is the soft version of the
Standard? joe

Joe Smith · May 24, 2006

Walter Roberson said:
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"abcdefghijklmnopqrstuvwxyz"
"0123456789"
" "
"!#%^&*()-_"
"+=~[]\|;:\'"
"\"{},.<>/\?"
"\a\b\f\n\r\t\v\\"

Click to expand...

Do the above string literals comprise an alphabet for C? joe

Click to expand...

You don't need to backslash the bar character in any literal form.

You don't need to backslash the single-quote within a string literal.

You don't need to backslash the question-mark in that context,
only if it is followed by another question-mark (in which case
triglyphs would start coming into play).

With regard to that last string: in the source character set,
space and the control characters representing horizontal tab,
vertical tab, and form feed are valid, but newline is not
explicitly valid -- only "some way of indicating the
end of each line of text"; the standard treats each end of line
as if were a single new-line character. If you happened to be
using a system with fixed length records, then an occurance
of a newline in the source would not necessarily be valid! Likewise,
carriage return is not explicitly valid in source text except to the
extent that that particular system happens to include carriage return
in the end of line indicator.
--

You seem to be right about the part that I was able to understand:
char m[] = "\a\b\f\n\r\t\v\\a'a\'a";
This gives the first seven characters that I would call formatting
characters, then the backslash.
the ' and the \' are each taken as single characters, indeed the same
character. It hardly seems possible to me that C could port anywhere
without an explicitly valid newline. joe
------------

All is vanity. -- Ecclesiastes

^^ <--The glory of man ...

Eric Sosman · May 24, 2006

Joe Smith wrote On 05/24/06 15:31,:

Eric Sosman wrote On 05/24/06 14:24,:

Joe Smith wrote On 05/24/06 14:06,:

"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"abcdefghijklmnopqrstuvwxyz"
"0123456789"
" "
"!#%^&*()-_"
"+=~[]\|;:\'"
"\"{},.<>/\?"
"\a\b\f\n\r\t\v\\"

Do the above string literals comprise an alphabet for C? joe

No {obviously}. Why do you ask?

Click to expand...

Hmmm -- not so obvious (I looked for something and
didn't find it, but it was there anyhow -- my eyes are
exhibiting undefined behavior ...). Still: Why do you
ask? Section 5.2.1 has the entire list if you want it.

Click to expand...

The short answer to motivation is, as usual, that I'm not getting something.
Since having made the original query, I've been going back and forth on
whether backslash zero was its own character.

Backslash zero is in the required "execution character
set," but not in the "source character set." Some of the
characters in your list (\a \b \n \r) are the same: required
in the execution character set, but not in the source set.

The status of \n is a bit convoluted: It is required to
be present in the execution character set, but is not listed
among the characters required for the source set. In fact,
its use in source is governed by 5.2.1/3:

If any other characters are encountered in a source
file (except in an identifier, a character constant,
a string literal, a header name, a comment, or a
preprocessing token that is never converted to a
token), the behavior is undefined.

Yet the same paragraph also states

... this Standard treats each end-of-line indicator
as if it were a single new-line character.

I think this apparent contradiction is resolved by considering
\n only in its special role as a line terminator, and not as
a character per se. That is, on a system that uses a different
line-ending convention, an "embedded" \n character would produce
undefined behavior. For example, if some system defines a "line"
as a two-digit character count followed by the characters, the
lines

10if (a < 0)12 abort();

could be valid but the single line

23if (a < 0)§ abort();

(where § indicates the position of a newline character) would
lead to undefined behavior.

How much larger is the list
in sec 5.2.1 than mine?

5.2.1 describes the source character set as containing
fifty-two letters, ten digits, twenty-nine graphics, the
space character, and the three control characters \t \v \f.
The execution character set contains all these plus \a \b
\r \n \0. These constitute the "basic" source and execution
character sets; an implementation may extend either or both
with additional characters.

Finally, how usable is the soft version of the
Standard? joe

Depends on one's purposes, I guess. It seems usable
for most of mine; YMMV.

Keith Thompson · May 24, 2006

Do the above string literals comprise an alphabet for C? joe

Click to expand...

[...]
You don't need to backslash the question-mark in that context,
only if it is followed by another question-mark (in which case
triglyphs would start coming into play).

Trigraphs, not triglyphs.

With regard to that last string: in the source character set,
space and the control characters representing horizontal tab,
vertical tab, and form feed are valid, but newline is not
explicitly valid -- only "some way of indicating the
end of each line of text"; the standard treats each end of line
as if were a single new-line character. If you happened to be
using a system with fixed length records, then an occurance
of a newline in the source would not necessarily be valid! Likewise,
carriage return is not explicitly valid in source text except to the
extent that that particular system happens to include carriage return
in the end of line indicator.

You're assuming that Joe is asking about the source character set
rather than the execution character set. Given the way he asked the
question, it's not clear to me just what he's asking.

The C standard does use the term "alphabet"; C99 5.2.1 refers to the
26 uppercase letters and the 26 lowercase letters of the Latin
alphabet.

Joe needs to tell us what he means by "an alphabet for C", or to ask a
different question. Or he can read section 5.2.1 of either the C90
standard or the C99 standard (for the latter, I recommend n1124.pdf).

To answer Joe's question from another followup in this thread, C99
5.2.1p2 says:

A byte with all bits set to 0, called the _null character_, shall
exist in the basic execution character set; it is used to
terminate a character string.

The null character is not in the basic source character set; the
character constant '\0' is used to represent it.

Joe Smith · May 24, 2006

Keith Thompson said:
Trigraphs, not triglyphs.

Trigraphs are not what I'm after. I'm hoping, that at the end of this
thread, when I get what I'm talking about, I can do this.
For my purposes, and it might be here noted that my purpose is not to
influence the Standard as much as adhere to it, I'm going to take all five
of the above mentioned source characters, and it is a headcount of source
characters that I am primarily after.

You're assuming that Joe is asking about the source character set
rather than the execution character set. Given the way he asked the
question, it's not clear to me just what he's asking.

I wouldn't mind knowing more about the execution set, if you felt like
talking about it.

The C standard does use the term "alphabet"; C99 5.2.1 refers to the
26 uppercase letters and the 26 lowercase letters of the Latin
alphabet.

Joe needs to tell us what he means by "an alphabet for C", or to ask a
different question. Or he can read section 5.2.1 of either the C90
standard or the C99 standard (for the latter, I recommend n1124.pdf).

To answer Joe's question from another followup in this thread, C99
5.2.1p2 says:

A byte with all bits set to 0, called the _null character_, shall
exist in the basic execution character set; it is used to
terminate a character string.

The null character is not in the basic source character set; the
character constant '\0' is used to represent it.

26 + 26 + 10 + 1 + 29 + 5 <--is that going to sum to the number of source
characters, given the above? joe

Joe Smith · May 25, 2006

Joe Smith wrote On 05/24/06 15:31,:

Eric Sosman wrote On 05/24/06 14:24,:

Joe Smith wrote On 05/24/06 14:06,:

"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"abcdefghijklmnopqrstuvwxyz"
"0123456789"
" "
"!#%^&*()-_"
"+=~[]\|;:\'"
"\"{},.<>/\?"
"\a\b\f\n\r\t\v\\"

Do the above string literals comprise an alphabet for C? joe

No {obviously}. Why do you ask?

Click to expand...

Hmmm -- not so obvious (I looked for something and
didn't find it, but it was there anyhow -- my eyes are
exhibiting undefined behavior ...). Still: Why do you
ask? Section 5.2.1 has the entire list if you want it.

Click to expand...

The short answer to motivation is, as usual, that I'm not getting
something.
Since having made the original query, I've been going back and forth on
whether backslash zero was its own character.

Backslash zero is in the required "execution character
set," but not in the "source character set." Some of the
characters in your list (\a \b \n \r) are the same: required
in the execution character set, but not in the source set.

The status of \n is a bit convoluted: It is required to
be present in the execution character set, but is not listed
among the characters required for the source set. In fact,
its use in source is governed by 5.2.1/3:

If any other characters are encountered in a source
file (except in an identifier, a character constant,
a string literal, a header name, a comment, or a
preprocessing token that is never converted to a
token), the behavior is undefined.

Yet the same paragraph also states

.... this Standard treats each end-of-line indicator
as if it were a single new-line character.

I think this apparent contradiction is resolved by considering
\n only in its special role as a line terminator, and not as
a character per se. That is, on a system that uses a different
line-ending convention, an "embedded" \n character would produce
undefined behavior. For example, if some system defines a "line"
as a two-digit character count followed by the characters, the
lines

10if (a < 0)12 abort();

could be valid but the single line

23if (a < 0)§ abort();

(where § indicates the position of a newline character) would
lead to undefined behavior.

How much larger is the list
in sec 5.2.1 than mine?

5.2.1 describes the source character set as containing
fifty-two letters, ten digits, twenty-nine graphics, the
space character, and the three control characters \t \v \f.
The execution character set contains all these plus \a \b
\r \n \0. These constitute the "basic" source and execution
character sets; an implementation may extend either or both
with additional characters.

Finally, how usable is the soft version of the
Standard? joe

Depends on one's purposes, I guess. It seems usable
for most of mine; YMMV.

/* begin reply */
I seem to be additionally confused by my newsreader today, as I'm not seeing
the '>' as I would expect. My guess is that the execution char set is going
to the the chars that C requires from its environment, in my case, an OS.
Can a person determine the status of '\n' using #ifdefs? joe

Walter Roberson · May 25, 2006

Joe Smith said:
My guess is that the execution char set is going
to the the chars that C requires from its environment, in my case, an OS.

As far as the C standard is concerned, it is valid to compile on one
system using one character set for the source, and to execute on
a different system that uses a different character set for the
execution. In such a situation, though, there would need to be enough
coordination in the cross-compilation process so that character
constants and string literals held the correct values for the
execution target.

Can a person determine the status of '\n' using #ifdefs? joe

No, because what is output for \n depends upon whether the stream
is binary or text -- and it doesn't necessarily expand to anything
at all, if the file happens to be use fixed-length records or
counted-length records or some other unusual format. The underlying
end-of-line representation for any particular file can vary from file
to file in ways that are essentially unpredictable (e.g., determined
by a JCL DD statement.)

Eric Sosman · May 25, 2006

Joe Smith wrote On 05/25/06 12:46,:

[garbled quoting snipped for clarity's sake; see up-thread]

/* begin reply */
I seem to be additionally confused by my newsreader today, as I'm not seeing
the '>' as I would expect. My guess is that the execution char set is going
to the the chars that C requires from its environment, in my case, an OS.

C is sometimes used as a cross-compiled language, where
a compiler on one system produces a program to be executed
on a dissimilar system. This sort of thing is quite common
when C programs are written to be executed in free-standing
"embedded" systems, whose hardware may be well-suited to
operating your microwave oven but ill-adapted for the job
of running a C compiler. You build the program on a machine
that has niceties like editors, compilers, file systems, and
so on, and then you (somehow) transfer the generated code
to the target machine and execute it there.

This raises the possibility that the system where the
compiler operates may use a different character set than the
system where the program runs, so the Standard is careful to
distinguish the "source character set" (the alphabet in which
the program source is written) from the "execution character
set" (the set of characters the program can use while running).
Not only can the encodings of the characters differ in the two
environments ('0' might be 48 in the source set and 240 in the
execution set), but the repertoire of available characters can
also be different.

The Standard enumerates the minimum required contents of
the two character sets. As you can see, there is a lot of
overlap; in fact, all the required source characters are also
required to exist in the execution set. (One imagines this is
to ensure that a C compiler can be written in C.) A small
number of execution-set characters, though, are not necessary
for the task of writing C source: there's little reason to
want a backspace character as part of the source alphabet, for
example. (A whimisical idea: considering how widely casts are
misused, perhaps the spelling of the cast operator ought to
involve the \a character!)

Can a person determine the status of '\n' using #ifdefs? joe

I'm afraid I don't understand the question. The execution
character set must include the newline character, so its status
is "present" in any conforming C implementation. Its role in
the source character set is a little strange, but that role
doesn't change from one implementation to the next. Its role
in the externally-stored format of C source may (does) differ
from system to system, but Translation Phase 1 smooths such
differences away -- and since #ifdefs aren't processed until
Phase 4, there are no differences left for them to test.

YMMV
you might move volvos?
your mother makes volleyballs?

Yankees in 2005. (Not!)

Joe Smith · May 25, 2006

Eric Sosman said:
Joe Smith wrote On 05/25/06 12:46,:

[garbled quoting snipped for clarity's sake; see up-thread]

Click to expand...

I think my newsreader's got a mind of its own. If anyone would like to
recommend one that contributes to usenet clarity, I'm all ears.

C is sometimes used as a cross-compiled language, where
a compiler on one system produces a program to be executed
on a dissimilar system. This sort of thing is quite common
when C programs are written to be executed in free-standing
"embedded" systems, whose hardware may be well-suited to
operating your microwave oven but ill-adapted for the job
of running a C compiler. You build the program on a machine
that has niceties like editors, compilers, file systems, and
so on, and then you (somehow) transfer the generated code
to the target machine and execute it there.

And this (somehow) is OT, right?

This raises the possibility that the system where the
compiler operates may use a different character set than the
system where the program runs, so the Standard is careful to
distinguish the "source character set" (the alphabet in which
the program source is written) from the "execution character
set" (the set of characters the program can use while running).
Not only can the encodings of the characters differ in the two
environments ('0' might be 48 in the source set and 240 in the
execution set), but the repertoire of available characters can
also be different.

The Standard enumerates the minimum required contents of
the two character sets. As you can see, there is a lot of
overlap; in fact, all the required source characters are also
required to exist in the execution set. (One imagines this is
to ensure that a C compiler can be written in C.) A small
number of execution-set characters, though, are not necessary
for the task of writing C source: there's little reason to
want a backspace character as part of the source alphabet, for
example. (A whimisical idea: considering how widely casts are
misused, perhaps the spelling of the cast operator ought to
involve the \a character!)

I'm afraid I don't understand the question. The execution
character set must include the newline character, so its status
is "present" in any conforming C implementation. Its role in
the source character set is a little strange, but that role
doesn't change from one implementation to the next. Its role
in the externally-stored format of C source may (does) differ
from system to system, but Translation Phase 1 smooths such
differences away -- and since #ifdefs aren't processed until
Phase 4, there are no differences left for them to test.

The question was whether a macro would be of use, but you and Mr. Roberson
are confident that it won't be. I should have known it was determined by a
JCL DD statement

The motivation for the original question was rooted in
a weak understanding of source vs. execution sets. Thanks for your help.
joe
--------

Yankees in 2005. (Not!)

~YMMVI , because it's going to
be the south side of the windy city

Ian Collins · May 25, 2006

Joe said:
Eric Sosman said:

Joe Smith wrote On 05/25/06 12:46,:

[garbled quoting snipped for clarity's sake; see up-thread]

Click to expand...

Click to expand...

I think my newsreader's got a mind of its own. If anyone would like to
recommend one that contributes to usenet clarity, I'm all ears.

Mozilla/Thunderbird. Does just about everything right out of the box.

Joe Smith · May 25, 2006

Ian Collins said:
Joe said:

Eric Sosman said:

Joe Smith wrote On 05/25/06 12:46,:

[garbled quoting snipped for clarity's sake; see up-thread]

Click to expand...

I think my newsreader's got a mind of its own. If anyone would like to
recommend one that contributes to usenet clarity, I'm all ears.

Click to expand...

Mozilla/Thunderbird. Does just about everything right out of the box.

And where do I procure this box? joe

My Status, Ciphertext	2	Nov 28, 2023
for loop skips items	13	Feb 15, 2012
Blue J Ciphertext Program	2	Nov 22, 2023
Python code problem	2	Apr 23, 2023
scope q	6	May 30, 2006
Translater + module + tkinter	1	Feb 16, 2023
How to play corresponding sound?	2	Jun 10, 2023
rand in a closed interval on the ints	10	Jun 10, 2006

alphabet q

Joe Smith

Kenneth Brody

Walter Roberson

Eric Sosman

Richard Heathfield

Walter Roberson

John Devereux

Eric Sosman

Eric Sosman

Joe Smith

Joe Smith

Eric Sosman

Keith Thompson

Joe Smith

Joe Smith

Walter Roberson

Eric Sosman

Joe Smith

Ian Collins

Joe Smith

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads