Character Array vs String

David Thompson · Nov 21, 2011

sometimes two carriage returns to give the teletype the time it needs
to do this

Or other padding, as discussed nearby. A person punching tape would
find CR easiest to use, but automation mostly used NUL or DEL.

To answer another question nearby, real Teletypes did not have flow
control, although they had an (extra cost) option to stop and start
the *Teletype* reading paper tape, which could be used by a receiving
computer or other automated equipment to slow that input (only).
Some other terminals, especially some video terminals (next), did
extend these characters (DC1 aka XON, DC3 aka XOFF) for flow control
from the computer or similar sender, but because of roundtrip delay
this only works if the terminal has at least some buffering (about 2-4
characters) and Teletypes didn't. And some terminal modems of the day
were half-duplex (doesn't allow back traffic) or even if they were
nominally full-duplex didn't always reliably work that way.

yes but some terminals need \r\n some need \r\r\n and glass ttys don't
need both. The Unix decision to use a single logical end-of-line

Video terminals aka glass TTYs varied widely. Some used CR LF
separate, some had CR do LF, some had LF do CR, some had switch or
jumper or even PROM options. Some needed padding, some didn't -- some
early ones even needed *more* padding than mechanical TTYs!

character is the sane answer. The driver sorts out the display
problems.
Exactly.

I think I've seen \r used as a line terminator on DEC(?) machines.

I assume you mean storage, since most systems used lone CR input.

Apple MacOS (at least classic) stored CR (reverse from what Keith (no
relation!) said nearby). I don't know any DEC *software* that did;
most used CR LF and some used counts or reassigned meanings within a
6-bit code. Much DEC *hardware* ran Unix and stored \n=LF.

Kind of, except \n was and still canonically is ASCII LF not CR.

Keith Thompson · Nov 21, 2011

David Thompson said:
Apple MacOS (at least classic) stored CR (reverse from what Keith (no
relation!) said nearby).

Actually, I think that is what I said:

| But we're stuck with the current situation. MacOS managed to change
| its end-of-line marker from CR to LF, but I don't know if Windows
| could (or would) to do the same thing.

[...]

Kind of, except \n was and still canonically is ASCII LF not CR.

The C standard doesn't actually require \n to be LF (it doesn't even
mention ASCII outside a couple of footnotes). \n probably couldn't be
CR, since that's \r (presumably '\n' == '\r' would be non-conforming).
But in EBCDIC \n could plausibly be NL, which is distinct from both CR
and LF.

Lew Pitcher · Nov 21, 2011

In the case I am dimly remembering, data readiness was not the issue.
The interface could take more data and it would be printed wherever the
print head happened to be at the time. I am sure this was regarded as
daft even then.

Yes, teletypes (I'm thinking of the KSR33 and ASR33 models, here)
would print exactly what you sent them, regardless of where the print
head was at the time.

It was customary to send a CR followed by a LF (often followed by one
or more padding characters like NUL) so that the print head had time
to return to the left margin /before/ it received and printed the
first printable character of the next line. From the right margin, the
carriage took more than 1/10'th of a second to return to the left
margin, and sometimes took more than 2/10'ths of a second. Thus, while
the carriage was traveling left, the LF character (and, if necessary,
the follow-on NUL padding characters) gave the carriage that time it
needed.

The LF character on it's own was quite immediate; it never took the
entire 1/10'th second to complete.

BartC · Nov 23, 2011

Kenneth Brody said:
On 11/21/2011 2:39 AM, David Thompson wrote:
[...]

But to answer the exact question, DEC PDP-10 computers had a 36-bit
word with (few) special 'byte' instructions that could access *any*

Click to expand...

While there were a few PDP-n's on campus as I recall, the main computer
lab at the college I went to used a KL-10. While I hadn't heard of C way
back when, I did some assembly programming on it.

As you say, it had 36-bit words, and 18-bit addresses. The O/S used six
6-bit characters per word for filename,

'SIXBIT' format. Very handy, and probably also used for identifiers for
compilers (so you were limited to rather short names, but string matching
was very quick).

which IIRC were in 6.3 format. (I guess they used the other 18 bits for
file attributes?) Text was typically stored as five 7-bit characters,
with 1 bit padding.

Yet, although addresses were 18 bits wide, and each address pointed to a
36-bit word, a "pointer" [could], in fact, be wider than 18 bits. For

The address was 18-bits (for 256KWords of memory per task; just over 1MB!).
But there was also an 'indirect' bit which could be used for repeated
pointer operations, all automatic on any instruction. I believe another
4-bits was an extra index register for each level of pointers.

example, there were machine instructions for accessing sub-words,
including things like "take the 7 bits at this address/bit-location, move
it to the low-order 7 bits at this address, zero-padding it, and then
increment the pointer by 7 bits".

There were a dozen or so bits left over, which were used by byte-pointers,
but these needed special instructions to make use of.

(Of course, C requires at least 8 bits per char, so I guess it would use
the same thing, but storing four 8-bit chars per word with 4 padding bits.
You could store four 9-bit chars w/o any padding, but files on disk were
8-bit characters, so it probably wouldn't make sense to do so.)

As I said, I never saw a C compiler for it, but I would suspect that
sizeof() would be "interesting" on such a system. The simplest system
would be to simply store one 8-bit char per 36-bit word, but that seems
rather inefficient.

I'd imagine C with it's various kinds of pointers would be a nightmare to
implement, if packed char arrays were to be used. I remember using Pascal
which offered a choice of unpacked (fast) or packed (slow) arrays, records
and (presumably) strings.

Kaz Kylheku · Nov 23, 2011

Kenneth Brody said:
Kenneth Brody said:

On 11/21/2011 2:39 AM, David Thompson wrote:
[...]

But to answer the exact question, DEC PDP-10 computers had a 36-bit
word with (few) special 'byte' instructions that could access *any*

Click to expand...

Click to expand...

While there were a few PDP-n's on campus as I recall, the main computer
lab at the college I went to used a KL-10. While I hadn't heard of C way
back when, I did some assembly programming on it.

As you say, it had 36-bit words, and 18-bit addresses. The O/S used six
6-bit characters per word for filename,

Click to expand...

'SIXBIT' format. Very handy, and probably also used for identifiers for
compilers (so you were limited to rather short names, but string matching
was very quick).

At that time, symbol interning had already been invented by the Lispers,
by which any length identifier is reduced to a one-word symbol atom thereafter
used in its place.

Ben Pfaff · Nov 23, 2011

BartC said:
'SIXBIT' format. Very handy, and probably also used for identifiers
for compilers (so you were limited to rather short names, but string
matching was very quick).

This idea was reincarnated, badly, in the "AML" language used by
ACPI, which has 32-bit identifiers that are limited to 4 8-bit
characters.

John Bode · Nov 24, 2011

Could anybody please mention difference between character array and
string in C?

A string is a sequence of character values terminated by a 0. This
sequence of character values may be stored in an array of type char.

Stefan Ram · Nov 24, 2011

John Bode said:
A string is a sequence of character values terminated by a 0. This

"of char values". "char" is not "character", "int" is not "integer",
and so on, you got it.

sequence of character values may be stored in an array of type char.

(In a sense, everything can be stored in an array of type char,
as long as it is large enough.)

James Kuyper · Nov 24, 2011

"of char values". "char" is not "character", "int" is not "integer",
and so on, you got it.

Section 7.1.1p1 says "character" not "char"; it also says "null
character" rather than "0". Strings are allowed to contain multibyte
characters.

Patrick Scheible · Nov 28, 2011

BartC said:
Kenneth Brody said:

On 11/21/2011 2:39 AM, David Thompson wrote:
[...]

But to answer the exact question, DEC PDP-10 computers had a 36-bit
word with (few) special 'byte' instructions that could access *any*

Click to expand...

Click to expand...

While there were a few PDP-n's on campus as I recall, the main
computer lab at the college I went to used a KL-10. While I hadn't
heard of C way back when, I did some assembly programming on it.

As you say, it had 36-bit words, and 18-bit addresses. The O/S used
six 6-bit characters per word for filename,

Click to expand...

'SIXBIT' format. Very handy, and probably also used for identifiers
for compilers (so you were limited to rather short names, but string
matching was very quick).

which IIRC were in 6.3 format. (I guess they used the other 18 bits
for file attributes?) Text was typically stored as five 7-bit
characters, with 1 bit padding.

Yet, although addresses were 18 bits wide, and each address pointed
to a 36-bit word, a "pointer" [could], in fact, be wider than 18
bits. For

Click to expand...

The address was 18-bits (for 256KWords of memory per task; just over
1MB!). But there was also an 'indirect' bit which could be used for
repeated pointer operations, all automatic on any instruction. I
believe another 4-bits was an extra index register for each level of
pointers.

example, there were machine instructions for accessing sub-words,
including things like "take the 7 bits at this address/bit-location,
move it to the low-order 7 bits at this address, zero-padding it,
and then increment the pointer by 7 bits".

Click to expand...

There were a dozen or so bits left over, which were used by
byte-pointers, but these needed special instructions to make use of.

(Of course, C requires at least 8 bits per char, so I guess it would
use the same thing, but storing four 8-bit chars per word with 4
padding bits. You could store four 9-bit chars w/o any padding, but
files on disk were 8-bit characters, so it probably wouldn't make
sense to do so.)

As I said, I never saw a C compiler for it, but I would suspect that
sizeof() would be "interesting" on such a system. The simplest
system would be to simply store one 8-bit char per 36-bit word, but
that seems rather inefficient.

Click to expand...

I'd imagine C with it's various kinds of pointers would be a nightmare
to implement, if packed char arrays were to be used. I remember using
Pascal which offered a choice of unpacked (fast) or packed (slow)
arrays, records and (presumably) strings.

There has been at least one C compiler for the PDP-10. It uses four
9-bit chars per word.

The biggest model of PDP-10 supported extended addressing, with 23 bit
wide addresses. That is word addresses, multiply by 4.5 to get the
capacity in 8-bit bytes.

-- Patrick

Uno · Nov 28, 2011

On 11/21/2011 3:02 PM, Lew Pitcher wrote:
[...]

Yes, teletypes (I'm thinking of the KSR33 and ASR33 models, here)
would print exactly what you sent them, regardless of where the print
head was at the time.

It was customary to send a CR followed by a LF (often followed by one
or more padding characters like NUL) so that the print head had time
to return to the left margin /before/ it received and printed the
first printable character of the next line. From the right margin, the
carriage took more than 1/10'th of a second to return to the left
margin, and sometimes took more than 2/10'ths of a second. Thus, while
the carriage was traveling left, the LF character (and, if necessary,
the follow-on NUL padding characters) gave the carriage that time it
needed.

Click to expand...

[...]

Yup. I saw systems which didn't add some sort of delay-after-CR, and the
printouts would include what was supposed to be the first character of
the next line somewhere in the middle of the line. (Sometimes between
lines, as the LF hadn't yet fully advanced the paper.)

Interesting. Would I be "wrong" to think that 'teletype' might usually
be abbreviated to 'tty'?

Lew Pitcher · Nov 28, 2011

On 11/21/2011 3:02 PM, Lew Pitcher wrote:
[...]

Yes, teletypes (I'm thinking of the KSR33 and ASR33 models, here)
would print exactly what you sent them, regardless of where the print
head was at the time.

Click to expand...

Click to expand...

[snip]
Interesting. Would I be "wrong" to think that 'teletype' might usually
be abbreviated to 'tty'?

Not in the least. TeleType was the brand name, which was commonly
abbreviated to "TTY".

Nick Keighley · Nov 28, 2011

On 11/23/2011 9:42 AM, Kenneth Brody wrote:

On 11/21/2011 3:02 PM, Lew Pitcher wrote:
[...]
Yes, teletypes (I'm thinking of the KSR33 and ASR33 models, here)
would print exactly what you sent them, regardless of where the print
head was at the time.

Click to expand...

[snip]
Interesting. Would I be "wrong" to think that 'teletype' might usually
be abbreviated to 'tty'?

Click to expand...

Not in the least. TeleType was the brand name, which was commonly
abbreviated to "TTY".

well Teletype was like "hoover"- it was used generically.

David Thompson · Jan 19, 2012

Kenneth Brody said:
Kenneth Brody said:

On 11/21/2011 2:39 AM, David Thompson wrote:
[...]

But to answer the exact question, DEC PDP-10 computers had a 36-bit
word with (few) special 'byte' instructions that could access *any*

Click to expand...

Click to expand...

While there were a few PDP-n's on campus as I recall, the main computer
lab at the college I went to used a KL-10. While I hadn't heard of C way
back when, I did some assembly programming on it.

Click to expand...

Not really -n. There were four implementations of PDP-10 by DEC (KA,
KI, KL, KS) and at least one competitive clone, plus PDP-6, with
basically the same instruction set. Other machines in the series were
(quite) different; PDP-5/8/12 and PDP-11 were the more widely used,
and unlike each other or PDP-6/10 or the other less-known PDP's.

'SIXBIT' format. Very handy, and probably also used for identifiers for
compilers (so you were limited to rather short names, but string matching
was very quick).

TOPS-10 (DEC's initial OS) was 6.3 filenames, with one directory per
user (basically). TENEX, developed by BBN and adopted (more or less)
by DEC as TOPS-20, had variable-length filenames (plus version
numbers), in hierarchical directories with variable-length names.

Yet, although addresses were 18 bits wide, and each address pointed to a
36-bit word, a "pointer" [could], in fact, be wider than 18 bits. For

Click to expand...

The address was 18-bits (for 256KWords of memory per task; just over 1MB!).
But there was also an 'indirect' bit which could be used for repeated
pointer operations, all automatic on any instruction. I believe another
4-bits was an extra index register for each level of pointers.

1-bits address plus 1-bit indirect and 4-bit _optional_ index, yes.
Directly in instruction, and indirect pointer in 'memory' if used.

'memory' in quotes because the 16 registers (usually called ACs) can
be accessed as the first 16 locations in memory also.

There were a dozen or so bits left over, which were used by byte-pointers,
but these needed special instructions to make use of.

Exactly. Because byte-pointer needs those extra bits, _and_ because it
can be modified (incremented), it is only in 'memory'.

The special instructions are: load byte (any contiguous bits up to
wordsize) from memory to register, zero-padded, without or with
incrementing the pointer (but not to arbitrary location as BartC seems
to say); store register to byte in memory without or with incrementing
the pointer; increment or 'adjust' (multiple-increment) the pointer.
Files on disk were 36-bit words, and could have characters in any
format supported by software. 6x6 and 5x7 were most common in my
experience, although I think 4x8 and 9x8 were used for file exchange
with other systems (especially in ARPAnet and early Internet).
Not really. C requires that any data type (any object) be accessible
as an array of unsigned char (e.g. for memcpy) so 4x9 seems to be
the most reasonable choice. Although nowadays with Unicode finally
becoming really widespread, you could make an argument for 2x18
(and use the halfword instructions to optimize some cases).

I'd imagine C with it's various kinds of pointers would be a nightmare to
implement, if packed char arrays were to be used. I remember using Pascal
which offered a choice of unpacked (fast) or packed (slow) arrays, records
and (presumably) strings.

C's allowance of different pointer formats (size and representation)
to different target types works excellently here.

Converting an Array to a String in JavaScript	7	Sep 22, 2023
Outputting signal values to terminal Within Character Array	0	Dec 10, 2021
Copy string from 2D array to a 1D array in C	1	Nov 1, 2023
Noob question about mathematical addition vs. "string addition" in C#	1	Mar 6, 2022
C Programming functions	2	Dec 3, 2021
Initializing a character array with a string literal?	15	Mar 15, 2010
Problem with displaying character that code number is 219 (after SetConsoleTextAttribute)?	3	Jan 9, 2023
Hello guys ! How do I convert a string from an array into numbers ? Javascript	3	Dec 19, 2022

Character Array vs String

David Thompson

Keith Thompson

Lew Pitcher

BartC

Kaz Kylheku

Ben Pfaff

John Bode

Stefan Ram

James Kuyper

Patrick Scheible

Uno

Lew Pitcher

Nick Keighley

David Thompson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads