end-of-file problem

K

Keith Thompson

Joe Wright said:
You are correct. Thanks for sharing. At my house..

| 0 NUL| 1 SOH| 2 STX| 3 ETX| 4 EOT| 5 ENQ| 6 ACK| 7 BEL|
| 8 BS | 9 HT | 10 LF | 11 VT | 12 FF | 13 CR | 14 SO | 15 SI |
| 16 DLE| 17 DC1| 18 DC2| 19 DC3| 20 DC4| 21 NAK| 22 SYN| 23 ETB|
| 24 CAN| 25 EM | 26 SUB| 27 ESC| 28 FS | 29 GS | 30 RS | 31 US |

..but even knowing the names of the things, I don't remember what SUB
and several others are supposed to do. Do you?

Nope. That's why I Googled "ascii" and found
<http://www.lookuptables.com/>.
 
L

Lew Pitcher

Keith said:
Nope. That's why I Googled "ascii" and found
<http://www.lookuptables.com/>.

The ASCII characters significant to this discussion would be
ETX, EOT, ETB, and EM, which all signal the end of something. EM and
EOT would be the likeliest candidates for an "eof character", if one
existed (EM is "End of Medium", which means that we ran out of data - a
classic "end-of-file" condition, while EOT is "End of Transmission",
which may also mean out of data).

OTOH, CP/M (and it's clone PCDOS) used the SUB ("Substitute") character
to flag the end of textual data in a file. SUBstitute is supposed to
signal the existance of a character byte that cannot be expressed in
ASCII. Rather than try to express an invalid character,
ASCII-compatable devices are supposed to substitute the SUB character
for the invalid one.
 
J

jmcgill

Joe said:
| 0 NUL| 1 SOH| 2 STX| 3 ETX| 4 EOT| 5 ENQ| 6 ACK| 7 BEL|
| 8 BS | 9 HT | 10 LF | 11 VT | 12 FF | 13 CR | 14 SO | 15 SI |
| 16 DLE| 17 DC1| 18 DC2| 19 DC3| 20 DC4| 21 NAK| 22 SYN| 23 ETB|
| 24 CAN| 25 EM | 26 SUB| 27 ESC| 28 FS | 29 GS | 30 RS | 31 US |

..but even knowing the names of the things, I don't remember what SUB
and several others are supposed to do. Do you?

SUB was a tty code to print in place of a code that was outside the
character set of the device, or for invalid codes.

Some of the codes are for (paper) tape control. The control and format
stuff was used on ttys that weren't even necessarily part of computer
systems.
 
C

CBFalconer

jmcgill said:
SUB was a tty code to print in place of a code that was outside
the character set of the device, or for invalid codes.

Some of the codes are for (paper) tape control. The control and
format stuff was used on ttys that weren't even necessarily part
of computer systems.

The ones I remember include:

NUL null character
SOH start of heading
EOT end of transmission
ENQ Enquiry. also know as WRU for who are you.
started the automatic reply sequence from a TTY
ACK acknowledge
BEL bell
BS back space
HT horizontal tab
LF line feed
CR carruage return
DLE data link escape
DC1 device control 1, or tape reader on
DC2 device control 2, or tape punch on
DC3 device control 3, or tape reader off
DC4 device control 4, or tape punch off
NAK Negative acknowledge
SYN Synchronize
CAN Cancel
EM End mode
ESC escaoe
FS field separator
GS group separator
RS record separator
US unit separator

and DEL DELETE (255) all bits on (all holes punched)
 
M

Mark McIntyre

There is no EOF character in the ASCII set. Never has been.

I'll leave you guys to argue about this, since its offtopic here. My
printed ASCII table, taken from the manual for an IBM-PCXT, has EOF at
position 26 as far as I recall. So blame IBM...

--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
 
R

Richard Tobin

*Bzzzt* 127, please. ASCII is a 7-bit code.
[/QUOTE]
Chuck's tape punch has even parity. :=)

(Obviously completely off-topic)

Presumably this is a good reason for using even parity for encodings
with an even number of bits, and odd for an odd number of bits: so
that a deleted character - all the holes punched - has correct parity.

-- Richard
 
R

Richard Tobin

There is no EOF character in the ASCII set. Never has been.
[/QUOTE]
I'll leave you guys to argue about this, since its offtopic here. My
printed ASCII table, taken from the manual for an IBM-PCXT, has EOF at
position 26 as far as I recall. So blame IBM...

As far as I know the real ASCII standard is not available without
charge, but the control codes are specified by ISO/IEC 6429:1992 and
according to the Unicode code chart

http://www.unicode.org/charts/PDF/U0000.pdf

character 26 (control-Z) is called "SUB" (substitute) and was
presumably intended to be used as a substitute for any character not
available in ASCII. It is of course used as EOF on many systems.

-- Richard
 
D

Dave Thompson

The ones I remember include:

NUL null character
SOH start of heading

Originally/also SOA, start of address.

STX start of text
ETX end of text
These three, plus ETB below, were generally used only for
block-oriented synchronous protocols like BSC 'bisync' (and not very
often for that, since it generally used EBCDIC instead) but also for
some Telex/wire/cable formats (especially machine-switched ones).

EOT, ENQ, ACK, and NAK were also used mostly with such protocols, but
sometimes just by themselves.
EOT end of transmission
ENQ Enquiry. also know as WRU for who are you.
started the automatic reply sequence from a TTY
ACK acknowledge
BEL bell
BS back space
HT horizontal tab
LF line feed

VT vertical tab
FF form feed
CR carruage return

Nit: carriage

SO shift out (to special/alternate glyphs, such as Greek, APL, etc.)
SI shift in (to normal glyphs)
DLE data link escape
DC1 device control 1, or tape reader on
DC2 device control 2, or tape punch on
DC3 device control 3, or tape reader off
DC4 device control 4, or tape punch off

DC1/DC3 = (bitpaired) ctrl+Q/S were also labelled on the TTY keyboard
as and hence often referred to as X-ON and X-OFF. And they acquired
another use for flow control, still valid after paper tape has gone to
the great beyond, by swapping the sequence: instead of 'start reader'
and then 'stop reader', ^S is used for 'stop sending, I'm full or
busy' and ^Q for 'you may start sending again, I'm ready'.
NAK Negative acknowledge
SYN Synchronize

ETB end of text block, see above
CAN Cancel
EM End mode

End Medium (or media?)

SUB substitute for error or suspect (as per snipped prior post)
ESC escaoe

Nit: escape
FS field separator

_file_ separator
GS group separator
RS record separator
US unit separator
Note that FS GS RS US are consecutive codes descending in the
traditional file organization hierarchy, and the next sequential
codepoint is 0x20 SP space, the usual (text) word separator.
and DEL DELETE (255) all bits on (all holes punched)

127 0x7F in ASCII which is only 7 bits, but 255 0xFF in common (and
important) embeddings like even parity or mark 'parity'.

- David.Thompson1 at worldnet.att.net
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,201
Latest member
KourtneyBe

Latest Threads

Top