Character Array vs String

BartC · Nov 9, 2011

Tobias Blass said:
Well Windows developed after UNIX, so they could have adopted the \n
encoding if they wanted to. You could as well reverse your argument and
say "but to this day text files won't display properly on *NIX as a
result" (most *NIX utilities can handle \r\n encodings, though). I also
don't think \n was used to save a byte(CMIIW). \n is more "natural" (you
want a newline, so you add a newline character), but \r\n is more natural
if you are used to typewriters. Since typewriters are quite rare these
days I think the *NIX way makes more sense, but YMMV.

Anyone who's used a teletype will know that both carriage return and
linefeed are usually necessary!

So it made sense to include both of those characters in text files, and this
was long before Windows.

On input of course, only one character is transmitted (when Return is
pressed), and the TTY software ensures it would be echoed as CR followed by
LF. Perhaps Unix modelled the input sequence rather than output.

Keith Thompson · Nov 9, 2011

Quentin Carbonneaux said:
To my knowledge, C does not have strings.

Your knowledge is incomplete. C doesn't have a string *type*, but it
does have strings, defined as "a contiguous sequence of characters
terminated by and including the ï¬rst null character".

Keith Thompson · Nov 9, 2011

Nick Keighley said:
CONTEXT!!
Leave some in so we know what you are responding to.

the character code you are trying to refer to is "ASCII" not "ANSI"

I doubt it. He's probably trying to refer to the Windows-specific
character set that is usually incorrectly called "ANSI"
(even though it's never been the subject of any ANSI
standard). It's more properlyi calledWindows-1252 or CP-1252
(<http://en.wikipedia.org/wiki/Windows-1252>). It's an 8-bit
character set that's similar to ISO-8859-1 (Latin-1), except that
codes in the range 0x80 .. 0x9f are used for printable characters
rather than control characters.

In the Windows world, it's common to refer to this particular 8-bit
character set (or similar variants) as "ANSI", and to a 16-bit
representation (UTF-16 or UCS-2) as "Unicode", even though UTF-16 is
just one of several *representations* of Unicode.

ASCII is a 7-bit character set, not an 8-bit character set (ASCII
characters are typically represented as 8-bit bytes with the
high-order bit set to 0). There are a number of 8-bit extensions
to ASCII.

Keith Thompson · Nov 9, 2011

Malcolm McLean said:
But then you'd have two file formats, identical except for the tags,
and the potential for extra costs and incompatibilities would be
large. A bit like the decision to encode newline/carriage return as
just a newline. It saved a byte, but to this day text files won't
display properly on Windows as a result.

IMHO it would have been far better to define a single character that
marks the end of a line in a text file. In an ASCII-like encoding, for
example, you might keep the existing CR and LF characters (CR specifies
moving the cursor to the beginning of the current line, LF specifies
moving it down to the next line) and add a NL character that's
specifically for use in text files. (EBCDIC has distinct CR, LF, and NL
characters; I don't know if they're used this way.)

The use of a two-character sequence to mark the end of a line is a
concession to old hardware that needed separate commands to move the
cursor to the beginning of the line and to advance to the next one.
It's not necessary on current systems.

But we're stuck with the current situation. MacOS managed to change
its end-of-line marker from CR to LF, but I don't know if Windows
could (or would) to do the same thing.

Stephen Sprunk · Nov 10, 2011

The use of a two-character sequence to mark the end of a line is a
concession to old hardware that needed separate commands to move the
cursor to the beginning of the line and to advance to the next one.
It's not necessary on current systems.

AIUI, old teletypes needed the _time_ that it took to receive two
characters to do that combined action. IOW, if only a single character
was used, the machine's output would fall further and further behind its
input with each line printed--and they didn't have much in the way of
buffers, so "real" characters would soon start getting lost.

S

Ben Bacarisse · Nov 10, 2011

Stephen Sprunk said:
AIUI, old teletypes needed the _time_ that it took to receive two
characters to do that combined action. IOW, if only a single character
was used, the machine's output would fall further and further behind its
input with each line printed--and they didn't have much in the way of
buffers, so "real" characters would soon start getting lost.

Some needed *more* time. A logical newline was sometimes translated
into \n\r\0\0\0\0\0\0\0\0\0. If I'm not mistaken, some systems had to
send more nulls after long lines than after short ones.

What the hardware needs is now, thankfully, kept entirely separate
from what marks logical lines in input or output.

Nick Keighley · Nov 10, 2011

:
On Nov 9, 12:48 pm, 88888 Dihedral <[email protected]>

your attributions are screwed I didn't write that

Furthermore, ASCII is a 7-bit code, not 8.

I thought there was a parity bit?

It is usually extended in some
fashion to become eight bits in actual use as opposed to an abstraction.

ASCII wasn't "an abstraction", it was a widely used code. Those codes
using the top bit for data encoding aren't ASCII

<snip>

Nick Keighley · Nov 10, 2011

Anyone who's used a teletype will know that both carriage return and
linefeed are usually necessary!

sometimes two carriage returns to give the teletype the time it needs
to do this

So it made sense to include both of those characters in text files, and this
was long before Windows.

yes but some terminals need \r\n some need \r\r\n and glass ttys don't
need both. The Unix decision to use a single logical end-of-line
character is the sane answer. The driver sorts out the display
problems.

I think I've seen \r used as a line terminator on DEC(?) machines.

BartC · Nov 10, 2011

Ben Bacarisse said:
Some needed *more* time. A logical newline was sometimes translated
into \n\r\0\0\0\0\0\0\0\0\0. If I'm not mistaken, some systems had to
send more nulls after long lines than after short ones.

Didn't a teletype use any handshaking to indicate when it was ready for more
data?

Each extra \0 took another 1/10 second to send (on an ASR33 anyway),
accompanied by a lot of extra clatter. I think it would be noticed if it
happened on every line!

What the hardware needs is now, thankfully, kept entirely separate
from what marks logical lines in input or output.

But CR and LF are sometimes still considered to be positioning codes, rather
than logical separators. (Although Windows consoles seem to have dropped
support for LF, so printf("abc%cdef",10); shows:

abc
def

instead of:

abc
def

but printf("abc%cdef",13); does still display:

def

)

Keith Thompson · Nov 10, 2011

BartC said:
Didn't a teletype use any handshaking to indicate when it was ready for more
data?

Not necessarily.

Each extra \0 took another 1/10 second to send (on an ASR33 anyway),
accompanied by a lot of extra clatter. I think it would be noticed if it
happened on every line!

Even today, the man page for the "stty" command has the following:

* [-]ofdel
use delete characters for fill instead of null characters

* [-]ofill
use fill (padding) characters instead of timing for delays

I seem to recall using systems where "stty" has a setting to control how
many null characters are transmitted after a carriage return, to allow
time for the cursor to reach the beginning of the line.

But CR and LF are sometimes still considered to be positioning codes, rather
than logical separators.

Yes, which is why I suggested that it would have made more sense to have
a *separate* logical separator.

Ben Bacarisse · Nov 10, 2011

BartC said:
Didn't a teletype use any handshaking to indicate when it was ready
for more data?

In the case I am dimly remembering, data readiness was not the issue.
The interface could take more data and it would be printed wherever the
print head happened to be at the time. I am sure this was regarded as
daft even then.

But CR and LF are sometimes still considered to be positioning codes,
rather than logical separators. (Although Windows consoles seem to
have dropped support for LF, so printf("abc%cdef",10); shows:

abc
def

instead of:

abc
def

Your example is in C and in C the intent of printing '\n' to a display
device is to move "the active position to the initial position of the
next line". This is tied in to C's adoption of '\n' as marking the end
of a line of text -- \n is a line end and the IO system must do what it
takes to make it so.

but printf("abc%cdef",13); does still display:

def

I'd write \r for portability and clarity but, yes, that is the common
behaviour and what the C standard intends: \r "[m]oves the active
position to the initial position of the current line".

osmium · Nov 10, 2011

Nick said:
your attributions are screwed I didn't write that

Sorry about that. I didn't notice you were using google to respond. My
email program considers google newsgroup responses as "broken" and I use a
special program (this one) to respond to those messages. But I just didn't
notice that you were using google.

I thought there was a parity bit?

We all have misinformation in our head.

ASCII wasn't "an abstraction", it was a widely used code. Those codes
using the top bit for data encoding aren't ASCII

I used "abstraction" in the sense that I don't know of any place where pure,
honest-to-god-ASCII is used. The interchange hardware I know of, optical
disks, tapes, external hard drives, are all based on 8-bit bytes. I guess I
would grudgingly accept a code where the top bit was guaranteed to be zero
or used for parity. If you can provide a sample or three I would be very
interested. Perhaps there is an obscure code point in MS-DOS that you know
of?

Clinton mode on. I now note you say "was" rather than "is". Where _was_
real ASCII used? Was there perhaps a magnetic tape format for 7-bit code
that two or more organizations agreed to use?

Nick Keighley · Nov 10, 2011

Nick said:
Nick said:

[...] ASCII is a 7-bit code, not 8.

Click to expand...

Click to expand...

I thought there was a parity bit?

Click to expand...

We all have misinformation in our head.

from other posters I think the 7-bit stuff is correct. It may have
been transmitted with parity. Ah, wikipedia describes it as a 7-bit
code.

I used "abstraction" in the sense that I don't know of any place where pure,
honest-to-god-ASCII is used.

it won't be used on anything modern. But there's a lot of old stuff
around.

The interchange hardware I know of, optical
disks, tapes, external hard drives, are all based on 8-bit bytes.

think teleprinters and paper tapes.

I guess I
would grudgingly accept a code where the top bit was guaranteed to be zero
or used for parity. If you can provide a sample or three I would be very
interested.

old teleprinter and telex sytesm worked like this.

Perhaps there is an obscure code point in MS-DOS that you know
of?

oh, nothing as modern as a PC! Though I've seen PCs linked to such
legacy systems.

Consider:-
http://en.wikipedia.org/wiki/Aeronautical_Fixed_Telecommunication_Network

Parts of it are probably still runnign and using shitty telegraph
lines, because to upgrade would involve upgrading everyone at once.

Clinton mode on. I now note you say "was" rather than "is". Where _was_
real ASCII used? Was there perhaps a magnetic tape format for 7-bit code
that two or more organizations agreed to use?

ASCIII was designed for serial data exchange. Take a look at the codes
below 32. They don't code for playing card suits, line drawing or
obscure european letters- but for SOH and ETX and so on.

lawrence.jones · Nov 10, 2011

osmium said:
I used "abstraction" in the sense that I don't know of any place where pure,
honest-to-god-ASCII is used. The interchange hardware I know of, optical
disks, tapes, external hard drives, are all based on 8-bit bytes. I guess I
would grudgingly accept a code where the top bit was guaranteed to be zero
or used for parity. If you can provide a sample or three I would be very
interested.

Lots of serial devices (like genuine Teletypes) used 7-bit ASCII with
parity, as did most punched paper tape.

Joachim Schmitz · Nov 10, 2011

Nick Keighley wrote:

ASCIII was designed for serial data exchange. Take a look at the codes
below 32. They don't code for playing card suits, line drawing or
obscure european letters- but for SOH and ETX and so on.

Obscure European letters??? As opposed to an entirely useless $ sign
(useless this side of the big pond at least), [], {} and @?

Bye, Jojo

PS: ;-)

Keith Thompson · Nov 10, 2011

Lots of serial devices (like genuine Teletypes) used 7-bit ASCII with
parity, as did most punched paper tape.

I remember reading about one system that used 7-bit ASCII in 8-bit bytes
with the high-order bit set to 1. It made interoperability fun.

Ben Bacarisse · Nov 10, 2011

Kenneth Brody said:
Nick Keighley wrote:

ASCIII was designed for serial data exchange. Take a look at the codes
below 32. They don't code for playing card suits, line drawing or
obscure european letters- but for SOH and ETX and so on.

Click to expand...

Obscure European letters??? As opposed to an entirely useless $ sign
(useless this side of the big pond at least), [], {} and @?

Click to expand...

Well, given that the "A" in "ASCII" stands for "American", I think
that "obscure European letters" is appropriate, as your side of the
Big Pond is irrelevant.

Given that Spanish is moderately widely spoken in the US, it seems odd
(from this side of the Pond) that Ã±, Ã¡, Ã©, Ã, Ã³ and Ãº are considered
to be either obscure or European

Nick Keighley · Nov 11, 2011

Nick Keighley wrote:

ASCIII was designed for serial data exchange. Take a look at the codes
below 32. They don't code for playing card suits, line drawing or
obscure european letters- but for SOH and ETX and so on.

Click to expand...

Obscure European letters??? As opposed to an entirely useless $ sign
(useless this side of the big pond at least), [], {} and @?

how could we write perl if there were no $ sign? Some places reuse it
as an alternative currency sign.

Jens Thoms Toerring · Nov 11, 2011

Lots of serial devices (like genuine Teletypes) used 7-bit ASCII with
parity, as did most punched paper tape.

As do lots of devices used in the lab today. They are con-
nected via the serial port (you often can set if parity is
to be used) andthere are still lots of them and even quite
a number of new devices still can be controlled via serial
port, although they normally also have a GPIB or USB con-
nector or are networked. Can be quite useful for debugging:
for USB, GPIB or network you typically need to write pro-
grams to talk to them, while with the serial interface you
plug them in, start a program like 'minicom' and then you
can directly communicate with the device by typing in com-
mands and see what's the reply.
Regards, Jens

David Thompson · Nov 21, 2011

Nick said:
Nick said:

[ASCII] is usually extended in some
fashion to become eight bits in actual use as opposed to an
abstraction.

Click to expand...

ASCII wasn't "an abstraction", it was a widely used code. Those codes
using the top bit for data encoding aren't ASCII

Click to expand...

I used "abstraction" in the sense that I don't know of any place where pure,
honest-to-god-ASCII is used. The interchange hardware I know of, optical
disks, tapes, external hard drives, are all based on 8-bit bytes. I guess I
would grudgingly accept a code where the top bit was guaranteed to be zero
or used for parity. If you can provide a sample or three I would be very
interested. Perhaps there is an obscure code point in MS-DOS that you know
of?

I would consider any 8-bit (or bigger) format of which a given 7-bits
is always ASCII to be using ASCII. E.g. ASCII + high bit 0, or 1, or
even parity, or odd, or flag, or even just out-of-band noise.
8-bit codes where 0x00-7f are ASCII but 0x80-ff are something else,
like line-drawing or math or dingbats or European or Greek or Russian
or Arabic or ... are no longer ASCII, although they are related.

But to answer the exact question, DEC PDP-10 computers had a 36-bit
word with (few) special 'byte' instructions that could access *any*
number of bits within a word (and historically on a range of machines
bytes are not always 8 bits). Most software on 10's either divided
each word into 6 characters of 6-bits using the printable uppercase
subset of ASCII (i.e. columns 2 to 5), or into 5 characters of 7-bit
ASCII with 1 bit left over. (In those days nearly all terminals and
printers were uppercase-only anyway.) Some software did use 8-bit
bytes with 4 bits left, or 'spanned' 9 bytes of 8-bits over 2 words,
or had 4 bytes of 9-bits per word. I suppose some software may even
have done 7 bytes of 5-bits leaving 1 using Baudot/ITA2 or similar.

Clinton mode on. I now note you say "was" rather than "is". Where _was_
real ASCII used? Was there perhaps a magnetic tape format for 7-bit code
that two or more organizations agreed to use?

Early IBM computers did have 7-track magtape, but used with IBM's
'BCDIC' 6-bit code plus parity (formally called Vertical Redundancy
Check). S/360 and later switched to 8-bit EBCDIC, on 9-track tape.

Technically it would have been possible for other (>=7-bit) equipment
to write 7-bit ASCII to 7-track tape with some escaping if necessary
to avoid NULs (because all 0-bits would screw up decode clocking).
But I never heard of anyone doing so. (There were standard formats for
ASCII on 9-track, including labels, but the overwhelming majority of
systems were IBM or wanted IBM compatibility and used EBCDIC.)

Converting an Array to a String in JavaScript	7	Sep 22, 2023
Outputting signal values to terminal Within Character Array	0	Dec 10, 2021
Copy string from 2D array to a 1D array in C	1	Nov 1, 2023
Noob question about mathematical addition vs. "string addition" in C#	1	Mar 6, 2022
C Programming functions	2	Dec 3, 2021
Initializing a character array with a string literal?	15	Mar 15, 2010
Problem with displaying character that code number is 219 (after SetConsoleTextAttribute)?	3	Jan 9, 2023
Hello guys ! How do I convert a string from an array into numbers ? Javascript	3	Dec 19, 2022

Character Array vs String

BartC

Keith Thompson

Keith Thompson

Keith Thompson

Stephen Sprunk

Ben Bacarisse

Nick Keighley

Nick Keighley

BartC

Keith Thompson

Ben Bacarisse

osmium

Nick Keighley

lawrence.jones

Joachim Schmitz

Keith Thompson

Ben Bacarisse

Nick Keighley

Jens Thoms Toerring

David Thompson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads