multiple versions of "Extended ASCII characters"(No. 128 to 255)

O

osmium

wob said:
Many thanks for those who responded to my question of "putting greek char
into C string". In searching for an solution, I noticed that there are
more than one version of "Extended ASCII characters"(No. 128 to 255) .
e.g., in one version No. 224 is the symbol alpha, in another, it's a "a"
with a ` on it... How come?

The phrase "extended ASCII" has come to mean that the new character set
contains ASCII as a subset. There are probably hundreds of these. ISTM
there should have been a better way to express that thought, but it doesn't
leap out at me. Related words that might help you pursue this subject in
google: font, code page.

There is now, and always has been only one ASCII and it contains 128
characters, basically the American version of the latin alphabet, plus
digits and punctuation and control characters. There is no established
graphic to identify the control characters.
 
L

Lew Pitcher

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
:




The phrase "extended ASCII" has come to mean that the new character set
contains ASCII as a subset.

Precicely!

IMHO, the phrase "Extended ASCII" should be banned from any discussion. People
too often say "Extended ASCII" when they mean "some unknown characterset that
shares a common set of characters with ASCII", and expect a precise answer
relating to ASCII.
There are probably hundreds of these.

One of the ISO working committees keeps a website just as a catalog of
charactersets. The URL is http://anubis.dkuug.dk/i18n/charmaps/
ISTM
there should have been a better way to express that thought, but it doesn't
leap out at me. Related words that might help you pursue this subject in
google: font, code page.

"coded character set" or "coded characterset"
Also, related to "characterset translation"

There is now, and always has been only one ASCII and it contains 128
characters, basically the American version of the latin alphabet, plus
digits and punctuation and control characters. There is no established
graphic to identify the control characters.

See http://anubis.dkuug.dk/i18n/charmaps/ASCII for an ASCII-to-Unicode table.
While you /can/ purchase the ASCII specs from ISO, the ECMA provides identical
specs for free at
http://www.ecma-international.org/publications/files/ecma-st/ECMA-006.pdf,
http://www.ecma-international.org/publications/files/ecma-st/ECMA-048.pdf, and
http://www.ecma-international.org/publications/files/ecma-st/ECMA-035.pdf

- --
Lew Pitcher
IT Specialist, Enterprise Data Systems,
Enterprise Technology Solutions, TD Bank Financial Group

(Opinions expressed are my own, not my employers')
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (MingW32)

iD8DBQFC39pQagVFX4UWr64RArjIAKDtK42C9728hfxIaF100LGQ9DEWrwCg88iN
3b2x+QqZcRbjDb5KOGn2WYQ=
=BwwV
-----END PGP SIGNATURE-----
 
C

Chris Croughton

Many thanks for those who responded to my question of "putting greek char
into C string". In searching for an solution, I noticed that there are more
than one version of "Extended ASCII characters"(No. 128 to 255) . e.g., in
one version No. 224 is the symbol alpha, in another, it's a "a" with a ` on
it... How come?

There is no such thing as "Extended ASCII" in any meaningful form. It's
like "C with extensions", the extended parts are done by whoever wants
them.

ASCII defines /only/ characters using the bottom 7 bits, thus the
characters numbered 0 to 127. Various people have decided that they
want more, so they allocated them to codes above 127 as they felt like
it. Line drawing characters, European accented characters (at least
four versions used commonly in Europe), mathematical symbols, Cyrillic
(Russuan) characters, Greek, funny faces, you name it. And of course
Microsoft came up with its own ones different from any others.

Recently (i.e. in the last 20 years) there have been attempts to
standardise, but because all of the characters can't fit into the
'spare' 128 available positions there are lots of variants in the
ISO-8859 standard (at least 10 variants). See for instance

http://czyborra.com/charsets/iso8859.html

It was realised that what was really wanted was a much expanded
character space, to allow for the thousands of Chinese characters and
other languages to be added, so Unicode was born. This uses fixed-width
characters of either 16 or 32 bits, with each character assigned to only
one position (some of the characters look alike but are in different
national or specific sets so they are treated as different characters).

Because much software still uses 8 bit strings (and 8 bit transport
paths), Unicode also specifies a method of converting a 'wide' (16 or
32 bit) character into an string of 8 bit characters. This system,
UTF (Unicode Transformation Format) 8 keeps the ASCII characters as
individual 7 bits with the top bit of the 8 bit character zero, so it is
compatible with 7 bit ASCII, and characterss with the top bit set are
not valid on their own, only as part of a "multi-byte character" string.

The web page above has descriptions of the ISO 8859 variants, and also
points to articles and descriptions of Unicode, UTF-8 and other matters.

This is relevant to C in the support for 'wide' characters and multibyte
characters, and the functions which transform and output them.

Chris C
 
D

Dave Thompson

The phrase "extended ASCII" has come to mean that the new character set
contains ASCII as a subset. There are probably hundreds of these. ISTM
there should have been a better way to express that thought, but it doesn't
leap out at me. Related words that might help you pursue this subject in
google: font, code page.
Right.

There is now, and always has been only one ASCII and it contains 128
characters, basically the American version of the latin alphabet, plus
digits and punctuation and control characters. There is no established
graphic to identify the control characters.
There is only one ASCII now, but it has changed significantly at least
once, when lowercase and other 6/x and 7/x was added, IIRC about 1968.
And to be pedantic it went through periods of being designated USASCII
and ANSCII as the name of the organization changed, but this did not
imply any substantive change. The American alphabet is the (modern)
English alphabet, at least for America = US plus most of CA; there are
other American countries (primarily) using other languages.

There _is_ a standard for graphical representations for control
characters, albeit at least mostly just two-letter mnemonics jammed
together, not "graphical" in the common sense of pictorial or iconic:
ISO 2047, IIRC based on and superseding an X3.n like 646 versus ASCII;
but it certainly hasn't been widely used or even known. I have seen
what I believe(d) were displays obeying it on various datascopes, and
a few (real) terminals back-in-the-day in "show controls" mode.

- David.Thompson1 at worldnet.att.net
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,045
Latest member
DRCM

Latest Threads

Top