multiple versions of "Extended ASCII characters"(No. 128 to 255)

Discussion in 'C Programming' started by wob, Jul 21, 2005.

  1. wob

    wob Guest

    wob, Jul 21, 2005
    #1
    1. Advertising

  2. wob

    osmium Guest

    "wob" writes:

    > Many thanks for those who responded to my question of "putting greek char
    > into C string". In searching for an solution, I noticed that there are
    > more than one version of "Extended ASCII characters"(No. 128 to 255) .
    > e.g., in one version No. 224 is the symbol alpha, in another, it's a "a"
    > with a ` on it... How come?


    The phrase "extended ASCII" has come to mean that the new character set
    contains ASCII as a subset. There are probably hundreds of these. ISTM
    there should have been a better way to express that thought, but it doesn't
    leap out at me. Related words that might help you pursue this subject in
    google: font, code page.

    There is now, and always has been only one ASCII and it contains 128
    characters, basically the American version of the latin alphabet, plus
    digits and punctuation and control characters. There is no established
    graphic to identify the control characters.
    osmium, Jul 21, 2005
    #2
    1. Advertising

  3. wob

    Lew Pitcher Guest

    -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1

    osmium wrote:
    > "wob" writes:
    >
    >
    >>Many thanks for those who responded to my question of "putting greek char
    >>into C string". In searching for an solution, I noticed that there are
    >>more than one version of "Extended ASCII characters"(No. 128 to 255) .
    >>e.g., in one version No. 224 is the symbol alpha, in another, it's a "a"
    >>with a ` on it... How come?

    >
    >
    > The phrase "extended ASCII" has come to mean that the new character set
    > contains ASCII as a subset.


    Precicely!

    IMHO, the phrase "Extended ASCII" should be banned from any discussion. People
    too often say "Extended ASCII" when they mean "some unknown characterset that
    shares a common set of characters with ASCII", and expect a precise answer
    relating to ASCII.

    > There are probably hundreds of these.


    One of the ISO working committees keeps a website just as a catalog of
    charactersets. The URL is http://anubis.dkuug.dk/i18n/charmaps/

    > ISTM
    > there should have been a better way to express that thought, but it doesn't
    > leap out at me. Related words that might help you pursue this subject in
    > google: font, code page.


    "coded character set" or "coded characterset"
    Also, related to "characterset translation"


    > There is now, and always has been only one ASCII and it contains 128
    > characters, basically the American version of the latin alphabet, plus
    > digits and punctuation and control characters. There is no established
    > graphic to identify the control characters.


    See http://anubis.dkuug.dk/i18n/charmaps/ASCII for an ASCII-to-Unicode table.
    While you /can/ purchase the ASCII specs from ISO, the ECMA provides identical
    specs for free at
    http://www.ecma-international.org/publications/files/ecma-st/ECMA-006.pdf,
    http://www.ecma-international.org/publications/files/ecma-st/ECMA-048.pdf, and
    http://www.ecma-international.org/publications/files/ecma-st/ECMA-035.pdf

    - --
    Lew Pitcher
    IT Specialist, Enterprise Data Systems,
    Enterprise Technology Solutions, TD Bank Financial Group

    (Opinions expressed are my own, not my employers')
    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.2.4 (MingW32)

    iD8DBQFC39pQagVFX4UWr64RArjIAKDtK42C9728hfxIaF100LGQ9DEWrwCg88iN
    3b2x+QqZcRbjDb5KOGn2WYQ=
    =BwwV
    -----END PGP SIGNATURE-----
    Lew Pitcher, Jul 21, 2005
    #3
  4. On Thu, 21 Jul 2005 10:59:45 -0500, wob
    <> wrote:

    > Many thanks for those who responded to my question of "putting greek char
    > into C string". In searching for an solution, I noticed that there are more
    > than one version of "Extended ASCII characters"(No. 128 to 255) . e.g., in
    > one version No. 224 is the symbol alpha, in another, it's a "a" with a ` on
    > it... How come?


    There is no such thing as "Extended ASCII" in any meaningful form. It's
    like "C with extensions", the extended parts are done by whoever wants
    them.

    ASCII defines /only/ characters using the bottom 7 bits, thus the
    characters numbered 0 to 127. Various people have decided that they
    want more, so they allocated them to codes above 127 as they felt like
    it. Line drawing characters, European accented characters (at least
    four versions used commonly in Europe), mathematical symbols, Cyrillic
    (Russuan) characters, Greek, funny faces, you name it. And of course
    Microsoft came up with its own ones different from any others.

    Recently (i.e. in the last 20 years) there have been attempts to
    standardise, but because all of the characters can't fit into the
    'spare' 128 available positions there are lots of variants in the
    ISO-8859 standard (at least 10 variants). See for instance

    http://czyborra.com/charsets/iso8859.html

    It was realised that what was really wanted was a much expanded
    character space, to allow for the thousands of Chinese characters and
    other languages to be added, so Unicode was born. This uses fixed-width
    characters of either 16 or 32 bits, with each character assigned to only
    one position (some of the characters look alike but are in different
    national or specific sets so they are treated as different characters).

    Because much software still uses 8 bit strings (and 8 bit transport
    paths), Unicode also specifies a method of converting a 'wide' (16 or
    32 bit) character into an string of 8 bit characters. This system,
    UTF (Unicode Transformation Format) 8 keeps the ASCII characters as
    individual 7 bits with the top bit of the 8 bit character zero, so it is
    compatible with 7 bit ASCII, and characterss with the top bit set are
    not valid on their own, only as part of a "multi-byte character" string.

    The web page above has descriptions of the ISO 8859 variants, and also
    points to articles and descriptions of Unicode, UTF-8 and other matters.

    This is relevant to C in the support for 'wide' characters and multibyte
    characters, and the functions which transform and output them.

    Chris C
    Chris Croughton, Jul 21, 2005
    #4
  5. On Thu, 21 Jul 2005 09:32:50 -0700, "osmium" <>
    wrote:
    <snip>
    > The phrase "extended ASCII" has come to mean that the new character set
    > contains ASCII as a subset. There are probably hundreds of these. ISTM
    > there should have been a better way to express that thought, but it doesn't
    > leap out at me. Related words that might help you pursue this subject in
    > google: font, code page.
    >

    Right.

    > There is now, and always has been only one ASCII and it contains 128
    > characters, basically the American version of the latin alphabet, plus
    > digits and punctuation and control characters. There is no established
    > graphic to identify the control characters.
    >

    There is only one ASCII now, but it has changed significantly at least
    once, when lowercase and other 6/x and 7/x was added, IIRC about 1968.
    And to be pedantic it went through periods of being designated USASCII
    and ANSCII as the name of the organization changed, but this did not
    imply any substantive change. The American alphabet is the (modern)
    English alphabet, at least for America = US plus most of CA; there are
    other American countries (primarily) using other languages.

    There _is_ a standard for graphical representations for control
    characters, albeit at least mostly just two-letter mnemonics jammed
    together, not "graphical" in the common sense of pictorial or iconic:
    ISO 2047, IIRC based on and superseding an X3.n like 646 versus ASCII;
    but it certainly hasn't been widely used or even known. I have seen
    what I believe(d) were displays obeying it on various datascopes, and
    a few (real) terminals back-in-the-day in "show controls" mode.

    - David.Thompson1 at worldnet.att.net
    Dave Thompson, Aug 1, 2005
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Geoff Warnock
    Replies:
    2
    Views:
    7,974
    Daniel Tryba
    Mar 9, 2005
  2. RC
    Replies:
    6
    Views:
    1,656
    Mike Schilling
    Dec 14, 2006
  3. Eric Sosman

    Re: Integer 128 != Integer 128 ??

    Eric Sosman, Oct 12, 2010, in forum: Java
    Replies:
    6
    Views:
    820
    Screamin Lord Byron
    Oct 13, 2010
  4. chankey pathak

    Re: Integer 128 != Integer 128 ??

    chankey pathak, Oct 13, 2010, in forum: Java
    Replies:
    0
    Views:
    832
    chankey pathak
    Oct 13, 2010
  5. hugo2
    Replies:
    4
    Views:
    182
    Randy Webb
    Mar 8, 2005
Loading...

Share This Page