Re: C++ way to convert ASCII digits to Integer?

Discussion in 'C++' started by andreas.koestler@googlemail.com, May 27, 2009.

  1. Guest

    On May 27, 9:18 am, "Peter Olcott" <> wrote:
    > I remember that there is a clean C++ way to do this, but, I
    > forgot what it was.


    I don't know what you mean by 'clean C++ way' but one way to do it is:
    int ascii_digit_to_int ( const char asciidigit ) {
    if ( asciidigit < '0' ||
    asciidigit > '9' ) {
    throw NotADigitException();
    }
    return (int) asciidigit - 48; // 48 => '0'
    }
    Or you can use atoi or similar.
    Or you use the std::stringstream:

    std::stringstream sstr("3");
    int value;
    sstr >> value;

    Hope that helps
    Andreas
     
    , May 27, 2009
    #1
    1. Advertising

  2. wrote:

    > On May 27, 9:18 am, "Peter Olcott" <> wrote:
    >> I remember that there is a clean C++ way to do this, but, I
    >> forgot what it was.

    >
    > I don't know what you mean by 'clean C++ way' but one way to do it is:
    > int ascii_digit_to_int ( const char asciidigit ) {
    > if ( asciidigit < '0' ||
    > asciidigit > '9' ) {
    > throw NotADigitException();
    > }
    > return (int) asciidigit - 48; // 48 => '0'


    To make the function usable also on non-ASCII implementations, and to
    remove the need for a comment, you should write that last line as:
    return asciidigit - '0';

    > }
    > Or you can use atoi or similar.


    Better use strtol rather than atoi. It provides much better behaviour in
    error situations.

    > Or you use the std::stringstream:
    >
    > std::stringstream sstr("3");
    > int value;
    > sstr >> value;


    Or you use boost::lexical_cast<> (which uses stringstreams internally,
    but with proper error handling).

    >
    > Hope that helps
    > Andreas


    Bart v Ingen Schenau
    --
    a.c.l.l.c-c++ FAQ: http://www.comeaucomputing.com/learn/faq
    c.l.c FAQ: http://c-faq.com/
    c.l.c++ FAQ: http://www.parashift.com/c -faq-lite/
     
    Bart van Ingen Schenau, May 27, 2009
    #2
    1. Advertising

  3. Guest

    > Not if the machine doesn't use ASCII; only a function like yours above
    > is fully portable. But perhaps the original poster wanted a function
    > that would convert from the host's native textual representation to an
    > integer, in which case the above function would not be a good idea, and
    > atoi or stringstream would.

    Blargg, please explain... :)
     
    , May 27, 2009
    #3
  4. "" <> writes:

    > From: blargg <>
    > andreas.koestler wrote:
    > > On May 27, 9:18 am, "Peter Olcott" <> wrote:
    > > > I remember that there is a clean C++ way to do this [convert
    > > > ASCII digits to Integer], but, I forgot what it was.

    > >
    > > I don't know what you mean by 'clean C++ way' but one way to do it is:
    > >
    > > int ascii_digit_to_int ( const char asciidigit ) {
    > > if ( asciidigit < '0' ||
    > > asciidigit > '9' ) {
    > > throw NotADigitException();
    > > }
    > > return (int) asciidigit - 48; // 48 => '0'
    > > }
    > >
    > > Or you can use atoi or similar.
    > > Or you use the std::stringstream:
    >> Not if the machine doesn't use ASCII; only a function like yours above
    >> is fully portable. But perhaps the original poster wanted a function
    >> that would convert from the host's native textual representation to an
    >> integer, in which case the above function would not be a good idea, and
    >> atoi or stringstream would.

    > Blargg, please explain... :)


    Actually Blargg's code is wrong. On a machine using EBCDIC '0' = 248, not 48.

    #include <iso646.h>

    struct ASCII {
    enum ASCII {
    NUL = 0, SOH, STX, ETX, EOT, ENQ, ACK, BELL, BACKSPACE, TAB,
    NEWLINE, VT, PAGE, RETURN, SO, SI, DLE, DC1, DC2, DC3, DC4, NAK,
    SYN, ETB, CAN, EM, SUB, ESCAPE, FS, GS, RS, US, SPACE,
    EXCLAMATION_MARK, QUOTATION_MARK, NUMBER_SIGN, DOLLAR_SIGN,
    PERCENT_SIGN, AMPERSAND, APOSTROPHE, LEFT_PARENTHESIS,
    RIGHT_PARENTHESIS, ASTERISK, PLUS_SIGN, COMMA, HYPHEN_MINUS,
    FULL_STOP, SOLIDUS, DIGIT_ZERO, DIGIT_ONE, DIGIT_TWO,
    DIGIT_THREE, DIGIT_FOUR, DIGIT_FIVE, DIGIT_SIX, DIGIT_SEVEN,
    DIGIT_EIGHT, DIGIT_NINE, COLON, SEMICOLON, LESS_THAN_SIGN,
    EQUALS_SIGN, GREATER_THAN_SIGN, QUESTION_MARK, COMMERCIAL_AT,
    LATIN_CAPITAL_LETTER_A, LATIN_CAPITAL_LETTER_B,
    LATIN_CAPITAL_LETTER_C, LATIN_CAPITAL_LETTER_D,
    LATIN_CAPITAL_LETTER_E, LATIN_CAPITAL_LETTER_F,
    LATIN_CAPITAL_LETTER_G, LATIN_CAPITAL_LETTER_H,
    LATIN_CAPITAL_LETTER_I, LATIN_CAPITAL_LETTER_J,
    LATIN_CAPITAL_LETTER_K, LATIN_CAPITAL_LETTER_L,
    LATIN_CAPITAL_LETTER_M, LATIN_CAPITAL_LETTER_N,
    LATIN_CAPITAL_LETTER_O, LATIN_CAPITAL_LETTER_P,
    LATIN_CAPITAL_LETTER_Q, LATIN_CAPITAL_LETTER_R,
    LATIN_CAPITAL_LETTER_S, LATIN_CAPITAL_LETTER_T,
    LATIN_CAPITAL_LETTER_U, LATIN_CAPITAL_LETTER_V,
    LATIN_CAPITAL_LETTER_W, LATIN_CAPITAL_LETTER_X,
    LATIN_CAPITAL_LETTER_Y, LATIN_CAPITAL_LETTER_Z,
    LEFT_SQUARE_BRACKET, REVERSE_SOLIDUS, RIGHT_SQUARE_BRACKET,
    CIRCUMFLEX_ACCENT, LOW_LINE, GRAVE_ACCENT, LATIN_SMALL_LETTER_A,
    LATIN_SMALL_LETTER_B, LATIN_SMALL_LETTER_C, LATIN_SMALL_LETTER_D,
    LATIN_SMALL_LETTER_E, LATIN_SMALL_LETTER_F, LATIN_SMALL_LETTER_G,
    LATIN_SMALL_LETTER_H, LATIN_SMALL_LETTER_I, LATIN_SMALL_LETTER_J,
    LATIN_SMALL_LETTER_K, LATIN_SMALL_LETTER_L, LATIN_SMALL_LETTER_M,
    LATIN_SMALL_LETTER_N, LATIN_SMALL_LETTER_O, LATIN_SMALL_LETTER_P,
    LATIN_SMALL_LETTER_Q, LATIN_SMALL_LETTER_R, LATIN_SMALL_LETTER_S,
    LATIN_SMALL_LETTER_T, LATIN_SMALL_LETTER_U, LATIN_SMALL_LETTER_V,
    LATIN_SMALL_LETTER_W, LATIN_SMALL_LETTER_X, LATIN_SMALL_LETTER_Y,
    LATIN_SMALL_LETTER_Z, LEFT_CURLY_BRACKET, VERTICAL_LINE,
    RIGHT_CURLY_BRACKET, TILDE, RUBOUT
    }}

    int ascii_digit_to_int ( const char asciidigit ) {
    if((asciidigit<ASCII.DIGIT_ZERO)or(ASCII.DIGIT_NINE<asciidigit)){
    throw NotADigitException();
    }else{
    return((int)asciidigit - ASCII.DIGIT_ZERO);
    }
    }

    --
    __Pascal Bourguignon__
     
    Pascal J. Bourguignon, May 27, 2009
    #4
  5. James Kanze Guest

    On May 27, 3:30 am, ""
    <> wrote:
    > On May 27, 9:18 am, "Peter Olcott" <> wrote:


    > > I remember that there is a clean C++ way to do this, but, I
    > > forgot what it was.


    > I don't know what you mean by 'clean C++ way' but one way to
    > do it is: int ascii_digit_to_int ( const char asciidigit ) {
    > if ( asciidigit < '0' ||
    > asciidigit > '9' ) {
    > throw NotADigitException();
    > }
    > return (int) asciidigit - 48; // 48 => '0'


    That's wrong. There's no guarantee that '0' is 48. I've worked
    on machines where it is 240. (Of course, the term asciidigit is
    very misleading on such machines, because you're really dealing
    with an ebcdicdigit.)

    You are guaranteed that the decimal digits are consecutive, so
    digit - '0' works. Of course, as soon as you do that, someone
    will ask for support for hexadecimal. The simplest solution is
    just to create a table, correctly initialize it, and then:

    return table[ digit ] < 0
    ? throw NotADigitException()
    : table[ digit ] ;
    > }


    > Or you can use atoi or similar.
    > Or you use the std::stringstream:


    > std::stringstream sstr("3");
    > int value;
    > sstr >> value;


    That's the normal way of converting a stream of digits into a
    number in internal format.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, May 27, 2009
    #5
  6. On 27 mai, 12:04, James Kanze <> wrote:
    > On May 27, 3:30 am, ""
    >
    > <> wrote:
    > > On May 27, 9:18 am, "Peter Olcott" <> wrote:
    > > > I remember that there is a clean C++ way to do this, but, I
    > > > forgot what it was.

    [snip]
    > > Or you can use atoi or similar.


    atoi cannot report errors and should be avoided if the format has not
    been previously validated.

    > > Or you use the std::stringstream:
    > > std::stringstream sstr("3");
    > > int value;
    > > sstr >> value;

    >
    > That's the normal way of converting a stream of digits into a
    > number in internal format.


    An equivalent indigest way to do it is to use directly the num_get()
    member of the facet:

    using namespace std;
    locale loc;
    ios::iostate state;
    int value;
    istream is;
    string number ("3");

    use_facet<num_get<char,string::iterator> >(loc).get (
    number.begin(),number.end(),
    is,state,
    value
    );

    You can modify the formatting of the number by setting fmtflags in
    'is'.

    You can wrap it in something like the numeric_cast<>() of Boost.

    --
    Michael
     
    Michael Doubez, May 27, 2009
    #6
  7. Default User Guest

    blargg wrote:

    > blargg wrote:
    > > andreas.koestler wrote:
    > > > On May 27, 9:18 am, "Peter Olcott" <> wrote:
    > > > > I remember that there is a clean C++ way to do this [convert
    > > > > ASCII digits to Integer], but, I forgot what it was.
    > > >
    > > > I don't know what you mean by 'clean C++ way' but one way to do
    > > > it is:
    > > >
    > > > int ascii_digit_to_int ( const char asciidigit ) {
    > > > if ( asciidigit < '0' ||
    > > > asciidigit > '9' ) {
    > > > throw NotADigitException();
    > > > }
    > > > return (int) asciidigit - 48; // 48 => '0'
    > > > }

    > [...]
    > > Not if the machine doesn't use ASCII; only a function like yours
    > > above is fully portable.

    >
    > Whoops, that's wrong too, as the above function uses '0' and '9',
    > which won't be ASCII on a non-ASCII machine. So the above should
    > really use 48 and 57 in place of those character constants, to live
    > up to its name. Otherwise, on a machine using ASCII, it'll work, but
    > on another, it'll be broken and neither convert from ASCII nor the
    > machine's native character set!


    The requirements for numerals in the character set specify that the
    values be consecutive and in increasing value.

    So digit - '0' will always give you the numeric value of the numeral
    the character represents, regardless of whether it is ASCII or not. The
    same is not true for digit - 48.

    The original problem specified conversion from ASCII, but that's not
    likely what the OP really wanted. If so, then a preliminary step to
    convert to ASCII could be performed, but that's probably not what was
    really desired.



    Brian
     
    Default User, May 27, 2009
    #7
  8. Default User Guest

    Pete Becker wrote:

    > Default User wrote:
    > >
    > > The original problem specified conversion from ASCII, but that's not
    > > likely what the OP really wanted.

    >
    > If you write code that you think your boss wants instead of the code
    > your boss said he wants you won't last long in your job. If you think
    > the specification is wrong, ask the person who is responsible for it.


    That really depends on circumstances. For the most part in my career as
    a software engineer, supervisors have not created specifications to
    that level.




    Brian
     
    Default User, May 28, 2009
    #8
  9. Default User Guest

    Pete Becker wrote:

    > Default User wrote:
    > > Pete Becker wrote:


    > > That really depends on circumstances. For the most part in my
    > > career as a software engineer, supervisors have not created
    > > specifications to that level.

    >
    > If you ignore a specification because you think it's wrong you're
    > simply wrong. If you're winging it without specifications you have a
    > completely different set of issues.



    I'm not following. My superviors have not typically set requirements to
    that level. That is to say, the broad strokes of task are set, and the
    engineers will define and implement the lower-level requirements.




    Brian
     
    Default User, May 28, 2009
    #9
  10. James Kanze Guest

    On May 28, 1:59 pm, Pete Becker <> wrote:
    > Default User wrote:


    > > The original problem specified conversion from ASCII, but
    > > that's not likely what the OP really wanted.


    > If you write code that you think your boss wants instead of
    > the code your boss said he wants you won't last long in your
    > job.


    In most places I've worked, writing what the boss wants, rather
    than what he says, is good for your career. Of course...

    > If you think the specification is wrong, ask the person
    > who is responsible for it.


    If you think that what he is actually asking for is not what he
    wants, you are better off asking, just to be sure.

    In the end, it depends on the boss.

    And in this case, Default User is probably right: far too many
    newbies use "ascii" to mean text. (Given that ASCII is for all
    intents and purposes dead, it's highly unlikely that they really
    want ASCII.)

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, May 29, 2009
    #10
  11. James Kanze wrote:

    > (Given that ASCII is for all intents and purposes dead, it's highly
    > unlikely that they really want ASCII.)


    I'm not sure, but I think in the USA there is quite a number of
    programmers who don't think beyond ASCII when thinking of text
    manipulation.

    Gerhard
     
    Gerhard Fiedler, May 29, 2009
    #11
  12. Joe Greer Guest

    Gerhard Fiedler <> wrote in news:2ijwirmpswzq
    $:

    > James Kanze wrote:
    >
    >> (Given that ASCII is for all intents and purposes dead, it's highly
    >> unlikely that they really want ASCII.)

    >
    > I'm not sure, but I think in the USA there is quite a number of
    > programmers who don't think beyond ASCII when thinking of text
    > manipulation.
    >
    > Gerhard
    >


    I believe you are correct for shops developing in house software or
    possibly just using text for debugging/logging (that is, internal use
    only). However, the last couple of places I worked were certainly at least
    trying to develop for an international market. :)

    joe
     
    Joe Greer, May 29, 2009
    #12
  13. James Kanze Guest

    On May 29, 3:08 pm, Gerhard Fiedler <> wrote:
    > James Kanze wrote:
    > > (Given that ASCII is for all intents and purposes dead, it's
    > > highly unlikely that they really want ASCII.)


    > I'm not sure, but I think in the USA there is quite a number
    > of programmers who don't think beyond ASCII when thinking of
    > text manipulation.


    In just about every country, there are quite a number of
    programmers who don't think:). The fact remains that the
    default encoding used by the system, even when configured for
    the US, is not ASCII. Even if you're not "thinking" beyond
    ASCII, your program must be capable of reading non-ASCII
    characters (if only to recognize them and signal the error).

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, May 31, 2009
    #13
  14. osmium Guest

    James Kanze wrote:

    > On May 29, 3:08 pm, Gerhard Fiedler <> wrote:
    >> James Kanze wrote:
    >>> (Given that ASCII is for all intents and purposes dead, it's
    >>> highly unlikely that they really want ASCII.)

    >
    >> I'm not sure, but I think in the USA there is quite a number
    >> of programmers who don't think beyond ASCII when thinking of
    >> text manipulation.

    >
    > In just about every country, there are quite a number of
    > programmers who don't think:). The fact remains that the
    > default encoding used by the system, even when configured for
    > the US, is not ASCII. Even if you're not "thinking" beyond
    > ASCII, your program must be capable of reading non-ASCII
    > characters (if only to recognize them and signal the error).


    Is it your point that an ASCII compliant environment would have to signal an
    error if the topmost bit in a byte was something other than 0? Or do you
    have something else in mind? I don't have the *actual* ASCII standard
    available but I would be surprised if that was expressed as a *requirement*.
    After all, the people that wrote the standard were well aware that there was
    no such thing as a seven-bit machine.
     
    osmium, May 31, 2009
    #14
  15. * osmium:
    > James Kanze wrote:
    >
    >> On May 29, 3:08 pm, Gerhard Fiedler <> wrote:
    >>> James Kanze wrote:
    >>>> (Given that ASCII is for all intents and purposes dead, it's
    >>>> highly unlikely that they really want ASCII.)
    >>> I'm not sure, but I think in the USA there is quite a number
    >>> of programmers who don't think beyond ASCII when thinking of
    >>> text manipulation.

    >> In just about every country, there are quite a number of
    >> programmers who don't think:). The fact remains that the
    >> default encoding used by the system, even when configured for
    >> the US, is not ASCII. Even if you're not "thinking" beyond
    >> ASCII, your program must be capable of reading non-ASCII
    >> characters (if only to recognize them and signal the error).

    >
    > Is it your point that an ASCII compliant environment would have to signal an
    > error if the topmost bit in a byte was something other than 0?


    I think James is perhaps referring to routines like isdigit family.

    Some of them take int argument and have UB if the argument value is outside
    0...(unsigned char)(-1).

    So with most implementations you get UB if you simply pass a char directly as
    argument and that char is beyond ASCII range, because then it will be negative.


    > Or do you
    > have something else in mind? I don't have the *actual* ASCII standard
    > available but I would be surprised if that was expressed as a *requirement*.


    See above.


    > After all, the people that wrote the standard were well aware that there was
    > no such thing as a seven-bit machine.


    On the contrary, the seven bit nature of ASCII was to facilitate communication
    over e.g. serial links with software parity check, where each byte was
    effectively seven bits (since one bit was used for parity).


    Cheers & hth.,

    - Alf

    --
    Due to hosting requirements I need visits to <url: http://alfps.izfree.com/>.
    No ads, and there is some C++ stuff! :) Just going there is good. Linking
    to it is even better! Thanks in advance!
     
    Alf P. Steinbach, May 31, 2009
    #15
  16. * Alf P. Steinbach:
    > * osmium:
    >> James Kanze wrote:
    >>
    >>> On May 29, 3:08 pm, Gerhard Fiedler <> wrote:
    >>>> James Kanze wrote:
    >>>>> (Given that ASCII is for all intents and purposes dead, it's
    >>>>> highly unlikely that they really want ASCII.)
    >>>> I'm not sure, but I think in the USA there is quite a number
    >>>> of programmers who don't think beyond ASCII when thinking of
    >>>> text manipulation.
    >>> In just about every country, there are quite a number of
    >>> programmers who don't think:). The fact remains that the
    >>> default encoding used by the system, even when configured for
    >>> the US, is not ASCII. Even if you're not "thinking" beyond
    >>> ASCII, your program must be capable of reading non-ASCII
    >>> characters (if only to recognize them and signal the error).

    >>
    >> Is it your point that an ASCII compliant environment would have to
    >> signal an error if the topmost bit in a byte was something other than 0?

    >
    > I think James is perhaps referring to routines like isdigit family.
    >
    > Some of them take int argument and have UB if the argument value is
    > outside 0...(unsigned char)(-1).
    >
    > So with most implementations you get UB if you simply pass a char
    > directly as argument and that char is beyond ASCII range, because then
    > it will be negative.
    >
    >
    >> Or do you have something else in mind? I don't have the *actual*
    >> ASCII standard available but I would be surprised if that was
    >> expressed as a *requirement*.

    >
    > See above.
    >
    >
    >> After all, the people that wrote the standard were well aware that
    >> there was no such thing as a seven-bit machine.

    >
    > On the contrary, the seven bit nature of ASCII was to facilitate
    > communication over e.g. serial links with software parity check, where
    > each byte was effectively seven bits (since one bit was used for parity).


    Forgot to add, one of the early PDPs had, as I recall, configurable byte size... :)

    --
    Due to hosting requirements I need visits to <url: http://alfps.izfree.com/>.
    No ads, and there is some C++ stuff! :) Just going there is good. Linking
    to it is even better! Thanks in advance!
     
    Alf P. Steinbach, May 31, 2009
    #16
  17. James Kanze Guest

    On May 31, 4:22 pm, "osmium" <> wrote:
    > James Kanze wrote:
    > > On May 29, 3:08 pm, Gerhard Fiedler <> wrote:
    > >> James Kanze wrote:
    > >>> (Given that ASCII is for all intents and purposes dead, it's
    > >>> highly unlikely that they really want ASCII.)


    > >> I'm not sure, but I think in the USA there is quite a number
    > >> of programmers who don't think beyond ASCII when thinking of
    > >> text manipulation.


    > > In just about every country, there are quite a number of
    > > programmers who don't think:). The fact remains that the
    > > default encoding used by the system, even when configured for
    > > the US, is not ASCII. Even if you're not "thinking" beyond
    > > ASCII, your program must be capable of reading non-ASCII
    > > characters (if only to recognize them and signal the error).


    > Is it your point that an ASCII compliant environment would
    > have to signal an error if the topmost bit in a byte was
    > something other than 0?


    My point is that the actual bytes you'll be reading may contain
    non-ASCII characters, whether you like it or not, and that your
    program has to handle them in order to be correct. (Of course,
    lots of programs limit their input. None of my programs which
    deal with text, for example, allow control characters like STX
    or DC1; such a character in the input will trigger an error. As
    will an illegal UTF-8 sequence, if the program is inputting
    UTF-8.)

    > Or do you have something else in mind? I don't have the
    > *actual* ASCII standard available but I would be surprised if
    > that was expressed as a *requirement*. After all, the people
    > that wrote the standard were well aware that there was no such
    > thing as a seven-bit machine.


    ASCII defined code points in the range 0-127. Any other value
    is not ASCII. (And the usual arrangement on a PDP-10 was 5
    seven bit bytes in a 36 bit word, with one bit left over.)

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, Jun 1, 2009
    #17
  18. James Kanze Guest

    On May 31, 4:52 pm, "Alf P. Steinbach" <> wrote:
    > * osmium:
    > > James Kanze wrote:


    > >> On May 29, 3:08 pm, Gerhard Fiedler <> wrote:
    > >>> James Kanze wrote:
    > >>>> (Given that ASCII is for all intents and purposes dead, it's
    > >>>> highly unlikely that they really want ASCII.)
    > >>> I'm not sure, but I think in the USA there is quite a number
    > >>> of programmers who don't think beyond ASCII when thinking of
    > >>> text manipulation.
    > >> In just about every country, there are quite a number of
    > >> programmers who don't think:). The fact remains that the
    > >> default encoding used by the system, even when configured for
    > >> the US, is not ASCII. Even if you're not "thinking" beyond
    > >> ASCII, your program must be capable of reading non-ASCII
    > >> characters (if only to recognize them and signal the error).


    > > Is it your point that an ASCII compliant environment would
    > > have to signal an error if the topmost bit in a byte was
    > > something other than 0?


    > I think James is perhaps referring to routines like isdigit
    > family.


    > Some of them take int argument and have UB if the argument
    > value is outside 0...(unsigned char)(-1).


    s/Some/All/

    The standard says 0...UCHAR_MAX or EOF. But UCHAR_MAX and
    (unsigned char)(-1) are, of course, guaranteed to be equal. And
    EOF is guaranteed to be negative, so there can never be any
    ambiguity between one of the characters and EOF.

    > So with most implementations you get UB if you simply pass a
    > char directly as argument and that char is beyond ASCII range,
    > because then it will be negative.


    That is, of course, something that you always have to consider.
    The "official" answer, in C++, is to use the corresponding
    functions in <locale>. Which have been carefully designed to be
    even more verbose than the C function with the cast, and to run
    several orders of magnitude slower. (But other than that,
    they're fine.)

    > > After all, the people that wrote the standard were well
    > > aware that there was no such thing as a seven-bit machine.


    > On the contrary, the seven bit nature of ASCII was to
    > facilitate communication over e.g. serial links with software
    > parity check, where each byte was effectively seven bits
    > (since one bit was used for parity).


    I'm not sure what the original rationale was. One mustn't
    forget that at the time, six bit codes were quite commun.
    Moving to seven bits probably seemed to be the minimum solution
    to support both upper case and lower. And that given the
    transmission speeds at the time (110 baud for a teletype), every
    bit gained helped. But the fact that you could put the
    character with parity into an octet was probably a consideration
    as well.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, Jun 1, 2009
    #18
  19. James Kanze Guest

    On May 31, 4:54 pm, "Alf P. Steinbach" <> wrote:
    > * Alf P. Steinbach:


    > Forgot to add, one of the early PDPs had, as I recall,
    > configurable byte size... :)


    Programmable. The PDP-10 was word addressed, with special
    instructions to access bytes. The "byte address" used some of
    the high order bits to specify the bit offset of the byte in the
    word, and the number of bits in the byte. Incrementing a "byte
    pointer" added the number of bits to the bit offset, and if the
    result was more than 36-number of bits, incremented base address
    and set the bit offset to 0.

    The fact that the bit offset was in the high bits led to another
    interesting effect:
    assert( (unsigned)( p + 1 ) > (unsigned)( p ) ) ;
    would often fail if p was a char*.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, Jun 1, 2009
    #19
  20. * James Kanze:
    > On May 31, 4:52 pm, "Alf P. Steinbach" <> wrote:
    >> * osmium:
    >>> James Kanze wrote:

    >
    >>>> On May 29, 3:08 pm, Gerhard Fiedler <> wrote:
    >>>>> James Kanze wrote:
    >>>>>> (Given that ASCII is for all intents and purposes dead, it's
    >>>>>> highly unlikely that they really want ASCII.)
    >>>>> I'm not sure, but I think in the USA there is quite a number
    >>>>> of programmers who don't think beyond ASCII when thinking of
    >>>>> text manipulation.
    >>>> In just about every country, there are quite a number of
    >>>> programmers who don't think:). The fact remains that the
    >>>> default encoding used by the system, even when configured for
    >>>> the US, is not ASCII. Even if you're not "thinking" beyond
    >>>> ASCII, your program must be capable of reading non-ASCII
    >>>> characters (if only to recognize them and signal the error).

    >
    >>> Is it your point that an ASCII compliant environment would
    >>> have to signal an error if the topmost bit in a byte was
    >>> something other than 0?

    >
    >> I think James is perhaps referring to routines like isdigit
    >> family.

    >
    >> Some of them take int argument and have UB if the argument
    >> value is outside 0...(unsigned char)(-1).

    >
    > s/Some/All/
    >
    > The standard says 0...UCHAR_MAX or EOF. But UCHAR_MAX and
    > (unsigned char)(-1) are, of course, guaranteed to be equal. And
    > EOF is guaranteed to be negative, so there can never be any
    > ambiguity between one of the characters and EOF.
    >
    >> So with most implementations you get UB if you simply pass a
    >> char directly as argument and that char is beyond ASCII range,
    >> because then it will be negative.

    >
    > That is, of course, something that you always have to consider.
    > The "official" answer, in C++, is to use the corresponding
    > functions in <locale>. Which have been carefully designed to be
    > even more verbose than the C function with the cast, and to run
    > several orders of magnitude slower. (But other than that,
    > they're fine.)


    It's the first time I can say we're on completely same wavelength. He he. :)


    >>> After all, the people that wrote the standard were well
    >>> aware that there was no such thing as a seven-bit machine.

    >
    >> On the contrary, the seven bit nature of ASCII was to
    >> facilitate communication over e.g. serial links with software
    >> parity check, where each byte was effectively seven bits
    >> (since one bit was used for parity).

    >
    > I'm not sure what the original rationale was. One mustn't
    > forget that at the time, six bit codes were quite commun.
    > Moving to seven bits probably seemed to be the minimum solution
    > to support both upper case and lower. And that given the
    > transmission speeds at the time (110 baud for a teletype), every
    > bit gained helped. But the fact that you could put the
    > character with parity into an octet was probably a consideration
    > as well.


    Yeah, I think perhaps that was 20-20 hindsight rationalization. Now that I try
    to shake and rattle old memory cubes a little, they spit out some vague
    recollection of using 7-bit serial comms without any parity.


    Cheers,

    - Alf

    --
    Due to hosting requirements I need visits to <url: http://alfps.izfree.com/>.
    No ads, and there is some C++ stuff! :) Just going there is good. Linking
    to it is even better! Thanks in advance!
     
    Alf P. Steinbach, Jun 1, 2009
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Kenneth
    Replies:
    2
    Views:
    11,239
    cheapterp
    Aug 20, 2007
  2. Replies:
    16
    Views:
    2,294
    Dave Thompson
    Mar 3, 2006
  3. Alextophi
    Replies:
    8
    Views:
    518
    Alan J. Flavell
    Dec 30, 2005
  4. bruce
    Replies:
    38
    Views:
    281
    Mark Lawrence
    Nov 1, 2013
  5. MRAB
    Replies:
    0
    Views:
    99
Loading...

Share This Page