|| putchar(ch == '\177' ? '?' : ch | 0100) == EOF)

Discussion in 'C Programming' started by c gordon liddy, Mar 28, 2008.

  1. 2 different cats.

    I've been going through chp 8 of K&R and wanted to write a standard cat
    function with a little more functionality than existing solns: I want to
    code behavior for the -v switch. It occurs to me that there "should" be
    source out there for this and googled for "cat.c unix source" . The second
    hit I got was this:

    /*
    * Concatenate files.
    */

    #include <stdio.h>
    #include <sys/types.h>
    #include <sys/stat.h>

    char stdbuf[BUFSIZ];

    main(argc, argv)
    char **argv;
    {
    int fflg = 0;
    register FILE *fi;
    register c;
    int dev, ino = -1;
    struct stat statb;

    setbuf(stdout, stdbuf);
    for( ; argc>1 && argv[1][0]=='-'; argc--,argv++) {
    switch(argv[1][1]) {

    Holy smokes! This must count as archeology for unix systems. Funky-looking
    main call and register as a type as opposed to storage specifier. This gave
    me a pretty good idea what I wasn't looking for.

    I then hit on:
    http://www.openbsd.org/cgi-bin/cvsweb/src/bin/cat/cat.c?rev=1.14&content-type=text/plain

    The first thing a person notices is the stack of non-standard headers.
    Their inclusion is the usual reason for lack of topicality of unix
    questions. My platform and my target consist of my non-unix machine; not
    only do I not know what's in those headers, I don't have them.

    Past that is the main control:
    while ((ch = getopt(argc, argv, "benstuv")) != -1)
    switch (ch) {
    case 'b':
    bflag = nflag = 1; /* -b implies -n */
    break;
    case 'e':
    eflag = vflag = 1; /* -e implies -v */
    break;
    case 'n':
    nflag = 1;
    break;
    case 's':
    sflag = 1;
    break;
    case 't':
    tflag = vflag = 1; /* -t implies -v */
    break;
    case 'u':
    setbuf(stdout, NULL);
    break;
    case 'v':
    vflag = 1;
    break;
    default:
    (void)fprintf(stderr,
    "usage: %s [-benstuv] [-] [file ...]\n", __progname);
    exit(1);
    /* NOTREACHED */
    }
    argv += optind;

    The only case I'm to consider is 'v', so I won't need all of this. getopt
    will be something that I have to code from scratch. Out of curiosity, what
    header is it defined in?

    Moving along is:
    if (bflag || eflag || nflag || sflag || tflag || vflag)
    cook_args(argv);
    , so if any flag gets set we cook the args. Maybe instead, we cook with the
    args. In this process we traverse through:

    } else if (vflag)
    { if (!isascii(ch))
    { if (putchar('M') == EOF || putchar('-') == EOF)
    break;
    ch = toascii(ch);
    }
    if (iscntrl(ch)) {
    if (putchar('^') == EOF ||
    putchar(ch == '\177' ? '?' :
    ch | 0100) == EOF)
    break;
    continue;
    }
    I did my best to get this on the screen. The parts I don't understand here
    follow the double pipe, which I read as "inclusive or." In the first if
    clause, it would appear that 'M' is substituted for non-ascii chars. What
    does
    || putchar('-') == EOF)
    do beyond this?

    Similarly, I'm out of my depth with what follows the double pipe in the
    second if clause.
    || putchar(ch == '\177' ? '?' : ch | 0100) == EOF)

    Wouldn't \177 be a tri-graph? A perfectly-acceptable explanation might be
    that it's beyond the scope of my present endeavor and can be omitted.

    Grateful for your thoughtful comment.
    --
    C. Gordon Liddy

    "Virile, vigorous, potent"
    c gordon liddy, Mar 28, 2008
    #1
    1. Advertising

  2. "c gordon liddy" <> writes:
    > 2 different cats.
    >
    > I've been going through chp 8 of K&R and wanted to write a standard cat
    > function with a little more functionality than existing solns: I want to
    > code behavior for the -v switch. It occurs to me that there "should" be
    > source out there for this and googled for "cat.c unix source" . The second
    > hit I got was this:
    >
    > /*
    > * Concatenate files.
    > */
    >
    > #include <stdio.h>
    > #include <sys/types.h>
    > #include <sys/stat.h>
    >
    > char stdbuf[BUFSIZ];
    >
    > main(argc, argv)
    > char **argv;
    > {
    > int fflg = 0;
    > register FILE *fi;
    > register c;
    > int dev, ino = -1;
    > struct stat statb;
    >
    > setbuf(stdout, stdbuf);
    > for( ; argc>1 && argv[1][0]=='-'; argc--,argv++) {
    > switch(argv[1][1]) {
    >
    > Holy smokes! This must count as archeology for unix systems. Funky-looking
    > main call and register as a type as opposed to storage specifier. This gave
    > me a pretty good idea what I wasn't looking for.


    Yes, that code is archaic, but it's of some historical interest (to
    show how much the language has improved if nothing else).

    It makes heavy use of "implicit int", which is discouraged for C90
    (though the standard doesn't say so) and dropped completely in C90.
    In "register c;", register is a storage specifier, just as it is in
    modern C; the declaration is equivalent to "register int c;".

    > I then hit on:
    > http://www.openbsd.org/cgi-bin/cvsweb/src/bin/cat/cat.c?rev=1.14&content-type=text/plain
    >
    > The first thing a person notices is the stack of non-standard headers.
    > Their inclusion is the usual reason for lack of topicality of unix
    > questions. My platform and my target consist of my non-unix machine; not
    > only do I not know what's in those headers, I don't have them.


    Most of the non-standard stuff is probably not strictly necessary, but
    it can be used to improve performance or to provide various bells and
    whistles.

    > Past that is the main control:
    > while ((ch = getopt(argc, argv, "benstuv")) != -1)

    [...]

    getopt is non-standard, as I'm sure you know.

    [...]
    > The only case I'm to consider is 'v', so I won't need all of this. getopt
    > will be something that I have to code from scratch. Out of curiosity, what
    > header is it defined in?


    It varies. Consult your system's documentation, or Google it. If
    your system doesn't provide it, there are open-source implementations
    out there. (For that matter, there are plenty of open-source
    implementations of "cat", but I suppose that would defeat your
    purpose.)

    > Moving along is:
    > if (bflag || eflag || nflag || sflag || tflag || vflag)
    > cook_args(argv);
    > , so if any flag gets set we cook the args. Maybe instead, we cook with the
    > args. In this process we traverse through:
    >
    > } else if (vflag)
    > { if (!isascii(ch))
    > { if (putchar('M') == EOF || putchar('-') == EOF)
    > break;
    > ch = toascii(ch);
    > }
    > if (iscntrl(ch)) {
    > if (putchar('^') == EOF ||
    > putchar(ch == '\177' ? '?' :
    > ch | 0100) == EOF)
    > break;
    > continue;
    > }
    > I did my best to get this on the screen. The parts I don't understand here
    > follow the double pipe, which I read as "inclusive or." In the first if
    > clause, it would appear that 'M' is substituted for non-ascii chars. What
    > does
    > || putchar('-') == EOF)
    > do beyond this?


    It uses a common convention (at least it's common on Unix) for
    displaying non-printable characters. Control characters in the range
    0 to 31 are represented as a '^' followed by another character,
    usually an uppercase letter; it's determined by adding 64 to the
    value. (On old keyboards, the control key actually worked by clearing
    a bit in the 7-bit or 8-bit value that was transmitted.) The DEL
    character, 127, is represented as ^?; this is a special case.
    Characters with the high bit set, in the range 128 to 255, are called
    "meta" characters (some old keyboards had a "meta" key that set this
    bit), and are represented as "M-" followed by the representation of
    the corresponding 7-bit character. For example, character 129 would
    be printed as M-^A.

    putchar() returns EOF on failure.

    All this (except the EOF part) is very specific to the ASCII character
    set, something that's not specified by the C standard, but it should
    give you enough information to understand what the code is doing (with
    a bit of work).

    > Similarly, I'm out of my depth with what follows the double pipe in the
    > second if clause.
    > || putchar(ch == '\177' ? '?' : ch | 0100) == EOF)
    >
    > Wouldn't \177 be a tri-graph? A perfectly-acceptable explanation might be
    > that it's beyond the scope of my present endeavor and can be omitted.


    No, it's not a trigraph; trigraphs are introduced by a double question
    mark. It's a character constant that uses an escape sequence. '\177'
    is the character whose integer value is 177 in octal, or 127 in
    decimal; it's the ASCII DEL character. "ch | 0100" yields the value
    of ch with a certain bit forced on; it's terse way of mapping
    control-A (1) to 'A" and so forth. The conditional expression is used
    to handle the fact that mapping DEL to "^?" is a special case.

    --
    Keith Thompson (The_Other_Keith) <>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Mar 28, 2008
    #2
    1. Advertising

  3. "Keith Thompson" <> wrote in message news:
    ...
    > "c gordon liddy" <> writes:


    > Yes, that code is archaic, but it's of some historical interest (to
    > show how much the language has improved if nothing else).

    Thanks for your generous response. I understood the above, but about
    right here is where I hit the peter principle:

    > It uses a common convention (at least it's common on Unix) for
    > displaying non-printable characters. Control characters in the range
    > 0 to 31 are represented as a '^' followed by another character,
    > usually an uppercase letter; it's determined by adding 64 to the
    > value. (On old keyboards, the control key actually worked by clearing
    > a bit in the 7-bit or 8-bit value that was transmitted.) The DEL
    > character, 127, is represented as ^?; this is a special case.
    > Characters with the high bit set, in the range 128 to 255, are called
    > "meta" characters (some old keyboards had a "meta" key that set this
    > bit), and are represented as "M-" followed by the representation of
    > the corresponding 7-bit character. For example, character 129 would
    > be printed as M-^A.
    >
    > putchar() returns EOF on failure.
    >
    > All this (except the EOF part) is very specific to the ASCII character
    > set, something that's not specified by the C standard, but it should
    > give you enough information to understand what the code is doing (with
    > a bit of work).
    >
    >> Similarly, I'm out of my depth with what follows the double pipe in the
    >> second if clause.
    >> || putchar(ch == '\177' ? '?' : ch | 0100) == EOF)
    >>
    >> Wouldn't \177 be a tri-graph? A perfectly-acceptable explanation might be
    >> that it's beyond the scope of my present endeavor and can be omitted.

    >
    > No, it's not a trigraph; trigraphs are introduced by a double question
    > mark. It's a character constant that uses an escape sequence. '\177'
    > is the character whose integer value is 177 in octal, or 127 in
    > decimal; it's the ASCII DEL character. "ch | 0100" yields the value
    > of ch with a certain bit forced on; it's terse way of mapping
    > control-A (1) to 'A" and so forth. The conditional expression is used
    > to handle the fact that mapping DEL to "^?" is a special case.


    I think I could study the above for a long time and not really get
    it. It's interesting but not germane to something that can be done in
    standard C. I have a double problem with the double pipe here. Not
    only is that which is on the right hand side of it obfuscated C, I
    don't get the control mechanism. To me, it looks like
    if this then that or the other.

    I would suspect that there's another K&R exercise that speaks to this
    in the realm of my own OS.

    --
    c gordon liddy, Mar 28, 2008
    #3
  4. c gordon liddy <> writes:
    > "Keith Thompson" <> wrote in message news:
    > ...
    >> "c gordon liddy" <> writes:

    [...]
    >>> Similarly, I'm out of my depth with what follows the double pipe in the
    >>> second if clause.
    >>> || putchar(ch == '\177' ? '?' : ch | 0100) == EOF)
    >>>
    >>> Wouldn't \177 be a tri-graph? A perfectly-acceptable explanation might be
    >>> that it's beyond the scope of my present endeavor and can be omitted.

    >>
    >> No, it's not a trigraph; trigraphs are introduced by a double question
    >> mark. It's a character constant that uses an escape sequence. '\177'
    >> is the character whose integer value is 177 in octal, or 127 in
    >> decimal; it's the ASCII DEL character. "ch | 0100" yields the value
    >> of ch with a certain bit forced on; it's terse way of mapping
    >> control-A (1) to 'A" and so forth. The conditional expression is used
    >> to handle the fact that mapping DEL to "^?" is a special case.

    >
    > I think I could study the above for a long time and not really get
    > it. It's interesting but not germane to something that can be done in
    > standard C. I have a double problem with the double pipe here. Not
    > only is that which is on the right hand side of it obfuscated C, I
    > don't get the control mechanism. To me, it looks like
    > if this then that or the other.


    The quote code, as far as I can tell, *is* standard C. I don't
    believe it's deliberately obfuscated; rather, it's unusually terse,
    written in a style that favors packing lots of information into
    complex expressions rather than breaking it down into separate
    statements.

    You can skip it and go on to something easier if you like, but you
    might consider taking one more stab at it.

    Let's take a look at the statement:

    if (iscntrl(ch)) {
    if (putchar('^') == EOF ||
    putchar(ch == '\177' ? '?' :
    ch | 0100) == EOF)
    break;
    continue;
    }

    if ch is a control character then
    if printing '^' fails *or* printing another character fails then
    break out of the loop (give up)
    end if
    Printing succeeded; nothing more to do here: "continue"
    end if

    iscntrl(ch) returns true if ch is a "control character". In this
    context, it tells us that it's a non-printable character that we want
    to represent as a '^' followed by another character (^G for the ASCII
    BEL character, ^? for DEL).

    Within the if statement we see two calls to putchar(), one to print
    the '^' character and one to print whatever follows it. Both results
    are compared against EOF (which indicates failure); if either
    putchar() fails, we break out of the loop.

    The part before the "||" is reasonably clear: try to print a '^'
    character and check whether the attempt failed. "||" is a
    short-circuit operator, evaluating its right operand only if the left
    operand is false, so if the first putchar call fails we won't attempt
    the second one.

    Now let's look at the part after the "||":

    putchar(ch == '\177' ? '?' : ch | 0100) == EOF

    We've covered the higher level control flow, so we're down to figuring
    out what the heck

    ch == '\177' ? '?' : ch | 0100

    means. Some parentheses might make it clearer:

    (ch == '\177') ? ('?') : (ch | 0100)

    If ch is equal to '\177' (character 177 octal, 127 decimal, ASCII
    DEL), the expression yields '?'. The result is that we print a '?'
    after the '^'.

    Otherwise (For any other control character), the result is (ch |
    0100). 0100, since it begins with '0' is an octal constant, equal to
    64, a power of 2. "|" is the bitwise "or" operator.

    The binary value of 0100 is 01000000. Suppose the value of ch is 7
    (ascii BEL, which we're going to want to print as "^G"). 7 is
    00000111. Applying bitwise or to these two operands gives us
    01000111, which is 0107 in octal, or 71 in decimal, or 'G' in ASCII.

    0100 (octal) is being used as a bit mask; it has a single bit set to
    1, and all others set to 0. (ch | 0100) yields the value of ch with
    the bit in that particular position turned on. As it happens, that's
    a terse way to specify a transformation from a control character to
    the corresponding letter.

    Note that (ch + 64) would have worked just as well in this context
    (since we know the bit we want to turn on isn't already on). The
    author probably chose to write "ch | 0100" because he thought of the
    operation as setting a bit, not as the equivalent addition.

    Here's a much more verbose chunk of code that does the same thing.
    I've kept the "c | 0100" idiom, but expanded everything else. The
    original code is more terse than I tend to like; the following is much
    too verbose for my taste, but it might be clearer. (I've compiled it,
    but I haven't tested it.)

    if (iscntrl(ch)) {
    /* ch is a control character */
    int result;

    /*
    * The two characters we want to print. The first is '^';
    * we don't know yet what the second is.
    */
    int ch1 = '^';
    int ch2;

    /* Try to print the first character. */
    result = putchar(ch1);
    if (result == EOF) {
    /* Failed, terminate the loop *?
    break;
    }

    if (ch == '\177') {
    /* ch is DEL, we want "^?" */
    ch2 = '?';
    }
    else {
    /*
    * ch is another control character.
    * Transform 1 to 'A', 2 to 'B', etc. using
    * our intimate knowledge of ASCII encoding.
    */
    ch2 = ch | 0100;
    }

    /* Print as above */
    result = putchar(ch2);
    if (result == EOF) {
    break;
    }
    }

    --
    Keith Thompson (The_Other_Keith) <>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Mar 28, 2008
    #4
  5. On Mar 28, 8:11 am, Keith Thompson <> wrote:
    > c gordon liddy <> writes:
    >
    >
    >
    > > "Keith Thompson" <> wrote in message news:
    > > ...
    > >> "c gordon liddy" <> writes:

    > [...]
    > >>> Similarly, I'm out of my depth with what follows the double pipe in the
    > >>> second if clause.
    > >>> || putchar(ch == '\177' ? '?' : ch | 0100) == EOF)

    >
    > >>> Wouldn't \177 be a tri-graph?  A perfectly-acceptable explanation might be
    > >>> that it's beyond the scope of my present endeavor and can be omitted.

    >
    > >> No, it's not a trigraph; trigraphs are introduced by a double question
    > >> mark.  It's a character constant that uses an escape sequence.  '\177'
    > >> is the character whose integer value is 177 in octal, or 127 in
    > >> decimal; it's the ASCII DEL character.  "ch | 0100" yields the value
    > >> of ch with a certain bit forced on; it's terse way of mapping
    > >> control-A (1) to 'A" and so forth.  The conditional expression is used
    > >> to handle the fact that mapping DEL to "^?" is a special case.

    >
    > > I think I could study the above for a long time and not really get
    > > it.  It's interesting but not germane to something that can be done in
    > > standard C.  I have a double problem with the double pipe here.  Not
    > > only is that which is on the right hand side of it obfuscated C, I
    > > don't get the control mechanism.  To me, it looks like
    > > if this then that or the other.

    >
    > The quote code, as far as I can tell, *is* standard C.  I don't
    > believe it's deliberately obfuscated; rather, it's unusually terse,
    > written in a style that favors packing lots of information into
    > complex expressions rather than breaking it down into separate
    > statements.
    >
    > You can skip it and go on to something easier if you like, but you
    > might consider taking one more stab at it.
    >
    > Let's take a look at the statement:
    >
    >   if (iscntrl(ch)) {
    >      if (putchar('^') == EOF ||
    >          putchar(ch == '\177' ? '?' :
    >          ch | 0100) == EOF)
    >       break;
    >      continue;
    >     }
    >
    > if ch is a control character then
    >     if printing '^' fails *or* printing another character fails then
    >         break out of the loop (give up)
    >     end if
    >     Printing succeeded; nothing more to do here: "continue"
    > end if
    >
    > iscntrl(ch) returns true if ch is a "control character".  In this
    > context, it tells us that it's a non-printable character that we want
    > to represent as a '^' followed by another character (^G for the ASCII
    > BEL character, ^? for DEL).
    >
    > Within the if statement we see two calls to putchar(), one to print
    > the '^' character and one to print whatever follows it.  Both results
    > are compared against EOF (which indicates failure); if either
    > putchar() fails, we break out of the loop.
    >
    > The part before the "||" is reasonably clear: try to print a '^'
    > character and check whether the attempt failed.  "||" is a
    > short-circuit operator, evaluating its right operand only if the left
    > operand is false, so if the first putchar call fails we won't attempt
    > the second one.
    >
    > Now let's look at the part after the "||":
    >
    >     putchar(ch == '\177' ? '?' : ch | 0100) == EOF
    >
    > We've covered the higher level control flow, so we're down to figuring
    > out what the heck
    >
    >     ch == '\177' ? '?' : ch | 0100
    >
    > means.  Some parentheses might make it clearer:
    >
    >     (ch == '\177') ? ('?') : (ch | 0100)
    >
    > If ch is equal to '\177' (character 177 octal, 127 decimal, ASCII
    > DEL), the expression yields '?'.  The result is that we print a '?'
    > after the '^'.
    >
    > Otherwise (For any other control character), the result is (ch |
    > 0100).  0100, since it begins with '0' is an octal constant, equal to
    > 64, a power of 2.  "|" is the bitwise "or" operator.
    >
    > The binary value of 0100 is 01000000.  Suppose the value of ch is 7
    > (ascii BEL, which we're going to want to print as "^G").  7 is
    > 00000111.  Applying bitwise or to these two operands gives us
    > 01000111, which is 0107 in octal, or 71 in decimal, or 'G' in ASCII.
    >
    > 0100 (octal) is being used as a bit mask; it has a single bit set to
    > 1, and all others set to 0.  (ch | 0100) yields the value of ch with
    > the bit in that particular position turned on.  As it happens, that's
    > a terse way to specify a transformation from a control character to
    > the corresponding letter.
    >
    > Note that (ch + 64) would have worked just as well in this context
    > (since we know the bit we want to turn on isn't already on).  The
    > author probably chose to write "ch | 0100" because he thought of the
    > operation as setting a bit, not as the equivalent addition.
    >
    > Here's a much more verbose chunk of code that does the same thing.
    > I've kept the "c | 0100" idiom, but expanded everything else.  The
    > original code is more terse than I tend to like; the following is much
    > too verbose for my taste, but it might be clearer.  (I've compiled it,
    > but I haven't tested it.)
    >
    >     if (iscntrl(ch)) {
    >         /* ch is a control character */
    >         int result;
    >
    >         /*
    >          * The two characters we want to print.  The first is '^';
    >          * we don't know yet what the second is.
    >          */
    >         int ch1 = '^';
    >         int ch2;
    >
    >         /* Try to print the first character. */
    >         result = putchar(ch1);
    >         if (result == EOF) {
    >             /* Failed, terminate the loop *?
    >             break;
    >         }
    >
    >         if (ch == '\177') {
    >             /* ch is DEL, we want "^?" */
    >             ch2 = '?';
    >         }
    >         else {
    >             /*
    >              * ch is another control character.
    >              * Transform 1 to 'A', 2 to 'B', etc. using
    >              * our intimate knowledge of ASCII encoding.
    >              */
    >             ch2 = ch | 0100;
    >         }
    >
    >         /* Print as above */
    >         result = putchar(ch2);
    >         if (result == EOF) {
    >             break;
    >         }
    >     }
    >

    This certainly puts my task into sharper relief. I think with what
    I've got, I can write this thing now.

    It was the double-whammy for me that I couldn't understand the double
    pipe in two different ways. That short-circuit stuff makes sense.

    I'll need to write parts of this from scratch. I've decided to forget
    about getopt for now and instaed write catv.c to expect a single file
    and to convert characters as above.

    The final kink I've got to work out is that all the available
    literature shows the output coming out buffered instead of one char at
    a time. I'll hope to use Heathfield's safegets and have a robust
    result. I think if I pay attention to the above, it will obtain.

    Thanks for the enlightening keystrokes.
    --
    c gordon liddy, Mar 30, 2008
    #5
  6. On Mar 28, 8:11 am, Keith Thompson <> wrote:
    > c gordon liddy <> writes:
    >
    >
    >
    > > "Keith Thompson" <> wrote in message news:
    > > ...
    > >> "c gordon liddy" <> writes:

    > [...]
    > >>> Similarly, I'm out of my depth with what follows the double pipe in the
    > >>> second if clause.
    > >>> || putchar(ch == '\177' ? '?' : ch | 0100) == EOF)

    >
    > >>> Wouldn't \177 be a tri-graph? A perfectly-acceptable explanation might be
    > >>> that it's beyond the scope of my present endeavor and can be omitted.

    >
    > >> No, it's not a trigraph; trigraphs are introduced by a double question
    > >> mark. It's a character constant that uses an escape sequence. '\177'
    > >> is the character whose integer value is 177 in octal, or 127 in
    > >> decimal; it's the ASCII DEL character. "ch | 0100" yields the value
    > >> of ch with a certain bit forced on; it's terse way of mapping
    > >> control-A (1) to 'A" and so forth. The conditional expression is used
    > >> to handle the fact that mapping DEL to "^?" is a special case.

    >
    > > I think I could study the above for a long time and not really get
    > > it. It's interesting but not germane to something that can be done in
    > > standard C. I have a double problem with the double pipe here. Not
    > > only is that which is on the right hand side of it obfuscated C, I
    > > don't get the control mechanism. To me, it looks like
    > > if this then that or the other.

    >
    > The quote code, as far as I can tell, *is* standard C. I don't
    > believe it's deliberately obfuscated; rather, it's unusually terse,
    > written in a style that favors packing lots of information into
    > complex expressions rather than breaking it down into separate
    > statements.
    >
    > You can skip it and go on to something easier if you like, but you
    > might consider taking one more stab at it.
    >
    > Let's take a look at the statement:
    >
    > if (iscntrl(ch)) {
    > if (putchar('^') == EOF ||
    > putchar(ch == '\177' ? '?' :
    > ch | 0100) == EOF)
    > break;
    > continue;
    > }
    >
    > if ch is a control character then
    > if printing '^' fails *or* printing another character fails then
    > break out of the loop (give up)
    > end if
    > Printing succeeded; nothing more to do here: "continue"
    > end if
    >
    > iscntrl(ch) returns true if ch is a "control character". In this
    > context, it tells us that it's a non-printable character that we want
    > to represent as a '^' followed by another character (^G for the ASCII
    > BEL character, ^? for DEL).
    >
    > Within the if statement we see two calls to putchar(), one to print
    > the '^' character and one to print whatever follows it. Both results
    > are compared against EOF (which indicates failure); if either
    > putchar() fails, we break out of the loop.
    >
    > The part before the "||" is reasonably clear: try to print a '^'
    > character and check whether the attempt failed. "||" is a
    > short-circuit operator, evaluating its right operand only if the left
    > operand is false, so if the first putchar call fails we won't attempt
    > the second one.
    >
    > Now let's look at the part after the "||":
    >
    > putchar(ch == '\177' ? '?' : ch | 0100) == EOF
    >
    > We've covered the higher level control flow, so we're down to figuring
    > out what the heck
    >
    > ch == '\177' ? '?' : ch | 0100
    >
    > means. Some parentheses might make it clearer:
    >
    > (ch == '\177') ? ('?') : (ch | 0100)
    >
    > If ch is equal to '\177' (character 177 octal, 127 decimal, ASCII
    > DEL), the expression yields '?'. The result is that we print a '?'
    > after the '^'.
    >
    > Otherwise (For any other control character), the result is (ch |
    > 0100). 0100, since it begins with '0' is an octal constant, equal to
    > 64, a power of 2. "|" is the bitwise "or" operator.
    >
    > The binary value of 0100 is 01000000. Suppose the value of ch is 7
    > (ascii BEL, which we're going to want to print as "^G"). 7 is
    > 00000111. Applying bitwise or to these two operands gives us
    > 01000111, which is 0107 in octal, or 71 in decimal, or 'G' in ASCII.
    >
    > 0100 (octal) is being used as a bit mask; it has a single bit set to
    > 1, and all others set to 0. (ch | 0100) yields the value of ch with
    > the bit in that particular position turned on. As it happens, that's
    > a terse way to specify a transformation from a control character to
    > the corresponding letter.
    >
    > Note that (ch + 64) would have worked just as well in this context
    > (since we know the bit we want to turn on isn't already on). The
    > author probably chose to write "ch | 0100" because he thought of the
    > operation as setting a bit, not as the equivalent addition.
    >
    > Here's a much more verbose chunk of code that does the same thing.
    > I've kept the "c | 0100" idiom, but expanded everything else. The
    > original code is more terse than I tend to like; the following is much
    > too verbose for my taste, but it might be clearer. (I've compiled it,
    > but I haven't tested it.)
    >
    > if (iscntrl(ch)) {
    > /* ch is a control character */
    > int result;
    >
    > /*
    > * The two characters we want to print. The first is '^';
    > * we don't know yet what the second is.
    > */
    > int ch1 = '^';
    > int ch2;
    >
    > /* Try to print the first character. */
    > result = putchar(ch1);
    > if (result == EOF) {
    > /* Failed, terminate the loop *?
    > break;
    > }
    >
    > if (ch == '\177') {
    > /* ch is DEL, we want "^?" */
    > ch2 = '?';
    > }
    > else {
    > /*
    > * ch is another control character.
    > * Transform 1 to 'A', 2 to 'B', etc. using
    > * our intimate knowledge of ASCII encoding.
    > */
    > ch2 = ch | 0100;
    > }
    >
    > /* Print as above */
    > result = putchar(ch2);
    > if (result == EOF) {
    > break;
    > }
    > }
    >
    > --
    > Keith Thompson (The_Other_Keith) <>
    > Nokia
    > "We must do something. This is something. Therefore, we must do this."
    > -- Antony Jay and Jonathan Lynn, "Yes Minister"


    K&R 7.5 has the text that includes the cat function that is alluded to
    in 8.1. The filecopy there uses characters instead of buffers to do
    its business. I believe it is better suited to my current task than
    using buffers. The part needing revision to account for the -v
    behavior appears as an external, void function. Main makes the
    adjustment for the output to go to stdout.

    /*filecopy */
    void filecopy(FILE *ifp, FILE *ofp)
    {
    int c;

    while((c=getc(ifp)) != EOF)
    putc(c, ofp);
    }

    I don't know whether I'll be able to get the job done with one int, so
    I'll put c in reserve and use ch to match the source I snipped from
    the bsd site. I've further added symbols to match Keith's verbose
    version.

    /*filecopy */
    void filecopy(FILE *ifp, FILE *ofp)
    {
    int c;
    int ch;
    int result;

    while((ch=getc(ifp)) != EOF)
    putc(ch, ofp);
    }

    So, I've got to exchange this for the putc statement:

    if (iscntrl(ch)) {
    /* ch is a control character */
    int result;

    /*
    * The two characters we want to print. The first is '^';
    * we don't know yet what the second is.
    */
    int ch1 = '^';
    int ch2;

    /* Try to print the first character. */
    result = putchar(ch1);
    if (result == EOF) {
    /* Failed, terminate the loop *?
    break;
    }

    if (ch == '\177') {
    /* ch is DEL, we want "^?" */
    ch2 = '?';
    }
    else {
    /*
    * ch is another control character.
    * Transform 1 to 'A', 2 to 'B', etc. using
    * our intimate knowledge of ASCII encoding.
    */
    ch2 = ch | 0100;
    }

    /* Print as above */
    result = putchar(ch2);
    if (result == EOF) {
    break;
    }
    }


    So I think I'm ready to take this to a compiler. I'm on someone
    else's laptop. It probably does have a compiler, but its owner is in
    an online naval battle. Our girlfriends are at the theatre. I love
    theatre when I don't have to go.

    Because I have to make the keystrokes, I'll finish with the caller.
    No non-standard headers here:
    #include <stdio.h>

    int main(int argc, char **argv)
    {

    FILE *fp;
    void filecopy(FILE *, FILE *);

    if (argc < 2) printf("die");
    else
    while (--argc > 0)
    if ((fp = fopen(*++argv, "r")) == NULL)
    {
    printf("catv can't open %s\n", *argv);
    return 1;
    }
    else
    {
    filecopy(fp, stdout);
    fclose(fp);
    }

    return 0;
    }
    Since the google portal is the only way for me to get this back to my
    own machine, I include reference material after the sig.

    --
    c gordon liddy



    if (iscntrl(ch)) {
    if (putchar('^') == EOF ||
    putchar(ch == '\177' ? '?' :
    ch | 0100) == EOF)
    break;
    continue;
    }

    if ch is a control character then
    if printing '^' fails *or* printing another character fails then
    break out of the loop (give up)
    end if
    Printing succeeded; nothing more to do here: "continue"
    end if

    iscntrl(ch) returns true if ch is a "control character". In this
    context, it tells us that it's a non-printable character that we want
    to represent as a '^' followed by another character (^G for the ASCII
    BEL character, ^? for DEL).

    Within the if statement we see two calls to putchar(), one to print
    the '^' character and one to print whatever follows it. Both results
    are compared against EOF (which indicates failure); if either
    putchar() fails, we break out of the loop.

    The part before the "||" is reasonably clear: try to print a '^'
    character and check whether the attempt failed. "||" is a
    short-circuit operator, evaluating its right operand only if the left
    operand is false, so if the first putchar call fails we won't attempt
    the second one.

    Now let's look at the part after the "||":

    putchar(ch == '\177' ? '?' : ch | 0100) == EOF

    We've covered the higher level control flow, so we're down to figuring
    out what the heck

    ch == '\177' ? '?' : ch | 0100

    means. Some parentheses might make it clearer:

    (ch == '\177') ? ('?') : (ch | 0100)

    If ch is equal to '\177' (character 177 octal, 127 decimal, ASCII
    DEL), the expression yields '?'. The result is that we print a '?'
    after the '^'.

    Otherwise (For any other control character), the result is (ch |
    0100). 0100, since it begins with '0' is an octal constant, equal to
    64, a power of 2. "|" is the bitwise "or" operator.

    The binary value of 0100 is 01000000. Suppose the value of ch is 7
    (ascii BEL, which we're going to want to print as "^G"). 7 is
    00000111. Applying bitwise or to these two operands gives us
    01000111, which is 0107 in octal, or 71 in decimal, or 'G' in ASCII.

    0100 (octal) is being used as a bit mask; it has a single bit set to
    1, and all others set to 0. (ch | 0100) yields the value of ch with
    the bit in that particular position turned on. As it happens, that's
    a terse way to specify a transformation from a control character to
    the corresponding letter.

    Note that (ch + 64) would have worked just as well in this context
    (since we know the bit we want to turn on isn't already on). The
    author probably chose to write "ch | 0100" because he thought of the
    operation as setting a bit, not as the equivalent addition.

    Here's a much more verbose chunk of code that does the same thing.
    I've kept the "c | 0100" idiom, but expanded everything else. The
    original code is more terse than I tend to like; the following is much
    too verbose for my taste, but it might be clearer. (I've compiled it,
    but I haven't tested it.)
    c gordon liddy, Mar 30, 2008
    #6
  7. On Sat, 29 Mar 2008 22:36:19 -0700 (PDT), c gordon liddy
    <> wrote:


    snip 160 lines of obsolete commentary

    Please trim you posts when responding

    >K&R 7.5 has the text that includes the cat function that is alluded to


    snip code you don't intend to use

    >
    >/*filecopy */
    >void filecopy(FILE *ifp, FILE *ofp)
    >{
    > int c;
    > int ch;
    > int result;
    >
    > while((ch=getc(ifp)) != EOF)
    > putc(ch, ofp);
    >}
    >
    >So, I've got to exchange this for the putc statement:
    >
    > if (iscntrl(ch)) {
    > /* ch is a control character */
    > int result;
    >
    > /*
    > * The two characters we want to print. The first is '^';
    > * we don't know yet what the second is.
    > */
    > int ch1 = '^';
    > int ch2;
    >
    > /* Try to print the first character. */
    > result = putchar(ch1);


    The putc call this is replacing used ofp. This is forced to stdout.
    Was that deliberate?

    > if (result == EOF) {
    > /* Failed, terminate the loop *?
    > break;


    Your main calls this function in a loop. A failure here is probably
    permanent. How do you tell the caller that things are broken and it
    should stop calling you?

    You may want to have the function return a status and let the calling
    function evaluate that status before iterating the loop.

    > }
    >
    > if (ch == '\177') {
    > /* ch is DEL, we want "^?" */
    > ch2 = '?';
    > }
    > else {
    > /*
    > * ch is another control character.
    > * Transform 1 to 'A', 2 to 'B', etc. using
    > * our intimate knowledge of ASCII encoding.
    > */
    > ch2 = ch | 0100;


    Why limit yourself to ASCII? Why use an octal constant to obfuscate
    the code? If you want to transform integers to letters, build a
    static array and select the character using the integer as the index.
    Make it constant and give it file scope. Something along the lines of
    const char transform[] = "@ABCD...XYZ~";
    ch2 = transform[ch];
    You will need to add a range check on ch since I'm not aware of any
    guarantee that all control characters have an integer value <= 26.

    > }
    >
    > /* Print as above */
    > result = putchar(ch2);
    > if (result == EOF) {
    > break;
    > }
    > }
    >

    snip commentary

    >
    >int main(int argc, char **argv)
    >{
    >
    > FILE *fp;
    > void filecopy(FILE *, FILE *);
    >
    >if (argc < 2) printf("die");


    Avoid portability issues and include a \n in your print string.

    > else
    > while (--argc > 0)


    As a matter of style, the absence of braces will eventually cause you
    problems. Right now your if and else are close enough to be visually
    obvious. That will not always be the case. Many adopt the style of
    using braces even when the range of the loop is a simple one-line
    statement. I'm not terribly consistent myself in that situation but I
    always use braces when the range occupies multiple lines.

    > if ((fp = fopen(*++argv, "r")) == NULL)


    Since you expect the file to contain control character, you should
    open it for binary input, not text. The reason is that some operating
    systems will cause getc to return EOF when reading the control
    character that they think marks the end of text.

    > {
    > printf("catv can't open %s\n", *argv);
    > return 1;


    Use EXIT_FAILURE instead of 1 for portability.

    > }
    > else
    > {
    > filecopy(fp, stdout);
    > fclose(fp);
    > }


    Avoid portability issues and
    putchar('\n');
    when you finish.

    If you run in Windows, a
    getchar();
    here will keep the window open long enough for you to read the output.

    >
    >return 0;
    >}



    Remove del for email
    Barry Schwarz, Mar 31, 2008
    #7
  8. "Barry Schwarz" <> wrote in message
    news:...
    > On Sat, 29 Mar 2008 22:36:19 -0700 (PDT), c gordon liddy


    > Your main calls this function in a loop. A failure here is probably
    > permanent. How do you tell the caller that things are broken and it
    > should stop calling you?

    Don't know.

    I would hope that an OS would just ignore to ignore a main call from the
    same place a million times a second. My guess is that you have a means for
    an OS to decide without burning the chip.


    >
    > You may want to have the function return a status and let the calling
    > function evaluate that status before iterating the loop.
    >
    >> }
    >>
    >> if (ch == '\177') {
    >> /* ch is DEL, we want "^?" */
    >> ch2 = '?';
    >> }
    >> else {
    >> /*
    >> * ch is another control character.
    >> * Transform 1 to 'A', 2 to 'B', etc. using
    >> * our intimate knowledge of ASCII encoding.
    >> */
    >> ch2 = ch | 0100;

    >
    > Why limit yourself to ASCII? Why use an octal constant to obfuscate
    > the code? If you want to transform integers to letters, build a
    > static array and select the character using the integer as the index.
    > Make it constant and give it file scope. Something along the lines of
    > const char transform[] = "@ABCD...XYZ~";
    > ch2 = transform[ch];
    > You will need to add a range check on ch since I'm not aware of any
    > guarantee that all control characters have an integer value <= 26.


    How do I do that? Are ctrl chars defined by ascii?



    >
    >> }
    >>
    >> /* Print as above */
    >> result = putchar(ch2);
    >> if (result == EOF) {
    >> break;
    >> }
    >> }
    >>

    > snip commentary
    >
    >>
    >>int main(int argc, char **argv)
    >>{
    >>
    >> FILE *fp;
    >> void filecopy(FILE *, FILE *);
    >>
    >>if (argc < 2) printf("die");

    >
    > Avoid portability issues and include a \n in your print string.
    >
    >> else
    >> while (--argc > 0)

    >
    > As a matter of style, the absence of braces will eventually cause you
    > problems. Right now your if and else are close enough to be visually
    > obvious. That will not always be the case. Many adopt the style of
    > using braces even when the range of the loop is a simple one-line
    > statement. I'm not terribly consistent myself in that situation but I
    > always use braces when the range occupies multiple lines.
    >
    >> if ((fp = fopen(*++argv, "r")) == NULL)

    >
    > Since you expect the file to contain control character, you should
    > open it for binary input, not text. The reason is that some operating
    > systems will cause getc to return EOF when reading the control
    > character that they think marks the end of text.
    >
    >> {
    >> printf("catv can't open %s\n", *argv);
    >> return 1;

    >
    > Use EXIT_FAILURE instead of 1 for portability.
    >
    >> }
    >> else
    >> {
    >> filecopy(fp, stdout);
    >> fclose(fp);
    >> }

    >
    > Avoid portability issues and
    > putchar('\n');
    > when you finish.
    >
    > If you run in Windows, a
    > getchar();
    > here will keep the window open long enough for you to read the output.

    We'll C. It's time for me to step away.


    >
    >>
    >>return 0;
    >>}

    >
    >
    > Remove del for email

    #include <stdio.h>

    int main(int argc, char **argv)
    {

    FILE *fp;
    void filecopy(FILE *, FILE *);

    if (argc < 2) printf("die\n");
    else
    while (--argc > 0)
    if ((fp = fopen(*++argv, "rb")) == NULL)
    {
    printf("catv can't open %s\n", *argv);
    return 1;
    }
    else
    {
    filecopy(fp, stdout);
    fclose(fp);
    }

    return 0;
    }

    /*filecopy */
    void filecopy(FILE *ifp, FILE *ofp)
    {
    int c;
    int ch;
    int result;
    int ch1 = '^';
    int ch2;
    int ch3 = 'M';

    while((ch=getc(ifp)) != EOF)
    {
    if (iscntrl(ch))
    {

    result = putchar(ch1);
    if (result == EOF)
    {
    /* Failed, terminate the loop */
    break;
    }

    if (ch == '\177')
    {
    /* ch is DEL, we want "^?" */
    ch2 = '?';
    }
    else if(ch == 26)
    {
    /* we don't want ctrl-z coming out of here */
    ch2 = '#';
    }

    else
    {
    /*
    * ch is another control character.
    * Transform 1 to 'A', 2 to 'B', etc. using
    * our intimate knowledge of ASCII encoding.
    */
    ch2 = ch | 0100;
    }

    /* Print as above */
    result = putchar(ch2);
    if (result == EOF)
    {
    break;
    }


    // outer brace of if (iscntrl(ch))
    }


    else if (!isascii(ch))
    {
    if (putchar('M') == EOF || putchar('-') == EOF)
    break;
    ch = toascii(ch);
    putchar(ch);
    }



    else putchar(ch);


    // outer brace of while control

    putchar('\n');
    // ready to exit

    }

    getchar();



    // outer brace of function
    }
    // gcc -o catv catv9.c >text22.txt 2>text23.txt
    // catv text42.txt >text43.txt



    --

    "I am waiting for them to prove that God is really American."

    ~~ Lawrence Ferlinghetti
    C. Gordon Liddy, Mar 31, 2008
    #8
  9. On Thu, 27 Mar 2008 23:22:52 -0700, Keith Thompson <>
    wrote:

    > "c gordon liddy" <> writes:

    <snip: from some version of 'cat'>
    > > } else if (vflag)
    > > { if (!isascii(ch))
    > > { if (putchar('M') == EOF || putchar('-') == EOF)
    > > break;
    > > ch = toascii(ch);
    > > }
    > > if (iscntrl(ch)) {
    > > if (putchar('^') == EOF ||
    > > putchar(ch == '\177' ? '?' :
    > > ch | 0100) == EOF)
    > > break;
    > > continue;
    > > }
    > > I did my best to get this on the screen. The parts I don't understand here
    > > follow the double pipe, which I read as "inclusive or." In the first if
    > > clause, it would appear that 'M' is substituted for non-ascii chars. What
    > > does
    > > || putchar('-') == EOF)
    > > do beyond this?

    >
    > It uses a common convention (at least it's common on Unix) for
    > displaying non-printable characters. Control characters in the range
    > 0 to 31 are represented as a '^' followed by another character,
    > usually an uppercase letter; it's determined by adding 64 to the
    > value. (On old keyboards, the control key actually worked by clearing
    > a bit in the 7-bit or 8-bit value that was transmitted.) The DEL
    > character, 127, is represented as ^?; this is a special case.


    ^x was common on many ASCII systems. Control clears the 0x40 bit IF in
    0x40-0x5F usually (after) ignoring shift=0x20 for letters 0x41-0x5A.

    > Characters with the high bit set, in the range 128 to 255, are called
    > "meta" characters (some old keyboards had a "meta" key that set this
    > bit), and are represented as "M-" followed by the representation of
    > the corresponding 7-bit character. For example, character 129 would
    > be printed as M-^A.
    >

    Few keyboards had meta, but some programs notably Emacs developed at
    MIT used it, and on other others you use prefix ESCape instead. meta
    'characters' are conventionally displayed as M-x, but since they are
    primarily used interactively and not stored I wouldn't have thought
    putting them in cat -v is very useful, more just for completeness.

    > putchar() returns EOF on failure.
    >
    > All this (except the EOF part) is very specific to the ASCII character
    > set, something that's not specified by the C standard, but it should
    > give you enough information to understand what the code is doing (with
    > a bit of work).
    >

    Yes.

    > > Similarly, I'm out of my depth with what follows the double pipe in the
    > > second if clause.
    > > || putchar(ch == '\177' ? '?' : ch | 0100) == EOF)
    > >
    > > Wouldn't \177 be a tri-graph? A perfectly-acceptable explanation might be
    > > that it's beyond the scope of my present endeavor and can be omitted.

    >
    > No, it's not a trigraph; trigraphs are introduced by a double question
    > mark. It's a character constant that uses an escape sequence. '\177'
    > is the character whose integer value is 177 in octal, or 127 in
    > decimal; it's the ASCII DEL character. "ch | 0100" yields the value
    > of ch with a certain bit forced on; it's terse way of mapping
    > control-A (1) to 'A" and so forth. The conditional expression is used
    > to handle the fact that mapping DEL to "^?" is a special case.


    It needn't be; (in ASCII) ch ^ 0100 /* or 0x40 */ works for both the
    'normal control' (!) characters 0x00-0x1F and DEL 0x7F. This is not an
    accident; the ^? display was chosen because of this.

    - formerly david.thompson1 || achar(64) || worldnet.att.net
    David Thompson, Apr 7, 2008
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Kiran
    Replies:
    1
    Views:
    514
    =?Utf-8?B?ZG90bmV0a3VtYXI=?=
    Aug 4, 2005
  2. sbourdette

    CreateObjetc --> ASP 177

    sbourdette, Apr 3, 2006, in forum: ASP .Net
    Replies:
    2
    Views:
    447
    sbourdette
    Apr 3, 2006
  3. Michel Rouzic
    Replies:
    18
    Views:
    1,564
    Malcolm
    Dec 10, 2005
  4. mosi
    Replies:
    14
    Views:
    621
    Bruno Desthuilliers
    Jul 18, 2007
  5. Matthew Moss

    [QUIZ] Statistician III (#177)

    Matthew Moss, Sep 13, 2008, in forum: Ruby
    Replies:
    1
    Views:
    107
    Matthew Moss
    Sep 17, 2008
Loading...

Share This Page