|| putchar(ch == '\177' ? '?' : ch | 0100) == EOF)


C

c gordon liddy

2 different cats.

I've been going through chp 8 of K&R and wanted to write a standard cat
function with a little more functionality than existing solns: I want to
code behavior for the -v switch. It occurs to me that there "should" be
source out there for this and googled for "cat.c unix source" . The second
hit I got was this:

/*
* Concatenate files.
*/

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>

char stdbuf[BUFSIZ];

main(argc, argv)
char **argv;
{
int fflg = 0;
register FILE *fi;
register c;
int dev, ino = -1;
struct stat statb;

setbuf(stdout, stdbuf);
for( ; argc>1 && argv[1][0]=='-'; argc--,argv++) {
switch(argv[1][1]) {

Holy smokes! This must count as archeology for unix systems. Funky-looking
main call and register as a type as opposed to storage specifier. This gave
me a pretty good idea what I wasn't looking for.

I then hit on:
http://www.openbsd.org/cgi-bin/cvsweb/src/bin/cat/cat.c?rev=1.14&content-type=text/plain

The first thing a person notices is the stack of non-standard headers.
Their inclusion is the usual reason for lack of topicality of unix
questions. My platform and my target consist of my non-unix machine; not
only do I not know what's in those headers, I don't have them.

Past that is the main control:
while ((ch = getopt(argc, argv, "benstuv")) != -1)
switch (ch) {
case 'b':
bflag = nflag = 1; /* -b implies -n */
break;
case 'e':
eflag = vflag = 1; /* -e implies -v */
break;
case 'n':
nflag = 1;
break;
case 's':
sflag = 1;
break;
case 't':
tflag = vflag = 1; /* -t implies -v */
break;
case 'u':
setbuf(stdout, NULL);
break;
case 'v':
vflag = 1;
break;
default:
(void)fprintf(stderr,
"usage: %s [-benstuv] [-] [file ...]\n", __progname);
exit(1);
/* NOTREACHED */
}
argv += optind;

The only case I'm to consider is 'v', so I won't need all of this. getopt
will be something that I have to code from scratch. Out of curiosity, what
header is it defined in?

Moving along is:
if (bflag || eflag || nflag || sflag || tflag || vflag)
cook_args(argv);
, so if any flag gets set we cook the args. Maybe instead, we cook with the
args. In this process we traverse through:

} else if (vflag)
{ if (!isascii(ch))
{ if (putchar('M') == EOF || putchar('-') == EOF)
break;
ch = toascii(ch);
}
if (iscntrl(ch)) {
if (putchar('^') == EOF ||
putchar(ch == '\177' ? '?' :
ch | 0100) == EOF)
break;
continue;
}
I did my best to get this on the screen. The parts I don't understand here
follow the double pipe, which I read as "inclusive or." In the first if
clause, it would appear that 'M' is substituted for non-ascii chars. What
does
|| putchar('-') == EOF)
do beyond this?

Similarly, I'm out of my depth with what follows the double pipe in the
second if clause.
|| putchar(ch == '\177' ? '?' : ch | 0100) == EOF)

Wouldn't \177 be a tri-graph? A perfectly-acceptable explanation might be
that it's beyond the scope of my present endeavor and can be omitted.

Grateful for your thoughtful comment.
 
Ad

Advertisements

K

Keith Thompson

c gordon liddy said:
2 different cats.

I've been going through chp 8 of K&R and wanted to write a standard cat
function with a little more functionality than existing solns: I want to
code behavior for the -v switch. It occurs to me that there "should" be
source out there for this and googled for "cat.c unix source" . The second
hit I got was this:

/*
* Concatenate files.
*/

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>

char stdbuf[BUFSIZ];

main(argc, argv)
char **argv;
{
int fflg = 0;
register FILE *fi;
register c;
int dev, ino = -1;
struct stat statb;

setbuf(stdout, stdbuf);
for( ; argc>1 && argv[1][0]=='-'; argc--,argv++) {
switch(argv[1][1]) {

Holy smokes! This must count as archeology for unix systems. Funky-looking
main call and register as a type as opposed to storage specifier. This gave
me a pretty good idea what I wasn't looking for.

Yes, that code is archaic, but it's of some historical interest (to
show how much the language has improved if nothing else).

It makes heavy use of "implicit int", which is discouraged for C90
(though the standard doesn't say so) and dropped completely in C90.
In "register c;", register is a storage specifier, just as it is in
modern C; the declaration is equivalent to "register int c;".
I then hit on:
http://www.openbsd.org/cgi-bin/cvsweb/src/bin/cat/cat.c?rev=1.14&content-type=text/plain

The first thing a person notices is the stack of non-standard headers.
Their inclusion is the usual reason for lack of topicality of unix
questions. My platform and my target consist of my non-unix machine; not
only do I not know what's in those headers, I don't have them.

Most of the non-standard stuff is probably not strictly necessary, but
it can be used to improve performance or to provide various bells and
whistles.
Past that is the main control:
while ((ch = getopt(argc, argv, "benstuv")) != -1)
[...]

getopt is non-standard, as I'm sure you know.

[...]
The only case I'm to consider is 'v', so I won't need all of this. getopt
will be something that I have to code from scratch. Out of curiosity, what
header is it defined in?

It varies. Consult your system's documentation, or Google it. If
your system doesn't provide it, there are open-source implementations
out there. (For that matter, there are plenty of open-source
implementations of "cat", but I suppose that would defeat your
purpose.)
Moving along is:
if (bflag || eflag || nflag || sflag || tflag || vflag)
cook_args(argv);
, so if any flag gets set we cook the args. Maybe instead, we cook with the
args. In this process we traverse through:

} else if (vflag)
{ if (!isascii(ch))
{ if (putchar('M') == EOF || putchar('-') == EOF)
break;
ch = toascii(ch);
}
if (iscntrl(ch)) {
if (putchar('^') == EOF ||
putchar(ch == '\177' ? '?' :
ch | 0100) == EOF)
break;
continue;
}
I did my best to get this on the screen. The parts I don't understand here
follow the double pipe, which I read as "inclusive or." In the first if
clause, it would appear that 'M' is substituted for non-ascii chars. What
does
|| putchar('-') == EOF)
do beyond this?

It uses a common convention (at least it's common on Unix) for
displaying non-printable characters. Control characters in the range
0 to 31 are represented as a '^' followed by another character,
usually an uppercase letter; it's determined by adding 64 to the
value. (On old keyboards, the control key actually worked by clearing
a bit in the 7-bit or 8-bit value that was transmitted.) The DEL
character, 127, is represented as ^?; this is a special case.
Characters with the high bit set, in the range 128 to 255, are called
"meta" characters (some old keyboards had a "meta" key that set this
bit), and are represented as "M-" followed by the representation of
the corresponding 7-bit character. For example, character 129 would
be printed as M-^A.

putchar() returns EOF on failure.

All this (except the EOF part) is very specific to the ASCII character
set, something that's not specified by the C standard, but it should
give you enough information to understand what the code is doing (with
a bit of work).
Similarly, I'm out of my depth with what follows the double pipe in the
second if clause.
|| putchar(ch == '\177' ? '?' : ch | 0100) == EOF)

Wouldn't \177 be a tri-graph? A perfectly-acceptable explanation might be
that it's beyond the scope of my present endeavor and can be omitted.

No, it's not a trigraph; trigraphs are introduced by a double question
mark. It's a character constant that uses an escape sequence. '\177'
is the character whose integer value is 177 in octal, or 127 in
decimal; it's the ASCII DEL character. "ch | 0100" yields the value
of ch with a certain bit forced on; it's terse way of mapping
control-A (1) to 'A" and so forth. The conditional expression is used
to handle the fact that mapping DEL to "^?" is a special case.
 
C

c gordon liddy

Keith Thompson said:
"c gordon liddy" <[email protected]> writes:
Yes, that code is archaic, but it's of some historical interest (to
show how much the language has improved if nothing else).
Thanks for your generous response. I understood the above, but about
right here is where I hit the peter principle:
It uses a common convention (at least it's common on Unix) for
displaying non-printable characters. Control characters in the range
0 to 31 are represented as a '^' followed by another character,
usually an uppercase letter; it's determined by adding 64 to the
value. (On old keyboards, the control key actually worked by clearing
a bit in the 7-bit or 8-bit value that was transmitted.) The DEL
character, 127, is represented as ^?; this is a special case.
Characters with the high bit set, in the range 128 to 255, are called
"meta" characters (some old keyboards had a "meta" key that set this
bit), and are represented as "M-" followed by the representation of
the corresponding 7-bit character. For example, character 129 would
be printed as M-^A.

putchar() returns EOF on failure.

All this (except the EOF part) is very specific to the ASCII character
set, something that's not specified by the C standard, but it should
give you enough information to understand what the code is doing (with
a bit of work).


No, it's not a trigraph; trigraphs are introduced by a double question
mark. It's a character constant that uses an escape sequence. '\177'
is the character whose integer value is 177 in octal, or 127 in
decimal; it's the ASCII DEL character. "ch | 0100" yields the value
of ch with a certain bit forced on; it's terse way of mapping
control-A (1) to 'A" and so forth. The conditional expression is used
to handle the fact that mapping DEL to "^?" is a special case.

I think I could study the above for a long time and not really get
it. It's interesting but not germane to something that can be done in
standard C. I have a double problem with the double pipe here. Not
only is that which is on the right hand side of it obfuscated C, I
don't get the control mechanism. To me, it looks like
if this then that or the other.

I would suspect that there's another K&R exercise that speaks to this
in the realm of my own OS.

--
 
K

Keith Thompson

c gordon liddy said:
I think I could study the above for a long time and not really get
it. It's interesting but not germane to something that can be done in
standard C. I have a double problem with the double pipe here. Not
only is that which is on the right hand side of it obfuscated C, I
don't get the control mechanism. To me, it looks like
if this then that or the other.

The quote code, as far as I can tell, *is* standard C. I don't
believe it's deliberately obfuscated; rather, it's unusually terse,
written in a style that favors packing lots of information into
complex expressions rather than breaking it down into separate
statements.

You can skip it and go on to something easier if you like, but you
might consider taking one more stab at it.

Let's take a look at the statement:

if (iscntrl(ch)) {
if (putchar('^') == EOF ||
putchar(ch == '\177' ? '?' :
ch | 0100) == EOF)
break;
continue;
}

if ch is a control character then
if printing '^' fails *or* printing another character fails then
break out of the loop (give up)
end if
Printing succeeded; nothing more to do here: "continue"
end if

iscntrl(ch) returns true if ch is a "control character". In this
context, it tells us that it's a non-printable character that we want
to represent as a '^' followed by another character (^G for the ASCII
BEL character, ^? for DEL).

Within the if statement we see two calls to putchar(), one to print
the '^' character and one to print whatever follows it. Both results
are compared against EOF (which indicates failure); if either
putchar() fails, we break out of the loop.

The part before the "||" is reasonably clear: try to print a '^'
character and check whether the attempt failed. "||" is a
short-circuit operator, evaluating its right operand only if the left
operand is false, so if the first putchar call fails we won't attempt
the second one.

Now let's look at the part after the "||":

putchar(ch == '\177' ? '?' : ch | 0100) == EOF

We've covered the higher level control flow, so we're down to figuring
out what the heck

ch == '\177' ? '?' : ch | 0100

means. Some parentheses might make it clearer:

(ch == '\177') ? ('?') : (ch | 0100)

If ch is equal to '\177' (character 177 octal, 127 decimal, ASCII
DEL), the expression yields '?'. The result is that we print a '?'
after the '^'.

Otherwise (For any other control character), the result is (ch |
0100). 0100, since it begins with '0' is an octal constant, equal to
64, a power of 2. "|" is the bitwise "or" operator.

The binary value of 0100 is 01000000. Suppose the value of ch is 7
(ascii BEL, which we're going to want to print as "^G"). 7 is
00000111. Applying bitwise or to these two operands gives us
01000111, which is 0107 in octal, or 71 in decimal, or 'G' in ASCII.

0100 (octal) is being used as a bit mask; it has a single bit set to
1, and all others set to 0. (ch | 0100) yields the value of ch with
the bit in that particular position turned on. As it happens, that's
a terse way to specify a transformation from a control character to
the corresponding letter.

Note that (ch + 64) would have worked just as well in this context
(since we know the bit we want to turn on isn't already on). The
author probably chose to write "ch | 0100" because he thought of the
operation as setting a bit, not as the equivalent addition.

Here's a much more verbose chunk of code that does the same thing.
I've kept the "c | 0100" idiom, but expanded everything else. The
original code is more terse than I tend to like; the following is much
too verbose for my taste, but it might be clearer. (I've compiled it,
but I haven't tested it.)

if (iscntrl(ch)) {
/* ch is a control character */
int result;

/*
* The two characters we want to print. The first is '^';
* we don't know yet what the second is.
*/
int ch1 = '^';
int ch2;

/* Try to print the first character. */
result = putchar(ch1);
if (result == EOF) {
/* Failed, terminate the loop *?
break;
}

if (ch == '\177') {
/* ch is DEL, we want "^?" */
ch2 = '?';
}
else {
/*
* ch is another control character.
* Transform 1 to 'A', 2 to 'B', etc. using
* our intimate knowledge of ASCII encoding.
*/
ch2 = ch | 0100;
}

/* Print as above */
result = putchar(ch2);
if (result == EOF) {
break;
}
}
 
C

c gordon liddy

The quote code, as far as I can tell, *is* standard C.  I don't
believe it's deliberately obfuscated; rather, it's unusually terse,
written in a style that favors packing lots of information into
complex expressions rather than breaking it down into separate
statements.

You can skip it and go on to something easier if you like, but you
might consider taking one more stab at it.

Let's take a look at the statement:

  if (iscntrl(ch)) {
     if (putchar('^') == EOF ||
         putchar(ch == '\177' ? '?' :
         ch | 0100) == EOF)
      break;
     continue;
    }

if ch is a control character then
    if printing '^' fails *or* printing another character fails then
        break out of the loop (give up)
    end if
    Printing succeeded; nothing more to do here: "continue"
end if

iscntrl(ch) returns true if ch is a "control character".  In this
context, it tells us that it's a non-printable character that we want
to represent as a '^' followed by another character (^G for the ASCII
BEL character, ^? for DEL).

Within the if statement we see two calls to putchar(), one to print
the '^' character and one to print whatever follows it.  Both results
are compared against EOF (which indicates failure); if either
putchar() fails, we break out of the loop.

The part before the "||" is reasonably clear: try to print a '^'
character and check whether the attempt failed.  "||" is a
short-circuit operator, evaluating its right operand only if the left
operand is false, so if the first putchar call fails we won't attempt
the second one.

Now let's look at the part after the "||":

    putchar(ch == '\177' ? '?' : ch | 0100) == EOF

We've covered the higher level control flow, so we're down to figuring
out what the heck

    ch == '\177' ? '?' : ch | 0100

means.  Some parentheses might make it clearer:

    (ch == '\177') ? ('?') : (ch | 0100)

If ch is equal to '\177' (character 177 octal, 127 decimal, ASCII
DEL), the expression yields '?'.  The result is that we print a '?'
after the '^'.

Otherwise (For any other control character), the result is (ch |
0100).  0100, since it begins with '0' is an octal constant, equal to
64, a power of 2.  "|" is the bitwise "or" operator.

The binary value of 0100 is 01000000.  Suppose the value of ch is 7
(ascii BEL, which we're going to want to print as "^G").  7 is
00000111.  Applying bitwise or to these two operands gives us
01000111, which is 0107 in octal, or 71 in decimal, or 'G' in ASCII.

0100 (octal) is being used as a bit mask; it has a single bit set to
1, and all others set to 0.  (ch | 0100) yields the value of ch with
the bit in that particular position turned on.  As it happens, that's
a terse way to specify a transformation from a control character to
the corresponding letter.

Note that (ch + 64) would have worked just as well in this context
(since we know the bit we want to turn on isn't already on).  The
author probably chose to write "ch | 0100" because he thought of the
operation as setting a bit, not as the equivalent addition.

Here's a much more verbose chunk of code that does the same thing.
I've kept the "c | 0100" idiom, but expanded everything else.  The
original code is more terse than I tend to like; the following is much
too verbose for my taste, but it might be clearer.  (I've compiled it,
but I haven't tested it.)

    if (iscntrl(ch)) {
        /* ch is a control character */
        int result;

        /*
         * The two characters we want to print.  The first is '^';
         * we don't know yet what the second is.
         */
        int ch1 = '^';
        int ch2;

        /* Try to print the first character. */
        result = putchar(ch1);
        if (result == EOF) {
            /* Failed, terminate the loop *?
            break;
        }

        if (ch == '\177') {
            /* ch is DEL, we want "^?" */
            ch2 = '?';
        }
        else {
            /*
             * ch is another control character.
             * Transform 1 to 'A', 2 to 'B', etc. using
             * our intimate knowledge of ASCII encoding.
             */
            ch2 = ch | 0100;
        }

        /* Print as above */
        result = putchar(ch2);
        if (result == EOF) {
            break;
        }
    }
This certainly puts my task into sharper relief. I think with what
I've got, I can write this thing now.

It was the double-whammy for me that I couldn't understand the double
pipe in two different ways. That short-circuit stuff makes sense.

I'll need to write parts of this from scratch. I've decided to forget
about getopt for now and instaed write catv.c to expect a single file
and to convert characters as above.

The final kink I've got to work out is that all the available
literature shows the output coming out buffered instead of one char at
a time. I'll hope to use Heathfield's safegets and have a robust
result. I think if I pay attention to the above, it will obtain.

Thanks for the enlightening keystrokes.
--
 
C

c gordon liddy

The quote code, as far as I can tell, *is* standard C. I don't
believe it's deliberately obfuscated; rather, it's unusually terse,
written in a style that favors packing lots of information into
complex expressions rather than breaking it down into separate
statements.

You can skip it and go on to something easier if you like, but you
might consider taking one more stab at it.

Let's take a look at the statement:

if (iscntrl(ch)) {
if (putchar('^') == EOF ||
putchar(ch == '\177' ? '?' :
ch | 0100) == EOF)
break;
continue;
}

if ch is a control character then
if printing '^' fails *or* printing another character fails then
break out of the loop (give up)
end if
Printing succeeded; nothing more to do here: "continue"
end if

iscntrl(ch) returns true if ch is a "control character". In this
context, it tells us that it's a non-printable character that we want
to represent as a '^' followed by another character (^G for the ASCII
BEL character, ^? for DEL).

Within the if statement we see two calls to putchar(), one to print
the '^' character and one to print whatever follows it. Both results
are compared against EOF (which indicates failure); if either
putchar() fails, we break out of the loop.

The part before the "||" is reasonably clear: try to print a '^'
character and check whether the attempt failed. "||" is a
short-circuit operator, evaluating its right operand only if the left
operand is false, so if the first putchar call fails we won't attempt
the second one.

Now let's look at the part after the "||":

putchar(ch == '\177' ? '?' : ch | 0100) == EOF

We've covered the higher level control flow, so we're down to figuring
out what the heck

ch == '\177' ? '?' : ch | 0100

means. Some parentheses might make it clearer:

(ch == '\177') ? ('?') : (ch | 0100)

If ch is equal to '\177' (character 177 octal, 127 decimal, ASCII
DEL), the expression yields '?'. The result is that we print a '?'
after the '^'.

Otherwise (For any other control character), the result is (ch |
0100). 0100, since it begins with '0' is an octal constant, equal to
64, a power of 2. "|" is the bitwise "or" operator.

The binary value of 0100 is 01000000. Suppose the value of ch is 7
(ascii BEL, which we're going to want to print as "^G"). 7 is
00000111. Applying bitwise or to these two operands gives us
01000111, which is 0107 in octal, or 71 in decimal, or 'G' in ASCII.

0100 (octal) is being used as a bit mask; it has a single bit set to
1, and all others set to 0. (ch | 0100) yields the value of ch with
the bit in that particular position turned on. As it happens, that's
a terse way to specify a transformation from a control character to
the corresponding letter.

Note that (ch + 64) would have worked just as well in this context
(since we know the bit we want to turn on isn't already on). The
author probably chose to write "ch | 0100" because he thought of the
operation as setting a bit, not as the equivalent addition.

Here's a much more verbose chunk of code that does the same thing.
I've kept the "c | 0100" idiom, but expanded everything else. The
original code is more terse than I tend to like; the following is much
too verbose for my taste, but it might be clearer. (I've compiled it,
but I haven't tested it.)

if (iscntrl(ch)) {
/* ch is a control character */
int result;

/*
* The two characters we want to print. The first is '^';
* we don't know yet what the second is.
*/
int ch1 = '^';
int ch2;

/* Try to print the first character. */
result = putchar(ch1);
if (result == EOF) {
/* Failed, terminate the loop *?
break;
}

if (ch == '\177') {
/* ch is DEL, we want "^?" */
ch2 = '?';
}
else {
/*
* ch is another control character.
* Transform 1 to 'A', 2 to 'B', etc. using
* our intimate knowledge of ASCII encoding.
*/
ch2 = ch | 0100;
}

/* Print as above */
result = putchar(ch2);
if (result == EOF) {
break;
}
}

--
Keith Thompson (The_Other_Keith) <[email protected]>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

K&R 7.5 has the text that includes the cat function that is alluded to
in 8.1. The filecopy there uses characters instead of buffers to do
its business. I believe it is better suited to my current task than
using buffers. The part needing revision to account for the -v
behavior appears as an external, void function. Main makes the
adjustment for the output to go to stdout.

/*filecopy */
void filecopy(FILE *ifp, FILE *ofp)
{
int c;

while((c=getc(ifp)) != EOF)
putc(c, ofp);
}

I don't know whether I'll be able to get the job done with one int, so
I'll put c in reserve and use ch to match the source I snipped from
the bsd site. I've further added symbols to match Keith's verbose
version.

/*filecopy */
void filecopy(FILE *ifp, FILE *ofp)
{
int c;
int ch;
int result;

while((ch=getc(ifp)) != EOF)
putc(ch, ofp);
}

So, I've got to exchange this for the putc statement:

if (iscntrl(ch)) {
/* ch is a control character */
int result;

/*
* The two characters we want to print. The first is '^';
* we don't know yet what the second is.
*/
int ch1 = '^';
int ch2;

/* Try to print the first character. */
result = putchar(ch1);
if (result == EOF) {
/* Failed, terminate the loop *?
break;
}

if (ch == '\177') {
/* ch is DEL, we want "^?" */
ch2 = '?';
}
else {
/*
* ch is another control character.
* Transform 1 to 'A', 2 to 'B', etc. using
* our intimate knowledge of ASCII encoding.
*/
ch2 = ch | 0100;
}

/* Print as above */
result = putchar(ch2);
if (result == EOF) {
break;
}
}


So I think I'm ready to take this to a compiler. I'm on someone
else's laptop. It probably does have a compiler, but its owner is in
an online naval battle. Our girlfriends are at the theatre. I love
theatre when I don't have to go.

Because I have to make the keystrokes, I'll finish with the caller.
No non-standard headers here:
#include <stdio.h>

int main(int argc, char **argv)
{

FILE *fp;
void filecopy(FILE *, FILE *);

if (argc < 2) printf("die");
else
while (--argc > 0)
if ((fp = fopen(*++argv, "r")) == NULL)
{
printf("catv can't open %s\n", *argv);
return 1;
}
else
{
filecopy(fp, stdout);
fclose(fp);
}

return 0;
}
Since the google portal is the only way for me to get this back to my
own machine, I include reference material after the sig.

--
c gordon liddy



if (iscntrl(ch)) {
if (putchar('^') == EOF ||
putchar(ch == '\177' ? '?' :
ch | 0100) == EOF)
break;
continue;
}

if ch is a control character then
if printing '^' fails *or* printing another character fails then
break out of the loop (give up)
end if
Printing succeeded; nothing more to do here: "continue"
end if

iscntrl(ch) returns true if ch is a "control character". In this
context, it tells us that it's a non-printable character that we want
to represent as a '^' followed by another character (^G for the ASCII
BEL character, ^? for DEL).

Within the if statement we see two calls to putchar(), one to print
the '^' character and one to print whatever follows it. Both results
are compared against EOF (which indicates failure); if either
putchar() fails, we break out of the loop.

The part before the "||" is reasonably clear: try to print a '^'
character and check whether the attempt failed. "||" is a
short-circuit operator, evaluating its right operand only if the left
operand is false, so if the first putchar call fails we won't attempt
the second one.

Now let's look at the part after the "||":

putchar(ch == '\177' ? '?' : ch | 0100) == EOF

We've covered the higher level control flow, so we're down to figuring
out what the heck

ch == '\177' ? '?' : ch | 0100

means. Some parentheses might make it clearer:

(ch == '\177') ? ('?') : (ch | 0100)

If ch is equal to '\177' (character 177 octal, 127 decimal, ASCII
DEL), the expression yields '?'. The result is that we print a '?'
after the '^'.

Otherwise (For any other control character), the result is (ch |
0100). 0100, since it begins with '0' is an octal constant, equal to
64, a power of 2. "|" is the bitwise "or" operator.

The binary value of 0100 is 01000000. Suppose the value of ch is 7
(ascii BEL, which we're going to want to print as "^G"). 7 is
00000111. Applying bitwise or to these two operands gives us
01000111, which is 0107 in octal, or 71 in decimal, or 'G' in ASCII.

0100 (octal) is being used as a bit mask; it has a single bit set to
1, and all others set to 0. (ch | 0100) yields the value of ch with
the bit in that particular position turned on. As it happens, that's
a terse way to specify a transformation from a control character to
the corresponding letter.

Note that (ch + 64) would have worked just as well in this context
(since we know the bit we want to turn on isn't already on). The
author probably chose to write "ch | 0100" because he thought of the
operation as setting a bit, not as the equivalent addition.

Here's a much more verbose chunk of code that does the same thing.
I've kept the "c | 0100" idiom, but expanded everything else. The
original code is more terse than I tend to like; the following is much
too verbose for my taste, but it might be clearer. (I've compiled it,
but I haven't tested it.)
 
Ad

Advertisements

B

Barry Schwarz

On Sat, 29 Mar 2008 22:36:19 -0700 (PDT), c gordon liddy


snip 160 lines of obsolete commentary

Please trim you posts when responding
K&R 7.5 has the text that includes the cat function that is alluded to

snip code you don't intend to use
/*filecopy */
void filecopy(FILE *ifp, FILE *ofp)
{
int c;
int ch;
int result;

while((ch=getc(ifp)) != EOF)
putc(ch, ofp);
}

So, I've got to exchange this for the putc statement:

if (iscntrl(ch)) {
/* ch is a control character */
int result;

/*
* The two characters we want to print. The first is '^';
* we don't know yet what the second is.
*/
int ch1 = '^';
int ch2;

/* Try to print the first character. */
result = putchar(ch1);

The putc call this is replacing used ofp. This is forced to stdout.
Was that deliberate?
if (result == EOF) {
/* Failed, terminate the loop *?
break;

Your main calls this function in a loop. A failure here is probably
permanent. How do you tell the caller that things are broken and it
should stop calling you?

You may want to have the function return a status and let the calling
function evaluate that status before iterating the loop.
}

if (ch == '\177') {
/* ch is DEL, we want "^?" */
ch2 = '?';
}
else {
/*
* ch is another control character.
* Transform 1 to 'A', 2 to 'B', etc. using
* our intimate knowledge of ASCII encoding.
*/
ch2 = ch | 0100;

Why limit yourself to ASCII? Why use an octal constant to obfuscate
the code? If you want to transform integers to letters, build a
static array and select the character using the integer as the index.
Make it constant and give it file scope. Something along the lines of
const char transform[] = "@ABCD...XYZ~";
ch2 = transform[ch];
You will need to add a range check on ch since I'm not aware of any
guarantee that all control characters have an integer value <= 26.
}

/* Print as above */
result = putchar(ch2);
if (result == EOF) {
break;
}
}
snip commentary
int main(int argc, char **argv)
{

FILE *fp;
void filecopy(FILE *, FILE *);

if (argc < 2) printf("die");

Avoid portability issues and include a \n in your print string.
else
while (--argc > 0)

As a matter of style, the absence of braces will eventually cause you
problems. Right now your if and else are close enough to be visually
obvious. That will not always be the case. Many adopt the style of
using braces even when the range of the loop is a simple one-line
statement. I'm not terribly consistent myself in that situation but I
always use braces when the range occupies multiple lines.
if ((fp = fopen(*++argv, "r")) == NULL)

Since you expect the file to contain control character, you should
open it for binary input, not text. The reason is that some operating
systems will cause getc to return EOF when reading the control
character that they think marks the end of text.
{
printf("catv can't open %s\n", *argv);
return 1;

Use EXIT_FAILURE instead of 1 for portability.
}
else
{
filecopy(fp, stdout);
fclose(fp);
}

Avoid portability issues and
putchar('\n');
when you finish.

If you run in Windows, a
getchar();
here will keep the window open long enough for you to read the output.
return 0;
}


Remove del for email
 
C

C. Gordon Liddy

Your main calls this function in a loop. A failure here is probably
permanent. How do you tell the caller that things are broken and it
should stop calling you?
Don't know.

I would hope that an OS would just ignore to ignore a main call from the
same place a million times a second. My guess is that you have a means for
an OS to decide without burning the chip.

You may want to have the function return a status and let the calling
function evaluate that status before iterating the loop.
}

if (ch == '\177') {
/* ch is DEL, we want "^?" */
ch2 = '?';
}
else {
/*
* ch is another control character.
* Transform 1 to 'A', 2 to 'B', etc. using
* our intimate knowledge of ASCII encoding.
*/
ch2 = ch | 0100;

Why limit yourself to ASCII? Why use an octal constant to obfuscate
the code? If you want to transform integers to letters, build a
static array and select the character using the integer as the index.
Make it constant and give it file scope. Something along the lines of
const char transform[] = "@ABCD...XYZ~";
ch2 = transform[ch];
You will need to add a range check on ch since I'm not aware of any
guarantee that all control characters have an integer value <= 26.

How do I do that? Are ctrl chars defined by ascii?


snip commentary


Avoid portability issues and include a \n in your print string.


As a matter of style, the absence of braces will eventually cause you
problems. Right now your if and else are close enough to be visually
obvious. That will not always be the case. Many adopt the style of
using braces even when the range of the loop is a simple one-line
statement. I'm not terribly consistent myself in that situation but I
always use braces when the range occupies multiple lines.


Since you expect the file to contain control character, you should
open it for binary input, not text. The reason is that some operating
systems will cause getc to return EOF when reading the control
character that they think marks the end of text.


Use EXIT_FAILURE instead of 1 for portability.


Avoid portability issues and
putchar('\n');
when you finish.

If you run in Windows, a
getchar();
here will keep the window open long enough for you to read the output.
We'll C. It's time for me to step away.

Remove del for email
#include <stdio.h>

int main(int argc, char **argv)
{

FILE *fp;
void filecopy(FILE *, FILE *);

if (argc < 2) printf("die\n");
else
while (--argc > 0)
if ((fp = fopen(*++argv, "rb")) == NULL)
{
printf("catv can't open %s\n", *argv);
return 1;
}
else
{
filecopy(fp, stdout);
fclose(fp);
}

return 0;
}

/*filecopy */
void filecopy(FILE *ifp, FILE *ofp)
{
int c;
int ch;
int result;
int ch1 = '^';
int ch2;
int ch3 = 'M';

while((ch=getc(ifp)) != EOF)
{
if (iscntrl(ch))
{

result = putchar(ch1);
if (result == EOF)
{
/* Failed, terminate the loop */
break;
}

if (ch == '\177')
{
/* ch is DEL, we want "^?" */
ch2 = '?';
}
else if(ch == 26)
{
/* we don't want ctrl-z coming out of here */
ch2 = '#';
}

else
{
/*
* ch is another control character.
* Transform 1 to 'A', 2 to 'B', etc. using
* our intimate knowledge of ASCII encoding.
*/
ch2 = ch | 0100;
}

/* Print as above */
result = putchar(ch2);
if (result == EOF)
{
break;
}


// outer brace of if (iscntrl(ch))
}


else if (!isascii(ch))
{
if (putchar('M') == EOF || putchar('-') == EOF)
break;
ch = toascii(ch);
putchar(ch);
}



else putchar(ch);


// outer brace of while control

putchar('\n');
// ready to exit

}

getchar();



// outer brace of function
}
// gcc -o catv catv9.c >text22.txt 2>text23.txt
// catv text42.txt >text43.txt
 
Ad

Advertisements

D

David Thompson

"c gordon liddy" <[email protected]> writes:

It uses a common convention (at least it's common on Unix) for
displaying non-printable characters. Control characters in the range
0 to 31 are represented as a '^' followed by another character,
usually an uppercase letter; it's determined by adding 64 to the
value. (On old keyboards, the control key actually worked by clearing
a bit in the 7-bit or 8-bit value that was transmitted.) The DEL
character, 127, is represented as ^?; this is a special case.

^x was common on many ASCII systems. Control clears the 0x40 bit IF in
0x40-0x5F usually (after) ignoring shift=0x20 for letters 0x41-0x5A.
Characters with the high bit set, in the range 128 to 255, are called
"meta" characters (some old keyboards had a "meta" key that set this
bit), and are represented as "M-" followed by the representation of
the corresponding 7-bit character. For example, character 129 would
be printed as M-^A.
Few keyboards had meta, but some programs notably Emacs developed at
MIT used it, and on other others you use prefix ESCape instead. meta
'characters' are conventionally displayed as M-x, but since they are
primarily used interactively and not stored I wouldn't have thought
putting them in cat -v is very useful, more just for completeness.
putchar() returns EOF on failure.

All this (except the EOF part) is very specific to the ASCII character
set, something that's not specified by the C standard, but it should
give you enough information to understand what the code is doing (with
a bit of work).
Yes.


No, it's not a trigraph; trigraphs are introduced by a double question
mark. It's a character constant that uses an escape sequence. '\177'
is the character whose integer value is 177 in octal, or 127 in
decimal; it's the ASCII DEL character. "ch | 0100" yields the value
of ch with a certain bit forced on; it's terse way of mapping
control-A (1) to 'A" and so forth. The conditional expression is used
to handle the fact that mapping DEL to "^?" is a special case.

It needn't be; (in ASCII) ch ^ 0100 /* or 0x40 */ works for both the
'normal control' (!) characters 0x00-0x1F and DEL 0x7F. This is not an
accident; the ^? display was chosen because of this.

- formerly david.thompson1 || achar(64) || worldnet.att.net
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top