cat

J

Jag

I've read parts of K&R's ANSI C v2 and this is what their cat looked
like but when I compared the speed of this code to gnu cat, it seems
very slow. How do I optimize this for greater speeds? is there an
alternative algorithm?

void catfile(FILE *in, FILE *out) {
register int num_char;

/*Get characters*/
while ((num_char = getc(in)) != EOF) {
/*Print to standard output*/
putc(num_char, out);
}
}

Thanks.
 
C

CBFalconer

Jag said:
I've read parts of K&R's ANSI C v2 and this is what their cat
looked like but when I compared the speed of this code to gnu
cat, it seems very slow. How do I optimize this for greater
speeds? is there an alternative algorithm?

void catfile(FILE *in, FILE *out) {
register int num_char;

/*Get characters*/
while ((num_char = getc(in)) != EOF) {
/*Print to standard output*/
putc(num_char, out);
}
}

You may find the following faster:

void catfile(FILE *in, FILE *out) {
int ch;

while (EOF != (ch = getc(in))) putc(out, ch);
}

If so, consider why.
 
M

Micah Cowan

Jag said:
I've read parts of K&R's ANSI C v2 and this is what their cat looked
like but when I compared the speed of this code to gnu cat, it seems
very slow. How do I optimize this for greater speeds? is there an
alternative algorithm?

void catfile(FILE *in, FILE *out) {
register int num_char;

/*Get characters*/
while ((num_char = getc(in)) != EOF) {
/*Print to standard output*/
putc(num_char, out);
}
}

This example was intended to be just that: an example. It gives a
clear, concise method for copying in to out.

If you look at GNU cat's source code, you'll see that it's nowhere
near as clear and concise (this is due in no small part to the fact
that it has to do a good deal more than the above example does,
including some options that require a bit of processing on the input).

POSIX systems that support the thread-safety option have to lock the
I/O streams every time you call getc() or putc(). On such systems,
getc_unlocked() and putc_unlocked() (POSIX options, not Standard C)
will usually be notably faster. Some implementations also provide
other means for using the "unlocked" versions.

Other common techniques would be to read in larger blocks at a time,
or perhaps to avoid the Standard C buffered I/O calls and use POSIX
read(), write() (it's unclear to me how much better that is than to
use setvbuf() to turn off buffering and use fwrite() and fread(),
which is a good option if you'd like to stick to Standard C).

Taking a look at the source code for GNU cat and other similar
utilities like dd, for (Unix-specific) ideas on how to improve
efficiency.
 
M

Micah Cowan

CBFalconer said:
You may find the following faster:

void catfile(FILE *in, FILE *out) {
int ch;

while (EOF != (ch = getc(in))) putc(out, ch);
}

If so, consider why.

I'm having a heck of a time figuring out why that should be any faster
(note that the order of your arguments for putc() are wrong).

The only differences I see are:
1. You changed some names
2. You elided the "register" keyword (which the compiler was free to
ignore anyway)
3. You put EOF first in the comparison
4. You elided some braces and a comment.
 
E

Eric Sosman

CBFalconer said:
You may find the following faster:

void catfile(FILE *in, FILE *out) {
int ch;

while (EOF != (ch = getc(in))) putc(out, ch);
}

If so, consider why.

I don't see why it would be faster. As far as I can
tell, the only substantive change is the removal of
`register', which is unlikely to make a difference --
and if it does make a difference, it's likely to make
the revised code slower, not faster.

(Well, there's one other "speed improvement," and it
could be a large one: Code that doesn't compile uses very
little execution time! Have another look at the putc()
call ...)
 
R

Richard

Micah Cowan said:
I'm having a heck of a time figuring out why that should be any faster
(note that the order of your arguments for putc() are wrong).

The only differences I see are:
1. You changed some names
2. You elided the "register" keyword (which the compiler was free to
ignore anyway)
3. You put EOF first in the comparison
4. You elided some braces and a comment.

It is highly unlikely to be faster. It is though very ugly and unlikely
to pass any half decent code review due to its condition and statement
on one line which makes it nigh on impossible to set a watch point
and/or breakpoint on the putc line if you were trying to trace/debug the
program. Falconer's continual disregard for programming niceties for
teamwork is somewhat surprising considering his vocal insistence on 100%
standards compliance. Unless (and it wouldn't surprise me), he considers
the removal of register and chopping out of some white space as
improving the compilation time ....... And on that subject I often find
it pays dividends in the maintenance stakes to always bracket off the
body of conditions even if they are only one line e.g

,----
| while(c){
| c=do(c);
| }
`----
 
G

Gerry Ford

Ohimigod!

I am recently given to understand that cat means something other than
one-half of the dead felines in arbitrary buckets.

I sense that you drank of the forbidden K&R chapter, forbidden, not by these
jerks but by the standard, Dan Pop, Elvis, and me.
 
M

Micah Cowan

Richard said:
And on that subject I often find
it pays dividends in the maintenance stakes to always bracket off the
body of conditions even if they are only one line e.g

,----
| while(c){
| c=do(c);
| }
`----

You state that it pays dividends, but neglect to state how/why.

I'm guessing you mean that, if you have:

while(c)
c=do(c);

And then later want to add another statement in the while's body, then
you have to go through the tedium of adding braces first.

If that's what you mean, then my answer is:
- It's not appreciably harder to add braces later than it is to put
them in in the first place.
- In a decent editor, such as emacs or vim, it's simple to define a
new macro to automatically add (or remove) braces for one-line
bodies.
 
R

Richard Heathfield

Micah Cowan said:

I'm guessing you mean that, if you have:

while(c)
c=do(c);

And then later want to add another statement in the while's body, then
you have to go through the tedium of adding braces first.

That isn't the issue. If the compound statement has two statements, you
need the braces anyway, and it doesn't make a lot of odds whether you add
them now or later. In fact, if that *were* the only issue, deferring them
could save you (a miniscule amount of) work.
If that's what you mean, then my answer is:
- It's not appreciably harder to add braces later than it is to put
them in in the first place.

Agreed. BUT - it is appreciably harder to remember to add them later on
special occasions than to put them in every time as a matter of habit.
- In a decent editor, such as emacs or vim, it's simple to define a
new macro to automatically add (or remove) braces for one-line
bodies.

That's as maybe, but it isn't the work of adding such tools - it's the work
of remembering to use them. Unless, of course, by "automatic" you really
do mean automatic!! :)
 
M

Micah Cowan

Richard Heathfield said:
Agreed. BUT - it is appreciably harder to remember to add them later on
special occasions than to put them in every time as a matter of habit.

Hm. I haven't found it to be so.

while (c)
c=do_it(c);
c=do_another_thing(c);

looks too broken right away for me not to notice it (though, perhaps
now that I'm doing more Python coding work these days, that may
change?).

I used to actually always put the braces in. I've fallen out of that
practice, just because I find it slightly more readable without, for
one-line bodies.
 
J

Jag

This example was intended to be just that: an example. It gives a
clear, concise method for copying in to out.

If you look at GNU cat's source code, you'll see that it's nowhere
near as clear and concise (this is due in no small part to the fact
that it has to do a good deal more than the above example does,
including some options that require a bit of processing on the input).

POSIX systems that support the thread-safety option have to lock the
I/O streams every time you call getc() or putc(). On such systems,
getc_unlocked() and putc_unlocked() (POSIX options, not Standard C)
will usually be notably faster. Some implementations also provide
other means for using the "unlocked" versions.

Other common techniques would be to read in larger blocks at a time,
or perhaps to avoid the Standard C buffered I/O calls and use POSIX
read(), write() (it's unclear to me how much better that is than to
use setvbuf() to turn off buffering and use fwrite() and fread(),
which is a good option if you'd like to stick to Standard C).

Taking a look at the source code for GNU cat and other similar
utilities like dd, for (Unix-specific) ideas on how to improve
efficiency.

Thanks!
 
R

Richard

Micah Cowan said:
You state that it pays dividends, but neglect to state how/why.

I assumed it would be obvious. I was wrong.
I'm guessing you mean that, if you have:

while(c)
c=do(c);

And then later want to add another statement in the while's body, then
you have to go through the tedium of adding braces first.

Its not tedium. Its just easier at code write IMO. It also helps guard
against any silly mistakes with body lines not being placed in
brackets. It does no harm, so why not?
If that's what you mean, then my answer is:
- It's not appreciably harder to add braces later than it is to put
them in in the first place.

No its not. But why bother later when you can do it then and ensure that
the body is correctly guarded.
- In a decent editor, such as emacs or vim, it's simple to define a
new macro to automatically add (or remove) braces for one-line
bodies.

So what? How you add the brace isn't the issue.

Its just a small "style" things that has served me well over the years.

Like not having multiple statements on one line other than trivial
constant assignments, I find it helps debugging and maintenance. Others
are free to disagree. As they will :-;
 
R

Richard

Micah Cowan said:
Hm. I haven't found it to be so.

Yes. But it wasn't address at just you. I assume you can see why the
code that follows WOULD occur? And how adding that bracket considerably
reduces the chances of it slipping through?
while (c)
c=do_it(c);
c=do_another_thing(c);

looks too broken right away for me not to notice it (though, perhaps
now that I'm doing more Python coding work these days, that may
change?).

And yet
while (c){
c=do_it(c);
c=do_another_thing(c);
}

is absolutely obvious to anyone. Python or not.

K&R were clever guys. I like their indentation. A lot.
I used to actually always put the braces in. I've fallen out of that
practice, just because I find it slightly more readable without, for
one-line bodies.

Nothing really wrong with it for sure.
 
R

regis

Micah said:
Hm. I haven't found it to be so.

while (c)
c=do_it(c);
c=do_another_thing(c);

looks too broken right away for me not to notice it (though, perhaps
now that I'm doing more Python coding work these days, that may
change?).


while (c);
c= do_it(c);

the error above is harder to find than:

while (c);{
c= do_it(c);
}

you cannot miss the ';' between ')' and '{':
it is not a natural location for it, and in fact,
for this very reason, you'll rarely find such a code.

But the first mistake is often found in the code of
my students. And when they ask why their code does not work
as expected, you may have a hard time before catching
the extra semi-colon.
 
C

CBFalconer

Micah said:
I'm having a heck of a time figuring out why that should be any
faster (note that the order of your arguments for putc() are wrong).

I think I just misread the original. Partly due to the extra
erroneous comments.
 
C

Charlton Wilbur

r> while (c); c= do_it(c);

r> the error above is harder to find than:

r> while (c);{ c= do_it(c); }

r> But the first mistake is often found in the code of my
r> students. And when they ask why their code does not work as
r> expected, you may have a hard time before catching the extra
r> semi-colon.

And then, after a semester where they've made this mistake a
half-dozen times, either they start recognizing the symptoms of that
mistake, they have learned to single-step through suspicious code in a
debugger, or they need to switch majors to something less
intellectually taxing.

Charlton
 
M

Micah Cowan

CBFalconer said:
I think I just misread the original.

Ah! I figured it might be something like that (or else that you left
soemthing out that you didn't mean to), but wasn't be sure.
Partly due to the extra
erroneous comments.

Er, erroneous? They look fine (though certainly spurious) to me. After
all, it does "Get characters", and then "Print to standard output".
 
S

SM Ryan

# I've read parts of K&R's ANSI C v2 and this is what their cat looked
# like but when I compared the speed of this code to gnu cat, it seems
# very slow. How do I optimize this for greater speeds? is there an
# alternative algorithm?

The system cat exploits features of the specific system that
are not available in ANSI C. For example on unix, you can
avoid stdio altogether, and do something like
read -> shared buffer -> write

You can also use asynchronous I/O and multiple buffers to
have overlapping reads and writes.

But all this is system specific.
 
C

CBFalconer

Micah said:
Ah! I figured it might be something like that (or else that you left
soemthing out that you didn't mean to), but wasn't be sure.


Er, erroneous? They look fine (though certainly spurious) to me. After
all, it does "Get characters", and then "Print to standard output".

You can make a motion to accept "get chars", but not "print to
standard output". It makes its output to the stream *out. What it
gets are chars encoded as integers, from the stream *in.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top