Bug/Gross InEfficiency in HeathField's fgetline program

John Bode · Oct 8, 2007

[snip]

Facts not in evidence. As others have pointed out, strcpy() may be
implemented in such a way that is faster than copying individual
characters in a loop. And elsethread RH has pointed out that this
code gets called *once*, that he has benchmarked it for obnoxiously
long inputs (tens of megabytes), and that the performance is perfectly
acceptable (< 1 us according to his profiler).

Click to expand...

Regardless, unnecessary call. Sorry.

FWIW, after four runs on a 200 MB input string, RH's code is on
average 4% faster than AT's code.

Keith Thompson · Oct 8, 2007

Richard said:
You don't have to defend Heathfield. His fairly unique style makes it
very clear, as did his followup, exactly what he meant.

I think I've misunderstood what *you* meant. I've been assuming that
you thought RH was claiming that the code doesn't work, and that your
"Word games don't even begin to cover it" remark was aimed at RH.
Looking back at this subthread, I see that this may have been a
misjudgement on my part.

In my defense, your defending Richard Heathfield is so out of
character for you that it didn't occur to me that you were doing so.
And I still don't know what "word games" is supposed to refer to.

He didn't even bother to read the code as he made clear in a later post.

Well, he didn't read it in enough depth to determine whether it works
(I don't recall his exact words). And that's why it's not obvious to
him that ... well, see above.

Mark McIntyre · Oct 8, 2007

What is hilarious? It should detect the failure of malloc quite
reliably. Of course the lack of blanks is rather foul.

in a post complaining about lack of clarity...

--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan

CBFalconer · Oct 8, 2007

Thad said:
What happens when malloc returns a null pointer?

While I am basically in favor of the 'tauter code' group, my
rewrite would have been different. For one thing, I don't like the
? coding. My solution:

char *dot_to_underscore(const char *s) {
char *t, *u;

if (!(t = u = malloc(strlen(s) + 1))) {
while (*u = *s++) {
if (*u == '.') *u = '_';
++u;
}
}
return t;
}

However, I dislike taking multiple scans of the same strings, so I
would probably have arranged for the routine to return strlen, and
a negative length if malloc fails. This decision depends heavily
on the use to which the routine is put.

All this is a non-factor, and basically tests individual styles and
preferances. While worthy of a discussion, it is not worth an
argument.

santosh · Oct 8, 2007

John said:
John Bode said:

news:[email protected]...

Click to expand...

[snip]

and inefficient.

Click to expand...

Facts not in evidence. As others have pointed out, strcpy() may be
implemented in such a way that is faster than copying individual
characters in a loop. And elsethread RH has pointed out that this
code gets called *once*, that he has benchmarked it for obnoxiously
long inputs (tens of megabytes), and that the performance is
perfectly acceptable (< 1 us according to his profiler).

Click to expand...

Regardless, unnecessary call. Sorry.

Click to expand...

FWIW, after four runs on a 200 MB input string, RH's code is on
average 4% faster than AT's code.

Interesting! This could be a case of simpler constructions being more
easily understood by the optimiser.

Mark McIntyre · Oct 8, 2007

Well, I agree, but you appear to have misunderstood it nonetheless. I said
*precisely* what I meant to say

You are the Cheshire Cat and I claim my five pounds.
gd&r
--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan

Mark McIntyre · Oct 8, 2007

Wading through 20 lines of code

"Wading"?

that do a 4-line job in a roundabout way can also be expensive.

This is why we invented computers - to do tricky things like reading
20 lines of code....
--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan

Malcolm McLean · Oct 8, 2007

Willem said:
This is another, much more suited for reading by non-C programmers:
char *dot_to_underscore(const char *src)
{
return substitute(copy_string(src), '.', '_');
}

My thought too. Here's my enhancement to the code.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>

/*
replace all instances of target string with destination
Params: src - the input string
tar - target string to search for
rep - replacement string
Returns: malloced pointer to substituted string
Notes: greedy replacement, overlapping targets replaced only once.
*/
char *replace(const char *src, const char *tar, const char *rep)
{
char *sub;
char *answer;
const char *ptr;
char *out;
size_t srclen;
size_t tarlen;
size_t replen;
int Nmatches = 0;

srclen = strlen(src);
tarlen = strlen(tar);
replen = strlen(rep);

assert(tarlen > 0);

ptr = src;
while(sub = strstr(ptr, tar))
{
Nmatches++;
ptr = sub + tarlen;
}
answer = malloc(srclen + Nmatches * replen - Nmatches * tarlen + 1);
if(!answer)
return 0;

ptr = src;
out = answer;
while(sub = strstr(ptr, tar))
{
memcpy(out, ptr, sub - ptr);
out += sub - ptr;
memcpy(out, rep, replen);
out += replen;
ptr = sub + tarlen;
}
strcpy(out, ptr);

return answer;
}

/*
Let's unit test it.
*/
int main(int argc, char **argv)
{
char *result = 0;

printf("Replace all instances of arg2 with arg2 in arg1\n");
if(argc != 4)
exit(EXIT_FAILURE);

result = replace(argv[1], argv[2], argv[3]);
if(result)
printf("%s\n", result);
else
{
fprintf(stderr, "Out of memory\n");
exit(EXIT_FAILURE);
}
free(result);
return 0;
}

The OP's solution was a hacker's approach to the problem. Sometimes that is
what you need, for instance if the replacement function were a bottleneck.
But generally you don't. It is hard to read, hard to translate to other
programming languages.

But is my function actually an improvement?

result = replace(str, ".", "_");

is certainly as readable as the original call, in fact more so, because in
replace() we have a reusable function that can be called many times, and its
behaviour memorised.

However it is substantially more code. It is quite probably over-engineering
the answer. The there are certian problems - what is meant to happen if
target strings overlap, for instance, or if caller passes the empty string
as a target? If it is in a time critical portion of the code, it is
unacceptably slow. Whilst Richard Heathfield's code was lackadisical, it was
bug free and did work.

At the end of the day, it depends on your project. If code quality is
paramount, use my approach, if it is just knock-up code use one of the
others.

Richard Heathfield · Oct 8, 2007

John Bode said:

FWIW, after four runs on a 200 MB input string, RH's code is on
average 4% faster than AT's code.

I couldn't possibly comment.

Richard Heathfield · Oct 8, 2007

Keith Thompson said:

Well, he didn't read it in enough depth to determine whether it works

Quite so. I'm not sure why Richard Riley finds this hard to understand.

Old Wolf · Oct 8, 2007

I am astonished that people claiming to be professional programmers
could be in any way "confused" or "unsure" about your concise
replacement for Heathfield's version.

,----
| while(*u++=(*s=='.' ? '_' : *s))
| s++;
`----

That is NOT the replacement code suggested by
"Antoninus Twink". Perhaps you should get one of
those threaded newsreaders you keep whingeing
that everybody else should have, so that you can
check your facts before posting.

For the actual code, see:
http://groups.google.co.nz/group/comp.lang.c/msg/f2e171fc9691cd7d

Antoninus Twink · Oct 8, 2007

That is NOT the replacement code suggested by
"Antoninus Twink". Perhaps you should get one of
those threaded newsreaders you keep whingeing
that everybody else should have, so that you can
check your facts before posting.

You are splitting hares. It is functionally equivalent - its advantage
is a slightly shorter total length, but in exchange the loop doesn't
have an empty body.

And I find it amusing that you object to me posting under a Usenet
handle, when your own posts are from "Old Wolf".

Tor Rustad · Oct 8, 2007

John Bode wrote:

[...]

FWIW, after four runs on a 200 MB input string, RH's code is on
average 4% faster than AT's code.

ROFL!

However, the optimizer can be playing tricks with you here. Anyway, RH
code was 96% more readable and maintainable, so OP was Trolling.

santosh · Oct 8, 2007

CBFalconer said:
While I am basically in favor of the 'tauter code' group, my
rewrite would have been different. For one thing, I don't like the
? coding. My solution:

char *dot_to_underscore(const char *s) {
char *t, *u;

if (!(t = u = malloc(strlen(s) + 1))) {

You mean you commence copy when malloc fails?

Tor Rustad · Oct 8, 2007

santosh said:
John Bode wrote:
[...]

FWIW, after four runs on a 200 MB input string, RH's code is on
average 4% faster than AT's code.

Click to expand...

Interesting! This could be a case of simpler constructions being more
easily understood by the optimiser.

I don't think so. My guess would be, either the measurement is
misleading, or strcpy() help alignment and execute some code in parallel.

CBFalconer · Oct 8, 2007

Antoninus said:
.... snip ...

You are splitting hares. It is functionally equivalent - its
advantage is a slightly shorter total length, but in exchange the
loop doesn't have an empty body.

Splitting hares will not reduce the poor beasts length, only its
width. The emptiness of the body depends (at least in part) on how
much of the interior spills after the split.

You might want to consider 'splitting hairs'.

Tor Rustad · Oct 8, 2007

CBFalconer said:
Thad Smith wrote:
[...]

What happens when malloc returns a null pointer?

Click to expand...

While I am basically in favor of the 'tauter code' group, my
rewrite would have been different. For one thing, I don't like the
? coding. My solution:

char *dot_to_underscore(const char *s) {
char *t, *u;

if (!(t = u = malloc(strlen(s) + 1))) {

That should run very fast!

Richard Tobin · Oct 9, 2007

Well I did a small comparison test between the two version on a string
of length 209,715,200 bytes, with the '.' character just before the
terminating null. I used clock to time the functions. For four runs for
each version here are the averages:

RH's version = 1.480000s
AT's version = 1.212500s

If you can be bothered, what happens if you replace the strcpy() in
RH's code with a memcpy() (the length being already known)?

-- Richard

santosh · Oct 9, 2007

Tor said:
santosh said:

John Bode wrote:
[...]

FWIW, after four runs on a 200 MB input string, RH's code is on
average 4% faster than AT's code.

Click to expand...

Interesting! This could be a case of simpler constructions being more
easily understood by the optimiser.

Click to expand...

I don't think so. My guess would be, either the measurement is
misleading, or strcpy() help alignment and execute some code in
parallel.

Well I did a small comparison test between the two version on a string
of length 209,715,200 bytes, with the '.' character just before the
terminating null. I used clock to time the functions. For four runs for
each version here are the averages:

RH's version = 1.480000s
AT's version = 1.212500s

[system is Pentium Dual Core 1.6 GHz with 1 Gb RAM]
So for this system at least, AT's version is significantly faster.

If anyone wants, I'll post the driver code.

Keith Thompson · Oct 9, 2007

Willem said:
By the way, the perl equivalent would be:
(ret = str) =~ s/\./_/g;
But that's off-topic.

Indeed, which is why I won't mention that the *correct* perl
equivalent is:

($ret = $str) =~ s/\./_/g;

Fibonacci	0	May 13, 2023
Adding adressing of IPv6 to program	1	Feb 16, 2023
C language. work with text	3	Dec 10, 2021
code review	26	Feb 6, 2004
Can't solve problems! please Help	0	Sep 26, 2022
compressing charatcers	35	Apr 2, 2014
Strange bug	65	Nov 19, 2010
K&R exercise 5-5	10	Feb 20, 2007

Bug/Gross InEfficiency in HeathField's fgetline program

John Bode

Keith Thompson

Mark McIntyre

CBFalconer

santosh

Mark McIntyre

Mark McIntyre

Malcolm McLean

Richard Heathfield

Richard Heathfield

Old Wolf

Antoninus Twink

Tor Rustad

santosh

Tor Rustad

CBFalconer

Tor Rustad

Richard Tobin

santosh

Keith Thompson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads