Bug/Gross InEfficiency in HeathField's fgetline program

  • Thread starter Antoninus Twink
  • Start date
J

John Bode

[snip]
Facts not in evidence. As others have pointed out, strcpy() may be
implemented in such a way that is faster than copying individual
characters in a loop. And elsethread RH has pointed out that this
code gets called *once*, that he has benchmarked it for obnoxiously
long inputs (tens of megabytes), and that the performance is perfectly
acceptable (< 1 us according to his profiler).

Regardless, unnecessary call. Sorry.

FWIW, after four runs on a 200 MB input string, RH's code is on
average 4% faster than AT's code.
 
K

Keith Thompson

Richard said:
You don't have to defend Heathfield. His fairly unique style makes it
very clear, as did his followup, exactly what he meant.

I think I've misunderstood what *you* meant. I've been assuming that
you thought RH was claiming that the code doesn't work, and that your
"Word games don't even begin to cover it" remark was aimed at RH.
Looking back at this subthread, I see that this may have been a
misjudgement on my part.

In my defense, your defending Richard Heathfield is so out of
character for you that it didn't occur to me that you were doing so.
And I still don't know what "word games" is supposed to refer to.
He didn't even bother to read the code as he made clear in a later post.

Well, he didn't read it in enough depth to determine whether it works
(I don't recall his exact words). And that's why it's not obvious to
him that ... well, see above.
 
M

Mark McIntyre

What is hilarious? It should detect the failure of malloc quite
reliably. Of course the lack of blanks is rather foul.

in a post complaining about lack of clarity...

--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
 
C

CBFalconer

Thad said:
What happens when malloc returns a null pointer?

While I am basically in favor of the 'tauter code' group, my
rewrite would have been different. For one thing, I don't like the
? coding. My solution:

char *dot_to_underscore(const char *s) {
char *t, *u;

if (!(t = u = malloc(strlen(s) + 1))) {
while (*u = *s++) {
if (*u == '.') *u = '_';
++u;
}
}
return t;
}

However, I dislike taking multiple scans of the same strings, so I
would probably have arranged for the routine to return strlen, and
a negative length if malloc fails. This decision depends heavily
on the use to which the routine is put.

All this is a non-factor, and basically tests individual styles and
preferances. While worthy of a discussion, it is not worth an
argument.
 
S

santosh

John said:
John Bode said:

[snip]
and inefficient.
Facts not in evidence. As others have pointed out, strcpy() may be
implemented in such a way that is faster than copying individual
characters in a loop. And elsethread RH has pointed out that this
code gets called *once*, that he has benchmarked it for obnoxiously
long inputs (tens of megabytes), and that the performance is
perfectly acceptable (< 1 us according to his profiler).

Regardless, unnecessary call. Sorry.

FWIW, after four runs on a 200 MB input string, RH's code is on
average 4% faster than AT's code.

Interesting! This could be a case of simpler constructions being more
easily understood by the optimiser.
 
M

Mark McIntyre

Well, I agree, but you appear to have misunderstood it nonetheless. I said
*precisely* what I meant to say

You are the Cheshire Cat and I claim my five pounds.
gd&r
--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
 
M

Mark McIntyre

Wading through 20 lines of code

"Wading"?
that do a 4-line job in a roundabout way can also be expensive.

This is why we invented computers - to do tricky things like reading
20 lines of code....
--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
 
M

Malcolm McLean

Willem said:
This is another, much more suited for reading by non-C programmers:
char *dot_to_underscore(const char *src)
{
return substitute(copy_string(src), '.', '_');
}

My thought too. Here's my enhancement to the code.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>

/*
replace all instances of target string with destination
Params: src - the input string
tar - target string to search for
rep - replacement string
Returns: malloced pointer to substituted string
Notes: greedy replacement, overlapping targets replaced only once.
*/
char *replace(const char *src, const char *tar, const char *rep)
{
char *sub;
char *answer;
const char *ptr;
char *out;
size_t srclen;
size_t tarlen;
size_t replen;
int Nmatches = 0;

srclen = strlen(src);
tarlen = strlen(tar);
replen = strlen(rep);

assert(tarlen > 0);

ptr = src;
while(sub = strstr(ptr, tar))
{
Nmatches++;
ptr = sub + tarlen;
}
answer = malloc(srclen + Nmatches * replen - Nmatches * tarlen + 1);
if(!answer)
return 0;

ptr = src;
out = answer;
while(sub = strstr(ptr, tar))
{
memcpy(out, ptr, sub - ptr);
out += sub - ptr;
memcpy(out, rep, replen);
out += replen;
ptr = sub + tarlen;
}
strcpy(out, ptr);

return answer;
}

/*
Let's unit test it.
*/
int main(int argc, char **argv)
{
char *result = 0;

printf("Replace all instances of arg2 with arg2 in arg1\n");
if(argc != 4)
exit(EXIT_FAILURE);

result = replace(argv[1], argv[2], argv[3]);
if(result)
printf("%s\n", result);
else
{
fprintf(stderr, "Out of memory\n");
exit(EXIT_FAILURE);
}
free(result);
return 0;
}

The OP's solution was a hacker's approach to the problem. Sometimes that is
what you need, for instance if the replacement function were a bottleneck.
But generally you don't. It is hard to read, hard to translate to other
programming languages.

But is my function actually an improvement?

result = replace(str, ".", "_");

is certainly as readable as the original call, in fact more so, because in
replace() we have a reusable function that can be called many times, and its
behaviour memorised.

However it is substantially more code. It is quite probably over-engineering
the answer. The there are certian problems - what is meant to happen if
target strings overlap, for instance, or if caller passes the empty string
as a target? If it is in a time critical portion of the code, it is
unacceptably slow. Whilst Richard Heathfield's code was lackadisical, it was
bug free and did work.

At the end of the day, it depends on your project. If code quality is
paramount, use my approach, if it is just knock-up code use one of the
others.
 
R

Richard Heathfield

Keith Thompson said:
Well, he didn't read it in enough depth to determine whether it works

Quite so. I'm not sure why Richard Riley finds this hard to understand.
 
O

Old Wolf

I am astonished that people claiming to be professional programmers
could be in any way "confused" or "unsure" about your concise
replacement for Heathfield's version.

,----
| while(*u++=(*s=='.' ? '_' : *s))
| s++;
`----

That is NOT the replacement code suggested by
"Antoninus Twink". Perhaps you should get one of
those threaded newsreaders you keep whingeing
that everybody else should have, so that you can
check your facts before posting.

For the actual code, see:
http://groups.google.co.nz/group/comp.lang.c/msg/f2e171fc9691cd7d
 
A

Antoninus Twink

That is NOT the replacement code suggested by
"Antoninus Twink". Perhaps you should get one of
those threaded newsreaders you keep whingeing
that everybody else should have, so that you can
check your facts before posting.

You are splitting hares. It is functionally equivalent - its advantage
is a slightly shorter total length, but in exchange the loop doesn't
have an empty body.

And I find it amusing that you object to me posting under a Usenet
handle, when your own posts are from "Old Wolf".
 
T

Tor Rustad

John Bode wrote:

[...]
FWIW, after four runs on a 200 MB input string, RH's code is on
average 4% faster than AT's code.


ROFL!

However, the optimizer can be playing tricks with you here. Anyway, RH
code was 96% more readable and maintainable, so OP was Trolling.
 
S

santosh

CBFalconer said:
While I am basically in favor of the 'tauter code' group, my
rewrite would have been different. For one thing, I don't like the
? coding. My solution:

char *dot_to_underscore(const char *s) {
char *t, *u;

if (!(t = u = malloc(strlen(s) + 1))) {

You mean you commence copy when malloc fails?
 
T

Tor Rustad

santosh said:
John Bode wrote:
[...]
FWIW, after four runs on a 200 MB input string, RH's code is on
average 4% faster than AT's code.

Interesting! This could be a case of simpler constructions being more
easily understood by the optimiser.

I don't think so. My guess would be, either the measurement is
misleading, or strcpy() help alignment and execute some code in parallel.
 
C

CBFalconer

Antoninus said:
.... snip ...

You are splitting hares. It is functionally equivalent - its
advantage is a slightly shorter total length, but in exchange the
loop doesn't have an empty body.

Splitting hares will not reduce the poor beasts length, only its
width. The emptiness of the body depends (at least in part) on how
much of the interior spills after the split. :)

You might want to consider 'splitting hairs'.
 
T

Tor Rustad

CBFalconer said:
Thad Smith wrote:
[...]
What happens when malloc returns a null pointer?

While I am basically in favor of the 'tauter code' group, my
rewrite would have been different. For one thing, I don't like the
? coding. My solution:

char *dot_to_underscore(const char *s) {
char *t, *u;

if (!(t = u = malloc(strlen(s) + 1))) {

That should run very fast! :)
 
R

Richard Tobin

Well I did a small comparison test between the two version on a string
of length 209,715,200 bytes, with the '.' character just before the
terminating null. I used clock to time the functions. For four runs for
each version here are the averages:

RH's version = 1.480000s
AT's version = 1.212500s

If you can be bothered, what happens if you replace the strcpy() in
RH's code with a memcpy() (the length being already known)?

-- Richard
 
S

santosh

Tor said:
santosh said:
John Bode wrote:
[...]
FWIW, after four runs on a 200 MB input string, RH's code is on
average 4% faster than AT's code.

Interesting! This could be a case of simpler constructions being more
easily understood by the optimiser.

I don't think so. My guess would be, either the measurement is
misleading, or strcpy() help alignment and execute some code in
parallel.

Well I did a small comparison test between the two version on a string
of length 209,715,200 bytes, with the '.' character just before the
terminating null. I used clock to time the functions. For four runs for
each version here are the averages:

RH's version = 1.480000s
AT's version = 1.212500s

[system is Pentium Dual Core 1.6 GHz with 1 Gb RAM]
So for this system at least, AT's version is significantly faster.

If anyone wants, I'll post the driver code.
 
K

Keith Thompson

Willem said:
By the way, the perl equivalent would be:
(ret = str) =~ s/\./_/g;
But that's off-topic.

Indeed, which is why I won't mention that the *correct* perl
equivalent is:

($ret = $str) =~ s/\./_/g;
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

Fibonacci 0
Adding adressing of IPv6 to program 1
C language. work with text 3
code review 26
Can't solve problems! please Help 0
compressing charatcers 35
Strange bug 65
K&R exercise 5-5 10

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,576
Members
45,054
Latest member
LucyCarper

Latest Threads

Top