S
santosh
Richard said:If you can be bothered, what happens if you replace the strcpy() in
RH's code with a memcpy() (the length being already known)?
The average time drops to 1.320000s. Not much.
Richard said:If you can be bothered, what happens if you replace the strcpy() in
RH's code with a memcpy() (the length being already known)?
RH's version = 1.480000s
AT's version = 1.212500s
[system is Pentium Dual Core 1.6 GHz with 1 Gb RAM]
So for this system at least, AT's version is significantly faster.
Richard said:santosh said:
RH's version = 1.480000s
AT's version = 1.212500s
[system is Pentium Dual Core 1.6 GHz with 1 Gb RAM]
So for this system at least, AT's version is significantly faster.
No, it isn't significantly faster. The function is called *once* by
the program of which it is a part. On my system, it takes less than a
microsecond to run. Let's round up and call it one microsecond.
According to your timings, AT's version saves (1.48 - 1.2125)/1.48 =
approximately 0.180743 of the time. This equates to 181 nanoseconds
per program run.
To save as much as a single second, you'd have to run the program well
over five *million* times. If it only took me one minute to verify the
change, update the source, recompile, and re-test, I'd still have to
run the code three hundred million times just to break even. I type at
40wpm, so I can probably manage to add one error message identifier
and meaningful message text in about five seconds. 300,000,000 * 5 =
1,500,000,000 seconds. So breaking even would take 47 years (assuming
I did nothing else for the next 47 years, such as eating and sleeping
and actual programming and watching LOTR and so on).
So in fact it would be counter-productive to adopt the change, even if
I thought it were an improvement, which I don't.
santosh said:Tor said:I don't think so. My guess would be, either the measurement issantosh said:John Bode wrote: [...]
FWIW, after four runs on a 200 MB input string, RH's code is on
average 4% faster than AT's code.
Interesting! This could be a case of simpler constructions being more
easily understood by the optimiser.
misleading, or strcpy() help alignment and execute some code in
parallel.
Well I did a small comparison test between the two version on a string
of length 209,715,200 bytes, with the '.' character just before the
terminating null. I used clock to time the functions. For four runs for
each version here are the averages:
RH's version = 1.480000s
AT's version = 1.212500s
[system is Pentium Dual Core 1.6 GHz with 1 Gb RAM]
So for this system at least, AT's version is significantly faster.
John Bode wrote:
[...]
FWIW, after four runs on a 200 MB input string, RH's code is on
average 4% faster than AT's code.
ROFL!
However, the optimizer can be playing tricks with you here.
Anyway, RH
code was 96% more readable and maintainable, so OP was Trolling.
santosh said:CBFalconer wrote:
.... snip ...
You mean you commence copy when malloc fails?
John Bode said:John Bode wrote:
[...]
FWIW, after four runs on a 200 MB input string, RH's code is on
average 4% faster than AT's code.
ROFL!
However, the optimizer can be playing tricks with you here.
Oh, no doubt. But that kind of reinforces the idea that writing code
in a clear, straightforward way (that is, clear to anyone who *isn't*
an expert in C) is better than trying to save screen real estate.
I'm rewriting the test harness and running it on my home box (Gentoo);
the box I ran it on earlier was RHEL 3 running in a VMware session on
an XP Pro box that's running eleventy billion server processes. I'm
also going to try it with various string lengths and optimization
options.
Anyway, RH
code was 96% more readable and maintainable, so OP was Trolling.
Again, no doubt.
You are splitting hares. It is functionally equivalent - its advantage
is a slightly shorter total length, but in exchange the loop doesn't
have an empty body.
And I find it amusing that you object to me posting under a Usenet
handle, when your own posts are from "Old Wolf".
John Bode said:John Bode wrote:
[...]
FWIW, after four runs on a 200 MB input string, RH's code is on
average 4% faster than AT's code.
ROFL!
However, the optimizer can be playing tricks with you here.
Oh, no doubt. But that kind of reinforces the idea that writing code
in a clear, straightforward way (that is, clear to anyone who *isn't*
an expert in C) is better than trying to save screen real estate.
I'm rewriting the test harness and running it on my home box (Gentoo);
the box I ran it on earlier was RHEL 3 running in a VMware session on
an XP Pro box that's running eleventy billion server processes. I'm
also going to try it with various string lengths and optimization
options.
Anyway, RH
code was 96% more readable and maintainable, so OP was Trolling.
Again, no doubt.
If you prefer RHs code I am astonished. But it takes all sorts I
support. it's not "bad code" by any stretch of the imagination.
As was pointed out earlier that kind of speed increase/decrease is
generally immaterial however, I know for a fact that the time saved in
debugging and reading it is worth its weight in gold if you can not
understand such a bog standard 2 lines then there are issues in your
understanding of C.
while(*u++=(*s=='.' ? '_' : *s))
s++;
It couldn't be easier.
Richard said:Why would you post that?
Off Topic for a start and NOTHING to do with C.
The poster was quite correct in his improvements of RHs code. Regardless
of who or what he is.
¬a\\/b said:In data Tue, 09 Oct 2007 02:39:06 +0200, Richard scrisse: [...]while(*u++=(*s=='.' ? '_' : *s))
s++;It couldn't be easier.
this is easier
goto l2;
l1: ++s; ++u;
l2: *u=(*s=='.'? '_': *s);
if(*s) goto l1;
in 2 lines:
goto l2;
l1: ++s; ++u; l2: *u=(*s=='.'? '_': *s); if(*s) goto l1;
in one line with #==goto
#l2; l1: ++s; ++u; l2: *u=(*s=='.'? '_': *s); if(*s)#l1;
The function below is from Richard HeathField's fgetline program. For
some reason, it makes three passes through the string (a strlen(), a
strcpy() then another pass to change dots) when two would clearly be
sufficient. This could lead to unnecessarily bad performance on very
long strings. It is also written in a hard-to-read and clunky style.
char *dot_to_underscore(const char *s)
{
char *t = malloc(strlen(s) + 1);
if(t != NULL)
{
char *u;
strcpy(t, s);
u = t;
while(*u)
{
if(*u == '.')
{
*u = '_';
}
++u;
}
}
return
t;
}
In data Tue, 09 Oct 2007 02:39:06 +0200, Richard scrisse:
John Bode said:John Bode wrote:
[...]
FWIW, after four runs on a 200 MB input string, RH's code is on
average 4% faster than AT's code.
ROFL!
However, the optimizer can be playing tricks with you here.
Oh, no doubt. But that kind of reinforces the idea that writing code
in a clear, straightforward way (that is, clear to anyone who *isn't*
an expert in C) is better than trying to save screen real estate.
I'm rewriting the test harness and running it on my home box (Gentoo);
the box I ran it on earlier was RHEL 3 running in a VMware session on
an XP Pro box that's running eleventy billion server processes. I'm
also going to try it with various string lengths and optimization
options.
Anyway, RH
code was 96% more readable and maintainable, so OP was Trolling.
Again, no doubt.
If you prefer RHs code I am astonished. But it takes all sorts I
support. it's not "bad code" by any stretch of the imagination.
As was pointed out earlier that kind of speed increase/decrease is
generally immaterial however, I know for a fact that the time saved in
debugging and reading it is worth its weight in gold if you can not
understand such a bog standard 2 lines then there are issues in your
understanding of C.
while(*u++=(*s=='.' ? '_' : *s))
s++;It couldn't be easier.
this is easier
goto l2;
l1: ++s; ++u;
l2: *u=(*s=='.'? '_': *s);
if(*s) goto l1;
in 2 lines:
goto l2;
l1: ++s; ++u; l2: *u=(*s=='.'? '_': *s); if(*s) goto l1;
in one line with #==goto
#l2; l1: ++s; ++u; l2: *u=(*s=='.'? '_': *s); if(*s)#l1;
#.2
.0: B*i='_'; #.1;
.1: ++i,j; .2: al=*j; al=='.'#.0; *i=al; al#.1
I have not seen the fgetline() code, but if the function in question is
called only *once*, then OP "optimized" the *outer* loop, a professional
would go for the *inner* loop.
Your measurement, showed that OP's posted nonsense, an insignificant
micro-optimization.
¬a\/b said:In data Tue, 09 Oct 2007 10:06:04 +0200, ¬a\/b scrisse:
there is jmp more
#.2
.0: B*i='_'
.1: ++i,j; .2: al=*j; al=='.'#.0; *i=al; al#.1
The point was that Mr Heathfield's code was a micro-anti-optimization.
If making the code harder to read in exchange for a small increase in
speed is bad,
how much worse is it to make the code harder to read by
implementing a convoluted two-pass algorithm that's more complex and
slower than the natural, idiomatic one?
Or, if someone makes the small decisions badly, does that give us any
faith that they'll do better on the big decisions?
The function below is from Richard HeathField's fgetline program. For
some reason, it makes three passes through the string (a strlen(), a
strcpy() then another pass to change dots) when two would clearly be
sufficient. This could lead to unnecessarily bad performance on very
long strings.
It is also written in a hard-to-read and clunky style.
char *dot_to_underscore(const char *s)
{
char *t = malloc(strlen(s) + 1);
if(t != NULL)
{
char *u;
strcpy(t, s);
u = t;
while(*u)
{
if(*u == '.')
{
*u = '_';
}
++u;
}
}
return
t;
}
Proposed solution:
char *dot_to_underscore(const char *s)
{
char *t, *u;
if(t=u=malloc(strlen(s)+1))
while(*u++=(*s=='.' ? s++, '_' : *s++));
return t;
}
John Bode said:It could, except that on my system (Gentoo Linux, 2.6.22 kernel, gcc
4.1.2), it doesn't. In fact, it leads to overall *better* performance
on very long strings. I wrote a test harness that generates random
It's time to re-learn those two important lessons:
1. Terse code does not necessarily equate to faster code
2. The only way to know what version of code will be more efficient
is to profile all versions under consideration.
More bad code is written in the name of "efficiency" than anything
else.
Antoninus Twink said:
Whether it is bad depends on the relative merits of performance and
clarity. I have already shown how I would have to use the program 24/7 for
almost half a century before your suggested change could save me so much
as a nanosecond (once the time cost of making that change is factored in),
and quite frankly I have better things to do with my life. So in this
case, the performance increase is meaningless, whereas the loss of clarity
is significant.
We have already demonstrated why "slower" is unimportant. If you seriously
think the original code is hard to read, then words fail me. It's
terribly, terribly simple C. It's astoundingly easy for most C people. I
simply cannot understand why you would find it difficult or complex.
Want to reply to this thread or ask your own question?
You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.