J
jl_post
Hi,
I've recently been toying around with Inline::C, benchmarking
certain variations of C and Perl and examining the results.
I was curious about the performance penalty of calling an extra
function, so I created three different (but similar C functions). All
of them use the Pythagorean theorem to calculate the average distance
from the origin for all points with integer coordinates from 1 to
100. However, one function calls distance() to compute the distance,
another is the same except that it calls an inlined function, and the
other doesn't use a distance() function -- it just uses the "unrolled"
code, calling the distance logic in place of the function.
I theorized that the "unrolled" code would be the fastest, followed
by the code that called the inlined function, followed by the code
that called the non-inlined function.
However, to my surprise, when I benchmarked the code, I saw that
the code with the non-inlined function ran consistently faster, while
the other two sets of code ran at about the same speed, with the code
that doesn't call the function being a little faster than the code
that called the inline function.
If you're curious, here is the code I used:
#!/usr/bin/perl
use strict;
use warnings;
use Inline 'C' => <<'END_OF_C_CODE';
/* Given a set of integer coordinates, this function
* calculates the distance from the origin. */
double distance(int x, int y)
{
return sqrt(x*x + y*y);
}
/* Same as distance(), but declared inline. */
inline double inline_distance(int x, int y)
{
return sqrt(x*x + y*y);
}
/* This function loops through all integer coordinates
* from (1,1) to (100,100) and returns the average
* distance from the origin. The distance is computed
* without calling distance() nor inline_distance(). */
double c_unrolled()
{
int x, y;
int numEntries = 0;
double total = 0;
for (x = 1; x <= 100; x++)
{
for (y = 1; y <= 100; y++)
{
total += sqrt(x*x + y*y);
numEntries++;
}
}
return total/numEntries;
}
/* Same as c_unrolled(), except that the distance
* from the origin is computed with distance(). */
double c_with_function()
{
int x, y;
int numEntries = 0;
double total = 0;
for (x = 1; x <= 100; x++)
{
for (y = 1; y <= 100; y++)
{
total += distance(x, y);
numEntries++;
}
}
return total/numEntries;
}
/* Same as c_with_function(), except that the distance
* from the origin is computed with inline_distance(). */
double c_with_inline()
{
int x, y;
int numEntries = 0;
double total = 0;
for (x = 1; x <= 100; x++)
{
for (y = 1; y <= 100; y++)
{
total += inline_distance(x, y);
numEntries++;
}
}
return total/numEntries;
}
END_OF_C_CODE
die "Usage: perl $0 <NUM_TIMES_TO_TEST>\n",
"Sample usage: perl $0 10_000\n"
unless @ARGV == 1;
my ($count) = @ARGV;
$count =~ tr/_//d; # remove all '_' characters
use Benchmark ':all', ':hireswallclock';
my $results = timethese($count, {
'C unrolled' => 'c_unrolled()',
'C with function' => 'c_with_function()',
'C with inline' => 'c_with_inline()',
});
cmpthese($results);
__END__
When I ran this code with the following command:
perl extra_function_c.pl 100_000
I got the following as output:
Rate C with inline C unrolled C with
function
C with inline 11090/s -- -0%
-11%
C unrolled 11130/s 0% --
-10%
C with function 12422/s 12% 12%
--
So calling the C code that made use of an extra function call was
actually faster! But why is this so?
In case anyone wants to know, my "gcc -v" output is:
Reading specs from C:/strawberry/c/bin/../lib/gcc/mingw32/3.4.5/specs
Configured with: ../gcc-3.4.5-20060117-3/configure --with-gcc --with-
gnu-ld --with-gnu-as --host=mingw32 --target=mingw32 --prefix=/mingw --
enable-threads --disable-nls --enable-languages=c,c+
+,f77,ada,objc,java --disable-win32-registry --disable-shared --enable-
sjlj-exceptions --enable-libgcj --disable-java-awt --without-x --
enable-java-gc=boehm --disable-libgcj-debug --enable-interpreter --
enable-hash-synchronization --enable-libstdcxx-debug
Thread model: win32
gcc version 3.4.5 (mingw-vista special r3)
Still curious, I created an all-C file that contained the C code in
my Perl script (on the same platform). When I compiled, ran, and
timed it, I saw that the C code without the function call was the
fastest (while the C code that used the inline function was the
slowest). This is in contrast to the Perl Benchmark findings, which
say that the function with the non-inlined function call was the
fastest.
(Incidentally, I searched on the web as to why the code that called
the inlined function might be slowest, and I discovered that inlining
functions doesn't necessarily make the code any faster. This might
explain why it's always the slowest when benchmarking.)
So I'm curious if other people also see similar results as mine
when running the above Perl script. And if so, why would the C code
with the extra function call be consistently faster in Perl (while not
in straight C)? The fact that the code with the extra function call
is faster seems counter-intuitive to me, no matter what platform I'm
using.
Thanks in advance for any advice, tips, or general wisdom.
-- Jean-Luc
I've recently been toying around with Inline::C, benchmarking
certain variations of C and Perl and examining the results.
I was curious about the performance penalty of calling an extra
function, so I created three different (but similar C functions). All
of them use the Pythagorean theorem to calculate the average distance
from the origin for all points with integer coordinates from 1 to
100. However, one function calls distance() to compute the distance,
another is the same except that it calls an inlined function, and the
other doesn't use a distance() function -- it just uses the "unrolled"
code, calling the distance logic in place of the function.
I theorized that the "unrolled" code would be the fastest, followed
by the code that called the inlined function, followed by the code
that called the non-inlined function.
However, to my surprise, when I benchmarked the code, I saw that
the code with the non-inlined function ran consistently faster, while
the other two sets of code ran at about the same speed, with the code
that doesn't call the function being a little faster than the code
that called the inline function.
If you're curious, here is the code I used:
#!/usr/bin/perl
use strict;
use warnings;
use Inline 'C' => <<'END_OF_C_CODE';
/* Given a set of integer coordinates, this function
* calculates the distance from the origin. */
double distance(int x, int y)
{
return sqrt(x*x + y*y);
}
/* Same as distance(), but declared inline. */
inline double inline_distance(int x, int y)
{
return sqrt(x*x + y*y);
}
/* This function loops through all integer coordinates
* from (1,1) to (100,100) and returns the average
* distance from the origin. The distance is computed
* without calling distance() nor inline_distance(). */
double c_unrolled()
{
int x, y;
int numEntries = 0;
double total = 0;
for (x = 1; x <= 100; x++)
{
for (y = 1; y <= 100; y++)
{
total += sqrt(x*x + y*y);
numEntries++;
}
}
return total/numEntries;
}
/* Same as c_unrolled(), except that the distance
* from the origin is computed with distance(). */
double c_with_function()
{
int x, y;
int numEntries = 0;
double total = 0;
for (x = 1; x <= 100; x++)
{
for (y = 1; y <= 100; y++)
{
total += distance(x, y);
numEntries++;
}
}
return total/numEntries;
}
/* Same as c_with_function(), except that the distance
* from the origin is computed with inline_distance(). */
double c_with_inline()
{
int x, y;
int numEntries = 0;
double total = 0;
for (x = 1; x <= 100; x++)
{
for (y = 1; y <= 100; y++)
{
total += inline_distance(x, y);
numEntries++;
}
}
return total/numEntries;
}
END_OF_C_CODE
die "Usage: perl $0 <NUM_TIMES_TO_TEST>\n",
"Sample usage: perl $0 10_000\n"
unless @ARGV == 1;
my ($count) = @ARGV;
$count =~ tr/_//d; # remove all '_' characters
use Benchmark ':all', ':hireswallclock';
my $results = timethese($count, {
'C unrolled' => 'c_unrolled()',
'C with function' => 'c_with_function()',
'C with inline' => 'c_with_inline()',
});
cmpthese($results);
__END__
When I ran this code with the following command:
perl extra_function_c.pl 100_000
I got the following as output:
Rate C with inline C unrolled C with
function
C with inline 11090/s -- -0%
-11%
C unrolled 11130/s 0% --
-10%
C with function 12422/s 12% 12%
--
So calling the C code that made use of an extra function call was
actually faster! But why is this so?
In case anyone wants to know, my "gcc -v" output is:
Reading specs from C:/strawberry/c/bin/../lib/gcc/mingw32/3.4.5/specs
Configured with: ../gcc-3.4.5-20060117-3/configure --with-gcc --with-
gnu-ld --with-gnu-as --host=mingw32 --target=mingw32 --prefix=/mingw --
enable-threads --disable-nls --enable-languages=c,c+
+,f77,ada,objc,java --disable-win32-registry --disable-shared --enable-
sjlj-exceptions --enable-libgcj --disable-java-awt --without-x --
enable-java-gc=boehm --disable-libgcj-debug --enable-interpreter --
enable-hash-synchronization --enable-libstdcxx-debug
Thread model: win32
gcc version 3.4.5 (mingw-vista special r3)
Still curious, I created an all-C file that contained the C code in
my Perl script (on the same platform). When I compiled, ran, and
timed it, I saw that the C code without the function call was the
fastest (while the C code that used the inline function was the
slowest). This is in contrast to the Perl Benchmark findings, which
say that the function with the non-inlined function call was the
fastest.
(Incidentally, I searched on the web as to why the code that called
the inlined function might be slowest, and I discovered that inlining
functions doesn't necessarily make the code any faster. This might
explain why it's always the slowest when benchmarking.)
So I'm curious if other people also see similar results as mine
when running the above Perl script. And if so, why would the C code
with the extra function call be consistently faster in Perl (while not
in straight C)? The fact that the code with the extra function call
is faster seems counter-intuitive to me, no matter what platform I'm
using.
Thanks in advance for any advice, tips, or general wisdom.
-- Jean-Luc