Benchmark C vs C++

J

jacob navia

Consider this benchmark

--------------------------------------------------------C++
#include <iostream>
#include <list>
using namespace std;

// Simple example uses type int
#define MAXIT 50000000
int main(void)
{
list<int> L;
int i;

for (i=0; i<MAXIT;i++)
L.push_back(i); // Insert a new element at the end
list<int>::iterator it;
int sum=0;
for (it=L.begin(); it != L.end(); ++it)
sum += *it;
printf("%d\n",sum);
return 0;
}

--------------------------------------------------------C
/* compile with list.c and containererror.c */
#include "containers.h"
#include <stdio.h>
int main(void)
{
#define MAX_IT 50000000

List *l = newList(sizeof(int));
list_element *le;
int i,sum=0;
for (i=0; i<MAX_IT; i++) {
l->lpVtbl->Add(l,&i); // Insert at the end
}
for (le = GetFirst(l); le != NULL; le = GetNext(le)) {
sum += *(int *)le->Data;
}
printf("%d\n",sum);
return 0;
}

---------------------------------------------------------

I think it would be interesting to see the results in as much
machines as possible.

Thanks
 
I

Ike Naar

Consider this benchmark

--------------------------------------------------------C++
#include <iostream>
#include <list>
using namespace std;

// Simple example uses type int
#define MAXIT 50000000
int main(void)
{
list<int> L;
int i;

for (i=0; i<MAXIT;i++)
L.push_back(i); // Insert a new element at the end
list<int>::iterator it;
int sum=0;
for (it=L.begin(); it != L.end(); ++it)
sum += *it;
printf("%d\n",sum);
return 0;
}

--------------------------------------------------------C
/* compile with list.c and containererror.c */
#include "containers.h"
#include <stdio.h>
int main(void)
{
#define MAX_IT 50000000

List *l = newList(sizeof(int));
list_element *le;
int i,sum=0;
for (i=0; i<MAX_IT; i++) {
l->lpVtbl->Add(l,&i); // Insert at the end
}
for (le = GetFirst(l); le != NULL; le = GetNext(le)) {
sum += *(int *)le->Data;
}
printf("%d\n",sum);
return 0;
}

On a SUNW,UltraSPARC-IIIi, using the Sun C++ 5.9 and Sun C 5.9 compilers,
the C++ version runs in 8.1 seconds CPU and needs 622 MB RAM,
the C version (with list.c and containererror.c) runs in 16.4 seconds CPU
and needs 1065 MB RAM.
 
J

jacob navia

For a PC with intel 8 core CPU+12GB RAM
I see:

cl -Ox -EHsc l1.cpp
timethis l1.cpp
TimeThis : Command Line : l1
TimeThis : Start Time : Mon Oct 05 00:38:01 2009

1283106752

TimeThis : Command Line : l1
TimeThis : Start Time : Mon Oct 05 00:38:01 2009
TimeThis : End Time : Mon Oct 05 00:38:05 2009
TimeThis : Elapsed Time : 00:00:04.167

For the C version I see:
timethis listtest

TimeThis : Command Line : listtest
TimeThis : Start Time : Mon Oct 05 00:36:46 2009

1283106752

TimeThis : Command Line : listtest
TimeThis : Start Time : Mon Oct 05 00:36:46 2009
TimeThis : End Time : Mon Oct 05 00:36:50 2009
TimeThis : Elapsed Time : 00:00:04.049

That version is compiled with lcc-win (lcc -O)
 
G

Giorgos Keramidas

Consider this benchmark

--------------------------------------------------------C++
#include <iostream>
#include <list>
using namespace std;

// Simple example uses type int
#define MAXIT 50000000
int main(void)
{
list<int> L;
int i;

for (i=0; i<MAXIT;i++)
L.push_back(i); // Insert a new element at the end
list<int>::iterator it;
int sum=0;
for (it=L.begin(); it != L.end(); ++it)
sum += *it;
printf("%d\n",sum);
return 0;
}

--------------------------------------------------------C
/* compile with list.c and containererror.c */
#include "containers.h"
#include <stdio.h>
int main(void)
{
#define MAX_IT 50000000

List *l = newList(sizeof(int));
list_element *le;
int i,sum=0;
for (i=0; i<MAX_IT; i++) {
l->lpVtbl->Add(l,&i); // Insert at the end
}
for (le = GetFirst(l); le != NULL; le = GetNext(le)) {
sum += *(int *)le->Data;
}
printf("%d\n",sum);
return 0;
}

Hi Jacob,

Can you please post a link where we can fetch the container files too?
I've seen the related thread in c.l.c a few days ago, but the posts have
expired from my newsreader cache and are no longer easy to find.

A download link with a ZIP or plain tar(1) archive would be very nice.
 
J

jacob navia

Hi Jacob,
Can you please post a link where we can fetch the container files too?
I've seen the related thread in c.l.c a few days ago, but the posts have
expired from my newsreader cache and are no longer easy to find.

A download link with a ZIP or plain tar(1) archive would be very nice.

I posted the files again 4-5 hours ago.

They should still be there.

jacob
 
J

jacob navia

Ike Naar a écrit :
On a SUNW,UltraSPARC-IIIi, using the Sun C++ 5.9 and Sun C 5.9 compilers,
the C++ version runs in 8.1 seconds CPU and needs 622 MB RAM,
the C version (with list.c and containererror.c) runs in 16.4 seconds CPU
and needs 1065 MB RAM.

The benchmark makes 50 million calls to malloc.

If malloc is well done, this is the same as C++. If not, it will be quite
bad performance.

The performance of the C version hasn't been optimized at all.
 
B

Ben Bacarisse

jacob navia said:
Consider this benchmark
C++:
#include <list>
C:
#include "containers.h"

Your list is singly-linked is it not? The STL's list is a
doubly-linked list and is therefore more flexible. Comparing them for
speed alone is maybe a little biased.
 
G

Giorgos Keramidas

I posted the files again 4-5 hours ago.
They should still be there.

Ok, I tracked these down and I just finished running 30 iterations of
each test program on my laptop:

: Machine: Thinkpad X61s
: OS: FreeBSD 9.0-CURRENT [svn rev /head@197735]
: Arch: i386
: -- dmesg output --
: CPU: Intel(R) Core(TM)2 Duo CPU L7700 @ 1.80GHz (1795.51-MHz
: 686-class CPU)
: Origin = "GenuineIntel" Id = 0x6fb Stepping = 11
: Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
: Features2=0xe3bd<SSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM>
: AMD Features=0x20100000<NX,LM>
: AMD Features2=0x1<LAHF>
: TSC: P-state invariant
: real memory = 3221225472 (3072 MB)
: avail memory = 3119816704 (2975 MB)

Comparing the real/user/sys times of each program with the `ministat'
utility shows that there is a marked difference in wall clock times and
userland/user-level times:

: Real Times
: ==========
:
: x real-cpp
: + real-stdc
: +------------------------------------------------------------------------------+
: | ++ x |
: | ++ xx |
: | ++ xx |
: | ++ xx |
: | ++ xx |
: | ++ xx |
: | ++ xx |
: | ++ xx |
: | +++ xx |
: | +++ xx |
: | +++ xxx |
: | ++++ + xxx x x|
: ||_MA_| |___MA____| |
: +------------------------------------------------------------------------------+
: N Min Max Median Avg Stddev
: x 30 12.116 13.127 12.17 12.2203 0.20618943
: + 30 9.833 10.526 9.907 9.9256667 0.11781205
: Difference at 95.0% confidence
: -2.29463 +/- 0.0867998
: -18.7772% +/- 0.710292%
: (Student's t, pooled s = 0.167919)


: User Times
: ==========
:
: x user-cpp
: + user-stdc
: +------------------------------------------------------------------------------+
: | + |
: | + |
: | ++ |
: | ++ x |
: | +++ xx |
: | +++ xxx |
: | +++++ x xxxx |
: | +++++ x x xxxxxx |
: |+ ++++++ + x xx xxxxxxx x|
: | |__MA__| |____A___| |
: +------------------------------------------------------------------------------+
: N Min Max Median Avg Stddev
: x 30 10.241 11.347 10.579 10.569833 0.1915676
: + 30 8.044 8.98 8.35 8.3697333 0.14194388
: Difference at 95.0% confidence
: -2.2001 +/- 0.0871474
: -20.8149% +/- 0.824491%
: (Student's t, pooled s = 0.168592)

But a much smaller difference in system/kernel times:

: System Times
: ============
:
: x sys-cpp
: + sys-stdc
: +------------------------------------------------------------------------------+
: | + ++ + |
: | + x + ++ x ++ + xx x |
: |x +* x + +* x xx**+* **++** + xx* x +x xxx x + xx|
: | |_|_________MA_____M_A____|_____________| |
: +------------------------------------------------------------------------------+
: N Min Max Median Avg Stddev
: x 30 1.279 1.786 1.484 1.4974333 0.12388724
: + 30 1.309 1.735 1.438 1.4421 0.084948808
: Difference at 95.0% confidence
: -0.0553333 +/- 0.0549054
: -3.69521% +/- 3.66663%
: (Student's t, pooled s = 0.106218)
 
G

gwowen

The benchmark makes 50 million calls to malloc.

If malloc is well done, this is the same as C++. If not, it will be quite
bad performance.

What happens in the case of memory exhaustion?
In the C++ case, push_back() throws std::bad_alloc.
For similar semantics, do you need to check a return value from
l->lpVtbl->Add() ?
 
J

jacob navia

gwowen a écrit :
What happens in the case of memory exhaustion?
In the C++ case, push_back() throws std::bad_alloc.
For similar semantics, do you need to check a return value from
l->lpVtbl->Add() ?

There are two mechanisms.
First, a callback will be called with the name of the function that failed, and
a numeric code indicating which error happened.

Second, (if the callback returns) a NULL value will be returned.
 
I

Ike Naar

Ike Naar a écrit :

The benchmark makes 50 million calls to malloc.
If malloc is well done, this is the same as C++. If not, it will be quite
bad performance.

In the above implementation, the C++ Standard Library uses malloc
under the hood.
 
J

jacob navia

Ike Naar a écrit :
In the above implementation, the C++ Standard Library uses malloc
under the hood.

Sure, but not 50 million times. I am implementing a simple heap manager
(that will be used by all containers) to avoid calling malloc 50
million times.

It was very instructive to see where the weak points are. For instance
I tried to use a little memory as possible, but since I am allocating
50 million small blocks of 8 bytes, overhead for malloc goes up to 100%!

It is better to allocate bigger blocks and not try to optimize memory as
this version does.

Thanks for your interest

jacob
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top