bench: binary trees

M

Melzzzzz

I have played a little tree with benchmark from computer language game
site:
http://shootout.alioth.debian.org/u64q/performance.php?test=binarytrees

Interested in gc vs manual I have been very surprised with Java
performance ;)
Seems that Java gc is blazingly fast in comparison with standard
new/delete.
I think this is because gc deallocates in chunks rather piece by
piece, perhaps allocates from pool. Also there are no destructors
calls.
fastest C++ program on this site uses boost pool which calls
destructors and can deallocate, therefore, is slower then C variant
which uses apache's pool, which just deallocate.

So I have coded simple custom pool that does not keeps track of
allocated objects and doesn't call destructors and result is
that it is slightly faster than C ;)

What is clear is that Java garbage collector cannot be bitten
in this case if manual deletion of objects is required.
(as seen here)
http://shootout.alioth.debian.org/u64/performance.php?test=binarytrees

Timings for cpp with my custom alloc
bmaxa@maxa:~/shootout/trees$ time ./binarytreescpp 20
stretch tree of depth 21 check: -1
2097152 trees of depth 4 check: -2097152
524288 trees of depth 6 check: -524288
131072 trees of depth 8 check: -131072
32768 trees of depth 10 check: -32768
8192 trees of depth 12 check: -8192
2048 trees of depth 14 check: -2048
512 trees of depth 16 check: -512
128 trees of depth 18 check: -128
32 trees of depth 20 check: -32
long lived tree of depth 20 check: -1

real 0m4.188s
user 0m7.036s
sys 0m0.112s
bmaxa@maxa:~/shootout/trees$ time taskset -c 0 ./binarytreescpp 20
stretch tree of depth 21 check: -1
2097152 trees of depth 4 check: -2097152
524288 trees of depth 6 check: -524288
131072 trees of depth 8 check: -131072
32768 trees of depth 10 check: -32768
8192 trees of depth 12 check: -8192
2048 trees of depth 14 check: -2048
512 trees of depth 16 check: -512
128 trees of depth 18 check: -128
32 trees of depth 20 check: -32
long lived tree of depth 20 check: -1

real 0m7.077s
user 0m6.976s
sys 0m0.084s

Timings for C with apache pool:
bmaxa@maxa:~/shootout/trees$ time ./binarytreesc 20
stretch tree of depth 21 check: -1
2097152 trees of depth 4 check: -2097152
524288 trees of depth 6 check: -524288
131072 trees of depth 8 check: -131072
32768 trees of depth 10 check: -32768
8192 trees of depth 12 check: -8192
2048 trees of depth 14 check: -2048
512 trees of depth 16 check: -512
128 trees of depth 18 check: -128
32 trees of depth 20 check: -32
long lived tree of depth 20 check: -1

real 0m4.625s
user 0m8.085s
sys 0m0.140s
bmaxa@maxa:~/shootout/trees$ time taskset -c 0 ./binarytreesc 20
stretch tree of depth 21 check: -1
2097152 trees of depth 4 check: -2097152
524288 trees of depth 6 check: -524288
131072 trees of depth 8 check: -131072
32768 trees of depth 10 check: -32768
8192 trees of depth 12 check: -8192
2048 trees of depth 14 check: -2048
512 trees of depth 16 check: -512
128 trees of depth 18 check: -128
32 trees of depth 20 check: -32
long lived tree of depth 20 check: -1

real 0m8.009s
user 0m7.888s
sys 0m0.116s

code follows if someone is interested:
/* The Computer Language Benchmarks Game
* http://shootout.alioth.debian.org/
*
* contributed by Jon Harrop
* modified by Alex Mizrahi
* modified by Andreas Schäfer
* very minor omp tweak by The Anh Tran
* custom pool Branimir Maksimovic
*/

#include <iostream>
#include <stdlib.h>
#include <stdio.h>
#include <vector>
#include <new>
#include <pthread.h>
#include <boost/pool/object_pool.hpp>

const size_t LINE_SIZE = 64;

template <class T>
class Pool{
static const unsigned NELEM = 10000*sizeof(T);
public:
Pool()
: pos_(0)
{
pool_.push_back(alloc());
}
Pool(const Pool&)=delete;
Pool& operator=(const Pool&)=delete;
~Pool()
{
for(auto i : pool_)
{
free(i);
}
}
template <class ...T1>
T* allocate(T1... args)
{
if(NELEM <= pos_)
{
pool_.push_back(alloc());
pos_=0;
}
void* p = &pool_.back()[pos_];
pos_+=sizeof(T);
return new(p)T(args...);
}
private:
static char* alloc()
{
Inner* next = (Inner*)pthread_getspecific(k.key);
if(next == nullptr)
{
next = (Inner*)::eek:perator new(NELEM);
next->next = nullptr;
}
char* p = (char*)next;
next = next->next;
pthread_setspecific(k.key,next);
return p;
}
static void free(void* p)
{
Inner* next = (Inner*)pthread_getspecific(k.key);
Inner* tmp = (Inner*)p;
tmp->next = next;
next = tmp;
pthread_setspecific(k.key,next);
}
std::vector<char*> pool_;
unsigned pos_;
struct Inner{
Inner* next;
};
static struct Key{
Key()
{
pthread_key_create(&key,destroy);
}
~Key()
{
pthread_key_delete(key);
}
static void destroy(void* p)
{
Inner* next = (Inner*)p;
while(next)
{
Inner* tmp = next->next;
::eek:perator delete(next);
next = tmp;
}
}
pthread_key_t key;
}k;
};

template <class T>
typename Pool<T>::Key Pool<T>::k;

struct Node
{
Node *l, *r;
int i;

Node(int i2) :l(0),r(0), i(i2)
{}
Node(Node *l2, int i2, Node *r2) : l(l2), r(r2), i(i2)
{}
Node(const Node&)=delete;
Node& operator=(const Node&)=delete;
int check() const
{
if (l)
return l->check() + i - r->check();
else return i;
}
};

typedef boost::eek:bject_pool<Node> NodePool;

Node *make(int i, int d, Pool<Node>& p)
{
if (d > 0)
return p.allocate( make(2*i-1, d-1,p),
i,
make(2*i, d-1,p));
return p.allocate(i);
}

Node *make(int i, int d, NodePool& p)
{
if (d > 0)
return p.construct( make(2*i-1, d-1,p),
i,
make(2*i, d-1,p));
return p.construct(i);
}

int main(int argc, char *argv[])
{
int min_depth = 4;
int max_depth = std::max(min_depth+2,
(argc == 2 ? atoi(argv[1]) : 10));
int stretch_depth = max_depth+1;

// Alloc then dealloc stretchdepth tree
{
Pool<Node> c_pool;
//NodePool c_pool;
Node* c (make(0, stretch_depth,c_pool));
std::cout << "stretch tree of depth " << stretch_depth << "\t "
<< "check: " << c->check() << '\n';
}

Pool<Node> long_lived_tree_pool;
//NodePool long_lived_tree_pool;
Node* long_lived_tree(make(0, max_depth,long_lived_tree_pool));

char *outputstr = (char*)malloc(LINE_SIZE * (max_depth +1) * sizeof(char));

#pragma omp parallel for
for (int d = min_depth; d <= max_depth; d += 2)
{
int iterations = 1 << (max_depth - d + min_depth);
int c = 0;

for (int i = 1; i <= iterations; ++i)
{
Pool<Node> pool;
//NodePool pool;
Node *a(make(i, d,pool)), *b(make(-i, d,pool));
c += a->check() + b->check();
}

sprintf(outputstr + LINE_SIZE * d, "%d\t trees of depth %d\t check: %d\n", (2 * iterations), d, c);
}

for (int d = min_depth; d <= max_depth; d += 2)
printf("%s", outputstr + (d * LINE_SIZE) );
free(outputstr);

std::cout << "long lived tree of depth " << max_depth << "\t "
<< "check: " << (long_lived_tree->check()) << '\n';

return 0;
}
 
W

W Karas

I have played a little tree with benchmark from computer language game

site:

http://shootout.alioth.debian.org/u64q/performance.php?test=binarytrees



Interested in gc vs manual I have been very surprised with Java

performance ;)

Seems that Java gc is blazingly fast in comparison with standard

new/delete.

Is it possible that the benchmark never causes GC to run? Is so, that would also mean any time it took to run destructors was also ignored in the Java case.
I think this is because gc deallocates in chunks rather piece by

piece, perhaps allocates from pool. Also there are no destructors

calls.

fastest C++ program on this site uses boost pool which calls

destructors and can deallocate, therefore, is slower then C variant

which uses apache's pool, which just deallocate.



So I have coded simple custom pool that does not keeps track of

allocated objects and doesn't call destructors and result is

that it is slightly faster than C ;)



What is clear is that Java garbage collector cannot be bitten

in this case if manual deletion of objects is required.

(as seen here)

http://shootout.alioth.debian.org/u64/performance.php?test=binarytrees



Timings for cpp with my custom alloc

bmaxa@maxa:~/shootout/trees$ time ./binarytreescpp 20

stretch tree of depth 21 check: -1

2097152 trees of depth 4 check: -2097152

524288 trees of depth 6 check: -524288

131072 trees of depth 8 check: -131072

32768 trees of depth 10 check: -32768

8192 trees of depth 12 check: -8192

2048 trees of depth 14 check: -2048

512 trees of depth 16 check: -512

128 trees of depth 18 check: -128

32 trees of depth 20 check: -32

long lived tree of depth 20 check: -1



real 0m4.188s

user 0m7.036s

sys 0m0.112s

bmaxa@maxa:~/shootout/trees$ time taskset -c 0 ./binarytreescpp 20

stretch tree of depth 21 check: -1

2097152 trees of depth 4 check: -2097152

524288 trees of depth 6 check: -524288

131072 trees of depth 8 check: -131072

32768 trees of depth 10 check: -32768

8192 trees of depth 12 check: -8192

2048 trees of depth 14 check: -2048

512 trees of depth 16 check: -512

128 trees of depth 18 check: -128

32 trees of depth 20 check: -32

long lived tree of depth 20 check: -1



real 0m7.077s

user 0m6.976s

sys 0m0.084s



Timings for C with apache pool:

bmaxa@maxa:~/shootout/trees$ time ./binarytreesc 20

stretch tree of depth 21 check: -1

2097152 trees of depth 4 check: -2097152

524288 trees of depth 6 check: -524288

131072 trees of depth 8 check: -131072

32768 trees of depth 10 check: -32768

8192 trees of depth 12 check: -8192

2048 trees of depth 14 check: -2048

512 trees of depth 16 check: -512

128 trees of depth 18 check: -128

32 trees of depth 20 check: -32

long lived tree of depth 20 check: -1



real 0m4.625s

user 0m8.085s

sys 0m0.140s

bmaxa@maxa:~/shootout/trees$ time taskset -c 0 ./binarytreesc 20

stretch tree of depth 21 check: -1

2097152 trees of depth 4 check: -2097152

524288 trees of depth 6 check: -524288

131072 trees of depth 8 check: -131072

32768 trees of depth 10 check: -32768

8192 trees of depth 12 check: -8192

2048 trees of depth 14 check: -2048

512 trees of depth 16 check: -512

128 trees of depth 18 check: -128

32 trees of depth 20 check: -32

long lived tree of depth 20 check: -1



real 0m8.009s

user 0m7.888s

sys 0m0.116s



code follows if someone is interested:

/* The Computer Language Benchmarks Game

* http://shootout.alioth.debian.org/

*

* contributed by Jon Harrop

* modified by Alex Mizrahi

* modified by Andreas Schäfer

* very minor omp tweak by The Anh Tran

* custom pool Branimir Maksimovic

*/



#include <iostream>

#include <stdlib.h>

#include <stdio.h>

#include <vector>

#include <new>

#include <pthread.h>

#include <boost/pool/object_pool.hpp>



const size_t LINE_SIZE = 64;



template <class T>

class Pool{

static const unsigned NELEM = 10000*sizeof(T);

public:

Pool()

: pos_(0)

{

pool_.push_back(alloc());

}

Pool(const Pool&)=delete;

Pool& operator=(const Pool&)=delete;

~Pool()

{

for(auto i : pool_)

{

free(i);

}

}

template <class ...T1>

T* allocate(T1... args)

{

if(NELEM <= pos_)

{

pool_.push_back(alloc());

pos_=0;

}

void* p = &pool_.back()[pos_];

pos_+=sizeof(T);

return new(p)T(args...);

}

private:

static char* alloc()

{

Inner* next = (Inner*)pthread_getspecific(k.key);

if(next == nullptr)

{

next = (Inner*)::eek:perator new(NELEM);

next->next = nullptr;

}

char* p = (char*)next;

next = next->next;

pthread_setspecific(k.key,next);

return p;

}

static void free(void* p)

{

Inner* next = (Inner*)pthread_getspecific(k.key);

Inner* tmp = (Inner*)p;

tmp->next = next;

next = tmp;

pthread_setspecific(k.key,next);

}

std::vector<char*> pool_;

unsigned pos_;

struct Inner{

Inner* next;

};

static struct Key{

Key()

{

pthread_key_create(&key,destroy);

}

~Key()

{

pthread_key_delete(key);

}

static void destroy(void* p)

{

Inner* next = (Inner*)p;

while(next)

{

Inner* tmp = next->next;

::eek:perator delete(next);

next = tmp;

}

}

pthread_key_t key;

}k;

};



template <class T>

typename Pool<T>::Key Pool<T>::k;



struct Node

{

Node *l, *r;

int i;



Node(int i2) :l(0),r(0), i(i2)

{}

Node(Node *l2, int i2, Node *r2) : l(l2), r(r2), i(i2)

{}

Node(const Node&)=delete;

Node& operator=(const Node&)=delete;

int check() const

{

if (l)

return l->check() + i - r->check();

else return i;

}

};



typedef boost::eek:bject_pool<Node> NodePool;



Node *make(int i, int d, Pool<Node>& p)

{

if (d > 0)

return p.allocate( make(2*i-1, d-1,p),

i,

make(2*i, d-1,p));

return p.allocate(i);

}



Node *make(int i, int d, NodePool& p)

{

if (d > 0)

return p.construct( make(2*i-1, d-1,p),

i,

make(2*i, d-1,p));

return p.construct(i);

}



int main(int argc, char *argv[])

{

int min_depth = 4;

int max_depth = std::max(min_depth+2,

(argc == 2 ? atoi(argv[1]) : 10));

int stretch_depth = max_depth+1;



// Alloc then dealloc stretchdepth tree

{

Pool<Node> c_pool;

//NodePool c_pool;

Node* c (make(0, stretch_depth,c_pool));

std::cout << "stretch tree of depth " << stretch_depth << "\t "

<< "check: " << c->check() << '\n';

}



Pool<Node> long_lived_tree_pool;

//NodePool long_lived_tree_pool;

Node* long_lived_tree(make(0, max_depth,long_lived_tree_pool));



char *outputstr = (char*)malloc(LINE_SIZE * (max_depth +1) * sizeof(char));



#pragma omp parallel for

for (int d = min_depth; d <= max_depth; d += 2)

{

int iterations = 1 << (max_depth - d + min_depth);

int c = 0;



for (int i = 1; i <= iterations; ++i)

{

Pool<Node> pool;

//NodePool pool;

Node *a(make(i, d,pool)), *b(make(-i, d,pool));

c += a->check() + b->check();

}



sprintf(outputstr + LINE_SIZE * d, "%d\t trees of depth %d\t check: %d\n", (2 * iterations), d, c);

}



for (int d = min_depth; d <= max_depth; d += 2)

printf("%s", outputstr + (d * LINE_SIZE) );

free(outputstr);



std::cout << "long lived tree of depth " << max_depth << "\t "

<< "check: " << (long_lived_tree->check()) << '\n';



return 0;

}



I have played a little tree with benchmark from computer language game

site:

http://shootout.alioth.debian.org/u64q/performance.php?test=binarytrees



Interested in gc vs manual I have been very surprised with Java

performance ;)

Seems that Java gc is blazingly fast in comparison with standard

new/delete.

I think this is because gc deallocates in chunks rather piece by

piece, perhaps allocates from pool. Also there are no destructors

calls.

fastest C++ program on this site uses boost pool which calls

destructors and can deallocate, therefore, is slower then C variant

which uses apache's pool, which just deallocate.



So I have coded simple custom pool that does not keeps track of

allocated objects and doesn't call destructors and result is

that it is slightly faster than C ;)



What is clear is that Java garbage collector cannot be bitten

in this case if manual deletion of objects is required.

(as seen here)

http://shootout.alioth.debian.org/u64/performance.php?test=binarytrees



Timings for cpp with my custom alloc

bmaxa@maxa:~/shootout/trees$ time ./binarytreescpp 20

stretch tree of depth 21 check: -1

2097152 trees of depth 4 check: -2097152

524288 trees of depth 6 check: -524288

131072 trees of depth 8 check: -131072

32768 trees of depth 10 check: -32768

8192 trees of depth 12 check: -8192

2048 trees of depth 14 check: -2048

512 trees of depth 16 check: -512

128 trees of depth 18 check: -128

32 trees of depth 20 check: -32

long lived tree of depth 20 check: -1



real 0m4.188s

user 0m7.036s

sys 0m0.112s

bmaxa@maxa:~/shootout/trees$ time taskset -c 0 ./binarytreescpp 20

stretch tree of depth 21 check: -1

2097152 trees of depth 4 check: -2097152

524288 trees of depth 6 check: -524288

131072 trees of depth 8 check: -131072

32768 trees of depth 10 check: -32768

8192 trees of depth 12 check: -8192

2048 trees of depth 14 check: -2048

512 trees of depth 16 check: -512

128 trees of depth 18 check: -128

32 trees of depth 20 check: -32

long lived tree of depth 20 check: -1



real 0m7.077s

user 0m6.976s

sys 0m0.084s



Timings for C with apache pool:

bmaxa@maxa:~/shootout/trees$ time ./binarytreesc 20

stretch tree of depth 21 check: -1

2097152 trees of depth 4 check: -2097152

524288 trees of depth 6 check: -524288

131072 trees of depth 8 check: -131072

32768 trees of depth 10 check: -32768

8192 trees of depth 12 check: -8192

2048 trees of depth 14 check: -2048

512 trees of depth 16 check: -512

128 trees of depth 18 check: -128

32 trees of depth 20 check: -32

long lived tree of depth 20 check: -1



real 0m4.625s

user 0m8.085s

sys 0m0.140s

bmaxa@maxa:~/shootout/trees$ time taskset -c 0 ./binarytreesc 20

stretch tree of depth 21 check: -1

2097152 trees of depth 4 check: -2097152

524288 trees of depth 6 check: -524288

131072 trees of depth 8 check: -131072

32768 trees of depth 10 check: -32768

8192 trees of depth 12 check: -8192

2048 trees of depth 14 check: -2048

512 trees of depth 16 check: -512

128 trees of depth 18 check: -128

32 trees of depth 20 check: -32

long lived tree of depth 20 check: -1



real 0m8.009s

user 0m7.888s

sys 0m0.116s



code follows if someone is interested:

/* The Computer Language Benchmarks Game

* http://shootout.alioth.debian.org/

*

* contributed by Jon Harrop

* modified by Alex Mizrahi

* modified by Andreas Schäfer

* very minor omp tweak by The Anh Tran

* custom pool Branimir Maksimovic

*/



#include <iostream>

#include <stdlib.h>

#include <stdio.h>

#include <vector>

#include <new>

#include <pthread.h>

#include <boost/pool/object_pool.hpp>



const size_t LINE_SIZE = 64;



template <class T>

class Pool{

static const unsigned NELEM = 10000*sizeof(T);

public:

Pool()

: pos_(0)

{

pool_.push_back(alloc());

}

Pool(const Pool&)=delete;

Pool& operator=(const Pool&)=delete;

~Pool()

{

for(auto i : pool_)

{

free(i);

}

}

template <class ...T1>

T* allocate(T1... args)

{

if(NELEM <= pos_)

{

pool_.push_back(alloc());

pos_=0;

}

void* p = &pool_.back()[pos_];

pos_+=sizeof(T);

return new(p)T(args...);

}

private:

static char* alloc()

{

Inner* next = (Inner*)pthread_getspecific(k.key);

if(next == nullptr)

{

next = (Inner*)::eek:perator new(NELEM);

next->next = nullptr;

}

char* p = (char*)next;

next = next->next;

pthread_setspecific(k.key,next);

return p;

}

static void free(void* p)

{

Inner* next = (Inner*)pthread_getspecific(k.key);

Inner* tmp = (Inner*)p;

tmp->next = next;

next = tmp;

pthread_setspecific(k.key,next);

}

std::vector<char*> pool_;

unsigned pos_;

struct Inner{

Inner* next;

};

static struct Key{

Key()

{

pthread_key_create(&key,destroy);

}

~Key()

{

pthread_key_delete(key);

}

static void destroy(void* p)

{

Inner* next = (Inner*)p;

while(next)

{

Inner* tmp = next->next;

::eek:perator delete(next);

next = tmp;

}

}

pthread_key_t key;

}k;

};



template <class T>

typename Pool<T>::Key Pool<T>::k;



struct Node

{

Node *l, *r;

int i;



Node(int i2) :l(0),r(0), i(i2)

{}

Node(Node *l2, int i2, Node *r2) : l(l2), r(r2), i(i2)

{}

Node(const Node&)=delete;

Node& operator=(const Node&)=delete;

int check() const

{

if (l)

return l->check() + i - r->check();

else return i;

}

};



typedef boost::eek:bject_pool<Node> NodePool;



Node *make(int i, int d, Pool<Node>& p)

{

if (d > 0)

return p.allocate( make(2*i-1, d-1,p),

i,

make(2*i, d-1,p));

return p.allocate(i);

}



Node *make(int i, int d, NodePool& p)

{

if (d > 0)

return p.construct( make(2*i-1, d-1,p),

i,

make(2*i, d-1,p));

return p.construct(i);

}



int main(int argc, char *argv[])

{

int min_depth = 4;

int max_depth = std::max(min_depth+2,

(argc == 2 ? atoi(argv[1]) : 10));

int stretch_depth = max_depth+1;



// Alloc then dealloc stretchdepth tree

{

Pool<Node> c_pool;

//NodePool c_pool;

Node* c (make(0, stretch_depth,c_pool));

std::cout << "stretch tree of depth " << stretch_depth << "\t "

<< "check: " << c->check() << '\n';

}



Pool<Node> long_lived_tree_pool;

//NodePool long_lived_tree_pool;

Node* long_lived_tree(make(0, max_depth,long_lived_tree_pool));



char *outputstr = (char*)malloc(LINE_SIZE * (max_depth +1) * sizeof(char));



#pragma omp parallel for

for (int d = min_depth; d <= max_depth; d += 2)

{

int iterations = 1 << (max_depth - d + min_depth);

int c = 0;



for (int i = 1; i <= iterations; ++i)

{

Pool<Node> pool;

//NodePool pool;

Node *a(make(i, d,pool)), *b(make(-i, d,pool));

c += a->check() + b->check();

}



sprintf(outputstr + LINE_SIZE * d, "%d\t trees of depth %d\t check: %d\n", (2 * iterations), d, c);

}



for (int d = min_depth; d <= max_depth; d += 2)

printf("%s", outputstr + (d * LINE_SIZE) );

free(outputstr);



std::cout << "long lived tree of depth " << max_depth << "\t "

<< "check: " << (long_lived_tree->check()) << '\n';



return 0;

}



I have played a little tree with benchmark from computer language game

site:

http://shootout.alioth.debian.org/u64q/performance.php?test=binarytrees



Interested in gc vs manual I have been very surprised with Java

performance ;)

Seems that Java gc is blazingly fast in comparison with standard

new/delete.

I think this is because gc deallocates in chunks rather piece by

piece, perhaps allocates from pool. Also there are no destructors

calls.

fastest C++ program on this site uses boost pool which calls

destructors and can deallocate, therefore, is slower then C variant

which uses apache's pool, which just deallocate.



So I have coded simple custom pool that does not keeps track of

allocated objects and doesn't call destructors and result is

that it is slightly faster than C ;)



What is clear is that Java garbage collector cannot be bitten

in this case if manual deletion of objects is required.

(as seen here)

http://shootout.alioth.debian.org/u64/performance.php?test=binarytrees



Timings for cpp with my custom alloc

bmaxa@maxa:~/shootout/trees$ time ./binarytreescpp 20

stretch tree of depth 21 check: -1

2097152 trees of depth 4 check: -2097152

524288 trees of depth 6 check: -524288

131072 trees of depth 8 check: -131072

32768 trees of depth 10 check: -32768

8192 trees of depth 12 check: -8192

2048 trees of depth 14 check: -2048

512 trees of depth 16 check: -512

128 trees of depth 18 check: -128

32 trees of depth 20 check: -32

long lived tree of depth 20 check: -1



real 0m4.188s

user 0m7.036s

sys 0m0.112s

bmaxa@maxa:~/shootout/trees$ time taskset -c 0 ./binarytreescpp 20

stretch tree of depth 21 check: -1

2097152 trees of depth 4 check: -2097152

524288 trees of depth 6 check: -524288

131072 trees of depth 8 check: -131072

32768 trees of depth 10 check: -32768

8192 trees of depth 12 check: -8192

2048 trees of depth 14 check: -2048

512 trees of depth 16 check: -512

128 trees of depth 18 check: -128

32 trees of depth 20 check: -32

long lived tree of depth 20 check: -1



real 0m7.077s

user 0m6.976s

sys 0m0.084s



Timings for C with apache pool:

bmaxa@maxa:~/shootout/trees$ time ./binarytreesc 20

stretch tree of depth 21 check: -1

2097152 trees of depth 4 check: -2097152

524288 trees of depth 6 check: -524288

131072 trees of depth 8 check: -131072

32768 trees of depth 10 check: -32768

8192 trees of depth 12 check: -8192

2048 trees of depth 14 check: -2048

512 trees of depth 16 check: -512

128 trees of depth 18 check: -128

32 trees of depth 20 check: -32

long lived tree of depth 20 check: -1



real 0m4.625s

user 0m8.085s

sys 0m0.140s

bmaxa@maxa:~/shootout/trees$ time taskset -c 0 ./binarytreesc 20

stretch tree of depth 21 check: -1

2097152 trees of depth 4 check: -2097152

524288 trees of depth 6 check: -524288

131072 trees of depth 8 check: -131072

32768 trees of depth 10 check: -32768

8192 trees of depth 12 check: -8192

2048 trees of depth 14 check: -2048

512 trees of depth 16 check: -512

128 trees of depth 18 check: -128

32 trees of depth 20 check: -32

long lived tree of depth 20 check: -1



real 0m8.009s

user 0m7.888s

sys 0m0.116s



code follows if someone is interested:

/* The Computer Language Benchmarks Game

* http://shootout.alioth.debian.org/

*

* contributed by Jon Harrop

* modified by Alex Mizrahi

* modified by Andreas Schäfer

* very minor omp tweak by The Anh Tran

* custom pool Branimir Maksimovic

*/



#include <iostream>

#include <stdlib.h>

#include <stdio.h>

#include <vector>

#include <new>

#include <pthread.h>

#include <boost/pool/object_pool.hpp>



const size_t LINE_SIZE = 64;



template <class T>

class Pool{

static const unsigned NELEM = 10000*sizeof(T);

public:

Pool()

: pos_(0)

{

pool_.push_back(alloc());

}

Pool(const Pool&)=delete;

Pool& operator=(const Pool&)=delete;

~Pool()

{

for(auto i : pool_)

{

free(i);

}

}

template <class ...T1>

T* allocate(T1... args)

{

if(NELEM <= pos_)

{

pool_.push_back(alloc());

pos_=0;

}

void* p = &pool_.back()[pos_];

pos_+=sizeof(T);

return new(p)T(args...);

}

private:

static char* alloc()

{

Inner* next = (Inner*)pthread_getspecific(k.key);

if(next == nullptr)

{

next = (Inner*)::eek:perator new(NELEM);

next->next = nullptr;

}

char* p = (char*)next;

next = next->next;

pthread_setspecific(k.key,next);

return p;

}

static void free(void* p)

{

Inner* next = (Inner*)pthread_getspecific(k.key);

Inner* tmp = (Inner*)p;

tmp->next = next;

next = tmp;

pthread_setspecific(k.key,next);

}

std::vector<char*> pool_;

unsigned pos_;

struct Inner{

Inner* next;

};

static struct Key{

Key()

{

pthread_key_create(&key,destroy);

}

~Key()

{

pthread_key_delete(key);

}

static void destroy(void* p)

{

Inner* next = (Inner*)p;

while(next)

{

Inner* tmp = next->next;

::eek:perator delete(next);

next = tmp;

}

}

pthread_key_t key;

}k;

};



template <class T>

typename Pool<T>::Key Pool<T>::k;



struct Node

{

Node *l, *r;

int i;



Node(int i2) :l(0),r(0), i(i2)

{}

Node(Node *l2, int i2, Node *r2) : l(l2), r(r2), i(i2)

{}

Node(const Node&)=delete;

Node& operator=(const Node&)=delete;

int check() const

{

if (l)

return l->check() + i - r->check();

else return i;

}

};



typedef boost::eek:bject_pool<Node> NodePool;



Node *make(int i, int d, Pool<Node>& p)

{

if (d > 0)

return p.allocate( make(2*i-1, d-1,p),

i,

make(2*i, d-1,p));

return p.allocate(i);

}



Node *make(int i, int d, NodePool& p)

{

if (d > 0)

return p.construct( make(2*i-1, d-1,p),

i,

make(2*i, d-1,p));

return p.construct(i);

}



int main(int argc, char *argv[])

{

int min_depth = 4;

int max_depth = std::max(min_depth+2,

(argc == 2 ? atoi(argv[1]) : 10));

int stretch_depth = max_depth+1;



// Alloc then dealloc stretchdepth tree

{

Pool<Node> c_pool;

//NodePool c_pool;

Node* c (make(0, stretch_depth,c_pool));

std::cout << "stretch tree of depth " << stretch_depth << "\t "

<< "check: " << c->check() << '\n';

}



Pool<Node> long_lived_tree_pool;

//NodePool long_lived_tree_pool;

Node* long_lived_tree(make(0, max_depth,long_lived_tree_pool));



char *outputstr = (char*)malloc(LINE_SIZE * (max_depth +1) * sizeof(char));



#pragma omp parallel for

for (int d = min_depth; d <= max_depth; d += 2)

{

int iterations = 1 << (max_depth - d + min_depth);

int c = 0;



for (int i = 1; i <= iterations; ++i)

{

Pool<Node> pool;

//NodePool pool;

Node *a(make(i, d,pool)), *b(make(-i, d,pool));

c += a->check() + b->check();

}



sprintf(outputstr + LINE_SIZE * d, "%d\t trees of depth %d\t check: %d\n", (2 * iterations), d, c);

}



for (int d = min_depth; d <= max_depth; d += 2)

printf("%s", outputstr + (d * LINE_SIZE) );

free(outputstr);



std::cout << "long lived tree of depth " << max_depth << "\t "

<< "check: " << (long_lived_tree->check()) << '\n';



return 0;

}
 
M

Melzzzzz

Is it possible that the benchmark never causes GC to run? Is so,
that would also mean any time it took to run destructors was also
ignored in the Java case.
GC runs heavily in this case as memory will be exhausted pretty
quickly if not.
Take for example comparison between C on single CPU and Java
(non threaded version, as threaded version suffers some on single CPU).
C uses fast pool from apache while Java uses GC!
What is also interesting is that with openmp it is actually easier
to parallelize with C/C++ than Java ;)

bmaxa@maxa:~/shootout/trees$ time java binarytrees 20
stretch tree of depth 21 check: -1
2097152 trees of depth 4 check: -2097152
524288 trees of depth 6 check: -524288
131072 trees of depth 8 check: -131072
32768 trees of depth 10 check: -32768
8192 trees of depth 12 check: -8192
2048 trees of depth 14 check: -2048
512 trees of depth 16 check: -512
128 trees of depth 18 check: -128
32 trees of depth 20 check: -32
long lived tree of depth 20 check: -1

real 0m8.537s
user 0m8.949s
sys 0m0.236s

bmaxa@maxa:~/shootout/trees$ time taskset -c 0 ./binarytreesc 20
stretch tree of depth 21 check: -1
2097152 trees of depth 4 check: -2097152
524288 trees of depth 6 check: -524288
131072 trees of depth 8 check: -131072
32768 trees of depth 10 check: -32768
8192 trees of depth 12 check: -8192
2048 trees of depth 14 check: -2048
512 trees of depth 16 check: -512
128 trees of depth 18 check: -128
32 trees of depth 20 check: -32
long lived tree of depth 20 check: -1

real 0m8.054s
user 0m7.936s
sys 0m0.112s
 
J

Juha Nieminen

Melzzzzz said:
Seems that Java gc is blazingly fast in comparison with standard
new/delete.

Yes, new/delete in C++ is very slow (as well as malloc()/free() in C).
That's the reason why a savvy C++ programmer will avoid doing tons of
them if it's possible to avoid them. Depending on the algorith, this is,
in fact, a relatively frequent occurrence (ie. that you can avoid using
tons of news and deletes).
 
J

James Kanze

I have played a little tree with benchmark from computer language game
site:

Interested in gc vs manual I have been very surprised with Java
performance ;)

Beware of any benchmark you haven't falsified yourself.
Seems that Java gc is blazingly fast in comparison with standard
new/delete.

Typically, the classical algorithms for garbage collection will
be faster than manual allocation in some applications, slower
than manual in other---the classical gc algorithms shine when
there are a lot of small, short lived objects, for example. So
people interested in showing garbage collection in its best
light write benchmarks which make use of a lot of small,
short-lived objects: trees and graphs are very popular for this.
If your application makes extensive use of trees or graphs with
a large number of very small nodes, you'd benefit,
performance-wise, by using garbage collection in C++. (The
Boehm collector works very well, for example.)

Of course, the real argument for garbage collection isn't
performance; it's safety. There's no way to get a dangling
pointer (to dynamically allocated memory) with garbage
collection. So when you destruct an object, you can mark it as
destructed, and be sure that it will stay marked until there are
no pointers to it.
I think this is because gc deallocates in chunks rather piece by
piece, perhaps allocates from pool. Also there are no destructors
calls.
fastest C++ program on this site uses boost pool which calls
destructors and can deallocate, therefore, is slower then C variant
which uses apache's pool, which just deallocate.
So I have coded simple custom pool that does not keeps track of
allocated objects and doesn't call destructors and result is
that it is slightly faster than C ;)
What is clear is that Java garbage collector cannot be bitten
in this case if manual deletion of objects is required.

I'm not sure what you mean by "manual deletion". If the objects
have a non-trivial destructor even in the absense of memory
management issues, then it must be called. Regardless of the
language. (Not calling it is a frequent error in Java code.)
 
J

Juha Nieminen

James Kanze said:
Of course, the real argument for garbage collection isn't
performance; it's safety. There's no way to get a dangling
pointer (to dynamically allocated memory) with garbage
collection. So when you destruct an object, you can mark it as
destructed, and be sure that it will stay marked until there are
no pointers to it.

Depends on your definition of "safety". Garbage collection makes the
execution of destructors ("finalizers") non-deterministic and, in fact,
they might not be executed *at all* in Java. Thus destructors are
basically useless. AFAIK Java does not offer any alternative (other
than C-style manual resource management, other than memory).
 
J

James Kanze

Depends on your definition of "safety". Garbage collection makes the
execution of destructors ("finalizers") non-deterministic and, in fact,
they might not be executed *at all* in Java. Thus destructors are
basically useless. AFAIK Java does not offer any alternative (other
than C-style manual resource management, other than memory).

I don't think you understand how garbage collection works.
Finalizers have nothing to do with destructors; destructors work
exactly as they always have, even with garbage collection.
Garbage collection has no impact with regards to destructors.

Note that the safety aspect doesn't mean that you don't "delete"
objects. What it means, above all, is that you don't reuse
"deleted" memory until there are no more pointers to it. So if
you somehow mark the memory as "deleted", you can detect any
attempt to use it---that mark won't be overwritten by a later
allocation. And that if you forget too delete the object before
the last pointer to it disappears, the finalize method can tell
you.

There's also a convenience aspect, that you don't need to
destruct objects with trivial destructors, and that most
destructors do become trivial once they don't have to worry
about memory management. But this is secondary to the safety
issue.
 
J

Juha Nieminen

James Kanze said:
I don't think you understand how garbage collection works.
Finalizers have nothing to do with destructors; destructors work
exactly as they always have, even with garbage collection.
Garbage collection has no impact with regards to destructors.

You are right. I don't understand what you are talking about. I don't
even understand what you mean by "finalizer" and "destructor" here.

Java classes have destructors, but due to GC they get called some time
in the future, maybe not at all. That makes them practically useless.
You better not rely on them to free important non-memory resources.
And Java doesn't really provide any alternative either.
 
J

James Kanze

You are right. I don't understand what you are talking about. I don't
even understand what you mean by "finalizer" and "destructor" here.
Java classes have destructors,

Java classes don't have destructors. At least not according to
the Java language specification.

In practice, "destructors" (in the largest sense of the word)
are essential for some classes. There is a relatively
widespread (but far from universal) convention in Java to call
them "dispose": if your class has a "dispose" function, clients
are expected to call it whenever the lifetime of the object ends
(logically, of course). Typically using a try ... finally,
since the language has no mechanism for automating calling it.
but due to GC they get called some time in the future, maybe
not at all. That makes them practically useless.

Those aren't destructors, but finalizer methods. And they can
be useful: if the object requires destruction (most don't), you
assert that the destructor has been called.
You better not rely on them to free important non-memory resources.

Of course not. That's not what they're there for,
And Java doesn't really provide any alternative either.

The "alternative" is try ... finally. In my opinion, not a
particularly good alternative, since it leaves the
responsibility in the hands of the client, rather than having
the author of the class enforce it.

But the argument try...finally vs. destructors is orthogonaly to
arguments concerning garbage collection.
 
J

Juha Nieminen

James Kanze said:
The "alternative" is try ... finally. In my opinion, not a
particularly good alternative, since it leaves the
responsibility in the hands of the client, rather than having
the author of the class enforce it.

Besides, that only works for objects that do not outlive the scope where
they are created, nor are shared. (In other words, it works only for
strictly scope-bound, non-shared objects.) Since it's next to impossible
to implement any kind of reference counting in Java (because you can't
overload a reference assignment nor destruction), and it's overall difficult
to know if an object is being currently shared or not, it's likewise very
difficult to implement any kind of automatic "when the last reference to
this object dies, execute this function".
 
G

goran.pusic

I have played a little tree with benchmark from computer language game

site:

http://shootout.alioth.debian.org/u64q/performance.php?test=binarytrees



Interested in gc vs manual I have been very surprised with Java

performance ;)

Seems that Java gc is blazingly fast in comparison with standard

new/delete.

I think this is because gc deallocates in chunks rather piece by

piece, perhaps allocates from pool. Also there are no destructors

calls.

fastest C++ program on this site uses boost pool which calls

destructors and can deallocate, therefore, is slower then C variant

which uses apache's pool, which just deallocate.



So I have coded simple custom pool that does not keeps track of

allocated objects and doesn't call destructors and result is

that it is slightly faster than C ;)



What is clear is that Java garbage collector cannot be bitten

in this case if manual deletion of objects is required.

Absolutely. Perhaps "cannot be beaten" are big words, but when most of whatyou do is allocation/deallocation, GC is certainly hard to beat with C or C++ (as far as allocation goes, the two are the same, really).

As said else-thread, C and C++ tend to be faster overall because in C and C++ it's easier to avoid allocation altogether (there's more control). Often, that speed comes at the expense of flexibility (fixed buffer sizes here and there), and safety (direct access to memory).

Goran.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,767
Messages
2,569,570
Members
45,045
Latest member
DRCM

Latest Threads

Top