Do you use a garbage collector?

J

Jerry Coffin

[ ... ]
Good thing I neither said nor implied that it is, then. However,
all Java VMs that I know of support it. for the obvious reason that
GCs which don't compact would fail pretty quickly with heap
fragmentation [1]. Most C++ implementations don't support it, because
their pointers are implemented as machine addresses.and they lack a
mechanism to fix up the pointers after a compaction.

1. Plus the fact that if you have enough information to do GC, you
have more than enough to make compaction work.

Not really. A typical gc in C++ is added on after the fact, so it does
conservative collection -- since it doesn't know what is or isn't a
pointer, it treats everything as if it was a pointer, and assumes that
whatever if would point at if it was as pointer is live memory. Of
course, some values wouldn't be valid pointers and are eliminated.

It does NOT, however, know with any certainty that a particular value IS
a pointer -- some definitely aren't (valid) pointers, but others might
or might not be. Since it doesn't know for sure which are pointers and
which are just integers (or whatever) that hold values that could be
pointers, it can't modify any of them. It has enough information to do
garbage collection, but NOT enough to support compacting the heap.
 
J

Jerry Coffin

Erik- said:
Not quite sure, seems to be something like mmap, perhaps you can use it
to allocate memory for a heap, but there are other functions available
to manage heaps in Windows.

VirtualAlloc is (nearly) the lowest level memory allocation routine
available to normal user-mode programs in Win32. Nearly everything else
(e.g. HeapAlloc) is built on top of VirtualAlloc.

It's NOT much like mmap -- the Windows analog of mmap would be
MapViewOfFile (along with CreateFileMapping and OpenFileMapping).
 
M

Mike Schilling

Jerry Coffin said:
Not really. A typical gc in C++ is added on after the fact, so it
does
conservative collection -- since it doesn't know what is or isn't a
pointer, it treats everything as if it was a pointer, and assumes
that
whatever if would point at if it was as pointer is live memory. Of
course, some values wouldn't be valid pointers and are eliminated.

It does NOT, however, know with any certainty that a particular
value IS
a pointer -- some definitely aren't (valid) pointers, but others
might
or might not be. Since it doesn't know for sure which are pointers
and
which are just integers (or whatever) that hold values that could be
pointers, it can't modify any of them. It has enough information to
do
garbage collection, but NOT enough to support compacting the heap.

You make a good point. I overlooked conservative collectors.
 
E

Erik Wikström

VirtualAlloc is (nearly) the lowest level memory allocation routine
available to normal user-mode programs in Win32. Nearly everything else
(e.g. HeapAlloc) is built on top of VirtualAlloc.

It's NOT much like mmap -- the Windows analog of mmap would be
MapViewOfFile (along with CreateFileMapping and OpenFileMapping).

I was thinking of using mmap to map anonymous memory (mapping physical
RAM (or swap-backed memory) into the virtual address space), but I see
now that it is a platform-specific extension.
 
B

Bo Persson

Razii said:
How does std::vector keeps growing? Doesn't it allocates larger and
larger blocks of memory? Doesn't vector call the allocate method of
an instance object you are adding to get hold of memory? You didn't
remove 'new' from your code. the 'new' and 'delete' are in the
vector source code.

Isn't that what I said? The new and delete operators are not used very
often in the code. Perhaps only once each in std::allocator, where
they don't allocate int-sized objects, and never 10 million times in a
loop.

That's why the benchmark is silly - you would never do anything like
that in real C++ code.


Bo Persson
 
I

Ian Collins

Razii said:
std::vector has new and delete. They are still there and must be there
every time memory is allocated dynamically.
Didn't you read the above? std::vector is very unlikely to allocate
objects one at a time. If the programmer knows he or she is going to
place a large number of objects he or she will pre-allocate them.
It's not silly. It's a benchmark that tests dynamic memory allocation
and GC performance.
It's silly because it tests something you will never see in C++ code.
 
G

gpderetta

Yes, I am sure..

g++ -O2 -fomit-frame-pointer -finline-functions "new.cpp" -o "new.exe"

No, you are not. Add -DNDEBUG. Also did you measure with -O3? Did you
try to tune -march for your architecture (this can make a *lot* of
difference - or not, depending of the program)?
 
G

gpderetta

No, you are not. Add -DNDEBUG. Also did you measure with -O3? Did you
try to tune -march for your architecture (this can make a *lot* of
difference - or not, depending of the program)?

BTW, could you benchmark this version:

http://pastebin.com/m16980424

This is nothing a decent C++ programmer would ever write [1], but so
it is the benchmark itself.
Be sure to use -O3 (on my machine, with gcc it is two time faster than
-O2).

[1] my version uses a very simple region allocator, which in some
extreme cases (HPC, embedded devices, benchmarks :) ) might actually
make sense.
 
M

Mirek Fidler

OFFTOPIC: Chris, I have tried to send you an email concerning AppCore.
Have you got it?

Mirek
 
B

Bo Persson

Razii said:
Yes I tried -O3. There was no difference.


Why? At least commercial C++ software will have to target the
least-common-denominator processor. The flags we use must target the
least-common-denominator processor.

Of course not, now you are being silly again.

If the program runs fast enough on mid-sized hardware, optimize for
that. It will run even better on the big iron.

If you need the lastest hardware, optimize for that. Don't bother with
the rest.
In any case, I added

-march=athlon-xp

You're kidding! :)
there was no change.

Time: 26656 ms

Java version (with the flags I suggested) is at

Time: 1789 ms

14 (or is that 15?) times faster.

Ok, so for Java you must optmize for the actual test machine? :))


Bo Persson
 
B

Bo Persson

Razii said:
There is no such java flag that optimizes for test machine. Java
code is compiled to bytecode, not machine code. JIT would compile
it at run time to machine code. JIT compiler knows what processor
it is running on, and can generate code specifically for that
processor. It knows whether the processor is a PIV or athlon, and
how big the caches are. A C++ compiler must target the
least-common-denominator processor.

No, the C++ can also target the appropriate system. You optimize for
the minimum target that is fast enough. On anything bigger or faster,
it just runs even better. Targeting x86 does not mean 386!


JIT doesn't buy you anything here. We have been through this before!


Bo Persson
 
G

gpderetta

BTW, could you benchmark this version:

This is nothing a decent C++ programmer would ever write [1], but so
it is the benchmark itself.
Be sure to use -O3 (on my machine, with gcc it is two time faster than
-O2).
[1] my version uses a very simple region allocator, which in some
extreme cases (HPC, embedded devices, benchmarks :) ) might actually
make sense.

On my old machine (P4 3.06 HyperThread)

This version <http://pastebin.com/m16980424> outputs:

I literally waited for about two minutes, and finally hit Ctrl-C... There is
something wrong. I have not studided your code yet.

A typo: the 'heap' arrays should have type 'Tree', not 'Tree*' (I was
playing with dynamically allocating the heap and forgot the '*' when I
reverted the change back).

try:
http://pastebin.com/m3a18a8e1
And my newest version <http://pastebin.com/m3a18a8e1> outputs:

Time: 7015 ms

What times are you getting?

On my laptop (pentium M-2.0gz).

Time: 670 ms

I think that this version is memory bandwidth limited.

If you really want to see how fast a smart compiler could optimize
this program,
refactor the call to DestroyTree(CreateTree()) to another function and
mark it with __attribute__((pure)) (which is legal as the function has
no side effect). In my tests it prints Time: 0. This in practice shows
how useless is this benchmark.
 
G

gpderetta

BTW, could you benchmark this version:

You have a bug. I fixed it. Here is the code:

http://pastebin.com/m67c0ee86

The bug was in line 17:

static Tree * heap[HEAP_SIZE];

You need to define it as:

static Tree heap[HEAP_SIZE];


Ah, ok, you found the bug :)
Sorry for posting a buggy program, by chance that version worked fine
here. :)

Fortunately I'm in good company here at posting buggy programs on the
first try ;)
:^)

I am getting:

Time 1935 ms

IMHO, there is one major flaw... This is not an example of dynamic memory.
This is basically static memory. Any thoughts?

So what? What's get the job done is fine for me. :)
I wouldn't be surprised if the java compiler was transforming the code
in something similar to my program.
Usually c++ compilers are bad at optimizing away allocation: for
example gcc thread malloc as having side effects and won't optimize
call to it.

Region allocation is still dynamic memory allocation (albeit a very
simple one).
It can be only used on limited circumstances (this test being a very
good one), but when it works, it works beautifully.
There are general purposes allocators that use regions when they can.
You probably know about 'reaps'

At most, what this test can show, is that C++ is bad at being Java
(which is what one would have expected).
 
G

gpderetta

Yes, this not dynamic memory.

Yes it is. Google for region allocator.
Ian Collins played the same trick. If
you change CreateTree(22)to CreateTree(23) (needs well over 100 MB),
the program will break.

Add the appropriate -DHEAP_SIZE=n on the command line.
By the way, I can do the same in Java too by Object pooling. Basically
make a big ol' List of Test objects, and then pick one from the list

Of course. The point being if you really are allocation limited (which
is *very* rarely the case, especially in C++),
you are better off doing hand optimizations, instead of relying on the
compiler to do it.

BTW, could you benchmark a Java version using the same trick?
 
G

gpderetta

On Sun, 13 Apr 2008 20:15:44 -0700, "Chris Thomasson"
Are you sure that you have been compiling the C++ code in non-debug
mode?
Yes, I am sure..
g++ -O2 -fomit-frame-pointer -finline-functions "new.cpp" -o
"new.exe"
No, you are not. Add -DNDEBUG. Also did you measure with -O3? Did you
try to tune -march for your architecture (this can make a *lot* of
difference - or not, depending of the program)?
BTW, could you benchmark this version:
http://pastebin.com/m16980424
This is nothing a decent C++ programmer would ever write [1], but so
it is the benchmark itself.
Be sure to use -O3 (on my machine, with gcc it is two time faster than
-O2).
[1] my version uses a very simple region allocator, which in some
extreme cases (HPC, embedded devices, benchmarks :) ) might actually
make sense.
On my old machine (P4 3.06 HyperThread)
This version <http://pastebin.com/m16980424> outputs:
I literally waited for about two minutes, and finally hit Ctrl-C... There
is
something wrong. I have not studided your code yet.
A typo: the 'heap' arrays should have type 'Tree', not 'Tree*' (I was
playing with dynamically allocating the heap and forgot the '*' when I
reverted the change back).
On my laptop (pentium M-2.0gz).
Time: 670 ms
I think that this version is memory bandwidth limited.

Are you referring to my version here:

http://pastebin.com/m3a18a8e1

If so, why do you think that its "memory bandwidth limited"? The code is
ad-hoc, I typed it out and tested it all in about 10 minutes, and it only
really works correctly within this benchmark. It can be improved so much.

No, to my version. It basically it is just filling memory with random
bytes and incrementing a variable.
It *has* to be memory bandwidth limited :)
 
T

Thomas J. Gritzan

Chris said:
Well, IMHO, Razii makes a fairly good point here... A _dynamic_
allocator _should_ be able to cope with the simple change he made (e.g.,
CreateTree(23)) without having to add special flags to the command line.
That is the allocator should be able to dynamically adapt to various
allocation needs. Think if your program took an argument on the command
line that can set the recursion depth of CreateTree. E.g.:


your_app 22


I would expect it to at least attempt to work if I types:

your_app 23
your_app 24
your_app 25


without having to recompile it each time in order to adjust the
'-DHEAP_SIZE=n'. Does that make any sense?

Not that I would want to interfere with your benchmark, but you could try
the boost pool allocator:
http://www.boost.org/doc/libs/1_35_0/libs/pool/doc/index.html

You could allocate/deallocate the objects one by one or even drop all
objects at once after each loop (at the point where System.gc() is in the
Java code)
 
M

Mirek Fidler

The benchmark is not useless. You failed to meet the requirement that
objects must be created dynamically. Your version doesn't. If
CreateTree(5) .. your version is wasting memory. If CreateTree(23),
your versions breaks down.

That's not dynamic memory.

Well, now you see, this is nearly the same problem as with your Java
flags...

Mirek
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,046
Latest member
Gavizuho

Latest Threads

Top