Tremendous slowdown due to garbage collection

M

Martin v. Löwis

Martin said it but nevertheless it might not be true.
We observed similar very bad behaviour -- in a Web application server.
Apparently, the standard behaviour is far from optimal when the
system contains a large number of objects and occationally, large
numbers of objects are created in a short time.
We have seen such behaviour during parsing of larger XML documents, for
example (in our Web application).

I don't want to claim that the *algorithm* works for all typically
applications well. I just claim that the *parameters* of it are fine.
The OP originally proposed to change the parameters, making garbage
collection run less frequently. This would a) have bad consequences
in terms of memory consumption on programs that do have allocation
spikes, and b) have no effect on the asymptotic complexity of the
algorithm in the case discussed.

It may well be that other algorithms would perform better, but
none have been contributed.

Regards,
Martin
 
T

Terry Reedy

| We observed similar very bad behaviour -- in a Web application server.
| Apparently, the standard behaviour is far from optimal when the
| system contains a large number of objects and occationally, large
| numbers of objects are created in a short time.
| We have seen such behaviour during parsing of larger XML documents, for
| example (in our Web application).

Does the standard alternative behavior of temporarily turning cyclic gc off
solve your problem?

Can this alternative be made easier by adding a context manager to gc
module to use with 'with' statements? Something like

with gc.delay() as dummy:
<block>

with exit invoking the collector (or make that a delay keyword param?)

Do the docs need some improvement in this area?

tjr
 
P

Paul Rubin

Terry Reedy said:
Can this alternative be made easier by adding a context manager to gc
module to use with 'with' statements? Something like

with gc.delay() as dummy:
<block>

That sonuds worth adding as a hack, but really I hope there can be
an improved gc someday.
 
D

Dieter Maurer

I don't want to claim that the *algorithm* works for all typically
applications well. I just claim that the *parameters* of it are fine.
The OP originally proposed to change the parameters, making garbage
collection run less frequently. This would a) have bad consequences
in terms of memory consumption on programs that do have allocation
spikes, and b) have no effect on the asymptotic complexity of the
algorithm in the case discussed.

In our case, it helped to change the parameters:

As usual in Python, in our case cyclic garbage is very rare.
On the other hand, we have large caches with lots of objects,
i.e. a large number of long living objects.
Each generation 2 garbage collection visits the complete
object set. Thus, if it is called too often, matters can
deteriorate drastically.

In our case, the main problem has not been the runtime
but that during GC the GIL is hold (understandably).
This meant that we had every few minutes a scheduling
distortion in the order of 10 to 20 s (the time of
our generation 2 gc).

We changed the parameters to let generation 2 GC happen
at about 1/1000 of its former frequency.


I do not argue that Python's default GC parameters must change -- only
that applications with lots of objects may want to consider a
reconfiguration.
 
J

John Nagle

We observed similar very bad behaviour -- in a Web application server.
Apparently, the standard behaviour is far from optimal when the
system contains a large number of objects and occationally, large
numbers of objects are created in a short time.
We have seen such behaviour during parsing of larger XML documents, for
example (in our Web application).

Our solution to that was to modify BeautifulSoup to use weak pointers.
All the pointers towards the root and towards previous parts of the
document are "weak". As a result, reference counting alone is sufficient
to manage the tree. We still keep GC enabled, but it doesn't find much
to collect.

John Nagle
SiteTruth
 
M

Martin v. Löwis

I do not argue that Python's default GC parameters must change -- only
that applications with lots of objects may want to consider a
reconfiguration.

That's exactly what I was trying to say: it's not that the parameters
are useful for *all* applications (that's why they are tunable
parameters), just that the defaults shouldn't change (as the OP was
requesting).

Regards,
Martin
 
S

s0suk3

I should have been more specific about possible fixes.
python2.5 -m timeit 'gc.disable();l=[(i,) for i in range(2000000)]'
10 loops, best of 3: 662 msec per loop
python2.5 -m timeit 'gc.enable();l=[(i,) for i in range(2000000)]'
10 loops, best of 3: 15.2 sec per loop
In the latter case, thegarbagecollector apparently consumes about
97% of the overall time.

In a related thread onhttp://bugs.python.org/issue2607
Amaury Forgeot d'Arc suggested a setting of the GC
thresholds that actually solves the problem for me:
Disabling the gc may not be a good idea in a real application; I suggest
you to play with the gc.set_threshold function and set larger values, at
least while building the dictionary. (700, 1000, 10) seems to yield good
results.
python2.5 -m timeit 'gc.set_threshold(700,1000,10);l=[(i,) for i in range(2000000)]'

10 loops, best of 3: 658 msec per loop

which made me suggest to use these as defaults, but then
Martin v. Löwis wrote that
No, the defaults are correct for typical applications.

At that point I felt lost and as the general wish in that thread was
to move
discussion to comp.lang.python, I brought it up here, in a modified
and simplified form.
I would suggest to configure the default behaviour of thegarbage
collector in such a way that this squared complexity is avoided
without requiring specific knowledge and intervention by the user. Not
being an expert in these details I would like to ask the gurus how
this could be done.
I hope this should be at least technically possible, whether it is
really desirable or important for a default installation of Python
could then be discussed once the disadvantages of such a setting would
be apparent.

I still don't see what is so good about defaults that lead to O(N*N)
computation for a O(N) problem, and I like Amaury's suggestion a lot,
so I would like to see comments on its disadvantages. Please don't
tell me that O(N*N) is good enough. For N>1E7 it isn't.

About some other remarks made here:
I think the linguists of the world should write better automated
translation systems. Not being an expert in these details I would like
to ask the gurus how it could be done.

I fully agree, and that is why I (as someone involved with MT) would
prefer to focus on MT and not on memory allocation issues, and by the
way, this is why I normally prefer to work in Python, as it normally
lets me focus on the problem instead of the tool.
There are going to be pathological cases in any memory allocation
scheme. The fact that you have apparently located one calls for you to
change your approach, not for the finely-tuned well-conceivedgarbage
collection scheme to be abandoned for your convenience.

I do not agree at all. Having to deal with N>1E7 objects is IMHO not
pathological, it is just daily practice in data-oriented work (we're
talking about deriving models from Gigawords, not about toy systems).
If you had made a definite proposal that could have been evaluated you
request would perhaps have seemed a little more approachable.

I feel it is ok to describe a generic problem without having found
the answer yet.
(If you disagree: Where are your definite proposals wrt. MT ? ;-)
But I admit, I should have brought up Amaury's definite proposal
right away.
A question often asked ... of people who are creating
very large data structures in Python is "Why are you doing that?"
That is, you should consider whether some kind of database solution
would be better. You mention lots of dicts--it sounds like some
balanced B-trees with disk loading on demand could be a good choice.

I do it because I have to count frequencies of pairs of things that
appear in real data. Extremely simple stuff, only interesting because
of the size of the data set. After Amaury's hint to switch GC
temporarily
off I can count 100M pairs tokens (18M pair types) in about 15
minutes,
which is very cool, and I can do it in only a few lines of code, which
is even cooler.
Any approach that would touch the disk would be too slow for me, and
bringing in complicated datastructures by myself would distract me
too much from what I need to do. That's exactly why I use Python;
it has lots of highly fine-tuned complicated stuff behind the scenes,
that is just doing the right thing, so I should not have to (and I
could
not) re-invent this myself.

Thanks for the helpful comments,
Andreas

In the issue at bugs.python.org, why do you say "Do I have to switch
to Perl or C to get this done???", as if Perl and C were similar
languages? If we were to look at Python, Perl, and C together, Python
and Perl would be similar languages, and C would be quite something
different, *specially* regarding the discussed issue: speed. So what
do you mean by saying "Perl or C"?
 
A

Aaron Watters

I do not argue that Python's default GC parameters must change -- only
that applications with lots of objects may want to consider a
reconfiguration.

I would argue that changing the GC to some sort
of adaptive strategy should at least be investigated.
Having an app which doesn't need gc spending most
of its time doing gc is annoying, to say the least.

Of course any such change should be tested and
analysed thoroughly before it goes into Python 2.6.1
or whatever and it would be great if there
were several alternatives considered and compared.

Hmmm. I think
this would make an excellent computer science Master's
Thesis topic. Anybody looking for a topic?
-- Aaron Watters

====
http://www.xfeedme.com/nucular/pydistro.py/go?FREETEXT=use+while+fresh
 
D

Dieter Maurer

John Nagle said:
Our solution to that was to modify BeautifulSoup to use weak pointers.
All the pointers towards the root and towards previous parts of the
document are "weak". As a result, reference counting alone is sufficient
to manage the tree. We still keep GC enabled, but it doesn't find much
to collect.

It will not help in our setup.

We, too, have almost no cycles -- but the GC does not know this:

If a large number of objects are created temporarily and not released
before the generation 1 threshoold is reached, then
the garbage collector will start collections -- even, if there
are no or very few cycles.
A generation 2 garbage collection takes time proportional
to the total number of (GC aware) objects -- independent of
the number of cycles.

Dieter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,144
Latest member
KetoBaseReviews
Top