Documenting builtin methods

J

Joshua Landau

I have this innocent and simple code:

from collections import deque
exhaust_iter = deque(maxlen=0).extend
exhaust_iter.__doc__ = "Exhaust an iterator efficiently without
caching any of its yielded values."

Obviously it does not work. Is there a way to get it to work simply
and without creating a new scope (which would be a rather inefficient
a way to set documentation, and would hamper introspection)?

How about dropping the "simply" requirement?
 
A

alex23

I have this innocent and simple code:

from collections import deque
exhaust_iter = deque(maxlen=0).extend
exhaust_iter.__doc__ = "Exhaust an iterator efficiently without
caching any of its yielded values."

Obviously it does not work. Is there a way to get it to work simply
and without creating a new scope (which would be a rather inefficient
a way to set documentation, and would hamper introspection)?

I would just go with the most obvious approach:

def exhaust_iter(iter):
"""
Exhaust an iterator efficiently without caching
any of its yielded values
"""
deque(maxlen=0).extend(iter)

It's not going to be that inefficient unless you're calling it in a long
inner loop.
 
S

Steven D'Aprano

I have this innocent and simple code:

from collections import deque
exhaust_iter = deque(maxlen=0).extend

At this point, exhaust_iter is another name for the bound instance method
"extend" of one specific deque instance.

Other implementations may do otherwise[1], but CPython optimizes built-in
methods and functions. E.g. they have no __dict__ so you can't add
attributes to them. When you look up exhaust_iter.__doc__, you are
actually looking up (type(exhaust_iter)).__doc__, which is a descriptor:

py> type(exhaust_iter).__doc__
<attribute '__doc__' of 'builtin_function_or_method' objects>
py> type(type(exhaust_iter).__doc__)
<class 'getset_descriptor'>


Confused yet? Don't worry, you will be...

So, calling exhaust_iter.__doc__:

1) looks up '__doc__' on the class "builtin_function_or_method", not the
instance;

2) which looks up '__doc__' on the class __dict__:

py> type(exhaust_iter).__dict__['__doc__']
<attribute '__doc__' of 'builtin_function_or_method' objects>

3) This is a descriptor with __get__ and __set__ methods. Because the
actual method is written in C, you can't access it's internals except via
the API: even the class __dict__ is not really a dict, it's a wrapper
around a dict:

py> type(type(exhaust_iter).__dict__)
<class 'mappingproxy'>


Anyway, we have a descriptor that returns the doc string:

py> descriptor = type(exhaust_iter).__doc__
py> descriptor.__get__(exhaust_iter)
'Extend the right side of the deque with elements from the iterable'

My guess is that it is fetching this from some private C member, which
you can't get to from Python except via the descriptor. And you can't set
it:

py> descriptor.__set__(exhaust_iter, '')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: attribute '__doc__' of 'builtin_function_or_method'
objects is not writable


which is probably because if you could write to it, it would change the
docstring for *every* deque. And that would be bad.

If this were a pure-Python method, you could probably bypass the
descriptor, but it's a C-level built-in. I think you're out of luck.

I think the right solution here is the trivial:

def exhaust(it):
"""Doc string here."""
deque(maxlen=0).extend(it)


which will be fast enough for all but the tightest inner loops. But if
you really care about optimizing this:


def factory():
eatit = deque(maxlen=0).extend
def exhaust_iter(it):
"""Doc string goes here"""
eatit(it)
return exhaust_iter

exhaust_it = factory()
del factory


which will be about as efficient as you can get while still having a
custom docstring.

But really, I'm having trouble understanding what sort of application
would have "run an iterator to exhaustion without doing anything with the
values" as the performance bottleneck :)


exhaust_iter.__doc__ = "Exhaust an iterator efficiently [...]"

Obviously it does not work.

Even if it did work, it would not do what you hope. Because __doc__ is a
dunder attribute (double leading and trailing underscores), help()
currently looks it up on the class, not the instance:


class Spam:
"Spam spam spam"

x = Spam()
help(x)
=> displays "Spam spam spam"

x.__doc__ = "Yummy spam"
help(x)
=> still displays "Spam spam spam"


Is there a way to get it to work simply and
without creating a new scope (which would be a rather inefficient a way
to set documentation, and would hamper introspection)?

How about dropping the "simply" requirement?

I don't believe so.





[1] IronPython and Jython both currently do the same thing as CPython, so
even if this is not explicitly language-defined behaviour, it looks like
it may be de facto standard behaviour.
 
C

Chris Angelico

I think the right solution here is the trivial:

def exhaust(it):
"""Doc string here."""
deque(maxlen=0).extend(it)


which will be fast enough for all but the tightest inner loops. But if
you really care about optimizing this:


def factory():
eatit = deque(maxlen=0).extend
def exhaust_iter(it):
"""Doc string goes here"""
eatit(it)
return exhaust_iter

exhaust_it = factory()
del factory


which will be about as efficient as you can get while still having a
custom docstring.

Surely no reason to go for the factory function:

def exhaust(it,eatit=deque(maxlen=0).extend):
eatit(it)

ChrisA
 
S

Steven D'Aprano

Surely no reason to go for the factory function:

def exhaust(it,eatit=deque(maxlen=0).extend):
eatit(it)

Now you have the function accept a second argument, which is public, just
to hold a purely internal reference to something that you don't want the
caller to replace.
 
C

Chris Angelico

Now you have the function accept a second argument, which is public, just
to hold a purely internal reference to something that you don't want the
caller to replace.

True, but doesn't that happen fairly often with default args? Usually
it's in the "int=int" notation to snapshot for performance.

ChrisA
 
J

Joshua Landau

But really, I'm having trouble understanding what sort of application
would have "run an iterator to exhaustion without doing anything with the
values" as the performance bottleneck :)

Definitely not this one. Heck, there's even no real reason something
as appropriately-named as "exhaust_iter" needs documentation.

Largely I was asking because I'd felt I'd missed something more
obvious; it seems there was not. I'm also doing some more functools
stuff than usual -- this method also applies to functions generated
with, say, functools.partial I had guessed. Only it does not, as you
show below -- and functools.partial objects allow you to ineffectively
set .__doc__ anyway.

I also feel that:

def factory():
eatit = deque(maxlen=0).extend
def exhaust_iter(it):
"""Doc string goes here"""
eatit(it)
return exhaust_iter

exhaust_it = factory()
del factory

is a very unobvious way to change a docstring and hides what I'm doing
very effectively. Chris Angelico's method is a fair bit better in this
regard, but I'm not sure it's worth it in this case. One
recommendation with Chris's method is to make it keyword-only (with
"*") which should keep the interface a touch cleaner.
exhaust_iter.__doc__ = "Exhaust an iterator efficiently [...]"

Obviously it does not work.

Even if it did work, it would not do what you hope. Because __doc__ is a
dunder attribute (double leading and trailing underscores), help()
currently looks it up on the class, not the instance:

I'd not considered that, and it seems to have doomed me from the start.
 
A

alex23

I also feel that:

def factory():
eatit = deque(maxlen=0).extend
def exhaust_iter(it):
"""Doc string goes here"""
eatit(it)
return exhaust_iter

exhaust_it = factory()
del factory

is a very unobvious way to change a docstring and hides what I'm doing
very effectively.

My last post seems to have been eaten by either Thunderbird or the
EternalSeptember servers, but it contained an erroneous claim that the
straight function version performed as well as the factory one. However,
in the interim a co-worker has come up with a slightly faster variant:

from functools import partial
from collections import deque

class exhaust_it(partial):
"""custom doc string"""

exhaust_it = exhaust_it(deque(maxlen=0).extend)

Shadowing the class name with the partial instance will ensure it has
the same name when accessed via help(), and it's a simple way to avoid
needing to clean up the namespace, as well.
 
J

Joshua Landau

My last post seems to have been eaten by either Thunderbird or the
EternalSeptember servers, but it contained an erroneous claim that the
straight function version performed as well as the factory one. However, in
the interim a co-worker has come up with a slightly faster variant:

from functools import partial
from collections import deque

class exhaust_it(partial):
"""custom doc string"""

exhaust_it = exhaust_it(deque(maxlen=0).extend)

Shadowing the class name with the partial instance will ensure it has the
same name when accessed via help(), and it's a simple way to avoid needing
to clean up the namespace, as well.

That's beautiful. You could even trivially make a wrapper function:

def wrap_docstring(function, docstring, *, name=None):
class Wrapper(partial): pass
Wrapper.__name__ = function.__name__ if name is None else name
Wrapper.__doc__ = docstring
return Wrapper(function)

which is no slower. You get great introspection through the "func"
attribute, too :).

Also:
times = time_raw(), time_function(), time_factory(), time_argument_hack(), time_partial()
[round(time/times[0], 1) for time in times]
[1.0, 16.8, 3.1, 3.0, 1.8]

This times almost purely the constant overhead by calling
exhaust_iterabe on an empty iterable. So your friend wins the
premature optimisation test, too.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,773
Messages
2,569,594
Members
45,114
Latest member
GlucoPremiumReview
Top