The opener parameter of Python 3 open() built-in

S

Steven D'Aprano

Am 03.09.2012 14:32, schrieb Marco:

The opener argument is a new 3.3 feature. For example you can use the
feature to implement exclusive creation of a file to avoid symlink
attacks.

import os

def opener(file, flags):
return os.open(file, flags | os.O_EXCL)

open("newfile", "w", opener=opener)


Why does the open builtin need this added complexity? Why not just call
os.open directly? Or for more complex openers, just call the opener
directly?

What is the rationale for complicating open instead of telling people to
just call their opener directly?
 
D

Dennis Lee Bieber

Why does the open builtin need this added complexity? Why not just call
os.open directly? Or for more complex openers, just call the opener
directly?
Because os.open() returns a low-level file descriptor, not a Python
file object?
What is the rationale for complicating open instead of telling people to
just call their opener directly?

To avoid the new syntax would mean coding the example as

f = os.fdopen(os.open("newfile", flags | os.O_EXCL), "w")

which does NOT look any cleaner to me... Especially not if "opener" is
to be used in more than one location. Furthermore, using "opener" could
allow for a localized change to affect all open statements in the module
-- change file path, open for string I/O rather than file I/O, etc.
 
S

Steven D'Aprano

Because os.open() returns a low-level file descriptor, not a
Python file object?

Good point.

But you can wrap the call to os.open, as you mention below. The only
complication is that you have to give the mode twice, converting between
low-level O_* integer modes and high-level string modes:

a = os.open('/tmp/foo', os.O_WRONLY | os.O_CREAT)
b = os.fdopen(a, 'w')


But to some degree, you still have to do that with the opener argument,
at least in your own head.

To avoid the new syntax would mean coding the example as

f = os.fdopen(os.open("newfile", flags | os.O_EXCL), "w")

which does NOT look any cleaner to me...

Well, I don't know about that. Once you start messing about with low-
level O_* flags, it's never going to exactly be clean no matter what you
do. But I think a one-liner like the above *is* cleaner than a three-
liner like the original:

def opener(file, flags):
return os.open(file, flags | os.O_EXCL)

open("newfile", "w", opener=opener)

although I accept that this is a matter of personal taste.

Particularly if the opener is defined far away from where you eventually
use it. A lambda is arguably better from that perspective:

open("newfile", "w",
opener=lambda file, flags: os.open(file, flags | os.O_EXCL)
)

but none of these solutions are exactly neat or clean. You still have to
mentally translate between string modes and int modes, and make sure
you're not passing the wrong mode:

py> open('junk', 'w').write('hello world')
11
py> open('junk', 'r', opener=lambda file, flags: os.open(file, flags |
os.O_TRUNC)).read() # oops
''

so it's not exactly a high-level interface.

In my opinion, a cleaner, more Pythonic interface would be either:

* allow built-in open to take numeric modes:

open(file, os.O_CREAT | os.O_WRONLY | os.O_EXCL)

* or even more Pythonic, expose those numeric modes using strings:

open(file, 'wx')


That's not as general as an opener, but it covers the common use-case and
for everything else, write a helper function.

Especially not if "opener" is to be used in more than one location.

The usual idiom for fixing the "used more than once" is "write a helper",
not "add a callback function to a builtin" :)

Furthermore, using "opener" could
allow for a localized change to affect all open statements in the module
-- change file path, open for string I/O rather than file I/O, etc.


A common idiom for that is to shadow open in the module, like this:

_open = open
def open(file, *args):
file = file.lowercase()
return _open(file, *args)
 
D

Dennis Lee Bieber

why not call that directly?

f = opener(file, flags)

It certainly is cleaner than either of the alternatives so far, and it
doesn't add a parameter to the builtin.
But it returns an OS file descriptor... It doesn't return a Python
file object. From what I can tell, (I've just upgraded to Python 2.7
<G>) the opener is meant to replace the low-level function normally used
by Python's open(), and supplies an fd which gets wrapped by Python's
open().
<type 'file'>

The two are not compatible except by using os.fdopen(fd) to get a
file object, or fo.fileno() to get the low-level file descriptor
I don't know of any real-life code which would be significantly improved
by that. Can you point us to some?
Not really -- but if they went one step further and supplied
"reader" and "writer" operations too, they'd get close to what I once
had to do in FORTRAN 77 under DEC VMS (by hooking in code to do double
buffering when reading data from magtape, while keeping the program
using regular F77 I/O statements; the open statement would do a pre-read
of one buffer and return; subsequent read statements would find a
pre-filled buffer, and issue an non-blocking read to fill the other
buffer -- cut the runtime for the program into a third or less as it was
no longer stuck waiting for slow mag-tape operations each time it did a
read). Implementing something like this in Python would likely require
"opener" to spawn a reader thread to do the I/O asynchronously, using a
limited Queue (1 buffer worth -- the reader thread would be the second
buffer, blocked on Q.put()), and a "reader" that would do Q.get() and
return the result to the Python read() logic for any parsing.

Okay, in Python, one could probably subclass "file", and override
the read methods -- but one would not be able to use the Python
open()... You'd have to do something like f = myFile(normal, open, args)
instead...
 
T

Terry Reedy


io.open depends on a function the returns an open file descriptor.
opener exposes that dependency so it can be replaced. (Obviously, one
could go crazily overboard with this idea.) I believe this is a simple
form of dependency injection, though it might be hard to discern from
the Java-inspired verbiage of the Wikipedia article. Part of the
rationale in the issue is to future-proof io.open from any future needs
for alternate fd fetching. It could also be used to decouple a test of
io.open from os.open
 
C

Chris Angelico

io.open depends on a function the returns an open file descriptor. opener
exposes that dependency so it can be replaced.

I skimmed the bug report comments but didn't find an answer to this:
Why not just monkey-patch? When a module function calls on a support
function and you want to change that support function's behaviour,
isn't monkey-patching the most usual?

Several possibilities come to mind, but without knowledge of
internals, I have no idea what's actually the case.
* Patching builtins is too confusing or dangerous, and should be avoided?
* You want to narrow the scope of the patch rather than do it globally?
* Explicit is better than implicit?

It just strikes me as something where an API change may not be necessary.

ChrisA
 
T

Terry Reedy

I skimmed the bug report comments but didn't find an answer to this:
Why not just monkey-patch?

As far as I know, one can only use normal Python code to monkey patch
modules written in Python. Even then, one can only rebind names stored
in writable dicts -- the module dict and class attribute dicts. The
attributes of function code objects are readonly. Replacing a code
object is not for the faint of heart.

io.py mostly loads _io compiled from C.
 
A

Antoine Pitrou

Chris Angelico said:
I skimmed the bug report comments but didn't find an answer to this:
Why not just monkey-patch? When a module function calls on a support
function and you want to change that support function's behaviour,
isn't monkey-patching the most usual?

Monkey-patching globals is not thread-safe: other threads will see your
modification, which is risky and fragile.

Regards

Antoine.
 
S

Steven D'Aprano

Monkey-patching globals is not thread-safe: other threads will see your
modification, which is risky and fragile.

Isn't that assuming that you don't intend the other threads to see the
modification?

If I have two functions in my module that call "open", and I monkey-patch
the global (module-level) name "open" to intercept that call, I don't see
that there is more risk of breakage just because one function is called
from a thread.

Obviously monkey-patching the builtin module itself is much riskier,
because it doesn't just effect code in my module, it affects *everything*.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top