PEP 304 - is anyone really interested?

S

Skip Montanaro

I wrote PEP 304, "Controlling Generation of Bytecode Files":

http://www.python.org/peps/pep-0304.html

quite awhile ago. The first version appeared in January 2003 in response to
questions from people about controlling/suppressing bytecode generation in
certain situations. It sat idle for a long while, though from time-to-time
people would ask about the functionality and I'd respond or update the PEP.
In response to another recent question about this topic:

http://mail.python.org/pipermail/python-list/2005-June/284775.html

and a wave of recommendations by Raymond Hettinger regarding several other
PEPs, I updated the patch to work with current CVS. Aside from one response
by Thomas Heller noting that my patch upload failed (and which has been
corrected since), I've seen no response either on python-dev or
comp.lang.python.

I really have no personal use for this functionality. I control all the
computers on which I use Python and don't use any exotic hardware (which
includes Windows as far with its multi-rooted file system as far as I'm
concerned), don't run from read-only media or think that in-memory file
systems are much of an advantage over OS caching. The best I will ever do
with it is respond to people's inputs. I'd hate to see it sit for another
two years. If someone out there is interested in this functionality and
would benefit more from its incorporation into the core, I'd be happy to
hand it off to you.

So speak up folks, otherwise my recommendation is that it be put out of its
misery.

Skip
 
P

Patrick Maupin

Skip said:
I wrote PEP 304, "Controlling Generation of Bytecode Files": ....
If someone out there is interested in this functionality
and would benefit more from its incorporation into the
core, I'd be happy to hand it off to you.

I am quite interested in this PEP.

What, exactly, would the recipient of this "hand-off" have to do? Does
it primarily have to do with resolution of the issues listed in the
PEP?

BTW, my use case for this PEP is different than the PEP's given
rationale -- in general, I (and the people I work with) keep our source
trees completely separate from (and parallel to) our object trees.
This obviates the need for .cvsignore, makes "make clean" a much easier
proposition, makes it easier to generate and test/compare slightly
different object sets (just use an environment variable to change the
build target directory), and reduces the filtration required to get
usable output from simple source file greps. The main fly in the
ointment right now is .pyc files -- they tend to pop up whereever there
is a little script (or more accurately, of course, whereever there is a
multiple-module script :), and the current choices for dealing with
them as special cases, e.g. zip files, copying all sub-modules over to
the build tree before importing them, manually reading the files and
compiling them, etc., are all rather unappealing.
From my perspective, the lack of PEP 304 functionality in the standard
distribution is hampering World Domination, since the littering of
source directories with compiled .pyc files is yet another excuse for
some of my compatriots to keep using Perl. In fact, I have to be very
careful how I introduce Python scripts into this environment lest I
anger someone by polluting their source tree.

Note that for my current purposes (as described above, and in contrast
to the original rationale behind the PEP) sys.bytecodebase is much more
important than PYTHONBYTECODEBASE (because my few scripts can set up
sys.bytecodebase quite easily themselves). Since it would seem that
most of the security concerns would derive from the PYTHONCODEBASE
environment variable, it would be an interesting exercise to try to
figure out how many potential users of this PEP want it for my purposes
and how many want it for the original purpose. I would think not
adding the environment variable would be less contentious from a
security/backward-compatibility standpoint, but possibly more
contentious from a
lack-of-new-useful-functionality-that-could-not-be-easily-implemented-in-an-add-on-package
standpoint.


Regards,
Pat
 
T

Thomas Guettler

Am Wed, 22 Jun 2005 18:01:51 -0500 schrieb Skip Montanaro:
I wrote PEP 304, "Controlling Generation of Bytecode Files":

http://www.python.org/peps/pep-0304.html

....

Hi,

I am interested in a small subset: I want to import a file without
a '.pyc' being generated.

Background: I sometimes missuse python for config files. For example
there is a file $MYAPP/etc/debuglog.py. This file contains simple
assignments

search=0
indexing=1
.....

In the code I use it like this:

sys.path.append(...) # Put $MYAPP/etc into the path
import debuglog

....
if debuglog.search:
print "Searching for ...."

I don't want pyc files in the etc directory.

Up to now I do it like this:

import debuglog
try:
os.unlink("...debuglog.pyc")
except:
pass


Thomas
 
T

Thomas Heller

Thomas Guettler said:
Am Wed, 22 Jun 2005 18:01:51 -0500 schrieb Skip Montanaro:


...

Hi,

I am interested in a small subset: I want to import a file without
a '.pyc' being generated.

Background: I sometimes missuse python for config files. For example

Although I was not interested originally, I think that's a use case I
also have. Optional config files, which should not be compiled to .pyc
or .pyo. Only removing the .py file doesn't have the expected effect
if a .pyc and/or .pyo if is left.

I don't think the PEP supports such a use case.

BTW: While I'me reading the PEP to check the above, I encountered this:

Add a new environment variable, PYTHONBYTECODEBASE, to the mix of
environment variables which Python understands. PYTHONBYTECODEBASE is
interpreted as follows:

If not defined, Python bytecode is generated in exactly the same
way as is currently done. sys.bytecodebase is set to the root
directory (either / on Unix and Mac OSX or the root directory of
the startup (installation???) drive -- typically C:\ -- on
Windows).

If defined and it refers to an existing directory to which the
user has write permission, sys.bytecodebase is set to that
directory and bytecode files are written into a directory
structure rooted at that location.

If defined but empty, sys.bytecodebase is set to None and
generation of bytecode files is suppressed altogether.

AFAIK, it is not possible to define empty env vars on Windows.

c:\>set PYTHONCODEBASE=

would remove this env var instead of setting it to an empty value.

Thomas
 
P

Patrick Maupin

Thomas said:
Although I was not interested originally, I think that's
a use case I also have. Optional config files, which
should not be compiled to .pyc or .pyo. Only removing
the .py file doesn't have the expected effect
if a .pyc and/or .pyo if is left.


I also think that if nobody has the same use-case as envisioned by the
original PEP, we should probably step back and rethink how it should be
implemented. The original PEP envisioned the installation of
third-party .py file libraries without their corresponding .pyc files
into directories for which the eventual script user had no
write-access. As more people become coginzant of the correct way to
install Python libraries, I think this use-case fades in importance.

The key distinction between this older use-case and the currently
envisioned use-case is that the former assumes third-party code, and
the latter assumes that the control over .pyc generation is desired by
the _author_ of some python code, rather than merely the _installer_ of
some python code. In this latter case, environment variables are not
strictly necessary, because the top-level script could do whatever it
needs to control .pyc generation, from inside Python itself.
AFAIK, it is not possible to define empty env vars on Windows.

You make a good point about null environment variables. I think the
original PEP was fairly *nix-centric, both in that aspect, and in the
aspect of requiring the value of bytecodebase to be the "root" of the
file system. This might not have the desired results in all cases on
Windows.

Regards,
Pat
 
J

John Roth

Patrick Maupin said:
I also think that if nobody has the same use-case as envisioned by the
original PEP, we should probably step back and rethink how it should be
implemented. The original PEP envisioned the installation of
third-party .py file libraries without their corresponding .pyc files
into directories for which the eventual script user had no
write-access. As more people become coginzant of the correct way to
install Python libraries, I think this use-case fades in importance.

See below.
The key distinction between this older use-case and the currently
envisioned use-case is that the former assumes third-party code, and
the latter assumes that the control over .pyc generation is desired by
the _author_ of some python code, rather than merely the _installer_ of
some python code. In this latter case, environment variables are not
strictly necessary, because the top-level script could do whatever it
needs to control .pyc generation, from inside Python itself.

Well, after thinking this over, I'm finding a significant problem with
the envisioned mechanism: it specifies a _single_ directory tree that
shadows the source tree. I'd like to suggest a different mechanism,
at least for packages (top level scripts don't generate .pyc files
anyway.) Put a system variable in the __init__.py file. Something
like __obj__ = path would do nicely. Then when Python created
the __init__.pyc file, it would insert a back link __src__ entry.

That way, either the source or the object directory could be
specified in the Pythonpath, the import machinery could then do the
right thing by checking the appropriate directory for the .py and .pyc
files.

I will say that I would personally find it very useful to have the
..py and .pyc (.pyo) files in separate directories for development
work.
You make a good point about null environment variables. I think the
original PEP was fairly *nix-centric, both in that aspect, and in the
aspect of requiring the value of bytecodebase to be the "root" of the
file system. This might not have the desired results in all cases on
Windows.

Actually, the PEP states that if the environment variable does not specify
a directory then it does not generate a .pyc file. Any entry that is not a
directory would do, such as some special characters that are illegal in
file and directory names.

John Roth
 
P

Patrick Maupin

John said:
I'd like to suggest a different mechanism, at least for packages
(top level scripts don't generate .pyc files anyway.) Put a system
variable in the __init__.py file. Something like __obj__ = path
would do nicely. Then when Python created the __init__.pyc file,
it would insert a back link __src__ entry.

I like the kernel of that proposal. One down-side is that you might
wind up compiling at least __init__.py on every execution which imports
the package (to extract the object directory variable __obj__). If
you had automatic .pyc directory creation and updating of __obj__ for
sub-packages, you would only take the hit of recompiling the topmost
__init__.py in a package tree.

I'm not as sure about the use of the backlink -- it seems to introduce
a little bit of a chicken/egg problem to be able to import from a .pyc
directory which knows where its .py directory is -- how did the .pyc
files get there in the first place? There must have been a separate
compilation phase at some point, and while I can envision a use case
for that, I don't have a pressing need for it.

I'm also not sure about the extent of the changes required to the
import/compile mechanism. It seems the importer would have to be able
to defer writing out the .pyc file until after the execution of the
body of the __init__.py module.

Finally, I think you're dismissing one of the discussed use-cases out
of hand :) Although it is true that a top-level script will not
generate a .pyc file, a slightly more generic use of the term "script"
could encompass a top-level module and one or more sub-modules in the
same directory. If the script is run infrequently enough and the
sub-modules are small enough, in some cases I would certainly love to
be able to tell Python "Please don't litter this directory with .pyc
files."

Assuming the deferral of writing .pyc files until after module
execution is is not a problem (for all I know it already works this
way, but I wouldn't know why that would be), I think a slightly more
fleshed out (but still very green) proposal might be:

1) When a module is first imported, the importing module's globals are
searched for an __obj__ identifier. The importer will make a local
copy of this variable during the import:

objdir = passed_globals.get('__obj__', None)

2) If a .pyc/.pyo file is found in the same directory as the
coresponding .py file:
a) If the .pyc/.pyo is newer, it is loaded and executed and we
are done; or
b) objdir is set to None to indicate that we should regenerate
..pyc/.pyo in situ.

Step b) could be debated, but I think it would be very confusing to
have an out-of-date .pyc file in a directory, with the "real" .pyc file
elsewhere...

3) If this is a package import and objdir is a non-null string, objdir
is updated to include the package name. Something like:

if is_package and objdir: objdir = os.path.join(objdir, package_name)

4) If objdir is a non-null string and there is a newer readable
..pyc/.pyo file at the directory indicated in objdir, that file is
loaded and executed and we are done.

5) The source file is compiled into memory.

6) If objdir is not None the globals of the newly created module are
updated such that __obj__= objdir.

7) The module body is executed, including performing any sub-imports.

8) The module's __obj__ is now examined to determine if and where to
write the module's .pyc/.pyo file:

if __obj__ does not exist -- write to same directory as .py (same as
current behavior)

if __obj__ exists and is a non-empty string and is equal to objdir
(e.g. not modified during initial module body execution) -- write to
the named directory. Create the leaf directory if necessary (e.g. for
package imports).

if __obj__ exists and is the empty string, do not create a .pyc file.
This allows author suppression of writing .pyc files.

if __obj__ exists but is not equal to objdir, create the leaf directory
if it does not exist, but do not write the .pyc file. This is an
optimization for the case where __init__.py specifies a new package
subdirectory -- why write out a .pyc file if you won't know where it is
until you re-compile/re-execute the source .py file? You'll never be
able to make use of the .pyc file and maybe you don't even have write
privileges in the directory. Even though this is true, we still want
to create the directory, so that sub-package imports don't have to
create multiple directory levels.

I think this mechanism would allow for the following:

- Control of package/subpackage .pyc file location with a single line
of code in the topmost __init__ file, at the small cost of recompiling
the __init__ file on every program execution.

- Control of _all_ .pyc file location (unless overridden for a given
package as described above) from a top-level script. In this case,
regular .pyc files would wind up _inside_ the __obj__ directory of the
package which first imported them, and package .pyc files would wind up
in a subdirectory underneath the directory of the package which first
imported them. This is not a major issue for scripts which follow
consistent import execution flow, but could be surprising to a
first-time user of the control mechanism, and could increase disk space
usage a tiny bit.

- Suppression of writing of .pyc files from a top-level script by
setting __obj__= ''. Preexisting .pyc files would be used (and
overwritten if the corresponding .py is newer), but .pyc files would
not be generated if they did not already exist.

In addition, it might be possible to implement this mechanism to enable
generation of .pyc files for zip-imported packages (as with using this
mechanism with regular packages, the __init__.pyc file would not be
written out, but it could define where to look for the other .pyc
files). This could potentially be a huge performance gain for some zip
imports.
Actually, the PEP states that if the environment variable does
not specify a directory then it does not generate a .pyc file.
Any entry that is not a directory would do, such as some special
characters that are illegal in file and directory names.

It is my understanding (the capabilities of bash and other shells
notwithstanding) that the only truly "illegal" filename characters
under Posix are the slash and the null character. So, for a full path
(as opposed to a single name), the only illegal character would be the
null character, which is probably pretty hard/ugly to inject into an
environment variable from a shell under either Windows or Linux. I
suppose you could try to usurp special combinations, such as "//",
except I think Microsoft has already done that :) Even if you found a
combination that Microsoft didn't claim, I would personally find it
very unPythonic to document the required use of something like
"badname//".

In any case, if we are re-thinking the PEP, and we find a use-case for
something like your __obj__ proposal, but don't find such a good
use-case for the environment variable, I would recommend leaving the
environment variable out of the PEP, since its presence would add a
security concern for people who are running SUDO python scripts when
they upgrade to a version of Python which honors the environment
variable.

Regards,
Pat
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,008
Latest member
obedient dusk

Latest Threads

Top