__pycache__, one more good reason to stck with Python 2?

J

jmfauth

As a scientist using computer tools, and not as a computer
scientist, I discovered Python long time ago (it was in its
1.5.6 version) and I remain an happy user up to now date.
Yesterday, I was happy to download and test Python 3.2rc1.
Python is still this powerful and pleasant language, but...

I fall on this cached pyc's directory, __pycache__. Without
to many explanations (I think they will be obvious for an
end user), one word: a nithtmare.
 
S

Steven D'Aprano

As a scientist using computer tools, and not as a computer scientist, I
discovered Python long time ago (it was in its 1.5.6 version) and I
remain an happy user up to now date. Yesterday, I was happy to download
and test Python 3.2rc1. Python is still this powerful and pleasant
language, but...

I fall on this cached pyc's directory, __pycache__. Without to many
explanations (I think they will be obvious for an end user), one word: a
nithtmare.

No, I'm sorry, they're not obvious at all. I too have been using Python
since version 1.5 (although I don't remember the minor point release),
and I've also been testing Python 3.2, but I'm afraid I have no idea why
you think that __pycache__ is a nightmare.
 
J

jmfauth

No, I'm sorry, they're not obvious at all.

These reasons become obious as soon as you start working.

Let's take a practical point view. It did not take a long time
to understand, that it is much simpler to delete the __pycache__
directory everytime I compile my scripts than to visit it just
because I deleted or renamed a .py file in my working directory.

How long will it take to find on the web tools to parse and
delete ophan .pyc files on a hd?

If I get (stupidly, I agree) a .pyc file and want to test
it. Should I create manually a cache alongside my test.py
script?

If I wish to delete the numerous auxiliary files a TeX
document produces, I just del /rm .* to keep a clean working
dir. With Python now? Impossible! The files are spread in two
dirs (at least).

....

That's life, unfortunately.
 
T

Terry Reedy

No. The benefit of, for instance, not adding 200 .pyc files to a
directory with 200 .py files is immediately obvious to most people.

These reasons become obious as soon as you start working.

Let's take a practical point view. It did not take a long time
to understand, that it is much simpler to delete the __pycache__
directory everytime I compile my scripts than to visit it just
because I deleted or renamed a .py file in my working directory.

Deleting the subdirectory is as least as easy as searching through the
directory to find one or more files. In any case, the obsolete misnamed
..pyc files hurt very little. Delete once a year or so if the space is an
issue.

In 13 years, I have hardly ever worried about deleting .pyc files.
How long will it take to find on the web tools to parse and
delete ophan .pyc files on a hd?

If I get (stupidly, I agree) a .pyc file and want to test
it. Should I create manually a cache alongside my test.py
script?

Since this is stupid (your word), it should be rare ;-).
Since it can be dangerous, it should be more difficult.
If you get a zip or tar file from a trusted source,
get the .__cache__ dir with the file.
If I wish to delete the numerous auxiliary files a TeX
document produces, I just del /rm .* to keep a clean working
dir. With Python now? Impossible! The files are spread in two
dirs (at least).

I do not know what TeX has to do with Python.
 
A

Alice Bevan–McGregor

find . -name \*.pyc -exec rm -f {} \;

vs.

rm -rf __pycache__

I do not see how this is more difficult, but I may be missing something.

— Alice.
 
T

Terry Reedy

That's why i disagree (and hate) the automatic compilation of code, my
project directory becomes full of object files

That is one point of stashing them all in a .__pycache__ directory.
After reading some articles about it, I've come to think python depends
a lot on bytecode writing on the filesystem.

A purely interpreted Python would be much slower. Saving module code to
the filesystem speeds startup, which most find slow as it is.
I wonder if it's good or
bad. I find it so invasive, and that it should not be the default
behaviour. But that's me, i'm sure most of python users don't mind at all.

Seems so. Complaints are rare.
 
C

Carl Banks

        find . -name \*.pyc -exec rm -f {} \;

vs.

        rm -rf __pycache__

I do not see how this is more difficult, but I may be missing something.


Well the former deletes all the pyc files in the directory tree
whereas the latter only deletes the top level __pycache__, not the
__pycache__ for subpackages. To delete all the __pycache__s you'd
have to do something like this:

file . -name __pycache__ -prune -exec rm -rf {} \;

or, better,

file . -name __pycache__ -prune | xargs rm -rf

Still not anything really difficult. (I don't think a lot of people
know about -prune; it tells find don't recursively descend.)


Carl Banks
 
C

Carl Banks

These reasons become obious as soon as you start working.

Let's take a practical point view. It did not take a long time
to understand, that it is much simpler to delete the __pycache__
directory everytime I compile my scripts than to visit it just
because I deleted or renamed a .py file in my working directory.

According to PEP 3147, stale *.pyc files in the __pycache__
directories are ignored. So it's no longer necessary to delete the
*.pyc files when renaming a *.py file. This is a big improvement, and
easily justifies __pycache__ IMO, even without the distro
considerations.

How long will it take to find on the web tools to parse and
delete ophan .pyc files on a hd?

Probably under a month. (Updating old tools to work with new scheme
will take a bit longer.)

If I get (stupidly, I agree) a .pyc file and want to test
it. Should I create manually a cache alongside my test.py
script?

Nope: according to PEP 3147 a standalone *.pyc should not be put in
same directory where the source file would have been, not in the
__pycache__ directory (it'll be considered stale otherwise).

It says this is for backwards compatibility, but I think there are
valid reasons you don't want to deliver source so it's good that we
can still do that.

If I wish to delete the numerous auxiliary files a TeX
document produces, I just del /rm .* to keep a clean working
dir. With Python now? Impossible! The files are spread in two
dirs (at least).

...

That's life, unfortunately.

Give yourself a little time.

The one little non-temporary drawback I see for __pycache__ is if you
have a directory with lots of stuff and one or two python files in the
mix; and then you add that directory to sys.path and import the
files. It creates the __pycache__ in that directory. It's a bit of a
shock compared to the *.pyc files because it's at a very different
place in the listings, and is a directory and not a file. But that's
a minor thing.


Carl Banks
 
T

Terry Reedy

That conclusion isn't valid; the behaviour is (AIUI) only in Python 3.2
and later. You can't presume that a lack of complaints means anything
about “most Python users†until those users are on a Python that shows
this behaviour.

The person I was responding to was complaining, I believe, about .pyc
files in the project directory, which I take to be the same directory as
.py files. This is the 21-year-old behavior now changed.
 
C

Carl Banks

That's life, unfortunately.

Also, an earlier version of the proposal was to create a *.pyr
directory for each *.py file. That was a real mess; be thankful they
worked on it and came up with a much cleaner method.


Carl Banks
 
J

jmfauth

...
This is the 21-year-old behavior now changed.
...


Yes, you summarized the situation very well. The way of
working has changed and probably more deeply that one
may think.

It is now practically impossible to launch a Python
application via a .pyc file. (For the fun, try to add
the "parent directory" of a cached file to the sys.path).

About the caches, I'am just fearing, they will
become finally garbage collectors of orphan .pyc files,
Python has seed/seeded(?). The .pyc files may not be
very pleasant, but at least you can use them and you
have that "feeling" of their existence.
I my "computer experience", once you start to cache/hide
something for simplicity, the problems start.

May be it a good move for Python, I do not feel very
comfortable with all this stuff.
 
P

Peter Otten

Carl said:
Well the former deletes all the pyc files in the directory tree
whereas the latter only deletes the top level __pycache__, not the
__pycache__ for subpackages. To delete all the __pycache__s you'd
have to do something like this:

file . -name __pycache__ -prune -exec rm -rf {} \;

or, better,

file . -name __pycache__ -prune | xargs rm -rf

Still not anything really difficult. (I don't think a lot of people
know about -prune; it tells find don't recursively descend.)

What's the advantage of 'find ... | xargs ...' over 'find ... -exec ...'?
 
S

Stefan Behnel

Terry Reedy, 18.01.2011 04:39:
Saving module code to the
filesystem speeds startup, which most find slow as it is.

I've been using Jython recently, which, in addition to the huge JVM startup
time, must also fire up the Jython runtime before actually doing anything
useful.

I must say that I never found CPython's startup time to be slow, quite the
contrary.

Stefan
 
S

Stefan Behnel

jmfauth, 18.01.2011 09:58:
About the caches, I'am just fearing, they will
become finally garbage collectors of orphan .pyc files,
Python has seeded

I can't see how that is supposed to be any different than before. If you
rename a file without deleting the .pyc file, you will end up with an
orphaned .pyc file, right now and as it was for the last 21 years. The same
applies to the new cache directory.

Stefan
 
S

Stefan Behnel

Peter Otten, 18.01.2011 10:04:
What's the advantage of 'find ... | xargs ...' over 'find ... -exec ...'?

The former runs in parallel, the latter runs sequentially.

Stefan
 
E

Ethan Furman

Carl said:
Nope: according to PEP 3147 a standalone *.pyc should not be put in
same directory where the source file would have been, not in the
__pycache__ directory (it'll be considered stale otherwise).

Typo?

According to PEP 3147 a standalone *.pyc *should* (not should not) be
put in the same directory where the source file would have been.

~Ethan~
 
P

Peter Otten

Stefan said:
Peter Otten, 18.01.2011 10:04:

The former runs in parallel, the latter runs sequentially.

This may sometimes be relevant, but I doubt that it matters in this
particular case.

Peter
 
P

Peter Otten

Sherm said:
Exec launches a new instance of 'rm' for each found file, while xargs
launches a single instance, and passes the list of found files as arg-
uments.

Probably not a big deal in this case, but if you're passing a long list
of files to a script that has a long startup time, it can make a big
difference.

You can avoid that:

$ touch {1..10}.txt
$ find . -exec python -c'import sys; print sys.argv' {} \;
['-c', '.']
['-c', './10.txt']
['-c', './1.txt']
['-c', './7.txt']
['-c', './8.txt']
['-c', './4.txt']
['-c', './6.txt']
['-c', './3.txt']
['-c', './5.txt']
['-c', './9.txt']
['-c', './2.txt']
$ find . -exec python -c'import sys; print sys.argv' {} \+
['-c', '.', './10.txt', './1.txt', './7.txt', './8.txt', './4.txt',
'./6.txt', './3.txt', './5.txt', './9.txt', './2.txt']

Peter
 
D

Dan Stromberg

This may sometimes be relevant, but I doubt that it matters in this
particular case.

I don't think xargs is ever parallel, but GNU parallel is supposed to
be a parallel tool with options and usage similar to those/that of
xargs:
http://www.gnu.org/software/parallel/

xargs' main advantages are:
1) Simpler quoting (correctness), especially if you use (GNU) "find
-print0" with "xargs -0"
2) Far fewer exec's, which usually means much better performance
 
H

Hâ‚‚0.py

What's the advantage of 'find ... | xargs ...' over 'find ... -exec ...'?

Portability. Running the '-exec' version will work fine in a directory
with a relatively small number of files, but will fail on a large one.
'xargs', which is designed to handle exactly that situations, splits
the returned output into chunks that can be handled by 'rm' and such.
'|xargs' is always the preferred option when you don't know how large
the output is going to be.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top