python24.zip

R

Robin Becker

Investigating a query about the python path I see that my win32 installation has
c:/windows/system32/python24.zip (which is non existent) second on sys.path
before the actual python24/lib etc etc.

Firstly should python start up with non-existent entries on the path?
Secondly is this entry be the default for some other kind of python installation?
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Robin said:
Firstly should python start up with non-existent entries on the path?

Yes, this is by design.
Secondly is this entry be the default for some other kind of python
installation?

Yes. People can package everything they want in python24.zip (including
site.py). This can only work if python24.zip is already on the path
(and I believe it will always be sought in the directory where
python24.dll lives).

Regards,
Martin
 
D

Dieter Maurer

Martin v. Löwis said:
Yes, this is by design.


Yes. People can package everything they want in python24.zip (including
site.py). This can only work if python24.zip is already on the path
(and I believe it will always be sought in the directory where
python24.dll lives).

The question was:

"should python start up with **non-existent** objects on the path".

I think there is no reason why path needs to contain an object
which does not exist (at the time the interpreter starts).

In your use case, "python24.zip" does exist and therefore may
be on the path. When "python24.zip" does not exist, it does
not contain anything and especially not "site.py".


I recently analysed excessive import times and
saw thousands of costly and unneccesary filesystem operations due to:

* long "sys.path", especially containing non-existing objects

Although non-existent, about 5 filesystem operations are
tried on them for any module not yet located.

* a severe weakness in Python's import hook treatment

When there is an importer "i" for a path "p" and
this importer cannot find module "m", then "p" is
treated as a directory and 5 file system operations
are tried to locate "p/m". Of course, all of them fail
when "p" happens to be a zip archive.


Dieter
 
R

Robin Becker

Dieter Maurer wrote:
......
The question was:

"should python start up with **non-existent** objects on the path".

I think there is no reason why path needs to contain an object
which does not exist (at the time the interpreter starts).

In your use case, "python24.zip" does exist and therefore may
be on the path. When "python24.zip" does not exist, it does
not contain anything and especially not "site.py".

I think this was my intention, but also I think I have some concern over
having two possible locations for the standard library. It seems non pythonic
and liable to cause confusion if some package should manage to install
python24.zip while I believe that python24\lib is being used.
I recently analysed excessive import times and
saw thousands of costly and unneccesary filesystem operations due to:

* long "sys.path", especially containing non-existing objects

Although non-existent, about 5 filesystem operations are
tried on them for any module not yet located.

* a severe weakness in Python's import hook treatment

When there is an importer "i" for a path "p" and
this importer cannot find module "m", then "p" is
treated as a directory and 5 file system operations
are tried to locate "p/m". Of course, all of them fail
when "p" happens to be a zip archive.


Dieter

I suppose that's a reason for eliminating duplicates and non-existent entries.
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Dieter said:
The question was:

"should python start up with **non-existent** objects on the path".

I think there is no reason why path needs to contain an object
which does not exist (at the time the interpreter starts).

There is. When the interpreter starts, it doesn't know what object
do or do not exist. So it must put python24.zip on the path
just in case.
In your use case, "python24.zip" does exist and therefore may
be on the path. When "python24.zip" does not exist, it does
not contain anything and especially not "site.py".

Yes, but the interpreter cannot know in advance whether
python24.zip will be there when it starts.
I recently analysed excessive import times and
saw thousands of costly and unneccesary filesystem operations due to:

Hmm. In my Python 2.4 installation, I only get 154 open calls, and
63 stat calls on an empty Python file. So somebody must have messed
with sys.path really badly if you saw thoughsands of file operations
(although I wonder what operating system you use so that failing
open operations are costly; most operating systems should do them
very efficiently).

Regards,
Martin
 
S

Steve Holden

Robin said:
Dieter Maurer wrote: [...]

I think this was my intention, but also I think I have some concern over
having two possible locations for the standard library. It seems non pythonic
and liable to cause confusion if some package should manage to install
python24.zip while I believe that python24\lib is being used.

I recently analysed excessive import times and
saw thousands of costly and unneccesary filesystem operations due to:

* long "sys.path", especially containing non-existing objects

Although non-existent, about 5 filesystem operations are
tried on them for any module not yet located.

* a severe weakness in Python's import hook treatment

When there is an importer "i" for a path "p" and
this importer cannot find module "m", then "p" is
treated as a directory and 5 file system operations
are tried to locate "p/m". Of course, all of them fail
when "p" happens to be a zip archive.


Dieter


I suppose that's a reason for eliminating duplicates and non-existent entries.
There are some aspects of Python's initialization that are IMHO a bit
too filesystem-dependent. I mentioned one in


http://sourceforge.net/tracker/index.php?func=detail&aid=1116520&group_id=5470&atid=105470

but I'd appreciate further support. Ideally there should be some means
for hooked import mechanisms to provide answers that are currently
sought from the filestore.

regards
Steve
 
D

Dieter Maurer

D

Dieter Maurer

Martin v. Löwis said:
Dieter Maurer wrote:
...

There is. When the interpreter starts, it doesn't know what object
do or do not exist. So it must put python24.zip on the path
just in case.

Really?

Is the interpreter unable to call "C" functions ("stat" for example)
to determine whether an object exists before it puts it on "path".
Yes, but the interpreter cannot know in advance whether
python24.zip will be there when it starts.

Thus, it checks dynamically when it starts.
Hmm. In my Python 2.4 installation, I only get 154 open calls, and
63 stat calls on an empty Python file. So somebody must have messed
with sys.path really badly if you saw thoughsands of file operations
(although I wonder what operating system you use so that failing
open operations are costly; most operating systems should do them
very efficiently).

The application was Zope importing about 2.500 modules
from 2 zip files "zope.zip" and "python24.zip".
This resulted in about 12.500 opens -- about 4 times more
than would be expected -- about 10.000 of them failing opens.


Dieter
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Dieter said:
Really?

Is the interpreter unable to call "C" functions ("stat" for example)
to determine whether an object exists before it puts it on "path".

What do you mean, "unable to"? It just doesn't.

Could it? Perhaps, if somebody wrote a patch.
Would the patch be accepted? Perhaps, if it didn't break something
else.

In the past, there was a silent guarantee that you could add
items to sys.path, and only later create the directories behind
these items. I don't know whether people rely on this guarantee.
The application was Zope importing about 2.500 modules
from 2 zip files "zope.zip" and "python24.zip".
This resulted in about 12.500 opens -- about 4 times more
than would be expected -- about 10.000 of them failing opens.

I see. Out of curiosity: how much startup time was saved
when sys.path was explicitly stripped to only contain these
two zip files?

I would expect that importing 2500 modules takes *way*
more time than doing 10.000 failed opens.

Regards,
Martin
 
S

Steve Holden

Dieter said:
There are such hooks. See e.g. the "meta_path" hooks as
described by PEP 302.

Indeed I have written PEP 302-based code to import from a relational
database, but I still don't believe there's any satisfactory way to have
[such a hooked import mechanism] be a first-class component of an
architecture that specifically requires an os.py to exist in the file
store during initialization.

I wasn't asking for an import hook mechanism (since I already knew these
to exist), but for a way to allow such mechanisms to be the sole import
support for certain implementations.

regards
Steve
 
S

Scott David Daniels

Martin said:
What do you mean, "unable to"? It just doesn't.
In fact, the interpreter doesn't necessarily know when it is
affecting the path.
Could it? Perhaps, if somebody wrote a patch.
Would the patch be accepted? Perhaps, if it didn't break something
else.

In the past, there was a silent guarantee that you could add
items to sys.path, and only later create the directories behind
these items. I don't know whether people rely on this guarantee.

If you only checked "lost" files/directories on the path a few
seconds later than the last time you checked, you might be able
to drive this "failed open" time down drastically without seriously
affecting those who care. Such an implementation should have a
call which allowed you to "clear" the timestamps for the "known bad"
entries.

--Scott David Daniels
(e-mail address removed)
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Scott said:
In fact, the interpreter doesn't necessarily know when it is
affecting the path.

Now I remember what makes this stuff really difficult: PEP 302
introduces path hooks (sys.path_hooks), allowing imports from
other sources than files. So the items on sys.path don't have
to be directory or file names at all, and importing from them
may still succeed if though stat fails.

Regards,
Martin
 
R

Robin Becker

Martin v. Löwis wrote:
.....
Now I remember what makes this stuff really difficult: PEP 302
introduces path hooks (sys.path_hooks), allowing imports from
other sources than files. So the items on sys.path don't have
to be directory or file names at all, and importing from them
may still succeed if though stat fails.
..... so is there implication of multiplicative behaviour?

ie if we have N importers and F leading failure syspath entries before the
correct one is found do we get order N*F failed stats/opens etc etc?
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Robin said:
ie if we have N importers and F leading failure syspath entries before
the correct one is found do we get order N*F failed stats/opens etc etc?

No. Each path hook is supposed to provide a decision as to whether this
is a useful item on sys.path only once; the importer objects themselves
are then cached (with some operation to clear the cache). Each path hook
may apply its own algorithm, e.g. looking at the syntactical structure
or the type of the sys.path item, so not all of them need stat/open
to determine whether they support the item.

The multiplicative behaviour rather results from the different type of
modules: each path item may carry .py, .pyc, .so, module.so, etc.

Regards,
Martin
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Robin said:
ie if we have N importers and F leading failure syspath entries before
the correct one is found do we get order N*F failed stats/opens etc etc?

No. Each path hook is supposed to provide a decision as to whether this
is a useful item on sys.path only once; the importer objects themselves
are then cached (with some operation to clear the cache). Each path hook
may apply its own algorithm, e.g. looking at the syntactical structure
or the type of the sys.path item, so not all of them need stat/open
to determine whether they support the item.

The multiplicative behaviour rather results from the different type of
modules: each path item may carry .py, .pyc, .so, module.so, etc.

Regards,
Martin
 
R

Robin Becker

Martin said:
No. Each path hook is supposed to provide a decision as to whether this
is a useful item on sys.path only once; the importer objects themselves
are then cached (with some operation to clear the cache). Each path hook
may apply its own algorithm, e.g. looking at the syntactical structure
or the type of the sys.path item, so not all of them need stat/open
to determine whether they support the item.

The multiplicative behaviour rather results from the different type of
modules: each path item may carry .py, .pyc, .so, module.so, etc.

Regards,
Martin
if the importers are tested statically how does a filesystem path ever manage
to get back into the loop if it was ever found missing? In other words if
things (eg python24.zip) are found not importable/usable in one pass how do
they get reinstated?
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Robin said:
if the importers are tested statically how does a filesystem path ever
manage
to get back into the loop if it was ever found missing? In other words if
things (eg python24.zip) are found not importable/usable in one pass how do
they get reinstated?

I think (but see the code yourself) that only the successful importers
are cached.

Regards,
Martin
 
D

Dieter Maurer

Steve Holden said:
...
Indeed I have written PEP 302-based code to import from a relational
database, but I still don't believe there's any satisfactory way to
have [such a hooked import mechanism] be a first-class component of an
architecture that specifically requires an os.py to exist in the file
store during initialization.


I wasn't asking for an import hook mechanism (since I already knew
these to exist), but for a way to allow such mechanisms to be the sole
import support for certain implementations.

We do not have "os.py" (directly) on the file system.
It lives (like everything else) in a zip archive.

This works because the "zipimporter" is put on
"sys.path_hook" before the interpreter starts executing Python code.

Thus, all you have to do: use a different Python startup
and ensure that you special importer (able to import e.g. "os")
is already set up, before you start executing Python code.


Dieter
 
D

Dieter Maurer

Martin v. Löwis said:
...
What do you mean, "unable to"? It just doesn't.

The original question was: "why does Python put non-existing
entries on 'sys.path'".

Your answer seems to be: "it just does not do it -- but it might
be changed if someone does the work".

This fine with me.
...
In the past, there was a silent guarantee that you could add
items to sys.path, and only later create the directories behind
these items. I don't know whether people rely on this guarantee.

I do not argue that Python should prevent adding non-existing
items on "path". This would not work as Python may not
know what "existing" means (due to "path_hooks").

I only argue that it should not *itself* (automatically) put items on path
where it knows the responsible importers and knows (or can
easily determine) that they are non existing for them.
...

I see. Out of curiosity: how much startup time was saved
when sys.path was explicitly stripped to only contain these
two zip files?

I cannot tell you precisely because it is very time consuming
to analyse cold start timing behavior (it requires a reboot for
each measurement).

We essentially have the following numbers only:

warm start cold start
(filled OS caches) (empty OS caches)

from file system 5s 13s
from ZIP archives 4s 8s
frozen 3s 5s

The ZIP archive time was measured after a patch to "import.c"
that prevents Python to view a ZIP archive member as a directory
when it cannot find the currently looked for module (of course,
this lookup fails also when the archive member is viewed as a directory).
Furthermore, all C-extensions were loaded via a "meta_path" hook (and
not "sys.path") and "sys.path" contained just the two Zip archives.
These optimizations led to about 3.000 opens (down from originally 12.500).
I would expect that importing 2500 modules takes *way*
more time than doing 10.000 failed opens.

You may be wrong: searching for non existing files may cause
disk io which is several orders of magnitude slower that
CPU activities.

The comparison between warm start (few disc io) and cold start
(much disc io) tells you that the import process is highly
io dominated (for cold starts).

I know that this does not prove that the failing opens contribute
significantly. However, a colleague reported that the
"import.c" patch (essential for the reduction of the number of opens)
resulted in significant (but not specified) improvements.


Dieter
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Dieter said:
The comparison between warm start (few disc io) and cold start
(much disc io) tells you that the import process is highly
io dominated (for cold starts).

Correct. However, I would expect that the contents of existing
directories is cached, and it might be that the absence of a directory
on sys.path is also cached (I know Linux does negative dentry caching).
I know that this does not prove that the failing opens contribute
significantly. However, a colleague reported that the
"import.c" patch (essential for the reduction of the number of opens)
resulted in significant (but not specified) improvements.

When I experimented with startup time for 2.4, I found that these
calls don't matter at all in any significant way (atleast not for
warm starts). Instead, I found that reducing the size of .pyc files,
by sharing interned strings, gives more speedup (and indeed, 2.4
changed the marshal format to accommodate shared interned strings).

So I would agree that IO makes a significant part of startup, but
I doubt it is directory reading (unless perhaps you have an
absent NFS server or some such).

Regards,
Martin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top