Error with co_filename when loading modules from zip file

B

Bob

Hi,

I'm using a program that distributes python in a zip
file and ran into an issue with the logging package.
It seems to return the wrong filename/line number when
loading python from a zip file. Please help!

I'm using python31, and have copied the lib directory to
/home/user/python3.1
and have created a zip of that directory and placed it in
/home/user/python3.1/python31.zip

The logging package gets the filename and line number
of the calling function by looking at two variables, the filename
of the frame in the stack trace and the variable logging._srcfile.
The comparison is done in logging/__init__.py:findCaller.

In the situation above, when I run,
PYTHONPATH=/home/user/python3.1 ./myexe run.py
I see
filename=/home/user/python3.1/logging/__init__.py
_srcfile=/home/user/python3.1/logging/__init__.py
Here, filename and _srcfile are the same, so the logger correctly
outputs the filename of run.py.

When I run,
PYTHONPATH=/home/user/python3.1/python31.zip ./myexe run.py
I see
filename=/home/user/python3.1/logging/__init__.py
_srcfile=/home/user/python3.1/python31.zip/logging/__init__.py
Here, filename and _srcfile are different, so the logger incorrectly
outputs the filename of /home/user/python3.1/logging/__init__.py

I've noticed this:
- the filename seems to be set when you compile the module
- it seems to be set when you load the module (even after moving it)
- it does not seem to get set when you load the module from
the pyc in the zip file!

I've tried putting only the pyc files, only the py files
and both in the zip file.

Any help?

Thanks,
Bob

run.py:
import logging

class Handler(logging.Handler):
def __init__(self):
logging.Handler.__init__(self)

def emit(self, record):
print('message: ' + record.msg)
print('filename: ' + record.pathname)
print('line: ' + str(record.lineno))

logging.getLogger().addHandler(Handler())
logging.error('hi')
 
V

Vinay Sajip

The logging package gets the filename and line number
of the calling function by looking at two variables, the filename
of the frame in the stack trace and the variable logging._srcfile.
The comparison is done in logging/__init__.py:findCaller.

The _srcfile is computed in logging/__init__.py - can you see which of
the paths it takes when computing _srcfile?
I've tried putting only the pyc files, only the py files
and both in the zip file.

I think the filename info might be stored in the .pyc from when you
ran it outside the .zip. If you delete all .pyc files and only
have .py in the .zip, what happens?

Regards,

Vinay Sajip
 
B

Bob Rossi

The _srcfile is computed in logging/__init__.py - can you see which of
the paths it takes when computing _srcfile?

Yes, it's this one,
elif __file__[-4:].lower() in ['.pyc', '.pyo']:
_srcfile = __file__[:-4] + '.py'

_srcfile I beleive is correct. It's filename that isn't IMHO.
I think the filename info might be stored in the .pyc from when you
ran it outside the .zip. If you delete all .pyc files and only
have .py in the .zip, what happens?

Nice one.

I tried putting only pyc file, only py files and both
in the zip. But I never tried putting the py files in the zip
and deleting the pyc files. That makes it work as I'm guessing
it has to recompile the python bytecode, making the filename
and _srcfile match.

The problem with this approach, is that it's less efficient
to store the pyc files, since I've got to recompile them
on startup, right?

I found this issue,
http://bugs.python.org/issue6811
and this related patch,
http://hg.python.org/cpython/rev/5deb2094f033
that I think might address this issue. Although that's using 3.3
which isn't released yet.

This is probably an issue that could be addressed in the logging
library. Comparing the compiled in filename
(which is determined at compile time) and the source file name
(which is determined at module load time) doesn't seem to
play well when you are moving the interpreter around in a zip file.
I don't think one would expect it to.

One solution would be to look to see if the filename ends in
pythonNM[.zip]/logging/__init__.py

Any suggestions?

Thanks,
Bob
 
B

Bob Rossi

The _srcfile is computed in logging/__init__.py - can you see which of
the paths it takes when computing _srcfile?


I think the filename info might be stored in the .pyc from when you
ran it outside the .zip. If you delete all .pyc files and only
have .py in the .zip, what happens?

Darn it, this was reported in 2007
http://bugs.python.org/issue1180193
and it was mentioned the logging package was effected.

Yikes.

Any resolutions?

Bob
 
V

Vinay Sajip

Darn it, this was reported in 2007
 http://bugs.python.org/issue1180193
and it was mentioned the logging package was effected.

Yikes.

I will think about this, but don't expect any quick resolution :-( I
think the right fix would be not in the logging package, but in the
module loading machinery (as mentioned on that issue).

I wouldn't worry about the performance aspect - once the logging
package is loaded, there's no performance impact. That's a tiny one-
off hit which you will probably not notice at all.

Regards,

Vinay Sajip
 
B

Bob Rossi

I will think about this, but don't expect any quick resolution :-( I
think the right fix would be not in the logging package, but in the
module loading machinery (as mentioned on that issue).

I wouldn't worry about the performance aspect - once the logging
package is loaded, there's no performance impact. That's a tiny one-
off hit which you will probably not notice at all.

OK.

Do you know where the bytecode gets stored when you load a py
file from a zip?

My program can potentially run for hours, from an embedded context,
and could call into the logger and other py files over and over.

Are the bytecode files stored in RAM one time, or recomputed each
time they are needed?

Thanks,
Bob
 
P

Peter Otten

Bob said:
OK.

Do you know where the bytecode gets stored when you load a py
file from a zip?

My program can potentially run for hours, from an embedded context,
and could call into the logger and other py files over and over.

Are the bytecode files stored in RAM one time, or recomputed each
time they are needed?

The bytecode is generated once when the module is loaded and kept as part of
the module object in the sys.modules cache unless you explicitly reload()
the module. For a long-running program the compilation overhead is
negligable.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,479
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top