would be nice: import from archive

J

Just

Jorge Godoy <[email protected]> said:
It would be great to have one example with more than one file.

From the discussion I got curious and tested it here and -- since
Python's so efficient I wasn't surprised that -- it worked.



$ cat test.py
def test():
print "Test from file 1"

$ cat test2.py
def test():
print "Test from file 2"




I also noticed that there was no '.pyc' created for that import, as is
usually done for uncompressed modules.

The zipimport module will never write to the zip archive, so for most
efficient imports, you have to store .pyc data in there yourself.
zipimport is mostly meant as a repackaging tool, and typical zip files
only contain .pyc files.

Just
 
P

Paul Rubin

Paul Rubin said:
I think jar files are just zip files containing an extra file (called
"manifest") that has signatures in it. So you can import from a jar
as if it were a zip.

But to add to that, if module zipfile is going to eventually expect
jar files to be signed, the first patch needed is that if it doesn't
have code to actually check the signatures, it should refuse to load
jar files.

I guess I better check into what Java does about this. It's been a
while since I've used Java, but I seem to remember that signing is not
mandatory.
 
A

Alex Martelli

Paul Rubin said:
I think this is reasonable, except what does the import statement look
like? Do you say something like "import frob from bar.jar"?

No, you say, as always:

import frob

Importing looks at each item on sys.path, and each item can be:

1. a directory X -- then import looks for X/frob.py or a subdirectory
X/frob/ containing an __init__.py (or in either case .pyc or .pyo)

2. a zipfile X.zip -- then import looks inside (unsigned) file X.zip for
a frob.py, frob.pyc, etc

3. [only novelty...] a signed zipfile X.jar -- then import verifies the
signature then if valid proceed as in 2

I think jar files are just zip files containing an extra file (called
"manifest") that has signatures in it. So you can import from a jar
as if it were a zip.

But it might be nice to check signatures automatically if reading such
files is a common task.


Alex
 
A

Alex Martelli

Paul Rubin said:
But to add to that, if module zipfile is going to eventually expect
jar files to be signed, the first patch needed is that if it doesn't
have code to actually check the signatures, it should refuse to load
jar files.

Presumably that would be an optional argument on the ZipFile constructor
specifying what to do about signatures -- defaulting to 'ignore' for
backwards compatibility, I guess, but possibly 'strict' or 'optional' or
something.
I guess I better check into what Java does about this. It's been a
while since I've used Java, but I seem to remember that signing is not
mandatory.

OK, but it might make for a nice optional feature anyway.


Alex
 
P

Paul Rubin

OK, but it might make for a nice optional feature anyway.

Well, in Java, jars are the only thing you can import from, and you
need to be able to import from unsigned ones or else you have to sign
them all the time. If we use a naming convention, then if you want an
unsigned archive you can just name it .zip instead of .jar. But
still, I better check. I do remember that in Javascript (which also
used jar files under the Netscape browsers of the day), you could load
code from unsigned jars and the code could do normal operations. But
code that did "dangerous" operations wouldn't run unless it was loaded
from a signed jar file.
 
B

Benjamin Niemann

Alex said:
I think that would be an excellent idea. If it was just about allowing
import from signed zipfiles it might not be needed, but how best to let
the user optionally DIS-allow imports from UN-signed files does appear
to be something requiring a little debate. An environment variable
would have the advantage of letting the disallowing work even for the
early imports that Python does before application code gets control, but
some people dislike relying on environment variables particularly for
security-related configuration tasks. Would it make sense to rely on a
naming convention instead? I.e. foo.zip would be unsigned but bar.jar
would have to be signed or else no go. This would have the advantage of
allowing substantial granularity in controlling this.
Isn't the purpose of signatures that the importing program can trust the
module? If it's implemented as you suggest, an attacker could just
inject path to an unsigned module into PYTHONPATH to fool a program. How
about something like

require_signature('mymodule')
import mymodule

or

import mymodule
verify_module(mymodule)

Another question is, where to place (require|verify)_signature() (that
could also take a CA key (or list of) as optional argument to only allow
modules signed by this CA). It must not be imported from an untrusted
module.
The whole signing thing probably make only sense, if python and it's
stdlib can be trusted (=signed).

Or am I missing other useful applications of signed archives?
 
P

Paul Rubin

Benjamin Niemann said:
import mymodule
verify_module(mymodule)

This is no good. The import runs any code in the module, so the sig
has to verify BEFORE the module loads.
Another question is, where to place (require|verify)_signature() (that
could also take a CA key (or list of) as optional argument to only
allow modules signed by this CA). It must not be imported from an
untrusted module.

Correct, that's the messy infrastructure I mentioned. My basic idea is
"do whatever Java does".
 
A

Alex Martelli

Benjamin Niemann said:
Isn't the purpose of signatures that the importing program can trust the
module? If it's implemented as you suggest, an attacker could just
inject path to an unsigned module into PYTHONPATH to fool a program. How

If the attacker is able to alter sys.path then it does not matter
whether zipfiles are even considered -- the attacker could simply
position a .pyc file early on the path.
about something like

require_signature('mymodule')
import mymodule

This could be made to work, but only if _every_ module was so checked
before importing it; otherwise, even just one unchecked module could
easily subvert __import__ or other aspects of the import hook mechanism.

So, if you're considering this approach, it makes more sense to switch
on module checking globally in an early phase of Python's startup
(because Python starts importing modules pretty early indeed). New
conventions will also be needed for signature of .py, .pyc, .pyo, and
..so (or other binary DLLoid files containing Python extensions).

It appears to me that this is a project of orders of magnitude more work
than the original idea, which didn't assume the attacker could freely
alter sys.path, and protected only against altered or replaced zipfiles
specifically -- presumably files that have been legitimately placed on
the path by authorized agents.
or

import mymodule
verify_module(mymodule)

Too late:'import mymodule' runs code in mymodule, shutting the barn door
after mymodule has tramped all over your system is little use.
Another question is, where to place (require|verify)_signature() (that
could also take a CA key (or list of) as optional argument to only allow
modules signed by this CA). It must not be imported from an untrusted
module.
The whole signing thing probably make only sense, if python and it's
stdlib can be trusted (=signed).

The stdlib Python (.pyc) parts could be moved into a .jar (signed) just
as easily as into a .zip (unsigned). The EXE and DLL's involved may be
quite a problem, though, since for those you're in the hands of the
operating system -- what could Python itself possibly do to stop an
altered Python.Exe from running?!
Or am I missing other useful applications of signed archives?

Remote distribution of code. My program's startup checks if an updated
version of foobar.jar purports to be available, and if so downloads it
and places it where the previous version used to be. Admittedly, in
this case, checking once and for all right after the download might work
better than checking after each and every import. (Not sure why but
this reminds me of the old 'end to end approach' issue;-).

Unfortunately, Python currently doesn't have a working 'sandbox'
mechanism where code might run in a resricted way if it hadn't passed
all needed checks. This lack, among other things, may certainly lessen
the usefulness of checks performed at (or, rather, just before) import
time.


Alex
 
B

Benjamin Niemann

Alex said:
If the attacker is able to alter sys.path then it does not matter
whether zipfiles are even considered -- the attacker could simply
position a .pyc file early on the path.




This could be made to work, but only if _every_ module was so checked
before importing it; otherwise, even just one unchecked module could
easily subvert __import__ or other aspects of the import hook mechanism.

So, if you're considering this approach, it makes more sense to switch
on module checking globally in an early phase of Python's startup
(because Python starts importing modules pretty early indeed). New
conventions will also be needed for signature of .py, .pyc, .pyo, and
.so (or other binary DLLoid files containing Python extensions).

It appears to me that this is a project of orders of magnitude more work
than the original idea, which didn't assume the attacker could freely
alter sys.path
Mmmm, seems I missed this point...
, and protected only against altered or replaced zipfiles
specifically -- presumably files that have been legitimately placed on
the path by authorized agents.




Too late:'import mymodule' runs code in mymodule, shutting the barn door
after mymodule has tramped all over your system is little use. correct



The stdlib Python (.pyc) parts could be moved into a .jar (signed) just
as easily as into a .zip (unsigned). The EXE and DLL's involved may be
quite a problem, though, since for those you're in the hands of the
operating system -- what could Python itself possibly do to stop an
altered Python.Exe from running?!
A use-case that came to my mind was a suid program that wants to verify
that everything that it imports is what it expects, specifically not to
allow the non-root user to inject any malicious replacement modules.
Doing this with module signatures is probably not the easiest way...
Remote distribution of code. My program's startup checks if an updated
version of foobar.jar purports to be available, and if so downloads it
and places it where the previous version used to be. Admittedly, in
this case, checking once and for all right after the download might work
better than checking after each and every import. (Not sure why but
this reminds me of the old 'end to end approach' issue;-).
Yes, this could be handled by a generic 'file-signature-verification'
mechanism. No need to delay verification until import.
Unfortunately, Python currently doesn't have a working 'sandbox'
mechanism where code might run in a resricted way if it hadn't passed
all needed checks. This lack, among other things, may certainly lessen
the usefulness of checks performed at (or, rather, just before) import
time.
Yep. Python treats us as adults. But adults often have to deal with
children and the environment should give adults some kind of authority
and places where the kiddies can play without causing damage... ;) This
is currently not the case for Python.
 
T

Terry Reedy

Paul Rubin said:
This is no good. The import runs any code in the module, so the sig
has to verify BEFORE the module loads.

'import x' is syntactic sugar for 'x = __import__('x')'. I do not see it
as necessary that sugar for the common case need cover every possible case.
So, how about giving __import__ had an optional param 'signed' defaulted to
False, to allow signed =True or signed = CA?

Terry J. Reedy
 
P

Paul Rubin

Terry Reedy said:
'import x' is syntactic sugar for 'x = __import__('x')'. I do not see it
as necessary that sugar for the common case need cover every possible case.
So, how about giving __import__ had an optional param 'signed' defaulted to
False, to allow signed =True or signed = CA?

Man, that __import__ thing is ugly. I think it's better to extend the
syntax, e.g.
import x(a,b) => __import__('x', {'a':None, 'b':None})
import x(a=v1,b=v2)=> __import__('x', {'a':v1, 'b':v2})

so you could say
import x(signed)
or
import x(signed, certfile='mycerts.pem')

or whatever.
 
T

Terry Reedy

Paul Rubin said:
Man, that __import__ thing is ugly.

Yes... but importing from signed zips is sufficiently rare and esoteric
that I would not see surface ugliness that accompanies using current syntax
as the most important consideration.

I think it's better to extend the syntax, e.g.
import x(a,b) => __import__('x', {'a':None, 'b':None})
import x(a=v1,b=v2)=> __import__('x', {'a':v1, 'b':v2})

Identifier(args) is currently a call of identifier with args and
overloading that syntax to mean somthing similar but different is, to me,
even uglier in a different sort of way.

Terry J. Reedy
 
P

Paul Rubin

Terry Reedy said:
Identifier(args) is currently a call of identifier with args and
overloading that syntax to mean somthing similar but different is, to me,
even uglier in a different sort of way.

Ok, use brackets instead: import x[a,b].
Then it's just a matter of overloading the index operator on __import__.
 
P

Peter Otten

Paul said:
Terry Reedy said:
Identifier(args) is currently a call of identifier with args and
overloading that syntax to mean somthing similar but different is, to me,
even uglier in a different sort of way.

Ok, use brackets instead: import x[a,b].
Then it's just a matter of overloading the index operator on __import__.

@signed
@certfile('mycerts.pem')
import x

anybody?


Peter
 
J

Jorge Godoy

Just said:
The zipimport module will never write to the zip archive, so for most
efficient imports, you have to store .pyc data in there yourself.
zipimport is mostly meant as a repackaging tool, and typical zip files
only contain .pyc files.

They aren't created even outside of the zip archive, this is what I
meant ;-)
 
J

Jorge Godoy

Just said:
But since .pyc's are always generated in the same directory as the .py
files, where else would you expect them to be generated?

At the directory where the zip archive is stored.
 
J

Just

Jorge Godoy <[email protected]> said:
At the directory where the zip archive is stored.

How does that follow? The zip archive _itself_ is the "directory" where
the .py files are, why would Python suddenly choose to write .pyc files
one level up? And what about packages? It simply doesn't work that way.

Just
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Paul said:
so you could say
import x(signed)
or
import x(signed, certfile='mycerts.pem')

or whatever.

I believe that import is the wrong point in time for checking
signatures. You want to check the signature when the file is
added to sys.path, i.e.

imp.verify_signature(filename)
sys.path.append(filename)

or

imp.verify_all_signatures(sys.path)

That way, you can guarantee that trusted code is on sys.path
all the time. Then, you can also trust any import statement.

Regards,
Martin
 
J

Jorge Godoy

Just said:
How does that follow? The zip archive _itself_ is the "directory" where
the .py files are, why would Python suddenly choose to write .pyc files
one level up? And what about packages? It simply doesn't work that way.

Because the implementation that allowed to unzip the "directory" and
find the files in there would also allow to place the '.pyc' in a
different place such as the real directory where the zip file resides.

Python didn't use to open zip files also, and now it does. I don't see
nothing wrong with making it also writing the '.pyc' files. Do you?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,796
Messages
2,569,645
Members
45,364
Latest member
CrypttoTaxSofttware

Latest Threads

Top