GC and security

L

Les Schaffer

i am working on a python application that uses encryption as part of its
security features. so then at some point someone has to enter a
passphrase into the system and passed into a decryption functions (we
are using gpg via subprocess).

so i am curious. so long as i drop all reference to the passphrase
string(s), eventually it gets garbage collected and the memory recycled.
so "before long" the phrase is gone from memory.

is there a best practice way to do this?

thanks

Les Schaffer
 
F

Felipe Almeida Lessa

2006/8/30 said:
is there a best practice way to do this?

I'm not a cryptographer, but you should really try the function
collect() inside the gc module.
 
P

Paul Rubin

Les Schaffer said:
so i am curious. so long as i drop all reference to the passphrase
string(s), eventually it gets garbage collected and the memory recycled.
so "before long" the phrase is gone from memory.

is there a best practice way to do this?

You can't rely on anything like that, either on the Python GC side or
from the OS (which might have long since written the passphrase out to
the swap disk) without special arrangement. Some OS's have system
calls to lock user pages in memory and prevent swapping, and GPG tries
to use them. "Best practice" if you're doing a high security app
involves using special hardware modules to wrap the keys. The
relevant standard is FIPS 140-2, with FIPS-140-3 in preparation:

http://csrc.nist.gov/cryptval/140-2.htm
http://csrc.nist.gov/cryptval/140-3.htm

For most purposes (e.g. some random web service), this stuff is
overkill, though.
 
L

Les Schaffer

Paul said:
You can't rely on anything like that, either on the Python GC side or
from the OS (which might have long since written the passphrase out to
the swap disk) without special arrangement.

we offered to disable swap for this app (its not memory intensive) but
this level of precaution was beyond what is currently desired. i
recently learned that Windows can be asked to zero the swap file during
shutdown, though i know there are ways around this one pass write.
Some OS's have system
calls to lock user pages in memory and prevent swapping, and GPG tries
to use them. "Best practice" if you're doing a high security app
involves using special hardware modules to wrap the keys.

understood, i meant best practice in terms of the less rigorous garbage
collection. if the collect() function hastens garbage collection for
unreferenced strings like a passphrase, it costs us nothing and buys us
a wee bit.
The
relevant standard is FIPS 140-2, with FIPS-140-3 in preparation:

http://csrc.nist.gov/cryptval/140-2.htm
http://csrc.nist.gov/cryptval/140-3.htm

thanks for these. we may be called upon to up the security level at some
point.
For most purposes (e.g. some random web service), this stuff is
overkill, though.

we're more sensitive than a web service, but not at the level of
hardware protection. it is health data related, and for the moment we
exceed the OMB's latest on laptop security:

http://www.whitehouse.gov/omb/memoranda/fy2006/m06-16.pdf

i don't see a mention of swap files on there, but maybe i missed it. and
the OMB doc exceeds the security level required by the client app.

les schaffer
 
A

Aahz

so i am curious. so long as i drop all reference to the passphrase
string(s), eventually it gets garbage collected and the memory recycled.
so "before long" the phrase is gone from memory.

Assuming you're talking about CPython, strings don't really participate
in garbage collection. Keep in mind that the primary mechanism for
reaping memory is reference counting, and generally as soon as the
refcount for an object goes to zero, it gets deleted from memory.
Garbage collection only gets used for objects that refer to other
objects, so it would only apply if string refcounts are being held by
GC-able objects.

Also keep in mind, of course, that deleting objects has nothing to do
with whether the memory gets overwritten...
 
L

Les Schaffer

Aahz said:
Assuming you're talking about CPython, strings don't really participate
in garbage collection. Keep in mind that the primary mechanism for
reaping memory is reference counting, and generally as soon as the
refcount for an object goes to zero, it gets deleted from memory.

ok so far ...
Garbage collection only gets used for objects that refer to other
objects, so it would only apply if string refcounts are being held by
GC-able objects.

you lost me by here ... is there something different about string
objects than other objects in Python? are you saying EVERY string in
Python stays in memory for the lifetime of the app?

Also keep in mind, of course, that deleting objects has nothing to do
with whether the memory gets overwritten...

no chance of being overwritten until deleted, no? and once deleted over
time there is some probability of being overwritten, no? and i am
curious how that works. it sounds like you are saying once a string,
always the same string, in python. if thats true, i am glad i know that.

Les Schaffer
 
P

Paul Rubin

Les Schaffer said:
understood, i meant best practice in terms of the less rigorous garbage
collection. if the collect() function hastens garbage collection for
unreferenced strings like a passphrase, it costs us nothing and buys us
a wee bit.

GC simply releases the memory for other uses in the application. It
doesn't necessarily zero the memory.

Just what attack are you trying to protect against, if swap space is
less of a problem than leaving keys around in ram?

Keep in mind that the weakest part of this application is likely to be
the passphrase itself. Is there a way to get rid of it?
we're more sensitive than a web service, but not at the level of
hardware protection. it is health data related, and for the moment we
exceed the OMB's latest on laptop security:

Is this data on a laptop? Why do you want to do encryption in the
application, instead of using an encrypted file system? Is there some
obstacle to using a token (like a smart card) to hold the key?
 
T

Tim Peters

[Aahz]
[Les Schaffer]
ok so far ...

Not really ;-) Refcounting is /a/ "garbage collection" technique, and
drawing a distinction between refcount-based reclamation and other
ways CPython reclaims memory has no bearing on your question.
you lost me by here ...

That tends to happen when an irrelevant distinction gets made ;-)
is there something different about string objects than other objects in
Python?

Not anything relevant wrt garbage collection. It may be relevant that
strings are immutable, since that prevents you from overwriting a
string's contents yourself.
are you saying EVERY string in Python stays in memory for the lifetime
of the app?

He wasn't, and they don't.
no chance of being overwritten until deleted, no?
True.

and once deleted over time there is some probability of being
overwritten, no?

True. Also true if you add the intended "non-zero" qualifier to
"probability" ;-)
and i am curious how that works.

Purely accidental -- nothing guaranteed -- details can (& do) change
across releases. Read obmalloc.c for a tour of the accidents du jour.
it sounds like you are saying once a string, always the same string, in python.
if thats true, i am glad i know that.

Not true, so be glad to forget it.

A curious possibility: if you do a debug build of Python, obmalloc.c
arranges to overwrite all of an object's memory as soon as the object
is reclaimed (by any means, refcounting or otherwise). That wasn't
for "security" (faux or otherwise), it's to make it easier to detect
buggy C code referencing freed memory.
 
F

Fredrik Lundh

Les said:
i am working on a python application that uses encryption as part of its
security features. so then at some point someone has to enter a
passphrase into the system and passed into a decryption functions (we
are using gpg via subprocess).

so i am curious. so long as i drop all reference to the passphrase
string(s), eventually it gets garbage collected and the memory recycled.
so "before long" the phrase is gone from memory.

Since Python uses reference counting, if you drop all references, the
object is garbaged collected immediately, and the associated memory is
freed. However, freeing memory doesn't mean that the memory is cleared,
so the passphrase will still be visible in memory, until some other part
of your program allocates the same memory area and overwrites it.

you could obscure things a bit by storing the passphrase as a list of
characters, or a list of integers, and write it to gpg one character at
a time (if that's possible; if not, you may need to write a custom
extension that builds a command string in a C-level buffer, runs the
command, and then overwrites the buffer before returning).

</F>
 
F

Fredrik Lundh

Fredrik said:
a time (if that's possible; if not, you may need to write a custom
extension that builds a command string in a C-level buffer, runs the
command, and then overwrites the buffer before returning).
>>> cmd = [101, 99, 104, 111, 32, 39, 104, 101, 108, 108, 111, 39]
>>> cmd = array.array("b", cmd) # build mutable buffer
>>> subprocess.call([buffer(cmd)], shell=True) 'hello'
>>> for i in range(len(cmd)): cmd = 0 # nuke it


the secret text will be visible in memory during the subprocess call,
but it won't linger around once the for-loop has finished.

(don't forget to put a try/finally clause around the critical part)

</F>
 
T

Tim N. van der Leeuw

Fredrik said:
Since Python uses reference counting, if you drop all references, the
object is garbaged collected immediately, and the associated memory is
freed. However, freeing memory doesn't mean that the memory is cleared,
so the passphrase will still be visible in memory, until some other part
of your program allocates the same memory area and overwrites it.

you could obscure things a bit by storing the passphrase as a list of
characters, or a list of integers, and write it to gpg one character at
a time (if that's possible; if not, you may need to write a custom
extension that builds a command string in a C-level buffer, runs the
command, and then overwrites the buffer before returning).

Storing the passphrase as a list of something has the advantage that
you could set all list-entries to zero, None or random values before
the list goes out of scope. The individual characters from the
passphrase can of course still be snooped from memory, somehow, in
theory -- but without any coherence. (At most the coherence of the
order of allocation).

However, such obfuscation does not make any real sense unless the
passphrase is always stored in a list and in a list only; if it enters
your program in the form of a string somehow then basically such
obfuscations seem very meaningless to me.


Perhaps the Python interpreter should be extended with a new C Type,
'secure_string', which clears all it's bytes before being freed. (Just
phantasizing out loud here, not being in any way serious!)


Cheers,

--Tim
 
L

Les Schaffer

Paul said:
GC simply releases the memory for other uses in the application. It
doesn't necessarily zero the memory.


release is followed by some probability of eventual overwrite; check.
Just what attack are you trying to protect against, if swap space is
less of a problem than leaving keys around in ram?

keys are on a USB drive key ring. gpg accesses the key ring as needed,
but in a separate process. and gpg is done with its work early on in our
app lifetime. comes back at end to encrypt and then app is done.

Keep in mind that the weakest part of this application is likely to be
the passphrase itself. Is there a way to get rid of it?

we got some suggestions from other parts of this thread. or do you mean
getting rid of the need for a passphrase? the passhprase protects the
private key on the USB drive.
Is this data on a laptop? Why do you want to do encryption in the
application, instead of using an encrypted file system?

i looked at EFS and TrueCrypt.

There was some questions (from MySQL pro) about how MySQL writes would
interact with EFS. also, EFS seems to store key pairs on disk, and the
passphrase was limited to the Windows user login password.

i remember looking at TrueCrypt and deciding against it. couldn't see
how to interact with it via python script. vaguely recall being
dissatisfied with keys and passphrase.

but the main reason? we were asked to encrypt the MySQL tables carrying
sensitive information.

Is there some
obstacle to using a token (like a smart card) to hold the key?

USB drive holds the GPG key. the drive must be inserted at start of
application, and must be pulled after authentication otherwise the app
warns and shuts down. The USB drive carries a digital signature, and
also encrypted identifying information for the user.

Les Schaffer
 
L

Les Schaffer

Tim said:
Purely accidental -- nothing guaranteed -- details can (& do) change
across releases. Read obmalloc.c for a tour of the accidents du jour.

cool. thanks for the pointer!

Not true, so be glad to forget it.

forget what??? ;-)

A curious possibility: if you do a debug build of Python, obmalloc.c
arranges to overwrite all of an object's memory as soon as the object
is reclaimed (by any means, refcounting or otherwise). That wasn't
for "security" (faux or otherwise), it's to make it easier to detect
buggy C code referencing freed memory.

i liked the other Tim's suggestion of a secure string ;-)

Les Schaffer
 
L

Les Schaffer

myself, i enjoy building C extensions, but would rather skip it for this
app.
cmd = [101, 99, 104, 111, 32, 39, 104, 101, 108, 108, 111, 39]
cmd = array.array("b", cmd) # build mutable buffer
subprocess.call([buffer(cmd)], shell=True) 'hello'
for i in range(len(cmd)): cmd = 0 # nuke it


i'll see if we can fit this into our subprocess scheme. if so, this is
good enough for now. think we'll use this for the mysql password too.

the secret text will be visible in memory during the subprocess call,
but it won't linger around once the for-loop has finished.

good enough for current rock and roll.

(don't forget to put a try/finally clause around the critical part)

okey doky.

Les Schaffer
 
P

Paul Rubin

Les Schaffer said:
keys are on a USB drive key ring. gpg accesses the key ring as needed,
but in a separate process. and gpg is done with its work early on in our
app lifetime. comes back at end to encrypt and then app is done.

gpg is fairly careful about passphrases. Why are you collecting the
passphrase in the Python app instead of letting gpg handle it?
we got some suggestions from other parts of this thread. or do you mean
getting rid of the need for a passphrase? the passhprase protects the
private key on the USB drive.

Yes, I mean get rid of the need for a passphrase, though since the
encrypted key is accessible on the USB drive, there's no way around
it. With smart cards it's generally considered ok to use a short PIN
instead of a passphrase; the card itself enforces a maximum # of
incorrect guesses.
but the main reason? we were asked to encrypt the MySQL tables carrying
sensitive information.

Does using an encrypted FS not take care of that? Also, I think there
are some FS's that use the Windows Crypto API (CAPI) either for bulk
encryption or for key management, so you can use secure passphrases,
hardware tokens, or whatever.
USB drive holds the GPG key. the drive must be inserted at start of
application, and must be pulled after authentication otherwise the app
warns and shuts down. The USB drive carries a digital signature, and
also encrypted identifying information for the user.

This is better than nothing but it's very easy to duplicate a USB key,
either intentionally or by spilling the contents through a routine
backup procedure, etc. A crypto token (USB dongle or smart card) is
way preferable for this type of thing. GPG has smart card support
that you might be able to use:

http://www.g10code.com/p-card.html
http://www.gnupg.org/(en)/howtos/card-howto/en/smartcard-howto-single.html

You might want to discuss this on sci.crypt, where specialists hang
out. As is fairly typical in these situations, it would help a lot if
you could describe the application in more detail.
 
L

Les Schaffer

Paul said:
gpg is fairly careful about passphrases. Why are you collecting the
passphrase in the Python app instead of letting gpg handle it?

as i recall we did that because we needed the passphrase more than once,
and didnt want to have to make the users type in something that long
that many times within a minute or so.

i forget whether gpg can be given a list of files to decrypt. but cuz of
what we are doing, i still believe we would need to call gpg more than
once.

Fred Lundh's scheme for blanking the passphrase looks good enough for now.

Yes, I mean get rid of the need for a passphrase, though since the
encrypted key is accessible on the USB drive, there's no way around
it. With smart cards it's generally considered ok to use a short PIN
instead of a passphrase; the card itself enforces a maximum # of
incorrect guesses.

by any chance, do you have any experience with these USB/fingerprint
things?

i think the PI on this project will consider smart cards overkill, but
we will suggest these as alternatives before we get out into the field.

Does using an encrypted FS not take care of that?

yea but ...

we are being asked to backup the MySQL tables onto the USBKey, so they
need to be encrypted there as well. which means we need some kind of EFS
on there as well. i wouldnt want to use more than one kind of encryption
in this app, or better said, i dont want more than one set of
keys/passes in this app. so we'd need an EFS on the Windows machines and
on the USB keys that can utilize the same encryption keys.

Also, I think there
are some FS's that use the Windows Crypto API (CAPI) either for bulk
encryption or for key management, so you can use secure passphrases,
hardware tokens, or whatever.

FS's other than Microsoft's EFS? i'll take a look at their capabilities.
and there is still MySQls comment about dealing with encrypted file
systems. we were advised by one of their people to test to make sure the
writes are not interfered with on an EFS.

This is better than nothing but it's very easy to duplicate a USB key,
either intentionally or by spilling the contents through a routine
backup procedure, etc.

but they still need the passphrase, hence keeping our eye on that silly
string.

not worried about accidents at the moment. if the USB keys had drive
serial numbers we could validate on that. but i just checked mine and it
has none. WMI reports that physical drives have a "Signature", not sure
what that is.


A crypto token (USB dongle or smart card) is
way preferable for this type of thing. GPG has smart card support
that you might be able to use:

http://www.g10code.com/p-card.html
http://www.gnupg.org/(en)/howtos/card-howto/en/smartcard-howto-single.html



would definitely consider this for Gen II.
You might want to discuss this on sci.crypt, where specialists hang
out. As is fairly typical in these situations, it would help a lot if
you could describe the application in more detail.

need to get permission from PI on the project.

thanks for the comments, they validate my concerns. if you know, or are,
a pro in python and security, we might be able to manage a small
consulting gig. but if its not python-relevant, lets talk offlist. in
any case, many thanks.

Les Schaffer
 
P

Paul Rubin

Les Schaffer said:
i forget whether gpg can be given a list of files to decrypt. but cuz of
what we are doing, i still believe we would need to call gpg more than
once.

Yes, gpg --batch if I remember correctly.
Fred Lundh's scheme for blanking the passphrase looks good enough for now.

Again, think hard about why you even want to blank the passphrase in
ram. If you sense that there's a reason for it, then maybe there is
one; but in that case, try to identify exactly what problem blanking
the passphrase is supposed to solve, and determine if blanking
actually does solve it. The issue I'm thinking of is swap space.
Leaking key or passphrase material there where it might be recovered
from a repurposed disk sometime in the indefinite future seems like a
more serious threat than the likelihood of some hostile process
scanning user ram while the application runs and knowing what to look
for.
by any chance, do you have any experience with these USB/fingerprint
things?

Unfortunately not; I've been interested in looking into them, though
fingerprint readers are generally not considered that great an idea
among the security crowd.
we are being asked to backup the MySQL tables onto the USBKey, so they
need to be encrypted there as well. which means we need some kind of EFS
on there as well. i wouldnt want to use more than one kind of encryption
in this app, or better said, i dont want more than one set of
keys/passes in this app. so we'd need an EFS on the Windows machines and
on the USB keys that can utilize the same encryption keys.

The idea of EFS is to keep the keys and encryption out of the app. If
you have EFS's on both the main drive and the USB key, there's no
problem with using separate keys for them. They can be controlled by
the same user PIN or password.
FS's other than Microsoft's EFS? i'll take a look at their capabilities.

Unfortunately I'm not that familiar with Windows EFS's; I'm mainly a
Unix developer. I guess I can check into this.
and there is still MySQls comment about dealing with encrypted file
systems. we were advised by one of their people to test to make sure the
writes are not interfered with on an EFS.

Well, EFS's are supposed to be transparent, but if they say to test it
then you better test it ;-). I'd hope there's be nothing worse than a
tolerable performance hit.
but they still need the passphrase, hence keeping our eye on that silly
string.

Right, secure passphrases (i.e. with enough entropy to protect an
encryption key) are a big usability problem--users forget them, or
write them down on the token, etc. If you're doing some early test
with just a few users who are security-conscious, maybe it's ok to
rely on passphrases, but for a wide deployment with non-technical
users, I think it's worth looking for alternatives.
thanks for the comments, they validate my concerns. if you know, or are,
a pro in python and security, we might be able to manage a small
consulting gig. but if its not python-relevant, lets talk offlist. in
any case, many thanks.

I do a lot of Python security stuff, though not much Windows stuff.
In any case, I'd be happy to talk offlist (email being sent).

Paul
 
D

Dennis Lee Bieber

by any chance, do you have any experience with these USB/fingerprint
things?
This is one, of three (and one of the others is from the same
company), which passed my employer's specifications.

http://www.beyondifsolutions.com/html/stealthmxp_.html

OTOH -- did anyone watch Mythbusters a few weeks ago... When they
spoofed a biometric door lock using a photocopy of a valid fingerprint
(held by a warm finger after licking the surface -- the door lock was
supposed to work by sensing: fingerprint, perspiration [conductivity],
body temperature).

This is after they'd made an 11x14 enlargement of the "stolen"
print, cleaned it up with a marker, reduced to life size, and etched
onto printed circuit board to form a mold for making a latex "skin".
Which, BTW, also fooled the lock.

They obtained the fingerprint master by leaving a CD case on the
victim's computer, then dusting it after he'd moved it.

They were also able to log into his computer using the attached
fingerprint reader.

Granted, the door may also have required a numeric passcode which --
for purposes of the test -- they'd been given in advance. I believe the
USB flash memory units are designed to "lock" if too many false attempts
are made.
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
P

Paul Rubin

Dennis Lee Bieber said:
This is after they'd made an 11x14 enlargement of the "stolen"
print, cleaned it up with a marker, reduced to life size, and etched
onto printed circuit board to form a mold for making a latex "skin".
Which, BTW, also fooled the lock.

I remember copying my own fingerprints with Elmer's glue when I was in
third grade or so (just put a layer of glue on your fingertip, let it
dry, and peel it off). I've always wanted to test that method on
today's electronic scanners, but it worked (at least somewhat) for
leaving visible fake prints on things. The prints would have had the
wrong oils and stuff so maybe Sherlock Holmes-level forensics could
have picked up the difference.
 
F

Fredrik Lundh

Dennis said:
This is after they'd made an 11x14 enlargement of the "stolen"
print, cleaned it up with a marker, reduced to life size, and etched
onto printed circuit board to form a mold for making a latex "skin".
Which, BTW, also fooled the lock.

http://www.diva-portal.org/liu/abstract.xsql?dbid=2397

Sandström, Marie: Liveness Detection in Fingerprint Recognition Systems

"Nine different systems were tested at the CeBIT trade fair in Germany and
all were deceived. Three other different systems were put up against more
extensive tests with three different subjects. All systems were circumvented
with all subjects' artificial fingerprints, but with varying results."

</F>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,431
Messages
2,571,677
Members
48,796
Latest member
Greg L.

Latest Threads

Top