securely overwrite files with Python

Bart Nessux · Mar 5, 2004

Is there a shred module in Python? You know, the kind that overwrites
files that one doesn't want others to see? I can call the unix program
like this:

x = os.popen("/usr/bin/shred -uvz NAME_OF_FILE")
x.read()
x.close()

But, I'd like a platform independant (portable) way of doing this if
possible.

Thanks,
Bart

Skip Montanaro · Mar 5, 2004

Bart> Is there a shred module in Python? You know, the kind that
Bart> overwrites files that one doesn't want others to see?

I've never used shred before, but here's an essentially untested stab at the
problem:

#!/usr/bin/env python

import os
import random
import sys
import md5

def shred(f, npasses=5):
sz = os.path.getsize(f)
for n in range(npasses):
dig = md5.new(file(f).read()).hexdigest()
print >> sys.stderr, "pass:", n+1,
print >> sys.stderr, "digest:", dig
chars = [chr(i) for i in range(128)]
random.shuffle(chars)
chars = "".join(chars)
bytesleft = sz
fp = file(f, "wb")
while bytesleft:
nbytes = min(bytesleft, 128)
fp.write(chars[:nbytes])
bytesleft -= nbytes
fp.close()
dig = md5.new(file(f).read()).hexdigest()
print >> sys.stderr, "last digest:", dig
os.unlink(f)

if __name__ == "__main__":
tmpf = "dummyf"
file(tmpf, "wb").write(file("/etc/hosts").read()*5)
shred(tmpf)

Note that it does no error checking, nor does it have any force write arg.

Skip

Mathias Waack · Mar 5, 2004

Bart said:
Is there a shred module in Python? You know, the kind that
overwrites files that one doesn't want others to see? I can call
the unix program like this:

x = os.popen("/usr/bin/shred -uvz NAME_OF_FILE")
x.read()
x.close()

But, I'd like a platform independant (portable) way of doing this
if possible.

First: It is not possible. Let me cite some sentences from shred(1):

CAUTION: Note that shred relies on a very important
assumption: that the filesystem overwrites data in place.
This is the traditional way to do things, but many modern
filesystem designs do not satisfy this assumption.

But you can make recovery a bit harder by simply overwriting the file
(in fact thats just what shred does):

(just to describe the idea, I'm sure it will not work

f = file("file_to_shred","a+")
size = os.stat("file_to_shred").st_size
f.seek(0)
for i in xrange(size):
f.write("x")
for i in xrange(size):
f.write(0)
f.close()
os.unlink("file_to_shred")

To achieve real security you need much knowledge about the
underlaying hardware and filesystem.

Mathias

Bob Ippolito · Mar 5, 2004

First: It is not possible. Let me cite some sentences from shred(1):

CAUTION: Note that shred relies on a very important
assumption: that the filesystem overwrites data in place.
This is the traditional way to do things, but many modern
filesystem designs do not satisfy this assumption.

Somewhat OT, Mac OS X 10.3 is one of the operating systems where this
assumption is false. Files smaller than a certain size get
automatically moved around on the disk when it makes sense to do so in
order to reduce fragmentation.

-bob

Skip Montanaro · Mar 5, 2004

Bob> Somewhat OT, Mac OS X 10.3 is one of the operating systems where
Bob> this assumption is false. Files smaller than a certain size get
Bob> automatically moved around on the disk when it makes sense to do so
Bob> in order to reduce fragmentation.

I'm not sure I understand how that can work. Suppose I have multiple (hard)
links to a small file named "small". If the OS moves it around to reduce
fragmentation (implying it will have a different inode next time it's
opened) how does it efficiently track down and change all inode references
to it? In theory it could keep a cache mapping inode numbers back to the
directories which reference them, but that could consume a fairly large
chunk of memory to maintain.

Skip

Peter Hansen · Mar 5, 2004

Mathias said:
First: It is not possible. Let me cite some sentences from shred(1):

CAUTION: Note that shred relies on a very important
assumption: that the filesystem overwrites data in place.
This is the traditional way to do things, but many modern
filesystem designs do not satisfy this assumption.

I'm fairly sure that at least some "journalling" or "logging" file
systems, such as ReiserFS, or maybe ext3, on Linux, violate this assumption.

-Peter

Bob Ippolito · Mar 5, 2004

Bob> Somewhat OT, Mac OS X 10.3 is one of the operating systems
where
Bob> this assumption is false. Files smaller than a certain size
get
Bob> automatically moved around on the disk when it makes sense to
do so
Bob> in order to reduce fragmentation.

I'm not sure I understand how that can work. Suppose I have multiple
(hard)
links to a small file named "small". If the OS moves it around to
reduce
fragmentation (implying it will have a different inode next time it's
opened) how does it efficiently track down and change all inode
references
to it? In theory it could keep a cache mapping inode numbers back to
the
directories which reference them, but that could consume a fairly large
chunk of memory to maintain.

I can't speak to what it does exactly, I'm no HFS+ or xnu expert, but I
know it only applies to files under 20mb on journaled HFS+ file
systems... I believe that HFS+ has a level of indirection between the
file's "inode" (HFS+ probably calls it something else) and the set of
blocks it is represented with on disk, so I don't believe that moving
the blocks around really has anything to do with hard links or creating
a new inode.

You can look for yourself if you're particularly interested.. it's part
of the APSL licensed Darwin 7.x kernel (xnu):

( should be mountable by WebDAV )
http://www.opensource.apple.com/darwinsource/10.3.2/xnu-517.3.7/bsd/hfs/

files of interest would be:
hfs_hotfiles.c
hfs_readwrite.c
hfs_vnops.c

In particular, you would be interested in the hfs_relocate function in
hfs_readwrite.c.

-bob

Bart Nessux · Mar 6, 2004

Skip said:
Bart> Is there a shred module in Python? You know, the kind that
Bart> overwrites files that one doesn't want others to see?

I've never used shred before, but here's an essentially untested stab at
the problem:

#!/usr/bin/env python

import os
import random
import sys
import md5

def shred(f, npasses=5):
sz = os.path.getsize(f)
for n in range(npasses):
dig = md5.new(file(f).read()).hexdigest()
print >> sys.stderr, "pass:", n+1,
print >> sys.stderr, "digest:", dig
chars = [chr(i) for i in range(128)]
random.shuffle(chars)
chars = "".join(chars)
bytesleft = sz
fp = file(f, "wb")
while bytesleft:
nbytes = min(bytesleft, 128)
fp.write(chars[:nbytes])
bytesleft -= nbytes
fp.close()
dig = md5.new(file(f).read()).hexdigest()
print >> sys.stderr, "last digest:", dig
os.unlink(f)

if __name__ == "__main__":
tmpf = "dummyf"
file(tmpf, "wb").write(file("/etc/hosts").read()*5)
shred(tmpf)

Note that it does no error checking, nor does it have any force write arg.

Skip

Thanks Skip! I'll give this a go.

Thomas Bellman · Mar 6, 2004

Skip Montanaro said:
I'm not sure I understand how that can work. Suppose I have multiple (hard)
links to a small file named "small". If the OS moves it around to reduce
fragmentation (implying it will have a different inode next time it's
opened) how does it efficiently track down and change all inode references
to it? In theory it could keep a cache mapping inode numbers back to the
directories which reference them, but that could consume a fairly large
chunk of memory to maintain.

I think you have misunderstood how Unix file systems work.

A directory is a list of directory entries, each entry consisting
of a name and an inode number. There may be several directory
entries in a file system that point to the same inode, and the
entries can be in different directories, and need not have the
same name. The names are also called "hard links". All names
for a file are equal in status -- none is worth more than any of
the others.

The inode is the central point of information for a file. It
holds information like:

- file type (regular file, directory, device file, ...)
- file permissions
- file ownership
- timestamps (data modification, inode modification, read)
- number of names (hard links) the file has
- file size
- list of data blocks for the file

The actual location of the inode on the storage device can
typically be calculated from the inode number, and from a small
index of inode clusters in the file system.

Finally there are the actual data blocks for the file. They are
*not* part of the inode, and they do not need to be placed near
the inode -- they can be scattered around at random places on the
storage device. The list of data blocks in the inode holds only
pointers to the data blocks.

In a typical Unix file systems, like the Fast File System of BSD
ancestry (in common use in many Unices; it is called UFS in
SunOS, for example), or the 2nd and 3rd Extended File System in
Linux (ext2 and ext3), the inode only contains pointers to the
first few datablocks (10 is a common number). There is also a
pointer to a single indirect block, which in turn holds pointers
to data blocks 10-1034 (or something). And there is a pointer to
a single indirect-indirect block, containing pointers to indirect
blocks, containing ponters to actual data block. Depending on
the implementation, the inode may also contain a pointer to an
indirect-indirect-indirect block.

A file system that moves around files when you overwrite them,
will only move the data blocks, not the inode. The inode will
stay the same, and in the same position on the storage device.

Paul Rubin · Mar 6, 2004

Thomas Bellman said:
A file system that moves around files when you overwrite them,
will only move the data blocks, not the inode. The inode will
stay the same, and in the same position on the storage device.

If the old blocks get moved to new by copying them and then updating
the block pointers in place, it may be impossible to find the old
blocks by normal means to overwrite them. But data recovery could
still find them, so you haven't securely deleted the file by
overwriting just the new blocks.

There's really no way to securely delete info from a hard drive. The
best you can do is encrypt the data so only ciphertext is stored.
Then if you manage to securely destroy the decryption key (a much
smaller piece of data than the whole file), the file is unrecoverable.
In fact you only need enough securely-eraseable media to hold one key,
and still be able to maintain destroyable keys for any number N of
files, where securely erasing a file takes O(log N) operations. I
have a Usenet post with further details and a pointer to some Python
code at:

http://www.google.com/[email protected]

Skip Montanaro · Mar 6, 2004

Thomas> A file system that moves around files when you overwrite them,
Thomas> will only move the data blocks, not the inode. The inode will
Thomas> stay the same, and in the same position on the storage device.

Thanks. It's been years since I looked at any file system structures. I
was indeed confusing inodes and data blocks.

Skip

Mathias Waack · Mar 6, 2004

Peter said:
I'm fairly sure that at least some "journalling" or "logging" file
systems, such as ReiserFS, or maybe ext3, on Linux, violate this
assumption.

shred(1) knows that, the manpage lists the FS from above and a lot
others as examples.

Mathias

Python 2-3 compatibility	0	Jun 2, 2013
Python daemonisation with python-daemon	0	Apr 30, 2010
Writing multiple files with with-context	1	May 23, 2011
Problem building python 2.7 with --enable-shared	3	Sep 16, 2010
Getting the Appdata Directory with Python and PEP?	12	Nov 25, 2013
64-bit Python for Solaris	0	May 21, 2013
Help running Windows programs from Python	3	May 7, 2010
ANN: eGenix PyRun - One file Python Runtime 1.2.0	0	Apr 30, 2013

securely overwrite files with Python

Bart Nessux

Skip Montanaro

Mathias Waack

Bob Ippolito

Skip Montanaro

Peter Hansen

Bob Ippolito

Bart Nessux

Thomas Bellman

Paul Rubin

Skip Montanaro

Mathias Waack

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads