Secure delete with python

B

Boris Genc

Hi everybody.
I was wandering is there a method or a function already implemented in
python that supports secure deletion of data?

I'm interested in something which is able to securely wipe data (from
single file to bunch of MB's), and that should run both on Linux and
Windows.

I tried on google, but I hadn't found anything useful to me.

Thank you very much in advance.

Boris Genc
 
R

Roy Smith

Boris Genc said:
Hi everybody.
I was wandering is there a method or a function already implemented in
python that supports secure deletion of data?

I'm interested in something which is able to securely wipe data (from
single file to bunch of MB's), and that should run both on Linux and
Windows.

When people talk about secure deletion of data, they generally mean
things like over-writing the physical disk blocks that used to hold the
file with random data. The details of how you do this is extremely
operating system dependent (and probably also on what kind of file
system, hardware, etc). Not to mention that the definition of "secure"
will vary with the type of data, and who's doing it (i.e. what I
consider secure probably doesn't pass muster with the military).
 
B

Benjamin Niemann

Boris said:
Hi everybody.
I was wandering is there a method or a function already implemented in
python that supports secure deletion of data?

I'm interested in something which is able to securely wipe data (from
single file to bunch of MB's), and that should run both on Linux and
Windows.

I tried on google, but I hadn't found anything useful to me.

Thank you very much in advance.

Boris Genc
something like

fp = open(path, "wb")
for i in range(os.path.getsize(path)):
fp.write("*")
fp.close()
os.unlink(path)

is probably all you can do in a portable way (multiple write phases with
different data could improve the 'security'). But a problem that cannot be
solved in a portable way is that the data might exist at other locations on the
disk (e.g. temporary file, backup, swapfile...). Unless you know *exactly* that
there *cannot* be another copy of the data, you would have to erase all unused
parts of the filesystem, too - a process that heavily depends on which
filesystem is used.
 
B

Benjamin Niemann

Benjamin said:
something like

fp = open(path, "wb")
for i in range(os.path.getsize(path)):
fp.write("*")
fp.close()
os.unlink(path)

and there is no guarantee that this actually overwrites the old file. The
filesystem may choose to write the new content at another location of the disk,
leaving the original data untouched.
 
B

Boris Genc

When people talk about secure deletion of data, they generally mean
things like over-writing the physical disk blocks that used to hold the
file with random data. The details of how you do this is extremely
operating system dependent (and probably also on what kind of file
system, hardware, etc). Not to mention that the definition of "secure"
will vary with the type of data, and who's doing it (i.e. what I
consider secure probably doesn't pass muster with the military).

Yes, I was thinking about overwriting the data I want to be deleted with
random data. I know that things like that are OS specific. I wasn't
thinking about all those Gutmann methods and 27 passes, it's more like a
simple utility, more "hide from your sister" than "hide from the
government" type:)

Anyway, thank you guys. Benjamin, I think your method will suit me, thank
you.
 
V

Ville Vainio

Benjamin> and there is no guarantee that this actually overwrites
Benjamin> the old file. The filesystem may choose to write the new
Benjamin> content at another location of the disk, leaving the
Benjamin> original data untouched.

Seriously? What OSen are known for doing this? I'd had thought that if
the file size is unchanged, the data is always written over the old
data...

Also, when overwriting a file, it's better to do it several times,
with alternating bit patterns and "syncing" the disk after each
pass. Of course even that is not going to guarantee anything because
it may just go to the hardware cache in the disk unit, but it's
reasonable if you are overwriting lots of data at once.

Performing these steps, you'll at least get a good false sense of
security ;-).
 
D

Dennis Lee Bieber

fp = open(path, "wb")

Opening for "w", on many systems I've used, basically creates a
new file that may or may not use the same disk region (it definitely
wouldn't on UCSD P-system -- when I used that all files opened for
output were opened in the largest contiguous space on the disk).

Opening the file for "r+" is probably better; since it indicates
one may wish to read from the file along with writing to it, then the
original file must be available -- and I've not heard of any OS that
makes complete copies of a file during updates (I'm not counting the
behavior of editors/word-processors that read the entire file into
memory and create a temporary backup copy).

--
 
A

Andrew Dalke

Ville said:
Seriously? What OSen are known for [writing new content at
> another location of the disk]? I'd had thought that if
the file size is unchanged, the data is always written over the old
data...

It can even be filesystem specific. Back in the days
of WORM drives (do people still use those?) you could write
once to a place on the drive, but read it many times.
(Write Once Read Many). Changing a file meant writing a
new copy of it and writing a new index to point to the
new file, ignoring the old. That is, all copies of the
file would stay on the disk.


The VMS systems always kept an old copy of the file around
unless you explicitly deleted it. By default a directory
listing would only show the most recent copy of the file,
but you could tell it to show all the versions, which
would look like (roughly, been 15 years since I last saw VMS)
MYFILE;1
MYFILE;2
..
MYFILE;94

It was believed this feature was a deliberate ploy of
DEC to sell more hard drives. ;)


If you read a file then wait a while, and during that time
the OS decided to defragment the drive then the location
of the file could easily be changed from underneath you.


Andrew
(e-mail address removed)
 
R

Roel Schroeven

Ville said:
Benjamin> and there is no guarantee that this actually overwrites
Benjamin> the old file. The filesystem may choose to write the new
Benjamin> content at another location of the disk, leaving the
Benjamin> original data untouched.

Seriously? What OSen are known for doing this? I'd had thought that if
the file size is unchanged, the data is always written over the old
data...

VMS, I believe, has a versioning system built into the file system. Each
time a file is saved, a new version is created while the old versions
are still there. All from hearsay though, I have never used or seen VMS
myself.
 
B

Benjamin Niemann

Ville said:
Benjamin> and there is no guarantee that this actually overwrites
Benjamin> the old file. The filesystem may choose to write the new
Benjamin> content at another location of the disk, leaving the
Benjamin> original data untouched.

Seriously? What OSen are known for doing this? I'd had thought that if
the file size is unchanged, the data is always written over the old
data...
I don't know, if there actually is a filesystem that does this, but
there is no rule (that comes to mind now at least) that forbids it. E.g.
I could imagine some kind of transactional FS that doesn't change the
original file until to finish the transaction (=close the file) to avoid
file corruption, if a program crashes while writing...

Modern filesystem do lots of things most people (including me) can't
imaging. ReiserFS e.g. packs several small files into one block. If such
a file grows (perhaps) the data is moved to a block of its own - and the
old data stays (unreferenced) on disk although you didn't conciously
made a copy of the file...

But I'm just thinking aloud - don't know if anything of this is true.
But I except to task of a "secure delete" to be pretty difficult.
 
P

Paul Rubin

Ville Vainio said:
Benjamin> and there is no guarantee that this actually overwrites
Benjamin> the old file. The filesystem may choose to write the new
Benjamin> content at another location of the disk, leaving the
Benjamin> original data untouched.

Seriously? What OSen are known for doing this? I'd had thought that if
the file size is unchanged, the data is always written over the old
data...

That's what log structured file systems do, for example.
Also, when overwriting a file, it's better to do it several times,
with alternating bit patterns and "syncing" the disk after each
pass. Of course even that is not going to guarantee anything because
it may just go to the hardware cache in the disk unit, but it's
reasonable if you are overwriting lots of data at once.

It may never get written to the same sector of the disk as the
original file, even if the OS has tried to overwrite those sectors.
Disk drives themselves will sometimes remap sectors from one place to
another.
 
D

Dennis Lee Bieber

VMS, I believe, has a versioning system built into the file system. Each
time a file is saved, a new version is created while the old versions

The keyword is "saved"... If opened in an "update" mode, one is
working with just the original file. Things like editors, however,
typically duplicated the contents (with modifications) into a NEW file
-- incrementing the version number.

--
 
M

Matthew K Jensen

Paul Rubin said:
That's what log structured file systems do, for example.


It may never get written to the same sector of the disk as the
original file, even if the OS has tried to overwrite those sectors.
Disk drives themselves will sometimes remap sectors from one place to
another.

I had this idea once, when I assumed that the OS wrote to the first
blocks nearest to the beginning of the disk, to where I just simply
write a whole bunch of crap files to fill in blocks that could be the
place where recently deleted files used to be. Then defrag the
filesystem. Then delete the crap files.

I'm just thinking aloud if any of this helps.
 
P

Paul Rubin

I had this idea once, when I assumed that the OS wrote to the first
blocks nearest to the beginning of the disk, to where I just simply
write a whole bunch of crap files to fill in blocks that could be the
place where recently deleted files used to be. Then defrag the
filesystem. Then delete the crap files.

I'm just thinking aloud if any of this helps.

If you're 1) in control of what the OS does; and 2) not concerned
about securing the data against serious recovery attempts, then ok,
there's all kinds of stuff you can do that gives reasonable protection.

In practice, 1) you're usually not in control of the OS and so you
can't assume what order blocks are written in; and 2) if you're
writing a security application for use by other people, you don't
necessarily know what kinds of opponents your users will have or what
will happen if their data escapes, so you have to guard against
powerful data recovery techniques (including as-yet-uninvented ones)
as well as casual ones.

I think you're best off assuming that short of melting the platters,
there's no way to ever erase data from a hard drive, i.e. that a
sufficiently powerful attacker can recover every state that the drive
has ever been in. The solution is to write only encrypted data to the
drive, and don't store the key on the drive.
 
D

Duncan Booth

Seriously? What OSen are known for doing this? I'd had thought that if
the file size is unchanged, the data is always written over the old
data...

I don't know for certain, but I think it is a pretty safe bet that NTFS
allocates new disc blocks instead of updating the existing ones.

NTFS is a transaction based file system, i.e. it guarantees that any
particular disc operation either completes or doesn't, you can never get
file-system corruption due to a power loss part way through updating a
file. Transactions are written to two transaction logs (in case one is
corrupted on failure), and every few seconds the outstanding transactions
are committed. Once committed there is sufficient information in the
transaction log that even if power is lost the transaction can be
completed, and likewise any transaction that has not been committed has
sufficient information stored that it can be rolled back.

There isn't very much published information on the NTFS internals (any
useful references gratefully received), but so far as I can see writing
updates to a fresh disc block would be the only realistic way to implement
this (otherwise you would need to write the data three times: once to each
transaction log then again to the actual file). If the data is written
separately then the transaction log only needs to store the location of the
new data (so it can be wiped if the transaction is rolled back) and then
update pointers when it is committed.

The other reason why I'm sure overwriting an existing file must allocate
new disc blocks is that NTFS supports compression on files, so if you start
off with a compressed file containing essentially random data and overwrite
it with repeated data (e.g. nulls) it will occupy less disc space.
 
P

Peter Otten

Paul said:
I think you're best off assuming that short of melting the platters,
there's no way to ever erase data from a hard drive, i.e. that a
sufficiently powerful attacker can recover every state that the drive
has ever been in. The solution is to write only encrypted data to the

The german PC magazine c't has sent in hard disks overwritten once with
zeros to data recovery firms. No data was recovered. So unless your
opponent has secret service connections I'd say you are safe. He will
rather watch your screen or log your keystrokes than mess with the hd - if
he's not already in your WLAN that is.
has ever been in. The solution is to write only encrypted data to the
drive, and don't store the key on the drive.

As a special case, avoid that the OS writes the key to disk while swapping.

Peter
 
J

John Lenton

As a special case, avoid that the OS writes the key to disk while swapping.

or encrypt the swapfile. In fact, encrypt the disk, then partition it;
this is easily done with the device mapper in linux 2.6...

--
John Lenton ([email protected]) -- Random fortune:
Todo lo que nace es digno de morir. -- Goethe --

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQFBPbLdgPqu395ykGsRAmvIAJ41SVhaTWAd3+8zVjANlFo0jCGWfgCgiqU/
cMQ+KqeulTq7QfLypgZeC6g=
=Vz57
-----END PGP SIGNATURE-----
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,189
Latest member
CryptoTaxSoftware

Latest Threads

Top