Secure delete with python

Discussion in 'Python' started by Boris Genc, Sep 6, 2004.

  1. Boris Genc

    Boris Genc Guest

    Hi everybody.
    I was wandering is there a method or a function already implemented in
    python that supports secure deletion of data?

    I'm interested in something which is able to securely wipe data (from
    single file to bunch of MB's), and that should run both on Linux and
    Windows.

    I tried on google, but I hadn't found anything useful to me.

    Thank you very much in advance.

    Boris Genc
     
    Boris Genc, Sep 6, 2004
    #1
    1. Advertising

  2. Boris Genc

    Roy Smith Guest

    Boris Genc <boris.genc@REMOVE_mindless_ME.com> wrote:

    > Hi everybody.
    > I was wandering is there a method or a function already implemented in
    > python that supports secure deletion of data?
    >
    > I'm interested in something which is able to securely wipe data (from
    > single file to bunch of MB's), and that should run both on Linux and
    > Windows.


    When people talk about secure deletion of data, they generally mean
    things like over-writing the physical disk blocks that used to hold the
    file with random data. The details of how you do this is extremely
    operating system dependent (and probably also on what kind of file
    system, hardware, etc). Not to mention that the definition of "secure"
    will vary with the type of data, and who's doing it (i.e. what I
    consider secure probably doesn't pass muster with the military).
     
    Roy Smith, Sep 6, 2004
    #2
    1. Advertising

  3. Boris Genc wrote:
    > Hi everybody.
    > I was wandering is there a method or a function already implemented in
    > python that supports secure deletion of data?
    >
    > I'm interested in something which is able to securely wipe data (from
    > single file to bunch of MB's), and that should run both on Linux and
    > Windows.
    >
    > I tried on google, but I hadn't found anything useful to me.
    >
    > Thank you very much in advance.
    >
    > Boris Genc

    something like

    fp = open(path, "wb")
    for i in range(os.path.getsize(path)):
    fp.write("*")
    fp.close()
    os.unlink(path)

    is probably all you can do in a portable way (multiple write phases with
    different data could improve the 'security'). But a problem that cannot be
    solved in a portable way is that the data might exist at other locations on the
    disk (e.g. temporary file, backup, swapfile...). Unless you know *exactly* that
    there *cannot* be another copy of the data, you would have to erase all unused
    parts of the filesystem, too - a process that heavily depends on which
    filesystem is used.
     
    Benjamin Niemann, Sep 6, 2004
    #3
  4. Benjamin Niemann wrote:

    > Boris Genc wrote:
    >
    >> Hi everybody.
    >> I was wandering is there a method or a function already implemented in
    >> python that supports secure deletion of data?
    >>
    >> I'm interested in something which is able to securely wipe data (from
    >> single file to bunch of MB's), and that should run both on Linux and
    >> Windows.
    >>
    >> I tried on google, but I hadn't found anything useful to me.
    >>
    >> Thank you very much in advance.
    >>
    >> Boris Genc

    >
    > something like
    >
    > fp = open(path, "wb")
    > for i in range(os.path.getsize(path)):
    > fp.write("*")
    > fp.close()
    > os.unlink(path)


    and there is no guarantee that this actually overwrites the old file. The
    filesystem may choose to write the new content at another location of the disk,
    leaving the original data untouched.
     
    Benjamin Niemann, Sep 6, 2004
    #4
  5. Boris Genc

    Boris Genc Guest

    On Mon, 06 Sep 2004 09:10:49 -0400, Roy Smith wrote:

    > When people talk about secure deletion of data, they generally mean
    > things like over-writing the physical disk blocks that used to hold the
    > file with random data. The details of how you do this is extremely
    > operating system dependent (and probably also on what kind of file
    > system, hardware, etc). Not to mention that the definition of "secure"
    > will vary with the type of data, and who's doing it (i.e. what I
    > consider secure probably doesn't pass muster with the military).


    Yes, I was thinking about overwriting the data I want to be deleted with
    random data. I know that things like that are OS specific. I wasn't
    thinking about all those Gutmann methods and 27 passes, it's more like a
    simple utility, more "hide from your sister" than "hide from the
    government" type:)

    Anyway, thank you guys. Benjamin, I think your method will suit me, thank
    you.
     
    Boris Genc, Sep 6, 2004
    #5
  6. Boris Genc

    Paul Rubin Guest

    Paul Rubin, Sep 6, 2004
    #6
  7. Boris Genc

    Ville Vainio Guest

    >>>>> "Benjamin" == Benjamin Niemann <> writes:

    >> fp = open(path, "wb")
    >> for i in range(os.path.getsize(path)):
    >> fp.write("*")
    >> fp.close()
    >> os.unlink(path)


    Benjamin> and there is no guarantee that this actually overwrites
    Benjamin> the old file. The filesystem may choose to write the new
    Benjamin> content at another location of the disk, leaving the
    Benjamin> original data untouched.

    Seriously? What OSen are known for doing this? I'd had thought that if
    the file size is unchanged, the data is always written over the old
    data...

    Also, when overwriting a file, it's better to do it several times,
    with alternating bit patterns and "syncing" the disk after each
    pass. Of course even that is not going to guarantee anything because
    it may just go to the hardware cache in the disk unit, but it's
    reasonable if you are overwriting lots of data at once.

    Performing these steps, you'll at least get a good false sense of
    security ;-).

    --
    Ville Vainio http://tinyurl.com/2prnb
     
    Ville Vainio, Sep 6, 2004
    #7
  8. On Mon, 06 Sep 2004 15:25:51 +0200, Benjamin Niemann
    <> declaimed the following in comp.lang.python:

    >
    > fp = open(path, "wb")


    Opening for "w", on many systems I've used, basically creates a
    new file that may or may not use the same disk region (it definitely
    wouldn't on UCSD P-system -- when I used that all files opened for
    output were opened in the largest contiguous space on the disk).

    Opening the file for "r+" is probably better; since it indicates
    one may wish to read from the file along with writing to it, then the
    original file must be available -- and I've not heard of any OS that
    makes complete copies of a file during updates (I'm not counting the
    behavior of editors/word-processors that read the entire file into
    memory and create a temporary backup copy).

    --
    > ============================================================== <
    > | Wulfraed Dennis Lee Bieber KD6MOG <
    > | Bestiaria Support Staff <
    > ============================================================== <
    > Home Page: <http://www.dm.net/~wulfraed/> <
    > Overflow Page: <http://wlfraed.home.netcom.com/> <
     
    Dennis Lee Bieber, Sep 6, 2004
    #8
  9. Boris Genc

    Andrew Dalke Guest

    Ville Vainio wrote:
    > Seriously? What OSen are known for [writing new content at
    > another location of the disk]? I'd had thought that if
    > the file size is unchanged, the data is always written over the old
    > data...


    It can even be filesystem specific. Back in the days
    of WORM drives (do people still use those?) you could write
    once to a place on the drive, but read it many times.
    (Write Once Read Many). Changing a file meant writing a
    new copy of it and writing a new index to point to the
    new file, ignoring the old. That is, all copies of the
    file would stay on the disk.


    The VMS systems always kept an old copy of the file around
    unless you explicitly deleted it. By default a directory
    listing would only show the most recent copy of the file,
    but you could tell it to show all the versions, which
    would look like (roughly, been 15 years since I last saw VMS)
    MYFILE;1
    MYFILE;2
    ..
    MYFILE;94

    It was believed this feature was a deliberate ploy of
    DEC to sell more hard drives. ;)


    If you read a file then wait a while, and during that time
    the OS decided to defragment the drive then the location
    of the file could easily be changed from underneath you.


    Andrew
     
    Andrew Dalke, Sep 6, 2004
    #9
  10. Ville Vainio wrote:

    >>>>>>"Benjamin" == Benjamin Niemann <> writes:

    >
    >
    > >> fp = open(path, "wb")
    > >> for i in range(os.path.getsize(path)):
    > >> fp.write("*")
    > >> fp.close()
    > >> os.unlink(path)

    >
    > Benjamin> and there is no guarantee that this actually overwrites
    > Benjamin> the old file. The filesystem may choose to write the new
    > Benjamin> content at another location of the disk, leaving the
    > Benjamin> original data untouched.
    >
    > Seriously? What OSen are known for doing this? I'd had thought that if
    > the file size is unchanged, the data is always written over the old
    > data...


    VMS, I believe, has a versioning system built into the file system. Each
    time a file is saved, a new version is created while the old versions
    are still there. All from hearsay though, I have never used or seen VMS
    myself.

    --
    "Codito ergo sum"
    Roel Schroeven
     
    Roel Schroeven, Sep 6, 2004
    #10
  11. Ville Vainio wrote:
    >>>>>>"Benjamin" == Benjamin Niemann <> writes:

    >
    >
    > >> fp = open(path, "wb")
    > >> for i in range(os.path.getsize(path)):
    > >> fp.write("*")
    > >> fp.close()
    > >> os.unlink(path)

    >
    > Benjamin> and there is no guarantee that this actually overwrites
    > Benjamin> the old file. The filesystem may choose to write the new
    > Benjamin> content at another location of the disk, leaving the
    > Benjamin> original data untouched.
    >
    > Seriously? What OSen are known for doing this? I'd had thought that if
    > the file size is unchanged, the data is always written over the old
    > data...

    I don't know, if there actually is a filesystem that does this, but
    there is no rule (that comes to mind now at least) that forbids it. E.g.
    I could imagine some kind of transactional FS that doesn't change the
    original file until to finish the transaction (=close the file) to avoid
    file corruption, if a program crashes while writing...

    Modern filesystem do lots of things most people (including me) can't
    imaging. ReiserFS e.g. packs several small files into one block. If such
    a file grows (perhaps) the data is moved to a block of its own - and the
    old data stays (unreferenced) on disk although you didn't conciously
    made a copy of the file...

    But I'm just thinking aloud - don't know if anything of this is true.
    But I except to task of a "secure delete" to be pretty difficult.

    > Also, when overwriting a file, it's better to do it several times,
    > with alternating bit patterns and "syncing" the disk after each
    > pass. Of course even that is not going to guarantee anything because
    > it may just go to the hardware cache in the disk unit, but it's
    > reasonable if you are overwriting lots of data at once.
    >
    > Performing these steps, you'll at least get a good false sense of
    > security ;-).
    >
     
    Benjamin Niemann, Sep 6, 2004
    #11
  12. Boris Genc

    Paul Rubin Guest

    Ville Vainio <> writes:
    > Benjamin> and there is no guarantee that this actually overwrites
    > Benjamin> the old file. The filesystem may choose to write the new
    > Benjamin> content at another location of the disk, leaving the
    > Benjamin> original data untouched.
    >
    > Seriously? What OSen are known for doing this? I'd had thought that if
    > the file size is unchanged, the data is always written over the old
    > data...


    That's what log structured file systems do, for example.

    > Also, when overwriting a file, it's better to do it several times,
    > with alternating bit patterns and "syncing" the disk after each
    > pass. Of course even that is not going to guarantee anything because
    > it may just go to the hardware cache in the disk unit, but it's
    > reasonable if you are overwriting lots of data at once.


    It may never get written to the same sector of the disk as the
    original file, even if the OS has tried to overwrite those sectors.
    Disk drives themselves will sometimes remap sectors from one place to
    another.
     
    Paul Rubin, Sep 6, 2004
    #12
  13. On Mon, 06 Sep 2004 20:40:50 GMT, Roel Schroeven
    <> declaimed the following in
    comp.lang.python:

    >
    > VMS, I believe, has a versioning system built into the file system. Each
    > time a file is saved, a new version is created while the old versions


    The keyword is "saved"... If opened in an "update" mode, one is
    working with just the original file. Things like editors, however,
    typically duplicated the contents (with modifications) into a NEW file
    -- incrementing the version number.

    --
    > ============================================================== <
    > | Wulfraed Dennis Lee Bieber KD6MOG <
    > | Bestiaria Support Staff <
    > ============================================================== <
    > Home Page: <http://www.dm.net/~wulfraed/> <
    > Overflow Page: <http://wlfraed.home.netcom.com/> <
     
    Dennis Lee Bieber, Sep 7, 2004
    #13
  14. Paul Rubin <http://> wrote in message news:<>...
    > Ville Vainio <> writes:
    > > Benjamin> and there is no guarantee that this actually overwrites
    > > Benjamin> the old file. The filesystem may choose to write the new
    > > Benjamin> content at another location of the disk, leaving the
    > > Benjamin> original data untouched.
    > >
    > > Seriously? What OSen are known for doing this? I'd had thought that if
    > > the file size is unchanged, the data is always written over the old
    > > data...

    >
    > That's what log structured file systems do, for example.
    >
    > > Also, when overwriting a file, it's better to do it several times,
    > > with alternating bit patterns and "syncing" the disk after each
    > > pass. Of course even that is not going to guarantee anything because
    > > it may just go to the hardware cache in the disk unit, but it's
    > > reasonable if you are overwriting lots of data at once.

    >
    > It may never get written to the same sector of the disk as the
    > original file, even if the OS has tried to overwrite those sectors.
    > Disk drives themselves will sometimes remap sectors from one place to
    > another.


    I had this idea once, when I assumed that the OS wrote to the first
    blocks nearest to the beginning of the disk, to where I just simply
    write a whole bunch of crap files to fill in blocks that could be the
    place where recently deleted files used to be. Then defrag the
    filesystem. Then delete the crap files.

    I'm just thinking aloud if any of this helps.
     
    Matthew K Jensen, Sep 7, 2004
    #14
  15. Boris Genc

    Paul Rubin Guest

    (Matthew K Jensen) writes:
    > I had this idea once, when I assumed that the OS wrote to the first
    > blocks nearest to the beginning of the disk, to where I just simply
    > write a whole bunch of crap files to fill in blocks that could be the
    > place where recently deleted files used to be. Then defrag the
    > filesystem. Then delete the crap files.
    >
    > I'm just thinking aloud if any of this helps.


    If you're 1) in control of what the OS does; and 2) not concerned
    about securing the data against serious recovery attempts, then ok,
    there's all kinds of stuff you can do that gives reasonable protection.

    In practice, 1) you're usually not in control of the OS and so you
    can't assume what order blocks are written in; and 2) if you're
    writing a security application for use by other people, you don't
    necessarily know what kinds of opponents your users will have or what
    will happen if their data escapes, so you have to guard against
    powerful data recovery techniques (including as-yet-uninvented ones)
    as well as casual ones.

    I think you're best off assuming that short of melting the platters,
    there's no way to ever erase data from a hard drive, i.e. that a
    sufficiently powerful attacker can recover every state that the drive
    has ever been in. The solution is to write only encrypted data to the
    drive, and don't store the key on the drive.
     
    Paul Rubin, Sep 7, 2004
    #15
  16. Boris Genc

    Duncan Booth Guest

    Ville Vainio <> wrote in
    news::

    > Seriously? What OSen are known for doing this? I'd had thought that if
    > the file size is unchanged, the data is always written over the old
    > data...


    I don't know for certain, but I think it is a pretty safe bet that NTFS
    allocates new disc blocks instead of updating the existing ones.

    NTFS is a transaction based file system, i.e. it guarantees that any
    particular disc operation either completes or doesn't, you can never get
    file-system corruption due to a power loss part way through updating a
    file. Transactions are written to two transaction logs (in case one is
    corrupted on failure), and every few seconds the outstanding transactions
    are committed. Once committed there is sufficient information in the
    transaction log that even if power is lost the transaction can be
    completed, and likewise any transaction that has not been committed has
    sufficient information stored that it can be rolled back.

    There isn't very much published information on the NTFS internals (any
    useful references gratefully received), but so far as I can see writing
    updates to a fresh disc block would be the only realistic way to implement
    this (otherwise you would need to write the data three times: once to each
    transaction log then again to the actual file). If the data is written
    separately then the transaction log only needs to store the location of the
    new data (so it can be wiped if the transaction is rolled back) and then
    update pointers when it is committed.

    The other reason why I'm sure overwriting an existing file must allocate
    new disc blocks is that NTFS supports compression on files, so if you start
    off with a compressed file containing essentially random data and overwrite
    it with repeated data (e.g. nulls) it will occupy less disc space.
     
    Duncan Booth, Sep 7, 2004
    #16
  17. Boris Genc

    Peter Otten Guest

    Paul Rubin wrote:

    > I think you're best off assuming that short of melting the platters,
    > there's no way to ever erase data from a hard drive, i.e. that a
    > sufficiently powerful attacker can recover every state that the drive
    > has ever been in. The solution is to write only encrypted data to the


    The german PC magazine c't has sent in hard disks overwritten once with
    zeros to data recovery firms. No data was recovered. So unless your
    opponent has secret service connections I'd say you are safe. He will
    rather watch your screen or log your keystrokes than mess with the hd - if
    he's not already in your WLAN that is.

    > has ever been in. The solution is to write only encrypted data to the
    > drive, and don't store the key on the drive.


    As a special case, avoid that the OS writes the key to disk while swapping.

    Peter
     
    Peter Otten, Sep 7, 2004
    #17
  18. Boris Genc

    Neil Hodgson Guest

    Neil Hodgson, Sep 7, 2004
    #18
  19. Boris Genc

    John Lenton Guest

    On Tue, Sep 07, 2004 at 10:40:07AM +0200, Peter Otten wrote:
    >
    > > has ever been in. The solution is to write only encrypted data to the
    > > drive, and don't store the key on the drive.

    >
    > As a special case, avoid that the OS writes the key to disk while swapping.


    or encrypt the swapfile. In fact, encrypt the disk, then partition it;
    this is easily done with the device mapper in linux 2.6...

    --
    John Lenton () -- Random fortune:
    Todo lo que nace es digno de morir. -- Goethe --

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.2.5 (GNU/Linux)

    iD8DBQFBPbLdgPqu395ykGsRAmvIAJ41SVhaTWAd3+8zVjANlFo0jCGWfgCgiqU/
    cMQ+KqeulTq7QfLypgZeC6g=
    =Vz57
    -----END PGP SIGNATURE-----
     
    John Lenton, Sep 7, 2004
    #19
  20. Boris Genc

    Paul Rubin Guest

    "Neil Hodgson" <> writes:
    > For example source code and discussion for Windows see
    > http://www.sysinternals.com/ntw2k/source/sdelete.shtml


    See also the stuff at briggsoft.com. Apparently a lot of so-called
    secure deletion products on Windows don't work nearly as well as
    claimed. Kent Briggs hangs out on sci.crypt and has evaluated a lot
    of them besides marketing some of his own.
     
    Paul Rubin, Sep 7, 2004
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. A.M
    Replies:
    5
    Views:
    5,487
    Teemu Keiski
    Jun 8, 2004
  2. Daniel Malcolm
    Replies:
    0
    Views:
    573
    Daniel Malcolm
    Jan 24, 2005
  3. zdrakec
    Replies:
    1
    Views:
    448
    zdrakec
    Jul 25, 2005
  4. Joe
    Replies:
    5
    Views:
    980
    Steven Cheng[MSFT]
    Dec 13, 2005
  5. verbal kint
    Replies:
    1
    Views:
    561
    Sudsy
    Sep 4, 2004
Loading...

Share This Page