emptying files

B

Bill Cunningham

I have written this short program to empty out files. It works great
except that it truncates. My guess was that it was in the fopen mode
somewhere but I have played with that and the same results. Empty file of
zero bytes. If I have a 512 byte file of data, I want 512 bytes of '\0'.
Pardon the exit(1)'s. It's hort hand on my implementation for
exit(EXIT_FAILURE); The macro is defined on my implementation as 1.

Bill

/* se, secure erase */

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
if (argc != 2) {
puts("se usage error");
exit(1);
}
int i, j;
FILE *fo, *fw;
if ((fo = fopen(argv[1], "ab")) == NULL) {
printf("%i\n", ferror(fo));
clearerr(fo);
fclose(fo);
exit(1);
}
if ((fw = fopen(argv[1], "ab")) == NULL) {
printf("%i\n", ferror(fw));
clearerr(fw);
fclose(fw);
exit(1);
}
while ((i = getc(fo)) != EOF)
putc(j = 0, fw);
fclose(fo);
fclose(fw);
return 0;
}
 
V

vippstar

Pardon the exit(1)'s. It's hort hand on my implementation for
exit(EXIT_FAILURE); The macro is defined on my implementation as 1.

Bill Cunningham on portability. Priceless
 
B

Bill Cunningham

By the way this isn't a very secure erase. You'd need to write at least
seven different bitpatterns over the entire file sequentially.

I kind of figured it wouldn't be. Thanks. I'll figure out something with
for() for that.

Bill
 
V

viza

Hi

I kind of figured it wouldn't be. Thanks. I'll figure out something
with for() for that.

This only works if the file system overwrites in-place. If not you are
just wasting effort.

See the documentation and source of your platform's shred(1) for pointers.

viza
 
V

viza

Hi

I have written this short program to empty out files. It works great
except that it truncates. My guess was that it was in the fopen mode
somewhere but I have played with that and the same results. Empty file
of zero bytes. If I have a 512 byte file of data, I want 512 bytes of
'\0'. Pardon the exit(1)'s. It's hort hand on my implementation for
exit(EXIT_FAILURE); The macro is defined on my implementation as 1.

if ((fo = fopen(argv[1], "ab")) == NULL) { ...
if ((fw = fopen(argv[1], "ab")) == NULL) { ...

while ((i = getc(fo)) != EOF)
putc(j = 0, fw);

Opening the file twice is a bad idea.

"a" is append. You want "r+b", and then fseek() to the end and then
ftell() to get the length and then rewind() and fwrite() in large-ish
blocks for some semblance of efficiency. Also, see my other reply.
 
B

Bill Cunningham

This only works if the file system overwrites in-place. If not you are
just wasting effort.
[snip]

What do you mean? If the filesystem is mounted in rw mode? I guess if it's
mount in ro it would do no good.

Bill
 
V

viza

This only works if the file system overwrites in-place. If not you are
just wasting effort.
[snip]

What do you mean? If the filesystem is mounted in rw mode? I guess if
it's mount in ro it would do no good.

If you open a file and write over its contents, sometimes the operating
system writes the new data to the same blocks where the old data was and
does not change which blocks make up the file. In other systems it first
writes the data to a new block, then changes the file to refer to the new
block. The old block is then released back to be reused, without ever
being overwritten whether you try to overwrite it or not.

Someone with low-level access to the disk can look for recently released
blocks and get access to the data that a naive shredding program like
this has left behind, in just the same way as if you had deleted the file
and recreated it.

HTH
viza
 
B

Bill Cunningham

If you open a file and write over its contents, sometimes the operating
system writes the new data to the same blocks where the old data was and
does not change which blocks make up the file. In other systems it first
writes the data to a new block, then changes the file to refer to the new
block. The old block is then released back to be reused, without ever
being overwritten whether you try to overwrite it or not.

Someone with low-level access to the disk can look for recently released
blocks and get access to the data that a naive shredding program like
this has left behind, in just the same way as if you had deleted the file
and recreated it.

Sounds like WFP. I hate not being allowed to erase my system critical
files. Atleast they say they are.
 
K

Keith Thompson

Bill Cunningham said:
Bill Cunningham on portability. Priceless

I only plan to use this on my system. Or else believe me it would be
exit(EXIT_FAILURE);[/QUOTE]

Why on Earth don't you just write "exit(EXIT_FAILURE);" in the first
place? You've wasted far more time trying to justify your
non-portable code than it would take to make it more portable.

If there were some system-specific advantage to using the less
portable code, then that would be fine -- but by your own admission,
exit(1) does exactly the same thing as exit(EXIT_FAILURE) on your
platform.
 
B

Bill Cunningham

See the documentation and source of your platform's shred(1) for pointers.

This shred is pretty cool. But it doesn't create an empty file. Only
secure erase which is what I want.

Bill
 
V

viza

Sounds like WFP. I hate not being allowed to erase my system
critical
files. Atleast they say they are.

No, it is good idea. Suppose the write operation was interrupted by a
power failure. If you write new blocks and then switch over the pointer
then you are can ensure that you always have one good copy of the file,
either before or after the write. If you try to modify the blocks in
place you are much more likely to get junk.

It does mean however that you need a filesystem aware shred program and
appropriate administrator permissions to run it.
 
K

Keith Thompson

pete said:
If you want to create a file with the same name and size as an original,
which reads as though it is full of null bytes unlike the original,
getc reading a byte at a time
would be a way to count how many bytes you need to write.

Unless the implementation pads the end of binary files with zero
bytes, which the standard specifically allows.
 
R

Richard Tobin

If you want to create a file with the same name and size as an original,
which reads as though it is full of null bytes unlike the original,
getc reading a byte at a time
would be a way to count how many bytes you need to write.
[/QUOTE]
Unless the implementation pads the end of binary files with zero
bytes, which the standard specifically allows.

Surely the point of that is for implementations that only record the
number of blocks in a file, and mark the "real" end of text files with
a character such as ^Z. In such an implementation, you'd just
unnecessarily write a few extra bytes, and there would be no change in
the length of the file - the unpadded length exists only in the mind
of the user.

-- Richard
 
K

Keith Thompson

Unless the implementation pads the end of binary files with zero
bytes, which the standard specifically allows.

Surely the point of that is for implementations that only record the
number of blocks in a file, and mark the "real" end of text files with
a character such as ^Z. In such an implementation, you'd just
unnecessarily write a few extra bytes, and there would be no change in
the length of the file - the unpadded length exists only in the mind
of the user.[/QUOTE]

It's for binary files, not text files. If you write, say, 10 bytes to
a file in binary mode, then read it back, you can get back the 10
bytes you wrote plus, say, another 502 null bytes. Since ^Z (ASCII
character 26, assuming an ASCII-based implementation) is a valid
character in a binary file, it can't be used as an end-of-file marker.
Reading from a text file in such an implementation will give you EOF
when it hits the logical end-of-file maker (^Z or whatever), and you
won't see any appended null bytes unless you read the file in binary
mode.

The standard's wording is (C99 7.19.2p3):

A binary stream is an ordered sequence of characters that can
transparently record internal data. Data read in from a binary
stream shall compare equal to the data that were earlier written
out to that stream, under the same implementation. Such a stream
may, however, have an implementation-defined number of null
characters appended to the end of the stream.

But yes, you make a good point. If I write 10 bytes to a file and end
up with a file that's indistinguishable from a file to which I wrote
512 bytes, the last 502 of which are null bytes, then the actual size
of the file is 512 bytes. (If I care, I can always encode the logical
file size in the file's data.)
 
R

Richard Tobin

Surely the point of that is for implementations that only record the
number of blocks in a file, and mark the "real" end of text files with
a character such as ^Z. In such an implementation, you'd just
unnecessarily write a few extra bytes, and there would be no change in
the length of the file - the unpadded length exists only in the mind
of the user.
[/QUOTE]
It's for binary files, not text files.

I must have expressed myself badly. I meant to say that these systems
have a way of giving a precise length to text files - using a marker
character - but for binary they only express length to block
granularity.

-- Richard
 
C

Chad

[email protected] (Richard Tobin) said:
It's for binary files, not text files. If you write, say, 10 bytes to
a file in binary mode, then read it back, you can get back the 10
bytes you wrote plus, say, another 502 null bytes. Since ^Z (ASCII
character 26, assuming an ASCII-based implementation) is a valid
character in a binary file, it can't be used as an end-of-file marker.
Reading from a text file in such an implementation will give you EOF
when it hits the logical end-of-file maker (^Z or whatever), and you
won't see any appended null bytes unless you read the file in binary
mode.

The standard's wording is (C99 7.19.2p3):

A binary stream is an ordered sequence of characters that can
transparently record internal data. Data read in from a binary
stream shall compare equal to the data that were earlier written
out to that stream, under the same implementation. Such a stream
may, however, have an implementation-defined number of null
characters appended to the end of the stream.

But yes, you make a good point. If I write 10 bytes to a file and end
up with a file that's indistinguishable from a file to which I wrote
512 bytes, the last 502 of which are null bytes, then the actual size
of the file is 512 bytes. (If I care, I can always encode the logical
file size in the file's data.)

How do you encode the logical file size in the file's data?


Chad
 
K

Keith Thompson

Chad said:
How do you encode the logical file size in the file's data?

Any way you like, depending on the file format.

For example, you might write the file's logical size in the first,
say, 8 bytes of the file -- as long as all writers and readers agree
on the format (including how the size is encoded in those 8 bytes).

I *think* that most image file formats, for example, include this kind
of information, though not typically at the very beginning of the file
(which is usually a marker indicating what kind of file it is).
 
C

CBFalconer

Chad said:
.... snip ...


How do you encode the logical file size in the file's data?

For example, you can write "size=10;" in the first 8 bytes of the
file, followed by the 10 binary bytes.
 
B

Bill Cunningham

Off-topic, I know, but how well do any erase algorithms do on flash
file devices?

A flash file system, like ffs, deliberately incorporates
wear-leveling, and so most likely will actually write those seven
files full of different patterns to seven distinct locations in the
flash, none of them actually overwriting the original.

And even if the file system has hooks to bypass this for erasing, many
of the flash devices, like compact flash, SD, and others, have the
wear leveling built into the controller micro in the device, and it
can't be bypassed.

Very good question. Even if OT I would wonder the same thing.

Bill
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,755
Messages
2,569,534
Members
45,007
Latest member
obedient dusk

Latest Threads

Top