Telling an empty binary file from a "full" one

M

Michel Rouzic

I have a binary file used to store the values of variables in order to
use them again. I easily know whether the file exists or not, but the
problem is, in case the program has been earlier interupted before it
could write the variables to the file, the file is gonna be empty, and
then it's gonna load a load of crap into variables, which i want to
avoid.

That file is always 36 bytes big (it contains 4 double-precision floats
and one integer) and i'd like to be able to test whether it is 36 bytes
long or not, but it seems like quite a big problem to get to do it in a
portable way.

I thought that using fseek and ftell could work if the end of file
could be told but i read that "Setting the file position indicator to
end-of-file, as with fseek(file, 0, SEEK_END), has undefined behavior
for a binary stream (because of possible trailing null characters)"

My file has lots of zero bytes in it, so I guess it means it can't tell
then end of file reliably, right? I'd just like to know how I can, in a
reliable and portable way, tell the size of my binary file, and if not,
tell whether my file is empty or not
 
W

Walter Roberson

I have a binary file used to store the values of variables in order to
use them again. I easily know whether the file exists or not, but the
problem is, in case the program has been earlier interupted before it
could write the variables to the file, the file is gonna be empty, and
then it's gonna load a load of crap into variables, which i want to
avoid.

In that case, your load routine is programmed without due regard
to the circumstances.

Each fread() or fgetc() or fscanf() that you perform returns a
status. If there is any serious chance that the file might not
be of the proper size, you should be testing those return statuses,
and taking appropriate steps if you do not get enough data.
That file is always 36 bytes big (it contains 4 double-precision floats
and one integer)

Probably the easiest portable well to tell if the file is the right
size would be to attempt to fread) 37 bytes, and see whether you were
handed fewer bytes (file truncated), 36 bytes (right size), or 37 bytes
(file is too long.)

I would, though, make the point that you have emphasized portability
for the test, but the size of double-precision floats is not certain
to be 8 bytes, and integers are not certain to be 4 bytes.
It also appears that you might not have left room for any flags
to indicate representation format and to indicate which "endian"
the data is in. Portably stiching together a double from a binary
number is no fun -- fixed point or printable text or XDR are easier
to deal with in that regard.
 
C

Christopher Benson-Manica

Michel Rouzic said:
That file is always 36 bytes big (it contains 4 double-precision floats
and one integer) and i'd like to be able to test whether it is 36 bytes
long or not, but it seems like quite a big problem to get to do it in a
portable way.

If you're assuming it will always be 36 bytes in size, you've already
left the realms of strict portability. Is there any particular reason
you're unable to simply store what sounds like a small amount of data
as text? It would make many things easier, in any case.
 
K

Kenneth Brody

Christopher said:
If you're assuming it will always be 36 bytes in size, you've already
left the realms of strict portability. Is there any particular reason
you're unable to simply store what sounds like a small amount of data
as text? It would make many things easier, in any case.

s/36 bytes/4*sizeof(double)/

In my mind, there's a world of difference between "portable code" and
"code that generates portable data files". (One could also argue that
writing the file as text isn't 100% portable, as ASCII files won't read
correctly on an EBCDIC system.)

In any case, the "real" answer here is to check the return values from
the fread() calls to make sure the data was there to be read.

--
+-------------------------+--------------------+-----------------------------+
| Kenneth J. Brody | www.hvcomputer.com | |
| kenbrody/at\spamcop.net | www.fptech.com | #include <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------------+
Don't e-mail me at: <mailto:[email protected]>
 
A

Anonymous 7843

My file has lots of zero bytes in it, so I guess it means it can't tell
then end of file reliably, right? I'd just like to know how I can, in a
reliable and portable way, tell the size of my binary file, and if not,
tell whether my file is empty or not

I'm curious as to what existing OS's do not accurately
report the lengths of binary files. Does anyone
have any examples?

The only situation like this that I've encountered is
that systems with unix-like device mapping can often
be coerced into opening a raw disk partition. Since
there is no file system present, fseek(SEEK_END) will
often position to the end of the partition whether
you have written meaningful data or not.
 
G

Gordon Burditt

My file has lots of zero bytes in it, so I guess it means it can't tell
I'm curious as to what existing OS's do not accurately
report the lengths of binary files. Does anyone
have any examples?

Under CP/M, the length of a file is a multiple of 128 bytes (the
size of a single-density floppy disk sector). A text file used ^Z
as an end-of-file marker, which was used when the file was opened
in text mode, to give a more fine-grained end-of-file. No such
marker could be used in binary mode, as ^Z is a legitimate binary
value that the file could contain. The file size was a number of
disk sectors.

C implementations (and there were some, although many of them left
out stuff like floating point) had to deal with the imprecise size
of binary files. A few used highly non-standard implementation
decisions (like int = 8 bits, on a z80 or 8080 processor). Others
were much closer to Standard C, but rather cramped as the machine
generally had only a 64k address space total (although some had
memory paging setups).
The only situation like this that I've encountered is
that systems with unix-like device mapping can often
be coerced into opening a raw disk partition. Since
there is no file system present, fseek(SEEK_END) will
often position to the end of the partition whether
you have written meaningful data or not.

That's another example.

Gordon L. Burditt
 
S

Skarmander

Kenneth said:
s/36 bytes/4*sizeof(double)/

In my mind, there's a world of difference between "portable code" and
"code that generates portable data files". (One could also argue that
writing the file as text isn't 100% portable, as ASCII files won't read
correctly on an EBCDIC system.)
And by extension of that argument, "100% portable data" does not exist.
There is only data that is read more and less easily by various
languages on various platforms. But the intent is probably to literally
convey the sense of effort one has to expend to hoist it from one end to
another. ASCII-encoded integers are portable. Doubles encoded by
someone's C implementation are too heavy.

If your program is writing files, it's doing so because it needs to
communicate something across process boundaries. By some Rule or Law
someone no doubt coined, the mere potential to do things inspires the
desire to have them done. Therefore, it's wise to accommodate as broad a
range of processes as you can afford.

Writing binary data in the native format of your C implementation is
probably the narrowest range possible, and only justifiable by laziness.
It may be justifiable laziness, of course, but it's still laziness. Know
that it only works if the process on the other side of the boundary is
a program compiled by the exact same C implementation, running on the
exact same platform. Even upgrading your C library is taking chances --
very mild ones, but you should nevertheless be aware of them.

What am I saying? Oh yes, right. What the other posters said. Use a text
file. It's really not much more involved and saves you ever so much
potential grief.

S.
 
K

Keith Thompson

Kenneth Brody said:
s/36 bytes/4*sizeof(double)/

s/4*sizeof(double)/4*sizeof(double)+sizeof(int)/

(assuming that "one integer" means "one int").

The most sensible approach *if* you don't care about portability of
the file is probably to declare a struct type

struct foo {
double a;
double b;
double c;
double d;
int e;
}

and use fread/fwrite to read and write values of type struct foo
directly to the file (in binary mode, of course). Never refer to
"36"; always use "sizeof(struct foo)" or "sizeof obj" where obj is of
type struct foo.

The code should be portable to other platforms, but the data file will
not be; it will only be usable on the system where it was created.
That's likely to be good enough. (If it isn't, use some portable
external representation of the data; plain text is a good choice.)

And, of course, choose more descripive names, than a, b, c, d, e, and
foo.
 
M

Michel Rouzic

Skarmander said:
Writing binary data in the native format of your C implementation is
probably the narrowest range possible, and only justifiable by laziness.
It may be justifiable laziness, of course, but it's still laziness. Know
that it only works if the process on the other side of the boundary is
a program compiled by the exact same C implementation, running on the
exact same platform.

No, I don't have this problem. The reason for that is that it's a
configuration file, it writes to a file whats in memory in order to use
it later. so it works both on big endian and little endian machines,
and indeed it can take absolutly any way of writing double-precision
floats, since it reads only what it writes.

and then, it's not laziness, rather ignorance, i never dealt yet with
using text files (i'm only at my second C program)
 
M

Michel Rouzic

Keith said:
The most sensible approach *if* you don't care about portability of
the file is probably to declare a struct type

struct foo {
double a;
double b;
double c;
double d;
int e;
}

and use fread/fwrite to read and write values of type struct foo
directly to the file (in binary mode, of course). Never refer to
"36"; always use "sizeof(struct foo)" or "sizeof obj" where obj is of
type struct foo.

The code should be portable to other platforms, but the data file will
not be; it will only be usable on the system where it was created.
That's likely to be good enough. (If it isn't, use some portable
external representation of the data; plain text is a good choice.)

And, of course, choose more descripive names, than a, b, c, d, e, and
foo.

thx, but it's quite off topic. I just want to test what's the size of
the file like, I already know how to write my data to it, or even read
it, my problem, is that I don't want to read a file if it's empty.
 
M

Michel Rouzic

Walter said:
In that case, your load routine is programmed without due regard
to the circumstances.

Each fread() or fgetc() or fscanf() that you perform returns a
status. If there is any serious chance that the file might not
be of the proper size, you should be testing those return statuses,
and taking appropriate steps if you do not get enough data.


Probably the easiest portable well to tell if the file is the right
size would be to attempt to fread) 37 bytes, and see whether you were
handed fewer bytes (file truncated), 36 bytes (right size), or 37 bytes
(file is too long.)

I would, though, make the point that you have emphasized portability
for the test, but the size of double-precision floats is not certain
to be 8 bytes, and integers are not certain to be 4 bytes.
It also appears that you might not have left room for any flags
to indicate representation format and to indicate which "endian"
the data is in. Portably stiching together a double from a binary
number is no fun -- fixed point or printable text or XDR are easier
to deal with in that regard.

That's getting helpful, but I don't really know how to deal with what
fread returns (indeed i have never dealt with size_t's before, nor
included stddef.h).

Anyways, my file has no risk of being over the right size, but only
under, so I guess i should try to read
(4*sizeof(double)+sizeof(int_32)) bytes and see what it returns (when
i'll have figured out what to do with what fread returns)

btw, right now, that file is empty, uneditable and undeletable, and `ls
-l` in cygwin tells me "ls: freq.cfg: No such file or directory", is it
because i killed the process before it fclosed the file?
 
M

Michel Rouzic

Christopher said:
If you're assuming it will always be 36 bytes in size, you've already
left the realms of strict portability. Is there any particular reason
you're unable to simply store what sounds like a small amount of data
as text? It would make many things easier, in any case.

yeah, i know, i should rather say that its size is
(4*sizeof(double)+sizeof(int32_t)) bytes.

The reason why I don't "simply store a small amount of data as text" is
that I'm a beginner and until now I only dealt with binary files and I
wouldn't know how to deal with read and writing to a text file. I guess
that's what I should try then, even if i wanted toa void having to do
that..
 
M

Michel Rouzic

Kenneth said:
s/36 bytes/4*sizeof(double)/

In my mind, there's a world of difference between "portable code" and
"code that generates portable data files". (One could also argue that
writing the file as text isn't 100% portable, as ASCII files won't read
correctly on an EBCDIC system.)

In any case, the "real" answer here is to check the return values from
the fread() calls to make sure the data was there to be read.

--
+-------------------------+--------------------+-----------------------------+
| Kenneth J. Brody | www.hvcomputer.com | |
| kenbrody/at\spamcop.net | www.fptech.com | #include <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------------+
Don't e-mail me at: <mailto:[email protected]>

The file is not meant to be portable. It's created in the first place
by the program and only used by the program. But I'll try to act like
it has to be portable then...
 
K

Keith Thompson

Michel Rouzic said:
thx, but it's quite off topic. I just want to test what's the size of
the file like, I already know how to write my data to it, or even read
it, my problem, is that I don't want to read a file if it's empty.

Sure, but the best way to do this is to attempt to read it (using
fread() with appropriate arguments) and detect whether it succeeded.
If you use "sizeof(struct foo)" rather than magic numbers like 36, the
code that does this will be easier to read and more maintainable.
 
M

Michel Rouzic

Walter said:
In that case, your load routine is programmed without due regard
to the circumstances.

Each fread() or fgetc() or fscanf() that you perform returns a
status. If there is any serious chance that the file might not
be of the proper size, you should be testing those return statuses,
and taking appropriate steps if you do not get enough data.


Probably the easiest portable well to tell if the file is the right
size would be to attempt to fread) 37 bytes, and see whether you were
handed fewer bytes (file truncated), 36 bytes (right size), or 37 bytes
(file is too long.)

I would, though, make the point that you have emphasized portability
for the test, but the size of double-precision floats is not certain
to be 8 bytes, and integers are not certain to be 4 bytes.
It also appears that you might not have left room for any flags
to indicate representation format and to indicate which "endian"
the data is in. Portably stiching together a double from a binary
number is no fun -- fixed point or printable text or XDR are easier
to deal with in that regard.

That's getting helpful, but I don't really know how to deal with what
fread returns (indeed i have never dealt with size_t's before, nor
included stddef.h).

Anyways, my file has no risk of being over the right size, but only
under, so I guess i should try to read
(4*sizeof(double)+sizeof(int_32)) bytes and see what it returns (when
i'll have figured out what to do with what fread returns)
 
M

Michel Rouzic

Keith said:
Sure, but the best way to do this is to attempt to read it (using
fread() with appropriate arguments) and detect whether it succeeded.
If you use "sizeof(struct foo)" rather than magic numbers like 36, the
code that does this will be easier to read and more maintainable.

OK, I managed to do it. And indeed, looks like a way to really
determine the size of the file, by doing a fread of one char in loop,
stopping the loop when the output of fread is 0, and telling the size
of the file by the iteration at which the loop stop.

Isn't it a reliable and relatively portable way of telling the size of
a file?
 
J

Jack Klein

I have a binary file used to store the values of variables in order to
use them again. I easily know whether the file exists or not, but the
problem is, in case the program has been earlier interupted before it
could write the variables to the file, the file is gonna be empty, and
then it's gonna load a load of crap into variables, which i want to
avoid.

That file is always 36 bytes big (it contains 4 double-precision floats
and one integer) and i'd like to be able to test whether it is 36 bytes
long or not, but it seems like quite a big problem to get to do it in a
portable way.

I thought that using fseek and ftell could work if the end of file
could be told but i read that "Setting the file position indicator to
end-of-file, as with fseek(file, 0, SEEK_END), has undefined behavior
for a binary stream (because of possible trailing null characters)"

My file has lots of zero bytes in it, so I guess it means it can't tell
then end of file reliably, right? I'd just like to know how I can, in a
reliable and portable way, tell the size of my binary file, and if not,
tell whether my file is empty or not

You are only addressing one part of the issue. If you have a reason
to verify that your data is truly valid, then make sure it is truly
valid. That means verifying more than just the size, but the actual
contents, by means of a checksum or some such similar mechanism.

Consider:

struct my_data
{
double v1;
double v2;
double v3;
double v4;
int i1;
};

struct my_storage
{
struct my_data data;
unsigned long checksum;
};

When your structure is filled with data and you are ready to write it
to file, pass its address to a function that will compute a checksum
on the inner structure and return it. Assign it to the checksum
member of the outer structure and store it with fwrite() into a binary
file.

To read it back, use fread() on the binary file to read it into a
my_storage structure. Call the checksum function again and compare
the value it returns to the checksum value in the outer structure.

If you read the correct number of bytes from the file, you verify that
the file has no more bytes by getc() returning EOF, and the checksum
matches, then you may have a high degree of confidence that you have
recovered valid data.
 
K

Keith Thompson

Please snip signatures when you quote.
OK, I managed to do it. And indeed, looks like a way to really
determine the size of the file, by doing a fread of one char in loop,
stopping the loop when the output of fread is 0, and telling the size
of the file by the iteration at which the loop stop.

Isn't it a reliable and relatively portable way of telling the size of
a file?

Yeah, that's one way to do it. In your case, though, you probably
don't need to care exactly how big the file is, just whether it's big
enough.

Given the type "struct foo" above, you could just do this:

FILE *config_file = fopen("filename", "rb");
/* Insert error checking here */
struct foo buffer;
size_t bytes_read = fread(&buffer, sizeof buffer, 1, config_file);
if (bytes_read == sizeof buffer) {
/* ok */
}
else {
/* read failed */
}

This is untested code.
 
M

Michel Rouzic

Keith said:
Please snip signatures when you quote.


Yeah, that's one way to do it. In your case, though, you probably
don't need to care exactly how big the file is, just whether it's big
enough.

Given the type "struct foo" above, you could just do this:

FILE *config_file = fopen("filename", "rb");
/* Insert error checking here */
struct foo buffer;
size_t bytes_read = fread(&buffer, sizeof buffer, 1, config_file);
if (bytes_read == sizeof buffer) {
/* ok */
}
else {
/* read failed */
}

This is untested code.

Yeah, that's right, and I didn't actually use it to determine the size,
but to see if the last char read returns 0 or 1. I was just saying it
could be used to tell the size too.
 
L

Lawrence Kirby

On Mon, 19 Sep 2005 18:56:28 -0700, Michel Rouzic wrote:

....
OK, I managed to do it. And indeed, looks like a way to really
determine the size of the file, by doing a fread of one char in loop,
stopping the loop when the output of fread is 0, and telling the size
of the file by the iteration at which the loop stop.

It would be simpler to use getc().
Isn't it a reliable and relatively portable way of telling the size of
a file?

Depends what you mean by "size of a file". It will tell you how much data
you can read from a file which is a reasonable definition. It won't
necessarily tell you how much data we written to the file in the first
place. Others have mentioned systems where the "file size" stored by the
system is a number of blocks and not a byte count.

Lawrence
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top