accessing binary data

C

Chris Roth

I have written a program that uses pre-calculated data that is currently
in a binary file. The program needs to access about 1 Mb of data in the
binary file that is scattered across the 500 Mb file.

Should the program read piecewise from the file to get all the data it
needs, or load the entire contents into memory and then read the bits it
needs?

Maybe more importantly, is the binary file technique the best one to use
given the circumstances or is there a better technique out there?
 
V

Victor Bazarov

Chris said:
I have written a program that uses pre-calculated data that is
currently in a binary file. The program needs to access about 1 Mb
of data in the binary file that is scattered across the 500 Mb file.

Should the program read piecewise from the file to get all the data it
needs, or load the entire contents into memory and then read the bits
it needs?

Read piecewise sounds better.
Maybe more importantly, is the binary file technique the best one to
use given the circumstances or is there a better technique out there?

Not enough information. Also, why did you post your general programming
question (or perhaps a system-specific one) to a langauge newsgroup?

V
 
J

Jim Langston

Chris Roth said:
I have written a program that uses pre-calculated data that is currently in
a binary file. The program needs to access about 1 Mb of data in the
binary file that is scattered across the 500 Mb file.

Should the program read piecewise from the file to get all the data it
needs, or load the entire contents into memory and then read the bits it
needs?

Maybe more importantly, is the binary file technique the best one to use
given the circumstances or is there a better technique out there?

It seems it would be much faster to read it piecewise as long as you knew
where the information was located in the binary file so you could seek to
it.
 
C

Chris Roth

Victor said:
Read piecewise sounds better.


Not enough information. Also, why did you post your general programming
question (or perhaps a system-specific one) to a langauge newsgroup?

V

I am programming in c++. I thought there might be some clever tricks
that are c++ specific.
 
V

Victor Bazarov

Chris said:
I am programming in c++. I thought there might be some clever tricks
that are c++ specific.

I/O is often system-specific. Whatever C++ can have is the lowest
common denominator for all platforms providing I/O out there. If
there are any "clever tricks", the best place to ask is the newsgroup
for your OS.

V
 
J

James Kanze

I have written a program that uses pre-calculated data that is currently
in a binary file. The program needs to access about 1 Mb of data in the
binary file that is scattered across the 500 Mb file.
Should the program read piecewise from the file to get all the data it
needs, or load the entire contents into memory and then read the bits it
needs?

Yes.

Which is just a way of saying: it depends. The general rule
would be to write the data as simply formatted text, and parse
it. If it's 500 Mb binary, however, that's likely to be a
little slow. And you can't seek to an arbitrary position in a
text file. A binary format might help; it could be faster, and
depending on the format, you may or may not be able to
effectively use seek to only read the relevant parts.

If the data has no historical value (i.e. you don't have to save
it---it's only used for communicating between these two
programs), and you can ensure that the two programs are running
on the same machine, and have been compiled with the same
compiler (and version), using the same options, then you can
consider using a binary dump of the memory. In that case, the
"best" solution is probably implementation specific: mmap under
Unix, CreateFileMapping under Windows.
Maybe more importantly, is the binary file technique the best
one to use given the circumstances or is there a better
technique out there?

It depends a lot on how long the data have to persist. If
there's even the slightest risk that you'll have to read them
with a future version of your program, or even a recompiled
version, then you need to define a format, and use it.

The format may be binary: binary formats are a lot harder to
debug, but generally end up with smaller files and faster
formatting and parsing. Although the difference isn't always as
much as one might think. Note too that it's possible to read
and write a file containing text in binary mode, to allow
seeking. If you want to go that route, you'll probably want to
ensure that all "records" have a fixed length. (If there are
different record types, consider storing them in separate
files.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,152
Latest member
LorettaGur
Top