Storing/processing binary file input help needed

A

Arnold

I need to read a binary file and store it into a buffer in memory (system
has large amount of RAM, 2GB+) then pass it to a function. The function
accepts input as 32 bit unsigned longs (DWORD). I can pass a max of 512
words to it at a time. So I would pass them in chunks of 512 words until the
whole file has been processed. I haven't worked with binary files before so
I'm confused with how to store the binary file into memory. What sort of
array do I use? Does C allow char only? Can I declare a DWORD buffer since
that's what the function is taking as input? Or do I need to know the format
of the original data that binary file is encoding and store it in that?
That's the part that is really confusing me.

I believe I'll need to used fread to copy the file to that array. I plan on
getting the size of file, then determining how many DWORD are present in it
(for example 9000) and use that my number of object parameter in fread. So
in this case:

fread(buffer, 4,9000,fp); //each DWORD is 4 bytes, 900 DWORDs in my binary
file

Is that right?

Once I get the file into the buffer, I can then do a loop where I pass 512
elements of the array to a function until all 9000 elements are processed. I
hope that's right. Any other tips on improving speed and efficiency would be
appreciated. Thanks.
 
M

Martijn Lievaart

Once I get the file into the buffer, I can then do a loop where I pass 512
elements of the array to a function until all 9000 elements are processed. I
hope that's right. Any other tips on improving speed and efficiency would be
appreciated. Thanks.

As an alternative to the mmap solution from Glanni, the easiest way to do
this would be to read 512 words, process them, write back result, repaet
until end-of-file. No need to read the whole file in memory.

You can write back results in place, if they should occupy the same
storage, ro to some other file. If the data has to be replaced, it is
often best to write the output to a new file, then move the new file over
the old file. That way you will not corrupt the original file if your
program crashes half way through.

HTH,
M4
 
S

sathyashrayan

I am not a C wizard but I have some suggestions.
I need to read a binary file and store it into a buffer in memory (system
has large amount of RAM, 2GB+) then pass it to a function. The function
accepts input as 32 bit unsigned longs (DWORD). I can pass a max of 512
words to it at a time. So I would pass them in chunks of 512 words until the
whole file has been processed.

By the term "words" means to say that it is a chunk of chars and a
delimiters with an ASCII space? Or each "words" size is 512 bytes?
I'm confused with how to store the binary file into memory. What sort of
array do I use? Does C allow char only? Can I declare a DWORD buffer since
that's what the function is taking as input? Or do I need to know the format
of the original data that binary file is encoding and store it in that?
That's the part that is really confusing me.

By the term binary file and file format are you talking about the
first two letters in a file according to the DOS assembly language
(example MZ in .exe file) or the format of data present in a file
(fields and record with a kind of delimiter). If it is the second then
it is more related with the file's record design concept.
I believe I'll need to used fread to copy the file to that array. I plan on
getting the size of file, then determining how many DWORD are present in it
(for example 9000) and use that my number of object parameter in fread. So
in this case:

fread(buffer, 4,9000,fp); //each DWORD is 4 bytes, 900 DWORDs in my binary
file

Is that right?


Just 512 elements or unknown during the run time? Is not the time to
take up with linked list rather than using array data type?


Once I get the file into the buffer, I can then do a loop where I pass 512
elements of the array to a function until all 9000 elements are processed. I
hope that's right. Any other tips on improving speed and efficiency would be
appreciated. Thanks.

Optimizing in C is not a kind of "instructions management" like in
asm.
 
A

Arnold

Martijn Lievaart said:
As an alternative to the mmap solution from Glanni, the easiest way to do
this would be to read 512 words, process them, write back result, repaet
until end-of-file. No need to read the whole file in memory.

I thought of that but speed is a concern so I want to keep the number of
disk accesses at a minimum.
You can write back results in place, if they should occupy the same
storage, ro to some other file. If the data has to be replaced, it is
often best to write the output to a new file, then move the new file over
the old file. That way you will not corrupt the original file if your
program crashes half way through.

In my case, I don't have to write any data back to the original file. Thanks
for the suggestions.
 
A

Arnold

sathyashrayan said:
"Arnold" <[email protected]> wrote in message
I am not a C wizard but I have some suggestions.


By the term "words" means to say that it is a chunk of chars and a
delimiters with an ASCII space? Or each "words" size is 512 bytes?

Each word is a DWORD, so each one is 32 bits. I can pass a maximum of 512
DWORDs at a time to the function.

By the term binary file and file format are you talking about the
first two letters in a file according to the DOS assembly language
(example MZ in .exe file) or the format of data present in a file
(fields and record with a kind of delimiter). If it is the second then
it is more related with the file's record design concept.

It is the second.
Just 512 elements or unknown during the run time? Is not the time to
take up with linked list rather than using array data type?

512 is the maximum the function can handle at a time so that is fixed,
except for the last iteration though as the file won't have a multiple of
512 number of DWORDs.
 
S

Sean Kenwrick

Arnold said:
I need to read a binary file and store it into a buffer in memory (system
has large amount of RAM, 2GB+) then pass it to a function. The function
accepts input as 32 bit unsigned longs (DWORD). I can pass a max of 512
words to it at a time. So I would pass them in chunks of 512 words until the
whole file has been processed. I haven't worked with binary files before so
I'm confused with how to store the binary file into memory. What sort of
array do I use? Does C allow char only? Can I declare a DWORD buffer since
that's what the function is taking as input? Or do I need to know the format
of the original data that binary file is encoding and store it in that?
That's the part that is really confusing me.

I believe I'll need to used fread to copy the file to that array. I plan on
getting the size of file, then determining how many DWORD are present in it
(for example 9000) and use that my number of object parameter in fread. So
in this case:

fread(buffer, 4,9000,fp); file://each DWORD is 4 bytes, 900 DWORDs in my binary
file

Is that right?

You don't need to read the whole file, you can read 512 bytes at a time into
a buffer of appropriate size:

char buffer[512];
x=fread(buffer,512 1, fp); // don't forget to check the value of x (which
is the number of bytes actually read)
...

You can then pass a pointer to this buffer to you function which has been
prototyped to accept an
array of DWORD, and the number of elements to process (which will be x/4
from the fread above)
e.g.

int process_buf(DWORD *my_array, int number_of_elements);

Then you function can iterate across this array as follows:

int process_buff(DWORD * my_array,int no_elements)
{
int i;
DWORD next_val;
for(i=0;i<no_elements;i++){
next_val=my_array; // You might need to convert from
big-endian to little-endian here (see below)
}

}


Of course this makes an assumption that the data in the file is stored in
the same byte order as the processor you are running your program on (most
likely you are using an Intel Pentium so Little-Endian is the byte order you
are assuming). If the file uses another byte order then you can write
(or google for) a macro that will do the conversion for you..

Hope this helps
Sean
 
B

Barry Schwarz

I need to read a binary file and store it into a buffer in memory (system
has large amount of RAM, 2GB+) then pass it to a function. The function
accepts input as 32 bit unsigned longs (DWORD). I can pass a max of 512
words to it at a time. So I would pass them in chunks of 512 words until the
whole file has been processed. I haven't worked with binary files before so
I'm confused with how to store the binary file into memory. What sort of
array do I use? Does C allow char only? Can I declare a DWORD buffer since
that's what the function is taking as input? Or do I need to know the format
of the original data that binary file is encoding and store it in that?
That's the part that is really confusing me.

The I/O function (fread as you suggest below) does not care how you
define the buffer. However, how you use the buffer may make a
difference. If you define the buffer as unsigned char, then you are
guaranteed that all possible 256 values are acceptable (unsigned char
cannot have trap values) and the buffer will be portable (at least for
systems which have CHAR_BIT defined as 8). If you define the buffer
as DWORD, are you sure that all 4 billion plus possible values that
could come from a binary file are acceptable and your program will
never execute on a machine with a different sizeof(unsigned long)?
I believe I'll need to used fread to copy the file to that array. I plan on
getting the size of file, then determining how many DWORD are present in it
(for example 9000) and use that my number of object parameter in fread. So
in this case:

There is no portable way to get the file size (unless you read the
entire file) so you probably need to use a system specific extension
or function for this.
fread(buffer, 4,9000,fp); //each DWORD is 4 bytes, 900 DWORDs in my binary
file

You meant 9000.
Is that right?

Once I get the file into the buffer, I can then do a loop where I pass 512
elements of the array to a function until all 9000 elements are processed. I
hope that's right. Any other tips on improving speed and efficiency would be
appreciated. Thanks.

How you pass a quantity of array elements will determine the
suitability of your design. (Actually, the method of passing the
argument(s) should drive the design.) What is the prototype for the
receiving function?

The odds on the file containing an exact multiple of 512 DWORDs is
about 1 in 500 so you may want to be able to handle the last set as a
smaller quantity.



<<Remove the del for email>>
 
M

Martijn Lievaart

I thought of that but speed is a concern so I want to keep the number of
disk accesses at a minimum.

Memory mapping the file is probably still the best way, but suffers of a
size limit. To get around this, you can also read in large chunks of the
file. Instead of 512 words, read a few 100KB at the time and operate on
that. Experiment with buffer sizes to see what gives the best result.

I'm not sure what will be faster. Large buffers reduce the number of
system calls slightly (good), but decrease locality of reference (bad).
The mmap solution does not suffer either of these disadvantages I think.

Note that the number of disk accesses will be the same whatever solution
you chose. You have to read the whole file, period. I guess the main speed
factors are the number of system calls and how effectively you use your
memory. Also, you should try to do some useful work while waiting for the
disk, maybe asynchronous I/O or multithreading can be of help?

(If you look into multithreading, be sure you know what synchronisation
machisms are lightweight and which are heavyweight, huge difference).

I would just try a simple solution. If it isn't fast enough, try others.
Profile to see where your program spends its time. If most of the time is
spend on calculations, all of the above will give only very marginal
speedups. If run on a fast machine, maybe a naive implementation will be
fast enough for your needs. Remember the old truism about optimizing:
Don't (until you have proven you need it).

HTH,
M4
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,930
Messages
2,570,072
Members
46,521
Latest member
JamieCooch

Latest Threads

Top