Binary File I/O

M

Mr. Mxyztplk

(supposing file fp has been opened in binary mode, and m = n/p)

Just wondering if this approach

for (i = 0; i < p; i++) {
for (j = 0; j < q; j++) {
fseek(fp, i, SEEK_SET);
for (k = 0; k < m; k++) {
fread(&ch, sizeof(char), 1, fp);
fseek(fcipher, p - 1, SEEK_CUR);
/* do whatever */
}
}
}

actually does run more efficiently (less moving back and forth between
locations??) than the following

for (i = 0; i < p; i++) {
for (j = 0; j < q; j++) {
for (k = 0; k < n - i + 1; k += p) {
fseek(fp, k + i, SEEK_SET);
fread(&ch, sizeof(char), 1, fp);
/* do whatever */
}
}
}

?

For my purposes it has not made any difference one way or the other. So
far, that is.
 
R

Richard Bos

Mr. Mxyztplk said:
actually does run more efficiently (less moving back and forth between
locations??) than the following

Measure, measure, measure. When it comes to efficiency, don't just
theorise, always measure.
For my purposes it has not made any difference one way or the other. So
far, that is.

So, no, then. You _did_ measure and the measurement came up blank.

Richard
 
B

BGB

Measure, measure, measure. When it comes to efficiency, don't just
theorise, always measure.


So, no, then. You _did_ measure and the measurement came up blank.

yeah.


it won't likely make much of a difference on a typical modern OS, as
there isn't really exactly a whole lot that "seek" does in the first place.

usually, the seek calls essentially just update a value holding the
current offset, and the underlying read/write calls are where all the
magic happens.

typically, then, things are done at the level of disk-blocks:
copying data from disk-blocks held in buffers into the applications'
buffers, or copying application memory into the buffered disk-block and
marking the block dirty.

as needed, disk-blocks are pulled in to memory, or written back to disk.
if this is needed for a given read/write operation to complete, the
current-process is blocked, and set to resume once the block in question
becomes available (in the meantime, the OS goes off and does whatever else).


except in extreme cases, access patterns to file data wont really make
all that much of a difference.
 
M

Mr. Mxyztplk

BGB said:
yeah. elision
typically, then, things are done at the level of disk-blocks:
copying data from disk-blocks held in buffers into the applications'
buffers, or copying application memory into the buffered disk-block
and marking the block dirty. elision
as needed, disk-blocks are pulled in to memory, or written back to
disk. if this is needed for a given read/write operation to complete,
the current-process is blocked, and set to resume once the block in
question becomes available (in the meantime, the OS goes off and does
whatever else).
elision
yeah

This is basically the answer I was expecting. Now-a-days you have
something like pages of memory referring to the locations sitting around
in some buffer or something (my vague way of describing it), so it's not
like the days when you were really moving back-up-and-down a reel of
tape.
 
J

Jorgen Grahn

Measure, measure, measure. When it comes to efficiency, don't just
theorise, always measure.


So, no, then. You _did_ measure and the measurement came up blank.

Like he said: so far. Perhaps if the files are accessed over a
networked file system it makes a difference? Or something.

"Measure" is sound advice, but it must be allowed to /reason/ about
efficiency too.

/Jorgen
 
K

Kaz Kylheku

There are many things that will mess up your experiment:

- The hardware is likely to do I/O only in multiples of a hardware
block size (e.g. 512 bytes, although some hard disks made since 2010
use 4096 bytes, and floppies may have hardware block sizes of
128, 256, 512, or 1024 bytes).

- In large disks, the amount of storage that can be accessed *without*
seeking is pretty big, e.g. 63 sectors/track * 255 tracks/cylinder
* 512 bytes/sector = about 8 megabytes. To see seeking delay,
the file needs to be considerably larger than that.

To see seeking delays while repeatedly accessing some amount of storage,
on a caching operating system, you have to exceed the operating system's
cache size.

As far as system call overhead goes (calling OS functions to fill the
FILE * stream's buffer), triggering that overhead probably doesn't require a
large position delta.

If you're doing randomly-accessed one byte reads in a binary stream,
it might be be faster if they are clustered together, if the library
optimizes small-delta fseeks (by not discarding the buffer).
 
G

glen herrmannsfeldt

(snip)
To see seeking delays while repeatedly accessing some amount of storage,
on a caching operating system, you have to exceed the operating system's
cache size.

About 15 years ago, I was timing the I/O of a program and noticed
that it was faster than the network (using NFS) that the file was
stored on. The file was about 400MB, but then I realized that the
disk cache on a 4GB (unusual at the time) server was big enough
to cache the whole file.

(Our project manager tried to order us a 16GB server, but Dell
wouldn't sell him one.)

The disk cache complicates any timing related to file and disk
access.

-- glen
 
J

Jorgen Grahn

(snip)


About 15 years ago, I was timing the I/O of a program and noticed
that it was faster than the network (using NFS) that the file was
stored on. The file was about 400MB, but then I realized that the
disk cache on a 4GB (unusual at the time) server was big enough
to cache the whole file.

My gut feeling about NFS is that it's often the other way around --
you get fewer benefits of caching, and a lot of things that would have
been cached for a local disk instead needs a network roundtrip.

E.g. the Git version control utility is rather painful to use on a NFS
disk.

It would be useful with a profiling utility for such things.

/Jorgen
 
G

glen herrmannsfeldt

(snip, I wrote)
My gut feeling about NFS is that it's often the other way around --
you get fewer benefits of caching, and a lot of things that would have
been cached for a local disk instead needs a network roundtrip.

I am not so sure in the general case of mixing read and write
what it does. In this case, it was one very large file that,
presumably, was in the disk cache on the client. It might have
verified that it hadn't changed on the server, and so believed
that it could just supply the data.
E.g. the Git version control utility is rather painful to use on
a NFS disk.

At that time, we were using CVS. I don't remember if this file
was part of the CVS tree, though.
It would be useful with a profiling utility for such things.

More recently, I had some other unexpected results from NFS,
with the client running OS X 10.6.8, I wrote a file, renamed
it on the server, then tried to open it with the new name on
the client. The message that came back was (old name) does
not exist. I never tracked down why it did that, but possibly
it is again related to disk caching.

-- glen
 
J

Jorgen Grahn

(snip, I wrote)



I am not so sure in the general case of mixing read and write
what it does. In this case, it was one very large file that,
presumably, was in the disk cache on the client. It might have
verified that it hadn't changed on the server, and so believed
that it could just supply the data.

Yeah, well, I'm not saying I understand NFS. Perhaps I should have
said "prejudice" instead of "gut feeling".

[...]

/Jorgen
 
B

BGB

Yeah, well, I'm not saying I understand NFS. Perhaps I should have
said "prejudice" instead of "gut feeling".

going OT:

this brings up a thought:
why, with so many years that people have been doing this sort of thing,
there are not yet any really "good" network filesystems.

ex, issues with existing network filesystems:
NFS, kind of funky, not really well supported on Windows;
SMB / CIFS, not very well-behaved in general, getting Samba and
different Windows versions to play well together is often a bit of a
pain, but can mostly look like a native FS on the various targets, when
it works at least;
FTP, generally doesn't behave much like a native filesystem, on either
Windows or Linux, but does at least work on both and works ok over the
internet;
HTTP, ok for unidirectional downloads, WebDAV exists but has similar
issues to FTP (*);
....

*: one can access things through Windows Explorer, but if doesn't quite
behave correctly, and file-associations / open-with, ... don't work
correctly. stuff only really works well if the volume can be mapped to a
drive-letter, but with Win7 this seems to only work with SMB/CIFS.


I guess the hope would be if there would be something with the
generality of FTP or HTTP, but could behave pretty much like a native
filesystem on both Windows and Linux (I guess it could require something
like a FUSE analogue for Windows or similar though).


as for disk caching:
yes, lots of stuff may be cached in the disk cache;
on 64-bit systems, one can almost get away with using files as a sort of
expanded memory (relying on them tending to stick around in the
disk-cache), although one can just as easily build the program as 64
bits and get the same effect.

more practically though, files can often be used as file-backed
persistent memory (with or without file-mappings, there are tradeoffs here).

[...]

/Jorgen
 
I

Ian Collins

glen said:
(snip, I wrote)



I am not so sure in the general case of mixing read and write
what it does. In this case, it was one very large file that,
presumably, was in the disk cache on the client. It might have
verified that it hadn't changed on the server, and so believed
that it could just supply the data.


At that time, we were using CVS. I don't remember if this file
was part of the CVS tree, though.

There are so many variables at play, the NFS version in use, the quality
of the client and server implementations and the (especially synchronous
write) performance of the storage on the server. A large part of my day
job is getting different NFS clients and servers to play well together
and optimising server performance.

The tasks that produce copious small (on the server) synchronous writes
like updating from GIT, extracting zip or tar archives and hosting a
virtual machine are pathological use cases for NFS...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top