Binary File I/O

Discussion in 'C Programming' started by Mr. Mxyztplk, Jun 3, 2014.

  1. Mr. Mxyztplk

    Mr. Mxyztplk Guest

    (supposing file fp has been opened in binary mode, and m = n/p)

    Just wondering if this approach

    for (i = 0; i < p; i++) {
    for (j = 0; j < q; j++) {
    fseek(fp, i, SEEK_SET);
    for (k = 0; k < m; k++) {
    fread(&ch, sizeof(char), 1, fp);
    fseek(fcipher, p - 1, SEEK_CUR);
    /* do whatever */
    }
    }
    }

    actually does run more efficiently (less moving back and forth between
    locations??) than the following

    for (i = 0; i < p; i++) {
    for (j = 0; j < q; j++) {
    for (k = 0; k < n - i + 1; k += p) {
    fseek(fp, k + i, SEEK_SET);
    fread(&ch, sizeof(char), 1, fp);
    /* do whatever */
    }
    }
    }

    ?

    For my purposes it has not made any difference one way or the other. So
    far, that is.
     
    Mr. Mxyztplk, Jun 3, 2014
    #1
    1. Advertisements

  2. Mr. Mxyztplk

    Richard Bos Guest

    Measure, measure, measure. When it comes to efficiency, don't just
    theorise, always measure.
    So, no, then. You _did_ measure and the measurement came up blank.

    Richard
     
    Richard Bos, Jun 4, 2014
    #2
    1. Advertisements

  3. Mr. Mxyztplk

    BGB Guest

    yeah.


    it won't likely make much of a difference on a typical modern OS, as
    there isn't really exactly a whole lot that "seek" does in the first place.

    usually, the seek calls essentially just update a value holding the
    current offset, and the underlying read/write calls are where all the
    magic happens.

    typically, then, things are done at the level of disk-blocks:
    copying data from disk-blocks held in buffers into the applications'
    buffers, or copying application memory into the buffered disk-block and
    marking the block dirty.

    as needed, disk-blocks are pulled in to memory, or written back to disk.
    if this is needed for a given read/write operation to complete, the
    current-process is blocked, and set to resume once the block in question
    becomes available (in the meantime, the OS goes off and does whatever else).


    except in extreme cases, access patterns to file data wont really make
    all that much of a difference.
     
    BGB, Jun 4, 2014
    #3
  4. Mr. Mxyztplk

    Mr. Mxyztplk Guest

    elision
    yeah

    This is basically the answer I was expecting. Now-a-days you have
    something like pages of memory referring to the locations sitting around
    in some buffer or something (my vague way of describing it), so it's not
    like the days when you were really moving back-up-and-down a reel of
    tape.
     
    Mr. Mxyztplk, Jun 4, 2014
    #4
  5. Mr. Mxyztplk

    Jorgen Grahn Guest

    Like he said: so far. Perhaps if the files are accessed over a
    networked file system it makes a difference? Or something.

    "Measure" is sound advice, but it must be allowed to /reason/ about
    efficiency too.

    /Jorgen
     
    Jorgen Grahn, Jun 4, 2014
    #5
  6. Mr. Mxyztplk

    Kaz Kylheku Guest

    To see seeking delays while repeatedly accessing some amount of storage,
    on a caching operating system, you have to exceed the operating system's
    cache size.

    As far as system call overhead goes (calling OS functions to fill the
    FILE * stream's buffer), triggering that overhead probably doesn't require a
    large position delta.

    If you're doing randomly-accessed one byte reads in a binary stream,
    it might be be faster if they are clustered together, if the library
    optimizes small-delta fseeks (by not discarding the buffer).
     
    Kaz Kylheku, Jun 5, 2014
    #6
  7. (snip)
    About 15 years ago, I was timing the I/O of a program and noticed
    that it was faster than the network (using NFS) that the file was
    stored on. The file was about 400MB, but then I realized that the
    disk cache on a 4GB (unusual at the time) server was big enough
    to cache the whole file.

    (Our project manager tried to order us a 16GB server, but Dell
    wouldn't sell him one.)

    The disk cache complicates any timing related to file and disk
    access.

    -- glen
     
    glen herrmannsfeldt, Jun 5, 2014
    #7
  8. Mr. Mxyztplk

    Jorgen Grahn Guest

    My gut feeling about NFS is that it's often the other way around --
    you get fewer benefits of caching, and a lot of things that would have
    been cached for a local disk instead needs a network roundtrip.

    E.g. the Git version control utility is rather painful to use on a NFS
    disk.

    It would be useful with a profiling utility for such things.

    /Jorgen
     
    Jorgen Grahn, Jun 6, 2014
    #8
  9. (snip, I wrote)
    I am not so sure in the general case of mixing read and write
    what it does. In this case, it was one very large file that,
    presumably, was in the disk cache on the client. It might have
    verified that it hadn't changed on the server, and so believed
    that it could just supply the data.
    At that time, we were using CVS. I don't remember if this file
    was part of the CVS tree, though.
    More recently, I had some other unexpected results from NFS,
    with the client running OS X 10.6.8, I wrote a file, renamed
    it on the server, then tried to open it with the new name on
    the client. The message that came back was (old name) does
    not exist. I never tracked down why it did that, but possibly
    it is again related to disk caching.

    -- glen
     
    glen herrmannsfeldt, Jun 6, 2014
    #9
  10. Mr. Mxyztplk

    Jorgen Grahn Guest

    Yeah, well, I'm not saying I understand NFS. Perhaps I should have
    said "prejudice" instead of "gut feeling".

    [...]

    /Jorgen
     
    Jorgen Grahn, Jun 7, 2014
    #10
  11. Mr. Mxyztplk

    BGB Guest

    going OT:

    this brings up a thought:
    why, with so many years that people have been doing this sort of thing,
    there are not yet any really "good" network filesystems.

    ex, issues with existing network filesystems:
    NFS, kind of funky, not really well supported on Windows;
    SMB / CIFS, not very well-behaved in general, getting Samba and
    different Windows versions to play well together is often a bit of a
    pain, but can mostly look like a native FS on the various targets, when
    it works at least;
    FTP, generally doesn't behave much like a native filesystem, on either
    Windows or Linux, but does at least work on both and works ok over the
    internet;
    HTTP, ok for unidirectional downloads, WebDAV exists but has similar
    issues to FTP (*);
    ....

    *: one can access things through Windows Explorer, but if doesn't quite
    behave correctly, and file-associations / open-with, ... don't work
    correctly. stuff only really works well if the volume can be mapped to a
    drive-letter, but with Win7 this seems to only work with SMB/CIFS.


    I guess the hope would be if there would be something with the
    generality of FTP or HTTP, but could behave pretty much like a native
    filesystem on both Windows and Linux (I guess it could require something
    like a FUSE analogue for Windows or similar though).


    as for disk caching:
    yes, lots of stuff may be cached in the disk cache;
    on 64-bit systems, one can almost get away with using files as a sort of
    expanded memory (relying on them tending to stick around in the
    disk-cache), although one can just as easily build the program as 64
    bits and get the same effect.

    more practically though, files can often be used as file-backed
    persistent memory (with or without file-mappings, there are tradeoffs here).

     
    BGB, Jun 7, 2014
    #11
  12. Mr. Mxyztplk

    Ian Collins Guest

    There are so many variables at play, the NFS version in use, the quality
    of the client and server implementations and the (especially synchronous
    write) performance of the storage on the server. A large part of my day
    job is getting different NFS clients and servers to play well together
    and optimising server performance.

    The tasks that produce copious small (on the server) synchronous writes
    like updating from GIT, extracting zip or tar archives and hosting a
    virtual machine are pathological use cases for NFS...
     
    Ian Collins, Jun 7, 2014
    #12
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.