fast multiple file access

Discussion in 'C Programming' started by Cable, Jun 29, 2005.

  1. Cable

    Cable Guest

    Hello,

    I am hoping that someone can answer a question or two regarding file
    access. I have created an app that reads an image from a file then
    displays it (using OpenGL). It works well using fopen() with fgetc()
    to access each byte. I have decided to move further with this app and
    allow the user to select the first file of an image sequence and it
    will play the sequence back at at 24 frames per second. I have almost
    everything worked out but am curious if using fgetc() is the fastest
    and most efficient means of reading in data. Each image is at least
    1.1 MB is file size. My main question is: When one uses fopen() does
    this just provide a Pointer (via a stream) to the file on disk or does
    this actually load the image into RAM and return a pointer to that?
    Also, can anyone recommend a decent workflow for reading in files very
    quickly (the files have to be parsed to retreive the image data as well
    as the header info)? My drives are fast enough but I want to make sure
    that I am not slowing things down with poor file access. Thanks for
    any advice.

    Cable
    Cable, Jun 29, 2005
    #1
    1. Advertising

  2. In article <>,
    Cable <> wrote:
    >I have almost
    >everything worked out but am curious if using fgetc() is the fastest
    >and most efficient means of reading in data.



    It depends on how good the optimizer is.

    Generally speaking, when you know you are reading a number of bytes,
    fread() is faster, as it avoids the overhead of invoking fgetc()
    each time. However, if you have a good optimizer then it might all
    come out the same.


    > Each image is at least
    >1.1 MB is file size. My main question is: When one uses fopen() does
    >this just provide a Pointer (via a stream) to the file on disk or does
    >this actually load the image into RAM and return a pointer to that?


    fopen() does NOT read any of the file. The first fgetc() or fread()
    or equivilents will read the first bufferful into memory, according to
    the size of buffer that has been configured. Subsequent fgetc()
    or fread() read out of the in-memory buffer until they get to the
    end of it, then read another bufferful, then go back to reading
    out of memory, and so on.


    The rest of this message gets into non-portable extensions.

    >Also, can anyone recommend a decent workflow for reading in files very
    >quickly (the files have to be parsed to retreive the image data as well
    >as the header info)? My drives are fast enough but I want to make sure
    >that I am not slowing things down with poor file access.


    > I have created an app that reads an image from a file then
    > displays it (using OpenGL)


    The OpenGL part is not part of the C standard, so you are already
    using non-portable constructs. You need to decide how far into
    non-portability you are willing to go. If you find that your
    current fgetc() scheme isn't fast enough, and fread() isn't either,
    then you should consider using system extensions such as:

    - read() -- implemented on all Unix systems and many others

    - open( O_DIRECT ) -- in association with read(), allows direct I/O
    bypassing system buffers; not supported in all Unixes

    - mmap() -- allows a file to be mapped into memory -- possibly more
    common than O_DIRECT

    - readv() -- allows scatter/gather I/O -- probably not particularily
    common
    - real-time filesystems such as via SGI's grio extensions and
    XFS real-time volumes
    - placing the files into a raw partition and handling the filesystem
    management yourself

    - writing your own device driver

    - turning on Command Tag Queuing on SCSI devices

    - pre-processing the files into raw data files that can be DMA'd
    directly into a buffer suitable for passing to OpenGL

    - read the files through once so as to bring their contents into
    the system file cache, before starting the graphics process

    - figuring out which part of your disk delivers data most quickly,
    and ensuring that the files are written to that part of the disk

    - when writing the files, figure out about how big they are
    going to be, seek to that position, write a byte, and seek
    back to the beginning and fill in the data. On many systems,
    this will result in contiguous blocks being allocated for the
    storage, whereas if you did the standard write of a buffer at
    a time, the buffers could end up fragmented all over the disk

    - pay attention to time needed to finish processing one file
    and open the next, and to the relative positions on disk.
    Ideally, when you issue the next read to disk, the disk block
    you need should be the very next one that spins under the
    head of the current track, so that there is no track-to-track
    seek time and no time spent waiting for the appropriate sector
    to spin around. This may require fetching information about the
    drive geometry -- and for most SCSI disks, geometry is only
    an approximation because there are variable number of sectors
    per track (outer tracks hold more.)

    - issue the largest read request that you can, so that the disk
    can read as many consequative blocks as practical

    - for SCSI disks, examine the bad-block information so as to
    ensure that you aren't seeking wildly to a replacement block
    in the middle of an important stream

    - if you really get cramped for time, use a solid-state disk
    if you can

    I'm sure there are many additional disk optimization methods.
    See if your OS has a tool named 'diskperf' available for it.

    You may have noticed that nearly all of these optimizations are
    system and/or hardware specific. The C language itself is
    not concerned with filesystem representations or I/O
    optimization.
    --
    "Never install telephone wiring during a lightning storm." -- Linksys
    Walter Roberson, Jun 29, 2005
    #2
    1. Advertising

  3. Cable

    Chris Torek Guest

    >In article <>,
    >Cable <> wrote:
    >>... My main question is: When one uses fopen() does
    >>this just provide a Pointer (via a stream) to the file on disk or does
    >>this actually load the image into RAM and return a pointer to that?


    In article <d9t7gt$ras$>
    Walter Roberson <-cnrc.gc.ca> wrote:
    >fopen() does NOT read any of the file.


    Unless, of course, it does. (Consider, e.g., FTP-based file
    systems, that act as an FTP client to an FTP server. Wind River
    has one in VxWorks, so they do exist. There are some catches
    though; and merely opening the file does not always read the
    entire thing.)

    >The first fgetc() or fread() or equivalents will read the first
    >bufferful into memory, according to the size of buffer that has
    >been configured.


    Ideally, anyway. If your C implementation has been reasonably
    optimized, it should choose "reasonably good" buffer sizes
    automatically as well, and your input should proceed at something
    approaching maximum possible speed without any foolery at all.

    Of course, there are always exceptions ... but then you have to:

    >The rest of this message gets into non-portable extensions.


    .... get into those non-portable extensions. You also have to
    experiment, as many attempts to go faster will prove to go slower
    instead. Things like interrupt latency, DMA, and overlapping
    read transactions can have surprising interactions. For instance,
    this trick stands to reason, and does work on some systems:

    >- issue the largest read request that you can, so that the disk
    > can read as many consequative blocks as practical


    but on others it backfires badly since a request for (say) one
    megabyte has to wait for the entire megabyte, while 16 requests
    for 64K each can process each 64K chunk "on the fly" while the
    next chunk arrives.

    >- if you really get cramped for time, use a solid-state disk
    > if you can


    Be aware, however, that flash memory is significantly *slower*
    than rotating media (though reads are not as bad as writes).
    --
    In-Real-Life: Chris Torek, Wind River Systems
    Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
    email: forget about it http://web.torek.net/torek/index.html
    Reading email is like searching for food in the garbage, thanks to spammers.
    Chris Torek, Jun 29, 2005
    #3
  4. Cable

    CBFalconer Guest

    Walter Roberson wrote:
    >

    .... snip ...
    >
    > Generally speaking, when you know you are reading a number of
    > bytes, fread() is faster, as it avoids the overhead of invoking
    > fgetc() each time. However, if you have a good optimizer then it
    > might all come out the same.


    Not necessarily so. getc may be faster if the purpose is to scan
    for something. Using it may well provide controlled access to the
    file buffer, without ever moving any data, and with no function
    call overhead (getc can be a macro, while fgetc may not).

    Typical expansion of a getc call might be the inline equivalent of:

    if (f->n--) fillbuffer(f);
    return *f->p++;

    which the implementation can do, because it knows the structure of
    a FILE.

    --
    "A man who is right every time is not likely to do very much."
    -- Francis Crick, co-discover of DNA
    "There is nothing more amazing than stupidity in action."
    -- Thomas Matthews
    CBFalconer, Jun 29, 2005
    #4
  5. Cable

    Michael Mair Guest

    CBFalconer wrote:
    > Walter Roberson wrote:
    >
    > ... snip ...
    >
    >>Generally speaking, when you know you are reading a number of
    >>bytes, fread() is faster, as it avoids the overhead of invoking
    >>fgetc() each time. However, if you have a good optimizer then it
    >>might all come out the same.

    >
    >
    > Not necessarily so. getc may be faster if the purpose is to scan
    > for something. Using it may well provide controlled access to the
    > file buffer, without ever moving any data, and with no function
    > call overhead (getc can be a macro, while fgetc may not).
    >
    > Typical expansion of a getc call might be the inline equivalent of:
    >
    > if (f->n--) fillbuffer(f);

    ITYM
    if (!(f->n--))

    > return *f->p++;
    >
    > which the implementation can do, because it knows the structure of
    > a FILE.
    >



    --
    E-Mail: Mine is an /at/ gmx /dot/ de address.
    Michael Mair, Jun 29, 2005
    #5
  6. On Wed, 29 Jun 2005 17:05:00 +0000, CBFalconer wrote:

    > Walter Roberson wrote:
    >>

    > ... snip ...
    >>
    >> Generally speaking, when you know you are reading a number of
    >> bytes, fread() is faster, as it avoids the overhead of invoking
    >> fgetc() each time. However, if you have a good optimizer then it
    >> might all come out the same.

    >
    > Not necessarily so. getc may be faster if the purpose is to scan
    > for something. Using it may well provide controlled access to the
    > file buffer, without ever moving any data, and with no function
    > call overhead (getc can be a macro, while fgetc may not).


    The standard requires both getc() and fgetc() to be available as a
    function e.g. (getc)(stdin) is valid. It also allows <stdio.h> to provide
    macro definitions for both. However it allows a getc() macro to violate
    normal function-call-like semantics by evaluating its FILE * argument more
    than once. fgetc() must evaluate it exactly once so possibilities for
    implementing it as a macro are limited.

    Lawrence
    Lawrence Kirby, Jun 30, 2005
    #6
  7. Cable

    CBFalconer Guest

    Lawrence Kirby wrote:
    > On Wed, 29 Jun 2005 17:05:00 +0000, CBFalconer wrote:
    >> Walter Roberson wrote:
    >>>

    >> ... snip ...
    >>>
    >>> Generally speaking, when you know you are reading a number of
    >>> bytes, fread() is faster, as it avoids the overhead of invoking
    >>> fgetc() each time. However, if you have a good optimizer then it
    >>> might all come out the same.

    >>
    >> Not necessarily so. getc may be faster if the purpose is to scan
    >> for something. Using it may well provide controlled access to the
    >> file buffer, without ever moving any data, and with no function
    >> call overhead (getc can be a macro, while fgetc may not).

    >
    > The standard requires both getc() and fgetc() to be available as a
    > function e.g. (getc)(stdin) is valid. It also allows <stdio.h> to
    > provide macro definitions for both. However it allows a getc()
    > macro to violate normal function-call-like semantics by evaluating
    > its FILE * argument more than once. fgetc() must evaluate it
    > exactly once so possibilities for implementing it as a macro are
    > limited.


    Yes, your exposition is more accurate than mine, and better exposes
    cases where you should or should not prefer one over the other.
    However, in the context of efficiency vis a vis fread, the point is
    that getc may well be the better choice, may break even, and in
    some cases may be the poorer chice. It depends on the
    implementation. If it matters, test.

    --
    Chuck F () ()
    Available for consulting/temporary embedded and systems.
    <http://cbfalconer.home.att.net> USE worldnet address!
    CBFalconer, Jun 30, 2005
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Chris

    Fast array access

    Chris, Jul 21, 2004, in forum: Java
    Replies:
    17
    Views:
    1,086
  2. Kevin
    Replies:
    10
    Views:
    1,230
    Chris Uppal
    Sep 15, 2005
  3. Replies:
    0
    Views:
    663
  4. Michele Simionato

    Python is darn fast (was: How fast is Python)

    Michele Simionato, Aug 23, 2003, in forum: Python
    Replies:
    13
    Views:
    563
  5. Juha Nieminen
    Replies:
    22
    Views:
    1,025
    Kai-Uwe Bux
    Oct 12, 2007
Loading...

Share This Page