fast multiple file access

C

Cable

Hello,

I am hoping that someone can answer a question or two regarding file
access. I have created an app that reads an image from a file then
displays it (using OpenGL). It works well using fopen() with fgetc()
to access each byte. I have decided to move further with this app and
allow the user to select the first file of an image sequence and it
will play the sequence back at at 24 frames per second. I have almost
everything worked out but am curious if using fgetc() is the fastest
and most efficient means of reading in data. Each image is at least
1.1 MB is file size. My main question is: When one uses fopen() does
this just provide a Pointer (via a stream) to the file on disk or does
this actually load the image into RAM and return a pointer to that?
Also, can anyone recommend a decent workflow for reading in files very
quickly (the files have to be parsed to retreive the image data as well
as the header info)? My drives are fast enough but I want to make sure
that I am not slowing things down with poor file access. Thanks for
any advice.

Cable
 
W

Walter Roberson

I have almost
everything worked out but am curious if using fgetc() is the fastest
and most efficient means of reading in data.


It depends on how good the optimizer is.

Generally speaking, when you know you are reading a number of bytes,
fread() is faster, as it avoids the overhead of invoking fgetc()
each time. However, if you have a good optimizer then it might all
come out the same.

Each image is at least
1.1 MB is file size. My main question is: When one uses fopen() does
this just provide a Pointer (via a stream) to the file on disk or does
this actually load the image into RAM and return a pointer to that?

fopen() does NOT read any of the file. The first fgetc() or fread()
or equivilents will read the first bufferful into memory, according to
the size of buffer that has been configured. Subsequent fgetc()
or fread() read out of the in-memory buffer until they get to the
end of it, then read another bufferful, then go back to reading
out of memory, and so on.


The rest of this message gets into non-portable extensions.
Also, can anyone recommend a decent workflow for reading in files very
quickly (the files have to be parsed to retreive the image data as well
as the header info)? My drives are fast enough but I want to make sure
that I am not slowing things down with poor file access.
I have created an app that reads an image from a file then
displays it (using OpenGL)

The OpenGL part is not part of the C standard, so you are already
using non-portable constructs. You need to decide how far into
non-portability you are willing to go. If you find that your
current fgetc() scheme isn't fast enough, and fread() isn't either,
then you should consider using system extensions such as:

- read() -- implemented on all Unix systems and many others

- open( O_DIRECT ) -- in association with read(), allows direct I/O
bypassing system buffers; not supported in all Unixes

- mmap() -- allows a file to be mapped into memory -- possibly more
common than O_DIRECT

- readv() -- allows scatter/gather I/O -- probably not particularily
common
- real-time filesystems such as via SGI's grio extensions and
XFS real-time volumes
- placing the files into a raw partition and handling the filesystem
management yourself

- writing your own device driver

- turning on Command Tag Queuing on SCSI devices

- pre-processing the files into raw data files that can be DMA'd
directly into a buffer suitable for passing to OpenGL

- read the files through once so as to bring their contents into
the system file cache, before starting the graphics process

- figuring out which part of your disk delivers data most quickly,
and ensuring that the files are written to that part of the disk

- when writing the files, figure out about how big they are
going to be, seek to that position, write a byte, and seek
back to the beginning and fill in the data. On many systems,
this will result in contiguous blocks being allocated for the
storage, whereas if you did the standard write of a buffer at
a time, the buffers could end up fragmented all over the disk

- pay attention to time needed to finish processing one file
and open the next, and to the relative positions on disk.
Ideally, when you issue the next read to disk, the disk block
you need should be the very next one that spins under the
head of the current track, so that there is no track-to-track
seek time and no time spent waiting for the appropriate sector
to spin around. This may require fetching information about the
drive geometry -- and for most SCSI disks, geometry is only
an approximation because there are variable number of sectors
per track (outer tracks hold more.)

- issue the largest read request that you can, so that the disk
can read as many consequative blocks as practical

- for SCSI disks, examine the bad-block information so as to
ensure that you aren't seeking wildly to a replacement block
in the middle of an important stream

- if you really get cramped for time, use a solid-state disk
if you can

I'm sure there are many additional disk optimization methods.
See if your OS has a tool named 'diskperf' available for it.

You may have noticed that nearly all of these optimizations are
system and/or hardware specific. The C language itself is
not concerned with filesystem representations or I/O
optimization.
 
C

Chris Torek

fopen() does NOT read any of the file.

Unless, of course, it does. (Consider, e.g., FTP-based file
systems, that act as an FTP client to an FTP server. Wind River
has one in VxWorks, so they do exist. There are some catches
though; and merely opening the file does not always read the
entire thing.)
The first fgetc() or fread() or equivalents will read the first
bufferful into memory, according to the size of buffer that has
been configured.

Ideally, anyway. If your C implementation has been reasonably
optimized, it should choose "reasonably good" buffer sizes
automatically as well, and your input should proceed at something
approaching maximum possible speed without any foolery at all.

Of course, there are always exceptions ... but then you have to:
The rest of this message gets into non-portable extensions.

.... get into those non-portable extensions. You also have to
experiment, as many attempts to go faster will prove to go slower
instead. Things like interrupt latency, DMA, and overlapping
read transactions can have surprising interactions. For instance,
this trick stands to reason, and does work on some systems:
- issue the largest read request that you can, so that the disk
can read as many consequative blocks as practical

but on others it backfires badly since a request for (say) one
megabyte has to wait for the entire megabyte, while 16 requests
for 64K each can process each 64K chunk "on the fly" while the
next chunk arrives.
- if you really get cramped for time, use a solid-state disk
if you can

Be aware, however, that flash memory is significantly *slower*
than rotating media (though reads are not as bad as writes).
 
C

CBFalconer

Walter said:
.... snip ...

Generally speaking, when you know you are reading a number of
bytes, fread() is faster, as it avoids the overhead of invoking
fgetc() each time. However, if you have a good optimizer then it
might all come out the same.

Not necessarily so. getc may be faster if the purpose is to scan
for something. Using it may well provide controlled access to the
file buffer, without ever moving any data, and with no function
call overhead (getc can be a macro, while fgetc may not).

Typical expansion of a getc call might be the inline equivalent of:

if (f->n--) fillbuffer(f);
return *f->p++;

which the implementation can do, because it knows the structure of
a FILE.
 
M

Michael Mair

CBFalconer said:
Walter Roberson wrote:

... snip ...



Not necessarily so. getc may be faster if the purpose is to scan
for something. Using it may well provide controlled access to the
file buffer, without ever moving any data, and with no function
call overhead (getc can be a macro, while fgetc may not).

Typical expansion of a getc call might be the inline equivalent of:

if (f->n--) fillbuffer(f);
ITYM
if (!(f->n--))
 
L

Lawrence Kirby

Not necessarily so. getc may be faster if the purpose is to scan
for something. Using it may well provide controlled access to the
file buffer, without ever moving any data, and with no function
call overhead (getc can be a macro, while fgetc may not).

The standard requires both getc() and fgetc() to be available as a
function e.g. (getc)(stdin) is valid. It also allows <stdio.h> to provide
macro definitions for both. However it allows a getc() macro to violate
normal function-call-like semantics by evaluating its FILE * argument more
than once. fgetc() must evaluate it exactly once so possibilities for
implementing it as a macro are limited.

Lawrence
 
C

CBFalconer

Lawrence said:
The standard requires both getc() and fgetc() to be available as a
function e.g. (getc)(stdin) is valid. It also allows <stdio.h> to
provide macro definitions for both. However it allows a getc()
macro to violate normal function-call-like semantics by evaluating
its FILE * argument more than once. fgetc() must evaluate it
exactly once so possibilities for implementing it as a macro are
limited.

Yes, your exposition is more accurate than mine, and better exposes
cases where you should or should not prefer one over the other.
However, in the context of efficiency vis a vis fread, the point is
that getc may well be the better choice, may break even, and in
some cases may be the poorer chice. It depends on the
implementation. If it matters, test.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,053
Latest member
BrodieSola

Latest Threads

Top