An exercise in fread optimisation

Khookie · Dec 10, 2007

Hi everyone

BTW, thanks to everyone in this group, your collective advice has been
very helpful. I have to say, the C guys are definitely much nicer
than the Lisp guys ;-).

Tonight, I was thinking about freads, and how to get them faster. I
initially wrote a program like this:

-- START CODE --
#include <stdio.h>
#include <time.h>

#define BUFSIZE 32768
#define CHAR_SIZE 1

int main(int argc, char **argv) {
clock_t start = clock();

char buf[BUFSIZE + 1];
memset(buf, '\0', BUFSIZE + 1);

FILE *file = fopen("stuff.txt", "rb");

while (fread(buf, BUFSIZE, CHAR_SIZE, file)) {
//printf("%s", buf);
memset(buf, '\0', BUFSIZE);
}
//printf("%s", buf);

fclose(file);

clock_t finish = clock();
printf("Total CPU time: %d\n", finish - start);
}
-- END CODE --

Average CPU time: 140-156

BTW, stuff.txt is a 200MB binary file of randomness.

I looked at the memsets, and thought - this could maybe be faster.

-- START CODE --
#include <stdio.h>
#include <time.h>

#define BUFSIZE 32768
#define CHAR_SIZE 1

int main(int argc, char **argv) {
clock_t start = clock();

char buf[BUFSIZE + 1];
buf[BUFSIZE] = '\0';

FILE *file = fopen("stuff.txt", "rb");

// Prepare
fseek(file, 0 , SEEK_END);
int size = ftell(file);
int iterations = size / BUFSIZE;
int remaining = size % BUFSIZE;
rewind(file);

/*
printf("Size: %d\n", size);
printf("Iterations: %d\n", iterations);
printf("Remainder: %d\n", remaining);
*/

// Iterate
int i;
for (i = 0; i < iterations; i++) {
fread(buf, BUFSIZE, CHAR_SIZE, file);
//printf("%s", buf);
}

fread(buf, BUFSIZE, CHAR_SIZE, file);
buf[remaining] = '\0';
//printf("%s", buf);

fclose(file);

clock_t finish = clock();
printf("Total CPU time: %d\n", finish - start);
}
-- END CODE --

Average CPU time: 125

What does everyone think? Could it be better?

Hope this was useful to someone.

Chris

Mark Bluemel · Dec 10, 2007

Khookie said:
Hi everyone

BTW, thanks to everyone in this group, your collective advice has been
very helpful. I have to say, the C guys are definitely much nicer
than the Lisp guys ;-).

Tonight, I was thinking about freads, and how to get them faster. I
initially wrote a program like this:

-- START CODE --
#include <stdio.h>
#include <time.h>

#define BUFSIZE 32768
#define CHAR_SIZE 1

int main(int argc, char **argv) {
clock_t start = clock();

char buf[BUFSIZE + 1];
memset(buf, '\0', BUFSIZE + 1);

FILE *file = fopen("stuff.txt", "rb");

while (fread(buf, BUFSIZE, CHAR_SIZE, file)) {

That's a perverse way of expressing it, IMHO... You are specifying you
want one item (CHAR_SIZE is used where a count is expected) of size
BUFSIZE. I'm not sure whether fread is guaranteed to produce consistent
data when the file is a fractional multiple of BUFSIZE bytes long.

//printf("%s", buf);
memset(buf, '\0', BUFSIZE);
} [snip]
BTW, stuff.txt is a 200MB binary file of randomness.

I looked at the memsets, and thought - this could maybe be faster.

Why are you memset()ing at all?

If it's truly random, then it could contain a '\0' anywhere, surely?

So there's no point in trying to use '\0' as a terminator.

fread() will tell you how many items it read. If you used
fread(buf,CHAR_SIZE,BUFSIZE,file)
you would get a useful return code (the number of bytes read) from it.

user923005 · Dec 10, 2007

On Dec 10 said:
#define BUFSIZE 32768
#define CHAR_SIZE 1

int main(int argc, char **argv) {
clock_t start = clock();

char buf[BUFSIZE + 1];

This is useless:
/*

memset(buf, '\0', BUFSIZE + 1); */

FILE *file = fopen("stuff.txt", "rb");

/* Create a 16K I/O buffer: */
setvbuf ( file , NULL , _IOFBF , 1024*16 );

/* We should check the return of setvbuf, as well as the file pointer
itself, of course. */
/* I will leave that to you. */

while (fread(buf, BUFSIZE, CHAR_SIZE, file)) {
//printf("%s", buf);
memset(buf, '\0', BUFSIZE);
}
//printf("%s", buf);

fclose(file);

clock_t finish = clock();
printf("Total CPU time: %d\n", finish - start);}
[snip]
What does everyone think? Could it be better?

1. The memset() calls are totally pointless. You set the data buffer
to zero and then set it to the wanted value via reads. This is not
different than just setting the value via reads.
2. If you want to read faster, then enlarge the read buffer via
setvbuff(). It makes a bigger difference for writing rather than
reading, but it should reduce the total number of reads.

William Pursell · Dec 10, 2007

Hi everyone

BTW, thanks to everyone in this group, your collective advice has been
very helpful. I have to say, the C guys are definitely much nicer
than the Lisp guys ;-).

Tonight, I was thinking about freads, and how to get them faster. I
initially wrote a program like this:

-- START CODE --
#include <stdio.h>
#include <time.h>

#define BUFSIZE 32768
#define CHAR_SIZE 1

BUFSIZ is defined in stdio.h. It is selected (in part)
as the size that is most efficient for I/O on the
implementation. Unless you have a really good
reason not to, you should probably use it instead
of randomly selecting your own value.

Khookie · Dec 10, 2007

BUFSIZ is defined in stdio.h. It is selected (in part)
as the size that is most efficient for I/O on the
implementation. Unless you have a really good
reason not to, you should probably use it instead
of randomly selecting your own value.

whoops - thanks everyone for pointing mistakes out & giving me
suggestions... will definitely fix it

Chris

Khookie · Dec 11, 2007

whoops - thanks everyone for pointing mistakes out & giving me
suggestions... will definitely fix it

Chris

Here is a hopefully more sane version.

#include <stdio.h>
#include <time.h>

#define BUFSIZE BUFSIZ

int main() {
clock_t start = clock();

FILE* file = fopen("stuff.txt", "rb");

char buf[BUFSIZE + 1];
buf[BUFSIZE] = '\0';

int length;
while ((length = fread(buf, 1, BUFSIZE, file)) == BUFSIZE) {
printf(buf);
}
buf[length] = '\0';
printf(buf);

fclose(file);

clock_t finish = clock();
printf("Total CPU time: %d\n", finish - start);
}

Keith Thompson · Dec 11, 2007

Khookie said:
Here is a hopefully more sane version.

#include <stdio.h>
#include <time.h>

#define BUFSIZE BUFSIZ

That's a bit silly. Why not use BUFSIZ directly?

int main() {

Better: int main(void)

clock_t start = clock();

FILE* file = fopen("stuff.txt", "rb");

You open the file (in binary mode), but you don't check whether the
open succeeded. (It seems odd to open a file named "stuff.txt" in
binary mode, but it's not necessarily wrong.)

char buf[BUFSIZE + 1];
buf[BUFSIZE] = '\0';

int length;
while ((length = fread(buf, 1, BUFSIZE, file)) == BUFSIZE) {

fread() returns a size_t result. Why not assign that result to a
size_t variable?

printf(buf);

This is dangerous. You said before that the input file contains
random data (though that's not very specific; it could be random
bytes, random printable characters, random letters, or random
Shakespeare quotations). You pass buf as the format string to printf.
If buf happens to contain a '%' character, it will likely be
interpreted as a format specifier. Kaboom.

And if "stuff.txt" contains random binary data, it likely contains
null bytes ('\0'), which will terminate the string early.

In other words, you're reading data as if it were binary, and writing
it as if it were text (with no '%' characters). Be consistent.

}
buf[length] = '\0';
printf(buf);

See above.

fclose(file);

clock_t finish = clock();
printf("Total CPU time: %d\n", finish - start);

"%d" expects an argument of type int. ``finish - start'' is of type
clock_t. And since the value returned by clock() is scaled by
CLOCKS_PER_SEC, the value printed may not mean much anyway.

I'd add a "return 0;" here.

}

You mix declarations and statements within a block. This is a
C99-specific feature, not supported by all compilers.

Richard Heathfield · Dec 11, 2007

[Yo, Keith. Just a little nit, here.]

Keith Thompson said:

That's a bit silly.

No, it isn't.

Why not use BUFSIZ directly?

Because he might want to change it later.

<snip>

Keith Thompson · Dec 11, 2007

Richard Heathfield said:
[Yo, Keith. Just a little nit, here.]

Keith Thompson said:

That's a bit silly.

Click to expand...

No, it isn't.

Why not use BUFSIZ directly?

Click to expand...

Because he might want to change it later.

Yes, good point. But in that case, I'd really want to use a name
other than BUFSIZE, something easier to distinguish from BUFSIZ.

Khookie · Dec 11, 2007

Richard Heathfield said:
Richard Heathfield said:

[Yo, Keith. Just a little nit, here.]

Click to expand...

Keith Thompson said:

Click to expand...

No, it isn't.

Click to expand...

Because he might want to change it later.

Click to expand...

Yes, good point. But in that case, I'd really want to use a name
other than BUFSIZE, something easier to distinguish from BUFSIZ.

--
Keith Thompson (The_Other_Keith) <[email protected]>
Looking for software development work in the San Diego area.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Apologies - the above code is meant for a sockets application, hence
"#define BUFSIZE BUFSIZ" might look odd, especially in the context of
printf. I'm still trying to determine what makes an optimal buffer
size for sending data via sockets.

Chris

user923005 · Dec 11, 2007

Richard Heathfield said:
Richard Heathfield said:

[Yo, Keith. Just a little nit, here.]
Keith Thompson said:
#define BUFSIZE BUFSIZ
That's a bit silly.
No, it isn't.
Why not use BUFSIZ directly?
Because he might want to change it later.

Click to expand...

Click to expand...

Yes, good point. But in that case, I'd really want to use a name
other than BUFSIZE, something easier to distinguish from BUFSIZ.

Click to expand...

--
Keith Thompson (The_Other_Keith) <[email protected]>
Looking for software development work in the San Diego area.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Click to expand...

Apologies - the above code is meant for a sockets application, hence
"#define BUFSIZE BUFSIZ" might look odd, especially in the context of
printf. I'm still trying to determine what makes an optimal buffer
size for sending data via sockets.

Ludicrously off-topic, but this is what you want for that:
http://dast.nlanr.net/Projects/Iperf/

Q for a source code in an exercise	1	Dec 15, 2023
Fibonacci	0	May 13, 2023
An interesting thing about fread().	6	Apr 13, 2006
fread/fwrite 2	18	Jun 23, 2012
fread()	6	Jan 23, 2008
Please help with C programming to save GPS reception data in Raspberry Pi.	0	Dec 8, 2022
fread()	2	Jul 12, 2004
regarding lseek and fread	2	May 27, 2007

An exercise in fread optimisation

Khookie

Mark Bluemel

user923005

William Pursell

Khookie

Khookie

Keith Thompson

Richard Heathfield

Keith Thompson

Khookie

user923005

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads