An exercise in fread optimisation

Discussion in 'C Programming' started by Khookie, Dec 10, 2007.

  1. Khookie

    Khookie Guest

    Hi everyone

    BTW, thanks to everyone in this group, your collective advice has been
    very helpful. I have to say, the C guys are definitely much nicer
    than the Lisp guys ;-).

    Tonight, I was thinking about freads, and how to get them faster. I
    initially wrote a program like this:

    -- START CODE --
    #include <stdio.h>
    #include <time.h>

    #define BUFSIZE 32768
    #define CHAR_SIZE 1

    int main(int argc, char **argv) {
    clock_t start = clock();

    char buf[BUFSIZE + 1];
    memset(buf, '\0', BUFSIZE + 1);

    FILE *file = fopen("stuff.txt", "rb");

    while (fread(buf, BUFSIZE, CHAR_SIZE, file)) {
    //printf("%s", buf);
    memset(buf, '\0', BUFSIZE);
    }
    //printf("%s", buf);

    fclose(file);

    clock_t finish = clock();
    printf("Total CPU time: %d\n", finish - start);
    }
    -- END CODE --

    Average CPU time: 140-156

    BTW, stuff.txt is a 200MB binary file of randomness.

    I looked at the memsets, and thought - this could maybe be faster.

    -- START CODE --
    #include <stdio.h>
    #include <time.h>

    #define BUFSIZE 32768
    #define CHAR_SIZE 1

    int main(int argc, char **argv) {
    clock_t start = clock();

    char buf[BUFSIZE + 1];
    buf[BUFSIZE] = '\0';

    FILE *file = fopen("stuff.txt", "rb");

    // Prepare
    fseek(file, 0 , SEEK_END);
    int size = ftell(file);
    int iterations = size / BUFSIZE;
    int remaining = size % BUFSIZE;
    rewind(file);

    /*
    printf("Size: %d\n", size);
    printf("Iterations: %d\n", iterations);
    printf("Remainder: %d\n", remaining);
    */

    // Iterate
    int i;
    for (i = 0; i < iterations; i++) {
    fread(buf, BUFSIZE, CHAR_SIZE, file);
    //printf("%s", buf);
    }

    fread(buf, BUFSIZE, CHAR_SIZE, file);
    buf[remaining] = '\0';
    //printf("%s", buf);

    fclose(file);

    clock_t finish = clock();
    printf("Total CPU time: %d\n", finish - start);
    }
    -- END CODE --

    Average CPU time: 125

    What does everyone think? Could it be better?

    Hope this was useful to someone.

    Chris
     
    Khookie, Dec 10, 2007
    #1
    1. Advertising

  2. Khookie

    Mark Bluemel Guest

    Khookie wrote:
    > Hi everyone
    >
    > BTW, thanks to everyone in this group, your collective advice has been
    > very helpful. I have to say, the C guys are definitely much nicer
    > than the Lisp guys ;-).
    >
    > Tonight, I was thinking about freads, and how to get them faster. I
    > initially wrote a program like this:
    >
    > -- START CODE --
    > #include <stdio.h>
    > #include <time.h>
    >
    > #define BUFSIZE 32768
    > #define CHAR_SIZE 1
    >
    > int main(int argc, char **argv) {
    > clock_t start = clock();
    >
    > char buf[BUFSIZE + 1];
    > memset(buf, '\0', BUFSIZE + 1);
    >
    > FILE *file = fopen("stuff.txt", "rb");
    >
    > while (fread(buf, BUFSIZE, CHAR_SIZE, file)) {


    That's a perverse way of expressing it, IMHO... You are specifying you
    want one item (CHAR_SIZE is used where a count is expected) of size
    BUFSIZE. I'm not sure whether fread is guaranteed to produce consistent
    data when the file is a fractional multiple of BUFSIZE bytes long.

    > //printf("%s", buf);
    > memset(buf, '\0', BUFSIZE);
    > }

    [snip]
    > BTW, stuff.txt is a 200MB binary file of randomness.
    >
    > I looked at the memsets, and thought - this could maybe be faster.


    Why are you memset()ing at all?

    If it's truly random, then it could contain a '\0' anywhere, surely?

    So there's no point in trying to use '\0' as a terminator.

    fread() will tell you how many items it read. If you used
    fread(buf,CHAR_SIZE,BUFSIZE,file)
    you would get a useful return code (the number of bytes read) from it.
     
    Mark Bluemel, Dec 10, 2007
    #2
    1. Advertising

  3. Khookie

    user923005 Guest

    On Dec 10, 6:55 am, Khookie <> wrote:
    [snip]
    > #define BUFSIZE 32768
    > #define CHAR_SIZE 1
    >
    > int main(int argc, char **argv) {
    > clock_t start = clock();
    >
    > char buf[BUFSIZE + 1];


    This is useless:
    /*
    > memset(buf, '\0', BUFSIZE + 1);

    */

    >
    > FILE *file = fopen("stuff.txt", "rb");


    /* Create a 16K I/O buffer: */
    setvbuf ( file , NULL , _IOFBF , 1024*16 );

    /* We should check the return of setvbuf, as well as the file pointer
    itself, of course. */
    /* I will leave that to you. */

    > while (fread(buf, BUFSIZE, CHAR_SIZE, file)) {
    > //printf("%s", buf);
    > memset(buf, '\0', BUFSIZE);
    > }
    > //printf("%s", buf);
    >
    > fclose(file);
    >
    > clock_t finish = clock();
    > printf("Total CPU time: %d\n", finish - start);}
    >

    [snip]
    > What does everyone think? Could it be better?


    1. The memset() calls are totally pointless. You set the data buffer
    to zero and then set it to the wanted value via reads. This is not
    different than just setting the value via reads.
    2. If you want to read faster, then enlarge the read buffer via
    setvbuff(). It makes a bigger difference for writing rather than
    reading, but it should reduce the total number of reads.
     
    user923005, Dec 10, 2007
    #3
  4. On 10 Dec, 14:55, Khookie <> wrote:
    > Hi everyone
    >
    > BTW, thanks to everyone in this group, your collective advice has been
    > very helpful. I have to say, the C guys are definitely much nicer
    > than the Lisp guys ;-).
    >
    > Tonight, I was thinking about freads, and how to get them faster. I
    > initially wrote a program like this:
    >
    > -- START CODE --
    > #include <stdio.h>
    > #include <time.h>
    >
    > #define BUFSIZE 32768
    > #define CHAR_SIZE 1


    BUFSIZ is defined in stdio.h. It is selected (in part)
    as the size that is most efficient for I/O on the
    implementation. Unless you have a really good
    reason not to, you should probably use it instead
    of randomly selecting your own value.
     
    William Pursell, Dec 10, 2007
    #4
  5. Khookie

    Khookie Guest

    On Dec 11, 6:40 am, William Pursell <> wrote:
    > On 10 Dec, 14:55, Khookie <> wrote:
    >
    > > Hi everyone

    >
    > > BTW, thanks to everyone in this group, your collective advice has been
    > > very helpful. I have to say, the C guys are definitely much nicer
    > > than the Lisp guys ;-).

    >
    > > Tonight, I was thinking about freads, and how to get them faster. I
    > > initially wrote a program like this:

    >
    > > -- START CODE --
    > > #include <stdio.h>
    > > #include <time.h>

    >
    > > #define BUFSIZE 32768
    > > #define CHAR_SIZE 1

    >
    > BUFSIZ is defined in stdio.h. It is selected (in part)
    > as the size that is most efficient for I/O on the
    > implementation. Unless you have a really good
    > reason not to, you should probably use it instead
    > of randomly selecting your own value.


    whoops - thanks everyone for pointing mistakes out & giving me
    suggestions... will definitely fix it

    Chris
     
    Khookie, Dec 10, 2007
    #5
  6. Khookie

    Khookie Guest

    On Dec 11, 9:29 am, Khookie <> wrote:
    > On Dec 11, 6:40 am, William Pursell <> wrote:
    >
    >
    >
    > > On 10 Dec, 14:55, Khookie <> wrote:

    >
    > > > Hi everyone

    >
    > > > BTW, thanks to everyone in this group, your collective advice has been
    > > > very helpful. I have to say, the C guys are definitely much nicer
    > > > than the Lisp guys ;-).

    >
    > > > Tonight, I was thinking about freads, and how to get them faster. I
    > > > initially wrote a program like this:

    >
    > > > -- START CODE --
    > > > #include <stdio.h>
    > > > #include <time.h>

    >
    > > > #define BUFSIZE 32768
    > > > #define CHAR_SIZE 1

    >
    > > BUFSIZ is defined in stdio.h. It is selected (in part)
    > > as the size that is most efficient for I/O on the
    > > implementation. Unless you have a really good
    > > reason not to, you should probably use it instead
    > > of randomly selecting your own value.

    >
    > whoops - thanks everyone for pointing mistakes out & giving me
    > suggestions... will definitely fix it
    >
    > Chris


    Here is a hopefully more sane version.

    #include <stdio.h>
    #include <time.h>

    #define BUFSIZE BUFSIZ

    int main() {
    clock_t start = clock();

    FILE* file = fopen("stuff.txt", "rb");

    char buf[BUFSIZE + 1];
    buf[BUFSIZE] = '\0';

    int length;
    while ((length = fread(buf, 1, BUFSIZE, file)) == BUFSIZE) {
    printf(buf);
    }
    buf[length] = '\0';
    printf(buf);

    fclose(file);

    clock_t finish = clock();
    printf("Total CPU time: %d\n", finish - start);
    }
     
    Khookie, Dec 11, 2007
    #6
  7. Khookie <> writes:
    > On Dec 11, 9:29 am, Khookie <> wrote:

    [...]
    > Here is a hopefully more sane version.
    >
    > #include <stdio.h>
    > #include <time.h>
    >
    > #define BUFSIZE BUFSIZ


    That's a bit silly. Why not use BUFSIZ directly?

    > int main() {


    Better: int main(void)

    > clock_t start = clock();
    >
    > FILE* file = fopen("stuff.txt", "rb");


    You open the file (in binary mode), but you don't check whether the
    open succeeded. (It seems odd to open a file named "stuff.txt" in
    binary mode, but it's not necessarily wrong.)

    > char buf[BUFSIZE + 1];
    > buf[BUFSIZE] = '\0';
    >
    > int length;
    > while ((length = fread(buf, 1, BUFSIZE, file)) == BUFSIZE) {


    fread() returns a size_t result. Why not assign that result to a
    size_t variable?

    > printf(buf);


    This is dangerous. You said before that the input file contains
    random data (though that's not very specific; it could be random
    bytes, random printable characters, random letters, or random
    Shakespeare quotations). You pass buf as the format string to printf.
    If buf happens to contain a '%' character, it will likely be
    interpreted as a format specifier. Kaboom.

    And if "stuff.txt" contains random binary data, it likely contains
    null bytes ('\0'), which will terminate the string early.

    In other words, you're reading data as if it were binary, and writing
    it as if it were text (with no '%' characters). Be consistent.

    > }
    > buf[length] = '\0';
    > printf(buf);


    See above.

    > fclose(file);
    >
    > clock_t finish = clock();
    > printf("Total CPU time: %d\n", finish - start);


    "%d" expects an argument of type int. ``finish - start'' is of type
    clock_t. And since the value returned by clock() is scaled by
    CLOCKS_PER_SEC, the value printed may not mean much anyway.

    I'd add a "return 0;" here.

    > }


    You mix declarations and statements within a block. This is a
    C99-specific feature, not supported by all compilers.

    --
    Keith Thompson (The_Other_Keith) <>
    Looking for software development work in the San Diego area.
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Dec 11, 2007
    #7
  8. [Yo, Keith. Just a little nit, here.]

    Keith Thompson said:

    > Khookie <> writes:
    >>
    >> #define BUFSIZE BUFSIZ

    >
    > That's a bit silly.


    No, it isn't.

    > Why not use BUFSIZ directly?


    Because he might want to change it later.

    <snip>

    --
    Richard Heathfield <http://www.cpax.org.uk>
    Email: -http://www. +rjh@
    Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
    "Usenet is a strange place" - dmr 29 July 1999
     
    Richard Heathfield, Dec 11, 2007
    #8
  9. Richard Heathfield <> writes:
    > [Yo, Keith. Just a little nit, here.]
    >
    > Keith Thompson said:
    >
    >> Khookie <> writes:
    >>>
    >>> #define BUFSIZE BUFSIZ

    >>
    >> That's a bit silly.

    >
    > No, it isn't.
    >
    >> Why not use BUFSIZ directly?

    >
    > Because he might want to change it later.


    Yes, good point. But in that case, I'd really want to use a name
    other than BUFSIZE, something easier to distinguish from BUFSIZ.

    --
    Keith Thompson (The_Other_Keith) <>
    Looking for software development work in the San Diego area.
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Dec 11, 2007
    #9
  10. Khookie

    Khookie Guest

    On Dec 11, 8:45 pm, Keith Thompson <> wrote:
    > Richard Heathfield <> writes:
    > > [Yo, Keith. Just a little nit, here.]

    >
    > > Keith Thompson said:

    >
    > >> Khookie <> writes:

    >
    > >>> #define BUFSIZE BUFSIZ

    >
    > >> That's a bit silly.

    >
    > > No, it isn't.

    >
    > >> Why not use BUFSIZ directly?

    >
    > > Because he might want to change it later.

    >
    > Yes, good point. But in that case, I'd really want to use a name
    > other than BUFSIZE, something easier to distinguish from BUFSIZ.
    >
    > --
    > Keith Thompson (The_Other_Keith) <>
    > Looking for software development work in the San Diego area.
    > "We must do something. This is something. Therefore, we must do this."
    > -- Antony Jay and Jonathan Lynn, "Yes Minister"


    Apologies - the above code is meant for a sockets application, hence
    "#define BUFSIZE BUFSIZ" might look odd, especially in the context of
    printf. I'm still trying to determine what makes an optimal buffer
    size for sending data via sockets.

    Chris
     
    Khookie, Dec 11, 2007
    #10
  11. Khookie

    user923005 Guest

    On Dec 11, 3:08 am, Khookie <> wrote:
    > On Dec 11, 8:45 pm, Keith Thompson <> wrote:
    >
    >
    >
    >
    >
    > > Richard Heathfield <> writes:
    > > > [Yo, Keith. Just a little nit, here.]

    >
    > > > Keith Thompson said:

    >
    > > >> Khookie <> writes:

    >
    > > >>> #define BUFSIZE BUFSIZ

    >
    > > >> That's a bit silly.

    >
    > > > No, it isn't.

    >
    > > >> Why not use BUFSIZ directly?

    >
    > > > Because he might want to change it later.

    >
    > > Yes, good point. But in that case, I'd really want to use a name
    > > other than BUFSIZE, something easier to distinguish from BUFSIZ.

    >
    > > --
    > > Keith Thompson (The_Other_Keith) <>
    > > Looking for software development work in the San Diego area.
    > > "We must do something. This is something. Therefore, we must do this."
    > > -- Antony Jay and Jonathan Lynn, "Yes Minister"

    >
    > Apologies - the above code is meant for a sockets application, hence
    > "#define BUFSIZE BUFSIZ" might look odd, especially in the context of
    > printf. I'm still trying to determine what makes an optimal buffer
    > size for sending data via sockets.


    Ludicrously off-topic, but this is what you want for that:
    http://dast.nlanr.net/Projects/Iperf/
     
    user923005, Dec 11, 2007
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Fredrik Ramsberg

    Optimisation of regexps in Perl?

    Fredrik Ramsberg, Oct 14, 2003, in forum: Perl
    Replies:
    2
    Views:
    477
    Fredrik Ramsberg
    Oct 15, 2003
  2. Roedy Green

    boolean loop optimisation

    Roedy Green, Sep 11, 2003, in forum: Java
    Replies:
    8
    Views:
    2,830
    Chris Uppal
    Sep 12, 2003
  3. sorry.no.email@post_NG.com

    Search Engine Optimisation

    sorry.no.email@post_NG.com, May 8, 2006, in forum: HTML
    Replies:
    0
    Views:
    359
    sorry.no.email@post_NG.com
    May 8, 2006
  4. Oliver Batchelor
    Replies:
    1
    Views:
    377
    Frank Schmitt
    Jul 22, 2003
  5. Agent Mulder

    Re: Code optimisation

    Agent Mulder, Aug 27, 2003, in forum: C++
    Replies:
    1
    Views:
    314
    Peter van Merkerk
    Aug 27, 2003
Loading...

Share This Page