std::istream slowness vs. std::fgetc

Discussion in 'C++' started by Jason K, May 9, 2005.

  1. Jason K

    Jason K Guest

    Let me preface this by saying this obviously isn't a C++ *language*
    issue per se; rather probably an issue relating to quality of
    implementation, unless I'm just misusing iostream...

    I wrote a function to count lines in a large file that start with
    a particular pattern. The file can contain large amounts of non-text
    crap, which may make some lines very long (so using getline() with
    a std::string isn't feasable). So I dug around looking at what
    else is available in terms of the unformatted io functions and such,
    and found istream::ignore, but the performance was crap compared
    to an implementation using <cstdio>. (I tried it on both GNU C++
    and MSVC++).

    I'm kinda new to iostream, so I'm guessing (*hoping* is more like
    it, because I'd rather use them for this) that I'm going about this
    in the wrong way (and that that is part of why the performance is
    so much worse). The other possibility is that compiler vendors
    have spent more time making the C-style stdio functions fast, and
    only implement iostream to be able to say they support the language
    standard. Still another possibility (that I hope isn't the case)
    is that iostreams are intrinsically unable to perform as well for
    some reason relating to the design (lots of creation/destruction
    of sentry objects maybe??).

    Before anyone mentions it, this isn't on cin, so the sync_with_stdio
    crap isn't the issue...

    Anyway, here's the implementation I came up with using iostream.
    (Btw, this is coming from memory and I'm not compiling it. So go
    easy on the nitpicks if you grok the general idea).

    ----

    // I was mildly suprised there's not something like this
    // in <algorithm>, btw.
    namespace {
    template<class InputIterator, class OutputIterator, class T>
    void
    copy_n_until(InputIterator in, InputIterator end, OutputIterator out,
    size_t limit, T until)
    {
    while (limit-- && in != end && *in != until)
    *out++ = *in++;
    }

    int
    count_special_lines(istream &in)
    {
    int cnt = 0;

    do {
    // Ok, I wanted to use istream::get(char_type *, size_type),
    // but unfortunately it sets failbit if it stores no characters
    // into the output array. Which kinda sucks for me, because
    // adjacent newlines is not a real failure (and how can I tell
    // this apart from a propagated failure on the low-level write()
    // or WriteFile() or whatever system-specific function?).
    // Maybe I'm not understanding something about the state bits...
    //
    // istream::getline also didn't seem so nice because it fails
    // when it stores the max chars you told it it was allowed to
    // read, but I'm deliberately limiting that number so I don't
    // have to read a whole (potentially large) line into core.
    //
    // This iterator version seemed to be the best way to avoid
    // having to screw with failure states that aren't real I/O
    // failures, and its syntax is pretty nice. I'm not so sure
    // about the performance implications, however. I'd note,
    // though, that this certainly isn't the slowest part; just
    // running a loop on the ignore is ridiculously slow...
    //
    string buf;
    istreambuf_iterator<char> it(in), end;
    copy_n_until(it, end, back_inserter(buf), 5, '\n');
    if (buf == "Magic")
    cnt++;
    } while (in.ignore(numerical_limits<streamsize>.max(), '\n');

    return cnt;
    }

    }

    int
    count_file(const string &file)
    {
    return count_special_lines(
    ifstream(file.c_str(), ios::binary | ios::in));
    }

    ----

    The above approach worked, but it goes way slow for what it is doing
    (essentially nothing). Using cstdio runs *many* times faster:

    ----

    int
    count_file(const string &file)
    {
    FILE *fp;

    if (!(fp = fopen(file.c_str(), "r"))
    return 0;

    int cnt = 0;
    for (;;) {
    char buf[6];
    if (!fgets(buf, sizeof buf, fp))
    break;
    if (!strcmp(buf, "Magic"))
    cnt++;

    // If we already finished a line, we don't need to
    // skip to the next one.
    if (strrchr(buf, '\n'))
    continue;

    // Skip to the next line.
    int c;
    while ((c = fgetc(fp)) != EOF && c != '\n')
    ;
    if (c == EOF)
    break;
    }

    fclose(fp);
    return cnt;
    }

    ----

    Anyone know how I can rewrite the iostreams version to not suck (if
    it is possible)? Or is the iostreams lib (at least, in the GNU and
    MS implementations) not really useful for this sort of (very simple)
    real world task, if speed is even slightly an issue?

    Jason K
     
    Jason K, May 9, 2005
    #1
    1. Advertising

  2. Sorry I don't have the answer to this question, but am also interested
    in the answer. I'm kind of surprised no one has replied to this post.
    Is it that this question is better suited to a different newsgroup since
    it could be compiler/vendor specific or os specific? Or is it just that
    no one really knows the answer?

    Perhaps a simpler question would be whether the i/o streams for C++
    should perform as well as C i/o. If so, are there any tricks or preferred
    usage?

    Thanks,
    Kyle


    "Jason K" <> wrote in message
    news:inRfe.57245$...
    > Let me preface this by saying this obviously isn't a C++ *language*
    > issue per se; rather probably an issue relating to quality of
    > implementation, unless I'm just misusing iostream...
    >
    > I wrote a function to count lines in a large file that start with
    > a particular pattern. The file can contain large amounts of non-text
    > crap, which may make some lines very long (so using getline() with
    > a std::string isn't feasable). So I dug around looking at what
    > else is available in terms of the unformatted io functions and such,
    > and found istream::ignore, but the performance was crap compared
    > to an implementation using <cstdio>. (I tried it on both GNU C++
    > and MSVC++).
    >
    > I'm kinda new to iostream, so I'm guessing (*hoping* is more like
    > it, because I'd rather use them for this) that I'm going about this
    > in the wrong way (and that that is part of why the performance is
    > so much worse). The other possibility is that compiler vendors
    > have spent more time making the C-style stdio functions fast, and
    > only implement iostream to be able to say they support the language
    > standard. Still another possibility (that I hope isn't the case)
    > is that iostreams are intrinsically unable to perform as well for
    > some reason relating to the design (lots of creation/destruction
    > of sentry objects maybe??).
    >
    > Before anyone mentions it, this isn't on cin, so the sync_with_stdio
    > crap isn't the issue...
    >
    > Anyway, here's the implementation I came up with using iostream.
    > (Btw, this is coming from memory and I'm not compiling it. So go
    > easy on the nitpicks if you grok the general idea).
    >
    > ----
    >
    > // I was mildly suprised there's not something like this
    > // in <algorithm>, btw.
    > namespace {
    > template<class InputIterator, class OutputIterator, class T>
    > void
    > copy_n_until(InputIterator in, InputIterator end, OutputIterator out,
    > size_t limit, T until)
    > {
    > while (limit-- && in != end && *in != until)
    > *out++ = *in++;
    > }
    >
    > int
    > count_special_lines(istream &in)
    > {
    > int cnt = 0;
    >
    > do {
    > // Ok, I wanted to use istream::get(char_type *, size_type),
    > // but unfortunately it sets failbit if it stores no characters
    > // into the output array. Which kinda sucks for me, because
    > // adjacent newlines is not a real failure (and how can I tell
    > // this apart from a propagated failure on the low-level write()
    > // or WriteFile() or whatever system-specific function?).
    > // Maybe I'm not understanding something about the state bits...
    > //
    > // istream::getline also didn't seem so nice because it fails
    > // when it stores the max chars you told it it was allowed to
    > // read, but I'm deliberately limiting that number so I don't
    > // have to read a whole (potentially large) line into core.
    > //
    > // This iterator version seemed to be the best way to avoid
    > // having to screw with failure states that aren't real I/O
    > // failures, and its syntax is pretty nice. I'm not so sure
    > // about the performance implications, however. I'd note,
    > // though, that this certainly isn't the slowest part; just
    > // running a loop on the ignore is ridiculously slow...
    > //
    > string buf;
    > istreambuf_iterator<char> it(in), end;
    > copy_n_until(it, end, back_inserter(buf), 5, '\n');
    > if (buf == "Magic")
    > cnt++;
    > } while (in.ignore(numerical_limits<streamsize>.max(), '\n');
    >
    > return cnt;
    > }
    >
    > }
    >
    > int
    > count_file(const string &file)
    > {
    > return count_special_lines(
    > ifstream(file.c_str(), ios::binary | ios::in));
    > }
    >
    > ----
    >
    > The above approach worked, but it goes way slow for what it is doing
    > (essentially nothing). Using cstdio runs *many* times faster:
    >
    > ----
    >
    > int
    > count_file(const string &file)
    > {
    > FILE *fp;
    >
    > if (!(fp = fopen(file.c_str(), "r"))
    > return 0;
    >
    > int cnt = 0;
    > for (;;) {
    > char buf[6];
    > if (!fgets(buf, sizeof buf, fp))
    > break;
    > if (!strcmp(buf, "Magic"))
    > cnt++;
    >
    > // If we already finished a line, we don't need to
    > // skip to the next one.
    > if (strrchr(buf, '\n'))
    > continue;
    >
    > // Skip to the next line.
    > int c;
    > while ((c = fgetc(fp)) != EOF && c != '\n')
    > ;
    > if (c == EOF)
    > break;
    > }
    >
    > fclose(fp);
    > return cnt;
    > }
    >
    > ----
    >
    > Anyone know how I can rewrite the iostreams version to not suck (if
    > it is possible)? Or is the iostreams lib (at least, in the GNU and
    > MS implementations) not really useful for this sort of (very simple)
    > real world task, if speed is even slightly an issue?
    >
    > Jason K
     
    Kyle Kolander, May 11, 2005
    #2
    1. Advertising

  3. Jason K wrote:
    > Let me preface this by saying this obviously isn't a C++ *language*
    > issue per se; rather probably an issue relating to quality of
    > implementation, unless I'm just misusing iostream...
    >
    > I wrote a function to count lines in a large file that start with
    > a particular pattern. The file can contain large amounts of non-text
    > crap, which may make some lines very long (so using getline() with
    > a std::string isn't feasable). So I dug around looking at what
    > else is available in terms of the unformatted io functions and such,
    > and found istream::ignore, but the performance was crap compared
    > to an implementation using <cstdio>. (I tried it on both GNU C++
    > and MSVC++).
    >
    > I'm kinda new to iostream, so I'm guessing (*hoping* is more like
    > it, because I'd rather use them for this) that I'm going about this
    > in the wrong way (and that that is part of why the performance is
    > so much worse). The other possibility is that compiler vendors
    > have spent more time making the C-style stdio functions fast, and
    > only implement iostream to be able to say they support the language
    > standard. Still another possibility (that I hope isn't the case)
    > is that iostreams are intrinsically unable to perform as well for
    > some reason relating to the design (lots of creation/destruction
    > of sentry objects maybe??).
    >
    > Before anyone mentions it, this isn't on cin, so the sync_with_stdio
    > crap isn't the issue...
    >
    > Anyway, here's the implementation I came up with using iostream.
    > (Btw, this is coming from memory and I'm not compiling it. So go
    > easy on the nitpicks if you grok the general idea).
    >
    > ----
    >
    > // I was mildly suprised there's not something like this
    > // in <algorithm>, btw.
    > namespace {
    > template<class InputIterator, class OutputIterator, class T>
    > void
    > copy_n_until(InputIterator in, InputIterator end, OutputIterator out,
    > size_t limit, T until)
    > {
    > while (limit-- && in != end && *in != until)
    > *out++ = *in++;
    > }
    >
    > int
    > count_special_lines(istream &in)
    > {
    > int cnt = 0;
    >
    > do {
    > // Ok, I wanted to use istream::get(char_type *, size_type),
    > // but unfortunately it sets failbit if it stores no characters
    > // into the output array. Which kinda sucks for me, because
    > // adjacent newlines is not a real failure (and how can I tell
    > // this apart from a propagated failure on the low-level write()
    > // or WriteFile() or whatever system-specific function?).
    > // Maybe I'm not understanding something about the state bits...
    > //
    > // istream::getline also didn't seem so nice because it fails
    > // when it stores the max chars you told it it was allowed to
    > // read, but I'm deliberately limiting that number so I don't
    > // have to read a whole (potentially large) line into core.
    > //
    > // This iterator version seemed to be the best way to avoid
    > // having to screw with failure states that aren't real I/O
    > // failures, and its syntax is pretty nice. I'm not so sure
    > // about the performance implications, however. I'd note,
    > // though, that this certainly isn't the slowest part; just
    > // running a loop on the ignore is ridiculously slow...
    > //
    > string buf;
    > istreambuf_iterator<char> it(in), end;
    > copy_n_until(it, end, back_inserter(buf), 5, '\n');
    > if (buf == "Magic")
    > cnt++;
    > } while (in.ignore(numerical_limits<streamsize>.max(), '\n');
    >
    > return cnt;
    > }
    >
    > }
    >
    > int
    > count_file(const string &file)
    > {
    > return count_special_lines(
    > ifstream(file.c_str(), ios::binary | ios::in));
    > }
    >
    > ----
    >
    > The above approach worked, but it goes way slow for what it is doing
    > (essentially nothing). Using cstdio runs *many* times faster:
    >
    > ----
    >
    > int
    > count_file(const string &file)
    > {
    > FILE *fp;
    >
    > if (!(fp = fopen(file.c_str(), "r"))
    > return 0;
    >
    > int cnt = 0;
    > for (;;) {
    > char buf[6];
    > if (!fgets(buf, sizeof buf, fp))
    > break;
    > if (!strcmp(buf, "Magic"))
    > cnt++;
    >
    > // If we already finished a line, we don't need to
    > // skip to the next one.
    > if (strrchr(buf, '\n'))
    > continue;
    >
    > // Skip to the next line.
    > int c;
    > while ((c = fgetc(fp)) != EOF && c != '\n')
    > ;
    > if (c == EOF)
    > break;
    > }
    >
    > fclose(fp);
    > return cnt;
    > }
    >
    > ----
    >
    > Anyone know how I can rewrite the iostreams version to not suck (if
    > it is possible)? Or is the iostreams lib (at least, in the GNU and
    > MS implementations) not really useful for this sort of (very simple)
    > real world task, if speed is even slightly an issue?
    >
    > Jason K


    No comment on your code example.

    Most C++ stream implementations are layered on top of the
    STDIO implementation (open/close fopen/fclose, etc). This
    adds additional overhead to most i/o calls.

    For example, I wrote a quick program that reads a 16MB file
    of binary data in 8 byte chunks (intentionally ineffecient).
    The program reads the binary file twice, once using an fread()
    loop, and then again using an ifstream.read() loop. The fread()
    loop finished in 0.84 seconds, and the ifstream.read() loop
    finished in 1.37 seconds. In both cases the file was opened
    in 'binary' mode (this only matters on Windows). These
    timings are only valid for my machine, which is:

    pentium 3, 450 mhz, 384 MB RAM
    SuSE Linux Pro v9.2
    GNU g++ v3.3.4 (pre 3.3.5 20040809)
    libstdc++ v3.3.4-11
    glibc v2.3.3-118

    Here's the ifstream function:

    void
    count_stream(const char * file)
    {
    int len;
    char buf[9];

    // *** open file for reading IN BINARY MODE ***
    std::ifstream dmy(file,
    std::ios_base::in |
    std::ios_base::binary);

    len = 8;
    buf[len] = '\0'; // just because...

    while(dmy)
    {
    // read 8 bytes of binary data
    dmy.read(buf, len);
    }

    dmy.close();

    return;
    }

    Here's the stdio version:

    void
    count_file(const char * file)
    {
    FILE *fp;
    int len;
    char buf[9];

    len = 8;
    buf[len] = '\0'; // just because...

    // *** open file for reading IN BINARY MODE ***
    if (!(fp = fopen(file, "rb")))
    return;

    for (;;)
    {
    // read 8 bytes of binary data
    if (0 == fread(buf, len, 1, fp))
    break;
    }

    fclose(fp);

    return;
    }

    Regards,
    Larry

    --
    Anti-spam address, change each 'X' to '.' to reply directly.
     
    Larry I Smith, May 12, 2005
    #3
  4. Jason K

    Alex Vinokur Guest

    "Kyle Kolander" <> wrote in message news:Kfuge.3$...

    [snip]
    > Perhaps a simpler question would be whether the i/o streams for C++
    > should perform as well as C i/o. If so, are there any tricks or preferred
    > usage?

    [snip]

    For instance, see

    "<Summary> Simple C/C++ Perfometer: Reading file to string (Versions 1.x)" at
    http://groups-beta.google.com/group/perfo/msg/530fae8e5e065030

    "<Release> Simple C/C++ Perfometer: Copying Files (Versions 4.x)" at
    http://groups-beta.google.com/group/perfo/msg/8a74465da4c4e9bb


    --
    Alex Vinokur
    email: alex DOT vinokur AT gmail DOT com
    http://mathforum.org/library/view/10978.html
    http://sourceforge.net/users/alexvn
     
    Alex Vinokur, May 12, 2005
    #4
  5. Jason K

    Abecedarian Guest

    Abecedarian, May 12, 2005
    #5
  6. Jason K

    Jeff Flinn Guest

    "Abecedarian" <> wrote in message
    news:...
    > Alex Vinokur wrote:
    >> For instance, see
    >>
    >> "<Summary> Simple C/C++ Perfometer: Reading file to string (Versions

    > 1.x)" at
    >> http://groups-beta.google.com/group/perfo/msg/530fae8e5e065030
    >>
    >> "<Release> Simple C/C++ Perfometer: Copying Files (Versions 4.x)" at
    >> http://groups-beta.google.com/group/perfo/msg/8a74465da4c4e9bb

    >
    > Impressive - thank you!


    Hardly, Just looking at the last line of the summary:


    ###############
    ### Summary ###
    ###############


    ==============================­===========
    * Performance
    * Comparative Performance Measurement
    ------------------------------­-----------
    * Tool : Simple C/C++ Perfometer
    * Algorithm: Reading file into string
    * Language : C++
    * Version : F2S-1.0
    ------------------------------­-----------
    * Environment : Windows 2000 Professional
    Intel(R) Celeron(R) CPU 1.70 GHz
    Cygwin
    * Compilers : GNU g++ 3.3.3
    * Optimization : No optimization
    ~~~~~~~~~~~~~~~

    Comparing C to un-optimized C++ is not a valid comparison.

    Jeff Flinn
     
    Jeff Flinn, May 12, 2005
    #6
  7. Jason K

    Jeff Flinn Guest

    "Jason K" <> wrote in message
    news:inRfe.57245$...
    > Let me preface this by saying this obviously isn't a C++ *language*
    > issue per se; rather probably an issue relating to quality of
    > implementation, unless I'm just misusing iostream...
    > IMO, yes. IOStreams performance is awful, and always will be, thanks to
    > the requirements imposed on it.


    Below is a reply to a similar posting in microsoft.public.vc.stl by Stephen
    Howe.

    Jeff Flinn

    -----------------------------------------------------------------------------

    Several things:

    1. Have you seen this Carl?
    http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2004/n1666.pdf
    It talks about efficient IOStreams

    2. Part of the problem is the fact that the C portions of the library are by
    Microsoft and the C++ portions are by Dinkumware. And the C++ portions "sit"
    on top of the C portions.
    Inevitably, C is faster because of the design favour. I would not have
    designed things this way.

    Instead fopen() and fstream would call a common internal function to do the
    opening of files.
    I would try and make it so it possible to share the same buffer (unless the
    standards make that impossible).

    Stephen Howe



    >
    > > b) Are there any getarounds?

    >
    > Don't use IOStreams.
    >
    > -cd
    >
    >
    >
     
    Jeff Flinn, May 12, 2005
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Gernot Frisch

    from std::string to std::istream?

    Gernot Frisch, Mar 17, 2005, in forum: C++
    Replies:
    4
    Views:
    14,659
    Victor Bazarov
    Mar 18, 2005
  2. re.I slowness

    , Mar 30, 2006, in forum: Python
    Replies:
    1
    Views:
    358
    Paul McGuire
    Mar 30, 2006
  3. Replies:
    10
    Views:
    566
  4. Ian Collins
    Replies:
    0
    Views:
    660
    Ian Collins
    Nov 13, 2009
  5. xmllmx
    Replies:
    5
    Views:
    616
    Jorgen Grahn
    Jun 15, 2010
Loading...

Share This Page