File Read Performance Issue

Discussion in 'C++' started by Mike Copeland, Aug 17, 2013.

  1. I am developing an application that reads and stores data from 2
    large text files. One file (5750kb) has ~160,000 records and the other
    (23,000kb) has ~330,000 records. The program reads both files and
    converts/stores their data into vectors. Also, some indices to unique
    records are developed (much like a telephone book), so that searches can
    be efficiently done.
    The reads at program start take a long time (~2 minutes), and
    although I use buffers of 4096 size, I can't see other ways to improve
    this aspect of the program's performance. The data this program uses
    will continue to grow, so I am concerned about its viability. Any
    thoughts? TIA
    Mike Copeland, Aug 17, 2013
    #1
    1. Advertising

  2. Mike Copeland

    Ian Collins Guest

    Mike Copeland wrote:
    > I am developing an application that reads and stores data from 2
    > large text files. One file (5750kb) has ~160,000 records and the other
    > (23,000kb) has ~330,000 records. The program reads both files and
    > converts/stores their data into vectors. Also, some indices to unique
    > records are developed (much like a telephone book), so that searches can
    > be efficiently done.
    > The reads at program start take a long time (~2 minutes), and
    > although I use buffers of 4096 size, I can't see other ways to improve
    > this aspect of the program's performance. The data this program uses
    > will continue to grow, so I am concerned about its viability. Any
    > thoughts? TIA


    Where does your profiler tell you it is spending its time? There are
    too many different possibilities to give a decent answer without knowing
    where the bottleneck is.

    --
    Ian Collins
    Ian Collins, Aug 17, 2013
    #2
    1. Advertising

  3. In article <>,
    says...
    > > I am developing an application that reads and stores data from 2
    > > large text files. One file (5750kb) has ~160,000 records and the other
    > > (23,000kb) has ~330,000 records. The program reads both files and
    > > converts/stores their data into vectors. Also, some indices to unique
    > > records are developed (much like a telephone book), so that searches can
    > > be efficiently done.
    > > The reads at program start take a long time (~2 minutes), and
    > > although I use buffers of 4096 size, I can't see other ways to improve
    > > this aspect of the program's performance. The data this program uses
    > > will continue to grow, so I am concerned about its viability. Any
    > > thoughts? TIA

    >
    > Where does your profiler tell you it is spending its time? There are
    > too many different possibilities to give a decent answer without knowing
    > where the bottleneck is.


    I don't have a profiler. However, I have tried 3 different "read"
    techniques (getline, fgets, a hybrid fgets) and have found that there is
    a base I/o factor that doesn't change. This test of mine eliminated all
    of the "storing" process, and the raw I/o numbers seem to indicate the
    problem is in the I/o runtime area.
    FWIW, using "getline" doubles the raw test times of the "fgets"
    techniques - there's a real extra overhead with this technique.
    I'm seeking here some sort of dramatic improvement to the I/o
    process, and I don't know if there's a better/faster way to read lots of
    text data records.
    Mike Copeland, Aug 17, 2013
    #3
  4. Mike Copeland

    Ian Collins Guest

    Mike Copeland wrote:
    > In article <>,
    > says...
    >>> I am developing an application that reads and stores data from 2
    >>> large text files. One file (5750kb) has ~160,000 records and the other
    >>> (23,000kb) has ~330,000 records. The program reads both files and
    >>> converts/stores their data into vectors. Also, some indices to unique
    >>> records are developed (much like a telephone book), so that searches can
    >>> be efficiently done.
    >>> The reads at program start take a long time (~2 minutes), and
    >>> although I use buffers of 4096 size, I can't see other ways to improve
    >>> this aspect of the program's performance. The data this program uses
    >>> will continue to grow, so I am concerned about its viability. Any
    >>> thoughts? TIA

    >>
    >> Where does your profiler tell you it is spending its time? There are
    >> too many different possibilities to give a decent answer without knowing
    >> where the bottleneck is.

    >
    > I don't have a profiler.


    Doesn't every tool-set have one?

    > However, I have tried 3 different "read"
    > techniques (getline, fgets, a hybrid fgets) and have found that there is
    > a base I/o factor that doesn't change. This test of mine eliminated all
    > of the "storing" process, and the raw I/o numbers seem to indicate the
    > problem is in the I/o runtime area.
    > FWIW, using "getline" doubles the raw test times of the "fgets"
    > techniques - there's a real extra overhead with this technique.
    > I'm seeking here some sort of dramatic improvement to the I/o
    > process, and I don't know if there's a better/faster way to read lots of
    > text data records.


    You're doing exactly what I hinted not to do, random guessing. As I
    said, there are many different possible causes of your slowness. For
    example are you resizing vectors too often and running into allocation
    and copy overheads? Is your I/O subsystem slow?

    Without measuring, you are taking stabs in the dark and most likely
    wasting time fixing something that isn't broken.

    If you can't profile (which I'm sure you can), try something like just
    reading and not saving the data. Then try reading all the data a some
    temporary buffer then load up your real structures. Time both.

    --
    Ian Collins
    Ian Collins, Aug 17, 2013
    #4
  5. Mike Copeland

    Nobody Guest

    On Fri, 16 Aug 2013 20:45:30 -0700, Mike Copeland wrote:

    > I don't have a profiler. However, I have tried 3 different "read"
    > techniques (getline, fgets, a hybrid fgets) and have found that there is a
    > base I/o factor that doesn't change. This test of mine eliminated all of
    > the "storing" process, and the raw I/o numbers seem to indicate the
    > problem is in the I/o runtime area.
    > FWIW, using "getline" doubles the raw test times of the "fgets"
    > techniques - there's a real extra overhead with this technique.
    > I'm seeking here some sort of dramatic improvement to the I/o
    > process, and I don't know if there's a better/faster way to read lots of
    > text data records.


    Other things to try include:

    1. Using significantly larger buffers.
    2. Using the OS' "native" I/O functions e.g. read() or ReadFile().
    3. Both 1 and 2.
    4. Using C's fread().
    5. 4 and 1.
    6. 4, with setvbuf(fp, NULL, _IONBF, 0);
    7. Memory mapping, e.g. mmap() or CreateFileMapping().
    Nobody, Aug 17, 2013
    #5
  6. Mike Copeland

    Ike Naar Guest

    On 2013-08-17, Mike Copeland <> wrote:
    > In article <>,
    > says...
    >> > I am developing an application that reads and stores data from 2
    >> > large text files. One file (5750kb) has ~160,000 records and the other
    >> > (23,000kb) has ~330,000 records. The program reads both files and
    >> > converts/stores their data into vectors. Also, some indices to unique
    >> > records are developed (much like a telephone book), so that searches can
    >> > be efficiently done.
    >> > The reads at program start take a long time (~2 minutes), and
    >> > although I use buffers of 4096 size, I can't see other ways to improve
    >> > this aspect of the program's performance. The data this program uses
    >> > will continue to grow, so I am concerned about its viability. Any
    >> > thoughts? TIA

    >>
    >> Where does your profiler tell you it is spending its time? There are
    >> too many different possibilities to give a decent answer without knowing
    >> where the bottleneck is.

    >
    > I don't have a profiler. However, I have tried 3 different "read"
    > techniques (getline, fgets, a hybrid fgets) and have found that there is
    > a base I/o factor that doesn't change. This test of mine eliminated all
    > of the "storing" process, and the raw I/o numbers seem to indicate the
    > problem is in the I/o runtime area.
    > FWIW, using "getline" doubles the raw test times of the "fgets"
    > techniques - there's a real extra overhead with this technique.
    > I'm seeking here some sort of dramatic improvement to the I/o
    > process, and I don't know if there's a better/faster way to read lots of
    > text data records.


    Is your input file stored on a very slow medium?
    On the machine I'm using right now, reading a 23MB file from disk,
    using getline in a loop, takes less than a second.
    You could try to time some other utility program, such as

    time wc -l <your_input_file

    or

    time cat <your_input_file >/dev/null

    to get an idea how long reading your input file should reasonably take.
    Ike Naar, Aug 17, 2013
    #6
  7. On 17.08.13 03.41, Mike Copeland wrote:
    > I am developing an application that reads and stores data from 2
    > large text files. One file (5750kb) has ~160,000 records and the other
    > (23,000kb) has ~330,000 records. The program reads both files and
    > converts/stores their data into vectors. Also, some indices to unique
    > records are developed (much like a telephone book), so that searches can
    > be efficiently done.
    > The reads at program start take a long time (~2 minutes), and
    > although I use buffers of 4096 size,


    4096 is no buffer nowadays. The break even of modern HDDs is above 1MB.
    So if you think, that your task is I/O bound (low CPU usage), then
    significantly increase the buffer size.

    If your task is CPU bound (high CPU load) then this won't help at all.
    Depending on your access pattern and the file system driver many
    platforms detect that you intend to read the entire file and implicitly
    use a large read ahead cache. So you need no larger buffer.

    > I can't see other ways to improve
    > this aspect of the program's performance. The data this program uses
    > will continue to grow, so I am concerned about its viability. Any
    > thoughts? TIA


    Do you have some linear access to your data structures? This will likely
    cause O(n^2) performance which is really bad. This happens e.g. if you
    use sorted vectors.

    What about allocations? I guess each record is stored in a separate
    vector instance. Is each instance allocated seperately? Depending on the
    performance of the allocators of your runtime different results are
    possible. I have seen very smart implementations as well as very bad ones.

    You could further improve performance if you read the two files in parallel.


    Marcel
    Marcel Müller, Aug 17, 2013
    #7
  8. Mike Copeland wrote:
    > I am developing an application that reads and stores data from 2
    > large text files. One file (5750kb) has ~160,000 records
    > and the other
    > (23,000kb) has ~330,000 records. The program reads both files and
    > converts/stores their data into vectors. Also, some indices to unique
    > records are developed (much like a telephone book),
    > so that searches can
    > be efficiently done.
    > The reads at program start take a long time (~2 minutes), and
    > although I use buffers of 4096 size, I can't see other ways to improve
    > this aspect of the program's performance. The data this program uses
    > will continue to grow, so I am concerned about its viability.


    Probably your program has a bug or uses a very slow medium, because
    modern hard-disks or flash-memories can be read at tens of MB per second.
    However, memory-mapped files may improve the performance of your program:
    http://en.wikibooks.org/wiki/Optimi...on_techniques/Input/Output#Memory-mapped_file

    --

    Carlo Milanesi
    http://carlomilanesi.wordpress.com/
    Carlo Milanesi, Aug 17, 2013
    #8
  9. Mike Copeland

    Jorgen Grahn Guest

    On Sat, 2013-08-17, Marcel Müller wrote:
    > On 17.08.13 03.41, Mike Copeland wrote:
    >> I am developing an application that reads and stores data from 2
    >> large text files. One file (5750kb) has ~160,000 records and the other
    >> (23,000kb) has ~330,000 records. The program reads both files and
    >> converts/stores their data into vectors. Also, some indices to unique
    >> records are developed (much like a telephone book), so that searches can
    >> be efficiently done.
    >> The reads at program start take a long time (~2 minutes), and
    >> although I use buffers of 4096 size,

    >
    > 4096 is no buffer nowadays. The break even of modern HDDs is above 1MB.
    > So if you think, that your task is I/O bound (low CPU usage), then
    > significantly increase the buffer size.


    Depends on what he means by "buffers of 4096 size". If I naively use
    iostreams to read a text file line by line, it maps to reading 8191
    bytes via the kernel interface, read(2). I hope the kernel at that
    point has read /more/ than 8191 bytes into its cache.

    > If your task is CPU bound (high CPU load) then this won't help at all.


    Yeah. As we all know, his problem lies somewhere else. His 2 minutes
    is four hundred times slower than the 280 ms I measure on my ancient
    hardware -- no misuse of the I/O facilities can make such a radical
    difference.

    /Jorgen

    --
    // Jorgen Grahn <grahn@ Oo o. . .
    \X/ snipabacken.se> O o .
    Jorgen Grahn, Aug 17, 2013
    #9
  10. On Fri, 16 Aug 2013 18:41:48 -0700
    (Mike Copeland) wrote:

    > I am developing an application that reads and stores data from 2
    > large text files. One file (5750kb) has ~160,000 records and the
    > other (23,000kb) has ~330,000 records. The program reads both files
    > and converts/stores their data into vectors. Also, some indices to
    > unique records are developed (much like a telephone book), so that
    > searches can be efficiently done.
    > The reads at program start take a long time (~2 minutes), and
    > although I use buffers of 4096 size, I can't see other ways to
    > improve this aspect of the program's performance. The data this
    > program uses will continue to grow, so I am concerned about its
    > viability. Any thoughts? TIA


    This data is tiny by today's standards and should easily be read
    within seconds (lower single digits, if that). If you have a slowdown
    of this magnitude, your problems are pretty much guaranteed to be
    on the algorithmic side, not IO-related. You most likely have a
    quadratic algorithm hidden in there somewhere and that's the one
    you have to replace with something sane.

    Without access to your code, it's impossible to be more specific.
    Your "indices to unique records" sound suspicious, for instance,
    because dumb "unique"-algorithms are quadratic; try whether disabling
    those temporarily will fix the problem and continue from there. Apart
    from that, if you read into vectors (as in std::vector) via push_back,
    this is not quadratic and should be fine, but beware if you do something
    like resize() or reserve() per record, because this _will_ degrade
    filling the vector into quadratic complexity. A final candidate would
    be your record structures, if these are very expensive to copy and
    you're allowed to use C++11-features, you might consider adding
    move-semantics (if your records are just a bunch of POD, don't
    bother).



    Andreas
    --
    Dr. Andreas Dehmel Ceterum censeo
    FLIPME(ed.enilno-t@nouqraz) Microsoft esse delendam
    http://www.zarquon.homepage.t-online.de (Cato the Much Younger)
    Andreas Dehmel, Aug 17, 2013
    #10
  11. Mike Copeland

    Jorgen Grahn Guest

    On Mon, 2013-08-19, Juha Nieminen wrote:

    > Unfortunately C++ streams tend to be significantly slower than the
    > C equivalents in the vast majority of implementation (every single
    > one I have ever used.) With small amounts of data it doesn't matter,
    > but when the data amount increases, it starts to show.
    >
    > If you are using C++ streams, switching to (properly used) C I/O
    > functions could well cut that time to 1 minute or less. (Further
    > optimization will have to be done at a higher level.)


    I was going to challenge that "but iostreams are slower!" idea but it
    looks like that here, too. On my Linux box, a line counting loop is
    about twice as fast with fgets() compared to std::getline(stream,
    string&). I was expecting a 20% difference or something.

    On the other hand, both are pretty fast: 100--200 nanoseconds per line.

    Interestingly, the fgets() version makes twice as many read() calls.

    /Jorgen

    --
    // Jorgen Grahn <grahn@ Oo o. . .
    \X/ snipabacken.se> O o .
    Jorgen Grahn, Aug 19, 2013
    #11
  12. On 17.08.13 15.12, Jorgen Grahn wrote:
    >>> The reads at program start take a long time (~2 minutes), and
    >>> although I use buffers of 4096 size,

    >>
    >> 4096 is no buffer nowadays. The break even of modern HDDs is above 1MB.
    >> So if you think, that your task is I/O bound (low CPU usage), then
    >> significantly increase the buffer size.

    >
    > Depends on what he means by "buffers of 4096 size". If I naively use
    > iostreams to read a text file line by line, it maps to reading 8191
    > bytes via the kernel interface, read(2). I hope the kernel at that
    > point has read /more/ than 8191 bytes into its cache.


    Yes, but for reading the kernel needs to estimate the next blocks that
    you are going to read. This works best for platforms where the disk
    cache is located at the filesystem driver. It works less well when the
    cache is part of the block device driver, because the next LBA is not
    necessarily the next block of your file. And it usually works very bad
    if you do seek operations.

    >> If your task is CPU bound (high CPU load) then this won't help at all.

    >
    > Yeah. As we all know, his problem lies somewhere else. His 2 minutes
    > is four hundred times slower than the 280 ms I measure on my ancient
    > hardware -- no misuse of the I/O facilities can make such a radical
    > difference.


    Anything is possible, but where did you get the numbers?


    Marcel
    Marcel Müller, Aug 19, 2013
    #12
  13. Mike Copeland

    Jorgen Grahn Guest

    On Mon, 2013-08-19, Marcel Müller wrote:
    > On 17.08.13 15.12, Jorgen Grahn wrote:
    >>>> The reads at program start take a long time (~2 minutes), and
    >>>> although I use buffers of 4096 size,
    >>>
    >>> 4096 is no buffer nowadays. The break even of modern HDDs is above 1MB.
    >>> So if you think, that your task is I/O bound (low CPU usage), then
    >>> significantly increase the buffer size.

    >>
    >> Depends on what he means by "buffers of 4096 size". If I naively use
    >> iostreams to read a text file line by line, it maps to reading 8191
    >> bytes via the kernel interface, read(2). I hope the kernel at that
    >> point has read /more/ than 8191 bytes into its cache.

    >
    > Yes, but for reading the kernel needs to estimate the next blocks that
    > you are going to read. This works best for platforms where the disk
    > cache is located at the filesystem driver. It works less well when the
    > cache is part of the block device driver, because the next LBA is not
    > necessarily the next block of your file. And it usually works very bad
    > if you do seek operations.
    >
    >>> If your task is CPU bound (high CPU load) then this won't help at all.

    >>
    >> Yeah. As we all know, his problem lies somewhere else. His 2 minutes
    >> is four hundred times slower than the 280 ms I measure on my ancient
    >> hardware -- no misuse of the I/O facilities can make such a radical
    >> difference.

    >
    > Anything is possible, but where did you get the numbers?


    280ms? Why -- do you doubt it? I simply found a file of the right size
    and (in Unix) timed 'cat file >/dev/null' or 'wc -l file' or
    something. The file was in cache at that point.

    /Jorgen

    --
    // Jorgen Grahn <grahn@ Oo o. . .
    \X/ snipabacken.se> O o .
    Jorgen Grahn, Aug 20, 2013
    #13
  14. Mike Copeland

    Jorgen Grahn Guest

    On Mon, 2013-08-19, Paavo Helde wrote:
    > Jorgen Grahn <> wrote in
    > news::
    >
    >> I was going to challenge that "but iostreams are slower!" idea but it
    >> looks like that here, too. On my Linux box, a line counting loop is
    >> about twice as fast with fgets() compared to std::getline(stream,
    >> string&). I was expecting a 20% difference or something.

    >
    > Yes, the iostreams slowdown factor is about 2 also in my experience, also
    > for other usage than std::getline(). For example, concatenating large
    > strings from pieces with std::string::eek:perator+= seems to be about twice
    > faster than using std::eek:stringstream. My guess is this is so because of
    > massive virtual function calls.


    Hm, they always say iostreams are designed so that you're not doing a
    virtual function call per character -- that buffer thing. So that
    theory seems a bit unlikely to me. But I don't have a better one.

    I suspected a performance hit from locale usage, but if that's involved
    switching to the plain "C" locale didn't help.

    Lastly: ten years ago I could believe my C++ (libstdc++, Gnu/Linux)
    wasn't optimized. But by now surely someone has tried to fix at least
    this common case: reading std::cin line by line? Strange.

    /Jorgen

    --
    // Jorgen Grahn <grahn@ Oo o. . .
    \X/ snipabacken.se> O o .
    Jorgen Grahn, Aug 20, 2013
    #14
  15. Mike Copeland

    Melzzzzz Guest

    On 20 Aug 2013 08:26:47 GMT
    Jorgen Grahn <> wrote:

    > On Mon, 2013-08-19, Paavo Helde wrote:
    > > Jorgen Grahn <> wrote in
    > > news::
    > >
    > >> I was going to challenge that "but iostreams are slower!" idea but
    > >> it looks like that here, too. On my Linux box, a line counting
    > >> loop is about twice as fast with fgets() compared to
    > >> std::getline(stream, string&). I was expecting a 20% difference or
    > >> something.

    > >
    > > Yes, the iostreams slowdown factor is about 2 also in my
    > > experience, also for other usage than std::getline(). For example,
    > > concatenating large strings from pieces with
    > > std::string::eek:perator+= seems to be about twice faster than using
    > > std::eek:stringstream. My guess is this is so because of massive
    > > virtual function calls.

    >
    > Hm, they always say iostreams are designed so that you're not doing a
    > virtual function call per character -- that buffer thing. So that
    > theory seems a bit unlikely to me. But I don't have a better one.
    >
    > I suspected a performance hit from locale usage, but if that's
    > involved switching to the plain "C" locale didn't help.
    >
    > Lastly: ten years ago I could believe my C++ (libstdc++, Gnu/Linux)
    > wasn't optimized. But by now surely someone has tried to fix at least
    > this common case: reading std::cin line by line? Strange.


    With g++ stdlib implementation you have to
    std::ios_base::sync_with_stdio(false) in order to get good
    performance.

    --
    Sig.
    Melzzzzz, Aug 20, 2013
    #15
  16. Mike Copeland

    Jorgen Grahn Guest

    On Tue, 2013-08-20, Melzzzzz wrote:
    > On 20 Aug 2013 08:26:47 GMT
    > Jorgen Grahn <> wrote:
    >
    >> On Mon, 2013-08-19, Paavo Helde wrote:
    >> > Jorgen Grahn <> wrote in
    >> > news::
    >> >
    >> >> I was going to challenge that "but iostreams are slower!" idea but
    >> >> it looks like that here, too. On my Linux box, a line counting
    >> >> loop is about twice as fast with fgets() compared to
    >> >> std::getline(stream, string&). I was expecting a 20% difference or
    >> >> something.
    >> >
    >> > Yes, the iostreams slowdown factor is about 2 also in my
    >> > experience, also for other usage than std::getline(). For example,
    >> > concatenating large strings from pieces with
    >> > std::string::eek:perator+= seems to be about twice faster than using
    >> > std::eek:stringstream. My guess is this is so because of massive
    >> > virtual function calls.

    >>
    >> Hm, they always say iostreams are designed so that you're not doing a
    >> virtual function call per character -- that buffer thing. So that
    >> theory seems a bit unlikely to me. But I don't have a better one.
    >>
    >> I suspected a performance hit from locale usage, but if that's
    >> involved switching to the plain "C" locale didn't help.
    >>
    >> Lastly: ten years ago I could believe my C++ (libstdc++, Gnu/Linux)
    >> wasn't optimized. But by now surely someone has tried to fix at least
    >> this common case: reading std::cin line by line? Strange.

    >
    > With g++ stdlib implementation you have to
    > std::ios_base::sync_with_stdio(false) in order to get good
    > performance.


    I was just about to say "of course I had that one; it's in main() in
    all my programs!" ... but I might have forgotten this time. I'll
    look into it tonight.

    /Jorgen

    --
    // Jorgen Grahn <grahn@ Oo o. . .
    \X/ snipabacken.se> O o .
    Jorgen Grahn, Aug 20, 2013
    #16
  17. Mike Copeland

    Melzzzzz Guest

    On Tue, 20 Aug 2013 07:13:57 -0500, Paavo Helde wrote:

    > Melzzzzz <> wrote in news:kuv9bt$4l2$:
    >
    >> With g++ stdlib implementation you have to
    >> std::ios_base::sync_with_stdio(false) in order to get good performance.

    >
    > This would not explain the difference between std::eek:stringstream vs
    > std::string concatenation, would it?
    >


    No. This is for fast cin/cout operations. Eg reading file from std::cin
    with getline.

    --
    Sig.
    Melzzzzz, Aug 20, 2013
    #17
  18. Mike Copeland

    Jorgen Grahn Guest

    On Tue, 2013-08-20, Jorgen Grahn wrote:
    > On Tue, 2013-08-20, Melzzzzz wrote:
    >> On 20 Aug 2013 08:26:47 GMT
    >> Jorgen Grahn <> wrote:
    >>
    >>> On Mon, 2013-08-19, Paavo Helde wrote:
    >>> > Jorgen Grahn <> wrote in
    >>> > news::
    >>> >
    >>> >> I was going to challenge that "but iostreams are slower!" idea but
    >>> >> it looks like that here, too. On my Linux box, a line counting
    >>> >> loop is about twice as fast with fgets() compared to
    >>> >> std::getline(stream, string&). I was expecting a 20% difference or
    >>> >> something.
    >>> >
    >>> > Yes, the iostreams slowdown factor is about 2 also in my
    >>> > experience, also for other usage than std::getline(). For example,
    >>> > concatenating large strings from pieces with
    >>> > std::string::eek:perator+= seems to be about twice faster than using
    >>> > std::eek:stringstream. My guess is this is so because of massive
    >>> > virtual function calls.
    >>>
    >>> Hm, they always say iostreams are designed so that you're not doing a
    >>> virtual function call per character -- that buffer thing. So that
    >>> theory seems a bit unlikely to me. But I don't have a better one.
    >>>
    >>> I suspected a performance hit from locale usage, but if that's
    >>> involved switching to the plain "C" locale didn't help.
    >>>
    >>> Lastly: ten years ago I could believe my C++ (libstdc++, Gnu/Linux)
    >>> wasn't optimized. But by now surely someone has tried to fix at least
    >>> this common case: reading std::cin line by line? Strange.

    >>
    >> With g++ stdlib implementation you have to
    >> std::ios_base::sync_with_stdio(false) in order to get good
    >> performance.

    >
    > I was just about to say "of course I had that one; it's in main() in
    > all my programs!" ... but I might have forgotten this time. I'll
    > look into it tonight.


    Turns out I hadn't forgotten. So, a factor 2 difference not caused by
    the stdin/std::cin connection.

    But when I now try omitting std::cin::sync_with_stdio(false),
    I get surprising results: a slowdown by another factor 30! To
    summarize the three results, for a file of 500MB and 6.7 million
    lines:

    0.7s - 1.2s - 34s

    The 30x slowdown is even harder to find an excuse for. Ok, so it's
    avoidable -- but the workaround is obscure. I think I learned about
    it here rather recently.

    /Jorgen

    --
    // Jorgen Grahn <grahn@ Oo o. . .
    \X/ snipabacken.se> O o .
    Jorgen Grahn, Aug 20, 2013
    #18
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. jm
    Replies:
    1
    Views:
    500
    alien2_51
    Dec 12, 2003
  2. Knackeback
    Replies:
    5
    Views:
    2,852
    John Harrison
    May 11, 2004
  3. Sreejith K
    Replies:
    24
    Views:
    992
    Sreejith K
    Mar 24, 2009
  4. Eivind
    Replies:
    2
    Views:
    73
    Ezra Zygmuntowicz
    Feb 16, 2007
  5. Alex Dowad
    Replies:
    4
    Views:
    258
    Michel Demazure
    May 1, 2010
Loading...

Share This Page