Reading from a stream til EOF

Discussion in 'C++' started by Hendrik Schober, Feb 25, 2004.

  1. Hi,

    I have a 'std::istream' and need to read
    its whole contents into a string. How can
    I do this?

    TIA;

    Schobi

    --
    is never read
    I'm Schobi at suespammers dot org

    "Sometimes compilers are so much more reasonable than people."
    Scott Meyers
     
    Hendrik Schober, Feb 25, 2004
    #1
    1. Advertisements

  2. well, I'm not an expert on STL, but here are some examples

    example 1:

    char c;
    while(your_istream.get(c))
    your_string.push_back(c);

    example 2:

    char c;
    while(your_istream >> c)
    your_string.push_back(c);


    example 3:

    string your_string;
    while(your_istream >> your_string)
    foo();
     
    Rodrigo Dominguez, Feb 25, 2004
    #2
    1. Advertisements

  3. Actually I was hoping for something
    that would promiss more performance.

    Schobi

    --
    is never read
    I'm Schobi at suespammers dot org

    "Sometimes compilers are so much more reasonable than people."
    Scott Meyers
     
    Hendrik Schober, Feb 25, 2004
    #3
  4. I'm afraid making a copy at some point is unavoidable. I wish you
    could call reserve() and then write directly into the underlying
    storage, as with vector -- at least if the string had never been
    copied.

    Jonathan
     
    Jonathan Turkanis, Feb 25, 2004
    #4
  5. I suppose you mean 'resize()', where you
    say 'reserve()'? The problem is, I don't
    see how I can find out how much there is
    to read from the stream in advance.
    What I'm doing right now is this:

    std::string f(std::istream& is)
    {
    return std::string( std::istream_iterator<char>(is)
    , std::istream_iterator<char>() );
    }

    However, I suppose this goes through all
    the sentries etc. for each and every char?
    One other thing I was thinking about is
    that 'operator>>' seems to be overloaded
    for a stream buffer on the RHS. So should
    this

    std::stringstream ss;
    is >> ss.rdbuf();
    return ss.str();

    do what I think? And if so, can I expect
    better performance from this compared to
    copying the char myself?
    Schobi

    --
    is never read
    I'm Schobi at suespammers dot org

    "Sometimes compilers are so much more reasonable than people."
    Scott Meyers
     
    Hendrik Schober, Feb 25, 2004
    #5
  6. Right. That's unavoidable. An exponential growth strategy is the way
    to go. You should get this automatically with string, or you can do it
    yourself.
    You defintely don't want to do this if you're concerned with
    efficiency. At the very least, you should extract the underlying
    streambuf using is.rdbuf(), and read into a char array using sgetn.
    I would have guessed that a good implementation would implement this
    as I described above, but I checked dinkumware and it does a
    character-by-character extraction. So I would use a char buffer.

    (In my first response, I though you were mainly interested in avoiding
    the final copy when you call ss.str())

    Jonathan
     
    Jonathan Turkanis, Feb 25, 2004
    #6
  7. I planned to let 'std::string' take care
    of this. :)
    I see. I was expecting this. I suppose
    using streambuf iterators wouldn't help
    much with this?
    As this avoids creating/destroying any
    sentries and all the formatting?
    Thanks for checking. We are indeed using
    Dinkumware on two platforms. So this would
    not help much. I should probably ask about
    this MS' std lib newsgroup, as PJP and PB
    are reading and posting there.
    I am not sure what you mean here. Can you
    elaborate.
    Well, actually, I would need to istream
    the content later anyway. However, first
    I need the size of it. (The real task is
    to parse the data, which is a rather
    lengthy process. OTOH the raw data itself
    usually is not very big. So I thought it
    would be better to loose some performance
    on copying to get the size, as this would
    give me a real progress bar for visual
    feedback to the users.)
    Schobi

    --
    is never read
    I'm Schobi at suespammers dot org

    "Sometimes compilers are so much more reasonable than people."
    Scott Meyers
     
    Hendrik Schober, Feb 26, 2004
    #7
  8. This is not at all what you want to do, I guess: amoung others, this will
    strip all white spaces from the input before putting it into the string!
    Yes, this goes through the sentries and the preparation etc. What you
    probably want to do is this:

    std::string f(std::istream& is) {
    return std::string( std::istreambuf_iterator<char>(is),
    std::istreambuf_iterator<char>() );
    }

    This does not go through the sentires. However, for this to be efficient,
    the library has either to implement the general segmented iterator
    optimization or it has to special case this particular use in some form.
    My implementation has a special case (which is pretty close to the general
    optimization but is not quite there) and this is the fastest method to
    read a string, especially for a file with the "C" facet: in this case it
    essentially amounts to a memcpy() from a memory mapped file to the string.
    I would expect this to be the fastest approach with typical implementations:
    this may bypass certain internal buffers, etc. For buffered input streams
    this should at the very least process blocks of characters from buffers
    directly.
    Go measure... I would expect the 'rdbuf()' to be significantly faster than
    processing individual characters. Here is something which should also be
    faster than processing individual characters:

    enum { bufsize = 8192 };
    char buf[bufsize];
    std::string s;
    for (std::streamsize size = 0; size = is.read(buf, bufsize) > 0; )
    s.append(buf, size);

    (this code is untested and I'm somewhat humble with respect to the string
    interface...).
     
    Dietmar Kuehl, Feb 26, 2004
    #8
  9. Yes, I found this out by now. :eek:>
    This was the next thing I was about to try.
    Could I do this the other way around, too?

    std::stringstream ss;
    ss << is.rdbuf();
    return ss.str();

    And if so, is there anything different in
    principle or is it just down to the
    particular library?
    The problem is, I need to find a way to do
    this which most likely is fast on a couple
    of platforms without beeing able to profile
    it on each one.
    I see.
    The good old char buf read functions. I
    wonder why it is so hard to do something
    efficiently without having to go back to
    C-ish ways.

    Schobi

    --
    is never read
    I'm Schobi at suespammers dot org

    "Sometimes compilers are so much more reasonable than people."
    Scott Meyers
     
    Hendrik Schober, Feb 26, 2004
    #9
  10. Hendrik Schober

    tom_usenet Guest

    I've posted a few solutions to this in the past:

    http://www.google.com/groups?selm=

    There are lots more ways, and the most efficient somewhat depends on
    the library implementation in question.

    Tom
     
    tom_usenet, Feb 26, 2004
    #10
  11. I didn't think of seeking through a
    stream to get its size! Of all the
    reasons I wanted to do this I did
    manage to eliminate all except that
    I need the size of the data to be
    read from the stream. Since you just
    showed me how to get this, I won't
    even need to read the whole thing
    into a string anymore!
    Yes. What I wanted was a solution
    that has good performance on most
    platforms. However, I think I don't
    need it anymore. :)
    Schobi

    --
    is never read
    I'm Schobi at suespammers dot org

    "Sometimes compilers are so much more reasonable than people."
    Scott Meyers
     
    Hendrik Schober, Feb 26, 2004
    #11
  12. Hendrik Schober

    tom_usenet Guest

    There are a couple of provisos.

    Firstly, opening the stream in binary mode is likely to give you a
    better result (e.g. the number of bytes in the file) - text mode
    sometimes has funny ideas about where a file ends on some OSes.

    Secondly, it won't work for files whose length won't fit in a
    std::streamoff (e.g. bigger than, say, 2GB).

    Finally, don't forget you can just use a std::filebuf and cut out the
    fstream entirely.

    Tom
     
    tom_usenet, Feb 26, 2004
    #12
  13. Is there anything worse to be expected than
    the "\r\n" problem? As this is just for
    progress indication for the users, accuracy
    is not as important.
    Yes. But I woulnd't have thought of loading
    these into a string anyway. :)
    How do I read a line from a streambuf?
    Schobi

    --
    is never read
    I'm Schobi at suespammers dot org

    "Sometimes compilers are so much more reasonable than people."
    Scott Meyers
     
    Hendrik Schober, Feb 26, 2004
    #13
  14. Thanks for the enlightment!
    I see.
    Yes, but then there is all the different
    versions of these libraries. And once a
    piece of code works, nobody will go into
    it and check whether with the newest
    version this or that could be optimized
    using another technique...
    Warum eigentlich?

    But I wonder whether it is a flaw in the
    design if something like reading into a
    string cannot easily be done fast with
    the recommended approach.

    Schobi

    --
    is never read
    I'm Schobi at suespammers dot org

    "Sometimes compilers are so much more reasonable than people."
    Scott Meyers
     
    Hendrik Schober, Feb 26, 2004
    #14
  15. Well, essentially, a streambuf iterator iterates over buffers of
    characters. Sure, it is always the same buffer but just envision each
    fill of the buffer a separate one. Now, each of these buffers can be
    processed in a chunk making up a segment of the overall sequence.
    Taking advantage of this view results in faster code because rather
    than making two checks in each iteration, there is just one. Also, it
    is possible to unroll the loop even further because the sizes of the
    segments are known in advance, allowing to make a check only for
    something like every 100th character. Without this optimization, the
    processing of stream buffers will work more efficiently because this
    processing does just this, just more naturally (at least, I would
    expect it from most implementations).

    The general principle can also be applied to other kinds of sequences
    which are similarily segmented. 'std::deque's and hashes using lists
    of each bucket come to mind.

    This is how I'm normally writing it. The direction should not really
    matter and the same function should be used underneath.
    But you should get a general feeling which things work fast and which
    don't by trying out a couple. Actually, I'm aware of only five
    different libraries being in wider use:
    - Dinkumware (eg. shipping with MSVC++)
    - libstdc++ (shipping with gcc)
    - Metrowerk's library shipping with their compiler
    - RougeWave (used to ship eg. with Sun CC)
    - STLport (a free drop in place library)

    I'm unaware of any other standard C++ library shipping with a commmercial
    compiler (ObjectSpace dropped their library and mine was never shipping
    with anything; is there any other reasonably complete standard library
    implementation still in use?)
    Well, the segmented iterator optimization requires quite a bit of
    machinery to work. It gives a nice abstract interface to an efficient
    implementation. Just, nobody does it because the library implementers are
    kept busy with all kinds of other stuff and optimizations. The low-level
    stuff is some wiring you can apply yourself...
     
    Dietmar Kuehl, Feb 26, 2004
    #15
  16. This is fine depending on the stream type. As I'm sure you know, an
    arbitrary stream deosn't have to be arbitrarily-positional. If you
    know that the streams you will be using are arbitrarily-positional,
    you're all set.

    You could try seeking, and then testing whether the result is a valid
    stream poosition. If it's not, you could then use another method.
    However, I'm not sure its guaranteed that a stream will be in a valid
    state after a failed seek.

    Jonathan
     
    Jonathan Turkanis, Feb 26, 2004
    #16
  17. How do I detect a failed positioning?

    Mhmm. Right now it will be file streams
    and string streams only which I assume
    to be positional. I think I will try
    this and put an assert to be triggered
    if anything fails.
    Schobi

    --
    is never read
    I'm Schobi at suespammers dot org

    "Sometimes compilers are so much more reasonable than people."
    Scott Meyers
     
    Hendrik Schober, Feb 26, 2004
    #17
  18. Test it against -1.

    Jonathan
     
    Jonathan Turkanis, Feb 26, 2004
    #18
  19. Schobi

    --
    is never read
    I'm Schobi at suespammers dot org

    "Sometimes compilers are so much more reasonable than people."
    Scott Meyers
     
    Hendrik Schober, Feb 26, 2004
    #19
  20. FTR, I just found another one:

    const std::istream::char_type chEof = std::istream::traits_type::eof();
    std::string f( std::istream& is )
    {
    std::string tmp;
    std::getline( is, tmp, chEof );
    return tmp;
    }


    Schobi

    --
    is never read
    I'm Schobi at suespammers dot org

    "Sometimes compilers are so much more reasonable than people."
    Scott Meyers
     
    Hendrik Schober, Feb 29, 2004
    #20
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.