Reading from a stream til EOF

H

Hendrik Schober

Hi,

I have a 'std::istream' and need to read
its whole contents into a string. How can
I do this?

TIA;

Schobi

--
(e-mail address removed) is never read
I'm Schobi at suespammers dot org

"Sometimes compilers are so much more reasonable than people."
Scott Meyers
 
R

Rodrigo Dominguez

Hendrik said:
Hi,

I have a 'std::istream' and need to read
its whole contents into a string. How can
I do this?

TIA;

Schobi
well, I'm not an expert on STL, but here are some examples

example 1:

char c;
while(your_istream.get(c))
your_string.push_back(c);

example 2:

char c;
while(your_istream >> c)
your_string.push_back(c);


example 3:

string your_string;
while(your_istream >> your_string)
foo();
 
H

Hendrik Schober

Rodrigo Dominguez said:
Hendrik said:
Hi,

I have a 'std::istream' and need to read
its whole contents into a string. How can
I do this?

TIA;

Schobi
well, I'm not an expert on STL, but here are some examples
[...]

Actually I was hoping for something
that would promiss more performance.

Schobi

--
(e-mail address removed) is never read
I'm Schobi at suespammers dot org

"Sometimes compilers are so much more reasonable than people."
Scott Meyers
 
J

Jonathan Turkanis

Hendrik Schober said:
Hi,

I have a 'std::istream' and need to read
its whole contents into a string. How can
I do this?

I'm afraid making a copy at some point is unavoidable. I wish you
could call reserve() and then write directly into the underlying
storage, as with vector -- at least if the string had never been
copied.

Jonathan
 
H

Hendrik Schober

Jonathan Turkanis said:
I'm afraid making a copy at some point is unavoidable. I wish you
could call reserve() and then write directly into the underlying
storage, as with vector -- at least if the string had never been
copied.

I suppose you mean 'resize()', where you
say 'reserve()'? The problem is, I don't
see how I can find out how much there is
to read from the stream in advance.
What I'm doing right now is this:

std::string f(std::istream& is)
{
return std::string( std::istream_iterator<char>(is)
, std::istream_iterator<char>() );
}

However, I suppose this goes through all
the sentries etc. for each and every char?
One other thing I was thinking about is
that 'operator>>' seems to be overloaded
for a stream buffer on the RHS. So should
this

std::stringstream ss;
is >> ss.rdbuf();
return ss.str();

do what I think? And if so, can I expect
better performance from this compared to
copying the char myself?

Schobi

--
(e-mail address removed) is never read
I'm Schobi at suespammers dot org

"Sometimes compilers are so much more reasonable than people."
Scott Meyers
 
J

Jonathan Turkanis

Hendrik Schober said:
I suppose you mean 'resize()', where you
Yes.

say 'reserve()'? The problem is, I don't
see how I can find out how much there is
to read from the stream in advance.

Right. That's unavoidable. An exponential growth strategy is the way
to go. You should get this automatically with string, or you can do it
yourself.
What I'm doing right now is this:

std::string f(std::istream& is)
{
return std::string( std::istream_iterator<char>(is)
, std::istream_iterator<char>() );
}

You defintely don't want to do this if you're concerned with
efficiency. At the very least, you should extract the underlying
streambuf using is.rdbuf(), and read into a char array using sgetn.
However, I suppose this goes through all
the sentries etc. for each and every char?
One other thing I was thinking about is
that 'operator>>' seems to be overloaded
for a stream buffer on the RHS. So should
this

std::stringstream ss;
is >> ss.rdbuf();
return ss.str();

I would have guessed that a good implementation would implement this
as I described above, but I checked dinkumware and it does a
character-by-character extraction. So I would use a char buffer.

(In my first response, I though you were mainly interested in avoiding
the final copy when you call ss.str())

Jonathan
 
H

Hendrik Schober

Jonathan Turkanis said:
[...]
say 'reserve()'? The problem is, I don't
see how I can find out how much there is
to read from the stream in advance.

Right. That's unavoidable. An exponential growth strategy is the way
to go. You should get this automatically with string, or you can do it
yourself.

I planned to let 'std::string' take care
of this. :)
You defintely don't want to do this if you're concerned with
efficiency.

I see. I was expecting this. I suppose
using streambuf iterators wouldn't help
much with this?
At the very least, you should extract the underlying
streambuf using is.rdbuf(), and read into a char array using sgetn.

As this avoids creating/destroying any
sentries and all the formatting?
I would have guessed that a good implementation would implement this
as I described above, but I checked dinkumware and it does a
character-by-character extraction.

Thanks for checking. We are indeed using
Dinkumware on two platforms. So this would
not help much. I should probably ask about
this MS' std lib newsgroup, as PJP and PB
are reading and posting there.
So I would use a char buffer.

I am not sure what you mean here. Can you
elaborate.
(In my first response, I though you were mainly interested in avoiding
the final copy when you call ss.str())

Well, actually, I would need to istream
the content later anyway. However, first
I need the size of it. (The real task is
to parse the data, which is a rather
lengthy process. OTOH the raw data itself
usually is not very big. So I thought it
would be better to loose some performance
on copying to get the size, as this would
give me a real progress bar for visual
feedback to the users.)

Schobi

--
(e-mail address removed) is never read
I'm Schobi at suespammers dot org

"Sometimes compilers are so much more reasonable than people."
Scott Meyers
 
D

Dietmar Kuehl

Hendrik Schober said:
What I'm doing right now is this:

std::string f(std::istream& is)
{
return std::string( std::istream_iterator<char>(is)
, std::istream_iterator<char>() );
}

This is not at all what you want to do, I guess: amoung others, this will
strip all white spaces from the input before putting it into the string!
However, I suppose this goes through all
the sentries etc. for each and every char?

Yes, this goes through the sentries and the preparation etc. What you
probably want to do is this:

std::string f(std::istream& is) {
return std::string( std::istreambuf_iterator<char>(is),
std::istreambuf_iterator<char>() );
}

This does not go through the sentires. However, for this to be efficient,
the library has either to implement the general segmented iterator
optimization or it has to special case this particular use in some form.
My implementation has a special case (which is pretty close to the general
optimization but is not quite there) and this is the fastest method to
read a string, especially for a file with the "C" facet: in this case it
essentially amounts to a memcpy() from a memory mapped file to the string.
One other thing I was thinking about is
that 'operator>>' seems to be overloaded
for a stream buffer on the RHS. So should
this

std::stringstream ss;
is >> ss.rdbuf();
return ss.str();

I would expect this to be the fastest approach with typical implementations:
this may bypass certain internal buffers, etc. For buffered input streams
this should at the very least process blocks of characters from buffers
directly.
do what I think? And if so, can I expect
better performance from this compared to
copying the char myself?

Go measure... I would expect the 'rdbuf()' to be significantly faster than
processing individual characters. Here is something which should also be
faster than processing individual characters:

enum { bufsize = 8192 };
char buf[bufsize];
std::string s;
for (std::streamsize size = 0; size = is.read(buf, bufsize) > 0; )
s.append(buf, size);

(this code is untested and I'm somewhat humble with respect to the string
interface...).
 
H

Hendrik Schober

Dietmar Kuehl said:
This is not at all what you want to do, I guess: amoung others, this will
strip all white spaces from the input before putting it into the string!

Yes, I found this out by now. :eek:>
Yes, this goes through the sentries and the preparation etc. What you
probably want to do is this:

std::string f(std::istream& is) {
return std::string( std::istreambuf_iterator<char>(is),
std::istreambuf_iterator<char>() );
}

This does not go through the sentires.

This was the next thing I was about to try.
However, for this to be efficient,
the library has either to implement the general segmented iterator
optimization [...]
???
[...]
std::stringstream ss;
is >> ss.rdbuf();
return ss.str();

I would expect this to be the fastest approach with typical implementations:
this may bypass certain internal buffers, etc. For buffered input streams
this should at the very least process blocks of characters from buffers
directly.

Could I do this the other way around, too?

std::stringstream ss;
ss << is.rdbuf();
return ss.str();

And if so, is there anything different in
principle or is it just down to the
particular library?
[...]
Go measure...

The problem is, I need to find a way to do
this which most likely is fast on a couple
of platforms without beeing able to profile
it on each one.
I would expect the 'rdbuf()' to be significantly faster than
processing individual characters.

I see.
Here is something which should also be
faster than processing individual characters:

enum { bufsize = 8192 };
char buf[bufsize];
std::string s;
for (std::streamsize size = 0; size = is.read(buf, bufsize) > 0; )
s.append(buf, size);

(this code is untested and I'm somewhat humble with respect to the string
interface...).

The good old char buf read functions. I
wonder why it is so hard to do something
efficiently without having to go back to
C-ish ways.

Schobi

--
(e-mail address removed) is never read
I'm Schobi at suespammers dot org

"Sometimes compilers are so much more reasonable than people."
Scott Meyers
 
H

Hendrik Schober

tom_usenet said:
[...]
I've posted a few solutions to this in the past:

http://www.google.com/[email protected]

I didn't think of seeking through a
stream to get its size! Of all the
reasons I wanted to do this I did
manage to eliminate all except that
I need the size of the data to be
read from the stream. Since you just
showed me how to get this, I won't
even need to read the whole thing
into a string anymore!
There are lots more ways, and the most efficient somewhat depends on
the library implementation in question.

Yes. What I wanted was a solution
that has good performance on most
platforms. However, I think I don't
need it anymore. :)

Schobi

--
(e-mail address removed) is never read
I'm Schobi at suespammers dot org

"Sometimes compilers are so much more reasonable than people."
Scott Meyers
 
T

tom_usenet

tom_usenet said:
[...]
I've posted a few solutions to this in the past:

http://www.google.com/[email protected]

I didn't think of seeking through a
stream to get its size! Of all the
reasons I wanted to do this I did
manage to eliminate all except that
I need the size of the data to be
read from the stream. Since you just
showed me how to get this, I won't
even need to read the whole thing
into a string anymore!

There are a couple of provisos.

Firstly, opening the stream in binary mode is likely to give you a
better result (e.g. the number of bytes in the file) - text mode
sometimes has funny ideas about where a file ends on some OSes.

Secondly, it won't work for files whose length won't fit in a
std::streamoff (e.g. bigger than, say, 2GB).

Finally, don't forget you can just use a std::filebuf and cut out the
fstream entirely.

Tom
 
H

Hendrik Schober

tom_usenet said:
[...]
Firstly, opening the stream in binary mode is likely to give you a
better result (e.g. the number of bytes in the file) - text mode
sometimes has funny ideas about where a file ends on some OSes.

Is there anything worse to be expected than
the "\r\n" problem? As this is just for
progress indication for the users, accuracy
is not as important.
Secondly, it won't work for files whose length won't fit in a
std::streamoff (e.g. bigger than, say, 2GB).

Yes. But I woulnd't have thought of loading
these into a string anyway. :)
Finally, don't forget you can just use a std::filebuf and cut out the
fstream entirely.

How do I read a line from a streambuf?

Schobi

--
(e-mail address removed) is never read
I'm Schobi at suespammers dot org

"Sometimes compilers are so much more reasonable than people."
Scott Meyers
 
H

Hendrik Schober

Dietmar Kuehl said:
[...]
Well, essentially, a streambuf iterator [...]

Thanks for the enlightment!
This is how I'm normally writing it. The direction should not really
matter and the same function should be used underneath.

I see.
But you should get a general feeling which things work fast and which
don't by trying out a couple. Actually, I'm aware of only five
different libraries being in wider use:
- Dinkumware (eg. shipping with MSVC++)
- libstdc++ (shipping with gcc)
- Metrowerk's library shipping with their compiler
- RougeWave (used to ship eg. with Sun CC)
- STLport (a free drop in place library)

Yes, but then there is all the different
versions of these libraries. And once a
piece of code works, nobody will go into
it and check whether with the newest
version this or that could be optimized
using another technique...
I'm unaware of any other standard C++ library shipping with a commmercial
compiler (ObjectSpace dropped their library and mine was never shipping
with anything;

Warum eigentlich?
is there any other reasonably complete standard library
implementation still in use?)


Well, the segmented iterator optimization requires quite a bit of
machinery to work. It gives a nice abstract interface to an efficient
implementation. Just, nobody does it because the library implementers are
kept busy with all kinds of other stuff and optimizations. The low-level
stuff is some wiring you can apply yourself...


But I wonder whether it is a flaw in the
design if something like reading into a
string cannot easily be done fast with
the recommended approach.

Schobi

--
(e-mail address removed) is never read
I'm Schobi at suespammers dot org

"Sometimes compilers are so much more reasonable than people."
Scott Meyers
 
D

Dietmar Kuehl

Hendrik said:
Dietmar Kuehl said:
std::string f(std::istream& is) {
return std::string( std::istreambuf_iterator<char>(is),
std::istreambuf_iterator<char>() );
}
However, for this to be efficient,
the library has either to implement the general segmented iterator
optimization [...]

Well, essentially, a streambuf iterator iterates over buffers of
characters. Sure, it is always the same buffer but just envision each
fill of the buffer a separate one. Now, each of these buffers can be
processed in a chunk making up a segment of the overall sequence.
Taking advantage of this view results in faster code because rather
than making two checks in each iteration, there is just one. Also, it
is possible to unroll the loop even further because the sizes of the
segments are known in advance, allowing to make a check only for
something like every 100th character. Without this optimization, the
processing of stream buffers will work more efficiently because this
processing does just this, just more naturally (at least, I would
expect it from most implementations).

The general principle can also be applied to other kinds of sequences
which are similarily segmented. 'std::deque's and hashes using lists
of each bucket come to mind.

Could I do this the other way around, too?

std::stringstream ss; std::eek:stringstream ss;
ss << is.rdbuf();
return ss.str();

This is how I'm normally writing it. The direction should not really
matter and the same function should be used underneath.
The problem is, I need to find a way to do
this which most likely is fast on a couple
of platforms without beeing able to profile
it on each one.

But you should get a general feeling which things work fast and which
don't by trying out a couple. Actually, I'm aware of only five
different libraries being in wider use:
- Dinkumware (eg. shipping with MSVC++)
- libstdc++ (shipping with gcc)
- Metrowerk's library shipping with their compiler
- RougeWave (used to ship eg. with Sun CC)
- STLport (a free drop in place library)

I'm unaware of any other standard C++ library shipping with a commmercial
compiler (ObjectSpace dropped their library and mine was never shipping
with anything; is there any other reasonably complete standard library
implementation still in use?)
The good old char buf read functions. I
wonder why it is so hard to do something
efficiently without having to go back to
C-ish ways.

Well, the segmented iterator optimization requires quite a bit of
machinery to work. It gives a nice abstract interface to an efficient
implementation. Just, nobody does it because the library implementers are
kept busy with all kinds of other stuff and optimizations. The low-level
stuff is some wiring you can apply yourself...
 
J

Jonathan Turkanis

Hendrik Schober said:
tom_usenet said:
I didn't think of seeking through a
stream to get its size! Of all the
reasons I wanted to do this I did
manage to eliminate all except that
I need the size of the data to be
read from the stream. Since you just
showed me how to get this, I won't
even need to read the whole thing
into a string anymore!

This is fine depending on the stream type. As I'm sure you know, an
arbitrary stream deosn't have to be arbitrarily-positional. If you
know that the streams you will be using are arbitrarily-positional,
you're all set.

You could try seeking, and then testing whether the result is a valid
stream poosition. If it's not, you could then use another method.
However, I'm not sure its guaranteed that a stream will be in a valid
state after a failed seek.

Jonathan
 
H

Hendrik Schober

Jonathan Turkanis said:
[...]
I didn't think of seeking through a
stream to get its size! Of all the
reasons I wanted to do this I did
manage to eliminate all except that
I need the size of the data to be
read from the stream. Since you just
showed me how to get this, I won't
even need to read the whole thing
into a string anymore!

This is fine depending on the stream type. As I'm sure you know, an
arbitrary stream deosn't have to be arbitrarily-positional. If you
know that the streams you will be using are arbitrarily-positional,
you're all set.

You could try seeking, and then testing whether the result is a valid
stream poosition. If it's not, you could then use another method.
However, I'm not sure its guaranteed that a stream will be in a valid
state after a failed seek.

How do I detect a failed positioning?

Mhmm. Right now it will be file streams
and string streams only which I assume
to be positional. I think I will try
this and put an assert to be triggered
if anything fails.

Schobi

--
(e-mail address removed) is never read
I'm Schobi at suespammers dot org

"Sometimes compilers are so much more reasonable than people."
Scott Meyers
 
J

Jonathan Turkanis

Hendrik Schober said:
Jonathan Turkanis said:
[...]
I didn't think of seeking through a
stream to get its size! Of all the
reasons I wanted to do this I did
manage to eliminate all except that
I need the size of the data to be
read from the stream. Since you just
showed me how to get this, I won't
even need to read the whole thing
into a string anymore!

This is fine depending on the stream type. As I'm sure you know, an
arbitrary stream deosn't have to be arbitrarily-positional. If you
know that the streams you will be using are arbitrarily-positional,
you're all set.

You could try seeking, and then testing whether the result is a valid
stream poosition. If it's not, you could then use another method.
However, I'm not sure its guaranteed that a stream will be in a valid
state after a failed seek.

How do I detect a failed positioning?

Test it against -1.

Jonathan
 
H

Hendrik Schober

Jonathan Turkanis said:
[...]
Test it against -1.
Thanks!

Jonathan

Schobi

--
(e-mail address removed) is never read
I'm Schobi at suespammers dot org

"Sometimes compilers are so much more reasonable than people."
Scott Meyers
 
H

Hendrik Schober

tom_usenet said:
I've posted a few solutions to this in the past:

http://www.google.com/[email protected]

There are lots more ways, and the most efficient somewhat depends on
the library implementation in question.

FTR, I just found another one:

const std::istream::char_type chEof = std::istream::traits_type::eof();
std::string f( std::istream& is )
{
std::string tmp;
std::getline( is, tmp, chEof );
return tmp;
}



Schobi

--
(e-mail address removed) is never read
I'm Schobi at suespammers dot org

"Sometimes compilers are so much more reasonable than people."
Scott Meyers
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top