Reading unformatted text from stdin

L

Lionel B

Greetings,

I need to read (unformatted text) from stdin up to EOF into a char
buffer; of course I cannot allocate my buffer until I know how much
text is available, and I do not know how much text is available until I
have read it... which seems to imply that multiple reads of the input
stream will be inevitable.

Now I can correctly find the number of characters available by:
|
| #include <iostream>
|
| std::cin.ignore(std::numeric_limits<int>::max());
| const int num_chars = std::cin.gcount();
|
Then I would like to do:
|
| char* const text = new char[num_chars+1];
| std::cin.read(text,num_chars);
| text[num_chars]='\0';
|
but the read() won't work because, as I understand it, ignore() has (as
its name implies) `thrown away' all characters in the stream...!

I am sure this must be a stock situation, and wonder if there is an
(efficient, elegant) stock way of approaching it.
Any tips appreciated,
 
M

Mike Wahler

Lionel B said:
Greetings,

I need to read (unformatted text) from stdin up to EOF into a char
buffer; of course I cannot allocate my buffer until I know how much
text is available, and I do not know how much text is available until I
have read it... which seems to imply that multiple reads of the input
stream will be inevitable.

Now I can correctly find the number of characters available by:
|
| #include <iostream>
|
| std::cin.ignore(std::numeric_limits<int>::max());
| const int num_chars = std::cin.gcount();
|
Then I would like to do:
|
| char* const text = new char[num_chars+1];
| std::cin.read(text,num_chars);
| text[num_chars]='\0';
|
but the read() won't work because, as I understand it, ignore() has (as
its name implies) `thrown away' all characters in the stream...!

I am sure this must be a stock situation, and wonder if there is an
(efficient, elegant) stock way of approaching it.
Any tips appreciated,

#include <algorithm>
#include <iostream>
#include <string>

int main()
{
std::cout << "Enter text: ";
std::string line;
std::getline(std::cin, line);

if(!line.empty())
{
char * const text = new char[line.size() + 1];
std::copy(line.begin(), line.end(), text);
text[line.size()] = 0;
std::cout << text << '\n';
delete[] text;
}

return 0;
}

-Mike
 
K

Karl Heinz Buchegger

Lionel said:
Greetings,

I need to read (unformatted text) from stdin up to EOF into a char
buffer; of course I cannot allocate my buffer until I know how much
text is available, and I do not know how much text is available until I
have read it... which seems to imply that multiple reads of the input
stream will be inevitable.

Now I can correctly find the number of characters available by:
|
| #include <iostream>
|
| std::cin.ignore(std::numeric_limits<int>::max());
| const int num_chars = std::cin.gcount();
|
Then I would like to do:
|
| char* const text = new char[num_chars+1];
| std::cin.read(text,num_chars);
| text[num_chars]='\0';
|
but the read() won't work because, as I understand it, ignore() has (as
its name implies) `thrown away' all characters in the stream...!

I don't understand.
Why did you do ignore() at the stream?
What should be the purpose of it?
I am sure this must be a stock situation, and wonder if there is an
(efficient, elegant) stock way of approaching it.
Any tips appreciated,

Well the tip is: If you want to read then it is unwise to first
throw away everything you want to read :)

Besides:
Did you know that std::string can hold a very long text?
Did you know that there is a function getline for strings?
Did you know that you can tell getline() what it should use as
delimiter for 'lines'?
 
D

Dietmar Kuehl

Lionel said:
I need to read (unformatted text) from stdin up to EOF into a char
buffer;

What's wrong with this?

| std::ifstream in("some file", std::ios_base::binary);
| std::eek:stringstream tmp;
| tmp << in.rdbuf();
| std::string const& contents = tmp.str();

.... or:

| std::ifstream in("some file",
std::ios_base::binary);
| std::istreambuf_iterator<char> beg(in), end;
| std::string contents(beg, end);

Actually, I like the latter better but it is probably considerably
slower than the first alternative on most implementations.
 
L

Lionel B

Mike said:
#include <algorithm>
#include <iostream>
#include <string>

int main()
{
std::cout << "Enter text: ";
std::string line;
std::getline(std::cin, line);

I guess that might work, but my input might contain '\n' chars (i.e.
the default eol chars), so I'd have to give getline() an eol char that
would never actually occur (I think '\0' would probably do it). Shall
try it.

Cheers,
 
M

Mike Wahler

Lionel B said:
I guess that might work, but my input might contain '\n' chars (i.e.
the default eol chars),

With the above code, not possible.
so I'd have to give getline() an eol char that

The default 'termination' character for 'std::getline()' is '\n'
(this can be overridden by supplying a different one as the
third argument to 'std::getline()' ).
would never actually occur (I think '\0' would probably do it).

On many systems, it's not possible to input a '\0' character.
Shall
try it.

Did you try what I wrote?

-Mike
 
L

Lionel B

Mike said:
With the above code, not possible.


The default 'termination' character for 'std::getline()' is '\n'
(this can be overridden by supplying a different one as the
third argument to 'std::getline()' ).

That's exactly what I was thinking of...
On many systems, it's not possible to input a '\0' character.

In my case input may be redirected from a file or piped from another
program, so input is not predictable (that's what I meant by
"unformatted" in my original post). However, a '\0' could be considered
pathological.
Did you try what I wrote?

Yes. As expected, it fails if the input comprises multiple lines (only
the first line is read). However, it works ok if I replace:

std::getline(std::cin, line);

with:

std::getline(std::cin, line, '\0');


Cheers,
 
L

Lionel B

Karl said:
Lionel said:
Greetings,

I need to read (unformatted text) from stdin up to EOF into a char
buffer; of course I cannot allocate my buffer until I know how much
text is available, and I do not know how much text is available
until I have read it... which seems to imply that multiple reads of
the input stream will be inevitable.

Now I can correctly find the number of characters available by:
|
| #include <iostream>
|
| std::cin.ignore(std::numeric_limits<int>::max());
| const int num_chars = std::cin.gcount();
|
Then I would like to do:
|
| char* const text = new char[num_chars+1];
| std::cin.read(text,num_chars);
| text[num_chars]='\0';
|
but the read() won't work because, as I understand it, ignore()
has (as its name implies) `thrown away' all characters in the
stream...!

I don't understand.
Why did you do ignore() at the stream?
What should be the purpose of it?

It was just a first cut attempt at calculating the number of characters
in the stream. The "advantage" is that it will, if invoked as above,
read to the end of the stream and enable gcount() to return the number
of characters correctly. Of course it has a fatal disadvantage...
Well the tip is: If you want to read then it is unwise to first
throw away everything you want to read :)

.... of course ;-) What I really need is an ignore() that doesn't
ignore; i.e. an unformatted read call which reads to the end of the
stream but doesn't actually extract any chars. As far as I know no
such call exists; I suspect it may be possible to do something along
these lines with peek() in a loop. Might give that a try.
Besides:
Did you know that std::string can hold a very long text?
Did you know that there is a function getline for strings?
Did you know that you can tell getline() what it should use as
delimiter for 'lines'?

Yep. See Mike Wahler's suggestion and my reply.
Regards,
 
L

Lionel B

Dietmar said:
What's wrong with this?

| std::ifstream in("some file", std::ios_base::binary);
| std::eek:stringstream tmp;
| tmp << in.rdbuf();
| std::string const& contents = tmp.str();

Works well. I have also tried:

| const int nchars= std::in.rdbuf()->in_avail()-1;
| char* const text = new char[nchars+1];
| std::in.read(text,nchars);
| text[nchars]='\0';

(needs the "-1" in the 1st line; I guess in_avail() counts the EOF too)
which should be pretty efficient, although I am not quite sure whether
in_avail() will always give me what I expect (i.e. the entire input up
to EOF). Seems to work, though.

Cheers,
 
L

Lionel B

Lionel said:
Dietmar said:
What's wrong with this?

| std::ifstream in("some file", std::ios_base::binary);
| std::eek:stringstream tmp;
| tmp << in.rdbuf();
| std::string const& contents = tmp.str();

Works well. I have also tried:

| const int nchars= std::in.rdbuf()->in_avail()-1;
| char* const text = new char[nchars+1];
| std::in.read(text,nchars);
| text[nchars]='\0';

(needs the "-1" in the 1st line; I guess in_avail() counts the
EOF too) which should be pretty efficient, although I am not quite
sure whether in_avail() will always give me what I expect (i.e. the
entire input up to EOF). Seems to work, though.

Correction: seems to work *sometimes* :-/ If I pipe input in from
another program (my prog reads stdin; i.e. in = cin) then sometimes
in_avail() returns 0... maybe some synching/flushing issue? Or cruddy
implementation of pipes (this is Win2k)? Tricky to replicate exact
conditions under which it doesn't work.
So I'm sticking with Dietmar's first method for now.
 
D

Dietmar Kuehl

Lionel said:
Works well. I have also tried:

| const int nchars= std::in.rdbuf()->in_avail()-1;
| char* const text = new char[nchars+1];
| std::in.read(text,nchars);
| text[nchars]='\0';

(needs the "-1" in the 1st line; I guess in_avail() counts the EOF too)
which should be pretty efficient, although I am not quite sure whether
in_avail() will always give me what I expect (i.e. the entire input up
to EOF). Seems to work, though.

This does not work: 'in_avail()' returns the number of characters
in the stream buffer's buffer. However, this number is not at all
related to the number of characters to be expected from the stream.
The use of 'in_avail()' is actually very limited - personally I had
no good use for 'in_avail()', yet.
 
M

Mike Wahler

Lionel B said:
That's exactly what I was thinking of...


In my case input may be redirected from a file or piped from another
program, so input is not predictable (that's what I meant by
"unformatted" in my original post). However, a '\0' could be considered
pathological.


Yes. As expected, it fails if the input comprises multiple lines (only
the first line is read). However, it works ok if I replace:

std::getline(std::cin, line);

with:

std::getline(std::cin, line, '\0');

Alternatively you can read line-by-line in a loop:

while(std::getline(std::cin, line))
{
/* do stuff */
}
if(!std.cin.eof())
/* error occurred while reading */

This form will allow you to do error checking for
each line, if that helps at all (e.g. it could let
you spot a possible stray '\0' 'mid-stream')

-Mike
 
A

Alex Vinokur

Dietmar Kuehl said:
What's wrong with this?

| std::ifstream in("some file", std::ios_base::binary);
| std::eek:stringstream tmp;
| tmp << in.rdbuf();
| std::string const& contents = tmp.str();

... or:

| std::ifstream in("some file",
std::ios_base::binary);
| std::istreambuf_iterator<char> beg(in), end;
| std::string contents(beg, end);

Actually, I like the latter better but it is probably considerably
slower than the first alternative on most implementations.
[snip]


Testing "Reading contents from file into one string" with using C/C++ Perfometer.

Summary
http://groups-beta.google.com/group/perfo/msg/5801f89772746dc3
http://groups-beta.google.com/group/perfo/attach/5801f89772746dc3/perfo_summary_file2str.txt?part=2

========================================================
| | File size |
| Testsuite |------------------------|
| | 1000 : 10000 : 100000 |
|------------------------------------------------------|
| getline | 125 : 839 : 7729 |
| vector, reading char | 75 : 592 : 5816 |
| string, reading char | 71 : 570 : 5626 |
| vector, reading whole file | 16 : 30 : 146 |
| mmap (UNIX) | 13 : 18 : 30 |
| istream_iterator | 80 : 597 : 6028 |
| ostringstream, rdbuf | 15 : 20 : 66 |
| istreambuf_iterator | 30 : 71 : 624 |
========================================================

We can see that speed differences between the testsuites are sizeable.

Full raw run log:
http://groups-beta.google.com/group/log-files/msg/1f7160243fd793cb
http://groups-beta.google.com/group/log-files/attach/1f7160243fd793cb/perfo_log_file2str.txt?part=2
 
L

Lionel B

Alex said:
/snip/

Testing "Reading contents from file into one string" with using
C/C++ Perfometer.

Summary
http://groups-beta.google.com/group/perfo/msg/5801f89772746dc3
http://groups-beta.google.com/group/perfo/attach/5801f89772746dc3/perfo_summary_file2str.txt?part=2

========================================================
| | File size |
| Testsuite |------------------------|
| | 1000 : 10000 : 100000 |
|------------------------------------------------------|
| getline | 125 : 839 : 7729 |
| vector, reading char | 75 : 592 : 5816 |
| string, reading char | 71 : 570 : 5626 |
| vector, reading whole file | 16 : 30 : 146 |
| mmap (UNIX) | 13 : 18 : 30 |
| istream_iterator | 80 : 597 : 6028 |
| ostringstream, rdbuf | 15 : 20 : 66 |
| istreambuf_iterator | 30 : 71 : 624 |
========================================================

We can see that speed differences between the testsuites
are sizeable.

Thanks, that's interesting... I'll probably stick with the
ostringstream/rdbuf version.
 
A

Alex Vinokur

Alex Vinokur said:
Testing "Reading contents from file into one string" with using C/C++ Perfometer.

Summary
http://groups-beta.google.com/group/perfo/msg/5801f89772746dc3
http://groups-beta.google.com/group/perfo/attach/5801f89772746dc3/perfo_summary_file2str.txt?part=2

========================================================
| | File size |
| Testsuite |------------------------|
| | 1000 : 10000 : 100000 |
|------------------------------------------------------|
| getline | 125 : 839 : 7729 |
| vector, reading char | 75 : 592 : 5816 |
| string, reading char | 71 : 570 : 5626 |
| vector, reading whole file | 16 : 30 : 146 |
| mmap (UNIX) | 13 : 18 : 30 |
| istream_iterator | 80 : 597 : 6028 |
| ostringstream, rdbuf | 15 : 20 : 66 |
| istreambuf_iterator | 30 : 71 : 624 |
========================================================

We can see that speed differences between the testsuites are sizeable.

Full raw run log:
http://groups-beta.google.com/group/log-files/msg/1f7160243fd793cb
http://groups-beta.google.com/group/log-files/attach/1f7160243fd793cb/perfo_log_file2str.txt?part=2
[snip]


Extended set of testsuites for "Reading contents from file into one string"
contains 29 testsuites:
* 4 - for C language,
* 1 - for UNIX system calls,
* 24 - for C++ language.


Testing "Reading contents from file into one string"
with using Simple C/C++ Perfometer.
---------------------------
* http://groups-beta.google.com/group/perfo/msg/8273f4d1a05cfbd1
* http://article.gmane.org/gmane.comp.lang.c++.perfometer/110
* http://permalink.gmane.org/gmane.comp.lang.c++.perfometer/110
* http://comments.gmane.org/gmane.comp.lang.c++.perfometer/110
* http://cache.gmane.org/gmane/comp/lang/c++/perfometer/110
---------------------------


Environment: Windows 2000, Cygwin
File size = 10000

Test results sorted by ascending time used
(the best of time used in binary and text openmode).

===========================================================================
| Testsuite | File OpenMode |
|-------------------------------------------------------|-----------------|
| Code : Name | binary : text |
|-------------------------------------------------------|-----------------|
| C-04 : C-function fread, max size buffer | 10 : 10 |
| C-03 : C-function fread, const size buffer | 10 : 13 |
| CPP-24 : std::string and istream::read | 10 : 16 |
| Unix-C-05 : UNIX system call mmap | 13 : 53 |
| CPP-05 : istream::read, ostream::write, | 20 : 30 |
| : const size buffer | : |
| CPP-06 : istream::read, ostream::write, | 20 : 30 |
| : ostringstream, const size buffer | : |
| CPP-04 : ifstream::rdbuf, ostream::eek:perator<< | 20 : 33 |
| CPP-08 : istream::read, ostream::write | 23 : 30 |
| : max size buffer | : |
| CPP-03 : streambuf::sbumpc, ostream::eek:perator<< | 26 : 30 |
| CPP-23 : std::vector, istream::read | 33 : 43 |
| CPP-07 : istream::readsome, ostream::write | 53 : 60 |
| : const size buffer | : |
| CPP-11 : istream::getline, ostringstream | 56 : 57 |
| : ostream::eek:perator<< | 56 : 57 |
| CPP-14 : istream::get(char*, streamsize), | 56 : 57 |
| : ostream::eek:perator<<, const size | : |
| CPP-15 : istream::get(streambuf&), streambuf, | 163 : 76 |
| : ostream::eek:perator<< | : |
| CPP-20 : istreambuf_iterator, std::string | 193 : 167 |
| CPP-18 : istreambuf_iterator, ostreambuf_iterator, | 187 : 370 |
| : std::copy | : |
| CPP-19 : istreambuf_iterator, ostreambuf_iterator, | 270 : 190 |
| : std::transform | : |
| CPP-13 : istream::get(char) | 1442 : 1428 |
| CPP-22 : std::vector, push_back() | 1502 : 1452 |
| CPP-17 : istream_iterator, std::string | 1542 : 1592 |
| CPP-09 : std::getline, ostringstream, | 1656 : 1619 |
| : ostream::eek:perator<< | 1656 : 1619 |
| CPP-02 : streambuf::sbumpc | 1625 : 1652 |
| CPP-10 : std::getline, std::string, | 1665 : 1652 |
| : ostream::eek:perator<< | : |
| C-02 : C-function fgetc | 1939 : 1966 |
| C-01 : C-function getc | 2002 : 1983 |
| CPP-21 : std::vector, std::copy | 2827 : 2890 |
| CPP-12 : istream::get(char), ostream::put | 2830 : 2841 |
| CPP-01 : istream::eek:perator>> | 2951 : 2957 |
| CPP-16 : istream_iterator, ostream_iterator, | 4232 : 4269 |
| : std::copy | : |
===========================================================================


We can see that the best method of the C++ methods is CPP-24
that uses std::string and istream::read()

====== CPP-24 method ======
string str (infile_size, '0');
infile.read(&ret_str[0], infile_size);
return str;
===========================
 
V

Victor Bazarov

Alex said:
[...]
We can see that the best method of the C++ methods is CPP-24
that uses std::string and istream::read()

====== CPP-24 method ======
string str (infile_size, '0');
infile.read(&ret_str[0], infile_size);
return str;
===========================

.... which probably falls back onto C method for reading a buffer of
a known size using fread. You could have just asked...
 
A

Alex Vinokur

Victor Bazarov said:
Alex said:
[...]
We can see that the best method of the C++ methods is CPP-24
that uses std::string and istream::read()

====== CPP-24 method ======
string str (infile_size, '0');
infile.read(&ret_str[0], infile_size);
return str;
===========================

... which probably falls back onto C method for reading a buffer of
a known size using fread. You could have just asked...

Does ifstream have access to file-pointer of the same file?
 
V

Victor Bazarov

Alex said:
Victor Bazarov said:
Alex said:
[...]
We can see that the best method of the C++ methods is CPP-24
that uses std::string and istream::read()

====== CPP-24 method ======
string str (infile_size, '0');
infile.read(&ret_str[0], infile_size);
return str;
===========================

... which probably falls back onto C method for reading a buffer of
a known size using fread. You could have just asked...


Does ifstream have access to file-pointer of the same file?

It probably does. Why?
 
A

Alex Vinokur

Victor Bazarov said:
Alex said:
Victor Bazarov said:
Alex Vinokur wrote:

[...]
We can see that the best method of the C++ methods is CPP-24
that uses std::string and istream::read()

====== CPP-24 method ======
string str (infile_size, '0');
infile.read(&ret_str[0], infile_size);
return str;
===========================

... which probably falls back onto C method for reading a buffer of
a known size using fread. You could have just asked...


Does ifstream have access to file-pointer of the same file?

It probably does. Why?

But we can't do that in our C++ programs (?).
 
V

Victor Bazarov

Alex said:
Victor Bazarov said:
Alex said:
Alex Vinokur wrote:


[...]
We can see that the best method of the C++ methods is CPP-24
that uses std::string and istream::read()

====== CPP-24 method ======
string str (infile_size, '0');
infile.read(&ret_str[0], infile_size);
return str;
===========================

... which probably falls back onto C method for reading a buffer of
a known size using fread. You could have just asked...


Does ifstream have access to file-pointer of the same file?

It probably does. Why?


But we can't do that in our C++ programs (?).

Do what? You can have your own file-pointer, can't you? No, you cannot
obtain the inner data of an ifstream if that's what you're asking. It is
called the "data hiding" or "data abstraction" principle. You're not
supposed to care how ifstream deals with opening files and reading its
data. It does not have to be a file-pointer. It probably is, but it does
not have to be.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top