Reading columns in a text file

C

C++ Newbie

Suppose I have a text file with the input:
1 2 3 4 5 6 7 8 9 10 ! Comment: Integers 1 - 10

How do I write a C++ program that reads in this line into a 10-element
vector and ignores the comment?

Thanks.
 
C

Christian Hackl

C++ Newbie said:
Suppose I have a text file with the input:
1 2 3 4 5 6 7 8 9 10 ! Comment: Integers 1 - 10

How do I write a C++ program that reads in this line into a 10-element
vector and ignores the comment?

First of all, you need to read the line from a file, preferably using
std::ifstream and std::getline.

Now that you've got a std::string that contains the line, use
std::string's member functions to get the substring up to the start of
the comment. Hint: find() and substr() will probably be useful.

In order to tokenize the remaining substring by spaces, use
boost::tokenizer and just push the tokens into a std::vector.
boost::lexical_cast can be used for string->int conversion.

(If you've never used the Boost libraries before, now is the perfect
moment to start.)

Stay away from strtok(), atoi() and arrays.
 
M

Martin York

Now that you've got a std::string that contains the line, use
std::string's member functions to get the substring up to the start of
the comment. Hint: find() and substr() will probably be useful.

That sounds like an awful lot of work when streams will do all that
for you automatically.

In order to tokenize the remaining substring by spaces, use
boost::tokenizer and just push the tokens into a std::vector.
boost::lexical_cast can be used for string->int conversion.


Wow. More work. When again the streams do it automatically.

(If you've never used the Boost libraries before, now is the perfect
moment to start.)


Yep. Learn how to use boost.
But first learn how to use the STL and the stream operators.


--------------------
// If your data file is just one line long
std::fstream file("file");

// read 1 integer.
int x;
file >> x;
// repeat 10 time (probably in a loop)

-------------------------------------
// If your data file is line based.
// Read 1 line into s string stream then use the stream operators.

std::fstream file("file");


// Repeat this for each line
std::string line;
std::getline(file,line);

std::stringstream lineStream(line);

// repeat above code to get the integers just use lineStream rather
than file.
int x;
lineStream >> x;


Now learn how to use the STL to do all the above nearly automatically.



Stay away from strtok(), atoi() and arrays.

Yep.
 
J

James Kanze

How do you know there are 10 elements?

Or more to the point, does he know? And what determines what is
a comment, and what isn't?
Basically, you read integers and stuff them into a vector
until you get an error or the end of the line. If you get an
error, ignore the rest of the line.

Maybe. Until we know what the actual specification is, any
suggestions are just guesswork. If the specification says that
the '!' character starts a comment, then the simplest solution
might be to use a filtering streambuf, so that characters from
the '!' to the line end simply don't show up in the input.
Although if the syntax is otherwise line oriented, this might be
overkill, since you can use getline, as you propose later. If,
on the other hand, the syntax is 10 elements, and anything else
is a comment, you need some other approach.
I strongly suggest two step processing: first, read a line
from your file, then, second, process the line you just read
to extract the individual integers (until the end of the line
or an error which should mean the end of the vector).

That's generally a good solution if the syntax is line oriented.
If the syntax says that anything following a '!' is a comment,
then it is trivial to trim anything after the first '!' from the
input line. It can be made to work in more complicated cases as
well.
 
J

Jim Langston

Victor said:
How do you know there are 10 elements?

Basically, you read integers and stuff them into a vector until you
get an error or the end of the line. If you get an error, ignore the
rest of the line. I strongly suggest two step processing: first,
read a line from your file, then, second, process the line you just
read to extract the individual integers (until the end of the line or
an error which should mean the end of the vector).

To read a line use 'std::geline' function. Then define a
istringstream from the line you just read into a string, and loop
while it's "good". Read an individual int, and if successful, stuff
it into your vector. Once your istringstream is no good, proceed to
reading the next line from the file. Do that until the file has no
more lines.

One addition I would make is to peek to see if the next character to read is
a ! or not. If it wasn't, and you get an error, I would produce some
diagnosis stating I was expecting a number or a !, but I received something
else.
 
C

C++ Newbie

Martin said:
That sounds like an awful lot of work when streams will do all that
for you automatically.

Hi everyone, thanks for the replies. How does this look?

inputfile.txt
3 ! Rows
5 ! Column entries
1 2 3 4 5
6 7 8 9 0
1 2 3 4 5

fstream myfile;
myfile.open("inputfile.txt");
string inputline;
string comment_starts("!"); // Comments flagged by "!"
unsigned int offset;
getline(myfile, inputline);
offset = inputline.find(comment_starts); // Find location of comment
inputline = inputline.substr(0,offset); // Trim string
unsigned int rows = atoi(inputline.c_str());

[Repeated for columns. Sorry about using atoi; I thought it would be
OK given that there should be only 1 integer in the first two lines of
the file.]

// Read in the 2D data
int i, j;
int x[columns][rows];
for (j = 0; j < rows; j++)
{for (i = 0; i < columns; i++)
{myfile >> x[j];} // Line #
}

How is it that the line # correctly reads in the columns and advances
to the next row of the array when the inputfile.txt's line hits a
carriage return? By analogy if we were writing the contents of x[j]
out to myfile, we would have to explicitly specify a carriage return,
i.e.:
// Write out the 2D data
for (j = 0; j < rows; j++)
{myfile << "\n";
for (i = 0; i < columns; i++)
{myfile << x[j];} // Line #
}

Why is it a bad idea to use arrays? I need to store the 2D data in a
2D array for later manipulation.
 
J

James Kanze

Hi everyone, thanks for the replies. How does this look?
inputfile.txt
3 ! Rows
5 ! Column entries
1 2 3 4 5
6 7 8 9 0
1 2 3 4 5

Do the comments really mean what they seem to mean? That is: is
the format of the file fixed so that the first line contains a
single integer with the number of rows, the second a single
integer with the number of columns, and there are then number of
rows lines, each with number of columns integers. And what
determines what is a comment? Anything after a '!'?

Until we know this, it's impossible to say whether your code is
right or not. If I suppose the above, however (and that empty
lines or just comment line are not allowed---IMHO, not a good
idea), then your code has a number of problems.
fstream myfile;
myfile.open("inputfile.txt");
string inputline;
string comment_starts("!"); // Comments flagged by "!"
unsigned int offset;
getline(myfile, inputline);
offset = inputline.find(comment_starts); // Find location of comment
inputline = inputline.substr(0,offset); // Trim string

Since you have to do this for every line, it really needs to be
in a separate function:

std::istream&
getInputLine( std::istream& source, std::string& dest )
{
std::string line ;
std::getline( source, line ) ;
if ( source ) {
dest = std::string(
line.begin(),
std::find( line.begin(), line.end(), '!' ) ) ;
}
return source ;
}

(I'd actually probably have it returning a Fallible, but the
above corresponds closest to the standard idiom.)
unsigned int rows = atoi(inputline.c_str());
[Repeated for columns. Sorry about using atoi; I thought it
would be OK given that there should be only 1 integer in the
first two lines of the file.]

Except that it doesn't allow for any error handling. What
happens if the line doesn't contain an integer?

Again, I'd go with a separate function:

std::istream&
getIntegers(
std::istream& source,
std::vector< int >& dest,
int count )
{
std::string line ;
if ( getInputLine( line ) ) {
// To support "blank" lines, insert a loop with the
// getInputLine, reading until you get either a line
// with at least one non-blank character or an error.
// Alternatively, the loop could be in
// getInputLine().
std::istringstream s( line ) ;
std::vector< int > tmp( count ) ;
for ( int i = 0 ; s && i < count ; ++ i ) {
s >> tmp[ i ] ;
}
s >> std::ws ;
if ( s && s.get() == EOF ) {
dest = tmp ;
} else {
source.setstate( std::ios::failbit ) ;
}
}
}

If you don't mind partially mangling the vector if there is an
error, you can skip the intermediate `tmp', resize dest, and
read directly to it.
// Read in the 2D data
int i, j;
int x[columns][rows];

This isn't legal C++, and shouldn't compile. For it to be
legal, both columns and rows must be constants.
for (j = 0; j < rows; j++)
{for (i = 0; i < columns; i++)
{myfile >> x[j];} // Line #
}

How is it that the line # correctly reads in the columns and
advances to the next row of the array when the inputfile.txt's
line hits a carriage return?

It doesn't. By default, end of line is just white space, like
any other white space. Between each read, you skip blank space.
Your code doesn't care if the structure of the file is correct
or not.
By analogy if we were writing the contents of x[j] out to
myfile, we would have to explicitly specify a carriage return,
i.e.:


If that's what you wanted. On output, you have to manually
insert white space; on input, it is skipped (but some separator
had better be there, or you won't be able to read the file).
// Write out the 2D data
for (j = 0; j < rows; j++)
{myfile << "\n";
for (i = 0; i < columns; i++)
{myfile << x[j];} // Line #
}

Why is it a bad idea to use arrays?

Because they're broken in the language. They're second class
objects, which don't behave like other objects.

In your case, also, because they must have compile-time constant
dimensions.
I need to store the 2D data in a 2D array for later
manipulation.

What's wrong with `std::vector< std::vector< int > >'. If
nothing else, it will make input an order of magnitude simpler.
Using the above functions:

std::vector< int > line ;
if ( ! getIntegers( source, line, 1 )
|| line[ 0 ] < 1 ) {
// Fatal error...
}
int rows = line[ 0 ] ;
if ( ! getIntegers( source, line, 1 )
|| line[ 0 ] < 1 ) {
// Fatal error...
}
int columns = line[ 0 ] ;
std::vector< std::vector< int > >
data ;
while ( source && data.size() != rows ) {
getIntegers( source, line, columns ) ) ;
if ( source ) {
data.push_back( line ) ;
}
}
if ( data.size() != rows ) {
// Error, not enough data...
}
source >> std::ws ;
if ( ! source || source.get() != EOF ) {
// Error, unexpected garbage at end of file
}

Of course, this all supposes that my assumptions concerning your
file format are correct. Before writing a single line of code,
you should specify the file format exactly, and program to that.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top