Tokenize string, homework

W

WP

Hello! I need some help with my program...it's supposed to read infix
expressions line by line from stdin and each expression should be
divided into operands and operators and added to a vector of strings. So
if we read one line that holds "1+2" the vector should afterwards hold
the strings "1", "+" and "2".
Valid operators are +, -, * and / meaning they are of length 1.
Valid operands are ints >= 0 meaning they can stretch over several chars.
There may be whitespace in the string, which should be skipped.
It can be assumed no invalid input occurs in the string (something that
is not an operand or an operator (according to the above rules) and is
not whitespace but we must check that before each operator we have an
operand.

I made program that tests four hard-coded strings (one valid and three
invalid) and it seems to work for that. I'm posting here because I want
to see if I could get some improvement tips. Note that I have an
assertion for invalid chars which is not needed for the assignment but I
added it to catch bugs.

Here's the program, it's a bit long but I wanted to post something
complete. Oh, also, I may not use anything that is not available in the
C++ standard.

#include <cassert>
#include <cctype>
#include <iostream>
#include <sstream>
#include <vector>

using namespace std;

string char_to_string(char c)
{
string s;
s += c;

return s;
}

bool is_operator(char c)
{
static const string valid_operators("+-*/");

return valid_operators.find(c) != string::npos;
}

bool is_operand(char c)
{
istringstream iss(char_to_string(c));
int n;

return iss >> n;
}

void divide_and_print(const string& infix);

int main()
{
const size_t dim = 4;
string infix_expressions[dim] =
{"1+222+ 32", "1++2", "1+ +2", "+1"};

for (size_t i = 0; i < dim; ++i)
{
divide_and_print(infix_expressions);
}
}

void divide_and_print(const string& infix)
{
// This variable is used to build up the current operand char by char
string current_operand;
enum {NOTHING, OPERATOR, OPERAND} last_was = NOTHING;
vector<string> tokens;

for (string::const_iterator itr = infix.begin();
itr != infix.end(); ++itr)
{
if (isspace(*itr))
{
// The current char is some sort of whitespace, we skip over
// it. Note that we don't update the variable last_was,
// that's because we want to detect if we have two operators
// with no intervening operand.
;
}
else if (is_operand(*itr))
{
// The current char is an operand. We add it to current_operand
// (remember operands may stretch over several chars).
current_operand += *itr;

if ((itr + 1) == infix.end() || !is_operand(*(itr + 1)))
{
// This was the last char in the current operand so we add
// it to our vector of tokens and we also make sure we
// clear current_operand.
tokens.push_back(current_operand);
current_operand.clear();
}

last_was = OPERAND;
}
else if (is_operator(*itr))
{
// An operator may only follow non-operators.
if (last_was == OPERATOR || last_was == NOTHING)
{
cout << "There must be an operand before each operator!\n";
cout << "Invalid expression was: " << infix << endl;

return;
}

last_was = OPERATOR;

// operators are only one char long so we convert it to string
// and add it to our vector.
tokens.push_back(char_to_string(*itr));
}
else
// We encountered something that was not an operator, operand
// or ws.
assert(0);
}

for (vector<string>::size_type i = 0; i < tokens.size(); ++i)
{
cout << tokens.at(i) << '\n';
}

cout << endl;
}

If no one wants to look at all that I can understand it. :) Anyway,
thanks for reading and I appreciate any suggestions on improvement or if
anyone spots any bugs.

/WP
 
A

alan

Hello! I need some help with my program...it's supposed to read infix
expressions line by line from stdin and each expression should be
divided into operands and operators and added to a vector of strings. So
if we read one line that holds "1+2" the vector should afterwards hold
the strings "1", "+" and "2".
Valid operators are +, -, * and / meaning they are of length 1.
Valid operands are ints >= 0 meaning they can stretch over several chars.
There may be whitespace in the string, which should be skipped.
It can be assumed no invalid input occurs in the string (something that
is not an operand or an operator (according to the above rules) and is
not whitespace but we must check that before each operator we have an
operand.

I made program that tests four hard-coded strings (one valid and three
invalid) and it seems to work for that. I'm posting here because I want
to see if I could get some improvement tips. Note that I have an
assertion for invalid chars which is not needed for the assignment but I
added it to catch bugs.

Here's the program, it's a bit long but I wanted to post something
complete. Oh, also, I may not use anything that is not available in the
C++ standard.

#include <cassert>
#include <cctype>
#include <iostream>
#include <sstream>
#include <vector>

using namespace std;
the above is generally a bad idea, because in the future, the c++
standard may define some new function that will conflict with a
function name in your program.
string char_to_string(char c)
{
string s;
s += c;

return s;

}
string s(1,c) would do as well.
bool is_operator(char c)
{
static const string valid_operators("+-*/");

return valid_operators.find(c) != string::npos;

}
I would have used switch instead, but personal preference I guess.
bool is_operand(char c)
{
istringstream iss(char_to_string(c));
int n;

return iss >> n;

}
I would recommend using the same method as is_operator, whatever
method you decide to eventually use. In fact I would suggest using
templates with a function is_a, although sadly C++ does not support
constant strings in templated instances.
void divide_and_print(const string& infix);

int main()
{
const size_t dim = 4;
string infix_expressions[dim] =
{"1+222+ 32", "1++2", "1+ +2", "+1"};

for (size_t i = 0; i < dim; ++i)
{
divide_and_print(infix_expressions);
}

}

Why not just have it input from the user? That way you can test more
without having to keep recompiling.
void divide_and_print(const string& infix)
{
// This variable is used to build up the current operand char by char
string current_operand;
enum {NOTHING, OPERATOR, OPERAND} last_was = NOTHING;
vector<string> tokens;

for (string::const_iterator itr = infix.begin();
itr != infix.end(); ++itr)
You will eventually get problems in the future when your program needs
to process multicharacter operators said:
{
if (isspace(*itr))
{
// The current char is some sort of whitespace, we skip over
// it. Note that we don't update the variable last_was,
// that's because we want to detect if we have two operators
// with no intervening operand.
;
}
else if (is_operand(*itr))
{
// The current char is an operand. We add it to current_operand
// (remember operands may stretch over several chars).
current_operand += *itr;

if ((itr + 1) == infix.end() || !is_operand(*(itr + 1)))
{
// This was the last char in the current operand so we add
// it to our vector of tokens and we also make sure we
// clear current_operand.
tokens.push_back(current_operand);
current_operand.clear();
}

last_was = OPERAND;
}
else if (is_operator(*itr))
{
// An operator may only follow non-operators.
if (last_was == OPERATOR || last_was == NOTHING)
{
cout << "There must be an operand before each operator!\n";
cout << "Invalid expression was: " << infix << endl;

return;
}

last_was = OPERATOR;

// operators are only one char long so we convert it to string
// and add it to our vector.
tokens.push_back(char_to_string(*itr));
}
else
// We encountered something that was not an operator, operand
// or ws.
assert(0);
}

for (vector<string>::size_type i = 0; i < tokens.size(); ++i)
{
cout << tokens.at(i) << '\n';
}

cout << endl;

}

If no one wants to look at all that I can understand it. :) Anyway,
thanks for reading and I appreciate any suggestions on improvement or if
anyone spots any bugs.

/WP
You may want to also study lexing and parsing.
 
T

Tadeusz B. Kopec

Hello! I need some help with my program...it's supposed to read infix
expressions line by line from stdin and each expression should be
divided into operands and operators and added to a vector of strings. So
if we read one line that holds "1+2" the vector should afterwards hold
the strings "1", "+" and "2".
Valid operators are +, -, * and / meaning they are of length 1. Valid
operands are ints >= 0 meaning they can stretch over several chars.
There may be whitespace in the string, which should be skipped. It can
be assumed no invalid input occurs in the string (something that is not
an operand or an operator (according to the above rules) and is not
whitespace but we must check that before each operator we have an
operand.

Alan already wrote even more remarks than I would have :)
I would only add that it's strange to me that two adjacent operands are
valid (2 + 3 4).
 
D

David Harmon

On Fri, 23 Nov 2007 00:39:48 +0100 in comp.lang.c++, WP
If no one wants to look at all that I can understand it. :) Anyway,
thanks for reading and I appreciate any suggestions on improvement or if
anyone spots any bugs.

I did not spot any bugs, but if it compiles and passes the tests
then it ought to do for homework. There are many ways to approach
something like this, and I expect that many of the things I might
suggest are of no use to you; if they have not been covered in the
course so far then you are probably not supposed to use them.

Other than nitpicking, the main things I would do differently is,
where you are parsing the input, think of it more in terms of
token-by-token looping rather than char-by-char. Thus something
like:

string::const_iterator itr = infix.begin();
while (itr != infix.end()) {
char ch = *itr++;
if (!isspace(ch)) {
string current(ch, 1);
if (isdigit(ch)) {
while(itr != infix.end() && isdigit(*itr))
current += (ch = *itr++);
}
tokens.push_back(current);
}
}


Or even more likely, a get_token() function that returns the next
token from the input, one by one. That is more likely to fit into a
larger parsing scheme. Either way, I would want to separate the
tokenizing from the other functions such as the checking for
alternating operators and operands.

Review the Desk Calculator example developed in chapter 6.1 of
Stroustrup _The C++ Programming Language, Third Ed._
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,169
Latest member
ArturoOlne
Top