D
David Rubin
I looked on google for an answer, but I didn't find anything short of
using boost which sufficiently answers my question: what is a good way
of doing string tokenization (note: I cannot use boost). For example, I
have tried this:
#include <algorithm>
#include <cctype>
#include <climits>
#include <deque>
#include <iostream>
#include <iterator>
#include <string>
using namespace std;
int
main()
{
string delim;
int c;
/* fill delim */
for(c=0; c < CHAR_MAX; c++){ // I tried #include <limits>, but
failed...
if((isspace(c) || ispunct(c))
&& !(c == '_' || c == '#')
delim += c;
}
string buf;
string::size_type op, np;
deque<string> tok;
while(std::getline(cin, buf) && !cin.fail()){
op = 0;
while((np=buf.find_first_of(delim, op)) != buf.npos){
tok.push_back(string(&buf[op], np-op));
if((op=buf.find_first_not_of(delim, np)) == buf.npos)
break;
}
tok.push_back(string(&buf[op]));
cout << buf << endl;
copy(tok.begin(), tok.end(), ostream_iterator<string>(cout,
"\n"));
cout << endl;
tok.clear();
}
return 0;
}
The inner loop basically finds tokens delimited by any character in
delim where multiple delimiters may appear between tokens (algorithm
follows some advice found on clc++). However, the method seems a little
clumsy, especially with respect to temporary objects. (Also, it does not
seem to work correctly. For example, the last token gets corrupted in
the second outer loop iteration.)
Also, it would be very nice to have a function like
int tokenize(const string& s, container<string>& c);
which returns the number of tokens, inserted into the container.
However, how do you write this so c is any container model? I'm not sure
you can since they don't share a base class. Is there any better way?
Certainly, this is easy to do with a mix of C and C++:
for(char *t=strtok(buf, delim); t != 0; t=strtok(0, delim))
tok.push_back(t);
where buf and delim are essentially char*'s. However, this seems
unsatisfactory as well.
/david
using boost which sufficiently answers my question: what is a good way
of doing string tokenization (note: I cannot use boost). For example, I
have tried this:
#include <algorithm>
#include <cctype>
#include <climits>
#include <deque>
#include <iostream>
#include <iterator>
#include <string>
using namespace std;
int
main()
{
string delim;
int c;
/* fill delim */
for(c=0; c < CHAR_MAX; c++){ // I tried #include <limits>, but
failed...
if((isspace(c) || ispunct(c))
&& !(c == '_' || c == '#')
delim += c;
}
string buf;
string::size_type op, np;
deque<string> tok;
while(std::getline(cin, buf) && !cin.fail()){
op = 0;
while((np=buf.find_first_of(delim, op)) != buf.npos){
tok.push_back(string(&buf[op], np-op));
if((op=buf.find_first_not_of(delim, np)) == buf.npos)
break;
}
tok.push_back(string(&buf[op]));
cout << buf << endl;
copy(tok.begin(), tok.end(), ostream_iterator<string>(cout,
"\n"));
cout << endl;
tok.clear();
}
return 0;
}
The inner loop basically finds tokens delimited by any character in
delim where multiple delimiters may appear between tokens (algorithm
follows some advice found on clc++). However, the method seems a little
clumsy, especially with respect to temporary objects. (Also, it does not
seem to work correctly. For example, the last token gets corrupted in
the second outer loop iteration.)
Also, it would be very nice to have a function like
int tokenize(const string& s, container<string>& c);
which returns the number of tokens, inserted into the container.
However, how do you write this so c is any container model? I'm not sure
you can since they don't share a base class. Is there any better way?
Certainly, this is easy to do with a mix of C and C++:
for(char *t=strtok(buf, delim); t != 0; t=strtok(0, delim))
tok.push_back(t);
where buf and delim are essentially char*'s. However, this seems
unsatisfactory as well.
/david