J
Jacek Dziedzic
Hi!
I need a routine like:
std::string nth_word(const std::string &s, unsigned int n) {
// return n-th word from the string, n is 0-based
// if 's' contains too few words, return ""
// 'words' are any sequences of non-whitespace characters
// leading, trailing and multiple whitespace characters
// should be ignored.
// eg. "These are four\t\twords\t\t".
}
I am currenlty using something like:
std::string nth_word(const std::string& source, unsigned int n) {
// the addition of " " allows for the extraction of the last
// word, after which ss would go eof() below if not for the space
stringstream ss(source+" ");
string s;
for(unsigned int k=0;k<=n;k++) {
ss >> s;
if(!ss.good()) return ""; // eof
}
return s;
}
which is fine, except it performs poorly. Before I'm flamed
with accusations of premature optimization, let me tell you
that I profiled my code and over 50% of time is spent in this
routine. This does not surprise me -- I am extracting words
from text files in the order of GB and it takes annoyingly
long...
I'm thinking of a combination of find_first_not_of and
find_first_of, but before I code it, perhaps somebody can
comment on this? I have a gut feeling that some nasty
strtok hack would be even faster, would it? Or is there
perhaps some other, performance-oriented way like traversing
s.c_str() with a pointer and memcpying out the relevant part?
TIA,
- J.
I need a routine like:
std::string nth_word(const std::string &s, unsigned int n) {
// return n-th word from the string, n is 0-based
// if 's' contains too few words, return ""
// 'words' are any sequences of non-whitespace characters
// leading, trailing and multiple whitespace characters
// should be ignored.
// eg. "These are four\t\twords\t\t".
}
I am currenlty using something like:
std::string nth_word(const std::string& source, unsigned int n) {
// the addition of " " allows for the extraction of the last
// word, after which ss would go eof() below if not for the space
stringstream ss(source+" ");
string s;
for(unsigned int k=0;k<=n;k++) {
ss >> s;
if(!ss.good()) return ""; // eof
}
return s;
}
which is fine, except it performs poorly. Before I'm flamed
with accusations of premature optimization, let me tell you
that I profiled my code and over 50% of time is spent in this
routine. This does not surprise me -- I am extracting words
from text files in the order of GB and it takes annoyingly
long...
I'm thinking of a combination of find_first_not_of and
find_first_of, but before I code it, perhaps somebody can
comment on this? I have a gut feeling that some nasty
strtok hack would be even faster, would it? Or is there
perhaps some other, performance-oriented way like traversing
s.c_str() with a pointer and memcpying out the relevant part?
TIA,
- J.