Iterating through a string

S

Steve

Hi Guys,

I have a string which contains data elements separated by spaces. I also
have a function which returns the number of characters from the beginning of
the string for a given number of spaces. I am using a loop with strchr for
the number of spaces, and I was just wondering if there was a more efficient
(ie faster) way of achieving this. Any ideas please? Current function code
below:

int GetNewPosition(const std::string& DataStr, const int Spaces)
{
int i = 0; // space counter
char* b = const_cast<char*>(DataStr.c_str()); // beginning of string

do {
b = strchr(b, ' '); // find next space
++i; ++b; // increment space counter and pointer
} while(i<Spaces && b != NULL);

if(b) return b - DataStr.c_str();
else return -1;
}


Any help is appreciated,
Thanks for reading,
Steve.
 
D

David Harmon

On Tue, 8 Jun 2004 22:42:36 +0100 in comp.lang.c++, "Steve"
the string for a given number of spaces. I am using a loop with strchr for
the number of spaces, and I was just wondering if there was a more efficient
(ie faster) way of achieving this.

For efficiency, avoid going over the same ground more than once, i.e.
continue from the previous position and do not restart at the beginning
of the string.

Use string::find() functions rather than .c_str() and dropping down to
low-level old library strchr().

Do not subtract one .c_str() from another. There is no guarantee that
..c_str() will point to the same buffer every time.

See if using a string splitter function would suit you better?
http://groups.google.com/[email protected]
 
D

Dave Townsend

There's probably a standard library input stream class which you can use to
suck out the pieces of data you want.


If you want to roll your own, you could create a specialized iterator object
which takes your string and lets you iterate over it. It seems that you
are always starting from the beginning when you want the next position and
count through the string to find the next one. By using an object to wrap
the
string, you could maintain this state information in the object and wouldn't
need to compute it from scratch every time.


class FooBar
{
public:
FooBar( const std::string& DataStr )
:_DataStr(DataStr), _position(0)
{
// move position to point to the first data item.
}

std::string GetDataItem( ) const
{
// return a string representing the next data item.
}
bool IsValid() const ; // return true if there are more items to return,
false otherwise.
bool Next(); // advance through datastring until end of string or next
data item is found,
// update position counter.

bool First() ; // wind back position to be at the first data item, so we
can iterate again over
// the string.

private:
const std::string _DataStr
int _position;

};

.......

FooBar myfoobar( DataStr);

for ( myfoobar.First() ; myfoobar.IsValid( DataItem); myfoobar.Next() )
{
string ds = myfoobar.GetDataItem();

}
 
D

Daniel T.

Steve said:
Hi Guys,

I have a string which contains data elements separated by spaces. I also
have a function which returns the number of characters from the beginning of
the string for a given number of spaces. I am using a loop with strchr for
the number of spaces, and I was just wondering if there was a more efficient
(ie faster) way of achieving this. Any ideas please? Current function code
below:

int GetNewPosition(const std::string& DataStr, const int Spaces)
{
int i = 0; // space counter
char* b = const_cast<char*>(DataStr.c_str()); // beginning of string

do {
b = strchr(b, ' '); // find next space
++i; ++b; // increment space counter and pointer
} while(i<Spaces && b != NULL);

if(b) return b - DataStr.c_str();
else return -1;
}

I don't think your code does what you want it to do, but since you
didn't show what you wanted, I'm not sure.

I wrote this:

int main() {
string elements( "aaa bbb ccc ddd" );
for ( int x = 0; x < 10; ++x ) {
int pos = GetNewPosition( elements, x );
cout << pos << endl;
}
}

and got this output:

4
4
8
12
-654603

*crash*

I'm not entirely sure what your code is supposed to do, but this
function may be more in line with what you are looking for:

int GetNewPosition(const std::string& DataStr, int Spaces)
{
assert( Spaces >= 0 );
int result = -1;
unsigned working = DataStr.find_first_not_of( ' ' );
// strictly speaking 'working != string::npos' in the
// below isn't necessary. It makes the code more efficient.
while ( working != string::npos && Spaces-- ) {
working = DataStr.find_first_not_of( ' ',
DataStr.find( ' ', working ) );
}
assert( working == string::npos || working <= INT_MAX );
if ( working != string::npos )
result = working;
return result;
}

On most computers, I could have done without the 'working' variable,
instead using the 'result' variable. However, from what I understand -1
== string::npos is not guaranteed to be true.

Also note the two asserts in the function. They check the assumptions
made in the algorighm (a) Spaces isn't negitive (it may work with a
negitive value for Spaces, but I didn't test it, (b) that the position
of the char we are looking for is small enough to fit in an int (what
would it take to generate a string that would cause this assert to fire?)
 
D

Daniel T.

Dave Townsend said:
If you want to roll your own, you could create a specialized iterator object
which takes your string and lets you iterate over it.

If the OP decides to do this, I think he would be better off if he
followed the standard iterator concept.

class FooBar
{
public:
FooBar( const std::string& DataStr )
:_DataStr(DataStr), _position(0)
{
// move position to point to the first data item.
}

std::string GetDataItem( ) const
{
// return a string representing the next data item.
}
bool IsValid() const ; // return true if there are more items to return,
false otherwise.
bool Next(); // advance through datastring until end of string or next
data item is found,
// update position counter.

bool First() ; // wind back position to be at the first data item, so we
can iterate again over
// the string.

private:
const std::string _DataStr
int _position;

};


The class below is a standard input iterator:

class word_iterator:
public std::iterator<std::input_iterator_tag, std::string>
{
friend bool operator==( const word_iterator& lhs,
const word_iterator& rhs );
friend std::eek:stream& operator<<( std::eek:stream& lhs,
const word_iterator& rhs ); // for testing only

const std::string* str;
std::string::size_type pos;

public:
word_iterator( const std::string* s = 0 ):
str( s && !s->empty() ? s: 0 ),
pos(std::string::npos) {
if ( s )
pos = s->find_first_not_of( ' ' );
}

const std::string operator*() const {
assert( pos != std::string::npos );
unsigned len = str->find( ' ', pos );
if ( len < std::string::npos ) len -= pos;
return str->substr( pos, len );
}

const std::string* operator->() const {
// is there a better way to write this function?
static std::string bit;
bit = **this;
return &bit;
}

const word_iterator& operator++() {
assert( pos != std::string::npos );
if ( pos != std::string::npos )
pos = str->find_first_not_of( ' ', str->find( ' ', pos ) );
return *this;
}

const word_iterator operator++(int) {
word_iterator tmp( *this );
++(*this);
return tmp;
}
};

bool operator==( const word_iterator& lhs, const word_iterator& rhs ) {
// comparing iterators from two different containers is undefined
return lhs.pos == rhs.pos;
}

bool operator!=( const word_iterator& lhs, const word_iterator& rhs ) {
return !( lhs == rhs );
}

std::eek:stream& operator<<( std::eek:stream& lhs, const word_iterator& rhs ) {
// for testing purposes only
lhs << "Pos: " << rhs.pos << " of string: '"
<< ((rhs.str) ? *rhs.str : std::string("NULL")) << "'";
return lhs;
}
FooBar myfoobar( DataStr);

for ( myfoobar.First() ; myfoobar.IsValid( DataItem); myfoobar.Next() )
{
string ds = myfoobar.GetDataItem();

}

Mine would be used like this:

string s;
//...
for ( word_iterator x( &s ); x != word_iterator(); ++x )
cout << "word: " << *x
<< " is " << x->length() << "long.\n";


Or you could use it in the standard algorithms:

string s;
//...
copy( word_iterator(&s), word_iterator(),
ostream_iterator<string>(cout, "\n"));
 
S

Siemel Naran

David Harmon said:
Use string::find() functions rather than .c_str() and dropping down to
low-level old library strchr().

The strcmp, strchr, memcmp functions may be more efficient than std::equal,
std::find because of block optimizations (ie. builtin CPU instructions to
analyze an entire block of bytes at once). In principle an implementation
may specialize std::find etc to call strchr, but I have yet to find one.
 
S

Siemel Naran

int GetNewPosition(const std::string& DataStr, const int Spaces)
{
int i = 0; // space counter
char* b = const_cast<char*>(DataStr.c_str()); // beginning of string

do {
b = strchr(b, ' '); // find next space
++i; ++b; // increment space counter and pointer
} while(i<Spaces && b != NULL);

if(b) return b - DataStr.c_str();
else return -1;
}

In addition to the other points, what happens when strchr returns NULL?
Your code does ++b, which may (and should) crash the program.

As Dave points out, you should store the result of the first DataStr.c_str
in a pointer, and avoid calling the function a second time as it is allowed
to return a different pointer.
 
J

John Harrison

Steve said:
Hi Guys,

I have a string which contains data elements separated by spaces. I also
have a function which returns the number of characters from the beginning of
the string for a given number of spaces. I am using a loop with strchr for
the number of spaces, and I was just wondering if there was a more efficient
(ie faster) way of achieving this. Any ideas please? Current function code
below:

int GetNewPosition(const std::string& DataStr, const int Spaces)
{
int i = 0; // space counter
char* b = const_cast<char*>(DataStr.c_str()); // beginning of string

do {
b = strchr(b, ' '); // find next space
++i; ++b; // increment space counter and pointer
} while(i<Spaces && b != NULL);

if(b) return b - DataStr.c_str();
else return -1;
}


Your current function is bugged. If there are no spaces, then b == NULL,
then you increment b, then you test it for NULL. That isn't going to work.
You also have a completely gratuitous const_cast in your code.

It's quite a good example of how newbies worry about efficiency without
other more important considerations such as writing working code, and
writing clear code.

However strchr is probably the fastest way to do it, however as with all
these questions there is no official answer as to which method is fastest.
The C++ standard makes no mention of which functions or techniques are
faster than others, the only way to tell for sure is try different methods
and time them.

Here's my version, completely untested, so apologies in advance for any
bugs.

int GetNewPosition(const std::string& DataStr, int Spaces)
{
const char* start = DataStr.c_str();
const char* p = start;
for (int i = 0; i < Spaces; ++i)
{
if (p == NULL)
return -1;
p = strchr(p + 1, ' ');
}
return p - start;
}

john
 
S

Siemel Naran

You also have a completely gratuitous const_cast in your code.

This last point seems fine to me. The original object was obviously
non-const, as it was created through new char[N] or allocator.allocate(N),
so it seems safe to use const_cast. I could be wrong though. Let me know.
I think some libraries provide only strchr(char *, char), but they should
also have strchr(const char *, char).
 
J

John Harrison

Siemel Naran said:
string
You also have a completely gratuitous const_cast in your code.

This last point seems fine to me. The original object was obviously
non-const, as it was created through new char[N] or allocator.allocate(N),
so it seems safe to use const_cast. I could be wrong though. Let me know.
I think some libraries provide only strchr(char *, char), but they should
also have strchr(const char *, char).

I agree it's safe, but its also unnecessary. By writing const_cast the OP
gives the impression that something tricky is going on in the code, which
isn't the case. Its a completely normal function which can be written
without any casts.

C++ should provide two overloads for strchr

const char* strchr(const char*, int);

and

char* strchr(char*, int);

Because C doesn't have function overloading it can only provide one version
which is the non-const correct

char* strchr(const char*, int);

Maybe that is what persuaded the OP that he needed to use a cast.

john
 
S

Steve

Thanks folks,

This exercise just tells me no matter how long I think I've been doing C++,
I'll always be a n00b! Fancy overlooking the "case of NULL", D'Oh!!

Thanks for the code advice too, it's appreciated...though I'm beginning to
think It'd be better for the world if I gave up programming ;)

Steve.
 
D

Dave Townsend

With regard to the first comment I originally posted, I thought there might
be a solution
using streams to parse the words out of a string, I looked a bit closer,
below is
my attempt. This works fine apart from one detail, perhaps somebody might
jump in, there doesn't seem to be a way to specify the separator chars of
the original stream, in the example, the last word is returned as "dog." not
"dog"
Other than that, its a wonderfully simple snippet of code, praise to STL!

I've not benchmarked this code, but I presume its going to be relatively
efficient
since the iterator will maintain the position in the stringstream as the
words are pulled out,
there will be some extra overhead I suspect from using the string class as
opposed to
plain characters, that's life.

The question is then, how to specify the separator/whitespace for the
stream.


#include <iostream>
#include <sstream>
#include <iterator>

using namespace std;


int main(int argc, char* argv[])
{
istringstream is("the quick brown fox jumped over the lazy dog.");

istream_iterator<string> is_iter(is);
istream_iterator<string> end;

for ( ; is_iter != end; ++is_iter)
{
string nextword = *is_iter;
cout << "next word is \"" << nextword << "\"\n";
}
return 0;
}
 
D

Daniel T.

John Harrison said:
Your current function is bugged. If there are no spaces, then b == NULL,
then you increment b, then you test it for NULL. That isn't going to work.
You also have a completely gratuitous const_cast in your code.

It's quite a good example of how newbies worry about efficiency without
other more important considerations such as writing working code, and
writing clear code.

However strchr is probably the fastest way to do it, however as with all
these questions there is no official answer as to which method is fastest.
The C++ standard makes no mention of which functions or techniques are
faster than others, the only way to tell for sure is try different methods
and time them.

Here's my version, completely untested, so apologies in advance for any
bugs.

int GetNewPosition(const std::string& DataStr, int Spaces)
{
const char* start = DataStr.c_str();
const char* p = start;
for (int i = 0; i < Spaces; ++i)
{
if (p == NULL)
return -1;
p = strchr(p + 1, ' ');
}
return p - start;
}

I suspect your code above doesn't do what the OP wanted...

GetNewPosition("", 0) should return -1 if I understand what he is trying
to do, yours returns 0. Care to try again? :)
 
D

Daniel T.

Dave Townsend said:
With regard to the first comment I originally posted, I thought there might
be a solution
using streams to parse the words out of a string, I looked a bit closer,
below is
my attempt. This works fine apart from one detail, perhaps somebody might
jump in, there doesn't seem to be a way to specify the separator chars of
the original stream, in the example, the last word is returned as "dog." not
"dog"

That wasn't a requirement anyway.

Other than that, its a wonderfully simple snippet of code, praise to STL!

I've not benchmarked this code, but I presume its going to be relatively
efficient
since the iterator will maintain the position in the stringstream as the
words are pulled out,
there will be some extra overhead I suspect from using the string class as
opposed to
plain characters, that's life.

The question is then, how to specify the separator/whitespace for the
stream.

Good question...

#include <iostream>
#include <sstream>
#include <iterator>

using namespace std;


int main(int argc, char* argv[])
{
istringstream is("the quick brown fox jumped over the lazy dog.");

istream_iterator<string> is_iter(is);
istream_iterator<string> end;

for ( ; is_iter != end; ++is_iter)
{
string nextword = *is_iter;
cout << "next word is \"" << nextword << "\"\n";
}
return 0;
}

Your idea skips over all whitespace, not just spaces as the OP's code
did. I don't know if he would have a problem with that or not...

Otherwise, this works just as well as my custom iterator. What I did
would only be useful if one also implemented op-- (so you could go both
forward and backward) and/or a non-const version of op* (so you could
replace words.)
 
S

Siemel Naran

C++ should provide two overloads for strchr

const char* strchr(const char*, int);

and

char* strchr(char*, int);

Good point. Because the OP is using std::string it's clear he's using C++.
 
J

John Harrison

Siemel Naran said:
Good point. Because the OP is using std::string it's clear he's using C++.

Posting to comp.lang.c++ should also mean C++, although that isn't always
the case. Some posters profess to be coding in a mysterious language called
C/C++.

john
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,043
Latest member
CannalabsCBDReview

Latest Threads

Top