Tokens in a string

G

gervaz

Hi all, I need to search for tokens in a string, can you suggest some
function?
Basically in a string like "This is a test string. It'a test ok?"
Given some token, I would like to split the string, like "This is a" "
string. It's a " " ok?" using "test" as a delimiter.

Any help?
PS no regex thanks,
Mattia
 
V

Victor Bazarov

Hi all, I need to search for tokens in a string, can you suggest some
function?
Basically in a string like "This is a test string. It'a test ok?"
Given some token, I would like to split the string, like "This is a" "
string. It's a " " ok?" using "test" as a delimiter.

Any help?

Have you looked at the interface of 'std::string' (std::basic_string
template specialization on char)? There is 'find', there is
'find_first_of', 'find_last_of'... Those could be just what you need.
Essentially you just let the standard library find the token for you and
then split the original string (by using 'substr') into pieces.

There are probably ready-to-wear tokenizers out there, too. I don't
have a link ('cept for www.google.com <g>).

V
 
S

Saeed Amrollahi

Hi all, I need to search for tokens in a string, can you suggest some
function?
Basically in a string like "This is a test string. It'a test ok?"
Given some token, I would like to split the string, like "This is a" "
string. It's a " " ok?" using "test" as a delimiter.

Any help?
PS no regex thanks,
Mattia

Hi
Besides what Victor wrote, there is strtok, which was inherited from
C Programming language. Also, there is a string tokenizer from Boost
library.
for strtok, in Windows check MSDN, in Linux use the "man 3 strtok"
command.
For boost tokenizer check www.boost.com

Regards,
-- Saeed Amrollahi
 
G

gervaz

Thanks for your help, I've come up with:

std::string ex = "This is a very simple string a test what a very
simple test! simple";
std::string f = "simple";

auto pos = ex.find(f);

while (pos != std::string::npos)
{
std::string split = ex.substr(0, pos);
std::cout << split << std::endl;
ex = ex.substr(pos + f.size(), ex.size());
pos = ex.find(f);
}

if (ex.size() != 0)
std::cout << ex;

--
 
D

Default User

Besides what Victor wrote, there is strtok, which was inherited from
C Programming language.

strtok() isn't going to work with a string as a delimiter. It TAKES a
C-string, but uses the individual characters as delimiters.



Brian
 
G

gervaz

BTW, is there a way to use iterators and do not every time have to
create/shrink the previous string?

Thanks,
Mattia
 
D

Default User

BTW, is there a way to use iterators and do not every time have to
create/shrink the previous string?

Please don't top-post, replies belong following the quotes.


Here's a routine I wrote some time back. It creates a vector of strings.

void Explode(const std::string &inString, std::vector<std::string>
&outVector, const std::string &separator)
{
std::string::size_type start = 0;
std::string::size_type end = 0;

while ((end=inString.find(separator, start)) != std::string::npos)
{
outVector.push_back(inString.substr (start, end-start));
start = end+separator.size();
}

// std::cout << start << "\n";
outVector.push_back(inString.substr (start));
}



Brian
 
P

Paul Bibbings

gervaz said:
BTW, is there a way to use iterators and do not every time have to
create/shrink the previous string?

To adapt the code you include in your previous post so that you "do not
every time have to create/shrink the previous string" you might do
something like this:

08:47:49 Paul Bibbings@JIJOU
/cygdrive/d/CPPProjects/CLCPP $cat tok_in_string.cpp
// file: tok_in_string.cpp

#include <string>
#include <iostream>

int main()
{
std::string ex = "This is a very simple string a test what a "
"very simple test! simple";
std::string f = "simple";

std::size_t old_pos = 0;
std::size_t pos = ex.find(f);

while (pos != std::string::npos)
{
std::cout << ex.substr(old_pos, pos - old_pos) << std::endl;
old_pos = pos + f.size();
pos = ex.find(f, old_pos);
}

if (old_pos < ex.size())
std::cout << ex.substr(old_pos) << std::endl;
}

08:47:55 Paul Bibbings@JIJOU
/cygdrive/d/CPPProjects/CLCPP $./tok_in_string
This is a very
string a test what a very
test!

08:48:00 Paul Bibbings@JIJOU
/cygdrive/d/CPPProjects/CLCPP $

Regards

Paul Bibbings
 
J

James Kanze

On 6/7/2010 12:09 PM, gervaz wrote:
Have you looked at the interface of 'std::string'
(std::basic_string template specialization on char)? There is
'find', there is 'find_first_of', 'find_last_of'... Those
could be just what you need. Essentially you just let the
standard library find the token for you and then split the
original string (by using 'substr') into pieces.

I generally prefer to use algorithms in the standard library for
this: something less to learn, since they're what I also use
with std::vector, etc.; std::find, std::find_if,
std::find_first_of and std::search are probably the most useful.
Once you've got iterators to the start and end of whatever
you're looking for, then the two iterator constructor of
std::string gives you the string. (I'd also create a series of
predicate objects around std::locale, for testing character
categories.)
There are probably ready-to-wear tokenizers out there, too.
I don't have a link ('cept forwww.google.com<g>).

Boost::regex is a a pretty powerful tool for this sort of thing,
and will be part of the next release of the standard.
Otherwise, with a little bit of work, you can use flex pretty
effectively.
 
J

James Kanze

BTW, is there a way to use iterators and do not every time have to
create/shrink the previous string?

Yes. Something along the lines of the following, for example:

std::vector<std::string>
tokenize(std::string const& input, char separ)
{
std::vector<std::string> results;
std::string::const_iterator end = input.end();
std::string::const_iterator current = input.begin();
std::string::const_iterator next = std::find(current, end, separ);
while (next != end) {
results.push_back(std::string(current, next));
current = next + 1;
next = std::find(current, end, separ);
}
results.push_back(std::string(current, end));
return results;
}
 
G

gervaz

Yes.  Something along the lines of the following, for example:

        std::vector<std::string>
        tokenize(std::stringconst& input, char separ)
        {
                std::vector<std::string> results;
                std::string::const_iterator end = input..end();
                std::string::const_iterator current = input.begin();
                std::string::const_iterator next = std::find(current, end, separ);
                while (next != end) {
                        results.push_back(std::string(current, next));
                        current = next + 1;
                        next = std::find(current, end, separ);
                }
                results.push_back(std::string(current, end));
                return results;
        }

Thanks all, I've come up with the following (although I'll have
problems in handling encodings where character are not 8 bit...):

template<typename T>
void split(const std::string& str, T container)
{
typedef std::string::const_iterator iter;
const std::locale loc("");

iter i = str.begin();

while (i != str.end())
{
i = std::find_if(i, str.end(), [&loc](char c) { return !
std::isspace(c, loc); });

iter j = std::find_if(i, str.end(), [&loc](char c) { return
std::isspace(c, loc); });

if (i != str.end())
*container++ = std::string(i, j);

i = j;
}
}

template<typename T>
void split(const std::string& str, const std::string& sep, T
container)
{
std::size_t start = 0;
std::size_t end = 0;

while ((end = str.find(sep, start)) != std::string::npos)
{
*container++ = str.substr(start, end - start);
start = end + sep.size();
}

if (start < str.size())
*container++ = str.substr(start);
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,570
Members
45,045
Latest member
DRCM

Latest Threads

Top