Tokens in a string

gervaz · Jun 7, 2010

Hi all, I need to search for tokens in a string, can you suggest some
function?
Basically in a string like "This is a test string. It'a test ok?"
Given some token, I would like to split the string, like "This is a" "
string. It's a " " ok?" using "test" as a delimiter.

Any help?
PS no regex thanks,
Mattia

Victor Bazarov · Jun 7, 2010

Hi all, I need to search for tokens in a string, can you suggest some
function?
Basically in a string like "This is a test string. It'a test ok?"
Given some token, I would like to split the string, like "This is a" "
string. It's a " " ok?" using "test" as a delimiter.

Any help?

Have you looked at the interface of 'std::string' (std::basic_string
template specialization on char)? There is 'find', there is
'find_first_of', 'find_last_of'... Those could be just what you need.
Essentially you just let the standard library find the token for you and
then split the original string (by using 'substr') into pieces.

There are probably ready-to-wear tokenizers out there, too. I don't
have a link ('cept for www.google.com <g>).

V

Saeed Amrollahi · Jun 7, 2010

Hi all, I need to search for tokens in a string, can you suggest some
function?
Basically in a string like "This is a test string. It'a test ok?"
Given some token, I would like to split the string, like "This is a" "
string. It's a " " ok?" using "test" as a delimiter.

Any help?
PS no regex thanks,
Mattia

Hi
Besides what Victor wrote, there is strtok, which was inherited from
C Programming language. Also, there is a string tokenizer from Boost
library.
for strtok, in Windows check MSDN, in Linux use the "man 3 strtok"
command.
For boost tokenizer check www.boost.com

Regards,
-- Saeed Amrollahi

gervaz · Jun 7, 2010

Thanks for your help, I've come up with:

std::string ex = "This is a very simple string a test what a very
simple test! simple";
std::string f = "simple";

auto pos = ex.find(f);

while (pos != std::string::npos)
{
std::string split = ex.substr(0, pos);
std::cout << split << std::endl;
ex = ex.substr(pos + f.size(), ex.size());
pos = ex.find(f);
}

if (ex.size() != 0)
std::cout << ex;

--

Default User · Jun 7, 2010

Besides what Victor wrote, there is strtok, which was inherited from
C Programming language.

strtok() isn't going to work with a string as a delimiter. It TAKES a
C-string, but uses the individual characters as delimiters.

Brian

gervaz · Jun 7, 2010

BTW, is there a way to use iterators and do not every time have to
create/shrink the previous string?

Thanks,
Mattia

Default User · Jun 8, 2010

BTW, is there a way to use iterators and do not every time have to
create/shrink the previous string?

Please don't top-post, replies belong following the quotes.

Here's a routine I wrote some time back. It creates a vector of strings.

void Explode(const std::string &inString, std::vector<std::string>
&outVector, const std::string &separator)
{
std::string::size_type start = 0;
std::string::size_type end = 0;

while ((end=inString.find(separator, start)) != std::string::npos)
{
outVector.push_back(inString.substr (start, end-start));
start = end+separator.size();
}

// std::cout << start << "\n";
outVector.push_back(inString.substr (start));
}

Brian

Paul Bibbings · Jun 8, 2010

gervaz said:
BTW, is there a way to use iterators and do not every time have to
create/shrink the previous string?

To adapt the code you include in your previous post so that you "do not
every time have to create/shrink the previous string" you might do
something like this:

08:47:49 Paul Bibbings@JIJOU
/cygdrive/d/CPPProjects/CLCPP $cat tok_in_string.cpp
// file: tok_in_string.cpp

#include <string>
#include <iostream>

int main()
{
std::string ex = "This is a very simple string a test what a "
"very simple test! simple";
std::string f = "simple";

std::size_t old_pos = 0;
std::size_t pos = ex.find(f);

while (pos != std::string::npos)
{
std::cout << ex.substr(old_pos, pos - old_pos) << std::endl;
old_pos = pos + f.size();
pos = ex.find(f, old_pos);
}

if (old_pos < ex.size())
std::cout << ex.substr(old_pos) << std::endl;
}

08:47:55 Paul Bibbings@JIJOU
/cygdrive/d/CPPProjects/CLCPP $./tok_in_string
This is a very
string a test what a very
test!

08:48:00 Paul Bibbings@JIJOU
/cygdrive/d/CPPProjects/CLCPP $

Regards

Paul Bibbings

James Kanze · Jun 8, 2010

On 6/7/2010 12:09 PM, gervaz wrote:

Have you looked at the interface of 'std::string'
(std::basic_string template specialization on char)? There is
'find', there is 'find_first_of', 'find_last_of'... Those
could be just what you need. Essentially you just let the
standard library find the token for you and then split the
original string (by using 'substr') into pieces.

I generally prefer to use algorithms in the standard library for
this: something less to learn, since they're what I also use
with std::vector, etc.; std::find, std::find_if,
std::find_first_of and std::search are probably the most useful.
Once you've got iterators to the start and end of whatever
you're looking for, then the two iterator constructor of
std::string gives you the string. (I'd also create a series of
predicate objects around std::locale, for testing character
categories.)

There are probably ready-to-wear tokenizers out there, too.
I don't have a link ('cept forwww.google.com<g>).

Boost::regex is a a pretty powerful tool for this sort of thing,
and will be part of the next release of the standard.
Otherwise, with a little bit of work, you can use flex pretty
effectively.

James Kanze · Jun 8, 2010

Besides what Victor wrote, there is strtok,

No. The strtok function is hopelessly broken, and should not be
used in any code you expect to maintain.

James Kanze · Jun 8, 2010

BTW, is there a way to use iterators and do not every time have to
create/shrink the previous string?

Yes. Something along the lines of the following, for example:

std::vector<std::string>
tokenize(std::string const& input, char separ)
{
std::vector<std::string> results;
std::string::const_iterator end = input.end();
std::string::const_iterator current = input.begin();
std::string::const_iterator next = std::find(current, end, separ);
while (next != end) {
results.push_back(std::string(current, next));
current = next + 1;
next = std::find(current, end, separ);
}
results.push_back(std::string(current, end));
return results;
}

gervaz · Jun 13, 2010

Yes. Something along the lines of the following, for example:

std::vector<std::string>
tokenize(std::stringconst& input, char separ)
{
std::vector<std::string> results;
std::string::const_iterator end = input..end();
std::string::const_iterator current = input.begin();
std::string::const_iterator next = std::find(current, end, separ);
while (next != end) {
results.push_back(std::string(current, next));
current = next + 1;
next = std::find(current, end, separ);
}
results.push_back(std::string(current, end));
return results;
}

Thanks all, I've come up with the following (although I'll have
problems in handling encodings where character are not 8 bit...):

template<typename T>
void split(const std::string& str, T container)
{
typedef std::string::const_iterator iter;
const std::locale loc("");

iter i = str.begin();

while (i != str.end())
{
i = std::find_if(i, str.end(), [&loc](char c) { return !
std::isspace(c, loc); });

iter j = std::find_if(i, str.end(), [&loc](char c) { return
std::isspace(c, loc); });

if (i != str.end())
*container++ = std::string(i, j);

i = j;
}
}

template<typename T>
void split(const std::string& str, const std::string& sep, T
container)
{
std::size_t start = 0;
std::size_t end = 0;

while ((end = str.find(sep, start)) != std::string::npos)
{
*container++ = str.substr(start, end - start);
start = end + sep.size();
}

if (start < str.size())
*container++ = str.substr(start);
}

Converting an Array to a String in JavaScript	7	Sep 22, 2023
Can't solve problems! please Help	0	Sep 26, 2022
Measuring a string of text	1	Sep 15, 2022
Trouble accessing a value within a JSON string.	1	Jun 16, 2023
Stripping tokens in the C preprocessor	12	Aug 5, 2012
Copy string from 2D array to a 1D array in C	1	Nov 1, 2023
tokens concat	8	Dec 6, 2008
Checking the available range while iterating through a string	13	Feb 16, 2011

Tokens in a string

gervaz

Victor Bazarov

Saeed Amrollahi

gervaz

Default User

gervaz

Default User

Paul Bibbings

James Kanze

James Kanze

James Kanze

gervaz

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads