Best way to tokenize in String

S

sravanreddy001

Hi,
what is the efficient way to tokenize the string and splitting into
words based on the delimiter?

i've looked at the strtok() in string.h
is there any other better way to do this?
A more efficient one?
 
I

Ian Collins

Hi,
what is the efficient way to tokenize the string and splitting into
words based on the delimiter?

The simplest way is to create an istringstream from the string and
stream out the words.
 
R

red floyd

But is that the most efficient way?

Screw "efficiency". The amount of time parsing should be minimal
compared to either your I/O or processing time. Until you benchmark
and show that the tokenizing is a bottleneck, go for the simplicity.
 
B

Ben Cottrell

Paul said:
But is that the most efficient way?

However simplicty is sometime preferred over efficiency and the stream
method is quite handy, it is described here, bottom of page:
http://www.oopweb.com/CPP/Documents/CPPHOWTO/Volume/C++Programming-HOWTO-7.html

Is there a way to use this stream method when the delimiter is not
whitespace?
Yes, there's a technique called the "whitespace redefinition approach"
which will let you do that:

https://groups.google.com/group/comp.lang.c++/msg/4de0be2e4eb8e0ba?output=gplain&hl=de&pli=1


Personally I like the ability to use 'cin >> foo >> bar' style to read
delimited data from a stream, although most C++ programmers I know don't
really know/care much about locales, ctypes and facets, and
unfortunately I would expect they'd probably consider it to be "a bit
too weird" to use in their code.




Another one which I like, using TR1/Boost RegEx:

#include <string>
#include <regex>
#include <iostream>

int main()
{
std::string str = "the\t quick brown\n-\n- fox"
" jumped..over,the,lazy,.dog";
std::tr1::regex re("[\\s-,.]+");

std::tr1::sregex_token_iterator
iter(str.begin(), str.end(), re, -1),
end;

while(iter != end)
{
std::cout << *iter++ << std::endl;
}
}
 
A

Asger-P

Hi sravanreddy001

Hi,
what is the efficient way to tokenize the string and splitting into
words based on the delimiter?

You write delimiter and not delimiters, is that because You
have only one delimiter to consider ?

If thats the case of if You have only a few known delimiters
then You can do a it a lot faster then strtok, if You write
Your own tokenizer.


Best regards
Asger-P
 
P

Paul

Yes, there's a technique called the "whitespace redefinition approach"
which will let you do that:

https://groups.google.com/group/comp.lang.c++/msg/4de0be2e4eb8e0ba?ou...

Personally I like the ability to use 'cin >> foo >> bar' style to read
delimited data from a stream, although most C++ programmers I know don't
really know/care much about locales, ctypes and facets, and
unfortunately I would expect they'd probably consider it to be "a bit
too weird" to use in their code.
This looks preety usefull especially if your source comes from a file.
I have to admit I don't know much about locales and facets as I've
never really delved into that part of the language. It looks quite
advanced and think it might take a couple of days studying to get a
good understanding of it all. I'll add it my list of things to do. :)
Another one which I like, using TR1/Boost RegEx:

#include <string>
#include <regex>
#include <iostream>

int main()
{
     std::string str = "the\t    quick  brown\n-\n- fox"
                       " jumped..over,the,lazy,.dog";
     std::tr1::regex re("[\\s-,.]+");

     std::tr1::sregex_token_iterator
         iter(str.begin(), str.end(), re, -1),
         end;

     while(iter != end)
     {
         std::cout << *iter++ << std::endl;
     }
This looks very tidy too. More usefull if input is in the form of a
string. Again never really delved into boost RegEx, another one for
the list :)
 
M

Moshbear dot Net

Hi,
what is the efficient way to tokenize the string and splitting into
words based on the delimiter?

i've looked at the strtok() in string.h
is there any other better way to do this?
A more efficient one?

I've had good results with boost.tokenizer.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top