Best way to tokenize in String

Discussion in 'C++' started by sravanreddy001, Sep 10, 2011.

  1. Hi,
    what is the efficient way to tokenize the string and splitting into
    words based on the delimiter?

    i've looked at the strtok() in string.h
    is there any other better way to do this?
    A more efficient one?
     
    sravanreddy001, Sep 10, 2011
    #1
    1. Advertising

  2. sravanreddy001

    Ian Collins Guest

    On 09/10/11 02:25 PM, sravanreddy001 wrote:
    > Hi,
    > what is the efficient way to tokenize the string and splitting into
    > words based on the delimiter?


    The simplest way is to create an istringstream from the string and
    stream out the words.

    --
    Ian Collins
     
    Ian Collins, Sep 10, 2011
    #2
    1. Advertising

  3. sravanreddy001

    Paul Guest

    On Sep 10, 4:14 am, Ian Collins <> wrote:
    > On 09/10/11 02:25 PM, sravanreddy001 wrote:
    >
    > > Hi,
    > > what is the efficient way to tokenize the string and splitting into
    > > words based on the delimiter?

    >
    > The simplest way is to create an istringstream from the string and
    > stream out the words.
    >

    But is that the most efficient way?

    However simplicty is sometime preferred over efficiency and the stream
    method is quite handy, it is described here, bottom of page:
    http://www.oopweb.com/CPP/Documents/CPPHOWTO/Volume/C Programming-HOWTO-7.html

    Is there a way to use this stream method when the delimiter is not
    whitespace?
     
    Paul, Sep 10, 2011
    #3
  4. sravanreddy001

    red floyd Guest

    On 9/9/2011 10:30 PM, Paul wrote:
    > On Sep 10, 4:14 am, Ian Collins<> wrote:
    >> On 09/10/11 02:25 PM, sravanreddy001 wrote:
    >>
    >>> Hi,
    >>> what is the efficient way to tokenize the string and splitting into
    >>> words based on the delimiter?

    >>
    >> The simplest way is to create an istringstream from the string and
    >> stream out the words.
    >>

    > But is that the most efficient way?


    Screw "efficiency". The amount of time parsing should be minimal
    compared to either your I/O or processing time. Until you benchmark
    and show that the tokenizing is a bottleneck, go for the simplicity.
     
    red floyd, Sep 10, 2011
    #4
  5. sravanreddy001

    Ben Cottrell Guest

    Paul wrote:
    > On Sep 10, 4:14 am, Ian Collins <> wrote:
    >
    >>On 09/10/11 02:25 PM, sravanreddy001 wrote:
    >>
    >>
    >>>Hi,
    >>>what is the efficient way to tokenize the string and splitting into
    >>>words based on the delimiter?

    >>
    >>The simplest way is to create an istringstream from the string and
    >>stream out the words.
    >>

    >
    > But is that the most efficient way?
    >
    > However simplicty is sometime preferred over efficiency and the stream
    > method is quite handy, it is described here, bottom of page:
    > http://www.oopweb.com/CPP/Documents/CPPHOWTO/Volume/C Programming-HOWTO-7.html
    >
    > Is there a way to use this stream method when the delimiter is not
    > whitespace?
    >
    >

    Yes, there's a technique called the "whitespace redefinition approach"
    which will let you do that:

    https://groups.google.com/group/comp.lang.c /msg/4de0be2e4eb8e0ba?output=gplain&hl=de&pli=1


    Personally I like the ability to use 'cin >> foo >> bar' style to read
    delimited data from a stream, although most C++ programmers I know don't
    really know/care much about locales, ctypes and facets, and
    unfortunately I would expect they'd probably consider it to be "a bit
    too weird" to use in their code.




    Another one which I like, using TR1/Boost RegEx:

    #include <string>
    #include <regex>
    #include <iostream>

    int main()
    {
    std::string str = "the\t quick brown\n-\n- fox"
    " jumped..over,the,lazy,.dog";
    std::tr1::regex re("[\\s-,.]+");

    std::tr1::sregex_token_iterator
    iter(str.begin(), str.end(), re, -1),
    end;

    while(iter != end)
    {
    std::cout << *iter++ << std::endl;
    }
    }
     
    Ben Cottrell, Sep 10, 2011
    #5
  6. sravanreddy001

    Asger-P Guest

    Hi sravanreddy001

    On the: 10. of september-2011 At: 04:25 sravanreddy001 wrote:

    > Hi,
    > what is the efficient way to tokenize the string and splitting into
    > words based on the delimiter?


    You write delimiter and not delimiters, is that because You
    have only one delimiter to consider ?

    If thats the case of if You have only a few known delimiters
    then You can do a it a lot faster then strtok, if You write
    Your own tokenizer.


    Best regards
    Asger-P
     
    Asger-P, Sep 10, 2011
    #6
  7. sravanreddy001

    Paul Guest

    On Sep 10, 1:05 pm, Ben Cottrell <> wrote:
    > Paul wrote:
    > > On Sep 10, 4:14 am, Ian Collins <> wrote:

    >
    > >>On 09/10/11 02:25 PM, sravanreddy001 wrote:

    >
    > >>>Hi,
    > >>>what is the efficient way to tokenize the string and splitting into
    > >>>words based on the delimiter?

    >
    > >>The simplest way is to create an istringstream from the string and
    > >>stream out the words.

    >
    > > But is that the most efficient way?

    >
    > > However simplicty is sometime preferred over efficiency and the stream
    > > method is quite handy, it is described here, bottom of page:
    > >http://www.oopweb.com/CPP/Documents/CPPHOWTO/Volume/C Programming-HO...

    >
    > > Is there a way to use this stream method when the delimiter is not
    > > whitespace?

    >
    > Yes, there's a technique called the "whitespace redefinition approach"
    > which will let you do that:
    >
    > https://groups.google.com/group/comp.lang.c /msg/4de0be2e4eb8e0ba?ou...
    >
    > Personally I like the ability to use 'cin >> foo >> bar' style to read
    > delimited data from a stream, although most C++ programmers I know don't
    > really know/care much about locales, ctypes and facets, and
    > unfortunately I would expect they'd probably consider it to be "a bit
    > too weird" to use in their code.

    This looks preety usefull especially if your source comes from a file.
    I have to admit I don't know much about locales and facets as I've
    never really delved into that part of the language. It looks quite
    advanced and think it might take a couple of days studying to get a
    good understanding of it all. I'll add it my list of things to do. :)

    >
    > Another one which I like, using TR1/Boost RegEx:
    >
    > #include <string>
    > #include <regex>
    > #include <iostream>
    >
    > int main()
    > {
    >      std::string str = "the\t    quick  brown\n-\n- fox"
    >                        " jumped..over,the,lazy,.dog";
    >      std::tr1::regex re("[\\s-,.]+");
    >
    >      std::tr1::sregex_token_iterator
    >          iter(str.begin(), str.end(), re, -1),
    >          end;
    >
    >      while(iter != end)
    >      {
    >          std::cout << *iter++ << std::endl;
    >      }
    >
    >
    >

    This looks very tidy too. More usefull if input is in the form of a
    string. Again never really delved into boost RegEx, another one for
    the list :)
     
    Paul, Sep 10, 2011
    #7
  8. On Sep 9, 10:25 pm, sravanreddy001 <> wrote:
    > Hi,
    > what is the efficient way to tokenize the string and splitting into
    > words based on the delimiter?
    >
    > i've looked at the strtok() in string.h
    > is there any other better way to do this?
    > A more efficient one?


    I've had good results with boost.tokenizer.
     
    Moshbear dot Net, Sep 11, 2011
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Lans
    Replies:
    9
    Views:
    497
    Chris \( Val \)
    Jul 10, 2003
  2. Kelvin@!!!

    tokenize a string

    Kelvin@!!!, Feb 24, 2005, in forum: C++
    Replies:
    4
    Views:
    7,937
  3. Replies:
    20
    Views:
    3,263
    Ben Bacarisse
    Feb 18, 2006
  4. Sree

    string tokenize...

    Sree, Mar 8, 2007, in forum: Java
    Replies:
    1
    Views:
    488
    Robert Klemme
    Mar 8, 2007
  5. Travis
    Replies:
    2
    Views:
    397
    Mirco Wahab
    Jul 15, 2008
Loading...

Share This Page