C++ strtok

Discussion in 'C++' started by abcd, Apr 24, 2012.

  1. abcd

    abcd Guest

    Hello C++ users,
    Greetings.

    I have a following question regarding strtok function used for string
    tokenizing. As I understand, strtok internally uses static variable to
    keep track of the string passed to it so that tokens can be searched
    based on delimiter.
    After the strtok returns NULL, it means that no tokens are available.

    What if now strtok is invoked with another string to search for
    tokens?? What happens to the internal static buffer which was
    initialized to the previous string, when is that released??

    Best Regards
    Sumit
    abcd, Apr 24, 2012
    #1
    1. Advertising

  2. abcd

    gwowen Guest

    On Apr 24, 10:46 am, abcd <> wrote:
    >
    > What if now strtok is invoked with another string to search for
    > tokens?? What happens to the internal static buffer which was
    > initialized to the previous string, when is that released??


    There is no static buffer. strtok() modifies the string passed to it
    as an argument, by overwriting the delimiter characters with '\0' so
    that the return values points to the (modified) input C-string. There
    is static state between calls (i.e. where the last tokenization got
    to), but no dynamic buffer is needed.
    gwowen, Apr 24, 2012
    #2
    1. Advertising

  3. On 24 апр, 13:46, abcd <> wrote:
    > Hello C++ users,
    > Greetings.
    >
    > I have a following question regarding strtok function used for string
    > tokenizing. As I understand, strtok internally uses static variable to
    > keep track of the string passed to it so that tokens can be searched
    > based on delimiter.
    > After the strtok returns NULL, it means that no tokens are available.
    >
    > What if now strtok is invoked with another string to search for
    > tokens?? What happens to the internal static buffer which was
    > initialized to the previous string, when is that released??
    >
    > Best Regards
    > Sumit
    >
    >


    As I understand it does not have any internal static buffer. So there
    is no need to release anything. It has a static variable of type point
    to char. When you supply another string to process that static pointer
    is set to this string. So in the very beginning there is a check
    whether supplied string has value of NULL. If it is not equal to NULL
    then the static pointer is set to this new value.
    Vlad from Moscow, Apr 24, 2012
    #3
  4. On 24.04.2012 11:46, abcd wrote:
    > I have a following question regarding strtok function used for string
    > tokenizing. As I understand, strtok internally uses static variable to
    > keep track of the string passed to it so that tokens can be searched
    > based on delimiter.
    > After the strtok returns NULL, it means that no tokens are available.
    >
    > What if now strtok is invoked with another string to search for
    > tokens??


    In this case the static state is discarded. Once you passed another
    string you cannot continue to tokenize the first one.

    > What happens to the internal static buffer which was
    > initialized to the previous string, when is that released??


    There is nothing to release. The internal state has fixed size and
    refers to the string buffer you supplied at the first call. The state is
    globally allocated in the data segment of the C++ runtime.

    More exactly, modern thread-safe C++ runtimes allocate the storage for
    the internal state of strtok as thread local storage. Otherwise strtok
    would be almost useless.


    In practice I avoid to use strtok at all.

    Firstly, because it is not re-entrant. I.e. you must not parse another
    string while you have to complete the first one. This divides the
    functions that you are allowed to call from within the parser loop into
    the ones that never call strtok and the functions that might call strtok.
    While it is trivial to decide this for runtime library functions it
    becomes error prone for your own code. E.g. an object method you call
    might internally call methods that use strtok. You might not be aware of
    that.

    Secondly strtok modifies the original string in a C style way. C like
    string manipulation should not be used in C++ programs because it is
    error prone and often a backdoor for security vulnerabilities. As long
    as you do not deal with char* in C++ and you only use const char* the
    probability of security vulnerabilities is significantly reduced.

    Use strspn and strcspn for C style parsing in C++. They will easily
    achieve the same behavior than strtok without it's disadvantages. I.e.
    they do not modify the input buffer and the internal state is kept at
    the local stack.

    strtok is mainly supported for C compatibility by the C++ runtime.


    Marcel
    Marcel Müller, Apr 24, 2012
    #4
  5. abcd

    none Guest

    In article <4f968b83$0$7620$-online.net>,
    Marcel Müller <> wrote:
    >On 24.04.2012 11:46, abcd wrote:
    >> I have a following question regarding strtok function used for string
    >> tokenizing. As I understand, strtok internally uses static variable to
    >> keep track of the string passed to it so that tokens can be searched
    >> based on delimiter.
    >> After the strtok returns NULL, it means that no tokens are available.
    >>
    >> What if now strtok is invoked with another string to search for
    >> tokens??

    >
    >In this case the static state is discarded. Once you passed another
    >string you cannot continue to tokenize the first one.
    >
    >> What happens to the internal static buffer which was
    >> initialized to the previous string, when is that released??

    >
    >There is nothing to release. The internal state has fixed size and
    >refers to the string buffer you supplied at the first call. The state is
    >globally allocated in the data segment of the C++ runtime.
    >
    >More exactly, modern thread-safe C++ runtimes allocate the storage for
    >the internal state of strtok as thread local storage. Otherwise strtok
    >would be almost useless.
    >
    >
    >In practice I avoid to use strtok at all.
    >
    >Firstly, because it is not re-entrant. I.e. you must not parse another
    >string while you have to complete the first one. This divides the
    >functions that you are allowed to call from within the parser loop into
    >the ones that never call strtok and the functions that might call strtok.
    >While it is trivial to decide this for runtime library functions it
    >becomes error prone for your own code. E.g. an object method you call
    >might internally call methods that use strtok. You might not be aware of
    >that.
    >
    >Secondly strtok modifies the original string in a C style way. C like
    >string manipulation should not be used in C++ programs because it is
    >error prone and often a backdoor for security vulnerabilities. As long
    >as you do not deal with char* in C++ and you only use const char* the
    >probability of security vulnerabilities is significantly reduced.
    >
    >Use strspn and strcspn for C style parsing in C++. They will easily
    >achieve the same behavior than strtok without it's disadvantages. I.e.
    >they do not modify the input buffer and the internal state is kept at
    >the local stack.
    >
    >strtok is mainly supported for C compatibility by the C++ runtime.


    Totally agree with Marcel here. strtok is not a very good function to
    use anywhere, neither in C nor in C++. Even the manual says so:

    ---------------------------------------
    man strtok
    <snip snip>
    BUGS
    Be cautious when using these functions. If you do use them,
    note that:

    * These functions modify their first argument.

    * These functions cannot be used on constant strings.

    * The identity of the delimiting character is lost.

    * The strtok() function uses a static buffer while parsing, so
    it's not thread safe. Use strtok_r() if this matters to you.
    -----------------------------------------

    std::string::find() and std::string::substr() can pretty much do
    everything the strtok does much more safely. I typically use a
    template function that tokenize a string a return vector of string
    containing the individual tokens. Much more usable at the cost of
    copying a few string. It is very rarely a performance bottle neck.

    Yannick
    none, Apr 24, 2012
    #5
  6. abcd

    Dan McLeran Guest

    > What if now strtok is invoked with another string to search for
    > tokens?? What happens to the internal static buffer which was
    > initialized to the previous string, when is that released??
    >
    > Best Regards
    > Sumit


    Have a look at Boost's excellent libraries: http://www.boost.org/doc/libs/1_49_0/libs/tokenizer/
    Dan McLeran, Apr 24, 2012
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Adam Balgach
    Replies:
    2
    Views:
    559
    news-east
    Nov 28, 2004
  2. Alex Vinokur

    strtok() and std::string

    Alex Vinokur, Apr 14, 2005, in forum: C++
    Replies:
    6
    Views:
    4,897
    Pete Becker
    Apr 14, 2005
  3. strtok problem

    , Aug 28, 2003, in forum: C Programming
    Replies:
    4
    Views:
    499
  4. Robert

    strtok trouble

    Robert, Sep 5, 2003, in forum: C Programming
    Replies:
    17
    Views:
    1,212
    Jalapeno
    Sep 6, 2003
  5. Fatih Gey

    segfault on strtok

    Fatih Gey, Oct 23, 2003, in forum: C Programming
    Replies:
    40
    Views:
    1,439
    nobody
    Nov 1, 2003
Loading...

Share This Page