tokezing a string

Discussion in 'C++' started by Amit Gupta, Jan 24, 2007.

  1. Amit Gupta

    Amit Gupta Guest

    Hi -

    I get a seg-fault when I compile and run this simple program.
    (seg-fault in first call to strtok). Any clues?
    My gcc is "gcc version 4.1.1 20070105 (Red Hat 4.1.1-51)"

    #include <string.h>


    int main()
    {
    char *token;
    char *line = "LINE TO BE SEPARATED";
    char *search = " ";


    /* Token will point to "LINE". */
    token = strtok(line, search);


    /* Token will point to "TO". */
    token = strtok(NULL, search);
    }
    Amit Gupta, Jan 24, 2007
    #1
    1. Advertising

  2. Amit Gupta wrote:
    > Hi -
    >
    > I get a seg-fault when I compile and run this simple program.
    > (seg-fault in first call to strtok). Any clues?
    > My gcc is "gcc version 4.1.1 20070105 (Red Hat 4.1.1-51)"
    >
    > #include <string.h>
    >
    >
    > int main()
    > {
    > char *token;
    > char *line = "LINE TO BE SEPARATED";
    > char *search = " ";
    >
    >
    > /* Token will point to "LINE". */
    > token = strtok(line, search);
    >
    >
    > /* Token will point to "TO". */
    > token = strtok(NULL, search);
    > }
    >


    Yes, strtok modifes the string it operates on, but string literal are
    read-only.

    Change your code to use an array instead of a pointer

    char line[] = "LINE TO BE SEPARATED";

    and it will work.
    John Harrison, Jan 24, 2007
    #2
    1. Advertising

  3. Amit Gupta

    Guest

    1. Its not C++, Its C.
    2.If you are using C++ then try to find std library function.
    3.if you are using C then -- Dont use strtok, or use it with caution
    and some Extra Checks for Null and related memory stuff.

    --raxit




    On Jan 24, 12:38 pm, "Amit Gupta" <> wrote:
    > Hi -
    >
    > I get a seg-fault when I compile and run this simple program.
    > (seg-fault in first call to strtok). Any clues?
    > My gcc is "gcc version 4.1.1 20070105 (Red Hat 4.1.1-51)"
    >
    > #include <string.h>
    >
    > int main()
    > {
    > char *token;
    > char *line = "LINE TO BE SEPARATED";
    > char *search = " ";
    >
    > /* Token will point to "LINE". */
    > token = strtok(line, search);
    >
    > /* Token will point to "TO". */
    > token = strtok(NULL, search);
    >
    >
    >
    > }- Hide quoted text -- Show quoted text -
    , Jan 24, 2007
    #3
  4. Amit Gupta

    Guest

    Yes Sir,

    It should be NULL in below last line.

    --raxit

    On Jan 24, 2:40 pm, wrote:
    > 1. Its not C++, Its C.
    > 2.If you are using C++ then try to find std library function.
    > 3.if you are using C then -- Dont use strtok, or use it with caution
    > and some Extra Checks for Null and related memory stuff.
    >
    > --raxit
    >
    > On Jan 24, 12:38 pm, "Amit Gupta" <> wrote:
    >
    >
    >
    > > Hi -

    >
    > > I get a seg-fault when I compile and run this simple program.
    > > (seg-fault in first call to strtok). Any clues?
    > > My gcc is "gcc version 4.1.1 20070105 (Red Hat 4.1.1-51)"

    >
    > > #include <string.h>

    >
    > > int main()
    > > {
    > > char *token;
    > > char *line = "LINE TO BE SEPARATED";
    > > char *search = " ";

    >
    > > /* Token will point to "LINE". */
    > > token = strtok(line, search);

    >
    > > /* Token will point to "TO". */
    > > token = strtok(NULL, search);

    >
    > > }- Hide quoted text -- Show quoted text -- Hide quoted text -- Show quoted text -
    , Jan 24, 2007
    #4
  5. Amit Gupta

    Chris Theis Guest

    > On Jan 24, 12:38 pm, "Amit Gupta" <> wrote:
    [SNIP]
    >> char *token;
    >> char *line = "LINE TO BE SEPARATED";
    >> char *search = " ";
    >>
    >> /* Token will point to "LINE". */
    >> token = strtok(line, search);
    >>
    >> /* Token will point to "TO". */
    >> token = strtok(NULL, search);
    >>


    > 1. Its not C++, Its C.


    Please do not top post!

    Even though strtok is not part of the standard library it's still a valid
    function call in C++.

    > 2.If you are using C++ then try to find std library function.


    But which std library function would that be?

    > 3.if you are using C then -- Dont use strtok, or use it with caution
    > and some Extra Checks for Null and related memory stuff.


    You're absolutely right on this one, although it does not help the OP a bit
    with his problem.

    Cheers
    Chris
    Chris Theis, Jan 24, 2007
    #5
  6. Amit Gupta

    Chris Theis Guest

    "Amit Gupta" <> wrote in message
    news:...
    [SNIP]
    >
    > #include <string.h>
    > int main()
    > {
    > char *token;
    > char *line = "LINE TO BE SEPARATED";
    > char *search = " ";
    >
    >
    > /* Token will point to "LINE". */
    > token = strtok(line, search);
    >
    >
    > /* Token will point to "TO". */
    > token = strtok(NULL, search);
    > }


    Hello,

    John already pointed out what to do but I just want to add a general remark.
    Using strtok() might not be the best solution for tokenizing especially if
    you use C++. You might be tempted for example to attempt to call strtok()
    with string object and thus end up doing somethin like this

    strtok( line.c_str(), search);

    Even though the line above causes undefined behavior as strtok will attempt
    to modify the data of the string object, I have seen it working on some
    platforms. This easily leads to troublesome confidence that it "works" and
    you can easily find yourself in trouble one day, when it breaks.

    Thus, you might consider doing a simple tokenizer using stringstreams if
    applicable:

    // std::string str = "1 2 3 4";
    // std::vector<int> vec = StringToVector<int>(str);

    template <class T> std::vector<T> StringToVector( const std::string& Str )
    {
    std::istringstream iss( Str );
    return std::vector<T>( std::istream_iterator<T>(iss),
    std::istream_iterator<T>() );
    }

    or a more sophisticated version:

    //////////////////////////////////////////////////////////////////////////////

    inline std::vector<std::string> TokenizeString( const std::string& Text,
    const std::string& Delimiters )
    // Tokenize a passed string with respect to the provided delimiters
    //
    // e.g.
    // string Line = "this_dog_is mine";
    // string Delimiters = " ,:_;#";
    // vector<string> WordList = TokenizeString( Line, Delimiters );
    //////////////////////////////////////////////////////////////////////////////
    {
    std::vector<std::string> WordList;
    std::string::size_type Begin, End;
    std::string Word;

    Begin = Text.find_first_not_of( Delimiters ); // skip blanks or whatever
    one finds at the beginning
    while( Begin != std::string::npos ) {
    End = Text.find_first_of( Delimiters, Begin );
    if( End == std::string::npos ) { // we'v reached the end without
    finding another delimiter
    End = Text.length();
    }

    Word.assign( Text.begin() + Begin, Text.begin() + End );
    WordList.push_back( Word );
    Begin = Text.find_first_not_of( Delimiters, End);
    }

    return WordList;
    };

    Cheers
    Chris
    Chris Theis, Jan 24, 2007
    #6
  7. Amit Gupta

    Rolf Magnus Guest

    Chris Theis wrote:


    > Even though strtok is not part of the standard library it's still a valid
    > function call in C++.


    Actually, strtok _is_ part of the standard library.
    Rolf Magnus, Jan 24, 2007
    #7
  8. Amit Gupta

    Pete Becker Guest

    wrote:
    > 1. Its not C++, Its C.


    Which part of the code is not valid C++?

    > 2.If you are using C++ then try to find std library function.


    strtok is part of the C++ standard library.

    > 3.if you are using C then -- Dont use strtok, or use it with caution
    > and some Extra Checks for Null and related memory stuff.
    >


    Yup. Know what the requirements are for any function you call, and be
    sure that you've satisfied them.

    --

    -- Pete
    Roundhouse Consulting, Ltd. (www.versatilecoding.com)
    Author of "The Standard C++ Library Extensions: a Tutorial and
    Reference." (www.petebecker.com/tr1book)
    Pete Becker, Jan 24, 2007
    #8
  9. Amit Gupta

    MrAsm Guest

    On Wed, 24 Jan 2007 12:33:52 +0100, "Chris Theis"
    <> wrote:


    >or a more sophisticated version:
    >
    >//////////////////////////////////////////////////////////////////////////////
    >
    >inline std::vector<std::string> TokenizeString( const std::string& Text,
    >const std::string& Delimiters )


    Very interesting code.

    But may I ask:

    1. Why are you defining the function as "inline"?
    Is "inline" just for simple stuff like a simple accessor (Get/Set) and
    similar...?

    2. Why you are returning the string vector?
    Would be better to return the string vector as reference in parameter
    list, to avoid copy constructors calls?

    e.g.

    void TokenizeString(
    <<< your params >>>
    /* out */ std::vector< std::string > & Tokens
    );


    Thanks in advance,
    MrAsm
    MrAsm, Jan 24, 2007
    #9
  10. Amit Gupta

    red floyd Guest

    Chris Theis wrote:
    You might be tempted for example to attempt to call strtok()
    > with string object and thus end up doing somethin like this
    >
    > strtok( line.c_str(), search);
    >


    actually, the above should fail to compile, since strtok() takes a char*
    as its first argument, and string::c_str() returns a const char *.
    red floyd, Jan 24, 2007
    #10
  11. Amit Gupta

    Chris Theis Guest

    "Rolf Magnus" <> wrote in message
    news:ep7g7b$jvb$01$-online.com...
    > Chris Theis wrote:
    >
    >
    >> Even though strtok is not part of the standard library it's still a valid
    >> function call in C++.

    >
    > Actually, strtok _is_ part of the standard library.


    You're of course right! That was a glitch on my part as I assumed the OP was
    referring to what was referred to as the standard template lib.

    Cheers
    Chris
    Chris Theis, Jan 24, 2007
    #11
  12. In article <>,
    MrAsm <> wrote:
    >On Wed, 24 Jan 2007 12:33:52 +0100, "Chris Theis"
    ><> wrote:
    >
    >
    >>or a more sophisticated version:
    >>
    >>//////////////////////////////////////////////////////////////////////////////
    >>
    >>inline std::vector<std::string> TokenizeString( const std::string& Text,
    >>const std::string& Delimiters )

    >
    >Very interesting code.
    >
    >But may I ask:
    >
    >1. Why are you defining the function as "inline"?
    >Is "inline" just for simple stuff like a simple accessor (Get/Set) and
    >similar...?


    I would agree that defining it as inline is at best optional.

    >2. Why you are returning the string vector?
    >Would be better to return the string vector as reference in parameter
    >list, to avoid copy constructors calls?


    This is most likely not the case. You just think it will not be as efficient.

    Read about return value optimization (RVO).


    > e.g.
    >
    > void TokenizeString(
    > <<< your params >>>
    > /* out */ std::vector< std::string > & Tokens
    > );


    Peronally, I prefer the other style. It is clearer, easier to write, easier
    to read, easier to maintain:

    output_t theOutput = theFunct(theInput);
    or
    output_t theOutput(theFunct(theInput));
    vs

    output_t theOutput; // Argh an unitialised variable !
    theFunct(theInput, theOutput);

    The first case is kind of natural, easy to read. The compiler probably turns
    it into the second case. The second case is second nature to C++ programmers
    but not quite as natural to read to others. The last case is by far the worst.
    Artificially creating an empty object in an undesirable state, then no way to
    immediately know if the following line is a bug:

    theFunct(theOutput, theInput);

    which will compile if for example this is a string transformation function
    that takes a string as input and a string as output.

    No, even if RVO didn't exist, 99% of the time I would prefer the first style.

    Yan
    Yannick Tremblay, Jan 24, 2007
    #12
  13. Amit Gupta

    Guest

    On Jan 24, 4:24 pm, "Chris Theis" <>
    wrote:
    > > On Jan 24, 12:38 pm, "Amit Gupta" <> wrote:

    > [SNIP]
    > >> char *token;
    > >> char *line = "LINE TO BE SEPARATED";
    > >> char *search = " ";

    >
    > >> /* Token will point to "LINE". */
    > >> token = strtok(line, search);

    >
    > >> /* Token will point to "TO". */
    > >> token = strtok(NULL, search);

    >
    > > 1. Its not C++, Its C.Please do not top post!


    I will really take care to release from bad habit of Top Posting,
    the only point i want to make is "think twice when you are using strtok
    in code."
    I was gone thru very bad debuggin hours (at that time i was knowing
    very less about multithreaded code and didn't read/understand man page
    of strtok carefully)

    >
    > Even though strtok is not part of the standard library it's still a valid
    > function call in C++.
    >
    > > 2.If you are using C++ then try to find std library function.But which std library function would that be?

    >
    > > 3.if you are using C then -- Dont use strtok, or use it with caution
    > > and some Extra Checks for Null and related memory stuff.You're absolutely right on this one, although it does not help the OP a bit

    > with his problem.
    >
    > Cheers
    > Chris
    , Jan 25, 2007
    #13
  14. Amit Gupta

    Chris Theis Guest

    "red floyd" <> wrote in message
    news:FTKth.53286$...
    > Chris Theis wrote:
    > You might be tempted for example to attempt to call strtok()
    >> with string object and thus end up doing somethin like this
    >>
    >> strtok( line.c_str(), search);
    >>

    >
    > actually, the above should fail to compile, since strtok() takes a char*
    > as its first argument, and string::c_str() returns a const char *.


    You're absolutely right, but I've (unfortunately) seen compilers silently
    getting away with it.

    Cheers
    Chris
    Chris Theis, Jan 25, 2007
    #14
  15. Amit Gupta

    Chris Theis Guest

    "MrAsm" <> wrote in message
    news:...
    > On Wed, 24 Jan 2007 12:33:52 +0100, "Chris Theis"
    > <> wrote:
    >
    >
    >>or a more sophisticated version:
    >>
    >>//////////////////////////////////////////////////////////////////////////////
    >>
    >>inline std::vector<std::string> TokenizeString( const std::string& Text,
    >>const std::string& Delimiters )

    >
    > Very interesting code.
    >
    > But may I ask:
    >
    > 1. Why are you defining the function as "inline"?
    > Is "inline" just for simple stuff like a simple accessor (Get/Set) and
    > similar...?


    Actually the reason was somehow historical 'cause I simply copied and pasted
    from my personal toolbox file. However, "inline" is simply a hint for the
    compiler and it is actually not bound to follow it. With respect to
    optimization the compiler is free to decide whether to inline the code or
    not as long as it is guaranteed that the behavior does not change. For
    example the rumor that virtual functions are never inlined is still going
    around, although this doesn't really hold true as it is up to the compiler
    and depends on the context
    (http://msdn.microsoft.com/msdnmag/issues/0600/c/).

    >
    > 2. Why you are returning the string vector?
    > Would be better to return the string vector as reference in parameter
    > list, to avoid copy constructors calls?


    Yan already answered this in more detail.

    Cheers
    Chris
    Chris Theis, Jan 25, 2007
    #15
  16. Amit Gupta

    Amit Gupta Guest

    On Jan 24, 3:33 am, "Chris Theis" <>
    wrote:
    > "Amit Gupta" <> wrote in

    [...]

    > }or a more sophisticated version:
    >
    > //////////////////////////////////////////////////////////////////////////////
    >
    > inline std::vector<std::string> TokenizeString( const std::string& Text,
    > const std::string& Delimiters )
    > // Tokenize a passed string with respect to the provided delimiters
    > //
    > // e.g.
    > // string Line = "this_dog_is mine";
    > // string Delimiters = " ,:_;#";
    > // vector<string> WordList = TokenizeString( Line, Delimiters );
    > //////////////////////////////////////////////////////////////////////////////
    > {
    > std::vector<std::string> WordList;
    > std::string::size_type Begin, End;
    > std::string Word;
    >
    > Begin = Text.find_first_not_of( Delimiters ); // skip blanks or whatever
    > one finds at the beginning
    > while( Begin != std::string::npos ) {
    > End = Text.find_first_of( Delimiters, Begin );
    > if( End == std::string::npos ) { // we'v reached the end without
    > finding another delimiter
    > End = Text.length();
    > }
    >
    > Word.assign( Text.begin() + Begin, Text.begin() + End );
    > WordList.push_back( Word );
    > Begin = Text.find_first_not_of( Delimiters, End);
    > }
    >
    > return WordList;


    -Thanks
    Amit Gupta, Jan 27, 2007
    #16
  17. Amit Gupta

    Jerry Coffin Guest

    In article <ep7g71$2rc$>,
    says...

    [ ... ]

    > Thus, you might consider doing a simple tokenizer using stringstreams if
    > applicable:


    [ code using stringstream elided ]

    > or a more sophisticated version:


    [ code using find_first_of and find_first_not_of elided ]

    You can combine the two, using stringstreams with delimiters of your
    choice. A stream considers something as a delimiter if its associated
    locale says that character is whitespace.

    One example is at:

    http://groups.google.com/group/comp.lang.c /msg/c181e95c03be9041

    --
    Later,
    Jerry.

    The universe is a figment of its own imagination.
    Jerry Coffin, Jan 27, 2007
    #17
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mladen Adamovic
    Replies:
    0
    Views:
    734
    Mladen Adamovic
    Dec 4, 2003
  2. Mladen Adamovic
    Replies:
    3
    Views:
    14,594
    Mladen Adamovic
    Dec 5, 2003
  3. Matt
    Replies:
    3
    Views:
    500
    Tor Iver Wilhelmsen
    Sep 17, 2004
  4. Bruce Sam
    Replies:
    15
    Views:
    7,908
    John C. Bollinger
    Nov 19, 2004
  5. =?Utf-8?B?UmFqZXNoIHNvbmk=?=

    'System.String[]' from its string representation 'String[] Array'

    =?Utf-8?B?UmFqZXNoIHNvbmk=?=, May 4, 2006, in forum: ASP .Net
    Replies:
    0
    Views:
    1,801
    =?Utf-8?B?UmFqZXNoIHNvbmk=?=
    May 4, 2006
Loading...

Share This Page