tokezing a string

A

Amit Gupta

Hi -

I get a seg-fault when I compile and run this simple program.
(seg-fault in first call to strtok). Any clues?
My gcc is "gcc version 4.1.1 20070105 (Red Hat 4.1.1-51)"

#include <string.h>


int main()
{
char *token;
char *line = "LINE TO BE SEPARATED";
char *search = " ";


/* Token will point to "LINE". */
token = strtok(line, search);


/* Token will point to "TO". */
token = strtok(NULL, search);
}
 
J

John Harrison

Amit said:
Hi -

I get a seg-fault when I compile and run this simple program.
(seg-fault in first call to strtok). Any clues?
My gcc is "gcc version 4.1.1 20070105 (Red Hat 4.1.1-51)"

#include <string.h>


int main()
{
char *token;
char *line = "LINE TO BE SEPARATED";
char *search = " ";


/* Token will point to "LINE". */
token = strtok(line, search);


/* Token will point to "TO". */
token = strtok(NULL, search);
}

Yes, strtok modifes the string it operates on, but string literal are
read-only.

Change your code to use an array instead of a pointer

char line[] = "LINE TO BE SEPARATED";

and it will work.
 
R

raxitsheth2000

1. Its not C++, Its C.
2.If you are using C++ then try to find std library function.
3.if you are using C then -- Dont use strtok, or use it with caution
and some Extra Checks for Null and related memory stuff.

--raxit
 
C

Chris Theis

1. Its not C++, Its C.

Please do not top post!

Even though strtok is not part of the standard library it's still a valid
function call in C++.
2.If you are using C++ then try to find std library function.

But which std library function would that be?
3.if you are using C then -- Dont use strtok, or use it with caution
and some Extra Checks for Null and related memory stuff.

You're absolutely right on this one, although it does not help the OP a bit
with his problem.

Cheers
Chris
 
C

Chris Theis

[SNIP]
#include <string.h>
int main()
{
char *token;
char *line = "LINE TO BE SEPARATED";
char *search = " ";


/* Token will point to "LINE". */
token = strtok(line, search);


/* Token will point to "TO". */
token = strtok(NULL, search);
}

Hello,

John already pointed out what to do but I just want to add a general remark.
Using strtok() might not be the best solution for tokenizing especially if
you use C++. You might be tempted for example to attempt to call strtok()
with string object and thus end up doing somethin like this

strtok( line.c_str(), search);

Even though the line above causes undefined behavior as strtok will attempt
to modify the data of the string object, I have seen it working on some
platforms. This easily leads to troublesome confidence that it "works" and
you can easily find yourself in trouble one day, when it breaks.

Thus, you might consider doing a simple tokenizer using stringstreams if
applicable:

// std::string str = "1 2 3 4";
// std::vector<int> vec = StringToVector<int>(str);

template <class T> std::vector<T> StringToVector( const std::string& Str )
{
std::istringstream iss( Str );
return std::vector<T>( std::istream_iterator<T>(iss),
std::istream_iterator<T>() );
}

or a more sophisticated version:

//////////////////////////////////////////////////////////////////////////////

inline std::vector<std::string> TokenizeString( const std::string& Text,
const std::string& Delimiters )
// Tokenize a passed string with respect to the provided delimiters
//
// e.g.
// string Line = "this_dog_is mine";
// string Delimiters = " ,:_;#";
// vector<string> WordList = TokenizeString( Line, Delimiters );
//////////////////////////////////////////////////////////////////////////////
{
std::vector<std::string> WordList;
std::string::size_type Begin, End;
std::string Word;

Begin = Text.find_first_not_of( Delimiters ); // skip blanks or whatever
one finds at the beginning
while( Begin != std::string::npos ) {
End = Text.find_first_of( Delimiters, Begin );
if( End == std::string::npos ) { // we'v reached the end without
finding another delimiter
End = Text.length();
}

Word.assign( Text.begin() + Begin, Text.begin() + End );
WordList.push_back( Word );
Begin = Text.find_first_not_of( Delimiters, End);
}

return WordList;
};

Cheers
Chris
 
R

Rolf Magnus

Chris Theis wrote:

Even though strtok is not part of the standard library it's still a valid
function call in C++.

Actually, strtok _is_ part of the standard library.
 
P

Pete Becker

1. Its not C++, Its C.

Which part of the code is not valid C++?
2.If you are using C++ then try to find std library function.

strtok is part of the C++ standard library.
3.if you are using C then -- Dont use strtok, or use it with caution
and some Extra Checks for Null and related memory stuff.

Yup. Know what the requirements are for any function you call, and be
sure that you've satisfied them.

--

-- Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com)
Author of "The Standard C++ Library Extensions: a Tutorial and
Reference." (www.petebecker.com/tr1book)
 
M

MrAsm

or a more sophisticated version:

//////////////////////////////////////////////////////////////////////////////

inline std::vector<std::string> TokenizeString( const std::string& Text,
const std::string& Delimiters )

Very interesting code.

But may I ask:

1. Why are you defining the function as "inline"?
Is "inline" just for simple stuff like a simple accessor (Get/Set) and
similar...?

2. Why you are returning the string vector?
Would be better to return the string vector as reference in parameter
list, to avoid copy constructors calls?

e.g.

void TokenizeString(
<<< your params >>>
/* out */ std::vector< std::string > & Tokens
);


Thanks in advance,
MrAsm
 
R

red floyd

Chris Theis wrote:
You might be tempted for example to attempt to call strtok()
with string object and thus end up doing somethin like this

strtok( line.c_str(), search);

actually, the above should fail to compile, since strtok() takes a char*
as its first argument, and string::c_str() returns a const char *.
 
C

Chris Theis

Rolf Magnus said:
Chris Theis wrote:



Actually, strtok _is_ part of the standard library.

You're of course right! That was a glitch on my part as I assumed the OP was
referring to what was referred to as the standard template lib.

Cheers
Chris
 
Y

Yannick Tremblay

Very interesting code.

But may I ask:

1. Why are you defining the function as "inline"?
Is "inline" just for simple stuff like a simple accessor (Get/Set) and
similar...?

I would agree that defining it as inline is at best optional.
2. Why you are returning the string vector?
Would be better to return the string vector as reference in parameter
list, to avoid copy constructors calls?

This is most likely not the case. You just think it will not be as efficient.

Read about return value optimization (RVO).

e.g.

void TokenizeString(
<<< your params >>>
/* out */ std::vector< std::string > & Tokens
);

Peronally, I prefer the other style. It is clearer, easier to write, easier
to read, easier to maintain:

output_t theOutput = theFunct(theInput);
or
output_t theOutput(theFunct(theInput));
vs

output_t theOutput; // Argh an unitialised variable !
theFunct(theInput, theOutput);

The first case is kind of natural, easy to read. The compiler probably turns
it into the second case. The second case is second nature to C++ programmers
but not quite as natural to read to others. The last case is by far the worst.
Artificially creating an empty object in an undesirable state, then no way to
immediately know if the following line is a bug:

theFunct(theOutput, theInput);

which will compile if for example this is a string transformation function
that takes a string as input and a string as output.

No, even if RVO didn't exist, 99% of the time I would prefer the first style.

Yan
 
R

raxitsheth2000

I will really take care to release from bad habit of Top Posting,
the only point i want to make is "think twice when you are using strtok
in code."
I was gone thru very bad debuggin hours (at that time i was knowing
very less about multithreaded code and didn't read/understand man page
of strtok carefully)
 
C

Chris Theis

red floyd said:
Chris Theis wrote:
You might be tempted for example to attempt to call strtok()

actually, the above should fail to compile, since strtok() takes a char*
as its first argument, and string::c_str() returns a const char *.

You're absolutely right, but I've (unfortunately) seen compilers silently
getting away with it.

Cheers
Chris
 
C

Chris Theis

MrAsm said:
Very interesting code.

But may I ask:

1. Why are you defining the function as "inline"?
Is "inline" just for simple stuff like a simple accessor (Get/Set) and
similar...?

Actually the reason was somehow historical 'cause I simply copied and pasted
from my personal toolbox file. However, "inline" is simply a hint for the
compiler and it is actually not bound to follow it. With respect to
optimization the compiler is free to decide whether to inline the code or
not as long as it is guaranteed that the behavior does not change. For
example the rumor that virtual functions are never inlined is still going
around, although this doesn't really hold true as it is up to the compiler
and depends on the context
(http://msdn.microsoft.com/msdnmag/issues/0600/c/).
2. Why you are returning the string vector?
Would be better to return the string vector as reference in parameter
list, to avoid copy constructors calls?

Yan already answered this in more detail.

Cheers
Chris
 
A

Amit Gupta

}or a more sophisticated version:

//////////////////////////////////////////////////////////////////////////////

inline std::vector<std::string> TokenizeString( const std::string& Text,
const std::string& Delimiters )
// Tokenize a passed string with respect to the provided delimiters
//
// e.g.
// string Line = "this_dog_is mine";
// string Delimiters = " ,:_;#";
// vector<string> WordList = TokenizeString( Line, Delimiters );
//////////////////////////////////////////////////////////////////////////////
{
std::vector<std::string> WordList;
std::string::size_type Begin, End;
std::string Word;

Begin = Text.find_first_not_of( Delimiters ); // skip blanks or whatever
one finds at the beginning
while( Begin != std::string::npos ) {
End = Text.find_first_of( Delimiters, Begin );
if( End == std::string::npos ) { // we'v reached the end without
finding another delimiter
End = Text.length();
}

Word.assign( Text.begin() + Begin, Text.begin() + End );
WordList.push_back( Word );
Begin = Text.find_first_not_of( Delimiters, End);
}

return WordList;

-Thanks
 
J

Jerry Coffin

[ ... ]
Thus, you might consider doing a simple tokenizer using stringstreams if
applicable:

[ code using stringstream elided ]
or a more sophisticated version:

[ code using find_first_of and find_first_not_of elided ]

You can combine the two, using stringstreams with delimiters of your
choice. A stream considers something as a delimiter if its associated
locale says that character is whitespace.

One example is at:

http://groups.google.com/group/comp.lang.c++/msg/c181e95c03be9041
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,189
Latest member
CryptoTaxSoftware

Latest Threads

Top