strtok/strtok_r woes

K

kbhat

I had written some code using strtok_r for parsing strings with two
"levels" of delimiters. The code was working like a charm, until it
suddenly broke one day. You see, I was applying strtok on the input
string itself, and as long as a variable string was passed to the
function, everything was hunky dory. Then one day somebody decided to
pass a constant string to my code ---- and everything came collapsing
down like a domino.

I believe that this is a pitfall into which many programmers may fall.
I have now changed the signature of my function to treat the input
string as a constant string, and I now making a local copy of the
string and operating on that. Of course, I now have to ensure that I
free up the dynamically allocated memory.
Ciao
KB

[I have posted my source listing to comp.sources.d]
 
K

kbhat

Modified source listing for my program using strtok_r in which I make a
local copy of the input string and operate strtok on that


// Program that parses strings of the form:
// Name, SS#, Emp#, Other#::Name, SS#, Emp#, Other#::Name, SS#, Emp#,
Other#
// and extracts the name and other attributes associated with an
employee


#include <stdio.h>
#include <iostream.h>
#include <fstream.h>

#include <list.h>

extern "C" char *strtok_r(char *s1, const char *s2, char **lasts);

#define MAX_DATA_LEN 128
#define EMPLOYEE_DELIMITER_STR ":"
#define ATTR_DELIMITER_STR ","
#define ATTR_DELIMITER_CHAR ','


struct EmployeeInfo
{
char _name[MAX_DATA_LEN];
size_t _attributeCount;
size_t *_attribute;

EmployeeInfo(const char *str):_attributeCount(0), _attribute(NULL)
{
if(!strstr(str,ATTR_DELIMITER_STR))
strcpy(_name, str);
else
{
char *temp1 = new char[strlen(str) + 1];
strcpy(temp1, str);
char *temp2 = temp1;

char* holdingBuf[1];

strcpy(_name, strtok_r(temp1, (const char *) ATTR_DELIMITER_STR,
holdingBuf));
temp1 += (strlen(temp1) + 1);
_attributeCount = numChars(temp1, ATTR_DELIMITER_CHAR) + 1;
_attribute = new size_t[_attributeCount];
memset(_attribute, '\0', _attributeCount*sizeof(*_attribute));

char* stringWithTokens = temp1;
char* token;
int i;
for(i =0;(token = strtok_r(stringWithTokens, (const char *)
ATTR_DELIMITER_STR, holdingBuf)) != NULL;
i++)
{
stringWithTokens = NULL;
_attribute = atoi(token);
}

_attributeCount = i;

delete [] temp2;
}
}

virtual ~EmployeeInfo()
{
if(_attributeCount) delete [] _attribute;
}

void toString()
{
cout << "\nEmployee Name: " << _name << endl;
for(size_t i = 0; i < _attributeCount; i++)
cout << "Attribute " << i << ": " << _attribute << endl;
}

protected:
static size_t numChars(const char* str, const char delimiter)
{
size_t retVal = 0;
char ch;
for(size_t i = 0; ch = str; i++)
{
if(ch == delimiter)
retVal++;
}
return retVal;
}
};


void
parseNameString(const char* nameString)
{
char *temp1 = new char[strlen(nameString) + 1];
strcpy(temp1, nameString);
char *temp2 = temp1;
char* holdingBuf[1];

list<EmployeeInfo*> empInfoList;

char* stringWithTokens = temp1;
char* token;
while((token = strtok_r(stringWithTokens, (const char *)
EMPLOYEE_DELIMITER_STR, holdingBuf)) != NULL)
{
stringWithTokens = NULL;
EmployeeInfo *newInfo = new EmployeeInfo(token);
empInfoList.push_back(newInfo);
}

delete [] temp2;

list<EmployeeInfo*>::iterator itor, endOfList=empInfoList.end();

for(itor = empInfoList.begin(); itor != endOfList; itor++)
{
EmployeeInfo* thisInfo = *itor;
thisInfo->toString();
}

for(itor = empInfoList.begin(); itor != endOfList; itor++)
{

EmployeeInfo* thisInfo = *itor;
delete thisInfo;
}
}

int
main(int argc, char* argv[])
{
if(argc < 2)
{
cerr << "usage: " << argv[0] << " <inputfile>. Exiting ...." <<
endl;
exit(-1);
}

ifstream inFile(argv[1]);
if(!inFile)
{
cerr << "Error reading file " << argv[1] << " Exiting ...." <<
endl;
exit(-2);
}

char inputStr[MAX_DATA_LEN + 1];

/*
while(inFile >> inputStr)
{
cout << "** Parsing string " << inputStr << " **\n\n";
parseNameString(inputStr);
cout << "\n\n\n";
}
*/

while(inFile.getline(inputStr, MAX_DATA_LEN))
{
cout << "** Parsing string " << inputStr << " **\n\n";
parseNameString(inputStr);
cout << "\n\n\n";
}

exit(0);
}
 
K

kbhat

The posting of the entire source listing to the comp.lang.c newsgroup
was an inadvertent mistake which is deeply regretted.
 
J

Jonathan Burd

I had written some code using strtok_r for parsing strings with two
"levels" of delimiters. The code was working like a charm, until it
suddenly broke one day. You see, I was applying strtok on the input
string itself, and as long as a variable string was passed to the
function, everything was hunky dory. Then one day somebody decided to
pass a constant string to my code ---- and everything came collapsing
down like a domino.

I believe that this is a pitfall into which many programmers may fall.
I have now changed the signature of my function to treat the input
string as a constant string, and I now making a local copy of the
string and operating on that. Of course, I now have to ensure that I
free up the dynamically allocated memory.
Ciao
KB

[I have posted my source listing to comp.sources.d]

char* strtok (char*, const char*);

strtok works by inserting '\0' in place of the delimiter each
successive call. Passing a string literal will definitely
cause trouble. Pass a string buffer. ;)

In fact, Herb-the-infamous-clown's book does what you did too!
Stay away from his books, unless you want an exercise
in debugging. ;)

Regards,
Jonathan.
 
M

Mike Wahler

Jonathan Burd said:
strtok works by inserting '\0' in place of the delimiter each
successive call. Passing a string literal will definitely
cause trouble. Pass a string buffer. ;)

In fact, Herb-the-infamous-clown's book does what you did too!

Wow, I knew he was spreading much misinformation, but
I didn't think it was *that* bad.

Stay away from his books, unless you want an exercise
in debugging. ;)

Too true.

-Mike
 
J

Jonathan Burd

Mike said:
Wow, I knew he was spreading much misinformation, but
I didn't think it was *that* bad.
<snip>

I wonder if he even compiled the example he has given.
I've put large crosses in that book wherever I've discovered
such errors. (That's the only book that I've marked so far.
Heh.)

Regards,
Jonathan.
 
M

Mike Wahler

Jonathan Burd said:
<snip>

I wonder if he even compiled the example he has given.
I've put large crosses in that book wherever I've discovered
such errors. (That's the only book that I've marked so far.
Heh.)

Depending upon the implementation(s) used, compiling and
testing might not have caught it. Especially on older
systems I've used, I've seen them happily accept modifying
literals, and 'seem to work'. I think stuff like this
(depending upon behavior of particular implementation(s)
rather than the language rules for determining correctness)
could easily be the cause of many of the errors in his work.

-Mike
 
N

nimmi_srivastav

In the interest of "public disclosure" I am reproducing the coding
sample that I see in my copy of "C: The Complete Reference"

--NS

#include "stdio.h" /* NS: Shouldn't this be <stdio.h>? */
#include "string.h" /* NS: Shouldn't this be <string.h>? */

void main(void)
{
char *p;

p = strtok("The summer soldier, the sunshine patriot", " ");
/* NS: Passing a constant string to strtok */

printf(p); /* NS: Isn't %s missing here? */
do {
p = strtok('\0', ", ");
if(p) printf("|%s", p);
} while(p);
}
 
J

Jack Klein

In the interest of "public disclosure" I am reproducing the coding
sample that I see in my copy of "C: The Complete Reference"

--NS

#include "stdio.h" /* NS: Shouldn't this be <stdio.h>? */
#include "string.h" /* NS: Shouldn't this be <string.h>? */

Yes, they both should. The #include "sometext" format is for
void main(void)

This of course is just plain undefined behavior.
{
char *p;

p = strtok("The summer soldier, the sunshine patriot", " ");
/* NS: Passing a constant string to strtok */

No, your comment is in correct. He is passing the address of a string
literal, but the type of a string literal in C is "array of char" and
NOT "array of const char". Attempting to modify a string literal in C
does indeed produce undefined behavior, but not because the characters
are const, just because the C standard specifically says so.

Because attempting to modify a string literal is undefined, compilers
are free to actually make them const, but are not required to do so.
No strictly conforming program can tell one way or the other.
printf(p); /* NS: Isn't %s missing here? */

Not necessarily. If you pass a pointer to char to printf, assuming
that the pointer points to a valid string, printf will just output the
string. Unless of course the string happens to contain a conversion
specifier. If p points to "%s" for example, yet another incident of
undefined behavior occurs.
do {
p = strtok('\0', ", ");
if(p) printf("|%s", p);
} while(p);
}

Let it never be said that Schildt lacks the talent to cram multiple
examples of undefined behavior in such a small piece of code.
 
J

Jonathan Burd

In the interest of "public disclosure" I am reproducing the coding
sample that I see in my copy of "C: The Complete Reference"

--NS

#include "stdio.h" /* NS: Shouldn't this be <stdio.h>? */
#include "string.h" /* NS: Shouldn't this be <string.h>? */

void main(void)
{
char *p;

p = strtok("The summer soldier, the sunshine patriot", " ");
/* NS: Passing a constant string to strtok */

printf(p); /* NS: Isn't %s missing here? */

No.

I prefer using puts() for testing certain string functions
instead of printf() though. Take, for example, a function
that generates a string consisting of printable characters when you
specify a range, say "^A-Z" (all printable characters except those in
the range A-Z; ^ negates), the string will contain '%' and if you use
printf to test such a routine, it will result in UB. It is
easy to forget that your string may contain '%', so when
testing string functions, generally avoid using printf() unless
you really need it and know what you are doing.
do {
p = strtok('\0', ", ");
if(p) printf("|%s", p);
} while(p);
}

Another error is his incorrect usage of feof().

See his usage in his book and then read this:
http://www.drpaulcarter.com/cs/common-c-errors.php#4.2

Regards,
Jonathan.
 
J

Jonathan Burd

Jonathan said:
(e-mail address removed) wrote:



No.

I prefer using puts() for testing certain string functions
instead of printf() though. Take, for example, a function
that generates a string consisting of printable characters when you
specify a range, say "^A-Z" (all printable characters except those in
the range A-Z; ^ negates), the string will contain '%' and if you use
printf to test such a routine, it will result in UB. It is
easy to forget that your string may contain '%', so when
testing string functions, generally avoid using printf() unless
you really need it and know what you are doing.

In short, use printf ("%s", p); when you need to use it.
printf (p); may be cause palpitation. ;)

<snip>

Regards,
Jonathan.
 
C

CBFalconer

Ben said:
.... snip ...

Do you? Speaking of whom, where has he taken his diplomatic
skills? When last seen around here he had a sig intimating that he
was job hunting.

Another sorely missing regular is Richard Heathfield.
 
R

Richard Bos

do {
p = strtok('\0', ", ");

Apart from what the others have written, that is a misleading strtok()
call. The first parameter to strtok() is a char pointer, either to a
(modifiable) string, or a null pointer. Now it so happens that, through
a quirk of the C Standard, a null character is (unfortunately) also a
null pointer constant. However, to use it as such is misguided, and
could be confusing.

Richard
 
K

Keith Thompson

Apart from what the others have written, that is a misleading strtok()
call. The first parameter to strtok() is a char pointer, either to a
(modifiable) string, or a null pointer. Now it so happens that, through
a quirk of the C Standard, a null character is (unfortunately) also a
null pointer constant. However, to use it as such is misguided, and
could be confusing.

I'm sure you meant to say that a null character *constant* is a null
pointer constant. A character variable whose current value happens to
be '\0' won't work.
 
R

Richard Bos

Keith Thompson said:
I'm sure you meant to say that a null character *constant* is a null
pointer constant.

Yes, so I did.
A character variable whose current value happens to be '\0' won't work.

No more than an int object whose value is 0, no.

Richard
 
D

Daniel Haude

On 24 Jan 2005 18:37:09 -0800,
In the interest of "public disclosure" I am reproducing the coding
sample that I see in my copy of "C: The Complete Reference"

[ mind-boggling piece of shi^h^h^hcode deleted ]

Is this really Schildt? I've never seen his book, but this is worse than I
would have ever dreamed.

--Daniel
 
L

Lawrence Kirby

On 24 Jan 2005 18:37:09 -0800,
In the interest of "public disclosure" I am reproducing the coding
sample that I see in my copy of "C: The Complete Reference"

[ mind-boggling piece of shi^h^h^hcode deleted ]

Is this really Schildt? I've never seen his book, but this is worse than I
would have ever dreamed.

Try a Google search on the word "bullschildt"

Lawrence
 
R

Randy Howard

On 24 Jan 2005 18:37:09 -0800,
In the interest of "public disclosure" I am reproducing the coding
sample that I see in my copy of "C: The Complete Reference"

[ mind-boggling piece of shi^h^h^hcode deleted ]

Is this really Schildt? I've never seen his book, but this is worse than I
would have ever dreamed.

Yes, he's really that bad. I would have used the couple books of his I
bought (years ago before I realized the problem) to start fires with, but
now I save them in case somebody doesn't believe how bad he is. I keep
them out of sight, it's embarrassing to even have a copy, you
definitely don't want another programmer seeing you with one.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top