strtok

  • Thread starter Bill Cunningham
  • Start date
B

Bill Cunningham

Why does my man page for strtok and an extension for a thread safe
version of this function say not to use these functions? Are they buggy or
deprecated? What should one use instead?

Bill
 
N

Nick Keighley

    Why does my man page for strtok and an extension for a thread safe
version of this function say not to use these functions? Are they buggy or
deprecated?

neither,what does the man page say? Some people don't like what
strtok() does. It modifies the string you pass to it and doesn't
handle empty fields in the way you might like.
What should one use instead?

good question. There's no portable answer. So write your own I
suppose. I've used strtok() its ok it does what it says on the can
 
J

Johann Klammer

Bill said:
Why does my man page for strtok and an extension for a thread safe
version of this function say not to use these functions? Are they buggy or
deprecated? What should one use instead?

Bill

To my knowledge, it uses internal state to maintain the current position
which can generate race conditions if used from multiple threads
concurrently.

eglibc has strtok_r() which is a reentrant function.
It allows the user to specify the address of a variable to use for
storing position.

[OT]
This is similar to the errno races observed with threaded code.
Some c libraries thus #define errno to be a function call returning a
pointer, which then is dereferenced to effectively create a thread-local
errno.
 
O

osmium

Bill Cunningham said:
Why does my man page for strtok and an extension for a thread safe
version of this function say not to use these functions? Are they buggy or
deprecated? What should one use instead?

Bill, I suggest you put any concerns about thread safe operations on the
back burner, next to "How will my program accommodate quantum computers?"

If you would quit trying to learn everything you might possibly learn
*something*.
 
B

Bill Cunningham

Nick said:
neither,what does the man page say? Some people don't like what
strtok() does. It modifies the string you pass to it and doesn't
handle empty fields in the way you might like.


good question. There's no portable answer. So write your own I
suppose. I've used strtok() its ok it does what it says on the can

Under bugs I have :
"Never use these functions. If you do note
These functions modify first arg
These functions can't be used with const strings.
The identity of the delimiting character is lost.
strtok uses a static buffer while parsing so is not thread safe. If using
threads user strtok_r."

The man pages online don't say this.

Bill
 
B

Ben Bacarisse

Bill Cunningham said:
Why does my man page for strtok and an extension for a thread safe
version of this function say not to use these functions? Are they buggy or
deprecated? What should one use instead?

If your question is still about reading your market data (as you posted
on comp.programming) you don't need strtok.

If you are sure that the data will be in the format you previously
described, fscanf can do the job perfectly well. If you want a little
more control, read each line using fgets and use sscanf to "parse" the
data you want.

Example:

#include <stdio.h>

int main(void)
{
double price;
int line_no, day, month, year;
char line[100];
while (fgets(line, sizeof line, stdin) &&
sscanf(line, "%d %lf %2d%2d%2d",
&line_no, &price, &day, &month, &year) == 5)
printf("line number: %d, price=%f on %d/%02d/%02d\n",
line_no, price, day, month, year);
return 0;
}
 
B

Bill Cunningham

Ben said:
If your question is still about reading your market data (as you
posted on comp.programming) you don't need strtok.
Ok

If you are sure that the data will be in the format you previously
described, fscanf can do the job perfectly well.

Ok

If you want a little
more control, read each line using fgets and use sscanf to "parse" the
data you want.

Example:

#include <stdio.h>

int main(void)
{
double price;
int line_no, day, month, year;
char line[100];
while (fgets(line, sizeof line, stdin) &&
sscanf(line, "%d %lf %2d%2d%2d",
&line_no, &price, &day, &month, &year) == 5)
printf("line number: %d, price=%f on %d/%02d/%02d\n",
line_no, price, day, month, year);
return 0;
}

Thanks. That sscanf is alittle hard to read but then again I'm not
familiar with that function.

Bill
 
J

jacob navia

Le 01/01/12 22:10, osmium a écrit :
Bill, I suggest you put any concerns about thread safe operations on the
back burner, next to "How will my program accommodate quantum computers?"

If you would quit trying to learn everything you might possibly learn
*something*.
Good words!
 
J

John Tsiombikas

Under bugs I have :
"Never use these functions. If you do note

Retard manpage author. Nevermind that, use it if it does what you need,
just be aware of its limitations as others pointed out.
 
B

Ben Pfaff

Bill Cunningham said:
Why does my man page for strtok and an extension for a thread safe
version of this function say not to use these functions? Are they buggy or
deprecated? What should one use instead?

strtok() has at least these problems:

* It merges adjacent delimiters. If you use a comma as your
delimiter, then "a,,b,c" will be divided into three tokens,
not four. This is often the wrong thing to do. In fact, it
is only the right thing to do, in my experience, when the
delimiter set contains white space (for dividing a string
into "words") or it is known in advance that there will be
no adjacent delimiters.

* The identity of the delimiter is lost, because it is
changed to a null terminator.

* It modifies the string that it tokenizes. This is bad
because it forces you to make a copy of the string if
you want to use it later. It also means that you can't
tokenize a string literal with it; this is not
necessarily something you'd want to do all the time but
it is surprising.

* It can only be used once at a time. If a sequence of
strtok() calls is ongoing and another one is started,
the state of the first one is lost. This isn't a
problem for small programs but it is easy to lose track
of such things in hierarchies of nested functions in
large programs. In other words, strtok() breaks
encapsulation.
 
D

David Thompson

strtok() has at least these problems:

* It merges adjacent delimiters. If you use a comma as your
delimiter, then "a,,b,c" will be divided into three tokens,
not four. This is often the wrong thing to do. In fact, it
is only the right thing to do, in my experience, when the
delimiter set contains white space (for dividing a string
into "words") or it is known in advance that there will be
no adjacent delimiters.
Yes.

* The identity of the delimiter is lost, because it is
changed to a null terminator.
Yes but. IME at least half the time the delimiter is unique (so
identity doesn't matter). There are a few cases where even nonunique
delimiters don't matter; one I personally like -- for vanishingly
small values of like -- is the punctuation in US telephone numbers.
* It modifies the string that it tokenizes. This is bad
because it forces you to make a copy of the string if
you want to use it later. It also means that you can't
tokenize a string literal with it; this is not
necessarily something you'd want to do all the time but
it is surprising.
Yes but. If you want to parse nondestructively and use the tokens as C
strings, you have to copy anyway -- and often allocate each token
separately which is almost certainly costlier. Although that second
condition isn't a given: I worked on one largish project that
religiously used {ptr,len} into existing/shared buffers. It required a
rigid policy on lifetime of the shared buffers (which for this app was
fairly easy) and a substantial custom library (basically duplicating
string.h plus more) but after that was paid once it worked nicely.

And that literals are not (safely) modifiable should be surprising at
most once, and needs to be learned anyway even without strtok();
consider mktemp and mkstemp, strupr and strlwr, or doing parity or
rot13 or other simple ciphering in place.
* It can only be used once at a time. If a sequence of
strtok() calls is ongoing and another one is started,
the state of the first one is lost. This isn't a
problem for small programs but it is easy to lose track
of such things in hierarchies of nested functions in
large programs. In other words, strtok() breaks
encapsulation.

Yes. Although it's often a good idea (sometimes even a requirement) to
separate the parsing from the 'real' processing, and if the parsing by
itself is so complicated you can't grok all the code in one fwoop,
your input syntax is in danger of being unusable anyway.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top