strtok and strsep

K

Keith Thompson

Bill Cunningham said:
I just read that some new function called strsep that I've never heard
of has replaced strtok and strtok is now evidently deprecated. Is this C99
or some other standard? strsep...got me.

strsep() is a *proposed* replacement for strtok() (and I don't think
it's particularly new). It differs in its handling of empty fields.

strsep(), unlink strtok(), is not defined by the C standard (nor is it
defined by POSIX). My man page says it conforms to 4.4BSD.
 
B

Bill Cunningham

I just read that some new function called strsep that I've never heard
of has replaced strtok and strtok is now evidently deprecated. Is this C99
or some other standard? strsep...got me.


Bill
 
S

Seebs

strsep() is a *proposed* replacement for strtok() (and I don't think
it's particularly new).

It's not.
It differs in its handling of empty fields.

Also in that it is inherently thread-safe, etcetera.
strsep(), unlink strtok(), is not defined by the C standard (nor is it
defined by POSIX). My man page says it conforms to 4.4BSD.

And to this day I think it should have been added in C99 when we had the
proposal. It's trivial to write, and it's a VERY useful function... The
latter being more than I feel I can say for strtok().

-s
 
K

Kaz Kylheku

strsep() is a *proposed* replacement for strtok() (and I don't think
it's particularly new). It differs in its handling of empty fields.

strsep(), unlink strtok(), is not defined by the C standard (nor is it
defined by POSIX). My man page says it conforms to 4.4BSD.

People working with strings in C using the standard library
should familiarize themselves with the Snobol-inspired functions
strspn, strcspn and strpbrk.

(Snobol has the pattern matching primitives SPAN and BREAK,
where we get the terminology for strspn and strpbrk: the pointer
to the break!)

Time and time again I have seen awful ad-hoc tokenizing C code that could have
been reduced to like 1/5 the number of lines with strspn and strcspn, and made
easy to understand at the same time.

In 2001 I posted the following to comp.lang.c: a strtok function which lets you
maintain your context to avoid the internal global variable.
It retains the disadvantage of poking zeros in the original string.

You can see that the task of pulling a token based on a
a set of separator character is very easy. A call to strspn,
a call to strcspn and taking care of some cases.

#include <string.h>
#include <stdio.h>

/*
* To use this function, initialize a pointer P
* to point to the start of the string. Then extract
* tokens T like this:
* T = get_next_token(&P, delimiters);
* When it returns a null pointer, there are no more,
* and P is set to null value as well.
*/

char *get_next_token(char **context, const char *delim)
{
char *ret;

/* A null context indicates no more tokens. */
if (*context == 0)
return 0;

/* Skip delimiters to find start of token */
ret = (*context += strspn(*context, delim));

/* skip to end of token */
*context += strcspn(*context, delim);

/* If the token has zero length, we just
skipped past a run of trailing delimiters, or
were at the end of the string already.
There are no more tokens. */

if (ret == *context) {
*context = 0;
return 0;
}

/* If the character past the end of the token is the end of the string,
set context to 0 so next time we will report no more tokens.
Otherwise put a 0 there, and advance one character past. */

if (**context == 0) {
*context = 0;
} else {
**context = 0;
(*context)++;
}

return ret;
}

/*
* Handy macro wrapper for get_next_token
*/

#define FOR_EACH_TOKEN(CTX, I, S, D) \
for (CTX = (S), (I) = get_next_token(&(CTX), D); \
(I) != 0; \
(I) = get_next_token(&(CTX), D))

int main(int argc, char **argv)
{
char *context, *iter;

if (argc >= 2)
FOR_EACH_TOKEN (context, iter, argv[1], ":")
puts(iter);

return 0;
}
 
C

Charles Richmond

Bill Cunningham said:
I just read that some new function called strsep that I've never heard
of has replaced strtok and strtok is now evidently deprecated. Is this C99
or some other standard? strsep...got me.

From my limited use of strsep(), one major difference between strsep() and
strtok()... is that strtok() will skip over a run of characters that are
delimiter characters, and strsep() will return a null string for each
additional delimiter character. Also, as pointed out by others, strtok()
saves internal state and thus is *not* thread safe.
 
M

Malcolm McLean

From my limited use of strsep(), one major difference between strsep() and
strtok()... is that strtok() will skip over a run of characters that are
delimiter characters, and strsep() will return a null string for each
additional delimiter character.  Also, as pointed out by others, strtok()
saves internal state and thus is *not* thread safe.
The problem is that a run of spaces almost certainly means the same
thing as just a single space, a run of commas almost certainly
indicates missing data, unless it's a trailing comma on a newline, and
a run of non-space whitespace, like tabs, could mean anything,
depending on context.

These rules are hard to code in a single function.
 
S

Seebs

The problem is that a run of spaces almost certainly means the same
thing as just a single space, a run of commas almost certainly
indicates missing data, unless it's a trailing comma on a newline, and
a run of non-space whitespace, like tabs, could mean anything,
depending on context.

These rules are hard to code in a single function.

But unnecessary, because it's easy to handle that using strsep() and
knowing what rules you're using at any given time. Skipping the null
substrings is trivial.

-s
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top