Re: Deficiency of strtok

Discussion in 'C Programming' started by CBFalconer, Aug 8, 2008.

  1. CBFalconer

    CBFalconer Guest

    Pilcrow wrote:
    >
    > Here is a quick program, together with its output, that illustrates
    > what I consider to be a deficiency of the standard function strtok
    > from <string.h>:


    Here is my solution to that problem.

    /* ------- file tknsplit.h ----------*/
    #ifndef H_tknsplit_h
    # define H_tknsplit_h

    # ifdef __cplusplus
    extern "C" {
    # endif

    #include <stddef.h>

    /* copy over the next tkn from an input string, after
    skipping leading blanks (or other whitespace?). The
    tkn is terminated by the first appearance of tknchar,
    or by the end of the source string.

    The caller must supply sufficient space in tkn to
    receive any tkn, Otherwise tkns will be truncated.

    Returns: a pointer past the terminating tknchar.

    This will happily return an infinity of empty tkns if
    called with src pointing to the end of a string. Tokens
    will never include a copy of tknchar.

    released to Public Domain, by C.B. Falconer.
    Published 2006-02-20. Attribution appreciated.
    revised 2007-05-26 (name)
    */

    const char *tknsplit(const char *src, /* Source of tkns */
    char tknchar, /* tkn delimiting char */
    char *tkn, /* receiver of parsed tkn */
    size_t lgh); /* length tkn can receive */
    /* not including final '\0' */

    # ifdef __cplusplus
    }
    # endif
    #endif
    /* ------- end file tknsplit.h ----------*/

    /* ------- file tknsplit.c ----------*/
    #include "tknsplit.h"

    /* copy over the next tkn from an input string, after
    skipping leading blanks (or other whitespace?). The
    tkn is terminated by the first appearance of tknchar,
    or by the end of the source string.

    The caller must supply sufficient space in tkn to
    receive any tkn, Otherwise tkns will be truncated.

    Returns: a pointer past the terminating tknchar.

    This will happily return an infinity of empty tkns if
    called with src pointing to the end of a string. Tokens
    will never include a copy of tknchar.

    A better name would be "strtkn", except that is reserved
    for the system namespace. Change to that at your risk.

    released to Public Domain, by C.B. Falconer.
    Published 2006-02-20. Attribution appreciated.
    Revised 2006-06-13 2007-05-26 (name)
    */

    const char *tknsplit(const char *src, /* Source of tkns */
    char tknchar, /* tkn delimiting char */
    char *tkn, /* receiver of parsed tkn */
    size_t lgh) /* length tkn can receive */
    /* not including final '\0' */
    {
    if (src) {
    while (' ' == *src) src++;

    while (*src && (tknchar != *src)) {
    if (lgh) {
    *tkn++ = *src;
    --lgh;
    }
    src++;
    }
    if (*src && (tknchar == *src)) src++;
    }
    *tkn = '\0';
    return src;
    } /* tknsplit */

    #ifdef TESTING
    #include <stdio.h>

    #define ABRsize 6 /* length of acceptable tkn abbreviations */

    /* ---------------- */

    static void showtkn(int i, char *tok)
    {
    putchar(i + '1'); putchar(':');
    puts(tok);
    } /* showtkn */

    /* ---------------- */

    int main(void)
    {
    char teststring[] = "This is a test, ,, abbrev, more";

    const char *t, *s = teststring;
    int i;
    char tkn[ABRsize + 1];

    puts(teststring);
    t = s;
    for (i = 0; i < 4; i++) {
    t = tknsplit(t, ',', tkn, ABRsize);
    showtkn(i, tkn);
    }

    puts("\nHow to detect 'no more tkns' while truncating");
    t = s; i = 0;
    while (*t) {
    t = tknsplit(t, ',', tkn, 3);
    showtkn(i, tkn);
    i++;
    }

    puts("\nUsing blanks as tkn delimiters");
    t = s; i = 0;
    while (*t) {
    t = tknsplit(t, ' ', tkn, ABRsize);
    showtkn(i, tkn);
    i++;
    }
    return 0;
    } /* main */

    #endif
    /* ------- end file tknsplit.c ----------*/

    --
    [mail]: Chuck F (cbfalconer at maineline dot net)
    [page]: <http://cbfalconer.home.att.net>
    Try the download section.
     
    CBFalconer, Aug 8, 2008
    #1
    1. Advertising

  2. CBFalconer

    Richard Guest

    CBFalconer <> writes:

    > Pilcrow wrote:
    >>
    >> Here is a quick program, together with its output, that illustrates
    >> what I consider to be a deficiency of the standard function strtok
    >> from <string.h>:

    >
    > Here is my solution to that problem.
    >
    > /* ------- file tknsplit.h ----------*/
    > #ifndef H_tknsplit_h
    > # define H_tknsplit_h
    >
    > # ifdef __cplusplus
    > extern "C" {
    > # endif
    >
    > #include <stddef.h>
    >
    > /* copy over the next tkn from an input string, after
    > skipping leading blanks (or other whitespace?). The
    > tkn is terminated by the first appearance of tknchar,
    > or by the end of the source string.
    >
    > The caller must supply sufficient space in tkn to
    > receive any tkn, Otherwise tkns will be truncated.
    >
    > Returns: a pointer past the terminating tknchar.
    >
    > This will happily return an infinity of empty tkns if
    > called with src pointing to the end of a string. Tokens
    > will never include a copy of tknchar.
    >
    > released to Public Domain, by C.B. Falconer.
    > Published 2006-02-20. Attribution appreciated.
    > revised 2007-05-26 (name)
    > */
    >
    > const char *tknsplit(const char *src, /* Source of tkns */
    > char tknchar, /* tkn delimiting char */
    > char *tkn, /* receiver of parsed tkn */
    > size_t lgh); /* length tkn can receive */
    > /* not including final '\0' */
    >
    > # ifdef __cplusplus
    > }
    > # endif
    > #endif
    > /* ------- end file tknsplit.h ----------*/
    >
    > /* ------- file tknsplit.c ----------*/
    > #include "tknsplit.h"
    >
    > /* copy over the next tkn from an input string, after
    > skipping leading blanks (or other whitespace?). The
    > tkn is terminated by the first appearance of tknchar,
    > or by the end of the source string.
    >
    > The caller must supply sufficient space in tkn to
    > receive any tkn, Otherwise tkns will be truncated.
    >
    > Returns: a pointer past the terminating tknchar.
    >
    > This will happily return an infinity of empty tkns if
    > called with src pointing to the end of a string. Tokens
    > will never include a copy of tknchar.
    >
    > A better name would be "strtkn", except that is reserved
    > for the system namespace. Change to that at your risk.
    >
    > released to Public Domain, by C.B. Falconer.
    > Published 2006-02-20. Attribution appreciated.
    > Revised 2006-06-13 2007-05-26 (name)
    > */
    >
    > const char *tknsplit(const char *src, /* Source of tkns */
    > char tknchar, /* tkn delimiting char */
    > char *tkn, /* receiver of parsed tkn */
    > size_t lgh) /* length tkn can receive */
    > /* not including final '\0' */
    > {
    > if (src) {
    > while (' ' == *src) src++;
    >
    > while (*src && (tknchar != *src)) {
    > if (lgh) {
    > *tkn++ = *src;
    > --lgh;
    > }
    > src++;
    > }
    > if (*src && (tknchar == *src)) src++;
    > }


    I would replace the function with this more or less for the following
    reasons:

    Back to front comparisons are used in a minority of code and most people
    hate them. And yes I do know "most" is not "all". Naming of variables
    seems almost meaningles - lgh is what? it doesnt save much compiling
    time to use meaningful names in a publicly released library. Oh and like
    other functions assume it is called with meaningful data.

    ,----
    |
    |/* remove leading white space */
    |while(*source==' ')
    | *source++;
    |
    |/* store string up to next token */
    |while(maxLength-- && ((tokenChar = *source++) != endOfTokenChar))
    | *savedToken++=tokenChar;
    |
    |*savedToken='\0';
    |
    `----





    > *tkn = '\0';
    > return src;
    > } /* tknsplit */
    >
    > #ifdef TESTING
    > #include <stdio.h>
    >
    > #define ABRsize 6 /* length of acceptable tkn abbreviations */
    >
    > /* ---------------- */
    >
    > static void showtkn(int i, char *tok)
    > {
    > putchar(i + '1'); putchar(':');
    > puts(tok);
    > } /* showtkn */
    >
    > /* ---------------- */
    >
    > int main(void)
    > {
    > char teststring[] = "This is a test, ,, abbrev, more";
    >
    > const char *t, *s = teststring;
    > int i;
    > char tkn[ABRsize + 1];
    >
    > puts(teststring);
    > t = s;
    > for (i = 0; i < 4; i++) {
    > t = tknsplit(t, ',', tkn, ABRsize);
    > showtkn(i, tkn);
    > }
    >
    > puts("\nHow to detect 'no more tkns' while truncating");
    > t = s; i = 0;
    > while (*t) {
    > t = tknsplit(t, ',', tkn, 3);
    > showtkn(i, tkn);
    > i++;
    > }
    >
    > puts("\nUsing blanks as tkn delimiters");
    > t = s; i = 0;
    > while (*t) {
    > t = tknsplit(t, ' ', tkn, ABRsize);
    > showtkn(i, tkn);
    > i++;
    > }
    > return 0;
    > } /* main */
    >
    > #endif
    > /* ------- end file tknsplit.c ----------*/


    --
     
    Richard, Aug 8, 2008
    #2
    1. Advertising

  3. CBFalconer

    Richard Guest

    Richard<> writes:

    > CBFalconer <> writes:
    >
    > ,----
    > |
    > |/* remove leading white space */
    > |while(*source==' ')
    > | *source++;


    Whoops! ^ error above left to the inquisitive to spot and pick on ....

    > |
    > |/* store string up to next token */
    > |while(maxLength-- && ((tokenChar = *source++) != endOfTokenChar))
    > | *savedToken++=tokenChar;
    > |
    > |*savedToken='\0';
    > |
    > `----
    >
    >
    >
    >
    >
    >> *tkn = '\0';
    >> return src;
    >> } /* tknsplit */
    >>
    >> #ifdef TESTING
    >> #include <stdio.h>
    >>
    >> #define ABRsize 6 /* length of acceptable tkn abbreviations */
    >>
    >> /* ---------------- */
    >>
    >> static void showtkn(int i, char *tok)
    >> {
    >> putchar(i + '1'); putchar(':');
    >> puts(tok);
    >> } /* showtkn */
    >>
    >> /* ---------------- */
    >>
    >> int main(void)
    >> {
    >> char teststring[] = "This is a test, ,, abbrev, more";
    >>
    >> const char *t, *s = teststring;
    >> int i;
    >> char tkn[ABRsize + 1];
    >>
    >> puts(teststring);
    >> t = s;
    >> for (i = 0; i < 4; i++) {
    >> t = tknsplit(t, ',', tkn, ABRsize);
    >> showtkn(i, tkn);
    >> }
    >>
    >> puts("\nHow to detect 'no more tkns' while truncating");
    >> t = s; i = 0;
    >> while (*t) {
    >> t = tknsplit(t, ',', tkn, 3);
    >> showtkn(i, tkn);
    >> i++;
    >> }
    >>
    >> puts("\nUsing blanks as tkn delimiters");
    >> t = s; i = 0;
    >> while (*t) {
    >> t = tknsplit(t, ' ', tkn, ABRsize);
    >> showtkn(i, tkn);
    >> i++;
    >> }
    >> return 0;
    >> } /* main */
    >>
    >> #endif
    >> /* ------- end file tknsplit.c ----------*/


    --
     
    Richard, Aug 8, 2008
    #3
  4. CBFalconer

    santosh Guest

    Richard wrote:
    > CBFalconer <> writes:


    <snip>

    >> const char *tknsplit(const char *src, /* Source of tkns */
    >> char tknchar, /* tkn delimiting char */
    >> char *tkn, /* receiver of parsed tkn */
    >> size_t lgh) /* length tkn can receive */
    >> /* not including final '\0' */
    >> {
    >> if (src) {
    >> while (' ' == *src) src++;
    >>
    >> while (*src && (tknchar != *src)) {
    >> if (lgh) {
    >> *tkn++ = *src;
    >> --lgh;
    >> }
    >> src++;
    >> }
    >> if (*src && (tknchar == *src)) src++;
    >> }

    >
    > I would replace the function with this more or less for the following
    > reasons:
    >
    > Back to front comparisons are used in a minority of code and most
    > people hate them. And yes I do know "most" is not "all". Naming of
    > variables seems almost meaningles - lgh is what? it doesnt save much
    > compiling time to use meaningful names in a publicly released library.
    > Oh and like other functions assume it is called with meaningful data.
    >
    > ,----
    > |
    > |/* remove leading white space */
    > |while(*source==' ')


    Wouldn't you be better of using isspace()?

    > | *source++;


    I suppose this should be source++?

    > |/* store string up to next token */
    > |while(maxLength-- && ((tokenChar = *source++) != endOfTokenChar))
    > | *savedToken++=tokenChar;
    > |
    > |*savedToken='\0';
    > |
    > `----


    <snip>
     
    santosh, Aug 8, 2008
    #4
  5. Richard<> writes:

    > CBFalconer <> writes:

    <snip>
    >> const char *tknsplit(const char *src, /* Source of tkns */
    >> char tknchar, /* tkn delimiting char */
    >> char *tkn, /* receiver of parsed tkn */
    >> size_t lgh) /* length tkn can receive */
    >> /* not including final '\0' */
    >> {
    >> if (src) {
    >> while (' ' == *src) src++;
    >>
    >> while (*src && (tknchar != *src)) {
    >> if (lgh) {
    >> *tkn++ = *src;
    >> --lgh;
    >> }
    >> src++;
    >> }
    >> if (*src && (tknchar == *src)) src++;
    >> }

    >
    > I would replace the function with this more or less for the following
    > reasons:
    >
    > Back to front comparisons are used in a minority of code and most people
    > hate them. And yes I do know "most" is not "all". Naming of variables
    > seems almost meaningles - lgh is what? it doesnt save much compiling
    > time to use meaningful names in a publicly released library. Oh and like
    > other functions assume it is called with meaningful data.
    >
    > ,----
    > |
    > |/* remove leading white space */
    > |while(*source==' ')
    > | *source++;


    This erroneous * has been corrected elsewhere, but..

    > |
    > |/* store string up to next token */
    > |while(maxLength-- && ((tokenChar = *source++) != endOfTokenChar))
    > | *savedToken++=tokenChar;
    > |
    > |*savedToken='\0';
    > |
    > `----


    You can't be serious? You seem to be since you corrected one mistake
    but you've left plenty more. Hardly a re-write to recommend.

    >> *tkn = '\0';
    >> return src;
    >> } /* tknsplit */


    --
    Ben.
     
    Ben Bacarisse, Aug 8, 2008
    #5
  6. Pilcrow <> writes:

    <snip>
    > int main(void)
    > {
    > char line[256];
    > while(fgets(line, sizeof line,
    > stdin)!= NULL && strlen(line) >1) { /* exit on empty line or ^Z
    > */
    > const char *t = line;
    > int i;
    > char tkn[1]; /*** ?? ***/
    > i = 0;
    >
    > printf("\nINPUT: %s\n",line);
    > puts("OUTPUT:");
    > while (*t) {
    > t = tknsplit(t, ',', tkn, 100);
    > showtkn(i, tkn);
    > i++;
    > }
    > putchar('\n');
    > }
    > return 0;
    > } /* main */
    >
    > When I test, I get:
    >
    > C:\>gcc -pedantic -O -Wall -Wextra tknsplit.c -o tknsplit
    >
    > C:\>tknsplit.exe
    > ghghghghghg,hhhhhh,, ,kk
    >
    > INPUT: ghghghghghg,hhhhhh,, ,kk
    >
    > OUTPUT:
    > 0:[ghghghghghg]
    > 1:[hhhhhh]
    > 2:[]
    > 3:[]
    > 4:[kk]
    >
    > Notice the definition
    > char tkn[1];
    >
    > This should not work according to Mr. Falconer. Why does it?


    Since you need a '\0' at the end you can't store any useful tokens in
    an array of size 1. You tell the function that the array can hold 100
    characters and at that point you enter the realms of undefined
    behaviour. You were just unlucky that the program ran without obvious
    fault -- you must not lie about the size of receiving array.

    --
    Ben.
     
    Ben Bacarisse, Aug 17, 2008
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. PenguinOfDoom

    distutils deficiency

    PenguinOfDoom, Jun 25, 2003, in forum: Python
    Replies:
    0
    Views:
    739
    PenguinOfDoom
    Jun 25, 2003
  2. Gary Feldman

    Deficiency in urllib/socket for https?

    Gary Feldman, Aug 21, 2003, in forum: Python
    Replies:
    4
    Views:
    497
    John J. Lee
    Aug 23, 2003
  3. KMyers1
    Replies:
    2
    Views:
    418
    KMyers1
    Jul 20, 2007
  4. KMyers1
    Replies:
    0
    Views:
    323
    KMyers1
    Jul 20, 2007
  5. robic0
    Replies:
    15
    Views:
    207
    robic0
    Dec 29, 2005
Loading...

Share This Page