strsup - supplementary string functions

Discussion in 'C Programming' started by Malcolm McLean, Jan 22, 2013.

  1. I'm putting together a little library of supplementary string functions,
    strsup.c.

    The functions are intended to be fairly short, and to operate on char *s.
    They should ideally be implementable as a single function which can be snipped
    and pasted.

    Unlike the string.h functions they can depend on malloc().

    The obvious first function is strdup().

    I've also got

    strcount - count the instances of character ch in str.
    trim - remove leadign and trailing whitespace from a string
    singlespace - repalce all runs of whitespace in a string with a single
    space character.
    split - return an array of fields, split on a delimiter character.
    compnumeric - compare two strings with embedded numbers.
    replace - replace all substrings with new passed string.

    Any ideas for more?
    Malcolm McLean, Jan 22, 2013
    #1
    1. Advertising

  2. Malcolm McLean <> writes:
    > I'm putting together a little library of supplementary string functions,
    > strsup.c.
    >
    > The functions are intended to be fairly short, and to operate on char *s.
    > They should ideally be implementable as a single function which can be snipped
    > and pasted.
    >
    > Unlike the string.h functions they can depend on malloc().
    >
    > The obvious first function is strdup().
    >
    > I've also got
    >
    > strcount - count the instances of character ch in str.
    > trim - remove leadign and trailing whitespace from a string
    > singlespace - repalce all runs of whitespace in a string with a single
    > space character.
    > split - return an array of fields, split on a delimiter character.
    > compnumeric - compare two strings with embedded numbers.
    > replace - replace all substrings with new passed string.


    Just one comment: the name "strcount" is reserved. And the other names
    could easily conflict with existing identifiers in user code.

    You might consider inventing a consistent naming scheme for your
    functions.

    If the purpose is to copy-and-paste the definitions, the user can change
    the name easily enough, but if they're going to be part of a library
    (which would make sense), then naming becomes more important.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Working, but not speaking, for JetHead Development, Inc.
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Jan 22, 2013
    #2
    1. Advertising

  3. Malcolm McLean

    James Kuyper Guest

    On 01/22/2013 02:23 PM, Keith Thompson wrote:
    > --
    > Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    > Working, but not speaking, for JetHead Development, Inc.


    I just noticed that you've dropped your "Will write code for food" sig.
    Congratulations, and good luck!
    James Kuyper, Jan 22, 2013
    #3
  4. James Kuyper <> writes:
    > On 01/22/2013 02:23 PM, Keith Thompson wrote:
    >> --
    >> Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    >> Working, but not speaking, for JetHead Development, Inc.

    >
    > I just noticed that you've dropped your "Will write code for food" sig.
    > Congratulations, and good luck!


    Thanks!

    I'm actually doing C programming, something I haven't done
    professionally in quite a while. (Or, How I Learned To Stop Worrying
    And Love Undefined Behavior.)

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Working, but not speaking, for JetHead Development, Inc.
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Jan 22, 2013
    #4
  5. Malcolm McLean wrote:
    >
    > Any ideas for more?
    >

    I've been using those...

    ///replace non ascii printable chars with spaces
    void utlAsciiSanitize(char *p);
    /**
    *\brief concat+realloc
    *
    *Will fail if called with str_p, size_p or str2 equal NULL or if
    *allocation fails.
    *
    *\param str_p points to a char pointer pointing to a c string.
    *May point to a char * holding NULL when calling, in which case
    *memory will be allocated to hold the string to append.
    *space may be reallocated for the string.
    *\param size_p points to an unsigned holding allocated size.
    *May be modified.
    *\param str2 string to append. Must be NUL terminated.
    *Must not be NULL.
    *\returns 1 on error, 0 on success.
    */
    int utlSmartCat(char **str_p, unsigned *size_p, char *str2);
    ///concat+realloc unterminated second string. _Always_ uses size2 bytes
    from str2.
    int utlSmartCatUt(char **str_p, unsigned *size_p, char *str2, unsigned
    size2);
    ///realloc to strlen+1 (not actually necessary)
    int utlSmartCatDone(char **str_p);

    ///smart cat for general arrays
    int utlSmartMemmove(uint8_t **str_p, unsigned *size_p,
    unsigned *alloc_p, uint8_t *str2, unsigned size2);

    extern char * UTL_UNDEF;
    /**
    *\brief string lookup
    *
    *look up string at idx
    *\param idx index to look up. Must be some integer type.
    *\param tbl _has_ to be of type char *tbl[]
    *\return a char * from tbl or STRU_UNDEF if out of bounds.
    *STRU_UNDEF is a printable char *.
    */
    #define utlGetStr(tbl,idx)
    (((idx)>=(sizeof(tbl)/sizeof(char*)))?UTL_UNDEF:tbl[(idx)])
    Johann Klammer, Jan 22, 2013
    #5
  6. Malcolm McLean

    Jorgen Grahn Guest

    On Tue, 2013-01-22, Malcolm McLean wrote:
    > I'm putting together a little library of supplementary string functions,
    > strsup.c.
    >
    > The functions are intended to be fairly short, and to operate on char *s.
    > They should ideally be implementable as a single function which can be snipped
    > and pasted.
    >
    > Unlike the string.h functions they can depend on malloc().
    >
    > The obvious first function is strdup().
    >
    > I've also got
    >
    > strcount - count the instances of character ch in str.
    > trim - remove leadign and trailing whitespace from a string


    trim_right() might often be what people want, and it has nicer
    semantics. Provide that one too.

    > singlespace - repalce all runs of whitespace in a string with a single
    > space character.
    > split - return an array of fields, split on a delimiter character.


    There are a number of variations on that theme, like:
    - split at most in N places
    - split on whitespace (one or more SP, TAB etc)
    - split and trim whitespace
    - ...

    I usually find that the simplest kind of split() leaves too much work
    for me to do, so I've implemented some of those. On the other hand,
    I also use Perl and that may influence my percieved needs ...

    > Any ideas for more?


    - an is_empty(), so I don't have to see a myriad variations like
    if(!s[0]); if(!*s); if(s[0]==0); if(s[0]=='0'); if(!strlen(s)) ...

    - an equality test so I don't have to see all the variations on
    strcmp() ...

    /Jorgen

    --
    // Jorgen Grahn <grahn@ Oo o. . .
    \X/ snipabacken.se> O o .
    Jorgen Grahn, Jan 23, 2013
    #6
  7. Den 22.01.2013 19:14, skrev Malcolm McLean:
    > I'm putting together a little library of supplementary string functions,
    > strsup.c.
    >
    > The functions are intended to be fairly short, and to operate on char *s.
    > They should ideally be implementable as a single function which can be snipped
    > and pasted.
    >
    > Unlike the string.h functions they can depend on malloc().
    >
    > The obvious first function is strdup().
    >
    > I've also got
    >
    > strcount - count the instances of character ch in str.
    > trim - remove leadign and trailing whitespace from a string
    > singlespace - repalce all runs of whitespace in a string with a single
    > space character.
    > split - return an array of fields, split on a delimiter character.
    > compnumeric - compare two strings with embedded numbers.
    > replace - replace all substrings with new passed string.
    >
    > Any ideas for more?


    You may find some ideas here ;-)

    http://libclc.cvs.sourceforge.net/viewvc/libclc/libclc/src/clc_string.h?view=markup
    Bjorn Augestad, Jan 23, 2013
    #7
  8. Jorgen Grahn <> writes:

    > On Tue, 2013-01-22, Malcolm McLean wrote:

    <snip>
    >> trim - remove leadign and trailing whitespace from a string

    >
    > trim_right() might often be what people want, and it has nicer
    > semantics. Provide that one too.


    A small point: when considering strings that are likely to contain text
    the meaning of left and right is ambiguous. Even if this library is
    currently all about ASCII strings (I don't recall) consistency with
    future names might be worth considering. Terms like start and end (as
    well as leading and trailing) are neutral in this sense.

    <snip>
    --
    Ben.
    Ben Bacarisse, Jan 23, 2013
    #8
  9. On Thursday, January 24, 2013 7:44:16 AM UTC, Dr Nick wrote:
    > Malcolm McLean <> writes:
    >
    >
    > Unless you change your mind and create (yet another, we've all done it)
    > structure based string type, here's a suggestion: have a very clear
    > convention in the naming as to which of your functions return the
    > original string, perhaps modified (convert case, trim the
    > right-hand-end) and which return a new string. In fact, think hard
    > about this all over (should replace always create a new string, create a
    > new string when lengthening but not shortening, or realloc when
    > necessary?).
    >

    The function will all take nul-terminated strings passed as char *s or
    const char *s. The idea is to write functions that should have been in
    string.h but weren't, either because of perceived need, or because string.h
    isn't allowed to depend on malloc.

    I think it's probably best to return a malloced char * where the return maybe larger than the input. Either way is a burden on the caller, malloc hasto be tested for null and freed, buffers have to be calculated. But the pass buffer _ length method has points in its favour also. Resolving that issue was one reason for starting this thread.
    Malcolm McLean, Jan 24, 2013
    #9
  10. Malcolm McLean

    John McCue Guest

    Malcolm McLean <> wrote:
    > I'm putting together a little library of supplementary string functions,
    > strsup.c.
    >
    > The functions are intended to be fairly short, and to operate on char *s.

    <snip>
    >
    > Any ideas for more?
    >


    maybe one which would changed 2 or more spaces to 1 space in
    a string.

    John
    John McCue, Jan 24, 2013
    #10
  11. Malcolm McLean

    JohnF Guest

    Malcolm McLean <> wrote:
    > I'm putting together a little library of supplementary string functions,
    > strsup.c.
    >
    > The functions are intended to be fairly short, and to operate on char *s.
    > They should ideally be implementable as a single function which can be
    > snipped and pasted.
    >
    > Unlike the string.h functions they can depend on malloc().
    >
    > The obvious first function is strdup().
    >
    > I've also got
    >
    > strcount - count the instances of character ch in str.
    > trim - remove leadign and trailing whitespace from a string
    > singlespace - repalce all runs of whitespace in a string with a single
    > space character.
    > split - return an array of fields, split on a delimiter character.
    > compnumeric - compare two strings with embedded numbers.
    > replace - replace all substrings with new passed string.
    >
    > Any ideas for more?


    Some might be implemented as macros, especially the snipped-and-pasted
    variety. Several snipped-and-pasted examples from my programs are below.
    Some do little more than arg-checking, e.g., to avoid segfaulting if
    used with a NULL ptr.
    Want about ten zillion other ideas (some arguably a bit over-the-top)?
    Look at the VMS documentation about lexical string functions in DCL
    (DEC's shell DCL="digital command language"),
    openvms.compaq.com/doc/73final/6489/6489pro_047.html#66_manipulatingstrings

    /* ---
    * macro to skip whitespace
    * ------------------------ */
    #define WHITESPACE " \t\n\r\f\v" /* skipped whitespace chars */
    #define skipwhite(thisstr) if ( (thisstr) != NULL ) \
    thisstr += strspn(thisstr,WHITESPACE)
    /* ---
    * macros to check if a string is empty
    * ------------------------------------ */
    #define isempty(s) ((s)==NULL?1:(*(s)=='\000'?1:0))
    /* ---
    * macro to strip leading and trailing whitespace
    * ---------------------------------------------- */
    #define trimwhite(thisstr) if ( (thisstr) != NULL ) { \
    int thislen = strlen(thisstr); \
    while ( --thislen >= 0 ) \
    if ( isthischar((thisstr)[thislen],WHITESPACE) ) \
    (thisstr)[thislen] = '\000'; \
    else break; \
    if ( (thislen = strspn((thisstr),WHITESPACE)) > 0 ) \
    {strsqueeze((thisstr),thislen);} } else /*user adds ;*/
    /* ---
    * macro to remove all 'c' chars from s
    * ------------------------------------ */
    #define compress(s,c) if(!isempty(s)) /* remove embedded c's from s */ \
    { char *p; while((p=strchr((s),(c)))!=NULL) {strsqueeze(p,1);} } else
    /* ---
    * macro to strcpy(s,s+n) using memmove() (also works for negative n)
    * ------------------------------------------------------------------ */
    #define strsqueeze(s,n) if((n)!=0) { if(!isempty((s))) { \
    int thislen3=strlen(s); \
    if ((n) >= thislen3) *(s) = '\000'; \
    else memmove((s),((s)+(n)),(1+thislen3-(n))); }} else /*user adds ;*/
    /* ---
    * macro to strncpy() n bytes and make sure it's null-terminated
    * ------------------------------------------------------------- */
    #define strninit(target,source,n) \
    if( (target)!=NULL && (n)>=0 ) { \
    char *thissource = (source); \
    (target)[0] = '\000'; \
    if ( (n)>0 && thissource!=NULL ) { \
    strncpy((target),thissource,(n)); \
    (target)[(n)] = '\000'; } }
    /* ---
    * macro to check for thischar inthisstr
    * ------------------------------------- */
    #define isthischar(thischar,inthisstr) \
    ( (thischar)!='\000' && *(inthisstr)!='\000' \
    && strchr(inthisstr,(thischar))!=(char *)NULL )
    /* ---
    * macro for last char of a string
    * ------------------------------- */
    #define lastchar(s) (isempty(s)?'\000':*((s)+(strlen(s)-1)))

    --
    John Forkosh ( mailto: where j=john and f=forkosh )
    JohnF, Jan 25, 2013
    #11
  12. Malcolm McLean

    BartC Guest

    "Malcolm McLean" <> wrote in message
    news:...
    > On Thursday, January 24, 2013 7:44:16 AM UTC, Dr Nick wrote:


    >> Unless you change your mind and create (yet another, we've all done it)
    >> structure based string type, here's a suggestion: have a very clear
    >> convention in the naming as to which of your functions return the
    >> original string, perhaps modified (convert case, trim the
    >> right-hand-end) and which return a new string. In fact, think hard
    >> about this all over (should replace always create a new string, create a
    >> new string when lengthening but not shortening, or realloc when
    >> necessary?).
    >>

    > The function will all take nul-terminated strings passed as char *s or
    > const char *s. The idea is to write functions that should have been in
    > string.h but weren't, either because of perceived need, or because
    > string.h
    > isn't allowed to depend on malloc.
    >
    > I think it's probably best to return a malloced char * where the return
    > may be larger than the input.


    How will the caller know that? And in the case of a trim function, the
    result will never be bigger than the input, but the output may need to be
    stored elsewhere because of the need to zero-terminate a sub-string of the
    input.

    As was mentioned, there are just too many ways of dealing with such
    functions: results can be in-place, in a caller-supplied destination, or in
    allocated memory.

    Sometimes also, the caller has useful length information for a string, but
    the standard library doesn't often have no way to impart that information to
    the function; that would be handy to be able to do.

    For ideas about new string functions, I mainly use the following:

    o Convert a string to upper/lower case (some Cs will have this already). I
    also allow just the first N characters to be modified.

    o Return the leftmost or rightmost N characters. When N is negative, then
    all *except* the abs(N) rightmost or leftmost characters are returned. When
    N is longer than the string, then it's padded with spaces, or a
    caller-supplied fill character (or string). (Some of this can be done, with
    a bit more trouble, with sprintf().)

    o Split or join strings as I think you've already mentioned.

    o Several functions to deal with filespecs: extract a path, filename,
    basefile, or extension; or to change or add an extension to a filespec.

    --
    Bartc
    BartC, Jan 25, 2013
    #12
  13. 在 2013å¹´1月25日星期五UTC+8下åˆ6æ—¶57分27秒,JohnF写é“:
    > Malcolm McLean <> wrote:
    >
    > > I'm putting together a little library of supplementary string functions,

    >
    > > strsup.c.

    >
    > >

    >
    > > The functions are intended to be fairly short, and to operate on char *s.

    >
    > > They should ideally be implementable as a single function which can be

    >
    > > snipped and pasted.

    >
    > >

    >
    > > Unlike the string.h functions they can depend on malloc().

    >
    > >

    >
    > > The obvious first function is strdup().

    >
    > >

    >
    > > I've also got

    >
    > >

    >
    > > strcount - count the instances of character ch in str.

    >
    > > trim - remove leadign and trailing whitespace from a string

    >
    > > singlespace - repalce all runs of whitespace in a string with a single

    >
    > > space character.

    >
    > > split - return an array of fields, split on a delimiter character.

    >
    > > compnumeric - compare two strings with embedded numbers.

    >
    > > replace - replace all substrings with new passed string.

    >
    > >

    >
    > > Any ideas for more?

    >
    >
    >
    > Some might be implemented as macros, especially the snipped-and-pasted
    >
    > variety. Several snipped-and-pasted examples from my programs are below.
    >
    > Some do little more than arg-checking, e.g., to avoid segfaulting if
    >
    > used with a NULL ptr.
    >
    > Want about ten zillion other ideas (some arguably a bit over-the-top)?
    >
    > Look at the VMS documentation about lexical string functions in DCL
    >
    > (DEC's shell DCL="digital command language"),
    >
    > openvms.compaq.com/doc/73final/6489/6489pro_047.html#66_manipulatingstrings
    >
    >
    >
    > /* ---
    >
    > * macro to skip whitespace
    >
    > * ------------------------ */
    >
    > #define WHITESPACE " \t\n\r\f\v" /* skipped whitespace chars */
    >
    > #define skipwhite(thisstr) if ( (thisstr) != NULL ) \
    >
    > thisstr += strspn(thisstr,WHITESPACE)
    >
    > /* ---
    >
    > * macros to check if a string is empty
    >
    > * ------------------------------------ */
    >
    > #define isempty(s) ((s)==NULL?1:(*(s)=='\000'?1:0))
    >
    > /* ---
    >

    I have to say one subtle point about
    the type char *str=NULL in C.

    str=NULL; // not even allocated

    char str2[10]; //10 bytes alloacted
    str2[0]='\0' ; // string length is zero

    // But I work out my own string library before
    // by my own format to deal with the BIG5 encoding.

    > * macro to strip leading and trailing whitespace
    >
    > * ---------------------------------------------- */
    >
    > #define trimwhite(thisstr) if ( (thisstr) != NULL ) { \
    >
    > int thislen = strlen(thisstr); \
    >
    > while ( --thislen >= 0 ) \
    >
    > if ( isthischar((thisstr)[thislen],WHITESPACE) ) \
    >
    > (thisstr)[thislen] = '\000'; \
    >
    > else break; \
    >
    > if ( (thislen = strspn((thisstr),WHITESPACE)) > 0 ) \
    >
    > {strsqueeze((thisstr),thislen);} } else /*user adds ;*/
    >
    > /* ---
    >
    > * macro to remove all 'c' chars from s
    >
    > * ------------------------------------ */
    >
    > #define compress(s,c) if(!isempty(s)) /* remove embedded c's from s */ \
    >
    > { char *p; while((p=strchr((s),(c)))!=NULL) {strsqueeze(p,1);} } else
    >
    > /* ---
    >
    > * macro to strcpy(s,s+n) using memmove() (also works for negative n)
    >
    > * ------------------------------------------------------------------ */
    >
    > #define strsqueeze(s,n) if((n)!=0) { if(!isempty((s))) { \
    >
    > int thislen3=strlen(s); \
    >
    > if ((n) >= thislen3) *(s) = '\000'; \
    >
    > else memmove((s),((s)+(n)),(1+thislen3-(n))); }} else /*user adds;*/
    >
    > /* ---
    >
    > * macro to strncpy() n bytes and make sure it's null-terminated
    >
    > * ------------------------------------------------------------- */
    >
    > #define strninit(target,source,n) \
    >
    > if( (target)!=NULL && (n)>=0 ) { \
    >
    > char *thissource = (source); \
    >
    > (target)[0] = '\000'; \
    >
    > if ( (n)>0 && thissource!=NULL ) { \
    >
    > strncpy((target),thissource,(n)); \
    >
    > (target)[(n)] = '\000'; } }
    >
    > /* ---
    >
    > * macro to check for thischar inthisstr
    >
    > * ------------------------------------- */
    >
    > #define isthischar(thischar,inthisstr) \
    >
    > ( (thischar)!='\000' && *(inthisstr)!='\000' \
    >
    > && strchr(inthisstr,(thischar))!=(char *)NULL )
    >
    > /* ---
    >
    > * macro for last char of a string
    >
    > * ------------------------------- */
    >
    > #define lastchar(s) (isempty(s)?'\000':*((s)+(strlen(s)-1)))
    >
    >
    >
    > --
    >
    > John Forkosh ( mailto: where j=john and f=forkosh )
    88888 Dihedral, Jan 25, 2013
    #13
  14. On Friday, January 25, 2013 6:27:16 PM UTC, Bart wrote:
    > "Malcolm McLean" <> wrote in message
    >
    >
    > As was mentioned, there are just too many ways of dealing with such
    > functions: results can be in-place, in a caller-supplied destination, or in
    > allocated memory.
    >

    That's one big issue.

    trim() could be reasonably an in-place trim function, one that took a const
    char * and a buffer (with length supplied or not supplied), or return an
    allocated pointer.
    I'd say the first option is best because of the way the function is likely
    to be used. No-one is going to want to pass it a string literal, and only
    rarely will you want both a trimmed string and the original retained.
    But with replace() you can't easily calculate the output size before calling
    it, and it can't be in-place as it's as likely to expand as to shrink the
    buffer.

    --
    Visit Malcolm's website
    http://www.malcolmmclean.site11.com
    Malcolm McLean, Jan 28, 2013
    #14
  15. Malcolm McLean

    Guest

    On Tuesday, January 22, 2013 1:14:18 PM UTC-5, Malcolm McLean wrote:
    > I'm putting together a little library of supplementary string functions, strsup.c. The functions are intended to be fairly short, and to operate on char *s. They should ideally be implementable as a single function which can be snipped and pasted. Unlike the string.h functions they can depend on malloc(). The obvious first function is strdup(). I've also got strcount - count the instances of character ch in str. trim - remove leadign and trailing whitespace from a string singlespace - repalce all runs of whitespace ina string with a single space character. split - return an array of fields,split on a delimiter character. compnumeric - compare two strings with embedded numbers. replace - replace all substrings with new passed string. Anyideas for more?


    Besides the obvious candidates, I've also found useful on a number of occasions a function to strip unwanted characters (such as commas or dollar signs in numeric values) from a string before further processing. My own version is simply called strip(), but for a library you would probably want to rename it:

    /*
    * remove unwanted characters from string
    *
    * call as strip(char *str, *unwanted)
    *
    * where "str" is the string to process and "unwanted" is a null-
    * terminated string containing the characters to be stripped.
    *
    * returns pointer to modified string
    */

    char *strip(char *str, char *unwanted) {
    char *cp, *savptr;
    savptr = str;
    cp = str - 1;
    while (*++cp = *str++)
    if (strchr(unwanted, *cp) != NULL)
    --cp;
    return savptr;
    {
    , Jan 29, 2013
    #15
  16. On Tuesday, January 29, 2013 4:17:22 PM UTC, wrote:
    > On Tuesday, January 22, 2013 1:14:18 PM UTC-5, Malcolm McLean wrote:
    >
    > My own version is simply called strip(), but for a library you would probably want to rename it:
    >

    It's got the str prefix which indicates a standard library string function.
    Malcolm McLean, Jan 29, 2013
    #16
  17. writes:
    <snip>
    > Besides the obvious candidates, I've also found useful on a number of
    > occasions a function to strip unwanted characters (such as commas or
    > dollar signs in numeric values) from a string before further
    > processing. My own version is simply called strip(), but for a
    > library you would probably want to rename it:
    >
    > /*
    > * remove unwanted characters from string
    > *
    > * call as strip(char *str, *unwanted)
    > *
    > * where "str" is the string to process and "unwanted" is a null-
    > * terminated string containing the characters to be stripped.
    > *
    > * returns pointer to modified string
    > */
    >
    > char *strip(char *str, char *unwanted) {
    > char *cp, *savptr;
    > savptr = str;
    > cp = str - 1;


    That's, technically, problematic. If str points to that start of an
    array, the standard does not permit you to form the pointer str - 1,
    even if you do nothing with it!

    > while (*++cp = *str++)
    > if (strchr(unwanted, *cp) != NULL)
    > --cp;


    I think the fix is simpler than the original:

    cp = str;
    while (*cp = *str++)
    if (strchr(unwanted, *cp) == NULL)
    cp++;

    I such cases I tend to write:

    while (*cp = *str++)
    cp += strchr(unwanted, *cp) == NULL;

    but similar things have caused me to accused of all sorts of barbarism,
    so I won't suggest you do likewise!

    > return savptr;
    > {


    } I think.


    --
    Ben.
    Ben Bacarisse, Jan 30, 2013
    #17
  18. Malcolm McLean

    Tim Rentsch Guest

    Ben Bacarisse <> writes:

    > writes:
    > <snip>
    >> Besides the obvious candidates, I've also found useful on a number of
    >> occasions a function to strip unwanted characters (such as commas or
    >> dollar signs in numeric values) from a string before further
    >> processing. My own version is simply called strip(), but for a
    >> library you would probably want to rename it:
    >>
    >> /*
    >> * remove unwanted characters from string
    >> *
    >> * call as strip(char *str, *unwanted)
    >> *
    >> * where "str" is the string to process and "unwanted" is a null-
    >> * terminated string containing the characters to be stripped.
    >> *
    >> * returns pointer to modified string
    >> */
    >>
    >> char *strip(char *str, char *unwanted) {
    >> char *cp, *savptr;
    >> savptr = str;
    >> cp = str - 1;

    >
    > That's, technically, problematic. If str points to that start of
    > an array, the standard does not permit you to form the pointer
    > str - 1, even if you do nothing with it!
    >
    >> while (*++cp = *str++)
    >> if (strchr(unwanted, *cp) != NULL)
    >> --cp;

    >
    > I think the fix is simpler than the original:
    >
    > cp = str;
    > while (*cp = *str++)
    > if (strchr(unwanted, *cp) == NULL)
    > cp++;
    >
    > I such cases I tend to write:
    >
    > while (*cp = *str++)
    > cp += strchr(unwanted, *cp) == NULL;
    >
    > but similar things have caused me to accused of all sorts of
    > barbarism, so I won't suggest you do likewise!


    I was inspired by your examples to look for a short and simple
    implementation. I came up with this:

    char *
    eliminate( char *to_shrink, const char *unwanted ){
    char *p = to_shrink, *q = p;
    do q += strspn( q, unwanted ); while( *p++ = *q++ );
    return to_shrink;
    }

    I think it's easy to see that all 'unwanted' bytes are skipped
    and only values not in 'unwanted' are copied.

    And now let the barbarism accusers say what they will!
    Tim Rentsch, Jan 31, 2013
    #18
  19. Tim Rentsch <> writes:

    > Ben Bacarisse <> writes:
    >
    >> writes:
    >> <snip>
    >>> Besides the obvious candidates, I've also found useful on a number of
    >>> occasions a function to strip unwanted characters (such as commas or
    >>> dollar signs in numeric values) from a string before further
    >>> processing. My own version is simply called strip(), but for a
    >>> library you would probably want to rename it:
    >>>
    >>> /*
    >>> * remove unwanted characters from string
    >>> *
    >>> * call as strip(char *str, *unwanted)
    >>> *
    >>> * where "str" is the string to process and "unwanted" is a null-
    >>> * terminated string containing the characters to be stripped.
    >>> *
    >>> * returns pointer to modified string
    >>> */
    >>>
    >>> char *strip(char *str, char *unwanted) {
    >>> char *cp, *savptr;
    >>> savptr = str;
    >>> cp = str - 1;

    >>
    >> That's, technically, problematic. If str points to that start of
    >> an array, the standard does not permit you to form the pointer
    >> str - 1, even if you do nothing with it!
    >>
    >>> while (*++cp = *str++)
    >>> if (strchr(unwanted, *cp) != NULL)
    >>> --cp;

    >>
    >> I think the fix is simpler than the original:
    >>
    >> cp = str;
    >> while (*cp = *str++)
    >> if (strchr(unwanted, *cp) == NULL)
    >> cp++;
    >>
    >> I such cases I tend to write:
    >>
    >> while (*cp = *str++)
    >> cp += strchr(unwanted, *cp) == NULL;
    >>
    >> but similar things have caused me to accused of all sorts of
    >> barbarism, so I won't suggest you do likewise!

    >
    > I was inspired by your examples to look for a short and simple
    > implementation. I came up with this:


    Well, I was suggesting a simple fix rather than a simple alternative.

    > char *
    > eliminate( char *to_shrink, const char *unwanted ){
    > char *p = to_shrink, *q = p;
    > do q += strspn( q, unwanted ); while( *p++ = *q++ );
    > return to_shrink;
    > }
    >
    > I think it's easy to see that all 'unwanted' bytes are skipped
    > and only values not in 'unwanted' are copied.
    >
    > And now let the barbarism accusers say what they will!


    That's nice (expect for the layout!) and I don't think there is any
    barbarism involved (where could it be?).

    --
    Ben.
    Ben Bacarisse, Feb 2, 2013
    #19
  20. Malcolm McLean

    Tim Rentsch Guest

    Ben Bacarisse <> writes:

    > Tim Rentsch <> writes:
    >
    >> Ben Bacarisse <> writes:
    >>>> [.. discussing a function to remove unwanted characters from
    >>>> a string ..]
    >>>
    >>> I think the fix is simpler than the original:
    >>> cp = str;
    >>> while (*cp = *str++)
    >>> if (strchr(unwanted, *cp) == NULL)
    >>> cp++;
    >>>
    >>> I such cases I tend to write:
    >>>
    >>> while (*cp = *str++)
    >>> cp += strchr(unwanted, *cp) == NULL;
    >>>
    >>> but similar things have caused me to accused of all sorts of
    >>> barbarism, so I won't suggest you do likewise!

    >>
    >> I was inspired by your examples to look for a short and simple
    >> implementation. I came up with this:

    >
    > Well, I was suggesting a simple fix rather than a simple
    > alternative.


    Right. I didn't mean to imply anything different.

    >> char *
    >> eliminate( char *to_shrink, const char *unwanted ){
    >> char *p = to_shrink, *q = p;
    >> do q += strspn( q, unwanted ); while( *p++ = *q++ );
    >> return to_shrink;
    >> }
    >>
    >> [snip]

    >
    > That's nice (expect for the layout!) [snip]


    Those who find the single-line do/while unattractive might
    prefer this instead:

    while( q += strspn( q, unwanted ), *p++ = *q++ ) {}
    Tim Rentsch, Feb 3, 2013
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Xiangliang Meng
    Replies:
    1
    Views:
    1,576
    Victor Bazarov
    Jun 21, 2004
  2. Ben Pfaff

    supplementary C frequent answers

    Ben Pfaff, Jan 3, 2004, in forum: C Programming
    Replies:
    26
    Views:
    746
    Ben Pfaff
    Jan 5, 2004
  3. korean_dave
    Replies:
    2
    Views:
    311
    John Machin
    Jun 17, 2008
  4. Andrew Walrond

    Supplementary groups

    Andrew Walrond, Nov 20, 2003, in forum: Ruby
    Replies:
    0
    Views:
    89
    Andrew Walrond
    Nov 20, 2003
  5. Jan Pokorný
    Replies:
    1
    Views:
    174
    Jan Pokorný
    Mar 11, 2012
Loading...

Share This Page