strsup - supplementary string functions

Malcolm McLean · Jan 22, 2013

I'm putting together a little library of supplementary string functions,
strsup.c.

The functions are intended to be fairly short, and to operate on char *s.
They should ideally be implementable as a single function which can be snipped
and pasted.

Unlike the string.h functions they can depend on malloc().

The obvious first function is strdup().

I've also got

strcount - count the instances of character ch in str.
trim - remove leadign and trailing whitespace from a string
singlespace - repalce all runs of whitespace in a string with a single
space character.
split - return an array of fields, split on a delimiter character.
compnumeric - compare two strings with embedded numbers.
replace - replace all substrings with new passed string.

Any ideas for more?

Keith Thompson · Jan 22, 2013

Malcolm McLean said:
I'm putting together a little library of supplementary string functions,
strsup.c.

The functions are intended to be fairly short, and to operate on char *s.
They should ideally be implementable as a single function which can be snipped
and pasted.

Unlike the string.h functions they can depend on malloc().

The obvious first function is strdup().

I've also got

strcount - count the instances of character ch in str.
trim - remove leadign and trailing whitespace from a string
singlespace - repalce all runs of whitespace in a string with a single
space character.
split - return an array of fields, split on a delimiter character.
compnumeric - compare two strings with embedded numbers.
replace - replace all substrings with new passed string.

Just one comment: the name "strcount" is reserved. And the other names
could easily conflict with existing identifiers in user code.

You might consider inventing a consistent naming scheme for your
functions.

If the purpose is to copy-and-paste the definitions, the user can change
the name easily enough, but if they're going to be part of a library
(which would make sense), then naming becomes more important.

James Kuyper · Jan 22, 2013

I just noticed that you've dropped your "Will write code for food" sig.
Congratulations, and good luck!

Keith Thompson · Jan 22, 2013

James Kuyper said:
I just noticed that you've dropped your "Will write code for food" sig.
Congratulations, and good luck!

Thanks!

I'm actually doing C programming, something I haven't done
professionally in quite a while. (Or, How I Learned To Stop Worrying
And Love Undefined Behavior.)

Johann Klammer · Jan 22, 2013

Malcolm said:
Any ideas for more?

I've been using those...

///replace non ascii printable chars with spaces
void utlAsciiSanitize(char *p);
/**
*\brief concat+realloc
*
*Will fail if called with str_p, size_p or str2 equal NULL or if
*allocation fails.
*
*\param str_p points to a char pointer pointing to a c string.
*May point to a char * holding NULL when calling, in which case
*memory will be allocated to hold the string to append.
*space may be reallocated for the string.
*\param size_p points to an unsigned holding allocated size.
*May be modified.
*\param str2 string to append. Must be NUL terminated.
*Must not be NULL.
*\returns 1 on error, 0 on success.
*/
int utlSmartCat(char **str_p, unsigned *size_p, char *str2);
///concat+realloc unterminated second string. _Always_ uses size2 bytes
from str2.
int utlSmartCatUt(char **str_p, unsigned *size_p, char *str2, unsigned
size2);
///realloc to strlen+1 (not actually necessary)
int utlSmartCatDone(char **str_p);

///smart cat for general arrays
int utlSmartMemmove(uint8_t **str_p, unsigned *size_p,
unsigned *alloc_p, uint8_t *str2, unsigned size2);

extern char * UTL_UNDEF;
/**
*\brief string lookup
*
*look up string at idx
*\param idx index to look up. Must be some integer type.
*\param tbl _has_ to be of type char *tbl[]
*\return a char * from tbl or STRU_UNDEF if out of bounds.
*STRU_UNDEF is a printable char *.
*/
#define utlGetStr(tbl,idx)
(((idx)>=(sizeof(tbl)/sizeof(char*)))?UTL_UNDEF:tbl[(idx)])

Jorgen Grahn · Jan 23, 2013

I'm putting together a little library of supplementary string functions,
strsup.c.

The functions are intended to be fairly short, and to operate on char *s.
They should ideally be implementable as a single function which can be snipped
and pasted.

Unlike the string.h functions they can depend on malloc().

The obvious first function is strdup().

I've also got

strcount - count the instances of character ch in str.
trim - remove leadign and trailing whitespace from a string

trim_right() might often be what people want, and it has nicer
semantics. Provide that one too.

singlespace - repalce all runs of whitespace in a string with a single
space character.
split - return an array of fields, split on a delimiter character.

There are a number of variations on that theme, like:
- split at most in N places
- split on whitespace (one or more SP, TAB etc)
- split and trim whitespace
- ...

I usually find that the simplest kind of split() leaves too much work
for me to do, so I've implemented some of those. On the other hand,
I also use Perl and that may influence my percieved needs ...

Any ideas for more?

- an is_empty(), so I don't have to see a myriad variations like
if(!s[0]); if(!*s); if(s[0]==0); if(s[0]=='0'); if(!strlen(s)) ...

- an equality test so I don't have to see all the variations on
strcmp() ...

/Jorgen

Bjorn Augestad · Jan 23, 2013

Den 22.01.2013 19:14, skrev Malcolm McLean:

I'm putting together a little library of supplementary string functions,
strsup.c.

The functions are intended to be fairly short, and to operate on char *s.
They should ideally be implementable as a single function which can be snipped
and pasted.

Unlike the string.h functions they can depend on malloc().

The obvious first function is strdup().

I've also got

strcount - count the instances of character ch in str.
trim - remove leadign and trailing whitespace from a string
singlespace - repalce all runs of whitespace in a string with a single
space character.
split - return an array of fields, split on a delimiter character.
compnumeric - compare two strings with embedded numbers.
replace - replace all substrings with new passed string.

Any ideas for more?

You may find some ideas here ;-)

http://libclc.cvs.sourceforge.net/viewvc/libclc/libclc/src/clc_string.h?view=markup

Ben Bacarisse · Jan 23, 2013

Jorgen Grahn said:
trim_right() might often be what people want, and it has nicer
semantics. Provide that one too.

A small point: when considering strings that are likely to contain text
the meaning of left and right is ambiguous. Even if this library is
currently all about ASCII strings (I don't recall) consistency with
future names might be worth considering. Terms like start and end (as
well as leading and trailing) are neutral in this sense.

<snip>

Malcolm McLean · Jan 24, 2013

Unless you change your mind and create (yet another, we've all done it)
structure based string type, here's a suggestion: have a very clear
convention in the naming as to which of your functions return the
original string, perhaps modified (convert case, trim the
right-hand-end) and which return a new string. In fact, think hard
about this all over (should replace always create a new string, create a
new string when lengthening but not shortening, or realloc when
necessary?).

The function will all take nul-terminated strings passed as char *s or
const char *s. The idea is to write functions that should have been in
string.h but weren't, either because of perceived need, or because string.h
isn't allowed to depend on malloc.

I think it's probably best to return a malloced char * where the return maybe larger than the input. Either way is a burden on the caller, malloc hasto be tested for null and freed, buffers have to be calculated. But the pass buffer _ length method has points in its favour also. Resolving that issue was one reason for starting this thread.

John McCue · Jan 24, 2013

Malcolm McLean said:
I'm putting together a little library of supplementary string functions,
strsup.c.

The functions are intended to be fairly short, and to operate on char *s.

Any ideas for more?

maybe one which would changed 2 or more spaces to 1 space in
a string.

John

JohnF · Jan 25, 2013

Malcolm McLean said:
I'm putting together a little library of supplementary string functions,
strsup.c.

The functions are intended to be fairly short, and to operate on char *s.
They should ideally be implementable as a single function which can be
snipped and pasted.

Unlike the string.h functions they can depend on malloc().

The obvious first function is strdup().

I've also got

strcount - count the instances of character ch in str.
trim - remove leadign and trailing whitespace from a string
singlespace - repalce all runs of whitespace in a string with a single
space character.
split - return an array of fields, split on a delimiter character.
compnumeric - compare two strings with embedded numbers.
replace - replace all substrings with new passed string.

Any ideas for more?

Some might be implemented as macros, especially the snipped-and-pasted
variety. Several snipped-and-pasted examples from my programs are below.
Some do little more than arg-checking, e.g., to avoid segfaulting if
used with a NULL ptr.
Want about ten zillion other ideas (some arguably a bit over-the-top)?
Look at the VMS documentation about lexical string functions in DCL
(DEC's shell DCL="digital command language"),
openvms.compaq.com/doc/73final/6489/6489pro_047.html#66_manipulatingstrings

/* ---
* macro to skip whitespace
* ------------------------ */
#define WHITESPACE " \t\n\r\f\v" /* skipped whitespace chars */
#define skipwhite(thisstr) if ( (thisstr) != NULL ) \
thisstr += strspn(thisstr,WHITESPACE)
/* ---
* macros to check if a string is empty
* ------------------------------------ */
#define isempty(s) ((s)==NULL?1

*(s)=='\000'?1:0))
/* ---
* macro to strip leading and trailing whitespace
* ---------------------------------------------- */
#define trimwhite(thisstr) if ( (thisstr) != NULL ) { \
int thislen = strlen(thisstr); \
while ( --thislen >= 0 ) \
if ( isthischar((thisstr)[thislen],WHITESPACE) ) \
(thisstr)[thislen] = '\000'; \
else break; \
if ( (thislen = strspn((thisstr),WHITESPACE)) > 0 ) \
{strsqueeze((thisstr),thislen);} } else /*user adds ;*/
/* ---
* macro to remove all 'c' chars from s
* ------------------------------------ */
#define compress(s,c) if(!isempty(s)) /* remove embedded c's from s */ \
{ char *p; while((p=strchr((s),(c)))!=NULL) {strsqueeze(p,1);} } else
/* ---
* macro to strcpy(s,s+n) using memmove() (also works for negative n)
* ------------------------------------------------------------------ */
#define strsqueeze(s,n) if((n)!=0) { if(!isempty((s))) { \
int thislen3=strlen(s); \
if ((n) >= thislen3) *(s) = '\000'; \
else memmove((s),((s)+(n)),(1+thislen3-(n))); }} else /*user adds ;*/
/* ---
* macro to strncpy() n bytes and make sure it's null-terminated
* ------------------------------------------------------------- */
#define strninit(target,source,n) \
if( (target)!=NULL && (n)>=0 ) { \
char *thissource = (source); \
(target)[0] = '\000'; \
if ( (n)>0 && thissource!=NULL ) { \
strncpy((target),thissource,(n)); \
(target)[(n)] = '\000'; } }
/* ---
* macro to check for thischar inthisstr
* ------------------------------------- */
#define isthischar(thischar,inthisstr) \
( (thischar)!='\000' && *(inthisstr)!='\000' \
&& strchr(inthisstr,(thischar))!=(char *)NULL )
/* ---
* macro for last char of a string
* ------------------------------- */
#define lastchar(s) (isempty(s)?'\000':*((s)+(strlen(s)-1)))

BartC · Jan 25, 2013

Malcolm McLean said:
The function will all take nul-terminated strings passed as char *s or
const char *s. The idea is to write functions that should have been in
string.h but weren't, either because of perceived need, or because
string.h
isn't allowed to depend on malloc.

I think it's probably best to return a malloced char * where the return
may be larger than the input.

How will the caller know that? And in the case of a trim function, the
result will never be bigger than the input, but the output may need to be
stored elsewhere because of the need to zero-terminate a sub-string of the
input.

As was mentioned, there are just too many ways of dealing with such
functions: results can be in-place, in a caller-supplied destination, or in
allocated memory.

Sometimes also, the caller has useful length information for a string, but
the standard library doesn't often have no way to impart that information to
the function; that would be handy to be able to do.

For ideas about new string functions, I mainly use the following:

o Convert a string to upper/lower case (some Cs will have this already). I
also allow just the first N characters to be modified.

o Return the leftmost or rightmost N characters. When N is negative, then
all *except* the abs(N) rightmost or leftmost characters are returned. When
N is longer than the string, then it's padded with spaces, or a
caller-supplied fill character (or string). (Some of this can be done, with
a bit more trouble, with sprintf().)

o Split or join strings as I think you've already mentioned.

o Several functions to deal with filespecs: extract a path, filename,
basefile, or extension; or to change or add an extension to a filespec.

88888 Dihedral · Jan 25, 2013

åœ¨ 2013å¹´1æœˆ25æ—¥æ˜ŸæœŸäº”UTC+8ä¸‹åˆ6æ—¶57åˆ†27ç§’ï¼ŒJohnFå†™é“ï¼š

Some might be implemented as macros, especially the snipped-and-pasted

variety. Several snipped-and-pasted examples from my programs are below.

Some do little more than arg-checking, e.g., to avoid segfaulting if

used with a NULL ptr.

Want about ten zillion other ideas (some arguably a bit over-the-top)?

Look at the VMS documentation about lexical string functions in DCL

(DEC's shell DCL="digital command language"),

openvms.compaq.com/doc/73final/6489/6489pro_047.html#66_manipulatingstrings

/* ---

* macro to skip whitespace

* ------------------------ */

#define WHITESPACE " \t\n\r\f\v" /* skipped whitespace chars */

#define skipwhite(thisstr) if ( (thisstr) != NULL ) \

thisstr += strspn(thisstr,WHITESPACE)

/* ---

* macros to check if a string is empty

* ------------------------------------ */

#define isempty(s) ((s)==NULL?1*(s)=='\000'?1:0))

/* ---

I have to say one subtle point about
the type char *str=NULL in C.

str=NULL; // not even allocated

char str2[10]; //10 bytes alloacted
str2[0]='\0' ; // string length is zero

// But I work out my own string library before
// by my own format to deal with the BIG5 encoding.

* macro to strip leading and trailing whitespace

* ---------------------------------------------- */

#define trimwhite(thisstr) if ( (thisstr) != NULL ) { \

int thislen = strlen(thisstr); \

while ( --thislen >= 0 ) \

if ( isthischar((thisstr)[thislen],WHITESPACE) ) \

(thisstr)[thislen] = '\000'; \

else break; \

if ( (thislen = strspn((thisstr),WHITESPACE)) > 0 ) \

{strsqueeze((thisstr),thislen);} } else /*user adds ;*/

/* ---

* macro to remove all 'c' chars from s

* ------------------------------------ */

#define compress(s,c) if(!isempty(s)) /* remove embedded c's from s */ \

{ char *p; while((p=strchr((s),(c)))!=NULL) {strsqueeze(p,1);} } else

/* ---

* macro to strcpy(s,s+n) using memmove() (also works for negative n)

* ------------------------------------------------------------------ */

#define strsqueeze(s,n) if((n)!=0) { if(!isempty((s))) { \

int thislen3=strlen(s); \

if ((n) >= thislen3) *(s) = '\000'; \

else memmove((s),((s)+(n)),(1+thislen3-(n))); }} else /*user adds;*/

/* ---

* macro to strncpy() n bytes and make sure it's null-terminated

* ------------------------------------------------------------- */

#define strninit(target,source,n) \

if( (target)!=NULL && (n)>=0 ) { \

char *thissource = (source); \

(target)[0] = '\000'; \

if ( (n)>0 && thissource!=NULL ) { \

strncpy((target),thissource,(n)); \

(target)[(n)] = '\000'; } }

/* ---

* macro to check for thischar inthisstr

* ------------------------------------- */

#define isthischar(thischar,inthisstr) \

( (thischar)!='\000' && *(inthisstr)!='\000' \

&& strchr(inthisstr,(thischar))!=(char *)NULL )

/* ---

* macro for last char of a string

* ------------------------------- */

#define lastchar(s) (isempty(s)?'\000':*((s)+(strlen(s)-1)))

Malcolm McLean · Jan 28, 2013

As was mentioned, there are just too many ways of dealing with such
functions: results can be in-place, in a caller-supplied destination, or in
allocated memory.

That's one big issue.

trim() could be reasonably an in-place trim function, one that took a const
char * and a buffer (with length supplied or not supplied), or return an
allocated pointer.
I'd say the first option is best because of the way the function is likely
to be used. No-one is going to want to pass it a string literal, and only
rarely will you want both a trimmed string and the original retained.
But with replace() you can't easily calculate the output size before calling
it, and it can't be in-place as it's as likely to expand as to shrink the
buffer.

ssmitch · Jan 29, 2013

I'm putting together a little library of supplementary string functions, strsup.c. The functions are intended to be fairly short, and to operate on char *s. They should ideally be implementable as a single function which can be snipped and pasted. Unlike the string.h functions they can depend on malloc(). The obvious first function is strdup(). I've also got strcount - count the instances of character ch in str. trim - remove leadign and trailing whitespace from a string singlespace - repalce all runs of whitespace ina string with a single space character. split - return an array of fields,split on a delimiter character. compnumeric - compare two strings with embedded numbers. replace - replace all substrings with new passed string. Anyideas for more?

Besides the obvious candidates, I've also found useful on a number of occasions a function to strip unwanted characters (such as commas or dollar signs in numeric values) from a string before further processing. My own version is simply called strip(), but for a library you would probably want to rename it:

/*
* remove unwanted characters from string
*
* call as strip(char *str, *unwanted)
*
* where "str" is the string to process and "unwanted" is a null-
* terminated string containing the characters to be stripped.
*
* returns pointer to modified string
*/

char *strip(char *str, char *unwanted) {
char *cp, *savptr;
savptr = str;
cp = str - 1;
while (*++cp = *str++)
if (strchr(unwanted, *cp) != NULL)
--cp;
return savptr;
{

Malcolm McLean · Jan 29, 2013

On Tuesday, January 22, 2013 1:14:18 PM UTC-5, Malcolm McLean wrote:

My own version is simply called strip(), but for a library you would probably want to rename it:

It's got the str prefix which indicates a standard library string function.

Ben Bacarisse · Jan 30, 2013

Besides the obvious candidates, I've also found useful on a number of
occasions a function to strip unwanted characters (such as commas or
dollar signs in numeric values) from a string before further
processing. My own version is simply called strip(), but for a
library you would probably want to rename it:

/*
* remove unwanted characters from string
*
* call as strip(char *str, *unwanted)
*
* where "str" is the string to process and "unwanted" is a null-
* terminated string containing the characters to be stripped.
*
* returns pointer to modified string
*/

char *strip(char *str, char *unwanted) {
char *cp, *savptr;
savptr = str;
cp = str - 1;

That's, technically, problematic. If str points to that start of an
array, the standard does not permit you to form the pointer str - 1,
even if you do nothing with it!

while (*++cp = *str++)
if (strchr(unwanted, *cp) != NULL)
--cp;

I think the fix is simpler than the original:

cp = str;
while (*cp = *str++)
if (strchr(unwanted, *cp) == NULL)
cp++;

I such cases I tend to write:

while (*cp = *str++)
cp += strchr(unwanted, *cp) == NULL;

but similar things have caused me to accused of all sorts of barbarism,
so I won't suggest you do likewise!

return savptr;
{

} I think.

Tim Rentsch · Jan 31, 2013

Ben Bacarisse said:
That's, technically, problematic. If str points to that start of
an array, the standard does not permit you to form the pointer
str - 1, even if you do nothing with it!

I think the fix is simpler than the original:

cp = str;
while (*cp = *str++)
if (strchr(unwanted, *cp) == NULL)
cp++;

I such cases I tend to write:

while (*cp = *str++)
cp += strchr(unwanted, *cp) == NULL;

but similar things have caused me to accused of all sorts of
barbarism, so I won't suggest you do likewise!

I was inspired by your examples to look for a short and simple
implementation. I came up with this:

char *
eliminate( char *to_shrink, const char *unwanted ){
char *p = to_shrink, *q = p;
do q += strspn( q, unwanted ); while( *p++ = *q++ );
return to_shrink;
}

I think it's easy to see that all 'unwanted' bytes are skipped
and only values not in 'unwanted' are copied.

And now let the barbarism accusers say what they will!

Ben Bacarisse · Feb 2, 2013

Tim Rentsch said:
I was inspired by your examples to look for a short and simple
implementation. I came up with this:

Well, I was suggesting a simple fix rather than a simple alternative.

char *
eliminate( char *to_shrink, const char *unwanted ){
char *p = to_shrink, *q = p;
do q += strspn( q, unwanted ); while( *p++ = *q++ );
return to_shrink;
}

I think it's easy to see that all 'unwanted' bytes are skipped
and only values not in 'unwanted' are copied.

And now let the barbarism accusers say what they will!

That's nice (expect for the layout!) and I don't think there is any
barbarism involved (where could it be?).

Tim Rentsch · Feb 3, 2013

Ben Bacarisse said:
Tim Rentsch said:

Ben Bacarisse said:

[.. discussing a function to remove unwanted characters from
a string ..]

I think the fix is simpler than the original:
cp = str;
while (*cp = *str++)
if (strchr(unwanted, *cp) == NULL)
cp++;

I such cases I tend to write:

while (*cp = *str++)
cp += strchr(unwanted, *cp) == NULL;

but similar things have caused me to accused of all sorts of
barbarism, so I won't suggest you do likewise!

Click to expand...

I was inspired by your examples to look for a short and simple
implementation. I came up with this:

Click to expand...

Well, I was suggesting a simple fix rather than a simple
alternative.

Right. I didn't mean to imply anything different.

char *
eliminate( char *to_shrink, const char *unwanted ){
char *p = to_shrink, *q = p;
do q += strspn( q, unwanted ); while( *p++ = *q++ );
return to_shrink;
}

[snip]

Click to expand...

That's nice (expect for the layout!) [snip]

Those who find the single-line do/while unattractive might
prefer this instead:

while( q += strspn( q, unwanted ), *p++ = *q++ ) {}

how to trim() a String only at the right side?	11	Sep 25, 2013
String functions: what's the difference?	4	Mar 9, 2006
Py-dea: Streamline string literals now!	21	Dec 28, 2011
Find/Replace with char arrays without string functions	0	Nov 4, 2006
FAQ 4.32 How do I strip blank space from the beginning/end of a string?	0	Feb 25, 2011
KirbyBase : replacing string exceptions	2	Nov 23, 2009
FAQ 4.34 How do I extract selected columns from a string?	0	Apr 27, 2011
split string into multi-character "letters"	7	Aug 25, 2010

strsup - supplementary string functions

Malcolm McLean

Keith Thompson

James Kuyper

Keith Thompson

Johann Klammer

Jorgen Grahn

Bjorn Augestad

Ben Bacarisse

Malcolm McLean

John McCue

JohnF

BartC

88888 Dihedral

Malcolm McLean

ssmitch

Malcolm McLean

Ben Bacarisse

Tim Rentsch

Ben Bacarisse

Tim Rentsch

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads