strsup - supplementary string functions

M

Malcolm McLean

I'm putting together a little library of supplementary string functions,
strsup.c.

The functions are intended to be fairly short, and to operate on char *s.
They should ideally be implementable as a single function which can be snipped
and pasted.

Unlike the string.h functions they can depend on malloc().

The obvious first function is strdup().

I've also got

strcount - count the instances of character ch in str.
trim - remove leadign and trailing whitespace from a string
singlespace - repalce all runs of whitespace in a string with a single
space character.
split - return an array of fields, split on a delimiter character.
compnumeric - compare two strings with embedded numbers.
replace - replace all substrings with new passed string.

Any ideas for more?
 
K

Keith Thompson

Malcolm McLean said:
I'm putting together a little library of supplementary string functions,
strsup.c.

The functions are intended to be fairly short, and to operate on char *s.
They should ideally be implementable as a single function which can be snipped
and pasted.

Unlike the string.h functions they can depend on malloc().

The obvious first function is strdup().

I've also got

strcount - count the instances of character ch in str.
trim - remove leadign and trailing whitespace from a string
singlespace - repalce all runs of whitespace in a string with a single
space character.
split - return an array of fields, split on a delimiter character.
compnumeric - compare two strings with embedded numbers.
replace - replace all substrings with new passed string.

Just one comment: the name "strcount" is reserved. And the other names
could easily conflict with existing identifiers in user code.

You might consider inventing a consistent naming scheme for your
functions.

If the purpose is to copy-and-paste the definitions, the user can change
the name easily enough, but if they're going to be part of a library
(which would make sense), then naming becomes more important.
 
J

James Kuyper

I just noticed that you've dropped your "Will write code for food" sig.
Congratulations, and good luck!
 
K

Keith Thompson

James Kuyper said:
I just noticed that you've dropped your "Will write code for food" sig.
Congratulations, and good luck!

Thanks!

I'm actually doing C programming, something I haven't done
professionally in quite a while. (Or, How I Learned To Stop Worrying
And Love Undefined Behavior.)
 
J

Johann Klammer

Malcolm said:
Any ideas for more?
I've been using those...

///replace non ascii printable chars with spaces
void utlAsciiSanitize(char *p);
/**
*\brief concat+realloc
*
*Will fail if called with str_p, size_p or str2 equal NULL or if
*allocation fails.
*
*\param str_p points to a char pointer pointing to a c string.
*May point to a char * holding NULL when calling, in which case
*memory will be allocated to hold the string to append.
*space may be reallocated for the string.
*\param size_p points to an unsigned holding allocated size.
*May be modified.
*\param str2 string to append. Must be NUL terminated.
*Must not be NULL.
*\returns 1 on error, 0 on success.
*/
int utlSmartCat(char **str_p, unsigned *size_p, char *str2);
///concat+realloc unterminated second string. _Always_ uses size2 bytes
from str2.
int utlSmartCatUt(char **str_p, unsigned *size_p, char *str2, unsigned
size2);
///realloc to strlen+1 (not actually necessary)
int utlSmartCatDone(char **str_p);

///smart cat for general arrays
int utlSmartMemmove(uint8_t **str_p, unsigned *size_p,
unsigned *alloc_p, uint8_t *str2, unsigned size2);

extern char * UTL_UNDEF;
/**
*\brief string lookup
*
*look up string at idx
*\param idx index to look up. Must be some integer type.
*\param tbl _has_ to be of type char *tbl[]
*\return a char * from tbl or STRU_UNDEF if out of bounds.
*STRU_UNDEF is a printable char *.
*/
#define utlGetStr(tbl,idx)
(((idx)>=(sizeof(tbl)/sizeof(char*)))?UTL_UNDEF:tbl[(idx)])
 
J

Jorgen Grahn

I'm putting together a little library of supplementary string functions,
strsup.c.

The functions are intended to be fairly short, and to operate on char *s.
They should ideally be implementable as a single function which can be snipped
and pasted.

Unlike the string.h functions they can depend on malloc().

The obvious first function is strdup().

I've also got

strcount - count the instances of character ch in str.
trim - remove leadign and trailing whitespace from a string

trim_right() might often be what people want, and it has nicer
semantics. Provide that one too.
singlespace - repalce all runs of whitespace in a string with a single
space character.
split - return an array of fields, split on a delimiter character.

There are a number of variations on that theme, like:
- split at most in N places
- split on whitespace (one or more SP, TAB etc)
- split and trim whitespace
- ...

I usually find that the simplest kind of split() leaves too much work
for me to do, so I've implemented some of those. On the other hand,
I also use Perl and that may influence my percieved needs ...
Any ideas for more?

- an is_empty(), so I don't have to see a myriad variations like
if(!s[0]); if(!*s); if(s[0]==0); if(s[0]=='0'); if(!strlen(s)) ...

- an equality test so I don't have to see all the variations on
strcmp() ...

/Jorgen
 
B

Bjorn Augestad

Den 22.01.2013 19:14, skrev Malcolm McLean:
I'm putting together a little library of supplementary string functions,
strsup.c.

The functions are intended to be fairly short, and to operate on char *s.
They should ideally be implementable as a single function which can be snipped
and pasted.

Unlike the string.h functions they can depend on malloc().

The obvious first function is strdup().

I've also got

strcount - count the instances of character ch in str.
trim - remove leadign and trailing whitespace from a string
singlespace - repalce all runs of whitespace in a string with a single
space character.
split - return an array of fields, split on a delimiter character.
compnumeric - compare two strings with embedded numbers.
replace - replace all substrings with new passed string.

Any ideas for more?

You may find some ideas here ;-)

http://libclc.cvs.sourceforge.net/viewvc/libclc/libclc/src/clc_string.h?view=markup
 
B

Ben Bacarisse

Jorgen Grahn said:
trim_right() might often be what people want, and it has nicer
semantics. Provide that one too.

A small point: when considering strings that are likely to contain text
the meaning of left and right is ambiguous. Even if this library is
currently all about ASCII strings (I don't recall) consistency with
future names might be worth considering. Terms like start and end (as
well as leading and trailing) are neutral in this sense.

<snip>
 
M

Malcolm McLean

Unless you change your mind and create (yet another, we've all done it)
structure based string type, here's a suggestion: have a very clear
convention in the naming as to which of your functions return the
original string, perhaps modified (convert case, trim the
right-hand-end) and which return a new string. In fact, think hard
about this all over (should replace always create a new string, create a
new string when lengthening but not shortening, or realloc when
necessary?).
The function will all take nul-terminated strings passed as char *s or
const char *s. The idea is to write functions that should have been in
string.h but weren't, either because of perceived need, or because string.h
isn't allowed to depend on malloc.

I think it's probably best to return a malloced char * where the return maybe larger than the input. Either way is a burden on the caller, malloc hasto be tested for null and freed, buffers have to be calculated. But the pass buffer _ length method has points in its favour also. Resolving that issue was one reason for starting this thread.
 
J

John McCue

Malcolm McLean said:
I'm putting together a little library of supplementary string functions,
strsup.c.

The functions are intended to be fairly short, and to operate on char *s.
Any ideas for more?

maybe one which would changed 2 or more spaces to 1 space in
a string.

John
 
J

JohnF

Malcolm McLean said:
I'm putting together a little library of supplementary string functions,
strsup.c.

The functions are intended to be fairly short, and to operate on char *s.
They should ideally be implementable as a single function which can be
snipped and pasted.

Unlike the string.h functions they can depend on malloc().

The obvious first function is strdup().

I've also got

strcount - count the instances of character ch in str.
trim - remove leadign and trailing whitespace from a string
singlespace - repalce all runs of whitespace in a string with a single
space character.
split - return an array of fields, split on a delimiter character.
compnumeric - compare two strings with embedded numbers.
replace - replace all substrings with new passed string.

Any ideas for more?

Some might be implemented as macros, especially the snipped-and-pasted
variety. Several snipped-and-pasted examples from my programs are below.
Some do little more than arg-checking, e.g., to avoid segfaulting if
used with a NULL ptr.
Want about ten zillion other ideas (some arguably a bit over-the-top)?
Look at the VMS documentation about lexical string functions in DCL
(DEC's shell DCL="digital command language"),
openvms.compaq.com/doc/73final/6489/6489pro_047.html#66_manipulatingstrings

/* ---
* macro to skip whitespace
* ------------------------ */
#define WHITESPACE " \t\n\r\f\v" /* skipped whitespace chars */
#define skipwhite(thisstr) if ( (thisstr) != NULL ) \
thisstr += strspn(thisstr,WHITESPACE)
/* ---
* macros to check if a string is empty
* ------------------------------------ */
#define isempty(s) ((s)==NULL?1:(*(s)=='\000'?1:0))
/* ---
* macro to strip leading and trailing whitespace
* ---------------------------------------------- */
#define trimwhite(thisstr) if ( (thisstr) != NULL ) { \
int thislen = strlen(thisstr); \
while ( --thislen >= 0 ) \
if ( isthischar((thisstr)[thislen],WHITESPACE) ) \
(thisstr)[thislen] = '\000'; \
else break; \
if ( (thislen = strspn((thisstr),WHITESPACE)) > 0 ) \
{strsqueeze((thisstr),thislen);} } else /*user adds ;*/
/* ---
* macro to remove all 'c' chars from s
* ------------------------------------ */
#define compress(s,c) if(!isempty(s)) /* remove embedded c's from s */ \
{ char *p; while((p=strchr((s),(c)))!=NULL) {strsqueeze(p,1);} } else
/* ---
* macro to strcpy(s,s+n) using memmove() (also works for negative n)
* ------------------------------------------------------------------ */
#define strsqueeze(s,n) if((n)!=0) { if(!isempty((s))) { \
int thislen3=strlen(s); \
if ((n) >= thislen3) *(s) = '\000'; \
else memmove((s),((s)+(n)),(1+thislen3-(n))); }} else /*user adds ;*/
/* ---
* macro to strncpy() n bytes and make sure it's null-terminated
* ------------------------------------------------------------- */
#define strninit(target,source,n) \
if( (target)!=NULL && (n)>=0 ) { \
char *thissource = (source); \
(target)[0] = '\000'; \
if ( (n)>0 && thissource!=NULL ) { \
strncpy((target),thissource,(n)); \
(target)[(n)] = '\000'; } }
/* ---
* macro to check for thischar inthisstr
* ------------------------------------- */
#define isthischar(thischar,inthisstr) \
( (thischar)!='\000' && *(inthisstr)!='\000' \
&& strchr(inthisstr,(thischar))!=(char *)NULL )
/* ---
* macro for last char of a string
* ------------------------------- */
#define lastchar(s) (isempty(s)?'\000':*((s)+(strlen(s)-1)))
 
B

BartC

Malcolm McLean said:
The function will all take nul-terminated strings passed as char *s or
const char *s. The idea is to write functions that should have been in
string.h but weren't, either because of perceived need, or because
string.h
isn't allowed to depend on malloc.

I think it's probably best to return a malloced char * where the return
may be larger than the input.

How will the caller know that? And in the case of a trim function, the
result will never be bigger than the input, but the output may need to be
stored elsewhere because of the need to zero-terminate a sub-string of the
input.

As was mentioned, there are just too many ways of dealing with such
functions: results can be in-place, in a caller-supplied destination, or in
allocated memory.

Sometimes also, the caller has useful length information for a string, but
the standard library doesn't often have no way to impart that information to
the function; that would be handy to be able to do.

For ideas about new string functions, I mainly use the following:

o Convert a string to upper/lower case (some Cs will have this already). I
also allow just the first N characters to be modified.

o Return the leftmost or rightmost N characters. When N is negative, then
all *except* the abs(N) rightmost or leftmost characters are returned. When
N is longer than the string, then it's padded with spaces, or a
caller-supplied fill character (or string). (Some of this can be done, with
a bit more trouble, with sprintf().)

o Split or join strings as I think you've already mentioned.

o Several functions to deal with filespecs: extract a path, filename,
basefile, or extension; or to change or add an extension to a filespec.
 
8

88888 Dihedral

在 2013å¹´1月25日星期五UTC+8下åˆ6æ—¶57分27秒,JohnF写é“:
Some might be implemented as macros, especially the snipped-and-pasted

variety. Several snipped-and-pasted examples from my programs are below.

Some do little more than arg-checking, e.g., to avoid segfaulting if

used with a NULL ptr.

Want about ten zillion other ideas (some arguably a bit over-the-top)?

Look at the VMS documentation about lexical string functions in DCL

(DEC's shell DCL="digital command language"),

openvms.compaq.com/doc/73final/6489/6489pro_047.html#66_manipulatingstrings



/* ---

* macro to skip whitespace

* ------------------------ */

#define WHITESPACE " \t\n\r\f\v" /* skipped whitespace chars */

#define skipwhite(thisstr) if ( (thisstr) != NULL ) \

thisstr += strspn(thisstr,WHITESPACE)

/* ---

* macros to check if a string is empty

* ------------------------------------ */

#define isempty(s) ((s)==NULL?1:(*(s)=='\000'?1:0))

/* ---
I have to say one subtle point about
the type char *str=NULL in C.

str=NULL; // not even allocated

char str2[10]; //10 bytes alloacted
str2[0]='\0' ; // string length is zero

// But I work out my own string library before
// by my own format to deal with the BIG5 encoding.
* macro to strip leading and trailing whitespace

* ---------------------------------------------- */

#define trimwhite(thisstr) if ( (thisstr) != NULL ) { \

int thislen = strlen(thisstr); \

while ( --thislen >= 0 ) \

if ( isthischar((thisstr)[thislen],WHITESPACE) ) \

(thisstr)[thislen] = '\000'; \

else break; \

if ( (thislen = strspn((thisstr),WHITESPACE)) > 0 ) \

{strsqueeze((thisstr),thislen);} } else /*user adds ;*/

/* ---

* macro to remove all 'c' chars from s

* ------------------------------------ */

#define compress(s,c) if(!isempty(s)) /* remove embedded c's from s */ \

{ char *p; while((p=strchr((s),(c)))!=NULL) {strsqueeze(p,1);} } else

/* ---

* macro to strcpy(s,s+n) using memmove() (also works for negative n)

* ------------------------------------------------------------------ */

#define strsqueeze(s,n) if((n)!=0) { if(!isempty((s))) { \

int thislen3=strlen(s); \

if ((n) >= thislen3) *(s) = '\000'; \

else memmove((s),((s)+(n)),(1+thislen3-(n))); }} else /*user adds;*/

/* ---

* macro to strncpy() n bytes and make sure it's null-terminated

* ------------------------------------------------------------- */

#define strninit(target,source,n) \

if( (target)!=NULL && (n)>=0 ) { \

char *thissource = (source); \

(target)[0] = '\000'; \

if ( (n)>0 && thissource!=NULL ) { \

strncpy((target),thissource,(n)); \

(target)[(n)] = '\000'; } }

/* ---

* macro to check for thischar inthisstr

* ------------------------------------- */

#define isthischar(thischar,inthisstr) \

( (thischar)!='\000' && *(inthisstr)!='\000' \

&& strchr(inthisstr,(thischar))!=(char *)NULL )

/* ---

* macro for last char of a string

* ------------------------------- */

#define lastchar(s) (isempty(s)?'\000':*((s)+(strlen(s)-1)))
 
M

Malcolm McLean

As was mentioned, there are just too many ways of dealing with such
functions: results can be in-place, in a caller-supplied destination, or in
allocated memory.
That's one big issue.

trim() could be reasonably an in-place trim function, one that took a const
char * and a buffer (with length supplied or not supplied), or return an
allocated pointer.
I'd say the first option is best because of the way the function is likely
to be used. No-one is going to want to pass it a string literal, and only
rarely will you want both a trimmed string and the original retained.
But with replace() you can't easily calculate the output size before calling
it, and it can't be in-place as it's as likely to expand as to shrink the
buffer.
 
S

ssmitch

I'm putting together a little library of supplementary string functions, strsup.c. The functions are intended to be fairly short, and to operate on char *s. They should ideally be implementable as a single function which can be snipped and pasted. Unlike the string.h functions they can depend on malloc(). The obvious first function is strdup(). I've also got strcount - count the instances of character ch in str. trim - remove leadign and trailing whitespace from a string singlespace - repalce all runs of whitespace ina string with a single space character. split - return an array of fields,split on a delimiter character. compnumeric - compare two strings with embedded numbers. replace - replace all substrings with new passed string. Anyideas for more?

Besides the obvious candidates, I've also found useful on a number of occasions a function to strip unwanted characters (such as commas or dollar signs in numeric values) from a string before further processing. My own version is simply called strip(), but for a library you would probably want to rename it:

/*
* remove unwanted characters from string
*
* call as strip(char *str, *unwanted)
*
* where "str" is the string to process and "unwanted" is a null-
* terminated string containing the characters to be stripped.
*
* returns pointer to modified string
*/

char *strip(char *str, char *unwanted) {
char *cp, *savptr;
savptr = str;
cp = str - 1;
while (*++cp = *str++)
if (strchr(unwanted, *cp) != NULL)
--cp;
return savptr;
{
 
M

Malcolm McLean

On Tuesday, January 22, 2013 1:14:18 PM UTC-5, Malcolm McLean wrote:

My own version is simply called strip(), but for a library you would probably want to rename it:
It's got the str prefix which indicates a standard library string function.
 
B

Ben Bacarisse

Besides the obvious candidates, I've also found useful on a number of
occasions a function to strip unwanted characters (such as commas or
dollar signs in numeric values) from a string before further
processing. My own version is simply called strip(), but for a
library you would probably want to rename it:

/*
* remove unwanted characters from string
*
* call as strip(char *str, *unwanted)
*
* where "str" is the string to process and "unwanted" is a null-
* terminated string containing the characters to be stripped.
*
* returns pointer to modified string
*/

char *strip(char *str, char *unwanted) {
char *cp, *savptr;
savptr = str;
cp = str - 1;

That's, technically, problematic. If str points to that start of an
array, the standard does not permit you to form the pointer str - 1,
even if you do nothing with it!
while (*++cp = *str++)
if (strchr(unwanted, *cp) != NULL)
--cp;

I think the fix is simpler than the original:

cp = str;
while (*cp = *str++)
if (strchr(unwanted, *cp) == NULL)
cp++;

I such cases I tend to write:

while (*cp = *str++)
cp += strchr(unwanted, *cp) == NULL;

but similar things have caused me to accused of all sorts of barbarism,
so I won't suggest you do likewise!
return savptr;
{

} I think.
 
T

Tim Rentsch

Ben Bacarisse said:
That's, technically, problematic. If str points to that start of
an array, the standard does not permit you to form the pointer
str - 1, even if you do nothing with it!


I think the fix is simpler than the original:

cp = str;
while (*cp = *str++)
if (strchr(unwanted, *cp) == NULL)
cp++;

I such cases I tend to write:

while (*cp = *str++)
cp += strchr(unwanted, *cp) == NULL;

but similar things have caused me to accused of all sorts of
barbarism, so I won't suggest you do likewise!

I was inspired by your examples to look for a short and simple
implementation. I came up with this:

char *
eliminate( char *to_shrink, const char *unwanted ){
char *p = to_shrink, *q = p;
do q += strspn( q, unwanted ); while( *p++ = *q++ );
return to_shrink;
}

I think it's easy to see that all 'unwanted' bytes are skipped
and only values not in 'unwanted' are copied.

And now let the barbarism accusers say what they will!
 
B

Ben Bacarisse

Tim Rentsch said:
I was inspired by your examples to look for a short and simple
implementation. I came up with this:

Well, I was suggesting a simple fix rather than a simple alternative.
char *
eliminate( char *to_shrink, const char *unwanted ){
char *p = to_shrink, *q = p;
do q += strspn( q, unwanted ); while( *p++ = *q++ );
return to_shrink;
}

I think it's easy to see that all 'unwanted' bytes are skipped
and only values not in 'unwanted' are copied.

And now let the barbarism accusers say what they will!

That's nice (expect for the layout!) and I don't think there is any
barbarism involved (where could it be?).
 
T

Tim Rentsch

Ben Bacarisse said:
Tim Rentsch said:
Ben Bacarisse said:
[.. discussing a function to remove unwanted characters from
a string ..]

I think the fix is simpler than the original:
cp = str;
while (*cp = *str++)
if (strchr(unwanted, *cp) == NULL)
cp++;

I such cases I tend to write:

while (*cp = *str++)
cp += strchr(unwanted, *cp) == NULL;

but similar things have caused me to accused of all sorts of
barbarism, so I won't suggest you do likewise!

I was inspired by your examples to look for a short and simple
implementation. I came up with this:

Well, I was suggesting a simple fix rather than a simple
alternative.

Right. I didn't mean to imply anything different.
char *
eliminate( char *to_shrink, const char *unwanted ){
char *p = to_shrink, *q = p;
do q += strspn( q, unwanted ); while( *p++ = *q++ );
return to_shrink;
}

[snip]

That's nice (expect for the layout!) [snip]

Those who find the single-line do/while unattractive might
prefer this instead:

while( q += strspn( q, unwanted ), *p++ = *q++ ) {}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,528
Members
45,000
Latest member
MurrayKeync

Latest Threads

Top