Comment on trim string function please

S

swengineer001

Just looking for a few eyes on this code other than my own.

void TrimCString(char *str)
{
// Trim whitespace from beginning:
size_t i = 0;
size_t j;

while(isspace(str))
{
i++;
}
if(i > 0)
{
for(j = 0; i < strlen(str); j++, i++)
{
str[j] = str;
}
str[j] = '\0';
}

// Trim whitespace from end:
i = strlen(str) - 1;

while(isspace(str))
{
i--;
}
if(i < (strlen(str) - 1))
{
str[i + 1] = '\0';
}
}
 
J

Jens Thoms Toerring

Just looking for a few eyes on this code other than my own.
void TrimCString(char *str)
{
// Trim whitespace from beginning:

I guess I would trim from the end first since then you have less
copying to do afterwards.
size_t i = 0;
size_t j;
while(isspace(str))


isspace() expects an int and not a char as it's argument.
{
i++;
}
if(i > 0)
{
for(j = 0; i < strlen(str); j++, i++)
{
str[j] = str;
}
str[j] = '\0';
}


An alternative would be to use memmove() here, so you don't
have to do it byte by byte. Also callling strlen() each time
through the loop is a bit of a waste of time - it doesn't
change and can be replaced by a check if str is '\0'.
// Trim whitespace from end:
i = strlen(str) - 1;

Careful: This could set 'i' to -1 (if the string consistet of white
space only) and then the rest won't work anymore.
while(isspace(str))
{
i--;
}

if(i < (strlen(str) - 1))
{
str[i + 1] = '\0';
}
}

Here's an alternative version using pointers (and trying to
minimize the number of calls of strlen()):

void
TrimCString( char *str )
{
char *p,
*q;

/* Check that we've got something that looks like a string */

if ( ! str || ! * str )
return;

/* Trim from end */

for ( p = str + strlen( str ) - 1; p != str && isspace( ( int ) *p ); p-- )
/* empty */ ;

if ( p == str ) /* only white space in string */
{
*str = '\0';
return;
}

*++p = '\0';

/* Trim from start */

for ( q = str; isspace( ( int ) *q ); q++ )
/* empty */ ;

if ( q != str )
memmove( str, q, p - q + 1 );
}
Regards, Jens
 
B

badc0de4

Just looking for a few eyes on this code other than my own.

void TrimCString(char *str)

Why not return a char *, like most other string functions?

char *TrimCString(char *str)

using char * enables you to piggyback your function in the
middle of other functions, eg
printf("%s\n", TrimCString(someString));
{
// Trim whitespace from beginning:
size_t i = 0;
size_t j;

while(isspace(str))
{
i++;
}
if(i > 0)
{
for(j = 0; i < strlen(str); j++, i++)


move the strlen() outside the loop.
Maybe use memmove() instead of the loop.
{
str[j] = str;
}
str[j] = '\0';
}

// Trim whitespace from end:
i = strlen(str) - 1;


Use the strlen you computed before, when
you moved it out of the for loop above :)
while(isspace(str))
{
i--;
}
if(i < (strlen(str) - 1))
{
str[i + 1] = '\0';


No need for the test.
when i >= (strlen(str) - 1) -- it can only be equal, anyway -- the
assignment overwrites a '\0' with a brand new '\0'.
Anyway, if you want to keep the test, use the computed strlen.


A couple what-if's

* what if a pass NULL to the function?
TrimCString(NULL);

* what if I pass a constant string literal to the function?
TrimCString(" 4 spaces at both ends ");
 
S

swengineer001

Just looking for a few eyes on this code other than my own.
void TrimCString(char *str)
{
        // Trim whitespace from beginning:

I guess I would trim from the end first since then you have less
copying to do afterwards.
        size_t i = 0;
        size_t j;
        while(isspace(str))


isspace() expects an int and not a char as it's argument.

Doesn't this promotion happen implicitly and without loss of
information since I am actually dealing with characters and not using
the char as a holder for small integers?
        {
                i++;
        }
        if(i > 0)
        {
                for(j = 0; i < strlen(str); j++, i++)
            {
                        str[j] = str;
                }
                str[j] = '\0';
        }


An alternative would be to use memmove() here, so you don't
have to do it byte by byte. Also callling strlen() each time
through the loop is a bit of a waste of time - it doesn't
change and can be replaced by a check if str is '\0'.
        // Trim whitespace from end:
        i = strlen(str) - 1;

Careful: This could set 'i' to -1 (if the string consistet of white
space only) and then the rest won't work anymore.

i is of type size_t which I believe can't be negative?
        while(isspace(str))
        {
                i--;
        }
        if(i < (strlen(str) - 1))
        {
                str[i + 1] = '\0';
        }
}


Here's an alternative version using pointers (and trying to
minimize the number of calls of strlen()):

void
TrimCString( char *str )
{
    char *p,
             *q;

    /* Check that we've got something that looks like a string */

        if ( ! str || ! * str )
                return;

        /* Trim from end */

    for ( p = str + strlen( str ) - 1; p != str && isspace( ( int ) *p ); p-- )
            /* empty */ ;

        if ( p == str )      /* only white space in string */
        {
            *str = '\0';
                return;
    }

        *++p = '\0';

    /* Trim from start */

        for ( q = str; isspace( ( int ) *q ); q++ )
        /* empty */ ;

    if ( q != str )
        memmove( str, q, p - q + 1 );}

Thanks for the code and the comments. This is useful.
 
S

swengineer001

Why not return a char *, like most other string functions?

char *TrimCString(char *str)

        using char * enables you to piggyback your function in the
        middle of other functions, eg
        printf("%s\n", TrimCString(someString));
Good point, I had not thought of it.
{
   // Trim whitespace from beginning:
   size_t i = 0;
   size_t j;
   while(isspace(str))
   {
           i++;
   }
   if(i > 0)
   {
           for(j = 0; i < strlen(str); j++, i++)


move the strlen() outside the loop.
Maybe use memmove() instead of the loop.
       {
                   str[j] = str;
           }
           str[j] = '\0';
   }

   // Trim whitespace from end:
   i = strlen(str) - 1;

Use the strlen you computed before, when
you moved it out of the for loop above :)

the length of the string has changed since then, possibly.
   while(isspace(str))
   {
           i--;
   }
   if(i < (strlen(str) - 1))
   {
           str[i + 1] = '\0';


No need for the test.
when i >= (strlen(str) - 1) -- it can only be equal, anyway -- the
assignment overwrites a '\0' with a brand new '\0'.
Anyway, if you want to keep the test, use the computed strlen.

A couple what-if's

* what if a pass NULL to the function?
  TrimCString(NULL); I added a check for this.

* what if I pass a constant string literal to the function?
  TrimCString("    4 spaces at both ends    ");

How can I account for this?
 
S

swengineer001

Just looking for a few eyes on this code other than my own.

     This pair of eyes sees three bugs, two occurring twice each
and the other perhaps due to too much snippage.  There are also
some opportunities to improve speed and style.  So, here we go:

     Bug: Since you're using isspace() and strlen(), you need to
#include <ctype.h> and <string.h> to get their declarations.
Without the declarations, a compiler operating under C99 rules
must generate a diagnostic.  Under C89 rules the code will work,
but might not be as fast as if the vendor's "magic" declarations
were present.
void TrimCString(char *str)
{
   // Trim whitespace from beginning:
   size_t i = 0;

     Style: Instead of initializing `i' at its declaration and then
relying on the initial value later on, consider initializing it
closer to its use.  The `for' statement is convenient for such
purposes.
   size_t j;
   while(isspace(str))


     Bug: If `str' contains negative-valued characters, this use
may violate the "contract" of isspace() by passing an out-of-range
argument.  Use `while (isspace( (unsigned char)str ))'.  (This
is one of those occasions where a cast is almost always required
and almost always omitted, as opposed to the more usual situation
where a cast is almost always inserted and almost always wrong.)
   {
           i++;
   }
   if(i > 0)
   {
           for(j = 0; i < strlen(str); j++, i++)

     Speed: This loop calculates strlen(str) on every iteration.
Since it will return the same value each time, all calls after the
first are wasted effort.  Call strlen() once before the loop and
store the result in a variable, and use that variable to control
the loop.
       {
                   str[j] = str;
           }
           str[j] = '\0';
   }

   // Trim whitespace from end:
   i = strlen(str) - 1;

     Bug: If `str' is the empty string (either because it was
empty to begin with or because it contained only white space),
this calculation will produce a very large `i' that is almost
assuredly wrong, R-O-N-G.
   while(isspace(str))


     Bug: Same missing cast as above.
   {
           i--;
   }
   if(i < (strlen(str) - 1))

     Bug: Same mishandling of the empty string as above.

     Speed: strlen(str) is still the same as it was a few lines
ago, so there's no point in computing it again.
   {
           str[i + 1] = '\0';

     Speed: It's probably quicker -- and certainly less verbose --
to do this assignment unconditionally than to test whether it's
needed.  If it isn't, you'll just store a zero on top of an
existing zero, which is harmless.

     Summary: Not quite ready for prime time, but not as bad as
some attempts I've seen.

     Challenge: See if you can think of a way to do the job in
just one pass over the string (calling strlen() counts as a
"pass").  Hint: During the copy-to-front operation, can you
somehow figure out where the final '\0' should land without
going back and inspecting the moved characters a second time?


Thanks for the input. Someone else had pointed out one of the issues
you mentioned but I didn't understsnd what he was saying until I read
your description.
 
S

swengineer001

Here is my second go at it after reading the comments provided.

//This was in my original file but just not immediately before this
function
#include <ctype.h>
#include <string.h>

//Added char* return as suggested
char* TrimCString(char *str)
{
// Trim whitespace from beginning:
size_t i = 0;
size_t j;

//Added validation of the argument for NULL
if(str == NULL)
{
return str;
}

//Added cast as suggested. I have always avoided casts for various
reasons
//discussed in the FAQ and elsewhere but it is good to know this
case
while(isspace((unsigned char)str))
{
i++;
}
if(i > 0)
{
//Calculate length once
size_t length = strlen(str);
//Decided to leave the bytewise copy I just think I can
understand it a little better
for(j = 0; i < length; j++, i++)
{
str[j] = str;
}
str[j] = '\0';
}

// Trim whitespace from end:
//Added check to catch the empty string
i = (strlen(str) ? (strlen(str) - 1) : 0);

while(isspace((unsigned char)str))
{
i--;
}
//Removed check that was not needed
str[i + 1] = '\0';
}
 
B

badc0de4

How can I account for this?

Well ... you can't!
But you can do (at least) one of two things:

a) add a comment forbidding the caller to pass a string literal.
The comment goes near the function in the file with the code,
it also goes in the header file and in the documentation
provided for TrimCString()

b) Rewrite TrimCString to write the trimmed string to a
different memory area (variable, array, whatever), a bit
like strcpy() works

char *TrimCString(char *dest, const char *source)
/* caller must ensure `dest` has enough space for
the trimmed string */
/* if source is NULL, trims `dest` in-place. `dest`
*MUST* be writable */
 
B

Ben Bacarisse

Eric Sosman said:
Style: Instead of initializing `i' at its declaration and then
relying on the initial value later on, consider initializing it
closer to its use. The `for' statement is convenient for such
purposes.

If you are suggesting replacing the while (isspace(str) with a
for (int i = 0; isspace((unsigned char)str; i++); then this will not
allow i to tested outside the for:
while(isspace(str))
{
i++;
}
if(i > 0)


here.
 
B

badc0de4

Here is my second go at it after reading the comments provided.
//Added char* return as suggested
char* TrimCString(char *str)
{
[...]

//Calculate length once
size_t length = strlen(str);
//Decided to leave the bytewise copy [...]
for(j = 0; i < length; j++, i++)
{
str[j] = str;
}
str[j] = '\0';
}

// Trim whitespace from end:
//Added check to catch the empty string
i = (strlen(str) ? (strlen(str) - 1) : 0);


Here, strlen(str) is the same as j.
You can replace the call to strlen() with j :)

i = j ? j - 1 : 0;
while(isspace((unsigned char)str))
{
i--;
}
//Removed check that was not needed
str[i + 1] = '\0';


And, since you changed the function to return a char *,
do return a char * to the caller.

return str;
 
M

Michael Brennan

Here is my second go at it after reading the comments provided.

// Trim whitespace from end:
//Added check to catch the empty string
i = (strlen(str) ? (strlen(str) - 1) : 0);

while(isspace((unsigned char)str))
{
i--;
}
//Removed check that was not needed
str[i + 1] = '\0';
}


2 things I found:

1. You are missing a return statement at the end.

2. This will not be correct if someone calls your function with an array
of length one (with its only element set to '\0'). The index in your
assignment at the last line will then be 1.
 
S

swengineer001

Here is my second go at it after reading the comments provided.
//Added char* return as suggested
char* TrimCString(char *str)
{
[...]

        //Calculate length once
        size_t length = strlen(str);
        //Decided to leave the bytewise copy [...]
        for(j = 0; i < length; j++, i++)
        {
            str[j] = str;
        }
        str[j] = '\0';
    }

    // Trim whitespace from end:
    //Added check to catch the empty string
    i = (strlen(str) ? (strlen(str) - 1) : 0);

Here, strlen(str) is the same as j.
You can replace the call to strlen() with j :)

This is only true if there was whitespace to trim from the front and
not in the more general case.
 
B

badc0de4

Here is my second go at it after reading the comments provided.
//Added char* return as suggested
char* TrimCString(char *str)
{

[...]

missing in my previous post
if (i > 0)
{
//Calculate length once
size_t length = strlen(str);
//Decided to leave the bytewise copy [...]
for(j = 0; i < length; j++, i++)
{
str[j] = str;
}
str[j] = '\0';
}

// Trim whitespace from end:
//Added check to catch the empty string
i = (strlen(str) ? (strlen(str) - 1) : 0);

Here, strlen(str) is the same as j.
You can replace the call to strlen() with j :)

This is only true if there was whitespace to trim from the front and
not in the more general case.


Oops ... you're right, thanks for pointing it out.
But I still think you could do without some of those strlen() calls.
 
W

Willem

Ben Bacarisse wrote:
) If you are suggesting replacing the while (isspace(str) with a
) for (int i = 0; isspace((unsigned char)str; i++); then this will not
) allow i to tested outside the for:

I assume he meant C89 style:
for (i = 0; isspace((unsigned char)str; i++);


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
W

Willem

Eric Sosman wrote:
) Why move the characters at all? The original version needed
) to because it returned no value, so the only way it could report
) the position of the first non-space was to squeeze out the leading
) spaces. But now that you're returning a pointer, why not just
) point at the first non-space no matter where you find it?

Because then the subsequent call to free() would bomb.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
S

swengineer001

I sort of took a different approach based on the comments here.

#include <ctype.h>
#include <string.h>

char* TrimCString(char *str)
{
// Trim whitespace from beginning:
size_t i = 0;
size_t length = strlen(str);
char* new_start;

if((str == NULL) || (length == 0))
{
return str;
}

while(isspace((unsigned char)str))
{
i++;
}

new_start = str + i;
length -= i;

// Trim whitespace from end:
i = (length ? (length - 1) : length);

if(i != 0)
{
while(isspace((unsigned char)str))
{
i--;
length--;
}
str[i + 1] = '\0';
}
memmove(str, new_start, (length + 1));
return str;
}
 
B

badc0de4

I sort of took a different approach based on the comments here.

It probably isn't relevant, but
I profiled 1,000,000 calls of your first and last versions:

index % time self children called name
[2] 94.2 6.26 0.00 1000000 TrimCString1 [2]
[3] 4.7 0.31 0.00 1000000 TrimCString2 [3]

Increase in speed was ( 6.26 / 0.31 = 20+ ) twentyfold :)
 
F

Flash Gordon

Eric Sosman wrote, On 10/07/08 18:41:
This pair of eyes sees three bugs, two occurring twice each
and the other perhaps due to too much snippage. There are also
some opportunities to improve speed and style. So, here we go:

Bug: Since you're using isspace() and strlen(), you need to
#include <ctype.h> and <string.h> to get their declarations.
Without the declarations, a compiler operating under C99 rules
must generate a diagnostic. Under C89 rules the code will work,
but might not be as fast as if the vendor's "magic" declarations
were present.

Actually, as the functions do not return an int (which under C89 rules
the compiler is required to assume they return) it invokes undefined
behaviour. So although I am not aware of any implementations where it
would fail it still *could* fail.
 
B

Bill Reid

Jens Thoms Toerring said:
Just looking for a few eyes on this code other than my own.
for(j = 0; i < strlen(str); j++, i++)
{
str[j] = str;
}

An alternative would be to use memmove() here, so you don't
have to do it byte by byte.

What would be the actual metrics to make a decision here
one way or the other? I myself just assume for this kind of stuff
that it will only be done on relatively small strings, just a few
characters, so it seems that a call to memmove() might be overkill.
Or is it? DOES it depend on the size of the block you'll
be moving? How many cycles does it take to assign one
character to a previous array position, versus the overhead of
a call to memmove(), and the operation of the function itself?
Also callling strlen() each time
through the loop is a bit of a waste of time - it doesn't
change and can be replaced by a check if str is '\0'.


Or just do a very simple while() loop that compares pointers:

char *start=str;
char *end=str+strlen(str);

while(start!=end) {
/* do something */
/* start++ for going forward from start */
/* end-- for going backwards from end */
}

Can't go wrong there...I will admit that I use to always use array
sub-scripting such as the OP because I found it easier conceptually
to understand, but then abandoned it largely for the equivalent
pointers as part of my paranoia about compiler "optimizations"
(I turn them ALL off, which means sub-scripting is not converted
to pointers, which I believe takes longer, so to retain the speed
I've manually converted most of the stuff like this).
 
V

voidpointer

another version, i don't know if it's correct, but work fine here

#include <stdio.h>
#include <string.h>
#include <ctype.h>
#include <stdlib.h>

char *stringtrim(char *str)
{
char *res, *ini, *end, *ptr;
int size;

/* safe mode to NULL string */
if (!str)
return NULL;

size = strlen(str);

if (!size)
return NULL;

ini = str;

/* go to the first non blank character */
while (isspace((unsigned char)*ini)) {
++ini;
--size;
}

if (size <= 0)
return NULL;

end = str + strlen(str) - 1; /* point to the last character */
/* go to the last non blank character */
while (isspace((unsigned char)*end)) {
--end;
--size;
}

if (size <= 0)
return NULL;

res = malloc(sizeof(char) * size);
for (ptr = res; ini != (end+1); ini++)
*ptr++ = *ini;
*ptr = '\0';

return res;
}

int main(void)
{
printf("|%s|\n", stringtrim(" hello, world"));
printf("|%s|\n", stringtrim(" 1234 5678 90 "));
char teste[] = " good bye, cruel world ";
printf("|%s|\n", stringtrim(teste));
return 0;
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top