Trim string

C

Chris M. Thomasson

Eric Sosman said:
Chris said:
[...]
typed directly in newsreader, try and forgive any typos ;^o
[...]
#define xisspace(c) isspace((int)(c))

This isn't a typo; it's a thinko.
char* cur = buffer;
while (*cur && xisspace(*cur))
:^)



... and this is still wrong, for all the same old reasons.
Also redundant, as well in addition to boot also.

I have a habit of checking for explicitly checking for NUL character. It's
wrong in the sense of the `xisspace()' macro being defined as taking a
signed value. I already corrected that non-sense:

http://groups.google.com/group/comp.lang.c/msg/d9d69bc065ee78bf



See my detailed explanations to DavidRF, elsethread.

The ones that deal with negative values being passed to `isspace()'?
 
D

David RF

Chris M. Thomasson said:
David RF said:
Hi, anybody knows a better way to (right-left) trim a string? [...]

You could try doing it in place; something like this:
typed directly in newsreader, try and forgive any typos  ;^o
_______________________________________________________________
#include <string.h>
#include <ctype.h>
#define xisspace(c) isspace((int)(c))

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

That should be:

#define xisspace(c) isspace((unsigned char)(c))

ARGHG!%$#!@

;^o


Is there a way to forbid the use of isspace() in *.c keeping xisspace
()?
 
B

Bart

Bart said:



In which case you'd be well advised to pass in a buffer size too. The
result string should always be at least no longer than the input
string, so telling the function how much space is available gives it
the opportunity to complain if that space is insufficient.

Yes, possibly. This is a special case where the destination *must* be
at least as large as the input string, otherwise it can't be
guaranteed to work, unless the caller has some knowledge of the
expected amount of white space to be trimmed.

Putting in a buffer size would require an error return scheme and
requires the caller to check for the error. All in all, you might as
well stick with the allocation method!
 
D

David RF

In






If there were, and if you availed yourself of that interdict, how
could xisspace possibly work? It's a macro. Wherever xisspace(x) is
seen in the code, the preprocessor will replace it with
isspace((unsigned char)(x)), which would fall foul of the interdict.

The best you can do, I think, is to forbid it not via your
implementation but via your project coding standards. You can then
search for unwrapped isspace calls in the source (grep is your friend
here), and raise them as project coding standard violations at code
review time.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
"Usenet is a strange place" - dmr 29 July 1999
Sig line vacant - apply within

Yes, is a stupid question
 
M

Moi

On Fri, 28 Aug 2009 04:42:26 -0700, David RF wrote:

A different approach: */

#include <string.h>
#include <stdlib.h>

void rev(char *str, size_t len)
{
char *end;

if (!len) len = strlen(str);
if (len < 2) return;

for (end=str+len-1; end > str; end--, str++) {
int tmp;
tmp= *str;
*str = *end;
*end = tmp;
}
}

char * trim(const char *str, size_t len)
{
char *new;
size_t cnt;

cnt = strspn(str, " \t\n\r\f\v" );
str += cnt;
if (!len) len = strlen(str);
else len -= cnt;

new = malloc (len +1);
/* we should check malloc's return value here */

if (len) memcpy(new, str, len);
new[len] = 0;
/* if (!len) we could return early here */

rev(new, len);
cnt = strspn(new, " \t\n\r\f\v" );

len -= cnt;
memcpy(new, str, len);
new[len] = 0;

/* if (cnt) we could realloc() here */
return new;
}

#include <stdio.h>
int main(void)
{
char *src = "\v Hello,\tWorld !\f\n", *dst;

dst = trim(src, 0);
printf("[%s] -> [%s]\n", src, dst);

return 0;
}


/*
It does some more scanning and copying, but has the advantage is that
both leading and trailing white are scanned using strspn(),


HTH,
AvK
 
G

Guest

| Hi, anybody knows a better way to (right-left) trim a string?

Can you define what you mean by this? I've learned through the years to
never trust implementation code as a description of a problem.

For nul terminated strings, the standard in C (though you can freely do it
in other ways, as I have), to get a substring from the middle of a string
for which I can modify the string in place, I would advance a pointer to
the beginning of the substring, and store a nul character in the position
just past the substring. If modification is not possible, then allocate a
new string if no predetermined space for it, and copy the subtring and add
the termination at the end of the copy.
 
J

Jorgen Grahn

David RF wrote: ....
/* Trim left and right spaces returning a new allocated string or NULL
*/
char *sstrip(const char *s);
char *sstrip(const char *s)
{ ....
if (*s) for (len = strlen(s); isspace(s[len - 1]); len--);
value = malloc(len + 1);
if (value == NULL) {
fprintf(stderr, "%s\n", "Buy more RAM!!");
exit(0);

Okay in a toy program. A production-quality function
would most likely return NULL to let the caller decide what
to do; a low-level function like this one is in a poor position
to make global decisions about the program's life or death.

IMHO it's not a good idea to malloc in a function like this. The user
should supply the destination buffer (which he can malloc if he wants).

Things like this are usually used for parsing text files, may be
called millions of times. Typically the trimmed string is thrown away
almost immediately. The huge overhead from a million malloc/free pairs
serves no purpose in such a scenario.

/Jorgen
 
C

Chris M. Thomasson

You could try doing it in place; something like this:

typed directly in newsreader, try and forgive any typos ;^o
[...]

Cool!!

:^)




Here is a little simplification in the adjustment phase:
_____________________________________________________________
if (start != buffer)
{
memmove(buffer, start, cur - start);

if (! end)
{
buffer[cur - start] = '\0';
}
}

if (end)
{
buffer[end - start] = '\0';
}
_____________________________________________________________




One other improvement/simplification could be to skip the call to
`memmove()' altogether and just return a pointer to the `start' location:
_____________________________________________________________
char*
trim_string(char* const buffer)
{
char* cur = buffer;

while (*cur && xisspace(*cur))
{
++cur;
}

if (*cur)
{
char* end = NULL;
char* start = cur;

++cur;

while (*cur)
{
if (xisspace(*cur))
{
if (! end)
{
end = cur;
}
}

else if (end)
{
end = NULL;
}

++cur;
}

if (end)
{
buffer[end - buffer] = '\0';
}

return start;
}

return buffer;
}
_____________________________________________________________




This would limit the number of mutations to the `buffer' down to a single
store of a terminating NUL character.
 
R

Richard Bos

David RF said:
I must cast all params passed to ctype or is better to made my own
safe functions?

That depends on where you get it from. If you have a char, cast it to
unsigned char. If, OTOH, you have an int that you just got from
getchar() or something related, you can (and should) pass it directly to
the <ctype.h> functions.

Richard
 
D

David RF

IMHO it's not a good idea to malloc in a function like this. The user
should supply the destination buffer (which he can malloc if he wants).

Things like this are usually used for parsing text files, may be
called millions of times. Typically the trimmed string is thrown away
almost immediately. The huge overhead from a million malloc/free pairs
serves no purpose in such a scenario.

How can you trim (left and right trim) a (const char *) if you don't
allocate a new string?

glib do this (without malloc):

gchar*
g_strchug (gchar *string)
{
guchar *start;

g_return_val_if_fail (string != NULL, NULL);

for (start = (guchar*) string; *start && g_ascii_isspace (*start);
start++)
;

g_memmove (string, start, strlen ((gchar *) start) + 1);

return string;
}

gchar*
g_strchomp (gchar *string)
{
gsize len;

g_return_val_if_fail (string != NULL, NULL);

len = strlen (string);
while (len--)
{
if (g_ascii_isspace ((guchar) string[len]))
string[len] = '\0';
else
break;
}

return string;
}

/* and finally trim */
#define g_strstrip( string ) g_strchomp (g_strchug (string))

Not tested but functions fail when a const char* (read only string) is
passed because of the use of memmove and string[len] = '\0';

Please, excuse my poor english
 
J

James Kuyper

David said:
How can you trim (left and right trim) a (const char *) if you don't
allocate a new string?

Put the trimmed string into memory provided by the caller of the
function, rather than by the trimming function itself. This might be
dynamically allocated memory, but it doesn't have to be.
 
D

David RF

Put the trimmed string into memory provided by the caller of the
function, rather than by the trimming function itself. This might be
dynamically allocated memory, but it doesn't have to be.

Yes!!

char *trim(const char *orig, char *dest)

Thanks to all posters of this thread
 
K

Keith Thompson

David RF said:
Yes!!

char *trim(const char *orig, char *dest)

Thanks to all posters of this thread

One disadvantage of this is that the caller has to ensure that
dest points to (the first element of) an array big enough to hold
the result. In this case, making the destination array the same
size as the source array is (more than) sufficient.

(BTW, for symmetry, I'd probably call the parameters src and dest
rather than orig and dest.)
 
C

Chris M. Thomasson

David RF said:
Yes!!

char *trim(const char *orig, char *dest)

Thanks to all posters of this thread

You could try something like this:
__________________________________________________________________
#include <string.h>
#include <assert.h>
#include <ctype.h>


#define xisspace(c) isspace((unsigned char)(c))


char*
trim_string(char* const buffer,
char** const end_buffer)
{
char* cur = buffer;

while (*cur && xisspace(*cur))
{
++cur;
}

if (*cur)
{
char* end = NULL;
char* start = cur;

++cur;

while (*cur)
{
if (xisspace(*cur))
{
if (! end)
{
end = cur;
}
}

else if (end)
{
end = NULL;
}

++cur;
}

if (end_buffer)
{
if (end)
{
*end_buffer = buffer + (end - buffer);
}

else
{
*end_buffer = cur;
}
}

return start;
}

else if (end_buffer)
{
*end_buffer = buffer;
}

return buffer;
}




#include <stdio.h>


void
print_trim_string(char* buffer)
{
char* end;
char const* start = trim_string(buffer, &end);
char tmp = *end;

*end = '\0';
printf("(%lu)->%s<-\n\n",
(unsigned long int)(end - start),
start);
*end = tmp;
}


int main(void)
{
char name1[] = " Hello World! ";
char name2[] = "123 - 456 - 768 ";
char name3[] = " a b c d";
char name4[] = "a b c d";
char name5[] = "Hello";
char name6[] = " ";
char name7[] = "";

trim_string(name1, NULL);
trim_string(name2, NULL);
trim_string(name3, NULL);
trim_string(name4, NULL);
trim_string(name5, NULL);
trim_string(name6, NULL);
trim_string(name7, NULL);

print_trim_string(name1);
print_trim_string(name2);
print_trim_string(name3);
print_trim_string(name4);
print_trim_string(name5);
print_trim_string(name6);
print_trim_string(name7);

return 0;
}

__________________________________________________________________




In this version `trim_string()' does not mutate the caller provided
`buffer'. Instead, it simply returns information on exactly where the
trimmed string starts and ends. The caller can do whatever she/he wants with
this information. As you can see, `print_trim_string()' temporally swaps the
`end' character with a NUL, prints the trimmed string along with it's length
and restores the character it previously swapped out.


This also allows you to easily create another dynamically created buffer if
you wish:
__________________________________________________________________
#include <stdio.h>
#include <stdlib.h>


char*
create_trim_string(char const* buffer)
{
char* const end;
char const* start = trim_string((char*)buffer, (char**)&end);
char* new_buffer = malloc(end - start + 1);

if (new_buffer)
{
memcpy(new_buffer, start, end - start);
new_buffer[end - start] = '\0';
}

return new_buffer;
}


int main(void)
{
char const name1[] = " Hello World! ";
char* buffer = create_trim_string(name1);

puts(buffer);
free(buffer);

return 0;
}
__________________________________________________________________




Any thoughts?
 
C

Chris M. Thomasson

[...]
This also allows you to easily create another dynamically created buffer
if you wish:
__________________________________________________________________
#include <stdio.h>
#include <stdlib.h>


char*
create_trim_string(char const* buffer)
{
char* const end;
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

That should be `char const* end;' if course!

;^/

char const* start = trim_string((char*)buffer, (char**)&end);
char* new_buffer = malloc(end - start + 1);

if (new_buffer)
{
memcpy(new_buffer, start, end - start);
new_buffer[end - start] = '\0';
}

return new_buffer;
}


int main(void)
{
char const name1[] = " Hello World! ";
char* buffer = create_trim_string(name1);

puts(buffer);
free(buffer);

return 0;
}
__________________________________________________________________
 
C

Chris M. Thomasson

Chris M. Thomasson said:
David RF said:
Yes!!

char *trim(const char *orig, char *dest)

Thanks to all posters of this thread

You could try something like this:
__________________________________________________________________ [...]
__________________________________________________________________




In this version `trim_string()' does not mutate the caller provided
`buffer'. Instead, it simply returns information on exactly where the
trimmed string starts and ends. The caller can do whatever she/he wants
with this information. As you can see, `print_trim_string()' temporally
swaps the `end' character with a NUL, prints the trimmed string along with
it's length and restores the character it previously swapped out.


This also allows you to easily create another dynamically created buffer
if you wish:
__________________________________________________________________ [...]
__________________________________________________________________

Or even return the size of the newly allocated string to perhaps allow the
caller to avoid a call to `strlen()', or whatever:
__________________________________________________________________
#include <stdio.h>
#include <stdlib.h>


char*
create_trim_string(char const* buffer, size_t* size)
{
char const* end;
char const* start = trim_string((char*)buffer, (char**)&end);
char* new_buffer = malloc((end - start) + 1);

if (new_buffer)
{
memcpy(new_buffer, start, end - start);
new_buffer[end - start] = '\0';
}

if (size) *size = end - start;

return new_buffer;
}


int main(void)
{
size_t size;
char const name1[] = " Hello World! ";
char* buffer = create_trim_string(name1, &size);

printf("(%lu)->%s<-\n", (unsigned long int)size, buffer);
free(buffer);

return 0;
}

__________________________________________________________________
 
N

Nick Keighley

/* Trim (right-left) and returns a new allocated string or NULL */
char *sstrip(const char *s);

this is a declaration of a function. It acts as a prototype
for the function.
char *sstrip(const char *s)

this is a definition of the function. It is also acts as
a prototype for the function. You don't need both of them.
 
D

David Thompson

<OT!>
On Fri, 28 Aug 2009 09:56:47 -0400, Eric Sosman
1a) Negative character values sometimes go unnoticed because
the characters in the "basic execution set" -- roughly speaking,
those that the Standard requires -- are all non-negative. Merkuns
are particularly likely to forget about negative characters, since
their impoverished repertoire of characters is mostly covered by
the basic execution set, and hence non-negative.
Damn! As a voting and newswatching merkin (is that an oxymoron?)
I knew there was something left out of all the recent bailouts and
stimulus packages, but I just couldn't remember what. It was those
poor friendless character sets! We need to strengthen our character
sets! <Strangelove> We must not allow a character set gap!

<G>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,775
Messages
2,569,601
Members
45,183
Latest member
BettinaPol

Latest Threads

Top