Right trim string function in C

P

peter

Hello C programmers,
I was wondering does anybody knows how or is there
a right trim string function available in C?

Eg. Suppose I have the following:

char *str = "Hello Dolly \0";

Is there a right trim function that will remove the
trailing spaces and make *str look like "Hello Dolly\0"?

In PYTHON there is a useful function for this: str.rstrip();
 
J

James Kuyper

Hello C programmers,
I was wondering does anybody knows how or is there
a right trim string function available in C?

Eg. Suppose I have the following:

char *str = "Hello Dolly \0";

Is there a right trim function that will remove the
trailing spaces and make *str look like "Hello Dolly\0"?

Sorry, there's no standard library function that does this. If it's
something you need a lot, you should write your own routine - it won't
be hard.
 
S

Stefan Ram

peter said:
char *str = "Hello Dolly \0"; (...)
In PYTHON there is a useful function for this: str.rstrip();

Untested and assuming that the string is in a mutable buffer:

#include <ctype.h>
#include <string.h>

void rstrip( char * const string )
{ char * last = string + strlen( string )- 1;
while( last >= string && isspace( last ))*last-- = 0; }
 
B

Barry Schwarz

Untested and assuming that the string is in a mutable buffer:

#include <ctype.h>
#include <string.h>

void rstrip( char * const string )
{ char * last = string + strlen( string )- 1;
while( last >= string && isspace( last ))*last-- = 0; }

If the string consists entirely of blanks, is it undefined behavior
for last to end up pointing before the start of the string? This
value is not dereferenced but it is used one last time in the first
relational expression.

Why have all the solutions so far changed all the trailing blanks to
nuls? Wouldn't it be sufficient and more efficient to change only the
first one?

while( last > string && isspace( last-- ));
*++last = 0; }

which has the "advantage" of changing a string of all blanks to one
with a single blank instead of the empty string.
 
M

Morris Keesan

Untested and assuming that the string is in a mutable buffer:

#include <ctype.h>
#include <string.h>

void rstrip( char * const string )
{ char * last = string + strlen( string )- 1;
while( last >= string && isspace( last ))*last-- = 0; }

I believe that this invokes undefined behavior if strlen(string) == 0.
 
S

Stefan Ram

Morris Keesan said:
I believe that this invokes undefined behavior if strlen(string) == 0.

#include <ctype.h>
#include <string.h>

void rstrip( char * const string )
{ char * p = string + strlen( string );
while( p-- > string && isspace( p ))*p = 0; }
 
S

Stefan Ram

void rstrip( char * const string )
{ char * p = string + strlen( string );
while( p-- > string && isspace( p ))*p = 0; }

or, more careful:

void rstrip( char * const string )
{ char * p = string + strlen( string ); if( p > string )
while( p-- > string && isspace( p ))*p = 0; }
 
B

Ben Bacarisse

Both here and below you meant to write isspace(*p).
or, more careful:

void rstrip( char * const string )
{ char * p = string + strlen( string ); if( p > string )
while( p-- > string && isspace( p ))*p = 0; }

No, that has a similar problem. Unfortunately you've cut the context so
it won't be clear what you were correcting. The problem was
constructing an invalid pointer that points before the start of the
string and this code can also do that when the string is all spaces.

In addition to being careful about the pointers, you need to finesse the
mess that is isspace (and friends) when char might be signed. It's a
shame that what should be a simple function is really quite tricky.

char *rstrip(unsigned char *string)
{
char *ep = strchr(0);
while (ep > string && isspace(ep[-1])) --ep;
*ep = 0;
return string;
}

(The unsigned char * just is to avoid cluttering the code with a cast or
Tim's exotic compound literal union.)
 
B

Ben Bacarisse

James Kuyper said:
Because that's what the OP requested (see above).

Only one blank needs to be changed to a null to satisfy the OP's
request. So far, the solutions have done more than that. It's not
wrong, but it is noteworthy.
 
J

James Kuyper

Only one blank needs to be changed to a null to satisfy the OP's
request. So far, the solutions have done more than that. It's not
wrong, but it is noteworthy.

He asked to have the trailing blanks removed. That's "blanks", plural -
removing only a single blank doesn't do the job. It's possible that what
you're suggesting is what he actually meant, but I've worked in
situations where it was important to zero-out the unused portion of a
buffer containing a string, and that's what I assumed he was looking
for. Several others seem to agree.
 
B

Ben Bacarisse

James Kuyper said:
On 02/10/2012 04:56 PM, Ben Bacarisse wrote:
...

strchr(string, 0)?

Yes, thanks. Glad I followed protocol in posting an error in a
correction :)
 
B

Ben Bacarisse

James Kuyper said:
James Kuyper said:
On 02/10/2012 02:30 PM, Barry Schwarz wrote:
On 10 Feb 2012 18:59:36 GMT, (e-mail address removed)-berlin.de (Stefan Ram)
wrote:

[snipage restored:]
Is there a right trim function that will remove the
trailing spaces and make *str look like "Hello Dolly\0"?
...
Why have all the solutions so far changed all the trailing blanks to
nuls? ...

Because that's what the OP requested (see above).

Only one blank needs to be changed to a null to satisfy the OP's
request. So far, the solutions have done more than that. It's not
wrong, but it is noteworthy.

He asked to have the trailing blanks removed. That's "blanks", plural -
removing only a single blank doesn't do the job. It's possible that what
you're suggesting is what he actually meant, but I've worked in
situations where it was important to zero-out the unused portion of a
buffer containing a string, and that's what I assumed he was looking
for. Several others seem to agree.

That seems unlikely. For one thing, people have not followed the
request literally. The OP said "spaces" and yet everyone has happily
ignored that and tested for isspace. Why take one thing literally and
ignore another?
 
K

Keith Thompson

James Kuyper said:
On 02/10/2012 05:15 PM, Ben Bacarisse wrote: [...]
Only one blank needs to be changed to a null to satisfy the OP's
request. So far, the solutions have done more than that. It's not
wrong, but it is noteworthy.

He asked to have the trailing blanks removed. That's "blanks", plural -
removing only a single blank doesn't do the job. It's possible that what
you're suggesting is what he actually meant, but I've worked in
situations where it was important to zero-out the unused portion of a
buffer containing a string, and that's what I assumed he was looking
for. Several others seem to agree.

Setting just the first trailing blank to '\0' does remove all the
trailing blanks *from the string*, but not from the array.

This is an argument for clear problem statements. This sometimes
requires a bit of back-and-forth to clarify ambiguities; it's not
realistic to expect that every problem statement will be completely
unambiguous in its initial form.
 
S

Stefan Ram

Ben Bacarisse said:
No, that has a similar problem. Unfortunately you've cut the context so
it won't be clear what you were correcting. The problem was
constructing an invalid pointer that points before the start of the
string and this code can also do that when the string is all spaces.

Not addressing the isspace issue here, I now have tried to
debug my code. I got the idea to use a language called
»C++«, which is in large parts like C, but allows me to
»overwrite« the decrement operator of my pointer, so as to
be sure to stay within the buffer. Using this, I found the
error you have described and corrected it:

#include <cstring>
#include <cctype>
#include <cstdio>

char text[] = " ";

class pointer
{ char * p; public:
pointer( char * const q ):p( q ){}
int operator>( char * const q )
{ return p > q; }
char * operator--(int)
{ char * old = p; p--;
printf( "%p %p\n", text, p );
return old; }
char & operator*(){ return *p; }};

void rstrip( char * const string )
{ pointer p = string + strlen( string ); if( p > string )
while( p > string && p-- && isspace( *p ))*p = 0; }

int main()
{ rstrip( text ); fputs( text, stdout ); puts( "|" ); }

00652010 00652012
00652010 00652011
00652010 00652010
|

The left column is the address of the first byte in the text
buffer and the right column is the address of p directly
after a decrement.
 
S

Shao Miller

Hello C programmers,
I was wondering does anybody knows how or is there
a right trim string function available in C?

Eg. Suppose I have the following:

char *str = "Hello Dolly \0";

Is there a right trim function that will remove the
trailing spaces and make *str look like "Hello Dolly\0"?

In PYTHON there is a useful function for this: str.rstrip();

Maybe you would like this?:

/* Also available at http://ideone.com/JBT2Y */

#include <stddef.h>
#include <stdlib.h>
#include <string.h>
#include <stdio.h>

static int in_set(
register const int c,
register const char * chars
) {
register int test;

while ((test = *chars)) {
if (c == test)
return 1;
++chars;
continue;
}
return 0;
}

char * rstrip(
register const char * string,
register const char * const chars
) {
register char c;
register const char * cur_pos = string;
register const char * one_past_end = string + 1;
register ptrdiff_t diff;
register char * copy;

if (!chars) {
while ((c = *cur_pos)) {
if (!isspace(c))
one_past_end = cur_pos + 2;
++cur_pos;
continue;
}
} else {
while ((c = *cur_pos)) {
if (!in_set(c, chars))
one_past_end = cur_pos + 2;
++cur_pos;
continue;
}
}
diff = one_past_end - string;
copy = malloc(diff);
if (!copy)
return NULL;

--diff;
copy[diff] = '\0';
return memcpy(copy, string, diff);
}

int main(void) {
char *tests[] = {
"",
" ",
" ",
"f",
" f",
" f",
" f ",
" f ",
"f ",
"f ",
"foo bar baz",
" foo bar baz",
" foo bar baz",
" foo bar baz ",
" foo bar baz ",
"foo bar baz ",
"foo bar baz ",
"mississippi",
};
int i;
char * stripped;

for (i = 0; i < (sizeof tests / sizeof *tests) - 1; ++i) {
stripped = rstrip(tests, NULL);
if (stripped)
printf("[%s]\n", stripped);
free(stripped);
continue;
}
stripped = rstrip(tests, "ipz");
if (stripped)
printf("[%s]\n", stripped);
free(stripped);

return EXIT_SUCCESS;
}
 
W

Willem

peter wrote:
) Hello C programmers,
) I was wondering does anybody knows how or is there
) a right trim string function available in C?
)
) Eg. Suppose I have the following:
)
) char *str = "Hello Dolly \0";
)
) Is there a right trim function that will remove the
) trailing spaces and make *str look like "Hello Dolly\0"?
)
) In PYTHON there is a useful function for this: str.rstrip();

There's been a lot of code posted, all of which seemed to be using
pointers, so just for shits I'll post one that uses indexes:

i = strlen(s);
while (i > 0 && s[i-1] == ' ') i--;
s2 = malloc(i+1);
memcpy(s2, s, i);
s2 = 0;


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
B

Ben Bacarisse

Shao Miller said:
Hello C programmers,
I was wondering does anybody knows how or is there
a right trim string function available in C?

Eg. Suppose I have the following:

char *str = "Hello Dolly \0";

Is there a right trim function that will remove the
trailing spaces and make *str look like "Hello Dolly\0"?

In PYTHON there is a useful function for this: str.rstrip();

Maybe you would like this?:

/* Also available at http://ideone.com/JBT2Y */

#include <stddef.h>
#include <stdlib.h>
#include <string.h>
#include <stdio.h>

static int in_set(
register const int c,
register const char * chars
) {
register int test;

while ((test = *chars)) {
if (c == test)
return 1;
++chars;
continue;
}
return 0;
}

char * rstrip(
register const char * string,
register const char * const chars
) {
register char c;
register const char * cur_pos = string;
register const char * one_past_end = string + 1;
register ptrdiff_t diff;
register char * copy;

if (!chars) {
while ((c = *cur_pos)) {
if (!isspace(c))
one_past_end = cur_pos + 2;
++cur_pos;
continue;
}
} else {
while ((c = *cur_pos)) {
if (!in_set(c, chars))
one_past_end = cur_pos + 2;
++cur_pos;
continue;
}
}
diff = one_past_end - string;
copy = malloc(diff);
if (!copy)
return NULL;

--diff;
copy[diff] = '\0';
return memcpy(copy, string, diff);
}

This is a very good effort, but, I am sorry to say, I was able to
understand it eventually. You might want to study the code posted by
Stefan Ram. He is undoubtedly ahead of you in matters of pure layout,
but your use of a little-used statement in a position where it is both
suggestive and pointless is a stoke of genius!

If you want to develop your own layout further, I'd humbly suggest that
indenting a closing } by half the indent used for the body could be
improved. Yes, there's nothing to line the } up with visually, but an
indent of 2 is just about enough for the human eye to follow without
that clue.

As for the semantics, I thought the idea of keeping "one_past_end" one
greater than it needs to be (thereby toying, teasingly, with UB all the
time) was a very nice touch -- it's always nice to see a '+ 2' in a
string-walking loop. And what can I say about the omission of #include
<ctype.h>? I had to check the standard so see if the implied
declaration, together with the default argument promotions, made the
code valid. A masterful flourish.

Some things are low-blows, though. Did you really think anyone would be
confused by the fact that only one of the parameters to rstrip is const?
Shame on you! The register declarations hide it, but it's a fair bet
the people won't even notice!

<snip>
 
S

Shao Miller

This is a very good effort, but, I am sorry to say, I was able to
understand it eventually.

No need to apologize. I'm glad that you eventually understood it
without any comments to follow along with. If it had been a more
instructive example, it would have included comments.
You might want to study the code posted by
Stefan Ram.

I studied it before typing my code.

His code walks to the end of the string via 'strlen' and then walks
backwards to the first non-space. My code does not walk over elements
of the original string twice until the 'memcpy'.

His code does not return a trimmed copy of the original string. Mine
attempts to, akin to the OP's Python reference, which the OP apparently
enjoys.

His code does not accept a set of characters to be used for trimming the
right side of a string. Mine attempts to, akin to the OP's Python
reference, which the OP apparently enjoys.
He is undoubtedly ahead of you in matters of pure layout,

What do you mean? Stylistically? In regards to program flow? In
regards to object usage? His program and mine do different things, so...
but your use of a little-used statement in a position where it is both
suggestive and pointless is a stoke of genius!

Are you referring to the keyword (not statement) 'register'? If so,
it's a storage-class specifier. While it suggests that access to
so-specified objects "be as fast as possible," it also ensures that the
address of a so-specified object cannot be taken. In this fashion, it
is used in my code to prevent accidentally taking the address of the
corresponding objects. I believe that this is similar to how 'const'
can be removed from a well-behaved program without changing the
behaviour. Am I mistaken?
If you want to develop your own layout further, I'd humbly suggest that
indenting a closing } by half the indent used for the body could be
improved. Yes, there's nothing to line the } up with visually, but an
indent of 2 is just about enough for the human eye to follow without
that clue.

If you don't like it, I'm sorry about that. That's currently my
preference and maybe that will change some day. My rationale is that
any continuation of a statement be indented further than the beginning
line of the statement, as I think you noticed. I also sometimes do:

int ok;

int ok = (
condition1 &&
condition2 &&
condition3 ||
condition4
);
if (ok)
foo();
else
bar();

instead of:

if (condition1 && condition2 &&
condition3 || condition4) {
foo();
}
else {
bar();
}

Whose 'else' is that, anyway?

If the conditions are someday changed, the 'diff' for the former looks a
bit different than the 'diff' of the latter. I enjoy the former more.
As for the semantics, I thought the idea of keeping "one_past_end" one
greater than it needs to be (thereby toying, teasingly, with UB all the
time) was a very nice touch -- it's always nice to see a '+ 2' in a
string-walking loop.

Why is it one greater than it needs to be? If 'c' is not the null
terminator, then one past the end of the string is at least two
characters away. That distance seems useful for the 'malloc'.
And what can I say about the omission of #include
<ctype.h>? I had to check the standard so see if the implied
declaration, together with the default argument promotions, made the
code valid. A masterful flourish.

Actually, it was absolutely unintentional. That's what I get for
hastily pasting code at 3:52 am. Thank you for pointing that out. I'm
glad that it worked out anyway. ;)
Some things are low-blows, though. Did you really think anyone would be
confused by the fact that only one of the parameters to rstrip is const?
Shame on you! The register declarations hide it, but it's a fair bet
the people won't even notice!

Another unintentional. At one point I had the 'const' in, but was
trying to save objects and considered re-purposing 'string', so I took
it out. I forgot to put it back in. Thank you for pointing that out,
too. Fortunately, 'const' and 'register' can be removed from the
entirety, but can be used to prevent accidents... Except for forgetting
to use them. ;)

Why didn't you point out the redundancy of my 'continue's? Doesn't it
seem like another "extra" like 'register' and 'const'?

I originally started writing the whole thing as a 'while' with no body
but a complex controlling expression. Then I thought it'd be clearer
without that.

Here is the corrected code:

/* Also available at http://ideone.com/Y5sz6 */

#include <stddef.h>
#include <ctype.h>
#include <stdlib.h>
#include <string.h>
#include <stdio.h>

static int in_set(
register const int c,
register const char * chars
) {
register int test;

while ((test = *chars)) {
if (c == test)
return 1;
++chars;
continue;
}
return 0;
}

char * rstrip(
register const char * const string,
register const char * const chars
) {
register char c;
register const char * cur_pos = string;
register const char * one_past_end = string + 1;
register ptrdiff_t diff;
register char * copy;

if (!chars) {
while ((c = *cur_pos)) {
if (!isspace(c))
one_past_end = cur_pos + 2;
++cur_pos;
continue;
}
} else {
while ((c = *cur_pos)) {
if (!in_set(c, chars))
one_past_end = cur_pos + 2;
++cur_pos;
continue;
}
}
diff = one_past_end - string;
copy = malloc(diff);
if (!copy)
return NULL;

--diff;
copy[diff] = '\0';
return memcpy(copy, string, diff);
}

int main(void) {
char *tests[] = {
"",
" ",
" ",
"f",
" f",
" f",
" f ",
" f ",
"f ",
"f ",
"foo bar baz",
" foo bar baz",
" foo bar baz",
" foo bar baz ",
" foo bar baz ",
"foo bar baz ",
"foo bar baz ",
"mississippi",
};
int i;
char * stripped;

for (i = 0; i < (sizeof tests / sizeof *tests) - 1; ++i) {
stripped = rstrip(tests, NULL);
if (stripped)
printf("[%s]\n", stripped);
free(stripped);
continue;
}
stripped = rstrip(tests, "ipz");
if (stripped)
printf("[%s]\n", stripped);
free(stripped);

return EXIT_SUCCESS;
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top