Right trim string function in C

S

Shao Miller

Are you referring to the keyword (not statement) 'register'? If so, it's
a storage-class specifier. While it suggests that access to so-specified
objects "be as fast as possible," it also ensures that the address of a
so-specified object cannot be taken. In this fashion, it is used in my
code to prevent accidentally taking the address of the corresponding
objects. I believe that this is similar to how 'const' can be removed
from a well-behaved program without changing the behaviour. Am I mistaken?


Why didn't you point out the redundancy of my 'continue's? Doesn't it
seem like another "extra" like 'register' and 'const'?

It just occurred to me that you were talking about the jump statement
'continue', rather than the storage-class specifier 'register'. It's in
there as a habit because of the ability to hook it with a macro for
debugging purposes.

#if DEBUG_ITERATIONS
#define continue \
if (1) { \
puts("Continuing iteration"); \
continue; \
} else do ; while (0)
#endif
 
A

August Karlstrom

Hello C programmers,
I was wondering does anybody knows how or is there
a right trim string function available in C?

No, not in the standard library, but it is not hard to implement.
Remember that since strings are constant objects you first need to copy
the string to a character array.

#include <ctype.h>
#include <string.h>

void trim_right(char *s)
{
int i;

i = strlen(s) - 1;
while ((i >= 0) && isspace(s)) {
i--;
}
s[i + 1] = '\0';
}


August
 
W

Willem

Willem wrote:
) peter wrote:
) ) Hello C programmers,
) ) I was wondering does anybody knows how or is there
) ) a right trim string function available in C?
) )
) ) Eg. Suppose I have the following:
) )
) ) char *str = "Hello Dolly \0";
) )
) ) Is there a right trim function that will remove the
) ) trailing spaces and make *str look like "Hello Dolly\0"?
) )
) ) In PYTHON there is a useful function for this: str.rstrip();
)
) There's been a lot of code posted, all of which seemed to be using
) pointers, so just for shits I'll post one that uses indexes:
)
) i = strlen(s);
) while (i > 0 && s[i-1] == ' ') i--;
) s2 = malloc(i+1);
) memcpy(s2, s, i);
) s2 = 0;

And if you really want to avoid the forwards/backwards seeking,
(but note that strlen is probably a lot faster at seeking forward than
a manual loop, so for long strings this could very well be slower)
use this:

i = 0;
for (j = 0; s[j]; j++) if (s[j] != ' ') i = j+1;
s2 = malloc(i+1);
memcpy(s2, s, i);
s2 = 0;

But like I said, that will probably be slower in most cases.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
S

Stefan Ram

Ben Bacarisse said:
In addition to being careful about the pointers, you need to finesse the
mess that is isspace (and friends) when char might be signed.

Ok, I admit that I do not understand this. After

#include <ctype.h>

, and in the scope of something like

char c;

, the expression

isspace( c )

will be nonzero, when c is a standard white-space character
(or is one of a locale-specific set of characters).

Where is the problem Ben refers to?
 
B

Ben Bacarisse

August Karlstrom said:
Hello C programmers,
I was wondering does anybody knows how or is there
a right trim string function available in C?

No, not in the standard library, but it is not hard to
implement. Remember that since strings are constant objects you first
need to copy the string to a character array.

#include <ctype.h>
#include <string.h>

void trim_right(char *s)
{
int i;

i = strlen(s) - 1;
while ((i >= 0) && isspace(s)) {
i--;
}
s[i + 1] = '\0';
}


Odd things can happen when strlen(s) == 0 because strlen(s) - 1 is a
large positive number which may or may not fit into an int (it's worse
if it does, though that is very unlikely). Of course, since the
conversion to int is implementation defined, you may get -1 exactly as
you expect, but that just means the problem might not show up in
testing. You can avoid the problem by sticking to unsigned arithmetic:

size_t i = strlen(s);
while (i-- > 0 && isspace(s))
/* do nothing */;
s[i + 1] = '\0';

Using unsigned wrap-around like this is a little unusual but it does
work (unless I've messed-up of course).
 
K

Keith Thompson

Ok, I admit that I do not understand this. After

#include <ctype.h>

, and in the scope of something like

char c;

, the expression

isspace( c )

will be nonzero, when c is a standard white-space character
(or is one of a locale-specific set of characters).

Where is the problem Ben refers to?

N1570 7.4p1:

... the argument is an int, the value of which shall be
representable as an unsigned char or shall equal the value of
the macro EOF. If the argument has any other value, the behavior
is undefined.

If plain char is signed and the value of c is negative (and not
equal to EOF), then the behavior of isspace(c) is undefined.

This can be avoided by writing:

isspace((unsigned char)c)
 
B

Ben Bacarisse

Ok, I admit that I do not understand this. After

#include <ctype.h>

, and in the scope of something like

char c;

, the expression

isspace( c )

will be nonzero, when c is a standard white-space character
(or is one of a locale-specific set of characters).

Where is the problem Ben refers to?

7.4 p1:

The header <ctype.h> declares several functions useful for classifying
and mapping characters. In all cases the argument is an int, the value
of which shall be representable as an unsigned char or shall equal the
value of the macro EOF. If the argument has any other value, the
behavior is undefined.

If char is signed, not all values of (int)c (the effect of the argument
promotion) are representable as an unsigned char. Also, there may be
a character for which (int)c == EOF.

If you know that char is unsigned or that all values of c are positive,
then there's no problem, but in portable code you need to take some
evasive action. The "usual" solution is to write

isspace((unsigned char)c);

but there was a recent thread where it was suggested that is not always
100% portable and correct depending on where the characters come from
and what representation the machine uses.
 
B

Ben Bacarisse

Shao Miller said:
No need to apologize. I'm glad that you eventually understood it
without any comments to follow along with. If it had been a more
instructive example, it would have included comments.

My post was an attempt at humour that seems to have failed. Sorry about
that. Because style discussions are generally unproductive (people who
use non-standard styles always know that the reasons they do it are
worth it) I wanted to say "Personally, I don't like your choices and BTW
you forgot to include ctype.h" in a more interesting way than that.

I'm sorry if it was not funny.
What do you mean? Stylistically? In regards to program flow? In
regards to object usage? His program and mine do different things,
so...

Yes, stylistically. I find his highly non-standard layout very hard to
read.
Are you referring to the keyword (not statement) 'register'?

No, as you later posted it was the continue at the bottom of every loop.
If so,
it's a storage-class specifier. While it suggests that access to
so-specified objects "be as fast as possible," it also ensures that
the address of a so-specified object cannot be taken. In this
fashion, it is used in my code to prevent accidentally taking the
address of the corresponding objects.

You see? I was sure you'd have a reason for that, Just chuckle at my
Luddite failure to perceive the value of it.

Why is it one greater than it needs to be? If 'c' is not the null
terminator, then one past the end of the string is at least two
characters away. That distance seems useful for the 'malloc'.

It's swings and roundabouts. You need to --diff to set the null and do
the memcpy and if you keep the pointer "nearer" you need to +1 when you
malloc. The latter is, at least, a common idiom. I.e.:

char c;
const char * cur_pos = string;
const char * one_past_end = string;
ptrdiff_t diff;
char * copy;

if (!chars) {
while ((c = *cur_pos++))
if (!isspace(c))
one_past_end = cur_pos;
}
else {
while ((c = *cur_pos++))
if (!in_set(c, chars))
one_past_end = cur_pos;
}
diff = one_past_end - string;
copy = malloc(diff + 1);
if (!copy)
return NULL;

copy[diff] = '\0';
return memcpy(copy, string, diff);

Personally, I'd then change "diff" to "length". I've also moved the ++
of cur_pos to the loop, but if you don't like that you can go back to

while ((c = *cur_pos)) {
if (!isspace(c))
one_past_end = cur_pos + 1;
++cur_pos;
}

but it's then clear that the ++cur_pos could go above the "if" and thus
into the loop condition.

<snip>
 
M

Malcolm McLean

No, not in the standard library, but it is not hard to
implement. Remember that since strings are constant objects you first
need to copy the string to a character array.
#include <ctype.h>
#include <string.h>
void trim_right(char *s)
{
   int i;
   i = strlen(s) - 1;
   while ((i >= 0) && isspace(s)) {
           i--;
   }
   s[i + 1] = '\0';
}


Odd things can happen when strlen(s) == 0 because strlen(s) - 1 is a
large positive number which may or may not fit into an int (it's worse
if it does, though that is very unlikely).  Of course, since the
conversion to int is implementation defined, you may get -1 exactly as
you expect, but that just means the problem might not show up in
testing.  You can avoid the problem by sticking to unsigned arithmetic:

  size_t i = strlen(s);
  while (i-- > 0 && isspace(s))
      /* do nothing */;
  s[i + 1] = '\0';

Using unsigned wrap-around like this is a little unusual but it does
work (unless I've messed-up of course).
 
A

August Karlstrom

August Karlstrom said:
#include<ctype.h>
#include<string.h>

void trim_right(char *s)
{
int i;

i = strlen(s) - 1;
while ((i>= 0)&& isspace(s)) {
i--;
}
s[i + 1] = '\0';
}


Odd things can happen when strlen(s) == 0 because strlen(s) - 1 is a
large positive number which may or may not fit into an int (it's worse
if it does, though that is very unlikely). Of course, since the
conversion to int is implementation defined, you may get -1 exactly as
you expect, but that just means the problem might not show up in
testing.


Thanks for pointing that out. I somewhat carelessly assumed that strlen
returns an int.
You can avoid the problem by sticking to unsigned arithmetic:

size_t i = strlen(s);
while (i--> 0&& isspace(s))
/* do nothing */;
s[i + 1] = '\0';

Using unsigned wrap-around like this is a little unusual but it does
work (unless I've messed-up of course).


For empty strings and strings containing only whitespace you rely on two
unsigned integer wrap-arounds. Nothing wrong with that, though I find
the side-effectful loop guard hard to understand - could you maybe read
its meaning aloud. ;-)

Sometimes unsigned types create more problems than they solve so I would
probably just make sure that the string is not insanely long:

#include <assert.h>
#include <ctype.h>
#include <limits.h>
#include <string.h>

void trim_right(char *s)
{
size_t len;
int i;

len = strlen(s);
assert(len <= INT_MAX);
i = len - 1;
while ((i >= 0) && isspace(s)) {
i--;
}
s[i + 1] = '\0';
}


August
 
B

Barry Schwarz

No, not in the standard library, but it is not hard to implement.
Remember that since strings are constant objects you first need to copy
the string to a character array.

While a string literal should be treated as if it were const, a string
in general need not be constant in any sense of the word.
 
S

Shao Miller

My post was an attempt at humour that seems to have failed. Sorry about
that.

I suspected, but wasn't 100% sure. Now that it's certain, I'll say that
the post was pretty funny, even if thoroughly punishing. :)
Because style discussions are generally unproductive (people who
use non-standard styles always know that the reasons they do it are
worth it) I wanted to say "Personally, I don't like your choices and BTW
you forgot to include ctype.h" in a more interesting way than that.

I'm sorry if it was not funny.

It was. It's still too bad that it doesn't appeal to everyone, though,
and is worth consideration in a group project.
Yes, stylistically. I find his highly non-standard layout very hard to
read.

')*p ' threw me off for a few seconds.
No, as you later posted it was the continue at the bottom of every loop.

++oops; --oops;
If so,
it's a storage-class specifier. While it suggests that access to
so-specified objects "be as fast as possible," it also ensures that
the address of a so-specified object cannot be taken. In this
fashion, it is used in my code to prevent accidentally taking the
address of the corresponding objects.

You see? I was sure you'd have a reason for that, Just chuckle at my
Luddite failure to perceive the value of it.

*Chuckle*
Why is it one greater than it needs to be? If 'c' is not the null
terminator, then one past the end of the string is at least two
characters away. That distance seems useful for the 'malloc'.

It's swings and roundabouts. You need to --diff to set the null and do
the memcpy and if you keep the pointer "nearer" you need to +1 when you
malloc. The latter is, at least, a common idiom. I.e.:

char c;
const char * cur_pos = string;
const char * one_past_end = string;
ptrdiff_t diff;
char * copy;

if (!chars) {
while ((c = *cur_pos++))
if (!isspace(c))
one_past_end = cur_pos;
}
else {
while ((c = *cur_pos++))
if (!in_set(c, chars))
one_past_end = cur_pos;
}
diff = one_past_end - string;
copy = malloc(diff + 1);
if (!copy)
return NULL;

copy[diff] = '\0';
return memcpy(copy, string, diff);

Yes, that does seem to be a more common approach that I've seen. I
guess that I figured this 'diff + 1' idiom above would be implemented in
one of two ways:

increment the register for 'diff'
use that with 'malloc'
decrement it back to what it was
use that with the last two lines

or:

copy 'diff' into another register B
increment that other register B
use that B with 'malloc'
register B is free for repurposing

Neither of which I particularly care for, versus:

use 'diff' with 'malloc'
decrement 'diff'
use that with the last two lines

Since both flavours use 'one_past_end' and 'cur_pos', I figured the
earlier 'one_past_end = cur_pos + 2' would probably go:

copy 'cur_pos' into 'one_past_end'
increment 'one_past_end'
increment 'one_past_end'

But of course, these are totally wild assumptions.
Personally, I'd then change "diff" to "length". I've also moved the ++
of cur_pos to the loop, but if you don't like that you can go back to

while ((c = *cur_pos)) {
if (!isspace(c))
one_past_end = cur_pos + 1;
++cur_pos;
}

but it's then clear that the ++cur_pos could go above the "if" and thus
into the loop condition.

Actually, I prefer '*cur_pos++' (and had that originally), but thought
it might be too wild for the OP, for whatever reason. However, since
there aren't any comments to follow along with, it's weak for any
instructional consideration, anyway.

Pleasant discussion, though! :)
 
B

Ben Bacarisse

August Karlstrom said:
On 2012-02-11 20:58, Ben Bacarisse wrote:
You can avoid the problem by sticking to unsigned arithmetic:

size_t i = strlen(s);
while (i--> 0&& isspace(s))
/* do nothing */;
s[i + 1] = '\0';

Using unsigned wrap-around like this is a little unusual but it does
work (unless I've messed-up of course).


For empty strings and strings containing only whitespace you rely on
two unsigned integer wrap-arounds. Nothing wrong with that, though I
find the side-effectful loop guard hard to understand - could you
maybe read its meaning aloud. ;-)


Some spaces have got lost. My post had 'i-- > 0 && isspace(s)' and
no, I can't read it out load. That's interesting, because I can see
that it might be an advantage to be able to, but I just think of complex
conditions involving && as chains of actions and side effects and so it
looks quite normal to me.
Sometimes unsigned types create more problems than they solve so I
would probably just make sure that the string is not insanely long:

#include <assert.h>
#include <ctype.h>
#include <limits.h>
#include <string.h>

void trim_right(char *s)
{
size_t len;
int i;

len = strlen(s);
assert(len <= INT_MAX);
i = len - 1;

That won't help because the problem comes when strlen(s) == 0. Maybe
you intended to write:

len = strlen(s);
assert(len <= INT_MAX);
i = len;
i -= 1;

? It's odd enough to deserve a comment, though.
while ((i >= 0) && isspace(s)) {
i--;
}
s[i + 1] = '\0';
}
 
A

August Karlstrom

Some spaces have got lost. My post had 'i--> 0&& isspace(s)' and
no,


Sorry about that, it is caused by a bug in my email client (Thunderbid
9.0) though I have not noticed it before.

https://bugzilla.mozilla.org/show_bug.cgi?id=448198
I can't read it out load. That's interesting, because I can see
that it might be an advantage to be able to, but I just think of complex
conditions involving&& as chains of actions and side effects and so it
looks quite normal to me.

I see, like shell commands in Bash. My own ideal is to build
abstractions from very simple parts (expressions and statements).
That won't help because the problem comes when strlen(s) == 0.

Yes, you are right. I forgot to cast len to int, the statement should be

i = (int) len - 1;


August
 
S

Stefan Ram

August Karlstrom said:
Sorry about that, it is caused by a bug in my email client (Thunderbid
9.0) though I have not noticed it before.
https://bugzilla.mozilla.org/show_bug.cgi?id=448198

There is a moderated C++ newsgroup in Usenet, and IIRC (I am
not sure whether it was the moderated C++ or the moderated C
newsgroup, or both), the moderation process always changed
some parts of my posts (like the indentation, but IIRC not
uniformly) before publication, and the moderation was not
willing to release a post of me that contained a paragrapph
that I wrote to demonstrate and to investigate the behavior
of the moderation software.

I believe it is worse to publish something the attributed
author did not write (even if it is just changed with regard
to the indentation), but still attribute it to him, than to
not publish it all (this is not addresses at you, August,
but at that moderation), and so I refrained to post to the
moderated C and C++ newsgroups.
 
M

Malcolm McLean

August Karlstrom said:
On 2012-02-11 20:58, Ben Bacarisse wrote:
You can avoid the problem by sticking to unsigned arithmetic:
   size_t i = strlen(s);
   while (i-->  0&&  isspace(s))
       /* do nothing */;
   s[i + 1] = '\0';
Using unsigned wrap-around like this is a little unusual but it does
work (unless I've messed-up of course).

For empty strings and strings containing only whitespace you rely on
two unsigned integer wrap-arounds. Nothing wrong with that, though I
find the side-effectful loop guard hard to understand - could you
maybe read its meaning aloud. ;-)

Some spaces have got lost.  My post had 'i-- > 0 && isspace(s)' and
no, I can't read it out load.  That's interesting, because I can see
that it might be an advantage to be able to, but I just think of complex
conditions involving && as chains of actions and side effects and so it
looks quite normal to me.




Sometimes unsigned types create more problems than they solve so I
would probably just make sure that the string is not insanely long:
#include <assert.h>
#include <ctype.h>
#include <limits.h>
#include <string.h>
void trim_right(char *s)
{
   size_t len;
   int i;
   len = strlen(s);
   assert(len <= INT_MAX);
   i = len - 1;

That won't help because the problem comes when strlen(s) == 0.  Maybe
you intended to write:

        len = strlen(s);
        assert(len <= INT_MAX);
        i = len;
        i -= 1;

?  It's odd enough to deserve a comment, though.
   while ((i >= 0) && isspace(s)) {
           i--;
   }
   s[i + 1] = '\0';
}


i = strlen(s);
while(i--)
if(!isspace(s))
break;
s[i+1] = '\0';
 
A

August Karlstrom

While a string literal should be treated as if it were const, a string
in general need not be constant in any sense of the word.

Right, in C a string is any sequence of chars ending with the null
character.


August
 
B

Ben Bacarisse

Malcolm McLean said:
On 2012-02-11 20:58, Ben Bacarisse wrote:
You can avoid the problem by sticking to unsigned arithmetic:
size_t i = strlen(s);
while (i-- > 0 && isspace(s))
/* do nothing */;
s[i + 1] = '\0';

Using unsigned wrap-around like this is a little unusual but it does
work (unless I've messed-up of course).

i = strlen(s);
while(i--)
if(!isspace(s))
break;
s[i+1] = '\0';


As you can see that's what I was advocating. Are you saying that
putting

if (!C) break;

in a loop is better that just adding '&& C' to the loop condition? I
prefer to have an explicit loop condition. I'm neutral about your other
change (from 'i-- > 0' to 'i--' on it's own).
 
M

Malcolm McLean

<snip>




On 2012-02-11 20:58, Ben Bacarisse wrote:
<snip>
You can avoid the problem by sticking to unsigned arithmetic:
   size_t i = strlen(s);
   while (i-- > 0 && isspace(s))
       /* do nothing */;
   s[i + 1] = '\0';
Using unsigned wrap-around like this is a little unusual but it does
work (unless I've messed-up of course).

i = strlen(s);
while(i--)
  if(!isspace(s))
    break;
s[i+1] = '\0';


As you can see that's what I was advocating.  Are you saying that
putting

  if (!C) break;

in a loop is better that just adding '&& C' to the loop condition?

while(i-- > 0 && isspace(s))
/* do nothing*/;

is a bit confusing. First, you have an empty loop. Then in fact i is
decremented before the test s, and it's safe because of intellignet
and guarding. That's exactly the sort of detail that people who only
use C occasionally often get wrong.
i = N;
while(i--)
is idiomatic C for counting down.
 
B

Ben Bacarisse

Malcolm McLean said:
On 2012-02-11 20:58, Ben Bacarisse wrote:
<snip>
You can avoid the problem by sticking to unsigned arithmetic:
   size_t i = strlen(s);
   while (i-- > 0 && isspace(s))
       /* do nothing */;
   s[i + 1] = '\0';

Using unsigned wrap-around like this is a little unusual but it does
work (unless I've messed-up of course).
i = strlen(s);
while(i--)
  if(!isspace(s))
    break;
s[i+1] = '\0';


As you can see that's what I was advocating.  Are you saying that
putting

  if (!C) break;

in a loop is better that just adding '&& C' to the loop condition?

while(i-- > 0 && isspace(s))
/* do nothing*/;

is a bit confusing. First, you have an empty loop. Then in fact i is
decremented before the test s, and it's safe because of intellignet
and guarding. That's exactly the sort of detail that people who only
use C occasionally often get wrong.


Your version also depends on exactly the same details of guarding so if
it's wrong in my version, it will be wrong in yours. That point can't
be an argument either way.

You've exchanged an empty loop for another exit from the loop. I don't
think that's a good exchange but I accept that's partly just a matter of
taste. You don't (presumably) write

while (1)
if (A)
break;

instead of while (A), but you do what to write

while (A)
if (B)
break;

instead of while (A && B) just because A has a side effect. && is both
short-circuiting and has a sequence point precisely so that A && B works
in cases like this.
i = N;
while(i--)
is idiomatic C for counting down.

and while (i-- & other_condition(i)) is idiomatic for counting while
something else is true. Both rely on starting with i one more than the
top value for which we are interested, but they are both idiomatic to
me.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,527
Members
45,000
Latest member
MurrayKeync

Latest Threads

Top