Right trim string function in C

S

Stefan Ram

Ben Bacarisse said:
and while (i-- & other_condition(i)) is idiomatic for counting while

There might be a typo above: »&« possibly was intended to be »&&«.
 
B

Ben Bacarisse

There might be a typo above: »&« possibly was intended to be »&&«.

Yup, thanks. Fortunately there were loads of other example that had the
&& so I don't think there will be too much confusion.
 
B

BartC

Which is basically something like
n := strlen(s)
while n>0 and isspace(s[n-1]),
s[--n] := 0

However an important gotcha is the string has to be writable and all
clients of
the string will see the trimming. If you may want to make a copy of the
string
and modify that.

Alternatively, the function can just calculate what the trimmed portion is,
without doing any of the actual work of constructing a new string; the
caller can do that, if it needs to.

Example prototype:

str: pointer to char-sequence (need not be right-terminated
when length is provided)
length: -1 when unknown, or 0 or more when caller provides the length

return: length of trimmed portion

When no length info is provided, this needs one pass through the string.
With a length, it's mainly just the trimmed portion that needs to be
scanned.

(Or just forget the length; I'm working with unterminated strings at present
so find it more natural.)
 
K

Kaz Kylheku

Which is basically something like
n := strlen(s)
while n>0 and isspace(s[n-1]),
s[--n] := 0

However an important gotcha is the string has to be writable and all clients of
the string will see the trimming. If you may want to make a copy of the string
and modify that.

Oh yeah, you wouldn't want to spoil the functional purity of C programming with
an underhanded destructive hack, such as trimming a string in place!
 
B

BartC

Example prototype:

Too much editing and the prototype disappeared! It might look like:

int trimstringright(char* str, int length);

(size_t can be used, if an alternative to -1 for the length can be found.)

Here's some code, although not as efficient as it could be when no length is
provided:

int trimstringright(char* str, int length){
char *p;

if (str==NULL)return 0;
if (length==-1) length=strlen(str); //being lazy..
if (length==0) return 0;

p=str+length-1;

while (p>=str && *p-- == ' ') --length;

return length;
}
 
M

Malcolm McLean

Oh yeah, you wouldn't want to spoil the functional purity of C programming with
an underhanded destructive hack, such as trimming a string in place!
I do sometimes use my own little library that accepts const string and
returns the result in an allocated string. But I find the advantage is
small.
MPI tutorial
http://www.malcolmmclean.site1.com/www
 
I

Ike Naar

Too much editing and the prototype disappeared! It might look like:

int trimstringright(char* str, int length);

(size_t can be used, if an alternative to -1 for the length can be found.)

Here's some code, although not as efficient as it could be when no length is
provided:

int trimstringright(char* str, int length){
char *p;

if (str==NULL)return 0;
if (length==-1) length=strlen(str); //being lazy..
if (length==0) return 0;

p=str+length-1;

while (p>=str && *p-- == ' ') --length;

return length;
}

Consider trimstringright(" ",1);
Just before the while loop, length==1 and p==str.
In the first iteration, the loop condition holds,
and p is decremented and now points outside the string.
In the second iteration, the access to p in "p>=str"
has undefined behaviour.
 
B

BartC

[Scanning a string backwards]
Consider trimstringright(" ",1);
Just before the while loop, length==1 and p==str.
In the first iteration, the loop condition holds,
and p is decremented and now points outside the string.
In the second iteration, the access to p in "p>=str"
has undefined behaviour.

OK, let's suppose that on some machine, it is meaningless to point to an
address before 'str'; why would the problem occur on the next access to p,
rather than in the p-- operation, or in the >= comparison?

And, on a machine which doesn't have such problems (eg. flat address space,
str not located at the beginning of that space, and the ability to compare
pointers even when the memory locations involved do not exist), would it
still be undefined behaviour?
 
B

Ben Bacarisse

BartC said:
[Scanning a string backwards]
Consider trimstringright(" ",1);
Just before the while loop, length==1 and p==str.
In the first iteration, the loop condition holds,
and p is decremented and now points outside the string.
In the second iteration, the access to p in "p>=str"
has undefined behaviour.

OK, let's suppose that on some machine, it is meaningless to point to
an address before 'str'; why would the problem occur on the next
access to p, rather than in the p-- operation, or in the >=
comparison?

You are right. The first (main?) problem is p-- (actually the implied
p-1). Simply calculating this pointer is undefined. Referring to p is
also undefined, as is the comparison p>=str, but the root problem is
subtracting one from a pointer equal to str.

[This assumes that str points to the start of an object. You can't
tell by looking at the code if this true of not, but it's clear from
the purpose of it that it might be, so the code should avoid this
problem.]
And, on a machine which doesn't have such problems (eg. flat address
space,

"Flatness" is not enough on its own. One can imagine some form of bound
checking that actually enforces the rule even when the address space is
flat.
str not located at the beginning of that space, and the ability
to compare pointers even when the memory locations involved do not
exist), would it still be undefined behaviour?

Yes, because in this context, undefined does not mean "goes wrong", it
just means that C does not define what happens. That does not change
even in the most forgiving environment. In practise what happens is
that other things step in and define what happens to code like this
where the language specification has nothing more to say. The compiler,
the machine, the OS all conspire to extend the C standard so that code
such as the above may well have a well-defined meaning.

The trouble is that you loose portability, and the assurances that the
code will work are often simply assumed rather than being explicit.
 
M

Malcolm McLean

And, on a machine which doesn't have such problems (eg. flat address space,
str not located at the beginning of that space, and the ability to compare
pointers even when the memory locations involved do not exist), would it
still be undefined behaviour?
Yes. "It works" is in fact the most dangerous form of undefined
behaviour.
 
K

Kenny McCormack

Yes. "It works" is in fact the most dangerous form of undefined
behaviour.

Hah hah. Of course, I understand this phrase in the CLC context, but I have
to point out that I think most working code in the real world does, in fact,
have undefined behavior coursing through its veins. But the PHBs don't care
- as long as it works (*), and the money keeps coming in, they don't care.

(*) Or at least, as is usually the case, "seems to work".

--
Modern Christian: Someone who can take time out from
complaining about "welfare mothers popping out babies we
have to feed" to complain about welfare mothers getting
abortions that PREVENT more babies to be raised at public
expense.
 
K

Kaz Kylheku

And, on a machine which doesn't have such problems (eg. flat address space,
str not located at the beginning of that space, and the ability to compare
pointers even when the memory locations involved do not exist), would it
still be undefined behaviour?

"Undefined behavior" means that the C standard doesn't have an opinion on the
meaning of a construct (a requirement is not imposed).

On the machine that you describe, by golly, the C standard still doesn't
impose a requirement. The document hasn't changed.

On that machine/compiler you may have a behavior that is defined, but that
doesn't come from any requirements in standard C, that is all.

Defined by your tools, not defined by standard C.

(That kind of "defined" works in practice and is often good enough.)
 
K

Kaz Kylheku

Hah hah. Of course, I understand this phrase in the CLC context, but I have
to point out that I think most working code in the real world does, in fact,
have undefined behavior coursing through its veins.

Simply including a nonstandard header which is not present in the program is
undefined behavior. Calling external functions that are not in ISO C and not
defined in the program is also undefined behavior.

A program that includes <unistd.h> and calls open is undefined according to
ISO C, but may be a well-defined POSIX C program.

This is why obsessing over undefined behavior is silly. Undefined behavior
just means, "you're leaving that province of C programming where ISO C
provides a useful opinion".

It's all just opinions, because at the end of the day, the compiler and library
may do something else, and you still want to ship something.

Only what actually happens in the machine you're programming is "behavior".
 
D

David Thompson

unsigned char * ep = (unsigned char *) strchr ( (char*)string, 0)
to achieve the stated simplification of isspace() call.
Which at this point might not really be worth it.

Or change string back to plain char *, don't cast the strchr argument,
but do cast in the comparison. Not really a whole lot easier.
Yes, thanks. Glad I followed protocol in posting an error in a
correction :)

Is an incomplete correction a new error?

<ObControversy> Maybe if it's volatile. </>
 
T

Tim Rentsch

Ben Bacarisse said:
Both here and below you meant to write isspace(*p).


No, that has a similar problem. Unfortunately you've cut the context so
it won't be clear what you were correcting. The problem was
constructing an invalid pointer that points before the start of the
string and this code can also do that when the string is all spaces.

In addition to being careful about the pointers, you need to finesse the
mess that is isspace (and friends) when char might be signed. It's a
shame that what should be a simple function is really quite tricky.

My solution had two statements and one single-line macro definition.

char *rstrip(unsigned char *string)
{
char *ep = strchr(0);
while (ep > string && isspace(ep[-1])) --ep;
*ep = 0;
return string;
}

(The unsigned char * just is to avoid cluttering the code with a cast

I think you've got your unsigned-ness in the wrong place.
or Tim's exotic compound literal union.)

Exotic?!??? Why that's ridiculous. :) :) :)

More seriously, this sort of thing should be bundled up in a macro
definition, and with C11 now coming to the fore, a macro around
a _Generic expression should work quite nicely.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,898
Latest member
BlairH7607

Latest Threads

Top