trim whitespace, bullet proof version

J

John Kelly

trim whitespace, bullet proof version



/*

Define author
John Kelly, August 20, 2010

Define copyright
Copyright John Kelly, 2010. All rights reserved.

Define license
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this work except in compliance with the License.
You may obtain a copy of the License at:
http://www.apache.org/licenses/LICENSE-2.0

Define symbols and (words)
exam ......... temporary char *
hast ......... temporary char * to alpha !isspace
keep ......... temporary char * to omega !isspace
ts ........... temporary string
tz ........... temporary ssize_t


*/

# include <ctype.h>
# include <errno.h>
# include <limits.h>
# include <stdio.h>
# include <string.h>

static int
trim (char **ts)
{
ssize_t tz;
unsigned char *exam;
unsigned char *hast;
unsigned char *keep;

if (!ts || !*ts) {
errno = EINVAL;
return -1;
}
tz = 0;
exam = (unsigned char *) *ts;
while (++tz < SSIZE_MAX && isspace (*exam)) {
++exam;
}
if (tz == SSIZE_MAX) {
errno = EOVERFLOW;
return -1;
}
tz = 0;
hast = keep = exam;
while (++tz < SSIZE_MAX && *exam) {
if (!isspace (*exam)) {
keep = exam;
}
++exam;
}
if (tz == SSIZE_MAX) {
errno = EOVERFLOW;
return -1;
}
if (*keep) {
*++keep = '\0';
}
if (keep - hast < 0) {
errno = EOVERFLOW;
return -1;
}
tz = keep - hast;
if (hast != (unsigned char *) *ts) {
(void) memmove (*ts, hast, tz + 1);
}
return tz;
}

void
testme (char *ts)
{
int tn;
tn = trim (&ts);
printf ("%3d characters in string=[%s]\n", tn, ts);
}

int
main (void)
{
char s1[] = " abc";
char s2[] = "def ";
char s3[] = " ghi ";
char s4[] = " ";
char s5[] = "";
char s6[] = " \n such a \n\t\t\t\t funny thing ";

printf ("trim whitespace\n");

testme (NULL);

testme (&s1[1]);
printf ("string=[%s]\n", s1);

testme (s2);
testme (s3);
testme (s4);
testme (s5);
testme (s6);

testme (s1);
testme (s2);
testme (s3);
testme (s4);
testme (s5);
testme (s6);

return 0;
}
 
J

John Kelly

trim whitespace, bullet proof version

Get rid of unnecessary test:


--- xc.c Sat Aug 21 07:03:25 2010
+++ xc.c Sat Aug 21 07:06:59 2010
@@ -64,10 +64,6 @@
if (*keep) {
*++keep = '\0';
}
- if (keep - hast < 0 || keep - hast == SSIZE_MAX) {
- errno = EOVERFLOW;
- return -1;
- }
tz = keep - hast;
if (hast != (unsigned char *) *ts) {
(void) memmove (*ts, hast, tz + 1);
 
S

Shao Miller

John said:
<sys/types.h>

Which is included by one of my listed headers, at least on my test box.
Thanks.

By the way, what about looping for multiple 'memmove's, each one with an
upper bound of 'SIZE_MAX'? That is, loop through everything with a
'SIZE_MAX' upper bound, but as many times as needed?

Should 'trim' return a 'ssize_t'?

Are you using 'unsigned char *' for a particular reason?

That's a neat trick for a wrap-around pointer arithmetic check.
 
J

John Kelly

Get rid of unnecessary test:


--- xc.c Sat Aug 21 07:03:25 2010
+++ xc.c Sat Aug 21 07:06:59 2010
@@ -64,10 +64,6 @@
if (*keep) {
*++keep = '\0';
}
- if (keep - hast < 0 || keep - hast == SSIZE_MAX) {
- errno = EOVERFLOW;
- return -1;
- }
tz = keep - hast;
if (hast != (unsigned char *) *ts) {
(void) memmove (*ts, hast, tz + 1);

That patch is against the wrong version. It's getting late. I think
this is the right one. Just kill the test of keep - hast < 0. That's
impossible, I see.


--- xc.c Sat Aug 21 07:03:25 2010
+++ xc.c Sat Aug 21 07:06:59 2010
@@ -64,10 +64,6 @@
if (*keep) {
*++keep = '\0';
}
- if (keep - hast < 0) {
- errno = EOVERFLOW;
- return -1;
- }
tz = keep - hast;
if (hast != (unsigned char *) *ts) {
(void) memmove (*ts, hast, tz + 1);
 
J

John Kelly

By the way, what about looping for multiple 'memmove's, each one with an
upper bound of 'SIZE_MAX'? That is, loop through everything with a
'SIZE_MAX' upper bound, but as many times as needed?

Maybe. I just wanted to avoid calling memmove with a negative length,
so I used SSIZE_MAX to limit how far I scan, from hast to keep.

It's open source. If you want to hack on it, go for it. Seems to me
like the brainiacs in this place ought to get organized, and develop a
canonical function library. But they better use my favorite license!

Should 'trim' return a 'ssize_t'?

ssize_t is typedef to a signed int. You think an explicit cast is
better?

Are you using 'unsigned char *' for a particular reason?

I wrote that part so long ago, I don't remember why.

That's a neat trick for a wrap-around pointer arithmetic check.

What is?

Time for sleep ...
 
E

Eric Sosman

[...]
Define author
John Kelly, August 20, 2010

Define copyright
Copyright John Kelly, 2010. All rights reserved.

Define license
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this work except in compliance with the License.
You may obtain a copy of the License at:
http://www.apache.org/licenses/LICENSE-2.0
[...]

Are you the same "John Kelly" who contributed
> On piratebay you can find many ebooks to download. More than you can
> shake a stick at.

.... to the "Where next?" thread? If so, I spit on your copyright,
on your hoity-toity license, and on your double standard.
 
B

Ben Bacarisse

John Kelly said:
trim whitespace, bullet proof version

Since you can't dodge all the bullets, it would be more helpful to what
ones you are proofing against. I.e. what is the contract between the
caller and called program? What are you trying to guard against?
Define symbols and (words)
exam ......... temporary char *
hast ......... temporary char * to alpha !isspace
keep ......... temporary char * to omega !isspace
ts ........... temporary string
tz ........... temporary ssize_t

I.e. rather than describe the internal temporary objects, what do you
assume about the environment and about the pointer that is passed to
generate what guarantees from the function.
*/

# include <ctype.h>
# include <errno.h>
# include <limits.h>
# include <stdio.h>
# include <string.h>

static int
trim (char **ts)
{
ssize_t tz;

I think ptrdiff_t is a better choice for this. If it is not a big
enough type you are sunk anyway (since keep - hast has this type no
matter what you assign the result to) and if it is big enough you make
the code use only C types.

<snip>
 
J

John Kelly

Since you can't dodge all the bullets, it would be more helpful to what
ones you are proofing against. I.e. what is the contract between the
caller and called program? What are you trying to guard against?

The instances of setting errno and returning -1 are the contract terms.
You could add comments about that, but code and comments always get out
of sync. Documenting the errors would be good, but I think a man page
is a better place to list them. Any volunteers?

I.e. rather than describe the internal temporary objects, what do you
assume about the environment and about the pointer that is passed to
generate what guarantees from the function.

I prefer few comments. Just enough to hint at what's going on. Code
that's well thought out should have a natural interface needing little
explanation. If there are gotchas in the code or interface, the code
should be refined until they're gone.

The purpose of the function is to trim leading and trailing whitespace
from any string found at the pointer location, whether data or garbage.
That could be spelled out, but the less said, the better.

I think ptrdiff_t is a better choice for this.

ssize_t tz counts the iterations looking for \0. SSIZE_MAX limits the
iterations looking \0. To be sure of calling memmove with a positive
number, the chosen limit must be such that (keep - hast) will always be
positive.

If it is not a big
enough type you are sunk anyway (since keep - hast has this type no
matter what you assign the result to) and if it is big enough you make
the code use only C types.

From what I understood of the ptrdiff discussion, ssize_t and SSIZE_MAX
satisfy the requirement for the memmove length to be positive . Or did
I misunderstand?
 
J

John Kelly

Are you the same "John Kelly" who contributed


... to the "Where next?" thread? If so, I spit on your copyright,
on your hoity-toity license, and on your double standard.

Yeah those cargo cult newbies are the devil aren't they.

:-D
 
J

James Waldby

Define symbols and (words)
exam ......... temporary char *
hast ......... temporary char * to alpha !isspace
....
... rather than describe the internal temporary objects, [tell] what
you assume about the environment and about the pointer that is passed
to generate what guarantees from the function.

I prefer few comments. Just enough to hint at what's going on. Code
that's well thought out should have a natural interface needing little
explanation. If there are gotchas in the code or interface, the code
should be refined until they're gone.

The purpose of the function is to trim leading and trailing whitespace
from any string found at the pointer location, whether data or garbage.
That could be spelled out, but the less said, the better.[/QUOTE]
....

One purpose of comments is to let maintainers know what the code is
supposed to be doing. The test cases that you wrote don't invoke
the errors that your code attempts to detect. Your comments ought
perhaps to say what you are trying to detect, in case the code needs
fixing in the future after it reports errors.

It's a good idea to say, at the beginning of a function, what the
intent of that function is. For example, you could add the following
at front:
/* Function static int trim(char **ts) trims leading and trailing
whitespace from garbage or whatever at location *ts. It returns the
trimmed string length if no error was detected, else EOVERFLOW or EINVAL.
*/

Since your new version doesn't modify *ts one could use parameter
list (char *ts) rather than (char **ts) [or could use (char const *ts)
if you add a (char *)ts cast in memmove call].
 
J

John Kelly

From what I understood of the ptrdiff discussion, ssize_t and SSIZE_MAX
satisfy the requirement for the memmove length to be positive . Or did
I misunderstand?

Uh-oh. That assumes SSIZE_MAX will always be <= PTRDIFF_MAX / 2.

Normally it will be, but I should check to be sure.

The maximum value (>= 0) that memmove can accept, is what defines the
limit for the number of iterations looking for \0.

One problem is, my test platform (Interix) does not have PTRDIFF_MAX or
SIZE_MAX. Only SSIZE_MAX. One other platform I looked at, linux, also
has SSIZE_MAX, so that's what I went with.

I suppose you could have a macro to test what's available and use the
largest that satisfies <= PTRDIFF_MAX / 2.

Or should I say <= ((PTRDIFF_MAX / 2) - 1)

Or to be really sure how about <= ((PTRDIFF_MAX - 1) / 2)

Right now I can't think hard enough to know which is correct to avoid an
off-by-one error.
 
J

John Kelly

One purpose of comments is to let maintainers know what the code is
supposed to be doing. The test cases that you wrote don't invoke
the errors that your code attempts to detect.

Testing for overflow was too much work. And it will probably crash
Interix. Any volunteers?

Your comments ought
perhaps to say what you are trying to detect, in case the code needs
fixing in the future after it reports errors.
It's a good idea to say, at the beginning of a function, what the
intent of that function is. For example, you could add the following
at front:
/* Function static int trim(char **ts) trims leading and trailing
whitespace from garbage or whatever at location *ts. It returns the
trimmed string length if no error was detected, else EOVERFLOW or EINVAL.
*/
OK.


Since your new version doesn't modify *ts one could use parameter
list (char *ts) rather than (char **ts) [or could use (char const *ts)
if you add a (char *)ts cast in memmove call].

I didn't think of that. I'll review it again, for improvements. When I
posted trim(), I didn't know how much work I was getting into. :-/
 
K

Keith Thompson

John Kelly said:
Testing for overflow was too much work. And it will probably crash
Interix. Any volunteers?
[...]

Wasn't testing for overflow the main point? Do you still believe
it's possible?
 
S

Seebs

John Kelly said:
Testing for overflow was too much work. And it will probably crash
Interix. Any volunteers? [...]

Wasn't testing for overflow the main point? Do you still believe
it's possible?

Haven't you noticed? He's going to do things that are amazing and
prove the naysayers wrong, which makes him feel important. If he can't
achieve devastating success in a few days, he'll give an excuse for
why it wasn't important, which doesn't make him feel bad because he
explained why it doesn't matter.

This allows him to get the thrill of being a revolutionary without any
actual point at which he has to think about what people say or do any
hard work.

Read up on pathological narcissism. It's not impossible to derive value
from responding to his posts, but you have to do so with a clear
understanding of what he's doing and why or it will be very frustrating.

-s
 
J

John Kelly

John Kelly said:
Testing for overflow was too much work. And it will probably crash
Interix. Any volunteers?
[...]

Wasn't testing for overflow the main point?

We have different ideas of what overflow means in this context. Have
you tested the code, or are you having fun pontificating?

Do you still believe it's possible?

What? Overflow, or testing for it?
 
K

Keith Thompson

John Kelly said:
John Kelly said:
Testing for overflow was too much work. And it will probably crash
Interix. Any volunteers?
[...]

Wasn't testing for overflow the main point?

We have different ideas of what overflow means in this context.

So what's your idea of what overflow means in this context?
Have
you tested the code, or are you having fun pontificating?

I haven't tested the code. I'm not sure exactly what it's supposed to
do. Provide some documentation, and I might try it.

In any case, testing isn't necessarily useful in the presence of
undefined behavior.
What? Overflow, or testing for it?

Testing for it.

Your stated goal was to create a "bullet proof" version of your
trim() function. Now you say testing for "overflow" is too much
work.

Just what error conditions do you intend to handle? What error
conditions, if any, can you not handle?

My position is this: It is not possible, given your specification
for trim(), to implement it in a way that avoids undefined behavior
unless you're able to control all possible callers. You can define
the circumstances in which its behavior is undefined, and define
the behavior in cases where it is defined. You cannot entirely
eliminate undefined behavior. Do you believe that you can? If not,
what *exactly* does "bullet proof" mean?
 
J

John Kelly

Testing for it.

Your stated goal was to create a "bullet proof" version of your
trim() function. Now you say testing for "overflow" is too much
work.

Just what error conditions do you intend to handle? What error
conditions, if any, can you not handle?

My position is this: It is not possible, given your specification
for trim(), to implement it in a way that avoids undefined behavior
unless you're able to control all possible callers. You can define
the circumstances in which its behavior is undefined, and define
the behavior in cases where it is defined. You cannot entirely
eliminate undefined behavior. Do you believe that you can? If not,
what *exactly* does "bullet proof" mean?

I appreciate your concern and welcome what help I can get. Right now
I'm focusing on one specific item.

void *memmove(void *dest, const void *src, size_t n);

It expects size_t for the length. It appears universally true that

typedef unsigned int size_t

Thus to me it makes no sense to call memmove with a negative value. Am
I wrong?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,534
Members
45,007
Latest member
obedient dusk

Latest Threads

Top