Strange bug

S

Super Ted

Hi I am trying to write a function to trim white-space from the beginning
and end of a string. I have decided to capitalize on the recursive nature
of this task and also to subdivide it into two simpler functions (trim
left, trim right).

Here is my code with a test-harness:

#include <stdio.h>

static int TrimWSpaceL(char *s)
{
if(isspace(*s)) {
strcpy(s,s+1);
return 1+TrimWSpaceL(s);
}
else
return 0;
}

static int TrimWSpaceR(char *s)
{
if(isspace(*(s+strlen(s)-1))) {
*(s+strlen(s)-1)='\0';
return 1+TrimWSpaceR(s);
}
else
return 0;
}

/* trims white-space, returning #spaces trimmed */
int TrimWSpace(char *s)
{
return TrimWSpaceL(s) + TrimWSpaceR(s);
}

int main()
{
char s[] = " \t *Hello world* \n ";
int i = TrimWSpace(s);
printf("%s\n(deleted %d spaces)\n", s,i);
return 0;
}

However I get an unexpected output:

*Hello world
(deleted 11 spaces)

Both lines are wrong! In the first line, a * has disappeared. In the 2nd
line, 11 spaces are reported, not 10!

Can anyone help?

Cheers.
 
S

Super Ted

Ian said:
You haven't included <ctype.h> or <string.h>, so you don't have
prototypes for strcpy, strlen or isspace.

Sorry, but adding those includes does not fix the problem!
 
I

Ian Collins

Sorry, but adding those includes does not fix the problem!

It should, the code appears fine (and works for me).

../a.out
*Hello world*
(deleted 10 spaces)
 
K

Keith Thompson

Ian Collins said:
It should, the code appears fine (and works for me).

./a.out
*Hello world*
(deleted 10 spaces)

strcpy(s,s+1);

C99 7.21.2.3:

The strcpy function copies the string pointed to by s2 (including
the terminating null character) into the array pointed to by
s1. If copying takes place between objects that overlap, the
behavior is undefined.

If you really wanted to do it this way, you could use memmove(),
which works for overlapping arguments. But the approach
is quite inefficient; each character is copied multiple times.
 
I

Ian Collins

strcpy(s,s+1);

C99 7.21.2.3:

The strcpy function copies the string pointed to by s2 (including
the terminating null character) into the array pointed to by
s1. If copying takes place between objects that overlap, the
behavior is undefined.

If you really wanted to do it this way, you could use memmove(),
which works for overlapping arguments. But the approach
is quite inefficient; each character is copied multiple times.

Silly me - the prototype should have given that away:

char *strcpy(char *restrict s1, const char *restrict s2);
 
M

Mark Wooding

Super Ted said:
#include <stdio.h>

The `strcpy' and `strlen' functions you use are declared in <string.h>.
The `isspace' function is declared in <ctype.h>. You should include
these headers if you use the functions. I believe that the implicit
declaration of `isspace' if you omit <ctype.h> is sufficient to be able
to call it in a well-defined way; but this is not the case for `strcpy'
or `strlen', neither of which return `int' -- the former returns a
`char *' and the latter a `size_t'.

My compiler gives me warnings about the above. You should probably turn
on your compiler's warnings.

#include <ctype.h>
#include said:
static int TrimWSpaceL(char *s)
{
if(isspace(*s)) {

The ismumble functions from <ctype.h> expect to receive their argument
in a rather peculiar form: an `unsigned char' converted to an `int'. If
the `char' type is signed on your system (as it is in many), and the
first character of the string is negative (and not EOF, a special
exemption) then the behaviour is undefined.
strcpy(s,s+1);

The source and destination arguments for `strcpy' mustn't overlap. This
has undefined behaviour. Also, you do this copy for every leading
whitespace character you find: if the string consists entirely of
n whitespace then you move a total of n (n + 1)/2 characters, taking
quadratic time to solve a linear-time problem.
return 1+TrimWSpaceL(s);

This is not a tail position. You have potentially unbounded non-tail
recursion, and therefore run the risk of exhausting the stack. There is
no reason to use more than a constant amount of memory to solve this
problem.

Also, the type `int' may be too small to store the number of whitespace
characters removed from the string. I'd use `size_t' instead.
}
else
return 0;
}

static size_t trimspace_l(char *p)
{
const char *q;

for (q = p; *q && isspace((unsigned char)*q); q++);
if (q > p) memmove(p, q, strlen(q) + 1);
return (q - p);
}
static int TrimWSpaceR(char *s)
{
if(isspace(*(s+strlen(s)-1))) {

Same remarks about isspace again. Many people would find
`s[strlen(s) - 1]' clearer, though the two are entirely equivalent.

More seriously, this has a bug if the string is empty: strlen(s) is
zero, so this is s[-1], probably outside the bounds of the string, and
undefined behaviour.
*(s+strlen(s)-1)='\0';

The `strlen' function has to scan the entire string trying to find a
zero byte. For long strings, this takes a fair amount of time. You
call it twice for each trailing whitespace character. Again, this leads
to quadratic running time for ought to be a linear-time operation.
return 1+TrimWSpaceR(s);

And again the unbounded recursion, using linear space for what ought to
be a constant-space operation. And again, `int' may not be large
enough.
}
else
return 0;
}

static size_t trimspace_r(char *p)
{
char *q = p + strlen(p);
const char *r = q;

while (q > p && isspace((unsigned char)q[-1])) *--q = 0;
return (r - q);
}
/* trims white-space, returning #spaces trimmed */
int TrimWSpace(char *s)
{
return TrimWSpaceL(s) + TrimWSpaceR(s);

This performs the two trimmings in an arbitrary order. Either order is
correct, but it's slightly (negligibly!) more efficient to trim the
right hand side first (why?).

static size_t trimspace(char *p)
{
size_t n = 0;

n += trimspace_r(p);
n += trimspace_l(p);
return (n);
}
int main()

It's more usual nowadays to write `int main(void)' in C. The above
doesn't provide a `prototype', which probably doesn't matter for `main'
because calling `main' is usually only done in IOCCC submissions.
{
char s[] = " \t *Hello world* \n ";
int i = TrimWSpace(s);
printf("%s\n(deleted %d spaces)\n", s,i);
return 0;
}

This seems OK.
However I get an unexpected output:

*Hello world
(deleted 11 spaces)

Both lines are wrong! In the first line, a * has disappeared. In the 2nd
line, 11 spaces are reported, not 10!

I don't know which of the mistakes I've pointed above causes this. (On
my system, I get the expected output for your program. That doesn't
mean that the mistakes aren't serious -- only that they don't cause
observable failures on one particular machine with one particular
compiler version using one particular collection of compiler options.

-- [mdw]
 
S

Super Ted

Keith said:
strcpy(s,s+1);

C99 7.21.2.3:

The strcpy function copies the string pointed to by s2 (including
the terminating null character) into the array pointed to by s1. If
copying takes place between objects that overlap, the behavior is
undefined.

If you really wanted to do it this way, you could use memmove(), which
works for overlapping arguments. But the approach is quite inefficient;
each character is copied multiple times.

This is a truely stupid restriction. I have now written a strcpy_safe
function by hand, where all string copies are valid, and from now on I
will use only that in all my future code.
 
S

Super Ted

More seriously, this has a bug if the string is empty: strlen(s) is
zero, so this is s[-1], probably outside the bounds of the string, and
undefined behaviour.

I do not check argument validity for efficiency reasons. Like with
standard functions eg strlen, the caller of my function is responsible
for ensuring it doesn't pass a null string.
 
S

Seebs

This is a truely stupid restriction.

Is it?

Because it seems to me that if it were all that stupid, surely someone
would have pointed this out and it wouldn't be there.

Is it possible that there is a reason for it which you don't understand?
I have now written a strcpy_safe
function by hand, where all string copies are valid, and from now on I
will use only that in all my future code.

All string copies? That's an accomplishment.

-s
 
K

Keith Thompson

Super Ted said:
More seriously, this has a bug if the string is empty: strlen(s) is
zero, so this is s[-1], probably outside the bounds of the string, and
undefined behaviour.

I do not check argument validity for efficiency reasons. Like with
standard functions eg strlen, the caller of my function is responsible
for ensuring it doesn't pass a null string.

By "null string", you mean a string with a length of zero, right?
The usual term for that is "empty string", which avoids confusion
with null pointers. (Pointers are not strings, of course, but
passing a string is usually accomplished by passing a pointer to
the first character.)

Note that strlen is perfectly happy with an empty string (it
returns 0).

Does the documentation for your function state that its behavior is
undefined when it's given an empty string? If so, then passing an
empty string is caller's fault; if not, failing to handle an emtpy
string gracefully is your fault.
 
S

Seebs

More seriously, this has a bug if the string is empty: strlen(s) is
zero, so this is s[-1], probably outside the bounds of the string, and
undefined behaviour.
I do not check argument validity for efficiency reasons. Like with
standard functions eg strlen, the caller of my function is responsible
for ensuring it doesn't pass a null string.

You are confused. An empty string is not the same as a null string.

You should absolutely handle empty strings correctly.

-s
 
K

Keith Thompson

Super Ted said:
Keith Thompson writes: [...]
C99 7.21.2.3:

The strcpy function copies the string pointed to by s2 (including
the terminating null character) into the array pointed to by s1. If
copying takes place between objects that overlap, the behavior is
undefined.

If you really wanted to do it this way, you could use memmove(), which
works for overlapping arguments. But the approach is quite inefficient;
each character is copied multiple times.

This is a truely stupid restriction. I have now written a strcpy_safe
function by hand, where all string copies are valid, and from now on I
will use only that in all my future code.

Note that the name "strcpy_safe", like any identifier starting with
"str" and a lowercase letter, is reserved.

The point of the restriction is that the vast majority of string
copies take place between non-overlapping objects. strcpy() can be
made just a little faster by not having to check for an overlap.
We give up defined behavior in 0.1% of cases for greater speed in
99.9% of cases (yes, I pulled those numbers of thin air).

You might be giving up even more performance if the implementation's
strcpy() uses compiler-specific tricks to get better speed, tricks
that you can't use in portable code.

Your strcpy_safe function might still be useful. If it were in the
standard, strmove() would probably be a good name for it (by analogy
with memcpy() vs. memmove()).

A suggestion: don't call a language feature or restriction "truly
stupid" until you understand why it's there. Sometimes you might
be right, but I think you'll often find there are good reasons you
haven't yet thought about.
 
J

J. J. Farrell

Super said:
More seriously, this has a bug if the string is empty: strlen(s) is
zero, so this is s[-1], probably outside the bounds of the string, and
undefined behaviour.

I do not check argument validity for efficiency reasons. Like with
standard functions eg strlen, the caller of my function is responsible
for ensuring it doesn't pass a null string.

I'm assuming that by "null string" you mean what's usually called an
empty string, or a zero-length string, the type of string Mark was
talking about. That's a perfectly valid string; you're wrong about the
standard functions, they're required to handle such a string correctly.
 
F

Francois Grieu

Hi I am trying to write a function to trim white-space from the beginning
and end of a string. I have decided to capitalize on the recursive nature
of this task and also to subdivide it into two simpler functions (trim
left, trim right).

The reasons I know to use C rather than (insert other language) are
- fast executable code.
- source code understandable by a wide human audience.
- source code portable to many platforms.

The approach used
- resulted in inherently slow code (execution time is proportional
to the square of the length of the input for a string consisting
of spaces and final char).
- resulted in verbose code, that is thus harder to check than
a straight non-recursive version.
- gave code that was not portable (non-conformant use of strcpy;
missing headers;then use of a reserved name for a strcpy substitute).

I conclude the decision "to capitalize on the recursive nature of
this task" and to do it in C are inconsistent, except perhaps as a
learning exercise
- on how to code recursive tasks in C.
- on why NOT to code thing recursively when they can be coded more
simply without recursion.

Also, all four embedded C compilers I use have two modes: one that
supports recursion, and another that generates much faster and
compact code by analyzing the call tree and assigning static addresses
to all automatic variables. Guess which one is used, and the resulting
practical portability of recursive code.

I note that the archetypal learning use case for recursion are
factorials and the "Tower of Hanoi", although both happen to have
a non-recursive solution which is much simpler:
http://en.wikipedia.org/wiki/Tower_of_Hanoi#Non-recursive_solution
This should be a giveaway that recursion is seldom useful, since
instructors fail to find a good use case for it.

Francois Grieu
 
B

BartC

Francois Grieu said:
The reasons I know to use C rather than (insert other language) are
- fast executable code.
- source code understandable by a wide human audience.
- source code portable to many platforms.

(Which is also a bit of myth (from what I've seen). Ignoring the mess of
conditional macros that seem to be needed to make this happen, 'portability'
also seems to depend on using, for example, the right compiler, the exact
sequence of compiler options, and the right makefile syntax. And a lot of
luck)
The approach used
- resulted in inherently slow code (execution time is proportional
to the square of the length of the input for a string consisting
of spaces and final char).
- resulted in verbose code, that is thus harder to check than
a straight non-recursive version.
- gave code that was not portable (non-conformant use of strcpy;
missing headers;then use of a reserved name for a strcpy substitute).

I think this exercise was just an experiment, rather than a serious
implementation.
I note that the archetypal learning use case for recursion are
factorials and the "Tower of Hanoi", although both happen to have
a non-recursive solution which is much simpler:
http://en.wikipedia.org/wiki/Tower_of_Hanoi#Non-recursive_solution
This should be a giveaway that recursion is seldom useful, since
instructors fail to find a good use case for it.

It tends to be useful when the problem itself, or the data, has a recursive
nature...
 
E

Eric Sosman

Keith Thompson writes:
[...]
C99 7.21.2.3:

The strcpy function copies the string pointed to by s2 (including
the terminating null character) into the array pointed to by s1. If
copying takes place between objects that overlap, the behavior is
undefined.
[...]
This is a truely stupid restriction. I have now written a strcpy_safe
function by hand, where all string copies are valid, and from now on I
will use only that in all my future code.

Super Ted, we all owe you a tremendous debt of gratitude for
pointing out the stupidity of a restriction that has plagued the C
library for more than two decades. In all that time, you're the
first to have seen that the restriction is pointless and could be
removed easily. Thank you, thank you, thank you for being so much
smarter than all the C practitioners that have gone before.

Just a few minor matters: (1) using strcpy_safe as the name of
your function isn't a good idea, since all external names beginning
with str... are reserved, (2) there's a faint chance that the preceding
paragraph might be insincere, and that there's something you haven't
realized yet, and (3) there's no "e" in "truly."
 
B

BartC

Eric Sosman said:
Keith Thompson writes:
[...]
C99 7.21.2.3:

The strcpy function copies the string pointed to by s2 (including
the terminating null character) into the array pointed to by s1. If
copying takes place between objects that overlap, the behavior is
undefined.
[...]
This is a truely stupid restriction. I have now written a strcpy_safe
function by hand, where all string copies are valid, and from now on I
will use only that in all my future code.
realized yet, and (3) there's no "e" in "truly."

No, but there is one in "truely".
 
E

Eric Sosman

(Which is also a bit of myth (from what I've seen). Ignoring the mess of
conditional macros that seem to be needed to make this happen,
'portability'
also seems to depend on using, for example, the right compiler, the exact
sequence of compiler options, and the right makefile syntax. And a lot
of luck)

Topic drift: It used to be said that "C combines the power of
assembly language with the portability of assembly language," and
this was (sort of) true twenty-odd years ago. Prior to the ANSI
standard, C code had to be larded up with all manner of #ifdef's
and the like if it was to be ported at all. C wasn't portable; it
was luggable.

But the ANSI standard improved matters vastly, which I think
was the main reason for its rapid uptake (contrast the reluctant
adoption of C99). With ANSI, it became not only possible but
practical to write portable code, with few or even no "feature test"
macros lying around. Read some C from a quarter-century ago and
contrast it with more recent C, and the improvement will be obvious.

Unfortunately, a C Standard can only legislate C itself, not the
environments in which C runs. Programs that use networking or window
systems or shared memory or ... still need to indulge in feature-
testing -- but it's for the beyond-C stuff, not for C per se. (Indeed,
some parts of POSIX are devoted to standardizing feature tests!) You
no longer see #ifdef's governing whether <string.h> or <strings.h> is
included, you see #ifdef's governing whether to use fork() or spawn().

... except, of course, when you run across C code written by
people who have not learned how to write portably. It remains possible
to write non-portable C (indeed, the Rationale treats the possibility
as an objective), and there are lots of bad examples to lead beginners
astray. Perhaps that's why this group attaches so much importance to
describing what is and isn't portable: It's not about prohibiting code
that is intentionally and constructively non-portable, but about raising
consciousness so that code isn't non-portable out of carelessness.

As for Makefiles, compiler options, and the like: Again, you're
talking about things outside the realm of the C language. It is, I
guess, conceivable that the C Standard might have tried to specify the
nature and form of the tools used with C, but I doubt the effort would
have been successful; we might still be waiting for the first Standard
if it had tried to specify all the linker options for Unix, Windows,
OpenVMS, MVS, ...
I think this exercise was just an experiment, rather than a serious
implementation.

Probably. In which case it's odd that the O.P. uses "efficiency"
as his excuse for not validating function arguments. (It's also odd
that he prates of "efficiency" while using an inherently inefficient
algorithm.)
It tends to be useful when the problem itself, or the data, has a recursive
nature...

Francois' point remains, though: If such problems are commonplace,
why do the teachers always assign exercises about Fibonacci numbers?
Can't they think of a problem for which recursion is useful, and not
just possible? I'd have thought Quicksort would make a good vehicle,
both for teaching the idea of recursion and for illustrating some of
its pitfalls -- but no, we keep seeing Fibonacci computations that
are described as "elegant" and take exponential running time. Pfui!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top