Comment on trim string function please

B

Bill Reid

Eric Sosman said:
Right. Or, "Hundreds of times don't nuthin' happen a-tall,"
says an old joke about the unimportance of contraception.

The REALLY important thing, is what are the defineable cases
for the "hundreds of times" for this "issue"...is it characters that
may appear on a "system", or the "systems" themselves?
[...]
I WANT MY COOKIE!!!!!!
You haven't yet shown that char is in fact signed on your
system.

Anything to weasel out of a bet...

Although you fail to answer the crucial question, by careful
reading between the lines I am now assuming that this "major
bug" is something that only applies to a small, perhaps imaginary,
subset of computer systems...
 
B

Bill Reid

Then you need to practice believing more things. :)

I try to keep an open mind, but not so open my brains fall out...remember,
we actually have a claim on this thread that moving a single character one
space forward can take "a billion" cycles (perhaps Carl Sagan's ghost is
posting here contributing to the pointless obfuscatory pedantry)...
On some systems, a call to memmove() will generate an actual call to a
library function that moves one byte at a time just like your loop, and
thus will always be slower.

OK, we're STILL not up to a "billion" cycles, and I specifically covered
this possibility later in this post. All you're really doing here is
agreeing
with my post (remember, I'm the guy who doesn't want to call memmove()
at all except for large amounts of text). Unfortunately, what you write
is just technical "common sense", and thus is completely inappropriate
for THIS forum, in which there are imaginary "systems" that require a
"billion" cycles to move a single character one space forward...try to
conform to the "rules of the newsgroup", or face the most dire of
consequences...
On other systems, a call to memmove() will generate an actual call to a
library function that uses a few (maybe even just one) machine
instructions to move the bytes much faster than your byte at a time
loop. Which is faster depends on the number of bytes moved, how much
overhead there is in a function call, and how much faster the machine
instructions are than your loop. It is likely that your loop will be
faster for "short" strings and slower for "long" strings, but exactly
where the break-even point is will vary from system to system. It might
be just a few bytes, it might be tens of bytes, it might even be
hundreds of bytes (although probably not).

OK, I warned you about "common sense", and then you post a
whole bunch more of it...
On still other systems, a call to memmove() will generate those few (or
maybe even just one) machine instructions inline, which will almost
certainly be faster than your byte at a time loop all the time.

Does my turning off ALL optimizations factor into this? I think it
does, since the documentation specifically mentions inlining "intrinsic"
functions as one of the configurable optimization, and I'm pretty
sure memmove() is one of them...
Although the C standard describes fread()/fwrite() as working as if they
called fgetc()/fputc() for each byte, I've never seen an implementation
where they actually do that; real implementations nearly always *do* do
something special instead. On the other hand, I've seen actual
implementations use all three of the above methods of implementing
memmove().

OK, I actually use fread() and fwrite() anyway, so I guess I have
wasted any precious cycles THAT way...
 
B

Bill Reid

Mark McIntyre said:
Bill Reid wrote:

Take it how you like. It means what it says.
For the record, I have several answers, depending on which system you're
interested in. However you have zero chance of getting any of them from
me now.

I'll take that as "I made a silly statement about it taking 'a billion'
cycles to move a single character one space forward on some 'systems',
and couldn't possibly provide any evidence to support that buffoonery,
therefore I'm now playing the 'temper tantrum card'", one of the more
popular behavior patterns here, where the newsgroup regular takes
faux offense and "stomps off mad" to avoid having to deal with
simple technical issues...
And had you asked the question over in a system-specific group, I'd have
answered with a relevant answer.

However its not topical here, so given your abusive attitude I suggest
you swivel on it.

Brilliant. The best thing about this childish behavior is, of course,
their purported belief that they are somehow withdrawing their valuable
"help" from me and the newsgroup...they're taking their ball and running
home crying, but the ball was deflated in the first place...
 
B

Bill Reid

santosh said:
(e-mail address removed) wrote:

this is my code:

$ cat a.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

char *str_reverse(char *p)
{
char *p1, *p2, ch;

for (p1 = p; *(p1 + 1); p1++) ;
for (p2 = p; p2 < p1; p2++, p1--)
ch = *p2, *p2 = *p1, *p1 = ch;
return p;
}

char *str_trim(char *s)
{
char *p;

p = s + strlen(s) - 1;
while (isspace(*p)) *p-- = '\0';
return s;
}

int main (int argc, char *argv[])
{
printf("\"%s\"\n", argv[1]);
printf("\"%s\"\n", str_trim(argv[1]));
printf("\"%s\"\n",
str_reverse(str_trim(str_reverse(argv[1]))));
return EXIT_SUCCESS;
}
[ ... ]

Your program misbehaves when invoked with no arguments or with an empty
string argument.

I'm back with the most-efficient code to do this, except of course
it depends on the "system", with all recommended fixes added in:

char *remove_beg_end_non_text(char *text) {
char *beg;
size_t length;

for(beg=text;isspace(*beg);beg++);

if(*beg=='\0') {

*text='\0';
return text;
}

length=strlen(beg);

while(length>0)
if(!isspace(*(beg+(--length)))) break;

*(beg+(++length))='\0';

return memmove(text,beg,length+1);
}
 
B

Ben Bacarisse

rio said:
I sort of took a different approach based on the comments here.

#include <ctype.h>
#include <string.h>

char* TrimCString(char *str)
{
// Trim whitespace from beginning:
size_t i = 0;
size_t length = strlen(str);
char* new_start;

if((str == NULL) || (length == 0))
{
return str;
}

while(isspace((unsigned char)str))
{
i++;
}

new_start = str + i;
length -= i;

// Trim whitespace from end:
i = (length ? (length - 1) : length);

if(i != 0)
{
while(isspace((unsigned char)str))
{
i--;
length--;
}
str[i + 1] = '\0';
}
memmove(str, new_start, (length + 1));
return str;
}


what happen on this " "
in the last memmove?


Can't you work it out?
 
L

lovecreatesbea...

this is my code:
$ cat a.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
char *str_reverse(char *p)
{
    char *p1, *p2, ch;
    for (p1 = p; *(p1 + 1); p1++) ;
    for (p2 = p; p2 < p1; p2++, p1--)
        ch = *p2, *p2 = *p1, *p1 = ch;
    return p;
}
char *str_trim(char *s)
{
        char *p;
        p = s + strlen(s) - 1;
        while (isspace(*p)) *p-- = '\0';
        return s;
}
int main (int argc, char *argv[])
{
        printf("\"%s\"\n", argv[1]);
        printf("\"%s\"\n", str_trim(argv[1]));
        printf("\"%s\"\n",
str_reverse(str_trim(str_reverse(argv[1]))));
        return EXIT_SUCCESS;
}

[ ... ]

Your program misbehaves when invoked with no arguments or with an empty
string argument.

It was. I've revised the main() function.

$ cat a.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

char *reverse(char *p)
{
char *p1, *p2, ch;

for (p1 = p; *(p1 + 1); p1++) ;
for (p2 = p; p2 < p1; p2++, p1--)
ch = *p2, *p2 = *p1, *p1 = ch;
return p;
}

char *trim(char *s)
{
char *p;

p = s + strlen(s) - 1;
while (isspace(*p)) *p-- = '\0';
return s;
}

int main (int argc, char *argv[])
{
char *s;

if (argc != 2){
printf("Usage: %s <string>\n", argv[0]);
return EXIT_FAILURE;
}
s = malloc(strlen(argv[1]) + 1);
if (!s){
printf("Failed to allocate memory!\n");
return EXIT_FAILURE;
}
strcpy(s, argv[1]);
printf("\"%s\"\n", s);
printf("\"%s\"\n", trim(s));
printf("\"%s\"\n", reverse(trim(reverse(s))));
free(s);
return EXIT_SUCCESS;
}
$ make
gcc -ansi -pedantic -Wall -W -c -o a.o a.c
gcc a.o -o a.out
$ ./a.out
Usage: ./a.out <string>
$ ./a.out ""
""
""
""
$ ./a.out " hello world ! "
" hello world ! "
" hello world !"
"hello world !"
$
 
B

Ben Bacarisse

Bill Reid said:
I'm back with the most-efficient code to do this, except of course
it depends on the "system", with all recommended fixes added in:

char *remove_beg_end_non_text(char *text) {
char *beg;
size_t length;

for(beg=text;isspace(*beg);beg++);

I am curious about why you don't put in the cast? Do you not accept
that it should be there?
if(*beg=='\0') {

*text='\0';
return text;
}

length=strlen(beg);

while(length>0)

This test is now redundant
if(!isspace(*(beg+(--length)))) break;

and this is overly complex. At this point you know for certain that
length > 0 and that there is a non-space character "in front" of
beg[length]. There is no need to test while (length > 0) or to break
when you find it:

while (isspace((unsigned char)beg[--length]));

(I prefer [] to * and + and I want to put the cast in).
 
B

Bill Reid

Ben Bacarisse said:
I am curious about why you don't put in the cast? Do you not accept
that it should be there?

Yeah, I haven't received a RATIONAL explanation yet as to why
it MUST be there...it doesn't seem to be needed on MY "system"...
This test is now redundant

I realized that exactly one picosecond after I pressed "Send"...
if(!isspace(*(beg+(--length)))) break;

and this is overly complex. At this point you know for certain that
length > 0 and that there is a non-space character "in front" of
beg[length]. There is no need to test while (length > 0) or to break
when you find it:

while (isspace((unsigned char)beg[--length]));

(I prefer [] to * and + and I want to put the cast in).

"Preferences" and "wants" aside, you are correct, sir...

So that now leaves us with:

char *remove_beg_end_non_text(char *text) {
char *beg;
size_t length;

for(beg=text;isspace(*beg);beg++);
if(*beg=='\0') { *text='\0'; return text; }
length=strlen(beg);
while(!isspace(*(beg+(--length))));
*(beg+(++length))='\0';
return memmove(text,beg,length+1);
}

....with some preferential whitespace adjustments to reduce it
to what appears to be six lines of code...
 
B

Ben Bacarisse

Bill Reid said:
Yeah, I haven't received a RATIONAL explanation yet as to why
it MUST be there...it doesn't seem to be needed on MY "system"...

What I am missing is a rational explanation of why you would leave it
out. What do you gain in exchange for this loss of portability?
 
K

Kaz Kylheku

I'm back with the most-efficient code to do this, except of course
it depends on the "system", with all recommended fixes added in:

I can't imagine a useful program whose overall performance can be measurably
affected by the performance of a string trimming function.
char *remove_beg_end_non_text(char *text) {
char *beg;
size_t length;

for(beg=text;isspace(*beg);beg++);

if(*beg=='\0') {

*text='\0';
return text;
}

length=strlen(beg);

while(length>0)
if(!isspace(*(beg+(--length)))) break;

*(beg+(++length))='\0';

return memmove(text,beg,length+1);
}

Programer efficiency first. Know thy standard library.

#include <string.h>

char *trim_destructively(char *str)
{
const char *space_chars = " \t\n\v\f\b";
char *skip_space = str + strspn(str, space_chars);
char *skip_nonspace = skip_space + strcspn(skip_space, space_chars);
*skip_nonspace = 0;
return skip_space;
}

memmoving alternative:

char *trim_destructively(char *str)
{
const char *space_chars = " \t\n\v\f\r";
char *skip_space = str + strspn(str, space_chars);
int nonspace_len = strcspn(skip_space, space_chars);
skip_space[nonspace_len] = 0;
return memmove(str, skip_space, nonspace_len + 1);
}

Doh!
 
L

lovecreatesbea...

But the problem is in the str_trim function, not main:




You can't do this is strlen(s) is 0.


and this makes matter even worse.

Both functions had problems. Revised:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

char *reverse(char *p)
{
char *p1, *p2, ch;

if (!*p || !p) return p;
for (p1 = p; *(p1 + 1); p1++) ;
for (p2 = p; p2 < p1; p2++, p1--)
ch = *p2, *p2 = *p1, *p1 = ch;
return p;
}

char *trim(char *s)
{
char *p;

if (!*s || !s) return s;
p = s + strlen(s) - 1;
while (isspace(*p)) *p-- = '\0';
return s;
}

int main (int argc, char *argv[])
{
char *s;

if (argc != 2){
printf("Usage: %s <string>\n", argv[0]);
return EXIT_FAILURE;
}
s = malloc(strlen(argv[1]) + 1);
if (!s){
printf("Failed to allocate memory!\n");
return EXIT_FAILURE;
}
strcpy(s, argv[1]);
printf("\"%s\"\n", s);
printf("\"%s\"\n", trim(s));
printf("\"%s\"\n", reverse(trim(reverse(s))));
free(s);
return EXIT_SUCCESS;
}
 
I

Ike Naar

Programer efficiency first. Know thy standard library.

#include <string.h>

char *trim_destructively(char *str)
{
const char *space_chars = " \t\n\v\f\b";
char *skip_space = str + strspn(str, space_chars);
char *skip_nonspace = skip_space + strcspn(skip_space, space_chars);
*skip_nonspace = 0;
return skip_space;
}

That would transform " foo bar " to "foo".

Ike
 
C

CBFalconer

.... snip ...

Both functions had problems. Revised:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

char *reverse(char *p) {
char *p1, *p2, ch;

if (!*p || !p) return p;
for (p1 = p; *(p1 + 1); p1++) ;
for (p2 = p; p2 < p1; p2++, p1--)
ch = *p2, *p2 = *p1, *p1 = ch;
return p;
}

char *trim(char *s) {
char *p;

if (!*s || !s) return s;
p = s + strlen(s) - 1;
while (isspace(*p)) *p-- = '\0';
return s;
}

int main (int argc, char *argv[]) {
char *s;

if (argc != 2){
printf("Usage: %s <string>\n", argv[0]);
return EXIT_FAILURE;
}
s = malloc(strlen(argv[1]) + 1);
if (!s) {
printf("Failed to allocate memory!\n");
return EXIT_FAILURE;
}
strcpy(s, argv[1]);
printf("\"%s\"\n", s);
printf("\"%s\"\n", trim(s));
printf("\"%s\"\n", reverse(trim(reverse(s))));
free(s);
return EXIT_SUCCESS;
}

Now turn it around. Make the trim function remove the leading
blanks, rather than trailing blanks (much simpler), and adjust the
rest accordingly. The basic action will be:

if (s)
while (' ' == *p) p++;
return p;

This is an exercise, since in practice the added efficiency won't
matter.
 
B

Ben Bacarisse

Both functions had problems. Revised:

They both still do.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

char *reverse(char *p)
{
char *p1, *p2, ch;

if (!*p || !p) return p;

There is no point in testing p after *p. You want:

if (!p || !*p) return p;

or NULL will still not be handled correctly. (You are not obliged to
handle NULL -- I often leave it as part of the contract between caller
and callee that NULL will not be passed -- but you seem to want to
handle a NULL correctly or you would not have tested for it at all.)
for (p1 = p; *(p1 + 1); p1++) ;
for (p2 = p; p2 < p1; p2++, p1--)
ch = *p2, *p2 = *p1, *p1 = ch;
return p;
}

char *trim(char *s)
{
char *p;

if (!*s || !s) return s;

See above.
p = s + strlen(s) - 1;
while (isspace(*p)) *p-- = '\0';

Here you can still generate in invalid pointer. All pointer/array
loops that run backwards need extra care in C. Think of --
(particularly postfix --) as a red flag the signals danger.
 
L

lovecreatesbea...

Thank santosh, Ben and the Group

There is no point in testing p after *p. You want:

if (!p || !*p) return p;

The order doesn't matter if the test against p is trivial:) I'd like
to keep it and take you suggestion. My code then becomes the same as
the one in Jens' post at the beginning of this thread.
or NULL will still not be handled correctly. (You are not obliged to
handle NULL -- I often leave it as part of the contract between caller
and callee that NULL will not be passed -- but you seem to want to
handle a NULL correctly or you would not have tested for it at all.)

Yes, even p1 doesn't equal to NULL, it's not a valid pointer
immediately.

char *p1, *p2 = NULL;
See above.


Here you can still generate in invalid pointer. All pointer/array
loops that run backwards need extra care in C. Think of --
(particularly postfix --) as a red flag the signals danger.

Do you mean pointing a pointer at an address doesn't belong to one's
own code isn't guaranteed even if no dereference occured. Code
Revised below,

#include <stdlib.h>
#include <string.h>
#include <ctype.h>

/* reverse a string */
char *reverse(char *s)
{
char *p1, *p2, c;

if (!s || !*s) return s;
for (p1 = s; *(p1 + 1); p1++) ;
for (p2 = s; p2 < p1; p2++, p1--)
c = *p2, *p2 = *p1, *p1 = c;
return s;
}

/* trim the trailing spaces of a string */
char *trim(char *s)
{
char *p;

if (!s || !*s) return s;
for (p = s + strlen(s) - 1; p != s; p--)
if (isspace(*p)) *p = '\0';
if (isspace(*p)) *p = '\0';
return s;
}
 
B

Barry Schwarz

Thank santosh, Ben and the Group



The order doesn't matter if the test against p is trivial:) I'd like
to keep it and take you suggestion. My code then becomes the same as
the one in Jens' post at the beginning of this thread.

The order certainly does matter. If p is NULL, which is one of the
things you are checking, then *p invokes undefined behavior. Ben's
code takes advantage of the short circuit nature of || and if p is
NULL then *p is never evaluated.



Remove del for email
 
L

lovecreatesbea...

void TrimCString(char *str)
{
size_t i = 0;
while(isspace(str))


Bug: If `str' contains negative-valued characters, this use
may violate the "contract" of isspace() by passing an out-of-range
argument.


The N1124 May 2005 draft shows that isspace() accepts an int. How is
an negative value an out-of-range argument?
Use `while (isspace( (unsigned char)str ))'. (This
is one of those occasions where a cast is almost always required
and almost always omitted, as opposed to the more usual situation
where a cast is almost always inserted and almost always wrong.)


The C-FAQ 13.6 <http://c-faq.com/lib/strtok.html> shows two example
calls on isspace() with argument type of char without any cast.

I still don't understand why a cast is needed! stupid me. Can someone
please inspire me?
 
V

vippstar

Use `while (isspace( (unsigned char)str ))'. (This
is one of those occasions where a cast is almost always required
and almost always omitted, as opposed to the more usual situation
where a cast is almost always inserted and almost always wrong.)


The C-FAQ 13.6 <http://c-faq.com/lib/strtok.html> shows two example
calls on isspace() with argument type of char without any cast.


A bug perhaps?
I still don't understand why a cast is needed! stupid me. Can someone
please inspire me?

Imagine that isspace is implemented as this in an ASCII system

char _Table[257] = { EOF, 0, 0, 0, .... };
char *_Ptable = _Table + 1;
int isspace(int c) { return _Ptable[c]; }
#define isspace(c) _Ptable[c]

In such case, assuming char is signed, c = getchar(); might be a
negative value, let's assume -12. isspace(-12) is _Ptable[-12], which
is _Table[-11]. Undefined behavior is invoked.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,480
Members
44,900
Latest member
Nell636132

Latest Threads

Top