Highly efficient string reversal code

R

Richard

Richard said:
Which is?

And which now makes me ask if this is UDB too:

unsigned int i=0;
while(i--){
}

While I would not normally do such things I know they exist all over the
place. Keeping in mind that the -- value is never actually used.
 
K

Keith Thompson

Richard said:
Which is?

Described in detail in C99 6.5.6p8.

Given:

int arr[5];
int *ptr = &arr[0];

ptr+0 is the address of the first element of the array.
ptr+4 is the address of the last element of the array.
ptr+5 points just past the last element. This address may legally be
computed, compared, etc., but it cannot be dereferenced (more
precisely, attempting to dereference it invokes UB).

Attempting to compute ptr+N, for N outside the range 0..5, invokes
undefined behavior, whether you subsequently attempt to dereference
the resulting pointer value or not.
 
K

Keith Thompson

Richard said:
And which now makes me ask if this is UDB too:

(UDB meaning Undefined Behavior; the more common abbreviation around
here is UB.)
unsigned int i=0;
while(i--){
}

While I would not normally do such things I know they exist all over the
place. Keeping in mind that the -- value is never actually used.

No, why would you think the behavior might be undefined? Decrementing
an unsigned object with the value 0 simply sets it to UINT_MAX.
 
B

Ben Bacarisse

Richard said:
Why wouldn't it? (and I acknowledge the correction btw).

Had I run this through a debugger with s as an empty string (I must
admit I had assumed s had to be non null and a non empty string but fair
enough point you raise) I would spot it in about 1 second when e is
calculated.

I don't see how, but maybe your debugger is clever. If you take the
view that a pointer is a number (it's not but you have suggested that
simplification here recently) e will be an address one less that s,
the while won't trigger and nothing will (correctly) be done. How can
testing show that such an e is invalid and can't even be constructed
let alone compared to s?
But correction (still no null check):


#include <stdio.h>
#include <string.h>

char * my_strrev(char *s){
char t;
char *e = s+strlen(s);
while(e-->s){
t = *s;
*s++ = *e;
*e = t;
}
}
Is that now defined? or, after a 14 hour day, am I talking rubbish here?

Can e not be decremented to a value "smaller" than s?

It can't be and your code still does so. You could write:

char *end = str + strlen(str);
while (end > str) {
char tmp = *str;
*str++ = *--end;
*end = tmp;
}

although this swaps characters that don't need to be swapped. I think
CBFalconer's test against length being > 1 is a good one:

size_t length = strlen(str);
if (length > 1) {
char *end = str + length;
while (--end > str) {
char tmp = *str;
*str++ = *end;
*end = tmp;
}
}

I've put declarations where I prefer and used my own pet names but it
is essentially the same code.
 
R

Richard

Ben Bacarisse said:
I don't see how, but maybe your debugger is clever. If you take the
view that a pointer is a number (it's not but you have suggested that
simplification here recently) e will be an address one less that s,

I pointed out that on every single system I have used (and this is one)
a pointer is indeed represented as a number of sorts in the
debugger.

In addition, since the pointers are char * the default is also to
display the strings they point to.

What I would have noticed in the debugger was that e was initialised to
something less than s in the case of empty strings. I also posted the
caveat that I (foolishly) did not consider empty strings and this was
indeed an error on my part. This I would have considered an error
despite me being far more liberal on UDB than many here and that being
something I know I can improve on.
the while won't trigger and nothing will (correctly) be done. How can
testing show that such an e is invalid and can't even be constructed
let alone compared to s?



It can't be and your code still does so. You could write:

char *end = str + strlen(str);
while (end > str) {
char tmp = *str;
*str++ = *--end;
*end = tmp;
}

I think I did this in another reply. But did the "--" in the while.

I like the code above.
although this swaps characters that don't need to be swapped. I think
CBFalconer's test against length being > 1 is a good one:

Personally I still dont see the point : the code above is concise.
 
C

CBFalconer

Ben said:
Not "magic" but it has undefined behaviour on empty strings.

It also has the fatal error (to me) of returning a char*. This
encourages the silly mistake of trying to print a string and its
reverse with:

printf("%s & %s\n", s, strrev(s));

In fact it is even worse than that - it fails to return the
declared object. My version returns strlen, which it needs to
evaluate to find the end of the string, and thus avoids the caller
needing to call it again. Also note that my code returns a NULL if
the passed string is NULL, and correctly handles strings of length
0 and 1.
 
L

lovecreatesbea...

It's undefined when attempting to reverse a zero length string.

How to reverse a zero length string for it has only a single NULL
terminating character : ) The reverse function quits if this is the
case.
 
L

lovecreatesbea...

How to reverse a zero length string for it has only a single NULL
terminating character : ) The reverse function quits if this is the
case.

My version

char *reverse(char *p)
{
char *p1, *p2, ch;

for (p1 = p; *(p1 + 1); p1++) ;
for (p2 = p; p2 < p1; p2++, p1--)
ch = *p2, *p2 = *p1, *p1 = ch;
return p;
}
 
B

Ben Bacarisse

Yes, but that is not the problem.
My version

char *reverse(char *p)
{
char *p1, *p2, ch;

for (p1 = p; *(p1 + 1); p1++) ;

If p points to an empty string this, too, is probably UB. It may not
be -- one would have to see the call, but you should not even risk
this. A call like: char test[10]; test[0] = 0; reverse(test); is bad
news!
 
R

Richard

pete said:

OK, I learnt something there. I have been too lazy with certain aspects
(and I'm not referring to the silly zero length error there) of my C
coding and I never, ever considered a line like

while(p-- > s)


to have UDB risks. Probably one of the problems from associating C too
much with being a low level language designed to produce efficient code
tied to the underlying CPU HW and never really giving too much of a hoot
about platform independence.

To get me started on the path of righteousness can you point out where I
should learn/read that in the n1256.txt? As, frankly, I find the
standard document quite daunting and confusing.
 
C

CBFalconer

.... snip ...


How to reverse a zero length string for it has only a single
NULL terminating character : ) The reverse function quits if
this is the case.

Not mine, which was posted earlier. It also considers a NULL
passed in as indicating an empty string, and returns 0 (for input
length).
 
C

CBFalconer

.... snip ...

My version

char *reverse(char *p) {
char *p1, *p2, ch;

for (p1 = p; *(p1 + 1); p1++) ;
for (p2 = p; p2 < p1; p2++, p1--)
ch = *p2, *p2 = *p1, *p1 = ch;
return p;
}

Much better. It still doesn't handle a NULL input, and has the
error of returning the string pointer rather than the string
length. Which is only an error in optimization, not coding.
 
C

CBFalconer

Ben said:
.... snip ...
My version

char *reverse(char *p) {
char *p1, *p2, ch;

for (p1 = p; *(p1 + 1); p1++) ;

If p points to an empty string this, too, is probably UB. It may
not be -- one would have to see the call, but you should not even
risk this. A call like: char test[10]; test[0] = 0; reverse(test);
is bad news!

I think that is OK. The above for does nothing except set p1 == p.
and this one does nothing because p2 is not < p1. Both are at p.
 
P

Peter Nilsson

5 Address of a function
OK, I learnt something there. I have been too lazy with
certain aspects (and I'm not referring to the silly zero
length error there) of my C coding and I never, ever
considered a line like

while(p-- > s)

to have UDB risks. ...
<snip>
To get me started on the path of righteousness can
you point out where I should learn/read that in the
n1256.txt? As, frankly, I find the standard document
quite daunting and confusing.

Your path to righteousness begins with not slagging off
people who are prepared to do what you can't or won't.

That said, look at 6.5.6 (additive operators),
specifically p8.

"...If both the pointer operand and the result point
to elements of the same array object, or one past the
last element of the array object, the evaluation shall
not produce an overflow; otherwise, the behavior is
undefined."
 
A

arnuld

This is so horrible and faulty that I have to attach the
following. Don't forget to #include <strings.h>. Note that it
returns strlen.

/* ======================= */
/* reverse string in place */
size_t revstring(char *stg)
{
char *last, temp;
size_t lgh;

lgh = strlen(stg);
if (lgh > 1) {
last = stg + lgh; /* points to '\0' */
while (last-- > stg) {
temp = *stg; *stg++ = *last; *last = temp;
}
}
return lgh;
} /* revstring */


I have edited it a little bit to get pointer to pointer as function
argument and combined it with my get_single_word program and it works
little strange though. Also, your function is much easier to read IMHO:


#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <string.h>


enum { WORD_SIZE = 28 };
enum { GSW_OK, GSW_ENOMEM, GSW_ENORESIZE } ;


int get_single_word( char** );
size_t revstring( char** );


int main(void)
{
char *word;

while( (GSW_OK == get_single_word(&word)) && (*word) )
{
revstring( &word );
printf("Reversed: [ %s ]\n", word);
}

return 0;
}



/* reverse string in place */
size_t revstring(char **stg)
{
char *last, temp;
size_t lgh;

lgh = strlen(*stg);
if (lgh > 1) {
last = *stg + lgh; /* points to '\0' */
while (last-- > *stg)
{
temp = **stg;
**stg++ = *last;
*last = temp;
}
}
return lgh;
}



int get_single_word( char** ppc )
{
unsigned ele_num;
int ch;
size_t word_length, word_length_interval;
char* word_begin;
char* new_mem;

ele_num = 0;

word_length = WORD_SIZE;
word_length_interval = 2;



*ppc = malloc(word_length * sizeof(**ppc));
word_begin = *ppc;


if( NULL == ppc )
{
return GSW_ENOMEM;

}


while( (EOF != (ch = getchar())) && isspace(ch) )
{
continue; /* trailing whitespace */
}


if( EOF != ch )
{
*word_begin++ = ch;
ele_num = 1;
}


while( (EOF != (ch = getchar())) && (! isspace(ch)) )
{
if( (word_length - 1) == ele_num++ )
{
new_mem = realloc( *ppc, (word_length_interval * word_length * sizeof **ppc) );
*ppc = new_mem;

if( new_mem )
{
word_begin = new_mem + (word_begin - *ppc);
word_length *= word_length_interval;
*ppc = new_mem;
}
else
{
*word_begin = '\0';
return GSW_ENORESIZE;
}
}

*word_begin++ = ch;
}


*word_begin = '\0';

return GSW_OK;
}

=================== OUTPUT =====================
[arnuld@dune ztest]$ gcc -ansi -pedantic -Wall -Wextra reverse-string.c
[arnuld@dune ztest]$ ./a.out
Ben
Reversed: [ neB ]
CBFalconer
Reversed: [ rBFalconeC ]
comp.kang.c++
Reversed: [ +omp.kang.c+c ]
[arnuld@dune ztest]$
 
R

Richard

Peter Nilsson said:
5 Address of a function


Your path to righteousness begins with not slagging off
people who are prepared to do what you can't or won't.

Hold on - I still dont hold with the other version from cbf. Frankly any
prodding he gets is his own fault - he spends 90% of his posts doing
much the same. And I dont slag off correct code either. I do however
slag off ridiculous bullying and silly preening with such claims as "I
dont need a debugger and neither do you" or "your debugger uses numbers
for pointers? Really"? etc etc etc . It makes this group a very tense
and strange place to be at times.

Anyway I appreciate the link.
That said, look at 6.5.6 (additive operators),
specifically p8.

"...If both the pointer operand and the result point
to elements of the same array object, or one past the
last element of the array object, the evaluation shall
not produce an overflow; otherwise, the behavior is
undefined."

Now back to "undefined".

And my code.

Where does "undefined" in any shape or form indicate that there will be
knock on effects outside of the value of the variable in question
considering it is not used anymore? And if that variable is not used
anymore does it really matter? Its like using (or not using rather) an
uninitialised pointer.

Yes, I know "undefined" can mean "anything", but lets stay real here for
a moment. Possibly "undefined" is "defined" to mean "undefined for that
variable in question".

Or?
 
K

Keith Thompson

Richard said:
Now back to "undefined".

And my code.


Where does "undefined" in any shape or form indicate that there will be
knock on effects outside of the value of the variable in question
considering it is not used anymore? And if that variable is not used
anymore does it really matter? Its like using (or not using rather) an
uninitialised pointer.

Yes, I know "undefined" can mean "anything", but lets stay real here for
a moment. Possibly "undefined" is "defined" to mean "undefined for that
variable in question".

Or?

No, "undefined behavior" means exactly that:

behavior, upon use of a nonportable or erroneous program construct
or of erroneous data, for which this International Standard
imposes no requirements

NOTE Possible undefined behavior ranges from ignoring the
situation completely with unpredictable results, to behaving
during translation or program execution in a documented manner
characteristic of the environment (with or without the issuance of
a diagnostic message), to terminating a translation or execution
(with the issuance of a diagnostic message).

In the case of constructing a pointer just before the beginning of an
object, it will very often be harmless. But suppose an implementation
uses segmented memory, and the object just happens to be start at the
very beginning of a segment. And suppose the machine has special
instructions that operate on addresses, and that trap if you attempt
to construct a pointer outside the segment you started from (which
would be quite useful for catching errors). The point is that the
standard is specifically designed to allow such an implementation.

I feel safe in assuming that decrementing a pointer won't actually
make demons fly out of my nose. And if it did, I would complain
bitterly to the the implementer for making the implementation behave
in such a rude manner. But among my many complaints, I would not be
able to claim that the implementation fails to conform to the C
standard.
 
F

Flash Gordon

Richard wrote, On 23/09/08 07:40:
Now back to "undefined".

And my code.


Where does "undefined" in any shape or form indicate that there will be
knock on effects outside of the value of the variable in question
considering it is not used anymore? And if that variable is not used
anymore does it really matter? Its like using (or not using rather) an
uninitialised pointer.

Yes, I know "undefined" can mean "anything", but lets stay real here for
a moment. Possibly "undefined" is "defined" to mean "undefined for that
variable in question".

Or?

It means the behaviour of the program, not just that variable, is undefined.

One reason it might cause the program to abort is that the object
pointed to might be at the beginning of a page or section or whatever
and the processor could trap on decrementing past the beginning of the
page. This would be eminently sensible behaviour and help on detecting
certain classes of bugs. Depending on the processor design it might be
easier to trap on the calculation than the dereference. As to why the
trap might occur at some unexpected time, the optimiser might have
rearranged the code with the assumption that you don't invoke undefined
behaviour by calculating a pointer to before the beginning of an object.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,148
Latest member
ElizbethDa
Top