Highly efficient string reversal code

N

Nick Keighley


[creating a pointer to one-before the start of an array is UB]
(UDB meaning Undefined Behavior; the more common abbreviation around
here is UB.)



No, why would you think the behavior might be undefined?  Decrementing
an unsigned object with the value 0 simply sets it to UINT_MAX.

Richard's post is easier to understand if you remember that he
thinks pointers are integers and that his debugger confirms it.

:)
 
N

Nick Keighley

Hold on - I still dont hold with the other version from cbf. Frankly any
prodding he gets is his own fault - he spends 90% of his posts doing
much the same. And I dont slag off correct code either.

was cbf's code incorrect?
I do however
slag off ridiculous bullying and silly preening with such claims as "I
dont need a debugger and neither do you" or "your debugger uses numbers
for pointers? Really"? etc etc etc . It makes this group a very tense
and strange place to be at times.

well you don't exactly help


<snip>
 
N

Nick Keighley

:


Much better.  It still doesn't handle a NULL input,

since NULL isn't a valid string nor should it.
At best NULL should assert
and has the
error of returning the string pointer rather than the string
length.  Which is only an error in optimization, not coding.

....is only an error of opinion
 
N

Nick Keighley

... snip ...

note: NULL is not a character. '\0' *is* a character.

Not mine, which was posted earlier.  It also considers a NULL
passed in as indicating an empty string, and returns 0 (for input
length).

which I consider a bug
 
N

Nick Keighley

Hello, I'm a highly experienced expert C programmer and I've written
this code to reverse a string in place. I think you could all learn
something from it!

int reverse(char* reverseme){
   int retval=-1;
   if(retval!=NULL){
      int len=strlen(retval){
      if(len>0){
         int half=len>>1;
         for(;retval<half;retval++){
            reverseme[retval]^=reverseme[len-(retval+1)];
            reverseme[len-(retval+1)]=reverseme[retval];
            reverseme[retval]^=reverseme[len-(retval+1)];

for newbies: don't do this. It is a very stupid way to swap two
integers.

         }
      }
   }
   return retval;

}

missing }
 
B

Bartc

Since the subject has come up, here's one I've been waiting for an
excuse to post:
[...]

I tested your reverse function against the simplest reverse function of my
own I could knock up. Mine was 6 to 20 times faster that yours, to reverse
a
77-character string one million times on my machine.

So if there is a point to your version, I haven't quite grasped it.

I'd rather give my version to somebody who's posting an obvious
homework problem than yours.

Nobody mentioned homework. Anyway there are plenty of versions of this
around; my code is practically identical to that in K&R2.
Once you untangle it, it also demonstrates a few code-structure ideas
that are quite useful to have wrapped your brain around (not that
reversing a string in C is really a _good_ way to illustrate those),
and I like to think it provides a nifty example of how to build working
approximations to some high-level abstractions that C doesn't provide
natively.

I've had a brief look, although I really think you need to lay out your code
better.
 
V

vippstar

which I consider a bug

Depends on the documentation. Do you consider it a bug if strlen(NULL)
returns 0?
I'd just document that NULL shouldn't be passed to such function; what
it does after that is not important. (ie returning 0 or dereferencing
a null pointer)
 
J

James Kuyper

Richard said:
Where does "undefined" in any shape or form indicate that there will be
knock on effects outside of the value of the variable in question
considering it is not used anymore? And if that variable is not used
anymore does it really matter? Its like using (or not using rather) an
uninitialised pointer.

Yes, I know "undefined" can mean "anything", but lets stay real here for
a moment. Possibly "undefined" is "defined" to mean "undefined for that
variable in question".

Or?

Or, it means that the compiler is allowed to perform optimizations
elsewhere in your program, based upon the assumption that your code does
not have undefined behavior.

Specifically, your use of 'p--' at that point in the program can be used
by the compiler as an indication that p does NOT point at the start of
the array. As a result, if there is a previous statement that uses 'p',
and the compiler can verify that the value of 'p' should not be changed
between that previous statement and the entry to your while() loop, then
the compiler is free to optimize that previous use of 'p', by making
that same assumption.

For example, if that previous statement were

q = (p+3)-1;

the compiler would be free to rearrange that as

q = (p-1)+3;

Now, I'm not claiming that this is actually an optimization. There's no
obvious benefit to that rearrangement, it's just the simplest example I
could think of, of a legal rearrangement with potentially fatal
consequences. The point is, that because you use 'p--' later on in the
code, the compiler is free to investigate such rearrangements at this
earlier point in the code. If it finds a rearrangement that it prefers
over the original code, it is permitted to use it.

This is true even if the compiler is compiling for a platform where
calculating p-1 would abort the program if p happens to point at the
beginning of a memory segment. The original version of the code is
carefully written to avoid that problem, but because of the later use of
'p--', the compiler is allowed to rearrange it on the assumption that
p-1 has defined behavior. If p-1 and p+3 both have defined behavior,
then the behavior of (p+3)-1 and (p-1)+3 are both guaranteed to be
identical to p+2.

This is an example of why it's a mistake to assume that the consequences
of writing a code construct with undefined behavior are restricted to
the execution of that particular code construct. This is why the
standard standard specifies that it is the behavior of your ENTIRE
program that is undefined, not just the behavior of the particular
construct that makes it undefined.

And yes, this is a "real world" issue; modern compiler technology does
include the ability to perform optimizations at locations in the code
that are far away from the site of the code construct that gives them
permission to perform the optimization.
 
C

Chris Dollin

Depends on the documentation. Do you consider it a bug if strlen(NULL)
returns 0?
I'd just document that NULL shouldn't be passed to such function; what
it does after that is not important. (ie returning 0 or dereferencing
a null pointer)

Or exploding.
 
J

James Kuyper

CBFalconer said:
Ben said:
... snip ...
My version

char *reverse(char *p) {
char *p1, *p2, ch;

for (p1 = p; *(p1 + 1); p1++) ;
If p points to an empty string this, too, is probably UB. It may
not be -- one would have to see the call, but you should not even
risk this. A call like: char test[10]; test[0] = 0; reverse(test);
is bad news!

I think that is OK. The above for does nothing except set p1 == p.

No, it dereferences p+1. If p points at an empty string "", then p+1
points one past the end of the array, and the behavior of dereferencing
p+1 is undefined. Furthermore, if p points at "\0Hello, world!", the
loop given above will leave p1 pointing at the '!', which is not where
it should be pointing.
 
B

Ben Bacarisse

Richard Heathfield said:
Nick Keighley said:


Well, it doesn't exactly matter, since nobody with any sense pays the
slightest attention to what he says.

For the moment (unless you correct me) I'll assume that did not intend
to imply that I (and a few others) don't have any sense. Maybe
reading his posts and correcting as yet uncorrected technical errors
falls somehow below the radar of paying "the slightest attention", or
maybe it was slip of the fingers.
 
B

Ben Bacarisse

Ben Bacarisse said:
Odd definition of "better". You don't mind the undefined behaviour
then?

I should have read the thread to the end (I usually do) since after
pete and James's comments I am sure you understand the UB, but there
is another way it can show up:

char string[100];
string[0] = 0;
reverse(string);

There is no guarantee that *(p1 + 1) is not, in fact, non-zero for all
the remaining 99 characters if the string. The loop will run off the
end.
 
R

Richard

Ben Bacarisse said:
Ben Bacarisse said:
Odd definition of "better". You don't mind the undefined behaviour
then?

I should have read the thread to the end (I usually do) since after
pete and James's comments I am sure you understand the UB, but there
is another way it can show up:

char string[100];
string[0] = 0;
reverse(string);

There is no guarantee that *(p1 + 1) is not, in fact, non-zero for all
the remaining 99 characters if the string. The loop will run off the
end.

I'm surprised this condition raised its head again in the same
thread. Not so much about what is hiding at *p+1 but the case
of zero length strings in general.

I must admit to still being a bit shell shocked to find while(p-- > s) can
cause undefined behaviour. I doubt there's anywhere it actually does
cause an issue but its interesting none the less.
 
J

jameskuyper

Richard wrote:
....
I must admit to still being a bit shell shocked to find while(p-- > s) can
cause undefined behaviour. I doubt there's anywhere it actually does
cause an issue but its interesting none the less.

In another thread recently, at least one real machine was identified
by name as having the characteristic of aborting a program when an
invalid pointer was loaded into a pointer register (not just when the
pointer is dereferenced). p-- when p points to the beginning of a
array is just another way of creating a pointer that might trigger
such behavior. Machines with that characteristic may be rare, but I
would hope by now that you're at least willing to concede that they
exist.
 
L

lovecreatesbea...

char *reverse(char *p)
{
        char *p1, *p2, ch;
        for (p1 = p; *(p1 + 1); p1++) ;

If p points to an empty string this, too, is probably UB.  It may not
be -- one would have to see the call, but you should not even risk
this.  A call like: char test[10]; test[0] = 0; reverse(test); is bad
news!
        for (p2 = p; p2 < p1; p2++, p1--)
                ch = *p2, *p2 = *p1, *p1 = ch;
        return p;
}

Thank you very much. I correct it as follow:

char *reverse(char *p)
{
char *p1, *p2, ch;

for (p1 = p; *p1; p1++) ;
p1--;
for (p2 = p; p2 < p1; p2++, p1--)
ch = *p2, *p2 = *p1, *p1 = ch;
return p;
}
 
L

lovecreatesbea...

Much better.  It still doesn't handle a NULL input, and has the
error of returning the string pointer rather than the string
length.  Which is only an error in optimization, not coding.

Thanks for read. And I made another mistake. The object one step past
the the end of the array was dereferenced in my old code. I post a
revision and comments are welcome as usual.
 
K

Keith Thompson

Nick Keighley said:
note: NULL is not a character. '\0' *is* a character.

Even though NULL may legally be #define'd as '\0', and even if it's
#define'd as 0 it has the same value and type as '\0'.

But yes, the point is that NULL is *intended* to be used as a pointer
value, not as a character value, and it *can* be defined in such a way
((void*)0) as to make the latter illegal.

[snip]
 
K

Keith Thompson

Depends on the documentation. Do you consider it a bug if strlen(NULL)
returns 0?

Depends on what you mean by "bug".

Anything strlen(NULL) does is consistent with the standard's
requirements. But as a quality-of-implementation issue, it *should*
either be written for maximum efficiency for non-NULL arguments (which
typically means that strlen(NULL) will trap), or it should explicitly
check for NULL and abort (or possibly use some other system-specific
error reporting mechanism).

If I call strlen(NULL), that's a bug in my program. If the
implementation deliberately goes out of its way to hide that from me,
it's not being user-friendly, it's being programmer-hostile.
I'd just document that NULL shouldn't be passed to such function; what
it does after that is not important. (ie returning 0 or dereferencing
a null pointer)

In theory, you're right. In practice, returning 0 is unhelpful.
 
K

Keith Thompson

Nick Keighley said:
Hello, I'm a highly experienced expert C programmer and I've written
this code to reverse a string in place. I think you could all learn
something from it!

int reverse(char* reverseme){
   int retval=-1;
   if(retval!=NULL){
      int len=strlen(retval){
      if(len>0){
         int half=len>>1;
         for(;retval<half;retval++){
            reverseme[retval]^=reverseme[len-(retval+1)];
            reverseme[len-(retval+1)]=reverseme[retval];
            reverseme[retval]^=reverseme[len-(retval+1)];

for newbies: don't do this. It is a very stupid way to swap two
integers.

The xor trick is a stupid way to swap two integers. But the quoted
code doesn't even get the xor trick right (and I suspect it's
deliberate).
missing }

Actually there's an extra '{', and a missing ';', on the declaration
of "len".

Newbies should disregard all the code posted by this person calling
himself "dominantubergeek". More experienced programmers might amuse
themselves by cataloging all the errors (I've seen worse, BTW), but I
think we've covered it suffiently here.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,152
Latest member
LorettaGur
Top