Highly efficient string reversal code

N

Nick Keighley

Bug: an instance of program behaviour that is not in the user's
reasonable expectation.

I think it's a bad idea to encourage users to expect defined
behaviour when a functions reasonable pre-condictions are
violated. reverse() operates on strings. NULL is not a string.
reverse(NULL) shouldn't have definend behaviour. This only
encourages people to rely on the behaviour.

There is no defined behaviour so I don't think
anyone should write code that relies on strlen(NULL)
doing anything in particular.

Depends on what you mean by "bug".

Anything strlen(NULL) does is consistent with the standard's
requirements.  But as a quality-of-implementation issue, it *should*
either be written for maximum efficiency for non-NULL arguments (which
typically means that strlen(NULL) will trap),

but may not on a system without memory protection


or it should explicitly
check for NULL and abort (or possibly use some other system-specific
error reporting mechanism).

If I call strlen(NULL), that's a bug in my program.
yes

 If the
implementation deliberately goes out of its way to hide that from me,
it's not being user-friendly, it's being programmer-hostile.

yes. But on a non-memory protected system it may be expensive
to detect.


and it could then return the length of the string it finds
at address zero.

In theory, you're right.  In practice, returning 0 is unhelpful.

yes. And I think reverse(NULL) is equally odd
 
B

Ben Bacarisse

Malcolm McLean said:
Ben Bacarisse said:
There are two problems with a pointer that points "before" the first
element of an array (I have to put "before" in quotes because since
the pointer is ill-formed it is not possible to say where it really
points). The first is that the pointer value might trap. The second
is that you are not permitted to compare it with other pointers -- or
to be more accurate, the comparison is undefined.
Consider this one.

void reverse(char *str)
{
char temp;
int start = 0;
int end = strlen(str) - 1;
while(start < end)
{
temp = str[start];
str[start] = str[end];
str[end] = temp;
start++;
end--;
}
}

Of course strlen() returns a size_t. So we'll replace those ints with
size_t's. What horrible thing have I now done to this function?

I don't get your point. You know the above is wrong and that
switching to size_ t does not make it right, so what are you driving
at?
 
B

Bartc

Ben Bacarisse said:
Malcolm McLean said:
Ben Bacarisse said:
There are two problems with a pointer that points "before" the first
element of an array (I have to put "before" in quotes because since
the pointer is ill-formed it is not possible to say where it really
points). The first is that the pointer value might trap. The second
is that you are not permitted to compare it with other pointers -- or
to be more accurate, the comparison is undefined.
Consider this one.

void reverse(char *str)
{
char temp;
int start = 0;
int end = strlen(str) - 1;
while(start < end)
{
temp = str[start];
str[start] = str[end];
str[end] = temp;
start++;
end--;
}
}

Of course strlen() returns a size_t. So we'll replace those ints with
size_t's. What horrible thing have I now done to this function?

I don't get your point. You know the above is wrong and that
switching to size_ t does not make it right, so what are you driving
at?

What's wrong about it? It works fine, provided str points to a valid string.
 
O

oksid

Bartc said:
Ben Bacarisse said:
Malcolm McLean said:
void reverse(char *str)
{
char temp;
int start = 0;
int end = strlen(str) - 1;
while(start < end)
{
temp = str[start];
str[start] = str[end];
str[end] = temp;
start++;
end--;
}
}

Of course strlen() returns a size_t. So we'll replace those ints with
size_t's. What horrible thing have I now done to this function?

I don't get your point. You know the above is wrong and that
switching to size_ t does not make it right, so what are you driving
at?

What's wrong about it? It works fine, provided str points to a valid
string.

If "str" points to an empty string "strlen()" will return 0...
"end" will be equal to -1 or 0xFFFFFFFF
 
B

Bartc

oksid said:
Bartc said:
Ben Bacarisse said:
void reverse(char *str)
{
char temp;
int start = 0;
int end = strlen(str) - 1;
while(start < end)
{
temp = str[start];
str[start] = str[end];
str[end] = temp;
start++;
end--;
}
}

Of course strlen() returns a size_t. So we'll replace those ints with
size_t's. What horrible thing have I now done to this function?

I don't get your point. You know the above is wrong and that
switching to size_ t does not make it right, so what are you driving
at?

What's wrong about it? It works fine, provided str points to a valid
string.

If "str" points to an empty string "strlen()" will return 0...
"end" will be equal to -1 or 0xFFFFFFFF

So? In that case start<end will be false, and the loop isn't executed.

Unless you're saying int can be unsigned? That would be news to me.

How can anyone write any sort of portable, reliable code when you don't even
know whether 'int' is signed or unsigned.
 
B

Ben Bacarisse

Bartc said:
Ben Bacarisse said:
"Malcolm McLean" <[email protected]> writes:
void reverse(char *str)
{
char temp;
int start = 0;
int end = strlen(str) - 1;
while(start < end)
{
temp = str[start];
str[start] = str[end];
str[end] = temp;
start++;
end--;
}
}

Of course strlen() returns a size_t. So we'll replace those ints with
size_t's. What horrible thing have I now done to this function?

I don't get your point. You know the above is wrong and that
switching to size_ t does not make it right, so what are you driving
at?

What's wrong about it? It works fine, provided str points to a valid string.

There are two cases. If all of the values of size_t (the result of
strlen) can be represented in an int (not likely, but possible) then
the code works. If not, then strlen("") - 1 is a large positive
number that is out of range for an int, and the conversion to int is
implementation defined (which may include raising a signal).
 
L

Lew Pitcher

My contribution to the cause

char *reverse(char *str)
{
char *start, *end, temp;

for (end=str;*end;++end);
--end; /* A */

for (start=str; start < end; ++start, --end) /* B */
{
temp = *start;
*start = *end;
*end = temp;
}
return str;
}

NB: For the degenerate case of a zero-length string, the computation of the
end character ( /* A */ ) produces a pointer that, *if dereferenced* would
encounter UB because it points out-of-bounds to the intended array.
*However*, the reversal loop ( /* B */ ) tests the value of this pointer
(without dereferencing it), and will discontinue the loop immediately
(again, without dereferencing the pointer).

--
Lew Pitcher

Master Codewright & JOAT-in-training | Registered Linux User #112576
http://pitcher.digitalfreehold.ca/ | GPG public key available by request
---------- Slackware - Because I know what I'm doing. ------
 
B

Ben Bacarisse

Lew Pitcher said:
My contribution to the cause

char *reverse(char *str)
{
char *start, *end, temp;

for (end=str;*end;++end);
--end; /* A */

for (start=str; start < end; ++start, --end) /* B */
{
temp = *start;
*start = *end;
*end = temp;
}
return str;
}

NB: For the degenerate case of a zero-length string, the computation of the
end character ( /* A */ ) produces a pointer that, *if dereferenced* would
encounter UB because it points out-of-bounds to the intended array.

You need to say why you dispute the claim that 6.5.6 paragraph 8 says
the result is undefined behaviour:

"... If both the pointer operand and the result point to elements
of the same array object, or one past the last element of the array
object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined."
*However*, the reversal loop ( /* B */ ) tests the value of this pointer
(without dereferencing it), and will discontinue the loop immediately
(again, without dereferencing the pointer).

And then you'd have to say why the < operation is defined when most
people read 6.5.8 paragraph 5 as saying that it is not.
 
R

Richard

Lew Pitcher said:
My contribution to the cause

char *reverse(char *str)
{
char *start, *end, temp;

for (end=str;*end;++end);
--end; /* A */

Did you not read the thread section where this is UDB in the case of
zero length strings? A surprise to me too I must admit. And thats even
before you use the illegal value in the comparison below.
for (start=str; start < end; ++start, --end) /* B */
{
temp = *start;
*start = *end;
*end = temp;
}
return str;
}

NB: For the degenerate case of a zero-length string, the computation of the
end character ( /* A */ ) produces a pointer that, *if dereferenced* would
encounter UB because it points out-of-bounds to the intended array.
*However*, the reversal loop ( /* B */ ) tests the value of this pointer
(without dereferencing it), and will discontinue the loop immediately
(again, without dereferencing the pointer).

--
 
B

brix99luftballons

(e-mail address removed) ha scritto:
Hello, I'm a highly experienced expert C programmer and I've written
this code to reverse a string in place. I think you could all learn
something from it!

int reverse(char* reverseme){
int retval=-1;
if(retval!=NULL){
int len=strlen(retval){
if(len>0){
int half=len>>1;
for(;retval<half;retval++){
reverseme[retval]^=reverseme[len-(retval+1)];
reverseme[len-(retval+1)]=reverseme[retval];
reverseme[retval]^=reverseme[len-(retval+1)];
}
}
}
return retval;
}

-Phil/CERisE
Just another little variation :

void reverse(char* pStart)
{
char* pEnd;
int len;

if((NULL==pStart) || (1 >= (len=(strlen(pStart)))))
return;

pEnd= pStart+len;
do{
char t;

pEnd--;
t = *pStart;
*pStart++ = *pEnd;
*pEnd = t;
}while(pStart < pEnd);
}

B.
 
L

Lew Pitcher

You need to say why you dispute the claim that 6.5.6 paragraph 8 says
the result is undefined behaviour:

"... If both the pointer operand and the result point to elements
of the same array object, or one past the last element of the array
object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined."


And then you'd have to say why the < operation is defined when most
people read 6.5.8 paragraph 5 as saying that it is not.

Oops. My bad. You are right, of course.

So, I'll try again - how about this variation?

char *reverse(char str[])
{
long int start, end;
char temp;

for (end=0;str[end];++end);
--end;

for (start=0; start < end; ++start, --end)
{
temp = str[start];
str[start] = str[end];
str[end] = temp;
}
return &str[0];
}


--
Lew Pitcher

Master Codewright & JOAT-in-training | Registered Linux User #112576
http://pitcher.digitalfreehold.ca/ | GPG public key available by request
---------- Slackware - Because I know what I'm doing. ------
 
F

Flash Gordon

Lew Pitcher wrote, On 24/09/08 18:28:
You need to say why you dispute the claim that 6.5.6 paragraph 8 says
the result is undefined behaviour:

"... If both the pointer operand and the result point to elements
of the same array object, or one past the last element of the array
object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined."

And then you'd have to say why the < operation is defined when most
people read 6.5.8 paragraph 5 as saying that it is not.

Oops. My bad. You are right, of course.

So, I'll try again - how about this variation?

char *reverse(char str[])
{
long int start, end;
char temp;

for (end=0;str[end];++end);

A problem is the string is longer than can be represented by a long int ;-)
--end;

for (start=0; start < end; ++start, --end)
{
temp = str[start];
str[start] = str[end];
str[end] = temp;
}
return &str[0];

Why not "return str"?

char *reverse(char *str)
{
if (str && *str) {
char *end;
char *start;
/* subtract safe as already caught 0 length */
for (start = str, end = strchr(str,0) - 1;
end > start;
start++, end--) {
char temp = *start;
*start = *end;
*end = temp;
}
}
return str;
}

I think the only limits on this would be the limits on the C
implementation and, of course, needing the pointer passed to be a valid
pointer to the first[1] character modifiable string.

[1] Well, the first character of the sub-string you want reversed, you
could leave the start of a string unreversed by passing a pointer to a
later character in the string.
 
F

Flash Gordon

pete wrote, On 25/09/08 00:16:
Flash Gordon wrote:
[1] Well, the first character of the sub-string you want reversed, you
could leave the start of a string unreversed by passing a pointer to a
later character in the string.

[1]Footnote not needed:
Every byte in a string,
is the first byte of some string.

I new that, but without it someone would have complained that it was
valid to pass a pointer to part way through a string :)
My turn!

#include <string.h>

char *str_rev(char *s)
{
char *const p = s;
char *t;
char swap;

if (*s != '\0') {

You've decided to leave s==NULL as undefined behaviour. This *is* a
perfectly valid design decision.
t = s + strlen(s + 1);

Clever, saved yourself a subtraction :)
It will still fail with a really stupidly long string so long that
strlen fails ;-) Now I've just got to find a way to generate such a
string :) Not a problem in practice, but one I deliberately sidestepped
by using strchr.

<snip>
 
C

CBFalconer

Flash said:
pete wrote:
.... snip ...


I new that, but without it someone would have complained that it
was valid to pass a pointer to part way through a string :)

I doubt it. I suspect you will get more 'fix your k key' help.
 
F

Flash Gordon

CBFalconer wrote, On 25/09/08 16:10:
I doubt it.

I don't.
I suspect you will get more 'fix your k key' help.

Nothing wrong with it. There is also a perfectly good reason why my
spelling is sometimes less than perfect, and a spell checker enabled in
my Usenet client and all my email clients to minimise the errors.
 
L

lovecreatesbeauty

*(p1 + 1) is undefined for a zero length string.

It existed in my old reply. Been revised as follow:

char *reverse(char *p)
{
char *p1, *p2, ch;

p1 = p;
if (!*p) return p;
while (*p1) p1++;
p1--;
for (p2 = p; p2 < p1; p2++, p1--)
ch = *p2, *p2 = *p1, *p1 = ch;
return p;
}
 
L

lovecreatesbeauty

unsigned my_strrev(char *s)
{unsigned h;
char *e, t;
if(s==0) return 0;
h=strlen(s);
if((int)h<=0) return 0;
e=s+h-1;
for( ;e>s ;--e,++s)
{t=*s; *s=*e; *e=t;}
return h;

}

not tested
don't know if "if((int)h<=0) return 0;" is right

The length of a string won't be a negative. The cast also isn't
required. The rest of it seems fine :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,789
Messages
2,569,634
Members
45,342
Latest member
Sicuro

Latest Threads

Top