squeeze function

N

Nick Keighley

use some white-space!
In this case, yes. Everyone knows s and t are used as names for
strings, assuming they've read K&R. And it's not a huge leap to
interpret r as "returned string".

In general, I am not in favour of one-letter identifiers, but there
are times when verbosity gets in the way.

I worked with someone once who was a great believer in absolutes
("this
is the worse thing that could possibly happen!" was my favourite) and
he
once told me "never use a single letter variable!". So I have used
them as much as reasonably possible ever since.

Block *block_clear (Block *b)
{
/* code here */
}
 
P

Phil Carmody

Ben Bacarisse said:
Very neat, but there is a nit to pick: you need to change the type of
squeeze or you need to return something.

Ooops - well spotted! It was rather stream-of-consciousness, I'd
forgotten how I'd started by the time I got to the end.

Phil
 
P

Peter Nilsson

user923005 said:
....

char *squeeze(char *r, char c)
{
char *w = r;
char *s = w;
do {
*w = *r;
w += *r != c;
} while (*r++);
return s;
}

I'd be surprised if that was optimal for 68K platforms.
The code circumvents a natural addressing mode in a way
that may be very difficult for an optimisor to pick up.
 
S

sfuerst

char *squeeze(char *s, char c)
{
  char *t = s;
  char *r = s;
  for(; *s != '\0'; *s++)
  {
    if(*s != c)
    {
      *t++ = *s;
    }
  }
  *t = '\0';
  return r;

}

Or, even more compressed, and extremely non-readable:

char *squeeze(char *s, char c)
{
char *t=s,*r=s;
while((t+=c!=(*t=*s))&&*s++);
return r;
}

Steven
 
T

Tim Rentsch

sfuerst said:
Or, even more compressed, and extremely non-readable:

char *squeeze(char *s, char c)
{
char *t=s,*r=s;
while((t+=c!=(*t=*s))&&*s++);
return r;
}

Can be made shorter and clearer --

while(*t=*s,t+=c!=*s,*s++);
 
U

user923005

Can be made shorter and clearer --

          while(*t=*s,t+=c!=*s,*s++);

This one is pretty. I would have to say things like this are one of
the reasons that I still love C.
It's almost mathematical in its simplicity at times.
 
C

Chris M. Thomasson

Tim Rentsch said:
Can be made shorter and clearer --

while(*t=*s,t+=c!=*s,*s++);

Please excuse my ignorance of the C standard, but is it perfectly okay to
increment `s' past it's terminating '\0' character? `s' will point to
garbage at the end of the while-loop.
 
U

user923005

Please excuse my ignorance of the C standard, but is it perfectly okay to
increment `s' past it's terminating '\0' character? `s' will point to
garbage at the end of the while-loop.

You can increment one past the end, but you cannot access or update
the contents of the pointer.

These are listed as undefined behavior:
- Addition or subtraction of a pointer into, or just beyond, an array
object and an integer type produces a result that does not point into,
or just beyond, the same array object (6.5.6).
— Addition or subtraction of a pointer into, or just beyond, an array
object and an integer type produces a result that points just beyond
the array object and is used as the operand of a unary * operator that
is evaluated (6.5.6).

So I think if you do not produce a result or is dereferenc the pointer
it is OK. I cannot find the exact verbage, but I seem to remember it
from somewhere.

When you think about it, many for() and while() loops are going to do
this in actual code.
 
P

Phil Carmody

Chris M. Thomasson said:
Please excuse my ignorance of the C standard, but is it perfectly okay
to increment `s' past it's terminating '\0' character? `s' will point
to garbage at the end of the while-loop.

It's valid to point to an address just beyond the end of any
object. As the \0 is within the object (presumably a char array),
the pointer may point one beyond it. The pointer may not be
dereferenced.

6.5.6 Additive operators
....
ã ãã I
ã ã ã
ãã ¸ ã ã ãã
¸ ãã ã ã ;
¸ ã I
ã ã ãã ¸ ã
ã ã ã ã * ã ã
ãã

What the heck just happened there with my copy/paste?

Grab n869.txt yourself - at least the 6.5.6 was hand-typed and
preserved.

Phil
 
F

Flash Gordon

sfuerst said:
Or, even more compressed, and extremely non-readable:

char *squeeze(char *s, char c)
{
char *t=s,*r=s;
while((t+=c!=(*t=*s))&&*s++);
return r;
}

If you want compressed, here is a one liner for you...

#include <stdio.h>
#include <ctype.h>
#include <string.h>
#include <limits.h>

char *squeeze(char *s,char c)
{
return *s?squeeze(s+1,c):s,*s==c?memmove(s,s+1,strlen(s)),s:s;
}

int main()
{
char s[] = "---The quick brown fox jumped over the lazy dogs!!!~~";
char c;
for (c=CHAR_MIN; c<CHAR_MAX; c++)
printf("%4d (%c): %s\n",c,isprint((unsigned char)c)?c:'
',squeeze(s,c));
printf("%4d (%c): %s\n",c,isprint((unsigned char)c)?c:' ',squeeze(s,c));
return 0;
}

OK, so it might be just a tad less efficient...

If you could have a recursive macro it would be even more fun.
 
C

Chris M. Thomasson

Phil Carmody said:
It's valid to point to an address just beyond the end of any
object. As the \0 is within the object (presumably a char array),
the pointer may point one beyond it. The pointer may not be
dereferenced.

Thank you both.

[...]
 
T

Tim Rentsch

Chris M. Thomasson said:
Please excuse my ignorance of the C standard, but is it perfectly okay
to increment `s' past it's terminating '\0' character? `s' will point
to garbage at the end of the while-loop.

It's always legal to form a pointer "just past" any legally
accessible object. The resulting pointer can be used in any of
the regular pointer ways (conversion, comparison, (in)equality
tests, pointer arithmetic) as long as there is no attempt to
access the memory it addresses. The Standard has this footnote
to 6.5.6p9, which I find expresses the principle in a memorable
way (the "one extra byte" sentence):

Another way to approach pointer arithmetic is first to
convert the pointer(s) to character pointer(s): In this
scheme the integer expression added to or subtracted from
the converted pointer is first multiplied by the size of the
object originally pointed to, and the resulting pointer is
converted back to the original type. For pointer
subtraction, the result of the difference between the
character pointers is similarly divided by the size of the
object originally pointed to.

When viewed in this way, an implementation need only provide
one extra byte (which may overlap another object in the
program) just after the end of the object in order to
satisfy the ``one past the last element'' requirements.
 
T

Tim Rentsch

Flash Gordon said:
sfuerst said:
Or, even more compressed, and extremely non-readable:

char *squeeze(char *s, char c)
{
char *t=s,*r=s;
while((t+=c!=(*t=*s))&&*s++);
return r;
}

If you want compressed, here is a one liner for you...

#include <stdio.h>
#include <ctype.h>
#include <string.h>
#include <limits.h>

char *squeeze(char *s,char c)
{
return *s?squeeze(s+1,c):s,*s==c?memmove(s,s+1,strlen(s)),s:s;
}

int main()
{
char s[] = "---The quick brown fox jumped over the lazy dogs!!!~~";
char c;
for (c=CHAR_MIN; c<CHAR_MAX; c++)
printf("%4d (%c): %s\n",c,isprint((unsigned char)c)?c:'
,squeeze(s,c));
printf("%4d (%c): %s\n",c,isprint((unsigned char)c)?c:' ',squeeze(s,c));
return 0;
}

OK, so it might be just a tad less efficient...

Very neat. *applause*

Can be made just a tiny bit shorter, with a somewhat larger tad:

return *s&&squeeze(s+1,c)&&memmove(s,s+(*s==c),strlen(s)),s;

This version has the advantage that the performance is more
predictable. ;)

Of course, the above solutions aren't meant as serious
suggestions, but if we allow ourselves a three-argument
helper function there is a nice compact expression that
also has good run-time performance --

int
squeez3( char *d, const char *s, char c ){
return (*d=*s) && squeez3( d+(*s!=c), s+1, c );
}

char *
squeeze( char *s, char c ){
return squeez3( s, s, c ), s;
}

Unfortunately gcc doesn't (yet?) realize the expression in
squeez3 can be computed using tail recursion. We can do that
using this form instead:

return (*d=*s) ? squeez3( d+(*s!=c), s+1, c ) : 0;

In case there are any gcc developers out there who might be
reading this -- in any integer-type function known to always
return either 0 or 1, an expression such as this one

<condition> && <this-function>( <arguments> )

can be seen as a proper tail call, since it is clearly equivalent
to

<condition> ? <this-function>( <arguments> ) : 0

which gcc already recognizes as tail recursion. (It's
easy to construct sufficient conditions for a function
always returning 0 or 1, looking at all its return
expressions, seeing if the top-most operator is a
comparison or equality operator, &&, ||, ?: with
both branches satisfying the 0/1 property, etc.)
 
T

Tim Rentsch

io_x said:
with the for loop

void squeeze(char *s, char c)
{char *t;
for(t=s; *t=*s; ++s, t+=(*s!=c));
}

The for() loop variation has the nice property
that the declaration can be folded into the
loop statement:

void
squeeze( char *s, char c ){
for( char *t = s; *t = *s; s++ ) t += *s != c;
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,078
Latest member
MakersCBDBlood

Latest Threads

Top