Quandry with the following C code (Intermediate)

B

BMarsh

Hi all,

I have a slight problem understanding the following code that I saw on
a Unix-PAM tutorial (not OT!)

The following code will compare and old string to a new one, bombing
out if 'max' similar chars is exceeded.


------8<------

static
int compare(unsigned char *old, unsigned char *new, int max)
{
unsigned char in_old[256];
int equal = 0;

(void)memset(in_old, 0, sizeof (in_old));

while (*old)
in_old[*(old++)]++;

while (*new) {
if (in_old[*new])
equal++;
new++;
}

if (equal > max)
return (1);

return (0);
}
------->8---------

I fail to see how the 2 strings are compared for character equality,
especially in how the

in_old[*(old++)]++;

line is used.
Could anyone please shed some light on this for me?

cheers

Bry
 
R

Rob van der Leek

Hi Bry,

I have a slight problem understanding the following code that I saw on
a Unix-PAM tutorial (not OT!)

The following code will compare and old string to a new one, bombing
out if 'max' similar chars is exceeded.

------8<------

static
int compare(unsigned char *old, unsigned char *new, int max)
{
unsigned char in_old[256];
int equal = 0;

(void)memset(in_old, 0, sizeof (in_old));

while (*old)
in_old[*(old++)]++;

while (*new) {
if (in_old[*new])
equal++;
new++;
}

if (equal > max)
return (1);

return (0);
}
------->8---------

I fail to see how the 2 strings are compared for character equality,
especially in how the

in_old[*(old++)]++;

The numerical character value of each character in the first input
string is used as an index for an array that counts the occurrences of
that character. Think about it like this: when the input string is "aab"
the first while loop does: in_old['a']++, in_old['a']++, in_old['b']++.

The second while loop checks for each character in the second input
string if it occurred in the first input string.

The first while loop could also be written as:

while (*old) {
in_old[*old]++;
old++;
}

Regards,
 
B

BMarsh

Hi Rob,

Many thanks for your answer; it's cleared it up for me! I was totally
thrown off by the way the loop was written.

Thanks again,

Bryan.
 
R

Richard Bos

BMarsh said:
The following code will compare and old string to a new one, bombing
out if 'max' similar chars is exceeded.

It doesn't do a compare the usual way. That is, it does something
completely different from strcmp().

(Oh, btw, if you insist on posting through Google-Broken-Beta, it would
be a good thing if you could get it not to strip all indentation. Your
code is hard to read this way.)
static
int compare(unsigned char *old, unsigned char *new, int max)
{
unsigned char in_old[256];

First of all, you need to use UCHAR_MAX here, instead of 256. If you
don't, you may try to run this code on a Unicode system some day, and be
surprised when your function scribbles all over memory when you pass it
a string with Unicode characters over 256 in it.
int equal = 0;

(void)memset(in_old, 0, sizeof (in_old));

Lose the cast. It does no good, and clutters up the code.
while (*old)
in_old[*(old++)]++;

This tallies the number of occurrences of each separate character value
in the first string. There's a bug in it: what happens if you pass it a
string of UCHAR_MAX 'a's?
while (*new) {
if (in_old[*new])
equal++;
new++;

(See what I mean about the indentation?)

This checks each character in the second string, and if there were any
of the same character at all in the first string, counts it as "equal".
}

if (equal > max)
return (1);

return (0);

If the number of "equal" characters, that is, the number of chars in the
second string of which there was at least one in the first string,
exceeds the passed-in maximum, return 1, else 0. This could be more
easily written as

return (equal>max);
I fail to see how the 2 strings are compared for character equality,

So do I; they're not.

Note, in particular, the different treatment of "old" and "new".

For example, try to explain the discrepancy between

compare("abc", "dbbbe", 2)

and

compare("dbbbe", "abc", 2)

Then, when you want an exercise I can't solve, try to explain _why_
someone would write a function like that, and then call it, sec,
"compare". The logic escapes me, I'm afraid. It's reasonably clear to me
_what_ this function does, but not why.
especially in how the

in_old[*(old++)]++;

The index entry corresponding to the character at the _current_ value of
old is increased (that is, the character now under the old pointer is
tallied); and old is moved to the next character. Not necessarily in
that order, or in any order at all, but since (old++) returns the old
value of old (so to speak) no matter which order is chosen, it doesn't
matter for the result.

Richard
 
I

infobahn

Richard said:
static
int compare(unsigned char *old, unsigned char *new, int max)
{
unsigned char in_old[256];

First of all, you need to use UCHAR_MAX here, instead of 256.

I think you mean "UCHAR_MAX + 1"
If you
don't, you may try to run this code on a Unicode system some day, and be
surprised when your function scribbles all over memory when you pass it
a string with Unicode characters over 256 in it.

I think you mean "over 255"

<snip>
 
R

Richard Bos

Francois Grieu said:
infobahn said:
Richard said:
unsigned char in_old[256];

First of all, you need to use UCHAR_MAX here, instead of 256.

I think you mean "UCHAR_MAX + 1"

Yes (and yes).
Do we need "UCHAR_MAX + 1L" to cover the case of UCHAR_MAX
equal to UINT_MAX, say both 0xFFFF ?

In theory, yes. In practice, systems where SCHAR_MAX == INT_MAX or
UCHAR_MAX==UINT_MAX have so many problems that I wouldn't bother to
cater for them. Anyone porting code to that kind of implementation knows
he's getting into a hornets' (or mare's <g>) nest, and should take all
necessary precautions himself.
(And why stop there? What if UCHAR_MAX==ULONG_MAX? Could happen
(probably does happen) on a 32-bit embedded processor.)

Richard
 
I

infobahn

Francois said:
Do we need "UCHAR_MAX + 1L" to cover the case of UCHAR_MAX
equal to UINT_MAX, say both 0xFFFF ?

Good spot, although I think we'd have to lump such an implementation
in with the DS9K. :)

Actually, this really is a problem on CSILP32 systems such as
(some) DSPs, and the L suffix doesn't help on such systems.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top