Quandry with the following C code (Intermediate)

BMarsh · Jan 12, 2005

Hi all,

I have a slight problem understanding the following code that I saw on
a Unix-PAM tutorial (not OT!)

The following code will compare and old string to a new one, bombing
out if 'max' similar chars is exceeded.

------8<------

static
int compare(unsigned char *old, unsigned char *new, int max)
{
unsigned char in_old[256];
int equal = 0;

(void)memset(in_old, 0, sizeof (in_old));

while (*old)
in_old[*(old++)]++;

while (*new) {
if (in_old[*new])
equal++;
new++;
}

if (equal > max)
return (1);

return (0);
}
------->8---------

I fail to see how the 2 strings are compared for character equality,
especially in how the

in_old[*(old++)]++;

line is used.
Could anyone please shed some light on this for me?

cheers

Bry

Rob van der Leek · Jan 12, 2005

Hi Bry,

I have a slight problem understanding the following code that I saw on
a Unix-PAM tutorial (not OT!)

The following code will compare and old string to a new one, bombing
out if 'max' similar chars is exceeded.

------8<------

static
int compare(unsigned char *old, unsigned char *new, int max)
{
unsigned char in_old[256];
int equal = 0;

(void)memset(in_old, 0, sizeof (in_old));

while (*old)
in_old[*(old++)]++;

while (*new) {
if (in_old[*new])
equal++;
new++;
}

if (equal > max)
return (1);

return (0);
}
------->8---------

I fail to see how the 2 strings are compared for character equality,
especially in how the

in_old[*(old++)]++;

The numerical character value of each character in the first input
string is used as an index for an array that counts the occurrences of
that character. Think about it like this: when the input string is "aab"
the first while loop does: in_old['a']++, in_old['a']++, in_old['b']++.

The second while loop checks for each character in the second input
string if it occurred in the first input string.

The first while loop could also be written as:

while (*old) {
in_old[*old]++;
old++;
}

Regards,

BMarsh · Jan 12, 2005

Hi Rob,

Many thanks for your answer; it's cleared it up for me! I was totally
thrown off by the way the loop was written.

Thanks again,

Bryan.

Richard Bos · Jan 13, 2005

BMarsh said:
The following code will compare and old string to a new one, bombing
out if 'max' similar chars is exceeded.

It doesn't do a compare the usual way. That is, it does something
completely different from strcmp().

(Oh, btw, if you insist on posting through Google-Broken-Beta, it would
be a good thing if you could get it not to strip all indentation. Your
code is hard to read this way.)

static
int compare(unsigned char *old, unsigned char *new, int max)
{
unsigned char in_old[256];

First of all, you need to use UCHAR_MAX here, instead of 256. If you
don't, you may try to run this code on a Unicode system some day, and be
surprised when your function scribbles all over memory when you pass it
a string with Unicode characters over 256 in it.

int equal = 0;

(void)memset(in_old, 0, sizeof (in_old));

Lose the cast. It does no good, and clutters up the code.

while (*old)
in_old[*(old++)]++;

This tallies the number of occurrences of each separate character value
in the first string. There's a bug in it: what happens if you pass it a
string of UCHAR_MAX 'a's?

while (*new) {
if (in_old[*new])
equal++;
new++;

(See what I mean about the indentation?)

This checks each character in the second string, and if there were any
of the same character at all in the first string, counts it as "equal".

}

if (equal > max)
return (1);

return (0);

If the number of "equal" characters, that is, the number of chars in the
second string of which there was at least one in the first string,
exceeds the passed-in maximum, return 1, else 0. This could be more
easily written as

return (equal>max);

I fail to see how the 2 strings are compared for character equality,

So do I; they're not.

Note, in particular, the different treatment of "old" and "new".

For example, try to explain the discrepancy between

compare("abc", "dbbbe", 2)

and

compare("dbbbe", "abc", 2)

Then, when you want an exercise I can't solve, try to explain _why_
someone would write a function like that, and then call it, sec,
"compare". The logic escapes me, I'm afraid. It's reasonably clear to me
_what_ this function does, but not why.

especially in how the

in_old[*(old++)]++;

The index entry corresponding to the character at the _current_ value of
old is increased (that is, the character now under the old pointer is
tallied); and old is moved to the next character. Not necessarily in
that order, or in any order at all, but since (old++) returns the old
value of old (so to speak) no matter which order is chosen, it doesn't
matter for the result.

Richard

infobahn · Jan 13, 2005

Richard said:
static
int compare(unsigned char *old, unsigned char *new, int max)
{
unsigned char in_old[256];

Click to expand...

First of all, you need to use UCHAR_MAX here, instead of 256.

I think you mean "UCHAR_MAX + 1"

If you
don't, you may try to run this code on a Unicode system some day, and be
surprised when your function scribbles all over memory when you pass it
a string with Unicode characters over 256 in it.

I think you mean "over 255"

<snip>

Francois Grieu · Jan 14, 2005

infobahn said:
Richard said:

BMarsh said:

unsigned char in_old[256];

Click to expand...

First of all, you need to use UCHAR_MAX here, instead of 256.

Click to expand...

I think you mean "UCHAR_MAX + 1"

Do we need "UCHAR_MAX + 1L" to cover the case of UCHAR_MAX
equal to UINT_MAX, say both 0xFFFF ?

Francois Grieu

Richard Bos · Jan 14, 2005

Francois Grieu said:
infobahn said:

Richard said:

unsigned char in_old[256];

First of all, you need to use UCHAR_MAX here, instead of 256.

Click to expand...

I think you mean "UCHAR_MAX + 1"

Click to expand...

Yes (and yes).

Do we need "UCHAR_MAX + 1L" to cover the case of UCHAR_MAX
equal to UINT_MAX, say both 0xFFFF ?

In theory, yes. In practice, systems where SCHAR_MAX == INT_MAX or
UCHAR_MAX==UINT_MAX have so many problems that I wouldn't bother to
cater for them. Anyone porting code to that kind of implementation knows
he's getting into a hornets' (or mare's <g>) nest, and should take all
necessary precautions himself.
(And why stop there? What if UCHAR_MAX==ULONG_MAX? Could happen
(probably does happen) on a 32-bit embedded processor.)

Richard

infobahn · Jan 14, 2005

Francois said:
Do we need "UCHAR_MAX + 1L" to cover the case of UCHAR_MAX
equal to UINT_MAX, say both 0xFFFF ?

Good spot, although I think we'd have to lump such an implementation
in with the DS9K.

Actually, this really is a problem on CSILP32 systems such as
(some) DSPs, and the L suffix doesn't help on such systems.

How can I view / open / render / display a pdf file with c code?	0	Sep 23, 2023
Adding adressing of IPv6 to program	1	Feb 16, 2023
How to try a range of hex values in C# code ?	0	Nov 19, 2022
Problem with displaying character that code number is 219 (after SetConsoleTextAttribute)?	3	Jan 9, 2023
C program: memory leak/ segmentation fault/ memory limit exceeded	0	Nov 12, 2022
Qsort() is messing with my entire Code!!!	0	Apr 25, 2022
I'm not seeing the error in the following linked list.	2	Apr 15, 2010
In C, the longest palindromic subsequence multithread exists	0	Nov 23, 2022

Quandry with the following C code (Intermediate)

BMarsh

Rob van der Leek

BMarsh

Richard Bos

infobahn

Francois Grieu

Richard Bos

infobahn

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads