Quandry with the following C code (Intermediate)

Discussion in 'C Programming' started by BMarsh, Jan 12, 2005.

  1. BMarsh

    BMarsh Guest

    Hi all,

    I have a slight problem understanding the following code that I saw on
    a Unix-PAM tutorial (not OT!)

    The following code will compare and old string to a new one, bombing
    out if 'max' similar chars is exceeded.


    ------8<------

    static
    int compare(unsigned char *old, unsigned char *new, int max)
    {
    unsigned char in_old[256];
    int equal = 0;

    (void)memset(in_old, 0, sizeof (in_old));

    while (*old)
    in_old[*(old++)]++;

    while (*new) {
    if (in_old[*new])
    equal++;
    new++;
    }

    if (equal > max)
    return (1);

    return (0);
    }
    ------->8---------

    I fail to see how the 2 strings are compared for character equality,
    especially in how the

    in_old[*(old++)]++;

    line is used.
    Could anyone please shed some light on this for me?

    cheers

    Bry
    BMarsh, Jan 12, 2005
    #1
    1. Advertising

  2. Hi Bry,

    In article <>,
    BMarsh wrote:
    > I have a slight problem understanding the following code that I saw on
    > a Unix-PAM tutorial (not OT!)
    >
    > The following code will compare and old string to a new one, bombing
    > out if 'max' similar chars is exceeded.
    >
    > ------8<------
    >
    > static
    > int compare(unsigned char *old, unsigned char *new, int max)
    > {
    > unsigned char in_old[256];
    > int equal = 0;
    >
    > (void)memset(in_old, 0, sizeof (in_old));
    >
    > while (*old)
    > in_old[*(old++)]++;
    >
    > while (*new) {
    > if (in_old[*new])
    > equal++;
    > new++;
    > }
    >
    > if (equal > max)
    > return (1);
    >
    > return (0);
    > }
    > ------->8---------
    >
    > I fail to see how the 2 strings are compared for character equality,
    > especially in how the
    >
    > in_old[*(old++)]++;


    The numerical character value of each character in the first input
    string is used as an index for an array that counts the occurrences of
    that character. Think about it like this: when the input string is "aab"
    the first while loop does: in_old['a']++, in_old['a']++, in_old['b']++.

    The second while loop checks for each character in the second input
    string if it occurred in the first input string.

    The first while loop could also be written as:

    while (*old) {
    in_old[*old]++;
    old++;
    }

    Regards,
    --
    Rob van der Leek | rob(at)ricardis(dot)tudelft(dot)nl
    Rob van der Leek, Jan 12, 2005
    #2
    1. Advertising

  3. BMarsh

    BMarsh Guest

    Hi Rob,

    Many thanks for your answer; it's cleared it up for me! I was totally
    thrown off by the way the loop was written.

    Thanks again,

    Bryan.
    BMarsh, Jan 12, 2005
    #3
  4. BMarsh

    Richard Bos Guest

    "BMarsh" <> wrote:

    > The following code will compare and old string to a new one, bombing
    > out if 'max' similar chars is exceeded.


    It doesn't do a compare the usual way. That is, it does something
    completely different from strcmp().

    (Oh, btw, if you insist on posting through Google-Broken-Beta, it would
    be a good thing if you could get it not to strip all indentation. Your
    code is hard to read this way.)

    > static
    > int compare(unsigned char *old, unsigned char *new, int max)
    > {
    > unsigned char in_old[256];


    First of all, you need to use UCHAR_MAX here, instead of 256. If you
    don't, you may try to run this code on a Unicode system some day, and be
    surprised when your function scribbles all over memory when you pass it
    a string with Unicode characters over 256 in it.

    > int equal = 0;
    >
    > (void)memset(in_old, 0, sizeof (in_old));


    Lose the cast. It does no good, and clutters up the code.

    > while (*old)
    > in_old[*(old++)]++;


    This tallies the number of occurrences of each separate character value
    in the first string. There's a bug in it: what happens if you pass it a
    string of UCHAR_MAX 'a's?

    > while (*new) {
    > if (in_old[*new])
    > equal++;
    > new++;


    (See what I mean about the indentation?)

    This checks each character in the second string, and if there were any
    of the same character at all in the first string, counts it as "equal".

    > }
    >
    > if (equal > max)
    > return (1);
    >
    > return (0);


    If the number of "equal" characters, that is, the number of chars in the
    second string of which there was at least one in the first string,
    exceeds the passed-in maximum, return 1, else 0. This could be more
    easily written as

    return (equal>max);

    > I fail to see how the 2 strings are compared for character equality,


    So do I; they're not.

    Note, in particular, the different treatment of "old" and "new".

    For example, try to explain the discrepancy between

    compare("abc", "dbbbe", 2)

    and

    compare("dbbbe", "abc", 2)

    Then, when you want an exercise I can't solve, try to explain _why_
    someone would write a function like that, and then call it, sec,
    "compare". The logic escapes me, I'm afraid. It's reasonably clear to me
    _what_ this function does, but not why.

    > especially in how the
    >
    > in_old[*(old++)]++;


    The index entry corresponding to the character at the _current_ value of
    old is increased (that is, the character now under the old pointer is
    tallied); and old is moved to the next character. Not necessarily in
    that order, or in any order at all, but since (old++) returns the old
    value of old (so to speak) no matter which order is chosen, it doesn't
    matter for the result.

    Richard
    Richard Bos, Jan 13, 2005
    #4
  5. BMarsh

    infobahn Guest

    Richard Bos wrote:
    >
    > "BMarsh" <> wrote:
    >


    <snip>

    > > static
    > > int compare(unsigned char *old, unsigned char *new, int max)
    > > {
    > > unsigned char in_old[256];

    >
    > First of all, you need to use UCHAR_MAX here, instead of 256.


    I think you mean "UCHAR_MAX + 1"

    > If you
    > don't, you may try to run this code on a Unicode system some day, and be
    > surprised when your function scribbles all over memory when you pass it
    > a string with Unicode characters over 256 in it.


    I think you mean "over 255"

    <snip>
    infobahn, Jan 13, 2005
    #5
  6. infobahn <> wrote:

    > Richard Bos wrote:
    > >
    > > "BMarsh" <> wrote:
    > > > unsigned char in_old[256];

    > >
    > > First of all, you need to use UCHAR_MAX here, instead of 256.

    >
    > I think you mean "UCHAR_MAX + 1"


    Do we need "UCHAR_MAX + 1L" to cover the case of UCHAR_MAX
    equal to UINT_MAX, say both 0xFFFF ?

    Francois Grieu
    Francois Grieu, Jan 14, 2005
    #6
  7. BMarsh

    Richard Bos Guest

    Francois Grieu <> wrote:

    > infobahn <> wrote:
    >
    > > Richard Bos wrote:
    > > >
    > > > "BMarsh" <> wrote:
    > > > > unsigned char in_old[256];
    > > >
    > > > First of all, you need to use UCHAR_MAX here, instead of 256.

    > >
    > > I think you mean "UCHAR_MAX + 1"


    Yes (and yes).

    > Do we need "UCHAR_MAX + 1L" to cover the case of UCHAR_MAX
    > equal to UINT_MAX, say both 0xFFFF ?


    In theory, yes. In practice, systems where SCHAR_MAX == INT_MAX or
    UCHAR_MAX==UINT_MAX have so many problems that I wouldn't bother to
    cater for them. Anyone porting code to that kind of implementation knows
    he's getting into a hornets' (or mare's <g>) nest, and should take all
    necessary precautions himself.
    (And why stop there? What if UCHAR_MAX==ULONG_MAX? Could happen
    (probably does happen) on a 32-bit embedded processor.)

    Richard
    Richard Bos, Jan 14, 2005
    #7
  8. BMarsh

    infobahn Guest

    Francois Grieu wrote:
    > infobahn <> wrote:
    > > Richard Bos wrote:
    > > > First of all, you need to use UCHAR_MAX here, instead of 256.

    > >
    > > I think you mean "UCHAR_MAX + 1"

    >
    > Do we need "UCHAR_MAX + 1L" to cover the case of UCHAR_MAX
    > equal to UINT_MAX, say both 0xFFFF ?


    Good spot, although I think we'd have to lump such an implementation
    in with the DS9K. :)

    Actually, this really is a problem on CSILP32 systems such as
    (some) DSPs, and the L suffix doesn't help on such systems.
    infobahn, Jan 14, 2005
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Chad
    Replies:
    0
    Views:
    626
  2. Fred
    Replies:
    3
    Views:
    1,071
    Bobby Ryzhy
    Jul 12, 2004
  3. Jack Carter

    namespace/dictionary quandry

    Jack Carter, Sep 18, 2004, in forum: Python
    Replies:
    9
    Views:
    303
    Jack Carter
    Sep 22, 2004
  4. design quandry ..

    , Oct 26, 2005, in forum: C++
    Replies:
    2
    Views:
    265
  5. Topi
    Replies:
    6
    Views:
    875
    Brian Drummond
    May 6, 2011
Loading...

Share This Page