Re: Finding out if a struct is in an array of structs

Discussion in 'C Programming' started by Keith Thompson, Aug 11, 2009.

  1. (Richard Harter) writes:
    > I would like to know the "best" way to find out if a struct is in
    > an array of structs given the address of the struct and the
    > address of the array of structs. I want to do this cleanly,
    > i.e., with no potential undefined behaviour.
    >
    > The obvious way is to subtract the address of the struct from the
    > address of the array and see if the difference is in range;
    > however this is a no-no.
    >
    > Is there any clean way of doing this other than looping through
    > the array and comparing structs field by field?


    Not without avoiding undefined behavior.

    But if you're willing to assume that the undefined behavior of pointer
    comparison doesn't do anything worse than give you an incorrect
    result, you can do something like this:

    Use "<=" to determine whether the struct's address is within the
    range of addresses of the array's element.

    If it isn't, you're done; the struct isn't in the array.

    If it is, use "-" to compute the index i, and check whether the
    address of arr matches the address of the struct.

    The last step is unnecessary if you can assume an ordinary linear
    addressing scheme, but could be necessary if, for example, each
    pointer is a base+offset pair and "<=" compares only the offsets
    (because the base will be the same within an object).

    The tricky part is finding a system with a non-linear addressing
    scheme so you can test the code.

    Of course it will fail on the DS9K.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Aug 11, 2009
    #1
    1. Advertising

  2. Keith Thompson

    Phil Carmody Guest

    Keith Thompson <> writes:
    > (Richard Harter) writes:
    >> I would like to know the "best" way to find out if a struct is in
    >> an array of structs given the address of the struct and the
    >> address of the array of structs. I want to do this cleanly,
    >> i.e., with no potential undefined behaviour.
    >>
    >> The obvious way is to subtract the address of the struct from the
    >> address of the array and see if the difference is in range;
    >> however this is a no-no.
    >>
    >> Is there any clean way of doing this other than looping through
    >> the array and comparing structs field by field?

    >
    > Not without avoiding undefined behavior.
    >
    > But if you're willing to assume that the undefined behavior of pointer
    > comparison doesn't do anything worse than give you an incorrect
    > result, you can do something like this:
    >
    > Use "<=" to determine whether the struct's address is within the
    > range of addresses of the array's element.
    >
    > If it isn't, you're done; the struct isn't in the array.
    >
    > If it is, use "-" to compute the index i, and check whether the
    > address of arr matches the address of the struct.
    >
    > The last step is unnecessary if you can assume an ordinary linear
    > addressing scheme, but could be necessary if, for example, each
    > pointer is a base+offset pair and "<=" compares only the offsets
    > (because the base will be the same within an object).
    >
    > The tricky part is finding a system with a non-linear addressing
    > scheme so you can test the code.
    >
    > Of course it will fail on the DS9K.


    I should fling this question to c.s.c, but I sometimes wonder
    why issues like this weren't made implementation defined rather
    than undefined?

    Phil
    --
    If GML was an infant, SGML is the bright youngster far exceeds
    expectations and made its parents too proud, but XML is the
    drug-addicted gang member who had committed his first murder
    before he had sex, which was rape. -- Erik Naggum (1965-2009)
     
    Phil Carmody, Aug 11, 2009
    #2
    1. Advertising

  3. Kenneth Brody <> writes:
    > Phil Carmody wrote:
    >> Keith Thompson <> writes:
    >>> (Richard Harter) writes:

    > [... comparing addresses in different objects ...]
    >>> The last step is unnecessary if you can assume an ordinary linear
    >>> addressing scheme, but could be necessary if, for example, each
    >>> pointer is a base+offset pair and "<=" compares only the offsets
    >>> (because the base will be the same within an object).
    >>>
    >>> The tricky part is finding a system with a non-linear addressing
    >>> scheme so you can test the code.

    >
    > BTDT. Real-mode x86 CPU, using "large model" addressing.
    >
    >>> Of course it will fail on the DS9K.

    >>
    >> I should fling this question to c.s.c, but I sometimes wonder
    >> why issues like this weren't made implementation defined rather
    >> than undefined?

    >
    > Can an implementation define implementation-defined behavior as "it
    > doesn't work"? If not, then that's probably your answer.


    Not in general, no. For each instance of unspecified or
    implementation-defined behavior, the standard specifies a set
    of possible behaviors, and the implementation gets to choose
    from that set. (The difference between the two is that if it's
    implementation-defined, the implementation has to document its
    choice; I-D behavior is actually a subset of unspecified behavior.)

    In the case of relational operators on pointers, the standard says
    that the comparison (p0 <= p1) invokes undefined behavior if p0 and
    p1 don't point into the same object (stated imprecisely for brevity).
    It *could* have said that the result is unspecified, but must be
    either 0 or 1. But how much should, or could, the standard specify?
    Must the result be consistent from one comparison to the next?

    And for the problem at hand, my proposed not-quite-portable solution
    invokes UB both for the comparison and for pointer arithmetic.

    The problem, paraphrased is:

    Given a pointer ptr to a struct foo and an array arr of struct
    foo, does ptr point to an element of arr?

    The portable solution is to use "==" to compare ptr to each &arr,
    but that's inefficient (O(N)).

    My non-portable solution is (untested):

    if (ptr >= &arr[0] && ptr < &arr[ARR_MAX]) {
    /* ptr *might* point to an element of arr */
    ptrdiff_t i = ptr - &arr[0];
    if (ptr == &arr) {
    /* ptr definitely points to an element of arr */
    }
    else {
    /* ptr doesn't point to an element of arr;
    ">=" and "<" gave us a false positive */
    }
    }
    else {
    /* ptr doesn't point to an element of arr */
    }

    But for this to work, the behavior of ``ptr - &arr[0]'' has to be
    sufficiently benign. If the ">=" and "<" produce a false positive
    result, then ``ptr - &arr[0]'' yields a pointer to some invalid
    location, with (in principle) arbitrarily bad results either from
    computing it or from using it.

    Probably the authors of the standard gave this some thought
    and decided that covering all the possible corner cases for
    implementation-defined behavior was too difficult, given the wide
    potential variety of addressing schemes (linear and x86-style
    segmented aren't the only possibilities) and the relatively small
    benefit.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Aug 12, 2009
    #3
  4. Keith Thompson

    Eric Sosman Guest

    Keith Thompson wrote:
    > [...] For each instance of unspecified or
    > implementation-defined behavior, the standard specifies a set
    > of possible behaviors, and the implementation gets to choose
    > from that set. (The difference between the two is that if it's
    > implementation-defined, the implementation has to document its
    > choice; I-D behavior is actually a subset of unspecified behavior.)


    Are you sure of the "choose from that set" part? For
    example, when attempting to convert an out-of-range value to
    a signed integer type "either the result is implementation-
    defined or an implementation-defined signal is raised." The
    I-D result could be viewed as a choice from the set of all
    possible values of the target type, but where does the Standard
    enumerate the I-D signals that might be raised?

    (I reject the use of "choice from a set" when the set is
    "everything you can imagine, plus all the things you can't,
    not enumerated here." That way lies Russel's Paradox, not a
    fit topic for a Standard.)

    > The problem, paraphrased is:
    >
    > Given a pointer ptr to a struct foo and an array arr of struct
    > foo, does ptr point to an element of arr?
    >
    > The portable solution is to use "==" to compare ptr to each &arr,
    > but that's inefficient (O(N)).


    It's O(1) with a good choice of the first `i'. :)

    (There's actually a germ of truth behind the smart-ass
    joke: It's what makes hash tables fast. Unfortunately, I
    can't think of a 100% foolproof way to apply hashing to the
    problem. You could develop a hash code from the bytes of
    `ptr', but then what? The Standard does not require that
    equal pointer values have identical representations, so two
    pointers that compare equal could have different hashes.)

    --
    Eric Sosman
    lid
     
    Eric Sosman, Aug 12, 2009
    #4
  5. Eric Sosman <> writes:
    > Keith Thompson wrote:
    >> [...] For each instance of unspecified or
    >> implementation-defined behavior, the standard specifies a set
    >> of possible behaviors, and the implementation gets to choose
    >> from that set. (The difference between the two is that if it's
    >> implementation-defined, the implementation has to document its
    >> choice; I-D behavior is actually a subset of unspecified behavior.)

    >
    > Are you sure of the "choose from that set" part? For
    > example, when attempting to convert an out-of-range value to
    > a signed integer type "either the result is implementation-
    > defined or an implementation-defined signal is raised." The
    > I-D result could be viewed as a choice from the set of all
    > possible values of the target type, but where does the Standard
    > enumerate the I-D signals that might be raised?
    >
    > (I reject the use of "choice from a set" when the set is
    > "everything you can imagine, plus all the things you can't,
    > not enumerated here." That way lies Russel's Paradox, not a
    > fit topic for a Standard.)


    The set of signals is finite, and each signal is represented by an int
    value (see the signal and raise functions in <signal.h>).

    And for a given implementation, the set of signals is likely to be
    much smaller than that.

    [...]

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Aug 12, 2009
    #5
  6. Keith Thompson

    jameskuyper Guest

    Eric Sosman wrote:
    > Keith Thompson wrote:
    > > [...] For each instance of unspecified or
    > > implementation-defined behavior, the standard specifies a set
    > > of possible behaviors, and the implementation gets to choose
    > > from that set. (The difference between the two is that if it's
    > > implementation-defined, the implementation has to document its
    > > choice; I-D behavior is actually a subset of unspecified behavior.)

    >
    > Are you sure of the "choose from that set" part? For
    > example, when attempting to convert an out-of-range value to
    > a signed integer type "either the result is implementation-
    > defined or an implementation-defined signal is raised." The
    > I-D result could be viewed as a choice from the set of all
    > possible values of the target type, but where does the Standard
    > enumerate the I-D signals that might be raised?


    It specifies that signal are identified by numbers that are ints and
    must be positive so the set is those signals identified by numbers in
    the range [1, INT_MAX].
     
    jameskuyper, Aug 12, 2009
    #6
  7. Keith Thompson

    Eric Sosman Guest

    jameskuyper wrote:
    >
    > Eric Sosman wrote:
    >> Keith Thompson wrote:
    >>> [...] For each instance of unspecified or
    >>> implementation-defined behavior, the standard specifies a set
    >>> of possible behaviors, and the implementation gets to choose
    >>> from that set. (The difference between the two is that if it's
    >>> implementation-defined, the implementation has to document its
    >>> choice; I-D behavior is actually a subset of unspecified behavior.)

    >> Are you sure of the "choose from that set" part? For
    >> example, when attempting to convert an out-of-range value to
    >> a signed integer type "either the result is implementation-
    >> defined or an implementation-defined signal is raised." The
    >> I-D result could be viewed as a choice from the set of all
    >> possible values of the target type, but where does the Standard
    >> enumerate the I-D signals that might be raised?

    >
    > It specifies that signal are identified by numbers that are ints and
    > must be positive so the set is those signals identified by numbers in
    > the range [1, INT_MAX].


    ... and INT_MAX is in the range [32767, um...]

    --
    Eric Sosman
    lid
     
    Eric Sosman, Aug 12, 2009
    #7
  8. Keith Thompson

    jameskuyper Guest

    Eric Sosman wrote:
    > jameskuyper wrote:
    > >
    > > Eric Sosman wrote:
    > >> Keith Thompson wrote:
    > >>> [...] For each instance of unspecified or
    > >>> implementation-defined behavior, the standard specifies a set
    > >>> of possible behaviors, and the implementation gets to choose
    > >>> from that set. (The difference between the two is that if it's
    > >>> implementation-defined, the implementation has to document its
    > >>> choice; I-D behavior is actually a subset of unspecified behavior.)
    > >> Are you sure of the "choose from that set" part? For
    > >> example, when attempting to convert an out-of-range value to
    > >> a signed integer type "either the result is implementation-
    > >> defined or an implementation-defined signal is raised." The
    > >> I-D result could be viewed as a choice from the set of all
    > >> possible values of the target type, but where does the Standard
    > >> enumerate the I-D signals that might be raised?

    > >
    > > It specifies that signal are identified by numbers that are ints and
    > > must be positive so the set is those signals identified by numbers in
    > > the range [1, INT_MAX].

    >
    > ... and INT_MAX is in the range [32767, um...]


    You see that as a problem? It's no more of a problem than the fact
    that the value of INT_MAX itself is implementation-defined, and has no
    upper limit. The set of options permitted when an implementation
    feature is unspecified need not be finite, or even countable, and
    choices with respect to different unspecified features of an
    implementation might interact with each other. None of that matters,
    so long as it is possible to determine whether a given choice is in
    the permitted set.

    Another way to look at it is that the set of permitted values for both
    INT_MAX and the implementation-defined signal is the positive
    integers, with the constraints that 65535 <= INT_MAX and the signal
    number <= INT_MAX.
     
    jameskuyper, Aug 12, 2009
    #8
  9. Keith Thompson

    Richard Bos Guest

    Eric Sosman <> wrote:

    > jameskuyper wrote:
    > >
    > > Eric Sosman wrote:


    > >> Are you sure of the "choose from that set" part? For
    > >> example, when attempting to convert an out-of-range value to
    > >> a signed integer type "either the result is implementation-
    > >> defined or an implementation-defined signal is raised." The
    > >> I-D result could be viewed as a choice from the set of all
    > >> possible values of the target type, but where does the Standard
    > >> enumerate the I-D signals that might be raised?

    > >
    > > It specifies that signal are identified by numbers that are ints and
    > > must be positive so the set is those signals identified by numbers in
    > > the range [1, INT_MAX].

    >
    > ... and INT_MAX is in the range [32767, um...]


    Where um is large, but for each possible implementation, some finite
    number. It's an algorithmically specified set, not an enumerated one;
    but it is a set.

    Richard
     
    Richard Bos, Aug 15, 2009
    #9
  10. Keith Thompson

    Tim Rentsch Guest

    Keith Thompson <> writes:

    > Kenneth Brody <> writes:
    >>
    >> [Is an implementation allowed complete freedom of choice
    >> for implementation-defined behavior.]

    >
    > Not in general, no. For each instance of unspecified or
    > implementation-defined behavior, the standard specifies a set
    > of possible behaviors, and the implementation gets to choose
    > from that set.


    That's right in principle, but the principle isn't always
    upheld. For example, what constitutes an access to a
    volatile-qualified object -- it's implementation-defined,
    but no set is identified.
     
    Tim Rentsch, Sep 29, 2009
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Patricia  Van Hise

    structs with fields that are structs

    Patricia Van Hise, Apr 5, 2004, in forum: C Programming
    Replies:
    5
    Views:
    641
    Al Bowers
    Apr 5, 2004
  2. Chris Hauxwell

    const structs in other structs

    Chris Hauxwell, Apr 23, 2004, in forum: C Programming
    Replies:
    6
    Views:
    560
    Chris Hauxwell
    Apr 27, 2004
  3. Paminu
    Replies:
    5
    Views:
    645
    Eric Sosman
    Oct 11, 2005
  4. Antoninus Twink

    Re: Finding out if a struct is in an array of structs

    Antoninus Twink, Aug 12, 2009, in forum: C Programming
    Replies:
    13
    Views:
    618
    Nobody
    Aug 19, 2009
  5. Tuan  Bui
    Replies:
    14
    Views:
    477
    it_says_BALLS_on_your forehead
    Jul 29, 2005
Loading...

Share This Page