Re: Finding out if a struct is in an array of structs

Discussion in 'C Programming' started by Antoninus Twink, Aug 12, 2009.

  1. On 11 Aug 2009 at 16:35, Eric Sosman wrote:
    > for (p = &array_of_foo[0]; p < &array_of_foo[N]; ++p) {
    > if (p == unknown)
    > return FOUND_IT;
    > }
    > return NOT_FOUND;
    >
    > As far as I can tell, this is the only 100% bullet-proof method.


    Come on Eric, leave that stupidity to Heathfield and his ilk.

    Why pay the price of a completely unnecessary loop through a possibly
    huge array, just to satisfy your paranoia that a simple pointer
    comparison will fail on some hypothetical outlandish platform? It
    doesn't make any sense in the real world.
     
    Antoninus Twink, Aug 12, 2009
    #1
    1. Advertising

  2. Antoninus Twink

    bartc Guest

    "Kenneth Brody" <> wrote in message
    news:...
    > Antoninus Twink wrote:
    >> On 11 Aug 2009 at 16:35, Eric Sosman wrote:
    >>> for (p = &array_of_foo[0]; p < &array_of_foo[N]; ++p) {
    >>> if (p == unknown)
    >>> return FOUND_IT;
    >>> }
    >>> return NOT_FOUND;
    >>>
    >>> As far as I can tell, this is the only 100% bullet-proof method.

    >>
    >> Come on Eric, leave that stupidity to Heathfield and his ilk.
    >>
    >> Why pay the price of a completely unnecessary loop through a possibly
    >> huge array, just to satisfy your paranoia that a simple pointer
    >> comparison will fail on some hypothetical outlandish platform? It
    >> doesn't make any sense in the real world.

    >
    > You've obviously never heard of "segmented memory architecture", such as
    > the Intel x86 line of processors. Two objects may be in different
    > segments, and any comparison other than "==" and "!=" does not need to use
    > the segment when doing the compare. It is perfectly legal, for example,
    > that "pt1 < pt2" will only compare the offsets, and ignore the segments.


    That doesn't sound right. What good does it do to compare two offsets, if
    the segments they belong to could be different?

    I would expect the addresses to be normalised in some way before comparing.

    --
    Bartc
     
    bartc, Aug 13, 2009
    #2
    1. Advertising

  3. "bartc" <> writes:

    > "Kenneth Brody" <> wrote in message
    > news:...
    >> Antoninus Twink wrote:
    >>> On 11 Aug 2009 at 16:35, Eric Sosman wrote:
    >>>> for (p = &array_of_foo[0]; p < &array_of_foo[N]; ++p) {
    >>>> if (p == unknown)
    >>>> return FOUND_IT;
    >>>> }
    >>>> return NOT_FOUND;
    >>>>
    >>>> As far as I can tell, this is the only 100% bullet-proof method.
    >>>
    >>> Come on Eric, leave that stupidity to Heathfield and his ilk.
    >>>
    >>> Why pay the price of a completely unnecessary loop through a possibly
    >>> huge array, just to satisfy your paranoia that a simple pointer
    >>> comparison will fail on some hypothetical outlandish platform? It
    >>> doesn't make any sense in the real world.

    >>
    >> You've obviously never heard of "segmented memory architecture",
    >> such as the Intel x86 line of processors. Two objects may be in
    >> different segments, and any comparison other than "==" and "!=" does
    >> not need to use the segment when doing the compare. It is perfectly
    >> legal, for example, that "pt1 < pt2" will only compare the offsets,
    >> and ignore the segments.

    >
    > That doesn't sound right. What good does it do to compare two offsets,
    > if the segments they belong to could be different?


    That's the point. If the compiler knows that no object spans more
    than one segment, the standard permits a cheap offset comparison
    because anything else is UB.

    > I would expect the addresses to be normalised in some way before
    > comparing.


    And this is often done, but it does not have to be done in all cases.

    Part of the problem is the term "platform". Even on a
    segmented-memory machine, the implementation may choose to normalise
    all pointers prior to comparison simple because the idiom is so
    common. As a result, a "platform" that exhibits the problem may
    consist of a CPU (and memory) architecture, a compiler /and/ a set of
    compiler switches, rather just a CPU family.

    --
    Ben.
     
    Ben Bacarisse, Aug 13, 2009
    #3
  4. "bartc" <> writes:
    > "Kenneth Brody" <> wrote in message
    > news:...
    >> Antoninus Twink wrote:
    >>> On 11 Aug 2009 at 16:35, Eric Sosman wrote:
    >>>> for (p = &array_of_foo[0]; p < &array_of_foo[N]; ++p) {
    >>>> if (p == unknown)
    >>>> return FOUND_IT;
    >>>> }
    >>>> return NOT_FOUND;
    >>>>
    >>>> As far as I can tell, this is the only 100% bullet-proof method.
    >>>

    [snip]
    >>
    >> You've obviously never heard of "segmented memory architecture",
    >> such as the Intel x86 line of processors. Two objects may be in
    >> different segments, and any comparison other than "==" and "!=" does
    >> not need to use the segment when doing the compare. It is perfectly
    >> legal, for example, that "pt1 < pt2" will only compare the offsets,
    >> and ignore the segments.

    >
    > That doesn't sound right. What good does it do to compare two offsets,
    > if the segments they belong to could be different?


    It doesn't do much good, which is why the standard doesn't define
    the behavior.

    > I would expect the addresses to be normalised in some way before comparing.


    An implementation could do that, but it's not required to. And it's
    not clear that the normalization would give you a meaningful result.

    Suppose comparison of just the offset portion of two addresses is
    significantly more efficient than normalizing and comparing the
    entire base+offset (or whatever). Then by *not* normalizing, the
    compiler can make valid comparisons (within an array) faster, at the
    expense of bad results for comparisons that aren't within an object.
    The standard deliberately allows an implementation to make that
    tradeoff.

    Note that "==" and "!=" comparisons, on segmented architectures,
    *do* have to normalize before comparing.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Aug 13, 2009
    #4
  5. Antoninus Twink

    Nobody Guest

    On Wed, 12 Aug 2009 23:12:09 +0000, bartc wrote:

    >> You've obviously never heard of "segmented memory architecture", such as
    >> the Intel x86 line of processors. Two objects may be in different
    >> segments, and any comparison other than "==" and "!=" does not need to use
    >> the segment when doing the compare. It is perfectly legal, for example,
    >> that "pt1 < pt2" will only compare the offsets, and ignore the segments.

    >
    > That doesn't sound right. What good does it do to compare two offsets, if
    > the segments they belong to could be different?


    If the pointers are in different segments, they don't refer to the same
    array, therefore a comparison is undefined. So there's no need to use a
    significantly more computationally expensive comparison.

    > I would expect the addresses to be normalised in some way before comparing.


    Why? Comparing offsets suffices and is far quicker. This is the whole
    point of having separate "large" and "huge" memory models. The huge model
    allows arrays larger than 64KiB, at the expense of requiring pointer
    normalisation.
     
    Nobody, Aug 13, 2009
    #5
  6. Antoninus Twink

    bartc Guest

    "Nobody" <> wrote in message
    news:p...
    > On Wed, 12 Aug 2009 23:12:09 +0000, bartc wrote:
    >
    >>> You've obviously never heard of "segmented memory architecture", such as
    >>> the Intel x86 line of processors. Two objects may be in different
    >>> segments, and any comparison other than "==" and "!=" does not need to
    >>> use
    >>> the segment when doing the compare. It is perfectly legal, for example,
    >>> that "pt1 < pt2" will only compare the offsets, and ignore the segments.

    >>
    >> That doesn't sound right. What good does it do to compare two offsets, if
    >> the segments they belong to could be different?

    >
    > If the pointers are in different segments, they don't refer to the same
    > array, therefore a comparison is undefined. So there's no need to use a
    > significantly more computationally expensive comparison.
    >
    >> I would expect the addresses to be normalised in some way before
    >> comparing.

    >
    > Why? Comparing offsets suffices and is far quicker. This is the whole
    > point of having separate "large" and "huge" memory models. The huge model
    > allows arrays larger than 64KiB, at the expense of requiring pointer
    > normalisation.


    I can hardly remember how these used to work. But, given a pointer
    consisting only of an offset, how would you dereference it? It seems that
    something somewhere needs to know what segment it's in.

    In that case that's not far from being able to do:

    if ((hugeptr)offseta == (hugeptr)offsetb)

    which is at the programmer's request so he knows it will take a bit longer
    than offseta==offsetb. (Although there's still the problem of normalising
    the large pointers so that they have the same value when pointing at the
    same thing.)

    --
    Bart
     
    bartc, Aug 13, 2009
    #6
  7. Antoninus Twink

    James Kuyper Guest

    bartc wrote:
    >
    > "Nobody" <> wrote in message
    > news:p...
    >> On Wed, 12 Aug 2009 23:12:09 +0000, bartc wrote:

    ....
    >>> That doesn't sound right. What good does it do to compare two
    >>> offsets, if
    >>> the segments they belong to could be different?

    >>
    >> If the pointers are in different segments, they don't refer to the same
    >> array, therefore a comparison is undefined. So there's no need to use a
    >> significantly more computationally expensive comparison.
    >>
    >>> I would expect the addresses to be normalised in some way before
    >>> comparing.

    >>
    >> Why? Comparing offsets suffices and is far quicker. This is the whole
    >> point of having separate "large" and "huge" memory models. The huge model
    >> allows arrays larger than 64KiB, at the expense of requiring pointer
    >> normalisation.

    >
    > I can hardly remember how these used to work. But, given a pointer
    > consisting only of an offset, how would you dereference it? It seems that
    > something somewhere needs to know what segment it's in.


    I barely remember it, but I apparently remember it a little better than
    you. There was a default segment that was used for ordinary near
    pointers, thereby restricting the total amount of memory that could be
    referred to by all such pointers to the amount that could be stored in
    one segment. I've written many programs for which that restriction would
    not be a serious problem.

    There were also extensions to C which allowed a program to identify a
    particular near pointer as being based upon a specified segment, which
    was always used when dereferencing that particular pointer. As I
    remember it, the segment could be identified numerically, or
    symbolically, or "same as this other pointer". I never used any of these
    options; I just marveled at their complexity - I tried and failed to
    imagine writing code where execution speed would be sufficiently more
    important that development time that it would be worthwhile dealing with
    that complexity.

    The point is, that while a segment number could be identified and used
    to normalize the pointers, any relational comparisons of two pointers
    had undefined behavior unless they pointed to the same segments.
    Therefore, there's no need to perform such normalization. Allowing such
    optimizations is the reason why the standard says that the behavior is
    undefined.
     
    James Kuyper, Aug 13, 2009
    #7
  8. "bartc" <> writes:
    > "Nobody" <> wrote in message
    > news:p...
    >> On Wed, 12 Aug 2009 23:12:09 +0000, bartc wrote:

    [...]
    >>> I would expect the addresses to be normalised in some way before
    >>> comparing.

    >>
    >> Why? Comparing offsets suffices and is far quicker. This is the whole
    >> point of having separate "large" and "huge" memory models. The huge model
    >> allows arrays larger than 64KiB, at the expense of requiring pointer
    >> normalisation.

    >
    > I can hardly remember how these used to work. But, given a pointer
    > consisting only of an offset, how would you dereference it? It seems that
    > something somewhere needs to know what segment it's in.


    Yes. Either something needs to establish which segment is being used
    before the dereference, or the compiler needs to confirm at compile
    time that the currently used segment is the same as the one for the
    pointer.

    > In that case that's not far from being able to do:
    >
    > if ((hugeptr)offseta == (hugeptr)offsetb)
    >
    > which is at the programmer's request so he knows it will take a bit
    > longer than offseta==offsetb. (Although there's still the problem of
    > normalising the large pointers so that they have the same value when
    > pointing at the same thing.)


    Equality comparison is guaranteed to work; in other words, if two
    pointers have different segments but the same offset, they're
    guaranteed to compare unequal. It's only relational operators (< <= >
    >=) that invoke UB for pointers to unrelated objects.


    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Aug 13, 2009
    #8
  9. Antoninus Twink

    Nobody Guest

    On Thu, 13 Aug 2009 08:18:41 +0000, bartc wrote:

    >>> I would expect the addresses to be normalised in some way before
    >>> comparing.

    >>
    >> Why? Comparing offsets suffices and is far quicker. This is the whole
    >> point of having separate "large" and "huge" memory models. The huge model
    >> allows arrays larger than 64KiB, at the expense of requiring pointer
    >> normalisation.

    >
    > I can hardly remember how these used to work. But, given a pointer
    > consisting only of an offset, how would you dereference it? It seems that
    > something somewhere needs to know what segment it's in.


    "near" pointers only contain an offset. They refer to the default (or
    sole) data segment (or code segment for function pointers).

    I was referring to the distinction is between "far" and "huge" pointers.

    A far pointer contains a segment and offset, but the pointer is treated as
    pointing into an array which is completely contained within the specified
    segment, so all pointer arithmetic (and inequality comparisons)
    manipulates the offset alone, leaving the segment untouched.

    A huge pointer also contains a segment and offset, but is treated as
    pointing into an array which can span more than one segment, so any
    pointer arithmetic or comparison considers both segment and offset. E.g.
    when adding/subtracting a 16-bit value to/from the pointer, the segment is
    adjusted in the event of carry/borrow; adding/subtracting a 32-bit value
    to/from a pointer or subtracting pointers involves converting to a 32-bit
    linear address, etc.

    The "large" memory uses far pointers by default, while the "huge" memory
    model uses huge pointers. The latter is more compatible with code which
    assumes a flat address space, but the necessary pointer conversions have a
    significant performance cost.
     
    Nobody, Aug 14, 2009
    #9
  10. Sorry for the late reply. I wouldn't post now if I didn't feel I could
    contribute something. I wish to clear up a thing or two, however off
    topic.

    Groovy hepcat Nobody was jivin' in comp.lang.c on Fri, 14 Aug 2009 2:20
    pm. It's a cool scene! Dig it.

    > On Thu, 13 Aug 2009 08:18:41 +0000, bartc wrote:


    <OT> Re. the segmented memory architecture of the 8086 processor
    family...

    >>>> I would expect the addresses to be normalised in some way before
    >>>> comparing.
    >>>
    >>> Why? Comparing offsets suffices and is far quicker. This is the
    >>> whole point of having separate "large" and "huge" memory models. The
    >>> huge model allows arrays larger than 64KiB, at the expense of
    >>> requiring pointer normalisation.

    >>
    >> I can hardly remember how these used to work. But, given a pointer
    >> consisting only of an offset, how would you dereference it? It seems
    >> that something somewhere needs to know what segment it's in.

    >
    > "near" pointers only contain an offset. They refer to the default (or
    > sole) data segment (or code segment for function pointers).


    Right. Here's how this works. The 8086 can address up to 64kB at a
    time, but can address a total of 1MB. It does this by segmenting its
    memory address range into a number of overlapping regions, called
    segments. It has a number of 16 bit segment registers, eg. CS (code
    segment), DS (data segment), SS (stack segment) and ES (extra segment),
    to distinguish among these segments. In this segmented architecture
    there are several types of pointer.
    The smallest of these is the "near" or "short" pointer. (Note that
    this has nothing to do with C's short data type.) This consists of a 16
    bit offset value. An address is determined by combining this offset
    with the value of one of the segment registers. This results in a 20
    bit linear address (hence the 2 to the power of 20 bit or 1MB address
    range of the 8086). A program using a near pointer only has to
    manipulate a 16 bit offset. Near pointers are always relative to one of
    the segment registers (usually DS or SS).
    Next comes the "far" or "long" pointer (nothing to do with C's long
    data type). This is a 32 bit quantity consisting of a 16 bit segment
    and a 16 bit offset. When a far pointer is used, the offset is relative
    to the segment portion. Pointer arithmetic applies to the offset only.
    The final pointer type is the "huge" pointer. This is normalised to a
    single 20 bit value (contained in a 32 bit word). This gives a linear
    address. This means you can have an object (array or whatever) larger
    than 64kB. When used, the address is internally converted to a
    segment:eek:ffset (far pointer) by the CPU. Pointer arithmetic applies to
    the entire 32 (20) bit pointer. Only the low 20 bits of a huge pointer
    are used.
    The default pointer type depends on the "memory model" used. For
    example, in "small" memory model, all pointers are near unless they are
    explicitly made far or huge. In "large" memory model, all pointers are
    far unless explicitly made near or huge.
    Anyhow, getting to the point... Near pointers are the most efficient,
    because they consist of only an offset value. Far pointers are less
    efficient, but still only manipulate the offset in many operations, to
    improve efficiency. Huge pointers are monsters. They are inefficient
    because they must be cludged to and from segment:eek:ffset pointers within
    the CPU, ignoring the upper 12 bits, and arithmetic is 32 bit (on a 16
    bit CPU). So (and this is the bit that relates to C), comparing
    pointers in C using the relational operators will not give meaningful
    results if your compiler has used near or far pointers to refer to the
    objects whose addresses are being compared and these objects reside in
    different segments. This is likely if, for example, one object is in
    the default data segment (addressed by DS) and the other is on the
    stack (addressed by SS).

    > I was referring to the distinction is between "far" and "huge"
    > pointers.
    >
    > A far pointer contains a segment and offset, but the pointer is
    > treated as pointing into an array which is completely contained within
    > the specified segment, so all pointer arithmetic (and inequality
    > comparisons) manipulates the offset alone, leaving the segment
    > untouched.
    >
    > A huge pointer also contains a segment and offset, but is treated as


    No. A huge pointer is normalised. It doesn't contain a segment and
    offset. Its value is a 20 bit linear memory address. This is converted
    to a segment and offset inside the CPU.

    > pointing into an array which can span more than one segment, so any


    Right.

    > pointer arithmetic or comparison considers both segment and offset.


    No, it considers the huge pointer as a whole (modulo the low 20 bits).
    </OT>

    --
    Dig the sig!

    ----------- Peter 'Shaggy' Haywood ------------
    Ain't I'm a dawg!!
     
    Peter 'Shaggy' Haywood, Aug 17, 2009
    #10
  11. Peter 'Shaggy' Haywood wrote:
    >> I was referring to the distinction is between "far" and "huge"
    >> pointers.
    >>
    >> A far pointer contains a segment and offset, but the pointer is
    >> treated as pointing into an array which is completely contained within
    >> the specified segment, so all pointer arithmetic (and inequality
    >> comparisons) manipulates the offset alone, leaving the segment
    >> untouched.
    >>
    >> A huge pointer also contains a segment and offset, but is treated as

    >
    > No. A huge pointer is normalised. It doesn't contain a segment and
    > offset. Its value is a 20 bit linear memory address. This is converted
    > to a segment and offset inside the CPU.


    A huge pointer does _not_ contain a linear address in the normal sense;
    if you want to look at them that way, it is bits 15:4 that are empty,
    not bits 31:20 as you say.

    Far and huge pointers both contain a segment and offset; the major
    differences are that (A) a huge pointer is normalized by the compiler so
    that the offset is never more than 0xF, and (B) relational comparisons
    between huge pointers use both the segment and offset while relational
    comparisons between far pointers use only the offset (which can result
    in both false negatives and false positives). Equality comparisons
    between equivalent far pointers will also fail if they point to the same
    linear address via different segment:eek:ffset combinations, but the
    normalization of huge pointers makes such a case impossible.

    Far Pointer: SSSSSSSS SSSSSSSS:OOOOOOOO OOOOOOOO
    Huge Pointer: SSSSSSSS SSSSSSSS:ZZZZZZZZ ZZZZOOOO
    Linear Address: ZZZZZZZZ ZZZZLLLL LLLLLLLL LLLLLLLL

    Note: "Z" used for Zero because "0" and "O" may be indistinguishable to
    some readers.

    (For those not familiar with x86 real mode, a 20-bit linear address is
    formed by computing segment << 4 + offset. However, the CPU does not
    expose such addresses to software, and there were no registers that
    could hold such a beast before the i386 anyway.)

    S

    --
    Stephen Sprunk "Stupid people surround themselves with smart
    CCIE #3723 people. Smart people surround themselves with
    K5SSS smart people who disagree with them." --Isaac Jaffe
     
    Stephen Sprunk, Aug 17, 2009
    #11
  12. Antoninus Twink

    Nobody Guest

    On Mon, 17 Aug 2009 05:17:19 +0000, Peter 'Shaggy' Haywood wrote:

    >> A huge pointer also contains a segment and offset, but is treated as

    >
    > No. A huge pointer is normalised. It doesn't contain a segment and
    > offset. Its value is a 20 bit linear memory address. This is converted
    > to a segment and offset inside the CPU.


    Huge pointers are stored as segment and offset, so that you can e.g. pass
    a "huge char **" to a function expecting a "far char **". The linear
    representation is transient, and isn't made visible.

    >> pointer arithmetic or comparison considers both segment and offset.

    >
    > No, it considers the huge pointer as a whole (modulo the low 20 bits).


    That doesn't contradict what I wrote. The arithmetic may well be
    performed using a 32-bit linear address (calculated from the segment and
    offset).

    Although, an inequality comparison on huge pointers will typically
    use a 16-bit comparison on the segments and only compare the offsets if
    the segments are equal, rather than converting both to linear addresses
    and performing a 32-bit comparison.
     
    Nobody, Aug 17, 2009
    #12
  13. Antoninus Twink

    Phil Carmody Guest

    Nobody <> writes:

    > On Mon, 17 Aug 2009 05:17:19 +0000, Peter 'Shaggy' Haywood wrote:
    >
    >>> A huge pointer also contains a segment and offset, but is treated as

    >>
    >> No. A huge pointer is normalised. It doesn't contain a segment and
    >> offset. Its value is a 20 bit linear memory address. This is converted
    >> to a segment and offset inside the CPU.

    >
    > Huge pointers are stored as segment and offset, so that you can e.g. pass
    > a "huge char **" to a function expecting a "far char **". The linear
    > representation is transient, and isn't made visible.


    void fuHP(far char**pp)
    {
    *pp+=16;
    }

    The not-C-compiler may permit you to do this, but it's inviting you
    to hang yourself as it hands you a shotgun and a jar of tranquilisers.

    Phil
    --
    If GML was an infant, SGML is the bright youngster far exceeds
    expectations and made its parents too proud, but XML is the
    drug-addicted gang member who had committed his first murder
    before he had sex, which was rape. -- Erik Naggum (1965-2009)
     
    Phil Carmody, Aug 18, 2009
    #13
  14. Antoninus Twink

    Nobody Guest

    On Tue, 18 Aug 2009 10:49:37 +0300, Phil Carmody wrote:

    >> Huge pointers are stored as segment and offset, so that you can e.g. pass
    >> a "huge char **" to a function expecting a "far char **". The linear
    >> representation is transient, and isn't made visible.

    >
    > void fuHP(far char**pp)
    > {
    > *pp+=16;
    > }
    >
    > The not-C-compiler may permit you to do this, but it's inviting you
    > to hang yourself as it hands you a shotgun and a jar of tranquilisers.


    It's more useful for const-qualified pointers.
     
    Nobody, Aug 19, 2009
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Patricia  Van Hise

    structs with fields that are structs

    Patricia Van Hise, Apr 5, 2004, in forum: C Programming
    Replies:
    5
    Views:
    651
    Al Bowers
    Apr 5, 2004
  2. Chris Hauxwell

    const structs in other structs

    Chris Hauxwell, Apr 23, 2004, in forum: C Programming
    Replies:
    6
    Views:
    564
    Chris Hauxwell
    Apr 27, 2004
  3. Paminu
    Replies:
    5
    Views:
    651
    Eric Sosman
    Oct 11, 2005
  4. Keith Thompson

    Re: Finding out if a struct is in an array of structs

    Keith Thompson, Aug 11, 2009, in forum: C Programming
    Replies:
    9
    Views:
    360
    Tim Rentsch
    Sep 29, 2009
  5. Tuan  Bui
    Replies:
    14
    Views:
    493
    it_says_BALLS_on_your forehead
    Jul 29, 2005
Loading...

Share This Page