STL: how to get the sequence number of a newly added item into a set

Discussion in 'C++' started by bilgekhan, May 25, 2008.

  1. bilgekhan

    bilgekhan Guest

    After doing a succcessful insert() or find() on a set<T> container
    is it possible to get the item number of this item in the set?
    (ie. its zero-based sequence number (position/location/rank/index)
    within the sorted collection)

    Additional info:
    insert() returns a pair<T> and find() returns an iterator.
    Using any of these two objects is it possible to get the
    sequence number of the item these objects point to?
    After the insert() or find() the container won't be modified.
     
    bilgekhan, May 25, 2008
    #1
    1. Advertising

  2. bilgekhan

    Jerry Coffin Guest

    In article <g1c3fs$ts8$>,
    says...
    > After doing a succcessful insert() or find() on a set<T> container
    > is it possible to get the item number of this item in the set?
    > (ie. its zero-based sequence number (position/location/rank/index)
    > within the sorted collection)
    >
    > Additional info:
    > insert() returns a pair<T> and find() returns an iterator.
    > Using any of these two objects is it possible to get the
    > sequence number of the item these objects point to?


    your_iterator = yourset.find(whatever);
    std::distance(yourset.begin(), your_iterator);

    --
    Later,
    Jerry.

    The universe is a figment of its own imagination.
     
    Jerry Coffin, May 25, 2008
    #2
    1. Advertising

  3. bilgekhan

    bilgekhan Guest

    "Jerry Coffin" <> wrote:
    > says...
    > >
    > > After doing a succcessful insert() or find() on a set<T> container
    > > is it possible to get the item number of this item in the set?
    > > (ie. its zero-based sequence number (position/location/rank/index)
    > > within the sorted collection)
    > >
    > > Additional info:
    > > insert() returns a pair<T> and find() returns an iterator.
    > > Using any of these two objects is it possible to get the
    > > sequence number of the item these objects point to?

    >
    > your_iterator = yourset.find(whatever);
    > std::distance(yourset.begin(), your_iterator);


    Yes, distance() does what I wanted, but it is VERY slow!
    It seems each time it walks all items from the beginning to the
    item in question, ie. it practically does a find().
    That's unfortunately too inefficient for my needs.
    Isn't there a better (a direct) method without traversing the set ?

    FYI: If I program this without STL (and then even using an unordered array
    --> ie. each time doing a sequential search) then it is about 7 times
    faster than the STL version that uses set<T> and distance() !
    How come? :)
    The culprit is IMO the distance() call. One needs a better, a direct,
    method without traversing the tree.

    A better way would have been if the insert() and find() functions would
    optionally deliver the distance value on the fly;
    ie. STL needs to extend these 2 functions with a direct 'distance' option.
     
    bilgekhan, May 26, 2008
    #3
  4. bilgekhan

    bilgekhan Guest

    "Daniel T." <> wrote:
    > "bilgekhan" <> wrote:
    >
    > > After doing a succcessful insert() or find() on a set<T> container
    > > is it possible to get the item number of this item in the set?
    > >
    > > (ie. its zero-based sequence number (position/location/rank/index)
    > > within the sorted collection)
    > >
    > > Additional info:
    > > insert() returns a pair<T> and find() returns an iterator.
    > > Using any of these two objects is it possible to get the
    > > sequence number of the item these objects point to?
    > > After the insert() or find() the container won't be modified.

    >
    > set<int>::iterator it = s.insert( x ).first;
    > // or = s.find( x );
    >
    > int seqNumb = distance( s.begin(), it );



    Yes, distance() does what I wanted, but it is VERY slow!
    It seems each time it walks all items from the beginning to the
    item in question, ie. it practically does a find().
    That's unfortunately too inefficient for my needs.
    Isn't there a better (a direct) method without traversing the set ?

    FYI: If I program this without STL (and then even using an unordered array
    --> ie. each time doing a sequential search) then it is about 7 times
    faster than the STL version that uses set<T> and distance() !
    How come? :)
    The culprit is IMO the distance() call. One needs a better, a direct,
    method without traversing the tree.

    A better way would have been if the insert() and find() functions would
    optionally deliver the distance value on the fly;
    ie. STL needs to extend these 2 functions with a direct 'distance' option.
     
    bilgekhan, May 26, 2008
    #4
  5. bilgekhan

    Jerry Coffin Guest

    In article <g1di7e$8ib$>,
    says...

    [ ... ]

    > Yes, distance() does what I wanted, but it is VERY slow!


    That's not terribly surprising.

    > It seems each time it walks all items from the beginning to the
    > item in question, ie. it practically does a find().


    Yes, that's correct.

    > That's unfortunately too inefficient for my needs.
    > Isn't there a better (a direct) method without traversing the set ?


    None of which I'm aware.

    > FYI: If I program this without STL (and then even using an unordered array
    > --> ie. each time doing a sequential search) then it is about 7 times
    > faster than the STL version that uses set<T> and distance() !
    > How come? :)


    A set is composed of nodes connected by pointers. Each node is
    separately allocated via the allocator for the set (which will use new,
    unless you supply an allocator of your own). Those nodes will usually
    have fairly poor locality of reference, which leads to poor cache usage
    (among other things). The result is relatively slow traversal.

    An array is required to be allocated as a single, contiguous block of
    storage. This leads to relatively good locality of reference, better
    cache usage, and (generally) relatively fast traversal.

    > The culprit is IMO the distance() call. One needs a better, a direct,
    > method without traversing the tree.
    >
    > A better way would have been if the insert() and find() functions would
    > optionally deliver the distance value on the fly;
    > ie. STL needs to extend these 2 functions with a direct 'distance' option.


    This requirement is sufficiently unusual that I doubt that'll happen.
    Anything that supports random access iterators can do distance in
    constant time.

    Since a set is typically implemented as a balanced tree, you could
    theoretically improve the speed of distance() by having each pointer in
    a node also record the number of nodes in the subtree at which it points
    (or at least in its left sub-tree). This would avoid traversing at least
    part of the tree most of the time -- OTOH, maintaining those counts
    would slow down insertions and deletions. TANSTAAFL.

    --
    Later,
    Jerry.

    The universe is a figment of its own imagination.
     
    Jerry Coffin, May 26, 2008
    #5
  6. bilgekhan

    James Kanze Guest

    Re: STL: how to get the sequence number of a newly added item into aset

    On May 25, 7:05 pm, Sam <> wrote:
    > bilgekhan writes:
    > > After doing a succcessful insert() or find() on a set<T> container
    > > is it possible to get the item number of this item in the set?
    > > (ie. its zero-based sequence number (position/location/rank/index)
    > > within the sorted collection)


    > There are no "sequence numbers" for members of a set.
    > Individual members of sets are accessed via iterators.


    A set has a defined order, so there is a defined and guaranteed
    bijection between elements of the set and the set of integers in
    [0...s.size()).

    > > Additional info: insert() returns a pair<T> and find()
    > > returns an iterator. Using any of these two objects is it
    > > possible to get the sequence number of the item these
    > > objects point to?


    > No, because there is no such a concept as a "sequence number".


    Anytime you have a defined order, there is a concept of sequence
    number. Set doesn't support it directly, for reasons explained
    by Jerry Coffin, but the concept definitly exists.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, May 26, 2008
    #6
  7. bilgekhan

    James Kanze Guest

    Re: STL: how to get the sequence number of a newly added item into aset

    On May 26, 8:04 am, Jerry Coffin <> wrote:
    > In article <g1di7e$>,
    > says...


    > [ ... ]


    > > Yes, distance() does what I wanted, but it is VERY slow!


    > That's not terribly surprising.


    > > It seems each time it walks all items from the beginning to the
    > > item in question, ie. it practically does a find().


    > Yes, that's correct.


    > > That's unfortunately too inefficient for my needs.
    > > Isn't there a better (a direct) method without traversing the set ?


    > None of which I'm aware.


    Perhaps the solution to his problem would be to use an
    std::vector, and keep it sorted (using std::lower_bound to find
    the insertion point). If insertions aren't too frequent, or
    copying is fairly cheap, this can be even faster than an
    std::set. And of course, given an iterator to an element, it's
    trivial and very fast to find the elements sequence number.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, May 26, 2008
    #7
  8. bilgekhan

    Jerry Coffin Guest

    Re: STL: how to get the sequence number of a newly added item into a set

    In article <611c02c0-da39-45cc-95aa-767706d1c0a8
    @a70g2000hsh.googlegroups.com>, says...

    [ ... ]

    > Perhaps the solution to his problem would be to use an
    > std::vector, and keep it sorted (using std::lower_bound to find
    > the insertion point). If insertions aren't too frequent, or
    > copying is fairly cheap, this can be even faster than an
    > std::set. And of course, given an iterator to an element, it's
    > trivial and very fast to find the elements sequence number.


    Right -- I'd add, however, that my mention of random access iterators
    instead of vectors was intentional. Under the circumstances at hand, a
    deque might well be a better choice, since it halves the number of
    elements that need to be shuffled for an average insertion.


    --
    Later,
    Jerry.

    The universe is a figment of its own imagination.
     
    Jerry Coffin, May 26, 2008
    #8
  9. bilgekhan

    bilgekhan Guest

    "Daniel T." <> wrote:
    > "bilgekhan" <> wrote:
    > > "Daniel T." <> wrote:
    > > > "bilgekhan" <> wrote:
    > > >
    > > > > After doing a succcessful insert() or find() on a set<T> container
    > > > > is it possible to get the item number of this item in the set?
    > > > >
    > > > > (ie. its zero-based sequence number (position/location/rank/index)
    > > > > within the sorted collection)
    > > > >
    > > > > Additional info:
    > > > > insert() returns a pair<T> and find() returns an iterator.
    > > > > Using any of these two objects is it possible to get the
    > > > > sequence number of the item these objects point to?
    > > > > After the insert() or find() the container won't be modified.
    > > >
    > > > set<int>::iterator it = s.insert( x ).first;
    > > > // or = s.find( x );
    > > >
    > > > int seqNumb = distance( s.begin(), it );

    > >
    > > Yes, distance() does what I wanted, but it is VERY slow!
    > > It seems each time it walks all items from the beginning to the
    > > item in question, ie. it practically does a find().

    >
    > How else could it be done?


    See below.

    > > FYI: If I program this without STL (and then even using an unordered array
    > > --> ie. each time doing a sequential search) then it is about 7 times
    > > faster than the STL version that uses set<T> and distance() !
    > > How come? :)

    >
    > Your comparing apples to oranges. In one implementation you are using an
    > unordered array where all the elements are contiguous, in the other
    > implementation the elements are ordered but not contiguous. But before
    > abandoning the set idea, make sure you have all optimizations turned on.
    > Many standard libraries have extensive error checking built in and make
    > a lot of calls that can be inlined, but the former doesn't get removed
    > and the latter doesn't get inlined unless optimizations are turned on.
    >
    > > The culprit is IMO the distance() call. One needs a better, a direct,
    > > method without traversing the tree.

    >
    > Like how?


    See below.

    > > A better way would have been if the insert() and find() functions would
    > > optionally deliver the distance value on the fly

    >
    > They would have to traverse the tree to do that.


    Yes, of course! Therefore the traversion does not need
    to be done again in distance() ! That's the point I'm trying
    to make clear. If you reread my inital posting then you will see that
    I said that I need the distance value immediately after an insert()
    or find() call... Do you see what I mean? :)
    Currently the tree is traversed twice: once in insert() and find(),
    and the second time in distance(). The second traversion is
    IMO unneccessary if insert() and find() would store this value
    somewhere internally, and distance() then just returns that value
    w/o traversing the tree.
    Maybe for this one can make a new function member like
    "distance_last_added()" or something that. Another option would be
    if insert() and find() would deliver it in a overloaded version...

    > As others have pointed out, "sequence numbers" aren't all that useful in
    > sets. It sounds like you would be better served by using a std::vector
    > and sorting it (using std::sort,) then you can do lookups with
    > std::lower_bound, upper_bound, equal_range, and binary_search and
    > getting the "sequence number" is simply a matter of a little pointer
    > math.


    I need to get the seqnbr for each item that gets added to
    the sorted collection, so then especially the sorting would be an overkill.

    FYI: I'm doing this for the table-lookup routine (ie. a dictionary) of
    a special data compressor for data records with common structure;
    ie. it's not general purpose.

    I just want to make this last point clear: I just have pointed out
    IMO a shortcoming, performance issues or missing features
    of the language that I in the field have encountered.
    It's of course up to the language and library designers to
    understand the problem and to fix or implement the functionality.
    I'm just informing the people who are more involved in such issues.
    If I were the designer of the library I would do it the way I described above.
     
    bilgekhan, May 26, 2008
    #9
  10. bilgekhan

    Kai-Uwe Bux Guest

    bilgekhan wrote:

    > "Daniel T." <> wrote:
    >> "bilgekhan" <> wrote:
    >> > "Daniel T." <> wrote:
    >> > > "bilgekhan" <> wrote:
    >> > >
    >> > > > After doing a succcessful insert() or find() on a set<T> container
    >> > > > is it possible to get the item number of this item in the set?
    >> > > >
    >> > > > (ie. its zero-based sequence number (position/location/rank/index)
    >> > > > within the sorted collection)
    >> > > >
    >> > > > Additional info:
    >> > > > insert() returns a pair<T> and find() returns an iterator.
    >> > > > Using any of these two objects is it possible to get the
    >> > > > sequence number of the item these objects point to?
    >> > > > After the insert() or find() the container won't be modified.
    >> > >
    >> > > set<int>::iterator it = s.insert( x ).first;
    >> > > // or = s.find( x );
    >> > >
    >> > > int seqNumb = distance( s.begin(), it );
    >> >
    >> > Yes, distance() does what I wanted, but it is VERY slow!
    >> > It seems each time it walks all items from the beginning to the
    >> > item in question, ie. it practically does a find().

    >>
    >> How else could it be done?

    >
    > See below.
    >
    >> > FYI: If I program this without STL (and then even using an unordered
    >> > array --> ie. each time doing a sequential search) then it is about 7
    >> > times faster than the STL version that uses set<T> and distance() !
    >> > How come? :)

    >>
    >> Your comparing apples to oranges. In one implementation you are using an
    >> unordered array where all the elements are contiguous, in the other
    >> implementation the elements are ordered but not contiguous. But before
    >> abandoning the set idea, make sure you have all optimizations turned on.
    >> Many standard libraries have extensive error checking built in and make
    >> a lot of calls that can be inlined, but the former doesn't get removed
    >> and the latter doesn't get inlined unless optimizations are turned on.
    >>
    >> > The culprit is IMO the distance() call. One needs a better, a direct,
    >> > method without traversing the tree.

    >>
    >> Like how?

    >
    > See below.
    >
    >> > A better way would have been if the insert() and find() functions would
    >> > optionally deliver the distance value on the fly

    >>
    >> They would have to traverse the tree to do that.


    Notice the "would".

    > Yes, of course! Therefore the traversion does not need
    > to be done again in distance() ! That's the point I'm trying
    > to make clear. If you reread my inital posting then you will see that
    > I said that I need the distance value immediately after an insert()
    > or find() call... Do you see what I mean? :)
    > Currently the tree is traversed twice: once in insert() and find(),


    No. Insert and find do _not_ traverse the tree. They only run through a path
    from the root to a leave in the tree. During this process, the sequence
    number is not established as a by-product.

    > and the second time in distance().


    It's the first time.

    [snip]


    Best

    Kai-Uwe Bux
     
    Kai-Uwe Bux, May 26, 2008
    #10
  11. bilgekhan

    Jerry Coffin Guest

    In article <g1fabi$rc$>,
    says...

    [ ... ]

    > > > A better way would have been if the insert() and find() functions would
    > > > optionally deliver the distance value on the fly

    > >
    > > They would have to traverse the tree to do that.

    >
    > Yes, of course! Therefore the traversion does not need
    > to be done again in distance() ! That's the point I'm trying
    > to make clear. If you reread my inital posting then you will see that
    > I said that I need the distance value immediately after an insert()
    > or find() call... Do you see what I mean? :)


    I see what you mean, but I think you're wrong. Consider (for example),
    inserting an item that's going to be in the right subtree. insert()
    compares the new item to the root node, finds that the new item is
    larger than the root, and proceeds to the node to the right of the root.
    In doing so, it has ignored the _entire_ left sub-tree.

    When you use distance to find the position of the new node in the tree,
    it cannot ignore the left subtree. Instead, it traverses the entire left
    subtree counting each node as it goes. It then proceeds to count through
    the right subtree until it reaches the point at which the new node was
    inserted.

    > Currently the tree is traversed twice: once in insert() and find(),
    > and the second time in distance(). The second traversion is
    > IMO unneccessary if insert() and find() would store this value
    > somewhere internally, and distance() then just returns that value
    > w/o traversing the tree.


    See above. The traversals are entirely _different_ from each other. The
    traversal involved in an insertion or deletion is NOT sufficient to
    determine the overall position of the new node in the tree.

    You _could_ determine an _extremely_ rough estimate of the overall
    position in the tree with only a little extra work. A set, map, etc., is
    normally implemented as a balanced binary tree (e.g. red-black or AVL
    tree). To maintain balancing, such a tree stores some information about
    the _relative_ depths of the left and right subtrees in each node. They
    limit the difference in height between the left and right subtrees
    (though r-b trees and AVL trees impose slightly different restrictions).

    Based upon that height difference, and the depth of the tree you _did_
    traverse, you could very quickly compute upper and lower bounds for the
    overall position of the new node in the tree -- but those bounds may
    easily differ from each other by a factor of 2 or so. Getting the
    correct number is going to be linear.

    [ ... ]

    > I just want to make this last point clear: I just have pointed out
    > IMO a shortcoming, performance issues or missing features
    > of the language that I in the field have encountered.
    > It's of course up to the language and library designers to
    > understand the problem and to fix or implement the functionality.
    > I'm just informing the people who are more involved in such issues.
    > If I were the designer of the library I would do it the way I described above.


    Maybe so -- but if you did, I doubt much of anybody would use your
    library. In fact, except for this specific situation, even YOU probably
    wouldn't use it. Making things work as you seem to want would involve a
    _large_ slowdown in common operations in the hope of achieving a _small_
    speedup for a relatively rare sequence of operations.

    --
    Later,
    Jerry.

    The universe is a figment of its own imagination.
     
    Jerry Coffin, May 27, 2008
    #11
  12. Re: STL: how to get the sequence number of a newly added item intoa set

    Hi!

    bilgekhan schrieb:
    >> As others have pointed out, "sequence numbers" aren't all that useful in
    >> sets. It sounds like you would be better served by using a std::vector
    >> and sorting it (using std::sort,) then you can do lookups with
    >> std::lower_bound, upper_bound, equal_range, and binary_search and
    >> getting the "sequence number" is simply a matter of a little pointer
    >> math.

    >
    > I need to get the seqnbr for each item that gets added to
    > the sorted collection, so then especially the sorting would be an overkill.


    Your answer doesn't make sense to me. I guess you misunderstood the
    advise. The advise propses what I had done if I were you:

    DO NOT USE A STD::SET.

    It doesn't fit your needs. The fact that you need the "index" of some
    element shows it.

    Replace the std::set<Whatever> by std::vector<Whatever> and try if you
    can get along.

    > FYI: I'm doing this for the table-lookup routine (ie. a dictionary) of
    > a special data compressor for data records with common structure;
    > ie. it's not general purpose.

    [snip]
    > If I were the designer of the library I would do it the way I described above.


    Maybe all people writing a non general purpose data compressor with a
    table-lookup routine will use your library then.

    I probably wouldn't. Don't be offensed. It is just that std::set will
    never have a "index_last_added" method. It's by design. If you need a
    function like that you need to implement a new set class of your own.
    Having done that come back and tell us how easy it was to determine the
    exact size of a tree branch which is never visited or how to keep the
    counts up-to-date which tell you the exact size, how you then managed to
    keep the insert in O(log n). I think this is too complicated compared to
    a sorted vector solution.

    Frank
     
    Frank Birbacher, May 27, 2008
    #12
  13. bilgekhan

    Daniel Pitts Guest

    Re: STL: how to get the sequence number of a newly added item intoa set

    Frank Birbacher wrote:
    > Hi!
    >
    > bilgekhan schrieb:
    >>> As others have pointed out, "sequence numbers" aren't all that useful
    >>> in sets. It sounds like you would be better served by using a
    >>> std::vector and sorting it (using std::sort,) then you can do lookups
    >>> with std::lower_bound, upper_bound, equal_range, and binary_search
    >>> and getting the "sequence number" is simply a matter of a little
    >>> pointer math.

    >>
    >> I need to get the seqnbr for each item that gets added to
    >> the sorted collection, so then especially the sorting would be an
    >> overkill.

    >
    > Your answer doesn't make sense to me. I guess you misunderstood the
    > advise. The advise propses what I had done if I were you:
    >
    > DO NOT USE A STD::SET.
    >
    > It doesn't fit your needs. The fact that you need the "index" of some
    > element shows it.
    >
    > Replace the std::set<Whatever> by std::vector<Whatever> and try if you
    > can get along.
    >
    >> FYI: I'm doing this for the table-lookup routine (ie. a dictionary) of
    >> a special data compressor for data records with common structure;
    >> ie. it's not general purpose.

    > [snip]
    >> If I were the designer of the library I would do it the way I
    >> described above.

    >
    > Maybe all people writing a non general purpose data compressor with a
    > table-lookup routine will use your library then.
    >
    > I probably wouldn't. Don't be offensed. It is just that std::set will
    > never have a "index_last_added" method. It's by design. If you need a
    > function like that you need to implement a new set class of your own.
    > Having done that come back and tell us how easy it was to determine the
    > exact size of a tree branch which is never visited or how to keep the
    > counts up-to-date which tell you the exact size, how you then managed to
    > keep the insert in O(log n). I think this is too complicated compared to
    > a sorted vector solution.
    >
    > Frank

    I agree with your point, but feel like pointing out that it is possible
    (almost trivial) to have every node on a tree keep track of how many
    children it has. An insert operation would increment the value on the
    return-trip if the insert was successful.

    My question to the OP is, do you need to update the insert-position of
    the elements *after* the last inserted element? say I have an existing set:

    <1, 5, 6> so, 5 and 6 are at position 1 and 2 respectively...
    I insert 3:
    <1, 3, 5, 6> and determine that 3 is at position 1, do I need to update
    the associated data with 5 and 6 to say they are now at position 2 and 3
    respectively?

    Also to the OP:
    If you design your library for your specific use-cases, but in such a
    way that it can easily be extended, you'll find yourself much happier.
    --
    Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>
     
    Daniel Pitts, May 28, 2008
    #13
  14. bilgekhan

    Daniel Pitts Guest

    Re: STL: how to get the sequence number of a newly added item intoa set

    Daniel T. wrote:
    > Daniel Pitts <> wrote:
    >
    >> ...it is possible (almost trivial) to have every node on a tree
    >> keep track of how many children it has. An insert operation would
    >> increment the value on the return-trip if the insert was successful.

    >
    > I'm not so sure if an extra unsigned long per element can be considered
    > "trivial".

    I said "almost trivial", which refers to difficulty, not space complexity.

    Also, it could easily be a compile-time decision if you needed it or not.

    --
    Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>
     
    Daniel Pitts, May 28, 2008
    #14
  15. bilgekhan

    bilgekhan Guest

    "Daniel Pitts" wrote:
    > Frank Birbacher wrote:
    > >
    > > Don't be offensed. It is just that std::set will never have a
    > > "index_last_added" method. It's by design. If you need a
    > > function like that you need to implement a new set class of your own.
    > > Having done that come back and tell us how easy it was to determine the
    > > exact size of a tree branch which is never visited or how to keep the
    > > counts up-to-date which tell you the exact size, how you then managed to
    > > keep the insert in O(log n). I think this is too complicated compared to
    > > a sorted vector solution.
    > >

    > I agree with your point, but feel like pointing out that it is possible
    > (almost trivial) to have every node on a tree keep track of how many
    > children it has. An insert operation would increment the value on the
    > return-trip if the insert was successful.
    >
    > My question to the OP is, do you need to update the insert-position of
    > the elements *after* the last inserted element? say I have an existing set:
    >
    > <1, 5, 6> so, 5 and 6 are at position 1 and 2 respectively...
    > I insert 3:
    > <1, 3, 5, 6> and determine that 3 is at position 1, do I need to update
    > the associated data with 5 and 6 to say they are now at position 2 and 3
    > respectively?


    From point of application code it is sufficient to know the position of
    just only the *last* successful insert() operation and of the
    *last* successful find() operation.
    Ie. in the application code there is no need to keep track of the positions
    of the other items; just that of the latest successful insert() or find().
     
    bilgekhan, May 28, 2008
    #15
  16. bilgekhan

    bilgekhan Guest

    "Daniel Pitts" wrote:
    > Frank Birbacher wrote:
    > >
    > > Don't be offensed. It is just that std::set will never have a
    > > "index_last_added" method. It's by design. If you need a
    > > function like that you need to implement a new set class of your own.
    > > Having done that come back and tell us how easy it was to determine the
    > > exact size of a tree branch which is never visited or how to keep the
    > > counts up-to-date which tell you the exact size, how you then managed to
    > > keep the insert in O(log n). I think this is too complicated compared to
    > > a sorted vector solution.
    > >

    > I agree with your point, but feel like pointing out that it is possible
    > (almost trivial) to have every node on a tree keep track of how many
    > children it has. An insert operation would increment the value on the
    > return-trip if the insert was successful.
    >
    > My question to the OP is, do you need to update the insert-position of
    > the elements *after* the last inserted element? say I have an existing set:
    >
    > <1, 5, 6> so, 5 and 6 are at position 1 and 2 respectively...
    > I insert 3:
    > <1, 3, 5, 6> and determine that 3 is at position 1, do I need to update
    > the associated data with 5 and 6 to say they are now at position 2 and 3
    > respectively?


    From point of application code it is sufficient to know the position of
    just only the *last* successful insert() operation and of the
    *last* successful find() operation.
    Ie. in the application code there is no need to keep track of the positions
    of the other items; just that of the latest successful insert() or find().

    In the example above, after inserting 3 into this set
    it just shall be possible to get the position
    of this latest new item (ie. here the position of 3, ie. 1).
    There is no need for the library to keep track or to update
    anything defined in the application code that references items of the set.
    Ie. there are no such references (at least not in my code).
     
    bilgekhan, May 28, 2008
    #16
  17. bilgekhan

    Kai-Uwe Bux Guest

    Daniel T. wrote:

    > Daniel Pitts <> wrote:
    >
    >> ...it is possible (almost trivial) to have every node on a tree
    >> keep track of how many children it has. An insert operation would
    >> increment the value on the return-trip if the insert was successful.

    >
    > I'm not so sure if an extra unsigned long per element can be considered
    > "trivial".


    It's not really "extra": Any balanced tree has to store some additional
    information. This information can be _traded_ for the unsigned long since
    size information can be used to rebalance. Then, alignment and memory
    quantization from the allocator will very likely imply that

    2 pointers + 1 unsigned long + user data

    is simply indistinguishable from

    2 pointers + balancing info + user data



    Best

    Kai-Uwe Bux
     
    Kai-Uwe Bux, May 28, 2008
    #17
  18. bilgekhan

    bilgekhan Guest

    "bilgekhan" wrote:
    > "Daniel Pitts" wrote:
    > > Frank Birbacher wrote:
    > > >
    > > > Don't be offensed. It is just that std::set will never have a
    > > > "index_last_added" method. It's by design. If you need a
    > > > function like that you need to implement a new set class of your own.
    > > > Having done that come back and tell us how easy it was to determine the
    > > > exact size of a tree branch which is never visited or how to keep the
    > > > counts up-to-date which tell you the exact size, how you then managed to
    > > > keep the insert in O(log n). I think this is too complicated compared to
    > > > a sorted vector solution.
    > > >

    > > I agree with your point, but feel like pointing out that it is possible
    > > (almost trivial) to have every node on a tree keep track of how many
    > > children it has. An insert operation would increment the value on the
    > > return-trip if the insert was successful.
    > >
    > > My question to the OP is, do you need to update the insert-position of
    > > the elements *after* the last inserted element? say I have an existing set:
    > >
    > > <1, 5, 6> so, 5 and 6 are at position 1 and 2 respectively...
    > > I insert 3:
    > > <1, 3, 5, 6> and determine that 3 is at position 1, do I need to update
    > > the associated data with 5 and 6 to say they are now at position 2 and 3
    > > respectively?

    >
    > From point of application code it is sufficient to know the position of
    > just only the *last* successful insert() operation and of the
    > *last* successful find() operation.
    > Ie. in the application code there is no need to keep track of the positions
    > of the other items; just that of the latest successful insert() or find().
    >
    > In the example above, after inserting 3 into this set
    > it just shall be possible to get the position
    > of this latest new item (ie. here the position of 3, ie. 1).
    > There is no need for the library to keep track or to update
    > anything defined in the application code that references items of the set.
    > Ie. there are no such references (at least not in my code).


    Besides the above functionality a furthergoing super solution
    would be if the traversing of the container would be possible
    for all these 3 cases:
    1) traverse in default (sorted) order
    2) traverse in raw (physical) order
    3) traverse in insert order

    The 1st one is of course already possible.

    The 2nd case could also efficiently be realized in application code
    if only the above mentioned functionality would be possible. Ie. one then
    would need to store and update the returned pos of the inserted item
    in a std::vector<int> at position pos. But before doing this one would need
    to move (reversely) all items from pos to the end by 1 position,
    and while doing this incrementing by 1 those items that have a
    value >= the pos to insert...

    The 3rd could be done for example like this:
    cursor c; // stores last walked pos etc.
    while ((it = s.next(c)) != s.end())
    do_something with *it

    Ie. one would need to implement a "next(cursor& c)" method in the library...

    These are just some ideas of mine...
    At the moment I unfortunately don't have time to implement these
    IMO useful functionalities myself, besides this I don't know the STL
    that well, much less its internals. It seems one would need to
    extend the underlying code of the RB or AVL tree and add
    a uint data member to each block and update it with each insert, delete etc.
    Long time ago I had worked on Btree implementations and I can imagine
    that the most complicated case would be when a block would need
    to be splitted or when two blocks need to be joined...

    People interessted in implementing the above features
    can also take a look at this interessting project:

    "STL containers map, set, and list with fast array access" by Ray Virzi
    "Source code for STL compliant container classes that add
    fast indexing capability to existing container types":
    http://www.codeproject.com/KB/stl/indexlist.aspx
     
    bilgekhan, May 28, 2008
    #18
  19. bilgekhan

    bilgekhan Guest

    <This a fix to my prev. posting>

    "bilgekhan" wrote:
    > "Daniel Pitts" wrote:
    > > Frank Birbacher wrote:
    > > >
    > > > Don't be offensed. It is just that std::set will never have a
    > > > "index_last_added" method. It's by design. If you need a
    > > > function like that you need to implement a new set class of your own.
    > > > Having done that come back and tell us how easy it was to determine the
    > > > exact size of a tree branch which is never visited or how to keep the
    > > > counts up-to-date which tell you the exact size, how you then managed to
    > > > keep the insert in O(log n). I think this is too complicated compared to
    > > > a sorted vector solution.
    > > >

    > > I agree with your point, but feel like pointing out that it is possible
    > > (almost trivial) to have every node on a tree keep track of how many
    > > children it has. An insert operation would increment the value on the
    > > return-trip if the insert was successful.
    > >
    > > My question to the OP is, do you need to update the insert-position of
    > > the elements *after* the last inserted element? say I have an existing set:
    > >
    > > <1, 5, 6> so, 5 and 6 are at position 1 and 2 respectively...
    > > I insert 3:
    > > <1, 3, 5, 6> and determine that 3 is at position 1, do I need to update
    > > the associated data with 5 and 6 to say they are now at position 2 and 3
    > > respectively?

    >
    > From point of application code it is sufficient to know the position of
    > just only the *last* successful insert() operation and of the
    > *last* successful find() operation.
    > Ie. in the application code there is no need to keep track of the positions
    > of the other items; just that of the latest successful insert() or find().
    >
    > In the example above, after inserting 3 into this set
    > it just shall be possible to get the position
    > of this latest new item (ie. here the position of 3, ie. 1).
    > There is no need for the library to keep track or to update
    > anything defined in the application code that references items of the set.
    > Ie. there are no such references (at least not in my code).


    Besides the above functionality a furthergoing super solution
    would be if the traversing of the container would be possible
    for all these 3 cases:
    1) traverse in default (sorted) order
    2) traverse in insert order
    3) traverse in raw (physical) order

    The 1st one is of course already possible.

    The 2nd case could also efficiently be realized in application code
    if only the above mentioned functionality would be possible. Ie. one then
    would need to store and update the returned pos of the inserted item
    in a std::vector<uint> at position pos. But before doing this one would need
    to move (reversely) all items from pos to the end by 1 position,
    and while doing this incrementing by 1 those items that have a
    value >= the pos to insert...
    But one would need a direct access (array access) to the items of the set via a uint handle...

    The 3rd could be done for example like this:
    cursor c; // stores last walked pos etc.
    while ((it = s.next(c)) != s.end())
    do_something with *it

    Ie. one would need to implement a "next(cursor& c)" method in the library...

    These are just some ideas of mine...
    At the moment I unfortunately don't have time to implement these
    IMO useful functionalities myself, besides this I don't know the STL
    that well, much less its internals. It seems one would need to
    extend the underlying code of the RB or AVL tree and add
    a uint data member to each block and update it with each insert, delete etc.
    Long time ago I had worked on Btree implementations and I can imagine
    that the most complicated case would be when a block would need
    to be splitted or when two blocks need to be joined...

    People interessted in implementing the above features
    can also take a look at this interessting project:

    "STL containers map, set, and list with fast array access" by Ray Virzi
    "Source code for STL compliant container classes that add
    fast indexing capability to existing container types":
    http://www.codeproject.com/KB/stl/indexlist.aspx
     
    bilgekhan, May 28, 2008
    #19
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Do
    Replies:
    1
    Views:
    455
    Dmitriy Lapshin [C# / .NET MVP]
    Oct 22, 2003
  2. Dica
    Replies:
    1
    Views:
    347
    Scott M.
    Apr 21, 2006
  3. Johannes Zellner
    Replies:
    1
    Views:
    512
    Alex Martelli
    Jan 17, 2006
  4. =?Utf-8?B?UHJhdmlu?=

    how to fetch only newly added/edited record

    =?Utf-8?B?UHJhdmlu?=, Mar 29, 2007, in forum: ASP .Net
    Replies:
    9
    Views:
    421
    =?Utf-8?B?UGFsbGF2aQ==?=
    Apr 13, 2007
  5. Martin
    Replies:
    2
    Views:
    114
    Rob B
    Nov 19, 2004
Loading...

Share This Page