Fast STL data structure

Discussion in 'C++' started by tim.lino@gmail.com, May 1, 2007.

  1. Guest

    Hello,

    I would like to use C++ STL to store a set of Object's which is as
    follows:

    class Object
    {
    public:
    int value;
    ......
    }

    I need to perform the following actions:

    1. Sort Objects in the increasing order of their "value".
    2. After sorting, I need to randomly access some of the Objects.

    As I know, vector is very efficient in random access. However, sorting
    vector of size n takes O(nlog n) time which is not good.

    Is there a good data structure (or a combination of some) such that it
    is efficient in random access and sorting?

    Thank You.
    , May 1, 2007
    #1
    1. Advertising

  2. wrote:
    > I would like to use C++ STL to store a set of Object's which is as
    > follows:
    >
    > class Object
    > {
    > public:
    > int value;
    > ......
    > }
    >
    > I need to perform the following actions:
    >
    > 1. Sort Objects in the increasing order of their "value".
    > 2. After sorting, I need to randomly access some of the Objects.
    >
    > As I know, vector is very efficient in random access. However, sorting
    > vector of size n takes O(nlog n) time which is not good.
    >
    > Is there a good data structure (or a combination of some) such that it
    > is efficient in random access and sorting?


    Collect them in a set, then copy the set into a vector.

    V
    --
    Please remove capital 'A's when replying by e-mail
    I do not respond to top-posted replies, please don't ask
    Victor Bazarov, May 1, 2007
    #2
    1. Advertising

  3. Guest

    their are sorts that are better than O(n lg n), but they are general,
    and hence not easily templatized. try implementing radix/counting sort
    for your own data stored within a vector.
    wrote:
    > Hello,
    >
    > I would like to use C++ STL to store a set of Object's which is as
    > follows:
    >
    > class Object
    > {
    > public:
    > int value;
    > ......
    > }
    >
    > I need to perform the following actions:
    >
    > 1. Sort Objects in the increasing order of their "value".
    > 2. After sorting, I need to randomly access some of the Objects.
    >
    > As I know, vector is very efficient in random access. However, sorting
    > vector of size n takes O(nlog n) time which is not good.
    >
    > Is there a good data structure (or a combination of some) such that it
    > is efficient in random access and sorting?
    >
    > Thank You.
    , May 1, 2007
    #3
  4. Guest

    collecting in a set is still O(n lg n)
    Victor Bazarov wrote:
    > wrote:
    > > I would like to use C++ STL to store a set of Object's which is as
    > > follows:
    > >
    > > class Object
    > > {
    > > public:
    > > int value;
    > > ......
    > > }
    > >
    > > I need to perform the following actions:
    > >
    > > 1. Sort Objects in the increasing order of their "value".
    > > 2. After sorting, I need to randomly access some of the Objects.
    > >
    > > As I know, vector is very efficient in random access. However, sorting
    > > vector of size n takes O(nlog n) time which is not good.
    > >
    > > Is there a good data structure (or a combination of some) such that it
    > > is efficient in random access and sorting?

    >
    > Collect them in a set, then copy the set into a vector.
    >
    > V
    > --
    > Please remove capital 'A's when replying by e-mail
    > I do not respond to top-posted replies, please don't ask
    , May 1, 2007
    #4
  5. Guest

    their are sorts that are better than O(n lg n), but they are general,
    and hence not easily templatized. try implementing radix/counting sort
    for your own data stored within a vector.
    wrote:
    > Hello,
    >
    > I would like to use C++ STL to store a set of Object's which is as
    > follows:
    >
    > class Object
    > {
    > public:
    > int value;
    > ......
    > }
    >
    > I need to perform the following actions:
    >
    > 1. Sort Objects in the increasing order of their "value".
    > 2. After sorting, I need to randomly access some of the Objects.
    >
    > As I know, vector is very efficient in random access. However, sorting
    > vector of size n takes O(nlog n) time which is not good.
    >
    > Is there a good data structure (or a combination of some) such that it
    > is efficient in random access and sorting?
    >
    > Thank You.
    , May 1, 2007
    #5
  6. peter koch Guest

    On 1 Maj, 05:09, wrote:
    > Hello,
    >
    > I would like to use C++ STL to store a set of Object's which is as
    > follows:
    >
    > class Object
    > {
    > public:
    > int value;
    > ......
    >
    > }
    >
    > I need to perform the following actions:
    >
    > 1. Sort Objects in the increasing order of their "value".
    > 2. After sorting, I need to randomly access some of the Objects.
    >
    > As I know, vector is very efficient in random access. However, sorting
    > vector of size n takes O(nlog n) time which is not good.
    >
    > Is there a good data structure (or a combination of some) such that it
    > is efficient in random access and sorting?
    >
    > Thank You.


    Vector is probably the correct choice. Sorting is inherently n log n
    if you sort by comparing, and the specialised sorts are only useful in
    special cases.

    /Peter
    peter koch, May 1, 2007
    #6
  7. > As I know, vector is very efficient in random access. However, sorting
    > vector of size n takes O(nlog n) time which is not good.


    What is not good about it?
    Andrew Koenig, May 1, 2007
    #7
  8. Dave Steffen Guest

    writes:

    > Hello,
    >
    > I would like to use C++ STL to store a set of Object's which is as
    > follows:
    >
    > class Object
    > {
    > public:
    > int value;
    > ......
    > }
    >
    > I need to perform the following actions:
    >
    > 1. Sort Objects in the increasing order of their "value".
    > 2. After sorting, I need to randomly access some of the Objects.
    >
    > As I know, vector is very efficient in random access. However, sorting
    > vector of size n takes O(nlog n) time which is not good.


    A point: _all_ sorting operations are O(n log n). You just can't do
    any better. What you _can_ do is reduce the constants in front of
    the n log n, by using well-implemented libraries, and being clever.
    For example, in Effective STL, Scott Meyers points out an
    alternative to using sets (and maps) that, given certain usage
    patterns, can provide significant run-time advantages. (If you
    haven't read Effective STL, you should.)


    ----------------------------------------------------------------------
    Dave Steffen, Ph.D. Disobey this command!
    Software Engineer IV - Douglas Hofstadter
    Numerica Corporation
    dg@steffen a@t numerica d@ot us (remove @'s to email me)
    Dave Steffen, May 1, 2007
    #8
  9. Dave Steffen Guest

    [Don't top-post. Reordered.]

    writes:

    > Victor Bazarov wrote:
    > > wrote:
    > > > I would like to use C++ STL to store a set of Object's which is as
    > > > follows:
    > > >
    > > > class Object
    > > > {
    > > > public:
    > > > int value;
    > > > ......
    > > > }
    > > >
    > > > I need to perform the following actions:
    > > >
    > > > 1. Sort Objects in the increasing order of their "value".
    > > > 2. After sorting, I need to randomly access some of the Objects.
    > > >
    > > > As I know, vector is very efficient in random access. However, sorting
    > > > vector of size n takes O(nlog n) time which is not good.
    > > >
    > > > Is there a good data structure (or a combination of some) such that it
    > > > is efficient in random access and sorting?

    > >
    > > Collect them in a set, then copy the set into a vector.

    >
    > collecting in a set is still O(n lg n)


    Indeed. Again, you can't beat n log n for sorting; IIRC that's been
    definitively proven.

    While I wouldn't do exactly what Victor suggests, he's got a point,
    which is effectively the same as Scott Meyer's point in Effective
    STL. Sets (and maps) are really optimized for the use case "insert,
    read, add or delete, read, add or delete..." In other words, the data
    in the container is frequently changed.

    Putting your data in a vector (which provides fast access) and sorting
    it _once_ up front can provide substantial speed increases; we've used
    that approach several times.

    My only quibble with Victor's solution is the extra memory management
    and copying that will go on behind the scenes. I'd say put your data
    in a vector (or maybe deque?), sort, and then use. This is probably
    the fastest "general purpose" solution there is.

    ----------------------------------------------------------------------
    Dave Steffen, Ph.D. Disobey this command!
    Software Engineer IV - Douglas Hofstadter
    Numerica Corporation
    dg@steffen a@t numerica d@ot us (remove @'s to email me)
    Dave Steffen, May 1, 2007
    #9
  10. * Dave Steffen:
    > writes:
    >
    >> Hello,
    >>
    >> I would like to use C++ STL to store a set of Object's which is as
    >> follows:
    >>
    >> class Object
    >> {
    >> public:
    >> int value;
    >> ......
    >> }
    >>
    >> I need to perform the following actions:
    >>
    >> 1. Sort Objects in the increasing order of their "value".
    >> 2. After sorting, I need to randomly access some of the Objects.
    >>
    >> As I know, vector is very efficient in random access. However, sorting
    >> vector of size n takes O(nlog n) time which is not good.

    >
    > A point: _all_ sorting operations are O(n log n). You just can't do
    > any better.


    Although off-topic in clc++m, this urban myth (an over-generalization of
    a basic result) should not be propagated, so:

    The O(n log n) limit is for random data, with enough of it, on a
    sequential machine. Theoretically it stems from the fact that n items
    can be arranged in n! ways, and n^n <= n!^2 <= (n^2)^n. Now just take
    logs to find the number of bits needed to represent a permutation, which
    is the same as the number of binary choices needed to select one.

    As soon as you can make assumptions about the data, or can establish
    properties of the data, you can do better (let's not discuss advanced
    computer architectures). The most trivial example is when you know the
    data is already sorted, yielding O(1) sorting time. As a more
    practically useful example, a least significant digit radix sort has
    sorting time O(nk), where k is the length of a key.


    > What you _can_ do is reduce the constants in front of
    > the n log n, by using well-implemented libraries, and being clever.


    Being clever is in general not very clever...


    > For example, in Effective STL, Scott Meyers points out an
    > alternative to using sets (and maps) that, given certain usage
    > patterns, can provide significant run-time advantages. (If you
    > haven't read Effective STL, you should.)


    The OP's problem is trivially solved by using std::map. He just needs
    to provide a strict weak ordering, operator<, for his Object class.
    Before optimizing or even thinking about it, measure.


    > ----------------------------------------------------------------------
    > Dave Steffen, Ph.D. Disobey this command!
    > Software Engineer IV - Douglas Hofstadter
    > Numerica Corporation
    > dg@steffen a@t numerica d@ot us (remove @'s to email me)


    Please use a proper signature delimiter (two dashes followed by space,
    with nothing else on that line).

    Oh well, I'll never get a PhD, nor a job with Numerica Corporation, I'm
    sure. :)


    --
    A: Because it messes up the order in which people normally read text.
    Q: Why is it such a bad thing?
    A: Top-posting.
    Q: What is the most annoying thing on usenet and in e-mail?
    Alf P. Steinbach, May 1, 2007
    #10
  11. Dave Steffen Guest

    "Alf P. Steinbach" <> writes:

    > * Dave Steffen:

    [...]
    > >> As I know, vector is very efficient in random access. However, sorting
    > >> vector of size n takes O(nlog n) time which is not good.

    > > A point: _all_ sorting operations are O(n log n). You just can't
    > > do
    > > any better.

    >
    > Although off-topic in clc++m, this urban myth (an over-generalization
    > of a basic result) should not be propagated, so:
    >
    > The O(n log n) limit is for random data, with enough of it, on a
    > sequential machine. Theoretically it stems from the fact that n items
    > can be arranged in n! ways, and n^n <= n!^2 <= (n^2)^n. Now just take
    > logs to find the number of bits needed to represent a permutation,
    > which is the same as the number of binary choices needed to select one.
    >
    > As soon as you can make assumptions about the data, or can establish
    > properties of the data, you can do better (let's not discuss advanced
    > computer architectures). The most trivial example is when you know
    > the data is already sorted, yielding O(1) sorting time. As a more
    > practically useful example, a least significant digit radix sort has
    > sorting time O(nk), where k is the length of a key.


    Of course. O(n log n) isn't so much an urban myth as the best
    answer you can give if you _don't_ know anything more about the
    input data. Or, put it another way, it's really an _upper bound_ on
    how long it takes to sort.

    The OP wanted better than O(n log n) but didn't specify any
    assumptions about the data. Arguably the correct answer should have
    been not "n log n is as good as you can do" but "your question is
    underspecified, no answer possible". :)

    > > What you _can_ do is reduce the constants in front of
    > > the n log n, by using well-implemented libraries, and being clever.

    >
    > Being clever is in general not very clever...


    Well, being clever is clever only if you can do it in general. :)

    What I meant by that, but didn't elaborate on, is what you said
    above; you can do better, but only under specific circumstances
    (e.g. when assumptions about the nature of the data are valid).

    > > For example, in Effective STL, Scott Meyers points out an
    > > alternative to using sets (and maps) that, given certain usage
    > > patterns, can provide significant run-time advantages. (If you
    > > haven't read Effective STL, you should.)

    >
    > The OP's problem is trivially solved by using std::map. He just needs
    > to provide a strict weak ordering, operator<, for his Object
    > class.


    Well, that's still O(n log n) which he wanted to improve on. My
    advice was more about usage patterns; if you're stuck with n log n,
    you can still (sometimes) do a lot to file down the constants involved.

    > Before optimizing or even thinking about it, measure.


    Absolutely, and always.

    > > Dave Steffen, Ph.D. Disobey this command!
    > > Software Engineer IV - Douglas Hofstadter
    > > Numerica Corporation

    [...]

    > Oh well, I'll never get a PhD, nor a job with Numerica Corporation,
    > I'm sure. :)


    Dunno, we tend to be chronically short on C++ people. :) (Sig
    fixed.)

    --
    Dave Steffen, Ph.D. Disobey this command!
    Software Engineer IV - Douglas Hofstadter
    Numerica Corporation
    dg@steffen a@t numerica d@ot us (remove @'s to email me)
    Dave Steffen, May 1, 2007
    #11
  12. Dave Steffen wrote:
    > Again, you can't beat n log n for sorting; IIRC that's been
    > definitively proven.


    You can't sort faster than nlogn when using *comparison* sorting.
    You can achieve linear sorting time by other means, but those are
    not based on comparisons between elements and thus don't work on
    all possible element types. If the element is an int, a faster
    sorting algorithm can be used (such as radix sort).

    OTOH one has to question if it's worth the efforts. nlogn is not
    all that more slower than linear, after all.
    Juha Nieminen, May 1, 2007
    #12
  13. Alf P. Steinbach wrote:
    > Although off-topic in clc++m, this urban myth (an over-generalization of
    > a basic result) should not be propagated, so:


    It's not an urban myth. It's the upper limit for generic sorting when
    only less-than comparison between elements is available. Any algorithm
    faster than that (in the general case) needs something more than just
    a less-than comparison.
    Juha Nieminen, May 1, 2007
    #13
  14. * Juha Nieminen:
    > Alf P. Steinbach wrote:
    >> Although off-topic in clc++m, this urban myth (an over-generalization of
    >> a basic result) should not be propagated, so:

    >
    > It's not an urban myth.


    It is. Note: you selected to snip the "it" referred to here. Please be
    a bit more conscientious about quoting.


    > It's the upper limit for generic sorting when
    > only less-than comparison between elements is available. Any algorithm
    > faster than that (in the general case) needs something more than just
    > a less-than comparison.


    Presumably you're here referring to a different "it" than above, namely
    just the expression O(n log n).

    Even so, your statement, as it is formulated, is incorrect or
    meaningless, depending on how it's interpreted. My earlier posting in
    this thread may provide some guidance, but I think I wrote what you
    intended to write above. However, that is better discussed in e.g.
    [comp.programming], if at all. Follow-ups set.

    Cheers,

    - Alf

    --
    A: Because it messes up the order in which people normally read text.
    Q: Why is it such a bad thing?
    A: Top-posting.
    Q: What is the most annoying thing on usenet and in e-mail?
    Alf P. Steinbach, May 1, 2007
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    0
    Views:
    635
  2. Michele Simionato

    Python is darn fast (was: How fast is Python)

    Michele Simionato, Aug 23, 2003, in forum: Python
    Replies:
    13
    Views:
    544
  3. puzzlecracker
    Replies:
    4
    Views:
    558
    John Harrison
    Sep 19, 2005
  4. Sigmathaar
    Replies:
    3
    Views:
    320
    Bob Hairgrove
    Dec 6, 2005
  5. Juha Nieminen
    Replies:
    22
    Views:
    980
    Kai-Uwe Bux
    Oct 12, 2007
Loading...

Share This Page