Fast STL data structure

tim.lino · May 1, 2007

Hello,

I would like to use C++ STL to store a set of Object's which is as
follows:

class Object
{
public:
int value;
......
}

I need to perform the following actions:

1. Sort Objects in the increasing order of their "value".
2. After sorting, I need to randomly access some of the Objects.

As I know, vector is very efficient in random access. However, sorting
vector of size n takes O(nlog n) time which is not good.

Is there a good data structure (or a combination of some) such that it
is efficient in random access and sorting?

Thank You.

Victor Bazarov · May 1, 2007

I would like to use C++ STL to store a set of Object's which is as
follows:

class Object
{
public:
int value;
......
}

I need to perform the following actions:

1. Sort Objects in the increasing order of their "value".
2. After sorting, I need to randomly access some of the Objects.

As I know, vector is very efficient in random access. However, sorting
vector of size n takes O(nlog n) time which is not good.

Is there a good data structure (or a combination of some) such that it
is efficient in random access and sorting?

Collect them in a set, then copy the set into a vector.

V

cpunerd · May 1, 2007

their are sorts that are better than O(n lg n), but they are general,
and hence not easily templatized. try implementing radix/counting sort
for your own data stored within a vector.

cpunerd · May 1, 2007

collecting in a set is still O(n lg n)

Victor said:
Collect them in a set, then copy the set into a vector.

V

cpunerd · May 1, 2007

their are sorts that are better than O(n lg n), but they are general,
and hence not easily templatized. try implementing radix/counting sort
for your own data stored within a vector.

peter koch · May 1, 2007

Hello,

I would like to use C++ STL to store a set of Object's which is as
follows:

class Object
{
public:
int value;
......

}

I need to perform the following actions:

1. Sort Objects in the increasing order of their "value".
2. After sorting, I need to randomly access some of the Objects.

As I know, vector is very efficient in random access. However, sorting
vector of size n takes O(nlog n) time which is not good.

Is there a good data structure (or a combination of some) such that it
is efficient in random access and sorting?

Thank You.

Vector is probably the correct choice. Sorting is inherently n log n
if you sort by comparing, and the specialised sorts are only useful in
special cases.

/Peter

Andrew Koenig · May 1, 2007

As I know, vector is very efficient in random access. However, sorting

vector of size n takes O(nlog n) time which is not good.

What is not good about it?

Dave Steffen · May 1, 2007

Hello,

I would like to use C++ STL to store a set of Object's which is as
follows:

class Object
{
public:
int value;
......
}

I need to perform the following actions:

1. Sort Objects in the increasing order of their "value".
2. After sorting, I need to randomly access some of the Objects.

As I know, vector is very efficient in random access. However, sorting
vector of size n takes O(nlog n) time which is not good.

A point: _all_ sorting operations are O(n log n). You just can't do
any better. What you _can_ do is reduce the constants in front of
the n log n, by using well-implemented libraries, and being clever.
For example, in Effective STL, Scott Meyers points out an
alternative to using sets (and maps) that, given certain usage
patterns, can provide significant run-time advantages. (If you
haven't read Effective STL, you should.)

Dave Steffen · May 1, 2007

[Don't top-post. Reordered.]

collecting in a set is still O(n lg n)

Indeed. Again, you can't beat n log n for sorting; IIRC that's been
definitively proven.

While I wouldn't do exactly what Victor suggests, he's got a point,
which is effectively the same as Scott Meyer's point in Effective
STL. Sets (and maps) are really optimized for the use case "insert,
read, add or delete, read, add or delete..." In other words, the data
in the container is frequently changed.

Putting your data in a vector (which provides fast access) and sorting
it _once_ up front can provide substantial speed increases; we've used
that approach several times.

My only quibble with Victor's solution is the extra memory management
and copying that will go on behind the scenes. I'd say put your data
in a vector (or maybe deque?), sort, and then use. This is probably
the fastest "general purpose" solution there is.

Alf P. Steinbach · May 1, 2007

* Dave Steffen:

A point: _all_ sorting operations are O(n log n). You just can't do
any better.

Although off-topic in clc++m, this urban myth (an over-generalization of
a basic result) should not be propagated, so:

The O(n log n) limit is for random data, with enough of it, on a
sequential machine. Theoretically it stems from the fact that n items
can be arranged in n! ways, and n^n <= n!^2 <= (n^2)^n. Now just take
logs to find the number of bits needed to represent a permutation, which
is the same as the number of binary choices needed to select one.

As soon as you can make assumptions about the data, or can establish
properties of the data, you can do better (let's not discuss advanced
computer architectures). The most trivial example is when you know the
data is already sorted, yielding O(1) sorting time. As a more
practically useful example, a least significant digit radix sort has
sorting time O(nk), where k is the length of a key.

What you _can_ do is reduce the constants in front of
the n log n, by using well-implemented libraries, and being clever.

Being clever is in general not very clever...

For example, in Effective STL, Scott Meyers points out an
alternative to using sets (and maps) that, given certain usage
patterns, can provide significant run-time advantages. (If you
haven't read Effective STL, you should.)

The OP's problem is trivially solved by using std::map. He just needs
to provide a strict weak ordering, operator<, for his Object class.
Before optimizing or even thinking about it, measure.

----------------------------------------------------------------------
Dave Steffen, Ph.D. Disobey this command!
Software Engineer IV - Douglas Hofstadter
Numerica Corporation
dg@steffen a@t numerica d@ot us (remove @'s to email me)

Please use a proper signature delimiter (two dashes followed by space,
with nothing else on that line).

Oh well, I'll never get a PhD, nor a job with Numerica Corporation, I'm
sure.

Dave Steffen · May 1, 2007

Alf P. Steinbach said:
* Dave Steffen: [...]

A point: _all_ sorting operations are O(n log n). You just can't
do
any better.

Click to expand...

Although off-topic in clc++m, this urban myth (an over-generalization
of a basic result) should not be propagated, so:

The O(n log n) limit is for random data, with enough of it, on a
sequential machine. Theoretically it stems from the fact that n items
can be arranged in n! ways, and n^n <= n!^2 <= (n^2)^n. Now just take
logs to find the number of bits needed to represent a permutation,
which is the same as the number of binary choices needed to select one.

As soon as you can make assumptions about the data, or can establish
properties of the data, you can do better (let's not discuss advanced
computer architectures). The most trivial example is when you know
the data is already sorted, yielding O(1) sorting time. As a more
practically useful example, a least significant digit radix sort has
sorting time O(nk), where k is the length of a key.

Of course. O(n log n) isn't so much an urban myth as the best
answer you can give if you _don't_ know anything more about the
input data. Or, put it another way, it's really an _upper bound_ on
how long it takes to sort.

The OP wanted better than O(n log n) but didn't specify any
assumptions about the data. Arguably the correct answer should have
been not "n log n is as good as you can do" but "your question is
underspecified, no answer possible".

Being clever is in general not very clever...

Well, being clever is clever only if you can do it in general.

What I meant by that, but didn't elaborate on, is what you said
above; you can do better, but only under specific circumstances
(e.g. when assumptions about the nature of the data are valid).

The OP's problem is trivially solved by using std::map. He just needs
to provide a strict weak ordering, operator<, for his Object
class.

Well, that's still O(n log n) which he wanted to improve on. My
advice was more about usage patterns; if you're stuck with n log n,
you can still (sometimes) do a lot to file down the constants involved.

Before optimizing or even thinking about it, measure.

Absolutely, and always.

Dave Steffen, Ph.D. Disobey this command!
Software Engineer IV - Douglas Hofstadter
Numerica Corporation

Click to expand...

[...]

Oh well, I'll never get a PhD, nor a job with Numerica Corporation,
I'm sure.

Dunno, we tend to be chronically short on C++ people.

(Sig
fixed.)

Juha Nieminen · May 1, 2007

Dave said:
Again, you can't beat n log n for sorting; IIRC that's been
definitively proven.

You can't sort faster than nlogn when using *comparison* sorting.
You can achieve linear sorting time by other means, but those are
not based on comparisons between elements and thus don't work on
all possible element types. If the element is an int, a faster
sorting algorithm can be used (such as radix sort).

OTOH one has to question if it's worth the efforts. nlogn is not
all that more slower than linear, after all.

Juha Nieminen · May 1, 2007

Alf said:
Although off-topic in clc++m, this urban myth (an over-generalization of
a basic result) should not be propagated, so:

It's not an urban myth. It's the upper limit for generic sorting when
only less-than comparison between elements is available. Any algorithm
faster than that (in the general case) needs something more than just
a less-than comparison.

Alf P. Steinbach · May 1, 2007

* Juha Nieminen:

It's not an urban myth.

It is. Note: you selected to snip the "it" referred to here. Please be
a bit more conscientious about quoting.

It's the upper limit for generic sorting when
only less-than comparison between elements is available. Any algorithm
faster than that (in the general case) needs something more than just
a less-than comparison.

Presumably you're here referring to a different "it" than above, namely
just the expression O(n log n).

Even so, your statement, as it is formulated, is incorrect or
meaningless, depending on how it's interpreted. My earlier posting in
this thread may provide some guidance, but I think I wrote what you
intended to write above. However, that is better discussed in e.g.
[comp.programming], if at all. Follow-ups set.

Cheers,

- Alf

Alternative STL Structure?	10	Nov 25, 2010
Fast Container	8	Nov 2, 2011
STL Container Choice	11	Mar 28, 2012
Sorting an STL map	1	Oct 19, 2013
which data structure to use?	11	Jan 21, 2014
C++ design/over desiging : stl style vs OOPS, OOAD	15	Sep 20, 2011
Data Structure Issue	5	Jul 4, 2007
New comer's question on STL/vector::data() over Dev-Cpp	4	Jul 1, 2013

Fast STL data structure

tim.lino

Victor Bazarov

cpunerd

cpunerd

cpunerd

peter koch

Andrew Koenig

Dave Steffen

Dave Steffen

Alf P. Steinbach

Dave Steffen

Juha Nieminen

Juha Nieminen

Alf P. Steinbach

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads