need an efficient data structure

R

Raghavendra Mahuli

i need to store a lot of integers(say 10,000) in a datastructure.
The constraints are:
1. It should be fast.
2. It should be orderded or sorted.
3.Insterting and deleting should be fast.
4. The no. of elements can change at runtime and hence it should be dynamic.
Right now i am using SortedVector , but it is very slow.
So, can u pls suggest a suitable Datastructure.
 
J

John Harrison

Raghavendra Mahuli said:
i need to store a lot of integers(say 10,000) in a datastructure.
The constraints are:
1. It should be fast.
2. It should be orderded or sorted.
3.Insterting and deleting should be fast.
4. The no. of elements can change at runtime and hence it should be dynamic.
Right now i am using SortedVector , but it is very slow.
So, can u pls suggest a suitable Datastructure.

std::set or std::multiset if you want duplicates.

john
 
T

tom_usenet

std::set or std::multiset if you want duplicates.

And make sure you have a fast memory allocator! A pool allocator would
help a lot here, if your std::allocator isn't already optimized in
that way.

Tom
 
J

Jeff Schwab

Raghavendra said:
i need to store a lot of integers(say 10,000) in a datastructure.
The constraints are:
1. It should be fast.

There will be trade-offs. Which operations should be fastest?
2. It should be orderded or sorted.

Must that property be maintained at all times, or do you plan to do a
bunch of insertions or deletions between reads? What I mean is, if you
will insert 100 elements in a row, does the collection really need to
re-sort itself after each insertion?
3.Insterting and deleting should be fast.

Inserting and deleting where? If you need to maintain the ordering
after absolutely every insertion, a std::set or std::multiset may be
best, as JohnH suggested. If you only need it to be sorted once all
elements have been inserted, a std::vector may be more efficient by a
constant factor in both time and space.
4. The no. of elements can change at runtime and hence it should be dynamic.

The standard library has dynamically growable collections for almost all
your containment needs. :)
 
S

Siemel Naran

Raghavendra Mahuli said:
i need to store a lot of integers(say 10,000) in a datastructure.
The constraints are:
1. It should be fast.
2. It should be orderded or sorted.
3.Insterting and deleting should be fast.
4. The no. of elements can change at runtime and hence it should be dynamic.
Right now i am using SortedVector , but it is very slow.
So, can u pls suggest a suitable Datastructure.

In addition to the fine points raised by everyone else, will the integers be
in a range (say 0 to 65535), and can each number occur more than once?
 
R

Raghavendra Mahuli

I'll clarify my situation further.....
1. Having a fast structure, what i mean is insertion of elements anywhere
(and deletion anywhere) should not take much time....Also access time, i.e.
time required to read a particular element should also be fast....I know it
is a tough requirement (But thats why i posted the question to this group).
2. Also each integer can occur more than once but not more than twice.....

If no standard structure satisfies these constraints, can u suggest a new
idea that would satisfy the needs.....
 
R

Raghavendra Mahuli

Jeff Schwab said:
There will be trade-offs. Which operations should be fastest?
--------- Insertion , deletion & searching should be fast
Must that property be maintained at all times, or do you plan to do a
bunch of insertions or deletions between reads? What I mean is, if you
will insert 100 elements in a row, does the collection really need to
re-sort itself after each insertion?

------ At a time, i'll be inserting only 2 elemnts in a row.....it should
reorder itself immediately.
Inserting and deleting where? If you need to maintain the ordering
after absolutely every insertion, a std::set or std::multiset may be
best, as JohnH suggested. If you only need it to be sorted once all
elements have been inserted, a std::vector may be more efficient by a
constant factor in both time and space.
--------I need to maintain ordering after every 2 insertions...
 
J

Jeff Schwab

Raghavendra said:
--------- Insertion , deletion & searching should be fast

Is O(ln2 n) good enough for all of those operations?
------ At a time, i'll be inserting only 2 elemnts in a row.....it should
reorder itself immediately.

Sounds like std::multiset might be your best bet.
--------I need to maintain ordering after every 2 insertions...

Is the number of possible values bounded? If so, how many possible
values of elements are there? 10,000? 2^32? I don't mean how many
elements you need to store in your container, but how many different
values might exist for the type of your integer; for example, if you're
using 16-bit integers, there are 2^16=65536 possible values.
 
R

Raghavendra Mahuli

Is O(ln2 n) good enough for all of those operations?
--------I think that is sufficient

Is the number of possible values bounded? If so, how many possible
values of elements are there? 10,000? 2^32? I don't mean how many
elements you need to store in your container, but how many different
values might exist for the type of your integer; for example, if you're
using 16-bit integers, there are 2^16=65536 possible values.

-----i am storing memory addresses....So all values in the range are
possible.....but they'll generally be multiples of 4...........And
maximum
possible value is 2^32.
 
S

Siemel Naran

-----i am storing memory addresses....So all values in the range are
possible.....but they'll generally be multiples of 4...........And
maximum
possible value is 2^32.

Try the map idea suggested by John, along with the allocator suggested by
tom_usenet.

A direct address table, hinted at in my post, is out of the question since
the range of values is so big (basically you have an array whose size equals
to the number of different possible elements).

You can use a hybrid approach, which could very well be a hashtable.
Something like: all of the numbers in [0, 2^32] hash to one of 32 different
values with equal probability, which still means many numbers could hash to
the same value, and so you store the different numbers in a map (with custom
pool allocator) data structure. See if your implementation has a
hash_table.
 
J

Jeff Schwab

Raghavendra said:
--------I think that is sufficient





-----i am storing memory addresses....So all values in the range are
possible.....but they'll generally be multiples of 4...........And
maximum
possible value is 2^32.

For the values that aren't multiples of 4, stick them in a std::set.
The access time for them will be logarithmic with the cardinality of the
set, but since that cardinality is small, the performance should be good.

This leaves 2^(32-2) = 2^30 values to store; you could use a sequence of
bits, initialized to zero, to represent whether each value is in the
collection. Since you allow up to two of each element, you'll probably
want an extra bit per element to show whether it has been inserted more
than once. This means you need 2^31 bits. I would recommend using two
std::vector<bool>: one to represent whether each value had been
inserted at least once, and another to represent whether each value had
been inserted more than once. You'll need at least 256 MB of RAM, plus
space for the std::set. The machines where I work typically have
multiple GB of RAM, so this is probably the solution I would choose if
performance were critical.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,763
Messages
2,569,562
Members
45,038
Latest member
OrderProperKetocapsules

Latest Threads

Top