Efficient data structures

Christian Christmann · Aug 15, 2006

Hi,

I'm looking for a data structure where I can store
arbitrary elements and than efficiently check if an
element (of same type as the stored elements) is contained
in the list. The latter operation is performed pretty
frequently, so the data structure I require must handle
it with low komplexity.

My idea was to use a STL set and then do something like

myset.find( element2Search ) != myset.end()

For sorted associative containers the "find" function
has a logarithmic complexity. Are there any approaches
with linear complexity?

Regards,
Chris

mlimber · Aug 15, 2006

[changed followup to c.l.c++]

Christian said:
Hi,

I'm looking for a data structure where I can store
arbitrary elements and than efficiently check if an
element (of same type as the stored elements) is contained
in the list. The latter operation is performed pretty
frequently, so the data structure I require must handle
it with low komplexity.

My idea was to use a STL set and then do something like

myset.find( element2Search ) != myset.end()

For sorted associative containers the "find" function
has a logarithmic complexity. Are there any approaches
with linear complexity?

A heterogenous container calls for boost::any. You would have to write
your own compare function, however, for ordering in the set (do you
really want uniqueness and if so, in what sense -- per type or per
value? if not, you might consider a sorted std::vector<boost::any> and
then the standard search functions).

Cheers! --M

Christian Christmann · Aug 15, 2006

A heterogenous container calls for boost::any. You would have to write
your own compare function, however, for ordering in the set (do you
really want uniqueness and if so, in what sense -- per type or per
value? if not, you might consider a sorted std::vector<boost::any> and
then the standard search functions).

Maybe I didn't specify my requirements too precise. What I need is
a "template-based" data structure where all stored elements are of the
same type. The order of the elements in the data structure is
irrelevant.

Victor Bazarov · Aug 15, 2006

Christian said:
Maybe I didn't specify my requirements too precise. What I need is
a "template-based" data structure where all stored elements are of the
same type. The order of the elements in the data structure is
irrelevant.

If you don't use the member notation (.find), you can use 'std::find'
with any container:

std::blah<mytype> mycontainer; // replace 'blah' with 'vector'
// or 'list' or 'set' or 'deque' or ...
std::find(mycontainer.begin(), mycontainer.end(), myvalue);

V

mlimber · Aug 15, 2006

Christian said:
Maybe I didn't specify my requirements too precise. What I need is
a "template-based" data structure where all stored elements are of the
same type. The order of the elements in the data structure is
irrelevant.

You could use any of the containers in the STL. Which one you choose
will depend on how you will use it (e.g., will there be a lot of
insertions or deletions, or just one populating on startup). If
uniqueness is necessary, you probably want std::set. If not, you might
consider std::multiset or std::vector (which you'd have to keep sorted
yourself). You can't beat logarithmic complexity in searching (unless
you already know the index into the data array, but then it's not
really a "search").

Cheers! --M

Jerry Coffin · Aug 15, 2006

[ ... ]

You could use any of the containers in the STL. Which one you choose
will depend on how you will use it (e.g., will there be a lot of
insertions or deletions, or just one populating on startup). If
uniqueness is necessary, you probably want std::set. If not, you might
consider std::multiset or std::vector (which you'd have to keep sorted
yourself). You can't beat logarithmic complexity in searching (unless
you already know the index into the data array, but then it's not
really a "search").

Actually, you often can beat logarithmic. Hash tables have constant
expected complexity. You can also use an interpolating search, which
typically has substantially lower complexity than logarithmic as well.

A binary search ignores much of the information that's really available
-- it blindly assumes that the best guess it can make at the location is
the middle of the available range.

An interpolating search is much closer to the way most people would (for
example) look something up in a dictionary. If you're looking up 'cat',
you know you can start fairly close to the beginning. If you're looking
up 'victory', you know you can start fairly close to the end. Your first
attempt usually won't be the right page, but you can (again) usually
make quite a bit better of a guess than simply the middle of the range.

Obviously, this _can_ backfire -- its worst-case complexity is quite
poor. You can, however, do something like an Introsort, and switch to a
plain binary search if you find that it's not working well for a
particular search.

Note that an interpolating search requires a random acces iterator -- it
generally doesn't work in something like a tree structure.

tolgaceylanus · Aug 15, 2006

Besides hashes, you can beat logaritmic complexity in
searches depending on specifics of your data.
radix sort, bucket sort, counting sort are all linear.

Tolga Ceylan

Howard · Aug 15, 2006

Besides hashes, you can beat logaritmic complexity in
searches depending on specifics of your data.
radix sort, bucket sort, counting sort are all linear.

Linear (O(N)) is worse that logarithmic (O(log N)) on average. Perhaps
you're thinking of O(N log N) when you say "logarithmic"? Or perhaps you
mean constant time (O(1)) when you say "linear"?

-Howard

mlimber · Aug 15, 2006

Jerry said:
[ ... ]

You could use any of the containers in the STL. Which one you choose
will depend on how you will use it (e.g., will there be a lot of
insertions or deletions, or just one populating on startup). If
uniqueness is necessary, you probably want std::set. If not, you might
consider std::multiset or std::vector (which you'd have to keep sorted
yourself). You can't beat logarithmic complexity in searching (unless
you already know the index into the data array, but then it's not
really a "search").

Click to expand...

Actually, you often can beat logarithmic. Hash tables have constant
expected complexity.

Ok, I should have said: "You can't beat logarithmic complexity with the
current standard library." (Hashes are part of TR1, however.) In any
case, while hashes can certainly be helpful, their worst-case guarantee
O(n) is obviously worse than logarithmic performance. In addition,
creating a good hash function isn't a trivial task (a bad one can
severly degrade performance), and the computations required to evaluate
the hash function can be slow. Less theoretically, hashes can decrease
locality of reference, which may degrade performance on particular
systems. Suffice it to say, there are trade-offs involved in choosing
data structures and algorithms.

You can also use an interpolating search, which
typically has substantially lower complexity than logarithmic as well.

A binary search ignores much of the information that's really available
-- it blindly assumes that the best guess it can make at the location is
the middle of the available range.

An interpolating search is much closer to the way most people would (for
example) look something up in a dictionary. If you're looking up 'cat',
you know you can start fairly close to the beginning. If you're looking
up 'victory', you know you can start fairly close to the end. Your first
attempt usually won't be the right page, but you can (again) usually
make quite a bit better of a guess than simply the middle of the range.

Obviously, this _can_ backfire -- its worst-case complexity is quite
poor. You can, however, do something like an Introsort, and switch to a
plain binary search if you find that it's not working well for a
particular search.

Note that an interpolating search requires a random acces iterator -- it
generally doesn't work in something like a tree structure.

You can beat logarithmic complexity in average complexity but, as far
as I know, not in worst-case complexity or with standard library
functions.

Cheers! --M

tolgaceylanus · Aug 15, 2006

Oppps... "logarithmic" is the wrong word here. It should have been
N log2 N

By linear, I mean O(N).

Linear (O(N)) is worse that logarithmic (O(log N)) on average. Perhaps
you're thinking of O(N log N) when you say "logarithmic"? Or perhaps you
mean constant time (O(1)) when you say "linear"?

-Howard

Thanks for the fix.

Tolga Ceylan

tolgaceylanus · Aug 15, 2006

Also, I meant "in sorting" not "in searching". My posting is not very
related
with the original question. I guess in this case hashes O(1) can beat
O(log N)
complexity of the sets for the 'lookup' operation.

(assuming sets are typically implemented with red-black trees.)

(Shouldn't have posted at all.. :-}} )

Tolga Ceylan

Mark P · Aug 15, 2006

Christian said:
Hi,

I'm looking for a data structure where I can store
arbitrary elements and than efficiently check if an
element (of same type as the stored elements) is contained
in the list. The latter operation is performed pretty
frequently, so the data structure I require must handle
it with low komplexity.

My idea was to use a STL set and then do something like

myset.find( element2Search ) != myset.end()

For sorted associative containers the "find" function
has a logarithmic complexity. Are there any approaches
with linear complexity?

Why would you want linear complexity when log is clearly faster? Or did
you mean constant time complexity? The canonical data structure for
this sort of problem is a hash table. Unfortunately this is not (yet)
part of standard C++ but it is a commonly provided extension.

Marcus Kwok · Aug 15, 2006

Oppps... "logarithmic" is the wrong word here. It should have been
N log2 N

Just a small thing: O(N log N) == O(N log2 N). This is because Big-O
does not care about constant factors.

N log2 N == N (log N / log 2) == (1/log 2) * N log N

1/log 2 is a constant so it can be dropped.

Jerry Coffin · Aug 16, 2006

[ ... ]

Ok, I should have said: "You can't beat logarithmic complexity with the
current standard library." (Hashes are part of TR1, however.) In any
case, while hashes can certainly be helpful, their worst-case guarantee
O(n) is obviously worse than logarithmic performance.

While the hash tables included in TR1 may have O(N) complexity in the
worst case, a hash table can be designed to provide logarithmic worst-
case complexity.

[ ... ]

You can beat logarithmic complexity in average complexity but, as far
as I know, not in worst-case complexity or with standard library
functions.

That sounds reasonable to me.

red floyd · Aug 16, 2006

Marcus said:
Just a small thing: O(N log N) == O(N log2 N). This is because Big-O
does not care about constant factors.

N log2 N == N (log N / log 2) == (1/log 2) * N log N

1/log 2 is a constant so it can be dropped.

I suspect that O(N log2 N) actually means O(N log^2 N)

mlimber · Aug 16, 2006

Jerry said:
While the hash tables included in TR1 may have O(N) complexity in the
worst case, a hash table can be designed to provide logarithmic worst-
case complexity.

Right you are, though of course it comes as another trade-off.

Cheers! --M

Marcus Kwok · Aug 16, 2006

red floyd said:
I suspect that O(N log2 N) actually means O(N log^2 N)

You could be right, I was just using the common convention that people
use to indicate the base. E.g., log10 is base-10 logarithm, so log2
would be base-2 logarithm.

hash_map, keys and data structures	2	Jul 2, 2010
Container to map a vector of structures	2	Apr 26, 2007
nested dictionaries and functions in data structures.	0	Jan 7, 2014
Efficient implementation of spatial occupancy grid	3	Feb 13, 2008
Data structures or STL	5	Mar 31, 2006
Problem about efficient data representation	1	Jul 1, 2008
Efficient way to store a limited number of booleans	11	Dec 11, 2007
16 to 32 bit Data Structures	6	Jun 20, 2010

Efficient data structures

Christian Christmann

mlimber

Christian Christmann

Victor Bazarov

mlimber

Jerry Coffin

tolgaceylanus

Howard

mlimber

tolgaceylanus

tolgaceylanus

Mark P

Marcus Kwok

Jerry Coffin

red floyd

mlimber

Marcus Kwok

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads