Efficient data structures

Discussion in 'C++' started by Christian Christmann, Aug 15, 2006.

  1. Hi,

    I'm looking for a data structure where I can store
    arbitrary elements and than efficiently check if an
    element (of same type as the stored elements) is contained
    in the list. The latter operation is performed pretty
    frequently, so the data structure I require must handle
    it with low komplexity.

    My idea was to use a STL set and then do something like

    myset.find( element2Search ) != myset.end()

    For sorted associative containers the "find" function
    has a logarithmic complexity. Are there any approaches
    with linear complexity?

    Regards,
    Chris
     
    Christian Christmann, Aug 15, 2006
    #1
    1. Advertising

  2. Christian Christmann

    mlimber Guest

    [changed followup to c.l.c++]

    Christian Christmann wrote:
    > Hi,
    >
    > I'm looking for a data structure where I can store
    > arbitrary elements and than efficiently check if an
    > element (of same type as the stored elements) is contained
    > in the list. The latter operation is performed pretty
    > frequently, so the data structure I require must handle
    > it with low komplexity.
    >
    > My idea was to use a STL set and then do something like
    >
    > myset.find( element2Search ) != myset.end()
    >
    > For sorted associative containers the "find" function
    > has a logarithmic complexity. Are there any approaches
    > with linear complexity?


    A heterogenous container calls for boost::any. You would have to write
    your own compare function, however, for ordering in the set (do you
    really want uniqueness and if so, in what sense -- per type or per
    value? if not, you might consider a sorted std::vector<boost::any> and
    then the standard search functions).

    Cheers! --M
     
    mlimber, Aug 15, 2006
    #2
    1. Advertising

  3. On Tue, 15 Aug 2006 07:24:13 -0700, mlimber wrote:

    >>
    >> My idea was to use a STL set and then do something like
    >>
    >> myset.find( element2Search ) != myset.end()
    >>
    >> For sorted associative containers the "find" function
    >> has a logarithmic complexity. Are there any approaches
    >> with linear complexity?

    >
    > A heterogenous container calls for boost::any. You would have to write
    > your own compare function, however, for ordering in the set (do you
    > really want uniqueness and if so, in what sense -- per type or per
    > value? if not, you might consider a sorted std::vector<boost::any> and
    > then the standard search functions).


    Maybe I didn't specify my requirements too precise. What I need is
    a "template-based" data structure where all stored elements are of the
    same type. The order of the elements in the data structure is
    irrelevant.
     
    Christian Christmann, Aug 15, 2006
    #3
  4. Christian Christmann wrote:
    > On Tue, 15 Aug 2006 07:24:13 -0700, mlimber wrote:
    >
    >>>
    >>> My idea was to use a STL set and then do something like
    >>>
    >>> myset.find( element2Search ) != myset.end()
    >>>
    >>> For sorted associative containers the "find" function
    >>> has a logarithmic complexity. Are there any approaches
    >>> with linear complexity?

    >>
    >> A heterogenous container calls for boost::any. You would have to
    >> write your own compare function, however, for ordering in the set
    >> (do you really want uniqueness and if so, in what sense -- per type
    >> or per value? if not, you might consider a sorted
    >> std::vector<boost::any> and then the standard search functions).

    >
    > Maybe I didn't specify my requirements too precise. What I need is
    > a "template-based" data structure where all stored elements are of the
    > same type. The order of the elements in the data structure is
    > irrelevant.


    If you don't use the member notation (.find), you can use 'std::find'
    with any container:

    std::blah<mytype> mycontainer; // replace 'blah' with 'vector'
    // or 'list' or 'set' or 'deque' or ...
    std::find(mycontainer.begin(), mycontainer.end(), myvalue);

    V
    --
    Please remove capital 'A's when replying by e-mail
    I do not respond to top-posted replies, please don't ask
     
    Victor Bazarov, Aug 15, 2006
    #4
  5. Christian Christmann

    mlimber Guest

    Christian Christmann wrote:
    > Maybe I didn't specify my requirements too precise. What I need is
    > a "template-based" data structure where all stored elements are of the
    > same type. The order of the elements in the data structure is
    > irrelevant.


    You could use any of the containers in the STL. Which one you choose
    will depend on how you will use it (e.g., will there be a lot of
    insertions or deletions, or just one populating on startup). If
    uniqueness is necessary, you probably want std::set. If not, you might
    consider std::multiset or std::vector (which you'd have to keep sorted
    yourself). You can't beat logarithmic complexity in searching (unless
    you already know the index into the data array, but then it's not
    really a "search").

    Cheers! --M
     
    mlimber, Aug 15, 2006
    #5
  6. Christian Christmann

    Jerry Coffin Guest

    In article <>,
    says...

    [ ... ]

    > You could use any of the containers in the STL. Which one you choose
    > will depend on how you will use it (e.g., will there be a lot of
    > insertions or deletions, or just one populating on startup). If
    > uniqueness is necessary, you probably want std::set. If not, you might
    > consider std::multiset or std::vector (which you'd have to keep sorted
    > yourself). You can't beat logarithmic complexity in searching (unless
    > you already know the index into the data array, but then it's not
    > really a "search").


    Actually, you often can beat logarithmic. Hash tables have constant
    expected complexity. You can also use an interpolating search, which
    typically has substantially lower complexity than logarithmic as well.

    A binary search ignores much of the information that's really available
    -- it blindly assumes that the best guess it can make at the location is
    the middle of the available range.

    An interpolating search is much closer to the way most people would (for
    example) look something up in a dictionary. If you're looking up 'cat',
    you know you can start fairly close to the beginning. If you're looking
    up 'victory', you know you can start fairly close to the end. Your first
    attempt usually won't be the right page, but you can (again) usually
    make quite a bit better of a guess than simply the middle of the range.

    Obviously, this _can_ backfire -- its worst-case complexity is quite
    poor. You can, however, do something like an Introsort, and switch to a
    plain binary search if you find that it's not working well for a
    particular search.

    Note that an interpolating search requires a random acces iterator -- it
    generally doesn't work in something like a tree structure.

    --
    Later,
    Jerry.

    The universe is a figment of its own imagination.
     
    Jerry Coffin, Aug 15, 2006
    #6
  7. Christian Christmann

    Guest

    Besides hashes, you can beat logaritmic complexity in
    searches depending on specifics of your data.
    radix sort, bucket sort, counting sort are all linear.

    Tolga Ceylan
     
    , Aug 15, 2006
    #7
  8. Christian Christmann

    Howard Guest

    <> wrote in message
    news:...
    >
    > Besides hashes, you can beat logaritmic complexity in
    > searches depending on specifics of your data.
    > radix sort, bucket sort, counting sort are all linear.
    >


    Linear (O(N)) is worse that logarithmic (O(log N)) on average. Perhaps
    you're thinking of O(N log N) when you say "logarithmic"? Or perhaps you
    mean constant time (O(1)) when you say "linear"?

    -Howard
     
    Howard, Aug 15, 2006
    #8
  9. Christian Christmann

    mlimber Guest

    Jerry Coffin wrote:
    > In article <>,
    > says...
    >
    > [ ... ]
    >
    > > You could use any of the containers in the STL. Which one you choose
    > > will depend on how you will use it (e.g., will there be a lot of
    > > insertions or deletions, or just one populating on startup). If
    > > uniqueness is necessary, you probably want std::set. If not, you might
    > > consider std::multiset or std::vector (which you'd have to keep sorted
    > > yourself). You can't beat logarithmic complexity in searching (unless
    > > you already know the index into the data array, but then it's not
    > > really a "search").

    >
    > Actually, you often can beat logarithmic. Hash tables have constant
    > expected complexity.


    Ok, I should have said: "You can't beat logarithmic complexity with the
    current standard library." (Hashes are part of TR1, however.) In any
    case, while hashes can certainly be helpful, their worst-case guarantee
    O(n) is obviously worse than logarithmic performance. In addition,
    creating a good hash function isn't a trivial task (a bad one can
    severly degrade performance), and the computations required to evaluate
    the hash function can be slow. Less theoretically, hashes can decrease
    locality of reference, which may degrade performance on particular
    systems. Suffice it to say, there are trade-offs involved in choosing
    data structures and algorithms.

    > You can also use an interpolating search, which
    > typically has substantially lower complexity than logarithmic as well.
    >
    > A binary search ignores much of the information that's really available
    > -- it blindly assumes that the best guess it can make at the location is
    > the middle of the available range.
    >
    > An interpolating search is much closer to the way most people would (for
    > example) look something up in a dictionary. If you're looking up 'cat',
    > you know you can start fairly close to the beginning. If you're looking
    > up 'victory', you know you can start fairly close to the end. Your first
    > attempt usually won't be the right page, but you can (again) usually
    > make quite a bit better of a guess than simply the middle of the range.
    >
    > Obviously, this _can_ backfire -- its worst-case complexity is quite
    > poor. You can, however, do something like an Introsort, and switch to a
    > plain binary search if you find that it's not working well for a
    > particular search.
    >
    > Note that an interpolating search requires a random acces iterator -- it
    > generally doesn't work in something like a tree structure.


    You can beat logarithmic complexity in average complexity but, as far
    as I know, not in worst-case complexity or with standard library
    functions.

    Cheers! --M
     
    mlimber, Aug 15, 2006
    #9
  10. Christian Christmann

    Guest

    Oppps... "logarithmic" is the wrong word here. It should have been
    N log2 N

    By linear, I mean O(N).

    Howard wrote:
    >
    > Linear (O(N)) is worse that logarithmic (O(log N)) on average. Perhaps
    > you're thinking of O(N log N) when you say "logarithmic"? Or perhaps you
    > mean constant time (O(1)) when you say "linear"?
    >
    > -Howard


    Thanks for the fix.

    Tolga Ceylan
     
    , Aug 15, 2006
    #10
  11. Christian Christmann

    Guest

    Also, I meant "in sorting" not "in searching". My posting is not very
    related
    with the original question. I guess in this case hashes O(1) can beat
    O(log N)
    complexity of the sets for the 'lookup' operation.

    (assuming sets are typically implemented with red-black trees.)

    (Shouldn't have posted at all.. :-}} )

    Tolga Ceylan

    Howard wrote:
    >
    > Linear (O(N)) is worse that logarithmic (O(log N)) on average. Perhaps
    > you're thinking of O(N log N) when you say "logarithmic"? Or perhaps you
    > mean constant time (O(1)) when you say "linear"?
    >
    > -Howard
     
    , Aug 15, 2006
    #11
  12. Christian Christmann

    Mark P Guest

    Christian Christmann wrote:
    > Hi,
    >
    > I'm looking for a data structure where I can store
    > arbitrary elements and than efficiently check if an
    > element (of same type as the stored elements) is contained
    > in the list. The latter operation is performed pretty
    > frequently, so the data structure I require must handle
    > it with low komplexity.
    >
    > My idea was to use a STL set and then do something like
    >
    > myset.find( element2Search ) != myset.end()
    >
    > For sorted associative containers the "find" function
    > has a logarithmic complexity. Are there any approaches
    > with linear complexity?
    >


    Why would you want linear complexity when log is clearly faster? Or did
    you mean constant time complexity? The canonical data structure for
    this sort of problem is a hash table. Unfortunately this is not (yet)
    part of standard C++ but it is a commonly provided extension.
     
    Mark P, Aug 15, 2006
    #12
  13. Christian Christmann

    Marcus Kwok Guest

    wrote:
    > Howard wrote:
    >> Linear (O(N)) is worse that logarithmic (O(log N)) on average. Perhaps
    >> you're thinking of O(N log N) when you say "logarithmic"? Or perhaps you
    >> mean constant time (O(1)) when you say "linear"?

    >
    > Oppps... "logarithmic" is the wrong word here. It should have been
    > N log2 N


    Just a small thing: O(N log N) == O(N log2 N). This is because Big-O
    does not care about constant factors.

    N log2 N == N (log N / log 2) == (1/log 2) * N log N

    1/log 2 is a constant so it can be dropped.

    --
    Marcus Kwok
    Replace 'invalid' with 'net' to reply
     
    Marcus Kwok, Aug 15, 2006
    #13
  14. Christian Christmann

    Jerry Coffin Guest

    In article <>,
    says...

    [ ... ]

    > Ok, I should have said: "You can't beat logarithmic complexity with the
    > current standard library." (Hashes are part of TR1, however.) In any
    > case, while hashes can certainly be helpful, their worst-case guarantee
    > O(n) is obviously worse than logarithmic performance.


    While the hash tables included in TR1 may have O(N) complexity in the
    worst case, a hash table can be designed to provide logarithmic worst-
    case complexity.

    [ ... ]

    > You can beat logarithmic complexity in average complexity but, as far
    > as I know, not in worst-case complexity or with standard library
    > functions.


    That sounds reasonable to me.

    --
    Later,
    Jerry.

    The universe is a figment of its own imagination.
     
    Jerry Coffin, Aug 16, 2006
    #14
  15. Christian Christmann

    red floyd Guest

    Marcus Kwok wrote:
    >
    > Just a small thing: O(N log N) == O(N log2 N). This is because Big-O
    > does not care about constant factors.
    >
    > N log2 N == N (log N / log 2) == (1/log 2) * N log N
    >
    > 1/log 2 is a constant so it can be dropped.


    I suspect that O(N log2 N) actually means O(N log^2 N)
     
    red floyd, Aug 16, 2006
    #15
  16. Christian Christmann

    mlimber Guest

    Jerry Coffin wrote:
    > While the hash tables included in TR1 may have O(N) complexity in the
    > worst case, a hash table can be designed to provide logarithmic worst-
    > case complexity.


    Right you are, though of course it comes as another trade-off.

    Cheers! --M
     
    mlimber, Aug 16, 2006
    #16
  17. Christian Christmann

    Marcus Kwok Guest

    red floyd <> wrote:
    > Marcus Kwok wrote:
    >> Just a small thing: O(N log N) == O(N log2 N). This is because Big-O
    >> does not care about constant factors.
    >>
    >> N log2 N == N (log N / log 2) == (1/log 2) * N log N
    >>
    >> 1/log 2 is a constant so it can be dropped.

    >
    > I suspect that O(N log2 N) actually means O(N log^2 N)


    You could be right, I was just using the common convention that people
    use to indicate the base. E.g., log10 is base-10 logarithm, so log2
    would be base-2 logarithm.

    --
    Marcus Kwok
    Replace 'invalid' with 'net' to reply
     
    Marcus Kwok, Aug 16, 2006
    #17
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Peter
    Replies:
    1
    Views:
    386
    Steve C. Orr [MVP, MCSD]
    Nov 9, 2004
  2. Gabriel Genellina

    Efficient format for huge amount of data

    Gabriel Genellina, Jan 20, 2004, in forum: Java
    Replies:
    21
    Views:
    843
    Alan Chen
    Jan 23, 2004
  3. John Galt
    Replies:
    4
    Views:
    540
    Chris Smith
    Feb 13, 2004
  4. tweak
    Replies:
    14
    Views:
    2,820
    Eric Sosman
    Jun 11, 2004
  5. Alfonso Morra
    Replies:
    11
    Views:
    754
    Emmanuel Delahaye
    Sep 24, 2005
Loading...

Share This Page