Binary tree search vs Binary search

Discussion in 'C Programming' started by Bogdan, Oct 18, 2010.

  1. Bogdan

    Bogdan Guest

    Hi

    I would like to check this result:
    I have tested binary tree search API from Linux (search.h) by
    searching into a binary tree generated from an unsorted dictionary a
    given word. The second program uses the same dictionary, from which an
    array is initialized which is sorted with qsort() then the same word
    is searched with binary search fct (bsearch()).
    Running both programs shows that the binary search (the second
    program) is almost twice as fast as the binary tree search.
    I have used directly the sorted distionary (so qsort() should have
    the highest complexity), but still the second program is the fastest.
    Is this the correct result ? What advantage has binary tree search
    wrt qsort() followed by binary search ?

    thanks
    Bogdan
     
    Bogdan, Oct 18, 2010
    #1
    1. Advertising

  2. Bogdan

    Tom St Denis Guest

    On Oct 18, 1:14 pm, Bogdan <> wrote:
    > Hi
    >
    >    I would like to check this result:
    > I have tested binary tree search API from Linux (search.h) by
    > searching into a binary tree generated from an unsorted dictionary a
    > given word. The second program uses the same dictionary, from which an
    > array is initialized which is sorted with qsort() then the same word
    > is searched with binary search fct (bsearch()).
    >   Running both programs shows that the binary search (the second
    > program) is almost twice as fast as the binary tree search.
    >   I have used directly the sorted distionary (so qsort() should have
    > the highest complexity), but still the second program is the fastest.
    >  Is this the correct result ? What advantage has binary tree search
    > wrt qsort() followed by binary search ?


    Binary search (e.g. ordered array) is O(log(n)) search but O(n)
    insert. Binary tree is O(log(n)) search *and* O(log(n)) insert.

    The overhead you're finding by a constant is acceptable (and falls
    below the noise in big-Oh notation). They should in theory take
    roughly the same time as n => oo, but for small values they'll skew.

    Tom
     
    Tom St Denis, Oct 18, 2010
    #2
    1. Advertising

  3. On Oct 19, 12:43 am, Tom St Denis <> wrote:
    > On Oct 18, 1:14 pm, Bogdan <> wrote:
    > >   Running both programs shows that the binary search (the second
    > > program) is almost twice as fast as the binary tree search.


    > Binary search (e.g. ordered array) is O(log(n)) search but O(n)
    > insert.  Binary tree is O(log(n)) search *and* O(log(n)) insert.


    Correct.

    > The overhead you're finding by a constant is acceptable


    How in heaven's name do you know that?? I was once called
    in to fix a poorly designed program that took 10 hours
    to process data from an 8-hour shift. Customer was running
    two shifts per day OK, but couldn't start a graveyard
    shift till the software was sped up! Perhaps I should have
    just told them "No problem! Some guy on Usenet says
    this is 'acceptable'." :) :)

    > (and falls below the noise in big-Oh notation).


    Meaningless gibberish.

    >  They should in theory take
    > roughly the same time as n => oo, but for small values they'll skew.


    Wrong again. They take respective approx. times of
    a_1 + b_1 log N and a_2 + b_2 log N
    What *is* correct is that you want to choose large N to
    measure (b_1 / b_2) accurately. To assume a priori this ratio
    will be unity shows much confusion.

    Hope this helps.
    James Dow Allen
     
    James Dow Allen, Oct 18, 2010
    #3
  4. Bogdan <> writes:

    > I would like to check this result:
    > I have tested binary tree search API from Linux (search.h) by
    > searching into a binary tree generated from an unsorted dictionary a
    > given word. The second program uses the same dictionary, from which an
    > array is initialized which is sorted with qsort() then the same word
    > is searched with binary search fct (bsearch()).
    > Running both programs shows that the binary search (the second
    > program) is almost twice as fast as the binary tree search.
    > I have used directly the sorted distionary (so qsort() should have
    > the highest complexity), but still the second program is the fastest.
    > Is this the correct result ? What advantage has binary tree search
    > wrt qsort() followed by binary search ?


    I won't repeat what's been said already, but I will add that to get a
    better understanding of what's happening you'll probably have to post
    the code. It may be that matters unrelated to the abstract algorithms
    such as IO or memory allocation are having a significant impact.

    --
    Ben.
     
    Ben Bacarisse, Oct 18, 2010
    #4
  5. Bogdan

    Guest

    On Oct 18, 12:14 pm, Bogdan <> wrote:
    > Hi
    >
    >    I would like to check this result:
    > I have tested binary tree search API from Linux (search.h) by
    > searching into a binary tree generated from an unsorted dictionary a
    > given word. The second program uses the same dictionary, from which an
    > array is initialized which is sorted with qsort() then the same word
    > is searched with binary search fct (bsearch()).
    >   Running both programs shows that the binary search (the second
    > program) is almost twice as fast as the binary tree search.
    >   I have used directly the sorted distionary (so qsort() should have
    > the highest complexity), but still the second program is the fastest.
    >  Is this the correct result ? What advantage has binary tree search
    > wrt qsort() followed by binary search ?



    In additional to what other have said, the pointer computation for
    binary search may be a bit quicker than the pointer chasing you get
    for a binary tree. The binary search may be a bit more cache friendly
    as well. Although that's all implementation dependent, and there are
    few absolutes.

    A significant difference is that a binary tree can be unbalanced,
    which may make certain leaf items considerably deeper in the tree than
    the best case lg(n) depth. A worst case is often generated by
    inserting items in the binary tree in ascending or descending order,
    which may reduce the binary tree to a sequential linked list. There
    are tree structures that remain balanced (FSVO "balance") during
    insertion and deletion, but they tend to be more complex than a simple
    binary tree.
     
    , Oct 18, 2010
    #5
  6. Bogdan

    Gene Guest

    On Oct 18, 1:14 pm, Bogdan <> wrote:
    > Hi
    >
    >    I would like to check this result:
    > I have tested binary tree search API from Linux (search.h) by
    > searching into a binary tree generated from an unsorted dictionary a
    > given word. The second program uses the same dictionary, from which an
    > array is initialized which is sorted with qsort() then the same word
    > is searched with binary search fct (bsearch()).
    >   Running both programs shows that the binary search (the second
    > program) is almost twice as fast as the binary tree search.
    >   I have used directly the sorted distionary (so qsort() should have
    > the highest complexity), but still the second program is the fastest.
    >  Is this the correct result ? What advantage has binary tree search
    > wrt qsort() followed by binary search ?


    As has been said, pre-qsorting is an average case O(n log n) operation
    and then looking up O(n) entries with binary search will again require
    O(n log n) time. Inserting O(n) items in e.g. a red-black or similar
    balanced tree requires O(n log n) time, and looking up O(n) items is
    again O(n log n). So asymptotically the solutions have comparable
    asymptotic performance as long as the dictionary isn't modified once
    sorted. If modifications can occur, the tree works better because
    each update requires only O(log n) time. The array requires O(n) time
    to update as has been explained.

    Or of course if the tree implementation is not self-balancing, then
    search length is a random phenomenon.

    You said you're looking up "a given word." If you are looking up only
    a single word, then chance difference between path length in the
    balanced tree and number of probes in the binary search table alone
    could account for a factor of 2.

    Otherwise, it's that constant factors related to the implementations
    are different. Inserting a string into a tree requires allocating a
    node in addition to the string. There is space overhead for pointers
    in the tree that isn't there in the sorted list. Extra space and the
    way it falls in memory can affect cache and paging performance as has
    been mentioned. Rebalancing the tree is overhead that isn't present
    in the array, etc. A factor of 2 or 3 difference is not hard to
    believe.

    FWIW, I just happened to read this
    http://ieeexplore.ieee.org/Xplore/l...530308.pdf?arnumber=5530308&authDecision=-203
    , a nicely done article about how merely realigning critical code
    segments with respect to page / cache line boundaries (by e.g.
    changing the size of the OS shell environment when starting the
    program) can in some cases affect run time by 20%.
     
    Gene, Oct 19, 2010
    #6
  7. Bogdan

    Eric Sosman Guest

    On 10/18/2010 9:44 PM, Gene wrote:
    > On Oct 18, 1:14 pm, Bogdan<> wrote:
    >> Hi
    >>
    >> I would like to check this result:
    >> I have tested binary tree search API from Linux (search.h) by
    >> searching into a binary tree generated from an unsorted dictionary a
    >> given word. The second program uses the same dictionary, from which an
    >> array is initialized which is sorted with qsort() then the same word
    >> is searched with binary search fct (bsearch()).
    >> Running both programs shows that the binary search (the second
    >> program) is almost twice as fast as the binary tree search.
    >> I have used directly the sorted distionary (so qsort() should have
    >> the highest complexity), but still the second program is the fastest.
    >> Is this the correct result ? What advantage has binary tree search
    >> wrt qsort() followed by binary search ?

    >
    > As has been said, pre-qsorting is an average case O(n log n) operation
    > and then looking up O(n) entries with binary search will again require
    > O(n log n) time. Inserting O(n) items in e.g. a red-black or similar
    > balanced tree requires O(n log n) time, and looking up O(n) items is
    > again O(n log n). So asymptotically the solutions have comparable
    > asymptotic performance as long as the dictionary isn't modified once
    > sorted.


    If by "comparable" you mean "essentially the same except for
    negligible fiddly bits," you are repeating Tom St Denis' error: The
    fact that two processes have the same order-of-magnitude asymptotic
    performance does not mean they have "comparable" performance. For
    example the time (or memory or comparisons or whatever) used two
    processes A and B, both of O(1), can differ by an arbitrarily large
    amount.

    Brutally simplistic example: Take an O(N log N) sorting method S
    that uses 100 nsec for each comparison and negligible time for other
    operations. Add a 2.7-month delay loop to each comparison to arrive
    at a new sorting method S2. Assertion: Both S and S2 have O(N log N)
    performance, even though S can sort 1000 items in a millisecond while
    S2 takes 2243 years. A millisecond is "comparable" to two-plus
    millennia only for people who have way too much time on their hands.

    --
    Eric Sosman
    lid
     
    Eric Sosman, Oct 19, 2010
    #7
  8. Bogdan

    Tom St Denis Guest

    On Oct 18, 3:04 pm, James Dow Allen <> wrote:
    > > The overhead you're finding by a constant is acceptable

    >
    > How in heaven's name do you know that??  I was once called
    > in to fix a poorly designed program that took 10 hours
    > to process data from an 8-hour shift.  Customer was running
    > two shifts per day OK, but couldn't start a graveyard
    > shift till the software was sped up!  Perhaps I should have
    > just told them "No problem!  Some guy on Usenet says
    > this is 'acceptable'."    :)   :)


    Well I didn't say it was optimal, I just said it's not a big deal. If
    he's running into a timing hazard then it is a problem. See how that
    works [read on...]

    > > (and falls below the noise in big-Oh notation).

    >
    > Meaningless gibberish.


    No, it's not. See the operation is log(n) time which is greater than
    a constant multiple c. O(c*log(n)) is equiv to O(log(n)) therefore a
    small constant factor difference is "below the noise" or irrelevant.

    > >  They should in theory take
    > > roughly the same time as n => oo, but for small values they'll skew.

    >
    > Wrong again.  They take respective approx. times of
    >      a_1 + b_1 log N   and   a_2 + b_2 log N
    > What *is* correct is that you want to choose large N to
    > measure (b_1 / b_2) accurately.  To assume a priori this ratio
    > will be unity shows much confusion.


    You really don't understand big-Oh notation do you? ...

    Tom
     
    Tom St Denis, Oct 19, 2010
    #8
  9. Tom St Denis <> writes:

    > On Oct 18, 3:04 pm, James Dow Allen <> wrote:
    >> Tom St Denis <> writes: [attribution restored]


    <snip talking about two O(log n) algorithms>

    >> >  They should in theory take
    >> > roughly the same time as n => oo, but for small values they'll skew.

    >>
    >> Wrong again.  They take respective approx. times of
    >>      a_1 + b_1 log N   and   a_2 + b_2 log N
    >> What *is* correct is that you want to choose large N to
    >> measure (b_1 / b_2) accurately.  To assume a priori this ratio
    >> will be unity shows much confusion.

    >
    > You really don't understand big-Oh notation do you? ...


    Which bit makes you think that?

    Maybe you should define what you mean by "they should in theory take
    roughly the same time as n => oo"?

    James Dow Allen is making the reasonable assumption that this means

    lim [n->oo] T1(n)/T2(n) = 1

    where T1 and T2 are functions that describe the two algorithms running
    times. How do you define "taking roughly the same time" so that it
    covers the example given by James?

    --
    Ben.
     
    Ben Bacarisse, Oct 19, 2010
    #9
  10. On Oct 19, 12:43 am, Tom St Denis <> wrote:
    > [OP reported on a speed comparison]:
    >> [bsearch presents twice the speed of bin tree search]

    > The overhead you're finding by a constant is acceptable (and falls
    > below the noise in big-Oh notation). They should in theory take
    > roughly the same time as n => oo, but for small values they'll skew.


    And now, he tries to defend this foolishness! :)

    First note that OP presents a speed-doubling and Tom implies
    that it will go away for large n. :) :)
    Tom: Do you still believe that, or did Googling "big-Oh" help?

    Tom St Denis <> wrote, in
    news::

    > On Oct 18, 3:04 pm, James Dow Allen <> wrote:
    >> > The overhead you're finding by a constant is acceptable


    Tom, did your newsreader remove the attribution on the above
    line? It's by Tom St Denis. If you now understand it was a foolish
    remark, good for you. But don't remove the attribution, please.

    >>
    >> How in heaven's name do you know that?? ... Perhaps I should have
    >> just told them "No problem!  Some guy on Usenet says
    >> this is 'acceptable'."    :)   :)

    >
    > Well I didn't say it was optimal, I just said it's not a big deal. If
    > he's running into a timing hazard then it is a problem.


    No. It's true that you don't say it's "optimal". But it's
    false that you said "it's no big deal." What you wrote is
    that it's "acceptable" and my question remains:
    How do you know that???

    If you meant it "might be acceptable" or "is probably fast enough"
    you should have written that. I'm not a stickler for "dotting every i",
    but in this case the way you phrased your answer was, to be blunt,
    inane.

    It's not unusual for such a search to dominate speed performance.
    Speedup by a factor of two is, as a matter of fact, not unlikely to be
    important! True, it's "often (or even usually) unimportant" but that's
    not what you wrote, is it? BTW, since you're wrong on *every single
    point* in your message here, I doubt you'll respond again, but if you
    do, kindly have the courtesy to quote and correctly attribute your own
    incorrect remarks which are under review.

    >> > (and falls below the noise in big-Oh notation).

    >>
    >> Meaningless gibberish.

    >
    > No, it's not.


    Yes it is. There is no "noise" in big-Oh notation for
    it to "fall below." If this isn't clear, please present
    an example which falls *above* your alleged "noise" in
    big-Oh notation. :)

    > O(c*log(n)) is equiv to O(log(n))


    So you *did* know this much after all!
    (Or did you Google for it just now?)


    >> >  They should in theory take
    >> > roughly the same time as n => oo, but for small values they'll
    >> > skew.

    >>
    >> Wrong again.  They take respective approx. times of
    >>      a_1 + b_1 log N   and   a_2 + b_2 log N
    >> What *is* correct is that you want to choose large N to
    >> measure (b_1 / b_2) accurately.  To assume a priori this ratio
    >> will be unity shows much confusion.

    >
    > You really don't understand big-Oh notation do you? ...


    :) :). We don't mind when ignorant newbies post here.
    Even when they act like insolent jerks, they're amusing.

    James Dow Allen
     
    James Dow Allen, Oct 19, 2010
    #10
  11. James Dow Allen <> might have writ, in
    news:Xns9E16D98DFA353jamesdowallen@78.46.73.112:
    >> You really don't understand big-Oh notation do you? ...

    >
    >:) :). We don't mind when ignorant newbies post here.
    > Even when they act like insolent jerks, they're amusing.


    Since I tend to be *more* patient than others
    with newbies in Usenet, readers may wonder why I went out of
    my way to criticise this pompous imbecile. It's in part because
    of another pompous and imbecilic post he made recently:

    On Sep 7, 3:31 am, Tom St Denis <> wrote:
    > > In article
    > > <>,
    > > James Dow Allen <> wrote:
    > > >What I AM asking is: Why aren't the rest of you doing this?
    > > > :)

    >
    > usually when a newb comes into a group [any group] and says things
    > like "why not we do everything while standing on our heads?" it's
    > because they don't know what they're talking about.
    >
    > This doesn't mean we don't listen to the newbs, it just means we don't
    > follow them. It's not our fault you can't tell the difference.
    >
    > As to the specifics here, the solution to the OPs problem is called a
    > "processor cache" and "stack opcode optimizations" [something modern
    > high performance processors have] making the need to inline your
    > compare function rather moot.


    What was especially amusing here is that his final paragraph, perhaps
    intended to be helpful. was a complete non sequitur regardless of
    how OP was misconstrued. With a little more Googling, Denis may
    soon be up-to-speed on the simpler facts about big-Oh notation,
    but I think he needs to take up Remedial Reading for Comprehension.

    Hope this helps.
    James Dow Allen
     
    James Dow Allen, Oct 19, 2010
    #11
  12. Bogdan

    Tom St Denis Guest

    On Oct 19, 10:23 am, James Dow Allen <>
    wrote:

    <snip>

    This is my last reply to you since you're intent on being openly
    hostile.

    > First note that OP presents a speed-doubling and Tom implies
    > that it will go away for large n.  :)  :)
    > Tom:  Do you still believe that, or did Googling "big-Oh" help?


    Do you know what asymptotic convergence is? Maybe a 2x speedup when
    n=10 is present but disappears when n=2^40?

    > Tom, did your newsreader remove the attribution on the above
    > line?  It's by Tom St Denis.  If you now understand it was a foolish
    > remark, good for you.  But don't remove the attribution, please.


    I removed the excess quotation to trim the reply down. People can see
    the attribution in the headers.

    > No.  It's true that you don't say it's "optimal".  But it's
    > false that you said "it's no big deal."  What you wrote is
    > that it's "acceptable" and my question remains:
    > How do you know that???


    It does depend on the requirements whether it's "acceptable" or not.
    But premature optimization is the root of all stupid. If the existing
    code uses a tree, porting it all over to a linear array might not be
    worth the 2x improvement if the current performance isn't a problem.

    You don't know that. I don't know that.

    > It's not unusual for such a search to dominate speed performance.
    > Speedup by a factor of two is, as a matter of fact, not unlikely to be
    > important!  True, it's "often (or even usually) unimportant" but that's
    > not what you wrote, is it? BTW, since you're wrong on *every single
    > point* in your message here, I doubt you'll respond again, but if you
    > do, kindly have the courtesy to quote and correctly attribute your own
    > incorrect remarks which are under review.


    You're being pedantic to basically have something to argue over. I
    maintain I don't care one way or the other.

    > Yes it is.  There is no "noise" in big-Oh notation for
    > it to "fall below."  If this isn't clear, please present
    > an example which falls *above* your alleged "noise" in
    > big-Oh notation.  :)


    Um ... you clearly don't understand big-Oh notation. I'm sorry but if
    you write something like this you just don't get what big-Oh
    accomplishes.

    > >  O(c*log(n)) is equiv to O(log(n))

    >
    > So you *did* know this much after all!
    > (Or did you Google for it just now?)


    I didn't google it. I went to college.

    > > You really don't understand big-Oh notation do you? ...

    >
    > :)  :).   We don't mind when ignorant newbies post here.
    > Even when they act like insolent jerks, they're amusing.


    How am I acting like an insolent jerk? Just because you disagree with
    what I wrote doesn't mean I'm a jerk.

    Anyways, please don't reply to this post, I won't reply to you in
    kind. You're clearly trying to pick a fight, and truth be told I
    don't really care about the outcome of this thread.

    Tom
     
    Tom St Denis, Oct 19, 2010
    #12
  13. On Oct 19, 9:58 pm, Tom St Denis <> wrote:
    > Do you know what asymptotic convergence is?  Maybe a 2x speedup when
    > n=10 is present but disappears when n=2^40?


    It's comments like this that make us wonder whether you
    indeed understand big-Oh or not. You claim to have Googled
    to learn O(c log N) = O(log N), but the above comment
    implies you think c = 1 ??? I'll admit to having been
    somewhat ironic in my replies -- I suspect you've understood
    the rudiments of big-Oh all along -- but why this strange insistence
    that c should be 1 ?

    > > Tom, did your newsreader remove the attribution on the above
    > > line?  It's by Tom St Denis.  If you now understand it was a foolish
    > > remark, good for you.  But don't remove the attribution, please.

    >
    > I removed the excess quotation to trim the reply down.  People can see
    > the attribution in the headers.


    False. The Reference list does not show clearly who wrote
    what. You unnecessarily snipped a single line; the most logical
    reason for that was to obfuscate the attribution your own
    incorrect comment.

    > > There is no "noise" in big-Oh notation for
    > > it to "fall below."  If this isn't clear, please present
    > > an example which falls *above* your alleged "noise" in
    > > big-Oh notation.  :)

    >
    > Um ... you clearly don't understand big-Oh notation.  I'm sorry but if
    > you write something like this you just don't get what big-Oh
    > accomplishes.


    Despite my sarcasm, I actually suspect you are more-or-less
    familiar with big-Oh. I'm mildly curious whether you're just being
    nasty, or honestly think I'm not familiar with it.

    And, if you insist that your nonsense about "noise" was
    meaningful after all, why didn't you do what you were told and
    "present an example which falls *above* your alleged noise."
    :)

    > Anyways, please don't reply to this post,


    *You* pick a fight, post more insulting and useless drivel
    and then ask *me* not to respond!! :) :)

    I don't give a shit if you respond to this one or not.

    James
     
    James Dow Allen, Oct 19, 2010
    #13
  14. Bogdan

    Sebastian Guest

    On 18 Okt., 19:43, Tom St Denis wrote:
    >
    > Binary search (e.g. ordered array) is O(log(n)) search but O(n)
    > insert.  Binary tree is O(log(n)) search *and* O(log(n)) insert.
    >
    > The overhead you're finding by a constant is acceptable (and falls
    > below the noise in big-Oh notation).  They should in theory take
    > roughly the same time as n => oo, but for small values they'll skew.


    Tom, I know you are a smart guy. But this time you are wrong.

    Here's a simple counter example:

    Algorithm A requires log2(n)+8 "costly operations".
    Algorithm B requires 2*log2(n) "costly operations".
    Both are O(log n).
    For small n B is faster than A.
    But if you let n approach infinity, A will be faster than B.
    The ratio between the two will aproach 2, not 1.
    So, they won't require "roughly the same time".

    Cheers!
    SG
     
    Sebastian, Oct 20, 2010
    #14
  15. Bogdan

    Tom St Denis Guest

    On Oct 20, 6:02 am, Sebastian <> wrote:
    > Tom, I know you are a smart guy.  But this time you are wrong.
    >
    > Here's a simple counter example:
    >
    >   Algorithm A requires log2(n)+8 "costly operations".
    >   Algorithm B requires 2*log2(n) "costly operations".
    >   Both are O(log n).
    >   For small n B is faster than A.
    >   But if you let n approach infinity, A will be faster than B.
    >   The ratio between the two will aproach 2, not 1.
    >   So, they won't require "roughly the same time".


    Except for all we [likely] know the ratio 2:1 appears at small sample
    set sizes. But that a comment of cache performance and not big-Oh.
    For example, when the sample set size gets so big it doesn't fit in
    memory, disk access will essentially destroy any advantage and they'll
    both be for all intents and purposes the same speed.

    In the big-Oh there is no such thing as O(2log n) as a "final
    answer." So in both cases I stand by my guess [which is all we're
    doing here] that the 2x difference is probably nothing to write home
    about.

    I suspect a binary search to be faster overall [with the advantage
    going down as size goes up], but the more important thing I said in my
    post was that a) it's in the noise and b) Insert is slower than with a
    tree. Granted I should have said "it's *probably* in the noise" but
    the general essence of the thought remains mostly valid.

    All we know from the OP is he searched a dictionary and one was
    "almost" twice as fast as the other. Does the table have 50
    elements? 50,000 elements? 50M elements? What sort of architecture
    is it? etc...

    Now why this turned into a flamewar between another poster and I I
    don't know. I never said anything derogatory to them. Alas, this is
    USENET ...

    Tom
     
    Tom St Denis, Oct 20, 2010
    #15
  16. Bogdan

    Eric Sosman Guest

    On 10/20/2010 6:48 AM, Tom St Denis wrote:
    > On Oct 20, 6:02 am, Sebastian<> wrote:
    >> Tom, I know you are a smart guy. But this time you are wrong.
    >>
    >> Here's a simple counter example:
    >>
    >> Algorithm A requires log2(n)+8 "costly operations".
    >> Algorithm B requires 2*log2(n) "costly operations".
    >> Both are O(log n).
    >> For small n B is faster than A.
    >> But if you let n approach infinity, A will be faster than B.
    >> The ratio between the two will aproach 2, not 1.
    >> So, they won't require "roughly the same time".

    >
    > Except for all we [likely] know the ratio 2:1 appears at small sample
    > set sizes. But that a comment of cache performance and not big-Oh.
    > For example, when the sample set size gets so big it doesn't fit in
    > memory, disk access will essentially destroy any advantage and they'll
    > both be for all intents and purposes the same speed.


    All you're saying is that real computers are finite, meaning
    that we'll always have N <= Nmax. That, in turn, implies that all
    real implementations of all algorithms, even bogosort, have O(1)
    performance because f(N) <= f(Nmax) == constant == O(1).

    So, yes: You can, sort of, defend your misunderstanding of O()
    notation, at the cost of making O() notation useless.

    --
    Eric Sosman
    lid
     
    Eric Sosman, Oct 20, 2010
    #16
  17. Bogdan

    Tom St Denis Guest

    On Oct 20, 7:32 am, Eric Sosman <> wrote:
    > On 10/20/2010 6:48 AM, Tom St Denis wrote:
    >
    >
    >
    >
    >
    >
    >
    >
    >
    > > On Oct 20, 6:02 am, Sebastian<>  wrote:
    > >> Tom, I know you are a smart guy.  But this time you are wrong.

    >
    > >> Here's a simple counter example:

    >
    > >>    Algorithm A requires log2(n)+8 "costly operations".
    > >>    Algorithm B requires 2*log2(n) "costly operations".
    > >>    Both are O(log n).
    > >>    For small n B is faster than A.
    > >>    But if you let n approach infinity, A will be faster than B.
    > >>    The ratio between the two will aproach 2, not 1.
    > >>    So, they won't require "roughly the same time".

    >
    > > Except for all we [likely] know the ratio 2:1 appears at small sample
    > > set sizes.  But that a comment of cache performance and not big-Oh.
    > > For example, when the sample set size gets so big it doesn't fit in
    > > memory, disk access will essentially destroy any advantage and they'll
    > > both be for all intents and purposes the same speed.

    >
    >      All you're saying is that real computers are finite, meaning
    > that we'll always have N <= Nmax.  That, in turn, implies that all
    > real implementations of all algorithms, even bogosort, have O(1)
    > performance because f(N) <= f(Nmax) == constant == O(1).
    >
    >      So, yes: You can, sort of, defend your misunderstanding of O()
    > notation, at the cost of making O() notation useless.


    It's not useless though...

    Compare Comba vs. Karatsuba multiplication...

    Comba is basically n^2 * (work of MULADD) + n * (work of carry
    prop) ... or O(n^2)

    Where Karatsuba is

    3x * comba of n/2 size + various 2n-length additions [with carry
    prop], etc.. or O(n^log_2(3))

    So even though Comba is FASTER at small values of n we still say
    "Karatsuba is more efficient." [even though at small sizes we'd use
    Comba instead]. Now if there was another technique that was
    2*Karatsuba work you'd still write it as O(n^log_2(3)) even though
    you'd know that at most reasonable sizes it'd be slower. I didn't
    invent big-Oh notation, so stop harping on me. Nor am I saying "all
    algorithms of equal big-Oh work are equivalent."

    But we don't know for a fact that the 2:1 ratio caries through larger
    [possibly practical] sizes. All I was saying is a small constant
    factor isn't an eye opener. If the OP had wrote that it was 10x
    faster then yes, clearly something is afoot, but 2x faster on say a
    500 word dictionary doesn't exactly scream FUNDAMENTAL FLAW to me. It
    could just be that there are more cache hits with a dataset that
    small. Which would mean if your application solely deals with
    searches [and no inserts] of 500 word datasets that a binary search is
    ideal. Nobody is arguing that [certainly I'm not]. But as to the
    concept of broad generalizations I'd have a hard time extrapolating
    whether a 2x speed difference FROM ===ONE=== SAMPLE SET is significant
    or not since it IS below the asymptotic work level threshold.

    IOW, stop putting words into my mouth and stop assuming that you know
    more about the OPs problem than anyone else does (since the OP hasn't
    explained their problem in enough detail).

    Tom
     
    Tom St Denis, Oct 20, 2010
    #17
  18. Bogdan

    Sebastian Guest

    On 20 Okt., 13:46, Tom St Denis wrote:
    > On Oct 20, 7:32 am, Eric Sosman wrote:
    > > On 10/20/2010 6:48 AM, Tom St Denis wrote:
    > > [...]
    > >      All you're saying is that real computers are finite, meaning
    > > that we'll always have N <= Nmax.  That, in turn, implies that all
    > > real implementations of all algorithms, even bogosort, have O(1)
    > > performance because f(N) <= f(Nmax) == constant == O(1).

    >
    > >      So, yes: You can, sort of, defend your misunderstanding of O()
    > > notation, at the cost of making O() notation useless.

    >
    > It's not useless though...
    >
    > Compare Comba vs. Karatsuba multiplication...
    >
    > Comba is basically n^2 * (work of MULADD) + n * (work of carry
    > prop) ... or O(n^2)
    >
    > Where Karatsuba is
    >
    > 3x * comba of n/2 size + various 2n-length additions [with carry
    > prop], etc.. or O(n^log_2(3))
    >
    > So even though Comba is FASTER at small values of n we still say
    > "Karatsuba is more efficient."


    ....in terms of the asymptotic runtime behaviour, yes.

    > Now if there was another technique that was
    > 2*Karatsuba work you'd still write it as O(n^log_2(3)) even though
    > you'd know that at most reasonable sizes it'd be slower.  I didn't
    > invent big-Oh notation, so stop harping on me.  Nor am I saying "all
    > algorithms of equal big-Oh work are equivalent."


    No, you just suggested that algorithms of the same time complexity
    class "should in theory take roughly the same time" as n approaches
    infinity. For some definition of "take roughly the same time" this
    might be the case. But IMHO the most sensible interpretation of this
    claim was -- as Ben pointed out -- that the runtime ratio between both
    algorithms will approach 1 as n approaches infinity. As pointed out,
    this is not necessarily the case. If you meant to say that the
    algorithms have the same asymptotic runtime behaviour, you should have
    said just that.

    > But we don't know for a fact that the 2:1 ratio caries through larger
    > [possibly practical] sizes.


    Right. But we also don't know that it doesn't. ;-)

    Cheers!
    SG
     
    Sebastian, Oct 20, 2010
    #18
  19. Bogdan

    Tom St Denis Guest

    On Oct 20, 10:27 am, Sebastian <> wrote:
    > On 20 Okt., 13:46, Tom St Denis wrote:
    >
    >
    >
    >
    >
    >
    >
    >
    >
    > > On Oct 20, 7:32 am, Eric Sosman wrote:
    > > > On 10/20/2010 6:48 AM, Tom St Denis wrote:
    > > > [...]
    > > >      All you're saying is that real computers are finite, meaning
    > > > that we'll always have N <= Nmax.  That, in turn, implies that all
    > > > real implementations of all algorithms, even bogosort, have O(1)
    > > > performance because f(N) <= f(Nmax) == constant == O(1).

    >
    > > >      So, yes: You can, sort of, defend your misunderstanding of O()
    > > > notation, at the cost of making O() notation useless.

    >
    > > It's not useless though...

    >
    > > Compare Comba vs. Karatsuba multiplication...

    >
    > > Comba is basically n^2 * (work of MULADD) + n * (work of carry
    > > prop) ... or O(n^2)

    >
    > > Where Karatsuba is

    >
    > > 3x * comba of n/2 size + various 2n-length additions [with carry
    > > prop], etc.. or O(n^log_2(3))

    >
    > > So even though Comba is FASTER at small values of n we still say
    > > "Karatsuba is more efficient."

    >
    > ...in terms of the asymptotic runtime behaviour, yes.
    >
    > > Now if there was another technique that was
    > > 2*Karatsuba work you'd still write it as O(n^log_2(3)) even though
    > > you'd know that at most reasonable sizes it'd be slower.  I didn't
    > > invent big-Oh notation, so stop harping on me.  Nor am I saying "all
    > > algorithms of equal big-Oh work are equivalent."

    >
    > No, you just suggested that algorithms of the same time complexity
    > class "should in theory take roughly the same time" as n approaches
    > infinity.  For some definition of "take roughly the same time" this
    > might be the case.  But IMHO the most sensible interpretation of this
    > claim was -- as Ben pointed out -- that the runtime ratio between both
    > algorithms will approach 1 as n approaches infinity.  As pointed out,
    > this is not necessarily the case.  If you meant to say that the
    > algorithms have the same asymptotic runtime behaviour, you should have
    > said just that.


    I'm saying it's lost in the noise because for a single sample set with
    a difference BELOW the asymptotic rate it's lost.

    It's like saying

    strlen("hello") and strlen_mmx_super_optimized("hello") each take 15
    and 20 cycles. Therefore, the latter is always slower than the
    former. In reality, the setup time for short strings might eat into
    the performance gains. So we can say "it's lost in the noise."

    You don't know that array search is ALWAYS 2X faster than a tree
    search. You know that for a SINGLE test case it was 2x faster. In
    single test case I can show that qsort is no better than bubble sort.
    What does that prove?

    > > But we don't know for a fact that the 2:1 ratio caries through larger
    > > [possibly practical] sizes.

    >
    > Right. But we also don't know that it doesn't. ;-)


    Hence my comment thrice now that I should have said "probably lost in
    the noise." But since you diehard pedantics won't let it be just stop
    replying. I acknowledged the mistake that you're pointing out. There
    is no more level of correction needed.

    Tom
     
    Tom St Denis, Oct 20, 2010
    #19
  20. Bogdan

    Eric Sosman Guest

    On 10/20/2010 7:46 AM, Tom St Denis wrote:
    > [...]
    > IOW, stop putting words into my mouth and stop assuming that you know
    > more about the OPs problem than anyone else does (since the OP hasn't
    > explained their problem in enough detail).


    Nobody knows enough about the O.P.'s problem to say anything
    useful about it; I don't think anyone disputes that. We won't know
    enough until (and unless) he reveals a whole lot more than he has
    thus far.

    As for the words I have put into your mouth, they are

    > The overhead you're finding by a constant is acceptable (and falls
    > below the noise in big-Oh notation). They should in theory take
    > roughly the same time as n => oo, but for small values they'll skew.


    .... and if they are not your words, you are being impersonated by
    someone who does not understand big-Oh.

    --
    Eric Sosman
    lid
     
    Eric Sosman, Oct 21, 2010
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ramkumar Menon

    B+ Tree versus Ternary Search Tree

    Ramkumar Menon, Aug 16, 2005, in forum: Java
    Replies:
    2
    Views:
    1,648
    Roedy Green
    Aug 16, 2005
  2. Stub

    B tree, B+ tree and B* tree

    Stub, Nov 12, 2003, in forum: C Programming
    Replies:
    3
    Views:
    10,240
  3. sharan
    Replies:
    4
    Views:
    722
    CBFalconer
    Oct 30, 2007
  4. sharan
    Replies:
    2
    Views:
    867
    SM Ryan
    Oct 31, 2007
  5. sharan
    Replies:
    1
    Views:
    716
    CBFalconer
    Oct 30, 2007
Loading...

Share This Page