Math/CompSci Interview Question - Thoughts?

Discussion in 'C++' started by Andrew Tomazos, Nov 21, 2009.

  1. I was posed the following question in a technical interview for a
    Software Engineering position by a major multinational NASDAQ company:

    [Paraphrasing] "You are given an array of 1,000,000 32-bit integers.
    One int value x occurs 500,001 times or more in the array. Specify an
    algorithm to determine x."

    The assumption being extra points for efficiency.

    I initially came up with a linear space, linear time solution. With
    some prompting and hints from the interviewer we worked our way to a
    smaller constant space and linear time algorithm. That being
    basically:

    int findmode(int* p, int n)
    {
    int count[32];
    for(int i = 0; i < 32; i++)
    count = 0;

    for (int i = 0; i < n; i++)
    for (int j = 0; j < 32; j++)
    if (p & (1 << j)) // if bit j is on
    count[j]++;
    else
    count[j]--;

    int x = 0;
    for (int i = 0; i < 32; i++)
    if (count > 0)
    x = x | (1 << i);

    return x;
    }

    The idea here is to count the frequency of each of the 32 bits in the
    array separately, knowing that these bit counts will be dominated by
    the mode.

    The interviewer already knew the answer, so I can only assume the test
    was based on how much help he had to give me to arrive at it.

    My question about this is as follows. I, perhaps boldly, claim to
    employers to have a "masters-equivilant" experience/knowledge of
    computer science and math. Should I have been able to come up with
    this solution without prompting or help?

    Would you expect someone with a CompSci Masters or PhD from some major
    ivy league university to be able to come up with this solution without
    help?

    If so in what course or from which book would you expect to learn the
    required knowledge to come up with the above solution?

    Is the skill to be able to come up with such an algorithm something
    that is learned by studying lots of algorithms and being able to mix
    and match the "tricks"? If so, what is a similar commonly known
    algorithm(s) on which the above solution could have been based?

    Or, is the ability to invent such a solution simply a matter of IQ?
    Some people have the talent to solve the puzzle, see the pattern and
    come up with the solution - and some just don't?

    Thanks,
    Andrew.
     
    Andrew Tomazos, Nov 21, 2009
    #1
    1. Advertising

  2. Andrew Tomazos <> writes:

    > I was posed the following question in a technical interview for a
    > Software Engineering position by a major multinational NASDAQ company:
    >
    > [Paraphrasing] "You are given an array of 1,000,000 32-bit integers.
    > One int value x occurs 500,001 times or more in the array. Specify an
    > algorithm to determine x."
    >
    > The assumption being extra points for efficiency.
    >
    > I initially came up with a linear space, linear time solution. With
    > some prompting and hints from the interviewer we worked our way to a
    > smaller constant space and linear time algorithm. That being
    > basically:
    >
    > int findmode(int* p, int n)
    > {
    > int count[32];
    > for(int i = 0; i < 32; i++)
    > count = 0;
    >
    > for (int i = 0; i < n; i++)
    > for (int j = 0; j < 32; j++)
    > if (p & (1 << j)) // if bit j is on
    > count[j]++;
    > else
    > count[j]--;
    >
    > int x = 0;
    > for (int i = 0; i < 32; i++)
    > if (count > 0)
    > x = x | (1 << i);
    >
    > return x;
    > }
    >
    > The idea here is to count the frequency of each of the 32 bits in the
    > array separately, knowing that these bit counts will be dominated by
    > the mode.
    >
    > The interviewer already knew the answer, so I can only assume the test
    > was based on how much help he had to give me to arrive at it.
    >
    > My question about this is as follows. I, perhaps boldly, claim to
    > employers to have a "masters-equivalant" experience/knowledge of
    > computer science and math. Should I have been able to come up with
    > this solution without prompting or help?


    If what you're asking is whether anybody having a master in CS and
    maths would have been able to come up with this solution in the
    interview time, I guess we can answer that definitely no, otherwise
    there would be no point in trying to select candidates with this test.

    Obviously, it's because some people (with or without a master diploma,
    this really isn't relevant) get or don't get it, that this test is
    useful for the recruiter.



    Now if you want this kind of jobs, yes you should better be able to
    come up with smart solutions to little puzzles like this in
    interviews.




    > Would you expect someone with a CompSci Masters or PhD from some major
    > ivy league university to be able to come up with this solution without
    > help?
    >
    > If so in what course or from which book would you expect to learn the
    > required knowledge to come up with the above solution?


    Not a single one. You have to develop your knowledge of algorithms,
    mathematics, your symbolic thinking and your imagination in these
    matters.


    > Is the skill to be able to come up with such an algorithm something
    > that is learned by studying lots of algorithms and being able to mix
    > and match the "tricks"?


    That could help yes. I'd tend to think that imagination is the most
    important ingredient here, to be able to come with a possible solution
    fast enough.

    Also, don't limit yourself to CS and maths, there are ideas to be
    taken from other domains too.


    > If so, what is a similar commonly known
    > algorithm(s) on which the above solution could have been based?
    >
    > Or, is the ability to invent such a solution simply a matter of IQ?


    CS IQ, yes, if such a IQ test exists.


    > Some people have the talent to solve the puzzle, see the pattern and
    > come up with the solution - and some just don't?


    It seems so. At least, in a given time.

    --
    __Pascal Bourguignon__
     
    Pascal J. Bourguignon, Nov 21, 2009
    #2
    1. Advertising

  3. Andrew Tomazos

    GJ Woeginger Guest

    In comp.theory Andrew Tomazos <> wrote:
    # I was posed the following question in a technical interview for a
    # Software Engineering position by a major multinational NASDAQ company:
    #
    # [Paraphrasing] "You are given an array of 1,000,000 32-bit integers.
    # One int value x occurs 500,001 times or more in the array. Specify an
    # algorithm to determine x."
    #
    # The assumption being extra points for efficiency.

    There is an old analysis of this problem by Fischer and Salzberg.
    M.J. Fisher and S.L. Salzberg (1982)
    Finding a majority among n votes.
    Journal of Algorithms 3, pp 143-152.

    If 2k elements contain a majority element (= an element that occurs at
    least k+1 times), then you can find it with 3k-2 element comparisons
    (and some small overhead). The basic idea in their algorithm is that
    whenever you find two distinct elements, then you can destroy both without
    changing the majority element among the remaining elements. This yields:

    Run once through the array, and maintain a majority-candidate and a counter.
    The majority-candidate is initialized as the first element, and
    the counter (counting how often you have seen the candidate) is
    initialized at 1.
    If you hit the current candidate again, increment the counter.
    If you hit another element, decrement the counter.
    If the counter becomes 0, drop the current candidate and start from
    scratch with the remaining part of the array.

    Fischer and Salzberg also show that this bound 3k-2 is best possible in
    the worst case (and that's the main part of their paper).

    --Gerhard

    ___________________________________________________________
    Gerhard J. Woeginger http://www.win.tue.nl/~gwoegi/
     
    GJ Woeginger, Nov 21, 2009
    #3
  4. Andrew Tomazos

    A.G.McDowell Guest

    In article <
    s.com>, Andrew Tomazos <> writes
    >I was posed the following question in a technical interview for a
    >Software Engineering position by a major multinational NASDAQ company:
    >
    >[Paraphrasing] "You are given an array of 1,000,000 32-bit integers.
    >One int value x occurs 500,001 times or more in the array. Specify an
    >algorithm to determine x."
    >
    >The assumption being extra points for efficiency.
    >
    >I initially came up with a linear space, linear time solution. With
    >some prompting and hints from the interviewer we worked our way to a
    >smaller constant space and linear time algorithm. That being
    >basically:
    >

    This sort of problem is covered by articles on Data Stream Processing in
    CACM Oct 2009. (CACM is a lot more interesting these days than it was
    some years ago). There are some very neat ideas in there, of which the
    algorithm "MAJORITY" matches the question reasonably well. Proving that
    it works under interview conditions would be extremely impressive,
    though.

    This is not the first time that I have heard of interview questions that
    discuss issues recently covered in the computing literature. I am unable
    to tell whether these come from a desire to know if the candidate keeps
    themselves abreast of the subject, or from the interviewer grasping the
    first thing that comes to hand when they are trying to think up a
    question. The few times that I have posed interview questions, I have
    tried to find evidence in the candidate of a knowledge of basic theory
    or mathematics that I could show was relevant to the job for which we
    were trying to recruit.
    --
    A.G.McDowell
     
    A.G.McDowell, Nov 21, 2009
    #4
  5. Andrew Tomazos

    Bill Dubuque Guest

    (Richard Harter) wrote:
    >Andrew Tomazos <> wrote:
    >>
    >>I was posed the following question in a technical interview for a
    >>Software Engineering position by a major multinational NASDAQ company:
    >>
    >>[Paraphrasing] "You are given an array of 1,000,000 32-bit integers.
    >>One int value x occurs 500,001 times or more in the array. Specify an
    >>algorithm to determine x."
    >>
    >>The assumption being extra points for efficiency. [snip code] The idea
    >>is to count the frequency of each of the 32 bits in the array separately,
    >>knowing that these bit counts will be dominated by the mode. [...]
    >>
    >>My question about this is as follows. I, perhaps boldly, claim to
    >>employers to have a "masters-equivilant" experience/knowledge of
    >>computer science and math. Should I have been able to come up with
    >>this solution without prompting or help?
    >>Would you expect someone with a CompSci Masters or PhD from some major
    >>ivy league university to be able to come up with this solution without
    >>help? If so in what course or from which book would you expect to learn
    >>the required knowledge to come up with the above solution?

    >
    > That's an interesting question with an interesting presupposition.
    > The first thing to understand is that this is a puzzle rather than
    > a programming problem.


    I disagree. Saying that it's a puzzle rather than a programming problem
    seems to imply that the solution is ad-hoc, rather than a special case
    of some more universal technique. But that is not true here. Namely,
    the proposed solution follows immediately from the obvious fact that
    majority elts on product sets are preserved by component projections.
    So the solution is simply an instance of well-known divide-and-conquer
    techniques for product objects. Such reductions are ubiquitous in both
    mathematics and computer science. So I would expect a good student
    to find this solution given enough time. I'd also expect a student
    from a top-tier school to discover the more optimal solution, viz.

    bi != a => Maj({a^k b1 b2... bk} \/ S) = Maj(S)

    via m/n > 1/2 => (m-k)/(n-2k) > 1/2 via mediants/arithmetic

    again, assuming enough time. But that assumption is highly problematic
    when it comes to designing tests that quickly measure various skills.
    It's impossible to test such in the short time span of an interview.
    It remains difficult even in much longer timed tests. For example
    many of the most successful mathematicians did not perform well on
    the Putnam competition, and some of those who did well didn't turn
    out to be exceptional mathematicians. Thus there isn't necessarily
    much correlation between intelligence and random test scores.

    Marilyn vos Savant is a prime example. The woman who supposedly
    has the "world's highest IQ" published a Parade magazine column [1]
    and book [2] on Wiles proof of FLT. Her nonsensical arguments proved
    beyond a doubt that she has absolutely no clue about higher math
    and, worse, doesn't have the intelligence to know that (even after
    many experts pointed out gaping flaws in her very naive arguments).
    So much for IQ tests.

    --Bill Dubuque

    [1] sci.math, 11/29/1993, Full Text of Savant FLT Parade Article
    http://google.com/group/sci.math/msg/8cb44534c63372ad
    http://google.com/groups?selm=

    [2] Boston, Nigel; Granville, Andrew.
    Review of: The World's Most Famous Math Problem (The Proof
    of Fermat's Last Theorem and Other Mathematical Mysteries)
    by Marilyn vos Savant
    The American Mathematical Monthly, 102, 5, 1995, 470-473
    http://www.dms.umontreal.ca/~andrew/PDF/VS.pdf
    http://www.jstor.org/stable/2975048
     
    Bill Dubuque, Nov 21, 2009
    #5
  6. * Bill Dubuque:
    >
    > I disagree. Saying that it's a puzzle rather than a programming problem
    > seems to imply that the solution is ad-hoc, rather than a special case
    > of some more universal technique. But that is not true here. Namely,
    > the proposed solution follows immediately from the obvious fact that
    > majority elts on product sets are preserved by component projections.
    > So the solution is simply an instance of well-known divide-and-conquer
    > techniques for product objects. Such reductions are ubiquitous in both
    > mathematics and computer science. So I would expect a good student
    > to find this solution given enough time. I'd also expect a student
    > from a top-tier school to discover the more optimal solution, viz.
    >
    > bi != a => Maj({a^k b1 b2... bk} \/ S) = Maj(S)
    >
    > via m/n > 1/2 => (m-k)/(n-2k) > 1/2 via mediants/arithmetic


    Hiya. Could you please explain the above more optimal solution. I'm unfamiliar
    with the notation and not from a top-tier school, but in my experience anything
    that's not nonsense can be visualized or explained in simple terms (e.g., Albert
    Einstein did that beautifully with his book on special relativity, which except
    for one little proof in an appendix -- I didn't yet know anything about
    solving sets of equations with multiple variables -- I found eminently
    grokkable as a teenager, so should also be possible for the above, yes?).

    Cheers & TIA.,

    - Alf
     
    Alf P. Steinbach, Nov 21, 2009
    #6
  7. Andrew Tomazos

    Axel Vogt Guest

    Andrew Tomazos wrote:
    > I was posed the following question in a technical interview for a
    > Software Engineering position by a major multinational NASDAQ company:
    >
    > [Paraphrasing] "You are given an array of 1,000,000 32-bit integers.
    > One int value x occurs 500,001 times or more in the array. Specify an
    > algorithm to determine x."
    >
    > The assumption being extra points for efficiency.

    <snipped>

    Being a bit stupid I first would ask "Why? What do you do with it?"
    and then I would pick on random. I am almost certain, that even at
    a low number of draws the chance to get the very integer is higher
    than implementing an algo without coding errors.
     
    Axel Vogt, Nov 21, 2009
    #7
  8. On Nov 21, 10:53 pm, "Alf P. Steinbach" <> wrote:
    > * Bill Dubuque:
    >
    >
    >
    > > I disagree. Saying that it's a puzzle rather than a programming problem
    > > seems to imply that the solution is ad-hoc, rather than a special case
    > > of some more universal technique. But that is not true here. Namely,
    > > the proposed solution follows immediately from the obvious fact that
    > > majority elts on product sets are preserved by component projections.
    > > So the solution is simply an instance of well-known divide-and-conquer
    > > techniques for product objects. Such reductions are ubiquitous in both
    > > mathematics and computer science. So I would expect a good student
    > > to find this solution given enough time. I'd also expect a student
    > > from a top-tier school to discover the more optimal solution, viz.

    >
    > >         bi != a  =>  Maj({a^k b1 b2... bk} \/ S)  =  Maj(S)

    >
    > >  via  m/n > 1/2  =>  (m-k)/(n-2k) > 1/2   via mediants/arithmetic

    >
    > Hiya. Could you please explain the above more optimal solution.


    Dubuque is referring to the solution that Woeginger more lucidly
    described above. Both it and the bit counting method are
    asymptotically equivalent solutions to the original problem. I'm sure
    either of these solutions provided on the spot would have received
    "full marks".

    I guess what I am curious about is exactly what percentage of, say...
    CS PhD students at tier one universities, would be able to come up
    with either of these solutions on the spot. 80%, 50% or 20% ? I
    guess only someone that has interviewed many people with these sort of
    problems has the necessary data to answer my question.
    -Andrew.
     
    Andrew Tomazos, Nov 21, 2009
    #8
  9. On Nov 21, 6:29 pm, -graz.ac.at (GJ Woeginger)
    wrote:
    > In comp.theory Andrew Tomazos <> wrote:
    > # I was posed the following question in a technical interview for a
    > # Software Engineering position by a major multinational NASDAQ company:
    > #
    > # [Paraphrasing]  "You are given an array of 1,000,000 32-bit integers.
    > # One int value x occurs 500,001 times or more in the array.  Specify an
    > # algorithm to determine x."
    > #
    > # The assumption being extra points for efficiency.
    >
    > There is an old analysis of this problem by Fischer and Salzberg.
    >   M.J. Fisher and S.L. Salzberg  (1982)
    >   Finding a majority among n votes.
    >   Journal of Algorithms 3, pp 143-152.
    >
    > If 2k elements contain a majority element (= an element that occurs at
    > least k+1 times), then you can find it with 3k-2 element comparisons
    > (and some small overhead).  The basic idea in their algorithm is that
    > whenever you find two distinct elements, then you can destroy both without
    > changing the majority element among the remaining elements.  This yields:
    >
    >  Run once through the array, and maintain a majority-candidate and a counter.
    >  The majority-candidate is initialized as the first element, and
    >    the counter (counting how often you have seen the candidate) is
    >    initialized at 1.
    >  If you hit the current candidate again, increment the counter.
    >  If you hit another element, decrement the counter.
    >  If the counter becomes 0, drop the current candidate and start from
    >    scratch with the remaining part of the array.
    >
    > Fischer and Salzberg also show that this bound 3k-2 is best possible in
    > the worst case (and that's the main part of their paper).


    If I understand your description than it would look like:

    int findmode(int* p, int n)
    {
    int x = p[0];
    int c = 1;

    for (int i = 1; i < n; i++)
    {
    if (c == 0)
    {
    x = p;
    c = 1;
    }
    else if (p == x)
    c++;
    else
    c--;
    }

    return x;
    }

    It seems that this could only produce at maximum (2k-1) comparisions
    in the worst case, not (3k-2) as you claim?
    -Andrew.
     
    Andrew Tomazos, Nov 22, 2009
    #9
  10. On Nov 22, 1:01 am, Andrew Tomazos <> wrote:
    > On Nov 21, 6:29 pm, -graz.ac.at (GJ Woeginger)
    > wrote:
    >
    >
    >
    > > In comp.theory Andrew Tomazos <> wrote:
    > > # I was posed the following question in a technical interview for a
    > > # Software Engineering position by a major multinational NASDAQ company:
    > > #
    > > # [Paraphrasing]  "You are given an array of 1,000,000 32-bit integers.
    > > # One int value x occurs 500,001 times or more in the array.  Specify an
    > > # algorithm to determine x."
    > > #
    > > # The assumption being extra points for efficiency.

    >
    > > There is an old analysis of this problem by Fischer and Salzberg.
    > >   M.J. Fisher and S.L. Salzberg  (1982)
    > >   Finding a majority among n votes.
    > >   Journal of Algorithms 3, pp 143-152.

    >
    > > If 2k elements contain a majority element (= an element that occurs at
    > > least k+1 times), then you can find it with 3k-2 element comparisons
    > > (and some small overhead).  The basic idea in their algorithm is that
    > > whenever you find two distinct elements, then you can destroy both without
    > > changing the majority element among the remaining elements.  This yields:

    >
    > >  Run once through the array, and maintain a majority-candidate and a counter.
    > >  The majority-candidate is initialized as the first element, and
    > >    the counter (counting how often you have seen the candidate) is
    > >    initialized at 1.
    > >  If you hit the current candidate again, increment the counter.
    > >  If you hit another element, decrement the counter.
    > >  If the counter becomes 0, drop the current candidate and start from
    > >    scratch with the remaining part of the array.

    >
    > > Fischer and Salzberg also show that this bound 3k-2 is best possible in
    > > the worst case (and that's the main part of their paper).

    >
    > If I understand your description than it would look like:
    >
    > int findmode(int* p, int n)
    > {
    >     int x = p[0];
    >     int c = 1;
    >
    >     for (int i = 1; i < n; i++)
    >     {
    >        if (c == 0)
    >        {
    >           x = p;
    >           c = 1;
    >        }
    >        else if (p == x)
    >            c++;
    >        else
    >            c--;
    >     }
    >
    >     return x;
    >
    > }
    >
    > It seems that this could only produce at maximum (2k-1) comparisions
    > in the worst case, not (3k-2) as you claim?
    >   -Andrew.


    Oh wait... the (c==0) counts as a comparison. I see it now. Ignore
    me.
    -Andrew.
     
    Andrew Tomazos, Nov 22, 2009
    #10
  11. Andrew Tomazos

    Bill Dubuque Guest

    Andrew Tomazos <> wrote:
    >"Alf P. Steinbach" <> wrote:
    >>Bill Dubuque <> wrote:
    >>> (Richard Harter) wrote:
    >>>>Andrew Tomazos <> wrote:
    >>>>>
    >>>>>I was posed the following question in a technical interview for a
    >>>>>Software Engineering position by a major multinational NASDAQ company:
    >>>>>
    >>>>>[Paraphrasing] "You are given an array of 1,000,000 32-bit integers.
    >>>>>One int value x occurs 500,001 times or more in the array. Specify an
    >>>>>algorithm to determine x."
    >>>>>
    >>>>>The assumption being extra points for efficiency. [snip code] The idea
    >>>>>is to count the frequency of each of the 32 bits in the array separately,
    >>>>>knowing that these bit counts will be dominated by the mode. [...]
    >>>>>
    >>>>>My question about this is as follows. I, perhaps boldly, claim to
    >>>>>employers to have a "masters-equivilant" experience/knowledge of
    >>>>>computer science and math. Should I have been able to come up with
    >>>>>this solution without prompting or help?
    >>>>>Would you expect someone with a CompSci Masters or PhD from some major
    >>>>>ivy league university to be able to come up with this solution without
    >>>>>help? If so in what course or from which book would you expect to learn
    >>>>>the required knowledge to come up with the above solution?
    >>>>
    >>>> That's an interesting question with an interesting presupposition.
    >>>> The first thing to understand is that this is a puzzle rather than
    >>>> a programming problem.
    >>>
    >>> I disagree. Saying that it's a puzzle rather than a programming problem
    >>> seems to imply that the solution is ad-hoc, rather than a special case
    >>> of some more universal technique. But that is not true here. Namely,
    >>> the proposed solution follows immediately from the obvious fact that
    >>> majority elts on product sets are preserved by component projections.
    >>> So the solution is simply an instance of well-known divide-and-conquer
    >>> techniques for product objects. Such reductions are ubiquitous in both
    >>> mathematics and computer science. So I would expect a good student
    >>> to find this solution given enough time. I'd also expect a student
    >>> from a top-tier school to discover the more optimal solution, viz.
    >>>
    >>> bi != a => Maj({a^k b1 b2... bk} \/ S) = Maj(S)
    >>>
    >>> via m/n > 1/2 => (m-k)/(n-2k) > 1/2 via mediants/arithmetic
    >>>
    >>> again, assuming enough time. But that assumption is highly problematic
    >>> when it comes to designing tests that quickly measure various skills.
    >>> It's impossible to test such in the short time span of an interview.
    >>> It remains difficult even in much longer timed tests. For example
    >>> many of the most successful mathematicians did not perform well on
    >>> the Putnam competition, and some of those who did well didn't turn
    >>> out to be exceptional mathematicians. Thus there isn't necessarily
    >>> much correlation between intelligence and random test scores.
    >>>
    >>> Marilyn vos Savant is a prime example. The woman who supposedly
    >>> has the "world's highest IQ" published a Parade magazine column [1]
    >>> and book [2] on Wiles proof of FLT. Her nonsensical arguments proved
    >>> beyond a doubt that she has absolutely no clue about higher math
    >>> and, worse, doesn't have the intelligence to know that (even after
    >>> many experts pointed out gaping flaws in her very naive arguments).
    >>> So much for IQ tests.
    >>>
    >>> [1] sci.math, 11/29/1993, Full Text of Savant FLT Parade Article
    >>> http://google.com/group/sci.math/msg/8cb44534c63372ad
    >>> http://google.com/groups?selm=
    >>>
    >>> [2] Boston, Nigel; Granville, Andrew.
    >>> Review of: The World's Most Famous Math Problem (The Proof
    >>> of Fermat's Last Theorem and Other Mathematical Mysteries)
    >>> by Marilyn vos Savant
    >>> The American Mathematical Monthly, 102, 5, 1995, 470-473
    >>> http://www.dms.umontreal.ca/~andrew/PDF/VS.pdf

    >>
    >> Hiya. Could you please explain the above more optimal solution.


    I'll be happy to elaborate if you say what isn't clear to you.

    > Dubuque is referring to the solution that Woeginger more lucidly
    > described above.


    No, I hadn't yet seen Woeginger's post when I posted that. Note
    that what one finds "more lucid" may depend on one's background.
    A computer scientist might find W's post more lucid, whereas a
    mathematician may find the above more to the essence of the matter,
    esp. since the mediant viewpoint make the underlying math trivial
    (W's post completely omits discussion of any math or proof).

    --Bill Dubuque
     
    Bill Dubuque, Nov 22, 2009
    #11
  12. These questions/puzzles are to see how you approach the problem, but
    if you've seen the puzzle before it is a useless practice.

    --
    Regards,
    Casey
     
    Casey Hawthorne, Nov 22, 2009
    #12
  13. Andrew Tomazos

    Willem Guest

    Virgil wrote:
    ) I
    )> >>>I was posed the following question in a technical interview for a
    )> >>>Software Engineering position by a major multinational NASDAQ company:
    )> >>>
    )> >>>[Paraphrasing] "You are given an array of 1,000,000 32-bit integers.
    )> >>>One int value x occurs 500,001 times or more in the array. Specify an
    )> >>>algorithm to determine x."
    )
    ) Find, to the nearest integer, the average value.

    Doesn't work.
    Consider the case of 500,001 times 0 and 499,999 times 2^31-1

    Perhaps you are confused with 'find the median value' ?


    SaSW, Willem
    --
    Disclaimer: I am in no way responsible for any of the statements
    made in the above text. For all I know I might be
    drugged or something..
    No I'm not paranoid. You all think I'm paranoid, don't you !
    #EOT
     
    Willem, Nov 22, 2009
    #13
  14. Andrew Tomazos

    Willem Guest

    Andrew Tomazos wrote:
    ) Dubuque is referring to the solution that Woeginger more lucidly
    ) described above. Both it and the bit counting method are
    ) asymptotically equivalent solutions to the original problem. I'm sure
    ) either of these solutions provided on the spot would have received
    ) "full marks".

    I'm not so sure. I wouldn't be surprised if you only get "full marks"
    for the solution that the interviewer has in mind, even if there are
    other solutions that are equivalent or better.

    NB: The counting method is actually better, because you don't need
    O(k * log n) space (where k is the size of an individual key).
    Or, practically speaking, it also works if the items are strings.


    A very simple proof of the 'other' method would be that whenever
    c reaches 0, the part of the array you processed can't have more
    than half of the 'majority' item. (Exactly half of it is one single
    item, and the other half is not that single item) So the remaining
    part *must* have more than half, i.e. the problem is reduced to
    the same problem for the remaining part.


    SaSW, Willem
    --
    Disclaimer: I am in no way responsible for any of the statements
    made in the above text. For all I know I might be
    drugged or something..
    No I'm not paranoid. You all think I'm paranoid, don't you !
    #EOT
     
    Willem, Nov 22, 2009
    #14
  15. Andrew Tomazos

    Moi Guest

    On Sun, 22 Nov 2009 08:47:19 +0000, Richard Heathfield wrote:

    > In <>, Axel Vogt wrote:
    >
    >> Andrew Tomazos wrote:
    >>> I was posed the following question in a technical interview for a
    >>> Software Engineering position by a major multinational NASDAQ company:
    >>>
    >>> [Paraphrasing] "You are given an array of 1,000,000 32-bit [integers.
    >>> One int value x occurs 500,001 times or more in the array. Specify an
    >>> algorithm to determine x."
    >>>
    >>> The assumption being extra points for efficiency.

    >> <snipped>
    >>
    >> Being a bit stupid I first would ask "Why? What do you do with it?" and
    >> then I would pick on random. I am almost certain, that even at a low
    >> number of draws the chance to get the very integer is higher than
    >> implementing an algo without coding errors.

    >
    > I thought of that too, but quickly saw the flaw - if for large N you
    > have (N/2)+1 occurrences of X, and (N/2)-1 occurrences of Y, you can't
    > settle the matter (is it X, or is it Y?) satisfactorily without reading
    > every element.


    With a kind of weight-balanced tree,
    ( := rotate in such a way that the node with the higher reference
    count is closer to the root) you can terminate reading and inserting
    the values once there is a clear "winner".

    I think, something similar can be done for the bitcounting method:
    if all bins have a value with (abs(value) > number_of_remaining_items)
    you can terminate the inspection / insertion.



    AvK
     
    Moi, Nov 22, 2009
    #15
  16. On Nov 22, 9:50 am, Willem <> wrote:
    > ) Dubuque is referring to the solution that Woeginger more lucidly
    > ) described above.  Both it and the bit counting method are
    > ) asymptotically equivalent solutions to the original problem.  I'm sure
    > ) either of these solutions provided on the spot would have received
    > ) "full marks".
    >
    > I'm not so sure.  I wouldn't be surprised if you only get "full marks"
    > for the solution that the interviewer has in mind, even if there are
    > other solutions that are equivalent or better.


    Ummm. No, I don't think so. It was my impression that performance is
    assessed according to the asymptotic equivalence class of time and
    space requirements. At least in the interview setting I am talking
    about.

    > NB: The counting method is actually better, because you don't need
    > O(k * log n) space (where k is the size of an individual key).
    > Or, practically speaking, it also works if the items are strings.


    Normally, k (the number of bits in an int) would be considered
    constant (I haven't encountered many 1,000,000 bit integers as yet),
    therefore both solutions are asymptotically equivalent in space and
    time. There is usually a very large number of ways to make
    nonasymptotic improvements to any algorithm. None of this precludes
    one solution being "better" than the other by some other measure
    though, but it gets harder to measure performance at a finer grain.
    You need to start considering the impact of the memory hierarchy,
    locality of reference, what the optimizer is going to do, and so on.
    To be honest I would expect both algorithms to be memory bound by the
    single pass of the large array. But it would be interesting to
    compare anyway. Maybe I'll do a performance comparison.
    -Andrew
     
    Andrew Tomazos, Nov 22, 2009
    #16
  17. Andrew Tomazos

    Willem Guest

    Andrew Tomazos wrote:
    ) On Nov 22, 9:50 am, Willem <> wrote:
    )> I'm not so sure.  I wouldn't be surprised if you only get "full marks"
    )> for the solution that the interviewer has in mind, even if there are
    )> other solutions that are equivalent or better.
    )
    ) Ummm. No, I don't think so. It was my impression that performance is
    ) assessed according to the asymptotic equivalence class of time and
    ) space requirements. At least in the interview setting I am talking
    ) about.

    My comment is based on my general impression of interviewers, not on the
    way the OP worded his question. I find it more likely that an interviewer
    posing such a question 'knows the trick' and wants the interviewee to find
    the same trick. Also note that in this case, the interviewer goaded the
    interviewee to the less general solution (see below).

    )> NB: The counting method is actually better, because you don't need
    )> O(k * log n) space (where k is the size of an individual key).
    )> Or, practically speaking, it also works if the items are strings.

    Did you read the last sentence ? You know, the one with 'strings' ?

    ) Normally, k (the number of bits in an int) would be considered
    ) constant (I haven't encountered many 1,000,000 bit integers as yet),
    ) therefore both solutions are asymptotically equivalent in space and
    ) time.

    We're looking at the difference between two algorithms. With one view,
    they're equivalent, with another view one of them is better. To me, that
    means that that one is better overall.

    Furthermore, the counting solution is more general, because it
    can work on any type of item with an equivalence operator.

    All of this has nothing to do with nonasymptotic behaviour.

    ) <snip>


    SaSW, Willem
    --
    Disclaimer: I am in no way responsible for any of the statements
    made in the above text. For all I know I might be
    drugged or something..
    No I'm not paranoid. You all think I'm paranoid, don't you !
    #EOT
     
    Willem, Nov 22, 2009
    #17
  18. Andrew Tomazos

    Chip Eastham Guest

    On Nov 21, 7:04 pm, Andrew Tomazos <> wrote:
    > On Nov 22, 1:01 am, Andrew Tomazos <> wrote:
    >
    >
    >
    > > On Nov 21, 6:29 pm, -graz.ac.at (GJ Woeginger)
    > > wrote:

    >
    > > > In comp.theory Andrew Tomazos <> wrote:
    > > > # I was posed the following question in a technical interview for a
    > > > # Software Engineering position by a major multinational NASDAQ company:
    > > > #
    > > > # [Paraphrasing]  "You are given an array of 1,000,000 32-bit integers.
    > > > # One int value x occurs 500,001 times or more in the array.  Specify an
    > > > # algorithm to determine x."
    > > > #
    > > > # The assumption being extra points for efficiency.

    >
    > > > There is an old analysis of this problem by Fischer and Salzberg.
    > > >   M.J. Fisher and S.L. Salzberg  (1982)
    > > >   Finding a majority among n votes.
    > > >   Journal of Algorithms 3, pp 143-152.

    >
    > > > If 2k elements contain a majority element (= an element that occurs at
    > > > least k+1 times), then you can find it with 3k-2 element comparisons
    > > > (and some small overhead).  The basic idea in their algorithm is that
    > > > whenever you find two distinct elements, then you can destroy both without
    > > > changing the majority element among the remaining elements.  This yields:

    >
    > > >  Run once through the array, and maintain a majority-candidate and a counter.
    > > >  The majority-candidate is initialized as the first element, and
    > > >    the counter (counting how often you have seen the candidate) is
    > > >    initialized at 1.
    > > >  If you hit the current candidate again, increment the counter.
    > > >  If you hit another element, decrement the counter.
    > > >  If the counter becomes 0, drop the current candidate and start from
    > > >    scratch with the remaining part of the array.

    >
    > > > Fischer and Salzberg also show that this bound 3k-2 is best possible in
    > > > the worst case (and that's the main part of their paper).

    >
    > > If I understand your description than it would look like:

    >
    > > int findmode(int* p, int n)
    > > {
    > >     int x = p[0];
    > >     int c = 1;

    >
    > >     for (int i = 1; i < n; i++)
    > >     {
    > >        if (c == 0)
    > >        {
    > >           x = p;
    > >           c = 1;
    > >        }
    > >        else if (p == x)
    > >            c++;
    > >        else
    > >            c--;
    > >     }

    >
    > >     return x;

    >
    > > }

    >
    > > It seems that this could only produce at maximum (2k-1) comparisions
    > > in the worst case, not (3k-2) as you claim?
    > >   -Andrew.

    >
    > Oh wait... the (c==0) counts as a comparison.  I see it now.  Ignore
    > me.
    >   -Andrew.


    I don't think the (c==0) counts as a comparison.
    I think the other k-1 comparisons are for the 2nd
    part of the algorithm, when you make one more pass
    through the array verifying the candidate found in
    the 1st part. [You already know the index i for
    one occurrence of the candidate, so only the other
    k-1 indices have to be checked.]

    You can see how badly I (mathtalk-ga) stumbled
    through this same puzzle here:

    [Google Answers: algorithms]
    http://answers.google.com/answers/threadview/id/582711.html

    regards, chip
     
    Chip Eastham, Nov 22, 2009
    #18
  19. -graz.ac.at (GJ Woeginger) writes:

    > In comp.theory Andrew Tomazos <> wrote:
    > # I was posed the following question in a technical interview for a
    > # Software Engineering position by a major multinational NASDAQ company:
    > #
    > # [Paraphrasing] "You are given an array of 1,000,000 32-bit integers.
    > # One int value x occurs 500,001 times or more in the array. Specify an
    > # algorithm to determine x."
    > #
    > # The assumption being extra points for efficiency.
    >
    > There is an old analysis of this problem by Fischer and Salzberg.
    > M.J. Fisher and S.L. Salzberg (1982)
    > Finding a majority among n votes.
    > Journal of Algorithms 3, pp 143-152.


    I can't track this down. Have I got the right Journal of Algorithms?

    http://www.informatik.uni-trier.de/~ley/db/journals/jal/jal3.html

    does not list that paper. Of course, this table of contents might be
    wrong. The horrendous link:

    http://www.sciencedirect.com/scienc...serid=10&md5=2c8bc9013d6025d08b7d24d3207dfe18

    appears to be for the journal itself. The paper does not appear
    there, either.

    > If 2k elements contain a majority element (= an element that occurs at
    > least k+1 times), then you can find it with 3k-2 element comparisons
    > (and some small overhead). The basic idea in their algorithm is that
    > whenever you find two distinct elements, then you can destroy both without
    > changing the majority element among the remaining elements. This yields:
    >
    > Run once through the array, and maintain a majority-candidate and a counter.
    > The majority-candidate is initialized as the first element, and
    > the counter (counting how often you have seen the candidate) is
    > initialized at 1.
    > If you hit the current candidate again, increment the counter.
    > If you hit another element, decrement the counter.
    > If the counter becomes 0, drop the current candidate and start from
    > scratch with the remaining part of the array.
    >
    > Fischer and Salzberg also show that this bound 3k-2 is best possible in
    > the worst case (and that's the main part of their paper).


    I can find a 1982 technical report from Yale:

    FINDING A MAJORITY AMONG N VOTES Solution to Problem 81-5 (Journal
    of Algorithms, June 1981)

    by the same two authors. It describes what seems to me to be a
    different algorithm and for which the 3k - 2 bound is proved. I
    suspect the algorithm has been modified since the technical report but
    I can't find a good reference that matches the algorithm you describe.

    <snip>
    --
    Ben.
     
    Ben Bacarisse, Nov 22, 2009
    #19
  20. Andrew Tomazos

    mjc Guest

    I was also given this problem as part of an interview quiz. My answer
    was: Use one of the known O(n) median algorithms and use that value.
    This was accepted by the interviewer (but I didn't get the job).
     
    mjc, Nov 22, 2009
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    0
    Views:
    619
  2. Dave Benjamin

    Thoughts on Guido's ITC audio interview

    Dave Benjamin, Jun 26, 2005, in forum: Python
    Replies:
    24
    Views:
    715
    Paul Rubin
    Jul 7, 2005
  3. Delaney, Timothy (Tim)

    RE: Thoughts on Guido's ITC audio interview

    Delaney, Timothy (Tim), Jun 27, 2005, in forum: Python
    Replies:
    3
    Views:
    338
    Ville Vainio
    Jul 7, 2005
  4. Tony Meyer
    Replies:
    5
    Views:
    357
    Paul Rubin
    Jul 14, 2005
  5. VK
    Replies:
    15
    Views:
    1,180
    Dr J R Stockton
    May 2, 2010
Loading...

Share This Page