# PLS HELP w/ Interview Question!!

Discussion in 'C++' started by Nobody, Nov 18, 2006.

1. ### NobodyGuest

I've been looking for a job for a while now, and have run into this
interview question twice now... and have stupidly kind of blown it twice...
(although I've gotten better)... time to finally figure this thing out...

basically the interview question is: given an unsorted listed such as:

3,1,3,7,1,2,4,4,3

find the FIRST UNIQUE number, in this case 7... and of course the list can
be millions long...

the first time I got this question, my solution was something like:

1) declare a second array A2 with n elements
2) loop through original array A1 and insert the element in its sorted
position in A2, if I find a match eliminate, mark all copies of that element
as "duplicate"
3) loop through original array A1 again and find the first number that is
not listed as a duplicate in A2

I think at the time of that interview, I pointed out that this wasnt a very
good algorithm as it would be something like O(2*(n^2)), but it will get the
job done (I did get offered that job actually, but stupidly turned it
down)...

I then offered up another solution at that interview which I thought was
better:

1) build a copy of A1 that has the original position of each element
2) sort A1
3) loop through sorted A1 deleting any duplicate items
4) now A1 will only contain unique items... so loop through it again looking
for the smallest position that was marked in step 1

this was kind of the algorithm I started out with in interview #2 (which I
really wanted, but didnt get)...as the interviewer asked me to give him the
run time of that algorithm which I said was:

step #1: n
step #2: n log n
step #3: n
step #4: n

resulting in 3n + n log n...

this was a lot better then the original algorithm, but still not too great
because of the 3n term...

I got my rejection from the 2nd place yesterday, so I kind of was up all
night trying to figure out how to get it faster...

I came up with something like this:

1) loop through A1 building a customized AVL tree... an AVL tree insertion
is O(log n), so building this tree should take O(n log n).. the customized
part is... each node inserted will bring along its original position in the
array (0..n)... if I run across a duplicate node, I will mark the position
of the node thats already in the tree as "-1" and dump the node I'm trying
to insert

so basically I've spent O(n log n) to combine steps 1, 2 & 3 of my last
solution... basically taking 2n + n log n down to O(n log n)...

2) but now I need to visit every node in the tree to find out which one has
the lowest position which is O(n) obviously ignoring any nodes I marked
as -1.

so this final "optimized" solution I came up with last night would still
require O(n + n log n).

Is there any way to optimize this further? or a better algorithm all
together? I'm thinking perhaps to keep track of the first unique node as I'm
creating the tree? but I couldn't nail down an idea that would work...

I'm sort of invisioning the tree having a member called m_nPos initialized
to 0... and as I'm building the tree, if I find a duplicate element at
m_nPos, to increment it. So basically I insert element 0, then later on, I
find a duplicate for it, so the first unique item can't possibly be at zero,
it has to be at least at position 1. But then as I did a few test cases, I
poke holes in this idea very easily.

Any ideas?

Nobody, Nov 18, 2006

2. ### Guest

> basically the interview question is: given an unsorted listed such as:
>
> 3,1,3,7,1,2,4,4,3

Where there any other constraints?

, Nov 18, 2006

3. ### NobodyGuest

<> wrote in message
news:...
>
>
>> basically the interview question is: given an unsorted listed such as:
>>
>> 3,1,3,7,1,2,4,4,3

>
> Where there any other constraints?
>

Nope, just given an unsorted list of n numbers (n could be millions), find
the first unique number. Obviously you can't use a built in
"FindFirstUnique()" number function, the exercise was to write that
function. I thought there may be some tricks with using a database, but
thought the overhead of the database would kill any performance gains in the
algorithm.

Obviously the point is to find the first unique number in the shortest time
possible... the algorithm I described in the original post was O (n + n log
n)... I'm wondering if it can be done faster.

Nobody, Nov 18, 2006
4. ### Guest

No idea for solving that. The AVL tree is unclear for me, but a
brute-force solution is simple, though time-consuming

, Nov 18, 2006
5. ### Wayne MarshGuest

Nobody wrote:
> I've been looking for a job for a while now, and have run into this
> interview question twice now... and have stupidly kind of blown it twice...
> (although I've gotten better)... time to finally figure this thing out...
>
> basically the interview question is: given an unsorted listed such as:
>
> 3,1,3,7,1,2,4,4,3

Here's the first method that springs to mind for me:

Create a vector of bools initialised to false the same size as the list,
and then run through the list, setting the item in the vector at the
position of the number encountered to true, and removing ones that are
already true. Then simply look at the first element of the vector.
Bingo. One pass, right?

I bet I've made some glaring error in process or efficiency.

Wayne Marsh, Nov 18, 2006
6. ### Wayne MarshGuest

Wayne Marsh wrote:
> Nobody wrote:
>> I've been looking for a job for a while now, and have run into this
>> interview question twice now... and have stupidly kind of blown it
>> twice... (although I've gotten better)... time to finally figure this
>> thing out...
>>
>> basically the interview question is: given an unsorted listed such as:
>>
>> 3,1,3,7,1,2,4,4,3

>
> Here's the first method that springs to mind for me:
>
> Create a vector of bools initialised to false the same size as the list,
> and then run through the list, setting the item in the vector at the
> position of the number encountered to true, and removing ones that are
> already true. Then simply look at the first element of the vector.
> Bingo. One pass, right?
>
> I bet I've made some glaring error in process or efficiency.

This doesn't even make sense at all.

Sorry.

Wayne Marsh, Nov 18, 2006
7. ### Frederick GothamGuest

Nobody:

> given an unsorted listed such as:
>
> 3,1,3,7,1,2,4,4,3
>
> find the FIRST UNIQUE number, in this case 7... and of course the list
> can be millions long...

I've only recently begun reading a book on the C++ Standard Library, so I'm
not very familiar with all the built-in iterator routines and so forth, so
I can't provide an optimal solution.

However, I could give you a inefficient solution that gets the job done by
simply putting thought into it:

(This code won't be Grade A because I'm only starting to get the hang of
iterators and so forth. Expect mistakes.)

#include <cstddef>

template<class Iterator,class T>
std::size_t QuantOccur(Iterator begin,Iterator const end,T const &obj)
{
size_t quant = 0;

while (end!=begin) if (*begin++==obj) ++quant;

return quant;
}

template<class Iterator>
Iterator FindFirstUnique(Iterator begin,Iterator const end)
{
for (;end!=begin;++begin) if (1==QuantOccur(begin,end,*begin)) break;

return begin;
}

#include <iostream>
#include <ostream>

int main()
{
int arr[] = {3,1,3,7,1,2,4,4,3};

int i = *FindFirstUnique(arr,arr+sizeof arr/sizeof*arr);

std::cout << i << std::endl;
}

For all I know, there could be a Standard Library facility that does this
with a single line of code!

--

Frederick Gotham

Frederick Gotham, Nov 18, 2006
8. ### Frederick GothamGuest

Frederick Gotham:

> size_t quant = 0;

Once again my retard compiler failed to spot the error.

Frederick Gotham, Nov 18, 2006
9. ### NobodyGuest

<> wrote in message
news:...
> No idea for solving that. The AVL tree is unclear for me, but a
> brute-force solution is simple, though time-consuming
>

Hehe... that was kind of the point of my question. In the interview, I got
it down to O(3n + n log n), later on at home I got it down to O(n + n log
n). I'm trying to see if there is a faster way.

Nobody, Nov 18, 2006
10. ### NobodyGuest

"Wayne Marsh" <> wrote in message
news:455f6b31\$0\$2440\$...
> Wayne Marsh wrote:
>> Nobody wrote:
>>> I've been looking for a job for a while now, and have run into this
>>> interview question twice now... and have stupidly kind of blown it
>>> twice... (although I've gotten better)... time to finally figure this
>>> thing out...
>>>
>>> basically the interview question is: given an unsorted listed such as:
>>>
>>> 3,1,3,7,1,2,4,4,3

>>
>> Here's the first method that springs to mind for me:
>>
>> Create a vector of bools initialised to false the same size as the list,
>> and then run through the list, setting the item in the vector at the
>> position of the number encountered to true, and removing ones that are
>> already true. Then simply look at the first element of the vector. Bingo.
>> One pass, right?
>>
>> I bet I've made some glaring error in process or efficiency.

>
> This doesn't even make sense at all.
>
> Sorry.

Probably not... you'd have to be able to map the bools to the items and the
position which would start getting to the order of my original first pass
2*(n^2) algorithm.

Same thing with a hash table, you'd need the original order somehow and you
can't really remove the first and second item, because then when a 3rd item
comes along, you wont know its a duplicate.

Nobody, Nov 18, 2006
11. ### NobodyGuest

"Frederick Gotham" <> wrote in message
news:S8K7h.16143\$...
> Nobody:
>
>> given an unsorted listed such as:
>>
>> 3,1,3,7,1,2,4,4,3
>>
>> find the FIRST UNIQUE number, in this case 7... and of course the list
>> can be millions long...

>
>
> I've only recently begun reading a book on the C++ Standard Library, so
> I'm
> not very familiar with all the built-in iterator routines and so forth, so
> I can't provide an optimal solution.
>
> However, I could give you a inefficient solution that gets the job done by
> simply putting thought into it:
>
> (This code won't be Grade A because I'm only starting to get the hang of
> iterators and so forth. Expect mistakes.)
>
> #include <cstddef>
>
> template<class Iterator,class T>
> std::size_t QuantOccur(Iterator begin,Iterator const end,T const &obj)
> {
> size_t quant = 0;
>
> while (end!=begin) if (*begin++==obj) ++quant;
>
> return quant;
> }
>
> template<class Iterator>
> Iterator FindFirstUnique(Iterator begin,Iterator const end)
> {
> for (;end!=begin;++begin) if (1==QuantOccur(begin,end,*begin)) break;
>
> return begin;
> }
>
> #include <iostream>
> #include <ostream>
>
> int main()
> {
> int arr[] = {3,1,3,7,1,2,4,4,3};
>
> int i = *FindFirstUnique(arr,arr+sizeof arr/sizeof*arr);
>
> std::cout << i << std::endl;
> }
>
>
> For all I know, there could be a Standard Library facility that does this
> with a single line of code!
>
> --
>
> Frederick Gotham

If I understand the algorithm, you go through the list and get the occurence
count of each item, and stop on the first one where count = 1.

Hmm... Big-O wise.. its something like

for 0 .. n // loop through list
for 0 .. n // get occurence count for outer loop

So the runtime here appears to be n^2. Its much cleaner and simpler then my
original 2*(n^2) algorithm.. Hehe.... no to sound picky here, but I didn't
get the job because my algorithm was apperently not good enough... so I'm
trying to nail this down further then even the AVL tree approach with O(n +
n log n)...

I'm wondering now if building an AVL tree is really n log n... since the
outer n is constant, but the inner n is increasing...

Nobody, Nov 18, 2006
12. ### Alf P. SteinbachGuest

* Nobody:
> I've been looking for a job for a while now, and have run into this
> interview question twice now... and have stupidly kind of blown it twice...
> (although I've gotten better)... time to finally figure this thing out...
>
> basically the interview question is: given an unsorted listed such as:
>
> 3,1,3,7,1,2,4,4,3
>
> find the FIRST UNIQUE number, in this case 7... and of course the list can
> be millions long...

[snip]
> Any ideas?

Yeah, post in [comp.programming], it's off-topic in clc++, and don't
post questions that have better than 50% chance of being HOMEWORK.

Hints: (1) you can't do it in less than linear time, (2) it's trivial to
do it in linear time with linear memory consumption, (3) with more
stringent conditions it becomes a challenge just to find out whether
it's doable.

Follow-ups to [comp.programming].

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

Alf P. Steinbach, Nov 18, 2006
13. ### NobodyGuest

"Alf P. Steinbach" <> wrote in message
news:...
>* Nobody:
>> I've been looking for a job for a while now, and have run into this
>> interview question twice now... and have stupidly kind of blown it
>> twice... (although I've gotten better)... time to finally figure this
>> thing out...
>>
>> basically the interview question is: given an unsorted listed such as:
>>
>> 3,1,3,7,1,2,4,4,3
>>
>> find the FIRST UNIQUE number, in this case 7... and of course the list
>> can be millions long...

>
> [snip]
>> Any ideas?

>
> Yeah, post in [comp.programming], it's off-topic in clc++, and don't post
> questions that have better than 50% chance of being HOMEWORK.
>
> Hints: (1) you can't do it in less than linear time, (2) it's trivial to
> do it in linear time with linear memory consumption, (3) with more
> stringent conditions it becomes a challenge just to find out whether it's
> doable.
>
>
>
> Follow-ups to [comp.programming].
>

Thanks, I've taken it there. By the way, I've been out of school for 12
years now .

Nobody, Nov 18, 2006
14. ### Gianni MarianiGuest

Nobody wrote:
....
> Any ideas?

You could use an "index" sort and this would be O(n) - this only works
if the elements are "indexable" i.e. a limited range otherwise you're
limited to nlogn.

i.e.
using an index sort:

a) create a "tally" vector<int> sized to the max element in the set.
b) scan through the elements incrementing the tally[index] in the vector.
c) scan through the elements a second time and stop on the first one
with tally[index] ==1.

If you can't use an index, you can sort the elements through a double
indirection:

a) create a "flags" vector<pair<bool,iterator>> setting first to true
and second to each consecutive iterator of the element list
b) create a "pointers" initialized to point to each element in flags
c) use a special compare routine that compares two * pair<bool,iterator>
if the (*second) are equal then set the first to false on both
operands.
d) scan through the "flags" vector looking for the first element with
first as "true".

This uses a property of sort algorithms (except for index sorts) that is
tricky to describe but I'll give it a go. While performing any sort on
any list of elements, any element that is duplicated must be compared to
at least one other duplicated element. Hence if you instrument the
compare operator of a sort algorithm, you can eliminate duplicate
elements. The corollary is that removing duplicates from a list of
elements need not be more complex that sorting. It's probably not any
easier than sorting either.

G

Here is an implementation of the second algorithm:

// ======== FirstUniqueDupeEliminator =================================
/**
* Helper for FirstUnique().
*
*/

template <
typename w_iterator
>

class FirstUniqueDupeEliminator
{
public:

typedef
std:air<
bool,
w_iterator
> Element;

std::vector< Element > m_values;
std::vector< Element * > m_values_ptrs;

w_iterator m_result;

void Remove( Element * i_listiter )
{
i_listiter->first = false;
}

bool operator()( Element * i_rhs, Element * i_lhs )
{
// if we're pointing to the same element
// we don't do anything special
if ( i_lhs == i_rhs )
{
return false;
}

if ( *(i_lhs->second) < *(i_rhs->second) )
{
return true;
}

if ( *(i_rhs->second) < *(i_lhs->second) )
{
return false;
}

// oops they're the same eliminate the elements from the list
//
Remove( i_rhs );
Remove( i_lhs );

return false;
}

FirstUniqueDupeEliminator(
const w_iterator & i_begin,
const w_iterator & i_end
)
{
if ( i_begin == i_end )
{
m_result = i_end;
return;
}

w_iterator l_tmp = i_begin;

while ( l_tmp != i_end )
{
m_values.push_back( Element( true, l_tmp ) );

++ l_tmp;
}

m_values_ptrs.resize( m_values.size() );

for ( std::size_t i = 0; i < m_values.size(); ++i )
{
m_values_ptrs[ i ] = & m_values[ i ];
}

std::sort( m_values_ptrs.begin(), m_values_ptrs.end(), *this );

for ( std::size_t i = 0; i < m_values.size(); ++i )
{
if ( m_values[ i ].first )
{
m_result = m_values[ i ].second;
return;
}
}

m_result = i_end;
}

operator w_iterator() const
{
return m_result;
}

};

// ======== FirstUnique ===============================================
/**
* FirstUnique returns the first unique element in a container
*
* @param i_begin The first element in the continer
* @param i_end The last element in the container
*
* @return The iterator to the first element in the
* range that is unique or i_end if no elements are unique
*/

template <
typename w_iterator
>

w_iterator FirstUnique(
const w_iterator & i_begin,
const w_iterator & i_end
) {

return FirstUniqueDupeEliminator<w_iterator>( i_begin, i_end );
}

const int array[] = { 3,1,3,7,1,2,4,4,3 };

int main()
{

const int * l_val = FirstUnique(
&array[0], at::ArrayEnd( array )
);
}

Gianni Mariani, Nov 19, 2006
15. ### Andrew KoenigGuest

"Nobody" <> wrote in message
news:IiK7h.216\$...

> Hehe... that was kind of the point of my question. In the interview, I got
> it down to O(3n + n log n), later on at home I got it down to O(n + n log
> n). I'm trying to see if there is a faster way.

If someone told me that an algorithm was O(3n+n log n), I would be wondering
whether that person really understood what big-O notations means, because
O(3n+n log n) means exactly the same as O(n log n).

Andrew Koenig, Nov 19, 2006
16. ### Andrew KoenigGuest

"Wayne Marsh" <> wrote in message
news:455f6937\$0\$2446\$...

> Create a vector of bools initialised to false the same size as the list,
> and then run through the list, setting the item in the vector at the
> position of the number encountered to true, and removing ones that are
> already true. Then simply look at the first element of the vector. Bingo.
> One pass, right?

> I bet I've made some glaring error in process or efficiency.

Doesn't the vector need to be as large as the number of possible values for
an element in the original vector? That is, if the original vector elements
are int, and an int on your machine has a range of (-2^31) <= n < (2^31),
then don't you need 2^32 elements in your boolean vector?

Andrew Koenig, Nov 19, 2006
17. ### NobodyGuest

"Andrew Koenig" <> wrote in message
news:FyN7h.303573\$...
> "Nobody" <> wrote in message
> news:IiK7h.216\$...
>
>> Hehe... that was kind of the point of my question. In the interview, I
>> got it down to O(3n + n log n), later on at home I got it down to O(n + n
>> log n). I'm trying to see if there is a faster way.

>
> If someone told me that an algorithm was O(3n+n log n), I would be
> wondering whether that person really understood what big-O notations
> means, because O(3n+n log n) means exactly the same as O(n log n).
>
>

Yeah, that was another one of my mistakes during the interview ... I also
stumbled on the fact that a quick sort was n log n and not log n... I asked
a friend who is very good at algorithms, and he mistakenly blurted out log n
as well til I pointed it out to him .

Honestly, I didn't really study up on algorithms per say, because I was
going in for a user interface type position and focused on that... you live
and learn I guess...

Nobody, Nov 19, 2006
18. ### NobodyGuest

"Gianni Mariani" <> wrote in message
news:455fa4f1\$0\$28188\$...
> Nobody wrote:
> ...
>> Any ideas?

>
> You could use an "index" sort and this would be O(n) - this only works if
> the elements are "indexable" i.e. a limited range otherwise you're limited
> to nlogn.
>
> i.e.
> using an index sort:
>
> a) create a "tally" vector<int> sized to the max element in the set.
> b) scan through the elements incrementing the tally[index] in the vector.
> c) scan through the elements a second time and stop on the first one
> with tally[index] ==1.
>
> If you can't use an index, you can sort the elements through a double
> indirection:
>
> a) create a "flags" vector<pair<bool,iterator>> setting first to true
> and second to each consecutive iterator of the element list
> b) create a "pointers" initialized to point to each element in flags
> c) use a special compare routine that compares two * pair<bool,iterator>
> if the (*second) are equal then set the first to false on both
> operands.
> d) scan through the "flags" vector looking for the first element with
> first as "true".

Thanks for the response and the code Gianni. I took the thread over to
comp.lang.algorithms as a poster requested. The AVL tree solution is
actually O (n log n) as well, but as per Alf, he pointed out that there is a
trivial O (n) solution.

I was totally stumped til someone mentioned the key word: hash table.

basically the O (n) "trivial" solution was to loop through the list like so:

for (0..n)
{
hash[element[n]] = 1;
else
hash[element[n]] = hash[element[n]] + 1;
}

that'll build a hash table of the occurance count of each item in the array.

then loop through like this to find the first unique element...

for (0..n)
{
if (hash.Find(element[n]) == 1)
FOUND IT
}

that ends up being O(2n) = O(n).

no sorting or deletions needed ...

another poster on this thread actually came close to this solution using ref
counts, but he basically looped through the list to count them O (n^2) for
each element rather then build the table.

hash tables would have solved in O(n) a few of the other questions they
asked me as well. I obviously knew about hash tables going in, but didn't
have it on the tip of my brain that operations were considered O(1). And I
don't know why I only used a hash table on the "2nd pass" while they were
grilling me...

But now I'm starting to see the wide array of uses for them .

Nobody, Nov 19, 2006
19. ### Rud1ger Sch1erzGuest

"Nobody" <> writes:

> basically the interview question is: given an unsorted listed such as:
>
> 3,1,3,7,1,2,4,4,3

Well, my solution takes 10 compares to find the first unique number (7
at position 3). The number of compares depends on the position, where
there first unique value is. In this example it takes
2 compares for value 0
2 compares for value 1
0 compares for value 2 (skipped, since already marked as not unique)
6 compares for value 3 (4 in rest of list, 2 in notuniquevalues)

Where can I get the job?

Sigh, I just *have to* fiddle around with those puzzles.

Cheers,
Rudiger

main.cc:
#include <iostream>

const size_t len = 9;
int values[len] = {3,1,3,7,1,2,4,4,3};
int notuniquevalues[len]; // array to remember already found not uniques
size_t ni = 0; // count of already found not unique values
bool stillunique[len] = {true, true, true, true, true, true, true, true, true};
int compare_count = 0;

bool IsUnique(int value, size_t pos, size_t len)
{
bool retval = false;

if(pos == len) {
retval = true;
for(int i = 0; i < ni; i++) {
compare_count++;
if(value == notuniquevalues) {
retval = false;
break;
}
}
} else {
if(stillunique[pos]) {
compare_count++;
if(value == values[pos]) {
stillunique[pos] = false;
notuniquevalues[ni++] = value;
} else {
retval = IsUnique(value, pos+1, len);
}
} else {
retval = IsUnique(value, pos+1, len);
}
}

return retval;
}

int main(int argc, char** argv)
{
size_t pos =0;

while(pos < len) {
if(stillunique[pos]) {
if(IsUnique(values[pos], pos+1, len)) {
cout << "Found first unique " << values[pos] << " at position " << pos << endl;
cout << "Count of compares: " << compare_count << endl;
break;
}
}
pos++;
}

return 0;
}

Rud1ger Sch1erz, Nov 28, 2006