Filter unique elements in an array algorithm

C

ck

Hi,
I would like to get opinion from the group members about efficiency of
the program below.

I need to filter an input array of int to obtain an output array which
contains distinct elements of the input array.
That is if my input array is {1,2,43,5,2,4,5,20,2,1} the output should
be {1,2,43,4,5,20}
The condition being I cannot use Collection framework, just use array
to get the most efficient algorithm.
Could you guys please tell me if the following algorithm would be
considered efficient?

public class ArrayUniqueElements {

public static int [] filter (int [] input) {
int [] output = new int[input.length];
output[0]=input[0];
boolean flag=true;
int pos=0;
// iteration for each element in input
for (int i=1;i<input.length;i++){
for(int j=0;j<=i;j++){
if(input==output[j]){
flag=false;
break; // exit out of inner loop on first duplicate element
}
}
if(flag){
output[++pos]=input;
} else
flag=true;
}
int [] tempOutput = new int[pos];
// This for loop is required to filter out default values from the
output
for (int i = 0; i < pos; i++) {
tempOutput=output;
}
return tempOutput;
}
}
 
P

Patricia Shanahan

ck said:
Hi,
I would like to get opinion from the group members about efficiency of
the program below.

I need to filter an input array of int to obtain an output array which
contains distinct elements of the input array.
That is if my input array is {1,2,43,5,2,4,5,20,2,1} the output should
be {1,2,43,4,5,20}
The condition being I cannot use Collection framework, just use array
to get the most efficient algorithm.
Could you guys please tell me if the following algorithm would be
considered efficient?

"guys"?

For long arrays, the Omega(N*N) performance is not very good. A hashing
solution would be O(N).

Are you allowed to use System.arraycopy instead of the final loop?

Patricia
 
K

Kai Schwebke

Hello,

Hi,
I would like to get opinion from the group members about efficiency of
the program below.

The program is not efficient - it's complexity is O(n^2). The task could
be solved with O(n*log n) using some sorting algorithm or even faster
with hashing techniques.
The condition being I cannot use Collection framework, just use array
to get the most efficient algorithm.

Don't exclude Collection framework just for performance reasons.
A HashSet or TreeSet will solve the task much more efficient than
your code.



Kai
 
C

Chris Uppal

ck said:
I need to filter an input array of int to obtain an output array which
contains distinct elements of the input array.
That is if my input array is {1,2,43,5,2,4,5,20,2,1} the output should
be {1,2,43,4,5,20}
The condition being I cannot use Collection framework, just use array
to get the most efficient algorithm.

If you are allowed to assume that the range of values to be checked is fairly
small, then you can get a linear algorithm by using a boolean[] array. How big
"fairly small" is depends on your instructor.

-- chris
 
W

Wojtek

ck wrote :
Hi,
I would like to get opinion from the group members about efficiency of
the program below.

I need to filter an input array of int to obtain an output array which
contains distinct elements of the input array.
That is if my input array is {1,2,43,5,2,4,5,20,2,1} the output should
be {1,2,43,4,5,20}

Are the values in the int[] always positive?

If so, you could create a boolean array with a size of the maximum
positive value of an int (all elements will be false by default), then
directly indice the boolean array with the values in the int array
setting the value to true. Next traverse the boolean array and count
the number of true elements (to get the size of the final array). The
final step would be to traverse the boolean array again and store the
indice in your int array.

This requires n*3 passes.
 
C

ck

Are the values in the int[] always positive?

Thank you everyone for sharing opinion about the problem. I was asked
to solve this in an interview with time limit of 5 minutes. Had never
thought about some problem like this in real life.
The length of the list could be anything from units to thousands, and
I was not told that the numbers are going to be positive only
If so, you could create a boolean array with a size of the maximum
positive value of an int (all elements will be false by default), then
directly indice the boolean array with the values in the int array
setting the value to true. Next traverse the boolean array and count
the number of true elements (to get the size of the final array). The
final step would be to traverse the boolean array again and store the
indice in your int array.

This requires n*3 passes.

I actually did not understand this part, I am not sure what is indice.

@ Patricia - Yes I think arraycopy would be a better idea. I am not
sure what is "Hashing solution" that you have mentioned.

@ Kai - Not using Collection was a constraint provided by the person
who was taking the interview.

I will look for more information on different approaches that you all
have suggested.

Thanks
 
P

Patricia Shanahan

ck said:
Are the values in the int[] always positive?

Thank you everyone for sharing opinion about the problem. I was asked
to solve this in an interview with time limit of 5 minutes. Had never
thought about some problem like this in real life.
The length of the list could be anything from units to thousands, and
I was not told that the numbers are going to be positive only
If so, you could create a boolean array with a size of the maximum
positive value of an int (all elements will be false by default), then
directly indice the boolean array with the values in the int array
setting the value to true. Next traverse the boolean array and count
the number of true elements (to get the size of the final array). The
final step would be to traverse the boolean array again and store the
indice in your int array.

This requires n*3 passes.

I actually did not understand this part, I am not sure what is indice.

I would have written "index". Incidentally, this approach is a good idea
if the integers are known to all fall in any narrow range, regardless of
whether the limits are positive or negative. Just add an offset to each
integer before using it as an index.

It may not work so well if the maximum possible range of the integers is
large, even if they are all positive.
@ Patricia - Yes I think arraycopy would be a better idea. I am not
sure what is "Hashing solution" that you have mentioned.

Do the equivalent of using HashSet, but write it yourself to avoid
java.util. In general, I think it is worth understanding the main
algorithms used in java.util. It contributes to understanding their use,
and you may sometimes need to code without a nice collections package.
@ Kai - Not using Collection was a constraint provided by the person
who was taking the interview.

I will look for more information on different approaches that you all
have suggested.

Rather than focusing on our answers to this particular question, which
you will probably never meet again, it might be better to spend the time
on general study of algorithms and data structures. That would equip you
to answer arbitrary questions of this general type.

Patricia
 
J

John W. Kennedy

ck said:
Hi,
I would like to get opinion from the group members about efficiency of
the program below.

I need to filter an input array of int to obtain an output array which
contains distinct elements of the input array.
That is if my input array is {1,2,43,5,2,4,5,20,2,1} the output should
be {1,2,43,4,5,20}
The condition being I cannot use Collection framework, just use array
to get the most efficient algorithm.
Could you guys please tell me if the following algorithm would be
considered efficient?

No. It will take about .5*n^2 trips through the inner loop, and it's
always bad to have a "squared" or "cubed" there.

Patricia has mentioned hashing, which is a good all-round solution. If
your array is known to have reasonably small values, though, an even
better way is to make an array of boolean, one for each possible value.
If it is known to have medium-sized values, it is better yet to make an
array of ints with one int for every 32 possible values, each value
getting a single bit. (The Bitset class from the collections framework
does this automatically.) If the values range up into the tens of
millions, though, this usually isn't such a good idea.
 
D

Daniel Pitts

Are the values in the int[] always positive?

Thank you everyone for sharing opinion about the problem. I was asked
to solve this in an interview with time limit of 5 minutes. Had never
thought about some problem like this in real life.
The length of the list could be anything from units to thousands, and
I was not told that the numbers are going to be positive only


If so, you could create a boolean array with a size of the maximum
positive value of an int (all elements will be false by default), then
directly indice the boolean array with the values in the int array
setting the value to true. Next traverse the boolean array and count
the number of true elements (to get the size of the final array). The
final step would be to traverse the boolean array again and store the
indice in your int array.
This requires n*3 passes.

I actually did not understand this part, I am not sure what is indice.

@ Patricia - Yes I think arraycopy would be a better idea. I am not
sure what is "Hashing solution" that you have mentioned.

@ Kai - Not using Collection was a constraint provided by the person
who was taking the interview.

I will look for more information on different approaches that you all
have suggested.

Thanks

My solution (in under 5 minuets):
// Operational complexity: O(N log N)
public int[] uniques(int[] array) {
if (array == null || array.length < 2) {
return array;
}
Arrays.sort(array);
int top = 1;
int last = array[0];
for (int i : array) {
if (i != last) {
array[top++] = last;
}
last = i;
}
final int[] result = new int[top];
System.arraycopy(array, 0, result, 0, top);
return result;
}
 
D

Daniel Pitts

Are the values in the int[] always positive?
Thank you everyone for sharing opinion about the problem. I was asked
to solve this in an interview with time limit of 5 minutes. Had never
thought about some problem like this in real life.
The length of the list could be anything from units to thousands, and
I was not told that the numbers are going to be positive only
I actually did not understand this part, I am not sure what is indice.
@ Patricia - Yes I think arraycopy would be a better idea. I am not
sure what is "Hashing solution" that you have mentioned.
@ Kai - Not using Collection was a constraint provided by the person
who was taking the interview.
I will look for more information on different approaches that you all
have suggested.

My solution (in under 5 minuets):
// Operational complexity: O(N log N)
public int[] uniques(int[] array) {
if (array == null || array.length < 2) {
return array;
}
Arrays.sort(array);
int top = 1;
int last = array[0];
for (int i : array) {
if (i != last) {
array[top++] = last;
}
last = i;
}
final int[] result = new int[top];
System.arraycopy(array, 0, result, 0, top);
return result;

}

Oops, there's a bug in that code. Should have been:

public static int[] uniques(int[] array) {
if (array == null || array.length < 2) {
return array;
}
Arrays.sort(array);
int top = 1;
int last = array[0];
for (int i : array) {
if (i != last) {
// array[top++] = last; // <-- BUG
array[top++] = i;
}
last = i;
}
final int[] result = new int[top];
System.arraycopy(array, 0, result, 0, top);
return result;
}
 
C

ck

Oops, there's a bug in that code. Should have been:

public static int[] uniques(int[] array) {
if (array == null || array.length < 2) {
return array;
}
Arrays.sort(array);
int top = 1;
int last = array[0];
for (int i : array) {
if (i != last) {
// array[top++] = last; // <-- BUG
array[top++] = i;
}
last = i;
}
final int[] result = new int[top];
System.arraycopy(array, 0, result, 0, top);
return result;
}


Thanks Daniel,
I think there is something wrong with Arrays.sort() method. Let me
know if I am wrong instead
Consider the code below
--- Code ---
public void someMethod(){
int [] input = {1,4,-2,3,4,-1};
System.out.println("just input");
printArray(input);
System.out.println("Clone input before sorting");
printArray(inputClone);
Arrays.sort(inputClone);
System.out.println("Clone input after sorting");
printArray(inputClone);
}
public static void printArray(int [] input){
System.out.print("[");
for (int i=0;i<input.length;i++)
System.out.print(input+ ", ");
System.out.println("]");
}
--- Code End ---

I get output as below
-- Output --
just input
[1, 4, -2, 3, 4, -1, ]
Clone input before sorting
[1, 4, -2, 3, 4, -1, ]
Clone input after sorting
[-2, -1, 1, 3, 4, 4, ] <-- bug??
-- output End --
"<-- bug??" is of course not the part of output.
Hence I wrote filter2 method as follow which actually fails to give
the right output. I think that is due to this probable bug?
--- Code ---
public static int [] filter2(int [] input){
int [] output= new int[input.length];
int [] inputClone = input.clone();
int pos =0;
output[0]=inputClone[0];
for (int i=1;i<inputClone.length;i++){
if (inputClone!=inputClone[i-1]){
output[++pos]=inputClone;
}
}
int [] tempout=new int[pos];
System.arraycopy(output, 0, tempout, 0, pos);
return tempout;
}
--- Code End ---

@ John and @ Particia
Any pointers (like recommended online articles or book to brush up
these things quickly?) Or at least about "hashing"? I am looking at
JDK source code, but haven't been able to make much sense out of the
algorithms that they have used. I probably need to practice more, and
clear up some fundamentals.

Thanks
 
C

ck

Oops, there's a bug in that code. Should have been:
public static int[] uniques(int[] array) {
if (array == null || array.length < 2) {
return array;
[Clipped ]
I think there is something wrong with Arrays.sort() method. Let me

I think I goofed up by thinking that Arrays.sort() is not giving right
output. Sorry about that. I would need to review filter2() method that
I have written.
 
C

ck

Oops, there's a bug in that code. Should have been:
public static int[] uniques(int[] array) {
if (array == null || array.length < 2) {
return array;
}
Arrays.sort(array);
int top = 1;
int last = array[0];
for (int i : array) {
if (i != last) {
// array[top++] = last; // <-- BUG
array[top++] = i;
}
last = i;
}
final int[] result = new int[top];
System.arraycopy(array, 0, result, 0, top);
return result;
}

Thanks Daniel,
I think there is something wrong with Arrays.sort() method. Let me
know if I am wrong instead
Consider the code below
--- Code ---
public void someMethod(){
int [] input = {1,4,-2,3,4,-1};
System.out.println("just input");
printArray(input);
System.out.println("Clone input before sorting");
printArray(inputClone);
Arrays.sort(inputClone);
System.out.println("Clone input after sorting");
printArray(inputClone);}

public static void printArray(int [] input){
System.out.print("[");
for (int i=0;i<input.length;i++)
System.out.print(input+ ", ");
System.out.println("]");}

--- Code End ---

I get output as below
-- Output --
just input
[1, 4, -2, 3, 4, -1, ]
Clone input before sorting
[1, 4, -2, 3, 4, -1, ]
Clone input after sorting
[-2, -1, 1, 3, 4, 4, ] <-- bug??
-- output End --
"<-- bug??" is of course not the part of output.
Hence I wrote filter2 method as follow which actually fails to give
the right output. I think that is due to this probable bug?
--- Code ---
public static int [] filter2(int [] input){
int [] output= new int[input.length];
int [] inputClone = input.clone();
int pos =0;
output[0]=inputClone[0];
for (int i=1;i<inputClone.length;i++){
if (inputClone!=inputClone[i-1]){
output[++pos]=inputClone;
}
}
int [] tempout=new int[pos];
System.arraycopy(output, 0, tempout, 0, pos);
return tempout;}

--- Code End ---

@ John and @ Particia
Any pointers (like recommended online articles or book to brush up
these things quickly?) Or at least about "hashing"? I am looking at
JDK source code, but haven't been able to make much sense out of the
algorithms that they have used. I probably need to practice more, and
clear up some fundamentals.

Thanks

** correction **
Filter2() would be as given below
public static int [] filter2(int [] input){
int [] output= new int[input.length];
int [] inputClone = input.clone();
Arrays.sort(inputClone);
int pos =0;
output[0]=inputClone[0];
for (int i=1;i<inputClone.length;i++){
if (inputClone!=inputClone[i-1]){
output[++pos]=inputClone;
}
}
++pos;
int [] tempout=new int[pos];
System.arraycopy(output, 0, tempout, 0, pos);
return tempout;
}
 
W

Wojtek

Wojtek wrote :
ck wrote :
Hi,
I would like to get opinion from the group members about efficiency of
the program below.

I need to filter an input array of int to obtain an output array which
contains distinct elements of the input array.
That is if my input array is {1,2,43,5,2,4,5,20,2,1} the output should
be {1,2,43,4,5,20}

Are the values in the int[] always positive?

Thought about this some more....

Use two arrays, one for positive integers, the other for negative
integers. This would allow for the full range of possible integer
values.

And you could do a first pass to grab the highest/lowest numbers in the
source array, to limit the size of the arrays you are creating. The
extra time to do this may be saved if the values are small (so the
compiler does not need to initialize the entire integer range).
 
P

Patricia Shanahan

Wojtek said:
Wojtek wrote :
ck wrote :
Hi,
I would like to get opinion from the group members about efficiency of
the program below.

I need to filter an input array of int to obtain an output array which
contains distinct elements of the input array.
That is if my input array is {1,2,43,5,2,4,5,20,2,1} the output should
be {1,2,43,4,5,20}

Are the values in the int[] always positive?

Thought about this some more....

Use two arrays, one for positive integers, the other for negative
integers. This would allow for the full range of possible integer values.

And you could do a first pass to grab the highest/lowest numbers in the
source array, to limit the size of the arrays you are creating. The
extra time to do this may be saved if the values are small (so the
compiler does not need to initialize the entire integer range).

Remember this is supposed to be efficient. Even on machines with enough
memory for two maximum sized arrays, multiple sequential scans of 4GB is
going to give the cache replacement algorithms quite a lot of work.

And even a two element input array could contain both extremes.

If the arrays were the only O(N) solution going, I might consider it. In
practice, for this problem, I would use the OP's original solution for
small input array, and a hash set for large.

Patricia
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,754
Messages
2,569,528
Members
45,000
Latest member
MurrayKeync

Latest Threads

Top