Is Set faster than List

C

ck

Hello everyone,
is Set faster than List for most of the operations? I implemented
prime number Generation by "Sieve of Eratosthenes" algorithm. One
method uses List other Set. After Benchmarking both the algorithms, I
find that List are faster than Set for if the number of elements to be
generated is small. But for larger it is Set that is significantly
faster than List.
Is this due to, how Set and List are implemented? and is it a general
rule to use Set over List where element repetition is not significant?
Why benchmarking in Eclipse are slower as compared to console(while
running the benchmark in console, I keep eclipse open, and vice
versa)? I presume that foreign process that could effect the benchmark
in both the cases remain the same.

Thanks
 
C

Christian

ck said:
Hello everyone,
is Set faster than List for most of the operations? I implemented
prime number Generation by "Sieve of Eratosthenes" algorithm. One
method uses List other Set. After Benchmarking both the algorithms, I
find that List are faster than Set for if the number of elements to be
generated is small. But for larger it is Set that is significantly
faster than List.
Is this due to, how Set and List are implemented? and is it a general
rule to use Set over List where element repetition is not significant?
Why benchmarking in Eclipse are slower as compared to console(while
running the benchmark in console, I keep eclipse open, and vice
versa)? I presume that foreign process that could effect the benchmark
in both the cases remain the same.

Thanks

totally based on the implementation.

List is for example an ArrayList then the contains Operation is in O(n)

Set is for example an HashSet then the contains Operation is in O(1)

there you go .. with few elements you may be faster and don't care about
the Asymptotic runtime ..

Just have a look at what you implementation is .

General Rule for me is: use what fits you most! If two things fit
equally well, look at the asymptotic runtime of the operations you need.
A Set is not more than a List that checks for contains() before it
inputs an element.

Christian
 
E

Eric Sosman

ck wrote On 03/08/07 14:54,:
Hello everyone,
is Set faster than List for most of the operations?

No. Both Set and List are interfaces, and interfaces
have no executable code, so neither is "fast" or "slow"
or anything in between.
I implemented
prime number Generation by "Sieve of Eratosthenes" algorithm. One
method uses List other Set. After Benchmarking both the algorithms, I
find that List are faster than Set for if the number of elements to be
generated is small. But for larger it is Set that is significantly
faster than List.
Is this due to, how Set and List are implemented?

It might be, depending on which Set and List implementations
you used and on how you made your measurements.
and is it a general
rule to use Set over List where element repetition is not significant?

No. Use whichever type of container is the best fit for
the problem at hand.
Why benchmarking in Eclipse are slower as compared to console(while
running the benchmark in console, I keep eclipse open, and vice
versa)? I presume that foreign process that could effect the benchmark
in both the cases remain the same.

Insufficient information, and an unwarranted presumption.
 
M

Mark Rafn

ck said:
is Set faster than List for most of the operations?

Neither Set nor List specify any speed or complexity guarantees for any
operation. It will depend on the actual implementation you use.
prime number Generation by "Sieve of Eratosthenes" algorithm. One
method uses List other Set.

ArrayList and HashSet? Or LinkedList and TreeSet? Or something else?
Is this due to, how Set and List are implemented?

Set and List are interfaces, not implementations. So "yes and no". It's due
to how the objects you're using are implemented, not how Set and List are
implemented in general.
and is it a general rule to use Set over List where element repetition
is not significant?

They're different things. It's a general rule to use the correct container
for your needs. Sets are for use where you want to have no repeats and no
indexed access (i.e. "4th element"). Lists are for when you care about index
or order, or want repeats.

Each of them has a number of different implementations to pick from depending
on your exact needs. TreeSet, for instance, can be iterated based on a sort
order, where HashSet has no order at all. ArrayList has fast lookup by
position but can be expensive to add elements if it's not sized properly.
LinkedList has slow lookup by position.
 
C

ck

Neither Set nor List specify any speed or complexity guarantees for any
operation. It will depend on the actual implementation you use.


ArrayList and HashSet? Or LinkedList and TreeSet? Or something else?


Set and List are interfaces, not implementations. So "yes and no". It's due
to how the objects you're using are implemented, not how Set and List are
implemented in general.

I do know that Set and List are interface that collectively fall under
Collection. I meant to question that, in general any Set
implementation like HashSet or a TreeSet (I have used HashSet) would
be faster than List like ArrayList or LinkedList or Vector(I used
ArrayList)

They're different things. It's a general rule to use the correct container
for your needs. Sets are for use where you want to have no repeats and no
indexed access (i.e. "4th element"). Lists are for when you care about index
or order, or want repeats.

For that matter for a small collection(where elements are less than 0)
ArrayList (sorted) seems to be faster with or without repeated values.
Where as its inverse for a bigger Collection.
Each of them has a number of different implementations to pick from depending
on your exact needs. TreeSet, for instance, can be iterated based on a sort
order, where HashSet has no order at all. ArrayList has fast lookup by
position but can be expensive to add elements if it's not sized properly.
LinkedList has slow lookup by position.

Would it be wise to consider a collection on basis of its size?

@ Eric
Insufficient information, and an unwarranted presumption.

The benchmarking for following code on windows xp AMD 64 1.8 GHz 1 GB
RAM.
Applications running - Firefox, Eclipse, Command prompt instance.
Background services like Etrust Antivirus, Network services etc..

--------- code ------------
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import java.util.Set;
import java.util.TreeSet;

public class Prime {

// TreeSet implementation
private static Set calculatePrime1(int number){
Set<Integer> listA = null;
Set<Integer> finalList = null;
int count =0;
listA=new TreeSet<Integer>();
finalList= new TreeSet<Integer>();
for(int i=2;i<number;i+=2)
listA.add(i);
while (count<number){
Iterator<Integer> it = listA.iterator();
count=it.next();
finalList.add(count);
listA.remove(count);
int internalCount=count;
while(internalCount<number){
internalCount += count;
listA.remove(internalCount);
}
if(listA.isEmpty())
break;
}
return finalList;
}

// ArrayList implementation algorithm same as Treeset

private static List calculatePrime2(int count){
List<Integer> list = new ArrayList<Integer>(count);
List<Integer> primeFound = new ArrayList<Integer>();
list.add((Integer)2);
for (Integer i = 3; i <= count; i+=2) {
list.add(i);
}
int divisor=3;
while (divisor<count/2) {
primeFound.add(list.get(0));
divisor=list.get(0);
for (int i = divisor; i<=count; i+=divisor) {
list.remove((Object)i);
}
}
primeFound.addAll(list);
return (List<Integer>)primeFound;
}

// ArrayList with slight difference from previous approach

private static List calculatePrime3(int count){
List<Integer> list = new ArrayList<Integer>(count);
List<Integer> primeFound = new ArrayList<Integer>();
for (Integer i = 3; i <= count; i+=2) {
list.add(i);
}
int divisor=3;
while (divisor<count) {
Iterator<Integer> it = list.iterator();
divisor=it.next();
primeFound.add(divisor);
int internalCount=divisor;
list.remove(0);
while(internalCount<count) {
internalCount = internalCount+ divisor;
list.remove((Integer)internalCount);
}
if (list.isEmpty())
break;
}
primeFound.addAll(list);
return (List<Integer>)primeFound;
}

public static void main(String [] args){
BenchMark b = new BenchMark();
int count=100;
calculatePrime1(count);
System.out.println(b.finish());
b = new BenchMark();
calculatePrime2(count);
System.out.println(b.finish());
b= new BenchMark();
calculatePrime3(count);
System.out.println(b.finish());
}
}

--------- Benchmark code -----------

public class BenchMark {
private long start;
public BenchMark() {
start = System.nanoTime();
}
public long finish(){
return System.nanoTime()-start;
}
}

--------- Code ends ------------

Bench mark output in eclipse

first run -->

1917283
2136026
331327

second run -->

1920076
2136584
331607

third run -->

1909740
2231848
330488

Benchmark output in Command prompt

first run -->
2121220
1596851
338032

second run -->
2189943
1593778
336076

third run -->
2107251
1597968
339150

is there some more information that would be helpful?
 
C

ck

In my previous post
For that matter for a small collection(where elements are less than 0)

I meant to type For that matter for a small collection(where elements
are less than 1000)
Sorry about the typo
 
J

Joshua Cranmer

ck said:
I do know that Set and List are interface that collectively fall under
Collection. I meant to question that, in general any Set
implementation like HashSet or a TreeSet (I have used HashSet) would
be faster than List like ArrayList or LinkedList or Vector(I used
ArrayList)
Collection doesn't mean much other than "this is an aggregate of data."
The answer to your question: Any Set implementation is not necessarily
faster than List. The reason there are different implementations is that
different implementations have different times: e.g., ArrayList has
O(1) random access, but O(N) deletion. LinkedList has O(N) random access
but O(1) deletion, if you are using an iterator on that node. Computer
science rarely has XXX is /always/ better than YYY, but has XXX is
better for a, b, and c, whereas YYY outperforms on d, e, and f.
For that matter for a small collection(where elements are less than 0)
ArrayList (sorted) seems to be faster with or without repeated values.
Where as its inverse for a bigger Collection.


Would it be wise to consider a collection on basis of its size?

@ Eric
Not really, no. Collections should be decided on how they are used. If
you need an aggregate dump of unique objects, HashSet or TreeSet work
fine. If you need a list of numbers, than ArrayList or LinkedList is better.
The benchmarking for following code on windows xp AMD 64 1.8 GHz 1 GB
RAM.
Applications running - Firefox, Eclipse, Command prompt instance.
Background services like Etrust Antivirus, Network services etc..

<scissor happy>

is there some more information that would be helpful?

P.S. The only code I've seen for Sieve of Erasthiones uses BitSet.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top