Merging Linked Lists

D

Damo

Hi,

I (will) have anythng up to 6 Linked Lists of strings. I want to merge
them and remove duplicate entries at the same time. So that I end up
with one Linked List with every node containing a distinct string. I
dont really want(need) to sort the list. Does anyone know how I would
go about doing this efficiently.
Any advice would be much appreciated.
 
D

Daniel Dyer

Hi,

I (will) have anythng up to 6 Linked Lists of strings. I want to merge
them and remove duplicate entries at the same time. So that I end up
with one Linked List with every node containing a distinct string. I
dont really want(need) to sort the list. Does anyone know how I would
go about doing this efficiently.
Any advice would be much appreciated.

Is LinkedList the best data structure for your needs? Perhaps a Set would
be better if you need no duplicates and don't care about sorting?
 
B

ballpointpenthief

Damo said:
Hi,

I (will) have anythng up to 6 Linked Lists of strings. I want to merge
them and remove duplicate entries at the same time. So that I end up
with one Linked List with every node containing a distinct string. I
dont really want(need) to sort the list. Does anyone know how I would
go about doing this efficiently.
Any advice would be much appreciated.

Read Knuth Vol 3 (I think).
I expect sorting before merging, then merging if no duplicates will be
more efficient.
 
D

Damo

ballpointpenthief said:
Read Knuth Vol 3 (I think).
I expect sorting before merging, then merging if no duplicates will be
more efficient.

But its possible that two of the Linked lists will have the same item.
so even if theyre sorted i only want to merge distinct items from the
second to the first. The only thing I'm sure of is that each list will
not have duplicates within them.
 
D

Daniel Pitts

Damo said:
But its possible that two of the Linked lists will have the same item.
so even if theyre sorted i only want to merge distinct items from the
second to the first. The only thing I'm sure of is that each list will
not have duplicates within them.

Is there any reason not to use a Set instead?
 
D

Damo

Is there any reason not to use a Set instead?


Ye, I was just reading about Sets , there and the more I read the more
inclined I am to use them. How are they as regards efficiency as
opposed to linked lists? Efficiency is my main prioity for this project.
 
D

Daniel Pitts

Damo said:
Ye, I was just reading about Sets , there and the more I read the more
inclined I am to use them. How are they as regards efficiency as
opposed to linked lists? Efficiency is my main prioity for this project.

Well, it certainly depends on what your using them for. However,
efficiency should NEVER be the main priority of any project. Creating
something that WORKS should be the main priority.
<soap-box>
You need to make something that works and is easy to read. After that,
if it runs to slowly, you can run it through a profiler and figure out
what parts of it are taking the most amount of time. You'd often be
surprised.

If you start with something that works and is easy to maintain, then
tweaking the speed will be a lot easier. If you worry about it too
soon (its called premature optimization), then you might end up with a
very slow program anyway, but it would be harder to optimize the RIGHT
parts.
</soap-box>
In any case:
HashSets are often more effecient when adding and removing arbitrary
elements if you need to ensure uniqueness. They are of comparable
speed when iterating through the contained values.
ArrayLists are generally faster than LinkedLists if you only add/remove
from the end of the list, but look up arbitrary elements in the middle
of the list.

If you need to maintain order, then look into LinkedHashSet.
While it has a little more overhead than HashSet, it can be faster when
iterating over the entire set.

So. My professional suggestion is to make all of your types "Set" (or
"Collection" is better, depending on what you need to do), and
instantiate with "new HashSet()". If you find, once you finish the
working project, that the HashSet is too slow for one reason or
another, it will be easy to modify your code to use a different type of
Collection, better suited to your domain.
 
L

Lee Weiner

"Damo" said:
Ye, I was just reading about Sets , there and the more I read the more
inclined I am to use them. How are they as regards efficiency as
opposed to linked lists? Efficiency is my main prioity for this project.

Efficiency measured how? Do Sets use more resources than LinkedLists? Yes.
The whole reason for a Set to exist is to eliminate duplicates. That's what
they do, that's all they do. AFAIC, a Set is the most efficient tool for your
requirement.

Lee Weiner
lee AT leeweiner DOT org
 
D

Daniel Pitts

Lee said:
Efficiency measured how? Do Sets use more resources than LinkedLists? Yes.
The whole reason for a Set to exist is to eliminate duplicates. That's what
they do, that's all they do. AFAIC, a Set is the most efficient tool for your
requirement.

Lee Weiner
lee AT leeweiner DOT org
Actually, a set is likely more effecient in some ways. Also, they are
for much more than just eliminating duplicates.

For example, they are also for effecient "contains" checks.
 
D

Damo

Thanks for the words of advice. I think i'll put some more thought into
which one I use as I need some of the advantages of several of them.
With Linked lists it takes O(1) to insert into a list at an arbirtrary
position, right?
My plan was to iterate through the lists , if the first list contains
the current item in the second list,skip it or else insert it into the
corrsponding position in the first list.
this should I think return one List with no duplicates
 
P

Patricia Shanahan

Damo said:
Thanks for the words of advice. I think i'll put some more thought into
which one I use as I need some of the advantages of several of them.
With Linked lists it takes O(1) to insert into a list at an arbirtrary
position, right?

If you do anything with a specified index, LinkedList scans from the
nearer end of the list to find the place. Finding the next element from
an iterator should be faster.
My plan was to iterate through the lists , if the first list contains
the current item in the second list,skip it or else insert it into the
corrsponding position in the first list.
this should I think return one List with no duplicates

That sounds like an O(n*n) method for merging two lists, each containing
n elements, even if you use iterators.

Sort-and-merge is O(n log n). You only have to look at the head elements
of each list, move the smaller to the result, dropping any duplicates
from either list.

Dumping all the data into a HashSet is O(n). It is also VERY simple to
code, using addAll.

Patricia
 
J

Jhair Tocancipa Triana

Damo said:
Hi,
I (will) have anythng up to 6 Linked Lists of strings. I want to merge
them and remove duplicate entries at the same time.

If you don't want duplicates, what about using Sets instead of a
Lists?
 
L

Lew

Daniel said:
> Is LinkedList the best data structure for your needs? Perhaps a Set
> would be better if you need no duplicates and don't care about sorting?

Daniel said:
> Is there any reason not to use a Set instead?

Daniel said:
So. My professional suggestion is to make all of your types "Set" (or
"Collection" is better, depending on what you need to do), and
instantiate with "new HashSet()".

Lee said:
> The whole reason for a Set to exist is to eliminate duplicates. That's what
> they do, that's all they do. AFAIC, a Set is the most efficient tool for
> your requirement.

Daniel said:
> Actually, a set is likely more efficient in some ways. Also, they are
> for much more than just eliminating duplicates.
>
> For example, they are also for efficient "contains" checks.

Patricia said:
>
> That sounds like an O(n*n) method for merging two lists, each containing
> n elements, even if you use iterators.
>
> Sort-and-merge is O(n log n). You only have to look at the head elements
> of each list, move the smaller to the result, dropping any duplicates
> from either list.
>
> Dumping all the data into a HashSet is O(n). It is also VERY simple to
> code, using addAll.
If you don't want duplicates, what about using Sets instead of a
Lists?

Get the hint??

- Lew
 
O

Oliver Wong

Damo said:
Thanks for the words of advice. I think i'll put some more thought into
which one I use as I need some of the advantages of several of them.
With Linked lists it takes O(1) to insert into a list at an arbirtrary
position, right?

Are we talking about linked lists, or are we talking about Sun's class,
java.util.LinkedList? Inserting into java.util.LinkedList at an arbitrary
position is O(n), the implementation has to "seek" to that arbitrary
position. A different implementation of linked list might have different
asymptotic runtime performance.
My plan was to iterate through the lists , if the first list contains
the current item in the second list,skip it or else insert it into the
corrsponding position in the first list.

Given that you said the list are unsorted, how would you determine what
the "corresponding" position is?
this should I think return one List with no duplicates

Yes, but using a HashSet will probably be faster.

- Oliver
 
L

Lew

Remon said:
Sarcasm is a wonderful thing ;)

Is it sarcasm? I thought it was emphasis that many folks kept repeating the
same advice over and over again, with varying degrees of supporting and
compelling evidence, and that the advice was not being absorbed.

If one person calls you a horse, they're a kook.
If two people call you a horse, it's a joke or a conspiracy.
If three people call you a horse, it's time to buy a saddle.

People kept responding to the OP with good advice. More people kept responding
with the same good advice. And still more people kept responding with still
the same good advice. People put themselves out in good faith trying to help
the OP; their efforts seemed to be in vain. There was no sarcasm in pointing
that out.

- Lew
 
J

John Ersatznom

Lew said:
If one person calls you a horse, they're a kook.
If two people call you a horse, it's a joke or a conspiracy.
If three people call you a horse, it's time to buy a saddle.

No, if three people call you a horse, it's time to feed your killfile,
if need be first saying sayonara to google groups or anything of the
kind. Who has time for anyone that can't even correctly identify you to
species? :)
 
D

Daniel Pitts

Lew said:
Is it sarcasm? I thought it was emphasis that many folks kept repeating the
same advice over and over again, with varying degrees of supporting and
compelling evidence, and that the advice was not being absorbed.

If one person calls you a horse, they're a kook.
If two people call you a horse, it's a joke or a conspiracy.
If three people call you a horse, it's time to buy a saddle.

People kept responding to the OP with good advice. More people kept responding
with the same good advice. And still more people kept responding with still
the same good advice. People put themselves out in good faith trying to help
the OP; their efforts seemed to be in vain. There was no sarcasm in pointing
that out.

- Lew

I'm not so sure it was in vain...
Damo hasn't argued with the point (he's no Twisted after all). There
just isn't an evidence that Damo has tried with the Set and succeeded
or otherwise. Perhaps a follow up with be nice Damo?
 
B

bugbear

Damo said:
Hi,

I (will) have anythng up to 6 Linked Lists of strings. I want to merge
them and remove duplicate entries at the same time. So that I end up
with one Linked List with every node containing a distinct string. I
dont really want(need) to sort the list. Does anyone know how I would
go about doing this efficiently.
Any advice would be much appreciated.

1) bear in mind what other have said about not
using lists at all.

2) If you don't mind some temporary storage,

LinkedList merge(List multipleLists) {
LinkedList newList = new LinkedList();
Set newListSet = new HashSet();

for(Iterator li = multipleLists.iterator(); li.hasNext();) {
List l = (List)li.next();
for(Iterator i = l.iterator(); i.hasNext();) {
Object o = i.next();
if(!newListSet.contains(o)) {
newListSet.add(o);
newList.add(o);
}
}
}
return newList;
}

may serve (untested code)

I'm assuming your multiple lists are held in a list...
I think that's O(N).

It also kind of retains the order of the input lists

BugBear
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top