Help with Searching TreeMaps containing range objects

T

tim felton

Hi, I am somewhat of a noob when it comes to Java but some assistance would
be great.

I have two TreeSet objects. They each hold instances of objects that
implement the following interface :

abstract interface IBioFeature extends Comparable, Serializable
{
public String getUID();
public int getStart();
public int getStop();
public int getStrand();
public String getSequence() throws SQLException,
ClassNotFoundException;
public String getContigIdentifier();
}

Note that this interface represents a range in a genomic sequence - that for
all purposes
may be regarded as a large String.

I would like to find elements in each TreeSet whose ranges partially
overlap. I do not want to search both collections and compare values from
getStart() and getStop() as this is very inefficient for large collections
of IBioFeature objects.

Any ideas would be greatly appreciated
 
B

Boris Stumm

tim said:
I have two TreeSet objects. They each hold instances of objects that
implement the following interface :
abstract interface IBioFeature extends Comparable, Serializable {
public String getUID();
public int getStart();
public int getStop();
public int getStrand();
public String getSequence() throws SQLException,
ClassNotFoundException;
public String getContigIdentifier();
}

I would like to find elements in each TreeSet whose ranges partially
overlap. I do not want to search both collections and compare values from
getStart() and getStop() as this is very inefficient for large collections
of IBioFeature objects.

Do the elements in _one_ of the TreeSets overlap (1), or can this only happen
if I take two elements of different TreeSets (2)? What is the sort key?

In the first case, it would be helpful to know the maximum element length to be
able to optimize and get an O(n) performance. In the second case thats much
easier. If in the first case (overlapping in _one_ TreeSet) there is no
maximum element length, i doubt you will get better than O(n^2) or O(n log n),
but i may be wrong.

So, for an efficient solution there is some more information required, as
stated above.

Boris Stumm
 
R

Ray Tayek

tim said:
Hi, I am somewhat of a noob when it comes to Java but some assistance would
be great.

I have two TreeSet objects. They each hold instances of objects...
abstract interface IBioFeature extends Comparable, Serializable
public String getUID(); public int getStart(); public int getStop(); public int getStrand(); public String getSequence() .. public String getContigIdentifier();

how are you sorting them? the same way?
I would like to find elements in each TreeSet whose ranges partially
overlap. I do not want to search both collections and compare values from
getStart() and getStop() as this is very inefficient for large collections
of IBioFeature objects. ...

consider combining the two into another with perhaps another sort order.
you could maybe define < if max 1 < min 2 and > if max 2 < min 1
otherwise consider them equal. this would get the overlapping ones into
one set perhaps? or refine the above ordering into a total ordering some
way.

or, since the collections just hold refrences, maybe make lists (from/or
sorted sets) of these guys in different sort orders? or find a useful
total ordering on the ones that do overlap, and put others in an other
collection?

just some thoughts off the top of my head.

thanks
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,575
Members
45,053
Latest member
billing-software

Latest Threads

Top