Overlapping region resolution

Discussion in 'Python' started by psaffrey@googlemail.com, May 21, 2009.

  1. Guest

    This may be an algorithmic question, but I'm trying to code it in
    Python, so...

    I have a list of pairwise regions, each with an integer start and end
    and a float data point. There may be overlaps between the regions. I
    want to resolve this into an ordered list with no overlapping
    regions.

    My initial solution was to sort the list by the start point, and then
    compare each adjacent region, clipping any overlapping section in
    half. I've attached code at the bottom. Unfortunately, this does not
    work well if you have sections that have three or more overlapping
    regions.

    A more general solution is to work out where all the overlaps are
    before I start. Then I can break up the region space based on what
    regions overlap each section and take averages of all the datapoints
    that are present in a particular section. Devising an algorithm to do
    this is making my brain hurt. Any ideas?

    Peter


    # also validates the data
    def clipRanges(regions):
    for i in range(len(regions) - 1):
    thispoint = regions
    nextpoint = regions[i+1]
    assert thispoint[1] > thispoint[0] and nextpoint[1] > nextpoint[0],
    "point read not valid"
    thisend = thispoint[1]
    nextstart = nextpoint[0]
    diff = thisend - nextstart
    # a difference of zero is too close together
    if diff > -1:
    if diff % 2 == 1:
    diff += 1
    correction = diff / 2
    newend = thisend - correction
    newstart = newend + 1
    assert newend > thispoint[0] and nextpoint[1] > newstart, "new
    range not valid!"
    newthispoint = (thispoint[0], newend, thispoint[2])
    newnextpoint = (newstart, nextpoint[1], nextpoint[2])
    regions = newthispoint
    regions[i+1] = newnextpoint
    return regions

    regions = [ (0,10,2.5), (12,22,3.5), (15,25,1.2), (23, 30,0.01), (27,
    37,1.23), (30, 35, 1.45) ]
    regions2 = [ (0,10,2.5), (1,11,1.1), (2,12,1.2) ]

    # works fine, produces [(0, 10, 2.5), (12, 18, 3.5), (19, 24, 1.2),
    (25, 28, 0.01), (29, 33, 1.23), (34, 35, 1.45)]
    print clipRanges(regions)
    # violates "new range not valid" assertion
    print clipRanges(regions2)
    , May 21, 2009
    #1
    1. Advertising

  2. MRAB Guest

    wrote:
    > This may be an algorithmic question, but I'm trying to code it in
    > Python, so...
    >
    > I have a list of pairwise regions, each with an integer start and end
    > and a float data point. There may be overlaps between the regions. I
    > want to resolve this into an ordered list with no overlapping
    > regions.
    >
    > My initial solution was to sort the list by the start point, and then
    > compare each adjacent region, clipping any overlapping section in
    > half. I've attached code at the bottom. Unfortunately, this does not
    > work well if you have sections that have three or more overlapping
    > regions.
    >
    > A more general solution is to work out where all the overlaps are
    > before I start. Then I can break up the region space based on what
    > regions overlap each section and take averages of all the datapoints
    > that are present in a particular section. Devising an algorithm to do
    > this is making my brain hurt. Any ideas?
    >
    > Peter
    >
    >
    > # also validates the data
    > def clipRanges(regions):
    > for i in range(len(regions) - 1):
    > thispoint = regions
    > nextpoint = regions[i+1]
    > assert thispoint[1] > thispoint[0] and nextpoint[1] > nextpoint[0],
    > "point read not valid"
    > thisend = thispoint[1]
    > nextstart = nextpoint[0]
    > diff = thisend - nextstart
    > # a difference of zero is too close together
    > if diff > -1:
    > if diff % 2 == 1:
    > diff += 1
    > correction = diff / 2
    > newend = thisend - correction
    > newstart = newend + 1
    > assert newend > thispoint[0] and nextpoint[1] > newstart, "new
    > range not valid!"
    > newthispoint = (thispoint[0], newend, thispoint[2])
    > newnextpoint = (newstart, nextpoint[1], nextpoint[2])
    > regions = newthispoint
    > regions[i+1] = newnextpoint
    > return regions
    >
    > regions = [ (0,10,2.5), (12,22,3.5), (15,25,1.2), (23, 30,0.01), (27,
    > 37,1.23), (30, 35, 1.45) ]
    > regions2 = [ (0,10,2.5), (1,11,1.1), (2,12,1.2) ]
    >
    > # works fine, produces [(0, 10, 2.5), (12, 18, 3.5), (19, 24, 1.2),
    > (25, 28, 0.01), (29, 33, 1.23), (34, 35, 1.45)]
    > print clipRanges(regions)
    > # violates "new range not valid" assertion
    > print clipRanges(regions2)


    Is the upper bound of a range inclusive or exclusive? If it's exclusive
    (like in Python) then it's empty only if lower == upper, and an overlap
    occurs only if first.upper > second.lower (and you can have newstart ==
    newend in your code).
    MRAB, May 21, 2009
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. M West
    Replies:
    1
    Views:
    473
  2. Andrew FPGA
    Replies:
    0
    Views:
    968
    Andrew FPGA
    Sep 26, 2005
  3. JakeC
    Replies:
    1
    Views:
    2,738
    Cowboy \(Gregory A. Beamer\)
    Dec 15, 2003
  4. SAL

    #Region #End Region issue

    SAL, Aug 29, 2008, in forum: ASP .Net
    Replies:
    1
    Views:
    355
    Alexey Smirnov
    Aug 29, 2008
  5. Bubu
    Replies:
    0
    Views:
    113
Loading...

Share This Page