medians for degree measurements

S

Steve Howell

I just saw the thread for medians, and it reminded me of a problem
that I need to solve. We are writing some Python software for
sailing, and we need to detect when we've departed from the median
heading on the leg. Calculating arithmetic medians is
straightforward, but compass bearings add a twist.

The numerical median of 1, 2, 3, 4, 5, 6, 359 is 4. But for
navigational purposes you would actually order the numbers 359, 1, 2,
3, 4, 5, 6, so the desired median heading of the boat is actually 3.

Of course, you could express 359 better as -1 degrees to north, then
the sequence would be -1, 1, 2, 3, 4, 5, and 6. And you'd be good.

But that trick does not generalize if you go south instead, as you
have similar issues with -179, 174, 175, 176, 177, 178, and 179.

Has anybody solved this in Python, either for compass bearings or a
different domain? I can think of kind of a brute force solution where
you keep rotating the sequence until the endpoints are closest
together mod 360, but I wonder if there is something more elegant.
 
R

Robert Kern

I just saw the thread for medians, and it reminded me of a problem
that I need to solve. We are writing some Python software for
sailing, and we need to detect when we've departed from the median
heading on the leg. Calculating arithmetic medians is
straightforward, but compass bearings add a twist.

The numerical median of 1, 2, 3, 4, 5, 6, 359 is 4. But for
navigational purposes you would actually order the numbers 359, 1, 2,
3, 4, 5, 6, so the desired median heading of the boat is actually 3.

Of course, you could express 359 better as -1 degrees to north, then
the sequence would be -1, 1, 2, 3, 4, 5, and 6. And you'd be good.

But that trick does not generalize if you go south instead, as you
have similar issues with -179, 174, 175, 176, 177, 178, and 179.

Has anybody solved this in Python, either for compass bearings or a
different domain? I can think of kind of a brute force solution where
you keep rotating the sequence until the endpoints are closest
together mod 360, but I wonder if there is something more elegant.

Brute force doesn't suck too much when using numpy to do the heavy lifting.

In [158]: import numpy as np

# Worst case. A von Mises distribution "centered" around the "South pole"
# at pi/-pi.
In [159]: theta = np.random.vonmises(np.pi, 1.0, size=1000)

# Complex division makes circular distances easier to compute.
In [160]: z = np.exp(1j * theta)

# The newaxis bit plus numpy's broadcasting yields a 1000x1000 array of
# the quotient of each pair of points in the dataset.
In [161]: zdiv = z / z[:,np.newaxis]

# Convert back to angular distances. Take the absolute value. Sum along one
# dimension. Find the index of the element with the smallest mean absolute
# circular deviation when used as the putative median.
In [162]: i = abs(np.arctan2(zdiv.imag, zdiv.real)).sum(axis=1).argmin()

# As expected, the result is close to the mode of the distribution.
In [163]: theta
Out[163]: 3.0886753423606521

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
M

MRAB

Steve said:
I just saw the thread for medians, and it reminded me of a problem
that I need to solve. We are writing some Python software for
sailing, and we need to detect when we've departed from the median
heading on the leg. Calculating arithmetic medians is
straightforward, but compass bearings add a twist.

The numerical median of 1, 2, 3, 4, 5, 6, 359 is 4. But for
navigational purposes you would actually order the numbers 359, 1, 2,
3, 4, 5, 6, so the desired median heading of the boat is actually 3.

Of course, you could express 359 better as -1 degrees to north, then
the sequence would be -1, 1, 2, 3, 4, 5, and 6. And you'd be good.

But that trick does not generalize if you go south instead, as you
have similar issues with -179, 174, 175, 176, 177, 178, and 179.

Has anybody solved this in Python, either for compass bearings or a
different domain? I can think of kind of a brute force solution where
you keep rotating the sequence until the endpoints are closest
together mod 360, but I wonder if there is something more elegant.
When you read the headings clockwise the values normally increase and
you pick the middle one. However, when you cross north the values
decrease. You can spot whether the set of headings crosses north by
checking whether the difference between the minimum and maximum is
greater than 180. If it is then make the westerly headings negative,
sort the values, and pick the middle one, adding 180 if it's negative.

def compass_median(points):
points = sorted(points)
if points[-1] - points[0] > 180:
points = [(p - 360 if p > 180 else p) for p in points]
points.sort()
median = points[len(points) // 2]
return median + 360 if median < 0 else median
 
S

Steve Howell

Steve said:
I just saw the thread for medians, and it reminded me of a problem
that I need to solve.  We are writing some Python software for
sailing, and we need to detect when we've departed from the median
heading on the leg.  Calculating arithmetic medians is
straightforward, but compass bearings add a twist.
The numerical median of 1, 2, 3, 4, 5, 6, 359 is 4.  But for
navigational purposes you would actually order the numbers 359, 1, 2,
3, 4, 5, 6, so the desired median heading of the boat is actually 3.
Of course, you could express 359 better as -1 degrees to north, then
the sequence would be -1, 1, 2, 3, 4, 5, and 6.  And you'd be good.
But that trick does not generalize if you go south instead, as you
have similar issues with -179, 174, 175, 176, 177, 178, and 179.
Has anybody solved this in Python, either for compass bearings or a
different domain?  I can think of kind of a brute force solution where
you keep rotating the sequence until the endpoints are closest
together mod 360, but I wonder if there is something more elegant.

When you read the headings clockwise the values normally increase and
you pick the middle one. However, when you cross north the values
decrease. You can spot whether the set of headings crosses north by
checking whether the difference between the minimum and maximum is
greater than 180. If it is then make the westerly headings negative,
sort the values, and pick the middle one, adding 180 if it's negative.

def compass_median(points):
     points = sorted(points)
     if points[-1] - points[0] > 180:
         points = [(p - 360 if p > 180 else p) for p in points]
         points.sort()
     median = points[len(points) // 2]
     return median + 360 if median < 0 else median

I like this implementation, and it would probably work 99.9999% of the
time for my particular use case. The only (very contrived) edge case
that I can think of is when you have 10 bearings to SSW, 10 bearings
to SSE, and the two outliers are unfortunately in the NE and NW
quadrants. It seems like the algorithm above would pick one of the
outliers.

Maybe the refinement to the algorithm above would be to find the mean
first, by summing sines and cosines of the bearings, taking the
quotient, and applying the arctangent. Then use the resulting angle
as the equivalent of "due north" and adjust angles to be within (-180,
180) respect to the mean, pretty much as you do in the code above,
with minor modifications.

I realize the problem as I stated it as sort of ill-defined.

The way you stated the solution made me realize more deeply that you
basically have a list that needs to be rotated either clockwise or
counterclockwise in certain situations. Instead of using the 180-
degree check to coarsely (but usually effectively) rotate your frame
of reference, you could instead use the mean to eliminate even more
edge cases.
 
N

Nobody

I just saw the thread for medians, and it reminded me of a problem
that I need to solve. We are writing some Python software for
sailing, and we need to detect when we've departed from the median
heading on the leg. Calculating arithmetic medians is
straightforward, but compass bearings add a twist.

The numerical median of 1, 2, 3, 4, 5, 6, 359 is 4. But for
navigational purposes you would actually order the numbers 359, 1, 2,
3, 4, 5, 6, so the desired median heading of the boat is actually 3.

Of course, you could express 359 better as -1 degrees to north, then
the sequence would be -1, 1, 2, 3, 4, 5, and 6. And you'd be good.

But that trick does not generalize if you go south instead, as you
have similar issues with -179, 174, 175, 176, 177, 178, and 179.

Has anybody solved this in Python, either for compass bearings or a
different domain? I can think of kind of a brute force solution where
you keep rotating the sequence until the endpoints are closest
together mod 360, but I wonder if there is something more elegant.

First, there isn't always a solution; what would you consider to be the
median of [0, 90, 180, 270]?

In the case where the bearings are clustered, one approach is to
convert each bearing from polar to cartesian coordinates, compute the
centroid, then convert back to polar coordinates, i.e.:

from math import degrees, radians, sin, cos, atan2

def mean(bearings):
x = sum(sin(radians(a)) for a in bearings)
y = sum(cos(radians(a)) for a in bearings)
return degrees(atan2(x, y))

Then, subtract the mean from each bearing, coerce all angles into the
range -180..+180, calculate the median, add the mean, coerce back to
0..360.

def median(bearings):
m = mean(bearings)
bearings = [(a - m + 180) % 360 - 180 for a in bearings]
bearings.sort()
median = bearings[len(bearings) / 2]
median += m
median %= 360
return median
 
S

Steve Howell

I just saw the thread for medians, and it reminded me of a problem
that I need to solve.  We are writing some Python software for
sailing, and we need to detect when we've departed from the median
heading on the leg.  Calculating arithmetic medians is
straightforward, but compass bearings add a twist.
The numerical median of 1, 2, 3, 4, 5, 6, 359 is 4.  But for
navigational purposes you would actually order the numbers 359, 1, 2,
3, 4, 5, 6, so the desired median heading of the boat is actually 3.
Of course, you could express 359 better as -1 degrees to north, then
the sequence would be -1, 1, 2, 3, 4, 5, and 6.  And you'd be good.
But that trick does not generalize if you go south instead, as you
have similar issues with -179, 174, 175, 176, 177, 178, and 179.
Has anybody solved this in Python, either for compass bearings or a
different domain?  I can think of kind of a brute force solution where
you keep rotating the sequence until the endpoints are closest
together mod 360, but I wonder if there is something more elegant.

First, there isn't always a solution; what would you consider to be the
median of [0, 90, 180, 270]?

You probably posted before seeing my recent post. I agree that the
problem is ill-defined for certain cases.
In the case where the bearings are clustered, one approach is to
convert each bearing from polar to cartesian coordinates, compute the
centroid, then convert back to polar coordinates, i.e.:

        from math import degrees, radians, sin, cos, atan2

        def mean(bearings):
                x = sum(sin(radians(a)) for a in bearings)
                y = sum(cos(radians(a)) for a in bearings)
                return degrees(atan2(x, y))

Then, subtract the mean from each bearing, coerce all angles into the
range -180..+180, calculate the median, add the mean, coerce back to
0..360.

        def median(bearings):
                m = mean(bearings)
                bearings = [(a - m + 180) % 360 - 180 for a in bearings]
                bearings.sort()
                median = bearings[len(bearings) / 2]
                median += m
                median %= 360
                return median

Yep, that's exactly the solution I'm leaning toward now.
 
T

Terry Reedy

On Jan 22, 5:12 pm, MRAB<[email protected]> wrote:
I like this implementation, and it would probably work 99.9999% of the
time for my particular use case. The only (very contrived) edge case
that I can think of is when you have 10 bearings to SSW, 10 bearings
to SSE, and the two outliers are unfortunately in the NE and NW
quadrants. It seems like the algorithm above would pick one of the
outliers.

Maybe the refinement to the algorithm above would be to find the mean
first, by summing sines and cosines of the bearings, taking the
quotient, and applying the arctangent. Then use the resulting angle
as the equivalent of "due north" and adjust angles to be within (-180,
180) respect to the mean, pretty much as you do in the code above,
with minor modifications.

I was going to suggest this. Let us know if it seems to work.
I realize the problem as I stated it as sort of ill-defined.

Yes, I think this one reason stat books typically ignore directional
data. I think it is an unfortunate omission.

Terry Jan Reedy
 
S

Steven D'Aprano

[...]
I like this implementation, and it would probably work 99.9999% of the
time for my particular use case. The only (very contrived) edge case
that I can think of is when you have 10 bearings to SSW, 10 bearings to
SSE, and the two outliers are unfortunately in the NE and NW quadrants.
It seems like the algorithm above would pick one of the outliers.

The trouble is that median of angular measurements is not a meaningful
concept. The median depends on the values being ordered, but angles can't
be sensibly ordered. Which is larger, 1 degree north or 359 degrees? Is
the midpoint between them 0 degree or 180 degree?

The median of the number line 0 <= x <= 360 is 180, but what's the median
of the circle 0 deg <= theta <= 360 deg? With no end points, it's
meaningless to talk about the middle.

For this reason, I expect that taking the median of bearing measurements
isn't well defined. You can probably get away with it so long as the
bearings are "close", where "close" itself is ill-defined.

In any case, this isn't a programming problem, it's a mathematics
problem. I think you're better off taking it to a maths or statistics
forum, and see what they say.
 
A

Arnaud Delobelle

Steve Howell said:
I just saw the thread for medians, and it reminded me of a problem
that I need to solve. We are writing some Python software for
sailing, and we need to detect when we've departed from the median
heading on the leg. Calculating arithmetic medians is
straightforward, but compass bearings add a twist.

The numerical median of 1, 2, 3, 4, 5, 6, 359 is 4. But for
navigational purposes you would actually order the numbers 359, 1, 2,
3, 4, 5, 6, so the desired median heading of the boat is actually 3.

Of course, you could express 359 better as -1 degrees to north, then
the sequence would be -1, 1, 2, 3, 4, 5, and 6. And you'd be good.

But that trick does not generalize if you go south instead, as you
have similar issues with -179, 174, 175, 176, 177, 178, and 179.

Has anybody solved this in Python, either for compass bearings or a
different domain? I can think of kind of a brute force solution where
you keep rotating the sequence until the endpoints are closest
together mod 360, but I wonder if there is something more elegant.

Here's a simple (too simple?) way to do it:

1. sort the bearings in ascending order
2. find the largest gap between two consecutive bearings
3. cut the circle at this point and now find the median the
normal way

In Python:
.... bearings = sorted(bearings)
.... n = len(bearings)
.... i = max(xrange(n), key=lambda i: (bearings[(i+1)%n] - bearings)%360)
.... return bearings[(i + (n+1)//2)%n]
....
median_bearing([1,2,3,4,5,6,359]) 3
median_bearing([-179, 174, 175, 176, 177, 178, 179])
177

I guess there may be cases where the results that it gives are still not
very good, as in general there isn't a good notion of median for cyclic
data. E.g. what would be the median of [0, 180] - it could equally be
090 or 270. Or the median of [0, 120, 240] has the same problem. But I
suppose your data couldn't be like this as then it would be ill-advised
to try to apply a notion of median to it.

But it will work well I believe with quite localized data set
with a few, even wildly inaccurate, outliers. E.g.
median_bearing([123, 124, 125, 125, 126, 10, 240])
125

HTH
 
R

Robert Kern

Steve Howell wrote:
I just saw the thread for medians, and it reminded me of a problem
that I need to solve. We are writing some Python software for
sailing, and we need to detect when we've departed from the median
heading on the leg. Calculating arithmetic medians is
straightforward, but compass bearings add a twist.
[...]
I like this implementation, and it would probably work 99.9999% of the
time for my particular use case. The only (very contrived) edge case
that I can think of is when you have 10 bearings to SSW, 10 bearings to
SSE, and the two outliers are unfortunately in the NE and NW quadrants.
It seems like the algorithm above would pick one of the outliers.

The trouble is that median of angular measurements is not a meaningful
concept. The median depends on the values being ordered, but angles can't
be sensibly ordered. Which is larger, 1 degree north or 359 degrees? Is
the midpoint between them 0 degree or 180 degree?

Then don't define the median that way. Instead, define the median as the point
that minimizes the sum of the absolute deviations of the data from that point
(the L1 norm of the deviations, for those familiar with that terminology). For
1-D data on the real number line, that corresponds to sorting the data and
taking the middle element (or the artithmetic mean of the middle two in the case
of even-numbered data). My definition applies to other spaces, too, that don't
have a total order attached to them including the space of angles.

The "circular median" is a real, well-defined statistic that is used for exactly
what the OP intends to use it for.
The median of the number line 0<= x<= 360 is 180, but what's the median
of the circle 0 deg<= theta<= 360 deg? With no end points, it's
meaningless to talk about the middle.

For this reason, I expect that taking the median of bearing measurements
isn't well defined. You can probably get away with it so long as the
bearings are "close", where "close" itself is ill-defined.
>
In any case, this isn't a programming problem, it's a mathematics
problem. I think you're better off taking it to a maths or statistics
forum, and see what they say.

<puts on statistics hat>
Same answer with my statistics hat on or off. :)
</puts on statistics hat>

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
B

Bas

[snip problem with angle data wrapping around at 360 degrees]

Hi,

This problem is trivial to solve if you can assume that you that your
data points are measured consecutively and that your boat does not
turn by more than 180 degrees between two samples, which seems a
reasonable use case. If you cannot make this assumption, the answer
seems pretty arbitrary to me anyhow. The standard trick in this
situation is to 'unwrap' the data (fix > 180 deg jumps by adding or
subtracting 360 to subsequent points), do your thing and then 'rewrap'
to your desired interval ([0-355] or [-180,179] degrees).

In [1]: from numpy import *

In [2]: def median_degree(degrees):
...: return mod(rad2deg(median(unwrap(deg2rad(degrees)))),360)
...:

In [3]: print(median_degree([1, 2, 3, 4, 5, 6, 359]))
3.0

In [4]: print(median_degree([-179, 174, 175, 176, 177, 178, 179]))
177.0

If the deg2rad and rad2deg bothers you, you should write your own
unwrap function that handles data in degrees.

Hope this helps,
Bas

P.S.
Slightly off-topic rant against both numpy and matlab implementation
of unwrap: They always assume data is in radians. There is some option
to specify the maximum jump size in radians, but to me it would be
more useful to specify the interval of a complete cycle, so that you
can do

unwrapped_radians = unwrap(radians)
unwrapped_degrees = unwrap(degrees, 360)
unwrapped_32bit_counter = unwrap(overflowing_counter, 2**32)
 
R

Robert Kern

P.S.
Slightly off-topic rant against both numpy and matlab implementation
of unwrap: They always assume data is in radians. There is some option
to specify the maximum jump size in radians, but to me it would be
more useful to specify the interval of a complete cycle, so that you
can do

unwrapped_radians = unwrap(radians)
unwrapped_degrees = unwrap(degrees, 360)
unwrapped_32bit_counter = unwrap(overflowing_counter, 2**32)

Rants accompanied with patches are more effective. :)

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
B

Bas

Rants accompanied with patches are more effective. :)

As you wish (untested):

def unwrap(p, cycle=2*pi, axis=-1):
"""docstring to be updated"""
p = asarray(p)
half_cycle = cycle / 2
nd = len(p.shape)
dd = diff(p, axis=axis)
slice1 = [slice(None, None)]*nd # full slices
slice1[axis] = slice(1, None)
ddmod = mod(dd+half_cycle, cycle)-half_cycle
_nx.putmask(ddmod, (ddmod==-half_cycle) & (dd > 0), half_cycle)
ph_correct = ddmod - dd;
_nx.putmask(ph_correct, abs(dd)<half_cycle, 0)
up = array(p, copy=True, dtype='d')
up[slice1] = p[slice1] + ph_correct.cumsum(axis)
return up

I never saw a use case for the discontinuity argument, so in my
preferred version it would be removed. Of course this breaks old code
(by who uses this option anyhow??) and breaks compatibility between
matlab and numpy.

Chears,
Bas
 
R

Robert Kern

Rants accompanied with patches are more effective. :)

As you wish (untested):

def unwrap(p, cycle=2*pi, axis=-1):
"""docstring to be updated"""
p = asarray(p)
half_cycle = cycle / 2
nd = len(p.shape)
dd = diff(p, axis=axis)
slice1 = [slice(None, None)]*nd # full slices
slice1[axis] = slice(1, None)
ddmod = mod(dd+half_cycle, cycle)-half_cycle
_nx.putmask(ddmod, (ddmod==-half_cycle)& (dd> 0), half_cycle)
ph_correct = ddmod - dd;
_nx.putmask(ph_correct, abs(dd)<half_cycle, 0)
up = array(p, copy=True, dtype='d')
up[slice1] = p[slice1] + ph_correct.cumsum(axis)
return up

I never saw a use case for the discontinuity argument, so in my
preferred version it would be removed. Of course this breaks old code
(by who uses this option anyhow??) and breaks compatibility between
matlab and numpy.

Sometimes legitimate features have phase discontinuities greater than pi. If you
want your feature to be accepted, please submit a patch that does not break
backwards compatibility and which updates the docstring and tests appropriately.
I look forward to seeing the complete patch! Thank you.

http://projects.scipy.org/numpy

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
B

Bas

[snip]
I never saw a use case for the discontinuity argument, so in my
preferred version it would be removed. Of course this breaks old code
(by who uses this option anyhow??) and breaks compatibility between
matlab and numpy.

Sometimes legitimate features have phase discontinuities greater than pi.

We are dwelling more and more off-topic here, but anyhow: According to
me, the use of unwrap is inherently related to measurement instruments
that wrap around, like rotation encoders, interferometers or up/down
counters. Say you have a real phase step of +1.5 pi, how could you
possibly discern if from a real phase step of -pi/2? This is like an
aliasing problem, so the only real solution would be to increase the
sampling speed of your system. To me, the discontinuity parameter is
serving some hard to explain corner case (see matlab manual), which is
better left to be solved by hand in cases it appears. I regret matlab
ever added the feature.
If you want your feature to be accepted, please submit a patch that does not break
backwards compatibility and which updates the docstring and tests appropriately.
I look forward to seeing the complete patch! Thank you.

I think my 'cycle' argument does have real uses, like the degrees in
this thread and the digital-counter example (which comes from own
experience and required me to write my own unwrap). I'll try to submit
a non-breaking patch if I ever have time.

Bas
 
S

Steve Howell

Steve Howell wrote:
I just saw the thread for medians, and it reminded me of a problem
that I need to solve.  We are writing some Python software for
sailing, and we need to detect when we've departed from the median
heading on the leg.  Calculating arithmetic medians is
straightforward, but compass bearings add a twist. [...]
I like this implementation, and it would probably work 99.9999% of the
time for my particular use case.  The only (very contrived) edge case
that I can think of is when you have 10 bearings to SSW, 10 bearings to
SSE, and the two outliers are unfortunately in the NE and NW quadrants..
It seems like the algorithm above would pick one of the outliers.
The trouble is that median of angular measurements is not a meaningful
concept. The median depends on the values being ordered, but angles can't
be sensibly ordered. Which is larger, 1 degree north or 359 degrees? Is
the midpoint between them 0 degree or 180 degree?

Then don't define the median that way. Instead, define the median as the point
that minimizes the sum of the absolute deviations of the data from that point
(the L1 norm of the deviations, for those familiar with that terminology).. For
1-D data on the real number line, that corresponds to sorting the data and
taking the middle element (or the artithmetic mean of the middle two in the case
of even-numbered data). My definition applies to other spaces, too, that don't
have a total order attached to them including the space of angles.

The "circular median" is a real, well-defined statistic that is used for exactly
what the OP intends to use it for.

I admitted pretty early in the thread that I did not define the
statistic with much rigor, although most people got the gist of the
problem, and as Robert points out, you can more clearly the define the
problem, although I think under any definition, some inputs will have
multiple solutions, such as (0, 90, 180, 270) and (0, 120, 240). If
you've ever done lake sailing, you probably have encountered days
where the wind seems to be coming from those exact angles.

This is the code that I'll be using (posted by "Nobody"). I'll report
back it if it has any issues.


def mean(bearings):
x = sum(sin(radians(a)) for a in bearings)
y = sum(cos(radians(a)) for a in bearings)
return degrees(atan2(x, y))

def median(bearings):
m = mean(bearings)
bearings = [(a - m + 180) % 360 - 180 for a in
bearings]
bearings.sort()
median = bearings[len(bearings) / 2]
median += m
median %= 360
return median
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,014
Latest member
BiancaFix3

Latest Threads

Top