[QUIZ] The Smallest Circle (#157)

B

Bill Kelly

From: "Lionel Bouton said:
Glad to see that I'm not the sole old-school minded :)

Hehe... if Wikipedia is to be believed, gnuplot is 22 years old, and
OpenGL is 16 years old. Practically siblings. ;)


Regards,

Bill
 
X

Xavier Noria

Without looking for literature (on purpose), I thought that an exact
solution could perhaps be computed from the convex hull of the set of
points. I've written a self-made algorithm to compute the convex hull,
but run out of time to continue with the problem.

Anyone?

-- fxn
 
L

Lionel Bouton

Lionel said:
Hi,

here's my solution. I initially thought that the best circle would be
either defined by a diameter (2 points) or one combinations of 3
points through which it passes. The idea was that given three points
defining a circle a fourth point is either in the circle or defines a
circle with 2 from the 3 originals that includes the remaining point.

This would be neat as I wasn't so happy about a gradient walking
solution. In this case it can be proven that you can deterministically
get the best solution by gradient walking (I don't think there are
local maximums) but it's not the general case. Unfortunately it can be
proven that such a circle can encircle all points, but not that it's
the smallest (and my experiments proved that it isn't).

Update : it seems I was wrong and probably had a bug in this
implementation as I just read this algorithm described as the best known
solution until 1983 :) On the paper referenced by Douglas :

http://www.cs.mcgill.ca/~cs507/projects/1998/jacob/

On a purely random set, my gradient walking is 27 times faster on 10000
points than Meggido's algorithm (but certainly slower in some cases,
Douglas' solution gets the result in a fairly constant time for a given
set size).

Hum, after testing, modifying the initial step for the gradient walking
from

max width of the set / point_number
to
max width of the set / 4

makes it competitive in all cases (12s on 10000 points instead of 270
for Meggido's) and not balanced data sets (which random ones are) only.

Lionel
 
L

Lionel Bouton

Bill said:
Hehe... if Wikipedia is to be believed, gnuplot is 22 years old, and
OpenGL is 16 years old. Practically siblings. ;)

True, but using OpenGL for rendering data sets was only recently made
easy thanks to Ruby ;-)

Lionel
 
B

Bill Kelly

From: "Frank Fischer said:
Had a lot of fun with this one :) My solution works by taking the average of
all points as the center, and the farthest outlying point as the radius. An
optimal solution is then found by moving the circle center towards the
outlier as the radius is reduced, while all points still are within the
radius. The same principles could be applied for higher dimensions as well.

Nice solution, this was also one of my first ideas. The problem with
this approach is if many points are close to each other and only few
points are far away. In this case the center is near that mass of points
and the line-search may fail to find an optimal solution. For example:

puts encircle( ([[0.0,0.0]] * 100 + [[0.0,1.0], [1.0,0.0]]).map{|p|
Point.new(*p)})

(100 points at (0,0), one at (1,0) and one at (0,1)) gives a solution of
{center:(0.00980392156862745, 0.00980392156862745), radius:0.990244611507173}
but optimal would be (1/2, 1/2) and radius 1/sqrt(2).

My variation using an axis-aligned bounding box to pick the
initial center point appears to handle the above case.

However I would not be surprised if it still might find some
sub-optimal solutions.... :-/


Regards,

Bill
 
T

ThoML

And again I reply to my own post.

This is getting a bad habit. Sorry. This slightly modified version
uses the center of the two points with maximum distance as starting
point. For random sets, the results fit quite well most of the time --
on certain sets it's farther off than the mass center version though.

Regards,
Thomas.


def encircle(points) # takes array of Point objects
points = points.uniq
return if points.nil? or points.empty?

return Circle.new(points[0], 0) if points.size == 1

m, n = points.combination(2).sort_by {|a, b| -distance(a, b)}[0]
a = Point.new(m.x / 2 + n.x / 2, m.y / 2 + n.y / 2)
return Circle.new(a, distance(a, m)) unless points.size > 2

points = points.sort_by {|p| -distance(a, p)}
f = points[0]
df = distance(f, a)
b = med(f, a)
e = 1e-10
1000.times do
db = distance(f, b)
if points.all? {|p| distance(b, p) <= db + e}
da = distance(f, a)
if (da - db).abs <= e
return Circle.new(b, db)
else
a, b = b, med(f, b)
end
else
b = med(a, b)
end
end
raise RuntimeError
end


def med(a, b)
Point.new((b.x - a.x) / 2.0 + a.x, (b.y - a.y) / 2.0 + a.y)
end


def distance(p1, p2)
Math.sqrt((p1.x - p2.x) ** 2 + (p1.y - p2.y) ** 2)
end
 
A

Alex Shulgin

As mentioned, this one may be more difficult than it seems. Feel free
to estimate the smallest circle, if you get stuck on getting the
exact solution.

This is really way more difficult than it seemed to me...

I was approaching some geometrical solution just to be shocked that
there are cases there not both ends of the longest chord lay on the
smallest circle!

Check out these (found by pure luck and help of random):

-0.385628 -0.435444
-0.319182 +0.478400
+0.257012 +0.333913
-0.243603 -0.496925

The longest chord is from points[0] to points[2] (of length 1.002445),
but correct solution appears to be the circle around a triangle [[1],
[2], [3]]. It includes second to longest chord [1]->[3] of length
0.978249.
 
L

Lionel Bouton

Philipp said:
Here is my solution. Although it uses a brute force approach if the
first attempts (driven by obvious constraints) of determining the
circle fail, it scales on larger sets, because the given equal
distibution of random points form an approximate square on these
sets.

Wow, that was awfully fast for the random distribution, it takes more
time to display the result than computing it even on 10000 points !

On badly shaped point clouds it suffers greatly, I couldn't get it to
compute the circle for 1000 points if I feed it with :

class Point < Struct.new:)x, :y)
def self.random
ro = rand / 2
theta = Math::pI * 2 * rand
Point.new(ro * Math::cos(theta) + 0.5, ro * Math::sin(theta) + 0.5)
end
[...]
end

The ruby process happily took more than 800MB and I had to kill it while
I could still get some feedback from my poor laptop :)


Note :
I found it suspicious that I could get better performance than the
accepted best mathematical solution in the general case and I reflected
a while on that.
I think the reason why the gradient walking is faster is because it is
inherently inaccurate. Megiddo's algorithm defines the exact position of
the center using a geometrical process. Unfortunately computers must
manipulate floating point data when describing points, segments,
vectors, circles, ... So any computer result is inherently inaccurate.
The gradient walking method profits from this inaccuracy: it stops when
the step becomes too small to move the current best position because of
rounding errors. This happens in o(n) steps where n is the number of
bits used to represent the floating point mantissa which is a huge win
(I don't remember the actual number of bits, but it must be < 64)
compared to the o(n) of Megiddo's where n is the number of *points*.



I benched the hull computing method and I believe that we can get the
best of both worlds : on any distribution of points I threw at it
(random in a square, in a disk or on a circle) computing the convex hull
took 0.4s for 10000 points on my laptop. To be as fast as possible, we
could run your algorithm up to the point where it switches on the brute
force method. Then, instead of looking for the best circle based on
three points of the hull, we'd use the gradient walking method. Using
only the hull points saves time in the most common cases and the
filtering is quick enough to be negligible in the worst case when no
point can be filtered out.

My tests only filtering the points using your ConvexHull class before
applying gradient walking returns in 0.5s for the initial distribution
(on a square) or on a disk and fall back to 12-13s on the worst case:
distribution on a circle.

Lionel.
 
L

Lionel Bouton

Philipp said:
What you call a 'badly shaped point clouds' is in fact an approximate
circle.

In fact initially I tested with a disk (which is an intermediate level
before the circle for the number of points on the convex hull), the
worst shape is indeed a circle (when you replace ro by a constant in my
random method for people still following) : the number of polygone
vertices is then the total number of points.
I confident it doesn't do any harm to your poor laptop this time. ;)

The little thing survived ! Clarkson's is good for the worst case (every
point on the same circle) which fits common sense (any circle should
encircle all points, so the first iteration should win minus rounding
errors in floating point computations), it's nearly 3x as fast as the
gradient walking in this case. The speed doesn't change too much given
various random shape (between 2 to 4s on my laptop). The gradient
walking still wins when a small portion of the points are on the convex
hull (random on a disk for example: 0.5s vs 4s).

I believe gradient walking should be faster when less than one third of
the points are on the convex hull, slower in the other case, but it's a
rough estimation based on observed behavior.

One thing that I don't like though is that the total time is difficult
to predict. Probabilistic algorithms often have good performance but I'm
always uneasy when I'm not sure how much time they can spend before
returning and even when they will exhibit bad behaviour. It's a matter
of constraints in which the code must fit and I mostly have to return
results in a timely manner in most of my projects :)

Is there any public paper describing the performance behavior of the
algorithm? All scientific papers returned by Google point to
paid-subscription materials.

Lionel

PS: is it me or is this becoming more an algorithm-Quiz than a
Ruby-Quiz? It's my first Quiz and I was expecting to see some clever use
of Ruby, but in this kind of Quiz I don't see much place for elegant code.
Not that I complain: I love to study the performance of algorithms too :)
 
P

Philipp Hofmann

One thing that I don't like though is that the total time is difficult
to predict. Probabilistic algorithms often have good performance but I'm
always uneasy when I'm not sure how much time they can spend before
returning and even when they will exhibit bad behaviour. It's a matter
of constraints in which the code must fit and I mostly have to return
results in a timely manner in most of my projects :)

Is there any public paper describing the performance behavior of the
algorithm? All scientific papers returned by Google point to
paid-subscription materials.

sorry, I haven't come across a public paper of clarkson's algorithm,
either. But it's proven that the algorithm has O(log(n)), n of course
being the number of given points.

I'm not going to do a full proof here, but it's an outline.

You need at max 3 points to describe the smallest enclosing circle. If
you haven't found it yet you increase the chance of every point that is
outside a circle determined by 13 random points each turn, by
doubleing their relative frequencies.

After k iterations at least one of the (at max) three points you are
looking for has a relative frequency of 2**(k/3)

In average the sum of relative frequencies is increased by a factor of
1+3/13 each iteration.

For an increasing k this leads to a paradox: Because at some point the
relative frequency of at least one point 2**(k/3) would exceed the
overall sum of relative frequencies (n*(1+3/13))**k, if the algorithm
hadn't come to an end, yet.

The only reference I found on the net where it is at least partially
explained and where I got the outline for the proof from is this (but
be warned, it's in german):

http://www-i1.informatik.rwth-aachen.de/~algorithmus/algo42.php

PS: is it me or is this becoming more an algorithm-Quiz than a
Ruby-Quiz? It's my first Quiz and I was expecting to see some clever use
of Ruby, but in this kind of Quiz I don't see much place for elegant
code.
Not that I complain: I love to study the performance of algorithms too :)

Even if this one seems to be more about the algorithm, we still do it
in Ruby, right? ;) Anyway we'll see what our brand new quizmaster will focus
on. ;)

g phil
 
M

Matthew Moss

PS: is it me or is this becoming more an algorithm-Quiz than a
Ruby-Quiz? It's my first Quiz and I was expecting to see some clever use
of Ruby, but in this kind of Quiz I don't see much place for elegant code.
Not that I complain: I love to study the performance of algorithms too :)


I aim to have a mix... Things that deal with algorithms to see how
people will approach them with Ruby, and things that focus more on
aspects particular aspects of Ruby. And then just some fun things.

Because of the nature of the problem, this quiz probably leaned very
heavily towards the algorithm side... I admit I wanted to see what
folks not familiar with the problem might do to estimate a solution.
 
B

Bill Kelly

From: "Matthew Moss said:
I admit I wanted to see what folks not familiar with the problem
might do to estimate a solution.

Hmm, yeah - I deliberately avoided researching existing algorithms.

Would be interesting to calculate how much my crude hack tends to
deviate from the perfect solution(s).


Regards,

Bill
 
T

ThoML

@bill
Would be interesting to calculate how much my crude hack tends to
deviate from the perfect solution(s).

You might be more interested in speed.


@matthew
I aim to have a mix...

I hope (most of) the quizzes will leave some room for "creativity".
Maybe there shouldn't be a proven set of best solutions. But a
comparison of well thought-out approaches is interesting of course.
I'm looking forward to the write-up.

Regards,
Thomas.
 
B

Bill Kelly

These are the times I get running the other solutions,
10 runs of 1000 points each. Each run, every solution is
fed the exact same point data.

Auuughhh!!

Sorry for another self-reply........ :-/

The benchmark script I attached to the earlier post did not,
in fact, pass the same point data to each solution per run.
(Maybe that'll teach me to make last minute changes without
verifying...)

The benchmark script at
http://tastyspleen.net/~billk/ruby/quiz/157-smallest-circle/benchmark/
now does correctly pass the same coordinate data to each script
per run.

. . .

I had hoped by averaging multiple runs (10) - each run having
different point data - that the rankings would have been pretty
stable.

However, JUSTIN and LIONEL still tend to trade places on
different invocations:

user system total real
FRANK 24.421000 0.297000 24.718000 ( 24.734000)
JUSTIN 6.110000 0.000000 6.110000 ( 6.109000)
LIONEL 6.078000 0.000000 6.078000 ( 6.094000)
DOUG 3.750000 0.000000 3.750000 ( 3.766000)
PHILIPP 1.609000 0.000000 1.609000 ( 1.609000)
BILL 0.328000 0.000000 0.328000 ( 0.328000)

user system total real
FRANK 24.656000 0.156000 24.812000 ( 24.828000)
LIONEL 6.047000 0.000000 6.047000 ( 6.047000)
JUSTIN 4.375000 0.000000 4.375000 ( 4.375000)
DOUG 3.281000 0.016000 3.297000 ( 3.297000)
PHILIPP 1.500000 0.000000 1.500000 ( 1.500000)
BILL 0.297000 0.000000 0.297000 ( 0.297000)

user system total real
FRANK 24.594000 0.172000 24.766000 ( 24.781000)
LIONEL 6.109000 0.000000 6.109000 ( 6.109000)
JUSTIN 5.922000 0.000000 5.922000 ( 5.922000)
DOUG 4.031000 0.016000 4.047000 ( 4.047000)
PHILIPP 1.344000 0.000000 1.344000 ( 1.344000)
BILL 0.281000 0.000000 0.281000 ( 0.282000)

user system total real
FRANK 24.203000 0.391000 24.594000 ( 24.672000)
LIONEL 6.109000 0.000000 6.109000 ( 6.125000)
JUSTIN 5.500000 0.000000 5.500000 ( 5.500000)
DOUG 4.188000 0.016000 4.204000 ( 4.219000)
PHILIPP 1.500000 0.000000 1.500000 ( 1.500000)
BILL 0.344000 0.000000 0.344000 ( 0.344000)

user system total real
FRANK 24.546000 0.219000 24.765000 ( 24.765000)
JUSTIN 6.047000 0.015000 6.062000 ( 6.062000)
LIONEL 5.828000 0.000000 5.828000 ( 5.828000)
DOUG 3.641000 0.000000 3.641000 ( 3.641000)
PHILIPP 1.688000 0.000000 1.688000 ( 1.688000)
BILL 0.296000 0.000000 0.296000 ( 0.296000)

(I have verified the rankings are stable if I hardwire
the random seed, though.)


Regards,

Bill
 
T

ThoML

Array#combination is ruby19 method. But it's probably a good thing you
didn't get my "solution" to run. If I remove irrelevant points in
advance, I get somewhat close to Frank's solution with respect to
speed but it still lags behind.

Anyway, I can offer some accuracy tests with random point sets (size
1..100).

s_tml2.rb | s_billk.rb | s_billk2.rb | s_lionel.rb | s_frank.rb |
s_justin.rb
Min: 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0004 |
0.0000
Max: 0.0225 | 0.0795 | 0.0422* | 0.0140 | 0.0063* |
0.0817
Avg: 0.0027 | 0.0297 | ------- | 0.0017 | 0.0058 |
0.0115

[*] Slightly invalid results, maybe the center isn't quite where it
should be?

x/y-range is -0.5..0.5

Since the number of test runs is rather small, you should take into
account a margin of 30% or so.

s_lionel is Lionel's very first submission because I have difficulties
to integrate the later solutions.

s_tml2 is my second solution that starts at the center of the longest
cord. The first solution yields results mostly identical to Justin's
submission.

Regards,
Thomas.
 
B

Bill Kelly

From: "Lionel Bouton said:
Here's a 157_benchmark_2.rb which benchs on several point distributions
(square, disk, circle and gaussian) and my code adapted to the benchmark
(the convex hull was done outside of encircle so it wasn't active for
the benchmark).

Neat! :)

-- Random on disk --
user system total real
FRANK 23.407000 0.313000 23.720000 ( 23.734000)
JUSTIN 4.359000 0.000000 4.359000 ( 4.360000)
DOUG 3.719000 0.031000 3.750000 ( 3.750000)
PHILIPP 2.703000 0.000000 2.703000 ( 2.704000)
LIONEL 0.734000 0.031000 0.765000 ( 0.765000)
BILL 0.282000 0.000000 0.282000 ( 0.281000)
-- Random on square --
user system total real
FRANK 27.734000 0.282000 28.016000 ( 28.016000)
JUSTIN 7.422000 0.000000 7.422000 ( 7.422000)
DOUG 3.703000 0.000000 3.703000 ( 3.703000)
PHILIPP 1.422000 0.000000 1.422000 ( 1.422000)
LIONEL 0.750000 0.000000 0.750000 ( 0.750000)
BILL 0.344000 0.000000 0.344000 ( 0.343000)
-- 2D Gaussian --
user system total real
FRANK 27.297000 0.093000 27.390000 ( 27.406000)
LIONEL 16.141000 0.000000 16.141000 ( 16.219000)
JUSTIN 8.047000 0.000000 8.047000 ( 8.094000)
PHILIPP 0.422000 0.000000 0.422000 ( 0.422000)
BILL 0.172000 0.000000 0.172000 ( 0.172000)
DOUG 0.140000 0.000000 0.140000 ( 0.140000)
-- Random on circle --
user system total real
FRANK 28.187000 0.079000 28.266000 ( 28.344000)
LIONEL 16.547000 0.000000 16.547000 ( 16.657000)
JUSTIN 7.297000 0.000000 7.297000 ( 7.328000)
PHILIPP 0.468000 0.000000 0.468000 ( 0.469000)
DOUG 0.172000 0.000000 0.172000 ( 0.171000)
BILL 0.141000 0.000000 0.141000 ( 0.141000)


Regards,

Bill
 
B

Bill Kelly

From: "ThoML said:
Anyway, I can offer some accuracy tests with random point sets (size
1..100).

s_tml2.rb | s_billk.rb | s_billk2.rb | s_lionel.rb | s_frank.rb | s_justin.rb
Min: 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0004 | 0.0000
Max: 0.0225 | 0.0795 | 0.0422* | 0.0140 | 0.0063* | 0.0817
Avg: 0.0027 | 0.0297 | ------- | 0.0017 | 0.0058 | 0.0115

[*] Slightly invalid results, maybe the center isn't quite where it
should be?

I don't quite understand how to interpret the numbers?

But I have to ask.. which is billk vs. billk2 ? If billk2 is less
accurate, would I be correct in assuming it is the naive bounding
box version?


Thanks,

Bill
 
L

Lionel Bouton

Lionel said:
Oups : the 2D Gaussian was the same as the circle one (stupid me)...
The gradient walking couldn't possibly be so slow on such a
distribution :)

I updated my submission with :
- a new heuristic for the initial step,
- an optimized step ratio for the gradient algorithm,
- a bug fix (it didn't run out of the benchmark).

Lionel

Note that if you filter the points with the convex hull, Justin's
algorithm becomes competitive with the gradient walking with roughly the
same performance profile:

-- Random on disk --
user system total real
FRANK 36.730000 0.040000 36.770000 ( 37.330030)
DOUG 6.570000 0.010000 6.580000 ( 6.680385)
PHILIPP 5.290000 0.000000 5.290000 ( 5.377509)
JUSTIN 1.090000 0.010000 1.100000 ( 1.107219)
LIONEL 0.900000 0.010000 0.910000 ( 0.915325)
BILL 0.610000 0.000000 0.610000 ( 0.622287)
-- Random on square --
user system total real
FRANK 35.420000 0.030000 35.450000 ( 36.056951)
DOUG 6.090000 0.010000 6.100000 ( 6.147446)
PHILIPP 2.720000 0.000000 2.720000 ( 2.766508)
JUSTIN 1.060000 0.000000 1.060000 ( 1.081301)
LIONEL 1.810000 0.010000 1.820000 ( 1.857297)
BILL 0.540000 0.000000 0.540000 ( 0.536318)
-- 2D Gaussian --
user system total real
FRANK 36.370000 0.050000 36.420000 ( 37.079492)
DOUG 4.730000 0.000000 4.730000 ( 4.816418)
PHILIPP 3.000000 0.010000 3.010000 ( 3.036339)
JUSTIN 1.270000 0.000000 1.270000 ( 1.279535)
LIONEL 0.810000 0.010000 0.820000 ( 0.828373)
BILL 0.540000 0.000000 0.540000 ( 0.544507)
-- Random on circle --
user system total real
FRANK 36.540000 0.030000 36.570000 ( 37.410887)
DOUG 0.260000 0.000000 0.260000 ( 0.255157)
PHILIPP 0.390000 0.000000 0.390000 ( 0.395307)
JUSTIN 11.350000 0.010000 11.360000 ( 11.568574)
LIONEL 12.730000 0.010000 12.740000 ( 13.040802)
BILL 0.170000 0.000000 0.170000 ( 0.166975)


Lionel
 
T

ThoML

I don't quite understand how to interpret the numbers?
But I have to ask.. which is billk vs. billk2 ? If billk2 is less
accurate, would I be correct in assuming it is the naive bounding
box version?

billk2 is the revised version. The maximum distance is lower, thus
it's more accurate. Basically, you'd want a 0.0 in all cells.

min ... minimal distance to the winner (smallest circle)
max ... maximal distance to the winner
avg ... average (arithmetic mean) distance to the winner

The table doesn't contain info about the diameter. I suppose a
relative measure would be more useful. Hm.

Here are two runs with relative numbers. The relevant measure is the
circle radius. The numbers are the distance in per cent of the correct
solution's radius.

s_tml.rb | s_billk.rb | s_billk2.rb | s_lionel.rb | s_frank.rb |
s_justin.rb
Min: 0.00 | 0.00 | 0.00 | 0.00 | 0.15 |
0.00
Max: 15.77 | 17.72 | 7.81 | 4.98 | 2.13 |
9.28
Avg: 0.54 | 4.79 | 1.18 | 0.43 | 1.03 |
1.38

s_tml.rb | s_billk.rb | s_billk2.rb | s_lionel.rb | s_frank.rb |
s_justin.rb
Min: 0.00 | 0.00 | 0.00 | 0.00 | 0.11 |
0.00
Max: 5.37 | 15.99 | 7.39 | 5.80 | 2.01 |
12.10
Avg: 0.36 | 5.07 | 1.07 | 0.35 | 1.01 |
1.38

Okay, now with Lionel's current version:

s_tml.rb | s_billk.rb | s_billk2.rb | s_lionel.rb | s_lionel4.rb
| s_frank.rb | s_justin.rb
Min: 0.00 | 0.00 | 0.00 | 0.00 | 0.00
| 0.26 | 0.00
Max: 4.79 | 18.35 | 9.15 | 3.33 | 4.66
| 1.95 | 17.57
Avg: 0.54 | 5.44 | 1.20 | 0.22 | 0.36
| 0.99 | 1.76

As you can see, may solution fails noticably on certain point sets but
does more or less well in average. At least for random point sets. I
think Justin's solution suffers from the same problem since his
approach is quite similar, only that he moves the center gradually bit
by bit while my "solution" ... well, let's not talk about that.

The performance of Frank's solution is quite constant albeit slightly
imprecise.

I didn't include Douglas's and Philip's solutions since they are
precise.

Regards,
Thomas.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,065
Latest member
OrderGreenAcreCBD

Latest Threads

Top