Distance normalized TSP algorithm

J

JSH

I came up with this idea for figuring out a path with the Traveling
Salesman Problem and posted about it, and an objection that came up
was the the algorithm required the distance between each of the nodes,
while the standard problem does not require this information.

In reaction to that objection I got upset, but now after taking some
time to think and ponder the problem I found I could handle the issue
easily so that a new distance normalized algorithm could handle any
TSP setup.

The simple idea is that if there are m nodes, and there is no distance
information between nodes given, then they are to be considered to be
embedded in an m-1 dimensional space equidistant from each other.

Then the algorithm proceeds as before, where you have two travelers
T_1 and T_2, where I'll talk through a single weight, but you can have
multiple weights. With a single weight between nodes, I'll use cost
to keep the idea closer to the standard way of stating the problem,
and also so that they complete a round trip circuit, I'll say they
begin from the same node N_1.

To use the algorithm and take them to the next nodes, T_1 considers
moving to another node, which I'll say is N_2.

Then T_2 considers moving to all other nodes available, not currently
under consideration by T_1, and not already visited, so T_2 might
start with N_3 and proceed from there, and calculates the
cost*distance from each path BACK to N_1 because T_2 is moving
BACKWARDS, while T_1 is moving forwards.

Notice that with the distance normalized algorithm you just have cost
considered as distance=1. Also if there are multiple weights you
simply multiply by each weight, so if there were a cost and a time for
each path, you'd have cost*time*distance and so forth.

T_2 considers every available move back to N_1, and selects the one
with the least value, and returns that information.

T_1 stores that info and now considers moving to N_3, and T_2 now
calculates for all available nodes again, excluding the one under
consideration or already visited, so now T_2 could start with N_2 and
loop through all available, and again T_2 picks the lowest cost move
BACK to N_1 and returns that to T_1.

In this way, all possible moves by T_1 and T_2 are considered, and
after T_1 has gotten values back from T_2 for each possible move, T_1
selects the lowest cost move, and then BOTH T_1 and T_2 make their
moves, and T_1 moves forward to the node that fits that least path,
while T_2 moves BACKWARD to the node that does, where those will be
different nodes and there is a rule that T_1 and T_2 can never move to
the same node.

A special case occurs if there are only 3 nodes left, meaning only one
is unvisited, and in this case the algorithm arbitrarily has T_1 move
forward to that node.

Notice that when the algorithm exits you have a path taken by T_1 and
a path taken by T_2, but you collapse that into a single journey by a
single traveler moving along the path of T_1, and then continuing
forward along the path of T_2, where to conceptualize it, T_2 is the
same traveler moving backwards in time to meet himself moving
forwards.

I am prepared to step through a simple example if that is necessary
but would not mind if someone else might help out in that regard, as
I'm curious to know if ANYONE else in the world can understand this
algorithm!!!

It seems so simple to me but often it's hard to explain something new
to people so I understand that it may take some time and patience for
others to know how to do the algorithm and I apologize for getting
frustrated and angry before, versus just coming up with the normalized
algorithm earlier.

Oh yeah, I'm still angling for coders if I can get some people
interested in this thing, as I have a Google Code project called
optimalpathengine.


James Harris
 
P

Patricia Shanahan

JSH wrote:
....
The simple idea is that if there are m nodes, and there is no distance
information between nodes given, then they are to be considered to be
embedded in an m-1 dimensional space equidistant from each other.

I believe that my original example is well-formulated according to the
rules you are now using. To save everyone digging back through the
depths of the previous thread, I'll repeat it here:

"Cities A, B, C, D, E. I'm assuming symmetrical costs - getting from A
to B is the same cost as B to A.

C,A is 1000. C,B is 1000. C,D is 2000. C,E is 2000. All other costs are 1."
Then the algorithm proceeds as before, where you have two travelers
T_1 and T_2, where I'll talk through a single weight, but you can have
multiple weights. With a single weight between nodes, I'll use cost
to keep the idea closer to the standard way of stating the problem,
and also so that they complete a round trip circuit, I'll say they
begin from the same node N_1.

You don't specify any criteria for picking the first node, so I assume
the algorithm is supposed to work regardless of that choice. Suppose
they start, in my example, at A.
To use the algorithm and take them to the next nodes, T_1 considers
moving to another node, which I'll say is N_2.

Then T_2 considers moving to all other nodes available, not currently
under consideration by T_1, and not already visited, so T_2 might
start with N_3 and proceed from there, and calculates the
cost*distance from each path BACK to N_1 because T_2 is moving
BACKWARDS, while T_1 is moving forwards.

Notice that with the distance normalized algorithm you just have cost
considered as distance=1. Also if there are multiple weights you
simply multiply by each weight, so if there were a cost and a time for
each path, you'd have cost*time*distance and so forth.

T_2 considers every available move back to N_1, and selects the one
with the least value, and returns that information.

T_1 stores that info and now considers moving to N_3, and T_2 now
calculates for all available nodes again, excluding the one under
consideration or already visited, so now T_2 could start with N_2 and
loop through all available, and again T_2 picks the lowest cost move
BACK to N_1 and returns that to T_1.

As far as I can tell, this procedure would result in T_1 and T_2 moving
from A to two distinct nodes chosen from {B, D, E}. Each of those moves
has cost 1. However, any path that contains e.g. both AB and DA would
not be able to use AC. Since C has only two cost 1000 edges, not using
one of them forces use of at least one of the cost 2000 edges. Such a
path would have cost either 3003 or 4003.

If I am misinterpreting your algorithm, please supply pseudo-code or
other specification, including how you choose the starting node.

An optimal path, such as ACBDEA, has total cost 2003, but that can only
be achieved if T_1 or T_2, starting at A, choses to go to C, at a cost
of 1000, rather than to B, D, or E at a cost of 1. How would either T_1
or T_2 know that a cost 1000 move is better for the overall solution
than a cost 1 move?

The core difficulty in the NP-complete problems is that making a series
of locally optimal choices cannot be depended on to lead to a globally
optimal solution.

Patricia
 
J

Joshua Cranmer

JSH said:
In this way, all possible moves by T_1 and T_2 are considered, and
after T_1 has gotten values back from T_2 for each possible move, T_1
selects the lowest cost move, and then BOTH T_1 and T_2 make their
moves, and T_1 moves forward to the node that fits that least path,
while T_2 moves BACKWARD to the node that does, where those will be
different nodes and there is a rule that T_1 and T_2 can never move to
the same node.

As Patricia points out, this breaks down in at least one case. The case
is this (graphically put):
¢
A --- B
$ / \ $ / \ $
/ \$/ \
* --- C --- *
$$$ $$$

In short, if C is a node with expensive connections to all but A and B
(which are of moderate expense), while A and B both have cheap
connections to everywhere but C, and even cheaper between the two of
them, then the traveler will see get to one of A or B, find the cheapest
one to be the other one, move to it, and then move back out into the
rest of the graph, cutting off the cheapest connections to C.

The example Patricia gave violated the Triangle Inequality, but looking
at the example I have, it is very likely that it fails even if you
assume the Triangle Inequality (I haven't done any analysis as extensive
of Patricia).

This, by the way, illustrates the danger of NP-hard problems: when you
come to a choice, the "best" solution at the onset can lead you into
pitfalls. Which means that you have to take into account not only the
cost of moving from one node to another but the loss of savings in not
moving to a third one (if that makes any sense).
A special case occurs if there are only 3 nodes left, meaning only one
is unvisited, and in this case the algorithm arbitrarily has T_1 move
forward to that node.

Three unvisited nodes or three nodes including the ones T_1 and T_2 are on?
I am prepared to step through a simple example if that is necessary
but would not mind if someone else might help out in that regard, as
I'm curious to know if ANYONE else in the world can understand this
algorithm!!!

One exclamation point suffices. I understand it sufficiently to reason
on it, and I could probably code it up reasonably fast if I cared to.
It's still ill-formed: you don't mention what to do in case of ties.
 
J

JSH

JSH wrote:

...


I believe that my original example is well-formulated according to the
rules you are now using. To save everyone digging back through the
depths of the previous thread, I'll repeat it here:
Thanks!

"Cities A, B, C, D, E. I'm assuming symmetrical costs - getting from A
to B is the same cost as B to A.

C,A is 1000. C,B is 1000. C,D is 2000. C,E is 2000. All other costs are 1.."


You don't specify any criteria for picking the first node, so I assume
the algorithm is supposed to work regardless of that choice. Suppose
they start, in my example, at A.

Good catch. That is just a miss on my part, but it's easily fixable,
by saying that a path is calculated for each possible starting node in
turn. The algorithm is still polynomial time with that change.

As that's fairly easy I'd like to focus first on getting a full path
from an arbitrary start at just any node, say, chosen randomly.

I realized an omission in my primary algorithm, as I simply forget
about the cost for T_1 from N_1 to the node being considered, which is
kind of an odd thing to forget.

The solution and correction to the algorithm is that cost is
multiplied as well.

So if cost from N_1 to N_2 is cost_1 for T_1 and cost from N_1 to N_3
is cost_2 for T_2, then the calculation is cost_1*cost_2*distance,
where here distance is 1, so it is cost_1*cost_2.

With the correction you'd have two weight values for each, so

cost_1*cost_2*time_1*time_2*distance

If more than one path has the least value then you randomly pick.
As far as I can tell, this procedure would result in T_1 and T_2 moving
from A to two distinct nodes chosen from {B, D, E}. Each of those moves
has cost 1. However, any path that contains e.g. both AB and DA would
not be able to use AC. Since C has only two cost 1000 edges, not using
one of them forces use of at least one of the cost 2000 edges. Such a
path would have cost either 3003 or 4003.

Oh, that's not good. I'll need to consider this carefully later.
Kind of in a rush this morning...
If I am misinterpreting your algorithm, please supply pseudo-code or
other specification, including how you choose the starting node.

An optimal path, such as ACBDEA, has total cost 2003, but that can only
be achieved if T_1 or T_2, starting at A, choses to go to C, at a cost
of 1000, rather than to B, D, or E at a cost of 1. How would either T_1
or T_2 know that a cost 1000 move is better for the overall solution
than a cost 1 move?

The core difficulty in the NP-complete problems is that making a series
of locally optimal choices cannot be depended on to lead to a globally
optimal solution.

Patricia

Thanks for the reply! I'll have to consider it in detail later as I'm
in a rush this morning, but wanted to add the correction.

If you are correct then that kills the usefulness of the idea for TSP,
but doesn't mean I won't still program the algorithm out of curiosity.


James Harris
 
J

Joshua Cranmer

JSH said:
Good catch. That is just a miss on my part, but it's easily fixable,
by saying that a path is calculated for each possible starting node in
turn. The algorithm is still polynomial time with that change.

Keeping score, that change should make it O(N^4):
for each starting node
for each node in path (actually N/2, but ignoring constants)
for each possible next node for T_1
for each possible previous node for T_2
So if cost from N_1 to N_2 is cost_1 for T_1 and cost from N_1 to N_3
is cost_2 for T_2, then the calculation is cost_1*cost_2*distance,
where here distance is 1, so it is cost_1*cost_2.

Forgive me for asking, but why multiply the two costs instead of adding
them together?
If more than one path has the least value then you randomly pick.

Answers my question in other post, but still a disconcerting answer,
since (as I explained again other there) you lose something by moving to
a node, unlike other working greedy algorithms where "moving" to a node
is only expanding a search tree.

Taking this time to expound on what I think the flaw in the algorithm
is: you're ignoring the opportunity cost of making a move.
 
C

Christian

JSH said:
I came up with this idea for figuring out a path with the Traveling
Salesman Problem and posted about it, and an objection that came up
was the the algorithm required the distance between each of the nodes,
while the standard problem does not require this information.

In reaction to that objection I got upset, but now after taking some
time to think and ponder the problem I found I could handle the issue
easily so that a new distance normalized algorithm could handle any
TSP setup.

The simple idea is that if there are m nodes, and there is no distance
information between nodes given, then they are to be considered to be
embedded in an m-1 dimensional space equidistant from each other.

Don't do this ... its just getting more complicated..
Just take Euklidean TSP ...
Embed you m nodes in a 2 dimensional plane ..

You get your distances as weights and everything is fine ...
The Euclidean TSP is still NP hard. So it will solve normal TSP if you
can solve it.

I know people said you would have to create this data yourself in the
previous thread... but that was only in general correct not in
particular for this problem... there is no need to as Euclidean TSP has
been prooven to be NP hard 30 years ago .. (also some good
approximations exist to solve it)



Christian
 
P

Patricia Shanahan

JSH wrote:
....
As that's fairly easy I'd like to focus first on getting a full path
from an arbitrary start at just any node, say, chosen randomly.
....

Unless you are aiming for a probabilistic algorithm, which may get
different results for the same problem on different runs, don't pick the
start node at random. Make up some rule, no matter how arbitrary.

The one I used was "First vertex mentioned in the problem description".

Patricia
 
J

JSH

Don't do this ... its just getting more complicated..
Just take Euklidean TSP ...
Embed you m nodes in a 2 dimensional plane ..

You get your distances as weights and everything is fine ...
The Euclidean TSP is still NP hard. So it will solve normal TSP if you
can solve it.

That's what I thought to do at first as well, which was last week, but
after pondering the problem for some days I decided that an equal
weight for distance along all nodes was preferable, if no distance
between nodes is given as then one can assume that distance is equally
weighted or wouldn't it be included?

And besides it's easy.

The m-1 dimensional space provides an easy solution to the issue, and
equal weights distance.
I know people said you would have to create this data yourself in the
previous thread... but that was only in general correct not in
particular for this problem... there is no need to as Euclidean TSP has
been prooven to be NP hard 30 years ago .. (also some good
approximations exist to solve it)

Christian

I think it's more fun now to go after the general problem, though the
concept I'm using is easier I think to understand with the Euclidean,
as distance is so key to it, so I'll babble on a bit here on that
issue though now I'll get more speculative.

I think it's provable to solve the Euclidean TSP by considering in a 2-
D space a ring going outward from a start of just two nodes with the
forward traveler on one and the backwards traveler on another, where
these nodes are well behaved in that you can expand a circle out and
get only 4 new nodes at a time, and the weight is the distance, and
then consider the decision points of picking the best path, where also
the best path tends to go in one direction--so I'm cheating a bit to
make it easier to conceptualize.

Then the idea I have is like chewing at the problem from both ends, as
each traveler gets two nodes at a time and there must be a best path
between those two, so you move the travelers out and expand your
circle to get 4 more nodes and do so until you have all nodes. Key
here is that one traveler is moving forward in time while the other is
moving backwards so at the end you collapse your solution to one
traveler moving forward from the starting point to the end.

AT each decision point you picked the shortest so the final full path
is the shortest one.

Lots of problems with the specifics of that thought experiment as it's
very well behaved, but maybe it can give an idea of why using two
travelers--one going forward in time while the other goes backwards in
time--is so key of an idea!

Without that, if you have one traveler in the middle of your graph on
one node and expand your circle to have just two more nodes, how do
you get to an end? How do you get to your beginning?

My idea allows you to start in the middle, and work your way out from
BOTH sides.


James Harris
 
J

JSH

JSH wrote:

...> As that's fairly easy I'd like to focus first on getting a full path

...

Unless you are aiming for a probabilistic algorithm, which may get
different results for the same problem on different runs, don't pick the
start node at random. Make up some rule, no matter how arbitrary.

Oh, the thing is you made an excellent point of where to start, and
gave a great example which forced me to consider how a given start
could break down with the basic algorithm, which caused me to consider
a solution to that problem.

That solution is to try EACH node in turn as a starting point, and
calculate the path, so it doesn't matter which one your start with, so
yeah, you could do it randomly.

I wasn't exactly to that solution this morning but realized it a
little bit later on in the day.
The one I used was "First vertex mentioned in the problem description".

Patricia

Your example revealed though that the problem with an approach from a
particular node was that there could be kind of a trap hidden far back
with a steep cost, so the path from THAT node wouldn't be so great
using my algorithm.

But what about starting at another node?

So I made the leap to just starting at each node in turn, using my
algorithm to get a full path, and then comparing between to pick the
best one, which would solve your simple example, but would it work on
a bigger one?

I think it would, as I'm conjecturing now that you cannot setup a case
where from EACH node the algorithm will fail to give the optimal path,
where for a graph where the costs match perfectly with distance it
would give the optimal path from every node and I've designated such a
graph, a perfect one.

So now I'm moving to inventing terminology.

I call a graph where costs match well with distance a well correlated
graph, and notice your example is not one, as distances are all units,
so I found a way to get my distance argument back in, with
correlation.

So in my terminology your example is a dis-correlated graph, where the
ratio of starting points that give the optimal solution to points that
do not gives the degree of correlation.


James Harris
 
P

Patricia Shanahan

JSH wrote:
....
AT each decision point you picked the shortest so the final full path
is the shortest one.
....

This is the key difference between the NP-complete problems and problems
with known polynomial time algorithms.

There are a lot of problems that do have the property that the globally
optimal solution must be a composition of a series of locally optimal
solutions. For example, if the shortest path from X to Y includes node
A, the path from X to A must be one of the shortest X to A paths.
Polynomial time algorithms for shortest path calculation can use that
fact to prune the search.

Similarly, any subset of elements in a sorted list appear in sorted
order. Sort algorithms such as quick sort and merge sort use that fact.

In my favorite TSP example, an optimal solution must go between A and B
via C, even though that route is far more expensive than the AB edge,
and uses the most expensive edge connected to each of A and B. It has
to be done because it is the least bad way of visiting C.

A globally optimal TSP solution often requires choices that are not
locally optimal.
Lots of problems with the specifics of that thought experiment as it's
very well behaved, but maybe it can give an idea of why using two
travelers--one going forward in time while the other goes backwards in
time--is so key of an idea!

Many NP-complete problems, including TSP, have easy, well behaved cases
that can be solved in polynomial time using locally optimal decisions. I
don't think you should spend much time on those cases when designing a
general algorithm. In general, an algorithm design that is not based on
thinking about difficult cases of the problem it is supposed to solve
probably won't work for them.

The problem I have with understanding the utility of the two travelers
is seeing how they influence each other in anything other than tiny
problems. If you have e.g. 10,000 nodes, for the first 50 iterations
each traveler bars less than 1% of the choices the other traveler could
have made if you ran only one traveler. They may have already made at
least one globally suboptimal choice long before the time the number of
unvisited nodes is small enough that each traveler significantly
constrains the other.

At a more basic level, you seem to be running a simple greedy algorithm
for each traveler, and I don't see anything to backtrack, look ahead, or
otherwise ensure that locally suboptimal choices win if they are needed
for the globally optimal solution.

Patricia
 
J

Joshua Cranmer

That solution is to try EACH node in turn as a starting point, and
calculate the path, so it doesn't matter which one your start with, so
yeah, you could do it randomly.

That doesn't quite solve the problem: if going to node A and node B have
the same cost, what criteria should I use to choose between them?
Patricia would probably be satisfied with an answer like "the latter you
looked at" instead of saying "choose randomly".
But what about starting at another node?

That doesn't work either. Make a larger graph containing two of these
"traps" and you'll note that you can't start from both of them at the
same time. I haven't tested it myself, but your solution gets around the
problem by (essentially) stating "start in the central node of the
subgraph", which can't work if there are two subgraphs like that.
I call a graph where costs match well with distance a well correlated
graph, and notice your example is not one, as distances are all units,
so I found a way to get my distance argument back in, with
correlation.

Define "match well with distance," please.
So in my terminology your example is a dis-correlated graph, where the
ratio of starting points that give the optimal solution to points that
do not gives the degree of correlation.

In this example, the number, I suppose is 1/4. I'm pretty sure a graph
matching my description comes out to be 0 under this scheme.
 
J

Joshua Cranmer

Patricia said:
At a more basic level, you seem to be running a simple greedy algorithm
for each traveler, and I don't see anything to backtrack, look ahead, or
otherwise ensure that locally suboptimal choices win if they are needed
for the globally optimal solution.

I wish to expound on this idea:

NP-hard problems are such because of the fact that one needs to look at
the entire problem to be able to infer which next step is best. To date,
the best correct algorithms more or less rely on "try all, but we know
these paths won't work."

To find a P algorithm, one would have to condense the entire global
solution space--in polynomial time--into some higher-order information
that one can then use to pick the next node in Ω(N) of time (I'm tempted
to say θ(N), but it could very well be true that the act of generating
the information produces output in such a way that the next step is θ(1)).

If you want my opinion, I think a very good line of thought would focus
on cataloging what one gives up by NOT choosing to visit a particular
node next.

In any case, I'm beginning to also get the suspicion that I've been
killfiled by JSH... Oh well, Patricia is probably more experienced than
I in this area anyways.
 
J

JSH

That doesn't quite solve the problem: if going to node A and node B have
the same cost, what criteria should I use to choose between them?
Patricia would probably be satisfied with an answer like "the latter you
looked at" instead of saying "choose randomly".

That works for me!

The idea is to be as complete as possible, as I assume that there
would be cases where paths came out with equal values.
That doesn't work either. Make a larger graph containing two of these
"traps" and you'll note that you can't start from both of them at the
same time. I haven't tested it myself, but your solution gets around the
problem by (essentially) stating "start in the central node of the
subgraph", which can't work if there are two subgraphs like that.

That's a more general assumption than what she made and sounds good
intuitively but you haven't even begun to give any kind of proof.

But it is a route to a counterexample to this idea: use two "traps"
and prove that the algorithm as described cannot give an optimal
path.
Define "match well with distance," please.

Goes back to when I was arguing that wrapped up in the real world with
any TSP problem is some notion of distance. For instance, long trips
tend to cost more, so it costs more to fly to a distant city than to
drive to the supermarket.

But, as was noted you can have situations where cost has no connection
with distance, and you can have a TSP problem where no distances are
given, which lead me to the distance normalized algorithm where all
nodes are assumed to be equidistant from each other.

However, the algorithm I use at its heart is about distance
correlation, where the longer the distance between nodes the greater
the cost.

So if I'm right, it easily solves TSP when cost matches well with
distance, and has problems if it does not, like with the "traps" you
mention.

I call a graph where distance matches perfectly with cost, a perfect
graph.
In this example, the number, I suppose is 1/4. I'm pretty sure a graph
matching my description comes out to be 0 under this scheme.

That would be an interesting counterexample.

I have suspicions for why your intuition is wrong in this case, but of
course, I want you to be wrong, so that doesn't say much.

But my thinking has to do with rotations in space (don't ask the
dimension), so that there is always an angle--if an optimal path
exists--that cuts through optimally, or maybe I'm just babbling.

Trouble with from scratch problem solving is it can take time, which
is why I sound differently this week than last, as last week, I'd just
come up with this approach. Kind of just came fully formed--why not
use two travelers with one going forward and one going backwards in
time?

So this entire algorithm is like a baby, and everything is fresh
ground as I slowly work my way through theories, implications, rough
guesses and complete b.s. in the hopes of getting an answer.


James Harris
 
J

JSH

I wish to expound on this idea:

NP-hard problems are such because of the fact that one needs to look at
the entire problem to be able to infer which next step is best. To date,
the best correct algorithms more or less rely on "try all, but we know
these paths won't work."

To find a P algorithm, one would have to condense the entire global
solution space--in polynomial time--into some higher-order information
that one can then use to pick the next node in Ù(N) of time (I'm tempted
to say è(N), but it could very well be true that the act of generating
the information produces output in such a way that the next step is è(1)).

This idea is an attempt to do just that by slicing the problem up into
a series of decision points and unraveling it, kind of like peeling an
orange.

It's sort of like, if you hit the problem at just the right angle--
here the main idea is solving the problem from TWO directions at
once!--do the decisions have a certain simple order to them?
If you want my opinion, I think a very good line of thought would focus
on cataloging what one gives up by NOT choosing to visit a particular
node next.

In any case, I'm beginning to also get the suspicion that I've been
killfiled by JSH... Oh well, Patricia is probably more experienced than
I in this area anyways.

Nope. I stay around as long as things are interesting. And read
people who make sense (and even annoying people who don't but I don't
like it). To me the discussing and arguing is a lot of the fun which
helps me figure things out very, very, rapidly, even if what I learn
is that I'm wrong.

I have a new idea. It came to me as a clever trick, and as I ponder
it more, I like it more, but it still might be crap.

But it's a great opportunity to ponder and play with something, and
that means that I may read without replying, as I find it a waste of
time to just reply and reply anyway.

The point here really is puzzling out this approach, where I'm still
telling myself, hey, at least this thing will give a round trip
solution and hit every node, so maybe down the line I program it and
see what kind of funny little graphs it makes anyway.

Ok. I've replied a lot in this thread so it's time for me to shut-up
and ponder.

I'm still not quite to programming as I keep finding myself fleshing
out the algorithm!

But I think it's done now, so maybe by next week or within a couple of
weeks I'll throw down some code.


James Harris
 
J

JSH

That would be a huge mistake on his part.  You're the rare bird who hasn't
killfiled JSH, never mind who takes his arguments seriously enough to respond
thoughtfully and respectfully.  If he killfiles everyone who disagrees with
him, no one will be left.

Why are you preaching to this group?

There has been a nice discussion, where yes, other thoughtful people
are disagreeing with me, and maybe you think I'm some horrible person,
but why did you choose to be one of the people who comes in and pisses
in the Kool-Aid at the party?

I am curious, why would you choose to be the one to crap in public so
that other unsuspecting people have to come here and get a dose of
hostility in the morning like I just did, when I was hoping to come in
and do a little further discussion, even if it meant agreeing that
maybe this idea isn't worth anything.

What motivates people like you to spread pain and misery around the
world?

Can you answer? Do you get off on it? Somewhere giggling to yourself
now imagining how many annoyed people are suddenly turned off from the
discussion?

People like me? I am turned off from this idiot crap from people like
you and know I will not killfile you as I rant instead and ponder why,
why, why?

What is wrong with you and people like you?


James Harris
 
J

Junoexpress

Why is your response so disproportionately out of response with Lew's
comment? Everything that Lew said was true.
1) It would be a mistake to killfile someone who took your arguments
seriously,
2) there aren't a lot of these people around
3) it is true that if you killfiled eveyone who disagreed with you, no
one would be left.
All true (albeit, unpleasant to you) statements. And then you start in
with him pissing on this and crapping on that ... Methinks thou dost
protest too much.

No, the reality is that you were about to blow up anyways, and just
looking for the first opportunity. You have Patricia saying she
doesn't believe your idea will work anyways and asking why you don't
actually study the problem. But this is something you know you will
never do past a cursory Wikipedia search, so how to get out of it? Oh
yeah, start blaming *others*: how convenient.

But, hey, at least the post *was* pretty funny.

M
 
J

Joshua Cranmer

JSH said:
> That's a more general assumption than what she made and sounds good
> intuitively but you haven't even begun to give any kind of proof.
>
> But it is a route to a counterexample to this idea: use two "traps"
> and prove that the algorithm as described cannot give an optimal
> path.

In working this out in my head, I realized there are two cases where the
algorithm will generate correct pathing for one of these traps (or you
could view it as one case): either the trap node has to be the start
position, or both walkers have to hit the entrance nodes at the same
time, i.e., the trap node is the halfway-point in the path.

Let me do the best to explain the graph. All weights are assumed to be
100 if not explicitly stated. This is also going to be an undirected
graph, so all weights are symmetrical.

A "trap" is defined as a triad (1, 2, 3) where 1 has weight 10000 to all
other points (unless otherwise stated), 1000 to 2 and 3, and the 2-3
weight is 1.

The graph has two traps, (A, B, C), and (D, E, F), as well as two more
nodes G and H. The A-to-D connection has 30000. The connection between B
and E is 5, as is the connection between C and G, G and H, and H and F.
If I have appropriately assigned the weights, the best cycle is
ABEDFHGCA, which has a total cost of 4040.

Should the algorithm start at A, the travelers will move to B and C,
then E and G respectively. At that point, the travelers will move to F
and H, forcing them to take the expensive 2 * 10000 to get to G, which
is obviously much more than the first path. Starting at B or C produces
worse results, as the travelers forgo the cheap connections to A.
Starting at G will move the travelers to H and C, then F and B, forgoing
the cheap connections to both A and D. Graph reflection will show that
similar results occur for the other nodes D, E, F, and H.

As a good lesson for generating similar counterexamples, this is how I
generated the graph:

I first noticed the problem of the trap, aided by Patricia's posts. Your
solution to this was to start at the central node, so a counterexample
involving traps would need to find somewhere to force you to start
elsewhere, i.e., using two traps. This was the source of my "intuition."

To find a concrete algorithm, I visualized what the counterexample graph
should look like: two traps with a minimal number of nodes. I realized
at this point that putting the traps diametrically opposed would not
work, so I inserted a node on one side to force one traveler to get
there first.

Here, I realized that might not work, since the two travelers would then
be considering the node at the same time, and I wasn't sure I could
adjust node weights properly to get them to miss in the right way, so I
added another node on the longer side to ensure that the
simultaneously-considered node was not part of the cheap connection to
the trap. Note that arbitrary numbers of nodes could be added to avoid
longer lookahead loops.

At that point, I added the weights to complete the graph. Factors of
ten, for this small a graph, were sufficient. The conditions that make
up a trap are by now well-documented, so those weights were easy; to be
sure that no one would shortcut the traps I tripled the A-D line
probably unnecessarily. I forced the traveler onto the path by lowering
those connections (the factor of ten was probably unnecessary; I
originally used two, but added more to be on the safe side).

For simple algorithms, the qualitative analysis tends to be sufficient
(unless the constraints are such that I need to watch the balance of
weights); I included quantitative values mostly to be complete and to
head off any rebuttals that it wasn't sufficient.
So if I'm right, it easily solves TSP when cost matches well with
distance, and has problems if it does not, like with the "traps" you
mention.

If I restricted myself to Euclidean graphs, I could probably come up
with counterexamples. However, I am (at this point) still working with
only my innate visualization skills, which means I am heavily relying on
qualitative analysis to reach conclusions, whereas Euclidean
constraints, most notably the triangle inequality, require me to be more
precise with numbers, as I explained above.
I have suspicions for why your intuition is wrong in this case, but of
course, I want you to be wrong, so that doesn't say much.

Everyone would love for counterexamples to be wrong. :)
Trouble with from scratch problem solving is it can take time, which
is why I sound differently this week than last, as last week, I'd just
come up with this approach. Kind of just came fully formed--why not
use two travelers with one going forward and one going backwards in
time?

Having the two travelers move in tandem doesn't seem to be gaining much
at this point; in fact, it's probably producing the same results as a
one-traveler algorithm except as O(N^4) instead of O(N^3). It certainly
seems to be acting as such in these heavily-massaged edge cases.

I'm not sure having two travelers will ever add much; the most common
case where an algorithm involves such two travelers is the Bidirectional
Search, where its improvement over BFS (or other corresponding
unidirectional searches) comes from the fact that the algorithm is
solved by local optimization, so having two steps toward the solution
improves time by a constant factor by not considering as many next
steps. TSP, however, needs to meet each node, so no benefit can be
derived in the same way by looking at only half (very rough estimate)
the nodes.

If you wish to continue on in these probings, my recommendations would
be to explore making the two interact in more dynamic ways. For example,
one could theoretically have one traveler somehow "warn" the other
traveler that what looks like a nice downhill path is actually the
steepest way up the mountain. This would probably mean dropping the
notion that one traveler is forwards in time and the other backwards, or
dropping that the two make moves at the same time.
So this entire algorithm is like a baby, and everything is fresh
ground as I slowly work my way through theories, implications, rough
guesses and complete b.s. in the hopes of getting an answer.

A good way to continue in probing algorithms is to look at the failures
and mull over why they fail. If P=NP, that would imply that the global
nature of the problem can be reduced to some statements about local
moves. That means that every class of graph that fails the greedy TSP
would have some step in the algorithm that redistributes expected
weights in such a way that the class succeeds.

The task, then, of proving P = NP in this manner would require us to
construct a list of all cases that fail the greedy (locally optimal)
TSP, construct a second graph whose edge weights are assigned in such a
manner that a greedy solution on /that/ graph solves the original graph,
and then prove that such an algorithm works for all failure cases and
that the list of failure cases is complete. This approach is similar to
how the four color theorem was proved.

So, the list to date:

Greedy TSP Failure Classes:
1. The smallest weight outgoing edge from node A goes to a node B, while
the smallest weight ingoing and outgoing edge from node C is node A and
node B, respectively, i.e., the "traps" we have repeatedly discussed.
 
P

Patricia Shanahan

Joshua said:
In working this out in my head, I realized there are two cases where the
algorithm will generate correct pathing for one of these traps (or you
could view it as one case): either the trap node has to be the start
position, or both walkers have to hit the entrance nodes at the same
time, i.e., the trap node is the halfway-point in the path.
....

Here's another example, and a brute force program for finding the
minimum cost. In this case, to avoid arguments about distance, I'm
specifying only coordinates. The cost of each edge is the Java double
approximation to the Euclidean distance between the vertices.

To make everything as self-contained and simple as possible, I specify
the problem inside the program. Here's the definition for my latest example:

private static final double L = 2;
private static final double M = 3;

private static final Node[] NODES = new Node[] {
new Node("A", -L/2,M+L/2), new Node("B", L/2, M+L/2),
new Node("C", M+L/2, L/2), new Node("D", M+L/2, -L/2),
new Node("E", L/2, -M-L/2), new Node("F", -L/2, -M-L/2),
new Node("G", -M-L/2, -L/2), new Node("H", -M-L/2, L/2),
new Node("W", -L/2, L/2), new Node("X", L/2, L/2),
new Node("Y", L/2, -L/2), new Node("Z", -L/2, -L/2)

};

W, X, Y, and Z are the corners of an L by L square. The remaining points
are corners of L by M rectangles attached to each edge of the square.

With L=2 and M=3 the results are:

Best Score 32.0000
Best Cycle: A W H G Z F E Y D C X B A

The rest of this message is a simple TSP solving program. Although the
12 vertex case only takes a few seconds, it is exponential time so I
don't recommend using it for much larger problems.

import java.util.Arrays;
import java.util.Deque;
import java.util.HashSet;
import java.util.LinkedList;
import java.util.Set;

public class BruteForce {
/*
* Problem definition.
*/
private static final double L = 2;
private static final double M = 3;

private static final Node[] NODES = new Node[] {
new Node("A", -L/2,M+L/2), new Node("B", L/2, M+L/2),
new Node("C", M+L/2, L/2), new Node("D", M+L/2, -L/2),
new Node("E", L/2, -M-L/2), new Node("F", -L/2, -M-L/2),
new Node("G", -M-L/2, -L/2), new Node("H", -M-L/2, L/2),
new Node("W", -L/2, L/2), new Node("X", L/2, L/2),
new Node("Y", L/2, -L/2), new Node("Z", -L/2, -L/2)

};

public static void main(String[] args) {
BruteForce test = new BruteForce(NODES);
long start = System.nanoTime();
test.solve();
long end = System.nanoTime();
System.out.printf("Best Score %g%n", test.bestScore);
System.out.print("Best Cycle:");
for (Node n : test.bestCycle) {
System.out.print(" "+n.getId());
}
System.out.println();
System.out.printf("Elapsed time: %g seconds%n", (end-start)/1e9);
}

private final Set<Node> nodes = new HashSet<Node>();

private double bestScore = Double.POSITIVE_INFINITY;

private Deque<Node> bestCycle;

private final Node startNode;

public BruteForce(Node[] nodes) {
this.nodes.addAll(Arrays.asList(nodes));
startNode = nodes[0];
}

/**
* Calculate bestScore and bestCycle.
*/
private void solve() {
/* Do special case handling for getting started with startNode as the
* first and last node of the cycle.
*/
nodes.remove(startNode);
Deque<Node> prefix = new LinkedList<Node>();
prefix.add(startNode);
nodes.remove(startNode);
solve(nodes, 0, prefix);
nodes.add(startNode);
}

/**
* Calculate bestScore and bestCycle starting from a
* specified prefix path.
* @param available The set of nodes that are not in the prefix path
* @param prefixScore The score for the prefix path
* @param prefix The prefix path
*/
private void solve(Set<Node> available,
double prefixScore, Deque<Node> prefix) {
if (available.isEmpty()) {
/* Finished, the path is complete except for closing the cycle*/
double score = prefixScore
+ prefix.getLast().distance(prefix.getFirst());
if (score < bestScore) {
/* This cycle is better than the best so far */
bestCycle = new LinkedList<Node>();
bestCycle.addAll(prefix);
bestCycle.addLast(prefix.getFirst());
bestScore = score;
}
return;
}
/* Need two copies of available set, one to provide an iterator
* and another that can be temporarily modified without
* disturbing the iterator.
*/
Set<Node> workingAvailable = new HashSet<Node>(
available);
for (Node n : available) {
double score = prefixScore + prefix.getLast().distance(n);
if (score < bestScore) {
/* There is a possibility that this path will lead to the best
* cycle.
*/
workingAvailable.remove(n);
prefix.addLast(n);
solve(workingAvailable, score, prefix);
prefix.removeLast();
workingAvailable.add(n);
}
}
}
static class Node {
private String id;
private double x;
private double y;

/**
* Create node.
* @param id Identifier-like string labeling the node.
* @param x X coordinate.
* @param y Y coordinate.
*/
Node(String id, double x, double y) {
this.id = id;
this.x = x;
this.y = y;
}

String getId() {
return id;
}

double distance(Node o) {
double xDist = x - o.x;
double yDist = y - o.y;
return Math.sqrt((xDist * xDist + yDist * yDist));
}
}
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,717
Messages
2,569,382
Members
44,705
Latest member
BerniePele

Latest Threads

Top