Distance normalized TSP algorithm

Discussion in 'Java' started by JSH, Jul 25, 2008.

  1. JSH

    JSH Guest

    I came up with this idea for figuring out a path with the Traveling
    Salesman Problem and posted about it, and an objection that came up
    was the the algorithm required the distance between each of the nodes,
    while the standard problem does not require this information.

    In reaction to that objection I got upset, but now after taking some
    time to think and ponder the problem I found I could handle the issue
    easily so that a new distance normalized algorithm could handle any
    TSP setup.

    The simple idea is that if there are m nodes, and there is no distance
    information between nodes given, then they are to be considered to be
    embedded in an m-1 dimensional space equidistant from each other.

    Then the algorithm proceeds as before, where you have two travelers
    T_1 and T_2, where I'll talk through a single weight, but you can have
    multiple weights. With a single weight between nodes, I'll use cost
    to keep the idea closer to the standard way of stating the problem,
    and also so that they complete a round trip circuit, I'll say they
    begin from the same node N_1.

    To use the algorithm and take them to the next nodes, T_1 considers
    moving to another node, which I'll say is N_2.

    Then T_2 considers moving to all other nodes available, not currently
    under consideration by T_1, and not already visited, so T_2 might
    start with N_3 and proceed from there, and calculates the
    cost*distance from each path BACK to N_1 because T_2 is moving
    BACKWARDS, while T_1 is moving forwards.

    Notice that with the distance normalized algorithm you just have cost
    considered as distance=1. Also if there are multiple weights you
    simply multiply by each weight, so if there were a cost and a time for
    each path, you'd have cost*time*distance and so forth.

    T_2 considers every available move back to N_1, and selects the one
    with the least value, and returns that information.

    T_1 stores that info and now considers moving to N_3, and T_2 now
    calculates for all available nodes again, excluding the one under
    consideration or already visited, so now T_2 could start with N_2 and
    loop through all available, and again T_2 picks the lowest cost move
    BACK to N_1 and returns that to T_1.

    In this way, all possible moves by T_1 and T_2 are considered, and
    after T_1 has gotten values back from T_2 for each possible move, T_1
    selects the lowest cost move, and then BOTH T_1 and T_2 make their
    moves, and T_1 moves forward to the node that fits that least path,
    while T_2 moves BACKWARD to the node that does, where those will be
    different nodes and there is a rule that T_1 and T_2 can never move to
    the same node.

    A special case occurs if there are only 3 nodes left, meaning only one
    is unvisited, and in this case the algorithm arbitrarily has T_1 move
    forward to that node.

    Notice that when the algorithm exits you have a path taken by T_1 and
    a path taken by T_2, but you collapse that into a single journey by a
    single traveler moving along the path of T_1, and then continuing
    forward along the path of T_2, where to conceptualize it, T_2 is the
    same traveler moving backwards in time to meet himself moving
    forwards.

    I am prepared to step through a simple example if that is necessary
    but would not mind if someone else might help out in that regard, as
    I'm curious to know if ANYONE else in the world can understand this
    algorithm!!!

    It seems so simple to me but often it's hard to explain something new
    to people so I understand that it may take some time and patience for
    others to know how to do the algorithm and I apologize for getting
    frustrated and angry before, versus just coming up with the normalized
    algorithm earlier.

    Oh yeah, I'm still angling for coders if I can get some people
    interested in this thing, as I have a Google Code project called
    optimalpathengine.


    James Harris
    JSH, Jul 25, 2008
    #1
    1. Advertising

  2. JSH wrote:
    ....
    > The simple idea is that if there are m nodes, and there is no distance
    > information between nodes given, then they are to be considered to be
    > embedded in an m-1 dimensional space equidistant from each other.


    I believe that my original example is well-formulated according to the
    rules you are now using. To save everyone digging back through the
    depths of the previous thread, I'll repeat it here:

    "Cities A, B, C, D, E. I'm assuming symmetrical costs - getting from A
    to B is the same cost as B to A.

    C,A is 1000. C,B is 1000. C,D is 2000. C,E is 2000. All other costs are 1."

    > Then the algorithm proceeds as before, where you have two travelers
    > T_1 and T_2, where I'll talk through a single weight, but you can have
    > multiple weights. With a single weight between nodes, I'll use cost
    > to keep the idea closer to the standard way of stating the problem,
    > and also so that they complete a round trip circuit, I'll say they
    > begin from the same node N_1.


    You don't specify any criteria for picking the first node, so I assume
    the algorithm is supposed to work regardless of that choice. Suppose
    they start, in my example, at A.

    >
    > To use the algorithm and take them to the next nodes, T_1 considers
    > moving to another node, which I'll say is N_2.
    >
    > Then T_2 considers moving to all other nodes available, not currently
    > under consideration by T_1, and not already visited, so T_2 might
    > start with N_3 and proceed from there, and calculates the
    > cost*distance from each path BACK to N_1 because T_2 is moving
    > BACKWARDS, while T_1 is moving forwards.
    >
    > Notice that with the distance normalized algorithm you just have cost
    > considered as distance=1. Also if there are multiple weights you
    > simply multiply by each weight, so if there were a cost and a time for
    > each path, you'd have cost*time*distance and so forth.
    >
    > T_2 considers every available move back to N_1, and selects the one
    > with the least value, and returns that information.
    >
    > T_1 stores that info and now considers moving to N_3, and T_2 now
    > calculates for all available nodes again, excluding the one under
    > consideration or already visited, so now T_2 could start with N_2 and
    > loop through all available, and again T_2 picks the lowest cost move
    > BACK to N_1 and returns that to T_1.


    As far as I can tell, this procedure would result in T_1 and T_2 moving
    from A to two distinct nodes chosen from {B, D, E}. Each of those moves
    has cost 1. However, any path that contains e.g. both AB and DA would
    not be able to use AC. Since C has only two cost 1000 edges, not using
    one of them forces use of at least one of the cost 2000 edges. Such a
    path would have cost either 3003 or 4003.

    If I am misinterpreting your algorithm, please supply pseudo-code or
    other specification, including how you choose the starting node.

    An optimal path, such as ACBDEA, has total cost 2003, but that can only
    be achieved if T_1 or T_2, starting at A, choses to go to C, at a cost
    of 1000, rather than to B, D, or E at a cost of 1. How would either T_1
    or T_2 know that a cost 1000 move is better for the overall solution
    than a cost 1 move?

    The core difficulty in the NP-complete problems is that making a series
    of locally optimal choices cannot be depended on to lead to a globally
    optimal solution.

    Patricia
    Patricia Shanahan, Jul 25, 2008
    #2
    1. Advertising

  3. JSH wrote:
    > In this way, all possible moves by T_1 and T_2 are considered, and
    > after T_1 has gotten values back from T_2 for each possible move, T_1
    > selects the lowest cost move, and then BOTH T_1 and T_2 make their
    > moves, and T_1 moves forward to the node that fits that least path,
    > while T_2 moves BACKWARD to the node that does, where those will be
    > different nodes and there is a rule that T_1 and T_2 can never move to
    > the same node.


    As Patricia points out, this breaks down in at least one case. The case
    is this (graphically put):
    ¢
    A --- B
    $ / \ $ / \ $
    / \$/ \
    * --- C --- *
    $$$ $$$

    In short, if C is a node with expensive connections to all but A and B
    (which are of moderate expense), while A and B both have cheap
    connections to everywhere but C, and even cheaper between the two of
    them, then the traveler will see get to one of A or B, find the cheapest
    one to be the other one, move to it, and then move back out into the
    rest of the graph, cutting off the cheapest connections to C.

    The example Patricia gave violated the Triangle Inequality, but looking
    at the example I have, it is very likely that it fails even if you
    assume the Triangle Inequality (I haven't done any analysis as extensive
    of Patricia).

    This, by the way, illustrates the danger of NP-hard problems: when you
    come to a choice, the "best" solution at the onset can lead you into
    pitfalls. Which means that you have to take into account not only the
    cost of moving from one node to another but the loss of savings in not
    moving to a third one (if that makes any sense).

    > A special case occurs if there are only 3 nodes left, meaning only one
    > is unvisited, and in this case the algorithm arbitrarily has T_1 move
    > forward to that node.


    Three unvisited nodes or three nodes including the ones T_1 and T_2 are on?

    > I am prepared to step through a simple example if that is necessary
    > but would not mind if someone else might help out in that regard, as
    > I'm curious to know if ANYONE else in the world can understand this
    > algorithm!!!


    One exclamation point suffices. I understand it sufficiently to reason
    on it, and I could probably code it up reasonably fast if I cared to.
    It's still ill-formed: you don't mention what to do in case of ties.

    --
    Beware of bugs in the above code; I have only proved it correct, not
    tried it. -- Donald E. Knuth
    Joshua Cranmer, Jul 25, 2008
    #3
  4. JSH

    JSH Guest

    On Jul 24, 9:05 pm, Patricia Shanahan <> wrote:
    > JSH wrote:
    >
    > ...
    >
    > > The simple idea is that if there are m nodes, and there is no distance
    > > information between nodes given, then they are to be considered to be
    > > embedded in an m-1 dimensional space equidistant from each other.

    >
    > I believe that my original example is well-formulated according to the
    > rules you are now using. To save everyone digging back through the
    > depths of the previous thread, I'll repeat it here:


    Thanks!

    > "Cities A, B, C, D, E. I'm assuming symmetrical costs - getting from A
    > to B is the same cost as B to A.
    >
    > C,A is 1000. C,B is 1000. C,D is 2000. C,E is 2000. All other costs are 1.."
    >
    > > Then the algorithm proceeds as before, where you have two travelers
    > > T_1 and T_2, where I'll talk through a single weight, but you can have
    > > multiple weights.  With a single weight between nodes, I'll use cost
    > > to keep the idea closer to the standard way of stating the problem,
    > > and also so that they complete a round trip circuit, I'll say they
    > > begin from the same node N_1.

    >
    > You don't specify any criteria for picking the first node, so I assume
    > the algorithm is supposed to work regardless of that choice. Suppose
    > they start, in my example, at A.


    Good catch. That is just a miss on my part, but it's easily fixable,
    by saying that a path is calculated for each possible starting node in
    turn. The algorithm is still polynomial time with that change.

    As that's fairly easy I'd like to focus first on getting a full path
    from an arbitrary start at just any node, say, chosen randomly.

    >
    > > To use the algorithm and take them to the next nodes, T_1 considers
    > > moving to another node, which I'll say is N_2.

    >
    > > Then T_2 considers moving to all other nodes available, not currently
    > > under consideration by T_1, and not already visited, so T_2 might
    > > start with N_3 and proceed from there, and calculates the
    > > cost*distance from each path BACK to N_1 because T_2 is moving
    > > BACKWARDS, while T_1 is moving forwards.


    I realized an omission in my primary algorithm, as I simply forget
    about the cost for T_1 from N_1 to the node being considered, which is
    kind of an odd thing to forget.

    The solution and correction to the algorithm is that cost is
    multiplied as well.

    So if cost from N_1 to N_2 is cost_1 for T_1 and cost from N_1 to N_3
    is cost_2 for T_2, then the calculation is cost_1*cost_2*distance,
    where here distance is 1, so it is cost_1*cost_2.

    >
    > > Notice that with the distance normalized algorithm you just have cost
    > > considered as distance=1.  Also if there are multiple weights you
    > > simply multiply by each weight, so if there were a cost and a time for
    > > each path, you'd have cost*time*distance and so forth.


    With the correction you'd have two weight values for each, so

    cost_1*cost_2*time_1*time_2*distance

    > > T_2 considers every available move back to N_1, and selects the one
    > > with the least value, and returns that information.


    If more than one path has the least value then you randomly pick.

    > > T_1 stores that info and now considers moving to N_3, and T_2 now
    > > calculates for all available nodes again, excluding the one under
    > > consideration or already visited, so now T_2 could start with N_2 and
    > > loop through all available, and again T_2 picks the lowest cost move
    > > BACK to N_1 and returns that to T_1.

    >
    > As far as I can tell, this procedure would result in T_1 and T_2 moving
    > from A to two distinct nodes chosen from {B, D, E}. Each of those moves
    > has cost 1. However, any path that contains e.g. both AB and DA would
    > not be able to use AC. Since C has only two cost 1000 edges, not using
    > one of them forces use of at least one of the cost 2000 edges. Such a
    > path would have cost either 3003 or 4003.


    Oh, that's not good. I'll need to consider this carefully later.
    Kind of in a rush this morning...

    > If I am misinterpreting your algorithm, please supply pseudo-code or
    > other specification, including how you choose the starting node.
    >
    > An optimal path, such as ACBDEA, has total cost 2003, but that can only
    > be achieved if T_1 or T_2, starting at A, choses to go to C, at a cost
    > of 1000, rather than to B, D, or E at a cost of 1. How would either T_1
    > or T_2 know that a cost 1000 move is better for the overall solution
    > than a cost 1 move?
    >
    > The core difficulty in the NP-complete problems is that making a series
    > of locally optimal choices cannot be depended on to lead to a globally
    > optimal solution.
    >
    > Patricia


    Thanks for the reply! I'll have to consider it in detail later as I'm
    in a rush this morning, but wanted to add the correction.

    If you are correct then that kills the usefulness of the idea for TSP,
    but doesn't mean I won't still program the algorithm out of curiosity.


    James Harris
    JSH, Jul 25, 2008
    #4
  5. JSH wrote:
    > Good catch. That is just a miss on my part, but it's easily fixable,
    > by saying that a path is calculated for each possible starting node in
    > turn. The algorithm is still polynomial time with that change.


    Keeping score, that change should make it O(N^4):
    for each starting node
    for each node in path (actually N/2, but ignoring constants)
    for each possible next node for T_1
    for each possible previous node for T_2

    > So if cost from N_1 to N_2 is cost_1 for T_1 and cost from N_1 to N_3
    > is cost_2 for T_2, then the calculation is cost_1*cost_2*distance,
    > where here distance is 1, so it is cost_1*cost_2.


    Forgive me for asking, but why multiply the two costs instead of adding
    them together?

    > If more than one path has the least value then you randomly pick.


    Answers my question in other post, but still a disconcerting answer,
    since (as I explained again other there) you lose something by moving to
    a node, unlike other working greedy algorithms where "moving" to a node
    is only expanding a search tree.

    Taking this time to expound on what I think the flaw in the algorithm
    is: you're ignoring the opportunity cost of making a move.

    --
    Beware of bugs in the above code; I have only proved it correct, not
    tried it. -- Donald E. Knuth
    Joshua Cranmer, Jul 25, 2008
    #5
  6. JSH

    Christian Guest

    JSH schrieb:
    > I came up with this idea for figuring out a path with the Traveling
    > Salesman Problem and posted about it, and an objection that came up
    > was the the algorithm required the distance between each of the nodes,
    > while the standard problem does not require this information.
    >
    > In reaction to that objection I got upset, but now after taking some
    > time to think and ponder the problem I found I could handle the issue
    > easily so that a new distance normalized algorithm could handle any
    > TSP setup.
    >
    > The simple idea is that if there are m nodes, and there is no distance
    > information between nodes given, then they are to be considered to be
    > embedded in an m-1 dimensional space equidistant from each other.


    Don't do this ... its just getting more complicated..
    Just take Euklidean TSP ...
    Embed you m nodes in a 2 dimensional plane ..

    You get your distances as weights and everything is fine ...
    The Euclidean TSP is still NP hard. So it will solve normal TSP if you
    can solve it.

    I know people said you would have to create this data yourself in the
    previous thread... but that was only in general correct not in
    particular for this problem... there is no need to as Euclidean TSP has
    been prooven to be NP hard 30 years ago .. (also some good
    approximations exist to solve it)



    Christian
    Christian, Jul 25, 2008
    #6
  7. JSH wrote:
    ....
    > As that's fairly easy I'd like to focus first on getting a full path
    > from an arbitrary start at just any node, say, chosen randomly.

    ....

    Unless you are aiming for a probabilistic algorithm, which may get
    different results for the same problem on different runs, don't pick the
    start node at random. Make up some rule, no matter how arbitrary.

    The one I used was "First vertex mentioned in the problem description".

    Patricia
    Patricia Shanahan, Jul 25, 2008
    #7
  8. JSH

    JSH Guest

    On Jul 25, 8:21 am, Christian <> wrote:
    > JSH schrieb:
    >
    > > I came up with this idea for figuring out a path with the Traveling
    > > Salesman Problem and posted about it, and an objection that came up
    > > was the the algorithm required the distance between each of the nodes,
    > > while the standard problem does not require this information.

    >
    > > In reaction to that objection I got upset, but now after taking some
    > > time to think and ponder the problem I found I could handle the issue
    > > easily so that a new distance normalized algorithm could handle any
    > > TSP setup.

    >
    > > The simple idea is that if there are m nodes, and there is no distance
    > > information between nodes given, then they are to be considered to be
    > > embedded in an m-1 dimensional space equidistant from each other.

    >
    > Don't do this ... its just getting more complicated..
    > Just take Euklidean TSP ...
    > Embed you m nodes in a 2 dimensional plane ..
    >
    > You get your distances as weights and everything is fine ...
    > The Euclidean TSP is still NP hard. So it will solve normal TSP if you
    > can solve it.


    That's what I thought to do at first as well, which was last week, but
    after pondering the problem for some days I decided that an equal
    weight for distance along all nodes was preferable, if no distance
    between nodes is given as then one can assume that distance is equally
    weighted or wouldn't it be included?

    And besides it's easy.

    The m-1 dimensional space provides an easy solution to the issue, and
    equal weights distance.

    > I know people said you would have to create this data yourself in the
    > previous thread... but that was only in general correct not in
    > particular for this problem... there is no need to as Euclidean TSP has
    > been prooven to be NP hard 30 years ago .. (also some good
    > approximations exist to solve it)
    >
    > Christian


    I think it's more fun now to go after the general problem, though the
    concept I'm using is easier I think to understand with the Euclidean,
    as distance is so key to it, so I'll babble on a bit here on that
    issue though now I'll get more speculative.

    I think it's provable to solve the Euclidean TSP by considering in a 2-
    D space a ring going outward from a start of just two nodes with the
    forward traveler on one and the backwards traveler on another, where
    these nodes are well behaved in that you can expand a circle out and
    get only 4 new nodes at a time, and the weight is the distance, and
    then consider the decision points of picking the best path, where also
    the best path tends to go in one direction--so I'm cheating a bit to
    make it easier to conceptualize.

    Then the idea I have is like chewing at the problem from both ends, as
    each traveler gets two nodes at a time and there must be a best path
    between those two, so you move the travelers out and expand your
    circle to get 4 more nodes and do so until you have all nodes. Key
    here is that one traveler is moving forward in time while the other is
    moving backwards so at the end you collapse your solution to one
    traveler moving forward from the starting point to the end.

    AT each decision point you picked the shortest so the final full path
    is the shortest one.

    Lots of problems with the specifics of that thought experiment as it's
    very well behaved, but maybe it can give an idea of why using two
    travelers--one going forward in time while the other goes backwards in
    time--is so key of an idea!

    Without that, if you have one traveler in the middle of your graph on
    one node and expand your circle to have just two more nodes, how do
    you get to an end? How do you get to your beginning?

    My idea allows you to start in the middle, and work your way out from
    BOTH sides.


    James Harris
    JSH, Jul 26, 2008
    #8
  9. JSH

    JSH Guest

    On Jul 25, 12:40 pm, Patricia Shanahan <> wrote:
    > JSH wrote:
    >
    > ...> As that's fairly easy I'd like to focus first on getting a full path
    > > from an arbitrary start at just any node, say, chosen randomly.

    >
    > ...
    >
    > Unless you are aiming for a probabilistic algorithm, which may get
    > different results for the same problem on different runs, don't pick the
    > start node at random. Make up some rule, no matter how arbitrary.


    Oh, the thing is you made an excellent point of where to start, and
    gave a great example which forced me to consider how a given start
    could break down with the basic algorithm, which caused me to consider
    a solution to that problem.

    That solution is to try EACH node in turn as a starting point, and
    calculate the path, so it doesn't matter which one your start with, so
    yeah, you could do it randomly.

    I wasn't exactly to that solution this morning but realized it a
    little bit later on in the day.

    > The one I used was "First vertex mentioned in the problem description".
    >
    > Patricia


    Your example revealed though that the problem with an approach from a
    particular node was that there could be kind of a trap hidden far back
    with a steep cost, so the path from THAT node wouldn't be so great
    using my algorithm.

    But what about starting at another node?

    So I made the leap to just starting at each node in turn, using my
    algorithm to get a full path, and then comparing between to pick the
    best one, which would solve your simple example, but would it work on
    a bigger one?

    I think it would, as I'm conjecturing now that you cannot setup a case
    where from EACH node the algorithm will fail to give the optimal path,
    where for a graph where the costs match perfectly with distance it
    would give the optimal path from every node and I've designated such a
    graph, a perfect one.

    So now I'm moving to inventing terminology.

    I call a graph where costs match well with distance a well correlated
    graph, and notice your example is not one, as distances are all units,
    so I found a way to get my distance argument back in, with
    correlation.

    So in my terminology your example is a dis-correlated graph, where the
    ratio of starting points that give the optimal solution to points that
    do not gives the degree of correlation.


    James Harris
    JSH, Jul 26, 2008
    #9
  10. JSH wrote:
    ....
    > AT each decision point you picked the shortest so the final full path
    > is the shortest one.

    ....

    This is the key difference between the NP-complete problems and problems
    with known polynomial time algorithms.

    There are a lot of problems that do have the property that the globally
    optimal solution must be a composition of a series of locally optimal
    solutions. For example, if the shortest path from X to Y includes node
    A, the path from X to A must be one of the shortest X to A paths.
    Polynomial time algorithms for shortest path calculation can use that
    fact to prune the search.

    Similarly, any subset of elements in a sorted list appear in sorted
    order. Sort algorithms such as quick sort and merge sort use that fact.

    In my favorite TSP example, an optimal solution must go between A and B
    via C, even though that route is far more expensive than the AB edge,
    and uses the most expensive edge connected to each of A and B. It has
    to be done because it is the least bad way of visiting C.

    A globally optimal TSP solution often requires choices that are not
    locally optimal.

    > Lots of problems with the specifics of that thought experiment as it's
    > very well behaved, but maybe it can give an idea of why using two
    > travelers--one going forward in time while the other goes backwards in
    > time--is so key of an idea!


    Many NP-complete problems, including TSP, have easy, well behaved cases
    that can be solved in polynomial time using locally optimal decisions. I
    don't think you should spend much time on those cases when designing a
    general algorithm. In general, an algorithm design that is not based on
    thinking about difficult cases of the problem it is supposed to solve
    probably won't work for them.

    The problem I have with understanding the utility of the two travelers
    is seeing how they influence each other in anything other than tiny
    problems. If you have e.g. 10,000 nodes, for the first 50 iterations
    each traveler bars less than 1% of the choices the other traveler could
    have made if you ran only one traveler. They may have already made at
    least one globally suboptimal choice long before the time the number of
    unvisited nodes is small enough that each traveler significantly
    constrains the other.

    At a more basic level, you seem to be running a simple greedy algorithm
    for each traveler, and I don't see anything to backtrack, look ahead, or
    otherwise ensure that locally suboptimal choices win if they are needed
    for the globally optimal solution.

    Patricia
    Patricia Shanahan, Jul 26, 2008
    #10
  11. JSH

    Guest

    On Jul 25, 8:54 pm, JSH <> wrote:
    >
    > AT each decision point you picked the shortest so the final full path
    > is the shortest one.
    >

    Hi, James! I see you're still very stupid!
    , Jul 26, 2008
    #11
  12. JSH wrote:
    > On Jul 25, 12:40 pm, Patricia Shanahan <> wrote:
    >> JSH wrote:
    >>
    >> ...> As that's fairly easy I'd like to focus first on getting a full path
    >>> from an arbitrary start at just any node, say, chosen randomly.

    >> ...
    >>
    >> Unless you are aiming for a probabilistic algorithm, which may get
    >> different results for the same problem on different runs, don't pick the
    >> start node at random. Make up some rule, no matter how arbitrary.


    > That solution is to try EACH node in turn as a starting point, and
    > calculate the path, so it doesn't matter which one your start with, so
    > yeah, you could do it randomly.


    That doesn't quite solve the problem: if going to node A and node B have
    the same cost, what criteria should I use to choose between them?
    Patricia would probably be satisfied with an answer like "the latter you
    looked at" instead of saying "choose randomly".

    > But what about starting at another node?


    That doesn't work either. Make a larger graph containing two of these
    "traps" and you'll note that you can't start from both of them at the
    same time. I haven't tested it myself, but your solution gets around the
    problem by (essentially) stating "start in the central node of the
    subgraph", which can't work if there are two subgraphs like that.

    > I call a graph where costs match well with distance a well correlated
    > graph, and notice your example is not one, as distances are all units,
    > so I found a way to get my distance argument back in, with
    > correlation.


    Define "match well with distance," please.

    > So in my terminology your example is a dis-correlated graph, where the
    > ratio of starting points that give the optimal solution to points that
    > do not gives the degree of correlation.


    In this example, the number, I suppose is 1/4. I'm pretty sure a graph
    matching my description comes out to be 0 under this scheme.

    --
    Beware of bugs in the above code; I have only proved it correct, not
    tried it. -- Donald E. Knuth
    Joshua Cranmer, Jul 26, 2008
    #12
  13. Patricia Shanahan wrote:
    > At a more basic level, you seem to be running a simple greedy algorithm
    > for each traveler, and I don't see anything to backtrack, look ahead, or
    > otherwise ensure that locally suboptimal choices win if they are needed
    > for the globally optimal solution.


    I wish to expound on this idea:

    NP-hard problems are such because of the fact that one needs to look at
    the entire problem to be able to infer which next step is best. To date,
    the best correct algorithms more or less rely on "try all, but we know
    these paths won't work."

    To find a P algorithm, one would have to condense the entire global
    solution space--in polynomial time--into some higher-order information
    that one can then use to pick the next node in Ω(N) of time (I'm tempted
    to say θ(N), but it could very well be true that the act of generating
    the information produces output in such a way that the next step is θ(1)).

    If you want my opinion, I think a very good line of thought would focus
    on cataloging what one gives up by NOT choosing to visit a particular
    node next.

    In any case, I'm beginning to also get the suspicion that I've been
    killfiled by JSH... Oh well, Patricia is probably more experienced than
    I in this area anyways.
    --
    Beware of bugs in the above code; I have only proved it correct, not
    tried it. -- Donald E. Knuth
    Joshua Cranmer, Jul 26, 2008
    #13
  14. JSH

    JSH Guest

    On Jul 25, 6:46 pm, Joshua Cranmer <> wrote:
    > JSH wrote:
    > > On Jul 25, 12:40 pm, Patricia Shanahan <> wrote:
    > >> JSH wrote:

    >
    > >> ...> As that's fairly easy I'd like to focus first on getting a full path
    > >>> from an arbitrary start at just any node, say, chosen randomly.
    > >> ...

    >
    > >> Unless you are aiming for a probabilistic algorithm, which may get
    > >> different results for the same problem on different runs, don't pick the
    > >> start node at random. Make up some rule, no matter how arbitrary.

    > > That solution is to try EACH node in turn as a starting point, and
    > > calculate the path, so it doesn't matter which one your start with, so
    > > yeah, you could do it randomly.

    >
    > That doesn't quite solve the problem: if going to node A and node B have
    > the same cost, what criteria should I use to choose between them?
    > Patricia would probably be satisfied with an answer like "the latter you
    > looked at" instead of saying "choose randomly".


    That works for me!

    The idea is to be as complete as possible, as I assume that there
    would be cases where paths came out with equal values.

    > > But what about starting at another node?

    >
    > That doesn't work either. Make a larger graph containing two of these
    > "traps" and you'll note that you can't start from both of them at the
    > same time. I haven't tested it myself, but your solution gets around the
    > problem by (essentially) stating "start in the central node of the
    > subgraph", which can't work if there are two subgraphs like that.


    That's a more general assumption than what she made and sounds good
    intuitively but you haven't even begun to give any kind of proof.

    But it is a route to a counterexample to this idea: use two "traps"
    and prove that the algorithm as described cannot give an optimal
    path.

    > > I call a graph where costs match well with distance a well correlated
    > > graph, and notice your example is not one, as distances are all units,
    > > so I found a way to get my distance argument back in, with
    > > correlation.

    >
    > Define "match well with distance," please.


    Goes back to when I was arguing that wrapped up in the real world with
    any TSP problem is some notion of distance. For instance, long trips
    tend to cost more, so it costs more to fly to a distant city than to
    drive to the supermarket.

    But, as was noted you can have situations where cost has no connection
    with distance, and you can have a TSP problem where no distances are
    given, which lead me to the distance normalized algorithm where all
    nodes are assumed to be equidistant from each other.

    However, the algorithm I use at its heart is about distance
    correlation, where the longer the distance between nodes the greater
    the cost.

    So if I'm right, it easily solves TSP when cost matches well with
    distance, and has problems if it does not, like with the "traps" you
    mention.

    I call a graph where distance matches perfectly with cost, a perfect
    graph.

    > > So in my terminology your example is a dis-correlated graph, where the
    > > ratio of starting points that give the optimal solution to points that
    > > do not gives the degree of correlation.

    >
    > In this example, the number, I suppose is 1/4. I'm pretty sure a graph
    > matching my description comes out to be 0 under this scheme.


    That would be an interesting counterexample.

    I have suspicions for why your intuition is wrong in this case, but of
    course, I want you to be wrong, so that doesn't say much.

    But my thinking has to do with rotations in space (don't ask the
    dimension), so that there is always an angle--if an optimal path
    exists--that cuts through optimally, or maybe I'm just babbling.

    Trouble with from scratch problem solving is it can take time, which
    is why I sound differently this week than last, as last week, I'd just
    come up with this approach. Kind of just came fully formed--why not
    use two travelers with one going forward and one going backwards in
    time?

    So this entire algorithm is like a baby, and everything is fresh
    ground as I slowly work my way through theories, implications, rough
    guesses and complete b.s. in the hopes of getting an answer.


    James Harris
    JSH, Jul 26, 2008
    #14
  15. JSH

    JSH Guest

    On Jul 25, 6:57 pm, Joshua Cranmer <> wrote:
    > Patricia Shanahan wrote:
    > > At a more basic level, you seem to be running a simple greedy algorithm
    > > for each traveler, and I don't see anything to backtrack, look ahead, or
    > > otherwise ensure that locally suboptimal choices win if they are needed
    > > for the globally optimal solution.

    >
    > I wish to expound on this idea:
    >
    > NP-hard problems are such because of the fact that one needs to look at
    > the entire problem to be able to infer which next step is best. To date,
    > the best correct algorithms more or less rely on "try all, but we know
    > these paths won't work."
    >
    > To find a P algorithm, one would have to condense the entire global
    > solution space--in polynomial time--into some higher-order information
    > that one can then use to pick the next node in Ù(N) of time (I'm tempted
    > to say è(N), but it could very well be true that the act of generating
    > the information produces output in such a way that the next step is è(1)).


    This idea is an attempt to do just that by slicing the problem up into
    a series of decision points and unraveling it, kind of like peeling an
    orange.

    It's sort of like, if you hit the problem at just the right angle--
    here the main idea is solving the problem from TWO directions at
    once!--do the decisions have a certain simple order to them?

    > If you want my opinion, I think a very good line of thought would focus
    > on cataloging what one gives up by NOT choosing to visit a particular
    > node next.
    >
    > In any case, I'm beginning to also get the suspicion that I've been
    > killfiled by JSH... Oh well, Patricia is probably more experienced than
    > I in this area anyways.


    Nope. I stay around as long as things are interesting. And read
    people who make sense (and even annoying people who don't but I don't
    like it). To me the discussing and arguing is a lot of the fun which
    helps me figure things out very, very, rapidly, even if what I learn
    is that I'm wrong.

    I have a new idea. It came to me as a clever trick, and as I ponder
    it more, I like it more, but it still might be crap.

    But it's a great opportunity to ponder and play with something, and
    that means that I may read without replying, as I find it a waste of
    time to just reply and reply anyway.

    The point here really is puzzling out this approach, where I'm still
    telling myself, hey, at least this thing will give a round trip
    solution and hit every node, so maybe down the line I program it and
    see what kind of funny little graphs it makes anyway.

    Ok. I've replied a lot in this thread so it's time for me to shut-up
    and ponder.

    I'm still not quite to programming as I keep finding myself fleshing
    out the algorithm!

    But I think it's done now, so maybe by next week or within a couple of
    weeks I'll throw down some code.


    James Harris
    JSH, Jul 26, 2008
    #15
  16. JSH

    JSH Guest

    On Jul 26, 6:57 am, Lew <com.lewscanon@lew> wrote:
    > Joshua Cranmer wrote:
    > > In any case, I'm beginning to also get the suspicion that I've been
    > > killfiled by JSH...

    >
    > That would be a huge mistake on his part.  You're the rare bird who hasn't
    > killfiled JSH, never mind who takes his arguments seriously enough to respond
    > thoughtfully and respectfully.  If he killfiles everyone who disagrees with
    > him, no one will be left.
    >
    > --
    > Lew


    Why are you preaching to this group?

    There has been a nice discussion, where yes, other thoughtful people
    are disagreeing with me, and maybe you think I'm some horrible person,
    but why did you choose to be one of the people who comes in and pisses
    in the Kool-Aid at the party?

    I am curious, why would you choose to be the one to crap in public so
    that other unsuspecting people have to come here and get a dose of
    hostility in the morning like I just did, when I was hoping to come in
    and do a little further discussion, even if it meant agreeing that
    maybe this idea isn't worth anything.

    What motivates people like you to spread pain and misery around the
    world?

    Can you answer? Do you get off on it? Somewhere giggling to yourself
    now imagining how many annoyed people are suddenly turned off from the
    discussion?

    People like me? I am turned off from this idiot crap from people like
    you and know I will not killfile you as I rant instead and ponder why,
    why, why?

    What is wrong with you and people like you?


    James Harris
    JSH, Jul 26, 2008
    #16
  17. JSH

    Junoexpress Guest

    On Jul 26, 10:06 am, JSH <> wrote:
    > On Jul 26, 6:57 am, Lew <com.lewscanon@lew> wrote:
    >
    > > Joshua Cranmer wrote:
    > > > In any case, I'm beginning to also get the suspicion that I've been
    > > > killfiled by JSH...

    >
    > > That would be a huge mistake on his part. You're the rare bird who hasn't
    > > killfiled JSH, never mind who takes his arguments seriously enough to respond
    > > thoughtfully and respectfully. If he killfiles everyone who disagrees with
    > > him, no one will be left.

    >
    > > --
    > > Lew

    >

    Why is your response so disproportionately out of response with Lew's
    comment? Everything that Lew said was true.
    1) It would be a mistake to killfile someone who took your arguments
    seriously,
    2) there aren't a lot of these people around
    3) it is true that if you killfiled eveyone who disagreed with you, no
    one would be left.
    All true (albeit, unpleasant to you) statements. And then you start in
    with him pissing on this and crapping on that ... Methinks thou dost
    protest too much.

    No, the reality is that you were about to blow up anyways, and just
    looking for the first opportunity. You have Patricia saying she
    doesn't believe your idea will work anyways and asking why you don't
    actually study the problem. But this is something you know you will
    never do past a cursory Wikipedia search, so how to get out of it? Oh
    yeah, start blaming *others*: how convenient.

    But, hey, at least the post *was* pretty funny.

    M
    Junoexpress, Jul 27, 2008
    #17
  18. JSH wrote:
    > That's a more general assumption than what she made and sounds good
    > intuitively but you haven't even begun to give any kind of proof.
    >
    > But it is a route to a counterexample to this idea: use two "traps"
    > and prove that the algorithm as described cannot give an optimal
    > path.


    In working this out in my head, I realized there are two cases where the
    algorithm will generate correct pathing for one of these traps (or you
    could view it as one case): either the trap node has to be the start
    position, or both walkers have to hit the entrance nodes at the same
    time, i.e., the trap node is the halfway-point in the path.

    Let me do the best to explain the graph. All weights are assumed to be
    100 if not explicitly stated. This is also going to be an undirected
    graph, so all weights are symmetrical.

    A "trap" is defined as a triad (1, 2, 3) where 1 has weight 10000 to all
    other points (unless otherwise stated), 1000 to 2 and 3, and the 2-3
    weight is 1.

    The graph has two traps, (A, B, C), and (D, E, F), as well as two more
    nodes G and H. The A-to-D connection has 30000. The connection between B
    and E is 5, as is the connection between C and G, G and H, and H and F.
    If I have appropriately assigned the weights, the best cycle is
    ABEDFHGCA, which has a total cost of 4040.

    Should the algorithm start at A, the travelers will move to B and C,
    then E and G respectively. At that point, the travelers will move to F
    and H, forcing them to take the expensive 2 * 10000 to get to G, which
    is obviously much more than the first path. Starting at B or C produces
    worse results, as the travelers forgo the cheap connections to A.
    Starting at G will move the travelers to H and C, then F and B, forgoing
    the cheap connections to both A and D. Graph reflection will show that
    similar results occur for the other nodes D, E, F, and H.

    As a good lesson for generating similar counterexamples, this is how I
    generated the graph:

    I first noticed the problem of the trap, aided by Patricia's posts. Your
    solution to this was to start at the central node, so a counterexample
    involving traps would need to find somewhere to force you to start
    elsewhere, i.e., using two traps. This was the source of my "intuition."

    To find a concrete algorithm, I visualized what the counterexample graph
    should look like: two traps with a minimal number of nodes. I realized
    at this point that putting the traps diametrically opposed would not
    work, so I inserted a node on one side to force one traveler to get
    there first.

    Here, I realized that might not work, since the two travelers would then
    be considering the node at the same time, and I wasn't sure I could
    adjust node weights properly to get them to miss in the right way, so I
    added another node on the longer side to ensure that the
    simultaneously-considered node was not part of the cheap connection to
    the trap. Note that arbitrary numbers of nodes could be added to avoid
    longer lookahead loops.

    At that point, I added the weights to complete the graph. Factors of
    ten, for this small a graph, were sufficient. The conditions that make
    up a trap are by now well-documented, so those weights were easy; to be
    sure that no one would shortcut the traps I tripled the A-D line
    probably unnecessarily. I forced the traveler onto the path by lowering
    those connections (the factor of ten was probably unnecessary; I
    originally used two, but added more to be on the safe side).

    For simple algorithms, the qualitative analysis tends to be sufficient
    (unless the constraints are such that I need to watch the balance of
    weights); I included quantitative values mostly to be complete and to
    head off any rebuttals that it wasn't sufficient.

    > So if I'm right, it easily solves TSP when cost matches well with
    > distance, and has problems if it does not, like with the "traps" you
    > mention.


    If I restricted myself to Euclidean graphs, I could probably come up
    with counterexamples. However, I am (at this point) still working with
    only my innate visualization skills, which means I am heavily relying on
    qualitative analysis to reach conclusions, whereas Euclidean
    constraints, most notably the triangle inequality, require me to be more
    precise with numbers, as I explained above.

    > I have suspicions for why your intuition is wrong in this case, but of
    > course, I want you to be wrong, so that doesn't say much.


    Everyone would love for counterexamples to be wrong. :)

    > Trouble with from scratch problem solving is it can take time, which
    > is why I sound differently this week than last, as last week, I'd just
    > come up with this approach. Kind of just came fully formed--why not
    > use two travelers with one going forward and one going backwards in
    > time?


    Having the two travelers move in tandem doesn't seem to be gaining much
    at this point; in fact, it's probably producing the same results as a
    one-traveler algorithm except as O(N^4) instead of O(N^3). It certainly
    seems to be acting as such in these heavily-massaged edge cases.

    I'm not sure having two travelers will ever add much; the most common
    case where an algorithm involves such two travelers is the Bidirectional
    Search, where its improvement over BFS (or other corresponding
    unidirectional searches) comes from the fact that the algorithm is
    solved by local optimization, so having two steps toward the solution
    improves time by a constant factor by not considering as many next
    steps. TSP, however, needs to meet each node, so no benefit can be
    derived in the same way by looking at only half (very rough estimate)
    the nodes.

    If you wish to continue on in these probings, my recommendations would
    be to explore making the two interact in more dynamic ways. For example,
    one could theoretically have one traveler somehow "warn" the other
    traveler that what looks like a nice downhill path is actually the
    steepest way up the mountain. This would probably mean dropping the
    notion that one traveler is forwards in time and the other backwards, or
    dropping that the two make moves at the same time.

    > So this entire algorithm is like a baby, and everything is fresh
    > ground as I slowly work my way through theories, implications, rough
    > guesses and complete b.s. in the hopes of getting an answer.


    A good way to continue in probing algorithms is to look at the failures
    and mull over why they fail. If P=NP, that would imply that the global
    nature of the problem can be reduced to some statements about local
    moves. That means that every class of graph that fails the greedy TSP
    would have some step in the algorithm that redistributes expected
    weights in such a way that the class succeeds.

    The task, then, of proving P = NP in this manner would require us to
    construct a list of all cases that fail the greedy (locally optimal)
    TSP, construct a second graph whose edge weights are assigned in such a
    manner that a greedy solution on /that/ graph solves the original graph,
    and then prove that such an algorithm works for all failure cases and
    that the list of failure cases is complete. This approach is similar to
    how the four color theorem was proved.

    So, the list to date:

    Greedy TSP Failure Classes:
    1. The smallest weight outgoing edge from node A goes to a node B, while
    the smallest weight ingoing and outgoing edge from node C is node A and
    node B, respectively, i.e., the "traps" we have repeatedly discussed.

    --
    Beware of bugs in the above code; I have only proved it correct, not
    tried it. -- Donald E. Knuth
    Joshua Cranmer, Jul 27, 2008
    #18
  19. Joshua Cranmer wrote:
    > JSH wrote:
    > > That's a more general assumption than what she made and sounds good
    > > intuitively but you haven't even begun to give any kind of proof.
    > >
    > > But it is a route to a counterexample to this idea: use two "traps"
    > > and prove that the algorithm as described cannot give an optimal
    > > path.

    >
    > In working this out in my head, I realized there are two cases where the
    > algorithm will generate correct pathing for one of these traps (or you
    > could view it as one case): either the trap node has to be the start
    > position, or both walkers have to hit the entrance nodes at the same
    > time, i.e., the trap node is the halfway-point in the path.

    ....

    Here's another example, and a brute force program for finding the
    minimum cost. In this case, to avoid arguments about distance, I'm
    specifying only coordinates. The cost of each edge is the Java double
    approximation to the Euclidean distance between the vertices.

    To make everything as self-contained and simple as possible, I specify
    the problem inside the program. Here's the definition for my latest example:

    private static final double L = 2;
    private static final double M = 3;

    private static final Node[] NODES = new Node[] {
    new Node("A", -L/2,M+L/2), new Node("B", L/2, M+L/2),
    new Node("C", M+L/2, L/2), new Node("D", M+L/2, -L/2),
    new Node("E", L/2, -M-L/2), new Node("F", -L/2, -M-L/2),
    new Node("G", -M-L/2, -L/2), new Node("H", -M-L/2, L/2),
    new Node("W", -L/2, L/2), new Node("X", L/2, L/2),
    new Node("Y", L/2, -L/2), new Node("Z", -L/2, -L/2)

    };

    W, X, Y, and Z are the corners of an L by L square. The remaining points
    are corners of L by M rectangles attached to each edge of the square.

    With L=2 and M=3 the results are:

    Best Score 32.0000
    Best Cycle: A W H G Z F E Y D C X B A

    The rest of this message is a simple TSP solving program. Although the
    12 vertex case only takes a few seconds, it is exponential time so I
    don't recommend using it for much larger problems.

    import java.util.Arrays;
    import java.util.Deque;
    import java.util.HashSet;
    import java.util.LinkedList;
    import java.util.Set;

    public class BruteForce {
    /*
    * Problem definition.
    */
    private static final double L = 2;
    private static final double M = 3;

    private static final Node[] NODES = new Node[] {
    new Node("A", -L/2,M+L/2), new Node("B", L/2, M+L/2),
    new Node("C", M+L/2, L/2), new Node("D", M+L/2, -L/2),
    new Node("E", L/2, -M-L/2), new Node("F", -L/2, -M-L/2),
    new Node("G", -M-L/2, -L/2), new Node("H", -M-L/2, L/2),
    new Node("W", -L/2, L/2), new Node("X", L/2, L/2),
    new Node("Y", L/2, -L/2), new Node("Z", -L/2, -L/2)

    };

    public static void main(String[] args) {
    BruteForce test = new BruteForce(NODES);
    long start = System.nanoTime();
    test.solve();
    long end = System.nanoTime();
    System.out.printf("Best Score %g%n", test.bestScore);
    System.out.print("Best Cycle:");
    for (Node n : test.bestCycle) {
    System.out.print(" "+n.getId());
    }
    System.out.println();
    System.out.printf("Elapsed time: %g seconds%n", (end-start)/1e9);
    }

    private final Set<Node> nodes = new HashSet<Node>();

    private double bestScore = Double.POSITIVE_INFINITY;

    private Deque<Node> bestCycle;

    private final Node startNode;

    public BruteForce(Node[] nodes) {
    this.nodes.addAll(Arrays.asList(nodes));
    startNode = nodes[0];
    }

    /**
    * Calculate bestScore and bestCycle.
    */
    private void solve() {
    /* Do special case handling for getting started with startNode as the
    * first and last node of the cycle.
    */
    nodes.remove(startNode);
    Deque<Node> prefix = new LinkedList<Node>();
    prefix.add(startNode);
    nodes.remove(startNode);
    solve(nodes, 0, prefix);
    nodes.add(startNode);
    }

    /**
    * Calculate bestScore and bestCycle starting from a
    * specified prefix path.
    * @param available The set of nodes that are not in the prefix path
    * @param prefixScore The score for the prefix path
    * @param prefix The prefix path
    */
    private void solve(Set<Node> available,
    double prefixScore, Deque<Node> prefix) {
    if (available.isEmpty()) {
    /* Finished, the path is complete except for closing the cycle*/
    double score = prefixScore
    + prefix.getLast().distance(prefix.getFirst());
    if (score < bestScore) {
    /* This cycle is better than the best so far */
    bestCycle = new LinkedList<Node>();
    bestCycle.addAll(prefix);
    bestCycle.addLast(prefix.getFirst());
    bestScore = score;
    }
    return;
    }
    /* Need two copies of available set, one to provide an iterator
    * and another that can be temporarily modified without
    * disturbing the iterator.
    */
    Set<Node> workingAvailable = new HashSet<Node>(
    available);
    for (Node n : available) {
    double score = prefixScore + prefix.getLast().distance(n);
    if (score < bestScore) {
    /* There is a possibility that this path will lead to the best
    * cycle.
    */
    workingAvailable.remove(n);
    prefix.addLast(n);
    solve(workingAvailable, score, prefix);
    prefix.removeLast();
    workingAvailable.add(n);
    }
    }
    }
    static class Node {
    private String id;
    private double x;
    private double y;

    /**
    * Create node.
    * @param id Identifier-like string labeling the node.
    * @param x X coordinate.
    * @param y Y coordinate.
    */
    Node(String id, double x, double y) {
    this.id = id;
    this.x = x;
    this.y = y;
    }

    String getId() {
    return id;
    }

    double distance(Node o) {
    double xDist = x - o.x;
    double yDist = y - o.y;
    return Math.sqrt((xDist * xDist + yDist * yDist));
    }
    }
    }
    Patricia Shanahan, Jul 27, 2008
    #19
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Steven Bethard

    aligning text with space-normalized text

    Steven Bethard, Jun 30, 2005, in forum: Python
    Replies:
    6
    Views:
    376
    Steven Bethard
    Jul 1, 2005
  2. mike
    Replies:
    1
    Views:
    299
  3. Glenn
    Replies:
    1
    Views:
    353
    Joseph Kesselman
    Aug 24, 2006
  4. Replies:
    12
    Views:
    642
  5. JSH
    Replies:
    38
    Views:
    1,008
    Lits O'Hate
    Aug 29, 2008
Loading...

Share This Page