Many optimisation problems are really hard, for instance the NP-complete ones. Exact (but slow) branch and bound procedures and unreliable (but quick) heuristics are two standard ways to find exact or approximate solutions. When the task is to generate several alternative solutions it is possible to make a virtue of necessity: there are normally many more good solutions than perfect ones – and different heuristics or heuristics with random elements will not always return the same good solution.

So, a simple strategy is to apply one or several heuristics repeatedly to the same problem, and to record the solutions generated during this process. Either, exactly as many solutions as needed are generated. Or a larger preliminary set of solutions is produced, giving the chance for improvement by shortlisting. Natural shortlisting criteria are quality and spread. Concerning spread, distance measures on the set of admissible solutions may be a helpful tool in conjunction with clustering algorithms.

The normal situation is that a heuristic contains randomness to a certain extent. Then no additional efforts are necessary: the heuristic is simply executed in independent runs, until enough different good solutions have been generated. Here we use the Travelling Salesperson Problem (TSP) for points as an example to demonstrate the approaches. For exchange heuristics and insertion heuristics on the TSP we give one example each, highlighting the probabilistic elements.

In the TSP with symmetric distances between the points local search with 2-exchanges is a standard exchange heuristic. In the following pseudo-code denote the -th component of vector .

*
Local-Search-with-2-Exchanges-for-TSP(
)
*

1 Generate a random starting tour . 2there exist indices with and , and For the special case we take . 3`WHILE`

4 compute the length of tour 5`DO`

`RETURN`

Random elements in this heuristic are the starting tour and the order in which edge pairs are checked in step 2. Different settings will lead to different local minima. In large problems, for instance with 1,000 random points in the unit square with Euclidean distance it is quite normal when 100 independent runs of the 2-exchange heuristic lead to almost 100 different local minima.

The next pseudo-code describes a standard insertion heuristic.

*
Insertion-Heuristic-for-TSP(
)
*

1 generate a random permutation from the elements of 2 3`FOR`

4`TO`

find the minimum of for Here again for . let the minimum be at 5 6 compute the length of tour 7`DO`

`RETURN`

So the elements are inserted one by one, always at the place where insertion results at minimal new length.

The random element is the permutation of the points. Like for the 2-exchanges, different settings will typically lead to different local minima. Sometimes an additional chance for random choice occurs when for some step the optimal insertion place is not unique.

Many modern heuristics are based on analogies to nature. In such cases the user has even more choices: In Simulated Annealing several (good) intermediate solutions from each single run may be taken; or from each single run of a Genetic Algorithm several solutions may be taken, either representing different generations or multiple solutions of some selected generation.

A special technique for repeated exchange heuristics is based on the perturbation of local optima: First make a run to find a local optimum. Then randomise this first optimum by a sequence of random local changes. From the resulting solution start local search anew to find a second local optimum. Randomise this again and so on. The degree of randomisation steers how different the local optima in the sequence will become.

Even in case of a deterministic heuristic there may be chances to collect more than only one candidate solution: In tiebreak situations different choices may lead to different outcomes, or the heuristic may be executed with different precisions (=number of decimals) or with different rounding rules. In Subsection 23.2.4 penalty methods are described, with artificial modification of problem parameters (for instance increased edge lengths) in repeated runs. In anytime algorithms —like iterative deepening in game tree search—also intermediate solutions (for preliminary search depths) may be used as alternative candidates.

When several heuristics for the same problem are available, each one of them may contribute one or several candidate solutions. The 3-Hirn setting, as described in item (5) of Subsection 23.1.1, is an extreme example of a multiple-choice system with more than one computer program: the two programs should be independent of each other, and they are running on distinct computers. (Tournament chess is played under strict time limits at a rate of three minutes per move. Wasting computational resources by having two programs run on a single machine in multi-tasking mode costs 60 to 80 rating points [ 161 ]). The chess programs used in 3-Hirn are standard of-the-shelf products, not specifically designed for use in a multiple-choice setting.

Every real world software has errors. Multiple-choice systems with independent programs have a clear advantage with respect to catastrophic failures. When two independent programs are run, each with the same probability for catastrophic errors, then the probability for a simultaneous failure reduces to . A human controller in a multiple-choice system will typically recognise when candidate solutions have catastrophic failures. So the “mixed” case (one normal and one catastrophic solution) with probability will not result in a catastrophe. Another advantage is that the programs do not need to have special -best or -choice mechanisms. Coinciding computer proposals may be taken as an indication that this solution is just really good.

However, multiple-choice systems with independent programs may also have weak spots:

When the programs are of clearly different strength, this may bring the human selector in moral conflicts when he prefers a solution from the less qualified program.

In multistep actions the proposals of different programs may be incompatible.

For a human it costs extra time and mental energy to operate more than one program simultaneously.

Not seldom – depending on programs and operating systems – a PC will run unstably in multi-tasking mode.

And of course it is not always guaranteed that the programs are really independent. For instance, in the late 1990's dozens of vehicle routing programs were available in Germany, all with different names and interfaces. However, they all were based on only four independent program kernels and data bases.

A more controlled way to find different candidate solutions is given by the **penalty method**. The main idea of this method is illustrated on the route planning example. Starting with an optimal (or good) route we are looking for an alternative solution which fulfills the following two criteria as much as possible.

(i) should be good with respect to the objective function. Otherwise it is not worthwhile to choose it. In our example we have the length (or needed time) of the route as first objective.

(ii) should have not too much in common with the original solution. Otherwise it is not a *true* alternative. In case of so called *micro mutations* the risk is high that all these similar candidates have the same weak spots. In our example a “measure for similarity” may be the length of the parts shared by and .

This means should have a short length but it should also have only little in common with . Therefore we use a combination of the two objective functions – the length of the route and the length of the road sections shared by and . This can be done by punishing the sections used by and solving the shortest path problem with this modified lengths to get the solution .

By the size of the penalties different weightings of criteria (i) and (ii) can be modelled.

A most natural approach is to use **relative penalty factors**. This means that the length of each section belonging to is multiplied by a factor .

*
Penalty-Method-with-Relative-Penalty-Factors(
)
*

1 find the shortest path from node to node in the weighted graph 2all 3`FOR`

`DO`

belongs to 4`IF`

5`THEN`

6 find the the shortest path from node to node in the modified graph 7 compute its unmodified length 8`ELSE`

and`RETURN`

Consider the following example.

**Example 23.1 **Given is a graph with weighted edge lengths. In Figure 23.2 the numbers denote the length of the according edges. The shortest path from to is via - - - - - with length . Multiplying all edge lengths of by and solving the obtained shortest path problem gives the alternative solution via - - - - with modified length and normal length . The shared parts of and are - and - with total length .

The size of has to be appropriate for the situation. In the commercial vehicle routing program [ 16 ] all sections of a shortest (or fastest) route were multiplied by , i.e., . Then the alternative route was computed. In [ 39 ] recognition of linear structures (streets, rivers, airport lanes) in satellite images was done by shortest path methods. Here turned out to be an appropriate choice for getting interesting alternative candidates.

Instead of relative penalty factors **additive penalties** might be used. That means we add a constant term to all edges we want to punish. The only modification of the algorithm above is in step 4.

```
```**
**`THEN`

**Example 23.2 **Given is the graph from Example 23.1 (see Figure 23.2). The shortest path from to is still via - - - - - with length . Adding to all edges of and solving the resulting shortest path problem gives the alternative solution via - - - - - with modified length and normal length . and have three edges in common.

In principle this approach with additive penalties is not worse in comparison with multiplicative penalties. However, the method with multiplicative penalties has the advantage to be immune against artificial splits of edges.

For a generalisation of the penalty method from routing problems the following definition of optimisation problems is helpful.

**Definition 23.1 **
*Let be an arbitrary finite set and a set of subsets of . is called the base set and the elements of are feasible subsets of . Let be a real valued weight function on . For every we set .*

*The optimisation problem is a Sum Type Optimisation Problem or in short “
-type problem”*

Remarks:

The elements are also called

**feasible solutions.**By substitution of by every maximisation problem can be formulated as a minimisation problem. Therefore we will also call maximisation problems -type problems.

Shortest Path Problem

Assignment Problem

Travelling Salesperson Problem (TSP)

Knapsack Problem

Sequence Alignment Problem

**Example 23.3 **Consider the Knapsack Problem. Given a set of items , a weight function , a value function , and a knapsack capacity . What is the most valuable collection of items whose weight sum does not exceed the knapsack capacity?

Choosing as base set and as the family of all subsets whose weight sum is smaller or equal to gives a representation as a -type problem: maximise over all .

**Definition 23.2 **
*Let be an arbitrary set and the set of feasible subsets of . Let be a real-valued and a non-negative real-valued function on .*

*For every , let be one of the optimal solutions of the problem*

*With an algorithm which is able to solve the unpunished problem we can also find the solutions . We just have to modify the function by replacing by for all . is called an
penalty solution or an
alternative.*

*Additionally we define the solution of the problem*

*which has a minimal value and among all such solutions a minimal value .*

**Remark**. If both and are positive real-valued functions, there is a symmetry in the optimal solutions: is an penalty solution () for the function pair , if and only if is a penalty solution for the pair .

To preserve this symmetry it makes sense to define as an optimal solution of the problem

That means is not only an optimal solution for the objective function , but among all such solutions it has also a minimal -value.

**Example 23.4 **We formulate the concrete Example 23.1 in this abstract -type formulation. We know the shortest path from to and search for a “good” alternative solution. The penalty function is defined by

Often it is a priori not clear which choice of the penalty parameter produces good and interesting alternative solutions. With a “divide-and-conquer” algorithm one is able to find *all* solutions which can be produced by *any* parameter .

For finite sets we give an efficient algorithm which generates a “small” set of solutions with the following properties.

For each element there exists an such that is an optimal solution for the penalty parameter .

For each there exists an element such that is optimal for the penalty parameter .

has a minimal number of elements among all systems of sets which have the two properties above.

We call a solution which is optimal for at least one penalty parameter **penalty-optimal**. The following algorithm finds a set of penalty-optimal solutions which covers all .

For easier identification we arrange the elements of the set in a fixed order , with .

The algorithm has to check that for there is no with such that for this penalty parameter neither nor is optimal. Otherwise it has to identify such an and an -penalty solution . In step 11 of the pseudo code below the variable Border is set to if it turns out that such an intermediate does not exist.

We present the pseudocode and give some remarks.

*
Divide-and-Cover(
)
*

1 compute , which minimises and has a -value as small as possible. 2 compute , which minimises and has a -value as small as possible. 34`IF`

; ; Border ( minimises the functions`THEN`

andand is optimal for all .) 5; ; Border ; . 6`ELSE`

There is an with Border . 7`WHILE`

8 Find an optimal solution for the parameter . 9`DO`

10`IF`

Border 11`THEN`

12 13 Border (Border ,Border Border , , Border ) 14 15`ELSE`

, Border`RETURN`

At the end is a sequence of different penalty-optimal solutions and the vector includes consecutive epsilons.

**This algorithm is based on the following properties:**

(1) If is an -optimal solution then there exists an interval , , such that is optimal for all penalty parameters and for no other parameters.

(2) For two different solutions and with nonempty optimality intervals and , only three cases are possible.

. This happens iff and .

and are disjoint.

, this means the intersection contains only a single epsilon. This happens if and are neighbouring intervals.

By finiteness of there are only finitely many feasible solutions . So, there can be only finitely many optimality intervals. Hence, (1) and (2) show that the interval can be decomposed into a set of intervals . For each interval we have a different solution which is optimal for all in this interval. We call such a solution an **interval representative**.

(3) The aim of the algorithm is to find the

*borders*of such optimality intervals and for each interval a representing solution. In every iteration step an interval representative of a new interval or a new border between two different intervals will be found (in steps 7–13). When there are optimality intervals with it is sufficient to solve problems of the type to detect all of them and to find representing solutions.

When only one -alternative shall be computed the question comes up which penalty parameter should be used to produce a “best possible” alternative solution. If the penalty parameter is too small the optimal and the alternative solution are too similar and offer no real choice. If the parameter is too large the alternative solution becomes too poor. The best choice is to take some “medium” .

We illustrate this effect in the following route planning example.

**Example 23.5 **Assume that we have to plan a route from a given starting point to a given target. We know the standard travel times of all road sections and are allowed to plan for *two* different routes. In last minute we learn about the real travel times and can choose the fastest of our two candidate routes.

Let the first route be the route with the smallest standard travel time and the second one a route found by the penalty method. Question: Which penalty parameter should we use to minimise the real travel time of the fastest route?

Concretely, consider randomly generated instances of the shortest path problem in a weighted directed grid graph of dimension . The weights of the arcs are independently uniformly distributed in the unit interval . We compute , a path from the lower left corner to the upper right with minimal weight. Afterwards we punish the edges of path by multiplying by and calculate a whole set of -penalty solutions for . We get 30 solution pairs and can compare these.

The weight of an arc is the *standard travel time without time lags*, i.e. the minimal needed travel time on a free road without any traffic jam. The *real* travel time of this arc may differ from as follows:

independently for all edges . Here the are independent random numbers, uniformly distributed in the interval . The parameter is called **failure probability** and the parameter is called **failure width**.

For every pair we calculate the minimum of the two function values and . To get a direct impression of the benefit of having two solutions instead of one we scale with respect to the real value of the optimal solution .

We computed the values for 100,000 randomly generated grid graphs with failure probability and failure width . Figure 23.3 shows the averages for .

As seen in Figure 23.3, the expected quality of the solution pairs is unimodal in . That mean that first decreases and then increases for growing . In this example is the optimal penalty parameter.

In further experiments it was observed that the optimal parameter is decreasing in the problem size (e.g. for shortest paths in -grids, for and for grid graphs).

Independently whether all -penalty solutions are generated or only a single one (as in the prevoius pages), the following structural properties are provable: With increasing penalty factor we get solutions where

the penalty part of the objective function is fitted monotonically better (the solution contains less punished parts),

the original objective function is getting monotonically worse, in compensation for the improvement with respect to the penalty part.

These facts are formalised in the following theorem.

**Theorem 23.3 **
*Let be a real-valued function and a positive real-valued function on . Let be defined for according to Definition 23.2. The following four statements hold:*

*(i) is weakly monotonically decreasing in .*

*(ii) is weakly monotonically increasing in .*

*(iii) The difference is weakly monotonically increasing in .*

*(iv) is weakly monotonically increasing in .*

**Proof. **Let and be two arbitrary nonnegative real numbers with .

Because of the definition of and the following inequalities hold.

*(i)* In case we have

Subtracting (23.2) from (23.1) we get

In case inequality (23.3) follows directly from the definition of .

*(ii)* Subtracting (23.3) multiplied with from (23.2) we get

*(iii)* Subtracting (23.3) from (23.4) we get

*(iv)* With (23.2) and we have

If we have a solution and want to get *more than one* alternative solution we can use the penalty method several times by punishing with different penalty parameters , getting alternative solutions . This method has a big disadvantage, because only the shared parts of the main solution and each alternative solution are controlled by the values . But there is no direct control of the parts shared by two different alternative solutions. So, and may be rather similar for some .

To avoid this effect the penalty method may be used *iteratively* for the same .

*
Iterative-Penalty-Method(
)
*

1 solve the original problem and find the optimal solution . 2 define the penalty function as . 3 solve the modified problem and find the solution . 4`FOR`

5`TO`

6 solve the modified problem and find the solution . 7`DO`

`RETURN`

Step 5 may be replaced by the variant

```
```**
**`DO`

In the first case (5) a part of a solution belonging to of the solutions and is punished by the factor . In the second case () a part of a solution is punished with multiplicity one if it belongs to at least one of or .

The differences in performance are marginal. However, in shortest path problems with three solutions and setting (5) seemed to give slightly better results.

**Example 23.6 **Take the graph from Figure 23.2. For penalty parameter we want to find three solutions. The shortest path from to is via - - - - - with length . Multiplying all edges of by and solving the obtained shortest path problem gives the alternative solution via - - - - .

Applying setting we have to multiply the edge lengths of , , , and by penalty factor 1.1. The edges and have to be multiplied by factor 1.2 (double penalty). The optimal solution is path via - - - .

Applying setting we have to multiply the edge lengths , , , , , and by penalty factor 1.1. The optimal solution of this modified problem is path via - - - - - .

It is well known that shortest path problems as well as many other network flow problems can be solved with **Linear Programming**. Linear Programming may also be used to generate alternative solutions. We start with the description of Linear Programming for the basic shortest path problem.

Consider a directed graph and a function assigning a length to every arc of the graph. Let and be two distinguished nodes of .

Which is the shortest simple path from to in ?

For every arc we introduce a variable . Here shall be if is part of the shortest path and shall be otherwise.

With we denote the set of the successors of node and with we denote the set of the predecessors of node . The linear program is formulated as follows:

By the starting and target conditions node is a source and node is a sink. Because of the *
Kirchhoff
* conditions there are no other sources or sinks. Therefore there must be a “connection” from to .

It is not obvious that this connection is a simple path. The variables might have non-integer values or there could be circles anywhere. But there is a basic theorem for network flow problems [ 7 ][p. 318] that the linear program has an optimal solution where all have the value . The according arcs with represent a simple path from to .

**Example 23.7 **Consider the graph in Figure 23.4. The linear program for the shortest path problem in this graph contains six equality constraints (one for each node) and seven pairs of inequality constraints (one pair for each arc).

The optimal solution has .

Here we give an -representation for the task to find *two* alternative routes from to .

For every arc we introduce two variables and . If the arc is used in both routes, then both and shall have the value . If is a part of only one route, shall be and shall be . Otherwise and shall both be . is a penalty parameter to punish arcs used by both routes.

With this in mind we can formulate the linear program

**Example 23.8 **Consider again the graph in Figure 23.4. The linear program for the -alternative-paths problem in this graph contains six equality constraints (one for each node) and pairs of inequality constraints.

This linear program can be interpreted as a minimal cost flow problem.

Where is the connection between the linear program and the problem to find two candidate routes from to ?

**Theorem 23.4 **
*If the linear program has an optimal solution then it has also an optimal solution with the following properties.*

*There are disjoint sets with*

*(i) , and ,*

*(ii) , for all ,*

*(iii) , for all ,*

*(iv) , for all .*

*(v) represents a path P from to and represents a path P from to . is the set of arcs used by both paths.*

*(vi) No other pair of paths is better than , i.e.,*

*That means the sum of the lengths of P and P plus a penalty for arcs used twice is minimal.*

We conclude with some remarks.

For each arc there are two variables and . This can be interpreted as a street with a

**normal lane**and an additional**passing lane**. Using the passing lane is more expensive than using the normal lane. If a solution uses an arc only once, it takes the cheaper normal lane. But if a solution uses an arc twice, it has to take both the normal and the passing lane.The decomposition of the solution into two paths from the starting node to the target node is in most cases not unique. With the arcs in Figure 23.4 we can build two different pairs of paths from to , namely and . Both pairs are equi-optimal in the sense of Theorem 23.4. So the user has the chance to choose between them according to other criteria.

The penalty method and the LP-penalty method generally lead to different results. The penalty method computes the best single solution and a suitable alternative. The LP-penalty method computes a pair of good solutions with relatively small overlap. Figure 23.5 shows that this pair not necessarily contains the best single solution. The shortest path from to is – – – with length . For all the -penalty solution is – – – . The path pair has a total lengths of and a shared length of . But for the LP-Penalty method produces the pair with a total length of and a shared length of zero.

Finding candidate routes for some larger number is possible, if we introduce variables , for each arc and set the supply of and the demand of to . As objective function we can use for instance

The LP-penalty method does not only work for shortest path problems. It can be generalised to arbitrary problems solvable by linear programming.

Furthermore an analogous method – the

*Integer Linear Programming Penalty Method*– can be applied to problems solvable by integer linear programming.

In Subsection 23.2.2 we discussed the penalty method in combination with exact solving algorithms (e.g. Dijkstra-algorithm or dynamic programming for the shortest path problem). But also in case of heuristics (instead of exact algorithms) the penalty method can be used to find multiple candidate solutions.

**Example 23.9 **A well known heuristic for the TSP-problem is local search with 2-exchange steps (see Subsection 23.2.1).

*
Penalty-Method-for-the-TSP-Problem-with-2-Exchange-Heuristic
*

1 apply the 2-exchange heuristic to the unpunished problem getting a locally (but not necessarily globally) optimal solution 2 punish the edges belonging to by multiplying their lengths with 3 apply the 2-exchange heuristic to the punished problem getting an alternative solution 4 compute the unmodified length of 5 the pair is the output

Question: Which penalty parameter should be used to minimise the travel time of the fastest route?

An experiment analogous to the one described in Example 23.5 was executed for TSP instances with random cities in the unit square. Figure 23.6 shows the scaled averages for .

So, the expected quality of the solution pairs is (again) unimodal in the penalty factor . That means that first decreases and then increases for growing . In this example is the optimal penalty parameter.

In further experiments it was observed that the optimal penalty parameter is decreasing in the problem size.

**Exercises**

23.2-1 The following programming exercise on the Travelling Salesperson Problem helps to get a feeling for the great variety of local optima. Generate 200 random points in the 2-dimensional unit-square. Compute distances with respect to the Euclidean metric. Make 100 runs of local search with random starting tours and 2-exchanges. Count how many different local minima have been found.

23.2-2 Enter the same key words into different internet search engines. Compare the hit lists and their diversities.

23.2-3 Formulate the Travelling Salesperson Problem as a -type problem.

23.2-4 Proof the assertion of the remark on page.

23.2-5 How does the penalty function look like in case of additive penalties like in Example 23.2?

23.2-6 Prove the properties (1) and (2) after the algorithm *
Divide-and-Cover(
)
*.

23.2-7 Apply the *
Divide and cover
* algorithm to the shortest path problem in Figure 23.2 with starting node and target node . Set length of for each road section, and length of for the road sections belonging to the shortest path via - - - - - and for all other sections. So, the penalty value of a whole path is the length of this part shared with .

23.2-8 Find a penalty parameter for Example 23.6 such that the first setting (5) produces three different paths but the second setting () only two different paths for .