4.4. 4.4 Safeguarding intermediate results

In this section, we will see ways to introduce fault-tolerance that scale up better. Namely, we will show:

Theorem 4.17 There are constants such that for

for all , , for every deterministic computation of size there is an -resilient computation of size with the same result.

Let us introduce a concept that will simplify the error analysis of our circuits, making it independent of the input vector .

Definition 4.18 In a Boolean circuit , let us call a majority gate at a node a correcting majority gate if for every input vector of , all input wires of node have the same value. Consider a computation of such a circuit . This computation will make some nodes and wires of tainted. We define taintedness by the following rules:

– The input nodes are untainted.

– If a node is tainted then all of its output wires are tainted.

– A correcting majority gate is tainted if either it fails or a majority of its inputs are tainted.

– Any other gate is tainted if either it fails or one of its inputs is tainted.

Clearly, if for all -admissible random configurations the output is tainted with probability then the circuit is -resilient.

4.4.1. 4.4.1 Cables

So far, we have only made use of redundancy idea (ii) of the introduction to the present chapter: repeating computation steps. Let us now try to use idea (i) (keeping information in redundant form) in Boolean circuits. To protect information traveling from gate to gate, we replace each wire of the noiseless circuit by a “cable” of wires (where will be chosen appropriately). Each wire within the cable is supposed to carry the same bit of information, and we hope that a majority will carry this bit even if some of the wires fail.

Definition 4.19 In a Boolean circuit , a certain set of edges is allowed to be called a cable if in a noise-free computation of this circuit, each edge carries the same Boolean value. The width of the cable is its number of elements. Let us fix an appropriate constant threshold . Consider any possible computation of the noisy version of the circuit , and a cable of width in . This cable will be called -safe if at most of its wires are tainted.

Figure 4.5.  An executive organ.

An executive organ.


Let us take a Boolean circuit that we want to make resilient. As we replace wires of with cables of containing wires each, we will replace each noiseless 2-argument gate at a node by a module called the executive organ of gates, which for each , passes the th wire both incoming cables into the th node of the organ. Each of these nodes contains a gate of one and the same type . The wires emerging from these nodes form the output cable of the executive organ.

The number of tainted wires in this output cable may become too high: indeed, if there were tainted wires in the cable and also in the cable then there could be as many as such wires in the cable (not even counting the possible new taints added by faults in the executive organ). The crucial part of the construction is to attach to the executive organ a so-called restoring organ: a module intended to decrease the taint in a cable.

Figure 4.6.  A restoring organ.

A restoring organ.

4.4.2. 4.4.2 Compressors

How to build a restoring organ? Keeping in mind that this organ itself must also work in noise, one solution is to build (for an approriate ) a special -resilient circuit that computes the near-majority of its inputs in independent copies. Theorem 4.16 provides a circuit of size to do this.

It turns out that, at least asymptotically, there is a better solution. We will look for a very simple restoring organ: one whose own noise we can analyse easily. What could be simpler than a circuit having only one level of gates? We fix an odd positive integer constant (for example, ). Each gate of our organ will be a -input majority gate.

Definition 4.20 A multigraph is a graph in which between any two vertices there may be several edges, not just 0 or 1. Let us call a bipartite multigraph with inputs and outputs, -half-regular. if each output node has degree . Such a graph is a -compressor if it has the following property: for every set of at most inputs, the number of those output points connected to at least elements of (with multiplicity) is at most .

The compressor property is interesting generally when . For example, in an -compressor the outputs have degree , and the majority operation in these nodes decreases every error set confined to 10% of all input to just 5% of all outputs. A compressor with the right parameters could serve as our restoring organ: it decreases a minority to a smaller minority and may in this way restore the safety of a cable. But, are there compressors?

Theorem 4.21 For all , all integers with

there is an such that for all integer there is a -compressor.

Note that for , the theorem does not guarantee a compressor with .

Proof. We will not give an explicit construction for the multigraph, we will just show that it exists. We will select a -half-regular multigraph randomly (each such multigraph with the same probability), and show that it will be a -compressor with positive probability. This proof method is called the probabilistic method. Let

Our construction will be somewhat more general, allowing outputs. Let us generate a random bipartite -half-regular multigraph with inputs and outputs in the following way. To each output, we draw edges from random input nodes chosen independently and with uniform distribution over all inputs. Let be an input set of size , let be an output node and let be the event that has or more edges from . Then we have

On the average (in expected value), the event will occur for different output nodes . For an input set , let be the event that the set of nodes for which holds has size . By inequality (4.6) we have

The number of sets of inputs with elements is, using inequality (4.7),

The probability that our random graph is not a compressor is at most as large as the probability that there is at least one input set for which event holds. This can be bounded by

where

As we decrease the first term of this expression dominates. Its coefficient is positive according to the assumption (4.17). We will have if

Example 4.4 Choosing , , the value will work.

We turn a -compressor into a restoring organ , by placing -input majority gates into its outputs. If the majority elements sometimes fail then the output of is random. Assume that at most inputs of are tainted. Then outputs can only be tainted if majority gates fail. Let

be the probability of this event. Assuming that the gates of fail independently with probability , inequality 4.6 gives

Example 4.5 Choose , , as in Example 4.4, further (this will satisfy the inequality (4.19) needed later). With , we get .

The attractively small degree led to an extremely unattractive probability bound on the failure of the whole compressor. This bound does decrease exponentially with cable width , but only an extremely large would make it small.

Example 4.6 Choosing again , but (voting in each gate of the compressor over 41 wires instead of 7), leads to somewhat more realistic results. This choice allows . With , again, we get .

These numbers look less frightening, but we will still need many scores of wires in the cable to drive down the probability of compression failure. And although in practice our computing components fail with frequency much less than , we may want to look at the largest that still can be tolerated.

Figure 4.7.  An executive organ followed by a restoring organ.

An executive organ followed by a restoring organ.

4.4.3. 4.4.3 Propagating safety

Compressors allow us to construct a reliable Boolean circuit all of whose cables are safe.

Definition 4.22 Given a Boolean circuit with a single bit of output (for simplicity), a cable width and a Boolean circuit with inputs and outputs, let

be the Boolean circuit that we obtain as follows. The input nodes of are the same as those of . We replace each wire of with a cable of width , and each gate of with an executive organ followed by a restoring organ that is a copy of the circuit . The new circuit has outputs: the outputs of the restoring organ of belonging to the last gate of .

In noise-free computations, on every input, the output of is the same as the output of , but in identical copies.

Lemma 4.23 There are constants and for every cable width a circuit of size and gate size with the following property. For every Boolean circuit of gate size and number of nodes , for every , for every -admissible configuration of , the probability that not every cable of is -safe is .

Proof. We know that there are and with the property that for all a -compressor exists. Let be chosen to satisfy

and define

Let be a restoring organ built from a -compressor. Consider a gate of circuit , and the corresponding executive organ and restoring organ in . Let us estimate the probability of the event that the input cables of this combined organ are -safe but its output cable is not. Assume that the two incoming cables are safe: then at most of the outputs of the executive organ are tainted due to the incoming cables: new taint can still occur due to failures. Let be the event that the executive organ taints at least more of these outputs. Then , using the estimate (4.18). The outputs of the executive organ are the inputs of the restoring organ. If no more than of these are tainted then, in case the organ operates perfectly, it would decrease the number of tainted wires to . Let be the event that the restoring organ taints an additional of these wires. Then again, . If neither nor occur then at most (see (4.19)) tainted wires emerge from the restoring organ, so the outgoing cable is safe. Therefore and hence .

Let be the nodes of the circuit . Since the incoming cables of the whole circuit are safe, the event that there is some cable that is not safe is contained in ; hence the probability is bounded by .

4.4.4. 4.4.4 Endgame

Figure 4.8.  Reliable circuit from a fault-free circuit.

Reliable circuit from a fault-free circuit.

Proof. [Proof of Theorem 4.17] We will prove the theorem only for the case when our computation is a Boolean circuit with a single bit of output. The generalization with more bits of output is straightforward. The proof of Lemma 4.23 gives us a circuit whose output cable is safe except for an event of probability . Let us choose in such a way that this becomes :

It remains to add a little circuit to this output cable to extract from it the majority reliably. This can be done using Theorem 4.16, adding a small extra circuit of size that can be called the coda to . Let us call the resulting circuit .

The probability that the output cable is unsafe is . The probability that the output cable is safe but the “coda” circuit fails is bounded by . So, the probability that fails is , by the assumption .

Let us estimate the size of . By 4.21, we can choose cable width . We have , hence

Example 4.7 Take the constants of Example 4.6, with defined in equation (4.20): then , , , , , , giving

so making as small as possible (ignoring that it must be integer), we get . With , this allows . In addition to this truly unpleasant cable size, the size of the “coda” circuit is , which dominates the size of the rest of (though as it becomes asymptotically negligible).

As Example 4.7 shows, the actual price in redundancy computable from the proof is unacceptable in practice. The redundancy sounds good, since it is only logarithmic in the size of the computation, and by choosing a rather large majority gate (41 inputs), the factor in the here also does not look bad; still, we do not expect the final price of reliability to be this high. How much can this redundancy improved by optimization or other methods? Problem 4-6 shows that in a slightly more restricted error model (all faults are independent and have the same probability), with more randomization, better constants can be achieved. Exercises 4.4-1, 4.4-2 and 4.4-5 are concerned with an improved construction for the “coda” circuit. Exercise 4.5-2 shows that the coda circuit can be omitted completely. But none of these improvements bring redundancy to acceptable level. Even aside from the discomfort caused by their random choice (this can be helped), concentrators themselves are rather large and unwieldy. The problem is probably with using circuits as a model for computation. There is no natural way to break up a general circuit into subunits of non-constant size in order to deal with the reliability problem in modular style.

4.4.5. 4.4.5 The construction of compressors

This subsection is sketchier than the preceding ones, and assumes some knowledge of linear algebra.

We have shown that compressors exist. How expensive is it to find a -compressor, say, with , , , as in Example 4.6? In a deterministic algorithm, we could search through all the approximately -half-regular bipartite graphs. For each of these, we could check all possible input sets of size : as we know, their number is . The cost of checking each subset is , so the total number of operations is . Though this number is exponential in , recall that in our error-correcting construction, for the size of the noiseless circuit: therefore the total number of operations needed to find a compressor is polynomial in .

The proof of Theorem 4.21 shows that a randomly chosen -half-regular bipartite graph is a compressor with large probability. Therefore there is a faster, randomized algorithm for finding a compressor. Pick a random -half-regular bipartite graph, check if it is a compressor: if it is not, repeat. We will be done in a constant expected number of repetititons. This is a faster algorithm, but is still exponential in , since each checking takes operations.

Is it possible to construct a compressor explicitly, avoiding any search that takes exponential time in ? The answer is yes. We will show here only, however, that the compressor property is implied by a certain property involving linear algebra, which can be checked in polynomial time. Certain explicitly constructed graphs are known that possess this property. These are generally sought after not so much for their compressor property as for their expander property (see the section on reliable storage).

For vectors , let denote their inner product. A -half-regular bipartite multigraph with nodes can be defined by an incidence matrix , where is the number of edges connecting input to output . Let be the vector . Then , so is an eigenvector of with eigenvalue . Moreover, is the largest eigenvalue of . Indeed, denoting by for any row vector , we have .

Theorem 4.24 Let be a multigraph defined by the matrix . For all , and

there is an such that if the second largest eigenvalue of the matrix is then is a -compressor.

Proof. The matrix has largest eigenvalue . Since it is symmetric, it has a basis of orthogonal eigenvectors of unit length with corresponding nonnegative eigenvalues

where and . Recall that in the orthonormal basis , any vector can be written as . For an arbitrary vector , we can estimate as follows.

Let now be a set of size and where for and 0 otherwise. Then, coordinate of counts the number of edges coming from the set to the node . Also, , the number of elements of . We get

Suppose that there are nodes with , then this says

Since (4.22) implies , it follows that is a -compressor for small enough .

It is actually sufficient to look for graphs with large and where are constants. To see this, let us define the product of two bipartite multigraphs with vertices by the multigraph belonging to the product of the corresponding matrices.

Suppose that is symmetric: then its second largest eigenvalue is and the ratio of the two largest eigenvalues of is . Therefore using for a sufficiently large as our matrix, the condition (4.22) can be satisfied. Unfortunately, taking the power will increase the degree , taking us probably even farther away from practical realizability.

We found that there is a construction of a compressor with the desired parameters as soon as we find multigraphs with arbitrarily large sizes , with symmetric matrices and with a ratio of the two largest eigenvalues of bounded by a constant independent of . There are various constructions of such multigraphs (see the references in the historical overview). The estimation of the desired eigenvalue quotient is never very simple.

Exercises

4.4-1 The proof of Theorem 4.17 uses a “coda” circuit of size . Once we proved this theorem we could, of course, apply it to the computation of the final majority itself: this would reduce the size of the coda circuit to . Try out this approach on the numerical examples considered above to see whether it results in a significant improvement.

4.4-2 The proof of Theorem 4.21 provided also bipartite graphs with the compressor property, with inputs and outputs. An idea to build a smaller “coda” circuit in the proof of Theorem 4.17 is to concatenate several such compressors, decreasing the number of cables in a geometric series. Explore this idea, keeping in mind, however, that as decreases, the “exponential” error estimate in inequality 4.18 becomes weaker.

4.4-3 In a noisy Boolean circuit, let if the gate at vertex fails and otherwise. Further, let if is tainted, and 0 otherwise. Suppose that the distribution of the random variables does not depend on the Boolean input vector. Show that then the joint distribution of the random variables is also independent of the input vector.

4.4-4 This exercise extends the result of Exercise 4.3-1 to random input vectors: it shows that if a random input vector has only a small number of errors, then the iterated majority vote of Exercise 4.2-5 may still work for it, if we rearrange the input wires randomly. Let , and let be a vector of integers . We define a Boolean circuit as follows. This circuit takes input vector , computes the vector where (in other words, just leads a wire from input node to an “intermediate node” ) and then inputs into the circuit .

Denote the (possibly random) output bit of by . For any fixed input vector , assuming that our majority gates can fail with probability independently, denote . Assume that the input is a vector of (not necessarily independent) Boolean random variables, with . Denoting , assume . Prove that there is a choice of the vector for which

The choice may depend on the distribution of the random vector .

Hint. Choose the vector (and hence the circuit ) randomly, as a random vector where the variables are independent and uniformly distributed over , and denote . Then prove

For this, interchange the averaging over and . Then note that is the probability of when the “wires” are chosen randomly “on the fly” during the computation of the circuit.

4.4-5 Taking the notation of Exercise 4.4-3 suppose, like there, that the random variables are independent of each other, and their distribution does not depend on the Boolean input vector. Take the Boolean circuit introduced in Definition 4.22, and define the random Boolean vector where if and only if the th output node is tainted. Apply Exercise 4.4-4 to show that there is a circuit that can be attached to the output nodes to play the role of the “coda” circuit in the proof of Theorem 4.17. The size of is only linear in , not as for the coda circuit in the proof there. But, we assumed a little more about the fault distribution, and also the choice of the “wiring”' depends on the circuit .