4.5. 4.5 The reliable storage problem

There is hardly any simpler computation than not doing anything, just keeping the input unchanged. This task does not fit well, however, into the simple model of Boolean circuits as introduced above.

4.5.1. 4.5.1 Clocked circuits

Figure 4.9.  A shift register.

A shift register.

An obvious element of ordinary computations is missing from the above described Boolean circuit model: repetition. If we want to repeat some computation steps, then we need to introduce timing into the work of computing elements and to store the partial results between consecutive steps. Let us look at the drawings of the circuit designer again. We will see components like in Figure 4.9, with one ingoing edge and no operation associated with them; these will be called shift registers. The shift registers are controlled by one central clock (invisible on the drawing). At each clock pulse, the assignment value on the incoming edge jumps onto the outgoing edges and “stays in” the register. Figure 4.10 shows how shift registers may be used inside a circuit.

Definition 4.25 A clocked circuit over a complete basis is given by a tuple just like a Boolean circuit in 4.10. Also, the circuit defines a graph similarly. Recall that we identified nodes with the natural numbers . To each non-input node either a gate is assigned as before, or a shift register: in this case (there is only one argument). We do not require the graph to be acyclic, but we do require every directed cycle (if there is any) to pass through at least one shift register.

Figure 4.10.  Part of a circuit which computes the sum of two binary numbers Part of a circuit which computes the sum of two binary numbers x,y . We feed the digits of x and y beginning with the lowest-order ones, at the input nodes. The digits of the sum come out on the output edge. A shift register holds the carry.. We feed the digits of Part of a circuit which computes the sum of two binary numbers x,y . We feed the digits of x and y beginning with the lowest-order ones, at the input nodes. The digits of the sum come out on the output edge. A shift register holds the carry. and Part of a circuit which computes the sum of two binary numbers x,y . We feed the digits of x and y beginning with the lowest-order ones, at the input nodes. The digits of the sum come out on the output edge. A shift register holds the carry. beginning with the lowest-order ones, at the input nodes. The digits of the sum come out on the output edge. A shift register holds the carry.

Part of a circuit which computes the sum of two binary numbers x,y . We feed the digits of x and y beginning with the lowest-order ones, at the input nodes. The digits of the sum come out on the output edge. A shift register holds the carry.

The circuit works in a sequence of clock cycles. Let us denote the input vector at clock cycle by , the shift register states by , and the output vector by . The part of the circuit going from the inputs and the shift registers to the outputs and the shift registers defines two Boolean vector functions and . The operation of the clocked circuit is described by the following equations (see Figure 4.11, which does not show any inputs and outputs).

Figure 4.11.  A “computer” consists of some memory (shift registers) and a Boolean circuit operating on it. We can define the size of computation as the size of the computer times the number of steps.

A “computer” consists of some memory (shift registers) and a Boolean circuit operating on it. We can define the size of computation as the size of the computer times the number of steps.

Frequently, we have no inputs or outputs during the work of the circuit, so the equations (4.23) can be simplified to

How to use a clocked circuit described by this equation for computation? We write some initial values into the shift registers, and propagate the assignment using the gates, for the given clock cycle. Now we send a clock pulse to the register, causing it to write new values to their output edges (which are identical to the input edges of the circuit). After this, the new assignment is computed, and so on.

How to compute a function with the help of such a circuit? Here is a possible convention. We enter the input (only in the first step), and then run the circuit, until it signals at an extra output edge when desired result can be received from the other output nodes.

Example 4.8 This example uses a convention different from the above described one: new input bits are supplied in every step, and the output is also delivered continuously. For the binary adder of Figure 4.10, let and be the two input bits in cycle , let be the content of the carry, and be the output in the same cycle. Then the equations (4.23) now have the form

where is the majority operation.

4.5.2. 4.5.2 Storage

A clocked circuit is an interesting parallel computer but let us pose now a task for it that is trivial in the absence of failures: information storage. We would like to store a certain amount of information in such a way that it can be recovered after some time, despite failures in the circuit. For this, the transition function introduced in (4.24) cannot be just the identity: it will have to perform some error-correcting operations. The restoring organs discussed earlier are natural candidates. Indeed, suppose that we use memory cells to store a bit of information. We can call the content of this -tuple safe when the number of memory cells that dissent from the correct value is under some treshold . Let the rest of the circuit be a restoring organ built on a -compressor with . Suppose that the input cable is safe. Then the probability that after the transition, the new output cable (and therefore the new state) is not safe is for some constant . Suppose we keep the circuit running for steps. Then the probability that the state is not safe in some of these steps is which is small as long as is significantly smaller than . When storing bits of information, the probability that any of the bits loses its safety in some step is .

To make this discussion rigorous, an error model must be introduced for clocked circuits. Since we will only consider simple transition functions like the majority vote above, with a single computation step between times and , we will make the model also very simple.

Definition 4.26 Consider a clocked circuit described by equation (4.24), where at each time instant , the configuration is described by the bit vector . Consider a sequence of random bit vectors for . Similarly to (4.13) we define

Thus, says that a failure occurs at the space-time point . The sequence will be called -admissible if (4.14) holds for every set of space-time points with .

By the just described construction, it is possible to keep bits of information for steps in

memory cells. More precisely, the cable will be safe with large probability in any admissible evolution ().

Cannot we do better? The reliable information storage problem is related to the problem of information transmission: given a message , a sender wants to transmit it to a receiver throught a noisy channel. Only now sender and receiver are the same person, and the noisy channel is just the passing of time. Below, we develop some basic concepts of reliable information transmission, and then we will apply them to the construction of a reliable data storage scheme that is more economical than the above seen naive, repetition-based solution.

4.5.3. 4.5.3 Error-correcting codes  Error detection.

To protect information, we can use redundancy in a way more efficient than repetition. We might even add only a single redundant bit to our message. Let , be the word we want to protect. Let us create the error check bit

For example, . Our codeword will be subject to noise and it turns into a new word, . If differs from in a single changed (not deleted or added) bit then we will detect this, since then violates the error check relation

We will not be able to correct the error, since we do not know which bit was corrupted.  Correcting a single error.

To also correct corrupted bits, we need to add more error check bits. We may try to add two more bits:

Then an uncorrupted word must satisfy the error check relations

or, in matrix notation , where

Note . The matrix is called the error check matrix, or parity check matrix.

Another way to write the error check relations is

Now if is corrupted, even if only in a single position, unfortunately we still cannot correct it: since , the error could be in position 1 or 5 and we could not tell the difference. If we choose our error-check matrix in such a way that the colum vectors are all different (of course also from 0), then we can always correct an error, provided there is only one. Indeed, if the error was in position 3 then

Since all vectors are different, if we see the vector we can imply that the bit is corrupted. This code is called the Hamming code. For example, the following error check matrix defines the Hamming code of size 7:

In general, if we have error check bits then our code can have size , hence the number of bits left to store information, the information bits is . So, to protect bits of information from a single error, the Hamming code adds error check bits. This is much better than repeating every bit 3 times.  Codes.

Let us summarize the error-correction scenario in general terms.

Figure 4.12.  Transmission through a noisy channel.

Transmission through a noisy channel.

In order to fight noise, the sender encodes the message by an encoding function into a longer string which, for simplicity, we also assume to be binary. This codeword will be changed by noise into a string . The receiver gets and applies to it a decoding function .

Definition 4.27 The pair of functions and is called a code if holds for all . The strings are called messages, words of the form are called codewords. (Sometimes the set of all codewords by itself is also called a code.) For every message , the set of words is called the decoding set of . (Of course, different decoding sets are disjoint.) The number

is called the rate of the code.

We say that our code that corrects errors if for all possible messages , if the received word differs from the codeword in at most positions, then .

If the rate is then the -bit codewords carry bits of useful information. In terms of decoding sets, a code corrects errors if each decoding set contains all words that differ from in at most symbols (the set of these words is a kind of “ball” of radius ).

The Hamming code corrects a single error, and its rate is close to 1. One of the important questions connected with error-correcting codes is how much do we have to lower the rate in order to correct more errors.

Having a notion of codes, we can formulate the main result of this section about information storage.

Theorem 4.28 (Network information storage) There are constants with the following property. For all sufficiently large , there is a code with message length and codeword length , and a Boolean clocked circuit of size with inputs and outputs, such that the following holds. Suppose that at time 0, the memory cells of the circuit contain string . Suppose further that the evolution of the circuit has -admissible failures. Then we have

This theorem shows that it is possible to store bits information for time , in a clocked circuit of size

As long as the storage time is below the exponential bound for a certain constant , this circuit size is only a constant times larger than the amount of information it stores. (In contrast, in (4.26) we needed an extra factor when we used a separate restoring organ for each bit.)

The theorem says nothing about how difficult it is to compute the codeword at the beginning and how difficult it is to carry out the decoding at the end. Moreover, it is desireable to perform these two operations also in a noise-tolerant fashion. We will return to the problem of decoding later.  Linear algebra.

Since we will be dealing more with bit matrices, it is convenient to introduce the algebraic structure

which is a two-element field. Addition and multiplication in are defined modulo 2 (of course, for multiplication this is no change). It is also convenient to vest the set of binary strings with the structure of an -dimensional vector space over the field . Most theorems and algorithms of basic linear algebra apply to arbitrary fields: in particular, one can define the row rank of a matrix as the maximum number of linearly independent rows, and similarly the column rank. Then it is a theorem that the row rank is equal to the colum rank. From now on, in algebraic operations over bits or bit vectors, we will write in place of unless this leads to confusion. To save space, we will frequently write column vectors horizontally: we write

where denotes the transpose of matrix . We will write

for the identity matrix over the vector space .  Linear codes.

Let us generalize the idea of the Hamming code.

Definition 4.29 A code with message length and codeword length is linear if, when viewing the message and code vectors as vectors over the field , the encoding function can be computed according to the formula

with an matrix called the generator matrix of the code. The number is called the the number of information bits in the code, the number

the number of error-check bits.

Example 4.9 The matrix in (4.27) can be written as , wit

Then the error check relation can be written as

This shows that the bits can be taken to be the message bits, or “information bits”, of the code, making the Hamming code a linear code with the generator matrix . (Of course, over the field .)

The following statement is proved using standard linear algebra, and it generalizes the relation between error check matrix and generator matrix seen in Example 4.9.

Claim 4.30 Let be given with .

(a) For every matrix of rank over there is a matrix of rank with the property

(b) For every matrix of rank over there is an matrix of rank with property (4.28).

Definition 4.31 For a vector , let denote the number of its nonzero elements: we will also call it the weight of .

In what follows it will be convenient to define a code starting from an error-check matrix . If the matrix has rank then the code has rate

We can fix any subset of linearly independent columns, and call the indices error check bits and the indices the information bits. (In Example 4.9, we chose .) Important operations can performed over a code, however, without fixing any separation into error-check bits and information bits.

4.5.4. 4.5.4 Refreshers

Correcting a single error was not too difficult; finding a similar scheme to correct 2 errors is much harder. However, in storing bits, typically (much more than 2) of those bits will be corrupted in every step. There are ingenious and quite efficient codes of positive rate (independent of ) correcting even this many errors. When applied to information storage, however, the error-correction mechanism itself must also work in noise, so we are looking for a particularly simple one. It works in our favor, however, that not all errors need to be corrected: it is sufficient to cut down their number, similarly to the restoring organ in reliable Boolean circuits above.

For simplicity, as gates of our circuit we will allow certain Boolean functions with a large, but constant, number of arguments. On the other hand, our Boolean circuit will have just depth 1, similarly to a restoring organ of Section 4.4. The output of each gate is the input of a memory cell (shift register). For simplicity, we identify the gate and the memory cell and call it a cell. At each clock tick, a cell reads its inputs from other cells, computes a Boolean function on them, and stores the result (till the next clock tick). But now, instead of majority vote among the input values cells, the Boolean function computed by each cell will be slightly more complicated.

Our particular restoring operations will be defined, with the help of a certain parity check matrix . Let be a vector of bits. For some , let (from “vertical”) be the set of those indices with . For integer , let (from “horizontal”) be the set of those indices with . Then the condition can also be expressed by saying that for all , we have . The sets are called the parity check sets belonging to the matrix . From now on, the indices will be called checks, and the indices locations.

Definition 4.32 A linear code is a low-density parity-check code with bounds if the following conditions are satisfied:

(a) For each we have ;

(b) For each we have .

In other words, the weight of each row is at most and the weight of each column is at most .

In our constructions, we will keep the bounds constant while the length of codewords grows. Consider a situation when is a codeword corrupted by some errors. To check whether bit is incorrect we may check all the sums

for all . If all these sums are 0 then we would not suspect to be in error. If only one of these is nonzero, we will know that has some errors but we may still think that the error is not in bit . But if a significant number of these sums is nonzero then we may suspect that is a culprit and may want to change it. This idea suggests the following definition.

Definition 4.33 For a low-density parity-check code with bounds , the refreshing operation associated with the code is the following, to be performed simultaneously for all locations :

Find out whether more than of the sums are nonzero among the ones for . If this is the case, flip .

Let denote the vector obtained from by this operation. For parameters , let us call a -refresher if for each vector of length with weight the weight of the resulting vector decreases thus:

Notice the similarity of refreshers to compressors. The following lemma shows the use of refreshers, and is an example of the advantages of linear codes.

Lemma 4.34 For an -refresher , let be an -vector and a codeword of length with . Then .

Proof. Since is a codeword, , implying . Therefore the error correction flips the same bits in as in : , giving . So, if , then .

Theorem 4.35 There is a parameter and integers such that for all sufficiently large codelength and there is a -refresher with at least information bits.

In particular, we can choose , , .

Figure 4.13.  Using a refresher.

Using a refresher.

We postpone the proof of this theorem, and apply it first.

Proof. Proof of Theorem 4.28 Theorem 4.35 provides us with a device for information storage. Indeed, we can implement the operation using a single gate of at most inputs for each bit of . Now as long as the inequality holds for some codeword , the inequality follows with . Of course, some gates will fail and introduce new deviations resulting in some rather than . Let and . Then just as earlier, the probability that there are more than failures is bounded by the exponentially decreasing expression . With fewer than new deviations, we will still have . The probability that at any time the number of failures is more than is bounded by

Example 4.10 Let . Using the sample values in Theorem 4.35 we can take , , so the information rate is . With the corresponding values of , and , we have . The probability that there are more than failures is bounded by

This is exponentially decreasing with , albeit initially very slowly: it is not really small until . Still, for , it gives .  Decoding?

In order to use a refresher for information storage, first we need to enter the encoded information into it, and at the end, we need to decode the information from it. How can this be done in a noisy environment? We have nothing particularly smart to say here about encoding besides the reference to the general reliable computation scheme discussed earlier. On the other hand, it turns out that if is sufficiently small then decoding can be avoided altogether.

Recall that in our codes, it is possible to designate certain symbols as information symbols. So, in principle it is sufficient to read out these symbols. The question is only how likely it is that any one of these symbols will be corrupted. The following theorem upperbounds the probability for any symbol to be corrupted, at any time.

Theorem 4.36 For parameters , integers , codelength , with , consider a -refresher. Build a Boolean clocked circuit of size with inputs and outputs based on this refresher, just as in the proof of Theorem 4.28. Suppose that at time 0, the memory cells of the circuit contain string . Suppose further that the evolution of the circuit has -admissible failures. Let be the bits stored at time . Then implies

for some depending on .

Remark 4.37 What we are bounding is only the probability of a corrupt symbol in the particular position . Some of the symbols will certainly be corrupt, but any one symbol one points to will be corrupt only with probability .

The upper bound on required in the condition of the theorem is very severe, underscoring the theoretical character of this result.

Proof. As usual, it is sufficient to assume . Let , and let be the set of circuit elements which fail at time . Let us define the following sequence of integers:

It is easy to see by induction

The first members of the sequence are 1,2,3,4,6,8,11,15,18,24,32, and for they are 1,1,1,2,2,3,4,5,6,8,11.

Lemma 4.38 Suppose that . Then either there is a time at which circuit elements failed, or there is a sequence of sets for and with the following properties.

(a) For , every element of shares some error-check with some element of . Also every element of shares some error-check with some element of .

(b) We have for , on the other hand .

(c) We have , , for all , and .

Proof. We will define the sequence recursively, and will say when to stop. If then we set , , and stop. Suppose that is already defined. Let us define (or if ). Let be the set of those which share some error-check with an element of , and let . The refresher property implies that either or

In the former case, there must have been some time with , otherwise could never become larger than . In the latter case, the property implies

Now if then let be any subset of with size (there is one), else let and a set of size (there is one). This construction has the required properties.

For a given , the number of different choices for is bounded by

where we used (4.9). Similarly, the number of different choices for is bounded by

It follows that the number of choices for the whole sequence is bounded by

On the other hand, the probability for a fixed to have is . This way, we can bound the probability that the sequence ends exactly at by

where we used (4.29). For small this gives


where we used and the property for . We can bound the last expression by with an appropriate constantb .

We found that the event happens either if there is a time at which circuit elements failed (this has probability bound ) or an event of probability occurs.  Expanders.

We will construct our refreshers from bipartite multigraphs with a property similar to compressors: expanders.

Definition 4.39 Here, we will distinguish the two parts of the bipartite (multi) graphs not as inputs and outputs but as left nodes and right nodes. A bipartite multigraph is -regular if the points of the left set have degree and the points in the right set have degree . Consider such a graph, with the left set having nodes (then the right set has nodes). For a subset of the left set of , let consist of the points connected by some edge to some element of . We say that the graph expands by a factor if we have . For , our graph is an -expander if expands every subset of size of the left set by a factor .

Figure 4.14.  A regular expander.

A regular expander.

Definition 4.40 Given an -regular bipartite multigraph , with left set and right set , we assign to it a parity-check code as follows: if is connected to , and 0 otherwise.

Now for every possible error set , the set describes the set of parity checks that the elements of participate in. Under some conditions, the lower bound on the size of guarantees that a sufficient number of errors will be corrected.

Theorem 4.41 Let be an -expander with integer . Let . Then is a -refresher.

Proof. More generally, for any , let be an -expander with integer . We will prove that is a -refresher. For an -dimensional bit vector with , , assume

Our goal is to show : in other words, that in the corrected vector the number of errors decreases at least by a factor of .

Let be the set of bits in that the error correction operation fails to flip, with , and the set of of bits that were 0 but the operation turns them to 1, with . Our goal is to bound . The key observation is that each element of shares at least half of its neighbors with elements of , and similarly, each element of shares at least half of its neighbors with other elements of . Therefore both and contribute relatively weakly to the expansion of . Since this expansion is assumed strong, the size of must be limited.


By expansion, .

First we show . Assume namely that, on the contrary, , and let be a subset of such that (an integer, according to the assumptions of the theorem). By expansion,

Each bit in has at most neighbors that are not neighbors of ; so,

Combining these:

since . This contradiction with 4.30 shows .

Now implies (recalling that each element of contributes at most new neighbors):

Each must share at least half of its neighbors with others in . Therefore contributes at most neighbors on its own; the contribution of the other must be divided by 2, so the the total contribution of to the neighbors of is at most :

Combining with (4.31):  Random expanders.

Are there expanders good enough for Theorem 4.41? The maximum expansion factor is the degree and we require a factor of It turns out that random choice works here, too, similarly to the one used in the “construction” of compressors.

The choice has to be done in a way that the result is an -regular bipartite multigraph of left size . We will start with left nodes and right nodes . Now we choose a random matching, that is a set of edges with the property that every left node is connected by an edge to exactly one right node. Let us call the resulting graph . We obtain now as follows: we collapse each group of left nodes into a single node: into one node, into another node, and so on. Similarly, we collapse each group of right nodes into a single node: into one node, into another node, and so on. The edges between any pair of nodes in are inherited from the ancestors of these nodes in . This results in a graph with left nodes of degree and right nodes of degree . The process may give multiple edges between nodes of , this is why is a multigraph. Two nodes of will be called cluster neighbors if they are collapsed to the same node of .

Theorem 4.42 Suppose

Then the above random choice gives an -expander with positive probability.

Example 4.11 If , then the inequality in the condition of the theorem becomes

Proof. Let be a set of size in the left set of . We will estimate the probability that has too few neighbors. In the above choice of the graph we might as well start with assigning edges to the nodes of , in some fixed order of the nodes of the preimage of in . There are edges to assign. Let us call a node of the right set of occupied if it has a cluster neighbor already reached by an earlier edge. Let be a random variable that is 1 if the th edge goes to an occupied node and 0 otherwise. There are

choices for the th edge and at most of these are occupied. Therefore

Using the large deviations theorem in the generalization given in Exercise 4.1-3, we have, for :

Now, the number of different neighbors of is , hence

Let us now multiply this with the number

of sets of size :

where in the last step we assumed . This is if

Substituting gives the formula of the theorem.

Proof. Proof of Theorem 4.35 Theorem 4.41 shows how to get a refresher from an expander, and Theorem 4.42 shows the existence of expanders for certain parameters. Example 4.11 shows that the parameters can be chosen as needed for the refreshers.


4.5-1 Prove Proposition 4.30.

4.5-2 Apply the ideas of the proof of Theorem 4.36 to the proof of Theorem 4.17, showing that the “coda” circuit is not needed: each wire of the output cable carries the correct value with high probability.


4-1 Critical value

Consider a circuit like in Exercise 4.2-5, assuming that each gate fails with probability independently of all the others and of the input. Assume that the input vector is all 0, and let be the probability that the circuit outputs a 1. Show that there is a value with the property that for all we have , and for , we have have . Estimate also the speed of convergence in both cases.

4-2 Regular compressor

We defined a compressor as a -halfregular bipartite multigraph. Let us call a compressor regular if it is a -regular multigraph (the input nodes also have degree ). Prove a theorem similar to Theorem 4.21: for each there is an integer and an such that for all integer there is a regular -compressor.

Hint. Choose a random -regular bipartite multigraph by the following process: (1. Replace each vertex by a group of vertices. 2. Choose a random complete matching betwen the new input and output vertices. 3. Merge each group of vertices into one vertex again.) Prove that the probability, over this choice, that a -regular multigraph is a not a compressor is small. For this, express the probability with the help of factorials and estimate the factorials using Stirling's formula.

4-3 Two-way expander

Recall the definition of expanders. Call a -expander regular if it is a -regular multigraph (the input nodes also have degree ). We will call this multigraph a two-way expander if it is an expander in both directions: from to and from to . Prove a theorem similar to the one in Problem 4-2: for all there is an such that for all integers there is a two-way regular -expander.

4-4 Restoring organ from 3-way voting

The proof of Theorem 4.21 did not guarantee a -compressor with any , . If we only want to use 3-way majority gates, consider the following construction. First create a 3-halfregular bipartite graph with inputs and outputs , with a 3-input majority gate in each . Then create new nodes , with a 3-input majority gate in each . The gate of computes the majority of , the gate of computes the majority of , and so on. Calculate whether a random choice of the graph will turn the circuit with inputs and outputs into a restoring organ. Then consider three stages instead of two, where has outputs and see what is gained.

4-5 Restoring organ from NOR gates

The majority gate is not the only gate capable of strengthening the majority. Recall the NOR gate introduced in Exercise 4.2-2, and form . Show that a construction similar to Problem 4-4 can be carried out with used in place of 3-way majority gates.

4-6 More randomness, smaller restoring organs

Taking the notation of Exercise 4.4-3, suppose like there, that the random variables are independent of each other, and their distribution does not depend on the Boolean input vector. Apply the idea of Exercise 4.4-5 to the construction of each restoring organ. Namely, construct a different restoring organ for each position: the choice depends on the circuit preceding this position. Show that in this case, our error estimates can be significantly improved. The improvement comes, just as in Exercise 4.4-5, since now we do not have to multiply the error probability by the number of all possible sets of size of tainted wires. Since we know the distribution of this set, we can average over it.

4-7 Near-sorting with expanders

In this problem, we show that expanders can be used for “near-sorting”. Let be a regular two-way -expander, whose two parts of size are and . According to a theorem of Kőnig, (the edge-set of) every -regular bipartite multigraph is the disjoint union of (the edge-sets of) complete matchings . To such an expander, we assign a Boolean circuit of depth as follows. The circuit's nodes are subdivide into levels . On level we have two disjoint sets of size of nodes , (). The Boolean value on will be and respectively. Denote the vector of values at stage by . If is an edge in the matching , then we put an gate into , and a gate into :

This network is trying to “sort” the 0's to and the 1's to in stages. More generally, the values in the vectors could be arbitrary numbers. Then if still means and means then each vector is a permutation of the vector . Let . Prove that is -sorted in the sense that for all , at least among the smallest values of is in its left half and at least among the largest values are in its right half.

4-8 Restoring organ from near-sorters

Develop a new restoring organ using expanders, as follows. First, split each wire of the input cable , to get two sets . Attach the -sorter of Problem 4-7, getting outputs . Now split the wires of into two sets . Attach the -sorter again, getting outputs . Keep only for the output cable. Show that the Boolean vector circuit leading from to can be used as a restoring organ.


The large deviation theorem (Theorem 4.1), or theorems similar to it, are sometimes attributed to Chernoff or Bernstein. One of its frequently used variants is given in Exercise 4.1-2.

The problem of reliable computation with unreliable components was addressed by John von Neumann in [ 192 ] on the model of logic circuits. A complete proof of the result of that paper (with a different restoring organ) appeare first in the paper [ 63 ] of R. L. Dobrushin and S. I. Ortyukov. Our presentation relied on parts of the paper [ 205 ] of N. Pippenger.

The lower-bound result of Dobrushin and Ortyukov in the paper [ 62 ] (corrected in [ 203 ], [ 211 ] and [ 85 ]), shows that reduncancy of is unavoidable for a general reliable computation whose complexity is . However, this lower bound only shows the necessity of putting the input into a redundantly encoded form (otherwise critical information may be lost in the first step). As shown in [ 205 ], for many important function classes, linear redundancy is achievable.

It seems natural to separate the cost of the initial encoding: it might be possible to perform the rest of the computation with much less redundancy. An important step in this direction has been made by D. Spielman in the paper [ 248 ] in (essentially) the clocked-circuit model. Spielman takes a parallel computation with time running on elementary components and makes it reliable using only times more processors and running it times longer. The failure probability will be . This is small as long as is not much larger than . So, the redundancy is bounded by some power of the logarithm of the space requirement; the time requirement does not enter explictly. In Boolean circuits no time- and space- complexity is defined separately. The size of the circuit is analogous to the quantity obtained in other models by taking the product of space and time complexity.

Questions more complex than Problem 4-1 have been studied in [ 204 ]. The method of Problem 4-2, for generating random -regular multigraphs is analyzed for example in [ 27 ]. It is much harder to generate simple regular graphs (not multigraphs) uniformly. See for example [ 143 ].

The result of Exercise 4.2-4 is due to C. Shannon, see [ 234 ]. The asymptotically best circuit size for the worst functions was found by Lupanov in [ 170 ]. Exercise 4.3-1 is based on [ 63 ], and Exercise 4.3-2 is based on [ 62 ] (and its corrections).

Problem 4-7 is based on the starting idea of the depth sorting networks in [ 9 ].

For storage in Boolean circuits we partly relied on A. V. Kuznietsov's paper [ 156 ] (the main theorem, on the existence of refreshers is from M. Pinsker). Low density parity check codes were introduced by R. G. Gallager in the book[ 80 ], and their use in reliable storage was first suggested by M. G. Taylor in the paper [ 257 ]. New, constructive versions of these codes were developed by M. Sipser and D. Spielman in the paper [ 247 ], with superfast coding and decoding.

Expanders, invented by Pinsker in [ 202 ] have been used extensively in theoretical computer science: see for example[ 188 ] for some more detail. This book also gives references on the construction of graphs with large eigenvalue-gap. Exercise 4.4-4 and Problem 4-6 are based on [ 63 ].

The use of expanders in the role of refreshers was suggested by Pippenger (private communication): our exposition follows Sipser and Spielman in [ 243 ]. Random expanders were found for example by Pinsker. The needed expansion rate ( times the left degree) is larger than what can be implied from the size of the eigenvalue gap. As shown in [ 202 ] (see the proof in Theorem 4.42) random expanders have the needed expansion rate. Lately, constructive expanders with nearly maximal expansion rate were announced by Capalbo, Reingold, Vadhan and Wigderson in [ 39 ].

Reliable computation is also possible in a model of parallel computation that is much more regular than logic circuits: in cellular automata. We cannot present those results here: see for example the papers [ 84 ], [ 86 ].