Any planned computation will be subject to different kinds of unpredictable influences during execution. Here are some examples:

(1) Loss or change of stored data during execution.

(2) Random, physical errors in the computer.

(3) Unexpected interactions between different parts of the system working simultaneously, or loss of connections in a network.

(4) Bugs in the program.

(5) Malicious attacks.

Up to now, it does not seem that the problem of bugs can be solved just with the help of appropriate algorithms. The discipline of software engineering addresses this problem by studying and improving the structure of programs and the process of their creation.

Malicious attacks are addressed by the discipline of computer security. A large part of the recommended solutions involves cryptography.

Problems of kind (3) are very important and a whole discipline, distributed computing has been created to deal with them.

The problem of storage errors is similar to the problems of reliable communication, studied in information theory: it can be viewed as communication from the present to the future. In both cases, we can protect against noise with the help of **error-correcting codes** (you will see some examples below).

In this chapter, we will discuss some sample problems, mainly from category (2). In this category, distinction should also be made between permanent and transient errors. An error is **permanent** when a part of the computing device is damaged physically and remains faulty for a long time, until some outside intervention by repairmen to it. It is **transient** if it happens only in a single step: the part of the device in which it happened is not damaged, in the next step it operates correctly again. For example, if a position in memory turns from 0 to 1 by accident, but a subsequent write operation can write a 0 again then a transient error happened. If the bit turned to 1 and the computer cannot change it to 0 again, this is a permanent error.

Some of these problems, especially the ones for transient errors, are as old as computing. The details of any physical errors depend on the kind of computer it is implemented on (and, of course, on the kind of computation we want to carry out). But after abstracting away from a lot of distracting details, we are left with some clean but challenging theoretical formulations, and some rather pleasing solutions. There are also interesting connections to other disciplines, like statistical physics and biology.

The computer industry has been amazingly successful over the last five decades in making the computer components smaller, faster, and at the same time more reliable. Among the daily computer horror stories seen in the press, the one conspicuously missing is where the processor wrote a 1 in place of a 0, just out of caprice. (It undisputably happens, but too rarely to become the identifiable source of some visible malfunction.) On the other hand, the generality of some of the results on the correction of transient errors makes them applicable in several settings. Though individual physical processors are very reliable (error rate is maybe once in every executions), when considering a whole network as performing a computation, the problems caused by unreliable network connections or possibly malicious network participants is not unlike the problems caused by unreliable processors.

The key idea for making a computation reliable is *redundancy*, which might be formulated as the following two procedures:

(i) Store information in such a form that losing any small part of it is not fatal: it can be restored using the rest of the data. For example, store it in multiple copies.

(ii) Perform the needed computations repeatedly, to make sure that the faulty results can be outvoted.

Our chapter will only use these methods, but there are other remarkable ideas which we cannot follow up here. For example, method (ii) seems especially costly; it is desireable to avoid a lot of repeated computation. The following ideas target this dilemma.

(A) Perform the computation directly on the information in its redundant form: then maybe recomputations can be avoided.

(B) Arrange the computation into “segments” such a way that those partial results that are to be used later, can be cheaply checked at each “milestone” between segments. If the checking finds error, repeat the last segment.

The present chapter does not require great sophistication in probability theory but there are some facts coming up repeatedly which I will review here. If you need additional information, you will find it in any graduate probability theory text.

A **probability space** is a triple where is the set of **elementary events**, is a set of subsets of called the set of **events** and is a function. For , the value is called the **probability** of event . It is required that and that implies . Further, if a (possibly infinite) sequence of sets is in then so is their union. Also, it is assumed that and that if are disjoint then

For , the **conditional probability** of given is defined as

Events are **independent** if for any sequence we have

**Example 4.1 **Let where is the set of all subsets of and . This is an example of a **discrete** probability space: one that has a countable number of elements.

More generally, a discrete probability space is given by a countable set , and a sequence with , . The set of events is the set of all subsets of , and for an event we define .

A **random variable** over a probability space is a function from to the real numbers, with the property that every set of the form is an event: it is in . Frequently, random variables are denoted by capital letters , possibly with indices, and the argument is omitted from . The event is then also written as . This notation is freely and informally extended to more complicated events. The **distribution** of a random variable is the function . We will frequently only specify the distribution of our variables, and not mention the underlying probability space, when it is clear from the context that it can be specified in one way or another. We can speak about the **joint distribution** of two or more random variables, but only if it is assumed that they can be defined as functions on a common probability space. Random variables with a joint distribution are **independent** if every -tuple of events of the form is independent.

The **expected value** of a random variable taking values with probabilities is defined as

It is easy to see that the expected value is a linear function of the random variable:

even if are not independent. On the other hand, if variables are independent then the expected values can also be multiplied:

There is an important simple inequality called the **Markov inequality**, which says that for an arbitrary nonnegative random variable and any value we have

In what follows the bounds

will be useful. Of these, the well-known upper bound holds since the graph of the function is below its tangent line drawn at the point . The lower bound is obtained from the identity and

Consider independent random variables that are identically distributed, with

Let

We want to estimate the probability for any constant . The “law of large numbers” says that if then this probability converges fast to 0 as while if then it converges fast to 1. Let

where the inequality (useful for small and ) comes via and from (4.3). Using the concavity of logarithm, it can be shown that is always nonnegative, and is 0 only if (see Exercise 4.1-1).

**Theorem 4.1 **
**(Large deviations for coin-toss)**
*If then*

This theorem shows that if then converges to 0 exponentially fast. Inequality (4.5) will allow the following simplification:

useful for small and .

**Proof. **For a certain real number (to be chosen later), let be the random variable that is if and if , and let : then

Applying the Markov inequality (4.2) and (4.1), we get

where . Let us choose , this is if . Then we get , and hence

This theorem also yields some convenient estimates for binomial coefficients. Let

This is sometimes called the **entropy** of the probability distribution (measured in logarithms over base instead of base 2). From inequality (4.3) we obtain the estimate

which is useful for small .

*In particular, taking with gives*

**Proof. **Theorem 4.1 says for the case :

Substituting , and noting the symmetries , and (4.7) gives (4.8).

**Remark 4.3 **
*Inequality (4.6) also follows from the trivial estimate combined with (4.9).*

**Exercises**

4.1-1 Prove that the statement made in the main text that is always nonnegative, and is 0 only if .

4.1-2 For , derive from Theorem 4.1 the useful bound

*Hint*. Let , and use the Taylor formula: , where .

4.1-3 Prove that in Theorem 4.1, the assumption that are independent and identically distributed can be weakened: replaced by the single inequality