In a distributed system it is often useful to compute a global state that consists of the states of all processors. Having access to the global can allows us to reason about the system properties that depend on all processors, for example to be able to detect a deadlock. One may attempt to compute global state by stopping all processors, and then gathering their states to a central location. Such a method is will-suited for many distributed systems that must continue computation at all times. This section discusses how one can compute global state that is quite intuitive, yet consistent, in a precise sense. We first discuss a distributed algorithm that imposes a global order on instructions of processors. This algorithm creates the illusion of a global clock available to processors. Then we introduce the notion of one instruction causally affecting other instruction, and an algorithm for computing which instruction affects which. The notion turns out to be very useful in defining a consistent global state of distributed system. We close the section with distributed algorithms that compute a consistent global state of distributed system.
The design of distributed algorithms is easier when processors have access to (Newtonian) global clock, because then each event that occurs in the distributed system can be labeled with the reading of the clock, processors agree on the ordering of any events, and this consensus can be used by algorithms to make decisions. However, construction of a global clock is difficult. There exist algorithms that approximate the ideal global clock by periodically synchronising drifting local hardware clocks. However, it is possible to totally order events without using hardware clocks. This idea is called the logical clock.
Recall that an execution is an interleaving of instructions of the programs. Each instruction can be either a computational step of a processor, or sending a message, or receiving a message. Any instruction is performed at a distinct point of global time. However, the reading of the global clock is not available to processors. Our goal is to assign values of the logical clock to each instruction, so that these values appear to be readings of the global clock. That is, it possible to postpone or advance the instants when instructions are executed in such a way, that each instruction that has been assigned a value of the logical clock, is executed exactly at the instant of the global clock, and that the resulting execution is a valid one, in the sense that it can actually occur when the algorithm is run with the modified delays.
algorithm assigns logical time to each instruction. Each processor has a local variable called counter. This variable is initially zero and it gets incremented every time processor executes an instruction. Specifically, when a processor executes any instruction other than sending or receiving a message, the variable counter gets incremented by one. When a processor sends a message, it increments the variable by one, and attaches the resulting value to the message. When a processor receives a message, then the processor retrieves the value attached to the message, then calculates the maximum of the value and the current value of counter, increments the maximum by one, and assigns the result to the counter variable. Note that every time instruction is executed, the value of counter is incremented by at least one, and so it grows as processor keeps on executing instructions. The value of logical time assigned to instruction is defined as the pair , where counter is the value of the variable counter right after the instruction has been executed, and id is the identifier of the processor. The values of logical time form a total order, where pairs are compared lexicographically. This logical time is also called Lamport time. We define to be a quotient , which is an equivalent way to represent the pair.
(i) if an instruction is performed by a processor before an instruction is performed by the same processor, then the logical time of is strictly smaller than that of ,
(ii) any two distinct instructions of any two processors get assigned different logical times,
(iii) if instruction sends a message and instruction receives this message, then the logical time of is strictly smaller than that of .
Our goal now is to argue that logical clock provides to processors the illusion of global clock. Intuitively, the reason why such an illusion can be created is that we can take any execution of a deterministic algorithm, compute the logical time of each instruction , and run the execution again delaying or speeding up processors and messages in such a way that each instruction is executed at the instant of the global clock. Thus, without access to a hardware clock or other external measurements not captured in our model, the processors cannot distinguish the reading of logical clock from the reading of a real global clock. Formally, the reason why the re-timed sequence is a valid execution that is indistinguishable from the original execution, is summarised in the subsequent corollary that follows directly from Remark 13.21.
Corollary 13.22 For any execution , let be the assignment of logical time to instructions, and let be the sequence of instructions ordered by their logical time in . Then for each processor, the subsequence of instructions executed by the processor in is the same as the subsequence in . Moreover, each message is received in after it is sent in .
In a system execution, an instruction can affect another instruction by altering the state of the computation in which the second instruction executes. We say that one instruction can causally affect (or influence) another, if the information that one instruction produces can be passed on to the other instruction. Recall that in our model of distributed system, each instruction is executed at a distinct instant of global time, but processors do not have access to the reading of the global clock. Let us illustrate causality. If two instructions are executed by the same processor, then we could say that the instruction executed earlier can causally affect the instruction executed later, because it is possible that the result of executing the former instruction was used when the later instruction was executed. We stress the word possible, because in fact the later instruction may not use any information produced by the former. However, when defining causality, we simplify the problem of capturing how processors influence other processors, and focus on what is possible. If two instructions and are executed by two different processors, then we could say that instruction can causally affect instruction , when the processor that executes sends a message when or after executing , and the message is delivered before or during the execution of at the other processor. It may also be the case that influence is passed on through intermediate processors or multiple instructions executed by processors, before reaching the second processor.
We will formally define the intuition that one instruction can causally affect another in terms of a relation called happens before, and that relates pairs of instructions. The relation is defined for a given execution, i.e., we fix a sequence of instructions executed by the algorithm and instances of global clock when the instructions were executed, and define which pairs of instructions are related by the happens before relation. The relation is introduced in two steps. If instructions and are executed by the same processor, then we say that happens before if and only if is executed before . When and are executed by two different processors, then we say that happens before if and only if there is a chain of instructions and messages
for , such that is either equal to or is executed after by the same processor that executes ; is either equal to or is executed before by the same processor that executes ; is executed before by the same processor, ; and sends a message that is received by , . Note that no instruction happens before itself. We write when happens before . We omit the reference to the execution for which the relation is defined, because it will be clear from the context which execution we mean. We say that two instructions and are concurrent when neither nor . The question stands how processors can determine if one instruction happens before another in a given execution according to our definition. This question can be answered through a generalisation of the
algorithm presented earlier. This generalisation is called vector clocks.
algorithm allows processors to relate instructions, and this relation is exactly the happens before relation. Each processor maintains a vector of integers. The -th coordinate of the vector is denoted by . The vector is initialised to the zero vector . A vector is modified each time processor executes an instruction, in a way similar to the way counter was modified in the
algorithm. Specifically, when a processor executes any instruction other than sending or receiving a message, the coordinate gets incremented by one, and other coordinates remain intact. When a processor sends a message, it increments by one, and attaches the resulting vector to the message. When a processor receives a message, then the processor retrieves the vector attached to the message, calculates coordinate-wise maximum of the current vector and the vector , except for coordinate that gets incremented by one, and assigns the result to the variable .
We label each instruction executed by processor with the value of the vector right after the instruction has been executed. The label is denoted by and is called vector timestamp of instruction . Intuitively, represents the knowledge of processor about how many instructions each processor has executed at the moment when has executed instruction . This knowledge may be obsolete.
Vector timestamps can be used to order instructions that have been executed. Specifically, given two instructions and , and their vector timestamps and , we write that when the vector is majorised by the vector i.e., for all , the coordinate is at most the corresponding coordinate . We write when but .
The next theorem explains that the
algorithm indeed implements the happens before relation, because we can decide if two instructions happen or not before each other, just by comparing the vector timestamps of the instructions.
Proof. We first show the forward implication. Suppose that . Hence and are two different instructions. If the two instructions are executed on the same processor, then must be executed before . Only finite number of instructions have been executed by the time has been executed. The
algorithm increases a coordinate by one as it calculates vector timestamps of instructions from until inclusive, and no coordinate is ever decreased. Thus . If and were executed on different processors, then by the definition of happens before relation, there must be a finite chain of instructions and messages leading from to . But then by the
algorithm, the value of a coordinate of vector timestamp gets increased at each move, as we move along the chain, and so again .
Now we show the reverse implication. Suppose that it is not the case that . We consider a few subcases always concluding that it is not that case that . First, it could be the case that and are the same instruction. But then obviously vector clocks assigned to and are the same, and so it cannot be the case that . Let us, therefore, assume that and are different instructions. If they are executed by the same processor, then cannot be executed before , and so is executed after . Thus, by monotonicity of vector timestamps, , and so it is not the case that . The final subcase is when and are executed by two distinct processors and . Let us focus on the component of vector clock of processor right after was executed. Let its value be . Recall that other processors can only increase the value of their components by adopting the value sent by other processors. Hence, in order for the value of component of processor to be or more at the moment is executed, there must be a chain of instructions and messages that passes a value at least , originating at processor . This chain starts at or at an instruction executed by subsequent to . But the existence of such chain would imply that happens before , which we assumed was not the case. So the component of vector clock is strictly smaller than the component of vector clock . Thus it cannot be the case that .
This theorem tells us that we can decide if two distinct instructions and are concurrent, by checking that it is not the case that nor is it the case that .
The happens before relation can be used to compute a global state of distributed system, such that this state is in some sense consistent. Shortly, we will formally define the notion of consistency. Each processor executes instructions. A cut is defined as a vector of non-negative integers. Intuitively, the vector denotes the states of processors. Formally, denotes the number of instructions that processor has executed. Not all cuts correspond to collections of states of distributed processors that could be considered natural or consistent. For example, if a processor has received a message from and we record the state of in the cut by making appropriately large, but make so small that the cut contains the state of the sender before the moment when the message was sent, then we could say that such cut is not natural—there are instructions recorded in the cut that are causally affected by instructions that are not recorded in the cut. Such cuts we consider not consistent and so undesirable. Formally, a cut is inconsistent when there are processors and such that the instruction number of processor is causally affected by an instruction subsequent to instruction number of processor . So in an inconsistent cut there is a message that “crosses” the cut in a backward direction. Any cut that is not inconsistent is called a consistent cut.
algorithm uses vector timestamps to find a consistent cut. We assume that each processor is given the same cut as an input. Then processors must determine a consistent cut that is majorised by . Each processor has an infinite table of vectors. Processor executes instructions, and stores vector timestamps in consecutive entries of the table. Specifically, entry of the table is the vector timestamp of the -th instruction executed by the processor; we define to be the zero vector. Processor begins calculating a cut right after the moment when the processor has executed instruction number . The processor determines the largest number that is at most , such that the vector is majorised by . The vector that processors collectively find turns out to be a consistent cut.
Proof. First observe that there is no need to consider entries of further than . Each of these entries is not majorised by , because the -th coordinate of any of these vectors is strictly larger than . So we can indeed focus on searching among the first entries of . Let be the largest entry such that the vector is majorised by the vector . We know that such vector exists, because is a zero vector, and such vector is majorised by any cut .
We argue that is a consistent cut by way of contradiction. Suppose that the vector is an inconsistent cut. Then, by definition, there are processors and such that there is an instruction of processor subsequent to instruction number , such that happens before instruction number of processor . Recall that is the furthest entry of majorised by . So entry is not majorised by , and since all subsequent entries, including the one for instruction , can have only larger coordinates, the entries are not majorised by either. But, happens before instruction number , so entry can only have larger coordinates than respective coordinates of the entry corresponding to , and so cannot be majorised by either. This contradicts the assumption that is majorised by . Therefore, must be a consistent cut.
There is a trivial algorithm for finding a consistent cut. The algorithm picks . However, the
algorithm is better in the sense that the consistent cut found is maximal. That this is indeed true, is left as an exercise.
There is an alternative way to find a consistent cut. The Consistent Cut algorithm requires that we attach vector timestamps to messages and remember vector timestamps for all instructions executed so far by the algorithm which consistent cut we want to compute. This may be too costly. The algorithm called
avoids this cost. In the algorithm, a processor initiates the calculation of consistent cut by flooding the network with a special message that acts like a sword that cuts the execution of algorithm consistently. In order to prove that the cut is indeed consistent, we require that messages are received by the recipient in the order they were sent by the sender. Such ordering can be implemented using sequence number.
algorithm, each processor has a variable called counter that counts the number of instructions of algorithm executed by the processor so far. In addition the processor has a variable that will store the -th coordinate of the cut. This variable is initialised to . Since the variables counter only count the instructions of algorithm , the instructions of
algorithm do not affect the counter variables. In some sense the snapshot algorithm runs in the “background”. Suppose that there is exactly one processor that can decide to take a snapshot of the distributed system. Upon deciding, the processor “floods” the network with a special message ≤Snapshot≥. Specifically, the processor sends the message to all its neighbours and assigns counter to . Whenever a processor receives the message and the variable is still , then the processor sends ≤Snapshot≥ message to all its neighbours and assigns current to . The sending of ≤Snapshot≥ messages and assignment are done by the processor without executing any instruction of (we can think of
algorithm as an “interrupt”). The algorithm calculates a consistent cut.
Let for any processors and , the messages sent from to be received in the order they are sent. The
algorithm eventually finds a consistent cut . The algorithm sends messages, where is the number of edges in the graph.
Proof. The fact that each variable is eventually different from follows from our model, because we assumed that instructions are eventually executed and messages are eventually received, so the ≤Snapshot≥ messages will eventually reach all nodes.
Suppose that is not a consistent cut. Then there is a processor such that instruction number or later sends a message ≤ ≥ other than ≤Snapshot≥, and the message is received on or before a processor executes instruction number . So the message ≤ ≥ must have been sent after the message ≤Snapshot≥ was sent from to . But messages are received in the order they are sent, so processes ≤Snapshot≥ before it processes ≤ ≥. But then message ≤ ≥ arrives after snapshot was taken at . This is a desired contradiction.
13.5-1 Show that logical time preserves the happens before () relation. That is, show that if for events and it is the case that , then , where is the logical time of an event.
13.5-2 Show that any vector clock that captures concurrency between processors must have at least coordinates.
13.5-3 Show that the vector calculated by the algorithm
is in fact a maximal consistent cut majorised by . That is that there is no that majorises and is different from , such that is majorised by .