One often needs to coordinate the activities of processors in a distributed system. This can frequently be simplified when there is a single processor that acts as a coordinator. Initially, the system may not have any coordinator, or an existing coordinator may fail and so another may need to be elected. This creates the problem where processors must elect exactly one among them, a leader. In this section we study the problem for special types of networks—rings. We will develop an asynchronous algorithm for the problem. As we shall demonstrate, the algorithm has asymptotically optimal message complexity. In the current section, we will see a distributed analogue of the well-known divide-and-conquer technique often used in sequential algorithms to keep their time complexity low. The technique used in distributed systems helps reduce the message complexity.
The leader election problem is to elect exactly leader among a set of processors. Formally each processor has a local variable leader initially equal to
. An algorithm is said to solve the leader election problem if it satisfies the following conditions:
in any execution, exactly one processor eventually assigns
to its leader variable, all other processors eventually assign
to their leader variables, and
in any execution, once a processor has assigned a value to its leader variable, the variable remains unchanged.
We study the leader election problem on a special type of network—the ring. Formally, the graph that models a distributed system consists of nodes that form a simple cycle; no other edges exist in the graph. The two links adjacent to a processor are labeled CW (Clock-Wise) and CCW (Counter Clock-Wise). Processors agree on the orientation of the ring i.e., if a message is passed on in CW direction times, then it visits all processors and comes back to the one that initially sent the message; same for CCW direction. Each processor has a unique identifier that is a natural number, i.e., the identifier of each processor is different from the identifier of any other processor; the identifiers do not have to be consecutive numbers . Initially, no processor knows the identifier of any other processor. Also processors do not know the size of the ring.
elects a leader among asynchronous processors . Identifiers of processors are used by the algorithm in a crucial way. Briefly speaking, each processor tries to become the leader, the processor that has the largest identifier among all processors blocks the attempts of other processors, declares itself to be the leader, and forces others to declare themselves not to be leaders.
Let us begin with a simpler version of the algorithm to exemplify some of the ideas of the algorithm. Suppose that each processor sends a message around the ring containing the identifier of the processor. Any processor passes on such message only if the identifier that the message carries is strictly larger than the identifier of the processor. Thus the message sent by the processor that has the largest identifier among the processors of the ring, will always be passed on, and so it will eventually travel around the ring and come back to the processor that initially sent it. The processor can detect that such message has come back, because no other processor sends a message with this identifier (identifiers are distinct). We observe that, no other message will make it all around the ring, because the processor with the largest identifier will not pass it on. We could say that the processor with the largest identifier “swallows” these messages that carry smaller identifiers. Then the processor becomes the leader and sends a special message around the ring forcing all others to decide not to be leaders. The algorithm has message complexity, because each processor induces at most messages, and the leader induces extra messages; and one can assign identifiers to processors and delay processors and messages in such a way that the messages sent by a constant fraction of processors are passed on around the ring for a constant fraction of hops. The algorithm can be improved so as to reduce message complexity to , and such improved algorithm will be presented in the remainder of the section.
The key idea of the
algorithm is to make sure that not too many messages travel far, which will ensure message complexity. Specifically, the activity of any processor is divided into phases. At the beginning of a phase, a processor sends “probe” messages in both directions: CW and CCW. These messages carry the identifier of the sender and a certain “time-to-live” value that limits the number of hops that each message can make. The probe message may be passed on by a processor provided that the identifier carried by the message is larger than the identifier of the processor. When the message reaches the limit, and has not been swallowed, then it is “bounced back”. Hence when the initial sender receives two bounced back messages, each from each direction, then the processor is certain that there is no processor with larger identifier up until the limit in CW nor CCW directions, because otherwise such processor would swallow a probe message. Only then does the processor enter the next phase through sending probe messages again, this time with the time-to-live value increased by a factor, in an attempt to find if there is no processor with a larger identifier in twice as large neighbourhood. As a result, a probe message that the processor sends will make many hops only when there is no processor with larger identifier in a large neighbourhood of the processor. Therefore, fewer and fewer processors send messages that can travel longer and longer distances. Consequently, as we will soon argue in detail, message complexity of the algorithm is .
We detail the
algorithm. Each processor has five local variables. The variable id stores the unique identifier of the processor. The variable leader stores
when the processor decides to be the leader, and
when it decides not to be the leader. The remaining three variables are used for bookkeeping: asleep determines if the processor has ever sent a ≤probe,id,0,0≥ message that carries the identifier id of the processor. Any processor may send ≤probe,id,phase,
≥ message in both directions (CW and CCW) for different values of phase. Each time a message is sent, a ≤reply,id,phase≥ message may be sent back to the processor. The variables and are used to remember whether the replies have already been processed the processor.
The code of each processor is composed of five segments. The first segment (lines 1–5) initialises the local variables of the processor. The second segment (lines 6–8) can only be executed when the local variable asleep is
. The remaining three segments (lines 9–17, 1–26, and 27–31) describe the actions that the processor takes when it processes each of the three types of messages: ≤probe,ids,phase,ttl≥, ≤reply,ids,phase≥ and ≤terminate≥ respectively. The messages carry parameters , phase and that are natural numbers.
We now describe how the algorithm works. Recall that we assume that the local variables of each processor have been initialised before time 0 of the global clock. Each processor eventually sends a ≤probe,id,0,0≥ message carrying the identifier id of the processor. At that time we say that the processor enters phase number zero. In general, when a processor sends a message ≤probe,id,phase, ≥, we say that the processor enters phase number phase. Message ≤probe,id,0,0≥ is never sent again because
is assigned to asleep in line 7. It may happen that by the time this message is sent, some other messages have already been processed by the processor.
When a processor processes message ≤probe,ids,phase,ttl≥ that has arrived on link CW (the link leading in the clock-wise direction), then the actions depend on the relationship between the parameter and the identifier id of the processor. If is smaller than id, then the processor does nothing else (the processor swallows the message). If is equal to id and processor has not yet decided, then, as we shall see, the probe message that the processor sent has circulated around the entire ring. Then the processor sends a ≤terminate≥ message, decides to be the leader, and terminates (the processor may still process messages after termination). If is larger than id, then actions of the processor depend on the value of the parameter (time-to-live). When the value is strictly larger than zero, then the processor passes on the probe message with decreased by one. If, however, the value of is already zero, then the processor sends back (in the CW direction) a reply message. Symmetric actions are executed when the ≤≤/sl≥probe,ids,phase,ttl≤/sl≥≥ message has arrived on link CCW, in the sense that the directions of sending messages are respectively reversed – see the code for details.
Code for any processor , 1
SEND≤probe,id,0,0≥ to links CW and CCW 9
PROCESS MESSAGE≤probe,ids,phase,ttl≥ that has arrived on link CW (resp. CCW) 10
IFid ids and leader
SEND≤terminate≥ to link CCW 12 leader
IFids id and ttl 15
SEND≤probe,ids,phase,ttl ≥ to link CCW (resp. CW) 16
IFids id and ttl 17
SEND≤reply,ids,phase≥ to link CW (resp. CCW) 18
PROCESS MESSAGE≤reply,ids,phase≥ that has arrived on link CW (resp. CCW) 19
IFid ids 20
SEND≤reply,ids,phase≥ to link CCW (resp. CW) 21
TRUE(resp. CCWreplied) 22
IFCWreplied and CCWreplied 23
SEND≤probe,id,phase+1, ≥ to links CW and CCW 26
PROCESS MESSAGE≤terminate≥ that has arrived on link CW 27
SEND≤terminate≥ to link CCW 29 leader
When a processor processes message ≤reply,ids,phase≥ that has arrived on link CW, then the processor first checks if ids is different from the identifier id of the processor. If so, the processor merely passes on the message. However, if , then the processor records the fact that a reply has been received from direction CW, by assigning
to CWreplied. Next the processor checks if both CWreplied and CCWreplied variables are true. If so, the processor has received replies from both directions. Then the processor assigns false to both variables. Next the processor sends a probe message. This message carries the identifier id of the processor, the next phase number , and an increased time-to-live parameter . Symmetric actions are executed when ≤reply,ids,phase≥ has arrived on link CCW.
The last type of message that a processor can process is ≤terminate≥. The processor checks if it has already decided to be or not to be the leader. When no decision has been made so far, the processor passes on the ≤terminate≥ message and decides not to be the leader. This message eventually reaches a processor that has already decided, and then the message is no longer passed on.
We begin the analysis by showing that the algorithm
solves the leader election problem.
Proof. We need to show that the two conditions listed at the beginning of the section are satisfied. The key idea that simplifies the argument is to focus on one processor. Consider the processor with maximum id among all processors in the ring. This processor eventually executes lines 6–8. Then the processor sends ≤probe,id,0,0≥ messages in CW and CCW directions. Note that whenever the processor sends ≤probe,id,phase, ≥ messages, each such message is always passed on by other processors, until the ttl parameter of the message drops down to zero, or the message travels around the entire ring and arrives at . If the message never arrives at , then a processor eventually receives the probe message with ttl equal to zero, and the processor sends a response back to . Then, eventually receives messages ≤reply,id,phase≥ from each directions, and enters phase number by sending probe messages ≤probe,id,phase+1, ≥ in both directions. These messages carry a larger time-to-live value compared to the value from the previous phase number phase. Since the ring is finite, eventually ttl becomes so large that processor receives a probe message that carries the identifier of . Note that will eventually receive two such messages. The first time when processes such message, the processor sends a ≤terminate≥ message and terminates as the leader. The second time when processes such message, lines 11–13 are not executed, because variable leader is no longer
. Note that no other processor can execute lines 11–13, because a probe message originated at cannot travel around the entire ring, since is on the way, and would swallow the message; and since identifiers are distinct, no other processor sends a probe message that carries the identifier of processor . Thus no processor other than can assign
to its leader variable. Any processor other than will receive the ≤terminate≥ message, assign
to its leader variable, and pass on the message. Finally, the ≤terminate≥ message will arrive at , and will not pass it anymore. The argument presented thus far ensures that eventually exactly one processor assigns
to its leader variable, all other processors assign
to their leader variables, and once a processor has assigned a value to its leader variable, the variable remains unchanged.
Our next task is to give an upper bound on the number of messages sent by the algorithm. The subsequent lemma shows that the number of processors that can enter a phase decays exponentially as the phase number increases.
Proof. There are exactly processors that enter phase number , because each processor eventually sends ≤probe,id,0,0≥ message. The bound stated in the lemma says that the number of processors that enter phase 0 is at most , so the bound evidently holds for . Let us consider any of the remaining cases i.e., let us assume that . Suppose that a processor enters phase number , and so by definition it sends message ≤probe,id,i, ≥. In order for a processor to send such message, each of the two probe messages ≤probe,id,i-1, ≥ that the processor sent in the previous phase in both directions must have made hops always arriving at a processor with strictly lower identifier than the identifier of (because otherwise, if a probe message arrives at a processor with strictly larger or the same identifier, than the message is swallowed, and so a reply message is not generated, and consequently cannot enter phase number ). As a result, if a processor enters phase number , then there is no other processor hops away in both directions that can ever enter the phase. Suppose that there are processors that enter phase . We can associate with each such processor , the consecutive processors that follow in the CW direction. This association assigns distinct processors to each of the processors. So there must be at least distinct processor in the ring. Hence , and so we can weaken this bound by dropping , and conclude that , as desired.
Note that any processor in phase , sends messages that are intended to travel away and back in each direction (CW and CCW). This contributes at most messages per processor that enters phase number . The contribution may be smaller than if a probe message gets swallowed on the way away from the processor. Lemma 13.9 provides an upper bound on the number of processors that enter phase number . What is the highest phase that a processor can ever enter? The number of processors that can be in phase is at most . So when , then there can be no processor that ever enters phase . Thus no processor can enter any phase beyond phase number , because . Finally, a single processor sends one termination message that travels around the ring once. So for the total number of messages sent by the algorithm we get the
Burns furthermore showed that the asynchronous leader election algorithm is asymptotically optimal: Any uniform algorithm solving the leader election problem in an asynchronous ring must send the number of messages at least proportional to .
The proof, for any algorithm, is based on constructing certain executions of the algorithm on rings of size . Then two rings of size are pasted together in such a way that the constructed executions on the smaller rings are combined, and additional messages are received. This construction strategy yields the desired logarithmic multiplicative overhead.
13.3-1 Show that the simplified
algorithm has message complexity, by appropriately assigning identifiers to processors on a ring of size , and by determining how to delay processors and messages.
13.3-2 Show that the algorithm
has message complexity.