13.7. 13.7 Rumor collection algorithms

Reliable multicast services can be used as building blocks in constructing algorithms for more advanced communication problems. In this section we illustrate this method for the problem of collecting rumors by synchronous processors prone to crashes. (Since we consider only fair executions, we assume that at least one processor remains operational to the end of the computation).

13.7.1. 13.7.1 Rumor collection problem and requirements

The classic problem of collecting rumors, or gossip, is defined as follows:

At the beginning, each processor has its distinct piece of information, called a rumor, the goal is to make every processor know all the rumors.

However in the model with processor crashes we need to re-define the gossip problem to respect crash failures of processors. Both Integrity and No-Duplicates properties are the same as in the reliable broadcast service, the only difference (which follows from the specification of the gossip problem) is in Liveness requirements:

  • Non-faulty Liveness: the rumor of every non-faulty processor must be known by each non-faulty processor.

  • Faulty Liveness: if processor has crashed during execution then each non-faulty processor either knows the rumor of or knows that is crashed.

The efficiency of gossip algorithms is measured in terms of time and message complexity. Time complexity measures number of (synchronous) steps from the beginning to the termination. Message complexity measures the total number of point-to-point messages sent (more precisely, if a processor sends a message to three other processors in one synchronous step, it contributes three to the message complexity).

The following simple algorithm completes gossip in just one synchronous step: each processor broadcasts its rumor to all processors. The algorithm is correct, because each message received contains a rumor, and a message not received means the failure of its sender. A drawback of such a solution is that a quadratic number of messages could be sent, which is quite inefficient.

We would like to perform gossip not only quickly, but also with fewer point-to-point messages. There is a natural trade-off between time and communication. Note that in the system without processor crashes such a trade-off may be achieved, e.g., sending messages over the (almost) complete binary tree, and then time complexity is , while the message complexity is . Hence by slightly increasing time complexity we may achieve almost linear improvement in message complexity. However, if the underlying communication network is prone to failures of components, then irregular failure patterns disturb a flow of information and make gossiping last longer. The question we address in this section is what is the best trade-off between time and message complexity in the model with processor crashes?

13.7.2. 13.7.2 Efficient gossip algorithms

In this part we describe the family of gossip algorithms, among which we can find some efficient ones. They are all based on the same generic code, and their efficiency depends on the quality of two data structures put in the generic algorithm. Our goal is to prove that we may find some of those data structures that obtained algorithm is always correct, and efficient if the number of crashes in the execution is at most , where is a parameter.

We start with description of these structures: communication graph and communication schedules.

13.7.2.1.  Communication graph.

A graph consists of a set of vertices and a set of edges. Graphs in this paper are always simple, which means that edges are pairs of vertices, with no direction associated with them. Graphs are used to describe communication patterns. The set of vertices of a graph consists of the processors of the underlying distributed system. Edges in determine the pairs of processors that communicate directly by exchanging messages, but this does not necessarily mean an existence of a physical link between them. We abstract form the communication mechanism: messages that are exchanged between two vertices connected by an edge in may need to be routed and traverse a possibly long path in the underlying physical communication network. Graph topologies we use, for a given number of processors, vary depending on an upper bound on the number of crashes we would like to tolerate in an execution. A graph that matters, at a given point in an execution, is the one induced by the processors that have not crashed till this step of the execution.

To obtain an efficient gossip algorithm, communication graphs should satisfy some suitable properties, for example the following property :

Definition 13.27 Let be a pair of positive integers. Graph is said to satisfy property , if has vertices, and if, for each subgraph of size at least , there is a subgraph of , such that the following hold:

1:

2:

3: The diameter of is at most

4: If , then

In the above definition, clause (1.) requires the existence of subgraphs whose vertices has the potential of (informally) inheriting the properties of the vertices of , clause (2.) requires the subgraphs to be sufficiently large, linear in size, clause (3.) requires the existence of paths in the subgraphs that can be used for communication of at most logarithmic length, and clause (4.) imposes monotonicity on the required subgraphs. Observe that graph is connected, even if is not, since its diameter is finite. The following result shows that graphs satisfying property can be constructed, and that their degree is not too large.

Theorem 13.28 For each , there exists a graph satisfying property . The maximum degree of graph is .

13.7.2.2.  Communication schedules.

A local permutation is a permutation of all the integers in the range . We assume that prior the computation there is given set of local permutations. Each processor has such a permutation from . For simplicity we assume that . Local permutation is used to collect rumor in systematic way according to the order given by this permutation, while communication graphs are rather used to exchange already collected rumors within large and compact non-faulty graph component.

13.7.2.3.  Generic algorithm.

We start with specifying a goal that gossiping algorithms need to achieve. We say that processor has heard about processor if either knows the original input rumor of or knows that has already failed. We may reformulate correctness of a gossiping algorithm in terms of hearing about other processors: algorithm is correct if Integrity and No-Duplicates properties are satisfied and if each processor has hard about any other processor by the termination of the algorithm.

The code of a gossiping algorithm includes objects that depend on the number of processors in the system, and also on the bound on the number of failures which are “efficiently tolerated” (if the number of failures is at most then message complexity of design algorithm is small). The additional parameter is a termination threshold which influences time complexity of the specific implementation of the generic gossip scheme. Our goal is to construct the generic gossip algorithm which is correct for any additional parameters and any communication graph and set of schedules, while efficient for some values and structures and .

Each processor starts gossiping as a collector. Collectors seek actively information about rumors of the other processors, by sending direct inquiries to some of them. A collector becomes a disseminator after it has heard about all the processors. Processors with this status disseminate their knowledge by sending local views to selected other processors.

Local views. Each processor starts with knowing only its ID and its input information . To store incoming data, processor maintains the following arrays:

  , and , 

each of size . All these arrays are initialised to store the value nil. For an array of processor , we denote its th entry by - intuitively this entry contains some information about processor . The array Rumor is used to store all the rumors that a processor knows. At the start, processor sets to its own input . Each time processor learns some , it immediately sets to this value. The array Active is used to store a set of all the processors that the owner of the array knows as crashed. Once processor learns that some processor has failed, it immediately sets to failed. Notice that processor has heard about processor , if one among the values and is not equal to NIL .

The purpose of using the array Pending is to facilitate dissemination. Each time processor learns that some other processor is fully informed, that is, it is either a disseminator itself or has been notified by a disseminator, then it marks this information in . Processor uses the array to send dissemination messages in a systematic way, by scanning to find those processors that possibly still have not heard about some processor.

The following is a useful terminology about the current contents of the arrays Active and Pending. Processor is said to be active according to , if has not yet received any information implying that crashed, which is the same as having nil in . Processor is said to need to be notified by if it is active according to and is equal to nil.

Phases. An execution of a gossiping algorithm starts with the processors initialising all the local objects. Processor initialises its list with nil at all the locations, except for the th one, which is set equal to . The remaining part of execution is structured as a loop, in which phases are iterated. Each phase consists of three parts: receiving messages, local computation, and multicasting messages. Phases are of two kinds: regular phase and ending phase. During regular phases processor: receives messages, updates local knowledge, checks its status, sends its knowledge to neighbours in communication graphs as well as inquiries about rumors and replies about its own rumor. During ending phases processor: receives messages, sends inquiries to all processors from which it has not heard yet, and replies about its own rumor. The regular phases are performed times; the number is a termination threshold. After this, the ending phase is performed four times. This defines a generic gossiping algorithm.

Generic-Gossip

       Code for any processor , ≤01≥
                           INITIALISATION
                        ≤/01≥≤02≥   processor  becomes a collector≤/02≥≤03≥   initialisation of arrays ,  and ≤/03≥         11  
                           REPEAT
                         
                         times  12    
                           PERFORM
                         regular phase          20  
                           REPEAT
                         
                         times  21    
                           PERFORM
                         ending phase 

Now we describe communication and kinds of messages used in regular and ending phases.

Graph and range messages used during regular phases. A processor may send a message to its neighbour in the graph , provided that it is is still active according to . Such a message is called a graph one. Sending these messages only is not sufficient to complete gossiping, because the communication graph may become disconnected as a result of node crashes. Hence other messages are also sent, to cover all the processors in a systematic way. In this kind of communication processor considers the processors as ordered by its local permutation , that is, in the order . Some of additional messages sent in this process are called range ones.

During regular phase processors send the following kind of range messages: inquiring, reply and notifying messages. A collector sends an inquiring message to the first processor about which has not heard yet. Each recipient of such a message sends back a range message that is called a reply one.

Disseminators send range messages also to subsets of processors. Such messages are called notifying ones. The target processor selected by disseminator is the first one that still needs to be notified by . Notifying messages need not to be replied to: a sender already knows the rumors of all the processors, that are active according to it, and the purpose of the message is to disseminate this knowledge.

Regular-Phase

       Code for any processor , ≤01≥
                           RECEIVE
                         messages≤/01≥         11  
                           PERFORM
                         local computation  12    
                           UPDATE
                         the local arrays  13    
                           IF
                         
                         is a collector, that has already heard about all the processors  14       
                           THEN
                         
                         becomes a disseminator  15    
                           COMPUTE
                         set of destination processors: 
                           FOR
                         each processor   16       
                           IF
                         
                         is active according to  and  is a neighbour of  in graph   17          
                           THEN
                         add  to destination set for a graph message  18       
                           IF
                         
                            is a collector and  is the first processor                    about which  has not heard yet 19          
                           THEN
                         send an inquiring message to   20       
                           IF
                         
                         is a disseminator and  is the first processor                    that needs to be notified by  21          
                           THEN
                         send a notifying message to   22       
                           IF
                         
                         is a collector, from which an inquiring message was received                    in the receiving step of this phase 23          
                           THEN
                         send a reply message to           30  
                           SEND
                         graph/inquiring/notifying/reply messages to corresponding destination sets 

Last-resort messages used during ending phases. Messages sent during the ending phases are called last-resort ones. These messages are categorised into inquiring, replying, and notifying, similarly as the corresponding range ones, which is because they serve a similar purpose. Collectors that have not heard about some processors yet send direct inquiries to all of these processors simultaneously. Such messages are called inquiring ones. They are replied to by the non-faulty recipients in the next step, by way of sending reply messages. This phase converts all the collectors into disseminators. In the next phase, each disseminator sends a message to all the processors that need to be notified by it. Such messages are called notifying ones.

The number of graph messages, sent by a processor at a step of the regular phase, is at most as large as the maximum node degree in the communication graph. The number of range messages, sent by a processor in a step of the regular phase, is at most as large as the number of inquiries received plus a constant - hence the global number of point-to-point range messages sent by all processors during regular phases may be accounted as a constant times the number of inquiries sent (which is one per processor per phase). In contrast to that, there is no a priori upper bound on the number of messages sent during the ending phase. By choosing the termination threshold to be large enough, one may control how many rumors still needs to be collected during the ending phases.

Updating local view. A message sent by a processor carries its current local knowledge. More precisely, a message sent by processor brings the following: the ID , the arrays , , and , and a label to notify the recipient about the character of the message. A label is selected from the following: graph_message, inquiry_from_collector, notification_from_disseminator, this_is_a_reply, their meaning is self-explanatory. A processor scans a newly received message from some processor to learn about rumors, failures, and the current status of other processors. It copies each rumor from the received copy of into , unless it is already there. It sets to failed, if this value is at . It sets to done, if this value is at . It sets to done, if is a disseminator and the received message is a range one. If is itself a disseminator, then it sets to done immediately after sending a range message to . If a processor expects a message to come from processor , for instance a graph one from a neighbour in the communication graph, or a reply one, and the message does not arrive, then knows that processor has failed, and it immediately sets to failed.

Ending-Phase

       Code for any processor , ≤01≥
                           RECEIVE
                         messages≤/01≥         11  
                           PERFORM
                         local computation  12    
                           UPDATE
                         the local arrays  13    
                           IF
                         
                         is a collector, that has already heard about all the processors  14       
                           THEN
                         
                         becomes a disseminator  15    
                           COMPUTE
                         set of destination processors: 
                           FOR
                         each processor   16       
                           IF
                         
                         is a collector and it has not heard about  yet  17          
                           THEN
                         send an inquiring message to   18       
                           IF
                         
                         is a disseminator and  needs to be notified by   19          
                           THEN
                         send a notifying message to           20       
                           IF
                         an inquiring message was received from                  in the receiving step of this phase 21          
                           THEN
                         send a reply message to           30  
                           SEND
                         inquiring/notifying/reply messages to corresponding destination sets 

Correctness. Ending phases guarantee correctness, as is stated in the next fact.

Lemma 13.29 Generic-Gossip is correct for every communication graph and set of schedules .

Proof. Integrity and No-Duplicates properties follow directly from the code and the multicast service in synchronous message-passing system. It remains to prove that each processor has heard about all processors. Consider the step just before the first ending phases. If a processor has not heard about some other processor yet, then it sends a last-resort message to in the first ending phase. It is replied to in the second ending phase, unless processor has crashed already. In any case, in the third ending phase, processor either learns the input rumor of or it gets to know that has failed. The fourth ending phase provides an opportunity to receive notifying messages, by all the processors that such messages were sent to by .

The choice of communication graph , set of schedules and termination threshold influences however time and message complexities of the specific implementation of Generic Gossip Algorithm. First consider the case when is a communication graph satisfying property from Definition 13.27, contains random permutations, and for sufficiently large positive constant . Using Theorem 13.28 we get the following result.

Theorem 13.30 For every and , for some constant , there is a graph such that the implementation of the generic gossip scheme with as a communication graph and a set of random permutations completes gossip in expected time and with expected message complexity , if the number of crashes is at most .

Consider a small modification of Generic Gossip scheme: during regular phase every processor sends an inquiring message to the first (instead of one) processors according to permutation , where is a maximum degree of used communication graph . Note that it does not influence the asymptotic message complexity, since besides inquiring messages in every regular phase each processor sends graph messages.

Theorem 13.31 For every there are parameters and and there is a graph such that the implementation of the modified Generic Gossip scheme with as a communication graph and a set of random permutations completes gossip in expected time and with expected message complexity , for any number of crashes.

Since in the above theorem set is selected prior the computation, we obtain the following existential deterministic result.

Theorem 13.32 For every there are parameters and and there are graph and set of schedules such that the implementation of the modified Generic Gossip scheme with as a communication graph and schedules completes gossip in time and with message complexity , for any number of crashes.

Exercises

13.7-1 Design executions showing that there is no relation between Causal Order and Total Order and between Single-Source FIFO and Total Order broadcast services. For simplicity consider two processors and two messages sent.

13.7-2 Does broadcast service satisfying Single-Source FIFO and Causal Order requirements satisfy a Total Order property? Does broadcast service satisfying Single-Source FIFO and Total Order requirements satisfy a Causal Order property? If yes provide a proof, if not show a counterexample.

13.7-3 Show that using reliable Basic Broadcast instead of Basic Broadcast in the implementation of Single-Source FIFO service, then we obtain reliable Single-Source FIFO broadcast.

13.7-4 Prove that the Ordered Broadcast algorithm implements Causal Order service on a top of Single-Source FIFO one.

13.7-5 What is the total number of point-to-point messages sent in the algorithm Ordered-Broadcast in case of broadcasts?

13.7-6 Estimate the total number of point-to-point messages sent during the execution of Reliable-Causally-Ordered-Broadcast , if it performs broadcast and there are processor crashes during the execution.

13.7-7 Show an execution of the algorithm Reliable-Causally-Ordered-Broadcast which violates Total Order requirement.

13.7-8 Write a code of the implementation of reliable Sub-Total Order multicast service.

13.7-9 Show that the described method of implementing multicast services on the top of corresponding broadcast services is correct.

13.7-10 Show that the random graph - in which each node selects independently at random edges from itself to other processors - satisfies property from Definition 13.27 and has degree with probability at least .

13.7-11 Leader election problem is as follows: all non-faulty processors must elect one non-faulty processor in the same synchronous step. Show that leader election can not be solved faster than gossip problem in synchronous message-passing system with processors crashes.