Among the fundamental problems in distributed systems where processors communicate by message passing are the tasks of spreading and gathering information. Many distributed algorithms for communication networks can be constructed using building blocks that implement various broadcast and multicast services. In this section we present some basic communication services in the message-passing model. Such services typically need to satisfy some quality of service requirements dealing with ordering of messages and reliability. We first focus on broadcast services, then we discuss more general multicast services.
In the broadcast problem, a selected processor , called a source or a sender, has the message , which must be delivered to all processors in the system (including the source). The interface of the broadcast service is specified as follows:
bc-send : an event of processor that sends a message to all processors.
bc-recv : an event of processor that receives a message sent by processor .
In above definitions qos denotes the quality of service provided by the system. We consider two kinds of quality service:
Ordering: how the order of received messages depends on the order of messages sent by the source?
Reliability: how the set of received messages depends on the failures in the system?
The basic model of a message-passing distributed system normally does not guarantee any ordering or reliability of messaging operations. In the basic model we only assume that each pair of processors is connected by a link, and message delivery is independent on each link — the order of received messages may not be related to the order of the sent messages, and messages may be lost in the case of crashes of senders or receivers.
We present some of the most useful requirements for ordering and reliability of broadcast services. The main question we address is how to implement a stronger service on top of the weaker service, starting with the basic system model.
Applying the definition of happens before to messages, we say that message happens before message if either and are sent by the same processor and is sent before , or the bc-recv event for happens before the bc-send event for .
We identify four common broadcast services with respect to the message ordering properties:
Basic Broadcast: no order of messages is guaranteed.
Single-Source FIFO (first-in-first-out): messages sent by one processor are received by each processor in the same order as sent; more precisely, for all processors and messages , if processor sends before it sends then processor does not receive message before message .
Causal Order: messages are received in the same order as they happen; more precisely, for all messages and every processor , if happens before then does not receive before .
Total Order: the same order of received messages is preserved in each processor; more precisely, for all processors and messages , if processor receives before it receives then processor does not receive message before message .
It is easy to see that Causal Order implies Single-Source FIFO requirements (since the relation “happens before” for messages includes the order of messages sent by one processor), and each of the given services trivially implies Basic Broadcast. There are no additional relations between these four services. For example, there are executions that satisfy Single-Source FIFO property, but not Causal Order. Consider two processors and . In the first event broadcasts message , next processor receives , and then broadcasts message . It follows that happens before . But if processor receives before , which may happen, then this execution violates Causal Order. Note that trivially Single-Source FIFO requirement is preserved, since each processor broadcasts only one message.
We denote by the Basic Broadcast service, by ssf the Single-Source FIFO, by the Causal Order and by the Total Order service.
In the model without failures we would like to guarantee the following properties of broadcast services:
Integrity: each message received in event bc-recv has been sent in some bc-send event.
No-Duplicates: each processor receives a message not more than once.
Liveness: each message sent is received by all processors.
In the model with failures we define the notion of reliable broadcast service, which satisfies Integrity, No-Duplicates and two kinds of Liveness properties:
Nonfaulty Liveness: each message sent by non-faulty processor must be received by every non-faulty processor.
Faulty Liveness: each message sent by a faulty processor is either received by all non-faulty processors or by none of them.
We denote by rbb the Reliable Basic Broadcast service, by rssf the Reliable Single-Source FIFO, by the Reliable Causal Order, and by rto the Reliable Total Order service.
We now describe implementations of algorithms for various broadcast services.
The bb service is implemented as follows. If event occurs then processor sends message via every link from to , where . If a message comes to processor then it enables event .
To provide reliability we do the following. We build the reliable broadcast on the top of basic broadcast service. When occurs, processor enables event . If event occurs and message-coordinate appears for the first time then processor first enables event (to inform other non-faulty processors about message in case when processor is faulty), and next enables event .
We prove that the above algorithm provides reliability for the basic broadcast service. First observe that Integrity and No-Duplicates properties follow directly from the fact that each processor enables only if message-coordinate is received for the first time. Nonfaulty liveness is preserved since links between non-faulty processors enables events correctly. Faulty Liveness is guaranteed by the fact that if there is a non-faulty processor which receives message from the faulty source , then before enabling processor sends message using event. Since is non-faulty, each non-faulty processor gets message in some event, and then accepts it (enabling event ) during the first such event.
Each processor has its own counter (timestamp), initialised to . If event occurs then processor sends message with its current timestamp attached, using . If an event occurs then processor enables event just after events have been enabled, where are the messages such that events have been enabled.
Note that if we use reliable Basic Broadcast instead of Basic Broadcast as the background service, the above implementation of Single-Source FIFO becomes Reliable Single-Source FIFO service. We leave the proof to the reader as an exercise.
We present an ordered broadcast algorithm which works in the asynchronous message-passing system providing single-source FIFO broadcast service. It uses the idea of timestamps, but in more advanced way than in the implementation of ssf. We denote by cto the service satisfying causal and total orders requirements.
Each processor maintains in a local array its own increasing counter (timestamp), and the estimated values of timestamps of other processors. Timestamps are used to mark messages before sending—if is going to broadcast a message, it increases its timestamp and uses it to tag this message (lines 11-13). During the execution processor estimates values of timestamps of other processors in the local vector —if processor receives a message from processor with a tag (timestamp of ), it puts into (lines 23–32). Processor sets its current timestamp to be the maximum of the estimated timestamps in the vector plus one (lines 24–26). After updating the timestamp processor sends an update message. Processor accepts a message with associated timestamp from processor if pair is the smallest among other received messages (line 42), and each processor has at least as large a timestamp as known by processor (line 43). The details are given in the code below.
Code for any processor , ≤01≥
INITIALISATION≤/01≥≤02≥ for every ≤/02≥ 11
ADDtriple to pending 23 24
IF42 is the pending triple with the smallest and for every 43
REMOVEtriple from pending
satisfies the causal order requirement. We leave the proof to the reader as an exercise (in the latter part we show how to achieve stronger reliable causal order service and provide the proof for that stronger case).
Proof. Integrity follows from the fact that each processor can enable event only if the triple is pending (lines 41–45), which may happen after receiving a message from processor (lines 21–22). No-Duplicates property is guaranteed by the fact that there is at most one pending triple containing message sent by processor (lines 13 and 21–22).
Liveness follows from the fact that each pending triple satisfies conditions in lines 42–43 in some moment of the execution. The proof of this fact is by induction on the events in the execution — suppose to the contrary that is the triple with smallest which does not satisfy conditions in lines 42–43 at any moment of the execution. It follows that there is a moment from which triple has smallest coordinates among pending triples in processor . Hence, starting from this moment, it must violate condition in line 43 for some . Note that , by updating rules in lines 23–25. It follows that processor never receives a message from with timestamp greater than , which by updating rules in lines 24–26 means that processor never receives a message from , which contradicts the liveness property of broadcast service.
To prove Total Order property it is sufficient to prove that for every processor and messages sent by processors with timestamps respectively, each of the triples , are accepted according to the lexicographic order of . There are two cases.
Case 1. Both triples are pending in processor at some moment of the execution. Then condition in line 42 guarantees acceptance in order of .
Case 2. Triple (without loss of generality) is accepted by processor before triple is pending. If then still the acceptance is according to the order of . Otherwise , and by condition in line 43 we get in particular that , and consequently . This can not happen because of the ssf requirement and the assumption that processor has not yet received message from via the broadcast service.
Now we address reliable versions of Causal Order and Total Order services. A Reliable Causal Order requirements can be implemented on the top of Reliable Basic Broadcast service in asynchronous message-passing system with processor crashes using the following algorithm. It uses the same data structures as previous
. The main difference between reliable
are as follows: instead of using integer timestamps processors use vector timestamps , and they do not estimate timestamps of other processors, only compare in lexicographic order their own (vector) timestamps with received ones. The intuition behind vector timestamp of processor is that it stores information how many messages have been sent by and how many have been accepted by from every , where .
In the course of the algorithm processor increases corresponding position in its vector timestamp before sending a new message (line 12), and increases th position of its vector timestamp after accepting new message from processor (line 38). After receiving a new message from processor together with its vector timestamp , processor adds triple to pending and accepts this triple if it is first not accepted message received from processor (condition in line 33) and the number of accepted messages (from each processor ) by processor was not bigger in the moment of sending than it is now in processor (condition in line 34). Detailed code of the algorithm follows.
Code for any processor , ≤01≥
INITIALISATION≤/01≥≤02≥ for every ≤/02≥≤03≥ pending list is empty≤/03≥ 11
ADDtriple to pending 31
IFis the pending triple, and 32 , and 33 for every 34
REMOVEtriple from pending 36
We argue that the algorithm
provides Reliable Causal Order broadcast service on the top of the system equipped with the Reliable Basic Broadcast service. Integrity and No-Duplicate properties are guaranteed by rbb broadcast service and facts that each message is added to pending at most once and non-received message is never added to pending. Nonfaulty and Faulty Liveness can be proved by one induction on the execution, using facts that non-faulty processors have received all messages sent, which guarantees that conditions in lines 33–34 are eventually satisfied. Causal Order requirement holds since if message happens before message then each processor accepts messages according to the lexicographic order of , and these vector-arrays are comparable in this case. Details are left to the reader.
Note that Reliable Total Order broadcast service can not be implemented in the general asynchronous setting with processor crashes, since it would solve consensus in this model — first accepted message would determine the agreement value (against the fact that consensus is not solvable in the general model).
Multicast services are similar to the broadcast services, except each multicast message is destined for a specified subset of all processors.In the multicast service we provide two types of events, where qos denotes a quality of service required:
: an event of processor which sends a message together with its id to all processors in a destination set .
: an event of processor which receives a message sent by processor .
Note that the event mc-recv is similar to bc-recv.
As in case of a broadcast service, we would like to provide useful ordering and reliable properties of the multicast services. We can adapt ordering requirements from the broadcast services. Basic Multicast does not require any ordering properties. Single-Source FIFO requires that if one processor multicasts messages (possibly to different destination sets), then the messages received in each processors (if any) must be received in the same order as sent by the source. Definition of Causal Order remains the same. Instead of Total Order, which is difficult to achieve since destination sets may be different, we define another ordering property:
Sub-Total Order: orders of received messages in all processors may be extended to the total order of messages; more precisely, for any messages and processors , if and receives both messages then they are received in the same order by and .
The reliability conditions for multicast are somewhat different from the conditions for reliable broadcast.
Integrity: each message received in event was sent in some mc-send event with destination set containing processor .
No Duplicates: each processor receives a message not more than once.
Nonfaulty Liveness: each message sent by non-faulty processor must be received in every non-faulty processor in the destination set.
Faulty Liveness: each message sent by a faulty processor is either received by all non-faulty processors in the destination set or by none of them.
One way of implementing ordered and reliable multicast services is to use the corresponding broadcast services (for Sub-Total Order the corresponding broadcast requirement is Total Order). More precisely, if event occurs processor enables event . When an event occurs, processor enables event if , otherwise it ignores this event. The proof that such method provides required multicast quality of service is left as an exercise.