15.4. 15.4 Computational models

15.4.1. 15.4.1 PRAM

The most popular computational model is the Parallel Random Access Machine (PRAM) which is a natural generalisation of the Random Access Machine (RAM).

The PRAM model consists of synchronised processors , a shared memory with memory cells and memories of the processors. Figure 15.9. shows processors and the shared random access memory

There are variants of this model. They differ in whether multiple processors are allowed to access the same memory cell in a step, and in how the resulting conflicts are resolved. In particular the following variants are distinguished:

Figure 15.9.  Parallel random access machine.

Parallel random access machine.

Types on the base of the properties of read/write operations are

  • EREW (Exclusive-Read Exclusive-Write) PRAM,

  • ERCW (Exclusive-Read Concurrent-Write) PRAM,

  • CREW (Concurrent-Read Exclusive-Write) PRAM,

  • CRCW (Concurrent-Read Concurrent-Write) PRAM.

Figure 15.10(a) shows the case when at most one processor has access a memory cell (ER), and Figure 15.10(d) shows, when multiple processors have access the same cell (CW).

Figure 15.10.  Types of parallel random access machines.

Types of parallel random access machines.

Types of concurrent writing are common, priority, arbitrary, combined.

15.4.2. 15.4.2 BSP, LogP and QSM

Here we consider the models BSP, LogP and QSM.

Bulk-synchronous Parallel Model (BSP) describes a computer as a collection of nodes, each consisting of a processor and memory. BSP supposes the existence of a router and a barrier synchronisation facility. The router transfers messages between the nodes, the barrier synchronises all or a subset of nodes. According to BSP computation is partitioned into supersteps. In a superstep each processor independently performs computations on data in its own memory, and initiates communications with other processors. The communication is guaranteed to complete until the beginning of the next superstep.

is defined such that is the time that is takes to route an -relation under continuous traffic conditions. An -relation is a communication pattern in which each processor sends and receives up to messages.

The cost of a superstep is determined as , where is the maximum number of communications initiated by any processor. The cost of a program is the sum of the costs of the individual supersteps.

BSP contains a cost model that involves three parameters: the number of processors , the cost of a barrier synchronisation and a characteristics of the available bandwidth .

LogP model was motivated by inaccuracies of BSP and the restrictive requirement to follow the superstep structure.

While LogP improves on BSP with respect to reflectivity, QSM improves on it with respect to simplicity. In contrast to BSP, QSM is a shared-memory model. As in BSP, the computation is structured into supersteps, and each processor has its own local memory. In a superstep, a processor performs computations on values in the local memory, and initiates read/write operations to the shared memory. All shared-memory accesses complete until the beginning of the next superstep. QSM allows for concurrent reads and writes. Let the maximum number of accesses to any cell in a superstep be . Then QSM charges costs , with , and being defined in BSP.

15.4.3. 15.4.3 Mesh, hypercube and butterfly

Mesh also is a popular computational model. A -dimensional mesh is an sized grid having a processor in each grid point. The edges are the communication lines, working in two directions. Processors are labelled by -tuples, as .

Each processor is a RAM, having a local memory. The local memory of the processor is . Each processor can execute in one step such basic operations as adding, subtraction, multiplication, division, comparison, read and write from/into the local memory, etc. Processors work in synchronised way, according to a global clock.

The simplest mesh is the chain, belonging to the value . Figure 15.11 shows a chain consisting of 6 processors.

Figure 15.11.  A chain consisting of six processors.

A chain consisting of six processors.

The processors of a chain are . is connected with , is connected with , the remaining processors are connected with and .

If , then we get a rectangle. If now , then we get a square. Figure 15.12 shows a square of size .

Figure 15.12.  A square of size A square of size 4\times4 ..

A square of size 4\times4 .

A square contains several chains consisting of processors. The processors having identical first index, form a row of processors, and the processors having the same second index form a column of processors. Algorithms running on a square often consists of such operations, executed only by processors of some rows or columns.

If , then the corresponding mesh is a brick. In the special case the mesh is called cube. Figure 15.13 shows a cube of size .

Figure 15.13.  A 3-dimensional cube of size A 3-dimensional cube of size 2\times2\times2 ..

A 3-dimensional cube of size 2\times2\times2 .

The next model of computation is the d-dimensional hypercube . This model can be considered as the generalisation of the square and cube: the square represented on Figure 15.12 is a 2-dimensional, and the cube, represented on Figure 15.13 is a 3-dimensional hypercube. The processors of can be labelled by a binary number consisting of bits. Two processors of are connected iff the Hamming-distance of their labels equals to 1. Therefore each processors of has neighbours, and the of is . Figure 15.14 represents .

Figure 15.14.  A 4-dimensional hypercube A 4-dimensional hypercube \mathcal{H}_{4} ..

A 4-dimensional hypercube \mathcal{H}_{4} .

The butterfly model consists of processors and edges. The processors can be labelled by a pair , where is the columnindex and is the level of the given processor. Figure 15.15 shows a butterfly model containing 32 processors in 8 columns and in 4 levels.

Figure 15.15.  A butterfly model.

A butterfly model.

Finally Figure 15.16 shows a ring containing 6 processors.

Figure 15.16.  A ring consisting of 6 processors.

A ring consisting of 6 processors.