## 15.6. 15.6 PRAM algorithms

In this section we consider parallel algorithms solving simple problems as prefix calculation, ranking of the elements of an array, merging, selection and sorting.

In the analysis of the algorithms we try to give the accurate order of the running time in the worst case and try to decide whether the presented algorithm is work-optimal or at least work-efficient or not. When parallel algorithms are compared with sequential algorithms, always the best known sequential algorithm is chosen.

To describe these algorithms we use the following pseudocode conventions.

IN

PARALLEL FOR

TO

DO

.             .             .

For PRAM ordered into a square grid of size the instruction begin with

IN PARALLEL FOR

TO

,

TO

DO

For a -dimensional mesh of size the similar instruction begins with

IN PARALLEL FOR

TO

TO

DO

It is allowed that in this commands represents a group of processors.

### 15.6.1. 15.6.1 Prefix

Let be a binary associative operator defined over a set . We suppose that the operator needs only one set and the set is closed for this operation.

A binary operation is associative on a set, if for all holds

Let the elements of the sequence be elements of the set . Then the input data are the elements of the sequence , and the prefix problem is the computation of the elements . These elements are called prefixes.

It is worth to remark that in other topics of parallel computations the starting sequences of the sequence are called prefixes.

Example 15.1 Associative operations. If is the set of integer numbers, means addition and the sequence of the input data is , then the sequence of the prefixes is . If the alphabet and the input data are the same, but the operation is the multiplication, then . If the operation is the minimum (it is also an associative operation), then . In this case the last prefix is the minimum of the input data.

The prefix problem can be solved by sequential algorithms in time. Any sequential algorithm A requires time to solve the prefix problem. There are parallel algorithms for different models of computation resulting a work-optimal solution of the prefix problem.

In this subsection at first the algorithm CREW-Prefix is introduced, which solves the prefix problem in time, using CREW PRAM processors.

Next is algorithm EREW-Prefix , having similar quantitative characteristics, but requiring only EREW PRAM processors.

These algorithms solve the prefix problem quicker, then the sequential algorithms, but the order of the necessary work is larger.

Therefore interesting is algorithm Optimal-Prefix , which uses only CREW PRAM processors, and makes only steps. The work of this algorithm is only , therefore its efficiency is , and so it is work-optimal. The speedup of this algorithm equals to .

For the sake of simplicity in the further we write usually instead of .

#### 15.6.1.1.  A CREW PRAM algorithm.

As first parallel algorithm a recursive algorithm is presented, which runs on CREW PRAM model of computation, uses processors and time. Designing parallel algorithm it is often used the principle divide-and-conquer, as we we will see in the case of the next algorithm too

Input is the number of processors and the array , output data are the array . We suppose is a power of 2. Since we use the algorithms always with the same number of processors, therefore we omit the number of processors from the list of input parameters. In the mathematical descriptions we prefer to consider and as sequences, while in the pseudocodes sometimes as arrays.

CREW-Prefix( )

1
IF

2
THEN

3
RETURN

4
IF

5
THEN

IN

PARALLEL FOR

TO

DO
compute recursive ,                the prefixes, belonging to

IN

PARALLEL FOR

TO

DO
compute recursive                 the prefixes, belonging to   6

IN

PARALLEL FOR

DO
read  from the global memory and compute   7
RETURN

Example 15.2 Calculation of prefixes of 8 elements on 8 processors. Let and . The input data of the prefix calculation are 12, 3, 6, 8, 11, 4, 5 and 7, the associative operation is the addition.

The run of the recursive algorithm consists of rounds. In the first round (step 4) the first four processors get the input data 12, 3, 6, 8, and compute recursively the prefixes 12, 15, 21, 29 as output. At the same time the other four processors get the input data 11, 4, 5, 7, and compute the prefixes 11, 15, 20, 27.

According to the recursive structure and work as follows. and get and , resp. and get and as input. Recursivity mean for and , that gets and gets , computing at first and , then updates . After this computes and .

While and , according to step 4, compute the final values and , and compute the local provisional values of and .

In the second round (step 5) the first four processors stay, the second four processors compute the final values of and , adding to the provisional values 11, 15, 20 and 27 and receiving 40, 44, 49 and 56.

In the remaining part of the section we use the notation instead of and give the number of used processors in verbal form. If , then we usually prefer to use .

Theorem 15.1 Algorithm CREW-Prefix uses time on p CREW PRAM processors to compute the prefixes of p elements.

Proof. The lines 4–6 require steps, the line 7 does steps. So we get the following recurrence:

Solution of this recursive equation is .

CREW-prefix is not work-optimal, since its work is and we know sequential algorithm requiring only time, but it is work-effective, since all sequential prefix algorithms require time.

#### 15.6.1.2.  An EREW PRAM algorithm.

In the following algorithm we use exclusive write instead of the parallel one, therefore it can be implemented on the EREW PRAM model. Its input is the number of processors and the sequence , and its output is the sequence containing the prefixes.

EREW-Prefix( )

1     2

IN PARALLEL FOR

TO

3
DO

4     5
WHILE

6
DO

IN PARALLEL FOR

TO

7
DO

8              9
RETURN

Theorem 15.2 Algorithm EREW-Prefix computes the prefixes of elements on EREW PRAM processors in time.

Proof. The commands in lines 1–3 and 9 are executed in time. Lines 4–7 are executed so many times as the assignment in line 8, that is times.

#### 15.6.1.3.  A work-optimal algorithm.

Next we consider a recursive work-optimal algorithm, which uses CREW PRAM processors. Input is the length of the input sequence and the sequence , output is the sequence , containing the computed prefixes.

Optimal-Prefix( )

1

IN PARALLEL FOR

TO

2
DO
compute recursive ,              the prefixes of the following  input data               3

IN PARALLEL FOR

TO

4
DO
using
CREW-Prefix
compute ,              the prefixes of the following  elements:               5

IN PARALLEL FOR

TO

6
DO

FOR

TO

7
DO

8

FOR

TO

9
DO

10
RETURN

This algorithm runs in logarithmic time. The following two formulas help to show it:

and

where summing goes using the corresponding associative operation.

Theorem 15.3 (parallel prefix computation in time) Algorithm Optimal-Prefix computes the prefixes of elements on CREW PRAM processors in time.

Proof. Line 1 runs in time, line 2 runs time, line 3 runs time.

This theorem imply that the work of Optimal-Prefix is , therefore Optimal-Prefix is a work-optimal algorithm.

Let the elements of the sequence be the elements of the alphabet . Then the input data of the prefix computation are the elements of the sequence , and the prefix problem is the computation of the elements . These computable elements are called prefixes.

We remark, that in some books on parallel programming often the elements of the sequence are called prefixes.

Example 15.3 Associative operations. If is the set of integers, denotes the addition and the sequence of the input data is 3, -5, 8, 2, 5, 4, then the prefixes are 3, -2, 6, 8, 13, 17. If the alphabet and the input data are the same, the operation is the multiplication, then the output data (prefixes) are 3, -15, -120, -240, -1200, -4800. If the operation is the minimum (it is also associative), then the prefixes are 3, -5, -5, -5, -5, -5. The last prefix equals to the smallest input data.

Sequential prefix calculation can be solved in time. Any A sequential algorithm needs time. There exist work-effective parallel algorithms solving the prefix problem.

Our first parallel algorithm is CREW-Prefix , which uses CREW PRAM processors and requires time. Then we continue with algorithm EREW-Prefix , having similar qualitative characteristics, but running on EREW PRAM model too.

These algorithms solve the prefix problem quicker, than the sequential algorithms, but the order of their work is larger.

Algorithm Optimal-Prefix requires only CREW PRAM processors and in spite of the reduced numbers of processors requires only time. So its work is , therefore its efficiency is and is work-effective. The speedup of the algorithm is .

### 15.6.2. 15.6.2 Ranking

The input of the list ranking problem is a list represented by an array : each element contains the index of its right neighbour (and maybe further data). The task is to determine the rank of the elements. The rank is defined as the number of the right neighbours of the given element.

Since the further data are not necessary to find the solution, for the simplicity we suppose that the elements of the array contain only the index of the right neighbour. This index is called pointer. The pointer of the rightmost element equals to zero.

Example 15.4 Input of list ranking. Let be the array represented in the first row of Figure 15.18. Then the right neighbour of the element is , the right neighbour of is . is the last element, therefore its rank is 0. The rank of is 1, since only one element, is to right from it. The rank of is 4, since the elements and are right from it. The second row of Figure 15.18 shows the elements of in decreasing order of their ranks.

The list ranking problem can be solved in linear time using a sequential algorithm. At first we determine the head of the list which is the unique having the property that does not exist an index with . In our case the head of is . The head of the list has the rank , its right neighbour has a rank and finally the rank of the last element is zero.

In this subsection we present a deterministic list ranking algorithm, which uses EREW PRAM processors and in worst case time. The pseudocode of algorithm Det-Ranking is as follows.

The input of the algorithm is the number of the elements to be ranked , the array containing the index of the right neighbour of the elements of , output is the array containing the computed ranks.

Det-Ranking( )

1

IN PARALLEL FOR

TO

2
DO

IF

3
THEN

4
ELSE

5
FOR

TO

6
DO

IN PARALLEL FOR

TO

7
DO

IF

8
THEN

9               10
RETURN

The basic idea behind the algorithm Det-Ranking is the pointer jumping. According to this algorithm at the beginning each element contains the index of its right neighbour, and accordingly its provisional rank equal to 1 (with exception of the last element of the list, whose rank equals to zero). This initial state is represented in the first row of Figure 15.19.

Then the algorithm modifies the element so, that each element points to the right neighbour of its right neighbour (if it exist, otherwise to the end of the list). This state is represented in the second row of Figure 15.19.

If we have processors, then it can be done in time. After this each element (with exception of the last one) shows to the element whose distance was originally two. In the next step of the pointer jumping the elements will show to such other element whose distance was originally 4 (if there is no such element, then to the last one), as it is shown in the third row of Figure 15.19.

In the next step the pointer part of the elements points to the neighbour of distance 8 (or to the last element, if there is no element of distance 8), according to the last row of Figure 15.19.

In each step of the algorithm each element updates the information on the number of elements between itself and the element pointed by the pointer. Let , resp. the rank, resp. neighbour field of the element . The initial value of is 1 for the majority of the elements, but is 0 for the rightmost element ( in the first line of Figure 15.19). During the pointer jumping gets the new value (if ) gets the new value , if . E.g. in the second row of Figure 15.19) , since its previous rank is 1, and the rank of its right neighbour is also 1. After this will be modified to point to . E.g. in the second row of Figure 15.19 , since the right neighbour of the right neighbour of is .

Theorem 15.4 Algorithm Det-Ranking computes the ranks of an array consisting of elements on EREW PRAM processors in time.

Since the work of Det-Ranking is , this algorithm is not work-optimal, but it is work-efficient.

The list ranking problem corresponds to a list prefix problem, where each element is 1, but the last element of the list is 0. One can easily modify Det-Ranking to get a prefix algorithm.

### 15.6.3. 15.6.3 Merge

The input of the merging problem is two sorted sequences and and the output is one sorted sequence containing the elements of the input.

If the length of the input sequences is , then the merging problem can be solved in time using a sequential processor. Since we have to investigate all elements and write them into the corresponding element of , the running time of any algorithm is . We get this lower bound even in the case when we count only the number of necessary comparisons.

#### 15.6.3.1.  Merge in logarithmic time.

Let and be the input sequences. For the shake of simplicity let be the power of two and let the elements be different.

To merge two sequences of length it is enough to know the ranks of the keys, since then we can write the keys—using processors—into the corresponding memory locations with one parallel write operation. The running time of the following algorithm is a logarithmic, therefore it is called Logarithmic-Merge .

Theorem 15.5 Algorithm Logarithmic-Merge merges two sequences of length on CREW PRAM processors in time.

Proof. Let the rank of element be in (in ). If , then let . If we assign a single processor to the element , then it can determine, using binary search, the number of elements in , which are smaller than . If is known, then computes the rank in the union of and , as . If belongs to , the method is the same.

Summarising the time requirements we get, that using one CREW PRAM processor per element, that is totally processors the running time is .

This algorithm is not work-optimal, only work-efficient.

#### 15.6.3.2.  Odd-even merging algorithm.

This following recursive algorithm Odd-Even-Merge follows the classical divide-and-conquer principle.

Let and be the two input sequences. We suppose that is a power of 2 and the elements of the arrays are different. The output of the algorithm is the sequence , containing the merged elements. This algorithm requires EREW PRAM processors.

Odd-Even-Merge( )

1
IF

2
THEN
get  by merging  and  with one comparison   3
RETURN

4
IF

5
THEN

IN PARALLEL FOR

TO

6
DO
merge recursively  and   7           to get    8

IN PARALLEL FOR

TO

9
DO
merge recursively  and  10           to get   11

IN PARALLEL FOR

TO

12
DO

13            14
IF

15
THEN

16                  17                  18
RETURN

Example 15.5 Merge of twice eight numbers. Let = 1, 5, 8, 11, 13, 16, 21, 26 and = 3, 9, 12, 18, 23, 27, 31, 65. Figure 15.20 shows the sort of 16 numbers.

At first elements of with odd indices form the sequence and elements with even indices form the sequence , and in the same way we get the sequences and . Then comes the recursive merge of the two odd sequences resulting and the recursive merge of the even sequences resulting .

After this Odd-Even-Merge shuffles and , resulting the sequence : the elements of with odd indices come from and the elements with even indices come from .

Finally we compare the elements of with even index and the next element (that is with , with etc.) and if necessary (that is they are not in the good order) they are changed.

Theorem 15.6 (merging in time) Algorithm Odd-Even-Merge merges two sequences of length elements in time using EREW PRAM processors.

Proof. Let denote the running time of the algorithm by . Step 1 requires time, Step 2 time. Therefore we get the recursive equation

having the solution .

We prove the correctness of this algorithm using the zero-one principle.

A comparison-based sorting algorithm is oblivious, if the sequence of comparisons is fixed (elements of the comparison do not depend on the results of the earlier comparisons). This definition means, that the sequence of the pairs of elements to be compared is given.

Theorem 15.7 (zero-one principle) If a simple comparison-based sorting algorithm correctly sorts an arbitrary 0-1 sequence of length n, then it sorts also correctly any sequence of length n consisting of arbitrary keys.

Proof. Let A be a comparison-based oblivious sorting algorithm and let be such a sequence of elements, sorted incorrectly by A. Let suppose A sorts in increasing order the elements of . Then the incorrectly sorted sequence contains an element on the -th position in spite of the fact that contains at least keys smaller than .

Let be the first (having the smallest index) such element of . Substitute in the input sequence the elements smaller than by 0's and the remaining elements by 1's. This modified sequence is a 0-1 sequence therefore A sorts it correctly. This observation implies that in the sorted 0-1 sequence at least 0's precede the 1, written on the place of .

Now denote the elements of the input sequence smaller than by red colour, and the remaining elements by blue colour (in the original and the transformed sequence too). We can show by induction, that the coloured sequences are identical at the start and remain identical after each comparison. According to colours we have three types of comparisons: blue-blue, red-red and blue-red. If the compared elements have the same colour, in both cases (after a change or not-change) the colours remain unchanged. If we compare elements of different colours, then in both sequences the red element occupy the position with smaller index. So finally we get a contradiction, proving the assertion of the theorem.

Example 15.6 A non comparison-based sorting algorithm. Let be a bit sequence. We can sort this sequence simply counting the zeros, and if we count zeros, then write zeros, then ones. Of course, the general correctness of this algorithm is not guaranteed. Since this algorithm is not comparison-based, therefore this fact does not contradict to the zero-one principle.

But merge is sorting, and Odd-Even-Merge is an oblivious sorting algorithm.

Theorem 15.8 Algorithm Odd-Even-Merge sorts correctly sequences consisting of arbitrary numbers.

Proof. Let and sorted 0-1 sequences of length . Let the number of zeros at the beginning of . Then the number of zeros in equals to , while the number of zeros in is . Therefore the number of zeros in equals to and the number of zeros in equals to .

The difference of and is at most 2. This difference is exactly then 2, if and are both odd numbers. Otherwise the difference is at most 1. Let suppose, that (the proof in the other cases is similar). In this cases contains two additional zeros. When the algorithm shuffles and , begins with an even number of zeros, end an even number of ones, and between the zeros and ones is a short “dirty” part, 0, 1. After the comparison and change in the last step of the algorithm the whole sequence become sorted.

#### 15.6.3.3.  A work-optimal merge algorithm.

Algorithm Work-Optimal-Merge uses only processors, but solves the merging in logarithmic time. This algorithm divides the original problem into parts so, that each part contains approximately elements.

Let and be the input sequences. Divide into parts so, that each part contain at most elements. Let the parts be denoted by . Let the largest element in be .

Assign a processor to each element. These processors determine (by binary search) the correct place (according to the sorting) of in . These places divide to parts (some of these parts can be empty). Let denote these parts by . We call the subset corresponding to in (see Figure 15.21).

The algorithm gets the merged sequence merging at first with , with and so on, and then joining these merged sequences.

Theorem 15.9 Algorithm Optimal-Merging merges two sorted sequences of length in time on CREW PRAM processors.

Proof. We use the previous algorithm.

The length of the parts is , but the length of the parts can be much larger. Therefore we repeat the partition. Let an arbitrary pair. If , then and can be merged using one processor in time. But if , then divide into parts—then each part contains at most keys. Assign a processor to each part. This assigned processor finds the subset corresponding to this subsequence in : time is sufficient to do this. So the merge of and can be reduced to subproblems, where each subproblem is the merge of two sequences of length.

The number of the used processors is , and this is at most , what is not larger then .

This theorem imply, that Optimal-Merging is work-optimal.

Corollary 15.10 Optimal-Merging is work-optimal.

### 15.6.4. 15.6.4 Selection

In the selection problem elements and a positive integer are given and the -th smallest element is to be selected. Since selection requires the investigation of all elements, and our operations can handle at most two elements, so .

Since it is known sequential algorithm A requiring only time, so A is asymptotically optimal.

The search problem is similar: in that problem the algorithm has to decide, whether a given element appears in the given sequence, and if yes, then where. Here negative answer is also possible and the features of any element decide, whether it corresponds the requirements or not.

We investigate three special cases and work-efficient algorithms to solve them.

#### 15.6.4.1.  Selection in constant time using processors.

Let , that is we wish to select the largest key. Algorithm Quadratic-Select solves this task in time using CRCW processors.

The input ( different keys) is the sequence , and the selected largest element is returned as .

1
IF

2
THEN

3
RETURN

4

IN PARALLEL FOR

TO

,

TO

DO

IF

5
THEN

FALSE
6
ELSE

TRUE
7

IN PARALLEL FOR

TO

8
DO

TRUE
9

IN PARALLEL FOR

TO

,

TO

10
IF

FALSE
11
THEN

FALSE
12

IN PARALLEL FOR

TO

13
DO

IF

TRUE
14
THEN

15
RETURN

In the first round (lines 4–6) the keys are compared in parallel manner, using all the processors. so, that processor computes the logical value . We suppose that the keys are different. If the elements are not different, then we can use instead of the pair (this solution requires an additional number of length bits. Since there is a unique key for which all comparison result FALSE , this unique key can be found with a logical OR operation is lines 7–11.

Theorem 15.11 (selection in time) Algorithm Quadratic-Select determines the largest key of different keys in time using CRCW common PRAM processors.

Proof. First and third rounds require unit time, the second round requires time, so the total running time is .

The speedup of this algorithm is . The work of the algorithm is . So the efficiency is . It follows that this algorithm is not work-optimal, even it is not work-effective.

#### 15.6.4.2.  Selection in logarithmic time on processors.

Now we show that the maximal element among keys can be found, using even only common CRCW PRAM processors and time. The used technique is the divide-and-conquer. For the simplicity let be a square number.

The input and the output are the same as at the previous algorithm.

Quick-Selection( )

1
IF

2
THEN

3
RETURN

4
IF

5
THEN
divide the input into groups  and              divide the processors into groups   6

IN PARALLEL FOR

TO

7
DO
recursively determines the maximal element  of the group    8

)
9
RETURN

The algorithm divides the input into groups so, that each group contains elements , and divides the processors into groups so, that group contains processors . Then the group of processors computes recursively the maximum of group . Finally the previous algorithm Quadratic-Select gets as input the sequence and finds the maximum y of the input sequence .

Theorem 15.12 (selection in time) Algorithm Quick-Select determines the largest of different elements in time using common CRCW PRAM processors.

Let the running time of the algorithm denoted by . Step 1 requires time, step 2 requires time. Therefore satisfies the recursive equation

having the solution .

The total work of algorithm Quick-Select is , so its efficiency is , therefore Quick-Select is not work-optimal, it is only work-effective.

#### 15.6.4.3.  Selection from integer numbers.

If the problem is to find the maximum of keys when the keys consist of one bit, then the problem can be solved using a logical OR operation, and so requires only constant time using processors. Now we try to extend this observation. Let be a given positive integer constant, and we suppose the keys are integer numbers, belonging to the interval . Then the keys can be represented using at most bits. For the simplicity we suppose that all the keys are given as binary numbers of length bits.

The following algorithm Integer-Selection requires only constant time and CRCW PRAM processors to find the maximum.

The basic idea is to partition the bits of the numbers into parts of length . The -th part contains the bits , the number of the parts is . Figure 15.22 shows the partition.

The input of Integer-Selection is the number of processors and the sequence containing different integer numbers, and output is the maximal number .

Integer-Selection( )

1
FOR

TO

2
DO
compute the maximum  of the remaining numbers on the base of              their -th part  3       delete the numbers whose -th part is smaller than    4  one of the remaining numbers   5
RETURN

The algorithm starts with searching the maximum on the base of the first part of the numbers. Then it delete the numbers, whose first part is smaller, than the maximum. Then this is repeated for the second, ..., last part of the numbers. Any of the non deleted numbers is maximal.

Theorem 15.13 (selection from integer numbers) If the numbers are integers drawn from the interval , then algorithm Integer-Selection determines the largest number among numbers for any positive in time using CRCW PRAM processors.

Proof. Let suppose that we start with the selection of numbers, whose most significant bits are maximal. Let this maximum in the first part denoted by . It is sure that the numbers whose first part is smaller than are not maximal, therefore can be deleted. If we execute this basis operation for all parts (that is times), then exactly those numbers will be deleted, what are not maximal, and all maximal element remain.

If a key contains at most bits, then its value is at most . So algorithm Integer-Select in its first step determines the maximum of integer numbers taken from the interval . The algorithm assigns a processor to each number and uses common memory locations , containing initially . In one step processor writes into . Later the maximum of all numbers can be determined from memory cells using processors by Theorem 15.11 in constant time.

#### 15.6.4.4.  General selection.

Let the sequence contain different numbers and the problem is to select the th smallest element of . Let we have CREW processors.

General-Selection( )

1  divide the  processors into  groups  so, that group         contains the processors  and divide       the  elements into  groups  so, that group        contains the elements   2

IN PARALLEL FOR

TO

3
DO
determine  (how many elements of  are smaller, than )   4

IN PARALLEL FOR

TO

5
DO
using
Optimal-Prefix
determine            (how many elements of  are smaller, than )  6

IN PARALLEL FOR

TO

7
DO

IF

8
THEN RETURN

Theorem 15.14 (general selection) The algorithm General-Selection determines the -th smallest of different numbers in time using processors.

Proof. In lines 2–3 works as a sequential processor, therefore these lines require time. Lines 4–5 require time according to Theorem 15.3. Lines 6–8 can be executed in constant time, so the total running time is .

The work of General-Selection is , therefore this algorithm is not work-effective.

### 15.6.5. 15.6.5 Sorting

Given a sequence the sorting problem is to rearrange the elements of e.g. in increasing order.

It is well-known that any A sequential comparison-based sorting algorithm needs comparisons, and there are comparison-based sorting algorithms with running time.

There are also algorithms, using special operations or sorting numbers with special features, which solve the sorting problem in linear time. If we have to investigate all elements of and permitted operations can handle at most 2 elements, then we get . So it is true, that among the comparison-based and also among the non-comparison-based sorting algorithms are asymptotically optimal sequential algorithms. In this subsection we consider three different sorting algorithm.

#### 15.6.5.1.  Sorting in logarithmic time using processors.

Using the ideas of algorithms Quadratic-Selection and Optimal-Prefix we can sort elements using processors in time.

1
IF

2
THEN

3
RETURN

4

IN PARALLEL FOR

TO

,

TO

DO

IF

5
THEN

6
ELSE

7  divide the processors into  groups  so, that group  contains           processors   8

IN PARALLEL FOR

TO

9
DO
compute   10

IN PARALLEL FOR

TO

11
DO

12
RETURN

In lines 4–7 the algorithm compares all pairs of the elements (as Quadratic-Selection ), then in lines 7–9 (in a similar way as Optimal-Prefix works ) it counts, how many elements of is smaller, than the investigated , and finally in lines 10–12 one processor of each group writes the final result into the corresponding memory cell.

Theorem 15.15 (sorting in time) Algorithm Quadratic-Sort sorts elements using CRCW PRAM processors in time.

Proof. Lines 8–9 require time, and the remaining lines require only constant time.

Since the work of Quadratic-Sort is , this algorithm is not work-effective.

#### 15.6.5.2.  Odd-even algorithm with running time.

The next algorithm uses the Odd-Even-Merge algorithm and the classical divide-and-conquer principle. The input is the sequence , containing the numbers to be sorted, and the output is the sequence , containing the sorted numbers.

Odd-Even-Sort( )

1
IF

2
THEN

3
IF

4
THEN
let  and .   5

IN PARALLEL FOR

TO

6
DO
sort recursively  to get    7

IN PARALLEL FOR

TO

8
DO
sort recursively  to get    9

IN PARALLEL FOR

TO

10
DO
merge  and  using
Odd-Even-Merge(

)
11
RETURN

The running time of this EREW PRAM algorithm is .

Theorem 15.16 (sorting in time) Algorithm Odd-Even-Sort sorts elements in time using EREW PRAM processors.

Proof. Let be the running time of the algorithm. Lines 3–4 require time, Lines 5–8 require time, and lines 9–10 require time, line 11 require time. Therefore satisfies the recurrence

having the solution .

Example 15.7 Sorting on 16 processors. Sort using 16 processors the following numbers: 62, 19, 8, 5, 1, 13, 11, 16, 23, 31, 9, 3, 18, 12, 27, 34. At first we get the odd and even parts, then the first 8 processors gets the sequence , while the other 8 processors get . The output of the first 8 processors is , while the output of the second 8 processors is . The merged final result is .

The work of the algorithm is , its efficiency is , and its speedup is . The algorithm is not work-optimal, but it is work-effective.

#### 15.6.5.3.  Algorithm of Preparata with running time.

If we have more processors, then the running time can be decreased. The following recursive algorithm due to Preparata uses CREW PRAM processors and time. Input is the sequence , and the output is the sequence containing the sorted elements.

Preparata( )

1
IF

2
THEN
sort  using  processors and
Odd-Even-Sort
3
RETURN

4  divide the  elements into  parts  so, that each part        contains  elements, and divide the processors into  groups         so, that each group contains  processors  5

IN PARALLEL FOR

TO

6
DO
sort the part  recursively to get a sorted sequence    7       divide the processors into  groups            containing  processors  8

IN PARALLEL FOR

TO

TO

9
DO
merge  and   10  divide the processors into  groups  so, that each group        contains  processors 11

IN PARALLEL FOR

TO

12
DO
determine the ranks of the  element in  using the local ranks              received in line 9 and using the algorithm
Optimal-Prefix
13       the elements of  having a rank   14
RETURN

This algorithm uses the divide-and-conquer principle. It divides the input into parts, then merges each pair of parts. This merge results local ranks of the elements. The global rank of the elements can be computed summing up these local ranks.

Theorem 15.17 (sorting in time) Algorithm Preparata sorts elements in time using CREW PRAM processors.

Proof. Let the running time be . Lines 4–6 require time, lines 7–12 together . Therefore satisfies the equation

having the solution .

The work of Preparata is the same, as the work of Odd-Even-Sort , but the speedup is better: . The efficiency of both algorithms is .

Exercises

15.6-1 The memory cell of the global memory contains some data. Design an algorithm, which copies this data to the memory cells in time, using EREW PRAM processors.

15.6-2 Design an algorithm which solves the previous Exercise 15.6-1 using only EREW PRAM processors saving the running time.

15.6-3 Design an algorithm having running time and determining the maximum of numbers using common CRCW PRAM processors.

15.6-4 Let be a sequence containing keys. Design an algorithm to determine the rank of any key using CREW PRAM processors and time.

15.6-5 Design an algorithm having running time, which decides using common CRCW PRAM processors, whether element 5 is contained by a given array , and if is contained, then gives the largest index , for which holds.

15.6-6 Design algorithm to merge two sorted sequence of length in time, using CREW PRAM processors.

15.6-7 Determine the running time, speedup, work, and efficiency of all algorithms, discussed in this section.