In this section we consider parallel algorithms solving simple problems as prefix calculation, ranking of the elements of an array, merging, selection and sorting.

In the analysis of the algorithms we try to give the accurate order of the running time in the worst case and try to decide whether the presented algorithm is work-optimal or at least work-efficient or not. When parallel algorithms are compared with sequential algorithms, always the best known sequential algorithm is chosen.

To describe these algorithms we use the following pseudocode conventions.

`IN`

`PARALLEL FOR`

`TO`

. . .`DO`

For PRAM ordered into a square grid of size the instruction begin with

`IN PARALLEL FOR`

,`TO`

`TO`

`DO`

For a -dimensional mesh of size the similar instruction begins with

`IN PARALLEL FOR`

`TO`

`TO`

`DO`

It is allowed that in this commands represents a group of processors.

Let be a binary associative operator defined over a set . We suppose that the operator needs only one set and the set is closed for this operation.

A binary operation is *associative* on a set, if for all holds

Let the elements of the sequence be elements of the set . Then the input data are the elements of the sequence , and the **prefix problem** is the computation of the elements . These elements are called **prefixes**.

It is worth to remark that in other topics of parallel computations the starting sequences of the sequence are called prefixes.

**Example 15.1 **
*Associative operations.* If is the set of integer numbers, means addition and the sequence of the input data is , then the sequence of the prefixes is . If the alphabet and the input data are the same, but the operation is the multiplication, then . If the operation is the minimum (it is also an associative operation), then . In this case the last prefix is the minimum of the input data.

The prefix problem can be solved by sequential algorithms in time. Any sequential algorithm A requires time to solve the prefix problem. There are parallel algorithms for different models of computation resulting a work-optimal solution of the prefix problem.

In this subsection at first the algorithm *
CREW-Prefix
* is introduced, which solves the prefix problem in time, using CREW PRAM processors.

Next is algorithm *
EREW-Prefix
*, having similar quantitative characteristics, but requiring only EREW PRAM processors.

These algorithms solve the prefix problem quicker, then the sequential algorithms, but the order of the necessary work is larger.

Therefore interesting is algorithm *
Optimal-Prefix
*, which uses only CREW PRAM processors, and makes only steps. The work of this algorithm is only , therefore its efficiency is , and so it is work-optimal. The speedup of this algorithm equals to .

For the sake of simplicity in the further we write usually instead of .

As first parallel algorithm a recursive algorithm is presented, which runs on CREW PRAM model of computation, uses processors and time. Designing parallel algorithm it is often used the principle *divide-and-conquer*, as we we will see in the case of the next algorithm too

Input is the number of processors and the array , output data are the array . We suppose is a power of 2. Since we use the algorithms always with the same number of processors, therefore we omit the number of processors from the list of input parameters. In the mathematical descriptions we prefer to consider and as sequences, while in the pseudocodes sometimes as arrays.

*
CREW-Prefix(
)
*

12`IF`

3`THEN`

4`RETURN`

5`IF`

`THEN`

`IN`

`PARALLEL FOR`

`TO`

compute recursive , the prefixes, belonging to`DO`

`IN`

`PARALLEL FOR`

`TO`

compute recursive the prefixes, belonging to 6`DO`

`IN`

`PARALLEL FOR`

read from the global memory and compute 7`DO`

`RETURN`

**Example 15.2 **
*Calculation of prefixes of 8 elements on 8 processors.* Let and . The input data of the prefix calculation are 12, 3, 6, 8, 11, 4, 5 and 7, the associative operation is the addition.

The run of the *recursive algorithm* consists of *rounds*. In the first round (step 4) the first four processors get the input data 12, 3, 6, 8, and compute recursively the prefixes 12, 15, 21, 29 as output. At the same time the other four processors get the input data 11, 4, 5, 7, and compute the prefixes 11, 15, 20, 27.

According to the recursive structure and work as follows. and get and , resp. and get and as input. Recursivity mean for and , that gets and gets , computing at first and , then updates . After this computes and .

While and , according to step 4, compute the final values and , and compute the local provisional values of and .

In the second round (step 5) the first four processors stay, the second four processors compute the final values of and , adding to the provisional values 11, 15, 20 and 27 and receiving 40, 44, 49 and 56.

In the remaining part of the section we use the notation instead of and give the number of used processors in verbal form. If , then we usually prefer to use .

**Theorem 15.1 **
*Algorithm
*

`CREW-Prefix`

uses time on p
`CREW PRAM`

processors to compute the prefixes of p elements.
**Proof. **The lines 4–6 require steps, the line 7 does steps. So we get the following recurrence:

Solution of this recursive equation is .

*
CREW-prefix
* is not work-optimal, since its work is and we know sequential algorithm requiring only time, but it is work-effective, since all sequential prefix algorithms require time.

In the following algorithm we use exclusive write instead of the parallel one, therefore it can be implemented on the EREW PRAM model. Its input is the number of processors and the sequence , and its output is the sequence containing the prefixes.

*
EREW-Prefix(
)
*

1 2`IN PARALLEL FOR`

3`TO`

4 5`DO`

6`WHILE`

`DO`

`IN PARALLEL FOR`

7`TO`

8 9`DO`

`RETURN`

**Theorem 15.2 **
*Algorithm
*

`EREW-Prefix`

computes the prefixes of elements on
`EREW PRAM`

processors in time.
**Proof. **The commands in lines 1–3 and 9 are executed in time. Lines 4–7 are executed so many times as the assignment in line 8, that is times.

Next we consider a recursive work-optimal algorithm, which uses CREW PRAM processors. Input is the length of the input sequence and the sequence , output is the sequence , containing the computed prefixes.

*
Optimal-Prefix(
)
*

1`IN PARALLEL FOR`

2`TO`

compute recursive , the prefixes of the following input data 3`DO`

`IN PARALLEL FOR`

4`TO`

using`DO`

compute , the prefixes of the following elements: 5`CREW-Prefix`

`IN PARALLEL FOR`

6`TO`

`DO`

`FOR`

7`TO`

8`DO`

`FOR`

9`TO`

10`DO`

`RETURN`

This algorithm runs in logarithmic time. The following two formulas help to show it:

and

where summing goes using the corresponding associative operation.

**Theorem 15.3 **
**(parallel prefix computation in time)**
*Algorithm
*

`Optimal-Prefix`

computes the prefixes of elements on
`CREW PRAM`

processors in time.
**Proof. **Line 1 runs in time, line 2 runs time, line 3 runs time.

This theorem imply that the work of *
Optimal-Prefix
* is , therefore

`Optimal-Prefix`

Let the elements of the sequence be the elements of the alphabet . Then the input data of the prefix computation are the elements of the sequence , and the **prefix problem** is the computation of the elements . These computable elements are called **prefixes**.

We remark, that in some books on parallel programming often the elements of the sequence are called prefixes.

**Example 15.3 **Associative operations. If is the set of integers, denotes the addition and the sequence of the input data is 3, -5, 8, 2, 5, 4, then the prefixes are 3, -2, 6, 8, 13, 17. If the alphabet and the input data are the same, the operation is the multiplication, then the output data (prefixes) are 3, -15, -120, -240, -1200, -4800. If the operation is the minimum (it is also associative), then the prefixes are 3, -5, -5, -5, -5, -5. The last prefix equals to the smallest input data.

Sequential prefix calculation can be solved in time. Any A sequential algorithm needs time. There exist work-effective parallel algorithms solving the prefix problem.

Our first parallel algorithm is *
CREW-Prefix
*, which uses CREW PRAM processors and requires time. Then we continue with algorithm

`EREW-Prefix`

These algorithms solve the prefix problem quicker, than the sequential algorithms, but the order of their work is larger.

Algorithm *
Optimal-Prefix
* requires only CREW PRAM processors and in spite of the reduced numbers of processors requires only time. So its work is , therefore its efficiency is and is work-effective. The speedup of the algorithm is .

The input of the **list ranking problem** is a list represented by an array : each element contains the index of its **right neighbour** (and maybe further data). The task is to determine the rank of the elements. The **rank** is defined as the number of the right neighbours of the given element.

Since the further data are not necessary to find the solution, for the simplicity we suppose that the elements of the array contain only the index of the right neighbour. This index is called **pointer**. The pointer of the rightmost element equals to zero.

**Example 15.4 **
*Input of list ranking.* Let be the array represented in the first row of Figure 15.18. Then the right neighbour of the element is , the right neighbour of is . is the last element, therefore its rank is 0. The rank of is 1, since only one element, is to right from it. The rank of is 4, since the elements and are right from it. The second row of Figure 15.18 shows the elements of in decreasing order of their ranks.

The list ranking problem can be solved in linear time using a sequential algorithm. At first we determine **the head of the list** which is the unique having the property that does not exist an index with . In our case the head of is . The head of the list has the rank , its right neighbour has a rank and finally the rank of the last element is zero.

In this subsection we present a deterministic list ranking algorithm, which uses EREW PRAM processors and in worst case time. The pseudocode of algorithm *
Det-Ranking
* is as follows.

The input of the algorithm is the number of the elements to be ranked , the array containing the index of the right neighbour of the elements of , output is the array containing the computed ranks.

*
Det-Ranking(
)
*

1`IN PARALLEL FOR`

2`TO`

`DO`

3`IF`

4`THEN`

5`ELSE`

`FOR`

6`TO`

`DO`

`IN PARALLEL FOR`

7`TO`

`DO`

8`IF`

9 10`THEN`

`RETURN`

The basic idea behind the algorithm *
Det-Ranking
* is the

Then the algorithm modifies the element so, that each element points to the right neighbour of its right neighbour (if it exist, otherwise to the end of the list). This state is represented in the second row of Figure 15.19.

If we have processors, then it can be done in time. After this each element (with exception of the last one) shows to the element whose distance was originally two. In the next step of the pointer jumping the elements will show to such other element whose distance was originally 4 (if there is no such element, then to the last one), as it is shown in the third row of Figure 15.19.

In the next step the pointer part of the elements points to the neighbour of distance 8 (or to the last element, if there is no element of distance 8), according to the last row of Figure 15.19.

In each step of the algorithm each element updates the information on the number of elements between itself and the element pointed by the pointer. Let , resp. the rank, resp. neighbour field of the element . The initial value of is 1 for the majority of the elements, but is 0 for the rightmost element ( in the first line of Figure 15.19). During the pointer jumping gets the new value (if ) gets the new value , if . E.g. in the second row of Figure 15.19) , since its previous rank is 1, and the rank of its right neighbour is also 1. After this will be modified to point to . E.g. in the second row of Figure 15.19 , since the right neighbour of the right neighbour of is .

**Theorem 15.4 **
*Algorithm
*

`Det-Ranking`

computes the ranks of an array consisting of elements on
`EREW PRAM`

processors in time.Since the work of *
Det-Ranking
* is , this algorithm is not work-optimal, but it is work-efficient.

The list ranking problem corresponds to a list prefix problem, where each element is 1, but the last element of the list is 0. One can easily modify *
Det-Ranking
* to get a prefix algorithm.

The input of the **merging problem** is two sorted sequences and and the output is one sorted sequence containing the elements of the input.

If the length of the input sequences is , then the merging problem can be solved in time using a sequential processor. Since we have to investigate all elements and write them into the corresponding element of , the running time of any algorithm is . We get this lower bound even in the case when we count only the number of necessary comparisons.

Let and be the input sequences. For the shake of simplicity let be the power of two and let the elements be different.

To merge two sequences of length it is enough to know the ranks of the keys, since then we can write the keys—using processors—into the corresponding memory locations with one parallel write operation. The running time of the following algorithm is a logarithmic, therefore it is called *
Logarithmic-Merge
*.

**Theorem 15.5 **
*Algorithm
*

`Logarithmic-Merge`

merges two sequences of length on
`CREW PRAM`

processors in time.
**Proof. **Let the rank of element be in (in ). If , then let . If we assign a single processor to the element , then it can determine, using binary search, the number of elements in , which are smaller than . If is known, then computes the rank in the union of and , as . If belongs to , the method is the same.

Summarising the time requirements we get, that using one CREW PRAM processor per element, that is totally processors the running time is .

This algorithm is not work-optimal, only work-efficient.

This following recursive algorithm *
Odd-Even-Merge
* follows the classical

Let and be the two input sequences. We suppose that is a power of 2 and the elements of the arrays are different. The output of the algorithm is the sequence , containing the merged elements. This algorithm requires EREW PRAM processors.

*
Odd-Even-Merge(
)
*

12`IF`

get by merging and with one comparison 3`THEN`

4`RETURN`

5`IF`

`THEN`

`IN PARALLEL FOR`

6`TO`

merge recursively and 7 to get 8`DO`

`IN PARALLEL FOR`

9`TO`

merge recursively and 10 to get 11`DO`

`IN PARALLEL FOR`

12`TO`

13 14`DO`

15`IF`

16 17 18`THEN`

`RETURN`

**Example 15.5 **
*Merge of twice eight numbers.* Let = 1, 5, 8, 11, 13, 16, 21, 26 and = 3, 9, 12, 18, 23, 27, 31, 65. Figure 15.20 shows the sort of 16 numbers.

At first elements of with odd indices form the sequence and elements with even indices form the sequence , and in the same way we get the sequences and . Then comes the recursive merge of the two odd sequences resulting and the recursive merge of the even sequences resulting .

After this *
Odd-Even-Merge
* shuffles and , resulting the sequence : the elements of with odd indices come from and the elements with even indices come from .

Finally we compare the elements of with even index and the next element (that is with , with etc.) and if necessary (that is they are not in the good order) they are changed.

**Theorem 15.6 **
**(merging in time)**
*Algorithm
*

`Odd-Even-Merge`

merges two sequences of length elements in time using
`EREW PRAM`

processors.
**Proof. **Let denote the running time of the algorithm by . Step 1 requires time, Step 2 time. Therefore we get the recursive equation

having the solution .

We prove the correctness of this algorithm using the **zero-one principle**.

A comparison-based sorting algorithm is **oblivious**, if the sequence of comparisons is fixed (elements of the comparison do not depend on the results of the earlier comparisons). This definition means, that the sequence of the pairs of elements to be compared is given.

**Theorem 15.7 **
**(zero-one principle)**
*If a simple comparison-based sorting algorithm correctly sorts an arbitrary 0-1 sequence of length n, then it sorts also correctly any sequence of length n consisting of arbitrary keys.*

**Proof. **Let A be a comparison-based oblivious sorting algorithm and let be such a sequence of elements, sorted incorrectly by A. Let suppose A sorts in increasing order the elements of . Then the incorrectly sorted sequence contains an element on the -th position in spite of the fact that contains at least keys smaller than .

Let be the first (having the smallest index) such element of . Substitute in the input sequence the elements smaller than by 0's and the remaining elements by 1's. This modified sequence is a 0-1 sequence therefore A sorts it correctly. This observation implies that in the sorted 0-1 sequence at least 0's precede the 1, written on the place of .

Now denote the elements of the input sequence smaller than by red colour, and the remaining elements by blue colour (in the original and the transformed sequence too). We can show by induction, that the coloured sequences are identical at the start and remain identical after each comparison. According to colours we have three types of comparisons: blue-blue, red-red and blue-red. If the compared elements have the same colour, in both cases (after a change or not-change) the colours remain unchanged. If we compare elements of different colours, then in both sequences the red element occupy the position with smaller index. So finally we get a contradiction, proving the assertion of the theorem.

**Example 15.6 **
*A non comparison-based sorting algorithm.* Let be a bit sequence. We can sort this sequence simply counting the zeros, and if we count zeros, then write zeros, then ones. Of course, the general correctness of this algorithm is not guaranteed. Since this algorithm is not comparison-based, therefore this fact does not contradict to the zero-one principle.

But merge is sorting, and *
Odd-Even-Merge
* is an oblivious sorting algorithm.

**Theorem 15.8 **
*Algorithm
*

`Odd-Even-Merge`

sorts correctly sequences consisting of arbitrary numbers.
**Proof. **Let and sorted 0-1 sequences of length . Let the number of zeros at the beginning of . Then the number of zeros in equals to , while the number of zeros in is . Therefore the number of zeros in equals to and the number of zeros in equals to .

The difference of and is at most 2. This difference is exactly then 2, if and are both odd numbers. Otherwise the difference is at most 1. Let suppose, that (the proof in the other cases is similar). In this cases contains two additional zeros. When the algorithm shuffles and , begins with an even number of zeros, end an even number of ones, and between the zeros and ones is a short “dirty” part, 0, 1. After the comparison and change in the last step of the algorithm the whole sequence become sorted.

Algorithm *
Work-Optimal-Merge
* uses only processors, but solves the merging in logarithmic time. This algorithm divides the original problem into parts so, that each part contains approximately elements.

Let and be the input sequences. Divide into parts so, that each part contain at most elements. Let the parts be denoted by . Let the largest element in be .

Assign a processor to each element. These processors determine (by binary search) the correct place (according to the sorting) of in . These places divide to parts (some of these parts can be empty). Let denote these parts by . We call the subset corresponding to in (see Figure 15.21).

The algorithm gets the merged sequence merging at first with , with and so on, and then joining these merged sequences.

**Theorem 15.9 **
*Algorithm
*

`Optimal-Merging`

merges two sorted sequences of length in time on
`CREW PRAM`

processors.
**Proof. **We use the previous algorithm.

The length of the parts is , but the length of the parts can be much larger. Therefore we repeat the partition. Let an arbitrary pair. If , then and can be merged using one processor in time. But if , then divide into parts—then each part contains at most keys. Assign a processor to each part. This assigned processor finds the subset corresponding to this subsequence in : time is sufficient to do this. So the merge of and can be reduced to subproblems, where each subproblem is the merge of two sequences of length.

The number of the used processors is , and this is at most , what is not larger then .

This theorem imply, that *
Optimal-Merging
* is work-optimal.

In the **selection problem**
elements and a positive integer are given and the -th smallest element is to be selected. Since selection requires the investigation of all elements, and our operations can handle at most two elements, so .

Since it is known sequential algorithm A requiring only time, so A is asymptotically optimal.

The **search problem** is similar: in that problem the algorithm has to decide, whether a given element appears in the given sequence, and if yes, then where. Here negative answer is also possible and the features of any element decide, whether it corresponds the requirements or not.

We investigate three special cases and work-efficient algorithms to solve them.

Let , that is we wish to select the largest key. Algorithm *
Quadratic-Select
* solves this task in time using CRCW processors.

The input ( different keys) is the sequence , and the selected largest element is returned as .

*
Quadratic-Select(
)
*

12`IF`

3`THEN`

4`RETURN`

`IN PARALLEL FOR`

,`TO`

`TO`

`DO`

5`IF`

`THEN`

6`FALSE`

`ELSE`

7`TRUE`

`IN PARALLEL FOR`

8`TO`

`DO`

9`TRUE`

`IN PARALLEL FOR`

,`TO`

10`TO`

`IF`

11`FALSE`

`THEN`

12`FALSE`

`IN PARALLEL FOR`

13`TO`

`DO`

`IF`

14`TRUE`

15`THEN`

`RETURN`

In the first round (lines 4–6) the keys are compared in parallel manner, using all the processors. so, that processor computes the logical value . We suppose that the keys are different. If the elements are not different, then we can use instead of the pair (this solution requires an additional number of length bits. Since there is a unique key for which all comparison result *
FALSE
*, this unique key can be found with a logical

`OR`

**Theorem 15.11 **
**(selection in time)**
*Algorithm
*

`Quadratic-Select`

determines the largest key of different keys in time using
`CRCW`

common
`PRAM`

processors.
**Proof. **First and third rounds require unit time, the second round requires time, so the total running time is .

The speedup of this algorithm is . The work of the algorithm is . So the efficiency is . It follows that this algorithm is not work-optimal, even it is not work-effective.

Now we show that the maximal element among keys can be found, using even only common CRCW PRAM processors and time. The used technique is the *divide-and-conquer*. For the simplicity let be a square number.

The input and the output are the same as at the previous algorithm.

*
Quick-Selection(
)
*

12`IF`

3`THEN`

4`RETURN`

5`IF`

divide the input into groups and divide the processors into groups 6`THEN`

`IN PARALLEL FOR`

7`TO`

recursively determines the maximal element of the group 8`DO`

9`Quadratic-Select(`

`)`

`RETURN`

The algorithm divides the input into groups so, that each group contains elements , and divides the processors into groups so, that group contains processors . Then the group of processors computes recursively the maximum of group . Finally the previous algorithm *
Quadratic-Select
* gets as input the sequence and finds the maximum y of the input sequence .

**Theorem 15.12 **
**(selection in time)**
*Algorithm
*

`Quick-Select`

determines the largest of different elements in time using common
`CRCW PRAM`

processors.Let the running time of the algorithm denoted by . Step 1 requires time, step 2 requires time. Therefore satisfies the recursive equation

having the solution .

The total work of algorithm *
Quick-Select
* is , so its efficiency is , therefore

`Quick-Select`

If the problem is to find the maximum of keys when the keys consist of one bit, then the problem can be solved using a logical *
OR
* operation, and so requires only constant time using processors. Now we try to extend this observation. Let be a given positive integer constant, and we suppose the keys are integer numbers, belonging to the interval . Then the keys can be represented using at most bits. For the simplicity we suppose that all the keys are given as binary numbers of length bits.

The following algorithm *
Integer-Selection
* requires only constant time and CRCW PRAM processors to find the maximum.

The basic idea is to partition the bits of the numbers into parts of length . The -th part contains the bits , the number of the parts is . Figure 15.22 shows the partition.

The input of *
Integer-Selection
* is the number of processors and the sequence containing different integer numbers, and output is the maximal number .

*
Integer-Selection(
)
*

1`FOR`

2`TO`

compute the maximum of the remaining numbers on the base of their -th part 3 delete the numbers whose -th part is smaller than 4 one of the remaining numbers 5`DO`

`RETURN`

The algorithm starts with searching the maximum on the base of the first part of the numbers. Then it delete the numbers, whose first part is smaller, than the maximum. Then this is repeated for the second, ..., last part of the numbers. Any of the non deleted numbers is maximal.

**Theorem 15.13 **
**(selection from integer numbers)**
*If the numbers are integers drawn from the interval , then algorithm
*

`Integer-Selection`

determines the largest number among numbers for any positive in time using
`CRCW PRAM`

processors.
**Proof. **Let suppose that we start with the selection of numbers, whose most significant bits are maximal. Let this maximum in the first part denoted by . It is sure that the numbers whose first part is smaller than are not maximal, therefore can be deleted. If we execute this basis operation for all parts (that is times), then exactly those numbers will be deleted, what are not maximal, and all maximal element remain.

If a key contains at most bits, then its value is at most . So algorithm *
Integer-Select
* in its first step determines the maximum of integer numbers taken from the interval . The algorithm assigns a processor to each number and uses common memory locations , containing initially . In one step processor writes into . Later the maximum of all numbers can be determined from memory cells using processors by Theorem 15.11 in constant time.

Let the sequence contain different numbers and the problem is to select the th smallest element of . Let we have CREW processors.

*
General-Selection(
)
*

1 divide the processors into groups so, that group contains the processors and divide the elements into groups so, that group contains the elements 2`IN PARALLEL FOR`

3`TO`

determine (how many elements of are smaller, than ) 4`DO`

`IN PARALLEL FOR`

5`TO`

using`DO`

determine (how many elements of are smaller, than ) 6`Optimal-Prefix`

`IN PARALLEL FOR`

7`TO`

`DO`

8`IF`

`THEN RETURN`

**Theorem 15.14 **
**(general selection)**
*The algorithm
*

`General-Selection`

determines the -th smallest of different numbers in time using processors.
**Proof. **In lines 2–3 works as a sequential processor, therefore these lines require time. Lines 4–5 require time according to Theorem 15.3. Lines 6–8 can be executed in constant time, so the total running time is .

The work of *
General-Selection
* is , therefore this algorithm is not work-effective.

Given a sequence the **sorting problem** is to rearrange the elements of e.g. in increasing order.

It is well-known that any A sequential comparison-based sorting algorithm needs comparisons, and there are comparison-based sorting algorithms with running time.

There are also algorithms, using special operations or sorting numbers with special features, which solve the sorting problem in linear time. If we have to investigate all elements of and permitted operations can handle at most 2 elements, then we get . So it is true, that among the comparison-based and also among the non-comparison-based sorting algorithms are asymptotically optimal sequential algorithms. In this subsection we consider three different sorting algorithm.

Using the ideas of algorithms *
Quadratic-Selection
* and

`Optimal-Prefix`

*
Quadratic-Sort(
)
*

12`IF`

3`THEN`

4`RETURN`

`IN PARALLEL FOR`

,`TO`

`TO`

`DO`

5`IF`

6`THEN`

7 divide the processors into groups so, that group contains processors 8`ELSE`

`IN PARALLEL FOR`

9`TO`

compute 10`DO`

`IN PARALLEL FOR`

11`TO`

12`DO`

`RETURN`

In lines 4–7 the algorithm compares all pairs of the elements (as *
Quadratic-Selection
*), then in lines 7–9 (in a similar way as

`Optimal-Prefix works`

**Theorem 15.15 **
**(sorting in time)**
*Algorithm
*

`Quadratic-Sort`

sorts elements using
`CRCW PRAM`

processors in time.
**Proof. **Lines 8–9 require time, and the remaining lines require only constant time.

Since the work of *
Quadratic-Sort
* is , this algorithm is not work-effective.

The next algorithm uses the *
Odd-Even-Merge
* algorithm and the classical

*
Odd-Even-Sort(
)
*

12`IF`

3`THEN`

4`IF`

let and . 5`THEN`

`IN PARALLEL FOR`

6`TO`

sort recursively to get 7`DO`

`IN PARALLEL FOR`

8`TO`

sort recursively to get 9`DO`

`IN PARALLEL FOR`

10`TO`

merge and using`DO`

11`Odd-Even-Merge(`

`)`

`RETURN`

The running time of this EREW PRAM algorithm is .

**Theorem 15.16 **
**(sorting in time)**
*Algorithm
*

`Odd-Even-Sort`

sorts elements in time using
`EREW PRAM`

processors.
**Proof. **Let be the running time of the algorithm. Lines 3–4 require time, Lines 5–8 require time, and lines 9–10 require time, line 11 require time. Therefore satisfies the recurrence

having the solution .

**Example 15.7 **
*Sorting on 16 processors.* Sort using 16 processors the following numbers: 62, 19, 8, 5, 1, 13, 11, 16, 23, 31, 9, 3, 18, 12, 27, 34. At first we get the odd and even parts, then the first 8 processors gets the sequence , while the other 8 processors get . The output of the first 8 processors is , while the output of the second 8 processors is . The merged final result is .

The work of the algorithm is , its efficiency is , and its speedup is . The algorithm is not work-optimal, but it is work-effective.

If we have more processors, then the running time can be decreased. The following recursive algorithm due to Preparata uses CREW PRAM processors and time. Input is the sequence , and the output is the sequence containing the sorted elements.

*
Preparata(
)
*

12`IF`

sort using processors and`THEN`

3`Odd-Even-Sort`

4 divide the elements into parts so, that each part contains elements, and divide the processors into groups so, that each group contains processors 5`RETURN`

`IN PARALLEL FOR`

6`TO`

sort the part recursively to get a sorted sequence 7 divide the processors into groups containing processors 8`DO`

`IN PARALLEL FOR`

`TO`

9`TO`

merge and 10 divide the processors into groups so, that each group contains processors 11`DO`

`IN PARALLEL FOR`

12`TO`

determine the ranks of the element in using the local ranks received in line 9 and using the algorithm`DO`

13 the elements of having a rank 14`Optimal-Prefix`

`RETURN`

This algorithm uses the *divide-and-conquer* principle. It divides the input into parts, then merges each pair of parts. This merge results local ranks of the elements. The global rank of the elements can be computed summing up these local ranks.

**Theorem 15.17 **
**(sorting in time)**
*Algorithm
*

`Preparata`

sorts elements in time using
`CREW PRAM`

processors.
**Proof. **Let the running time be . Lines 4–6 require time, lines 7–12 together . Therefore satisfies the equation

having the solution .

The work of *
Preparata
* is the same, as the work of

`Odd-Even-Sort`

**Exercises**

15.6-1 The memory cell of the global memory contains some data. Design an algorithm, which copies this data to the memory cells in time, using EREW PRAM processors.

15.6-2 Design an algorithm which solves the previous Exercise 15.6-1 using only EREW PRAM processors saving the running time.

15.6-3 Design an algorithm having running time and determining the maximum of numbers using common CRCW PRAM processors.

15.6-4 Let be a sequence containing keys. Design an algorithm to determine the rank of any key using CREW PRAM processors and time.

15.6-5 Design an algorithm having running time, which decides using common CRCW PRAM processors, whether element 5 is contained by a given array , and if is contained, then gives the largest index , for which holds.

15.6-6 Design algorithm to merge two sorted sequence of length in time, using CREW PRAM processors.

15.6-7 Determine the running time, speedup, work, and efficiency of all algorithms, discussed in this section.