Notes on Concurrency and Parallelism based on Chapter 10 of "Introduction to Computing and Algorithms," Russell L. Shackelford, Addison-Wesley, 1998. DEFINITIONS ----------- Definition: CONCURRENCY is the execution of multiple tasks "at the same time". "At the same time" can have several interpretations: (1) there is a single processor (CPU) that is rapidly taking turns and doing little pieces of each task in turn (2) there are several processors, and each one is devoted to one task, executing at the same time (3) there are several processors, each of which is taking turns executing pieces of several tasks Example: When you are cooking Thanksgiving dinner, you . start the bread dough rising . put the turkey in the oven . punch down the dough and shape it . baste the turkey . put the rolls in the oven . take out the turkey and carve it . take the rolls out of the oven Example: On a time-shared (multiuser) computer, the processor takes turns giving all the jobs a little bit of processing time - In these two examples, there is only one processing unit (the cook / the single CPU), but the illusion is given that multiple tasks are being done simultaneously. Definition: PARALLELISM is the deployment of multiple processors on a single task. Some degree of cooperation is required, often overseen by one program. Example: You want to discover whether a large number n is prime (has no factors other than 1 and itself). Apply the "trial division" algorithm (divide n by 2, then by 3, then by 4, ..., then by the square root of n and for each one check is there is a remainder) in parallel: One computer can do the checks for 1 through 100, another for 101 through 200, etc. Cooperation is needed to divide up the trial factors and to collate the answers, but the computations are independent and go on in parallel. Definition: A DISTRIBUTED SYSTEM is a computer system that consists of multiple computers, each with its own memory and each with one or more processors, that reside at different sites. Examples: airline reservation system, a bank's automatic teller machine system. Usually no one program is "in charge". Instead, control is distributed to many programs that have to cooperate with one another on "equal" footing. ISSUES IN CONCURRENCY --------------------- * Protection: make sure that one task does not interfere with another. Need to partition the memory storing the different tasks' data. Need to make sure that only one task at a time is trying to use devices like printers. * Fairness: guarantee that each task gets its "fair share" of CPU cycles and other resources (like printer). There are various ways to define what is fair. One option is first-come, first-served. But this can make small jobs wait a long time for big jobs. So divide up time into slices. Perhaps give higher priority to small jobs, at least at peak times. Numerous schemes have been proposed and their performance analyzed. * Deadlock: We already saw this issue with transactions. Various schemes have been proposed and analyzed. ISSUES IN PARALLELISM --------------------- Using a parallel system will not reduce the total amount of work done. Think about the example that does the trial division algorithm to decide if a number is prime. You still have to do sqrt(n) divisions to decide if n is prime (in the worst case). It's just that instead of doing the divisions one after the other, you do (at least some of) them in parallel. At one extreme you could have sqrt(n) processors and each one does one division. There will also be extra work to be done - overhead to partition the problem and collate the results, so in fact, usually the total work done will be *more* than in the sequential case. Furthermore, the actual number of processors available will almost never scale with the problem size. The number of processors is a property of the hardware, whereas problem sizes can often grow arbitrarily large. Dependencies ------------ Another issue that limits what you can do with parallelism is inherent dependencies between different steps of an algorithm. Sometimes you cannot do step i+1 until you know the result of doing step i, and thus you can't do those two steps in parallel. Example: Step 1: read a from the user Step 2: b := a*3 Step 3: c := b*a This 3-line algorithm will have to be done serially. Dependencies can be represented with a dependency graph. (Draw it. Then draw graph if Step 3 becomes "c := a*4".) Another example program: Step 1: read a Step 2: read b Step 3: c := a*4 Step 4: d := b/3 Step 5: e := c*d Step 6: f := d+8 Dependency graph: Indicates that two processors is the most we can exploit here: Using two processors will result in elapsed time of 3, vs. elapsed time of 6 with one processor, but more than two processors will not speed things up. Yet another example: for i := 1 to MAX do Step 1: read a[i] Step 2: b[i] := a[i]+4 Step 3: c[i] := a[i]/3 Step 4: d[i] := b[i]/c[i] endfor (Line numbers indicate that we are ignoring the overhead required to manage the loop.) Dependency graph for a single iteration of the for loop: Indicates we can reduce from four time steps to three, by using two processors. But notice that each iteration of the for loop is independent of any other iteration! So if we have 2*MAX processors, we can compute the entire for loop in three time steps. Precedence: ----------- Like dependency, precedence places a requirement on the relative order in which two steps are executed, but the reason is different. In a dependency relationship, S1 must occur before S2 because S2 needs the result of S1. In a precedence relationship, S1 must occur before S2 because the execution of S2 would contaminate the data needed by S1. Example: for x := 1 to 3 do Step 1: read a Step 2: print a Step 3: a := a*7 Step 4: print a endfor Step 2 must execute before Step 3 does, otherwise it will print the wrong value. We can indicate both dependency and precedence relations in a single graph (dependency relations are indicated by an arrow while precedence relations are indicated by an arrow with a line through it). Example: for i := 1 to n do Step 1: read a[i] Step 2: a[i] := a[i]*7 Step 3: c := a[i]/3 Step 4: print c endfor Here's the graph: We can exploit at most two processors and finish the loop in 2n+2 time steps (as opposed to 4n time steps with just one processor). The problem here is that variable c is a bottleneck for parallelism, since there is a precedence relation between the printing of c in iteration i and the assignment to c in iteration i+1. In this case, we can make c into an array also and speed things up: now each iteration is independent of any other iteration and we can use n processors to execute the entire loop in four time steps.