Notes on Concurrency and Parallelism
based on Chapter 10 of
"Introduction to Computing and Algorithms,"
Russell L. Shackelford,
Addison-Wesley, 1998.

DEFINITIONS
-----------
Definition:  CONCURRENCY is the execution of multiple tasks
"at the same time".

"At the same time" can have several interpretations:
(1) there is a single processor (CPU) that is rapidly taking turns
    and doing little pieces of each task in turn
(2) there are several processors, and each one is devoted to one
    task, executing at the same time
(3) there are several processors, each of which is taking turns
    executing pieces of several tasks

Example:  When you are cooking Thanksgiving dinner, you 
. start the bread dough rising
. put the turkey in the oven
. punch down the dough and shape it
. baste the turkey
. put the rolls in the oven
. take out the turkey and carve it
. take the rolls out of the oven

Example:  On a time-shared (multiuser) computer, the processor
takes turns giving all the jobs a little bit of processing time -

In these two examples, there is only one processing unit (the cook /
the single CPU), but the illusion is given that multiple tasks are
being done simultaneously.

Definition:  PARALLELISM is the deployment of multiple processors
on a single task.  Some degree of cooperation is required, often
overseen by one program.

Example:  You want to discover whether a large number n is prime
(has no factors other than 1 and itself).  Apply the "trial division"
algorithm (divide n by 2, then by 3, then by 4, ..., then by
the square root of n and for each one check is there is a remainder)
in parallel:  One computer can do the checks for 1 through 100,
another for 101 through 200, etc.  Cooperation is needed to
divide up the trial factors and to collate the answers, but 
the computations are independent and go on in parallel.

Definition:  A DISTRIBUTED SYSTEM is a computer system that consists of
multiple computers, each with its own memory and each with one or
more processors, that reside at different sites.

Examples:  airline reservation system, a bank's automatic teller machine
system.

Usually no one program is "in charge".  Instead, control is distributed
to many programs that have to cooperate with one another on "equal"
footing.

ISSUES IN CONCURRENCY
---------------------
* Protection:  make sure that one task does not interfere with another.
  Need to partition the memory storing the different tasks' data.
  Need to make sure that only one task at a time is trying to use
  devices like printers.

* Fairness:  guarantee that each task gets its "fair share" of CPU
  cycles and other resources (like printer).
  There are various ways to define what is fair.
  One option is first-come, first-served.  But this can make
  small jobs wait a long time for big jobs.
  So divide up time into slices.  Perhaps give higher priority to 
  small jobs, at least at peak times.  Numerous schemes have been
  proposed and their performance analyzed.

* Deadlock:  We already saw this issue with transactions.
  Various schemes have been proposed and analyzed.

ISSUES IN PARALLELISM
---------------------
Using a parallel system will not reduce the total amount of work
done.  Think about the example that does the trial division algorithm
to decide if a number is prime.  You still have to
do sqrt(n) divisions to decide if n is prime (in the worst case).
It's just that instead of doing the divisions one after the other,
you do (at least some of) them in parallel.  At one extreme you could
have sqrt(n) processors and each one does one division.  

There will also be extra work to be done - overhead to partition the
problem and collate the results, so in fact, usually the total
work done will be *more* than in the sequential case.

Furthermore, the actual number of processors available will almost
never scale with the problem size.  The number of processors is a
property of the hardware, whereas problem sizes can often grow
arbitrarily large.

Dependencies
------------
Another issue that limits what you can do with parallelism is
inherent dependencies between different steps of an algorithm.
Sometimes you cannot do step i+1 until you know the result of
doing step i, and thus you can't do those two steps in parallel.

Example:
Step 1:  read a from the user
Step 2: b := a*3
Step 3: c := b*a

This 3-line algorithm will have to be done serially.

Dependencies can be represented with a dependency graph.
(Draw it.  Then draw graph if Step 3 becomes "c := a*4".)


Another example program:

Step 1: read a
Step 2: read b
Step 3: c := a*4
Step 4: d := b/3
Step 5: e := c*d
Step 6: f := d+8

Dependency graph:


Indicates that two processors is the most we can exploit here:
Using two processors will result in elapsed time of 3, vs. elapsed
time of 6 with one processor, but more than two processors will not
speed things up.

Yet another example:

        for i := 1 to MAX do
Step 1:    read a[i]
Step 2:    b[i] := a[i]+4
Step 3:    c[i] := a[i]/3
Step 4:    d[i] := b[i]/c[i] 
        endfor

(Line numbers indicate that we are ignoring the overhead required to
manage the loop.)

Dependency graph for a single iteration of the for loop:


Indicates we can reduce from four time steps to three, by using two
processors.

But notice that each iteration of the for loop is independent of
any other iteration!  So if we have 2*MAX processors, we can compute
the entire for loop in three time steps.

Precedence:
-----------
Like dependency, precedence places a requirement on the relative
order in which two steps are executed, but the reason is different.
In a dependency relationship, S1 must occur before S2 because S2 needs 
the result of S1.  In a precedence relationship, S1 must occur before S2
because the execution of S2 would contaminate the data needed by S1.

Example:

        for x := 1 to 3 do
Step 1:    read a
Step 2:    print a
Step 3:    a := a*7
Step 4:    print a
        endfor

Step 2 must execute before Step 3 does, otherwise it will print the
wrong value.

We can indicate both dependency and precedence relations in a single
graph (dependency relations are indicated by an arrow while precedence
relations are indicated by an arrow with a line through it).

Example:

        for i := 1 to n do
Step 1:    read a[i]
Step 2:    a[i] := a[i]*7
Step 3:    c := a[i]/3
Step 4:    print c
        endfor

Here's the graph:


We can exploit at most two processors and finish the loop in 2n+2 time
steps (as opposed to 4n time steps with just one processor).

The problem here is that variable c is a bottleneck for parallelism,
since there is a precedence relation between the printing of c in
iteration i and the assignment to c in iteration i+1.
In this case, we can make c into an array also and speed things up:
now each iteration is independent of any other iteration and we can use
n processors to execute the entire loop in four time steps.