Introduction

1 Introduction

1.1 Motivation for Scan Detection

Most automatic parallelizers, whether industrial or academic, are based on the detection of independent operations. This is a purely syntactical process, which relies on the identification of read and modified sets of memory cells, then checking the Bernstein conditions. There is another source of parallelism, which relies on semantical information, namely the detection of reductions. A reduction is the application of an associative binary operator to a vector of values, the result being a single scalar. Reductions are quite frequent in scientific codes and are implemented efficiently on most parallel computers. Scans are similar to reductions, but the intermediate results are kept, thus giving a vector. Since there is no single rule for detecting associative operators, scan detection must use pattern-matching. Our method is based on a representation of the program as a system of recurrence equations. This allow the detection of reductions in loops nests of arbitrary depth and on multi-dimensional arrays. Scans and reductions are represented in symbolic form with the help of the Scan operator. There are many relations between scans which allows the design of a Scan ``algebra''. Some of the rules of this algebra are presented here, with examples showing their use in program normalization. Development of this algebra may lead in the future to algorithm recognition.

1.2 The Dataflow Graph

In the area of automatic parallelization a very useful structure is the DataFlow Graph (DFG). This structure is more accurate than the classical Dependences Graph (DG) which states the pairs of statements in memory conflict. Some memory conflicts are due to memory cell reuse. It may be satisfied either by enforcing sequential execution, or by avoiding memory reuse by expansion. To find more parallelism, a compiler should base its analysis on the data flow.

In fact the DFG gives, for each reference to a scalar or array element in an operation, the source operation, that is the operation in which the scalar or the array element was computed. The PAF parallelizer builds such a DFG. To be more precise, for a set of operations (i,v)_v_Î_D ¹, PAF computes a Quasi Affine Selection Tree (QuAST) of source operations. This is necessary because the source operation depends on v. Each condition of the selection tree is a term on loop counters and constants using additions, multiplications by a constant and Euclidean divisions by a constant. In the context of the PAF compiler, the DFG can be computed only for static control programs, which have only assignment statements and DO loops with linear subscripts and bounds.

Consider, for example, the following code:

DO i=1,n
 s(i)=a(i)*b(i)     (v1)
END DO
s(0)=0              (v2)
DO i=1,n
 s(i)=s(i-1)+s(i)   (v3)
END DO

Its Dataflow Graph is:

Source of s(i-1) in v3 ;
   * if [ i-2>=0 ] then (v3,i-1)
   * if [ i-1=0 ] then (v2)
Source of s(i) in v3 ;
   * if [ true ] then (v1,i)

The DFG allows the translation of a static control program into a single assignment program (see 3.1). In such a program there is no memory cell reuse. An important property of single assignment programs is that a reference to a memory cell can be replaced by its definition: this is the usual notion of forward substitution in systems of equations. Another method for computing informations equivalent to the DFG has been devised by Pugh and Wonnacott [18].