Conclusion

6 Conclusion

6.1 Implementation

Our scans detector prototype is written in C and Lisp. The most heavily used tools are written in C and can be accessed via an ASCII interface using a Lisp-like syntax. This interpreter combine the following tools:

the PIP software for solving integer programming problems,
the convex calculator from IRISA [17],
a set of operations on systems of equations (mainly forward substitution).

For example, consider the system of equations below.

v1 : { i | 1 <= i <= n } of real ;
v2 : { i | 1 <= i <= n } of real ;
v1[i] = case
         { i | i = 1 }  : b[0] + a[1] ;
         { i | i >= 2 } : v2[i-1] + a[i] ;
        esac ;
v2[i] = case
         { i | i = 1 }  : a[0] + b[1] ;
         { i | i >= 2 } : v1[i-1] + b[i] ;
        esac ;

The following text asks the interpreter for a substitution of the definition of v2 into the second clause of v1 (note the keyword subst in this clause):

(print_sys (system (n)
   (equation v1 (i)
      (clause (constraints (i n) (i = 1)) (add (aref b 0) (aref a 1)))
      (clause (constraints (i n) (i >= 2) (i <= n))
              (subst v2 (add (eref v2 (function (i n) ( - i 1) n)) (aref a i)) 0)))
   (equation v2 (i)
      (clause (constraints (i n) (i = 1) ) (add (aref a 0) (aref b 1)))
      (clause (constraints (i n) (i >= 2) (i <= n))
              (add (eref v1 (function (i1 n) ( - i1 1) n)) (aref b i))))))

The interpreter answer is a bit hard to read due to the representation of convex domains as matrices of constraints:

(system (n) (equation v1 (i)
    (clause ((#[#[0 1 0 -1] #[1 0 0 1]].#[#[0 0 1 0] #[1 1 0 1]]))
            (add (aref b 0) (aref a 1)))
    (clause ((#[#[0 1 0 -2] #[1 0 1 -2] #[1 0 0 1]].#[#[1 0 1 0] #[1 2 2 1]]))
            (add (add (aref a 0) (aref b 1)) (aref a i)))
    (clause ((#[#[1 -1 1 0] #[1 1 0 -3] #[1 0 0 1]].#[#[1 1 1 0] #[1 0 1 0] #[1 3 3 1]]))
            (add (add (eref v1 #[#[1 0 -2] #[0 1 0]]) (aref b (- i 1))) (aref a i))))
 (equation v2 (i)
    (clause ((#[#[0 1 0 -1] #[1 0 0 1]].#[#[0 0 1 0] #[1 1 0 1]]))
            (add (aref a 0) (aref b 1)))
    (clause ((#[#[1 1 0 -2] #[1 -1 1 0] #[1 0 0 1]].#[#[1 0 1 0] #[1 1 1 0] #[1 2 2 1]]))
            (add (eref v1 #[#[1 0 -1] #[0 1 0]]) (aref b i)))))

Yet one can recognize the system of equation explicitly given below, which is indeed the expected result.

v1[i] = case
         { i | i = 1 }  : b[0] + a[1] ;
         { i | i = 2 }  : a[0] + b[1] + a[2] ;
         { i | i >= 3 } : v1[i-2] + b[i-1] + a[i] ;
        esac ;
v2[i] = case
         { i | i = 1 }  : a[0] + b[1] ;
         { i | i >= 2 } : v1[i-1] + b[i] ;
        esac ;

There are two main modules in our current prototype, a module of system normalization at a given pseudo-depth and a module for detecting scans at a given pseudo-depth. The canonical way of using these modules is to apply normalization at the greater pseudo-depth and then to apply the detection module at the same pseudo-depth. If no scan is found the process may be iterated with a lower pseudo-depth and so on. Most of the operations are executed by the C interpreter, the only major part coded in Lisp is the extraction of the binary operation and the associated data from the propagation function of recurrences. At the present time, the rules presented in section 5 are not used in the prototype, even the enhancement of scans to multi-directional ones. The implementation of these rules will lead to the creation of a third module. The aim of this module is to increase the size of the accumulation domain of the detected scans. This may be achieved by enhancement or combination of scans.

Consider the following program extracted from the Argonne benchmarks [2]:
      j=0                   (v1)
      do 90 i = 1,n
         if(b(i).gt.x)then
            x = b(i)        (v2)
            j = i           (v3)
         endif
   90 continue
      write(*,*) x          (v4)
      write(*,*) j          (v5)
The system found after the application of the detection module at pseudo-depth 0 (there is no need of the normalization module here) is:
v2[i] = case
      { i | i >= 2 } :
          Scan( { i' | 0<=i'<=n }, ( [1] ),
                max, b[i'], x )[i] ;
      { i | i = 1 }  :
          if (b[1] > x) then b[1] else x ;
     esac ;
v4 = v2[n] ;
v5 = Scan( { i' | 1<=i'<=n }, ( [1] ),
           search, (b[i'] > v2[i'-1]) x i',
           true x (if (b[1] > x)
                   then 1 else 0) )[n] ;
The two scans are detected (the one which computes the maximum of the array b and the one which compute the subscript of this maximum). The results for the other Argonne benchmarks can be found in [16].

6.2 Future Work

The present version of our software fulfill one of the aims we set ourselves when we began this research: to build a much more powerful scan detector than what is found in current parallelizers and vectorizers. We detect much more scans than do others, and this is done with a very small knowledge base. Our detector is almost insensitive to variations in the syntax of the original loop nest. Last but not least, we are able to recognize scan on arrays and arrays of scans. We believe our software is the first one to implement this facility.

This work is obviously just the beginning on the road toward efficient implementations of scans and reductions. The first problem that suggests itself is how to use the results of our analyzer.These results are not an imperative program, but a set of recurrence equations embellished with Scan operators. We still have to convert them back to our object language, e.g. some sort of parallel or data-parallel Fortran. We gave preliminary solutions and experimental results in [15] but much more work is needed in that direction.

There are, however, other applications of scan detection besides parallelization. One of them is program checking (or reverse engineering). After scan detection --- and possibly some pretty printing --- a program is brought in a form which is much nearer to mathematical notations than the original. It should be easier to detect errors --- e.g. a summation which is short one term --- on the mathematical representation than on the imperative version.

There is another, possibly much more interesting application, algorithm recognition. Toward this aim, we need a complete set of operators on scans, and some way of organizing them in a semblance of a normalization algorithm, it being understood that a full normalization algorithm is probably impossible. We could then compare the result to a base of normal forms for standard algorithms, e.g. all those for which we have an efficient implementation in our library. We could then replace part of the original program by a call the the corresponding routine. If the user care to supply directives, we could even select a version which is well adapted to the task at hand, as for instance a vector or parallel version, or even a sparse version.

This proposal is not as farfetched as it seems. Consider, for example, the following code for matrix multiplication:

      do 1 i=1,n
        do 1 j=1,n
          s=0.                        (v1)
          do 2 k=1,n
   2        s = s + a(i,k) * b(k,j)   (v2)
   1      c(i,j) = s                  (v3)

Our present scan detector translates this program into a very small system of equations:

v4[i,j] = Scan( { i',j',k' | 1<=i'<=n, 1<=j'<=n, 0<=k'<=n },
                ( [0 0 1] ), +, a[i',k'] * b[k',j'], 0 )[i,j,n] ;

Besides,it seems probable --- but it has to be investigated --- that most variants of this code (e.g. all those obtained by permuting the loops) will normalize to the same or to very similar equations. It thus seems that we are at the threshold of being able of recognizing simple algorithms from linear algebra.

To achieve this ``semantic parallelization'', a whole algebra must be build around the Scan operator. The basis of this algebra is given in section 5. Now, we must find a way to apply its rules to the systems of equations in order to obtain a better normalization. This will be the subject of future work.