Conclusion

6 Conclusion

6.1 Implementation

Our scans detector prototype is written in C and Lisp. The most heavily used tools are written in C and can be accessed via an ASCII interface using a Lisp-like syntax. This interpreter combine the following tools:

the PIP software for solving integer programming problems,
the convex calculator from IRISA [20],
a set of operations on systems of equations (mainly substitution).

There are two main modules in our current prototype, a module of system normalization at a given pseudo-depth and a module for detecting scans at a given pseudo-depth. The canonical way of using these modules is to apply normalization at the greater pseudo-depth and then to apply the detection module at the same pseudo-depth. If no scan is found the process may be iterated with a lower pseudo-depth and so on. Most of the operations are executed by the C interpreter, the only major part coded in Lisp is the extraction of the binary operation and the associated data from the propagation function of recurrences. In the present version of the software, rules for detecting multi-directional scans have not been implemented yet; their implementation will lead to the creation of a third module.

Consider the following extract from a real world program:
DO i=1,n
  b(i)=a(i)
  a(i-1)=a(i-1)+b(i)
  a(i)=a(i)+b(i)
  a(i+1)=a(i+1)+b(i)
END DO
print *,(a(i),i=1,n)

 
The system found after the application of the normalization and detection module at pseudo-depth 0 is:
x[i] =
   case
      i | i=1  : a[1]+a[2] ;  
      i | i>=2  :
         Scan(  i' | 0<=i'<=n , ( [1] ),
               +, a[i'+1], a[1] ) [i] ;
   esac ;

 
print
   case
      i | i=1  : a[1] + a[1] + x[1] ;
      i | 2<=i<=n-1  : x[i-1] + x[i-1] + x[i] ;
      i | i=n  : x[n-1] + x[n-1] ;
   esac ;

 
A scan hidden by some manipulations on the elements of the array a is detected. The results for some other examples (mostly from the Argonne benchmarks [3]) can be found in [19]. All of the 18 loops which are classified as reductions in the Argonne benchmarks can be solved by our software with one exception (the DFG analysis cannot be done because of the presence of a goto in the main loop).

6.2 Future Work

The present version of our software fulfill one of the aims we set ourselves when we began this research: to build a much more powerful scan detector than what is found in current parallelizers and vectorizers. We detect much more scans than do others, and this is done with a very small knowledge base. Our detector is almost insensitive to variations in the syntax of the original loop nest. Last but not least, we are able to recognize scan on arrays and arrays of scans. We believe our software is the first one to implement this facility.

This work is obviously just the beginning on the road toward efficient implementations of scans and reductions. The first problem that suggests itself is how to use the results of our analyzer.These results are not an imperative program, but a set of recurrence equations embellished with Scan operators. We still have to convert them back to our object language, e.g. some sort of parallel or data-parallel Fortran. We gave preliminary solutions and experimental results in [18] but much more work is needed in that direction.

There are, however, other applications of scan detection besides parallelization. One of them is program checking (or reverse engineering). After scan detection --- and possibly some pretty printing --- a program is brought in a form which is much nearer to mathematical notations than the original. It should be easier to detect errors --- e.g. a summation which is short one term --- on the mathematical representation than on the imperative version.

There is another, possibly much more interesting application, algorithm recognition. Toward this aim, we need a complete set of operators on scans, and some way of organizing them in a semblance of a normalization algorithm, it being understood that a full normalization algorithm is probably impossible. We could then compare the result to a base of normal forms for standard algorithms, e.g. all those for which we have an efficient implementation in our library. We could then replace part of the original program by a call of the corresponding routine. If the user care to supply directives, we could even select a version which is well adapted to the task at hand, as for instance a vector or parallel version, or even a sparse version.

This proposal is not as farfetched as it seems. Consider, for example, the following code for matrix multiplication:

      do 1 i=1,n
        do 1 j=1,n
          s=0.
          do 2 k=1,n
   2        s = s + a(i,k) * b(k,j)
   1      c(i,j) = s

Our present scan detector translates this program into one recurrence equation:

c[i,j] = Scan(  i',j',k' | 1<=i'<=n, 1<=j'<=n, 0<=k'<=n ,
               ( [0 0 1] ), +, a[i',k'] * b[k',j'], 0 )[i,j,n] ;

Besides, it seems probable --- but it has to be investigated --- that most variants of this code (e.g. all those obtained by permuting the loops) will normalize to the same or to very similar equations. It thus seems that we are at the threshold of being able of recognizing simple algorithms from linear algebra.

To achieve this ``semantic parallelization'', a whole algebra must be build around the Scan operator; and we must find a way to translate its rules into a normalization algorithm. This will be the subject of future work.