6 Conclusion
6.1 Implementation
Our scans detector prototype is written in C and Lisp.
The most heavily used tools are written in C and can be accessed
via an ASCII interface using a Lisp-like syntax. This interpreter
combine the following tools:
-
the PIP software for solving integer programming problems,
- the convex calculator from IRISA [17],
- a set of operations on systems of equations (mainly
forward substitution).
For example, consider the system of equations below.
v1 : { i | 1 <= i <= n } of real ;
v2 : { i | 1 <= i <= n } of real ;
v1[i] = case
{ i | i = 1 } : b[0] + a[1] ;
{ i | i >= 2 } : v2[i-1] + a[i] ;
esac ;
v2[i] = case
{ i | i = 1 } : a[0] + b[1] ;
{ i | i >= 2 } : v1[i-1] + b[i] ;
esac ;
The following text asks the interpreter for a substitution of
the definition of v2 into the second clause of v1
(note the keyword subst in this clause):
(print_sys (system (n)
(equation v1 (i)
(clause (constraints (i n) (i = 1)) (add (aref b 0) (aref a 1)))
(clause (constraints (i n) (i >= 2) (i <= n))
(subst v2 (add (eref v2 (function (i n) ( - i 1) n)) (aref a i)) 0)))
(equation v2 (i)
(clause (constraints (i n) (i = 1) ) (add (aref a 0) (aref b 1)))
(clause (constraints (i n) (i >= 2) (i <= n))
(add (eref v1 (function (i1 n) ( - i1 1) n)) (aref b i))))))
The interpreter answer is a bit hard to read due to the representation
of convex domains as matrices of constraints:
(system (n) (equation v1 (i)
(clause ((#[#[0 1 0 -1] #[1 0 0 1]].#[#[0 0 1 0] #[1 1 0 1]]))
(add (aref b 0) (aref a 1)))
(clause ((#[#[0 1 0 -2] #[1 0 1 -2] #[1 0 0 1]].#[#[1 0 1 0] #[1 2 2 1]]))
(add (add (aref a 0) (aref b 1)) (aref a i)))
(clause ((#[#[1 -1 1 0] #[1 1 0 -3] #[1 0 0 1]].#[#[1 1 1 0] #[1 0 1 0] #[1 3 3 1]]))
(add (add (eref v1 #[#[1 0 -2] #[0 1 0]]) (aref b (- i 1))) (aref a i))))
(equation v2 (i)
(clause ((#[#[0 1 0 -1] #[1 0 0 1]].#[#[0 0 1 0] #[1 1 0 1]]))
(add (aref a 0) (aref b 1)))
(clause ((#[#[1 1 0 -2] #[1 -1 1 0] #[1 0 0 1]].#[#[1 0 1 0] #[1 1 1 0] #[1 2 2 1]]))
(add (eref v1 #[#[1 0 -1] #[0 1 0]]) (aref b i)))))
Yet one can recognize the system of equation explicitly given below, which
is indeed the expected result.
v1[i] = case
{ i | i = 1 } : b[0] + a[1] ;
{ i | i = 2 } : a[0] + b[1] + a[2] ;
{ i | i >= 3 } : v1[i-2] + b[i-1] + a[i] ;
esac ;
v2[i] = case
{ i | i = 1 } : a[0] + b[1] ;
{ i | i >= 2 } : v1[i-1] + b[i] ;
esac ;
There are two main modules in our current prototype, a module of
system normalization at a given pseudo-depth and a module for
detecting scans at a given pseudo-depth. The canonical way of
using these modules is to apply normalization at the
greater pseudo-depth and then to apply the detection module
at the same pseudo-depth. If no scan is found the process may
be iterated with a lower pseudo-depth and so on. Most of the
operations are executed by the C interpreter, the only major
part coded in Lisp is the extraction of the binary operation
and the associated data from the propagation function of
recurrences. At the present time, the rules presented in section
5 are not used in the prototype, even the enhancement
of scans to multi-directional ones. The implementation of these
rules will lead to the creation of a third module. The aim of this
module is to increase the size of the accumulation domain of the
detected scans. This may be achieved by enhancement or combination
of scans.
Consider the following program extracted from the Argonne benchmarks
[2]:
j=0 (v1)
do 90 i = 1,n
if(b(i).gt.x)then
x = b(i) (v2)
j = i (v3)
endif
90 continue
write(*,*) x (v4)
write(*,*) j (v5)
The system found after the application of the detection module at
pseudo-depth 0 (there is no need of the normalization module here) is:
v2[i] = case
{ i | i >= 2 } :
Scan( { i' | 0<=i'<=n }, ( [1] ),
max, b[i'], x )[i] ;
{ i | i = 1 } :
if (b[1] > x) then b[1] else x ;
esac ;
v4 = v2[n] ;
v5 = Scan( { i' | 1<=i'<=n }, ( [1] ),
search, (b[i'] > v2[i'-1]) x i',
true x (if (b[1] > x)
then 1 else 0) )[n] ;
The two scans are detected (the one which computes the maximum of the
array b and the one which compute the subscript of this maximum).
The results for the other Argonne benchmarks can be found in [16].
6.2 Future Work
The present version of our software fulfill one of the aims we set
ourselves when we began this research: to build a much more powerful
scan detector than what is found in current parallelizers and vectorizers.
We detect much more scans than do others, and this is done with a very
small knowledge base. Our detector is almost insensitive to
variations in the syntax of the original loop nest. Last but not least, we are
able to recognize scan on arrays and arrays of scans. We believe our
software is the first one to implement this facility.
This work is obviously just the beginning on the road toward efficient
implementations of scans and reductions. The first problem that suggests
itself is how to use the results of our analyzer.These results are not
an imperative program, but a set of recurrence equations embellished
with Scan operators. We still have to convert them back to
our object language, e.g. some sort of parallel or data-parallel Fortran.
We gave preliminary solutions and experimental results in [15]
but much more work is needed in that direction.
There are, however, other applications of scan detection besides parallelization.
One of them is program checking (or reverse engineering). After scan detection
--- and possibly some pretty printing --- a program is brought in a form which is
much nearer to mathematical notations than the original. It should be
easier to detect errors --- e.g. a summation which is short one term ---
on the mathematical representation than on the imperative version.
There is another, possibly much more interesting application,
algorithm recognition. Toward this aim, we need a complete set of
operators on scans, and some way of organizing them in a semblance
of a normalization algorithm, it being understood that a full normalization
algorithm is probably impossible. We could then compare the result to a
base of normal forms for standard algorithms, e.g. all those for
which we have an efficient implementation in our library. We could then replace
part of the original program by a call the the corresponding routine. If the user
care to supply directives, we could even select a version which is well
adapted to the task at hand, as for instance a vector or parallel version,
or even a sparse version.
This proposal is not as farfetched as it seems. Consider, for example,
the following code for matrix multiplication:
do 1 i=1,n
do 1 j=1,n
s=0. (v1)
do 2 k=1,n
2 s = s + a(i,k) * b(k,j) (v2)
1 c(i,j) = s (v3)
Our present scan detector translates this program into a very small
system of equations:
v4[i,j] = Scan( { i',j',k' | 1<=i'<=n, 1<=j'<=n, 0<=k'<=n },
( [0 0 1] ), +, a[i',k'] * b[k',j'], 0 )[i,j,n] ;
Besides,it seems probable --- but it has to be investigated ---
that most variants of this code (e.g. all those obtained by permuting
the loops) will normalize to the same or to very similar equations.
It thus seems that we are at the threshold of being able of recognizing
simple algorithms from linear algebra.
To achieve this ``semantic parallelization'', a whole algebra
must be build around the Scan operator. The basis of this algebra
is given in section 5. Now, we must find a way to
apply its rules to the systems of equations in order to obtain a
better normalization. This will be the subject of future work.