Previous Contents Next

5   Scan Algebra

The Scan operator obeys several identities which can be used as transformation rules. These transformations include change of basis, combination between several operators, normalization of data and more. In this section we present some of these transformations and show their usefulness on examples . Our long term aim is to apply, in an automatic way, these transformations on systems of equations with Scan operators to obtain a normal form. The problem of deciding if two programs are equivalent is undecidable, but it may be possible to detect cases where expressions including Scan operators are equivalent.

5.1   Combination of Scans

It may be interesting to modify the scan directions, for instance to align these directions on the canonical directions of the parallel machine geometry. Indeed some scan primitives work only along these directions. The following theorem 5 show under which conditions such a change of basis can take place.

Theorem 5  [Change of basis for the Scan operator]   Let T be an affine one-to-one transformation. We denote by Tl the linear transformation T-T(0). The following equivalence stands:

Scan( D ,(ei)
 
iÎNk*
,b,d,g)
º
Scan(T( D ),(Tl(ei))
 
iÎNk*
,b,d° T-1, g° T-1)° T  .

Proof:
One may remark that, for a one-to-one transformation T, the expression

min
T( D)
 
Tl(e1),...,Tl(ek)
(T(z))  ,

is equivalent to
T æ
ç
ç
è
min
D
 
e1,...,ek
(z) ö
÷
÷
ø
 .
By applying this identity on the definition of the array computed by the first Scan operator, one obtain the definition of the array computed by the second Scan operator.
The previous theorem proves itself very useful for scan combination, i.e. for merging scans operators present in an expression into a single Scan operator. In the sequel of this section we show that, under some conditions, an expression of the form Scan(...)o Scan(...) (with o a classical operator like addition) can be rewritten as a single Scan operator. At some point in the proof we need to enlarge the accumulation domain of the scans. Enlarging the accumulation domain increase the number of operations needed to compute a scan and hence is not a recommended operation. But when carefully done, this operation does not change the scan result and is useful in scan rewriting processes.

Lemma 6  [Enlarging an accumulation domain]   Let S be a scan expression of the following form:
Scan( D ,(ei)
 
iÎ{1,...,k}
,b,d,g)  .
The binary operator b is supposed to have a neutral element ^. Let D' be a convex including D, a scan expression S' with D' as accumulation domain may be built such that the restriction of its results on D is equal to S. The scan expression S' is:
Scan( D ',(ei)
 
iÎ{1,...,k}
,b,d',g')  .
with
Proof:
If two integer vectors z and z-1 of D are such that z-1 is the immediate predecessor of z with respect to the directions of the scan S, then z-1 is a predecessor of z with respect to the directions of the scan S'. Moreover, between z and z-1 only integer vectors of D'- D can be encountered. These vectors are associated to the neutral element for b, hence the two scan expressions S and S' give the same result for integer vectors of D.
Corollary 7  [Scan with Neutral Initial Values]   From the preceding theorem it is clear that one can always obtain, from any Scan, a Scan with initial values ^. It suffices to enlarge the accumulation domain D of the scan to the convex hull of DÈmine1,...,ek D( D).

The previous theorems for changing basis and enlarging domain give the necessary background to combine Scan operators of a clause expression. The method to combine these operators consists in applying to them the change of basis in order to obtain scans computed along compatible paths. Then it is possible to enlarge the Scan domains to the convex hull of their union. At this point the Scan operators can be merged, the resulting Scan operator working on vectors of data. Moreover, to obtain the same multi-dimensional array a final stage of general communications is necessary. This superposition of Scan operators is not very interesting from a practical point of view. Indeed the merging is only a syntactical trick and the resulting Scan is more difficult to compute on a parallel machine (these machines does not provide optimized primitives to compute complex scans on vectors and a general communication is a high cost operation). But sometime the merging can lead to a simpler (in term of computation complexity) Scan expression. The following theorem illustrate such a situation.

Theorem 8  [Combination of Scans]   Let S be an expression of the following form:
Scan( D 1,(ei1)
 
iÎ{1,...,k1}
,o,d1,g1)°f1 o Scan( D 2,(ei2)
 
iÎ{1,...,k2}
,o,d2,g2)°f2  .
We denote by y1 and y2 the linear transformations f1-f1(0) and f2-f2(0).

Provided that the following conditions are fulfilled
then expression S is rewritable as a single scan expression.
Proof:
After a change of basis and an accumulation domain enlarging the expression is of the form
Scan( D ,(ei)
 
iÎ{1,...,k}
,o,d1',g1') o Scan( D ,(ei)
 
iÎ{1,...,k}
,o,d2',g2')  .
Because of the commutativity of o it is valid to merge the two scans into a single one :
Scan( D ,(ei)
 
iÎ{1,...,k}
,o,l z.d1'(z)o d2'(z), l z.g1'(z)o g2'(z))  .

The expression corresponding to program A in section (2.2) is
s = Scan({i|1<=i<=n},([1]),+,a,a[1])[n]
For program B the expression is a bit more complex.
s = Scan({i|2<=i<=n},([1]),+,a,a[2])[n]+a[1]
The value a[1] can be considered as a degenerate Scan.
s = Scan({i|2<=i<=n},([1]),+,a,a[2])[n] +
    Scan({i|1<=i<=n},([1]),+,0,a[1])[n]
  = ( Scan({i|2<=i<=n},([1]),+,a,a[2]) + 
      Scan({i|1<=i<=n},([1]),+,0,a[1])
      )[n]
The accumulation domain of the first Scan operator must be enlarged.
s = ( Scan({i|1<=i<=n},([1]),+,if i=2 a[2] else a[i],0) + 
      Scan({i|1<=i<=n},([1]),+,0,a[1])
      )[n]
Now a merging is possible.
s = Scan({i|1<=i<=n},([1]),+,if i=2 a[2]+0 else a[i]+0,a[1]+0)[n]
  = Scan({i|1<=i<=n},([1]),+,a,a[1])[n]
This last expression is exactly the one computed for program A. Note that the result may differ according to the domain you choose for the degenerated scan. In the sequel of this paper we present a normalization for the data field of Scan operators used to compute a reduction. With this tool the resulting expression is the same whatever the domain of the degenerated scan is chosen.
A condition for applying the Scan combination theorem is that the transformations on the Scans must be one-to-one. This restrict considerably the field of application of this theorem. In fact, it is often possible to split a transformation into a one-to-one transformation and another transformation depending only on the kernel of the original transformation. This help to apply the combination theorem on a fair amount of cases.

Theorem 9  [Transformation Factorization]   Let f be an affine transformation from C to D (two affine sub-spaces of Zp 6 ) such that dimD³dimC. There exists a one-to-one transformation f' and a transformation p depending only on kerf such that f=f'°p.

Proof:
This proof use only classical linear algebra results.
Let us consider the following clause expression.
v[i] = Scan({i,j|0<=i<=n,1<=j<=n},([1 0]),+,a,0)[i->n,i] +
       Scan({i,j|1<=i<=n,0<=j<=n},([0 1]),+,b,0)[i->i,n]
The accumulation domains are two-dimensional and the clause domain is one-dimensional. The first step (a technical one) is to add a dummy dimension to the clause domain.
v[i] = v'[i,0]
v'[i,j] = Scan({i,j|0<=i<=n,1<=j<=n},([1 0]),+,a,0)[i,j->n,i] +
          Scan({i,j|1<=i<=n,0<=j<=n},([0 1]),+,b,0)[i,j->i,n]
Note that the dimension of the new clause is again 1 but it is defined in Z2 not in Z. The two transformations have the same kernel generated by (
0
1
). So the function p can be factorized; a necessary condition for the combination of the two Scans of the expression.
v'[i,j] = ( Scan({i,j|0<=i<=n,1<=j<=n},([1 0]),+,a,0)[i,j->j+n,i] +
            Scan({i,j|1<=i<=n,0<=j<=n},([0 1]),+,b,0)[i,j->i,j+n]
            ) [i,j->i,0]
A change of basis is applied on the Scans.
v'[i,j] = ( Scan({i,j|1<=i<=n,-n<=j<=0},([0 1]),+,a[j,i+n],0) +
            Scan({i,j|1<=i<=n,-n<=j<=0},([0 1]),+,b[i,j+n],0)
            ) [i,j->i,0]
No domain enlarging is needed, the merging of the two Scans is immediate.
v'[i,j] = Scan({i,j|1<=i<=n,-n<=j<=0},([0 1]),+,
               a[j,i+n]+b[i,j+n],0)[i,j->i,0]
One may apply some simplifications to obtain the expected result.
v[i] = Scan({i,j|1<=i<=n,0<=j<=n},([0 1]),+,a[j,i]+b[i,j],0)
Corollary 10  [Single Reductions Combination]   When a Scan operator compute a single reduction then the image of the associated transformation is an integer vector. Hence, the combination of two Scan operators computing a single reduction is possible provided that they have a compatible path and that they are combined using a commutative binary operator which is also the binary operator of the two Scan operators.

When the dimension of the clause is greater than the dimension of the Scan operator accumulation domain the only solution is to duplicate the Scan operator data.

Theorem 11  [Scan Data Replication]   Let S be the Scan expression below.
Scan( D ,(ei)
 
iÎNk*
,o,d,g)°f  .
The transformation f is from the affine sub-space C of Zp generated by C to the affine sub-space D of Zp generated by D (sub-spaces of Zp). Let us denote by z0 the vector of Zp such that D-z0 is the linear sub-space Dl associated to D. Moreover, let (ki)iÎNs* be an orthonormal basis of kerf, let (ki)iÎ{s+1,...,t} be an orthonormal basis of an orthogonal supplementary of kerf in C, let k'iiÎNs'* be an orthonormal basis of D and let (k'i)iÎ{s'+1,...,p} be an orthonormal basis of an orthogonal supplementary of D in Zp. The expression S is equivalent to the following Scan expression.
S' º Scan( D ',(ei)
 
iÎNk*
,o,d°p,g°p) °f'  ,
with
Proof:
Since the directions of the Scans are in Dl, a value S'(z) is computed iterating integer vectors translated by (z.k1).k's'+1 from the integer vectors used to compute S(z). Moreover, the application p°f' is the identity on D'. Indeed, for z in C the image f(z) is in D, so f(z)-z0 is in Dl and hence orthogonal to k's'+1. In conclusion, for a given z element of C, the list of values used to compute S'(z) is the same that the list of values used to compute S(z).
In few words, the previous theorem means that replicating the data of the Scan S on directions orthogonal to D allows to access its result using a one-to-one transformation. Another view of this theorem is that the replication of a Scan result may be avoided by the replication of the Scan data but at the expense of redundant computations.

Let us consider the following clause expression
forall i,j in {i,j|1<=i<=n,1<=j<=n},
  v[i,j] = Scan({i,j|0<=i<=n,1<=j<=n},([1 0]),+,a,0) +
           Scan({i|0<=i<=n},([1]),+,b,0)[i,j->i]
Again the vectors are considered as vectors of Z2.
forall i,j in {i,j|1<=i<=n,1<=j<=n},
  v[i,j] = Scan({i,j|0<=i<=n,1<=j<=n},([1 0]),+,a,0) +
           Scan({i,j|0<=i<=n,j=0},([1 0]),+,b[i],0)[i,j->i,0]
The dimension of the accumulation space of the second Scan is less than the dimension of the clause domain. Let us rewrite this Scan using the previous theorem. The kernel of its transformation is generated by (
0
1
). Moreover a supplementary of the space generated by the domain {i,j|0<=i<=n,j=0} has as basis the same vector. Using this vector it is easy to compute the projection p.
p æ
è
i
j
ö
ø
= æ
è
i
j
ö
ø
- é
ë
æ
è
0
1
ö
ø
. æ
è
i
j
ö
ø
ù
û
* æ
è
0
1
ö
ø
= æ
è
i
0
ö
ø
 .
The bounds for the new dimension of the Scan accumulation domain are
 
min
æ
è
i
j
ö
ø
Î(Nn*)2
æ
è
æ
è
0
1
ö
ø
. æ
è
i
j
ö
ø
ö
ø
=1,             
 
max
æ
è
i
j
ö
ø
Î(Nn*)2
æ
è
æ
è
0
1
ö
ø
. æ
è
i
j
ö
ø
ö
ø
=n
The new transformation is
f' æ
è
i
j
ö
ø
= f æ
è
i
j
ö
ø
- é
ë
æ
è
0
1
ö
ø
. æ
è
i
j
ö
ø
ù
û
* æ
è
0
1
ö
ø
= æ
è
i
j
ö
ø
 .
In conclusion the original expression become:
forall i,j in {i,j|1<=i<=n,1<=j<=n},
  v[i,j] = Scan({i,j|0<=i<=n,1<=j<=n},([1 0]),+,a,0) +
           Scan({i,j|0<=i<=n,1<=j<=n},([1 0]),+,b[i,0],0)[i,j->i,j]
The last stage is the merging of the two Scans.
forall i,j in {i,j|1<=i<=n,1<=j<=n},
  v[i,j] = Scan({i,j|0<=i<=n,1<=j<=n},([1 0]),+,a[i,j]+b[i,0],0)
Some other rules of rewriting exist concerning the Scan operator. For example when the Scan is used to compute a reduction, it is sometime possible to reorganize its data to simplify the Scan expression.

Theorem 12  [Reduction Data Normalization]   Let S be a multi-directional Scan for the commutative operator o (its neutral element being ^):
Scan( D ,(ei)
 
iÎ{1,...,k}
, o,l z.(d1(I1(z))o...o dn(In(z))), l z.^)  .
Let Di be Ii( D-mine1,...,ek D( D)), let D0 be the convex hull of the union of the Di, let D' be the convex hull of D0Èmine1,...,ek D0( D0) and let S' be another Scan built from S:

Scan æ
ç
è
D ', (ei)
 
iÎ{1,...,k}
, o, l z. ì
í
î
d1(z) if zÎ D1
^  else
ü
ý
þ
o...o ì
í
î
dn(z) if zÎ Dn
^  else
ü
ý
þ
, ^ ö
÷
ø
 ,

Provided that
" iÎ Nn*, " zÎ D, $ (l1,...,lk)ÎNk  such that  I i(z)=z+
k
å
l=1
ll.ek  ,
then
" zÎ max
D
 
e1,...,ek
( D),  S[z]= S'[max
D'
 
e1,...,ek
(z)]
Proof:
The proof consists in verifying that the data used to compute S'[maxe1,...,ek D'(z)] (with z element of maxe1,...,ek D( D)) are the same as the data used to compute S[z]. The order of accumulation is unimportant since o is commutative. Because of constraint (10) we are sure that the integer vectors Ii(z) are on the same accumulation path than z. Moreover, by definition of our maximum operator on convex polyhedra, maxe1,...,ek D'(z) is located after z on their common accumulation path. Hence the values used to compute S[z] are used to compute S'[maxe1,...,ek D'(z)]. Since two accumulation paths of a Scan do not have a common integer point, the two expressions compute exactly the same value.
To illustrate the previous theorem consider the following Scan
Scan({i,j|0<=i<=n,0<=j<=n},([1 1]),+,a[i,i]+a[i+1,j],0)
We suppose that this Scan is used like a reduction. This means that only the last line (n,j) and the last column (i,n) of the resulting matrix are used. To normalize the data of this reduction a verification must take place: does there exists an integer l such that (
i+1
j
)= (
i
j
)+l (
1
1
) ? The answer is no. In this case, no normalization can occurs because some values of the array a are used in more than one accumulation path. Now, let us consider a variation of the example.
Scan({i,j|0<=i<=n,0<=j<=n},([1 1]),+,a[i,i]+a[i+1,j+1],0)
The new constraint for the normalization is (
i+1
j+1
)= (
i
j
)+l (
1
1
) . Since 1 is a solution, the normalization is possible. Using the theorem notations, the domain D1 is {i,j|1<=i<=n,1<=j<=n} and the domain D2 is {i,j|2<=i<=n+1,2<=j<=n+1}. The application of the theorem leads to the new Scan domain {i,j|0<=i<=n+1,0<=j<=n+1}. Moreover the new Scan must be accessed using the integer vectors max (
1
1
) D'([n,i])=[n+1,i+1] and max (
1
1
) D'([j,n])=[j+1,n+1]. Finally, the new Scan is, after simplification:
Scan({i,j|0<=i<=n+1,0<=j<=n+1},([1 1]),+,
     if (2<=i<=n and 2<=j<=n) then 2*a[i,j] else a[i,j],0)
The report [RF:97] presents more examples of Scan algebra application to programs transformations.

5.2   Multi-directional Scans

Our first observation is that multi-directional scans are more efficient than multiple uni-directional scans.
Take the example of the sum of the elements of a n× n matrix. With only uni-directional scans, one must compute n sequential scans operating on n data. With P processors, the run-time is of the order of n(n/P+log2(P)). With a multi-directional scan, a run-time of the order of n2/P+log2(P) is expected. In the best case (when n=P) the speed-up is about 1+log2(P).
The multi-dimensional scans are not detected directly. They can be found using multistage elimination. When working at pseudo-depth p, a direction may be added to a scan if its initial value references the clause c in which it lays. To be more formal, let us list the conditions under which a general scan may be enhanced to a scan with one more direction. The scan may already have several directions:
Scan( D ,(ei)
 
iÎ{1,...,k}
,b,d,g)° f  ,
Let the initial value g (remember that g is a spatial value) be a disjunction of s clauses of domains ( Djg)jÎ Ns* and of expressions (expjg)jÎ Ns*. In this context the conditions are: The new direction e0 can be found by resolving the integer problem below:
" jÎ Ns'*" zÎ Djg,  max
Dj
 
e1,...,ek
(z-e0)=f(Ij(z))
Proof:
In the definition of the multi-dimensional array v computed by the Scan operator (11) the first clause explode into two new clauses which are the first clauses of a new Scan operator:
Scan( D ,(ei)
 
iÎ{0,...,k}
,b,d,g')° f  .

Practically, the problem (12) may be solved using the PIP software [5], since the definition of maxe1,...,ek Dj is based on a lexicographic maximum.
Let us consider again the summation of the elements of a matrix. The corresponding system is:
v2[i,j] =
   case
    { i,j | i=1, j=1 }        : a[i,j] ;
    { i,j | 2<=i<=n, j=1 }    : v2[i-1,n] + a[i,j] ;
    { i,j | 1<=i<=n, 2<=j<=n }: v2[i,j-1] + a[i,j] ;
   esac ;
Pattern matching is applied for the detection of scans at pseudo-depth 1. A closed form is introduced for the clause v2.2:
v2[i,j] =
   case
    { i,j | i=1, j=1 }        : a[i,j] ;
    { i,j | 2<=i<=n, j=1 }    : v2[i-1,n] + a[i,j] ;
    { i,j | 1<=i<=n, 2<=j<=n }: Scan( {i',j' | 1<=i'<=n, 1<=j'<=n},
                                      ([0 1]), +, a[i',j'], v2[i',j'] );
   esac ;
Removing inter-components references leads to include the clauses v2.0 and v2.1 into the initial value of the Scan operator.
v2[i,j] =
   case
    { i,j | 1<=i<=n, 2<=j<=n }:
          Scan( {i',j' | 1<=i'<=n, 1<=j'<=n}, ([0 1]), +, a[i',j'],
                 case 
                   { k,l | k=1, l=1 }        : a[1,1] ;
                   { k,l | 2<=k<=n, l=1 }    : v2[k-1,n] + a[k,l] ;
                 esac ; )
   esac ;
This system is already normalized for pseudo-depth 0, a pattern-matching can be applied. No new uni-directional scan can be detected. But we can try to add a direction to the previous detected scan. The first two conditions for scan enhancement are fulfilled: To verify the last condition, we must find the candidate vector for the new direction. This vector e0 is the solution of the following problem:

" zÎ{i',j' |  2£ i'£ n, j'=1},  max
Dc
 
æ
è
0
1
ö
ø
(z-e0)= æ
è
z0-1
n
ö
ø

It is possible to transform this problem in a mere maximization under constraints. A solution is the integer vector (
1
0
). Moreover the domain

min
D
 
æ
è
0
1
ö
ø
, æ
è
1
0
ö
ø
( D)  ,

(where D represents the accumulation domain of the scan) is equal to the domain of the first clause of the initial value. Indeed, these two domains are equal to the single integer vector (
1
1
). The scan enhancement is successful, we obtain a two-directional scan:
v2[i,j] =
   case
    { i,j | 1<=i<=n, 2<=j<=n }:
          Scan( {i',j' | 1<=i'<=n, 1<=j'<=n}, ([1 0] [0 1]), +,
                a[i',j'], a[1,1] ) ;
   esac ;

5.3   Tactical Considerations

The rules in the preceding sections can be used in many ways and at many places in a system of recurrence equations. We need a set of guidelines in order to direct their application in fruitful directions. Clearly, these guidelines depends on the ultimate aim of the transformations. Our main objective here is to write efficient programs for parallel computers. Other objectives can be contemplated, like program debugging and checking; these applications are left for future work.

All parallel computers have the property that the larger the parallel computation, the better the efficiency of the system. Hence, our aim now is to decrease the number of scans, by using either scan combination or multi-dimensional scans. Other rules (change of basis, domain enlarging, and so on) are to be used mainly in order to enable the use of the two principal transformations.

The heuristics for scan combination is patterned on the method for multi-directional Scan operators introduction: Note that combinaison of Scan operators is attempted for each pseudo-depth for two reasons. The obvious reason is that the system of equations may be modified at each step, thus creating new opportunities, but the more important reason is that combination of Scan operators can lead to the introduction of a greater number of multi-directional operators.

The keystone for scan combination is theorem (8) but its application is not a trivial matter. First, the two Scan operators to be merged must be identified. For instance, in an expression more than two Scan operators can be eligible for combination. It seems that the choice of the couples of Scan operators is important for the final result. Considering that scalars are nothing but degenerated cases of Scan operators make the problem of identification even worse. However, the possible choices are somewhat restricted by the fact that Scans to be combined must have the same operator.

After the identification, the combination theorem (8) can be applied directly only if the transformations associated to the Scan operators are one-to-one. In other cases, the result can be obtained by combining several transformations. For instance if the transformations have the same kernel and if the dimension of the accumulation domains is greater than the dimension of the clause domain, it is enough to apply the factorization theorem (9). Another tractable case is when the dimension of an accumulation domain of a Scan operator is less than the dimension of the clause domain. In this case the application of the scan replication theorem (11) leads to an equivalent Scan operator with an associated one-to-one transformation. This theorem must be used with caution since it increases the amount of computation in the system of equations. In the other cases it is impossible to perform a combination of the identified Scan operators. A last verification must take place after the change of basis: the paths of the resulting Scan operators must be compatible.

We are aware that these proposals are preliminary; some of our heuristics are highly target computer sensitive. Hence, it is time to implement a version of the scan algebra, by enlarging on the work which has been presented in [16] (see also section 6.1). To facilitate experimenting with heuristics, the implementation is to be interactive, the user being required to direct the selection of transformations. More automated versions can then be designed when sufficient experience has been gathered.


Previous Contents Next