5 Scan Algebra
The Scan operator obeys several identities which can be used as
transformation rules. These transformations include change of basis,
combination between several operators, normalization of data and
more. In this section we present some of these transformations and show
their usefulness on examples . Our long term aim is to apply, in an automatic
way, these transformations on systems of equations with Scan
operators to obtain
a normal form. The problem of deciding if two programs are equivalent
is undecidable, but it may be possible to detect cases where expressions
including Scan operators are equivalent.
5.1 Combination of Scans
It may be interesting to modify the scan directions, for instance to
align these directions on the canonical directions of the parallel
machine geometry. Indeed some scan primitives work only along these
directions. The following theorem
5
show under which conditions such a change of basis can take place.
Theorem 5 [Change of basis for the Scan operator]
Let T be an affine one-to-one transformation. We denote by
Tl the linear transformation T-T(0). The following equivalence stands:
|
|
º |
Scan(T( D |
),(Tl(ei)) |
|
,b,d° T-1,
g° T-1)° T
. |
|
|
-
Proof:
-
One may remark that, for a one-to-one transformation T,
the expression
is equivalent to
By applying this identity on the definition of the array computed
by the first Scan operator, one obtain the definition of the array
computed by the second Scan operator.
The previous theorem proves itself very useful for scan combination,
i.e. for merging scans operators present in an expression into a
single Scan operator. In the sequel of this section we show that, under
some conditions, an expression of the form Scan(...)o
Scan(...) (with o a classical operator like addition)
can be rewritten as a single Scan operator. At some point in the proof
we need to enlarge the accumulation domain of the scans. Enlarging the
accumulation domain increase the number of operations needed to compute
a scan and hence is not a recommended operation. But when carefully done,
this operation does not change the scan result and is useful in
scan rewriting processes.
Lemma 6 [Enlarging an accumulation domain]
Let S be a scan expression of the following form:
The binary operator b is supposed to have a neutral element ^.
Let D' be a convex including D, a scan expression S'
with D' as accumulation domain may be built such that the restriction
of its results on D is equal to S. The scan expression
S' is:
Scan(
D |
',(ei) |
|
,b,d',g')
.
|
with
-
d' identical to d on D and neutral elsewhere :
" zÎ D'-
min |
|
( D'), d'(z)=
|
ì ï ï í ï ï î |
if |
|
then |
g(z) |
else if |
zÎ D |
then |
d(z) |
else |
^
; |
|
|
- g' is the neutral element excepted on the old initial data
space :
" zÎmin |
|
( D'), g'(z)=
|
ì ï í ï î |
|
|
-
Proof:
-
If two integer vectors z and z-1 of D are such that z-1
is the immediate predecessor of z with respect to the directions of the
scan S, then z-1 is a predecessor of z with respect to the
directions of the scan S'. Moreover, between z and z-1 only
integer vectors of D'- D can be encountered. These vectors
are associated to the neutral element for b, hence the two scan expressions
S and S' give the same result for integer vectors of D.
Corollary 7 [Scan with Neutral Initial Values]
From the preceding theorem it is clear that one can always obtain,
from any Scan, a Scan with initial values ^. It suffices to
enlarge the accumulation domain D of the scan to the convex
hull of DÈmine1,...,ek D( D).
The previous theorems for changing basis and enlarging domain
give the necessary background to combine Scan operators of
a clause expression. The method to combine these operators
consists in applying to them the change of basis in order
to obtain scans computed along compatible paths. Then it is
possible to enlarge the Scan domains to the convex hull of
their union. At this point the Scan operators can be merged,
the resulting Scan operator working on vectors of data.
Moreover, to obtain the same multi-dimensional array a final
stage of general communications is necessary.
This superposition of Scan operators is not very interesting
from a practical point of view. Indeed the merging is only a
syntactical trick and the resulting Scan is more difficult to
compute on a parallel machine (these machines does not provide
optimized primitives to compute complex scans on vectors and a general
communication is a high cost operation).
But sometime the merging can lead to a simpler (in term of
computation complexity) Scan expression. The following theorem
illustrate such a situation.
Theorem 8 [Combination of Scans]
Let S be an expression of the following form:
Scan(
D |
1,(ei1) |
|
,o,d1,g1)°f1
o
Scan(
D |
2,(ei2) |
|
,o,d2,g2)°f2
.
|
We denote by y1 and y2 the linear transformations
f1-f1(0) and f2-f2(0).
Provided that the following conditions are fulfilled
-
the binary operator o is commutative and has a neutral
element ^ ;
- the transformations y1 and y2 are one-to-one ;
- the paths P1=(y1-1(ei1))iÎ{1,...,k1} and
P2=(y2-1(ei2))iÎ{1,...,k2} are compatible (i.e.
there exists a path P=(ei)iÎ{1,...,k} such that the
paths P1 and P2 are the restrictions of P
respectively on D1 and D2),
then expression S is rewritable as a single scan expression.
-
Proof:
-
After a change of basis and an accumulation domain enlarging the expression
is of the form
Scan(
D |
,(ei) |
|
,o,d1',g1')
o
Scan(
D |
,(ei) |
|
,o,d2',g2')
.
|
Because of the commutativity of o it is valid to merge the two
scans into a single one :
Scan(
D |
,(ei) |
|
,o,l z.d1'(z)o d2'(z),
l z.g1'(z)o g2'(z))
.
|
The expression corresponding to program A in section (2.2)
is
s = Scan({i|1<=i<=n},([1]),+,a,a[1])[n]
For program B the expression is a bit more complex.
s = Scan({i|2<=i<=n},([1]),+,a,a[2])[n]+a[1]
The value a[1] can be considered as a degenerate Scan.
s = Scan({i|2<=i<=n},([1]),+,a,a[2])[n] +
Scan({i|1<=i<=n},([1]),+,0,a[1])[n]
= ( Scan({i|2<=i<=n},([1]),+,a,a[2]) +
Scan({i|1<=i<=n},([1]),+,0,a[1])
)[n]
The accumulation domain of the first Scan operator must be enlarged.
s = ( Scan({i|1<=i<=n},([1]),+,if i=2 a[2] else a[i],0) +
Scan({i|1<=i<=n},([1]),+,0,a[1])
)[n]
Now a merging is possible.
s = Scan({i|1<=i<=n},([1]),+,if i=2 a[2]+0 else a[i]+0,a[1]+0)[n]
= Scan({i|1<=i<=n},([1]),+,a,a[1])[n]
This last expression is exactly the one computed for program A.
Note that the result may differ according to the domain you choose
for the degenerated scan. In the sequel of this paper we present
a normalization for the data field of Scan operators used to compute
a reduction. With this tool the resulting expression is the same
whatever the domain of the degenerated scan is chosen.
A condition for applying the Scan combination theorem is that the
transformations on the Scans must be one-to-one. This restrict
considerably the field of application of this theorem. In fact, it is often
possible to split a transformation into a one-to-one transformation and another
transformation depending only on the kernel of the original transformation.
This help to apply the combination theorem on a fair amount of cases.
Theorem 9 [Transformation Factorization]
Let f be an affine transformation from C to D
(two affine sub-spaces of Zp
6
) such that dimD³dimC.
There exists a one-to-one transformation f' and a transformation
p depending only on kerf such that f=f'°p.
-
Proof:
-
This proof use only classical linear algebra results.
Let us consider the following clause expression.
v[i] = Scan({i,j|0<=i<=n,1<=j<=n},([1 0]),+,a,0)[i->n,i] +
Scan({i,j|1<=i<=n,0<=j<=n},([0 1]),+,b,0)[i->i,n]
The accumulation domains are two-dimensional and the clause domain
is one-dimensional. The first step (a technical one) is to add a dummy
dimension to the clause domain.
v[i] = v'[i,0]
v'[i,j] = Scan({i,j|0<=i<=n,1<=j<=n},([1 0]),+,a,0)[i,j->n,i] +
Scan({i,j|1<=i<=n,0<=j<=n},([0 1]),+,b,0)[i,j->i,n]
Note that the dimension of the new clause is again 1 but it is defined
in Z2 not in Z. The two transformations have the same kernel
generated by (). So the
function p can be factorized; a necessary condition for the combination
of the two Scans of the expression.
v'[i,j] = ( Scan({i,j|0<=i<=n,1<=j<=n},([1 0]),+,a,0)[i,j->j+n,i] +
Scan({i,j|1<=i<=n,0<=j<=n},([0 1]),+,b,0)[i,j->i,j+n]
) [i,j->i,0]
A change of basis is applied on the Scans.
v'[i,j] = ( Scan({i,j|1<=i<=n,-n<=j<=0},([0 1]),+,a[j,i+n],0) +
Scan({i,j|1<=i<=n,-n<=j<=0},([0 1]),+,b[i,j+n],0)
) [i,j->i,0]
No domain enlarging is needed, the merging of the two Scans is immediate.
v'[i,j] = Scan({i,j|1<=i<=n,-n<=j<=0},([0 1]),+,
a[j,i+n]+b[i,j+n],0)[i,j->i,0]
One may apply some simplifications to obtain the expected result.
v[i] = Scan({i,j|1<=i<=n,0<=j<=n},([0 1]),+,a[j,i]+b[i,j],0)
Corollary 10 [Single Reductions Combination]
When a Scan operator compute a single reduction then
the image of the associated transformation is an integer vector.
Hence, the combination of two Scan operators computing a single
reduction is possible provided that they have a compatible path
and that they are combined using a commutative binary operator
which is also the binary operator of the two Scan operators.
When the dimension of the clause is greater than the dimension of the
Scan operator accumulation domain the only solution is to duplicate
the Scan operator data.
Theorem 11 [Scan Data Replication]
Let S be the Scan expression below.
Scan( D |
,(ei) |
|
,o,d,g)°f
.
|
The transformation f is from the affine sub-space C of Zp
generated by C to the affine sub-space D of Zp generated
by D (sub-spaces of Zp). Let us denote by z0 the vector
of Zp such that D-z0 is the linear sub-space Dl associated to D.
Moreover, let (ki)iÎNs* be an orthonormal basis of kerf,
let (ki)iÎ{s+1,...,t} be an orthonormal basis of an orthogonal
supplementary of kerf in C, let k'iiÎNs'* be
an orthonormal basis of D and let (k'i)iÎ{s'+1,...,p}
be an orthonormal basis of an orthogonal supplementary of D in Zp.
The expression S is equivalent to the following Scan expression.
S' º
Scan( D |
',(ei) |
|
,o,d°p,g°p)
°f'
,
|
with
-
the function p a projection on the vectors
(k'i)i¹ s'+1
" zÎZp, p(z)=z-((z-z0).k's'+1).k's'+1
,
- the convex D' defined by
" zÎZp, zÎ D' Þ
p(z)Î D |
Ù
|
|
(x.k1)£ z.k's'+1 £ |
|
(x.k1)
,
|
- the function f' is a variant of f with a smaller kernel
" zÎ C, f'(z)=f(z)+(z.k1).k's'+1
.
-
Proof:
-
Since the directions of the Scans are in Dl, a value S'(z)
is computed iterating integer vectors translated by (z.k1).k's'+1
from the integer vectors used to compute S(z). Moreover,
the application p°f' is the identity on D'.
Indeed, for z in C the image f(z) is in D,
so f(z)-z0 is in Dl and hence orthogonal to k's'+1.
In conclusion, for a given z element of C, the list of
values used to compute S'(z) is the same that the list of
values used to compute S(z).
In few words, the previous theorem means that replicating the data of
the Scan S on directions orthogonal to D allows
to access its result using a one-to-one transformation. Another
view of this theorem is that the replication of a Scan result may be
avoided by the replication of the Scan data but at the expense of
redundant computations.
Let us consider the following clause expression
forall i,j in {i,j|1<=i<=n,1<=j<=n},
v[i,j] = Scan({i,j|0<=i<=n,1<=j<=n},([1 0]),+,a,0) +
Scan({i|0<=i<=n},([1]),+,b,0)[i,j->i]
Again the vectors are considered as vectors of Z2.
forall i,j in {i,j|1<=i<=n,1<=j<=n},
v[i,j] = Scan({i,j|0<=i<=n,1<=j<=n},([1 0]),+,a,0) +
Scan({i,j|0<=i<=n,j=0},([1 0]),+,b[i],0)[i,j->i,0]
The dimension of the accumulation space of the second Scan is less
than the dimension of the clause domain. Let us rewrite this Scan using
the previous theorem. The kernel of its transformation is generated
by (). Moreover a
supplementary of the space generated by the domain
{i,j|0<=i<=n,j=0}
has as basis the same vector.
Using this vector it is easy to compute the projection p.
p
|
æ è |
|
ö ø |
=
|
æ è |
|
ö ø |
-
|
é ë |
|
æ è |
|
ö ø |
.
|
æ è |
|
ö ø |
ù û |
*
|
æ è |
|
ö ø |
=
|
æ è |
|
ö ø |
.
|
The bounds for the new dimension of the Scan accumulation domain
are
|
|
|
æ è |
|
æ è |
|
ö ø |
.
|
æ è |
|
ö ø |
ö ø |
=1,
|
|
|
æ è |
|
æ è |
|
ö ø |
.
|
æ è |
|
ö ø |
ö ø |
=n
|
The new transformation is
f'
|
æ è |
|
ö ø |
=
f |
æ è |
|
ö ø |
-
|
é ë |
|
æ è |
|
ö ø |
.
|
æ è |
|
ö ø |
ù û |
*
|
æ è |
|
ö ø |
=
|
æ è |
|
ö ø |
.
|
In conclusion the original expression become:
forall i,j in {i,j|1<=i<=n,1<=j<=n},
v[i,j] = Scan({i,j|0<=i<=n,1<=j<=n},([1 0]),+,a,0) +
Scan({i,j|0<=i<=n,1<=j<=n},([1 0]),+,b[i,0],0)[i,j->i,j]
The last stage is the merging of the two Scans.
forall i,j in {i,j|1<=i<=n,1<=j<=n},
v[i,j] = Scan({i,j|0<=i<=n,1<=j<=n},([1 0]),+,a[i,j]+b[i,0],0)
Some other rules of rewriting exist concerning the Scan operator.
For example when the Scan is used to compute a reduction, it is
sometime possible to reorganize its data to simplify the Scan
expression.
Theorem 12 [Reduction Data Normalization]
Let S be a multi-directional Scan for the commutative operator
o (its neutral element being ^):
Scan(
D |
,(ei) |
|
,
o,l z.(d1(I1(z))o...o dn(In(z))),
l z.^)
.
|
Let Di be
Ii( D-mine1,...,ek D( D)),
let D0 be the convex hull of the union of the Di,
let D' be the convex hull of
D0Èmine1,...,ek D0( D0)
and let S' be another Scan built from S:
Scan |
æ ç è |
D |
',
(ei) |
|
,
o,
l z.
|
ì í î |
|
|
ü ý þ |
o...o
|
ì í î |
|
|
ü ý þ |
,
^ |
ö ÷ ø |
,
|
Provided that
|
" iÎ Nn*, " zÎ D,
$ (l1,...,lk)ÎNk such that
I |
i(z)=z+ |
|
ll.ek
, |
|
then
" zÎ max |
|
( D),
S[z]= S'[max |
|
(z)]
|
-
Proof:
-
The proof consists in verifying that the data used to compute
S'[maxe1,...,ek D'(z)]
(with z element of maxe1,...,ek D( D))
are the same as the data used to compute S[z]. The order of
accumulation is unimportant since o is commutative.
Because of constraint (10) we are sure that the integer
vectors Ii(z) are on the same accumulation path than z.
Moreover, by definition of our maximum operator on convex polyhedra,
maxe1,...,ek D'(z) is located after
z on their common accumulation path. Hence the values
used to compute S[z] are used to compute
S'[maxe1,...,ek D'(z)].
Since two accumulation paths of a Scan do not have a common
integer point, the two expressions compute exactly the same value.
To illustrate the previous theorem consider the following Scan
Scan({i,j|0<=i<=n,0<=j<=n},([1 1]),+,a[i,i]+a[i+1,j],0)
We suppose that this Scan is used like a reduction. This means that
only the last line (n,j) and the last column (i,n) of the
resulting matrix are used. To normalize the data of this reduction
a verification must take place: does there exists an integer l
such that
()=
()+l
()
? The answer is no. In this case, no normalization can
occurs because some values of the array a are used in more than
one accumulation path. Now, let us consider a variation of the example.
Scan({i,j|0<=i<=n,0<=j<=n},([1 1]),+,a[i,i]+a[i+1,j+1],0)
The new constraint for the normalization is
()=
()+l
()
. Since 1 is a solution, the normalization is possible.
Using the theorem notations, the domain D1 is
{i,j|1<=i<=n,1<=j<=n}
and the domain D2
is {i,j|2<=i<=n+1,2<=j<=n+1}
. The application of the
theorem leads to the new Scan domain {i,j|0<=i<=n+1,0<=j<=n+1}
.
Moreover the new Scan must be accessed using the integer vectors
max
(
)
D'([n,i])=[n+1,i+1]
and
max
(
)
D'([j,n])=[j+1,n+1].
Finally, the new Scan is, after simplification:
Scan({i,j|0<=i<=n+1,0<=j<=n+1},([1 1]),+,
if (2<=i<=n and 2<=j<=n) then 2*a[i,j] else a[i,j],0)
The report [RF:97] presents more examples of Scan algebra
application to programs transformations.
5.2 Multi-directional Scans
Our first observation is that multi-directional scans are more efficient
than multiple uni-directional scans.
Take the example of the sum of the elements of a n× n matrix.
With only uni-directional scans, one must compute n sequential scans
operating on n data. With P processors, the run-time is of the order of
n(n/P+log2(P)). With a multi-directional scan, a run-time of the
order of n2/P+log2(P) is expected. In the best case (when n=P)
the speed-up is about 1+log2(P).
The multi-dimensional scans are not detected directly. They can be found
using multistage elimination. When working at pseudo-depth p,
a direction may be added to a scan if its initial value references the clause
c in which it lays. To be more formal, let us list the conditions under which
a general scan may be enhanced to a scan with one more direction.
The scan may already have several directions:
|
Scan( D |
,(ei) |
|
,b,d,g)° f
, |
|
Let the initial value g (remember that g is a spatial value) be a
disjunction of s clauses of domains ( Djg)jÎ Ns*
and of expressions (expjg)jÎ Ns*. In this
context the conditions are:
-
The scan is the clause c expression (if it is included in an
expression, it cannot reference itself).
- A number s' (between 1 and s-1) of clauses of g are of
the form f(z,vc(Ij(z))). The function f is l zx.b(x,d(z)),
vc represents the multi-dimensional array computed by the clause c
and Ij is a subscript function of pseudo-depth p.
- A new direction can be extracted from these s' clauses.
- The union of the domains of the others clauses of g must
be equal to the domain of the initial value:
The new direction e0 can be found by resolving the integer
problem below:
|
|
" jÎ Ns'*, " zÎ Djg,
max |
|
(z-e0)=f(Ij(z)) |
|
|
|
-
Proof:
-
In the definition of the multi-dimensional array v computed
by the Scan operator (11) the first clause explode
into two new clauses which are the first clauses of a new Scan operator:
Scan( D |
,(ei) |
|
,b,d,g')° f
.
|
Practically, the problem (12) may be solved using the
PIP software [5], since the definition of
maxe1,...,ek Dj is based on a
lexicographic maximum.
Let us consider again the summation of the elements of a matrix.
The corresponding system is:
v2[i,j] =
case
{ i,j | i=1, j=1 } : a[i,j] ;
{ i,j | 2<=i<=n, j=1 } : v2[i-1,n] + a[i,j] ;
{ i,j | 1<=i<=n, 2<=j<=n }: v2[i,j-1] + a[i,j] ;
esac ;
Pattern matching is applied for the detection of scans at pseudo-depth
1. A closed form is introduced for the clause v2.2:
v2[i,j] =
case
{ i,j | i=1, j=1 } : a[i,j] ;
{ i,j | 2<=i<=n, j=1 } : v2[i-1,n] + a[i,j] ;
{ i,j | 1<=i<=n, 2<=j<=n }: Scan( {i',j' | 1<=i'<=n, 1<=j'<=n},
([0 1]), +, a[i',j'], v2[i',j'] );
esac ;
Removing inter-components references leads to include the clauses
v2.0 and v2.1 into the initial value of the Scan operator.
v2[i,j] =
case
{ i,j | 1<=i<=n, 2<=j<=n }:
Scan( {i',j' | 1<=i'<=n, 1<=j'<=n}, ([0 1]), +, a[i',j'],
case
{ k,l | k=1, l=1 } : a[1,1] ;
{ k,l | 2<=k<=n, l=1 } : v2[k-1,n] + a[k,l] ;
esac ; )
esac ;
This system is already normalized for pseudo-depth 0, a pattern-matching
can be applied. No new uni-directional scan can be detected. But we can
try to add a direction to the previous detected scan. The first two conditions
for scan enhancement are fulfilled:
-
the scan is the clause expression;
- the second clause of the initial value is referencing v2.0
at pseudo-depth 0 and its expression used the scan binary operation
(here an addition).
To verify the last condition, we must find the candidate vector for the
new direction. This vector e0 is the solution of the following problem:
" zÎ{i',j' | 2£ i'£ n, j'=1},
max |
|
(z-e0)= |
æ è |
|
ö ø |
It is possible to transform this problem in a mere maximization
under constraints. A solution is the integer vector
().
Moreover the domain
(where D represents the accumulation domain of the scan)
is equal to the domain of the first clause of the initial value.
Indeed, these two domains are equal to the single integer vector
().
The scan enhancement is successful, we obtain a two-directional
scan:
v2[i,j] =
case
{ i,j | 1<=i<=n, 2<=j<=n }:
Scan( {i',j' | 1<=i'<=n, 1<=j'<=n}, ([1 0] [0 1]), +,
a[i',j'], a[1,1] ) ;
esac ;
5.3 Tactical Considerations
The rules in the preceding sections can be used in many ways and at many
places in a system of recurrence equations. We need a set of guidelines
in order to direct their application in fruitful directions. Clearly, these
guidelines depends on the ultimate aim of the transformations. Our main
objective here is to write efficient programs for parallel computers.
Other objectives can be contemplated, like program debugging and checking;
these applications are left for future work.
All parallel computers have the property that the larger the parallel
computation, the better the efficiency of the system.
Hence, our aim now is to decrease the number of scans, by using either
scan combination or multi-dimensional scans. Other rules (change of basis,
domain enlarging, and so on) are to be used mainly in order to enable
the use of the two principal transformations.
The heuristics for scan combination is patterned on
the method for multi-directional Scan operators introduction:
-
the algorithm starts executing at the largest pseudo-depth and
goes down to 0.
- clause expressions are scanned and Scan operators combination
performed when possible;
- multi-directional Scan operators for the current pseudo-depth
are introduced when possible.
Note that combinaison of Scan operators is attempted for each
pseudo-depth for two reasons. The obvious reason is that the system
of equations may be modified at each step, thus creating new opportunities,
but the more important reason is that combination of Scan operators can
lead to the introduction of a greater number of multi-directional operators.
The keystone for scan combination is theorem (8) but
its application is not a trivial matter. First, the two Scan operators
to be merged must be identified. For instance, in an expression more than
two Scan operators can be eligible for combination. It seems
that the choice of the couples of Scan operators is important for the
final result. Considering that scalars are nothing but degenerated
cases of Scan operators make the problem of identification even worse.
However, the possible choices are somewhat restricted by the fact that
Scans to be combined must have the same operator.
After the identification, the combination theorem (8) can be
applied directly only if the transformations associated to the Scan
operators are one-to-one. In other cases, the result can be obtained
by combining several transformations.
For instance if the transformations have the same kernel and if
the dimension of the accumulation domains is greater than the dimension
of the clause domain, it is enough to apply the factorization theorem
(9). Another tractable case is when the dimension of an
accumulation domain of a Scan operator is less than the dimension of
the clause domain. In this case the application of the scan replication
theorem (11) leads to an equivalent Scan operator with an
associated one-to-one transformation. This theorem must be used with
caution since it increases the amount of computation in the system of
equations. In the other cases it is impossible to perform a combination
of the identified Scan operators. A last verification must take place
after the change of basis: the paths of the resulting Scan operators
must be compatible.
We are aware that these proposals are preliminary; some of our heuristics
are highly target computer sensitive. Hence, it is time to implement a version
of the scan algebra, by enlarging on the work which has been presented in
[16] (see also section 6.1). To facilitate experimenting
with heuristics, the implementation is to be interactive, the user being
required to direct the selection of transformations. More automated versions
can then be designed when sufficient experience has been gathered.