The sum of the elements of a vector v can be computed by the mere expression sum(v(1:n)). With our operator the syntax is a bit more complex :HPF also includes more modern reduction primitives of names like xxx_SCATTER and some scans primitives are provided with names like xxx_SUFFIX and xxx_PREFIX. The HPF reduction primitives can use very complex patterns thanks to indirection arrays. Such a primitive needs the following arguments :Scan({ i | 0 <= i <= n }, ( [1] ), +, v, 0)This operator is described in more details later but one can note that the binary operation to be applied (here the + operation) is given as a parameter. The first two parameters (the accumulation domain and the direction) state that the reduction takes into account the elements of v from subscript 1 to n with a step of 1. Subscript 0 is for the initial value which is the last parameter. Moreover the Scan operator computes the whole set of partial results. In the example, only the last value of the resulting vector is needed:Scan({ i | 0 <= i <= n }, ( [1] ), +, v, 0) [n]
The classical example of a multi-directional scan is the sum of the elements of a matrix:At this point we can give a more formal definition of the Scan operator. First we need to define a variant of the lexicographic minimum. Let D be a convex of Zp and let the ei be vectors of the same Zp space.s=0 DO i=1,n DO j=1,n s=s+a(i,j) END DO END DOIn the equivalent system of equations, the piecewise rectilinear path appears clearly:v2[i,j] = case { i,j | i=1, j=1 } : a[i,j] ; { i,j | 2<=i<=n, j=1 } : v2[i-1,n] + a[i,j] ; { i,j | 1<=i<=n, 2<=j<=n }: v2[i,j-1] + a[i,j] ; esac ;This path may be defined by two vectors (the two directions of the multi-directional scan) () and (
1 0 ). The second direction is used until the last element of a row and then the first direction is used to skip to the next row which must be scanned from the beginning to the end, and so on. Since the path iterates the whole array, only one scan is defined here (and not an array of scans). Hence, the domain of the initial value is the single point (1,1). To summarize, the Scan expression to compute the sum of the elements of a matrix a is:
0 1 Scan( { i,j | 1<=i<=n, 1<=j<=n }, ( [1 0] [0 1] ), +, a, a[1,1])
|
, |
|
. |
| Scan( D | ,(ei) |
|
,b,d,g) . |
| " zÎ D, v(z)= |
ì ï ï ï ï ï ï í ï ï ï ï ï ï î |
|
| Dg = min |
|
( D) . |
| fabcd(x) = |
|
| a | b |
| c | d |
Let us take as O the set {+, -, *} with the usual properties: associativity and commutativity of + and *, the rule of signs, distributivity of * with respect to +, and the familiar rules of arithmetic (2 + 2 = 4 and so on). If t is tractable for this set of operators, its normal form is a polynomial in vi-1 whose coefficients are combinations of its elementary terms. As we have seen earlier, such a polynomial is associated to a scan if it is linear:vi = ai vi-1 + bi .There are two simple cases : ai = 1, which gives a sum, and bi = 0, which gives a product. In the general case, we may compose two linear functions and show that the result is still linear:a(cx + d) + b = (ac) x + (ad + b) .This gives the associative companion function:
lin æ
èæ
è
a b ö
ø, æ
è
c d ö
øö
ø= æ
è
ac ad + b ö
ø.
In fact, since linear functions are a subfamily of the homographic functions which is closed under composition, this is a special case of an earlier observation.
do i = 1,n
f3(i) = f2(f1(i))
end do
which takes 2n table accesses. The total time for the reduction is thus
2n(m/P + log2 P). The last step takes m/P times unit,
for a total of 2nm/P + 2nlog2 P + m/P.
In the case of a reduction, the last step takes only
time 1, and we have to compare m to 2nm/P + 2nlog2 P + 1.
In both cases, and supposing that m is large enough that we can neglect
other terms, we have to compare m to 2m n/P. The conclusion is
that the method is advantageous provided that the size of the finite
domain is small compared to the number of processors.