Previous Contents Next

4   Algebraic Method to Evaluate HPF Communications

As said in the previous section, we evaluate the communication cost in a HPF program by counting the number of communications between template elements.

4.1   Formula for Communication Cost Evaluation

There is a communication between two template elements if they are not distributed onto the same virtual processor and if the computation of a value to be stored in the first template element uses a value stored in the second one.

To be more precise, let us consider a storage statement S (as defined in the previous section) and a template T. We also need to define some functions: The number of template communications generated by statement S is equal to the number of elements in the union of the sets CJ( S):
C( S ) =
 
J DT
CJ( S ) =
 
J DT


(J,o)  |  o O
S
 
T
(J) ,  pT(J) pT(t(o))

  .     (2)
Note that C( S) is a set of couples and not of mere operations. Indeed we may have to count several times the same operation, so we added the template subscript to distinguish different occurrences of the same operation. Computing the set C( S) implies the application of a change of basis on the program loop nests. Indeed, in the original program, the loops are used to enumerate the operations in the program execution order and the set C( S) enumerates them following the template iteration space.

4.2   Formal Representation of HPF Alignments

To perform this change of basis the only informations needed are the HPF ALIGN directives. Basically HPF alignments are a sub-set of linear alignments but a replication symbol * has been added. Because of the replication symbol, a HPF alignment cannot be defined by a linear transformation from the array space to the template space.

A convenient way to represent a HPF alignment a is to use two linear transformations g and d. Transformation d defines the replication part of the alignment. Let I be a subscript of an array A aligned on a template T using a. The set of the subscripts of T on which the data A(I) is stored is:
{ J | J DTg(I)=d(J) } = d-1(g(I))   .

Let us consider the alignment:
!HPF$ ALIGN A(i) WITH T(i,*)
The first transformation from the array space to an intermediate template space is obtained by removing the replication symbols in the directive: g : i | i . The second transformation is the projection from the template space to the intermediate template space: d :  ( ij ) | i .

With this representation of a HPF alignment one can explicit the function t of Sect. (4.1). Let us consider a statement S as defined in (1) and an alignment tR S for array R S on the template T defined by (gR S,dR S). In this context, the following holds:
" I DT,  t( S (I))= t
 
R
 
S
(f
 
R
 
S
(I))= d
-1
 
R
 
S
(g
 
R
 
S
(f
 
R
 
S
(I)))   .

4.3   Enumeration of Operations in the Template Iteration Space

The main problem of enumerating operations in the template space is that there may be several operations corresponding to a template element T(J). I.e. more than one operation may produce a value to be stored in T(J). In fact, the solution lies again in the formal representation of HPF alignments. Let us denote by tW S the alignment for array W S. According to Sect. 4.2, the alignment can be written as tW S=gW SdW S-1. Therefore, the set OT S(J) introduced in the beginning of this section is defined by:
O
S
 
T
(J) = { S(I)  |  I D
 
S
f
 
W
 
S
(I) g
-1
 
W
 
S
(d
 
W
 
S
(J))}   .

Let us consider the code fragment below:
!HPF$ ALIGN M(i,j) WITH T(i,*)
DO i=1,n
  DO j=1,m
    M(i,j) = ...               (s1)
  END DO
END DO
The aligment of M on T is defined by: ( g :  ( ij ) | id :  ( ij ) | i )   . Since the subscript function for M is the identity, the set OTs1 is such that:
O
s1
 
T

i'
j'

=

s1
i
j

 |  g
i
j

= d
i'
j'



=

s1
i'
j

 |  j [1,m]

  .
This means that a template element T(i,j) owns the full row of rank i of array A.

In conclusion, the set C( S) may be computed using the subscript functions in S for W S and R S, the alignments of W S and R S on the template T and the distribution of T on the virtual processors:
C( S ) =
 
J DT




(J, S (I))  |  IF
-1
 
W
 
S
(J) ,  pT(J)pT(F
 
R
 
S
(I))



    (3)
with the functions FW S and FR S defined as follows:
F
 
W
 
S
=
t
 
W
 
S
f
 
W
 
S
=d
-1
 
W
 
S
g
 
W
 
S
f
 
W
 
S
  ,
F
 
R
 
S
=
t
 
R
 
S
f
 
R
 
S
=d
-1
 
R
 
S
g
 
R
 
S
f
 
R
 
S
  .

4.4   Formal Representation of HPF Distributions

A HPF distribution directive for a template T can be represented using a projection rT and an integer vector kT of size the dimension of the virtual processors grid P. Projection rT selects the dimensions of T to be distributed on P. The dimension of T projected on the ith dimension of P is distributed according to the pattern CYCLIC((kT)i). We denote by Amin (respectively by Amax) the vector of lower bounds (respectively the vector of upper bounds) for array A. In this context the distribution pT can be explicited:
pT(J) = P
min
 
+ (rT(J-T
min
 
)kT)% (P
max
 
-P
min
 
+1)   .     (4)
In the previous expression, the operator (respectively the operator %) represents an element-wise integer division (respectively modulo) on vectors. One may remark that a BLOCK distribution can be achieved on the ith dimension of P using a relevant (kT)i value:
(kT)i=





T
max
 
i
-T
min
 
i
+1
P
max
 
i
-P
min
 
i
+1






  .


Previous Contents Next