Algebraic Method to Evaluate HPF Communications

4 Algebraic Method to Evaluate HPF Communications

As said in the previous section, we evaluate the communication cost in a HPF program by counting the number of communications between template elements.

4.1 Formula for Communication Cost Evaluation

There is a communication between two template elements if they are not distributed onto the same virtual processor and if the computation of a value to be stored in the first template element uses a value stored in the second one.

To be more precise, let us consider a storage statement S (as defined in the previous section) and a template T. We also need to define some functions:

Function O_T^S gives, for an element J of D_T, the set of storage operations from S which compute values to be stored in T(J);
Function t gives, for an operation o from S, the set of template subscripts from where o may read its data.
Last, function p_T is the distribution function which maps the template T on the virtual processors.

The number of template communications generated by statement S is equal to the number of elements in the union of the sets C_J( S):

C( S

) =

JÎ D_T

C_J( S

) =

JÎ D_T

ì
í
î

(J,o) | oÎ O

(J) , p_T(J) Ïp_T(t(o))

ü
ý
þ

. (2)

Note that C( S) is a set of couples and not of mere operations. Indeed we may have to count several times the same operation, so we added the template subscript to distinguish different occurrences of the same operation. Computing the set C( S) implies the application of a change of basis on the program loop nests. Indeed, in the original program, the loops are used to enumerate the operations in the program execution order and the set C( S) enumerates them following the template iteration space.

4.2 Formal Representation of HPF Alignments

To perform this change of basis the only informations needed are the HPF ALIGN directives. Basically HPF alignments are a sub-set of linear alignments but a replication symbol * has been added. Because of the replication symbol, a HPF alignment cannot be defined by a linear transformation from the array space to the template space.

A convenient way to represent a HPF alignment a is to use two linear transformations g and d. Transformation d defines the replication part of the alignment. Let I be a subscript of an array A aligned on a template T using a. The set of the subscripts of T on which the data A(I) is stored is:

{

J | JÎ D_T, g(I)=d(J)

}

= d^-1(g(I)) .

Let us consider the alignment:

!HPF$ ALIGN A(i) WITH T(i,*)

The first transformation from the array space to an intermediate template space is obtained by removing the replication symbols in the directive: g : i |® i . The second transformation is the projection from the template space to the intermediate template space: d : ( i , j ) |® i .

With this representation of a HPF alignment one can explicit the function t of Sect. (4.1). Let us consider a statement S as defined in (1) and an alignment t_R_{_S} for array R_S on the template T defined by (g_R_{_S},d_R_{_S}). In this context, the following holds:

" IÎ D_T, t( S

(I))= t

(I))= d

-1

(I))) .

4.3 Enumeration of Operations in the Template Iteration Space

The main problem of enumerating operations in the template space is that there may be several operations corresponding to a template element T(J). I.e. more than one operation may produce a value to be stored in T(J). In fact, the solution lies again in the formal representation of HPF alignments. Let us denote by t_W_{_S} the alignment for array W_S. According to Sect. 4.2, the alignment can be written as t_W_{_S}=g_W_{_S}°d_W_{_S}^-1. Therefore, the set O_T^S(J) introduced in the beginning of this section is defined by:

(J) = { S(I) | IÎ D

, f

(I)Î g

-1

(J))} .

Let us consider the code fragment below:

!HPF$ ALIGN M(i,j) WITH T(i,*)
DO i=1,n
  DO j=1,m
    M(i,j) = ...               (s1)
  END DO
END DO

The aligment of M on T is defined by: ( g : ( i , j ) |® i , d : ( i , j ) |® i ) . Since the subscript function for M is the identity, the set O_T^s1 is such that:

s1

æ
è

ö
ø

ì
í
î

s1

æ
è

ö
ø

| g

æ
è

ö
ø

= d

æ
è

ö
ø

ü
ý
þ

ì
í
î

s1

æ
è

ö
ø

| jÎ [1,m]

ü
ý
þ

This means that a template element T(i,j) owns the full row of rank i of array A.

In conclusion, the set C( S) may be computed using the subscript functions in S for W_S and R_S, the alignments of W_S and R_S on the template T and the distribution of T on the virtual processors:

C( S

) =

JÎ D_T

ì
ï
í
ï
î

(J, S

(I)) | IÎF

-1

(J) , p_T(J)Ïp_T(F

(I))

ü
ï
ý
ï
þ

(3)

with the functions F_W_{_S} and F_R_{_S} defined as follows:

°f

-1

°g

°f

-1

°g

°f

4.4 Formal Representation of HPF Distributions

A HPF distribution directive for a template T can be represented using a projection r_T and an integer vector k_T of size the dimension of the virtual processors grid P. Projection r_T selects the dimensions of T to be distributed on P. The dimension of T projected on the i^th dimension of P is distributed according to the pattern CYCLIC((k_T)_i). We denote by A^min (respectively by A^max) the vector of lower bounds (respectively the vector of upper bounds) for array A. In this context the distribution p_T can be explicited:

p_T(J) = P

min

+ (r_T(J-T

min

)÷k_T)% (P

max

-P

min

+1) . (4)

In the previous expression, the operator ÷ (respectively the operator %) represents an element-wise integer division (respectively modulo) on vectors. One may remark that a BLOCK distribution can be achieved on the i^th dimension of P using a relevant (k_T)_i value:

(k_T)_i=

é
ê
ê
ê
ê
ê
ê

max

-T

min

max

-P

min

ù
ú
ú
ú
ú
ú
ú