Description of the Problem

3 Description of the Problem

3.1 General Remarks

The problem we study in this paper is the evaluation of the communication volume in a HPF program before compilation. The goal is to help the programmer understand the communications generated by his program and find a good data distribution (or a more efficient way to code his algorithm).

The communications we consider are at the PROCESSORS level: as in HPF, we use an abstract target machine. We stay at the language level, thus allowing to find a data distribution that is well suited to the problem, and thus retaining portability. Our aim is not to find the best data distribution for a given machine, but a good one for any machine (and compiler).

In the future, if some compiler optimization techniques such as overlap areas to vectorize communications are used by most compilers, we could adapt our evaluation techniques to these optimizations. In a first step, though, we remain at the language level to validate our approach. We believe indeed that a good data distribution at the language level should not be a bad one for any compiler, regardless of the compilation techniques used.

3.2 Modelization

We consider that the only communication generation statements are storage statements, we do not take into account I/O statements. For each of these storage statements, the surrounding loops define an iteration domain that ``shapes'' the communication pattern.

Given the mathematical tools we use in the following, we have to restrict ourselves to loop bounds that define a polyhedron, that is loop bounds defined as extrema of affine functions of the surrounding loop indices and parameters. For the same reason, we restrict the array access functions to affine functions. This class of loops contains most of the linear algebra routines.

To simplify the following discussion, we make the hypotheses below without loss of generality:

Scalar values are described as arrays with one element.
Storage statements read only one array reference that may not be aligned with the result value. Indeed, any more complex statement can be decomposed in a sequence of such statements with temporary storage [3, 2].
All the arrays of the considered storage statement are aligned with respect to the same template T. This restriction makes sense since the distributions of computation related arrays should be related in some way. Actually most HPF programs contain only one template with all the arrays aligned onto. The domain of this template T is noted D_T

Given the previous restrictions, a storage statement S is represented by:

(I))= f

(I))) (1)

where the iteration vector I has value in the iteration domain D_S defined by the surrounding loops. W_S and R_S are arrays and f_W_{_S} and f_R_{_S} are affine access functions.

We call operation an instance S(I) of a statement S for a given value I of the iteration vector.

The number of communications generated by a given statement is the number of array elements that need to be communicated for some operation generated by this statement. There is a communication when the array elements being read are not on the same virtual processor than the one the written element is. As array elements can be replicated, we will focus on the template elements.