3 Description of the Problem
3.1 General Remarks
The problem we study in this paper is the evaluation of the
communication volume in a HPF program before compilation. The goal is
to help the programmer understand the communications generated by his
program and find a good data distribution (or a more efficient way to
code his algorithm).
The communications we consider are at the PROCESSORS level:
as in HPF, we use an abstract target machine. We stay at the language
level, thus allowing to find a data distribution that is well suited
to the problem, and thus retaining portability. Our aim is not to find
the best data distribution for a given machine, but a good one for any
machine (and compiler).
In the future, if some compiler optimization techniques such as
overlap areas to vectorize communications are used by most
compilers, we could adapt our evaluation techniques to these
optimizations. In a first step, though, we remain at the language
level to validate our approach. We believe indeed that a good data
distribution at the language level should not be a bad one for any
compiler, regardless of the compilation techniques used.
3.2 Modelization
We consider that the only communication generation statements are
storage statements, we do not take into account I/O statements. For
each of these storage statements, the surrounding loops define an
iteration domain that ``shapes'' the communication pattern.
Given the mathematical tools we use in the following, we have to
restrict ourselves to loop bounds that define a polyhedron, that is
loop bounds defined as extrema of affine functions of the surrounding
loop indices and parameters. For the same reason, we restrict the
array access functions to affine functions. This class of loops
contains most of the linear algebra routines.
To simplify the following discussion, we make the hypotheses below
without loss of generality:
-
Scalar values are described as arrays with one element.
- Storage statements read only one array reference that may not be
aligned with the result value. Indeed, any more complex statement
can be decomposed in a sequence of such statements with temporary
storage [3, 2].
- All the arrays of the considered storage statement are aligned
with respect to the same template T. This restriction makes sense since
the distributions of computation related arrays should be related in
some way. Actually most HPF programs contain only one template with
all the arrays aligned onto. The domain of this template T is
noted DT
Given the previous restrictions, a storage statement S is
represented by:
|
W |
|
(f |
|
(I))=
f |
|
(R |
|
(f |
|
(I)))
(1) |
where the iteration vector I has value in the iteration domain
D S defined by the surrounding
loops. W S and R S are arrays and
fW S and fR S are affine access
functions.
We call operation an instance S(I) of a statement
S for a given value I of the iteration vector.
The number of communications generated by a given statement is the
number of array elements that need to be communicated for some
operation generated by this statement. There is a communication when
the array elements being read are not on the same virtual processor
than the one the written element is. As array elements can be
replicated, we will focus on the template elements.