MPI (Message Passing Interface) is
paradigm of parralel programming that allows multiple processes to
communicate with each other by means of exchanging messages; we will
also refer by this name to the library that provides tools for writing
programs in this paradigm.
There are (known to me) implementations of MPI that allow coding in
FORTRAN, C and C++. These notes deal with C++ implementation only.
There is a single program, that is executed by all processors; the control flow within the code is determined by the processor ID, so this is the programmer's job.
A group is an ordered set of processes; each process has its own ID, called its rank. Ranks are contiguous and start from 0. Groups are objects of class MPI::Group. While it is possible to access some parameters of the group directly, it is much more common to operate through an intragroup communicator associated with the group.
The processes communicate with each other through communicators, which are objects of either intragroup communicator class MPI::Intracomm (point-to-point communication within a single group), or intergroup communicator class MPI::Intecomm (communication between two groups of processes). Both classes are derived from an abstract base class MPI::Comm.
The total number of the processes in the group associated with an
MPI::Intracomm object is returned by
int MPI::Comm::Get_size()
const,
and the rank of a process within that group, which obviously can be between 0
and the return value of Get_size(), is returned by
int MPI::Comm::Get_rank()
const.
Given an intracommunicator, the
corresponding group can be partitioned
into disjoint subgroups by calling
MPI::Intracomm MPI::Intracomm::Split(int color,
int key) const.
For each value of color, a new subgroup is created, that
contains the processes with the same color. The second argument,
key specifies the rank that should be assigned to the process
within the new group (ties broken by comparing th rank in the
"old" group). A new communicator is created for each new group, and is
returned by Split(). If the value of color supplied
by a process is MPI::UNDEFINED, that process will not be
included in any of the new groups.
source | the rank of the sending process |
---|---|
destination | the rank of the intended receiving process |
tag | |
communicator |
The basic tool for sending a message is
MPI::Comm::Send(const
void* buf, int count, const MPI::Datatype& datatype, int dest, int
tag) const;
where
buf is the address of the buffer with the data to be
sent;
count is the number of instances of datatype to be
sent;
tag is the tag (message type); it is an integer between 0 and
MPI::TAG_UB;
dest is the rank of the process to which the message is
sent.
The basic tool for receiving messages is
MPI::Comm::
Recv(
void* buf,
int max_count,
const MPI:Datatype& datatype,
int source,
int tag,
MPI::Status& status) const;
where
buf is the address where the received data should be placed (must
be able to contain at least max_count elements of type
datatype);
tag is the tag of the message (Recv will block until
a message with an envelope mathing the values of tag and
source arrives);
status (optional) contains the info on the result of the
operation.
Both source and tag values allow a wildcard
specification: MPI::ANY_SOURCE and MPI:ANY_TAG, respectively.
An object of class MPI::Status aloows following quieries:
The number of entries received in the call to Recv is
available through a call to
int MPI::Status::Get_count(const MPI::Datatype& datatype)
const.
Synchronization between the processes in a group is achieved
through call to
void MPI::Comm::Barrier(void)
const
- the calling process is blocked until all the group members have
called the Barrier.
The collective communications usually involve a single process (the
root) communicating wih all processes in the group (itself
included). There is a number of different functionalities in such
communications. First, the root might need to "broadcast" a value to
all processes. This is done by
MPI::Intracomm::Bcast(
void* buffer,
int count,
const MPI::Datatype& datatype,
int root) const;
(the value of root must be identical at all processes). E.g., after the
call
comm.Bcast(int_arr, 100, MPI::INT, 0)
the 100 elements of the integer array int_arr in all
processes are copies of these elements in the process 0.
In "gathering", each process (root itself included) sends a value to the root, which stores
all the incoming values:
void MPI::Intracomm::Gather(const void*
sendbuf,
int sendcount,
const MPI::Datatype& sendtype,
void* recbuf,
int recvcount,
constMPI::Datatype&
recvtype, int root
) const;
- the semantics being that each proccess sends a message
(sendcount elements of type sendtype) from its
sendbuf, and the root concatenates all the received messages
(which are assumed to contain at most recvcount elements of
recvtype) at its recbuf. Note that the root must
have recvbuf large enough to contain all the messages
(usually the return value of Comm::Get_size() is used in
allocation), while the rest of the nodes ignore the recvbuf
and thus need not allocate it at all.
An alternative, more flexible variant is provided in the form of
vector gather:
MPI::Intracomm::Gatherv(
const void* sendbuf,
int sendcount,
const MPIDatatype& sendtype,
void* recvbuf,
const int recvcounts[],
const int displs[],
const MPI::Datatype& recvtype,
int root)
const;
Here each processor can communicate a message of different length.
Operation | Meaning |
---|---|
MPI::MAX | Maximum |
MPI::MIN | Minimum |
MPI::SUM | Sum |
MPI::PROD | Product |
MPI::LAND | Logical (bolean) and |
MPI::LOR | Logical or |
MPI::MINLOC | Minimum, and the location of the minimum (the rank of the process that has the min). Similar to MATLAB's [maxval,maxind]=max(...) |
MPI::MAXLOC | Maximum, and the location of the maximum |
MPI::CHAR | signed char |
---|---|
MPI::INT | signed int |
MPI::DOUBLE | double |
MPI::BOOL | bool |
It is possible to derive a new datatype from the existing ones, using
one of the consturctors:
MPI::Datatype MPI::Datatype::Create_contiguous(int count)
const
creates a type with values being a contiguous replication of values of
the old type. For example,
MPI::Datatype newtype = MPI::DOUBLE.Create_contiguous(4);
makes newtype an array of 4 doubles.
The most general type constructor is
static MPI::Datatype MPI::Datatype::Create_struct(int count, const
int array_of_blocklength[], const MPI::Aint array_of__displacements[],
const MPI::Datatype array_of_types[])
For example, let B = [2,1,4], D = [0,16,24], T = [ MPI::DOUBLE,
MPI::INT, MPI::CHAR]. Then, the call to
MPI::Datatype.Create_struct(3,B,D,T) creates the following
layout in memory:
It is much easier to compute the displacements using the function
MPI::Aint MPI::Get_address(void* location)
e.g., in the example above it can be done as follows:
struct S {
double a;
double b;
int x;
char c[4];
};
MPI::Datatype type[3] = {MPI::DOUBLE, MPI::INT, MPI:CHAR};
int blocklen[3] = {2,1,4};
MPI::Aint disp[3];
struct S data[10];
disp[0] = MPI::Get_address(&data[0].a);
disp[1] = MPI::Get_address(&data[0].x);
disp[2] = MPI::Get_address(&data[0].c);
MPI::Datatype newtype = MPI::Create_struct(3,blocklen,disp,type);
newtype.Commit();
...
Before a derived type can be used in MPI communications, it should be
commited:
void MPI::Datatype::Commit(void)
and the handle can be freed, after the program is done using the type,
by calling
void MPI::Datatype::Free(void)