Message Passing Interface (MPI) is a language-independent communications protocol used to program parallel computers. Both point-to-point and collective communication are supported. MPI "is a message-passing application programmer interface, together with protocol and semantic specifications for how its features must behave in any implementation." So, MPI is a specification, not an implementation.
MPI is not sanctioned by any major standards body; nevertheless, it has become the de facto standard for communication among processes that model a parallel program running on a distributed memory system. Actual distributed memory supercomputers such as computer clusters often run these programs. The principal MPI-1 model has no shared memory concept, and MPI-2 has only a limited distributed shared memory concept. Nonetheless, MPI programs are regularly run on shared memory computers.
Designing programs around the MPI model (as opposed to explicit shared memory models) has advantages on NUMA architectures as programming for MPI encourages memory locality.
Most MPI implementations consist of a specific set of routines (API) callable from Fortran, C, or C++ and from any language capable of interfacing with such routine libraries. The advantages of MPI over older message passing libraries are portability (because MPI has been implemented for almost every distributed memory architecture) and speed (because each implementation is in principle optimized for the hardware on which it runs).
MPI is often compared with PVM, which is a popular distributed environment and message passing system developed in 1989, and which was one of the systems that motivated the need for standard parallel message passing systems.
Threaded shared memory programming models (such as Pthreads and OpenMP) and message passing programming (MPI/PVM) can be considered as complementary programming approaches.
OpenMP is an implementation of multithreading, a method of parallelization whereby the master "thread" (a series of instructions executed consecutively) "forks" a specified number of slave "threads" and a task is divided among them. The threads then run concurrently, with the runtime environment allocating threads to different processors.
The section of code that is meant to run in parallel is marked accordingly, with a preprocessor directive that will cause the threads to form before the section is executed. Each thread has an "id" attached to it which can be obtained using a function (called omp_get_thread_num() in C/C++ and OMP_GET_THREAD_NUM() in FORTRAN). The thread id is an integer, and the master thread has an id of "0". After the execution of the parallelized code, the threads "join" back into the master thread, which continues onward to the end of the program. The number of threads for execution can be determined either statically (by environment variables) or dynamically (by a function call).
By default, each thread executes the parallelized section of code independently. "Work-sharing constructs" can be used to divide a task among the threads so that each thread executes its allocated part of the code. Both Task parallelism and Data parallelism can be achieved using OpenMP in this way.
Pros and Cons of OpenMP
• considered by some to be easier to program and debug (compared to MPI)
• data layout and decomposition is handled automatically by directives.
• allows incremental parallelism: directives can be added incrementally, so the program can be parallelized one portion after another and thus no dramatic change to code is needed.
• unified code for both serial and parallel applications: OpenMP constructs are treated as comments when sequential compilers are used.
• original (serial) code statements need not, in general, be modified when parallelized with OpenMP. This reduces the chance of inadvertently introducing bugs and helps maintenance as well.
• both coarse-grained and fine-grained parallelism are possible
• currently only runs efficiently in shared-memory multiprocessor platforms
• requires a compiler that supports OpenMP.
• scalability is limited by memory architecture.
• reliable error handling is missing.
• lacks fine-grained mechanisms to control thread-processor mapping.
• synchronization between subsets of threads is not allowed.
• mostly used for loop parallelization
• can be difficult to debug, due to implicit communication between threads via shared variables.
Pros and Cons of MPI
• does not require shared memory architectures which are more expensive than distributed memory architectures
• can be used on a wider range of problems since it exploits both task parallelism and data parallelism
• can run on both shared memory and distributed memory architectures
• highly portable with specific optimization for the implementation on most hardware
• requires more programming changes to go from serial to parallel version
• can be harder to debug