
Parallel Programming with MPI
- 1st Edition - October 1, 1996
- Imprint: Morgan Kaufmann
- Author: Peter Pacheco
- Language: English
- Paperback ISBN:9 7 8 - 1 - 5 5 8 6 0 - 3 3 9 - 4
- eBook ISBN:9 7 8 - 0 - 0 8 - 0 5 1 3 5 4 - 6
A hands-on introduction to parallel programming based on the Message-Passing Interface (MPI) standard, the de-facto industry standard adopted by major vendors of commercial… Read more

Purchase options

Institutional subscription on ScienceDirect
Request a sales quoteA hands-on introduction to parallel programming based on the Message-Passing Interface (MPI) standard, the de-facto industry standard adopted by major vendors of commercial parallel systems. This textbook/tutorial, based on the C language, contains many fully-developed examples and exercises. The complete source code for the examples is available in both C and Fortran 77. Students and professionals will find that the portability of MPI, combined with a thorough grounding in parallel programming principles, will allow them to program any parallel system, from a network of workstations to a parallel supercomputer.
* Proceeds from basic blocking sends and receives to the most esoteric aspects of MPI.
* Includes extensive coverage of performance and debugging.
* Discusses a variety of approaches to the problem of basic I/O on parallel machines.
* Provides exercises and programming assignments.
* Includes extensive coverage of performance and debugging.
* Discusses a variety of approaches to the problem of basic I/O on parallel machines.
* Provides exercises and programming assignments.
Foreword
Preface
Chapter 1 Introduction
1.1 The Need for More Computational Power
1.2 The Need for Parallel Computing
1.3 The Bad News
1.4 MPI
1.5 The Rest of the Book
1.6 Typographic Conventions
Chapter 2 An Overview of Parallel Computing
2.1 Hardware
2.1.1 Flynn's Taxonomy
2.1.2 The Classical von Neumann Machine
2.1.3 Pipeline and Vector Architectures
2.1.4 SIMD Systems
2.1.5 General MIMD Systems
2.1.6 Shared-Memory MIMD
2.1.7 Distributed-Memory MIMD
2.1.8 Communication and Routing
2.2 Software Issues
2.2.1 Shared-Memory Programming
2.2.2 Message Passing
2.2.3 Data-Parallel Languages
2.2.4 RPC and Active Messages
2.2.5 Data Mapping
2.3 Summary
2.4 References
2.5 Exercises
Chapter 3 Greetings!
3.1 The Program
3.2 Execution
3.3 MPI
3.3.1 General MPI Programs
3.3.2 Finding Out about the Rest of the World
3.3.3 Message: Data + Envelope
3.3.4 Sending Messages
3.4 Summary
3.5 References
3.6 Exercises
3.7 Programming Assignment
Chapter 4 An Application: Numerical Integration
4.1 The Trapezoidal Rule
4.2 Parallelizing the Trapezoidal Rule
4.3 I/O on Parallel Systems
4.4 Summary
4.5 References
4.6 Exercises
4.7 Programming Assignments
Chapter 5 Collective Communication
5.1 Tree-Structured Communication
5.2 Broadcast
5.3 Tags, Safety, Buffering, and Synchronization
5.4 Reduce
5.5 Dot Product
5.6 Allreduce
5.7 Gather and Scatter
5.8 Summary
5.10 References
5.11 Exercises
5.12 Programming Assignments
Chapter 6 Grouping Data for Communication
6.1 The Count Parameter
6.2 Derived Types and MPI_Type_struct
6.3 Other Derived Datatype Constructors
6.4 Type Matching
6.5 Pack/Unpack
6.6 Deciding Which Method to Use
6.7 Summary
6.8 References
6.9 Exercises
6.10 Programming Assignments
Chapter 7 Communicators and Topologies
7.1 Matrix Multiplication
7.2 Fox's Algorithm
7.3 Communicators
7.4 Working with Groups, Contexts, and Communicators
7.5 MPI_Comm_split
7.6 Topologies
7.7 MPI_Cart_sub
7.8 Implementation of Fox's Algorithm
7.9 Summary
7.10 References
7.11 Exercises
7.12 Programming Assignments
Chapter 8 Dealing with I/O
8.1 Dealing with stdin, stdout, and stderr
8.1.1 Attribute caching
8.1.2 Callback Functions
8.1.3 Identifying the I/O process rank
8.1.4 Caching an I/O Process Rank
8.1.5 Retrieving the I/O Process Rank
8.1.6 Reading from stdin
8.1.7 Writing to stdout
8.1.8 Writing to stderr and Error Checking
8.2 Limited Access to stdin
8.3 File I/O
8.4 Array I/O
8.4.1 Data Distributions
8.4.2 Model Problem
8.4.3 Distribution of the Input
8.4.4 Derived Datatypes
8.4.5 The Extent of a Derived Datatype
8.4.6 The Input Code
8.4.7 Printing the Array
8.4.8 An Example
8.5 Summary
8.6 References
8.7 Exercises
8.8 Programming Assignments
Chapter 9 Debugging Your Program
9.1 Quick Review of Serial Debugging
9.1.1 Examine the Source Code
9.1.2 Add Debugging Output
9.1.3 Use a Debugger
9.2 More on Serial debugging
9.3 Parallel Debugging
9.4 Nondeterminism
9.5 An Example
9.5.1 The Program?
9.5.2 Debugging The Program
9.5.3 A Brief Discussion of Parallel Debuggers
9.5.4 The Old Standby: printf/fflush
9.5.5 The Classical Bugs in Parallel Programs
9.5.6 First Fix
9.5.7 many parallel Programming Bugs are Really Serial Programming Bugs
9.5.8 Different Systems, Different Errors
9.5.9 Moving to Multiple Processes
9.5.10 Confusion about I/O
9.5.11 Finishing Up
9.6 Error Handling in MPI
9.7 Summary
9.8 References
9.9 Exercises
9.10 Programming Assignments
Chapter 10 Design and Coding of Parallel Programs
10.1 Data-Parallel Programs
10.2 Jacobi's Method
10.3 Parallel Jacobi's Method
10.4 Coding Parallel Programs
10.5 An Example: Sorting
10.5.1 Main Program
10.5.2 The "Input" Functions
10.5.3 All-to-all Scatter/Gather
10.5.4 Redistributing the Keys
10.5.5 Pause to Clean Up
10.5.6 Find_alltoall_send_params
10.5.7 Finishing Up
10.8 Summary
10.7 References
10.8 Exercises
10.9 Programming Assignments
Chapter 11 Performance
11.1 Serial Program Performance
11.2 An Example: The Serial Trapezoidal Rule
11.3 What about the I/O?
11.4 Parallel Program Performance Analysis
11.5 The Cost of Communication
11.6 An Example: The Parallel Trapezoidal Rule
11.7 Taking Timings
11.8 Summary
11.9 References
11.10 Exercises
11.11 Programming Assignments
Chapter 12 More on Performance
12.1 Amdahl's Law
12.2 Work and Overhead
12.3 Sources of Overhead
12.4 Scalability
12.5 Potential Problems in Estimating Performance
12.5.1 Networks of Workstations and Resource Contention
12.5.2 Load Balancing and Idle Time
12.5.3 Overlapping Communication and Computation
12.5.4 Collective Communication
12.6 Performance Evaluation Tools
12.6.1 MPI's Profiling Interface
12.6.2 Upshot
12.7 Summary
12.8 References
12.9 Exercises
12.10 Programming Assignments
Chapter 13 Advanced Point-to-Point Communication
13.1 An Example: Coding Allgather
13.1.1 Function Parameters
13.1.2 Ring Pass Allgather
13.2 Hypercubes
13.2.1 Additional Issues in the Hypercube Exchange
13.2.2 Details of the Hypercube Algorithm
13.3 Send-receive
13.4 Null Processes
13.5 Nonblocking Communication
13.5.1 Ring Allgather with Nonblocking Communication
13.5.2 Hypercube Allgather with Nonblocking Communication
13.6 Persistent Communication Requests
13.7 Communication Modes
13.7.1 Synchronous Mode
13.7.2 Ready Mode
13.7.3 Buffered Mode
13.8 The Last Word on Point-to-Point Communication
13.9 Summary
13.10 References
13.11 Exercises
13.12 Programming Assignments
Chapter 14 Parallel Algorithms
14.1 Designing a Parallel Algorithm
14.2 Sorting
14.3 Serial Bitonic Sort
14.4 Parallel Bitonic Sort
14.5 Tree Searches and Combinatorial Optimization
14.6 Serial Tree Search
14.7 Parallel Tree Search
14.7.1 Par_dfs
14.7.2 Service_requests
14.7.3 Work_remains
14.7.4 Distributed Termination Detection
14.8 Summary
14.9 References
14.10 Exercises
14.11 Programming Assignments
Chapter 15 Parallel Libraries
15.1 Using Libraries: Pro and Con
15.2 Using More than One Language
15.3 ScaLAPACK
15.4 An Example of a ScaLAPACK Program
15.5 PETSc
15.6 A PETSc Example
15.7 Summary
15.8 References
15.9 Exercises
15.10 Programming Assignments
Chapter 16 Wrapping Up
16.1 Where to Go from Here
16.2 The Future of MPI
Appendix A Summary of MPI Commands
A.1 Point-to-Point Communication Functions
A.1.1 Blocking Sends and Receives
A.1.2 Communication Modes
A.1.3 Buffer Allocation
A.1.4 Nonblocking Communication
A.1.5 Probe and Cancel
A.1.6 Persistent Communication Requests
A.1.7 Send-receive
A.2 Derived Datatypes and MPI_Pack/Unpack
A.2.1 Derived Datatypes
A.2.2 MPI_Pack and MPI_Unpack
A.3 Collective Communication Functions
A.3.1 Barrier and Broadcast
A.3.2 Gather Scatter
A.3.3 Reduction Operations
A.4 Groups, Contexts, and Communicators
A.4.1 Group Management
A.4.2 Communicator Management
A.4.3 Inter-communicators
A.4.4 Attribute Caching
A.5 Process Topologies
A.5.1 General Topology Functions
A.5.2 Cartesian Topology Management
A.5.3 Graph Topology Management
A.6 Environmental Management
A.6.1 Implementation Information
A.6.2 Error Handling
A.6.3 Timers
A.6.4 Startup
A.7 Profiling
A.8 Constants
A.9 Type Definitions
Appendix B MPI on the Internet
B.1 Implementations of MPI
B.2 The MPI FAQ
B.3 MPI Web Pages
B.4 MPI Newsgroup
B.5 MPI-2 and MPI-IO
B.6 Parallel Programming with MPI
Bibliography
Index
Preface
Chapter 1 Introduction
1.1 The Need for More Computational Power
1.2 The Need for Parallel Computing
1.3 The Bad News
1.4 MPI
1.5 The Rest of the Book
1.6 Typographic Conventions
Chapter 2 An Overview of Parallel Computing
2.1 Hardware
2.1.1 Flynn's Taxonomy
2.1.2 The Classical von Neumann Machine
2.1.3 Pipeline and Vector Architectures
2.1.4 SIMD Systems
2.1.5 General MIMD Systems
2.1.6 Shared-Memory MIMD
2.1.7 Distributed-Memory MIMD
2.1.8 Communication and Routing
2.2 Software Issues
2.2.1 Shared-Memory Programming
2.2.2 Message Passing
2.2.3 Data-Parallel Languages
2.2.4 RPC and Active Messages
2.2.5 Data Mapping
2.3 Summary
2.4 References
2.5 Exercises
Chapter 3 Greetings!
3.1 The Program
3.2 Execution
3.3 MPI
3.3.1 General MPI Programs
3.3.2 Finding Out about the Rest of the World
3.3.3 Message: Data + Envelope
3.3.4 Sending Messages
3.4 Summary
3.5 References
3.6 Exercises
3.7 Programming Assignment
Chapter 4 An Application: Numerical Integration
4.1 The Trapezoidal Rule
4.2 Parallelizing the Trapezoidal Rule
4.3 I/O on Parallel Systems
4.4 Summary
4.5 References
4.6 Exercises
4.7 Programming Assignments
Chapter 5 Collective Communication
5.1 Tree-Structured Communication
5.2 Broadcast
5.3 Tags, Safety, Buffering, and Synchronization
5.4 Reduce
5.5 Dot Product
5.6 Allreduce
5.7 Gather and Scatter
5.8 Summary
5.10 References
5.11 Exercises
5.12 Programming Assignments
Chapter 6 Grouping Data for Communication
6.1 The Count Parameter
6.2 Derived Types and MPI_Type_struct
6.3 Other Derived Datatype Constructors
6.4 Type Matching
6.5 Pack/Unpack
6.6 Deciding Which Method to Use
6.7 Summary
6.8 References
6.9 Exercises
6.10 Programming Assignments
Chapter 7 Communicators and Topologies
7.1 Matrix Multiplication
7.2 Fox's Algorithm
7.3 Communicators
7.4 Working with Groups, Contexts, and Communicators
7.5 MPI_Comm_split
7.6 Topologies
7.7 MPI_Cart_sub
7.8 Implementation of Fox's Algorithm
7.9 Summary
7.10 References
7.11 Exercises
7.12 Programming Assignments
Chapter 8 Dealing with I/O
8.1 Dealing with stdin, stdout, and stderr
8.1.1 Attribute caching
8.1.2 Callback Functions
8.1.3 Identifying the I/O process rank
8.1.4 Caching an I/O Process Rank
8.1.5 Retrieving the I/O Process Rank
8.1.6 Reading from stdin
8.1.7 Writing to stdout
8.1.8 Writing to stderr and Error Checking
8.2 Limited Access to stdin
8.3 File I/O
8.4 Array I/O
8.4.1 Data Distributions
8.4.2 Model Problem
8.4.3 Distribution of the Input
8.4.4 Derived Datatypes
8.4.5 The Extent of a Derived Datatype
8.4.6 The Input Code
8.4.7 Printing the Array
8.4.8 An Example
8.5 Summary
8.6 References
8.7 Exercises
8.8 Programming Assignments
Chapter 9 Debugging Your Program
9.1 Quick Review of Serial Debugging
9.1.1 Examine the Source Code
9.1.2 Add Debugging Output
9.1.3 Use a Debugger
9.2 More on Serial debugging
9.3 Parallel Debugging
9.4 Nondeterminism
9.5 An Example
9.5.1 The Program?
9.5.2 Debugging The Program
9.5.3 A Brief Discussion of Parallel Debuggers
9.5.4 The Old Standby: printf/fflush
9.5.5 The Classical Bugs in Parallel Programs
9.5.6 First Fix
9.5.7 many parallel Programming Bugs are Really Serial Programming Bugs
9.5.8 Different Systems, Different Errors
9.5.9 Moving to Multiple Processes
9.5.10 Confusion about I/O
9.5.11 Finishing Up
9.6 Error Handling in MPI
9.7 Summary
9.8 References
9.9 Exercises
9.10 Programming Assignments
Chapter 10 Design and Coding of Parallel Programs
10.1 Data-Parallel Programs
10.2 Jacobi's Method
10.3 Parallel Jacobi's Method
10.4 Coding Parallel Programs
10.5 An Example: Sorting
10.5.1 Main Program
10.5.2 The "Input" Functions
10.5.3 All-to-all Scatter/Gather
10.5.4 Redistributing the Keys
10.5.5 Pause to Clean Up
10.5.6 Find_alltoall_send_params
10.5.7 Finishing Up
10.8 Summary
10.7 References
10.8 Exercises
10.9 Programming Assignments
Chapter 11 Performance
11.1 Serial Program Performance
11.2 An Example: The Serial Trapezoidal Rule
11.3 What about the I/O?
11.4 Parallel Program Performance Analysis
11.5 The Cost of Communication
11.6 An Example: The Parallel Trapezoidal Rule
11.7 Taking Timings
11.8 Summary
11.9 References
11.10 Exercises
11.11 Programming Assignments
Chapter 12 More on Performance
12.1 Amdahl's Law
12.2 Work and Overhead
12.3 Sources of Overhead
12.4 Scalability
12.5 Potential Problems in Estimating Performance
12.5.1 Networks of Workstations and Resource Contention
12.5.2 Load Balancing and Idle Time
12.5.3 Overlapping Communication and Computation
12.5.4 Collective Communication
12.6 Performance Evaluation Tools
12.6.1 MPI's Profiling Interface
12.6.2 Upshot
12.7 Summary
12.8 References
12.9 Exercises
12.10 Programming Assignments
Chapter 13 Advanced Point-to-Point Communication
13.1 An Example: Coding Allgather
13.1.1 Function Parameters
13.1.2 Ring Pass Allgather
13.2 Hypercubes
13.2.1 Additional Issues in the Hypercube Exchange
13.2.2 Details of the Hypercube Algorithm
13.3 Send-receive
13.4 Null Processes
13.5 Nonblocking Communication
13.5.1 Ring Allgather with Nonblocking Communication
13.5.2 Hypercube Allgather with Nonblocking Communication
13.6 Persistent Communication Requests
13.7 Communication Modes
13.7.1 Synchronous Mode
13.7.2 Ready Mode
13.7.3 Buffered Mode
13.8 The Last Word on Point-to-Point Communication
13.9 Summary
13.10 References
13.11 Exercises
13.12 Programming Assignments
Chapter 14 Parallel Algorithms
14.1 Designing a Parallel Algorithm
14.2 Sorting
14.3 Serial Bitonic Sort
14.4 Parallel Bitonic Sort
14.5 Tree Searches and Combinatorial Optimization
14.6 Serial Tree Search
14.7 Parallel Tree Search
14.7.1 Par_dfs
14.7.2 Service_requests
14.7.3 Work_remains
14.7.4 Distributed Termination Detection
14.8 Summary
14.9 References
14.10 Exercises
14.11 Programming Assignments
Chapter 15 Parallel Libraries
15.1 Using Libraries: Pro and Con
15.2 Using More than One Language
15.3 ScaLAPACK
15.4 An Example of a ScaLAPACK Program
15.5 PETSc
15.6 A PETSc Example
15.7 Summary
15.8 References
15.9 Exercises
15.10 Programming Assignments
Chapter 16 Wrapping Up
16.1 Where to Go from Here
16.2 The Future of MPI
Appendix A Summary of MPI Commands
A.1 Point-to-Point Communication Functions
A.1.1 Blocking Sends and Receives
A.1.2 Communication Modes
A.1.3 Buffer Allocation
A.1.4 Nonblocking Communication
A.1.5 Probe and Cancel
A.1.6 Persistent Communication Requests
A.1.7 Send-receive
A.2 Derived Datatypes and MPI_Pack/Unpack
A.2.1 Derived Datatypes
A.2.2 MPI_Pack and MPI_Unpack
A.3 Collective Communication Functions
A.3.1 Barrier and Broadcast
A.3.2 Gather Scatter
A.3.3 Reduction Operations
A.4 Groups, Contexts, and Communicators
A.4.1 Group Management
A.4.2 Communicator Management
A.4.3 Inter-communicators
A.4.4 Attribute Caching
A.5 Process Topologies
A.5.1 General Topology Functions
A.5.2 Cartesian Topology Management
A.5.3 Graph Topology Management
A.6 Environmental Management
A.6.1 Implementation Information
A.6.2 Error Handling
A.6.3 Timers
A.6.4 Startup
A.7 Profiling
A.8 Constants
A.9 Type Definitions
Appendix B MPI on the Internet
B.1 Implementations of MPI
B.2 The MPI FAQ
B.3 MPI Web Pages
B.4 MPI Newsgroup
B.5 MPI-2 and MPI-IO
B.6 Parallel Programming with MPI
Bibliography
Index
- Edition: 1
- Published: October 1, 1996
- Imprint: Morgan Kaufmann
- No. of pages: 500
- Language: English
- Paperback ISBN: 9781558603394
- eBook ISBN: 9780080513546
PP
Peter Pacheco
Peter Pacheco received a PhD in mathematics from Florida State University. After completing graduate school, he became one of the first professors in UCLA’s “Program in Computing,” which teaches basic computer science to students at the College of Letters and Sciences there. Since leaving UCLA, he has been on the faculty of the University of San Francisco. At USF Peter has served as chair of the computer science department and is currently chair of the mathematics department.
His research is in parallel scientific computing. He has worked on the development of parallel software for circuit simulation, speech recognition, and the simulation of large networks of biologically accurate neurons. Peter has been teaching parallel computing at both the undergraduate and graduate levels for nearly twenty years. He is the author of Parallel Programming with MPI, published by Morgan Kaufmann Publishers.
Affiliations and expertise
University of San Francisco, USA