LIMITED OFFER
Save 50% on book bundles
Immediately download your ebook while waiting for your print delivery. No promo code needed.
Structured Parallel Programming offers the simplest way for developers to learn patterns for high-performance parallel programming. Written by parallel computing experts and indus… Read more
LIMITED OFFER
Immediately download your ebook while waiting for your print delivery. No promo code needed.
Structured Parallel Programming offers the simplest way for developers to learn patterns for high-performance parallel programming. Written by parallel computing experts and industry insiders Michael McCool, Arch Robison, and James Reinders, this book explains how to design and implement maintainable and efficient parallel algorithms using a composable, structured, scalable, and machine-independent approach to parallel computing. It presents both theory and practice, and provides detailed concrete examples using multiple programming models.
The examples in this book are presented using two of the most popular and cutting edge programming models for parallel programming: Threading Building Blocks, and Cilk Plus. These architecture-independent models enable easy integration into existing applications, preserve investments in existing code, and speed the development of parallel applications. Examples from realistic contexts illustrate patterns and themes in parallel algorithm design that are widely applicable regardless of implementation technology.
Software developers, computer programmers, and software architects will find this book extremely helpful.
Software Developers, Computer Programmers, Software Architects
Listings
Preface
Preliminaries
Chapter 1. Introduction
1.1 Think Parallel
1.2 Performance
1.3 Motivation: Pervasive Parallelism
1.4 Structured Pattern-Based Programming
1.5 Parallel Programming Models
1.6 Organization of this Book
1.7 Summary
Chapter 2. Background
2.1 Vocabulary and Notation
2.2 Strategies
2.3 Mechanisms
2.4 Machine Models
2.5 Performance Theory
2.6 Pitfalls
2.7 Summary
PART I. Patterns
Chapter 3. Patterns
3.1 Nesting Pattern
3.2 Structured Serial Control Flow Patterns
3.3 Parallel Control Patterns
3.4 Serial Data Management Patterns
3.5 Parallel Data Management Patterns
3.6 Other Parallel Patterns
3.7 Non-Deterministic Patterns
3.8 Programming Model Support for Patterns
3.9 Summary
Chapter 4. Map
4.1 Map
4.2 Scaled Vector Addition (SAXPY)
4.3 Mandelbrot
4.4 Sequence of Maps versus Map of Sequence
4.5 Comparison of Parallel Models
4.6 Related Patterns
4.7 Summary
Chapter 5. Collectives
5.1 Reduce
5.2 Fusing Map and Reduce
5.3 Dot Product
5.4 Scan
5.5 Fusing Map and Scan
5.6 Integration
5.7 Summary
Chapter 6. Data Reorganization
6.1 Gather
6.2 Scatter
6.3 Converting Scatter to Gather
6.4 Pack
6.5 Fusing Map and Pack
6.6 Geometric Decomposition and Partition
6.7 Array of Structures vs. Structures of Arrays
6.8 Summary
Chapter 7. Stencil and Recurrence
7.1 Stencil
7.2 Implementing Stencil with Shift
7.3 Tiling Stencils for Cache
7.4 Optimizing Stencils for Communication
7.5 Recurrence
7.6 Summary
Chapter 8. Fork–Join
8.1 Definition
8.2 Programming Model Support for Fork–Join
8.3 Recursive Implementation of Map
8.4 Choosing Base Cases
8.5 Load Balancing
8.6 Complexity of Parallel Divide-and-Conquer
8.7 Karatsuba Multiplication of Polynomials
8.8 Cache Locality and Cache-Oblivious Algorithms
8.9 Quicksort
8.10 Reductions and Hyperobjects
8.11 Implementing Scan with Fork–Join
8.12 Applying Fork–Join to Recurrences
8.13 Summary
Chapter 9. Pipeline
9.1 Basic Pipeline
9.2 Pipeline with Parallel Stages
9.3 Implementation of a Pipeline
9.4 Programming Model Support for Pipelines
9.5 More General Topologies
9.6 Mandatory versus Optional Parallelism
9.7 Summary
PART II. Examples
Chapter 10. Forward Seismic Simulation
10.1 Background
10.2 Stencil Computation
10.3 Impact of Caches on Arithmetic Intensity
10.4 Raising Arithmetic Intensity with Space–Time Tiling
10.5 Cilk Plus Code
10.6 ArBB Implementation
10.7 Summary
Chapter 11. K-Means Clustering
11.1 Algorithm
11.2 K-Means with Cilk Plus
11.3 K-Means with TBB
11.4 Summary
Chapter 12. Bzip2 Data Compression
12.1 The Bzip2 Algorithm
12.2 Three-Stage Pipeline Using TBB
12.3 Four-Stage Pipeline Using TBB
12.4 Three-Stage Pipeline Using Cilk Plus
12.5 Summary
Chapter 13. Merge Sort
13.1 Parallel Merge
13.2 Parallel Merge Sort
13.3 Summary
Chapter 14. Sample Sort
14.1 Overall Structure
14.2 Choosing the Number of Bins
14.3 Binning
14.4 Repacking and Subsorting
14.5 Performance Analysis of Sample Sort
14.6 For C++ Experts
14.7 Summary
Chapter 15. Cholesky Factorization
15.1 Fortran Rules!
15.2 Recursive Cholesky Decomposition
15.3 Triangular Solve
15.4 Symmetric Rank Update
15.5 Where is the Time Spent?
15.6 Summary
APPENDIX A. Further Reading
APPENDIX B. Cilk Plus
APPENDIX C. TBB
APPENDIX D. C++11
APPENDIX E. Glossary
Bibliography
Index
MM
JR
AR