Programming Massively Parallel Processors
A Hands-on Approach
- 5th Edition - February 27, 2026
- Latest edition
- Authors: Wen-mei W. Hwu, David B. Kirk, Izzat El Hajj
- Language: English
Programming Massively Parallel Processors: A Hands-on Approach Fifth Edition shows both students and professionals alike the basic concepts of parallel programming and GPU… Read more
Programming Massively Parallel Processors: A Hands-on Approach Fifth Edition shows both students and professionals alike the basic concepts of parallel programming and GPU architecture. Concise, intuitive, and practical, it is based on years of road-testing in the authors' own parallel computing courses. Various techniques for constructing and optimizing parallel programs are explored in detail, while case studies demonstrate the development process, which begins with computational thinking and ends with effective and efficient parallel programs. This new edition has been updated with an expanded repertoire of optimizations, new patterns and applications, ad more coverage of important CUDA features.
· Expanded optimization checklist with a more comprehensive demonstration of essential optimizations across patterns
· New pattern and application chapters including: filtering, wavefront parallelism, advanced optimizations for matrix multiplication, and large language models (LLMs)
· More coverage of important CUDA features including warp-level programming, cooperative groups, CUDA C++ atomics, and multi-GPU programming with NCCL and NVSHMEM
Upper-level undergraduate through graduate level students studying parallel computing within computer science or engineering
1. Introduction
Part I. Fundamental Concepts
2. Heterogeneous data parallel computing
3. Multidimensional grids and data
4. Compute architecture and scheduling
5. Memory architecture and data locality
6. Performance considerations
Part II. Parallel Patterns
7. Convolution
8. Stencil
9. Parallel histogram
10. Reduction
11. Prefix sum (scan)
12. Merge
Part III. Advanced Patterns and Applications
13. Sorting
14. Filtering (new)
15. Sparse matrix computation
16. Wavefront Algorithms (new)
17. Graph traversal
18. Deep learning
19. Multi-GPU API (new)
20. Electrostatic potential map
21. Parallel programming and computational thinking
Part IV. Advanced Practices
22. Programming a heterogeneous computing cluster
23. Advanced Optimizations for Matrix Multiplication (new)
24. Advanced practices and future evolution
25. Conclusion and outlook
Part I. Fundamental Concepts
2. Heterogeneous data parallel computing
3. Multidimensional grids and data
4. Compute architecture and scheduling
5. Memory architecture and data locality
6. Performance considerations
Part II. Parallel Patterns
7. Convolution
8. Stencil
9. Parallel histogram
10. Reduction
11. Prefix sum (scan)
12. Merge
Part III. Advanced Patterns and Applications
13. Sorting
14. Filtering (new)
15. Sparse matrix computation
16. Wavefront Algorithms (new)
17. Graph traversal
18. Deep learning
19. Multi-GPU API (new)
20. Electrostatic potential map
21. Parallel programming and computational thinking
Part IV. Advanced Practices
22. Programming a heterogeneous computing cluster
23. Advanced Optimizations for Matrix Multiplication (new)
24. Advanced practices and future evolution
25. Conclusion and outlook
- Edition: 5
- Latest edition
- Published: February 27, 2026
- Language: English
WH
Wen-mei W. Hwu
Wen-mei W. Hwu
is a Senior Director of
Research of NVIDIA and the
Sanders-AMD Endowed Chair
Professor Emeritus of Electrical
and Computer Engineering
at the University of Illinois
at Urbana-Champaign. His
work focuses on parallel
computing—covering
architecture, implementation,
compilers, and algorithms. Dr.
Hwu has received numerous
honors, including the ACM/
IEEE Eckert-Mauchly Award,
ACM Grace Murray Hopper
Award, IEEE B.R. Rau Award.
He is an IEEE and ACM
Fellow. He earned his Ph.D.
in Computer Science from UC
Berkele
Affiliations and expertise
CTO, MulticoreWare and professor specializing in compiler design, computer architecture, microarchitecture, and parallel processing, University of Illinois at Urbana-Champaign, USADK
David B. Kirk
David B. Kirk
is known for major
contributions to graphics,
hardware, and algorithms.
Before pursuing his Ph.D. at
Caltech, he earned B.S. and
M.S. degrees in mechanical
engineering from MIT and
worked at Raster Technologies
and Hewlett-Packard’s Apollo
Systems Division. After
completing his doctorate, he
served as chief scientist and
head of technology at Crystal
Dynamics. In 1997, he became
Chief Scientist at NVIDIA. Dr.
Kirk has received numerous
honors including the IEEE
Seymour Cray Computer
Engineering Award and
ACM SIGGRAPH Computer
Graphics Achievement
Award. He is a member of
the U.S. National Academy of
Engineering.
Affiliations and expertise
NVIDIA FellowIE
Izzat El Hajj
Izzat El Hajj
is an Assistant Professor
of Computer Science at
the American University
of Beirut. His research
focuses on leveraging
accelerator architectures
to tackle challenging
computations, with a
focus on GPU computing,
processing-in-memory,
and performance
modeling. He earned his
Ph.D. in Electrical and
Computer Engineering at
the University of Illinois
at Urbana-Champaign.
He has received the
Dan Vivoli Endowed
Fellowship (UIUC) and the
Distinguished Graduate
Award from the American
University of Beirut.
Affiliations and expertise
Assistant Professor, Department of Computer Science, American University of Beirut, Lebanon