LIMITED OFFER
Save 50% on book bundles
Immediately download your ebook while waiting for your print delivery. No promo code needed.
GPU Computing Gems Emerald Edition offers practical techniques in parallel computing using graphics processing units (GPUs) to enhance scientific research. The first volume in… Read more
LIMITED OFFER
Immediately download your ebook while waiting for your print delivery. No promo code needed.
GPU Computing Gems Emerald Edition offers practical techniques in parallel computing using graphics processing units (GPUs) to enhance scientific research. The first volume in Morgan Kaufmann's Applications of GPU Computing Series, this book offers the latest insights and research in computer vision, electronic design automation, and emerging data-intensive applications. It also covers life sciences, medical imaging, ray tracing and rendering, scientific simulation, signal and audio processing, statistical modeling, video and image processing.
This book is intended to help those who are facing the challenge of programming systems to effectively use GPUs to achieve efficiency and performance goals. It offers developers a window into diverse application areas, and the opportunity to gain insights from others' algorithm work that they may apply to their own projects. Readers will learn from the leading researchers in parallel programming, who have gathered their solutions and experience in one volume under the guidance of expert area editors. Each chapter is written to be accessible to researchers from other domains, allowing knowledge to cross-pollinate across the GPU spectrum. Many examples leverage NVIDIA's CUDA parallel computing architecture, the most widely-adopted massively parallel programming solution. The insights and ideas as well as practical hands-on skills in the book can be immediately put to use.
Computer programmers, software engineers, hardware engineers, and computer science students will find this volume a helpful resource. For useful source codes discussed throughout the book, the editors invite readers to the following website: <a href="http://gpugems.hwu-server2.crhc.illinois.edu</a>…"
computer programmers, software engineers, hardware engineers, computer science students
Editors, Reviewers, and Authors
Introduction
Introduction
Chapter 1. GPU-Accelerated Computation and Interactive Display of Molecular Orbitals
1.1. Introduction, Problem Statement, and Context
1.2. Core Method
1.3. Algorithms, Implementations, and Evaluations
1.4. Final Evaluation
1.5. Future Directions
Chapter 2. Large-Scale Chemical Informatics on GPUs
2.1. Introduction, Problem Statement, and Context
2.2. Core Methods
2.3. Gaussian Shape Overlay: Parallelization and Arithmetic Optimization
2.4. LINGO: Algorithmic Transformation and Memory Optimization
2.5. Final Evaluation
2.6. Future Directions
Chapter 3. Dynamical Quadrature Grids
3.1. Introduction
3.2. Core Method
3.3. Implementation
3.4. Performance Improvement
3.5. Future Work
Chapter 4. Fast Molecular Electrostatics Algorithms on GPUs
4.1. Introduction, Problem Statement, and Context
4.2. Core Method
4.3. Algorithms, Implementations, and Evaluations
4.4. Final Evaluation
4.5. Future Directions
Chapter 5. Quantum Chemistry
5.1. Problem Statement
5.2. Core Technology and Algorithm
5.3. The Key Insight on the Implementation—the Choice of Building Blocks
5.4. Final Evaluation and Benefits
5.5. Conclusions and Future Directions
Chapter 6. An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-Body Algorithm
6.1. Introduction, Problem Statement, and Context
6.2. Core Methods
6.3. Algorithms and Implementations
6.4. Evaluation and Validation of Results, Total Benefits, and Limitations
6.5. Future Directions
Chapter 7. Leveraging the Untapped Computation Power of GPUs
7.1. Background and Problem Statement
7.2. Flux Calculation and Aggregation
7.3. The GRASSY Platform
7.4. Initial Testing
7.5. Impact and Future Directions
Chapter 8. Black Hole Simulations with CUDA
8.1. Introduction
8.2. The Post-Newtonian Approximation
8.3. Numerical Algorithm
8.4. GPU Implementation
8.5. Performance Results
8.6. GPU Supercomputing Clusters
8.7. Statistical Results for Black Hole Inspirals
8.8. Conclusion
Chapter 9. Treecode and Fast Multipole Method for N-Body Simulation with CUDA
9.1. Introduction
9.2. Fast N-Body Simulation
9.3. CUDA Implementation of the Fast N-Body Algorithms
9.4. Improvements of Performance
9.5. Detailed Description of the GPU Kernels
9.6. Overview of Advanced Techniques
9.7. Conclusions
Chapter 10. Wavelet-Based Density Functional Theory Calculation on Massively Parallel Hybrid Architectures
10.1. Introduction, Problem Statement, and Context
10.2. Core Method
10.3. Algorithms, Implementations, and Evaluations
10.4. Final Evaluation and Validation of Results, Total Benefits, and Limitations
10.5. Conclusions and Future Directions
Introduction
Chapter 11. Accurate Scanning of Sequence Databases with the Smith-Waterman Algorithm
11.1. Introduction, Problem Statement, and Context
11.2. Core Method
11.3. CUDA implementation of the SW algorithm for identification of homologous proteins
11.4. Discussion
11.5. Final Evaluation
Chapter 12. Massive Parallel Computing to Accelerate Genome-Matching
12.1. Introduction, Problem Statement, and Context
12.2. Core Methods
12.3. Algorithms, Implementations, and Evaluations
12.4. Final Evaluation and Validation of Results, Total Benefits, and Limitations
12.5. Future Directions
Chapter 13. GPU-Supercomputer Acceleration of Pattern Matching
13.1. Introduction, Problem Statement, and Context
13.2. Core Method
13.3. Algorithms, Implementations, and Evaluations
13.4. Final Evaluation
13.5. Future Direction
Chapter 14. GPU Accelerated RNA Folding Algorithm
14.1. Problem Statement
14.2. Core Method
14.3. Algorithms, Implementations, and Evaluations
14.4. Final Evaluation
14.5. Future Directions
Chapter 15. Temporal Data Mining for Neuroscience
15.1. Introduction
15.2. Core Methodology
15.3. GPU Parallelization: Algorithms and Implementations
15.4. Experimental Results
15.5. Discussion
Introduction
Chapter 16. Parallelization Techniques for Random Number Generators
16.1. Introduction
16.2. L'Ecuyer's Multiple Recursive Generator MRG32k3a
16.3. Sobol Generator
16.4. Mersenne Twister MT19937
16.5. Performance Benchmarks
Chapter 17. Monte Carlo Photon Transport on the GPU
17.1. Physics of Photon Transport
17.2. Photon Transport on the GPU
17.3. The Complete System
17.4. Results and Evaluation
17.5. Future Directions
Chapter 18. High-Performance Iterated Function Systems
18.1. Problem Statement and Mathematical Background
18.2. Core Technology
18.3. Implementation
18.4. Final Evaluation
18.5. Conclusion
Introduction
Chapter 19. Large-Scale Machine Learning
19.1. Introduction
19.2. Core Technology
19.3. GPU Algorithm and Implementation
19.4. Improvements of Performance
19.5. Conclusions and Future Work
Chapter 20. Multiclass Support Vector Machine
20.1. Introduction, Problem Statement, and Context
20.2. Core Method
20.3. Algorithms, Implementations, and Evaluations
20.4. Final Evaluation
20.5. Future Direction
Chapter 21. Template-Driven Agent-Based Modeling and Simulation with CUDA
21.1. Introduction, Problem Statement, and Context
21.2. Final Evaluation and Validation of Results
21.3. Conclusions, Benefits and Limitations, and Future Work
Chapter 22. GPU-Accelerated Ant Colony Optimization
22.1. Introduction, Problem Statement, and Context
22.2. Core Method
22.3. Algorithms, Implementations, and Evaluations
22.4. Final Evaluation
22.5. Future Direction
Introduction
Chapter 23. High-Performance Gate-Level Simulation with GP-GPUs
23.1. Introduction
23.2. Simulator Overview
23.3. Compilation and Simulation
23.4. Experimental Results
23.5. Future Directions
Chapter 24. GPU-Based Parallel Computing for Fast Circuit Optimization
24.1. Introduction, Problem Statement, and Context
24.2. Core Method
24.3. Algorithms, Implementations, and Evaluations
24.4. Final Evaluation
24.5. Future Direction
Introduction
Chapter 25. Lattice Boltzmann Lighting Models
25.1. Introduction, Problem Statement, and Context
25.2. Core Methods
25.3. Algorithms, Implementation, and Evaluation
25.4. Final Evaluation
25.5. Future Directions
25.6. Derivation of the Diffusion Equation
Chapter 26. Path Regeneration for Random Walks
26.1. Introduction
26.2. Path Tracing as Case Study
26.3. Random Walks in Path Tracing
26.4. Implementation Details
26.5. Results
26.6. Discussion
Chapter 27. From Sparse Mocap to Highly Detailed Facial Animation
27.1. System Overview
27.2. Background
27.3. Core Technology and Algorithms
27.4. Future Directions
Chapter 28. A Programmable Graphics Pipeline in CUDA for Order-Independent Transparency
28.1. Introduction, Problem Statement, and Context
28.2. Core Method
28.3. Algorithms, Implementations, and Evaluations
28.4. Final Evaluation
28.5. Future Direction
Introduction
Chapter 29. Fast Graph Cuts for Computer Vision
29.1. Introduction, Problem Statement, and Context
29.2. Core Method
29.3. Algorithms, Implementations, and Evaluations
29.4. Final evaluation and validation of results
29.5. Multilabel Graph Cuts
Chapter 30. Visual Saliency Model on Multi-GPU
30.1. Introduction
30.2. Visual Saliency Model
30.3. GPU Implementation
30.4. Results
30.5. Conclusion
Chapter 31. Real-Time Stereo on GPGPU Using Progressive Multiresolution Adaptive Windows
31.1. Introduction, Problem Statement, and Context
31.2. Core Method
Chapter 32. Real-Time Speed-Limit-Sign Recognition on an Embedded System Using a GPU
32.1. Introduction
32.2. Methods
32.3. Implementation
32.4. Results and Discussion
32.5. Conclusion and Future Work
Chapter 33. Haar Classifiers for Object Detection with CUDA
33.1. Introduction
33.2. Viola-Jones Object Detection Retrospective
33.3. Object Detection Pipeline with NVIDIA CUDA
33.4. Benchmarking and Implementation Details
33.5. Future Direction
33.6. Conclusion
Introduction
Chapter 34. Experiences on Image and Video Processing with CUDA and OpenCL
34.1. Introduction, Problem Statement, and Background
34.2. Core Technology or Algorithm
34.3. Key Insights from Implementation and Evaluation
34.4. Final Evaluation
Chapter 35. Connected Component Labeling in CUDA
35.1. Introduction
35.2. Core Algorithm
35.3. CUDA Algorithm and Implementation
35.4. Final Evaluation and Results
Chapter 36. Image De-Mosaicing
36.1. Introduction, Problem Statement, and Context
36.2. Core Method
36.3. Algorithms, Implementations, and Evaluations
36.4. Final Evaluation
Introduction
Chapter 37. Efficient Automatic Speech Recognition on the GPU
37.1. Introduction, Problem Statement, and Context
37.2. Core Methods
37.3. Algorithms, Implementations, and Evaluations
37.4. Conclusion and Future Directions
Chapter 38. Parallel LDPC Decoding
38.1. Introduction, Problem Statement, and Context
38.2. Core Technology
38.3. Algorithms, Implementations, and Evaluations
38.4. Final Evaluation
38.5. Future Directions
Chapter 39. Large-Scale Fast Fourier Transform
39.1. Introduction
39.2. Memory Hierarchy of GPU Clusters
39.3. Large-Scale Fast Fourier Transform
39.4. Algebraic Manipulation of Array Dimensions
39.5. Performance Results
39.6. Conclusion and Future Work
Introduction
Chapter 40. GPU Acceleration of Iterative Digital Breast Tomosynthesis
40.1. Introduction
40.2. Digital Breast Tomosynthesis
40.3. Accelerating Iterative DBT using GPUs
40.4. Conclusions
Chapter 41. Parallelization of Katsevich CT Image Reconstruction Algorithm on Generic Multi-Core Processors and GPGPU
41.1. Introduction, Problem, and Context
41.2. Core Methods
41.3. Algorithms, Implementations, and Evaluations
41.4. Final Evaluation and Validation of Results, Total Benefits, and Limitations
41.5. Related Work
41.6. Future Directions
41.7. Summary
Chapter 42. 3-D Tomographic Image Reconstruction from Randomly Ordered Lines with CUDA
42.1. Introduction
42.2. Core Methods
42.3. Implementation
42.4. Evaluation and Validation of Results, Total Benefits, and Limitations
42.5. Future Directions
Chapter 43. Using GPUs to Learn Effective Parameter Settings for GPU-Accelerated Iterative CT Reconstruction Algorithms
43.1. Introduction, Problem Statement, and Context
43.2. Core Method(s)
43.3. Algorithms, Implementations, and Evaluations
43.4. Final Evaluation and Validation of Results, Total Benefits, and Limitations
43.5. Future Directions
Chapter 44. Using GPUs to Accelerate Advanced MRI Reconstruction with Field Inhomogeneity Compensation
44.1. Introduction
44.2. Core Method: Advanced Image Reconstruction Toolbox for MRI
44.3. MRI Reconstruction Algorithms and Implementation on GPUs
44.4. Final Results and Evaluation
44.5. Conclusion and Future Directions
Chapter 45. ℓ1 Minimization in ℓ1-SPIRiT Compressed Sensing MRI Reconstruction
45.1. Introduction, Problem Statement, and Context
45.2. Core Methods (High Level Description)
45.3. Algorithms, Implementations, and Evaluations (Detailed Description)
45.4. Final Evaluation and Validation of Results, Total Benefits, and Limitations
45.5. Discussion and Conclusion
Chapter 46. Medical Image Processing Using GPU-Accelerated ITK Image Filters
46.1. Introduction
46.2. Core Methods
46.3. Implementation
46.4. Results
46.5. Future Directions
46.6. Acknowledgments
Chapter 47. Deformable Volumetric Registration Using B-Splines
47.1. Introduction
47.2. An Overview of B-Spline Registration
47.3. Implementation Details
47.4. Results
47.5. Conclusions
Chapter 48. Multiscale Unbiased Diffeomorphic Atlas Construction on Multi-GPUs
48.1. Introduction, Problem Statement, and Context
48.2. Core Methods
48.3. Algorithms, Implementations, and Evaluations
48.4. Final Evaluation and Validation of Results, Total Benefits, and Limitations
48.5. Future Directions
Chapter 49. GPU-Accelerated Brain Connectivity Reconstruction and Visualization in Large-Scale Electron Micrographs
49.1. Introduction
49.2. Core Methods
49.3. Implementation
49.4. Results
49.5. Future Directions
Chapter 50. Fast Simulation of Radiographic Images Using a Monte Carlo X-Ray Transport Algorithm Implemented in CUDA
50.1. Introduction, Problem Statement, and Context
50.2. Core Methods
50.3. Algorithms, Implementations, and Evaluations
50.4. Final Evaluation and Validation of Results, Total Benefits, and Limitations
50.5. Future Directions
Index
WH