LIMITED OFFER
Save 50% on book bundles
Immediately download your ebook while waiting for your print delivery. No promo code needed.
GPU Computing Gems, Jade Edition, offers hands-on, proven techniques for general purpose GPU programming based on the successful application experiences of leading researche… Read more
LIMITED OFFER
Immediately download your ebook while waiting for your print delivery. No promo code needed.
GPU Computing Gems, Jade Edition, offers hands-on, proven techniques for general purpose GPU programming based on the successful application experiences of leading researchers and developers. One of few resources available that distills the best practices of the community of CUDA programmers, this second edition contains 100% new material of interest across industry, including finance, medicine, imaging, engineering, gaming, environmental science, and green computing. It covers new tools and frameworks for productive GPU computing application development and provides immediate benefit to researchers developing improved programming environments for GPUs.
Divided into five sections, this book explains how GPU execution is achieved with algorithm implementation techniques and approaches to data structure layout. More specifically, it considers three general requirements: high level of parallelism, coherent memory access by threads within warps, and coherent control flow within warps. Chapters explore topics such as accelerating database searches; how to leverage the Fermi GPU architecture to further accelerate prefix operations; and GPU implementation of hash tables. There are also discussions on the state of GPU computing in interactive physics and artificial intelligence; programming tools and techniques for GPU computing; and the edge and node parallelism approach for computing graph centrality metrics. In addition, the book proposes an alternative approach that balances computation regardless of node degree variance.
Software engineers, programmers, hardware engineers, and advanced students will find this book extremely usefull. For useful source codes discussed throughout the book, the editors invite readers to the following website: <a href="http://gpugems.hwu-server2.crhc.illinois.edu</a>…"
Software engineers, programmers, hardware engineers, advanced students
Editors, Reviewers, and Authors
Editor-In-Chief
Managing Editor
NVIDIA Editor
Area Editors
Reviewers
Authors
Introduction
State of GPU Computing
Section 1: Parallel Algorithms and Data Structures
Introduction
In this Section
Chapter 1. Large-Scale GPU Search
1.1 Introduction
1.2 Memory Performance
1.3 Searching Large Data Sets
1.4 Experimental Evaluation
1.5 Conclusion
References
Chapter 2. Edge v. Node Parallelism for Graph Centrality Metrics
2.1 Introduction
2.2 Background
2.3 Node v. Edge Parallelism
2.4 Data Structure
2.5 Implementation
2.6 Analysis
2.7 Results
2.8 Conclusions
References
Chapter 3. Optimizing Parallel Prefix Operations for the Fermi Architecture
3.1 Introduction to Parallel Prefix Operations
3.2 Efficient Binary Prefix Operations on Fermi
3.3 Conclusion
References
Chapter 4. Building an Efficient Hash Table on the GPU
4.1 Introduction
4.2 Overview
4.3 Building and Querying a Basic Hash Table
4.4 Specializing the Hash Table
4.5 Analysis
4.6 Conclusion
Acknowledgments
References
Chapter 5. Efficient CUDA Algorithms for the Maximum Network Flow Problem
5.1 Introduction, Problem Statement, and Context
5.2 Core Method
5.3 Algorithms, Implementations, and Evaluations
5.4 Final Evaluation
5.5 Future Directions
References
Chapter 6. Optimizing Memory Access Patterns for Cellular Automata on GPUs
6.1 Introduction, Problem Statement, and Context
6.2 Core Methods
6.3 Algorithms, Implementations, and Evaluations
6.4 Final Results
6.5 Future Directions
References
Chapter 7. Fast Minimum Spanning Tree Computation
7.1 Introduction, Problem Statement, and Context
7.2 The MST Algorithm: Overview
7.3 CUDA Implementation of MST
7.4 Evaluation
7.5 Conclusions
References
Chapter 8. Comparison-Based In-Place Sorting with CUDA
8.1 Introduction
8.2 Bitonic Sort
8.3 Implementation
8.4 Evaluation
8.5 Conclusion
References
Section 2: Numerical Algorithms
Introduction
State of GPU-Based Numerical Algorithms
In this Section
Chapter 9. Interval Arithmetic in CUDA
9.1 Interval Arithmetic
9.2 Importance of Rounding Modes
9.3 Interval Operators in CUDA
9.4 Some Evaluations: Synthetic Benchmark
9.5 Application-Level Benchmark
9.6 Conclusion
References
Chapter 10. Approximating the erfinv Function
10.1 Introduction
10.2 New erfinv Approximations
10.3 Performance and Accuracy
10.4 Conclusions
References
Chapter 11. A Hybrid Method for Solving Tridiagonal Systems on the GPU
11.1 Introduction
11.3 Algorithms
11.4 Implementation
11.5 Results and Evaluation
11.6 Future Directions
Source code
References
Chapter 12. Accelerating CULA Linear Algebra Routines with Hybrid GPU and Multicore Computing
12.1 Introduction, Problem Statement, and Context
12.2 Core Methods
12.3 Algorithms, Implementations, and Evaluations
12.4 Final Evaluation and Validation]{Final Evaluation and Validation of Results, Total Benefits, and Limitations
12.5 Future Directions
References
Chapter 13. GPU Accelerated Derivative-Free Mesh Optimization
13.1 Introduction, Problem Statement, and Context
13.2 Core Method
13.3 Algorithms, Implementations, and Evaluations
13.4 Final Evaluation
13.5 Future Direction
References
Section 3: Engineering Simulation
Introduction
State of GPU Computing in Engineering Simulations
In this Section
Chapter 14. Large-Scale Gas Turbine Simulations on GPU Clusters
14.1 Introduction, Problem Statement, and Context
14.2 Core Method
14.3 Algorithms, Implementations, and Evaluations
14.4 Final Evaluation
14.5 Test Case and Parallel Performance
14.6 Future Directions
References
Chapter 15. GPU Acceleration of Rarefied Gas Dynamic Simulations
15.1 Introduction, Problem Statement, and Context
15.2 Core Methods
15.3 Algorithms, Implementations, and Evaluations
15.4 Final Evaluation
15.5 Future Directions
References
Chapter 16. Application of Assembly of Finite Element Methods on Graphics Processors for Real-Time Elastodynamics
16.1 Introduction, Problem Statement, and Context
16.2 Core Method
16.3 Algorithms, Implementations, and Evaluations
16.4 Evaluation and Validation of Results, Total Benefits, Limitations
16.5 Future Directions
Acknowledgments
References
Chapter 17. CUDA Implementation of Vertex-Centered, Finite Volume CFD Methods on Unstructured Grids with Flow Control Applications
17.1 Introduction, Problem Statement, and Context
17.2 Core (CFD and Optimization) Methods
17.3 Implementations and Evaluation
17.4 Applications to Flow Control — Optimization
References
Chapter 18. Solving Wave Equations on Unstructured Geometries
18.1 Introduction, Problem Statement, and Context
18.2 Core Method
18.3 Algorithms, Implementations, and Evaluations
18.4 Final Evaluation
18.5 Future Directions
Acknowledgments
References
Chapter 19. Fast Electromagnetic Integral Equation Solvers on Graphics Processing Units
19.1 Problem Statement and Background
19.2 Algorithms Introduction
19.3 Algorithm Description
19.4 GPU Implementations
19.5 Results
19.6 Integrating the GPU NGIM Algorithms with Iterative IE Solvers
19.7 Future directions
References
Section 4: Interactive Physics and AI for Games and Engineering Simulation
Introduction
State of GPU Computing in Interactive Physics and AI
In this Section
Chapter 20. Solving Large Multibody Dynamics Problems on the GPU
20.1 Introduction, Problem Statement, and Context
20.2 Core Method
20.3 The Time-Stepping Scheme
20.4 Algorithms, Implementations, and Evaluations
20.5 Final Evaluation
20.6 Future Directions
Acknowledgments
References
Chapter 21. Implicit FEM Solver on GPU for Interactive Deformation Simulation
21.1 Problem Statement and Context
21.2 Core Method
21.3 Algorithms and Implementations
21.4 Results and Evaluation
21.5 Future Directions
Acknowledgements
References
Chapter 22. Real-Time Adaptive GPU Multiagent Path Planning
22.1 Introduction
22.2 Core Method
22.3 Implementation
22.4 Results
References
Section 5: Computational Finance
Introduction
State of GPU Computing in Computational Finance
In this Section
Chapter 23. Pricing Financial Derivatives with High Performance Finite Difference Solvers on GPUs
23.1 Introduction, Problem Statement, and Context
23.2 Core Method
23.3 Algorithms, Implementations, and Evaluations
23.4 Final Evaluation
23.5 Future Directions
References
Chapter 24. Large-Scale Credit Risk Loss Simulation
24.1 Introduction, Problem Statement, and Context
24.2 Core Methods
24.3 Algorithms, Implementations, Evaluations
24.4 Results and Conclusions
24.5 Future Developments
Acknowledgements
References
Chapter 25. Monte Carlo–Based Financial Market Value-at-Risk Estimation on GPUs
25.1 Introduction, Problem Statement, and Context
25.2 Core Methods
25.3 Algorithms, Implementations, and Evaluations
25.4 Final Results
25.5 Conclusion
References
Section 6: Programming Tools and Techniques
Introduction
Programming Tools and Techniques for GPU Computing
In this Section
Chapter 26. Thrust: A Productivity-Oriented Library for CUDA
26.1 Motivation
26.2 Diving In
26.3 Generic Programming
26.4 Benefits of Abstraction
26.5 Best Practices
References
Chapter 27. GPU Scripting and Code Generation with PyCUDA
27.1 Introduction, Problem Statement, and Context
27.2 Core Method
27.3 Algorithms, Implementations, and Evaluations
27.4 Evaluation
27.5 Availability
27.6 Future Directions
Acknowledgment
References
Chapter 28. Jacket: GPU Powered MATLAB Acceleration
28.1 Introduction
28.2 Jacket
28.3 Benchmarking Procedures
28.4 Experimental Results
28.5 Future Directions
References
Chapter 29. Accelerating Development and Execution Speed with Just-in-Time GPU Code Generation
29.1 Introduction, Problem Statement, and Context
29.2 Core Methods
29.3 Algorithms, Implementations, and Evaluations
29.4 Final Evaluation
29.5 Future Directions
References
Chapter 30. GPU Application Development, Debugging, and Performance Tuning with GPU Ocelot
30.1 Introduction
30.2 Core Technology
30.3 Algorithm, Implementation, and Benefits
30.4 Future Directions
Acknowledgements
References
Chapter 31. Abstraction for AoS and SoA Layout in C++
31.1 Introduction, Problem Statement, and Context
31.2 Core Method
31.3 Implementation
31.4 ASA in Practice
31.5 Final Evaluation
Acknowledgments
References
Chapter 32. Processing Device Arrays with C++ Metaprogramming
32.1 Introduction, Problem Statement, and Context
32.2 Core Method
32.3 Implementation
32.4 Evaluation
32.5 Future Directions
References
Chapter 33. GPU Metaprogramming: A Case Study in Biologically Inspired Machine Vision
33.1 Introduction, Problem Statement, and Context
33.2 Core Method
33.3 Algorithms, Implementations, and Evaluations
33.4 Final Evaluation
33.5 Future Directions
References
Chapter 34. A Hybridization Methodology for High-Performance Linear Algebra Software for GPUs
34.1 Introduction, Problem Statement, and Context
34.2 Core Method
34.3 Algorithms, Implementations, and Evaluations
34.4 Final Evaluation
34.5 Future Directions
References
Chapter 35. Dynamic Load Balancing Using Work-Stealing
35.1 Introduction
35.2 Core Method
35.3 Algorithms and Implementations
35.4 Case Studies and Evaluation
35.5 Future Directions
Acknowledgments
References
Chapter 36. Applying Software-Managed Caching and CPU/GPU Task Scheduling for Accelerating Dynamic Workloads
36.1 Introduction, Problem Statement, and Context
36.2 Core Method
36.3 Algorithms, Implementations, and Evaluations
36.4 Final Evaluation
References
Index
WH