
Scalable Shared-Memory Multiprocessing
- 1st Edition - June 1, 1995
- Imprint: Morgan Kaufmann
- Authors: Daniel E. Lenoski, Wolf-Dietrich Weber
- Language: English
- Paperback ISBN:9 7 8 - 1 - 4 9 3 3 - 0 6 1 4 - 5
- eBook ISBN:9 7 8 - 1 - 4 8 3 2 - 9 6 0 1 - 2
Dr. Lenoski and Dr. Weber have experience with leading-edge research and practical issues involved in implementing large-scale parallel systems. They were key contributors to the… Read more

Purchase options

Institutional subscription on ScienceDirect
Request a sales quoteDr. Lenoski and Dr. Weber have experience with leading-edge research and practical issues involved in implementing large-scale parallel systems. They were key contributors to the architecture and design of the DASH multiprocessor. Currently, they are involved with commercializing scalable shared-memory technology.
Scalable Shared-Memory Multiprocessing
by Daniel E. Lenoski and Wolf-Dietrich Weber
by Daniel E. Lenoski and Wolf-Dietrich Weber
- Foreword
Preface
Part 1 General Concepts
Chapter 1 Multiprocessing and Scalability
- 1.1 Multiprocessor Architecture
- 1.1.1 Single versus Multiple Instruction Streams
1.1.2 Message-Passing versus Shared-Memory Architectures
1.2 Cache Coherence
- 1.2.1 Uniprocessor Caches
1.2.2 Multiprocessor Caches
1.3 Scalability
- 1.3.1 Scalable Interconnection Networks
1.3.2 Scalable Cache Coherence
1.3.3 Scalable I/O
1.3.4 Summary of Hardware Architecture Scalability
1.3.5 Scalability of Parallel Software
1.4 Scaling and Processor Grain Size
1.5 Chapter conclusions
Chapter 2 Shared-Memory Parallel Programs
- 2.1 Basic Concepts
2.2 Parallel Application Set
- 2.2.1 MP3D
2.2.2 Water
2.2.3 PTHOR
2.2.4 LocusRoute
2.2.5 Cholesky
2.2.6 Barnes-Hut
2.3 Simulation Environment
- 2.3.1 Basic Program Characteristics
2.4 Parallel Application Execution Model
2.5 Parallel Execution under a PRAM Memory Model
2.6 Parallel Execution with Shared Data Uncached
2.7 Parallel Execution with Shared Data Cached
2.8 Summary of Results with Different Memory System Models
2.9 Communication Behavior of Parallel Applications
2.10 Communication-to-Computation Ratios
2.11 Invalidation Patterns
- 2.11.1 Classification of Data Objects
2.11.2 Average Invalidation Characteristics
2.11.3 Basic Invalidation Patterns for Each Application
2.11.4 MP3D
2.11.5 Water
2.11.6 PTHOR
2.11.7 LocusRoute
2.11.8 Cholesky
2.11.9 Barnes-Hut
2.11.10 Summary of Individual Invalidation Distributions
2.11.11 Effect of Problem Size
2.11.12 Effect of Number of Processors
2.11.13 Effect of Finite Caches and Replacement Hints
2.11.14 Effect of Cache Line Size
2.11.15 Invalidation Patterns Summary
2.12 Chapter Conclusions
Chapter 3 System Performance Issues
- 3.1 Memory Latency
3.2 Memory Latency Reduction
- 3.2.1 Nonuniform Memory Access (NUMA)
3.2.2 Cache-Only Memory Architecture (COMA)
3.2.3 Direct Interconnect Networks
3.2.4 Hierarchical Access
3.2.5 Protocol Optimizations
3.2.6 Latency Reduction Summary
3.3 Latency Hiding
- 3.3.1 Weak Consistency Models
3.3.2 Prefetch
3.3.3 Multiple-Context Processors
3.3.4 Producer-Initiated Communications
3.3.5 Latency Hiding Summary
3.4 Memory Bandwidth
- 3.4.1 Hot Spots
3.4.2 Synchronization Support
3.5 Chapter Conclusions
Chapter 4 System Implementation
- 4.1 Scalability of System Costs
- 4.1.1 Directory Storage overhead
4.1.2 Sparse Directories
4.1.3 Hierarchical Directories
4.1.4 Summary of Directory Storage overhead
4.2 Implementation Issues and Design Correctness
- 4.2.1 Unbounded Number of Requests
4.2.2 Distributed memory Operations
4.2.3 Request Starvation
4.2.4 Error Detection and Fault tolerance
4.2.5 Design Verification
4.3 Chapter Conclusions
Chapter 5 Scalable Shared-Memory Systems
- 5.1 Directory-Based Systems
- 5.1.1 DASH
5.1.2 Alewife
5.1.3 S3.mp
5.1.4 IEEE Scalable Coherent Interface
5.1.5 Convex Exemplar
5.2 Hierarchical Systems
- 5.2.1 Encore GigaMax
5.2.2 ParaDiGM
5.2.3 Data Diffusion Machine
5.2.4 Kendall Square Research KSR-1 and KSR-2
5.3 Reflective Memory Systems
- 5.3.1 Plus
5.3.2 Merlin and Sesame
5.4 Non-Cache Coherent Systems
- 5.4.1 NYU Ultracomputer
5.4.2 IBM RP3 and BBN TC2000
5.4.3 Cray Research T3D
5.5 Vector Supercomputer Systems
- 5.5.1 Cray Research Y-MP C90
5.5.2 Tera Computer MTA
5.6 Virtual Shared-Memory Systems
- 5.6.1 Ivy and Munin/Treadmarks
5.6.2 J-Machine
5.6.3 MIT/Motorola *T and *T-NG
5.7 Chapter Conclusions
Part 2 Experience with DASH
Chapter 6 DASH Prototype System
- 6.1 System Organization
- 6.1.1 Cluster Organization
6.1.2 Directory Logic
6.1.3 Interconnection Network
6.2 Programmer's Model
6.3 Coherence Protocol
- 6.3.1 Nomenclature
6.3.2 Basic Memory Operations
6.3.3 Prefetch Operations
6.3.4 DMA/Uncached Operations
6.4 Synchronization Protocol
- 6.4.1 Granting Locks
6.4.2 Fetch&Op Variables
6.4.3 Fence Operations
6.5 Protocol General Exceptions
6.6 Chapter Conclusions
Chapter 7 Prototype Hardware Structures
- 7.1 Base Cluster Hardware
- 7.1.1 SGI Multiprocessor Bus (MPBUS)
7.1.2 SGI CPU Board
7.1.3 SGI Memory Board
7.1.4 SGI I/O Board
7.2 Directory Controller
7.3 Reply Controller
7.4 Pseudo-CPU
7.5 Network and Network Interface
7.6 Performance Monitor
7.7 Logic Overhead of Directory-Based Coherence
7.8 Chapter Conclusions
Chapter 8 Prototype Performance Analysis
- 8.1 Base Memory Performance
- 8.1.1 Overall Memory System Bandwidth
8.1.2 Other Memory Bandwidth Limits
8.1.3 Processor Issue Bandwidth and Latency
8.1.4 Interprocessor Latency
8.1.5 Summary of Memory System Bandwidth and Latency
8.2 Parallel Application Performance
- 8.2.1 Application Run-time Environment
8.2.2 Application Speedups
8.2.3 Detailed Case Studies
8.2.4 Application Speedup Summary
8.3 Protocol Effectiveness
- 8.3.1 Base Protocol Features
8.3.2 Alternative Memory Operations
8.4 Chapter Conclusions
Part 3 Future Trends
Chapter 9 TeraDASH
- 9.1 TeraDASH System Organization
- 9.1.1 TeraDASH Cluster Structure
9.1.2 Intracluster Operations
9.1.3 TeraDASH Mesh Network
9.1.4 Tera \DASH Directory Structure
9.2. TeraDASH Coherence Protocol
- 9.2.1 Required Changes for the Scalable Directory Structure
9.2.2 Enhancements for Increased protocol Robustness
9.2.3 Enhancements for Increased Performance
9.3 TeraDASH Performance
- 9.3.1 Access Latencies
9.3.2 Potential Application Speedup
9.4 Chapter Conclusions
Chapter 10 Conclusions and Future Directions
- 10.1 SSMP Design Conclusions
10.2 Current Trends
10.3 Future Trends
Appendix Multiprocessor Systems
References
Index
- Edition: 1
- Published: June 1, 1995
- No. of pages (eBook): 341
- Imprint: Morgan Kaufmann
- Language: English
- Paperback ISBN: 9781493306145
- eBook ISBN: 9781483296012
DL
Daniel E. Lenoski
Dr. Lenoski has experience with leading-edge research and practical issues involved in implementing large-scale parallel systems. He was a key contributor to the architecture and design of the DASH multiprocessor. Currently, he is involved with commercializing scalable shared-memory technology.
WW
Wolf-Dietrich Weber
Dr. Weber has experience with leading-edge research and practical issues involved in implementing large-scale parallel systems. He was a key contributor to the architecture and design of the DASH multiprocessor. Currently, he is involved with commercializing scalable shared-memory technology.
Read Scalable Shared-Memory Multiprocessing on ScienceDirect