Computer Architecture
A Quantitative Approach
- 4th Edition - November 3, 2006
- Authors: John L. Hennessy, David A. Patterson
- Language: English
The era of seemingly unlimited growth in processor performance is over: single chip architectures can no longer overcome the performance limitations imposed by the power they… Read more
Purchase options
The era of seemingly unlimited growth in processor performance is over: single chip architectures can no longer overcome the performance limitations imposed by the power they consume and the heat they generate. Today, Intel and other semiconductor firms are abandoning the single fast processor model in favor of multi-core microprocessors--chips that combine two or more processors in a single package. In the fourth edition of Computer Architecture, the authors focus on this historic shift, increasing their coverage of multiprocessors and exploring the most effective ways of achieving parallelism as the key to unlocking the power of multiple processor architectures. Additionally, the new edition has expanded and updated coverage of design topics beyond processor performance, including power, reliability, availability, and dependability.
- Increased coverage on achieving parallelism with multiprocessors.
- Case studies of latest technology from industry including the Sun Niagara Multiprocessor, AMD Opteron, and Pentium 4.
- Three review appendices, included in the printed volume, review the basic and intermediate principles the main text relies upon.
computer architects, computer system designers, compiler and system software developers, programmers, application developers
1 Fundamentals of Computer Design
1.1 Introduction
1.2 The Changing Face of Computing and the Task of the Computer Designer
1.3 Technology Trends
1.4 Power in Integrated Circuits
1.5 Trends in Cost
1.6 Reliability, Availability and Dependability
1.7 Measuring and Reporting Performance
1.8 Quantitative Principles of Computer Design
1.9 Putting It All Together: Performance and Price-Performance
1.10 Fallacies and Pitfalls
1.11 Concluding Remarks
2 Instruction Level Parallelism and Its Exploitation
2.1 Instruction-Level Parallelism: Concepts and Challenges
2.2 Basic Compiler Techniques for Exposing ILP
2.3 Reducing Branch Costs with Prediction
2.4 Overcoming Data Hazards with Dynamic Scheduling
2.5 Dynamic Scheduling: Examples and the Algorithm
2.6 Hardware-Based Speculation
2.7 Exploiting ILP using Multiple Issue and Static Scheduling
2.8 Exploiting ILP using Dynamic Scheduling, Multiple Issue, and Speculation
2.9 Advanced Techniques for Instruction Delivery and Speculation
2.10 Putting It All Together: The Intel Pentium 4
2.11 Fallacies and Pitfalls
2.12 Concluding Remarks
3 Advanced Techniques for Exploiting Instruction-Level Parallelism and Their Limits
3.1 Introduction
3.2 Studies of the Limitations of ILP
3.3 Limitations on ILP for Realizable Processors
3.4 Crosscutting Issues: Hardware versus Software Speculation
3.5 Multithreading: Using ILP Support to Exploit Thread-level Parallelism
3.6 Putting It All Together: Performance and Efficiency in
Advanced Multiple Issue Processors
3.7 Fallacies and Pitfalls
3.8 Concluding Remarks
4 Multiprocessors and Thread-Level Parallelism
4.1 Introduction
4.2 Symmetric Shared-Memory Architectures
4.3 Performance of Symmetric Shared-Memory Multiprocessors
4.4 Distributed Shared Memory and Directory-Based Coherence
4.5 Synchronization: The Basics
4.6 Models of Memory Consistency: An Introduction
4.7 Crosscutting Issues
4.8 Putting It All Together: The Sun T1 Multiprocessor
4.9 Fallacies and Pitfalls
4.10 Concluding Remarks
5 Memory Hierarchy Design
5.1 Introduction
5.2 Eleven Advanced Optimizations of Cache Performance
5.3 Memory Technology and Optimizations
5.4 Protection: Virtual Memory and Virtual Machines
5.5 Crosscutting Issues: The Design of Memory Hierarchies
5.6 Putting It All Together: AMD Opteron Memory Hierarchy
5.7 Fallacies and Pitfalls
5.8 Concluding Remarks
6 Storage Systems
6.1 Introduction
6.2 Advanced Topics in Disk Storage
6.3 Definition and Examples of Real Faults and Failures
6.4 I/O Performance, Reliability Measures, and Benchmarks
6.5 A Little Queuing Theory
6.6 Crosscutting Issues
6.7 Designing and Evaluating an I/O System - The Internet Archive Cluster
6.8 Putting It All Together: NetApp FAS6000 Filer
6.9 Fallacies and Pitfalls
6.10 Concluding Remarks
Appendix A: Pipelining: Basic and Intermediate Concepts
A.1 Introduction
A.2 The Major Hurdle of Pipelining—Pipeline Hazards
A.3 How Is Pipelining Implemented?
A.4 What Makes Pipelining Hard to Implement?
A.5 Extending the MIPS Pipeline to Handle Multicycle Operations
A.6 Putting It All Together: The MIPS R4000 Pipeline
A.7 Crosscutting Issues
A.8 Fallacies and Pitfalls
A.9 Concluding Remarks
Appendix B: Instruction Set Principles and Examples
B.1 Introduction
B.2 Classifying Instruction Set Architectures
B.3 Memory Addressing
B.4 Addressing Modes for Signal Processing 1
B.5 Type and Size of Operands
B.6 Operations in the Instruction Set
B.7 Instructions for Control Flow
B.8 Encoding an Instruction Set
B.9 Crosscutting Issues: The Role of Compilers
B.10 Putting It All Together: The MIPS Architecture
B.11 Fallacies and Pitfalls
B.12 Concluding Remarks
Appendix C: Introduction to Memory Hierarchy
C.1 Introduction
C.2 Cache Performance
C.3 Seven Basic Cache Optimizations
C.4 Virtual Memory
C.5 Protection and Examples of Virtual Memory
C.6 Fallacies and Pitfalls
C.7 Concluding Remarks
1.1 Introduction
1.2 The Changing Face of Computing and the Task of the Computer Designer
1.3 Technology Trends
1.4 Power in Integrated Circuits
1.5 Trends in Cost
1.6 Reliability, Availability and Dependability
1.7 Measuring and Reporting Performance
1.8 Quantitative Principles of Computer Design
1.9 Putting It All Together: Performance and Price-Performance
1.10 Fallacies and Pitfalls
1.11 Concluding Remarks
2 Instruction Level Parallelism and Its Exploitation
2.1 Instruction-Level Parallelism: Concepts and Challenges
2.2 Basic Compiler Techniques for Exposing ILP
2.3 Reducing Branch Costs with Prediction
2.4 Overcoming Data Hazards with Dynamic Scheduling
2.5 Dynamic Scheduling: Examples and the Algorithm
2.6 Hardware-Based Speculation
2.7 Exploiting ILP using Multiple Issue and Static Scheduling
2.8 Exploiting ILP using Dynamic Scheduling, Multiple Issue, and Speculation
2.9 Advanced Techniques for Instruction Delivery and Speculation
2.10 Putting It All Together: The Intel Pentium 4
2.11 Fallacies and Pitfalls
2.12 Concluding Remarks
3 Advanced Techniques for Exploiting Instruction-Level Parallelism and Their Limits
3.1 Introduction
3.2 Studies of the Limitations of ILP
3.3 Limitations on ILP for Realizable Processors
3.4 Crosscutting Issues: Hardware versus Software Speculation
3.5 Multithreading: Using ILP Support to Exploit Thread-level Parallelism
3.6 Putting It All Together: Performance and Efficiency in
Advanced Multiple Issue Processors
3.7 Fallacies and Pitfalls
3.8 Concluding Remarks
4 Multiprocessors and Thread-Level Parallelism
4.1 Introduction
4.2 Symmetric Shared-Memory Architectures
4.3 Performance of Symmetric Shared-Memory Multiprocessors
4.4 Distributed Shared Memory and Directory-Based Coherence
4.5 Synchronization: The Basics
4.6 Models of Memory Consistency: An Introduction
4.7 Crosscutting Issues
4.8 Putting It All Together: The Sun T1 Multiprocessor
4.9 Fallacies and Pitfalls
4.10 Concluding Remarks
5 Memory Hierarchy Design
5.1 Introduction
5.2 Eleven Advanced Optimizations of Cache Performance
5.3 Memory Technology and Optimizations
5.4 Protection: Virtual Memory and Virtual Machines
5.5 Crosscutting Issues: The Design of Memory Hierarchies
5.6 Putting It All Together: AMD Opteron Memory Hierarchy
5.7 Fallacies and Pitfalls
5.8 Concluding Remarks
6 Storage Systems
6.1 Introduction
6.2 Advanced Topics in Disk Storage
6.3 Definition and Examples of Real Faults and Failures
6.4 I/O Performance, Reliability Measures, and Benchmarks
6.5 A Little Queuing Theory
6.6 Crosscutting Issues
6.7 Designing and Evaluating an I/O System - The Internet Archive Cluster
6.8 Putting It All Together: NetApp FAS6000 Filer
6.9 Fallacies and Pitfalls
6.10 Concluding Remarks
Appendix A: Pipelining: Basic and Intermediate Concepts
A.1 Introduction
A.2 The Major Hurdle of Pipelining—Pipeline Hazards
A.3 How Is Pipelining Implemented?
A.4 What Makes Pipelining Hard to Implement?
A.5 Extending the MIPS Pipeline to Handle Multicycle Operations
A.6 Putting It All Together: The MIPS R4000 Pipeline
A.7 Crosscutting Issues
A.8 Fallacies and Pitfalls
A.9 Concluding Remarks
Appendix B: Instruction Set Principles and Examples
B.1 Introduction
B.2 Classifying Instruction Set Architectures
B.3 Memory Addressing
B.4 Addressing Modes for Signal Processing 1
B.5 Type and Size of Operands
B.6 Operations in the Instruction Set
B.7 Instructions for Control Flow
B.8 Encoding an Instruction Set
B.9 Crosscutting Issues: The Role of Compilers
B.10 Putting It All Together: The MIPS Architecture
B.11 Fallacies and Pitfalls
B.12 Concluding Remarks
Appendix C: Introduction to Memory Hierarchy
C.1 Introduction
C.2 Cache Performance
C.3 Seven Basic Cache Optimizations
C.4 Virtual Memory
C.5 Protection and Examples of Virtual Memory
C.6 Fallacies and Pitfalls
C.7 Concluding Remarks
- Edition: 4
- Published: November 3, 2006
- Language: English
JH
John L. Hennessy
ACM named John L. Hennessy a recipient of the 2017 ACM A.M. Turing Award for pioneering a systematic, quantitative approach to the design and evaluation of computer architectures with enduring impact on the microprocessor industry. John L. Hennessy is a Professor of Electrical Engineering and Computer Science at Stanford University, where he has been a member of the faculty since 1977 and was, from 2000 to 2016, its tenth President. Prof. Hennessy is a Fellow of the IEEE and ACM; a member of the National Academy of Engineering, the National Academy of Science, and the American Philosophical Society; and a Fellow of the American Academy of Arts and Sciences. Among his many awards are the 2001 Eckert-Mauchly Award for his contributions to RISC technology, the 2001 Seymour Cray Computer Engineering Award, and the 2000 John von Neumann Award, which he shared with David Patterson. He has also received seven honorary doctorates.
Affiliations and expertise
Departments of Electrical Engineering and Computer Science, Stanford University, USADP
David A. Patterson
David Patterson is the Pardee Professor of Computer Science, Emeritus at the University of California at Berkeley, which he joined after graduating from UCLA in 1977.His teaching has been honored by the Distinguished Teaching Award from the University of California, the Karlstrom Award from ACM, and the Mulligan Education Medal and Undergraduate Teaching Award from IEEE. Prof. Patterson received the IEEE Technical Achievement Award and the ACM Eckert-Mauchly Award for contributions to RISC, and he shared the IEEE Johnson Information Storage Award for contributions to RAID. He also shared the IEEE John von Neumann Medal and the C & C Prize with John Hennessy. Like his co-author, Prof. Patterson is a Fellow of the American Academy of Arts and Sciences, the Computer History Museum, ACM, and IEEE, and he was elected to the National Academy of Engineering, the National Academy of Sciences, and the Silicon Valley Engineering Hall of Fame. He served on the Information Technology Advisory Committee to the U.S. President, as chair of the CS division in the Berkeley EECS department, as chair of the Computing Research Association, and as President of ACM. This record led to Distinguished Service Awards from ACM, CRA, and SIGARCH.
Affiliations and expertise
Pardee Professor of Computer Science, Emeritus, University of California at Berkeley, USA