Transaction Processing: Concepts and Techniquesby Jim Gray and Andreas ReuterFOREWORD BY BRUCE LINDSAYPREFACEPART ONE - The Basics of Transaction Processing1 INTRODUCTION1.1 Historical Perspective1.2 What Is a Transaction Processing System?1.2.1 The End User's View of a Transaction Processing System1.2.2 The Administrator/Operator's View of a TP System1.2.3 Application Designer's View of a TP System1.2.4 The Resource Manager's View of a TP System1.2.5 TP System Core Services1.3 A Transaction Processing System Feature List1.3.1 Application Development Features1.3.2 Repository Features1.3.3 TP Monitor Features1.3.4 Data Communications Features1.3.5 Database Features1.3.6. Operations Features1.3.7 Education and Testing Features1.3.8 Features Summary1.4 Summary1.5 Historical NotesExercisesAnswers2 BASIC COMPUTER SCIENCE TERMINOLOGY2.1 Introduction2.1.1 Units2.2 Basic Hardware2.2.1 Memories2.2.2 Processors2.2.3 Communications Hardware2.2.4 Hardware Architectures2.3 Basic Software - Address Spaces, Processes, Sessions2.3.1 Address Spaces2.3.2 Processes, Protection Domains, and Threads2.3.3 Messages and Sessions2.4 Generic System Issues2.4.1 Clients and Servers2.4.2 Naming2.4.3 Authentication2.4.4 Authorization2.4.5 Scheduling and Performance2.4.6 Summary2.5 Files2.5.1 File Operations2.5.2 File Organizations2.5.3 Distributed Files2.5.4 SQL2.6 Software Performance2.7 Transaction Processing Standards2.7.1 Portability versus Interoperability Standards2.7.2 APIs and FAPs2.7.3 LU6.2, a de facto Standard2.7.4 OSI-TP with X/Open DTP, a de jure Standard2.8 SummaryExercisesAnswersPART TWO - The Basics of Fault Tolerance3 FAULT TOLERANCE3.1 Introduction3.1.1 A Crash Course in Simple Probability3.1.2 An External View of Fault Tolerance3.2 Definitions3.2.1. Fault, Failure, Availability, Reliability3.2.2 Taxonomy of Fault Avoidance and Fault Tolerance3.2.3 Repair, Failfast, Modularity, Recursive Design3.3 Empirical Studies3.3.1 Outages Are Rare Events3.3.2 Studies of Conventional Systems3.3.3 A Study of Fault-Tolerant System3.4 Typical Module Failure Rates3.5 Hardware Approaches to Fault Tolerance3.5.1 The Basic N-Plex Idea: How to Build Failfast Modules3.5.2 Failfast versus Failvote Voters in an N-Plex3.5.3 N-Plex plus Repair Results in High Availability3.5.4 The Voter's Problem3.5.5 Summary3.6 Software Is the Problem3.6.1 N-Version Programming and Software Fault Tolerance3.6.2 Transactions and Software Fault Tolerance3.6.3 Summary3.7 Fault Modeal and Software Fault Masking3.7.1 An Overview of the Model3.7.2 Building Highly Available Storage3.7.3 Highly Available Processes3.7.4 Reliable Messages via Sessions and Process Pairs3.7.5 Summary of the Process-Message-Storage Model3.8 General Principles3.9 A Cautionary Tale - System Delusion3.10 Summary3.11 Historical NotesExercisesAnswersPART THREE - Transaction-Oriented Computing4 TRANSACTION MODELS4.1 Introduction4.1.1 About this Chapter4.2 Atomic Actions and Flat Transaction4.2.1 Disk Writes as Atomic Actions4.2.2 A Classification of Action Types4.2.3 Flat Transactions4.2.4 Limitations of Flat Transactions4.3 Spheres of Control4.3.1 Definition of Spheres of Control4.3.2 Dynamic Behavior of Spheres of Control4.3.3 Summary4.4 A Notation for Explaining Transaction Models4.4.1 What is Required to Describe Transaction Models?4.4.2 Elements of the Notation4.4.3 Defining Transaction Models by a Set of Simple Rules4.5 Flat Transactions with Savepoints4.5.1 About Savepoints4.5.2 Developing the Rules for the Savepoint Model4.5.3 Persistent Savepoints4.6 Chained Transactions4.7 Nested Transactions4.7.1 Definition of the Nexting Structure4.7.2 Using Nested Transactions4.7.3 Emulating Nested Transactions by Savepoints4.8 Distributed Transactions4.9 Multi-Level Transactions4.9.1 The Role of a Compensating Transaction4.9.2 The Use of Multi-Level Transactions4.10 Open Nested Transactions4.11 Long-Lived transactions4.11.1 Transaction Processing Context4.11.2 The Mini-Batch4.11.3 Sagas4.12 Exotics4.13 Summary4.14 Historical NotesExercisesAnswers5 TRANSACTION PROCESSING MONITORS - An Overview5.1 Introduction5.2 The Role of TP Monitors in Transaction Systems5.2.1 The Transaction -Oriented Computing Style5.2.2 The Transaction Processing Services5.2.3 TP System Process Structure5.2.4 Summary5.3 The Structure of a TP Monitor5.3.1 The TP Monitor Components5.3.2 Components of the Transaction Services5.3.3 TP Monitor Support for the Resource Manager Interfaces5.4 Transactional Remote Procedure Calls: The Basic Idea5.4.1 Who Participates in Remote Procedure Calls?5.4.2 Address Space Structure Required for RPC Handling5.4.3 The Dynamics of Remote Procedure Calls5.4.4 Summary5.5 Examples of the Transaction-Oriented Programming Style5.5.1 The Basic Processing Loop5.5.2 Attaching Resource Managers to Transactions: The Simple Cases5.5.3 Attaching Resource Managers To Transactions: The Sophisticated Case5.5.4 Using Persistent Savepoints5.6 Terminological Wrap-Up5.7 Historical NotesExercisesAnswers6 TRANSACTION PROCESSING MONITORS6.1 Introduction6.2 Transactional Remote Procedures Calls6.2.1 The Resource Manager Interface6.2.2 What the Resource Manager Has to Do in Support of Transactions6.2.3 Interfaces between resource Managers and the TP Monitor6.2.4 Resource Manager Calls versus Resource Manager Sessions6.2.5 Summary6.3. Functional Principles of the TP Monitor6.3.1 The Central Data Structures of the TPOS6.3.2 Data Structures Owned by the TP Monitor6.3.3. A Guided Tour Along the TRPC Path6.3.4 Aborts Racing TRPCs6.3.5 Summary6.4 Managing Request and Response Queues6.4.1 Short-Term Queues for Mapping Resource Manager Invocations6.4.2 Durable Request Queues for Asynchronous Transaction Processing6.4.3 Summary6.5 Other Tasks of the TP Monitor6.5.1 Load Balancing6.5.2 Authentication and Authorization 6.5.3 Restart Processing6.6 Summary6.7 Historical NotesExercisesAnswersPART FOUR - Concurrency Control7 ISOLATION CONCEPTS7.1 Overview7.2 Introduction to Isolation7.3 The Dependency Model of Isolation7.3.1 Static versus Dynamic Allocation7.3.2 Transaction Dependencies7.3.3 The Three Bad Dependencies7.3.4 The Case for a Formal Model of Isolation7.4 Isolation: The Application Programmer's View7.5 Isolation Theorems7.5.1 Actions and Transactions7.5.2 Well-Formed and Two-Phased Transactions7.5.3 Transaction Histories7.5.4 Legal Histories and Lock Compatibility7.5.5 Versions, Dependencies, and the Dependency Graph7.5.6 Equivalent and Isolated Histories: BEFORE, AFTER, and Wormholes7.5.7 Wormholes Are Not Isolated7.5.8 Summary of Definitions7.5.9 Summary of the Isolation Theorems7.6 Degrees of Isolation7.6.1 Degrees of Isolation Theorem7.6.2 SQL and Degrees of Isolation7.6.3 Pros and Cons of Low Degrees of Isolation7.6.4 Exotic SQL Isolation - Read-Past and Notify Locks7.7 Phantoms and Predicate Locks7.7.1 The Problem with Predicate Locks7.8 Granular Locks7.8.1 Three Locking and Intent Lock Modes7.8.2 Update Mode Locks7.8.3 Granular Locking Summary7.8.4 Key-Range Locking7.8.5 Dynamic Key-Range Locks: Previous-Key and Next-Key Locking7.8.6 Key-Range Locks Need DAG Locking7.8.7 The DAG Locking Protocol7.8.8 Formal Definition of Granular Locks on a DAG7.9 Locking Heuristics7.10 Nested Transaction Locking7.11 Scheduling and Deadlock7.11.1 The Convoy Phenomenon7.11.2 Deadlock Avoidance versus Toleration7.11.3 The Wait-for Graph and Deadlock Detector7.11.4 Distributed Deadlock7.11.5 Probability of Deadlock7.12 Exotics7.12.1 Field Calls7.12.2 Escrow Locking and Other Field Call Refinements7.12.3 Optimistic and Timestamp Locking7.12.4 Time Domain Addressing7.13 Summary7.14 Historical NotesExercisesAnswers8 LOCK IMPLEMENTATION8.1 About This Chapter8.1.2 The Need for Parallelism within the Lock Manager8.1.3 The Resource Manager and Lock Manager Address Space8.2 Atomic Machine Instructions8.3 Semaphores8.3.1 Exclusive Semaphores8.3.2 Crabbing: Traversing Shared Data Structures8.3.3 Shared Semaphores8.3.4 Allocating Shared Storage8.3.5 Semaphores and Exceptions8.4 Lock Manager8.4.1 Lock Names8.4.2 Lock Queues and Scheduling8.4.3 Lock Duration and Lock Counts8.4.4 Lock Manager Interface and Data Structures8.4.5 Lock Manager Internal Logic8.4.6 Lock Escalation and Generic Unlock, Notify Locks8.4.7 Transaction Savepoints, Commit, and Rollback8.4.8 Locking at System Restart8.4.9 Phoenix Transactions8.4.10 Lock Manager Configuration and Complexity8.4.11 Lock Manager Summary8.5 Deadlock Detection8.6 Locking for Parallel and Parallel Nested Transactions8.7 Summary8.8 Historical NotesExercisesAnswersPART FIVE- Recovery9 LOG MANAGER9.1 Introduction9.1.1 Uses of the Log9.1.2 Log Manager Overview9.1.3 The Log Manager's Relationship to Other Services9.1.4 Why Have a Log Manager?9.2 Log Tables9.2.1 Mapping the Log Table onto Files9.2.2 Log Sequence Numbers9.3 Public Interface to the Log9.3.1 Authorization to Access the Log Table9.3.2 Reading the Log Table9.3.4 Summary9.4 Implementation Details of Log Reads and Writes9.4.1 Reading the Log9.4.2 Log Anchor9.4.3 Transaction Related Anchors9.4.4 Log Insert9.4.5 Allocate and Flush Log Daemons9.4.6 Careful Writes: Serial of Ping-Pong9.4.7 Group Commit, Batching, Boxcarring9.4.8 WADS Writes9.4.9 Multiple Logs per Transaction Manager9.4.10 Summary9.5 Log Restart Logic9.5.1 Saving the Transaction Manager Anchor9.5.2 Preparing for Restart: Careful Writes of the Log Anchor9.5.3 Finding the Anchor and Log End at Restart9.6 Archiving the Log9.6.1 How Much of the Log Table Should Be Online?9.6.2 Low-Water Marks for Rollback, Restart, Archive9.6.3 Dynamic Logs: Copy Aside versus Copy Forward9.6.4 Archiving the Log Without Impacting Concurrent Transactions9.6.5 Electronic Vaulting and Change Accumulation9.6.6 Dealing with Log Manager-Archive Circularity9.7 Logging in a Client-Server Architecture9.8 Summary9.9 Historical NotesExercisesAnswers10 TRANSACTION MANAGER CONCEPTS10.1 Introduction10.2 Transaction Manager Interfaces10.2.1 The Application Interface to Transactions10.2.2 The Resource Manager Interface to Transactions10.2.3 Transaction Manager Functions10.3 Transactional Resource10.3.1 The DO-UNDO-REDO Protocol10.3.2 The Log Table and Log Records10.3.3. Communication Session Recovery10.3.4 Value Logging10.3.5 Logical Logging10.3.6 Physiological Logging10.3.7 Physiological Logging Rules: FIX, WAL, and Force-Log-at-Commit10.3.8 Compensation Log Records10.3.9 Idempotence of Physiological REDO10.3.10 Summary10.4 Two-Phase Commit: Making Computations Atomic10.4.1 Two-Phase Commit in a Centralized System10.4.2 Distributed Transactions and Two-Phase Commit10.5 Summary10.6 Historical NotesExercisesAnswers11 TRANSACTION MANAGER STRUCTURE11.1 Introduction11.2 Normal Processing11.2.1 Transaction Identifiers11.2.2 Transaction Manager Data Structures11.2.3 MyTrid( ), Status_Transaction( ), Leave_Transaction( ), Resume_Transaction( )11.2.4 Savepoint Log Records11.2.5 Begin Work( )11.2.6 Local Commit_Work( ).11.2.7 Remote Commit_Work( ): Prepare( ) and Commit( )11.2.8 Save_Work( ) and Read_Context( )11.2.9 Rollback_Work( )11.3 Checkpoint11.3.1 Sharp Checkpoints11.3.2 Fuzzy Checkpoints11.3.3 Transaction Manager Checkpoint11.4 System Restart11.4.1 Transaction States at Restart11.4.2 Transaction Manager Restart Logic11.4.3 Resource Manager Restart Logic, Identify( )11.4.4 Summary of the Restart Design11.4.5 Independent Resource Managers11.4.6 The Two-Checkpoint Approach: A Different Strategy11.4.7 Why Restart Works11.4.8 Distributed Transaction Resolution: Two Phase Commit at Restart11.4.9 Accelerating Restart11.4.10 Other Restart Issues11.5 Resource Manager Failure and Restart11.6 Archive Recovery11.7 Configuring the Transaction Manager11.7.1 Transaction Manager Size and Complexity11.8 SummaryExercisesAnswers12 ADVANCED TRANSACTION MANAGER TOPICS12.1 Introduction12.2 Heterogeneous Commit Coordinators12.2.1 Closes versus Open Transaction Managers12.2.2 Interoperating with a Closed Transaction Manager12.2.3 Writing A Gateway to an Open Transaction Manager12.2.4 Summary of Transaction Gateways12.3 Highly Available (Non-Blocking) Commit Coordinators12.3.1 Heuristic Decisions Resolve Blocked Transaction Commit12.4 Transfer-of-Commit12.5 Optimizations of Two-Phase Commit12.5.1 Read-Only Commit Optimization12.5.2 Lazy Commit Optimization12.5.3 Linear Commit Optimization12.6 Disaster Recovery at a Remote Site12.6.1 System Pair Takeover12.6.2 Session Switching at Takeover12.6.3 Configuration Options: 1-Safe, 2-Safe, and Very Safe12.6.4 Catch-up After Failure12.6.5 Summary of System Pair Designs12.7 Summary12.8 Historical NotesExercisesAnswersPART SIX - Transactional File System: A Sample Resource Manager13 FILE AND BUFFER MANAGEMENT13.1 Introduction13.2 The File System as a Basis for Transactional Durable Storage13.2.1 External Storage versus Main Memory13.2.3 Levels of Abstraction in a Transactional File and Database Manager13.3 Media and File Management13.3.1 Objects and Operations of the Basic File System13.3.2 Managing Disk Space13.3.3 Catalog Management for Low-Level File Systems13.4 Buffer Management13.4.1 Functional Principles of the Database Buffer13.4.2 Implementation Issues of a Buffer Manager13.4.3 Logging and Recovery from the Buffer's Perspective13.4.4 Optimizing Buffer Manager Performance13.5 Exotics13.5.1 Side Files13.5.2 Single-Level Storage13.6 Summary13.7 Historical NotesExercisesAnswers14 THE TUPLE-ORIENTED FILE SYSTEM14.1 Introduction14.2 Mapping Tuples into Pages14.2.1 Internal Organization of Pages14.2.2 Free Space Administration in a File14.2.3 Tuple Identification14.3 Physical Tuple Management14.3.1 Physical Representation of Attribute Values14.3.2 Physical Representation of Short Tuples14.3.3 Special Aspects of Representing Attribute Values in Tuples14.3.4 Physical Representation of Long Tuples14.3.5 Physical Representation of Complex Tuples and Very Long Attributes14.4 File Organization14.4.1 Administrative Operations14.4.2 An Abstract View of Different File Organizations via Scans14.4.3 Entry-Sequenced Files14.4.4 System-Sequenced Files14.4.5 Relative Files14.4.6 Key-Sequenced Files and Hash Files14.4.7 Summary14.5 Exotics14.5.1 Cluster Files14.5.2 Partitioned Files14.5.3 Using Transactions to Maintain the File System14.5.4 The Tuple-Oriented File System in Current Database Systems14.6 SummaryExercisesAnswers15 ACCESS PATHS15.1 Introduction15.2 Overview of Techniques to Implement Associative Access Paths15.2.1 Summary15.3 Associative Access By Hashing15.3.1 Folding the Key Value into a Numerical Data Type15.3.2 Criteria for a Good Hash Function15.3.3 Overflow Handling in Hash Files15.3.4 Local Administration of Pages in Hash File15.3.5 Summary of Associative Access Based of Hashing15.4 B-Trees15.4.1 B-Trees: The Basic Idea15.4.2 Performance Aspects of B-Trees15.4.3 Synchronization on B-Trees: The Page-Oriented View15.4.4 Synchronization on B-Trees: The Tuple-Oriented View15.4.5 Recovering Operations on B-Trees15.5. Sample Implementation of Some Operations on B-Trees15.5.1 Declarations of Data Structures Assumed in All Programs15.5.2 Implementation of the readkey Operation on a B-Tree15.5.3 Key-Range Locking in a B-Tree15.5.4 Implementation of the Insert Operation for a B-Tree: The Simple Case15.5.5 Implementing B-Tree Insert: The Split Case15.5.6 Summary15.6 Exotics15.6.1 Extendible Hashing15.6.2 The Grid File15.6.3 Holey Brick B-Trees15.7 Summary15.8 Historical NotesExercisesAnswersPART SEVEN - System Surveys16 SURVEY OF TP SYSTEMS16.1 Introduction16.2 IMS16.2.1 Hardware and Operating System Environment16.2.2 Workflow Model16.2.3 Program Isolation16.2.4 Main Storage Databases and Field Calls 16.2.5 Data Sharing16.2.6 Improved Availability and duplexed systems16.2.7 DB216.2.8 Recent Evolution of IMS16.3 CICS and LU6.216.3.1 CICS Overview16.3.2 CICS Services16.3.3. CICS Workflow16.3.4 CICS Distributed Transaction Processing16.3.5 LU6.216.4 Guardian 9016.4.1 Guardian: The Operating System and Hardware16.4.2 Pathway, Terminal Context, and Server Class Management16.4.3 Transaction Management16.4.4 Other Interesting Features16.5 DECdta16.5.1 ACMS's Three-Ball Workflow Model of Transaction Processing16.5.2 ACMS Services16.5.3 ACMS Summary16.5.4 VMS Transaction Management Support16.5.5 Summary of DECdta16.5.6 Reliable Transaction Router (RTR)16.6 X/Open DTP, OSI-TP, CCR16.6.1 The Local Case16.6.2 The Distributed Case: Services and Servers16.6.3 Summary16.7 Other Systems16.7.1 Universal Transaction Manager (UTM)16.7.2 ADABAS TPF16.7.3 Encina16.7.4 Tuxedo16.8 SummaryPART EIGHT - Addenda17 REFERENCES18 DATA STRUCTURES AND INTERFACES19 GLOSSARYINDEX