ContentsPrefaceColor PlatesPART I - The Parallel Computing EnvironmentChapter 1 - Parallel Computing ArchitecturesAlice E. Koniges, David C. Eder, Margaret Cahir1.1 Historical Parallel Computing Architectures1.2 Contemporary Parallel Computing Architectures1.2.1 MPP Processors1.2.2 MPP Memory1.2.3 MPP Interconnect NetworkReferencesChapter 2 - Parallel Application PerformanceAlice E. Koniges2.1 Defining Performance2.2 Measuring Performance2.2.1 MPP Application SpeedupReferencesChapter 3 - Programming Models and MethodsMargaret Cahir, Robert Moench, Alice E. Koniges3.1 Message-Passing Models3.1.1 PVM3.1.2 MPI3.1.3 SHMEM3.2 Data-Parallel Models3.2.1 High-Performance Fortran3.3 Parallel Programming Methods3.3.1 Nested- and Mixed-Model Methods3.3.2 POSIX Threads and Mixed Models3.3.3 Compiler Extensions for Explicit Parallelism with Distributed Objects3.3.4 Work-Sharing ModelsReferencesChapter 4 - Parallel Programming ToolsMargaret Cahir, Robert Moench, Alice E. Koniges4.1 The Apprentice Performance Analysis Tool4.2 Debuggers4.2.1 Process Control4.2.2 Data ViewingChapter 5 - Optimizing for Single-Processor PerformanceJeff Brooks, Sara Graffunder, Alice E. Koniges5.1 Using the Functional Units Effectively5.2 Hiding Latency with the Cache5.3 Stream Buffer Optimizations5.4 E-Register Operations5.5 How Much Performance Can Be Obtained on a Single Processor?ReferencesChapter 6 - Scheduling IssuesMorris A. Jette6.1 Gang Scheduler Implementation6.2 Gang Scheduler PerformanceReferencesPART II - The ApplicationsChapter 7 - Ocean Modeling and VisualizationYi Chao, P. Peggy Li, Ping Wang, Daniel S. Katz, Benny N. Cheng, Scott Whitman7.1 Introduction7.2 Model Description7.3 Computational Considerations7.3.1 Parallel Software Tools7.3.2 Compiler Options7.3.3 Memory Optimization and Arithmetic Pipelines7.3.4 Optimized Libraries7.3.5 Replacement of If/Where Statements by Using Mask Arrays7.3.6 Computational Performance7.4 Visualization on MPP Machines7.5 Scientific Results7.6 Summary and Future ChallengesAcknowledgmentsReferencesChapter 8 - Impact of Aircraft on Global Atmospheric ChemistryDouglas A. Rotman, John R. Tannahill, Steven L. Baughcum8.1 Introduction8.2 Industrial Considerations8.3 Project Objectives and Application Coder8.4 Computational Considerations8.4.1 Why Use an MPP?8.4.2 Programming Considerations8.4.3 Algorithm Considerations8.5 Computational Results8.5.1 Performance8.5.2 Subsidiary Technology8.6 Industrial Results8.7 SummaryReferencesChapter 9 - Petroleum Reservoir ManagementMichael DeLong, Allyson Gajraj, Wayne Joubert, Olaf Lubeck, James Sanderson, Robert E. Stephenson, Gautam S. Shiralkar, Bart van Bloemen Waanders9.1 Introduction9.2 The Need for Parallel Simulations9.3 Basic Features of the Falcon Simulator9.4 Parallel Programming Model and Implementation9.5 IMPES Linear Solver9.6 Fully Implicit Linear Solver9.7 Falcon Performance Results9.8 Amoco Field Study9.9 SummaryAcknowledgmentsReferencesChapter 10 - An Architecture-Independent Navier-Stokes CodeJohnson C. T. Wang, Stephen Taylor10.1 Introduction10.2 Basic Equations10.2.1 Nomenclature10.3 A Navier-Stokes Solver10.4 Parallelization of a Navier-Stokes Solver10.4.1 Domain Decomposition10.4.2 Parallel Algorithm10.5 Computational Results10.5.1 Supersonic Flow over Two Wedges10.5.2 Titan IV Launch Vehicle10.5.3 Delta II 7925 Vehicle10.6 SummaryAcnowledgmentsReferencesChapter 11 - Gaining Insights into the Flow in a Static MixerOlivier Byrde, Mark L. Sawley11.1 Introduction11.1.1 Overview11.1.2 Description of the Application11.2 Computational Aspects11.2.1 Why Use an MPP?11.2.2 Flow Computation11.2.3 Particle Tracking11.3 Performance Results11.3.1 Flow Computation11.3.2 Particle Tracking11.4 Industrial Results11.4.1 Numerical Solutions11.4.2 Optimization Results11.4.3 Future Work11.5 SummaryAcknowledgmentsReferencesChapter 12 - Modeling Groundwater Flow and Contaminant TransportWilliam J. Bosil, Steven F. Ashby, Chuck Baldwin, Robert D. Falgout, Steven G. Smith, Andrew F. B. Tompson12.1 Introduction12.2 Numerical Simulation of Groundwater Flow12.2.1 Flow and Transport Model12.2.2 Discrete Solution Approach12.3 Parallel Implementation12.3.1 Parallel Random Field Generation12.3.2 Preconditioned Conjugate Gradient Solver12.3.3 Gridding and Data Distribution12.3.4 Parallel Computations in ParFlow12.3.5 Scalability12.4 The MGCG Algorithm12.4.1 Heuristic Semicoarsening Strategy12.4.2 Operator-Induced Prolongation and Restriction12.4.3 Definition of Coarse Grid Operator12.4.4 Smoothers12.4.5 Coarsest Grid Solvers12.4.6 Stand-Alone Multigrid versus Multigrid As a Preconditioner12.5 Numerical Results12.5.1 The Effect of Coarsest Grid Solver Strategy12.5.2 Increasing the Spatial Resolution12.5.3 Enlarging the Size of the Domain12.5.4 Increasing the Degree of Heterogeneity12.5.5 Parallel Performance on the Cray T3D12.6 SummaryAcknowledgmentsReferencesChapter 13 - Simulation of Plasma ReactorsStephen Taylor, Marc Rieffel, Jerrell Watts, Sadasivan Shankar13.1 Introduction13.2 Computational Considerations13.2.1 Grid Generation and Partitioning Techniques13.2.2 Concurrent DSMC Algorithm13.2.3 Grid Adaption Technique13.2.4 Library Technology13.3 Simulation Results13.4 Performance Results13.5 SummaryAcknowledgmentsReferencesChapter 14 - Electron-Molecule Collisions for Plasma ModelingCarl Winstead, Chuo-Han Lee, Vincent McKoy14.1 Introduction14.2 Computing Electron-Molecule Cross Sections14.2.1 Theoretical Outline14.2.2 Implementation14.2.3 Parallel Organization14.3 Performance14.4 SummaryAcknowledgmentsReferencesChapter 15 - Three-Dimensional Plasma Particle-in-Cell Calculations of Ion Thruster Backflow ContaminationRobie I. Samanta Roy, Daniel E. Hastings, Stephen Taylor15.1 Introduction15.2 The Physical Model15.2.1 Beam Ions15.2.2 Neutral Efflux15.2.3 CEX Propellant Ions15.2.4 Electrons15.3 The Numerical Model15.4 Parallel Implementation15.4.1 Partitioning15.4.2 Parallel PIC Algorithm15.5 Results15.5.1 3D Plume Structure15.5.2 Comparison of 2D and 3D Results15.6 Parallel Study15.7 SummaryAcknowledgmentsReferencesChapter 16 - Advanced Atomic-Level Materials DesignLin H. Yang16.1 Introduction16.2 Industrial Considerations16.3 Computational Considerations and Parallel Implementations16.4 Applications to Grain Boundaries in Polycrystalline Diamond16.5 SummaryAcknowledgmentsReferencesChapter 17 - Solving Symmetric Eigenvalue ProblemsDavid C. O'Neal, Raghurama Reddy17.1 Introduction17.2 Jacobi's Method17.3 Classical Jacobi Method17.4 Serial Jacobi Method17.5 Tournament Orderings17.6 Parallel Jacobi Method17.7 Macro Jacobi Method17.8 Computational Experiments17.8.1 Test Problems17.8.2 Convergence17.8.3 Scaling17.9 SummaryAcknowledgmentsReferencesChapter 18 - Nuclear Magnetic Resonance SimulationsAlan J. Benesi, Kenneth M. Merz, James J. Vincent, Ravi Subramanya18.1 Introduction18.2 Scientific Considerations18.3 Description of the Application18.4 Computational Considerations18.4.1 Algorithmic Considerations18.4.2 Programming Considerations18.5 Computational Results18.6 Scientific Results18.6.1 Validation of Simulation18.6.2 Interesting Scientific Results18.7 SummaryAcknowledgmentsReferencesChapter 19 - Molecular Dynamics Simulations Using Particle-Mesh Ewald MethodsMichael F. Crowley, David W. Deerfield II, Tom A. Darden, Thomas E. Cheatham III19.1 Introduction: Industrial Considerations19.1.1 Overview19.1.2 Cutoff Problem for Long-Distance Forces19.1.3 Particle-Mesh Ewald Method19.2 Computational Considerations19.2.1 Why Use an MPP?19.2.2 Parallel PME19.2.3 Coarse-Grain Parallel PME19.3 Computational Results19.3.1 Performance19.3.2 Parallel 3D FFT and Groups19.4 Industrial Strength Results19.5 The Future19.6 SummaryReferencesChapter 20 - Radar Scattering and Antenna ModelingTom Cwik, Cinzia Zuffada, Daniel S. Katz, Jay Parker20.1 Introduction20.2 Electromagnetic Scattering and Radiation20.2.1 Formulation of the Problem20.2.2 Why This Formulation Addresses the Problem20.3 Finite Element Modeling20.3.1 Discretization of the Problem20.3.2 Why Use a Scalable MPP?20.4 Computational Formulation and Results20.4.1 Constructing the Matrix Problem20.4.2 Beginning the Matrix Solution20.4.3 Completing the Solution of the Matrix Problem20.4.4 The Three Stages of the Application20.5 Results for Radar Scattering and Antenna Modeling20.5.1 Anistropic Scattering20.5.2 Patch Antennas-Modeling Conformal Antennas with PHOEBE20.6 Summary and Future ChallengesAcknowledgmentsReferencesChapter 21 - Functional Magnetic Resonance Imaging Dataset AnalysisNigel H. Goddard, Greg Hood, Jonathan D. Cohen, Leigh E. Nystrom, William F. Eddy, Christopher R. Genovese, Douglas C. Noll21.1 Introduction21.2 Industrial Considerations21.2.1 Overview21.2.2 Description of the Application21.2.3 Parallelization and the Online Capability21.3 Computational Considerations21.3.1 Why Use an MPP?21.3.2 Programming Considerations21.3.3 Algorithm Considerations21.4 Computational Results21.4.1 Performance21.4.2 Subsidiary Technologies21.5 Clinical and Scientific Results21.5.1 Supercomputing '96 Demonstration21.5.2 Science Application21.5.3 What Are the Next Problems to Tackle?21.6 SummaryAckowledgmentsReferencesChapter 22 - Selective and Sensitive Comparison of Genetic Sequence DataAlexander J. Ropelewski, Hugh B. Nicholas, Jr., David W. Deerfield II22.1 Introduction22.2 Industrial Considerations22.2.1 Overview/Statement of the Problem22.3 Approaches Used to Compare Sequences22.3.1 Visualization of Sequence Comparison22.3.2 Basic Sequence-Sequence Comparison Algorithm22.3.3 Basic Sequence-Profile Comparison Algorithm22.3.4 Other Approaches to Sequence Comparison22.4 Computational Considerations22.4.1 Why Use an MPP?22.4.2 Programming Considerations22.4.3 Algorithm Considerations22.5 Computational Results22.5.1 Performance22.6 Industrial and Scientific Considerations22.7 The Next Problems to Tackle22.8 SummaryAcknowledgmentsReferencesChapter 23 - Interactive Optimization of Video Compression AlgorithmsHenri Nicolas, Fred Jordan23.1 Introduction23.2 Industrial Considerations23.3 General Description of the System23.3.1 General Principle23.3.2 Main Advantages Offered by Direct View23.4 Parallel Implementation23.4.1 Remarks23.5 Description of the Compression Algorithm23.6 Experimental Results23.7 SummaryAckowledgmentsReferencesPART III - Conclusions and PredictionsChapter 24 - Designing Industrial Parallel ApplicationsAlice E. Koniges, David C. Eder, Michael A. Heroux24.1 Design Lessons from the Applications24.1.1 Meso- to Macroscale Environmental Modeling24.1.2 Applied Fluid Dynamics24.1.3 Applied Plasma Dynamics24.1.4 Material Design and Modeling24.1.5 Data Analysis24.2 Design Issues24.2.1 Code Conversion Issues24.2.2 The Degree of Parallelism in the Application24.3 Additional Design IssuesChapter 25 - The Future of Industrial Parallel ComputingMichael A. Heroux, Horst Simon, Alice E. Koniges25.1 The Role of Parallel Computing in Industry25.2 Microarchitecture Issues25.2.1 Prediction25.2.2 Discussion25.3 Macroarchitecture Issues25.3.1 Prediction25.3.2 Discussion25.4 System Software Issues25.4.1 Prediction25.4.2 Discussion25.5 Programming Environment Issues25.5.1 Prediction25.5.2 Discussion25.6 Applications Issues25.6.1 Prediction25.6.2 Discussion25.7 Parallel Computing in Industry25.7.1 Area 1: Parallel Execution of a Single Analysis: Incompressible CFD Analysis25.7.2 Area 2: Design Optimization: Noise, Vibration, Harshness (NVH) Analysis25.7.3 Area 3: Design Studies: Crash Design Optimization25.7.4 Area 4: Interactive, Intuitive, Immersive Simulation Environments: Large- Scale Particle Tracing25.8 Looking Forward: The Role of Parallel Computing in the Digital Information Age25.8.1 Increasing the Demand for Parallel Computing25.8.2 The Importance of Advanced User Interfaces25.8.3 Highly Integrated Computing25.8.4 The FutureReferencesAppendix: Mixed Models with Pthreads and MPIVijay Sonnad, Chary G. Tamirisa, Gyan BhanotGlossaryIndexContributors