Parallel Computing Works!
- 1st Edition - May 1, 1994
- Authors: Geoffrey C. Fox, Roy D. Williams, Guiseppe C. Messina
- Language: English
- Hardback ISBN:9 7 8 - 1 - 5 5 8 6 0 - 2 5 3 - 3
- eBook ISBN:9 7 8 - 0 - 0 8 - 0 5 1 3 5 1 - 5
A clear illustration of how parallel computers can be successfully appliedto large-scale scientific computations. This book demonstrates how avariety of applications in physics, bi… Read more

Purchase options
Institutional subscription on ScienceDirect
Request a sales quoteA clear illustration of how parallel computers can be successfully appliedto large-scale scientific computations. This book demonstrates how avariety of applications in physics, biology, mathematics and other scienceswere implemented on real parallel computers to produce new scientificresults. It investigates issues of fine-grained parallelism relevant forfuture supercomputers with particular emphasis on hypercube architecture.
The authors describe how they used an experimental approach to configuredifferent massively parallel machines, design and implement basic systemsoftware, and develop algorithms for frequently used mathematicalcomputations. They also devise performance models, measure the performancecharacteristics of several computers, and create a high-performancecomputing facility based exclusively on parallel computers. By addressingall issues involved in scientific problem solving, Parallel ComputingWorks! provides valuable insight into computational science for large-scaleparallel architectures. For those in the sciences, the findings reveal theusefulness of an important experimental tool. Anyone in supercomputing andrelated computational fields will gain a new perspective on the potentialcontributions of parallelism. Includes over 30 full-color illustrations.
by Geoffrey C. Fox, Roy D. Williams, Paul C. Messina
- Preface
- 1 Introduction
- 1.1 Introduction
- 1.2 The National Vision for Parallel Computation
- 1.3 Caltech Concurrent Computation Program
- 1.4 How Parallel Computing Works
- 2 Technical Backdrop
- 2.2 Hardware
- 2.2.1 Parallel Scientific Computers Before 1980
- 2.2.2 Early 1980s
- 2.2.3 Birth of the Hypercube
- 2.2.4 Mid-1980s
- 2.2.5 Late 1980s
- 2.2.6 Parallel Systems-1992
- 2.3 Software
- 2.3.1 Languages and Compilers
- 2.3.2 Tools
- 2.4 Summary
- 3. A Methodology for Computation
- 3.1 Introduction
- 3.2 The Process
- 3.3 Complex Systems and Space-Time Structure
- 3.4 The Temporal Properties of Complex Systems
- 3.5 Spatial Properties of Complex Systems
- 3.6 Compound Complex Systems
- 3.7 Mapping Complex Systems
- 3.8 Parallel Computing Works?
4 Synchronous Applications I- 4.1 QCD and the Beginning of C3P
- 4.2 Synchronous Applications
- 4.3 Quantum Chromodynamics
- 4.3.1 Introduction
- 4.3.2 Monte Carlo
- 4.3.3 QCD
- 4.3.4 Lattice QCD
- 4.3.5 Concurrent QCD Machines
- 4.3.6 QCD on the Caltech Hypercubes
- 4.3.7 QCD on the Connection Machine
- 4.3.8 Status and Prospects
- 4.4 Spin Models
- 4.4.1 Introduction
- 4.4.2 Ising Model
- 4.4.3 Potts Model
- 4.4.4 XY Model
- 4.4.5 O(3) Model
- 4.5 An Automata Model of Granular Materials
- 4.5.1 Introduction
- 4.5.2 Comparison to Particle Dynamics Models
- 4.5.3 Comparison to Lattice Gas Models
- 4.5.4 The Rules for the Lattice Grain Model
- 4.5.5 O(3) Implementation on a Parallel Computer
- 4.5.6 Simulations
- 4.5.7 Conclusion
- 5 Express and CrOS
- 5.1 Multicomputer Operating Systems
- 5.2 A "Packet" History of Message-passing Systems
- 5.2.1 Prehistory
- 5.2.2 Application-driven Development
- 5.2.3 Collective Communication
- 5.2.4 Automated Decompositon-whoami
- 5.2.5 "Melting"-A Non-crystalline Problem
- 5.2.6 The Mark III
- 5.2.7 Host Programs
- 5.2.8 A Ray Tracer - and an "Operating System"
- 5.2.9 The Crystal Router
- 5.2.10 Portability
- 5.2.11 Express
- 5.2.12 Other Message-passing Systems
- 5.2.13 What Did We Learn?
- 5.2.14 Conclusions
- 5.3 Parallel Debugging
- 5.3.1 Introduction and History
- 5.3.2 Designing a Parallel Debugger
- 5.3.3 Conclusions
- 5.4 Parallel Profiling
- 5.4.1 Missing Point
- 5.4.2 Visualization
- 5.4.3 Goals in Performance Analysis
- 5.4.4 Overhead Analysis
- 5.4.5 Event Tracing
- 5.4.6 Data Distribution Analysis
- 5.4.7 CPU Usage Analysis
- 5.4.8 Why So Many Separate Tools?
- 5.4.9 Conclusions
- 6 Synchronous Applications II
- 6.1 Computational Issues in Synchronous Problems
- 6.2 Convectively-Dominated Flows
- 6.2.1 An Overview of the FCT Technique
- 6.2.2 Mathematics and the FCT Algorithm
- 6.2.3 Parallel Issues
- 6.2.4 Example Problem
- 6.2.5 Performance and Results
- 6.2.6 Summary
- 6.3 Magnetism
- 6.3.1 Introduction
- 6.3.2 The Computational Algorithm
- 6.3.3 Parallel Implementation and Performance
- 6.3.4 Physics Results
- 6.3.5 Conclusions
- 6.4 Phase Transitions
- 6.4.1 The case of h ( J: Antiferromagnetic Transitions
- 6.4.2 The Case of h = -J: Quantum XY Model and Topological Transition
- 6.5 Surface Reconstruction
- 6.5.1 Multigrid Method with Discontinuities
- 6.5.2 Interacting Line Processes
- 6.5.3 Generic Look-up Table and Specific Parametrication
- 6.5.4 Pyramid on a 2D Mesh of Processors
- 6.5.5 Results for Orientation Constraints
- 6.5.6 Results for Depth Constraints
- 6.5.7 Conclusions
- 6.6 Character Recognition by Neural Nets
- 6.6.1 MLP in General
- 6.6.2 Character Recognition using MLP
- 6.6.3 The Multiscale The Multiscale Technique
- 6.6.4 Results
- 6.6.5 Comments and Variants on the Method
- 6.7 An Adaptive Multiscale Scheme
- 6.7.1 Errors in Computing the Motion Field
- 6.7.2 Adaptive Multiscale Scheme on a Multicomputer
- 6.7.3 Conclusions
- 6.8 Collective Stereopsis
- 7 Independent Parallelism
- 7.1 Embarrassingly Parallel Problem Structure
- 7.2 Dynamically Triangulated Random Surfaces
- 7.2.1 Introduction
- 7.2.2 Discretized Strings
- 7.2.3 Computational Aspects
- 7.2.4 Performance of String Program
- 7.2.5 Conclusion
- 7.3 Numerical Study of High-Tc Spin Systems
- 7.4 Statistical Gravitational Lensing
- 7.5 Parallel Random Number Generators
- 7.6 The GENESIS Project
- 7.6.1 What is Computational Neurobiology?
- 7.6.2 Parallel Computers?
- 7.6.3 Problems with Most Present Day Parallel Computers
- 7.6.4 What is GENESIS?
- 7.6.5 Task Farming
- 7.6.6 Distributed Modelling via the Postmaster Element
- 8 Full Matrix Algorithms
- 8.1 Full and Banded Matrix Algorithms
- 8.1.1 Matrix Decomposition
- 8.1.2 Basic Matrix Arithmetic
- 8.1.3 Matrix Multiplication for Banded Matrices
- 8.1.4 Systems of Linear Equations
- 8.1.5 The Gauss-Jordan Method
- 8.1.6 Other Matrix Algorithms
- 8.1.7 Concurrent Linear Algebra Libraries
- 8.1.8 Problem Structure
- 8.1.9 Conclusions
- 8.2 Quantum Mechanical Reactive Scattering
- 8.2.1 Introduction
- 8.2.2 Methodology
- 8.2.3 Parallel Algorithm
- 8.2.4 Results and Discussion
- 8.3 Electron-Molecule Collisions
- 8.3.1 Introduction
- 8.3.2 The SMC Method and Its Implementation
- 8.3.3 Parallel Implementation
- 8.3.4 Performance
- 8.3.5 Selected Results
- 8.3.6 Conclusion
- 9 Loosely Synchronous Problems
- 9.1 Problem Structure
- 9.2 Micromechanical Simulations
- 9.3 Plasma Particle-in-Cell Simulation
- 9.3.1 Introduction
- 9.3.2 GCPIC Algorithm
- 9.3.3 Electron Beam Plasma Instability
- 9.3.4 Performance Results for 1D Electrostatic Code
- 9.3.5 One-Dimensional Electromagnetic Code
- 9.3.6 Dynamic Load Balancing
- 9.3.7 Summary
- 9.4 Computational Electromagnetics
- 9.5 LU Factorization
- 9.5.1 Introduction
- 9.5.2 Design Overview
- 9.5.3 Reduced-Communication Pivoting
- 9.5.4 New Data Distributions
- 9.5.5 Performance Versus Scattering
- 9.5.6 Performance
- 9.5.7 Conclusions
- 9.6 Concurrent DASSL
- 9.6.1 Introduction
- 9.6.2 Mathematical Formulation
- 9.6.3 proto-Cdyn - Simulation Layer
- 9.6.4 Concurrent Formulation
- 9.6.5 Chemical Engineering Example
- 9.6.6 Conclusions
- 9.7 Adaptive Multigrid
- 9.7.1 Introduction
- 9.7.2 The Basic Algorithm
- 9.7.3 The Adaptive Algorithm
- 9.7.4 The Concurrent Algorithm
- 9.7.5 Summary
- 9.8 Munkres Algorithm for Assignment
- 9.8.1 Introduction
- 9.8.2 The Sequential Algorithm
- 9.8.3 The Concurrent Algorithm
- 9.9. Optimization Methods for Neural Nets
- 9.9.1 Deficiencies of Steepest Descent
- 9.9.2 The "Bold Driver" Network
- 9.9.3 The Broyden-Fletcher-Goldfarb-Shanno One-Step Memoryless Quasi-Newton Method
- 9.9.4 Parallel Optimization
- 9.9.5 Experiment: The Dichotomy Problem
- 9.9.6 Experiment: Time Series Prediction
- 9.9.7 Summary
- 10 DIME Programming Environment
- 10.1 DIME
- 10.1.1 Applications and Extensions
- 10.1.2 The Components of DIME
- 10.1.3 Domain Definition
- 10.1.4 Mesh Structure
- 10.1.5 Refinement
- 10.1.6 Load Balancing
- 10.1.7 Summary
- 10.2 DIMEFEM
- 10.2.1 Memory Allocation
- 10.2.2 Operations and Elements
- 10.2.3 Navier-Stokes Solver
- 10.2.4 Results
- 10.2.5 Summary
- 11 Load Balancing and Optimization
- 11.1 Load Balancing as an Optimization Problem
- 11.1.1 Load Balancing a Finite-Element Mesh
- 11.1.2 The Optimization Problem and Physical Analogy
- 11.1.3 Algorithms for Load Balancing
- 11.1.4 Simulated Annealing
- 11.1.5 Recursive Bisection
- 11.1.6 Eigenvalue Recursive Bisection
- 11.1.7 Testing Procedure
- 11.1.8 Test Results
- 11.1.9 Conclusions
- 11.2 Physical Analogy
- 11.3 Physical Optimization
- 11.4 The Travelling Salesman Problem
- 11.4.1 Background on Local Search Heuristics
- 11.4.2 Background on Markov Chains and Simulated Annealing
- 11.4.3 The New Algorithms-Large-Step Markov Chains
- 11.4.4 Results
- 12 Irregular LS Problems
- 12.1 Irregular Loosely Synchronous Problems are Hard
- 12.2 The Elctrosensory System of the Fish
- 12.2.1 Physical Model
- 12.2.2 Mathematical theory
- 12.2.3 Results
- 12.2.4 Summary
- 12.3 Transonic Flow
- 12.3.1 Compressible Flow Algorithm
- 12.3.2 Adaptive Refinement
- 12.3.3 Examples
- 12.3.4 Performance
- 12.3.5 Summary
- 12.4 Tree Codes for N-body Simulations
- 12.4.1 Oct-Trees
- 12.4.2 Computing Forces
- 12.4.3 Parallelism in Tree Codes
- 12.4.4 Acquiring Locally Essential Data
- 12.4.5 Comments on Performance
- 12.5 Fast vortex Algorithm
- 12.5.1 Vortex Methods
- 12.5.2 Fast Algorithms
- 12.5.3 Hypercube Implementation
- 12.5.4 Efficiency of Parallel Implementation
- 12.5.5 Results
- 12.6 Cluster Algorithms for Spin Models
- 12.6.1 Monte Carlo Calculations of Spin Models
- 12.6.2 Cluster Algorithms
- 12.6.3 Parallel Cluster Algorithms
- 12.6.4 Self-labelling
- 12.6.5 Global Equivalencing
- 12.6.6 Other Algorithms
- 12.6.7 Summary
- 12.7 Sorting
- 12.7.1 The Merge Strategy
- 12.7.2 The Bitonic Algorithm
- 12.7.3 Shellsort or Diminishing Increment Algorithm
- 12.7.4 Quicksort or Samplesort Algorithm
- 12.8 Hierarchical Tree-Structures
- 12.8.1 Introduction
- 12.8.2 Adaptive Structures
- 12.8.3 Tree as Grid
- 12.8.4 Conclusion
- 13 Data Parallel C and Fortran
- 13.1 High-Level Languages
- 13.1.1 High Performance Fortran Perspective
- 13.1.2 Problem Architecture and Message-Passing Fortran
- 13.1.3 Problem Architecture and Fortran 77
- 13.2 A Software Tool
- 13.2.1 Is Any Assistance Really Needed?
- 13.2.2 Overview of the Tool
- 13.2.3 Dependence-based Data Partitioning
- 13.2.4 Mapping Data to Processors
- 13.2.5 Communication Analysis and Performance Improvement Transformations
- 13.2.6 Communication Analysis Algorithm
- 13.2.7 Static Performance Estimator
- 13.2.8 Conclusion
- 13.3 Fortran 90 Experiments
- 13.4 Optimizing Compilers by Neural Networks
- 13.5 ASPAR
- 13.5.1 Degrees of Difficulty
- 13.5.2 Various Parallelizing Technologies
- 13.5.3 The Local View
- 13.5.4 The "Global" View
- 13.5.5 Global Strategies
- 13.5.6 Dynamic Data Distribution
- 13.5.7 Conclusions
- 13.6 Coherent Parallel C
- 13.7 Hierarchical Memory
- 14 Asynchronous Applications
- 14.1 Asynchronous Problems
- 14.2 Melting in Two Dimensions
- 14.2.1 Problem Description
- 14.2.2 Solution Method
- 14.2.3 Concurrent Update Procedure
- 14.2.4 Performance Analysis
- 14.3 Computer Chess
- 14.3.1 Sequential Computer Chess
- 14.3.2 Parallel Computer chess: the Hardware
- 14.3.3 Parallel Alpha-Beta Pruning
- 14.3.4 Load Balancing
- 14.3.5 Speedup Measurements
- 14.3.6 Real-time Graphical Performance Monitoring
- 14.3.7 Speculation
- 14.3.8 Summary
- 15 High-Level Asynchronous Software
- 15.1 Asynchronous Software Paradigms
- 15.2 MOOS II
- 15.2.1 Design of MOOSE
- 15.2.2 Dynamic Load-Balancing Support
- 15.2.3. What We Learned
- 15.3 Time Warp
- 16 The Zipcode Message-Passing System
- 16.1 Overview of Zipcode
- 16.2 Low-Level Primitives
- 16.2.1 CE/RK Overview
- 16.2.2 Interface with the CE/RK system
- 16.2.3 CE Functions
- 16.2.4 RK Calls
- 16.2.5 Zipcode Calls
- 16.3 High-Level Primitives
- 16.3.1 Invoices
- 16.3.2 Packing and Unpacking
- 16.3.3 The Packed-Message Functions
- 16.3.4 Fortran Interface
- 16.4 Details of Execution
- 16.4.1 Initialization/Termination
- 16.4.2 Process Creation/Destruction
- 16.5 Conclusions
- 17 MOVIE
- 17.1 Introduction
- 17.1.1 The Beginning
- 17.1.2 Towards the MOVIE System
- 17.1.3 Current Status and Outlook
- 17.2 System Overview
- 17.2.1 The MOVIE System in a Nutshell
- 17.2.2 MovieScript as Virtual Machine Language
- 17.2.3 Data-Parallel Computing
- 17.2.4 Model for MIMD-parallelism
- 17.2.5 Distributed Computing
- 17.2.6 Object Orientation
- 17.2.7 Integrated Visualization Model
- 17.2.8 "In Large" Extensibility Model
- 17.2.9 CASE Tools
- 17.2.10 Planned MOVIE Applications
- 17.3 Map Separates
- 17.3.1 Problem Specification
- 17.3.2 Test Case
- 17.3.3 Segmentation via RGB Clustering
- 17.3.4 Comparison with JPL Neural Net Results
- 17.3.5 Edge Detection via Zero Crossing
- 17.3.6 Towards the Map Expert System
- 17.3.7 Summary
- 18 Simulation and Analysis
- 18.1 MetaProblems and MetaSoftware
- 18.1.1 Applications
- 18.1.2 Asynchronous versus Loosely Synchronous?
- 18.1.3 Software for Compound Problems
- 18.2 ISIS: An Interactive Seismic Imaging System
- 18.2.1 Introduction
- 18.2.2 Concepts of Interactive Imaging
- 18.2.3 Geologist-As-Analyst
- 18.2.4 Why Interactive Imaging?
- 18.2.5 System Design
- 18.2.6 Performance Considerations
- 18.2.7 Trace Manager
- 18.2.8 Display Manager
- 18.2.9 User Interface
- 18.2.10 Computation
- 18.2.11 Prototype System
- 18.2.12 Conclusions
- 18.3 Parallel Simulations that Emulate Function
- 18.3.1 The Basic Simulation Structure
- 18.3.2 The Run-time Environment-the Centaur Operating System
- 18.3.3 SDI Simulation Evolution
- 18.3.4 Simulation Framework and Synchronization Control
- 18.4 Multitarget Tracking
- 18.4.1 Nature of the Problem
- 18.4.2 Tracking Techniques
- 18.4.3 Algorithm Overview
- 18.4.4 Two-dimensional Mono Tracking
- 18.4.5 Three-dimensional Tracking
- 19 Parallel Computing in Industry
- 19.1 Motivation
- 19.2 Examples of Industrial Applications
- 20 Computational Science
- 20.1 Lessons
- 20.2 Computational Science
- Appendices
- A C3P Reports
- B Selected Biographic Information
- Bibliography
- Index
- No. of pages: 977
- Language: English
- Edition: 1
- Published: May 1, 1994
- Imprint: Morgan Kaufmann
- Hardback ISBN: 9781558602533
- eBook ISBN: 9780080513515
GF