Limited Offer
ChatGPT
Principles and Architecture
- 1st Edition - June 1, 2025
- Author: Cheng Ge
- Language: English
- Paperback ISBN:9 7 8 - 0 - 4 4 3 - 2 7 4 3 6 - 7
- eBook ISBN:9 7 8 - 0 - 4 4 3 - 2 7 4 3 7 - 4
ChatGPT: Principles and Architecture is an important comprehensive book, bringing readers up to date with the latest developments in large language models such as ChatGPT, and fu… Read more
Purchase options
Institutional subscription on ScienceDirect
Request a sales quoteChatGPT: Principles and Architecture is an important comprehensive book, bringing readers up to date with the latest developments in large language models such as ChatGPT, and fulfilling the need for a resource that not only explains the theory but also provides insights into the implementation of AI technologies. The book bridges the knowledge gap between theoretical AI concepts and their practical applications. It equips industry professionals and researchers with a deeper understanding of large language models, enabling them to leverage these technologies effectively in their respective fields. This book tackles the complexity of understanding large language models and their practical applications. It demystifies the underlying technologies and strategies used in developing ChatGPT and similar models, offering readers a clear roadmap from conceptual understanding to practical implementation. By combining theoretical knowledge with real-world examples, the book enables readers to grasp the nuances of AI technologies, paving the way for innovative applications and solutions in their professional domains. In exploring the intricacies of large language models, the book focuses on the principles, architecture, pretraining, transfer learning, and middleware programming techniques of ChatGPT, providing a useful resource for the research and academic communities. The book addresses the needs of industry professionals, researchers and students in the field of AI and computer science who face daily challenges in understanding and implementing complex large language model technologies. It provides them with the necessary theoretical knowledge and practical insights to effectively apply these technologies in their work. Furthermore, it helps in overcoming the gap between theoretical concepts of large language models and their practical applications, offering guidance on how to navigate through the rapidly evolving landscape of AI.
- Offers comprehensive insights into the principles and architecture of ChatGPT, helping readers to understand the intricacies of large language models
- A detailed analysis of large language model technologies is provided, covering key aspects of large language models such as pretraining, transfer learning, and middleware programming, addressing these technical aspects in a detailed and accessible manner
- Real-world examples and case studies are included, illustrating how large language models can be applied in various industries and professional settings
- Future developments and potential innovations in the field of large language models are discussed, preparing readers for upcoming changes and technological advancements
Researchers in the field of artificial intelligence (AI) and related disciplines using large language models, engineers dealing with large-scale data processing and analysis, and AI product managers, seeking an up to date, in-depth yet accessible understanding of the principles, mechanisms and architecture of large language models such as ChatGPT and their application to everyday life and work; T professionals, software developers, those working to enhance security and privacy in data management, and technical managers in industries such as healthcare, education and finance, and the governmental, administrative, legal and technology sectors where AI and machine learning are becoming increasingly relevant and who need a resource to assist in their understanding of current AI and its implementation in various professional contexts
Chapter 1 The New Milestone in AI - ChatGPT
1.1 Development History of ChatGPT
1.2 Capability Level of ChatGPT
1.3 Evolution of Large Language Models
1.3.1 From Symbolism to Connectionism
1.3.2 Transformer Architecture
1.3.3 Unsupervised Pretraining
1.3.4 Supervised Fine-tuning
1.3.5 Reinforcement Learning from Human Feedback
1.4 Technology Stack of Large Language Models
1.5 Impact of Large Language Models
1.6 Barriers to Replication of Large Language Models
1.6.1 Computational Bottlenecks
1.6.2 Data Bottlenecks
1.6.3 Engineering Bottlenecks
1.7 Limitations and Improvement Directions for Large Language Models
1.8 Summary
Chapter 2 In-Depth Understanding of Transformer Architecture
2.1 Introduction to Transformer Architecture
2.2 Self-attention Mechanism
2.2.1 Calculation Process of Self-attention
2.2.2 Essence of Self-attention Mechanism
2.2.3 Advantages and Limitations of Self-attention
2.3 Multi-head Attention
2.3.1 Implementation of Multi-head Attention
2.3.2 Role of Multi-head Attention
2.3.3 Optimization of Multi-head Attention
2.4 Feed-forward Neural Networks
2.5 Residual Connections
2.6 Layer Normalization
2.7 Position Encoding
2.7.1 Design and Implementation of Position Encoding
2.7.2 Variants of Position Encoding
2.7.3 Advantages and Limitations of Position Encoding
2.8 Training and Optimization
2.8.1 Loss Functions
2.8.2 Optimizers
2.8.3 Learning Rate Adjustment Strategies
2.8.4 Regularization
2.8.5 Other Training and Optimization Techniques
2.9 Summary
Chapter 3 Generative Pretraining
3.1 Introduction to Generative Pretraining
3.2 GPT's Transformer Architecture
3.3 Process of Generative Pretraining
3.3.1 Objectives of Generative Pretraining
3.3.2 Error Backpropagation in Generative Pretraining
3.4 Supervised Fine-tuning
3.4.1 Principles of Supervised Fine-tuning
3.4.2 Specific Tasks for Supervised Fine-tuning
3.4.3 Steps of Fine-tuning for Specific Tasks
3.5 Summary
Chapter 4 Unsupervised Multi-task and Zero-shot Learning
4.1 Encoder and Decoder in GPT-2
4.2 Transformer Architecture of GPT-2
4.2.1 Layer Normalization
4.2.2 Orthogonal Initialization
4.2.3 Reversible Tokenization Methods
4.2.4 Learnable Relative Position Encoding
4.3 Unsupervised Multi-task Learning
4.4 Relationship between Multi-task and Zero-shot Learning
4.5 Autoregressive Generation Process in GPT-2
4.5.1 Token Embedding Matrices
4.5.2 Autoregressive Process
4.6 Summary
Chapter 5 Sparse Attention and Content-based Learning in GPT-3
5.1 Architecture of GPT-3 5.2 Sparse Attention Mechanism
5.2.1 Characteristics of Sparse Transformer
5.2.2 Local Band Attention
5.2.3 Cross-layer Sparse Connections
5.3 Meta-learning and Content-based Learning
5.3.1 Meta-learning
5.3.2 Content-based Learning
5.4 Bayesian Inference of Concept Distribution
5.4.1 Implicit Fine-tuning
5.4.2 Bayesian Reasoning
5.5 Reasoning Capability of Thought Chains
5.6 Summary
Chapter 6 Pretraining Strategies for Large Language Models
6.1 Pretraining Datasets
6.2 Data Processing for Pretraining
6.3 Distributed Training Modes
6.3.1 Data Parallelism
6.3.2 Model Parallelism
6.4 Distributed Training Architectures
6.4.1 Pathways
6.4.2 Megatron-LM
6.4.3 ZeRO
6.5 Examples of Training Strategies
6.5.1 Training Frameworks
6.5.2 Parameter Stability
6.5.3 Adjustments in Training Settings
6.5.4 BF16 Optimization
6.5.5 Other Factors
6.6 Summary
Chapter 7 Proximal Policy Optimization Algorithms
7.1 Policy Gradient Methods
7.1.1 Basic Principles of Policy Gradient Methods
7.1.2 Importance Sampling
7.1.3 Introduction of Advantage Functions
7.2 Actor-critic Algorithms
7.2.1 Basic Steps of the Algorithm
7.2.2 Value Functions and Policy Updates
7.2.3 Challenges and Problems in Actor-critic Algorithms
7.3 Trust Region Policy Optimization
7.3.1 Policy Optimization Objectives in TRPO
7.3.2 Limitations of TRPO Algorithm
7.4 Principles of PPO Algorithm
7.4.1 Objective Functions
7.4.2 Loss and Reward Functions
7.5 Summary
Chapter 8 Human Feedback Reinforcement Learning
8.1 Role of Reinforcement Learning in ChatGPT Iterations
8.2 Training Datasets for InstructGPT/ChatGPT
8.2.1 Sources of Fine-tuning Dataset
8.2.2 Annotation Standards
8.2.3 Data Analysis
8.3 Stages of Human Feedback Reinforcement Learning
8.3.1 Supervised Fine-tuning Stage
8.3.2 Reward Modeling Stage
8.3.3 Reinforcement Learning Stage
8.4 Reward Modeling Algorithms
8.4.1 Algorithm Concepts
8.4.2 Loss Functions
8.5 Application of PPO in InstructGPT
8.6 Multi-turn Dialogue Capabilities
8.7 Necessity of Human Feedback Reinforcement Learning
8.8 Summary
Chapter 9 Low-Compute Domain Transfer for Large Language Models
9.1 Bootstrapping Instructions
9.2 AI Feedback in Low-compute Environments
9.3 Low-Rank Adaptation
9.3.1 Training and Deployment of Low-rank Adaptation Models
9.3.2 Choice of Rank in Low-rank Adaptation
9.4 Quantization - Reducing Compute Requirements for Deployment
9.5 SparseGPT Pruning Algorithm
9.6 Low-compute Transfer Cases of Open-source Large Language Models
9.6.1 Baseline Models
9.6.2 Bootstrapped Instruction Fine-tuning of the Llama Series
9.6.3 Chinese Solutions 9.6.4 Medical Domain Transfer Cases
9.6.5 Legal Domain Transfer Cases
9.7 Summary
Chapter 10 Middleware Programming
10.1 Filling the Gap - LangChain Comes at the Right Time
10.2 Multimodal Fusion Middleware
10.2.1 Task Planning
10.2.2 Model Selection
10.2.3 Task Execution
10.2.4 Response Generation
10.3 AutoGPT Autonomous Agents and Task Planning
10.4 Competitive Middleware Frameworks
10.5 Summary
Chapter 11 The Future Path of Large Language Models
11.1 Path to Strong AI
11.2 Data Resource Depletion
11.3 Limitations of Autoregressive Models
11.4 Embodied Intelligence
11.4.1 Challenges of Embodied Intelligence
11.4.2 PaLM-E
11.4.3 ChatGPT for Robotics
11.5 Summary
1.1 Development History of ChatGPT
1.2 Capability Level of ChatGPT
1.3 Evolution of Large Language Models
1.3.1 From Symbolism to Connectionism
1.3.2 Transformer Architecture
1.3.3 Unsupervised Pretraining
1.3.4 Supervised Fine-tuning
1.3.5 Reinforcement Learning from Human Feedback
1.4 Technology Stack of Large Language Models
1.5 Impact of Large Language Models
1.6 Barriers to Replication of Large Language Models
1.6.1 Computational Bottlenecks
1.6.2 Data Bottlenecks
1.6.3 Engineering Bottlenecks
1.7 Limitations and Improvement Directions for Large Language Models
1.8 Summary
Chapter 2 In-Depth Understanding of Transformer Architecture
2.1 Introduction to Transformer Architecture
2.2 Self-attention Mechanism
2.2.1 Calculation Process of Self-attention
2.2.2 Essence of Self-attention Mechanism
2.2.3 Advantages and Limitations of Self-attention
2.3 Multi-head Attention
2.3.1 Implementation of Multi-head Attention
2.3.2 Role of Multi-head Attention
2.3.3 Optimization of Multi-head Attention
2.4 Feed-forward Neural Networks
2.5 Residual Connections
2.6 Layer Normalization
2.7 Position Encoding
2.7.1 Design and Implementation of Position Encoding
2.7.2 Variants of Position Encoding
2.7.3 Advantages and Limitations of Position Encoding
2.8 Training and Optimization
2.8.1 Loss Functions
2.8.2 Optimizers
2.8.3 Learning Rate Adjustment Strategies
2.8.4 Regularization
2.8.5 Other Training and Optimization Techniques
2.9 Summary
Chapter 3 Generative Pretraining
3.1 Introduction to Generative Pretraining
3.2 GPT's Transformer Architecture
3.3 Process of Generative Pretraining
3.3.1 Objectives of Generative Pretraining
3.3.2 Error Backpropagation in Generative Pretraining
3.4 Supervised Fine-tuning
3.4.1 Principles of Supervised Fine-tuning
3.4.2 Specific Tasks for Supervised Fine-tuning
3.4.3 Steps of Fine-tuning for Specific Tasks
3.5 Summary
Chapter 4 Unsupervised Multi-task and Zero-shot Learning
4.1 Encoder and Decoder in GPT-2
4.2 Transformer Architecture of GPT-2
4.2.1 Layer Normalization
4.2.2 Orthogonal Initialization
4.2.3 Reversible Tokenization Methods
4.2.4 Learnable Relative Position Encoding
4.3 Unsupervised Multi-task Learning
4.4 Relationship between Multi-task and Zero-shot Learning
4.5 Autoregressive Generation Process in GPT-2
4.5.1 Token Embedding Matrices
4.5.2 Autoregressive Process
4.6 Summary
Chapter 5 Sparse Attention and Content-based Learning in GPT-3
5.1 Architecture of GPT-3 5.2 Sparse Attention Mechanism
5.2.1 Characteristics of Sparse Transformer
5.2.2 Local Band Attention
5.2.3 Cross-layer Sparse Connections
5.3 Meta-learning and Content-based Learning
5.3.1 Meta-learning
5.3.2 Content-based Learning
5.4 Bayesian Inference of Concept Distribution
5.4.1 Implicit Fine-tuning
5.4.2 Bayesian Reasoning
5.5 Reasoning Capability of Thought Chains
5.6 Summary
Chapter 6 Pretraining Strategies for Large Language Models
6.1 Pretraining Datasets
6.2 Data Processing for Pretraining
6.3 Distributed Training Modes
6.3.1 Data Parallelism
6.3.2 Model Parallelism
6.4 Distributed Training Architectures
6.4.1 Pathways
6.4.2 Megatron-LM
6.4.3 ZeRO
6.5 Examples of Training Strategies
6.5.1 Training Frameworks
6.5.2 Parameter Stability
6.5.3 Adjustments in Training Settings
6.5.4 BF16 Optimization
6.5.5 Other Factors
6.6 Summary
Chapter 7 Proximal Policy Optimization Algorithms
7.1 Policy Gradient Methods
7.1.1 Basic Principles of Policy Gradient Methods
7.1.2 Importance Sampling
7.1.3 Introduction of Advantage Functions
7.2 Actor-critic Algorithms
7.2.1 Basic Steps of the Algorithm
7.2.2 Value Functions and Policy Updates
7.2.3 Challenges and Problems in Actor-critic Algorithms
7.3 Trust Region Policy Optimization
7.3.1 Policy Optimization Objectives in TRPO
7.3.2 Limitations of TRPO Algorithm
7.4 Principles of PPO Algorithm
7.4.1 Objective Functions
7.4.2 Loss and Reward Functions
7.5 Summary
Chapter 8 Human Feedback Reinforcement Learning
8.1 Role of Reinforcement Learning in ChatGPT Iterations
8.2 Training Datasets for InstructGPT/ChatGPT
8.2.1 Sources of Fine-tuning Dataset
8.2.2 Annotation Standards
8.2.3 Data Analysis
8.3 Stages of Human Feedback Reinforcement Learning
8.3.1 Supervised Fine-tuning Stage
8.3.2 Reward Modeling Stage
8.3.3 Reinforcement Learning Stage
8.4 Reward Modeling Algorithms
8.4.1 Algorithm Concepts
8.4.2 Loss Functions
8.5 Application of PPO in InstructGPT
8.6 Multi-turn Dialogue Capabilities
8.7 Necessity of Human Feedback Reinforcement Learning
8.8 Summary
Chapter 9 Low-Compute Domain Transfer for Large Language Models
9.1 Bootstrapping Instructions
9.2 AI Feedback in Low-compute Environments
9.3 Low-Rank Adaptation
9.3.1 Training and Deployment of Low-rank Adaptation Models
9.3.2 Choice of Rank in Low-rank Adaptation
9.4 Quantization - Reducing Compute Requirements for Deployment
9.5 SparseGPT Pruning Algorithm
9.6 Low-compute Transfer Cases of Open-source Large Language Models
9.6.1 Baseline Models
9.6.2 Bootstrapped Instruction Fine-tuning of the Llama Series
9.6.3 Chinese Solutions 9.6.4 Medical Domain Transfer Cases
9.6.5 Legal Domain Transfer Cases
9.7 Summary
Chapter 10 Middleware Programming
10.1 Filling the Gap - LangChain Comes at the Right Time
10.2 Multimodal Fusion Middleware
10.2.1 Task Planning
10.2.2 Model Selection
10.2.3 Task Execution
10.2.4 Response Generation
10.3 AutoGPT Autonomous Agents and Task Planning
10.4 Competitive Middleware Frameworks
10.5 Summary
Chapter 11 The Future Path of Large Language Models
11.1 Path to Strong AI
11.2 Data Resource Depletion
11.3 Limitations of Autoregressive Models
11.4 Embodied Intelligence
11.4.1 Challenges of Embodied Intelligence
11.4.2 PaLM-E
11.4.3 ChatGPT for Robotics
11.5 Summary
- No. of pages: 300
- Language: English
- Edition: 1
- Published: June 1, 2025
- Imprint: Elsevier
- Paperback ISBN: 9780443274367
- eBook ISBN: 9780443274374
CG
Cheng Ge
Dr Cheng Ge is the Deputy Director of the Technology Transfer Center at Xiangtan University and the Vice Dean of the JD Intelligent City and Big Data Research Institute in Xiangtan. He is a professor at the School of Computer Science and Cyberspace at Xiangtan University and a committee member of the CCF Law and Computing Society. His primary research interests include knowledge representation learning and content security. An experienced entrepreneur, Dr. Ge has founded several software companies and provided AI industry insights as a technology consultant
Affiliations and expertise
Xiangtan University, Beijing, China