ChatGPT
Principles and Architecture
- 1st Edition - June 1, 2025
- Author: Ge Cheng
- Language: English
- Paperback ISBN:9 7 8 - 0 - 4 4 3 - 2 7 4 3 6 - 7
- eBook ISBN:9 7 8 - 0 - 4 4 3 - 2 7 4 3 7 - 4
ChatGPT: Principles and Architecture bridges the knowledge gap between theoretical AI concepts and their practical applications. It equips industry professionals and resear… Read more
Purchase options
Institutional subscription on ScienceDirect
Request a sales quoteChatGPT: Principles and Architecture bridges the knowledge gap between theoretical AI concepts and their practical applications. It equips industry professionals and researchers with a deeper understanding of large language models, enabling them to effectively leverage these technologies in their respective fields. In addition, it tackles the complexity of understanding large language models and their practical applications by demystifying underlying technologies and strategies used in developing ChatGPT and similar models. By combining theoretical knowledge with real-world examples, the book enables readers to grasp the nuances of AI technologies, thus paving the way for innovative applications and solutions in their professional domains.
Sections focus on the principles, architecture, pretraining, transfer learning, and middleware programming techniques of ChatGPT, providing a useful resource for the research and academic communities. It is ideal for the needs of industry professionals, researchers, and students in the field of AI and computer science who face daily challenges in understanding and implementing complex large language model technologies.
Sections focus on the principles, architecture, pretraining, transfer learning, and middleware programming techniques of ChatGPT, providing a useful resource for the research and academic communities. It is ideal for the needs of industry professionals, researchers, and students in the field of AI and computer science who face daily challenges in understanding and implementing complex large language model technologies.
- Offers comprehensive insights into the principles and architecture of ChatGPT, helping readers understand the intricacies of large language models
- Details large language model technologies, covering key aspects such as pretraining, transfer learning, middleware programming, and addressing technical aspects in an accessible manner
- Includes real-world examples and case studies, illustrating how large language models can be applied in various industries and professional settings
- Provides future developments and potential innovations in the field of large language models, preparing readers for upcoming changes and technological advancements
Researchers in the field of artificial intelligence (AI) and related disciplines using large language models, engineers dealing with large-scale data processing and analysis, and AI product managers, seeking an up to date, in-depth yet accessible understanding of the principles, mechanisms and architecture of large language models such as ChatGPT and their application to everyday life and work; T professionals, software developers, those working to enhance security and privacy in data management, and technical managers in industries such as healthcare, education and finance, and the governmental, administrative, legal and technology sectors where AI and machine learning are becoming increasingly relevant and who need a resource to assist in their understanding of current AI and its implementation in various professional contexts
1 The New Milestone in AI - ChatGPT
1.1 Development History of ChatGPT
1.2 Capability Level of ChatGPT
1.3 Evolution of Large Language Models
1.3.1 From Symbolism to Connectionism
1.3.2 Transformer Architecture
1.3.3 Unsupervised Pretraining
1.3.4 Supervised Fine-tuning
1.3.5 Reinforcement Learning from Human Feedback
1.4 Technology Stack of Large Language Models
1.5 Impact of Large Language Models
1.6 Barriers to Replication of Large Language Models
1.6.1 Computational Bottlenecks
1.6.2 Data Bottlenecks
1.6.3 Engineering Bottlenecks
1.7 Limitations and Improvement Directions for Large Language Models
1.8 Summary
2 In-Depth Understanding of Transformer Architecture
2.1 Introduction to Transformer Architecture
2.2 Self-attention Mechanism
2.2.1 Calculation Process of Self-attention
2.2.2 Essence of Self-attention Mechanism
2.2.3 Advantages and Limitations of Self-attention
2.3 Multi-head Attention
2.3.1 Implementation of Multi-head Attention
2.3.2 Role of Multi-head Attention
2.3.3 Optimization of Multi-head Attention
2.4 Feed-forward Neural Networks
2.5 Residual Connections
2.6 Layer Normalization
2.7 Position Encoding
2.7.1 Design and Implementation of Position Encoding
2.7.2 Variants of Position Encoding
2.7.3 Advantages and Limitations of Position Encoding
2.8 Training and Optimization
2.8.1 Loss Functions
2.8.2 Optimizers
2.8.3 Learning Rate Adjustment Strategies
2.8.4 Regularization
2.8.5 Other Training and Optimization Techniques
2.9 Summary
3 Generative Pretraining
3.1 Introduction to Generative Pretraining
3.2 GPT's Transformer Architecture
3.3 Process of Generative Pretraining
3.3.1 Objectives of Generative Pretraining
3.3.2 Error Backpropagation in Generative Pretraining
3.4 Supervised Fine-tuning
3.4.1 Principles of Supervised Fine-tuning
3.4.2 Specific Tasks for Supervised Fine-tuning
3.4.3 Steps of Fine-tuning for Specific Tasks
3.5 Summary
4 Unsupervised Multi-task and Zero-shot Learning
4.1 Encoder and Decoder in GPT-2
4.2 Transformer Architecture of GPT-2
4.2.1 Layer Normalization
4.2.2 Orthogonal Initialization
4.2.3 Reversible Tokenization Methods
4.2.4 Learnable Relative Position Encoding
4.3 Unsupervised Multi-task Learning
4.4 Relationship between Multi-task and Zero-shot Learning
4.5 Autoregressive Generation Process in GPT-2
4.5.1 Token Embedding Matrices
4.5.2 Autoregressive Process
4.6 Summary
5 Sparse Attention and Content-based Learning in GPT-3
5.1 Architecture of GPT-3 5.2 Sparse Attention Mechanism
5.2.1 Characteristics of Sparse Transformer
5.2.2 Local Band Attention
5.2.3 Cross-layer Sparse Connections
5.3 Meta-learning and Content-based Learning
5.3.1 Meta-learning
5.3.2 Content-based Learning
5.4 Bayesian Inference of Concept Distribution
5.4.1 Implicit Fine-tuning
5.4.2 Bayesian Reasoning
5.5 Reasoning Capability of Thought Chains
5.6 Summary
6 Pretraining Strategies for Large Language Models
6.1 Pretraining Datasets
6.2 Data Processing for Pretraining
6.3 Distributed Training Modes
6.3.1 Data Parallelism
6.3.2 Model Parallelism
6.4 Distributed Training Architectures
6.4.1 Pathways
6.4.2 Megatron-LM
6.4.3 ZeRO
6.5 Examples of Training Strategies
6.5.1 Training Frameworks
6.5.2 Parameter Stability
6.5.3 Adjustments in Training Settings
6.5.4 BF16 Optimization
6.5.5 Other Factors
6.6 Summary
7 Proximal Policy Optimization Algorithms
7.1 Policy Gradient Methods
7.1.1 Basic Principles of Policy Gradient Methods
7.1.2 Importance Sampling
7.1.3 Introduction of Advantage Functions
7.2 Actor-critic Algorithms
7.2.1 Basic Steps of the Algorithm
7.2.2 Value Functions and Policy Updates
7.2.3 Challenges and Problems in Actor-critic Algorithms
7.3 Trust Region Policy Optimization
7.3.1 Policy Optimization Objectives in TRPO
7.3.2 Limitations of TRPO Algorithm
7.4 Principles of PPO Algorithm
7.4.1 Objective Functions
7.4.2 Loss and Reward Functions
7.5 Summary
8 Human Feedback Reinforcement Learning
8.1 Role of Reinforcement Learning in ChatGPT Iterations
8.2 Training Datasets for InstructGPT/ChatGPT
8.2.1 Sources of Fine-tuning Dataset
8.2.2 Annotation Standards
8.2.3 Data Analysis
8.3 Stages of Human Feedback Reinforcement Learning
8.3.1 Supervised Fine-tuning Stage
8.3.2 Reward Modeling Stage
8.3.3 Reinforcement Learning Stage
8.4 Reward Modeling Algorithms
8.4.1 Algorithm Concepts
8.4.2 Loss Functions
8.5 Application of PPO in InstructGPT
8.6 Multi-turn Dialogue Capabilities
8.7 Necessity of Human Feedback Reinforcement Learning
8.8 Summary
9 Low-Compute Domain Transfer for Large Language Models
9.1 Bootstrapping Instructions
9.2 AI Feedback in Low-compute Environments
9.3 Low-Rank Adaptation
9.3.1 Training and Deployment of Low-rank Adaptation Models
9.3.2 Choice of Rank in Low-rank Adaptation
9.4 Quantization - Reducing Compute Requirements for Deployment
9.5 SparseGPT Pruning Algorithm
9.6 Low-compute Transfer Cases of Open-source Large Language Models
9.6.1 Baseline Models
9.6.2 Bootstrapped Instruction Fine-tuning of the Llama Series
9.6.3 Chinese Solutions 9.6.4 Medical Domain Transfer Cases
9.6.5 Legal Domain Transfer Cases
9.7 Summary
10 Middleware Programming
10.1 Filling the Gap - LangChain Comes at the Right Time
10.2 Multimodal Fusion Middleware
10.2.1 Task Planning
10.2.2 Model Selection
10.2.3 Task Execution
10.2.4 Response Generation
10.3 AutoGPT Autonomous Agents and Task Planning
10.4 Competitive Middleware Frameworks
10.5 Summary
11 The Future Path of Large Language Models
11.1 Path to Strong AI
11.2 Data Resource Depletion
11.3 Limitations of Autoregressive Models
11.4 Embodied Intelligence
11.4.1 Challenges of Embodied Intelligence
11.4.2 PaLM-E
11.4.3 ChatGPT for Robotics
11.5 Summary
1.1 Development History of ChatGPT
1.2 Capability Level of ChatGPT
1.3 Evolution of Large Language Models
1.3.1 From Symbolism to Connectionism
1.3.2 Transformer Architecture
1.3.3 Unsupervised Pretraining
1.3.4 Supervised Fine-tuning
1.3.5 Reinforcement Learning from Human Feedback
1.4 Technology Stack of Large Language Models
1.5 Impact of Large Language Models
1.6 Barriers to Replication of Large Language Models
1.6.1 Computational Bottlenecks
1.6.2 Data Bottlenecks
1.6.3 Engineering Bottlenecks
1.7 Limitations and Improvement Directions for Large Language Models
1.8 Summary
2 In-Depth Understanding of Transformer Architecture
2.1 Introduction to Transformer Architecture
2.2 Self-attention Mechanism
2.2.1 Calculation Process of Self-attention
2.2.2 Essence of Self-attention Mechanism
2.2.3 Advantages and Limitations of Self-attention
2.3 Multi-head Attention
2.3.1 Implementation of Multi-head Attention
2.3.2 Role of Multi-head Attention
2.3.3 Optimization of Multi-head Attention
2.4 Feed-forward Neural Networks
2.5 Residual Connections
2.6 Layer Normalization
2.7 Position Encoding
2.7.1 Design and Implementation of Position Encoding
2.7.2 Variants of Position Encoding
2.7.3 Advantages and Limitations of Position Encoding
2.8 Training and Optimization
2.8.1 Loss Functions
2.8.2 Optimizers
2.8.3 Learning Rate Adjustment Strategies
2.8.4 Regularization
2.8.5 Other Training and Optimization Techniques
2.9 Summary
3 Generative Pretraining
3.1 Introduction to Generative Pretraining
3.2 GPT's Transformer Architecture
3.3 Process of Generative Pretraining
3.3.1 Objectives of Generative Pretraining
3.3.2 Error Backpropagation in Generative Pretraining
3.4 Supervised Fine-tuning
3.4.1 Principles of Supervised Fine-tuning
3.4.2 Specific Tasks for Supervised Fine-tuning
3.4.3 Steps of Fine-tuning for Specific Tasks
3.5 Summary
4 Unsupervised Multi-task and Zero-shot Learning
4.1 Encoder and Decoder in GPT-2
4.2 Transformer Architecture of GPT-2
4.2.1 Layer Normalization
4.2.2 Orthogonal Initialization
4.2.3 Reversible Tokenization Methods
4.2.4 Learnable Relative Position Encoding
4.3 Unsupervised Multi-task Learning
4.4 Relationship between Multi-task and Zero-shot Learning
4.5 Autoregressive Generation Process in GPT-2
4.5.1 Token Embedding Matrices
4.5.2 Autoregressive Process
4.6 Summary
5 Sparse Attention and Content-based Learning in GPT-3
5.1 Architecture of GPT-3 5.2 Sparse Attention Mechanism
5.2.1 Characteristics of Sparse Transformer
5.2.2 Local Band Attention
5.2.3 Cross-layer Sparse Connections
5.3 Meta-learning and Content-based Learning
5.3.1 Meta-learning
5.3.2 Content-based Learning
5.4 Bayesian Inference of Concept Distribution
5.4.1 Implicit Fine-tuning
5.4.2 Bayesian Reasoning
5.5 Reasoning Capability of Thought Chains
5.6 Summary
6 Pretraining Strategies for Large Language Models
6.1 Pretraining Datasets
6.2 Data Processing for Pretraining
6.3 Distributed Training Modes
6.3.1 Data Parallelism
6.3.2 Model Parallelism
6.4 Distributed Training Architectures
6.4.1 Pathways
6.4.2 Megatron-LM
6.4.3 ZeRO
6.5 Examples of Training Strategies
6.5.1 Training Frameworks
6.5.2 Parameter Stability
6.5.3 Adjustments in Training Settings
6.5.4 BF16 Optimization
6.5.5 Other Factors
6.6 Summary
7 Proximal Policy Optimization Algorithms
7.1 Policy Gradient Methods
7.1.1 Basic Principles of Policy Gradient Methods
7.1.2 Importance Sampling
7.1.3 Introduction of Advantage Functions
7.2 Actor-critic Algorithms
7.2.1 Basic Steps of the Algorithm
7.2.2 Value Functions and Policy Updates
7.2.3 Challenges and Problems in Actor-critic Algorithms
7.3 Trust Region Policy Optimization
7.3.1 Policy Optimization Objectives in TRPO
7.3.2 Limitations of TRPO Algorithm
7.4 Principles of PPO Algorithm
7.4.1 Objective Functions
7.4.2 Loss and Reward Functions
7.5 Summary
8 Human Feedback Reinforcement Learning
8.1 Role of Reinforcement Learning in ChatGPT Iterations
8.2 Training Datasets for InstructGPT/ChatGPT
8.2.1 Sources of Fine-tuning Dataset
8.2.2 Annotation Standards
8.2.3 Data Analysis
8.3 Stages of Human Feedback Reinforcement Learning
8.3.1 Supervised Fine-tuning Stage
8.3.2 Reward Modeling Stage
8.3.3 Reinforcement Learning Stage
8.4 Reward Modeling Algorithms
8.4.1 Algorithm Concepts
8.4.2 Loss Functions
8.5 Application of PPO in InstructGPT
8.6 Multi-turn Dialogue Capabilities
8.7 Necessity of Human Feedback Reinforcement Learning
8.8 Summary
9 Low-Compute Domain Transfer for Large Language Models
9.1 Bootstrapping Instructions
9.2 AI Feedback in Low-compute Environments
9.3 Low-Rank Adaptation
9.3.1 Training and Deployment of Low-rank Adaptation Models
9.3.2 Choice of Rank in Low-rank Adaptation
9.4 Quantization - Reducing Compute Requirements for Deployment
9.5 SparseGPT Pruning Algorithm
9.6 Low-compute Transfer Cases of Open-source Large Language Models
9.6.1 Baseline Models
9.6.2 Bootstrapped Instruction Fine-tuning of the Llama Series
9.6.3 Chinese Solutions 9.6.4 Medical Domain Transfer Cases
9.6.5 Legal Domain Transfer Cases
9.7 Summary
10 Middleware Programming
10.1 Filling the Gap - LangChain Comes at the Right Time
10.2 Multimodal Fusion Middleware
10.2.1 Task Planning
10.2.2 Model Selection
10.2.3 Task Execution
10.2.4 Response Generation
10.3 AutoGPT Autonomous Agents and Task Planning
10.4 Competitive Middleware Frameworks
10.5 Summary
11 The Future Path of Large Language Models
11.1 Path to Strong AI
11.2 Data Resource Depletion
11.3 Limitations of Autoregressive Models
11.4 Embodied Intelligence
11.4.1 Challenges of Embodied Intelligence
11.4.2 PaLM-E
11.4.3 ChatGPT for Robotics
11.5 Summary
- No. of pages: 300
- Language: English
- Edition: 1
- Published: June 1, 2025
- Imprint: Elsevier
- Paperback ISBN: 9780443274367
- eBook ISBN: 9780443274374
GC
Ge Cheng
Affiliations and expertise
Dr Cheng is the Deputy Director of the Technology Transfer Center at Xiangtan University and the Vice Dean of the JD Intelligent City and Big Data Research Institute in Xiangtan, China.