ChatGPT

Principles and Architecture

1st Edition - June 17, 2025
Latest edition
Author: Ge Cheng
Language: English

ChatGPT: Principles and Architecture bridges the knowledge gap between theoretical AI concepts and their practical applications. It equips industry professionals and resear… Read more

Discover top bestsellers!

Save up to 25% off books and eBooks.

Shop bestsellers

ChatGPT: Principles and Architecture bridges the knowledge gap between theoretical AI concepts and their practical applications. It equips industry professionals and researchers with a deeper understanding of large language models, enabling them to effectively leverage these technologies in their respective fields. In addition, it tackles the complexity of understanding large language models and their practical applications by demystifying underlying technologies and strategies used in developing ChatGPT and similar models. By combining theoretical knowledge with real-world examples, the book enables readers to grasp the nuances of AI technologies, thus paving the way for innovative applications and solutions in their professional domains.

Sections focus on the principles, architecture, pretraining, transfer learning, and middleware programming techniques of ChatGPT, providing a useful resource for the research and academic communities. It is ideal for the needs of industry professionals, researchers, and students in the field of AI and computer science who face daily challenges in understanding and implementing complex large language model technologies.

1 The New Milestone in AI - ChatGPT

1.1 Development History of ChatGPT

1.2 Capability Level of ChatGPT

1.3 Evolution of Large Language Models

1.3.1 From Symbolism to Connectionism

1.3.2 Transformer Architecture

1.3.3 Unsupervised Pretraining

1.3.4 Supervised Fine-tuning

1.3.5 Reinforcement Learning from Human Feedback

1.4 Technology Stack of Large Language Models

1.5 Impact of Large Language Models

1.6 Barriers to Replication of Large Language Models

1.6.1 Computational Bottlenecks

1.6.2 Data Bottlenecks

1.6.3 Engineering Bottlenecks

1.7 Limitations and Improvement Directions for Large Language Models

1.8 Summary

2 In-Depth Understanding of Transformer Architecture

2.1 Introduction to Transformer Architecture

2.2 Self-attention Mechanism

2.2.1 Calculation Process of Self-attention

2.2.2 Essence of Self-attention Mechanism

2.2.3 Advantages and Limitations of Self-attention

2.3 Multi-head Attention

2.3.1 Implementation of Multi-head Attention

2.3.2 Role of Multi-head Attention

2.3.3 Optimization of Multi-head Attention

2.4 Feed-forward Neural Networks

2.5 Residual Connections

2.6 Layer Normalization

2.7 Position Encoding

2.7.1 Design and Implementation of Position Encoding

2.7.2 Variants of Position Encoding

2.7.3 Advantages and Limitations of Position Encoding

2.8 Training and Optimization

2.8.1 Loss Functions

2.8.2 Optimizers

2.8.3 Learning Rate Adjustment Strategies

2.8.4 Regularization

2.8.5 Other Training and Optimization Techniques

2.9 Summary

3 Generative Pretraining

3.1 Introduction to Generative Pretraining

3.2 GPT's Transformer Architecture

3.3 Process of Generative Pretraining

3.3.1 Objectives of Generative Pretraining

3.3.2 Error Backpropagation in Generative Pretraining

3.4 Supervised Fine-tuning

3.4.1 Principles of Supervised Fine-tuning

3.4.2 Specific Tasks for Supervised Fine-tuning

3.4.3 Steps of Fine-tuning for Specific Tasks

3.5 Summary

4 Unsupervised Multi-task and Zero-shot Learning

4.1 Encoder and Decoder in GPT-2

4.2 Transformer Architecture of GPT-2

4.2.1 Layer Normalization

4.2.2 Orthogonal Initialization

4.2.3 Reversible Tokenization Methods

4.2.4 Learnable Relative Position Encoding

4.3 Unsupervised Multi-task Learning

4.4 Relationship between Multi-task and Zero-shot Learning

4.5 Autoregressive Generation Process in GPT-2

4.5.1 Token Embedding Matrices

4.5.2 Autoregressive Process

4.6 Summary

5 Sparse Attention and Content-based Learning in GPT-3

5.1 Architecture of GPT-3 5.2 Sparse Attention Mechanism

5.2.1 Characteristics of Sparse Transformer

5.2.2 Local Band Attention

5.2.3 Cross-layer Sparse Connections

5.3 Meta-learning and Content-based Learning

5.3.1 Meta-learning

5.3.2 Content-based Learning

5.4 Bayesian Inference of Concept Distribution

5.4.1 Implicit Fine-tuning

5.4.2 Bayesian Reasoning

5.5 Reasoning Capability of Thought Chains

5.6 Summary

6 Pretraining Strategies for Large Language Models

6.1 Pretraining Datasets

6.2 Data Processing for Pretraining

6.3 Distributed Training Modes

6.3.1 Data Parallelism

6.3.2 Model Parallelism

6.4 Distributed Training Architectures

6.4.1 Pathways

6.4.2 Megatron-LM

6.4.3 ZeRO

6.5 Examples of Training Strategies

6.5.1 Training Frameworks

6.5.2 Parameter Stability

6.5.3 Adjustments in Training Settings

6.5.4 BF16 Optimization

6.5.5 Other Factors

6.6 Summary

7 Proximal Policy Optimization Algorithms

7.1 Policy Gradient Methods

7.1.1 Basic Principles of Policy Gradient Methods

7.1.2 Importance Sampling

7.1.3 Introduction of Advantage Functions

7.2 Actor-critic Algorithms

7.2.1 Basic Steps of the Algorithm

7.2.2 Value Functions and Policy Updates

7.2.3 Challenges and Problems in Actor-critic Algorithms

7.3 Trust Region Policy Optimization

7.3.1 Policy Optimization Objectives in TRPO

7.3.2 Limitations of TRPO Algorithm

7.4 Principles of PPO Algorithm

7.4.1 Objective Functions

7.4.2 Loss and Reward Functions

7.5 Summary

8 Human Feedback Reinforcement Learning

8.1 Role of Reinforcement Learning in ChatGPT Iterations

8.2 Training Datasets for InstructGPT/ChatGPT

8.2.1 Sources of Fine-tuning Dataset

8.2.2 Annotation Standards

8.2.3 Data Analysis

8.3 Stages of Human Feedback Reinforcement Learning

8.3.1 Supervised Fine-tuning Stage

8.3.2 Reward Modeling Stage

8.3.3 Reinforcement Learning Stage

8.4 Reward Modeling Algorithms

8.4.1 Algorithm Concepts

8.4.2 Loss Functions

8.5 Application of PPO in InstructGPT

8.6 Multi-turn Dialogue Capabilities

8.7 Necessity of Human Feedback Reinforcement Learning

8.8 Summary

9 Low-Compute Domain Transfer for Large Language Models

9.1 Bootstrapping Instructions

9.2 AI Feedback in Low-compute Environments

9.3 Low-Rank Adaptation

9.3.1 Training and Deployment of Low-rank Adaptation Models

9.3.2 Choice of Rank in Low-rank Adaptation

9.4 Quantization - Reducing Compute Requirements for Deployment

9.5 SparseGPT Pruning Algorithm

9.6 Low-compute Transfer Cases of Open-source Large Language Models

9.6.1 Baseline Models

9.6.2 Bootstrapped Instruction Fine-tuning of the Llama Series

9.6.3 Chinese Solutions 9.6.4 Medical Domain Transfer Cases

9.6.5 Legal Domain Transfer Cases

9.7 Summary

10 Middleware Programming

10.1 Filling the Gap - LangChain Comes at the Right Time

10.2 Multimodal Fusion Middleware

10.2.1 Task Planning

10.2.2 Model Selection

10.2.3 Task Execution

10.2.4 Response Generation

10.3 AutoGPT Autonomous Agents and Task Planning

10.4 Competitive Middleware Frameworks

10.5 Summary

11 The Future Path of Large Language Models

11.1 Path to Strong AI

11.2 Data Resource Depletion

11.3 Limitations of Autoregressive Models

11.4 Embodied Intelligence

11.4.1 Challenges of Embodied Intelligence

11.4.2 PaLM-E

11.4.3 ChatGPT for Robotics

11.5 Summary

Life Sciences

Physical Sciences & Engineering

Social Sciences & Humanities

Health

ChatGPT

Principles and Architecture

Discover top bestsellers!

Ge Cheng

Related books