
Data Compression for Data Mining Algorithms
- 1st Edition - May 1, 2026
- Latest edition
- Author: Xiaochun Wang
- Language: English
- Paperback ISBN:9 7 8 - 0 - 4 4 3 - 4 0 5 4 1 - 9
- eBook ISBN:9 7 8 - 0 - 4 4 3 - 4 0 5 4 2 - 6
Data Compression for Data Mining Algorithms tackles the important problems in the design of more efficient data mining algorithms by way of data compression techniques and provid… Read more
Purchase options

Data Compression for Data Mining Algorithms tackles the important problems in the design of more efficient data mining algorithms by way of data compression techniques and provides the first systematic and comprehensive description of the relationships between data compression mechanisms and the computations involved in data mining algorithms. Data mining algorithms are powerful analytical techniques used across various disciplines, including business, engineering, and science. However, in the big data era, tasks such as association rule mining and classification often require multiple scans of databases, while clustering and outlier detection methods typically depend on Euclidean distance for similarity measures, leading to high computational costs. Data Compression for Data Mining Algorithms addresses these challenges by focusing on the scalarization of data mining algorithms, leveraging data compression techniques to reduce dataset sizes and applying information theory principles to minimize computations involved in tasks such as feature selection and similarity computation. The book features the latest developments in both lossless and lossy data compression methods and provides a comprehensive exposition of data compression methods for data mining algorithm design from multiple points of view. Key discussions include Huffman coding, scalar and vector quantization, transforms, subbands, wavelet-based compression for scalable algorithms, and the role of neural networks, particularly deep learning, in feature selection and dimensionality reduction. The book’s contents are well-balanced for both theoretical analysis and real-world applications, and the chapters are well organized to compose a solid overview of the data compression techniques for data mining. To provide the reader with a more complete understanding of the material, projects and problems solved with Python are interspersed throughout the text.
- Covers popular data compression methods and their solutions to aid in the development and application of data mining algorithms.
- Includes projects and problems solved with Python to help readers create programs for both data compression and data mining problems.
- Focuses on the scalarization of data mining algorithms, leveraging data compression techniques to reduce dataset sizes and applying information theory principles to minimize computations.
- Simplifies the content of the field of data compression by covering topics that are widely useful from a data mining perspective.
Computer Science researchers, data science researchers, and data analysis researchers in academia and industry. The primary audience also includes researchers and professionals in the fields of mathematics, AI, ML, deep learning and those who want to enhance their skills in data mining and analysis.
Part I: Foundation
1. Overview and Contributions
1.1 Overview
1.2 Introduction
1.3 Developments in Data Compression Techniques for Data Mining Algorithm Design
1.4 Overview of the Book
1.5 Contributions
1.6 Conclusions
2. Introduction to Data Mining Algorithms
2.1 Introduction
2.2 Association Rule Mining
2.2.1 Frequent Itemsets
2.2.2 Association Rules
2.3 Classification
2.3.1 Decision Tree
2.3.2 Support Vector Machine
2.4 Clustering
2.4.1 k-Means Algorithm
2.4.2 Single-Link Algorithm
2.4.3 DBSCAN Algorithm
2.4.4 Minimum Spanning Tree Algorithm
2.5 Outlier Detection
2.5.1 Probability Based Algorithm
2.5.2 Proximity Based Algorithm
2.5.3 Classification Based Algorithm
2.5.4 Clustering Based Algorithm
2.6 Mining Large Datasets
2.6.1 Overview
2.6.2 Issues and Challenges
2.7 Summary
2.8 Bibliographies
3. Introduction to Data Compression Methods
3.1 Feature Extraction and Data Representation
3.2 Lossless Data Compression Methods
3.2.1 Huffman Coding
3.2.2 Arithmetic Coding
3.2.3 Run-length Coding
3.3 Lossy Data Compression Methods
3.3.1 Quantization
3.3.2 Dictionary Techniques
3.3.3 Differential Encoding
3.3.4 Transform Coding, Subband Coding, and Wavelets
3.4 Data Compression for Data Preprocessing
3.4.1 Data Reduction and Transformation
3.4.2 Sampling
3.4.3 Dimensionality Reduction
3.5 Summary
3.6 Bibliographic Notes
Part II: Association Rule Mining
4. Huffman Coding for Association Rule Mining
4.1 Introduction
4.2 Frequent Itemset and Association Rule Mining
4.3 The Apriori Algorithm
4.4 The FP-tree Algorithm
4.5 The Proposed Huffman Coding for Frequent Itemset Mining
4.6 Experiments and Results
4.7 Conclusions
4.8 References
5. Arithmetic Coding for Maximal Frequent Itemsets Mining
5.1 Introduction
5.2 Maximal Frequent Itemsets Mining
5.3 Arithmetic Coding
5.4 The Proposed Arithmetic Coding for Maximal Frequent Itemset Mining
5.5 Experiments and Results
5.6 Conclusions
5.7 References
Part III: Classification
6. Feature Subset Selection for Decision Tree Construction
6.1 Introduction
6.2 Decision Tree for Classification
6.3 Feature Subset Selection
6.4 The Proposed Feature Subset Selection for Decision Tree Construction
6.5 Experiments and Results
6.6 Conclusions
6.7 References
7. Neural Networks for Decision Tree Construction
7.1 Introduction
7.2 Neural Networks
7.3 Deep Neural Networks
7.4 The Proposed NN-Based Feature Subset Selection for Decision Tree Construction
7.5 Experiments and Results
7.6 Conclusions
7.7 References
8. Principal Component Analysis for Decision Tree Construction
8.1 Introduction
8.2 Principal Component Analysis
8.3 The Proposed PCA-Based Decision Tree Construction
8.4 Experiments and Results
8.5 Conclusions
8.6 References
9. Dictionary Techniques for Support Vector Machine
9.1 Introduction
9.2 Support Vector Machine for Classification
9.3 Dictionary Techniques
9.4 The Proposed Dictionary Techniques for Support Vector Machine
9.5 Experiments and Results
9.6 Conclusions
9.7 References
10. Quantization for Support Vector Machine
10.1 Introduction
10.2 Scalar Quantization
10.3 Vector Quantization
10.4 The Proposed Quantization Method for Support Vector Machine
10.5 Experiments and Results
10.6 Conclusions
10.7 References
Part IV: Clustering and Outlier Detection
11. A Sparse Data Representation for Clustering
11.1 Introduction
11.2 Background
11.3 The Proposed Data Compression Method
11.4 Experiments and Results
11.5 Conclusions
11.6 References
12. Dictionary Coding Based Compression for Clustering
12.1 Introduction
12.2 Background
12.3 The Proposed Dictionary Coding Method for Efficient Clustering
12.4 Experiments and Results
12.5 Conclusions
12.6 References
13. Nearest Neighbor Based Compression for Outlier Detection
13.1 Introduction
13.2 Background
13.3 The Proposed Data Compression Method for Efficient Outlier Detection
13.4 Experiments and Results
13.5 Conclusions
13.6 References
14. Huffman Coding for Outlier Detection
14.1 Introduction
14.2 Background
14.3 The Proposed Multi-dimensional Data Compression by Huffman Coding
14.4 Experiments and Results
14.5 Conclusions
14.6 References
15. Arithmetic Coding for Outlier Detection
15.1 Introduction
15.2 Background
15.3 The Proposed Multi-dimensional Data Compression by Arithmetic Coding
15.4 Experiments and Results
15.5 Conclusions
15.6 References
1. Overview and Contributions
1.1 Overview
1.2 Introduction
1.3 Developments in Data Compression Techniques for Data Mining Algorithm Design
1.4 Overview of the Book
1.5 Contributions
1.6 Conclusions
2. Introduction to Data Mining Algorithms
2.1 Introduction
2.2 Association Rule Mining
2.2.1 Frequent Itemsets
2.2.2 Association Rules
2.3 Classification
2.3.1 Decision Tree
2.3.2 Support Vector Machine
2.4 Clustering
2.4.1 k-Means Algorithm
2.4.2 Single-Link Algorithm
2.4.3 DBSCAN Algorithm
2.4.4 Minimum Spanning Tree Algorithm
2.5 Outlier Detection
2.5.1 Probability Based Algorithm
2.5.2 Proximity Based Algorithm
2.5.3 Classification Based Algorithm
2.5.4 Clustering Based Algorithm
2.6 Mining Large Datasets
2.6.1 Overview
2.6.2 Issues and Challenges
2.7 Summary
2.8 Bibliographies
3. Introduction to Data Compression Methods
3.1 Feature Extraction and Data Representation
3.2 Lossless Data Compression Methods
3.2.1 Huffman Coding
3.2.2 Arithmetic Coding
3.2.3 Run-length Coding
3.3 Lossy Data Compression Methods
3.3.1 Quantization
3.3.2 Dictionary Techniques
3.3.3 Differential Encoding
3.3.4 Transform Coding, Subband Coding, and Wavelets
3.4 Data Compression for Data Preprocessing
3.4.1 Data Reduction and Transformation
3.4.2 Sampling
3.4.3 Dimensionality Reduction
3.5 Summary
3.6 Bibliographic Notes
Part II: Association Rule Mining
4. Huffman Coding for Association Rule Mining
4.1 Introduction
4.2 Frequent Itemset and Association Rule Mining
4.3 The Apriori Algorithm
4.4 The FP-tree Algorithm
4.5 The Proposed Huffman Coding for Frequent Itemset Mining
4.6 Experiments and Results
4.7 Conclusions
4.8 References
5. Arithmetic Coding for Maximal Frequent Itemsets Mining
5.1 Introduction
5.2 Maximal Frequent Itemsets Mining
5.3 Arithmetic Coding
5.4 The Proposed Arithmetic Coding for Maximal Frequent Itemset Mining
5.5 Experiments and Results
5.6 Conclusions
5.7 References
Part III: Classification
6. Feature Subset Selection for Decision Tree Construction
6.1 Introduction
6.2 Decision Tree for Classification
6.3 Feature Subset Selection
6.4 The Proposed Feature Subset Selection for Decision Tree Construction
6.5 Experiments and Results
6.6 Conclusions
6.7 References
7. Neural Networks for Decision Tree Construction
7.1 Introduction
7.2 Neural Networks
7.3 Deep Neural Networks
7.4 The Proposed NN-Based Feature Subset Selection for Decision Tree Construction
7.5 Experiments and Results
7.6 Conclusions
7.7 References
8. Principal Component Analysis for Decision Tree Construction
8.1 Introduction
8.2 Principal Component Analysis
8.3 The Proposed PCA-Based Decision Tree Construction
8.4 Experiments and Results
8.5 Conclusions
8.6 References
9. Dictionary Techniques for Support Vector Machine
9.1 Introduction
9.2 Support Vector Machine for Classification
9.3 Dictionary Techniques
9.4 The Proposed Dictionary Techniques for Support Vector Machine
9.5 Experiments and Results
9.6 Conclusions
9.7 References
10. Quantization for Support Vector Machine
10.1 Introduction
10.2 Scalar Quantization
10.3 Vector Quantization
10.4 The Proposed Quantization Method for Support Vector Machine
10.5 Experiments and Results
10.6 Conclusions
10.7 References
Part IV: Clustering and Outlier Detection
11. A Sparse Data Representation for Clustering
11.1 Introduction
11.2 Background
11.3 The Proposed Data Compression Method
11.4 Experiments and Results
11.5 Conclusions
11.6 References
12. Dictionary Coding Based Compression for Clustering
12.1 Introduction
12.2 Background
12.3 The Proposed Dictionary Coding Method for Efficient Clustering
12.4 Experiments and Results
12.5 Conclusions
12.6 References
13. Nearest Neighbor Based Compression for Outlier Detection
13.1 Introduction
13.2 Background
13.3 The Proposed Data Compression Method for Efficient Outlier Detection
13.4 Experiments and Results
13.5 Conclusions
13.6 References
14. Huffman Coding for Outlier Detection
14.1 Introduction
14.2 Background
14.3 The Proposed Multi-dimensional Data Compression by Huffman Coding
14.4 Experiments and Results
14.5 Conclusions
14.6 References
15. Arithmetic Coding for Outlier Detection
15.1 Introduction
15.2 Background
15.3 The Proposed Multi-dimensional Data Compression by Arithmetic Coding
15.4 Experiments and Results
15.5 Conclusions
15.6 References
- Edition: 1
- Latest edition
- Published: May 1, 2026
- Language: English
XW
Xiaochun Wang
Dr. Xiaochun Wang received her BS degree from Beijing University and her MS degree in data compression and PhD degree in mobile robotics from the Department of Electrical Engineering and Computer Science at Vanderbilt University. She was an associate professor at the School of Software Engineering at Xi’an Jiaotong University and taught Database Management and Data Mining courses from 2010 to 2021. She currently works as a senior scientist at Xi’an Tuowei Hi-Tech Corporation. Her research interests include data mining, pattern recognition, signal processing, and computer vision.
Affiliations and expertise
Xi’an Tuowei-High-Tech Corporation, Xi'an, China