Data Compression for Data Mining Algorithms

1st Edition - May 1, 2026
Latest edition
Author: Xiaochun Wang
Language: English

Data Compression for Data Mining Algorithms tackles the important problems in the design of more efficient data mining algorithms by way of data compression techniques and provid… Read more

Holiday Savings

Save up to 30% off books & Journals plus free shipping on all orders.

Shop now

Data Compression for Data Mining Algorithms tackles the important problems in the design of more efficient data mining algorithms by way of data compression techniques and provides the first systematic and comprehensive description of the relationships between data compression mechanisms and the computations involved in data mining algorithms. Data mining algorithms are powerful analytical techniques used across various disciplines, including business, engineering, and science. However, in the big data era, tasks such as association rule mining and classification often require multiple scans of databases, while clustering and outlier detection methods typically depend on Euclidean distance for similarity measures, leading to high computational costs.

Data Compression for Data Mining Algorithms addresses these challenges by focusing on the scalarization of data mining algorithms, leveraging data compression techniques to reduce dataset sizes and applying information theory principles to minimize computations involved in tasks such as feature selection and similarity computation. The book features the latest developments in both lossless and lossy data compression methods and provides a comprehensive exposition of data compression methods for data mining algorithm design from multiple points of view.

Key discussions include Huffman coding, scalar and vector quantization, transforms, subbands, wavelet-based compression for scalable algorithms, and the role of neural networks, particularly deep learning, in feature selection and dimensionality reduction. The book’s contents are well-balanced for both theoretical analysis and real-world applications, and the chapters are well organized to compose a solid overview of the data compression techniques for data mining. To provide the reader with a more complete understanding of the material, projects and problems solved with Python are interspersed throughout the text.

Part I: Foundation

1. Overview and Contributions

1.1 Overview

1.2 Introduction

1.3 Developments in Data Compression Techniques for Data Mining Algorithm Design

1.4 Overview of the Book

1.5 Contributions

1.6 Conclusions

2. Introduction to Data Mining Algorithms

2.1 Introduction

2.2 Association Rule Mining

2.2.1 Frequent Itemsets

2.2.2 Association Rules

2.3 Classification

2.3.1 Decision Tree

2.3.2 Support Vector Machine

2.4 Clustering

2.4.1 k-Means Algorithm

2.4.2 Single-Link Algorithm

2.4.3 DBSCAN Algorithm

2.4.4 Minimum Spanning Tree Algorithm

2.5 Outlier Detection

2.5.1 Probability Based Algorithm

2.5.2 Proximity Based Algorithm

2.5.3 Classification Based Algorithm

2.5.4 Clustering Based Algorithm

2.6 Mining Large Datasets

2.6.1 Overview

2.6.2 Issues and Challenges

2.7 Summary

2.8 Bibliographies

3. Introduction to Data Compression Methods

3.1 Feature Extraction and Data Representation

3.2 Lossless Data Compression Methods

3.2.1 Huffman Coding

3.2.2 Arithmetic Coding

3.2.3 Run-length Coding

3.3 Lossy Data Compression Methods

3.3.1 Quantization

3.3.2 Dictionary Techniques

3.3.3 Differential Encoding

3.3.4 Transform Coding, Subband Coding, and Wavelets

3.4 Data Compression for Data Preprocessing

3.4.1 Data Reduction and Transformation

3.4.2 Sampling

3.4.3 Dimensionality Reduction

3.5 Summary

3.6 Bibliographic Notes

Part II: Association Rule Mining

4. Huffman Coding for Association Rule Mining

4.1 Introduction

4.2 Frequent Itemset and Association Rule Mining

4.3 The Apriori Algorithm

4.4 The FP-tree Algorithm

4.5 The Proposed Huffman Coding for Frequent Itemset Mining

4.6 Experiments and Results

4.7 Conclusions

4.8 References

5. Arithmetic Coding for Maximal Frequent Itemsets Mining

5.1 Introduction

5.2 Maximal Frequent Itemsets Mining

5.3 Arithmetic Coding

5.4 The Proposed Arithmetic Coding for Maximal Frequent Itemset Mining

5.5 Experiments and Results

5.6 Conclusions

5.7 References

Part III: Classification

6. Feature Subset Selection for Decision Tree Construction

6.1 Introduction

6.2 Decision Tree for Classification

6.3 Feature Subset Selection

6.4 The Proposed Feature Subset Selection for Decision Tree Construction

6.5 Experiments and Results

6.6 Conclusions

6.7 References

7. Neural Networks for Decision Tree Construction

7.1 Introduction

7.2 Neural Networks

7.3 Deep Neural Networks

7.4 The Proposed NN-Based Feature Subset Selection for Decision Tree Construction

7.5 Experiments and Results

7.6 Conclusions

7.7 References

8. Principal Component Analysis for Decision Tree Construction

8.1 Introduction

8.2 Principal Component Analysis

8.3 The Proposed PCA-Based Decision Tree Construction

8.4 Experiments and Results

8.5 Conclusions

8.6 References

9. Dictionary Techniques for Support Vector Machine

9.1 Introduction

9.2 Support Vector Machine for Classification

9.3 Dictionary Techniques

9.4 The Proposed Dictionary Techniques for Support Vector Machine

9.5 Experiments and Results

9.6 Conclusions

9.7 References

10. Quantization for Support Vector Machine

10.1 Introduction

10.2 Scalar Quantization

10.3 Vector Quantization

10.4 The Proposed Quantization Method for Support Vector Machine

10.5 Experiments and Results

10.6 Conclusions

10.7 References

Part IV: Clustering and Outlier Detection

11. A Sparse Data Representation for Clustering

11.1 Introduction

11.2 Background

11.3 The Proposed Data Compression Method

11.4 Experiments and Results

11.5 Conclusions

11.6 References

12. Dictionary Coding Based Compression for Clustering

12.1 Introduction

12.2 Background

12.3 The Proposed Dictionary Coding Method for Efficient Clustering

12.4 Experiments and Results

12.5 Conclusions

12.6 References

13. Nearest Neighbor Based Compression for Outlier Detection

13.1 Introduction

13.2 Background

13.3 The Proposed Data Compression Method for Efficient Outlier Detection

13.4 Experiments and Results

13.5 Conclusions

13.6 References

14. Huffman Coding for Outlier Detection

14.1 Introduction

14.2 Background

14.3 The Proposed Multi-dimensional Data Compression by Huffman Coding

14.4 Experiments and Results

14.5 Conclusions

14.6 References

15. Arithmetic Coding for Outlier Detection

15.1 Introduction

15.2 Background

15.3 The Proposed Multi-dimensional Data Compression by Arithmetic Coding

15.4 Experiments and Results

15.5 Conclusions

15.6 References

Life Sciences

Physical Sciences & Engineering

Social Sciences & Humanities

Health

Data Compression for Data Mining Algorithms

Holiday Savings

Description

Key features

Readership

Table of contents

Product details

About the author

Xiaochun Wang

Related books