LIMITED OFFER
Save 50% on book bundles
Immediately download your ebook while waiting for your print delivery. No promo code needed.
Data Mining Applications with R is a great resource for researchers and professionals to understand the wide use of R, a free software environment for statistical computing… Read more
LIMITED OFFER
Immediately download your ebook while waiting for your print delivery. No promo code needed.
Data Mining Applications with R
is a great resource for researchers and professionals to understand the wide use of R, a free software environment for statistical computing and graphics, in solving different problems in industry. R is widely used in leveraging data mining techniques across many different industries, including government, finance, insurance, medicine, scientific research and more. This book presents 15 different real-world case studies illustrating various techniques in rapidly growing areas. It is an ideal companion for data mining researchers in academia and industry looking for ways to turn this versatile software into a powerful analytic tool.
R code, Data and color figures for the book are provided at the RDataMining.com website.
Researchers in academia and industry working in the field of data mining, postgraduate students who are interested in data mining, as well as data miners and analysts from industry. Government agencies, banks, insurance, retail, telecom, medicine and scientific research.
Preface
Background
Objectives and Significance
Target Audience
Acknowledgments
Review Committee
Additional Reviewers
Foreword
References
Chapter 1. Power Grid Data Analysis with R and Hadoop
Abstract
1.1 Introduction
1.2 A Brief Overview of the Power Grid
1.3 Introduction to MapReduce, Hadoop, and RHIPE
1.4 Power Grid Analytical Approach
1.5 Discussion and Conclusions
Appendix
References
Chapter 2. Picturing Bayesian Classifiers: A Visual Data Mining Approach to Parameters Optimization
Abstract
Acknowledgments
2.1 Introduction
2.2 Related Works
2.3 Motivations and Requirements
2.4 Probabilistic Framework of NB Classifiers
2.5 Two-Dimensional Visualization System
2.6 A Case Study: Text Classification
2.7 Conclusions
References
Chapter 3. Discovery of Emergent Issues and Controversies in Anthropology Using Text Mining, Topic Modeling, and Social Network Analysis of Microblog Content
Abstract
3.1 Introduction
3.2 How Many Messages and How Many Twitter-Users in the Sample?
3.3 Who Is Writing All These Twitter Messages?
3.4 Who Are the Influential Twitter-Users in This Sample?
3.5 What Is the Community Structure of These Twitter-Users?
3.6 What Were Twitter-Users Writing About During the Meeting?
3.7 What Do the Twitter Messages Reveal About the Opinions of Their Authors?
3.8 What Can Be Discovered in the Less Frequently Used Words in the Sample?
3.9 What Are the Topics That Can Be Algorithmically Discovered in This Sample?
3.10 Conclusion
References
Chapter 4. Text Mining and Network Analysis of Digital Libraries in R
Abstract
4.1 Introduction
4.2 Dataset Preparation
4.3 Manipulating the Document-Term Matrix
4.4 Clustering Content by Topics Using the LDA
4.5 Using Similarity Between Documents to Explore Document Cohesion
4.6 Social Network Analysis of Authors
4.7 Conclusion
References
Chapter 5. Recommender Systems in R
Abstract
5.1 Introduction
5.2 Business Case
5.3 Evaluation
5.4 Collaborative Filtering Methods
5.5 Latent Factor Collaborative Filtering
5.6 Simplified Approach
5.7 Roll Your Own
5.8 Final Thoughts
References
Chapter 6. Response Modeling in Direct Marketing: A Data Mining-Based Approach for Target Selection
Abstract
6.1 Introduction/Background
6.2 Business Problem
6.3 Proposed Response Model
6.4 Modeling Detail
6.5 Prediction Result
6.6 Model Evaluation
6.7 Conclusion
References
Chapter 7. Caravan Insurance Customer Profile Modeling with R
Abstract
7.1 Introduction
7.2 Data Description and Initial Exploratory Data Analysis
7.3 Classifier Models of Caravan Insurance Holders
7.4 Discussion of Results and Conclusion
Appendix A Details of the Full Data Set Variables
Appendix B Customer Profile Data-Frequency of Binary Values
Appendix C Proportion of Caravan Insurance Holders vis-à-vis other Customer Profile Variables
Appendix D LR Model Details
Appendix E R Commands for Computation of ROC Curves for Each Model Using Validation Dataset
Appendix F Commands for Cross-Validation Analysis of Classifier Models
References
Chapter 8. Selecting Best Features for Predicting Bank Loan Default
Abstract
8.1 Introduction
8.2 Business Problem
8.3 Data Extraction
8.4 Data Exploration and Preparation
8.5 Missing Imputation
8.6 Modeling
8.7 Model Evaluation
8.8 Finding and Model Deployment
8.9 Lessons and Discussions
Appendix Selecting Best Features for Predicting Bank Loan Default
References
Chapter 9. A Choquet Integral Toolbox and Its Application in Customer Preference Analysis
Abstract
9.1 Introduction
9.2 Background
9.3 Rfmtool Package
9.4 Case Study
9.5 Conclusions
References
Chapter 10. A Real-Time Property Value Index Based on Web Data
Abstract
Acknowledgments
10.1 Introduction
10.2 Housing Prices and Indices
10.3 A Data Mining Approach
10.4 Real Estate Pricing Models
10.5 Conclusion
References
Chapter 11. Predicting Seabed Hardness Using Random Forest in R
Abstract
Acknowledgments
11.1 Introduction
11.2 Study Region and Data Processing
11.3 Dataset Manipulation and Exploratory Analyses
11.4 Application of RF for Predicting Seabed Hardness
11.5 Model Validation Using rfcv
11.6 Optimal Predictive Model
11.7 Application of the Optimal Predictive Model
11.8 Discussion and Conclusions
Appendix AA Dataset of Seabed Hardness and 15 Predictors
Appendix BA R Function, rf.cv, Shows the Cross-Validated Prediction Performance of a Predictive Model
References
Chapter 12. Supervised Classification of Images, Applied to Plankton Samples Using R and Zooimage
Abstract
Acknowledgments
12.1 Background
12.2 Challenges
12.3 Data Extraction and Exploration
12.4 Data Preprocessing
12.5 Modeling
12.6 Model Evaluation
12.7 Model Deployment
12.8 Lessons, Discussion, and Conclusions
References
Chapter 13. Crime Analyses Using R
Abstract
13.1 Introduction
13.2 Problem Definition
13.3 Data Extraction
13.4 Data Exploration and Preprocessing
13.5 Visualizations
13.6 Modeling
13.7 Model Evaluation
13.8 Discussions and Improvements
References
Chapter 14. Football Mining with R
Abstract
Acknowledgments
14.1 Introduction to the Case Study and Organization of the Analysis
14.2 Background of the Analysis: The Italian Football Championship
14.3 Data Extraction and Exploration
14.4 Data Preprocessing
14.5 Model Development: Building Classifiers
14.6 Model Deployment
14.7 Concluding Remarks
References
Chapter 15. Analyzing Internet DNS(SEC) Traffic with R for Resolving Platform Optimization
Abstract
15.1 Introduction
15.2 Data Extraction from PCAP to CSV File
15.3 Data Importation from CSV File to R
15.4 Dimension Reduction Via PCA
15.5 Initial Data Exploration Via Graphs
15.6 Variables Scaling and Samples Selection
15.7 Clustering for Segmenting the FQDN
15.8 Building Routing Table Thanks to Clustering
15.9 Building Routing Table Thanks to Mixed Integer Linear Programming
15.10 Building Routing Table Via a Heuristic
15.11 Final Evaluation
15.12 Conclusion
References
Index
YZ
Before joining public sector, he was an Australian Postdoctoral Fellow (Industry) in the Faculty of Engineering & Information Technology at University of Technology, Sydney, Australia. His research interests include clustering, association rules, time series, outlier detection and data mining applications and he has over forty papers published in journals and conference proceedings. He is a member of the IEEE and a member of the Institute of Analytics Professionals of Australia, and served as program committee member for more than thirty international conferences.