LIMITED OFFER
Save 50% on book bundles
Immediately download your ebook while waiting for your print delivery. No promo code needed.
R and Data Mining introduces researchers, post-graduate students, and analysts to data mining using R, a free software environment for statistical computing and graphics. The bo… Read more
LIMITED OFFER
Immediately download your ebook while waiting for your print delivery. No promo code needed.
Dedication
List of Figures
List of Abbreviations
Chapter 1. Introduction
1.1 Data Mining
1.2 R
1.3 Datasets
References
Chapter 2. Data Import and Export
2.1 Save and Load R Data
2.2 Import from and Export to .CSV Files
2.3 Import Data from SAS
2.4 Import/Export via ODBC
References
Chapter 3. Data Exploration
3.1 Have a Look at Data
3.2 Explore Individual Variables
3.3 Explore Multiple Variables
3.4 More Explorations
3.5 Save Charts into Files
References
Chapter 4. Decision Trees and Random Forest
4.1 Decision Trees with Package party
4.2 Decision Trees with Package rpart
4.3 Random Forest
References
Chapter 5. Regression
5.1 Linear Regression
5.2 Logistic Regression
5.3 Generalized Linear Regression
5.4 Non-Linear Regression
Chapter 6. Clustering
6.1 The k-Means Clustering
6.2 The k-Medoids Clustering
6.3 Hierarchical Clustering
6.4 Density-Based Clustering
References
Chapter 7. Outlier Detection
7.1 Univariate Outlier Detection
7.2 Outlier Detection with LOF
7.3 Outlier Detection by Clustering
7.4 Outlier Detection from Time Series
7.5 Discussions
References
Chapter 8. Time Series Analysis and Mining
8.1 Time Series Data in R
8.2 Time Series Decomposition
8.3 Time Series Forecasting
8.4 Time Series Clustering
8.5 Time Series Classification
8.6 Discussions
8.7 Further Readings
References
Chapter 9. Association Rules
9.1 Basics of Association Rules
9.2 The Titanic Dataset
9.3 Association Rule Mining
9.4 Removing Redundancy
9.5 Interpreting Rules
9.6 Visualizing Association Rules
9.7 Discussions and Further Readings
References
Chapter 10. Text Mining
10.1 Retrieving Text from Twitter
10.2 Transforming Text
10.3 Stemming Words
10.4 Building a Term-Document Matrix
10.5 Frequent Terms and Associations
10.6 Word Cloud
10.7 Clustering Words
10.8 Clustering Tweets
10.9 Packages, Further Readings, and Discussions
References
Chapter 11. Social Network Analysis
11.1 Network of Terms
11.2 Network of Tweets
11.3 Two-Mode Network
11.4 Discussions and Further Readings
References
Chapter 12. Case Study I: Analysis and Forecasting of House Price Indices
12.1 Importing HPI Data
12.2 Exploration of HPI Data
12.3 Trend and Seasonal Components of HPI
12.4 HPI Forecasting
12.5 The Estimated Price of a Property
12.6 Discussion
Chapter 13. Case Study II: Customer Response Prediction and Profit Optimization
13.1 Introduction
13.2 The Data of KDD Cup 1998
13.3 Data Exploration
13.4 Training Decision Trees
13.5 Model Evaluation
13.6 Selecting the Best Tree
13.7 Scoring
13.8 Discussions and Conclusions
Reference
Chapter 14. Case Study III: Predictive Modeling of Big Data with Limited Memory
14.1 Introduction
14.2 Methodology
14.3 Data and Variables
14.4 Random Forest
14.5 Memory Issue
14.6 Train Models on Sample Data
14.7 Build Models with Selected Variables
14.8 Scoring
14.9 Print Rules
14.10 Conclusions and Discussion
Chapter 15. Online Resources
15.1 R Reference Cards
15.2 R
15.3 Data Mining
15.4 Data Mining with R
15.5 Classification/Prediction with R
15.6 Time Series Analysis with R
15.7 Association Rule Mining with R
15.8 Spatial Data Analysis with R
15.9 Text Mining with R
15.10 Social Network Analysis with R
15.11 Data Cleansing and Transformation with R
15.12 Big Data and Parallel Computing with R
R Reference Card for Data Mining
Bibliography
General Index
Package Index
Function Index
YZ
Before joining public sector, he was an Australian Postdoctoral Fellow (Industry) in the Faculty of Engineering & Information Technology at University of Technology, Sydney, Australia. His research interests include clustering, association rules, time series, outlier detection and data mining applications and he has over forty papers published in journals and conference proceedings. He is a member of the IEEE and a member of the Institute of Analytics Professionals of Australia, and served as program committee member for more than thirty international conferences.