Handbook of Statistical Analysis
AI and ML Applications
- 3rd Edition - September 16, 2024
- Authors: Robert Nisbet, Gary D. Miner, Keith McCormick
- Language: English
- Hardback ISBN:9 7 8 - 0 - 4 4 3 - 1 3 2 7 3 - 5
- Paperback ISBN:9 7 8 - 0 - 4 4 3 - 1 5 8 4 5 - 2
- eBook ISBN:9 7 8 - 0 - 4 4 3 - 1 5 8 4 6 - 9
Handbook of Statistical Analysis: AI and ML Applications, third edition, is a comprehensive introduction to all stages of data analysis, data preparation, model building, and model… Read more
Purchase options
Institutional subscription on ScienceDirect
Request a sales quoteHandbook of Statistical Analysis: AI and ML Applications, third edition, is a comprehensive introduction to all stages of data analysis, data preparation, model building, and model evaluation. This valuable resource is useful to students and professionals across a variety of fields and settings: business analysts, scientists, engineers, and researchers in academia and industry. General descriptions of algorithms together with case studies help readers understand technical and business problems, weigh the strengths and weaknesses of modern data analysis algorithms, and employ the right analytical methods for practical application.
This resource is an ideal guide for users who want to address massive and complex datasets with many standard analytical approaches and be able to evaluate analyses and solutions objectively. It includes clear, intuitive explanations of the principles and tools for solving problems using modern analytic techniques; offers accessible tutorials; and discusses their application to real-world problems.
- Brings together, in a single resource, all the information a beginner needs to understand the tools and issues in data analytics to build successful predictive analytic solutions
- Provides in-depth descriptions and directions for performing many data preparation operations necessary to generate data sets in the proper form and format for submission to modeling algorithms
- Features clear, intuitive explanations of standard analytical tools and techniques and their practical applications
- Provides a number of case studies to guide practitioners in the design of analytical applications to solve real-world problems in their data domain
- Offers valuable tutorials on the book webpage with step-by-step instructions on how to use suggested tools to build models
- Provides predictive insights into the rapidly expanding “Intelligence Age” as it takes over from the “Information Age,” enabling readers to easily transition the book’s content into the tools of the future
- Cover image
- Title page
- Table of Contents
- Copyright
- Foreword 1 for first edition
- Foreword 2 for first edition
- Foreword—to this third edition
- Preface
- Overall organization of this book
- Acknowledgments
- Praise for the first edition of this book
- Biographies of the primary authors of this book
- Part I: Introduction
- 1. The background and history of predictive analytics
- Abstract
- Preamble
- Phases in the history of statistics and predictive analytics
- Current trends of development in predictive analytics
- Postscript
- References
- 2. Theoretical considerations for data analytics
- Abstract
- Preamble
- The scientific method
- What is the process of data analytics?
- A theoretical framework for the data analytics process
- Strengths of the data analytics process
- Customer centric versus account centric: a new way to look at your data
- The data paradigm shift
- Creation of the CAR
- Major activities of predictive analytics
- Major challenges of data analytics and predictive analytics
- General examples of predictive analytics applications
- Major issues in data analytics
- General requirements for success in a data analytics project
- Example of a data analytics project: classify a bat’s species by its sound
- The importance of domain knowledge
- Postscript
- References
- 3. The data analytics process
- Abstract
- Preamble
- Two approaches to understanding and problem solving
- The role of process in data analytics
- Business understanding phase (mostly art)
- Data understanding phase (mostly science)
- Data preparation phase (a mixture of art and science)
- Modeling (a mixture of art and science)
- Deployment (mostly art)
- Closing the information loop* (art)
- The art of data analytics
- Postscript
- References
- 4. Tools
- Abstract
- Prelude
- Accessory tools
- Pure coding languages
- Menu-based procedural platforms
- Multipersona data science and machine learning (DSML) platforms with graphical programming canvasses and AutoML capabilities
- Advice on tool selection and professional development
- Postscript
- Reference
- Part II: Data preparation
- 5. Data access
- Abstract
- Preamble
- Knowledge from data: predictive analytics
- Overall structure of the database – data modeling
- Account-centric format of data in early computer databases
- The customer-centric analytical data mart
- The ETL process
- Issues in data mart design
- Current trends in data access and data structure design
- References
- 6. Data Understanding
- Abstract
- Prelude
- Four Data Understanding tasks in CRISP-DM
- KNIME examples in Data Understanding
- IBM SPSS modeler examples
- Reference
- 7. Visualizing data for understanding
- Abstract
- Foundations of data visualization: John Tukey
- Importance of the audience of visualizations
- Seeing is believing
- Case Study: visualizations of the Ames Housing data set
- Postscript
- References
- 8. Data cleaning
- Abstract
- Preamble
- Data cleaning operation classification
- Types of data cleaning operations
- Technical correction of data
- String manipulation
- Data filtering
- Time-series filtering
- Outlier handling
- Postscript
- References
- 9. Data conditioning
- Abstract
- Preamble
- Data conditioning operations that generate consistent data sets
- Three types of data conditioning operations are discussed in this chapter
- Postscript
- References
- 10. Data and feature engineering
- Abstract
- Prelude
- Definitions
- Topics of discussion in Chapter 10
- References
- 11. Feature selection
- Abstract
- Preamble
- Modeling and sculpting
- Some introductory concepts
- Feature selection versus dimensionality reduction
- Classification of dimensionality reduction methods
- Decision tree processing elements useful as feature reduction filters
- KNIME examples of feature selection methods
- Correlation filtering (a filter method)
- References
- Additional bibliography for feature selection
- 12. A data preparation cookbook
- Abstract
- Preamble
- Introduction
- 1 CRISP-DM phase I: business understanding
- 2 CRISP-DM phase II: data understanding
- 3 CRISP-DM phase III: data preparation
- 4 CRISP-DM phase IV: modeling
- 18 common mistakes in data preparation in predictive analytics projects
- References
- Part III: Modeling
- 13. A taxonomy of algorithms
- Abstract
- Preamble
- Introduction
- The top five modeling algorithms
- More advanced algorithms
- A brief introduction to the top five algorithms
- Keywords and phrases
- References
- 14. Modeling methods
- Abstract
- Prelude
- Scope of modeling operations
- General modeling issues
- Types of analytical models
- Two approaches to building analytical models
- Two basic types of predictive models
- Keywords and phrases
- References
- 15. Model evaluation and enhancement
- Abstract
- Preamble
- Evaluation and enhancement: part of the modeling process
- Types of errors in analytical models
- Assessment of random error
- Model enhancement techniques
- Model enhancement checklist
- References
- 16. Ensembles and complexity: a practical approach
- Abstract
- Preamble
- Modeling: How to choose the best modeling option?
- Do ensembles violate Occam’s razor?
- The power of ensembles
- Types of ensemble modeling techniques
- General structure of a fused ensemble model
- References
- 17. Deep learning versus traditional machine learning
- Abstract
- Preamble
- Introduction
- Differences between deep learning and traditional machine learning
- The journey to deep learning
- Why is feature engineering so important
- Interactions
- Interaction representation in artificial neural networks
- Postscript
- References
- 18. Interpretable machine learning and explainable AI
- Abstract
- Preamble
- Introduction
- Comparing interpretable machine learning and XAI
- Comparing two types of XAI: global and local explanations
- Techniques for global explanations
- Techniques for local explanations
- LIME
- Regulatory considerations
- The proposed AI Act in the EU
- Postscript
- References
- 19. Human in the loop machine learning and data annotation
- Abstract
- Preamble
- Introduction
- Defining human in the loop
- Concurrent rise of human in the loop and deep learning
- Common human in the loop use cases
- The data annotation ecosystem
- The ethics of data annotation
- Issues associated with managing a data annotation project
- Becoming “data centric”
- Postscript
- References
- Part IV: Applications
- 20. Overview of healthcare delivery and medical informatics
- Abstract
- Prelude
- Introduction
- Chapter 1: The purpose of the book on medical data analytics
- Chapter 2: History of predictive analytics in medicine and health care
- Chapter 3: Bioinformatics
- Chapter 4: Data and process models in medical informatics
- Chapter 5: Access to data for analytics – the initial issue in medical and healthcare predictive analytics
- Chapter 6: Precision (personalized) medicine
- Chapter 7: Patient-directed health care
- Chapter 8: Regulatory measures – agencies and data issues in medicine and health care
- Chapter 9: Predictive analytics with multiomics data
- Chapter 10: Artificial intelligence and genomics
- Chapter 11: Glaucoma (eye disease): a case study with suggested predictive analytics modeling for identifying an individual patient’s best diagnosis and best treatment
- Chapter 12: Practical application example: using data science algorithms in predicting ICU patient urine output in response to diuretics to aid clinicians and healthcare providers in clinical decision making
- Chapter 13: Practical application example: prediction tool development
- Chapter 14: Modeling precancerous colon polyps with OMOP data
- Chapter 15: Prediction of pancreatic and lung cancer from metabolomics data
- Chapter 16: COVID-19 descriptive analytics visualization of pandemic and hospitalization data
- Chapter 17: Disseminated intravascular coagulation predictive analytics with pediatric ICU admissions
- Chapter 18: Challenges for healthcare administration and delivery: integrating predictive and prescriptive modeling into personalized precision health care
- Chapter 19: Challenges of medical research in incorporating modern data analytics in studies
- Chapter 20: The nature of insight from data and implications for automated decisioning
- Chapter 21: Model management and ModelOps: managing an artificial intelligence-driven enterprise
- Chapter 22: Forecasts for advances in predictive and prescriptive analytics
- Chapter 23: Sampling and data analysis
- Chapter 24: Analytics architectures for the 21st century
- Chapter: 25 Predictive models versus prescriptive models; causal inference and Bayesian networks
- Chapter 26: The future: 21st-century health care and wellness in the digital age
- Postscript
- References
- 21. Customer response modeling with temporal abstractions
- Abstract
- Preamble
- Early CRM issues in business
- Knowing how customers behaved before they acted
- Transforming corporations into business ecosystems: The path to customer fulfillment
- CRM modeling in business ecosystems
- Modeling with different sets of predictor variables
- Discussion
- Conclusions
- Postscript
- References
- 22. Academic Analytics: a case study in student retention
- Abstract
- Preamble
- Introduction
- Academic Analytics
- Student retention: a case study
- Methodology
- Results
- Key word list
- Discussion
- Postscript
- References
- Key word list
- 23. Medical informatics case study: predicting type 1 precancerous colon polyps
- Abstract
- Prelude
- Introduction
- The University of California, Irvine Colon Polyp Risk Modeling Project
- Modeling
- Results
- Discussion
- Postscript
- References
- 24. Credit risk case study
- Abstract
- Prelude
- Introduction
- Credit scoring
- Credit risk models
- Case study: credit risk score for a loan application
- Postscript
- Reference
- 25. The intelligence revolution
- Abstract
- Preamble
- Introduction
- Large language models (ChatGPT, BARD, etc.)
- TinyML
- HUMANE
- Tiny to Large AI-FAB LABS
- Data and Predictive Analytics as a “Personal AI” or “Company AI” that functions automatically with AI and AGI – the future is only 3–7 years away!!!
- Postscript
- References
- Part V: Right models – luck & ethics of analytics
- 26. The “Right Model” for the “Right Purpose” – when less is good enough
- ABSTRACT
- Preamble
- More is not necessarily better: Lessons from nature and engineering
- References
- 27. Ethics and data analytics
- ABSTRACT
- Preamble
- The birthday party – a practical example for ethical action
- Secular ethics
- Michael Sandel on “Doing the Right Thing” with data analytics
- Alignment of ethical perspectives: The ethical road map
- What is generative artificial intelligence?
- Postscript
- References
- 28. The significance of luck
- ABSTRACT
- Preamble
- Introduction
- The problem of significance in traditional p-value statistical analysis
- The response of the American Statistical Association
- Postscript
- References
- Part VI: Tutorials and case studies
- Tutorial A. Example of data mining recipes using Windows 10 and STATISTICA 13
- Open the data set
- Open the Data Miner Recipes interface
- Opening the data set
- Fixing the text label problem
- Data mining recipes (again)
- Tutorial B. Analysis of Hurricane data (Hurrdata.sta) using the STATISTICA Data Miner 13
- Discussion
- Tutorial C. Case study – Predicting student success at high-stakes nursing examinations (NCLEX) using IBM SPSS Modeler and STATISTICA Data Miner 13
- Introduction
- Decision management in nursing education
- Case study
- Research question
- Literature review
- Dataset and expected strength of predictors
- Data mining with IBM SPSS Modeler
- Improving model accuracy and stability: boosting and bagging
- Data mining with STATISTICA
- Conclusion
- References
- Uncited references for Additional Information
- Tutorial D. Constructing a Histogram using MidWest Company Personality Data with KNIME
- Changing to the KNIME classical interface
- Opening KNIME
- Tutorial E. Feature selection using KNIME
- Why select features?
- Occam’s razor – simple, but not simplistic
- Local minimum error
- Moving out of the local minimum
- Strategies for reduction of dimensionality in predictive analytics available in KNIME
- Tutorial F. Medical/business tutorial using STATISTICA Data Miner 13 determining possible predictors for days with hospice service for patients with dementia
- Tutorial G. Modeling hospice residence stay with KNIME
- Introduction
- Modeling objective
- Data
- Adding an extension to KNIME
- Creating the Tutorial G workflow
- The XGBoost Tree Ensemble modeling algorithm
- Model evaluation
- Discussion of model accuracy results
- Feature importance
- Tutorial H. Data preparation: Merging data sources
- Tutorial I. Data description
- Tutorial J. Data cleaning and recoding
- Tutorial K. Dummy coding category variables
- Tutorial L. Outlier handling
- Tutorial M. Filling missing values with constants
- Tutorial N. Filling missing values with formulas
- Tutorial O. Filling missing values with a model
- APPENDIX A. Listing of tutorials and other resources on this book’s companion web page
- Appendix B
- Index
- No. of pages: 650
- Language: English
- Edition: 3
- Published: September 16, 2024
- Imprint: Academic Press
- Hardback ISBN: 9780443132735
- Paperback ISBN: 9780443158452
- eBook ISBN: 9780443158469
RN
Robert Nisbet
GM
Gary D. Miner
KM