Synthetic Data and Generative AI
- 1st Edition - January 9, 2024
- Author: Vincent Granville
- Language: English
- Paperback ISBN:9 7 8 - 0 - 4 4 3 - 2 1 8 5 7 - 6
- eBook ISBN:9 7 8 - 0 - 4 4 3 - 2 1 8 5 6 - 9
Synthetic Data and Generative AI covers the foundations of machine learning with modern approaches to solving complex problems and the systematic generation and use of synthe… Read more
Purchase options
Institutional subscription on ScienceDirect
Request a sales quoteSynthetic Data and Generative AI covers the foundations of machine learning with modern approaches to solving complex problems and the systematic generation and use of synthetic data. Emphasis is on scalability, automation, testing, optimizing, and interpretability (explainable AI). For instance, regression techniques – including logistic and Lasso – are presented as a single method without using advanced linear algebra. Confidence regions and prediction intervals are built using parametric bootstrap without statistical models or probability distributions. Models (including generative models and mixtures) are mostly used to create rich synthetic data to test and benchmark various methods.
- Emphasizes numerical stability and performance of algorithms (computational complexity)
- Focuses on explainable AI/interpretable machine learning, with heavy use of synthetic data and generative models, a new trend in the field
- Includes new, easier construction of confidence regions, without statistics, a simple alternative to the powerful, well-known XGBoost technique
- Covers automation of data cleaning, favoring easier solutions when possible
- Includes chapters dedicated fully to synthetic data applications: fractal-like terrain generation with the diamond-square algorithm, and synthetic star clusters evolving over time and bound by gravity
Computer Scientists and researchers in Artificial Intelligence and Machine Learning, as well as practitioners in analytics in a variety of fields such as quant, engineering, statistics, operations research, biostatisticians, data scientists, data engineers, CTOs, and other decision makers. As such, academics, researchers, and professionals in a variety of research fields who work with AI, algorithms, big data, and machine learning and their applications to various real-world research and application problems will be a target audience. Upper-level undergrad and graduate students in Computer Science, AI, ML, applied mathematics, and data science.
- Cover image
- Title page
- Table of Contents
- Copyright
- Chapter 1: Machine learning cloud regression and optimization
- Abstract
- 1.1. Introduction: circle fitting
- 1.2. Methodology, implementation details, and caveats
- 1.3. Case studies
- 1.4. Connection to synthetic data: meteorites, ocean tides
- References
- Chapter 2: A simple, robust, and efficient ensemble method
- Abstract
- 2.1. Introduction
- 2.2. Methodology
- 2.3. Implementation details
- 2.4. Model-free confidence intervals and perfect nodes
- References
- Chapter 3: Gentle introduction to linear algebra – synthetic time series
- Abstract
- 3.1. Power of a matrix
- 3.2. Examples, generalization, and matrix inversion
- 3.3. Application to machine learning problems
- 3.4. Mathematics of autoregressive time series
- 3.5. Math for machine learning: must-read books
- References
- Chapter 4: Image and video generation
- Abstract
- 4.1. Introduction
- 4.2. Applications
- 4.3. Python code
- 4.4. Visualizations
- References
- Chapter 5: Synthetic clusters and alternative to GMM
- Abstract
- 5.1. Introduction
- 5.2. Generating the synthetic data
- 5.3. Classification and unsupervised clustering
- 5.4. Python code
- References
- Chapter 6: Shape classification and synthetization via explainable AI
- Abstract
- 6.1. Introduction
- 6.2. Mathematical foundations
- 6.3. Shape signature
- 6.4. Shape comparison
- 6.5. Application
- 6.6. Exercises
- References
- Chapter 7: Synthetic data, interpretable regression, and submodels
- Abstract
- 7.1. Introduction
- 7.2. Synthetic data sets and the spreadsheet
- 7.3. Damping schedule and convergence acceleration
- 7.4. Performance assessment on synthetic data
- 7.5. Feature selection
- 7.6. Conclusion
- References
- Chapter 8: From interpolation to fuzzy regression
- Abstract
- 8.1. Introduction
- 8.2. Original version
- 8.3. Full, nonlinear model in higher dimensions
- 8.4. Results
- 8.5. Exercises
- 8.6. Python source code and data sets
- References
- Chapter 9: New interpolation methods for synthetization and prediction
- Abstract
- 9.1. First method
- 9.2. Second method
- 9.3. Python code
- References
- Chapter 10: Synthetic tabular data: copulas vs enhanced GANs
- Abstract
- 10.1. Sensitivity analysis, bias reduction and other uses of synthetic data
- 10.2. Using copulas to generate synthetic data
- 10.3. Synthetization: GAN versus copulas
- 10.4. Deep dive into generative adversarial networks (GAN)
- 10.5. Comparing GANs with the copula method
- 10.6. Data synthetization explained in one picture
- 10.7. Python code: GAN to synthesize medical data
- References
- Chapter 11: High quality random numbers for data synthetization
- Abstract
- 11.1. Introduction
- 11.2. Pseudorandom numbers
- 11.3. Python code
- 11.4. Military-grade PRNG based on quadratic irrationals
- References
- Chapter 12: Some unusual random walks
- Abstract
- 12.1. Symmetric unbiased constrained random walks
- 12.2. Related stochastic processes
- 12.3. Python code
- References
- Chapter 13: Divergent optimization algorithm and synthetic functions
- Abstract
- 13.1. Introduction
- 13.2. Nonconverging fixed-point algorithm
- 13.3. Generalization with synthetic random functions
- 13.4. Smoothing highly chaotic curves
- 13.5. Connection to synthetic data: random functions
- References
- Chapter 14: Synthetic terrain generation and AI-generated art
- Abstract
- 14.1. Introduction
- 14.2. Terrain generation and the evolutionary process
- 14.3. Python code
- 14.4. AI-generated art with 3D contours
- References
- Chapter 15: Synthetic star cluster generation with collision graphs
- Abstract
- 15.1. Introduction
- 15.2. Model parameters and simulation results
- 15.3. Analysis of star collisions and collision graph
- 15.4. Animated data visualizations
- 15.5. Python code and computational issues
- Chapter 16: Perturbed lattice point process: alternative to GMM
- Abstract
- 16.1. Perturbed lattices: definition and properties
- 16.2. Cluster processes and nearest neighbor graphs
- 16.3. Statistical inference for point processes
- 16.4. Special topics
- References
- Chapter 17: Synthetizing multiplicative functions in number theory
- Abstract
- 17.1. Introduction
- 17.2. Euler products
- 17.3. Finite Dirichlet series and generalizations
- 17.4. Exercises
- 17.5. Python code
- References
- Chapter 18: Text, sound generation, and other topics
- Abstract
- 18.1. Sound generation: let your data sing!
- 18.2. Data videos and enhanced visualizations in R
- 18.3. Dual confidence regions
- 18.4. Fast feature selection based on predictive power
- 18.5. NLP: taxonomy creation and text generation
- 18.6. Automated detection of outliers and number of clusters
- 18.7. Advice to beginners
- References
- Glossary
- Index
- No. of pages: 408
- Language: English
- Edition: 1
- Published: January 9, 2024
- Imprint: Morgan Kaufmann
- Paperback ISBN: 9780443218576
- eBook ISBN: 9780443218569
VG
Vincent Granville
Dr. Vincent Granville is a pioneering data scientist and machine learning expert, co-founder of Data Science Central (acquired by TechTarget in 2020), founder of MLTechniques.com, former VC-funded executive, author, and patent owner. Dr. Granville’s past corporate experience includes Visa, Wells Fargo, eBay, NBC, Microsoft, and CNET. Dr. Granville is also a former post-doc at Cambridge University, and the National Institute of Statistical Sciences (NISS). Dr. Granville has published in Journal of Number Theory, Journal of the Royal Statistical Society, and IEEE Transactions on Pattern Analysis and Machine Intelligence, and he is the author of Developing Analytic Talent: Becoming a Data Scientist, Wiley. Dr. Granville lives in Washington state, and enjoys doing research on stochastic processes, dynamical systems, experimental math, and probabilistic number theory. He has been listed in the Forbes magazine Top 20 Big Data Influencers.
Affiliations and expertise
Author and Publisher, MLTechniques.com, USARead Synthetic Data and Generative AI on ScienceDirect