
Big Data Analytics in Chemoinformatics and Bioinformatics
With Applications to Computer-Aided Drug Design, Cancer Biology, Emerging Pathogens and Computational Toxicology
- 1st Edition - December 6, 2022
- Imprint: Elsevier
- Editors: Subhash C. Basak, Marjan Vračko
- Language: English
- Paperback ISBN:9 7 8 - 0 - 3 2 3 - 8 5 7 1 3 - 0
- eBook ISBN:9 7 8 - 0 - 3 2 3 - 8 5 7 1 4 - 7
Big Data Analytics in Chemoinformatics and Bioinformatics: With Applications to Computer-Aided Drug Design, Cancer Biology, Emerging Pathogens and Computational Toxic… Read more

Purchase options

Institutional subscription on ScienceDirect
Request a sales quoteBig Data Analytics in Chemoinformatics and Bioinformatics: With Applications to Computer-Aided Drug Design, Cancer Biology, Emerging Pathogens and Computational Toxicology provides an up-to-date presentation of big data analytics methods and their applications in diverse fields. The proper management of big data for decision-making in scientific and social issues is of paramount importance. This book gives researchers the tools they need to solve big data problems in these fields. It begins with a section on general topics that all readers will find useful and continues with specific sections covering a range of interdisciplinary applications.
Here, an international team of leading experts review their respective fields and present their latest research findings, with case studies used throughout to analyze and present key information.
- Brings together the current knowledge on the most important aspects of big data, including analysis using deep learning and fuzzy logic, transparency and data protection, disparate data analytics, and scalability of the big data domain
- Covers many applications of big data analysis in diverse fields such as chemistry, chemoinformatics, bioinformatics, computer-assisted drug/vaccine design, characterization of emerging pathogens, and environmental protection
- Highlights the considerable benefits offered by big data analytics to science, in biomedical fields and in industry
Researchers involved in the management and practical use of big data in chemistry, biology, chemoinformatics, bioinformatics, computational chemistry, new drug discovery, drug design, and surveillance of emerging pathogens. Students and young researchers interested in techniques and applications of big data analytics
- Cover image
- Title page
- Table of Contents
- Copyright
- List of contributors
- Preface
- Section 1: General section
- 1. Chemoinformatics and bioinformatics by discrete mathematics and numbers: an adventure from small data to the realm of emerging big data
- Abstract
- 1.1 Introduction
- 1.2 Chemobioinformatics—a confluence of disciplines?
- 1.3 Bioifnormatics: quantitative inforamtics in the age of big biology
- 1.4 Major pillars of model building
- 1.5 Discussion
- 1.6 Conclusion
- Acknowledgment
- References
- 2. Robustness concerns in high-dimensional data analyses and potential solutions
- Abstract
- 2.1 Introduction
- 2.2 Sparse estimation in high-dimensional regression models
- 2.3 Robustness concerns for the penalized likelihood methods
- 2.4 Penalized M-estimation for robust high-dimensional analyses
- 2.5 Robust minimum divergence methods for high-dimensional regressions
- 2.6 A real-life application: identifying important descriptors of amines for explaining their mutagenic activity
- 2.7 Concluding remarks
- Appendix: A list of useful R-packages for high-dimensional data analysis
- Acknowledgments
- References
- 3. Fairness, explainability, privacy, and robustness for trustworthy algorithmic decision-making
- Abstract
- 3.1 Introduction
- 3.2 Fairness in machine learning
- 3.3 Explainable artificial intelligence
- 3.4 Notions of algorithmic privacy
- 3.5 Robustness
- 3.6 Discussion
- References
- Section 2: Chemistry & chemoinformatics section
- 4. How to integrate the “small and big” data into a complex adverse outcome pathway?
- Abstract
- 4.1 Introduction
- 4.2 State and review
- 4.3 Binding affinity to androgen nuclear receptor evaluated with respect to carcinogenic potency data
- 4.4 Conclusion and future directions
- References
- 5. Big data and deep learning: extracting and revising chemical knowledge from data
- Abstract
- 5.1 Introduction
- 5.2 Basic methods in neural networks and deep learning
- 5.3 Neural networks for quantitative structure–activity relationship: input, output, and parameters
- 5.4 Deep learning models for mutagenicity prediction
- 5.5 Interpreting deep neural network models
- 5.6 Discussion and conclusions
- References
- 6. Retrosynthetic space modeled by big data descriptors
- Abstract
- 6.1 Introduction
- 6.2 Computer-assisted organic synthesis
- 6.3 Quantitative structure–activity relationship model
- 6.4 Dimensionality reduction using retrosynthetic analysis
- 6.5 Discussion
- References
- 7. Approaching history of chemistry through big data on chemical reactions and compounds
- Abstract
- 7.1 Introduction
- 7.2 Computational history of chemistry
- 7.3 The expanding chemical space, a case study for computational history of chemistry
- 7.4 Conclusions
- Acknowledgments
- References
- 8. Combinatorial and quantum techniques for large data sets: hypercubes and halocarbons
- Abstract
- 8.1 Introduction
- 8.2 Combinatorial techniques for isomer enumerations to generate large datasets
- 8.3 Quantum chemical techniques for large data sets
- 8.4 Hypercubes and large datasets
- 8.5 Conclusion
- References
- 9. Development of quantitative structure–activity relationship models based on electrophilicity index: a conceptual DFT-based descriptor
- Abstract
- 9.1 Introduction
- 9.2 Theoretical background
- 9.3 Computational details
- 9.4 Methodology
- 9.5 Results and discussion
- 9.6 Conclusion
- Acknowledgments
- Conflict of interest
- References
- 10. Pharmacophore-based virtual screening of large compound databases can aid “big data” problems in drug discovery
- Abstract
- 10.1 Introduction
- 10.2 Background of data analytics, machine learning, intelligent augmentation methods and applications in drug discovery
- 10.3 Pharmacophore modeling
- 10.4 Concluding remarks
- References
- 11. A new robust classifier to detect hot-spots and null-spots in protein–protein interface: validation of binding pocket and identification of inhibitors in in vitro and in vivo models
- Abstract
- 11.1 Introduction
- 11.2 Training and testing of the classifier
- 11.3 Technical details to develop novel protein–protein interaction hotspot prediction program
- 11.4 A case study
- 11.5 Discussion
- Author contribution
- Acknowledgment
- Conflicts of interest
- References
- 12. Mining big data in drug discovery—triaging and decision trees
- Abstract
- 12.1 Introduction
- 12.2 Big data in drug discovery
- 12.3 Triaging
- 12.4 Decision trees
- 12.5 Recursive partitioning
- 12.6 PhyloGenetic-like trees
- 12.7 Multidomain classification
- 12.8 Fuzzy trees and clustering
- Acknowledgments
- References
- Section 3: Bioinformatics and computatioanl toxicology section
- 13. Use of proteomics data and proteomics-based biodescriptors in the estimation of bioactivity/toxicity of chemicals and nanosubstances
- Abstract
- 13.1 Introduction
- 13.2 Proteomics technologies and their toxicological applications
- 13.3 Discussion
- Acknowledgment
- References
- 14. Mapping interaction between big spaces; active space from protein structure and available chemical space
- Abstract
- 14.1 Introduction
- 14.2 Background
- 14.3 Protein topology for exploring structure space
- 14.4 Scaffolds curve the functional and catalytic sites
- 14.5 Protein interactive sites and designing of inhibitor
- 14.6 Intrinsically unstructured regions and protein function
- 14.7 Conclusions
- Acknowledgments
- References
- 15. Artificial intelligence, big data and machine learning approaches in genome-wide SNP-based prediction for precision medicine and drug discovery
- Abstract
- 15.1 Introduction
- 15.2 Role of artificial intelligence and machine learning in medicine
- 15.3 Genome-wide SNP prediction
- 15.4 Artificial intelligence, precision medicine and drug discovery
- 15.5 Applications of artificial intelligence in disease prediction and analysis oncology
- 15.6 Cardiology
- 15.7 Neurology
- 15.8 Conclusion
- Abbreviations
- References
- 16. Applications of alignment-free sequence descriptors in the characterization of sequences in the age of big data: a case study with Zika virus, SARS, MERS, and COVID-19
- Abstract
- 16.1 Introduction
- 16.2 Section 1—bioinformatics today: problems now
- 16.3 Section 2—bioinformatics today and tomorrow: sustainable solutions
- 16.4 Summary
- References
- 17. Scalable quantitative structure–activity relationship systems for predictive toxicology
- Abstract
- 17.1 Background
- 17.2 Scalability in quantitative structure–activity relationship modeling
- 17.3 Summary
- References
- 18. From big data to complex network: a navigation through the maze of drug–target interaction
- Abstract
- 18.1 Introduction
- 18.2 Databases
- 18.3 Prediction, construction, and analysis of drug–target network
- 18.4 Conclusion and perspectives
- Acknowledgments
- References
- 19. Dissecting big RNA-Seq cancer data using machine learning to find disease-associated genes and the causal mechanism
- Abstract
- 19.1 Introduction
- 19.2 Bird’s eye view of the analysis of cancer RNA-Seq data using machine learning
- 19.3 Materials and methods
- 19.4 Hand-in-hand walk with RNA-Seq data
- 19.5 Conclusion
- References
- Index
- Edition: 1
- Published: December 6, 2022
- Imprint: Elsevier
- No. of pages: 502
- Language: English
- Paperback ISBN: 9780323857130
- eBook ISBN: 9780323857147
SB
Subhash C. Basak
MV