All About Bioinformatics

All About Bioinformatics: From Beginner to Expert provides readers with an overview of the fundamentals and advances in the _x001F_field of bioinformatics, as well as some future directions. Each chapter is didactically organized and includes introduction, applications, tools, and future directions to cover the topics thoroughly.

The book covers both traditional topics such as biological databases, algorithms, genetic variations, static methods, and structural bioinformatics, as well as contemporary advanced topics such as high-throughput technologies, drug informatics, system and network biology, and machine learning. It is a valuable resource for researchers and graduate students who are interested to learn more about bioinformatics to apply in their research work.

CHAPTER 1 What is bioinformatics?1.1 Introduction1.2 History 1.3 Biological databases1.4 Algorithms in computational biology 1.5 Genetic variation and bioinformatics1.6 Structural bioinformatics1.7 High-throughput technology1.8 Drug informatics1.9 System and network biology1.10 Machine learning in bioinformatics1.11 Bioinformatics workflow management systems1.12 Application of bioinformaticsReferences

CHAPTER 2 Introduction to biological databases2.1 Introduction2.1.1 Characteristics of biological data2.2 Types of databases2.2.1 Primary database2.2.2 Secondary database2.2.3 Composite database2.3 Models of databases2.3.1 Flat file2.3.2 Hierarchical model2.3.3 Network model2.3.4 Entity relationship model2.3.5 Relational database model2.3.6 Other models2.4 Primary nucleic acid databases 2.4.1 EMBL2.4.2 GenBank2.4.3 DDBJ2.5 Primary protein databanks2.5.1 PDB2.5.2 SWISS-PROT2.6 Secondary protein databases2.6.1 CATH 2.6.2 SCOP2.6.3 Prostate 2.7 Composite sequence databases2.7.1 Meta-databases2.8 Genomic, proteomic, and other databases2.8.1 The search engines for literature2.9 Genome projects and genomic databases of humans, animals, fungi, and microorganisms2.9.1 Humans2.9.2 Animals2.9.3 Fungi2.9.4 Microorganisms2.9.5 Plant and crop genomic database 2.9.6 Organelle database 2.9.7 Pathway databasesReferences

CHAPTER 3 Statistical methods in bioinformatics 3.1 Introduction 3.2 Statistics at the interface of bioinformatics 3.3 Measures of central tendency 3.3.1 Mean 3.3.2 Median 3.3.3 Mode 3.3.4 Percentiles, quartiles and interquartile range 3.4 Skewness and kurtosis 3.5 Variability and its measures 3.5.1 Variance 3.5.2 Standard deviation 3.5.3 Standard error 3.5.4 Coefficient of variation 3.6 Different types of distributions and their significance 3.6.1 Probability distributions 3.6.2 Continuous probability function 3.6.3 Discrete probability function 3.6.4 Normal distribution and normal curve 3.6.5 Normal curve 3.6.6 Asymmetrical distribution3.7 Sampling3.8 Probability 3.8.1 Laws of probability 3.9 Comparing the means of two or more data variables or groups3.9.1 Independent samples t-test 3.9.2 One sample t-test 3.9.3 Paired samples t-test 3.9.4 ANOVA3.9.5 The Chi-square tests 3.9.6 Test of independence 3.9.7 Test of goodness of fit3.9.8 Correlation and regression 3.9.9 A look into correlation and regression3.10 Platforms employed for statistical analysis3.10.1 Downstream analysis and visualization3.11 Gene ontology & pathway analysis3.11.1 Singular enrichment analysis (SEA)3.11.2 Gene set enrichment analysis (GSEA)3.11.3 Modular enrichment analysis (MEA)3.11.4 Correlation networks3.12 Future prospects and conclusionReferences

CHAPTER 4 Algorithms in computational biology4.1 Sequence alignment4.1.1 Local alignment4.1.2 Global alignment4.1.3 Gap penalty4.2 Pair-wise alignment4.3 Dot-matrix method4.4 Dynamic programming4.4.1 Needleman-Wunsch4.4.2 Smith Waterman algorithm4.5 Scoring matrices4.5.1 Scoring matrices for amino acids4.5.2 PAM (point accepted mutation)4.5.3 BLOcks SUbstitution matrix (BLOSUM)4.6 Word methods4.7 Multiple sequence alignment4.7.1 Progressive alignment4.7.2 Iterative method4.7.3 MSA filtering4.7.4 Filtering techniques’ fundamental principles4.7.5 Programs and methods for multiple sequence alignment4.7.6 Representation and structural inference4.8 Phylogenetics4.8.1 Molecular phylogenetics4.8.2 Phylogenetics trees4.8.3 Properties4.8.4 Building methods4.8.5 Distance matrix method4.8.6 Bayesian inferenceReferences

CHAPTER 5 Genetic variations5.1 Introduction5.2 Types of variations 5.3 Effects of genetic variation5.4 Biological database5.4.1 Database of human genetic variation5.4.2 Predicting the clinical significance of human genetic variation5.5 Phenotype-genotype association5.6 Pharmacogenomics5.6.1 Drug receptors5.6.2 Drug uptake5.6.3 Drug breakdown5.7 Pharmacogenomics and targeted drug development5.7.1 Personalized medicine5.7.2 Personalized medicine drivers5.7.3 Future aspects of pharmacogenomics in personalized medicine5.8 Computational biology methods for decision support in personalized medicine5.8.1 Pharmacogenomics informationReferences

CHAPTER 6 Structural bioinformatics6.1 Introduction6.2 Viewing protein structures6.3 Alignment of protein structures6.4 Structural prediction6.4.1 Use of sequence patterns for protein structure prediction6.4.2 Prediction of protein secondary structure from the amino acid sequence6.4.3 Chou Fasman method6.4.4 GOR method6.4.5 Prediction of three-dimensional protein structure6.4.6 Evaluating the success of structure predictionsReferences

CHAPTER 7 High throughput technology7.1 Omics theory7.2 High-throughput technologies7.3 Genomics7.3.1 What is DNA?7.3.2 DNA microarray7.3.3 DNA sequencing7.3.4 Whole exome sequencing (WES)7.3.5 Single cell DNA-SEQ (sc-DNA-seq)7.4 Epigenomics7.4.1 ChIP-seq7.4.2 Whole-genome shotgun bisulfite sequencing (WGSBS)7.5 Transcriptomics7.5.1 RNA-seq7.6 Proteomics7.6.1 Reverse phase protein microarrays (RPPA)7.7 Metabolomics7.7.1 Different methods for studying metabolomicsReferences

CHAPTER 8 Drug informatics8.1 Introduction8.2 Computational drug designing and discovery8.3 Structure based drug designing8.3.1 Homology modeling8.3.2 Molecular docking8.3.3 Molecular simulation8.4 Ligand-based drug designing8.4.1 Pharmacophore modeling8.5 ADMET8.5.1 Adsorption8.5.2 Distribution8.5.3 Metabolism8.5.4 Excretion8.5.5 Toxicity8.6 Drug repurposingReferences

CHAPTER 9 A machine learning approach to bioinformatics9.1 Introduction to machine learning?9.2 Types of machine learning systems9.2.1 Supervised learning9.2.2 The below are the most commonly used supervised algorithms9.2.3 Logistic regression9.2.4 K-nearest neighbor9.2.5 Decision trees9.2.6 Support vector machines9.2.7 Neural networks9.2.8 Neural networks architecture9.2.9 Convolutional neural network 9.2.10 Unsupervised learning 9.2.11 K-means clustering 9.2.12 Reinforcement learning9.3 Evaluation of machine learning models9.3.1 Accuracy9.3.2 Cross-validation9.3.3 Testing and validating9.4 Optimization of models9.4.1 Parameter searching9.4.2 Ensemble methods9.5 Main challenges of machine learning9.5.1 Insufficient quantity of training data9.5.2 Non-representative training data9.5.3 Quality of data9.5.4 Irrelevant features9.5.5 Overfitting or underfitting on training dataReferences

CHAPTER 10 Systems and network biology10.1 Introduction10.2 Network theory10.3 Graph theory10.4 Features of biological networks10.4.1 The various types of network edges10.4.2 Network measures10.4.3 Network models10.5 Types of biological networks10.5.1 Cell signaling networks10.5.2 Gene/transcription regulation networks10.5.3 Genetic interaction networks10.5.4 Metabolic networks10.5.5 Proteineprotein interaction networks10.6 Sources of data for biological networks10.7 Gene ontology for network analysis10.8 Analysis of biological networks and interactomes10.9 Interaction network construction using a gene list10.10 Data analysis tools10.10.1 The InnateDB10.10.2 Visualization and download of networks10.10.3 Enrichr10.10.4 Babelomics 510.11 Network visualization tools10.11.1 Cytoscape10.11.2 NAViGaTOR10.11.3 VisANT10.11.4 CellDesigner10.11.5 Pathway Studio10.11.6 Gephi10.12 Important properties to be inferred from networks10.12.1 Hubs10.12.2 Bottlenecks10.12.3 Modules10.12.4 Bioinformatics tools to detect modules, bottlenecks and hubsReferences

CHAPTER 11 Bioinformatics workflow management systems 11.1 Introduction to workflow management systems11.2 Galaxy11.3 Gene pattern11.4 KNIME: The Konstanz information miner11.5 LINCS tools 11.5.1 The program’s overall goal11.5.2 Test performed under LINCS11.6 Anduril bioinformatics and image analysis11.6.1 Anduril image analysis: ANIMA11.7 NextFlowReferences

CHAPTER 12 Data handling using Python12.1 Introduction12.2 Datatypes and operators12.2.1 Datatypes12.2.2 Operators12.3 Variables12.4 Strings12.4.1 String indexing12.4.2 Operations on strings12.4.3 Methods in strings12.5 Python lists and tuples12.5.1 Accessing values in list12.5.2 Methods with lists12.5.3 Tuples12.6 Dictionary in Python12.7 Conditional statements12.7.1 Logical operators12.7.2 If and else statements12.8 Loops in Python12.8.1 While loop12.8.2 “For” loop12.8.3 Breaking a loop12.9 File handling in Python12.9.1 Specify file mode12.10 Importing functions12.10.1 Running a t-test in Python12.10.2 Make a simple scatterplot in matplotlib12.10.3 Running a simple linear regression in Python12.11 Data handlingReferences

Index

Life Sciences

Physical Sciences & Engineering

Social Sciences & Humanities

Health

All About Bioinformatics

From Beginner to Expert

Description

Key features

Readership

Table of contents

Product details

About the author

Yasha Hasija

View book on ScienceDirect