
Big Data Analytics
- 1st Edition, Volume 33 - July 7, 2015
- Imprint: North Holland
- Editors: Venu Govindaraju, Vijay Raghavan, C.R. Rao
- Language: English
- Hardback ISBN:9 7 8 - 0 - 4 4 4 - 6 3 4 9 2 - 4
- eBook ISBN:9 7 8 - 0 - 4 4 4 - 6 3 4 9 7 - 9
While the term Big Data is open to varying interpretation, it is quite clear that the Volume, Velocity, and Variety (3Vs) of data have impacted every aspect of computational sc… Read more

Purchase options

Institutional subscription on ScienceDirect
Request a sales quoteWhile the term Big Data is open to varying interpretation, it is quite clear that the Volume, Velocity, and Variety (3Vs) of data have impacted every aspect of computational science and its applications. The volume of data is increasing at a phenomenal rate and a majority of it is unstructured. With big data, the volume is so large that processing it using traditional database and software techniques is difficult, if not impossible. The drivers are the ubiquitous sensors, devices, social networks and the all-pervasive web. Scientists are increasingly looking to derive insights from the massive quantity of data to create new knowledge. In common usage, Big Data has come to refer simply to the use of predictive analytics or other certain advanced methods to extract value from data, without any required magnitude thereon. Challenges include analysis, capture, curation, search, sharing, storage, transfer, visualization, and information privacy. While there are challenges, there are huge opportunities emerging in the fields of Machine Learning, Data Mining, Statistics, Human-Computer Interfaces and Distributed Systems to address ways to analyze and reason with this data. The edited volume focuses on the challenges and opportunities posed by "Big Data" in a variety of domains and how statistical techniques and innovative algorithms can help glean insights and accelerate discovery. Big data has the potential to help companies improve operations and make faster, more intelligent decisions.
- Review of big data research challenges from diverse areas of scientific endeavor
- Rich perspective on a range of data science issues from leading researchers
- Insight into the mathematical and statistical theory underlying the computational methods used to address big data analytics problems in a variety of domains
Computer scientists, statisticians, data scientists, and Artificial Intelligence researchers
A: Modeling and Analytics
Chapter 1: Document Informatics for Scientific Learning and Accelerated Discovery
- Abstract
- 1 Introduction
- 2 How Document Informatics Will Aid Materials Discovery
- 3 The General Research Framework
- 4 Pilot Implementation
Chapter 2: An Introduction to Rare Event Simulation and Importance Sampling
- Abstract
- 1 Introduction: Monte Carlo Methods, Rare Event Simulation, and Variance Reduction Techniques
- 2 MC Methods and the Problem of Rare Events
- 3 Importance Sampling
- 4 Multiple IS
- 5 The Cross-Entropy Method
- 6 MCMC: Rejection Sampling, the Metropolis Method, and Gibbs Sampling
- 7 Applications of VRTs to Error Estimation in Optical Fiber Communication Systems
- 8 Large Deviations Theory, Asymptotic Efficiency, and Final Remarks
Chapter 3: A Large-Scale Study of Language Usage as a Cognitive Biometric Trait
- Abstract
- 1 Introduction
- 2 Cognitive Fingerprints: Problem Description
- 3 Data Description
- 4 Methodology
- 5 Results
- 6 Discussions
- 7 Related Work
- 8 Conclusions and Future Work
- Acknowledgment
Chapter 4: Customer Selection Utilizing Big Data Analytics
- Abstract
- 1 Introduction
- 2 Methodology
- 3 Experiments
- 4 Conclusion
Chapter 5: Continuous Model Selection for Large-Scale Recommender Systems
- Abstract
- 1 Introduction
- 2 Related Work
- 3 Preference Prediction
- 4 Proposed Continuous Modeling
- 5 Experimental Evaluations
- 6 Conclusion and Future Work
Chapter 6: Zero-Knowledge Mechanisms for Private Release of Social Graph Summarization
- Abstract
- 1 Introduction
- 2 Related Work
- 3 Graph Summarization
- 4 Background on ε-Zero-Knowledge Privacy
- 5 ZKP Mechanism for Graph Summarization
- 6 Evaluation
- 7 From Privacy Level to Noise Scale
- 8 Private Probabilistic A-GS
- 9 Conclusions
Chapter 7: Distributed Confidence-Weighted Classification on Big Data Platforms
- Abstract
- 1 Introduction
- 2 Classification with Linear SVM Models
- 3 MapReduce Framework for Distributed Computations
- 4 CW Classification Using MapReduce
- 5 Experiments
- 6 Conclusion
- Acknowledgments
B: Applications and Infrastructure
Chapter 8: Big Data Applications in Health Sciences and Epidemiology
- Abstract
- 1 Introduction
- 2 Mathematical Framework for Epidemiology
- 3 Dynamics and Analysis Problems
- 4 Inference Problems
- 5 Disease Surveillance, Molecular Epidemiology, and Pathogen Phylodynamics
- 6 High-Performance Synthetic Information Environments and Tools
- 7 Summary
- Acknowledgments
Chapter 9: Big Data Driven Natural Language Processing Research and Applications
- Abstract
- 1 Introduction
- 2 NLP Core Tasks
- 3 NLP Applications
- 4 Data Sources for NLP Research
- 5 Big Data Driven NLP Research and Applications
- 6 Trends and Future Research Directions
- 7 Conclusions
Chapter 10: Analyzing Big Spatial and Big Spatiotemporal Data: A Case Study of Methods and Applications
- Abstract
- 1 Introduction
- 2 Algorithms
- 3 Applications
- 4 Conclusions
Chapter 11: Experimental Computational Simulation Environments for Big Data Analytic in Social Sciences
- Abstract
- 1 Introduction
- 2 Big Data Analytics
- 3 Sociofinancial-Economic Simulations
- 4 Software Infrastructure for Social Sciences
- 5 Market Simulators for Financial Economics Modeling
- 6 Statistical Simulations of AT Models
- 7 DRACUS
- 8 Summary
Chapter 12: Terabyte-Scale Image Similarity Search
- Abstract
- 1 Introduction
- 2 Big-Data Processing
- 3 Application Workload (Distributed Indexing + Searching)
- 4 Hadoop in Practice
- 5 Large-Scale Hadoop
- 6 Conclusion
- Acknowledgments
Chapter 13: Measuring Inter-site Engagement in a Network of Sites
- Abstract
- 1 Introduction
- 2 Related Work
- 3 Data, Networks, and Metrics
- 4 Evaluating Inter-site Metrics
- 5 Studying Inter-site Engagement
- 6 The Network Effect
- 7 Hyperlink Performance
- 8 Conclusions
- 9 Future Work
- Acknowledgments
Chapter 14: Scaling RDF Triple Stores in Size and Performance: Modeling SPARQL Queries as Graph Homomorphism Routines
- Abstract
- 1 Introduction
- 2 SPARQL Queries as Graph Homomorphism Routines
- 3 GEMS: Graph Database Engine for Multithreaded Systems
- 4 Related Work
- 5 Experimental Results
- 6 Conclusions
- Edition: 1
- Volume: 33
- Published: July 7, 2015
- Imprint: North Holland
- No. of pages: 390
- Language: English
- Hardback ISBN: 9780444634924
- eBook ISBN: 9780444634979
VG
Venu Govindaraju
VR
Vijay Raghavan
CR
C.R. Rao
He retired from ISI in 1980 at the mandatory age of 60 after working for 40 years during which period he developed ISI as an international center for statistical education and research. He also took an active part in establishing state statistical bureaus to collect local statistics and transmitting them to Central Statistical Organization in New Delhi. Rao played a pivitol role in launching undergraduate and postgraduate courses at ISI. He is the author of 475 research publications and several breakthrough papers contributing to statistical theory and methodology for applications to problems in all areas of human endeavor. There are a number of classical statistical terms named after him, the most popular of which are Cramer-Rao inequality, Rao-Blackwellization, Rao’s Orthogonal arrays used in quality control, Rao’s score test, Rao’s Quadratic Entropy used in ecological work, Rao’s metric and distance which are incorporated in most statistical books.
He is the author of 10 books, of which two important books are, Linear Statistical Inference which is translated into German, Russian, Czec, Polish and Japanese languages,and Statistics and Truth which is translated into, French, German, Japanese, Mainland Chinese, Taiwan Chinese, Turkish and Korean languages.
He directed the research work of 50 students for the Ph.D. degrees who in turn produced 500 Ph.D.’s. Rao received 38 hon. Doctorate degree from universities in 19 countries spanning 6 continents. He received the highest awards in statistics in USA,UK and India: National Medal of Science awarded by the president of USA, Indian National Medal of Science awarded by the Prime Minister of India and the Guy Medal in Gold awarded by the Royal Statistical Society, UK. Rao was a recipient of the first batch of Bhatnagar awards in 1959 for mathematical sciences and and numerous medals in India and abroad from Science Academies. He is a Fellow of Royal Society (FRS),UK, and member of National Academy of Sciences, USA, Lithuania and Europe. In his honor a research Institute named as CRRAO ADVANCED INSTITUTE OF MATHEMATICS, STATISTICS AND COMPUTER SCIENCE was established in the campus of Hyderabad University.