
The Art and Science of Analyzing Software Data
- 1st Edition - August 27, 2015
- Imprint: Morgan Kaufmann
- Editors: Christian Bird, Tim Menzies, Thomas Zimmermann
- Language: English
- Paperback ISBN:9 7 8 - 0 - 1 2 - 4 1 1 5 1 9 - 4
- eBook ISBN:9 7 8 - 0 - 1 2 - 4 1 1 5 4 3 - 9
The Art and Science of Analyzing Software Data provides valuable information on analysis techniques often used to derive insight from software data. This book shares best practices… Read more

Purchase options

Institutional subscription on ScienceDirect
Request a sales quoteThe Art and Science of Analyzing Software Data provides valuable information on analysis techniques often used to derive insight from software data. This book shares best practices in the field generated by leading data scientists, collected from their experience training software engineering students and practitioners to master data science.
The book covers topics such as the analysis of security data, code reviews, app stores, log files, and user telemetry, among others. It covers a wide variety of techniques such as co-change analysis, text analysis, topic analysis, and concept analysis, as well as advanced topics such as release planning and generation of source code comments. It includes stories from the trenches from expert data scientists illustrating how to apply data analysis in industry and open source, present results to stakeholders, and drive decisions.
- Presents best practices, hints, and tips to analyze data and apply tools in data science projects
- Presents research methods and case studies that have emerged over the past few years to furtherunderstanding of software data
- Shares stories from the trenches of successful data science initiatives in industry
Practicing software engineers, researchers and graduate software engineering students with an interest in data science.
- List of Contributors
- Chapter 1: Past, Present, and Future of Analyzing Software Data
- Abstract
- Acknowledgments
- 1.1 Definitions
- 1.2 The Past: Origins
- 1.3 Present Day
- 1.4 Conclusion
- Part 1: Tutorial-Techniques
- Chapter 2: Mining Patterns and Violations Using Concept Analysis
- Abstract
- Acknowledgments
- 2.1 Introduction
- 2.2 Patterns and Blocks
- 2.3 Computing All Blocks
- 2.4 Mining Shopping Carts with Colibri
- 2.5 Violations
- 2.6 Finding Violations
- 2.7 Two Patterns or One Violation?
- 2.8 Performance
- 2.9 Encoding Order
- 2.10 Inlining
- 2.11 Related Work
- 2.12 Conclusions
- Chapter 3: Analyzing Text in Software Projects
- Abstract
- 3.1 Introduction
- 3.2 Textual Software Project Data and Retrieval
- 3.3 Manual Coding
- 3.4 Automated Analysis
- 3.5 Two Industrial Studies
- 3.6 Summary
- Chapter 4: Synthesizing Knowledge from Software Development Artifacts
- Abstract
- 4.1 Problem Statement
- 4.2 Artifact Lifecycle Models
- 4.3 Code Review
- 4.4 Lifecycle Analysis
- 4.5 Other Applications
- 4.6 Conclusion
- Chapter 5: A Practical Guide to Analyzing IDE Usage Data
- Abstract
- Acknowledgments
- 5.1 Introduction
- 5.2 Usage Data Research Concepts
- 5.3 How to Collect Data
- 5.4 How to Analyze Usage Data
- 5.5 Limits of What You Can Learn from Usage Data
- 5.6 Conclusion
- 5.7 Code Listings
- Chapter 6: Latent Dirichlet Allocation: Extracting Topics from Software Engineering Data
- Abstract
- 6.1 Introduction
- 6.2 Applications of LDA in Software Analysis
- 6.3 How LDA Works
- 6.4 LDA Tutorial
- 6.5 Pitfalls and Threats to Validity
- 6.6 Conclusions
- Chapter 7: Tools and Techniques for Analyzing Product and Process Data
- Abstract
- 7.1 Introduction
- 7.2 A Rational Analysis Pipeline
- 7.3 Source Code Analysis
- 7.4 Compiled Code Analysis
- 7.5 Analysis of Configuration Management Data
- 7.6 Data Visualization
- 7.7 Concluding Remarks
- Chapter 2: Mining Patterns and Violations Using Concept Analysis
- Part 2: Data/Problem Focussed
- Chapter 8: Analyzing Security Data
- Abstract
- 8.1 Vulnerability
- 8.2 Security Data “Gotchas”
- 8.3 Measuring Vulnerability Severity
- 8.4 Method of Collecting and Analyzing Vulnerability Data
- 8.5 What Security Data has Told Us Thus Far
- 8.6 Summary
- Chapter 9: A Mixed Methods Approach to Mining Code Review Data: Examples and a Study of Multicommit Reviews and Pull Requests
- Abstract
- 9.1 Introduction
- 9.2 Motivation for a Mixed Methods Approach
- 9.3 Review Process and Data
- 9.4 Quantitative Replication Study: Code Review on Branches
- 9.5 Qualitative Approaches
- 9.6 Triangulation
- 9.7 Conclusion
- Chapter 10: Mining Android Apps for Anomalies
- Abstract
- Acknowledgments
- 10.1 Introduction
- 10.2 Clustering Apps by Description
- 10.3 Identifying Anomalies by APIs
- 10.4 Evaluation
- 10.5 Related Work
- 10.6 Conclusion and Future Work
- Chapter 11: Change Coupling Between Software Artifacts: Learning from Past Changes
- Abstract
- 11.1 Introduction
- 11.2 Change Coupling
- 11.3 Change Coupling Identification Approaches
- 11.4 Challenges in Change Coupling Identification
- 11.5 Change Coupling Applications
- 11.6 Conclusion
- Chapter 8: Analyzing Security Data
- Part 3: Stories from the Trenches
- Chapter 12: Applying Software Data Analysis in Industry Contexts: When Research Meets Reality
- Abstract
- 12.1 Introduction
- 12.2 Background
- 12.3 Six Key Issues when Implementing a Measurement Program in Industry
- 12.4 Conclusions
- Chapter 13: Using Data to Make Decisions in Software Engineering: Providing a Method to our Madness
- Abstract
- 13.1 Introduction
- 13.2 Short History of Software Engineering Metrics
- 13.3 Establishing Clear Goals
- 13.4 Review of Metrics
- 13.5 Challenges with Data Analysis on Software Projects
- 13.6 Example of Changing Product Development Through the Use of Data
- 13.7 Driving Software Engineering Processes with Data
- Chapter 14: Community Data for OSS Adoption Risk Management
- Abstract
- Acknowledgments
- 14.1 Introduction
- 14.2 Background
- 14.3 An Approach to OSS Risk Adoption Management
- 14.4 OSS Communities Structure and Behavior Analysis: The XWiki Case
- 14.5 A Risk Assessment Example: The Moodbile Case
- 14.6 Related Work
- 14.7 Conclusions
- Chapter 15: Assessing the State of Software in a Large Enterprise: A 12-Year Retrospective
- Abstract
- Acknowledgments
- 15.1 Introduction
- 15.2 Evolution of the Process and the Assessment
- 15.3 Impact Summary of the State of Avaya Software Report
- 15.4 Assessment Approach and Mechanisms
- 15.5 Data Sources
- 15.6 Examples of Analyses
- 15.7 Software Practices
- 15.8 Assessment Follow-up: Recommendations and Impact
- 15.9 Impact of the Assessments
- 15.10 Conclusions
- 15.11 Appendix
- Author Biographies
- Chapter 16: Lessons Learned from Software Analytics in Practice
- Abstract
- 16.1 Introduction
- 16.2 Problem Selection
- 16.3 Data Collection
- 16.4 Descriptive Analytics
- 16.5 Predictive Analytics
- 16.6 Road Ahead
- Chapter 12: Applying Software Data Analysis in Industry Contexts: When Research Meets Reality
- Part 4: Advanced Topics
- Chapter 17: Code Comment Analysis for Improving Software Quality
- Abstract
- 17.1 Introduction
- 17.2 Text Analytics: Techniques, Tools, and Measures
- 17.3 Studies of Code Comments
- 17.4 Automated Code Comment Analysis for Specification Mining and Bug Detection
- 17.5 Studies and Analysis of API Documentation
- 17.6 Future Directions and Challenges
- Chapter 18: Mining Software Logs for Goal-Driven Root Cause Analysis
- Abstract
- 18.1 Introduction
- 18.2 Approaches to Root Cause Analysis
- 18.3 Root Cause Analysis Framework Overview
- 18.4 Modeling Diagnostics for Root Cause Analysis
- 18.5 Log Reduction
- 18.6 Reasoning Techniques
- 18.7 Root Cause Analysis for Failures Induced by Internal Faults
- 18.8 Root Cause Analysis for Failures due to External Threats
- 18.9 Experimental Evaluations
- 18.10 Conclusions
- Chapter 19: Analytical Product Release Planning
- Abstract
- Acknowledgments
- 19.1 Introduction and Motivation
- 19.2 Taxonomy of Data-intensive Release Planning Problems
- 19.3 Information Needs for Software Release Planning
- 19.4 The Paradigm of Analytical Open Innovation
- Analysis phase
- Synthesize phase
- 19.5 Analytical Release Planning—A Case Study
- 19.6 Summary and Future Research
- 19.7 Appendix: Feature Dependency Constraints
- Chapter 17: Code Comment Analysis for Improving Software Quality
- Part 5: Data Analysis at Scale (Big Data)
- Chapter 20: Boa: An Enabling Language and Infrastructure for Ultra-Large-Scale MSR Studies
- Abstract
- 20.1 Objectives
- 20.2 Getting Started with Boa
- 20.3 Boa’s Syntax and Semantics
- 20.4 Mining Project and Repository Metadata
- 20.5 Mining Source Code with Visitors
- 20.6 Guidelines for Replicable Research
- 20.7 Conclusions
- 20.8 Practice Problems
- Project and Repository Metadata Problems
- Source Code Problems
- Chapter 21: Scalable Parallelization of Specification Mining Using Distributed Computing
- Abstract
- 21.1 Introduction
- 21.2 Background
- 21.3 Distributed Specification Mining
- 21.4 Implementation and Empirical Evaluation
- 21.5 Related Work
- 21.6 Conclusion and Future Work
- Chapter 20: Boa: An Enabling Language and Infrastructure for Ultra-Large-Scale MSR Studies
- Edition: 1
- Published: August 27, 2015
- No. of pages (Paperback): 672
- Imprint: Morgan Kaufmann
- Language: English
- Paperback ISBN: 9780124115194
- eBook ISBN: 9780124115439
CB
Christian Bird
TM
Tim Menzies
TZ