The Art and Science of Analyzing Software Data

The Art and Science of Analyzing Software Data provides valuable information on analysis techniques often used to derive insight from software data. This book shares best practices in the field generated by leading data scientists, collected from their experience training software engineering students and practitioners to master data science.

The book covers topics such as the analysis of security data, code reviews, app stores, log files, and user telemetry, among others. It covers a wide variety of techniques such as co-change analysis, text analysis, topic analysis, and concept analysis, as well as advanced topics such as release planning and generation of source code comments. It includes stories from the trenches from expert data scientists illustrating how to apply data analysis in industry and open source, present results to stakeholders, and drive decisions.

List of Contributors
Chapter 1: Past, Present, and Future of Analyzing Software Data
- Abstract
- Acknowledgments
- 1.1 Definitions
- 1.2 The Past: Origins
- 1.3 Present Day
- 1.4 Conclusion
Part 1: Tutorial-Techniques
- Chapter 2: Mining Patterns and Violations Using Concept Analysis
  - Abstract
  - Acknowledgments
  - 2.1 Introduction
  - 2.2 Patterns and Blocks
  - 2.3 Computing All Blocks
  - 2.4 Mining Shopping Carts with Colibri
  - 2.5 Violations
  - 2.6 Finding Violations
  - 2.7 Two Patterns or One Violation?
  - 2.8 Performance
  - 2.9 Encoding Order
  - 2.10 Inlining
  - 2.11 Related Work
  - 2.12 Conclusions
- Chapter 3: Analyzing Text in Software Projects
  - Abstract
  - 3.1 Introduction
  - 3.2 Textual Software Project Data and Retrieval
  - 3.3 Manual Coding
  - 3.4 Automated Analysis
  - 3.5 Two Industrial Studies
  - 3.6 Summary
- Chapter 4: Synthesizing Knowledge from Software Development Artifacts
  - Abstract
  - 4.1 Problem Statement
  - 4.2 Artifact Lifecycle Models
  - 4.3 Code Review
  - 4.4 Lifecycle Analysis
  - 4.5 Other Applications
  - 4.6 Conclusion
- Chapter 5: A Practical Guide to Analyzing IDE Usage Data
  - Abstract
  - Acknowledgments
  - 5.1 Introduction
  - 5.2 Usage Data Research Concepts
  - 5.3 How to Collect Data
  - 5.4 How to Analyze Usage Data
  - 5.5 Limits of What You Can Learn from Usage Data
  - 5.6 Conclusion
  - 5.7 Code Listings
- Chapter 6: Latent Dirichlet Allocation: Extracting Topics from Software Engineering Data
  - Abstract
  - 6.1 Introduction
  - 6.2 Applications of LDA in Software Analysis
  - 6.3 How LDA Works
  - 6.4 LDA Tutorial
  - 6.5 Pitfalls and Threats to Validity
  - 6.6 Conclusions
- Chapter 7: Tools and Techniques for Analyzing Product and Process Data
  - Abstract
  - 7.1 Introduction
  - 7.2 A Rational Analysis Pipeline
  - 7.3 Source Code Analysis
  - 7.4 Compiled Code Analysis
  - 7.5 Analysis of Configuration Management Data
  - 7.6 Data Visualization
  - 7.7 Concluding Remarks
Part 2: Data/Problem Focussed
- Chapter 8: Analyzing Security Data
  - Abstract
  - 8.1 Vulnerability
  - 8.2 Security Data “Gotchas”
  - 8.3 Measuring Vulnerability Severity
  - 8.4 Method of Collecting and Analyzing Vulnerability Data
  - 8.5 What Security Data has Told Us Thus Far
  - 8.6 Summary
- Chapter 9: A Mixed Methods Approach to Mining Code Review Data: Examples and a Study of Multicommit Reviews and Pull Requests
  - Abstract
  - 9.1 Introduction
  - 9.2 Motivation for a Mixed Methods Approach
  - 9.3 Review Process and Data
  - 9.4 Quantitative Replication Study: Code Review on Branches
  - 9.5 Qualitative Approaches
  - 9.6 Triangulation
  - 9.7 Conclusion
- Chapter 10: Mining Android Apps for Anomalies
  - Abstract
  - Acknowledgments
  - 10.1 Introduction
  - 10.2 Clustering Apps by Description
  - 10.3 Identifying Anomalies by APIs
  - 10.4 Evaluation
  - 10.5 Related Work
  - 10.6 Conclusion and Future Work
- Chapter 11: Change Coupling Between Software Artifacts: Learning from Past Changes
  - Abstract
  - 11.1 Introduction
  - 11.2 Change Coupling
  - 11.3 Change Coupling Identification Approaches
  - 11.4 Challenges in Change Coupling Identification
  - 11.5 Change Coupling Applications
  - 11.6 Conclusion
Part 3: Stories from the Trenches
- Chapter 12: Applying Software Data Analysis in Industry Contexts: When Research Meets Reality
  - Abstract
  - 12.1 Introduction
  - 12.2 Background
  - 12.3 Six Key Issues when Implementing a Measurement Program in Industry
  - 12.4 Conclusions
- Chapter 13: Using Data to Make Decisions in Software Engineering: Providing a Method to our Madness
  - Abstract
  - 13.1 Introduction
  - 13.2 Short History of Software Engineering Metrics
  - 13.3 Establishing Clear Goals
  - 13.4 Review of Metrics
  - 13.5 Challenges with Data Analysis on Software Projects
  - 13.6 Example of Changing Product Development Through the Use of Data
  - 13.7 Driving Software Engineering Processes with Data
- Chapter 14: Community Data for OSS Adoption Risk Management
  - Abstract
  - Acknowledgments
  - 14.1 Introduction
  - 14.2 Background
  - 14.3 An Approach to OSS Risk Adoption Management
  - 14.4 OSS Communities Structure and Behavior Analysis: The XWiki Case
  - 14.5 A Risk Assessment Example: The Moodbile Case
  - 14.6 Related Work
  - 14.7 Conclusions
- Chapter 15: Assessing the State of Software in a Large Enterprise: A 12-Year Retrospective
  - Abstract
  - Acknowledgments
  - 15.1 Introduction
  - 15.2 Evolution of the Process and the Assessment
  - 15.3 Impact Summary of the State of Avaya Software Report
  - 15.4 Assessment Approach and Mechanisms
  - 15.5 Data Sources
  - 15.6 Examples of Analyses
  - 15.7 Software Practices
  - 15.8 Assessment Follow-up: Recommendations and Impact
  - 15.9 Impact of the Assessments
  - 15.10 Conclusions
  - 15.11 Appendix
  - Author Biographies
- Chapter 16: Lessons Learned from Software Analytics in Practice
  - Abstract
  - 16.1 Introduction
  - 16.2 Problem Selection
  - 16.3 Data Collection
  - 16.4 Descriptive Analytics
  - 16.5 Predictive Analytics
  - 16.6 Road Ahead
Part 4: Advanced Topics
- Chapter 17: Code Comment Analysis for Improving Software Quality
  - Abstract
  - 17.1 Introduction
  - 17.2 Text Analytics: Techniques, Tools, and Measures
  - 17.3 Studies of Code Comments
  - 17.4 Automated Code Comment Analysis for Specification Mining and Bug Detection
  - 17.5 Studies and Analysis of API Documentation
  - 17.6 Future Directions and Challenges
- Chapter 18: Mining Software Logs for Goal-Driven Root Cause Analysis
  - Abstract
  - 18.1 Introduction
  - 18.2 Approaches to Root Cause Analysis
  - 18.3 Root Cause Analysis Framework Overview
  - 18.4 Modeling Diagnostics for Root Cause Analysis
  - 18.5 Log Reduction
  - 18.6 Reasoning Techniques
  - 18.7 Root Cause Analysis for Failures Induced by Internal Faults
  - 18.8 Root Cause Analysis for Failures due to External Threats
  - 18.9 Experimental Evaluations
  - 18.10 Conclusions
- Chapter 19: Analytical Product Release Planning
  - Abstract
  - Acknowledgments
  - 19.1 Introduction and Motivation
  - 19.2 Taxonomy of Data-intensive Release Planning Problems
  - 19.3 Information Needs for Software Release Planning
  - 19.4 The Paradigm of Analytical Open Innovation
  - Analysis phase
  - Synthesize phase
  - 19.5 Analytical Release Planning—A Case Study
  - 19.6 Summary and Future Research
  - 19.7 Appendix: Feature Dependency Constraints
Part 5: Data Analysis at Scale (Big Data)
- Chapter 20: Boa: An Enabling Language and Infrastructure for Ultra-Large-Scale MSR Studies
  - Abstract
  - 20.1 Objectives
  - 20.2 Getting Started with Boa
  - 20.3 Boa’s Syntax and Semantics
  - 20.4 Mining Project and Repository Metadata
  - 20.5 Mining Source Code with Visitors
  - 20.6 Guidelines for Replicable Research
  - 20.7 Conclusions
  - 20.8 Practice Problems
  - Project and Repository Metadata Problems
  - Source Code Problems
- Chapter 21: Scalable Parallelization of Specification Mining Using Distributed Computing
  - Abstract
  - 21.1 Introduction
  - 21.2 Background
  - 21.3 Distributed Specification Mining
  - 21.4 Implementation and Empirical Evaluation
  - 21.5 Related Work
  - 21.6 Conclusion and Future Work

Life Sciences

Physical Sciences & Engineering

Social Sciences & Humanities

Health

The Art and Science of Analyzing Software Data

Description

Key features

Readership

Table of contents

Product details

About the editors

Christian Bird

Tim Menzies

Thomas Zimmermann

View book on ScienceDirect