Limited Offer
Measuring Data Quality for Ongoing Improvement
A Data Quality Assessment Framework
- 1st Edition - December 31, 2012
- Author: Laura Sebastian-Coleman
- Language: English
- Paperback ISBN:9 7 8 - 0 - 1 2 - 3 9 7 0 3 3 - 6
- eBook ISBN:9 7 8 - 0 - 1 2 - 3 9 7 7 5 4 - 0
The Data Quality Assessment Framework shows you how to measure and monitor data quality, ensuring quality over time. You’ll start with general concepts of measurement and work your… Read more
Purchase options
Institutional subscription on ScienceDirect
Request a sales quoteThe Data Quality Assessment Framework shows you how to measure and monitor data quality, ensuring quality over time. You’ll start with general concepts of measurement and work your way through a detailed framework of more than three dozen measurement types related to five objective dimensions of quality: completeness, timeliness, consistency, validity, and integrity. Ongoing measurement, rather than one time activities will help your organization reach a new level of data quality. This plain-language approach to measuring data can be understood by both business and IT and provides practical guidance on how to apply the DQAF within any organization enabling you to prioritize measurements and effectively report on results. Strategies for using data measurement to govern and improve the quality of data and guidelines for applying the framework within a data asset are included. You’ll come away able to prioritize which measurement types to implement, knowing where to place them in a data flow and how frequently to measure. Common conceptual models for defining and storing of data quality results for purposes of trend analysis are also included as well as generic business requirements for ongoing measuring and monitoring including calculations and comparisons that make the measurements meaningful and help understand trends and detect anomalies.
- Demonstrates how to leverage a technology independent data quality measurement framework for your specific business priorities and data quality challenges
- Enables discussions between business and IT with a non-technical vocabulary for data quality measurement
- Describes how to measure data quality on an ongoing basis with generic measurement types that can be applied to any situation
Data quality engineers, managers and analysts, application program managers and developers, data stewards, data managers and analysts, compliance analysts, Business intelligence professionals, Database designers and administrators, Business and IT managers
Dedication
Acknowledgments
Foreword
Author Biography
Introduction: Measuring Data Quality for Ongoing Improvement
Data Quality Measurement: the Problem we are Trying to Solve
Recurring Challenges in the Context of Data Quality
DQAF: the Data Quality Assessment Framework
Overview of Measuring Data Quality for Ongoing Improvement
Intended Audience
What Measuring Data Quality for Ongoing Improvement Does Not Do
Why I Wrote Measuring Data Quality for Ongoing Improvement
Section 1. Concepts and Definitions
Chapter 1. Data
Purpose
Data
Data as Representation
Data as Facts
Data as a Product
Data as Input to Analyses
Data and Expectations
Information
Concluding Thoughts
Chapter 2. Data, People, and Systems
Purpose
Enterprise or Organization
IT and the Business
Data Producers
Data Consumers
Data Brokers
Data Stewards and Data Stewardship
Data Owners
Data Ownership and Data Governance
IT, the Business, and Data Owners, Redux
Data Quality Program Team
Stakeholder
Systems and System Design
Concluding Thoughts
Chapter 3. Data Management, Models, and Metadata
Purpose
Data Management
Database, Data Warehouse, Data Asset, Dataset
Source System, Target System, System of Record
Data Models
Types of Data Models
Physical Characteristics of Data
Metadata
Metadata as Explicit Knowledge
Data Chain and Information Life Cycle
Data Lineage and Data Provenance
Concluding Thoughts
Chapter 4. Data Quality and Measurement
Purpose
Data Quality
Data Quality Dimensions
Measurement
Measurement as Data
Data Quality Measurement and the Business/IT Divide
Characteristics of Effective Measurements
Data Quality Assessment
Data Quality Dimensions, DQAF Measurement Types, Specific Data Quality Metrics
Data Profiling
Data Quality Issues and Data Issue Management
Reasonability Checks
Data Quality Thresholds
Process Controls
In-line Data Quality Measurement and Monitoring
Concluding Thoughts
Section 2. DQAF Concepts and Measurement Types
Chapter 5. DQAF Concepts
Purpose
The Problem the DQAF Addresses
Data Quality Expectations and Data Management
The Scope of the DQAF
DQAF Quality Dimensions
Defining DQAF Measurement Types
Metadata Requirements
Objects of Measurement and Assessment Categories
Functions in Measurement: Collect, Calculate, Compare
Concluding Thoughts
Chapter 6. DQAF Measurement Types
Purpose
Consistency of the Data Model
Ensuring the Correct Receipt of Data for Processing
Inspecting the Condition of Data upon Receipt
Assessing the Results of Data Processing
Assessing the Validity of Data Content
Assessing the Consistency of Data Content
Comments on the Placement of In-line Measurements
Periodic Measurement of Cross-table Content Integrity
Assessing Overall Database Content
Assessing Controls and Measurements
The Measurement Types: Consolidated Listing
Concluding Thoughts
Section 3. Data Assessment Scenarios
Purpose
Assessment Scenarios
Metadata: Knowledge before Assessment
Chapter 7. Initial Data Assessment
Purpose
Initial Assessment
Input to Initial Assessments
Data Expectations
Data Profiling
Column Property Profiling
Structure Profiling
Profiling an Existing Data Asset
From Profiling to Assessment
Deliverables from Initial Assessment
Concluding Thoughts
Chapter 8. Assessment in Data Quality Improvement Projects
Purpose
Data Quality Improvement Efforts
Measurement in Improvement Projects
Chapter 9. Ongoing Measurement
Purpose
The Case for Ongoing Measurement
Example: Health Care Data
Inputs for Ongoing Measurement
Criticality and Risk
Automation
Controls
Periodic Measurement
Deliverables from Ongoing Measurement
In-Line versus Periodic Measurement
Concluding Thoughts
Section 4. Applying the DQAF to Data Requirements
Context
Chapter 10. Requirements, Risk, Criticality
Purpose
Business Requirements
Data Quality Requirements and Expected Data Characteristics
Data Quality Requirements and Risks to Data
Factors Influencing Data Criticality
Specifying Data Quality Metrics
Concluding Thoughts
Chapter 11. Asking Questions
Purpose
Asking Questions
Understanding the Project
Learning about Source Systems
Your Data Consumers’ Requirements
The Condition of the Data
The Data Model, Transformation Rules, and System Design
Measurement Specification Process
Concluding Thoughts
Section 5. A Strategic Approach to Data Quality
Chapter 12. Data Quality Strategy
Purpose
The Concept of Strategy
Systems Strategy, Data Strategy, and Data Quality Strategy
Data Quality Strategy and Data Governance
Decision Points in the Information Life Cycle
General Considerations for Data Quality Strategy
Concluding Thoughts
Chapter 13. Directives for Data Quality Strategy
Purpose
Directive 1: Obtain Management Commitment to Data Quality
Directive 2: Treat Data as an Asset
Directive 3: Apply Resources to Focus on Quality
Directive 4: Build Explicit Knowledge of Data
Directive 5: Treat Data as a Product of Processes that can be Measured and Improved
Directive 6: Recognize Quality is Defined by Data Consumers
Directive 7: Address the Root Causes of Data Problems
Directive 8: Measure Data Quality, Monitor Critical Data
Directive 9: Hold Data Producers Accountable for the Quality of their Data (and Knowledge about that Data)
Directive 10: Provide Data Consumers with the Knowledge they Require for Data Use
Directive 11: Data Needs and Uses will Evolve—Plan for Evolution
Directive 12: Data Quality Goes beyond the Data—Build a Culture Focused on Quality
Concluding Thoughts: Using the Current State Assessment
Section 6. The DQAF in Depth
Functions for Measurement: Collect, Calculate, Compare
Features of the DQAF Measurement Logical Data Model
Facets of the DQAF Measurement Types
Chapter 14. Functions of Measurement: Collection, Calculation, Comparison
Purpose
Functions in Measurement: Collect, Calculate, Compare
Collecting Raw Measurement Data
Calculating Measurement Data
Comparing Measurements to Past History
Statistics
The Control Chart: A Primary Tool for Statistical Process Control
The DQAF and Statistical Process Control
Concluding Thoughts
Chapter 15. Features of the DQAF Measurement Logical Model
Purpose
Metric Definition and Measurement Result Tables
Optional Fields
Denominator Fields
Automated Thresholds
Manual Thresholds
Emergency Thresholds
Manual or Emergency Thresholds and Results Tables
Additional System Requirements
Support Requirements
Concluding Thoughts
Chapter 16. Facets of the DQAF Measurement Types
Purpose
Facets of the DQAF
Organization of the Chapter
Measurement Type #1: Dataset Completeness—Sufficiency of Metadata and Reference Data
Measurement Type #2: Consistent Formatting in One Field
Measurement Type #3: Consistent Formatting, Cross-table
Measurement Type #4: Consistent Use of Default Value in One Field
Measurement Type #5: Consistent Use of Default Values, Cross-table
Measurement Type #6: Timely Delivery of Data for Processing
Measurement Type #7: Dataset Completeness—Availability for Processing
Measurement Type #8: Dataset Completeness—Record Counts to Control Records
Measurement Type #9: Dataset Completeness—Summarized Amount Field Data
Measurement Type #10: Dataset Completeness—Size Compared to Past Sizes
Measurement Type #11: Record Completeness—Length
Measurement Type #12: Field Completeness—Non-Nullable Fields
Measurement Type #13: Dataset Integrity—De-Duplication
Measurement Type #14: Dataset Integrity—Duplicate Record Reasonability Check
Measurement Type #15: Field Content Completeness—Defaults from Source
Measurement Type #16: Dataset Completeness Based on Date Criteria
Measurement Type #17: Dataset Reasonability Based on Date Criteria
Measurement Type #18: Field Content Completeness—Received Data is Missing Fields Critical to Processing
Measurement Type #19: Dataset Completeness—Balance Record Counts Through a Process
Measurement Type #20: Dataset Completeness—Reasons for Rejecting Records
Measurement Type #21: Dataset Completeness Through a Process—Ratio of Input to Output
Measurement Type #22: Dataset Completeness Through a Process—Balance Amount Fields
Measurement Type #23: Field Content Completeness—Ratio of Summed Amount Fields
Measurement Type #24: Field Content Completeness—Defaults from Derivation
Measurement Type #25: Data Processing Duration
Measurement Type #26: Timely Availability of Data for Access
Measurement Type #27: Validity Check, Single Field, Detailed Results
Measurement Type #28: Validity Check, Roll-up
Measurement Logical Data Model
Measurement Type #29: Validity Check, Multiple Columns within a Table, Detailed Results
Measurement Type #30: Consistent Column Profile
Measurement Type #31: Consistent Dataset Content, Distinct Count of Represented Entity, with Ratios to Record Counts
Measurement Type #32 Consistent Dataset Content, Ratio of Distinct Counts of Two Represented Entities
Measurement Type #33: Consistent Multicolumn Profile
Measurement Type #34: Chronology Consistent with Business Rules within a Table
Measurement Type #35: Consistent Time Elapsed (hours, days, months, etc.)
Measurement Type #36: Consistent Amount Field Calculations Across Secondary Fields
Measurement Type #37: Consistent Record Counts by Aggregated Date
Measurement Type #38: Consistent Amount Field Data by Aggregated Date
Measurement Type #39: Parent/Child Referential Integrity
Measurement Type #40: Child/Parent Referential Integrity
Measurement Type #41: Validity Check, Cross Table, Detailed Results
Measurement Type #42: Consistent Cross-table Multicolumn Profile
Measurement Type #43: Chronology Consistent with Business Rules Across-tables
Measurement Type #44: Consistent Cross-table Amount Column Calculations
Measurement Type #45: Consistent Cross-Table Amount Columns by Aggregated Dates
Measurement Type #46: Consistency Compared to External Benchmarks
Measurement Type #47: Dataset Completeness—Overall Sufficiency for Defined Purposes
Measurement Type #48: Dataset Completeness—Overall Sufficiency of Measures and Controls
Concluding Thoughts: Know Your Data
Glossary
Bibliography
Index
Online Materials
Appendix A. Measuring the Value of Data
Appendix B. Data Quality Dimensions
Purpose
Richard Wang’s and Diane Strong’s Data Quality Framework, 1996
Thomas Redman’s Dimensions of Data Quality, 1996
Larry English’s Information Quality Characteristics and Measures, 1999
Appendix C. Completeness, Consistency, and Integrity of the Data Model
Purpose
Process Input and Output
High-Level Assessment
Detailed Assessment
Quality of Definitions
Summary
Appendix D. Prediction, Error, and Shewhart’s Lost Disciple, Kristo Ivanov
Purpose
Limitations of the Communications Model of Information Quality
Error, Prediction, and Scientific Measurement
What Do We Learn from Ivanov?
Ivanov’s Concept of the System as Model
Appendix E. Quality Improvement and Data Quality
Purpose
A Brief History of Quality Improvement
Process Improvement Tools
Implications for Data Quality
Limitations of the Data as Product Metaphor
Concluding Thoughts: Building Quality in Means Building Knowledge in
- No. of pages: 376
- Language: English
- Edition: 1
- Published: December 31, 2012
- Imprint: Morgan Kaufmann
- Paperback ISBN: 9780123970336
- eBook ISBN: 9780123977540
LS