data quality and the ppdm business rules - energyiq© 2013. energyiq, inc. all rights reserved....
TRANSCRIPT
7061 S. University Blvd Centennial, CO 80122 303-790-0919 www.energyiq.info
© 2013. EnergyIQ, Inc. All rights reserved.
Data Quality and the PPDM Business Rules
Steve Cooper: President
© 2013. EnergyIQ, Inc. All rights reserved.
Background
• The PPDM Business Rules initiative provides a platform for sharing data quality rules
2
© 2013. EnergyIQ, Inc. All rights reserved.
Background
• The rules by themselves are only part of the solution • We need to also establish a consistent process for
applying the rules: – Identify the most valuable data based upon an
analysis of workflows and decisions – Apply the dimensions of data quality – Develop the rules to provide a quantitative
assessment of the quality of the data that we care about
– Manage and run the rules effectively – Present the results so that critical trends and
problems can be easily identified
3
© 2013. EnergyIQ, Inc. All rights reserved.
Background
• Establishing a consistent process for applying data quality rules is the focus of this presentation
4
• Most companies do not follow an established process
© 2013. EnergyIQ, Inc. All rights reserved.
Data Value
• Data quality initiatives are expensive and can be overwhelming
• We need to focus resources on delivering the most value to the organization
• This can be achieved by assigning a value to data based upon an assessment of business needs: – Workflows – Processes – Decisions
5
© 2013. EnergyIQ, Inc. All rights reserved.
Data Value
6
• Assign a value to data based upon a scale: – Level 1: Critical – Level 2: Important – Level 3: Useful – Level 4: Supportive
© 2013. EnergyIQ, Inc. All rights reserved.
Data Quality Dimensions
• We need to be clear on what we mean by data quality
• Typically data quality is defined and measured along a number of different dimensions - Accuracy - Timeliness - Completeness - Currency - Consistency - Standards
• We can establish quality requirements for the most valuable data along these dimensions
© 2013. EnergyIQ, Inc. All rights reserved.
Data Quality Rules
• Once the data value and quality matrix has been established it provides the framework for building the rules library
• The PPDM Business Rules initiative will provide a comprehensive list of quality rules in the form of a definition and supporting information
• The rules need to be translated into a format that can be executed to return a quantitative assessment of data quality: – Combine rules – Target different databases, subsets of a
database – Process automatically or manually
9
© 2013. EnergyIQ, Inc. All rights reserved.
Data Quality Rules
10
• Individual rules can be created in standard SQL and stored in the PPDM data model
• Rules should return a Quality % and list of exceptions: Quality % = 1- Exception Count Population Count
© 2013. EnergyIQ, Inc. All rights reserved.
Data Quality Rules
11
• Rule Sets combine individual rules: – Well Header – Well Test – Dates and Elevations …..
• They can be run against a target subset of the database: – State or County – Formation – Rig Operator …..
• This combination of Rule, Rule Set, and Target enables sophisticated data quality analysis to be performed: – Results can be stored in the PPDM database
© 2013. EnergyIQ, Inc. All rights reserved.
Data Quality Results
• Establish acceptable thresholds and ranges • Set meaningful targets for data vendors • Assign a value to data in an acquisition • Begin to treat data as an asset
14
© 2013. EnergyIQ, Inc. All rights reserved.
Summary
15
Business Workflows
Decision Points
Data Requirements
Data Value
Data Quality
QualityRules
Metrics
Business Analysis
Data Analysis (PPDM)
Fix/Audit
© 2013. EnergyIQ, Inc. All rights reserved.
Summary
• The PPDM Business Rules initiative provides a great foundation for any data quality initiative
• To be successful, however, a consistent and robust process must be adopted for developing and executing data quality rules
• The process must include an analysis of the needs of the business and the corresponding value of the data
• We must be able to effectively manage large numbers of rules and how they are executed
• Data quality visualization is important • The PPDM data model is a great place to store the
data quality rules, exceptions, and results
16
7061 S. University Blvd Centennial, CO 80122 303-790-0919 www.energyiq.info
Steve Cooper Ph.D. Principal EnergyIQ
Questions?
© 2013. EnergyIQ, Inc. All rights reserved.
Data Value: Data Objects
• It is difficult to think about data attributes in isolation • It makes more sense to think in terms of data
objects about related information: – Well location – Depths and elevations – Directional surveys – Tests
• This ties in to the concept of the Common Object Model – See PPDM Houston conference
18