Тестирование данных с помощью data quality services (ms sql 12)
DESCRIPTION
Презентация доклада Дмитрия Романова на конференции SQADays-14, Львов 8-9 ноября 2013TRANSCRIPT
Quality Assurance for Data with Data Quality Services (MS SQL 12)
Dmitriy RomanovItera Consulting, Kiev
Dmitriy Romanov
Areas of expertise:
Test Automation for various projects in:Business IntelligenceRIABilling systems
Agenda
• Intro– Data Quality – what it is about ?– Data Quality in Business Intelligence projects– Tools selection
• Data Quality Services– Structure– Project component– Data Quality routine
• Conclusions
Typical information flow
Data Quality Components
DATA QUALITY
Validity
Accuracy
Consistency
Integrity
Timeliness
Completeness
Data Quality IssuesBefore QA :
After QA :
Data Quality: What is it?
Business intelligence (BI) is a set of methodologies, processes, and technologies that
transform raw data into meaningful and useful information for business purposes.
Data Quality – represents the degree to which Data is suitable for business usages
Data Quality: Tools selection
Custom Tools• Variety of technologies• Flexibility• Accuracy
PROS
• Higher Competence level in business area / tech. stack
• Lots of development efforts
CONS
3rd-party software• Established methods, standards,
algorithms• Open / Expandable / Reusable• Lower entry level for newcomers
PROS
• Scalability / performance issues• Limitations
CONS
Gartner Magic Quadrant for BI platformsCHALLENGERS LEADERS
NICHE PLAYERS VISIONARIES
COMPLETENESS OF VISION
ABILITY TO EXECUTE
Data Quality: tasksData Quality Services (DQS) is a Knowledge-Driven data
quality solution enabling data stewards to easily improve the quality of their data
Cleansing Matching
Profiling Monitoring
DQS: Knowledgebase creation process
Build
Use
DQ Projects
KnowledgeManagement
Match & De-dupe Correct & sta
ndardize
Manage Knowledge
Connect
EnterpriseData
ReferenceData
Cloud Services
KnowledgeBase
Discover /
Explore Data
Notifications
Progress
Status
MatchingReference
Data
DQ Clients
DQ Server
DQ Projects Store
Common Knowledge Store
Knowledge Base Store
DQ Engine
3rd Party / Internal
SSIS DQ Component
DQ Active Projects
Published KBs
Knowledge Discovery
Data Profiling & Exploration
Cleansing
Azure Market Place
Reference Data API(Browse, Get,
Update…)
RD Services API
(Browse, Set, Validate…)
Data Domains
DQS User Interface
DQS Structure
DQS Usage
Knowledge Base
Reference Data Definition
Values/Rules
New
Suggestions
Correct & Corrected
Invalid
Source DQS CleansingComponent
SSIS Package
Destination
Reference Data Services
DQS Server
Design Run
Monitor Review & Manage
Activity Monitoring
Interactive Cleansing Project
Real Examples
Business Case – Source Data Quality Assurance
Source Data
Oracle
DB2
csv
Screen
DQS
Load
KDVH
ConfirmStatus
“Ready to load”
DQ Reports
Data steward - requesting source data fixing
ETL
Data steward - managing data KB- monitoring DQ process
How DQS could help QA Engineer ?
• In general it allows to bring closer things Data Analytics usually deal with
• Helps to understand underlaying data better • Introduce measurement and manageability to DQ
matters• Increase re-use/decrease re-work• Open and extendable proposal of new standard to
store and treat Knowledge Bases on iterative basis
Thank you