…optimise your IT investments
Data DiscoveryUnderstanding data relationships
Philip HowardResearch Director – Bloor Research
…optimise your IT investmentsConfidential © Bloor Research 2009 telling the Information Management storyConfidential © Bloor Research 2009
Agenda
What are data relationships and why are they important?
Different approaches to discovering data relationships
Features you might look for in a data discovery tool
…optimise your IT investmentsConfidential © Bloor Research 2009 telling the Information Management storyConfidential © Bloor Research 2009
What is a data relationship?
1. A relationship between database tables, either within or across databases
2. A relationship within or across non-relational data sources
3. A relationship between a relational and non-relational source
Note that relationships may be complex and/or involve more than 2 elements
…optimise your IT investmentsConfidential © Bloor Research 2009 telling the Information Management storyConfidential © Bloor Research 2009
Why are data relationships important?
1. Data migration
…optimise your IT investmentsConfidential © Bloor Research 2009 telling the Information Management storyConfidential © Bloor Research 2009
Why are data relationships important?
2. Data archival
…optimise your IT investmentsConfidential © Bloor Research 2009 telling the Information Management storyConfidential © Bloor Research 2009
Why are data relationships important?
3. Master data management
…optimise your IT investmentsConfidential © Bloor Research 2009 telling the Information Management storyConfidential © Bloor Research 2009
Why are data relationships important?
4. Data governance
…optimise your IT investmentsConfidential © Bloor Research 2009 telling the Information Management storyConfidential © Bloor Research 2009
Why are data relationships important?
5. Data modelling
…optimise your IT investmentsConfidential © Bloor Research 2009 telling the Information Management storyConfidential © Bloor Research 2009
Why are data relationships important?
6. Business intelligence
…optimise your IT investmentsConfidential © Bloor Research 2009 telling the Information Management storyConfidential © Bloor Research 2009
Why are data relationships important?
7 & 8 & 9 & …
Data integration
Legacy migration
Data warehousing
…
…optimise your IT investmentsConfidential © Bloor Research 2009 telling the Information Management storyConfidential © Bloor Research 2009
Why are data relationships difficult?
No definition exists across multiple sources
Within a source many relationships are not explicit
Ownership of relationships is diverse
Many relationships are defined within application software and not in the data source
…optimise your IT investmentsConfidential © Bloor Research 2009 telling the Information Management storyConfidential © Bloor Research 2009
Data relationships in place
Different issues arise when you consider relationships within
systems versus across systems
…optimise your IT investmentsConfidential © Bloor Research 2009 telling the Information Management storyConfidential © Bloor Research 2009
Data relationships within systems
Typical functions:Identification of primary-foreign key pairs
Dependency analysis
Redundant columns
Usually provided through data profiling, which also provides error statistics
…optimise your IT investmentsConfidential © Bloor Research 2009 telling the Information Management storyConfidential © Bloor Research 2009
Data relationships across systems
Requirement for relationship discovery
No requirement for error statistics
Requirement for rule violations where this represents a violation of a cross-source relationship
…optimise your IT investmentsConfidential © Bloor Research 2009 telling the Information Management storyConfidential © Bloor Research 2009
Specific requirements
For MDM – overlap & precedence analysis, transformation & business rules and exceptions, outlier analysis, matching keys
For data migration & archival – business entities
…optimise your IT investmentsConfidential © Bloor Research 2009 telling the Information Management storyConfidential © Bloor Research 2009
General functions
Automation of MDM and Profiling functions
Visualisation of relationships
Semantics the semantic type of the data e.g. zip code
context-free discovery – e.g. recognising that cust# is equivalent to custID
Data classification: recognising the relationship between a pre-defined, business-user-maintained domain of values and the actual content of a field in order to identify the content of a field as well as unexpected values.
Business glossary
…optimise your IT investmentsConfidential © Bloor Research 2009 telling the Information Management storyConfidential © Bloor Research 2009
Tools LandscapeTools Landscape
…optimise your IT investmentsConfidential © Bloor Research 2009 telling the Information Management storyConfidential © Bloor Research 2009
Conclusion
Understanding data relationships across data sources is important in many data management disciplines
There are relatively few tools that are good at discovering such relationships – moreover, data discovery is a broad discipline and no one tool is good at all aspects of relationship discovery.
…optimise your IT investmentsConfidential © Bloor Research 2009