umm, how did you get that number? managing data integrity throughout the data lifecycle

28

Upload: pointmarc

Post on 10-May-2015

247 views

Category:

Technology


3 download

DESCRIPTION

We live at the intersection of data and people. Data integrity is a function of the decisions that people make throughout the data lifecycle. Dave De Noia, Pointmarc lead solution architect in data management, gives his take on the processes and people that affect data integrity throughout organizations at DRIVE 2014 (Data, Reporting, Intelligence, and Visualization Exchange) Whether you're a retailer merging web analytics data with offline numbers or a healthcare company adding new data management software, De Noia explains how to avoid logic wobble and establish shared data structures. About Dave: Dave De Noia lives in the balance of chaos and order inherent to working with data. Starting his career at Microsoft building analyses in both SQL and big data environments, Dave later moved onto Redfin where he created and managed data infrastructure for analysis and reporting projects. Dave now serves as the senior solution and data architect at Pointmarc, a Bellevue-based digital analytics consultancy, where he helps some of the world’s largest brands get value from their data. Naturally functioning as a bridge between business and technical teams, Dave’s professional passion lies at the intersection of data and people. About Pointmarc: Pointmarc is a leading digital analytics agency providing actionable marketing insight and analytics platform instrumentation services for Fortune 500 clients within retail, technology, financial, media and pharmaceutical industries. With offices in Seattle, Boston, San Francisco and Portland, Pointmarc’s immersive approach to analytics empowers businesses to dive deeper into their data. Email [email protected] for more information on data management or analytics instrumentation, and follow @pointmarc on Twitter for the latest in analytics.

TRANSCRIPT

Page 1: Umm, how did you get that number? Managing Data Integrity throughout the Data Lifecycle
Page 2: Umm, how did you get that number? Managing Data Integrity throughout the Data Lifecycle

MANAGING DATA INTEGRITY THROUGHOUT THE DATA LIFECYCLE

SOURCEDATA

ANALYTICS ROOT

ACCESS

REPORTING

SOURCEDATA

ANALYTICS ROOT

ACCESS

REPORTING

SOURCEDATA

ANALYTICS ROOT

ACCESS

REPORTING

SOURCEDATA

ANALYTICS ROOT

ACCESS

SOURCEDATA

ANALYTICS ROOT

ACCESS

REPORTING

Page 3: Umm, how did you get that number? Managing Data Integrity throughout the Data Lifecycle

With so many people making so many decisions, how do you maintain integrity in this environment?

MANAGING DATA INTEGRITY THROUGHOUT THE DATA LIFECYCLE

Page 4: Umm, how did you get that number? Managing Data Integrity throughout the Data Lifecycle

We live at the intersection data and people.

Data integrity is a function of the decisions that people make throughout the data lifecycle.

MANAGING DATA INTEGRITY THROUGHOUT THE DATA LIFECYCLE

Page 5: Umm, how did you get that number? Managing Data Integrity throughout the Data Lifecycle
Page 6: Umm, how did you get that number? Managing Data Integrity throughout the Data Lifecycle

Arc of an analytics team.

An analyst, a blob of data and a question

An analyst, a blob of data and 2 questions

Two analysts, a blob of data and 5 questions

Three analysts, 2 blobs of data and 10 questions

Five analysts, 4 blobs of data and countless questions

complexity

frut

stra

tion

MANAGING DATA INTEGRITY THROUGHOUT THE DATA LIFECYCLE

Page 7: Umm, how did you get that number? Managing Data Integrity throughout the Data Lifecycle

Eventually, your questions and answers become a worldview.

How consistent is that worldview?

MANAGING DATA INTEGRITY THROUGHOUT THE DATA LIFECYCLE

Page 8: Umm, how did you get that number? Managing Data Integrity throughout the Data Lifecycle

Sample Question: How many clients are we working with?• Must create and employ logic.• Three dimensions of your logic

• Value creation • Attribution* • Filtering

CLIENT REPORTClients: 5788

MANAGING DATA INTEGRITY THROUGHOUT THE DATA LIFECYCLE

Page 9: Umm, how did you get that number? Managing Data Integrity throughout the Data Lifecycle

Now, let’s add some complexity to our question.• Clients can be active in more than one

product or service line

• New logic is created and employed

• How many people is the company working with now? Which report are you looking at?

CLIENT REPORTClients:

Product 1: 4,563Product 2: 2,127Product 3: 1,294

Best Practice: Make data accessible in multiple environments, but be sure to maintain your logic and across contexts.

MANAGING DATA INTEGRITY THROUGHOUT THE DATA LIFECYCLE

Page 10: Umm, how did you get that number? Managing Data Integrity throughout the Data Lifecycle

Add in even more complexity to the original question• An analyst creates a new dimension

(‘client_need_segment’) that segments clients into ‘high need’ and ‘low need’ buckets.

• Only ‘high need’ clients are served by the customer service department…

• Customer service filters their reports to ‘high need’ clients

• This becomes their new definition of ‘clients’

• How far does their influence spread?

CLIENT REPORTClients:

Product 1: 4,563 High-need: 1,748 Low-need: 2,815

Product 2: 2,127 High-need: 851 Low-need: 1276

Product 3: 1,294 High-need: 354 Low-need: 940

Best Practice: Take taxonomy seriously. Terms with general broadly categorical meanings like clients, customers, and conversion make poor metric names.

MANAGING DATA INTEGRITY THROUGHOUT THE DATA LIFECYCLE

Page 11: Umm, how did you get that number? Managing Data Integrity throughout the Data Lifecycle

Problems of process Logic Vetting & Logic Wobble• Logic Vetting

• Does the concept/metric make sense?• Is the logic implemented correctly?• How broadly applicable is the new concept/metric?• Best Practice: Metrics committees & code reviews.

• Logic Wobble• What environment did the analyst work in?• Where does the logic live?• What is the process to translate the logic to other scenarios and environments?• Best Practice: Move established logic into shared data structures.

MANAGING DATA INTEGRITY THROUGHOUT THE DATA LIFECYCLE

Page 12: Umm, how did you get that number? Managing Data Integrity throughout the Data Lifecycle

DRIVE 2014 Conference

Page 13: Umm, how did you get that number? Managing Data Integrity throughout the Data Lifecycle

Optimizing your relationship with IT/Engineering/Vendor

• IT/Engineering/Vendor controls the data lifecycle until you access the data

• Decisions often have large downstream effects

• Your data lifecycle needs will change over time

MANAGING DATA INTEGRITY THROUGHOUT THE DATA LIFECYCLE

Source Data

Analytics Root Dashboarding

General Analytics Access Reporting

Page 14: Umm, how did you get that number? Managing Data Integrity throughout the Data Lifecycle

Changes that I’ve seen flow downstream(with suboptimal results)

Changes you can blame engineering/IT/Vendor for• Cookie synch changes• Naming convention changes• Logging changes

• Event firing scenarios• Format• Human vs. system generated values

Changes that you should have known were coming• Hacks • Piggybacks

Changes in a less engineering-focused scenario• Changes to the forms that feed your data

MANAGING DATA INTEGRITY THROUGHOUT THE DATA LIFECYCLE

Page 15: Umm, how did you get that number? Managing Data Integrity throughout the Data Lifecycle

Differences that lead to barriers

• Often work in silos.

• Processes are earlier in maturity curve.

• Analytics’ negotiations with execs remain asymmetrical.

MANAGING DATA INTEGRITY THROUGHOUT THE DATA LIFECYCLE

• Trends towards deep integration.

• Processes are known & accepted throughout most organizations.

• Engineering’s process requirements provide an effective counter when negotiating with executives.

Engineers VS Analysts

Page 16: Umm, how did you get that number? Managing Data Integrity throughout the Data Lifecycle

MANAGING DATA INTEGRITY THROUGHOUT THE DATA LIFECYCLE

How do you play nice?

Page 17: Umm, how did you get that number? Managing Data Integrity throughout the Data Lifecycle

MANAGING DATA INTEGRITY THROUGHOUT THE DATA LIFECYCLE

1. Don’t just work together when there are problems.

2. Learn each other’s successes, pain points and processes.

3. Schedule regular ‘bridge’ meetings

4. Read the release emails

5. Listen to rants at the water cooler

6. Where their process will work for analytics, borrow shamelessly!

Page 18: Umm, how did you get that number? Managing Data Integrity throughout the Data Lifecycle

Examples of Working Together

Testing• Engineering priority: It must work every time.• Analytics priority: Solve with most rigorous precision• Solution: Integrate analytics’ tribal knowledge into unit testing

Data Access

• Engineering perspective: What are you going to do with the data?

• Analytics perspective: Just give us access and get out of the way?!?!

• Solution: Concrete example with executive support that ladders into larger access goals.

MANAGING DATA INTEGRITY THROUGHOUT THE DATA LIFECYCLE

Page 19: Umm, how did you get that number? Managing Data Integrity throughout the Data Lifecycle

DRIVE 2014 Conference

Page 20: Umm, how did you get that number? Managing Data Integrity throughout the Data Lifecycle

Taming the beast withDATA PRODUCT MANAGEMENT

• Transparency

• Consistency

• Managing links between logics

• Managing Access

• Formalize roles w/ engineering

MANAGING DATA INTEGRITY THROUGHOUT THE DATA LIFECYCLE

Page 21: Umm, how did you get that number? Managing Data Integrity throughout the Data Lifecycle

Transparency & Consistency.

• Must move from discovery environment to a transparent state.

• Transparent state enables distribution & sharing.

• Consistency flows from transparency.

• Best Practice: Logic must live in shared data structures.

Transparency ConsistencyManaging links between logic

Managing Access

MANAGING DATA INTEGRITY THROUGHOUT THE DATA LIFECYCLE

Page 22: Umm, how did you get that number? Managing Data Integrity throughout the Data Lifecycle

Managing links between logic.

• Establish standards for how concepts & logic implementations interact by context.

• Going through this exercise often shows you where your logic is incomplete or inconsistent.

• Best Practice: Create, manage & maintain appropriately grained data structures to the logical concepts that people employ.

Transparency ConsistencyManaging links between logic

Managing Access

MANAGING DATA INTEGRITY THROUGHOUT THE DATA LIFECYCLE

Page 23: Umm, how did you get that number? Managing Data Integrity throughout the Data Lifecycle

Managing Access• Different levels of access for different levels of data literacy

• Data scientist: Just label it.

• Analyst: Apply standardized filters to data elements

• Operational Managers: Drillable reports to line graphs

• Executive: Range between analyst-level to pictures only.

• Best Practice: Manage data literacy throughout your org.

Transparency ConsistencyManaging links between logic

Managing Access

MANAGING DATA INTEGRITY THROUGHOUT THE DATA LIFECYCLE

Page 24: Umm, how did you get that number? Managing Data Integrity throughout the Data Lifecycle

Formalize roles:Analytics versus Engineering

Analytics owns the ‘abstract’• Scripting/querying for data• Logical Data Modeling• Taxonomy• Data lifecycle management

Engineering owns the ‘concrete’• Systems management• Testing• Project management• Implementation• Scheduling and integration

MANAGING DATA INTEGRITY THROUGHOUT THE DATA LIFECYCLE

Page 25: Umm, how did you get that number? Managing Data Integrity throughout the Data Lifecycle

Best Practice Summation

• Prioritize taxonomy and data governance

• Create shared data structures at appropriate grains

• Manage your logic(s) through metrics committees and code reviews

• Plan for your stack to be dynamic.

• Get to know your engineers!

MANAGING DATA INTEGRITY THROUGHOUT THE DATA LIFECYCLE

Page 26: Umm, how did you get that number? Managing Data Integrity throughout the Data Lifecycle

In Closing.

Maintaining data integrity is a culture and organizational challenge

Partner with Engineering/IT/Vendor as deeply as possible

Manage your data as a product to ensure integrity throughout the data lifecycle.

MANAGING DATA INTEGRITY THROUGHOUT THE DATA LIFECYCLE

Page 27: Umm, how did you get that number? Managing Data Integrity throughout the Data Lifecycle
Page 28: Umm, how did you get that number? Managing Data Integrity throughout the Data Lifecycle

Let’s continue the conversation.

MANAGING DATA INTEGRITY THROUGHOUT THE DATA LIFECYCLE

Presentation to be available on Slideshare.

[email protected]

Follow @pointmarc on Twitter