build big data products at linkedin

1
Building Big Data Products at LinkedIn Lili Wu [email protected] LinkedIn Corporation What is a Data Product Data Product: a product that facilitates an end goal through the use of data. -- DJ Patil, Greylock Partners From raw data … Derive insights/actions Improve decision making Be more efficient Example Three Phases Phase II: Growth Phase III: Maintenance Lessons Learned Clean & standardize data 80% of the work in any data project is in cleaning the data Human can enter erroneous data Different data sources have different representations Track everything Daily / monthly active users Impression, click User growth • Revenue • Coverage Build Incrementally Minimum Viable Product (MVP) Leverage existing systems Fastest way to deploy the product Listen to user feedback, and observe user engagement E.g.: PYMK’s first version: - A few intuitive heuristics - Computed by simple DB queries - Three results for each user Overwhelming user Keep the lights on Minimum human intervention Solidify monitoring and auto alerting throughout the systems Build comprehensive dashboard for all aspects of business (growth, engagement, revenue) Reward Time Phase I: Inception Phase II: Growth Phase III: Maintenance Insights: Profile views by time, location, etc Goal/Actions: Connect, message, search Follow influencers, companies Phase I: Inception Time/effort spent comparing to reward attained for the three phases of data product development. 1. User experience matters: 2. Don’t build a blackbox: Can we interpret the recommendation easily? Can we explain the results to CEO in 5 minutes? 3. Data quality is crucial: Mistakes can be hard to detect/debug Extremely simple mistakes can have big impact “jobid” “id” caused number of recommendations down to 0 Need prevention mechanism Formulation What interesting data/insights to surface? Enough coverage? Who would want this product? Define goal / metrics What is the goal for this product? e.g.: increase user base, engagement, revenue. How do we measure success: revenue increase by x%, click- through-rate increase by y% Validate Product Analyze subset of data Create a simple test app 50% CTR Increase The 2014 Grace Hopper Celebration of Women in Computing October 8-10, 2014 Phoenix, Arizona

Upload: lili-wu

Post on 25-Jul-2015

221 views

Category:

Data & Analytics


3 download

TRANSCRIPT

Page 1: Build Big Data Products at LinkedIn

Building Big Data Products at LinkedInLili Wu

[email protected] Corporation

What is a Data Product

Data Product: a product that facilitates an end goal through the use of data.-- DJ Patil, Greylock Partners

From raw data … Derive insights/actions Improve decision making Be more efficient

Example

Three Phases Phase II: Growth Phase III: Maintenance

Lessons Learned

Clean & standardize data• 80% of the work in any data project

is in cleaning the data • Human can enter erroneous data• Different data sources have

different representations

Track everything• Daily / monthly active users• Impression, click• User growth• Revenue• Coverage

Build Incrementally• Minimum Viable Product (MVP)• Leverage existing systems• Fastest way to deploy the product• Listen to user feedback, and

observe user engagement

E.g.: PYMK’s first version:- A few intuitive heuristics- Computed by simple DB queries- Three results for each user

Overwhelming user engagement Then we built sophisticated algorithm / infrastructure

Experimentation • Apply scientific method• Run A/B testing to learn user

preference• Measure impact on real users• Iterate on models

Keep the lights on

• Minimum human intervention

• Solidify monitoring and auto alerting throughout the systems

• Build comprehensive dashboard for all aspects of business (growth, engagement, revenue)

Reward

Time

Phase I:Inception

Phase II:Growth

Phase III:Maintenance

Insights:• Profile views by time, location, etc Goal/Actions:• Connect, message, search• Follow influencers, companies

Phase I: Inception

Time/effort spent comparing to reward attained for the three phases of data product development.

1. User experience matters:

2. Don’t build a blackbox:

• Can we interpret the recommendation easily?

• Can we explain the results to CEO in 5 minutes?

3. Data quality is crucial:

• Mistakes can be hard to detect/debug

• Extremely simple mistakes can have big impact

“jobid” “id” caused number of recommendations down to 0

• Need prevention mechanism

Formulation

• What interesting data/insights to surface? Enough coverage?

• Who would want this product?

Define goal / metrics

• What is the goal for this product? e.g.: increase user base, engagement, revenue.

• How do we measure success: revenue increase by x%, click-through-rate increase by y%

Validate Product

• Analyze subset of data

• Create a simple test app

• Solicit feedback from teams and friends

50%

CTR

Increase

The 2014 Grace Hopper Celebration of Women in Computing October 8-10, 2014 Phoenix, Arizona