build big data products at linkedin
TRANSCRIPT
Building Big Data Products at LinkedInLili Wu
[email protected] Corporation
What is a Data Product
Data Product: a product that facilitates an end goal through the use of data.-- DJ Patil, Greylock Partners
From raw data … Derive insights/actions Improve decision making Be more efficient
Example
Three Phases Phase II: Growth Phase III: Maintenance
Lessons Learned
Clean & standardize data• 80% of the work in any data project
is in cleaning the data • Human can enter erroneous data• Different data sources have
different representations
Track everything• Daily / monthly active users• Impression, click• User growth• Revenue• Coverage
Build Incrementally• Minimum Viable Product (MVP)• Leverage existing systems• Fastest way to deploy the product• Listen to user feedback, and
observe user engagement
E.g.: PYMK’s first version:- A few intuitive heuristics- Computed by simple DB queries- Three results for each user
Overwhelming user engagement Then we built sophisticated algorithm / infrastructure
Experimentation • Apply scientific method• Run A/B testing to learn user
preference• Measure impact on real users• Iterate on models
Keep the lights on
• Minimum human intervention
• Solidify monitoring and auto alerting throughout the systems
• Build comprehensive dashboard for all aspects of business (growth, engagement, revenue)
Reward
Time
Phase I:Inception
Phase II:Growth
Phase III:Maintenance
Insights:• Profile views by time, location, etc Goal/Actions:• Connect, message, search• Follow influencers, companies
Phase I: Inception
Time/effort spent comparing to reward attained for the three phases of data product development.
1. User experience matters:
2. Don’t build a blackbox:
• Can we interpret the recommendation easily?
• Can we explain the results to CEO in 5 minutes?
3. Data quality is crucial:
• Mistakes can be hard to detect/debug
• Extremely simple mistakes can have big impact
“jobid” “id” caused number of recommendations down to 0
• Need prevention mechanism
Formulation
• What interesting data/insights to surface? Enough coverage?
• Who would want this product?
Define goal / metrics
• What is the goal for this product? e.g.: increase user base, engagement, revenue.
• How do we measure success: revenue increase by x%, click-through-rate increase by y%
Validate Product
• Analyze subset of data
• Create a simple test app
• Solicit feedback from teams and friends
50%
CTR
Increase
The 2014 Grace Hopper Celebration of Women in Computing October 8-10, 2014 Phoenix, Arizona