![Page 1: LinkedIn Links Analysts to Collaborate on Analysis · 2016. 9. 13. · The analytics team at LinkedIn 4 • Different types of “data discovery” happen in different teams of analysts](https://reader035.vdocuments.mx/reader035/viewer/2022071008/5fc5e92ecf249f730f2e4015/html5/thumbnails/1.jpg)
#TDPARTNERS16 #datacatalog GEORGIA WORLD CONGRESS CENTER
LinkedIn Links Analysts to Collaborate on AnalysisRohit JonnalagaddaBusiness Operations, LinkedIn
Stephanie McReynoldsVP of Marketing @Alation
![Page 2: LinkedIn Links Analysts to Collaborate on Analysis · 2016. 9. 13. · The analytics team at LinkedIn 4 • Different types of “data discovery” happen in different teams of analysts](https://reader035.vdocuments.mx/reader035/viewer/2022071008/5fc5e92ecf249f730f2e4015/html5/thumbnails/2.jpg)
A little about me & LinkedIn
2
• Investment banker turned data junkie• Work as part of a cross-functional team to support our
Marketing Solutions (advertising) business• Expert in manipulating data with SQL but goal is always
to deliver insights that actually drive business decisions
![Page 3: LinkedIn Links Analysts to Collaborate on Analysis · 2016. 9. 13. · The analytics team at LinkedIn 4 • Different types of “data discovery” happen in different teams of analysts](https://reader035.vdocuments.mx/reader035/viewer/2022071008/5fc5e92ecf249f730f2e4015/html5/thumbnails/3.jpg)
• Mix of numerous home-grown, open source, & procured products• Offline analytics starts with Kafka → Hadoop• Distributed team of users (“anyone” can learn SQL)
• Primary DW Environment• Used by hundreds of
data analysts across Finance, Operations, Product, HR, etc.
• (Most) ETL in Hadoop
• Built at LinkedIn• Hundreds of Billions of
Messages Routed Daily
• Petabytes of storage across the grid
• Writing > 75TB+ Daily• Spread across 3 DC’s• Thousands of Nodes• Hive, Presto, Spark
Supported by a robust environment
3
![Page 4: LinkedIn Links Analysts to Collaborate on Analysis · 2016. 9. 13. · The analytics team at LinkedIn 4 • Different types of “data discovery” happen in different teams of analysts](https://reader035.vdocuments.mx/reader035/viewer/2022071008/5fc5e92ecf249f730f2e4015/html5/thumbnails/4.jpg)
The analytics team at LinkedIn
4
• Different types of “data discovery” happen in different teams of analysts
• All analysts need access to data, but individual workflows can be quite different
• Data catalogs are a central point of reference for all data consumers
Executive Reporting
1000s of Data Consumers
10s
Business Ops100s
Ad Hoc Analysis1000s
![Page 5: LinkedIn Links Analysts to Collaborate on Analysis · 2016. 9. 13. · The analytics team at LinkedIn 4 • Different types of “data discovery” happen in different teams of analysts](https://reader035.vdocuments.mx/reader035/viewer/2022071008/5fc5e92ecf249f730f2e4015/html5/thumbnails/5.jpg)
• 3 data industry trends are driving Data Catalogs in the enterprise
• Challenges of linking analysts & data• How data cataloging helps• LinkedIn example
Linking Analysts to Collaborate
5
![Page 6: LinkedIn Links Analysts to Collaborate on Analysis · 2016. 9. 13. · The analytics team at LinkedIn 4 • Different types of “data discovery” happen in different teams of analysts](https://reader035.vdocuments.mx/reader035/viewer/2022071008/5fc5e92ecf249f730f2e4015/html5/thumbnails/6.jpg)
Trend #1: Data Proliferation
6
Data-driven organizations demand data proliferation:• All new products released
with new data structures• A new data set every
week• Deeper and wider data is
being produced than ever before
- Typical weblog has hundreds of attributes/columns
![Page 7: LinkedIn Links Analysts to Collaborate on Analysis · 2016. 9. 13. · The analytics team at LinkedIn 4 • Different types of “data discovery” happen in different teams of analysts](https://reader035.vdocuments.mx/reader035/viewer/2022071008/5fc5e92ecf249f730f2e4015/html5/thumbnails/7.jpg)
“Big” data’s challenge is human
7
Volume is not our challenge, the speed of analysis is• Impossible for any one analyst to keep up with the continual stream of
new data updates• Documentation is often light by design• Rough conclusions are easy, accurate insights are hard• Impossible for any one analyst to keep up with the continual stream of
new data updates
Remember: insights come from analysis, not from keeping up with the data
![Page 8: LinkedIn Links Analysts to Collaborate on Analysis · 2016. 9. 13. · The analytics team at LinkedIn 4 • Different types of “data discovery” happen in different teams of analysts](https://reader035.vdocuments.mx/reader035/viewer/2022071008/5fc5e92ecf249f730f2e4015/html5/thumbnails/8.jpg)
Trend #2: Data Discovery
8
36% of end-users now preparing their own data - Late-binding/discovery oriented style of analysis wins over predictable/ well structured BI queries
Source: TDWI Best Practices Report, Improving Data Preparation for Business Analytics, Q3 2016
![Page 9: LinkedIn Links Analysts to Collaborate on Analysis · 2016. 9. 13. · The analytics team at LinkedIn 4 • Different types of “data discovery” happen in different teams of analysts](https://reader035.vdocuments.mx/reader035/viewer/2022071008/5fc5e92ecf249f730f2e4015/html5/thumbnails/9.jpg)
What can be cataloged for re-use?
9
86% of organizations looking for re-use options to make data prep efficient – data catalogs help immensely with re-use & consistency
Source: TDWI Best Practices Report, Improving Data Preparation for Business Analytics, Q3 2016
![Page 10: LinkedIn Links Analysts to Collaborate on Analysis · 2016. 9. 13. · The analytics team at LinkedIn 4 • Different types of “data discovery” happen in different teams of analysts](https://reader035.vdocuments.mx/reader035/viewer/2022071008/5fc5e92ecf249f730f2e4015/html5/thumbnails/10.jpg)
Trend #3: Collaboration
10
Analysis has become a team sport
“According to data we have collected over the past two decades, the time spent by managers and employees in collaborative activities has ballooned by 50% or more.
Source: Harvard Business Review, Collaborative Overload, January 2016
![Page 11: LinkedIn Links Analysts to Collaborate on Analysis · 2016. 9. 13. · The analytics team at LinkedIn 4 • Different types of “data discovery” happen in different teams of analysts](https://reader035.vdocuments.mx/reader035/viewer/2022071008/5fc5e92ecf249f730f2e4015/html5/thumbnails/11.jpg)
“But Collaborative Overload is a Risk
11
Data on leaders across 20 organizations show that those regarded by colleagues as the best information sources & most desirable collaborators have the lowest career satisfaction.
Source: Harvard Business Review, Collaborative Overload, January 2016
![Page 12: LinkedIn Links Analysts to Collaborate on Analysis · 2016. 9. 13. · The analytics team at LinkedIn 4 • Different types of “data discovery” happen in different teams of analysts](https://reader035.vdocuments.mx/reader035/viewer/2022071008/5fc5e92ecf249f730f2e4015/html5/thumbnails/12.jpg)
Challenges of the new era of analysis
12
Data-driven orgs drive Data Proliferation• A new product, a new dataset• Every product launches new datasets• Unboxing process is often one of discovery without documentation
More ad-hoc analysis challenges Human Productivity & System Performance• Performance - Cost to trying a query out for the first time• Analysts & tools must be productive cross-system
Analysis is now a team sport where Collaborative Efficiency & Overload must be managed• Effective collaboration requires some organizing structure/documentation• Best analysts are overloaded/burnt out• New analysts take 6 months to learn LinkedIn data
![Page 13: LinkedIn Links Analysts to Collaborate on Analysis · 2016. 9. 13. · The analytics team at LinkedIn 4 • Different types of “data discovery” happen in different teams of analysts](https://reader035.vdocuments.mx/reader035/viewer/2022071008/5fc5e92ecf249f730f2e4015/html5/thumbnails/13.jpg)
Scenario: Onboarding new users
13
Scenario: New employee needs to learn about our vast data footprint• No single place to learn• Unlike “source code”, queries are decentralized and live
on a mix of desktops and servers• Difficult to discern “source of truth” when questions can
have multiple answers• Need to come up to speed quickly due to rapid growth
and constant product innovation
![Page 14: LinkedIn Links Analysts to Collaborate on Analysis · 2016. 9. 13. · The analytics team at LinkedIn 4 • Different types of “data discovery” happen in different teams of analysts](https://reader035.vdocuments.mx/reader035/viewer/2022071008/5fc5e92ecf249f730f2e4015/html5/thumbnails/14.jpg)
Step 1: Build an Inventory
14
• What data sources exist?• What data is available?• What do the columns mean?• Where does the data come from (ETL, lineage)?• What is sensitive/protected?• What promises do we make to our users about their private
data and how we can use it for advertising purposes?
![Page 15: LinkedIn Links Analysts to Collaborate on Analysis · 2016. 9. 13. · The analytics team at LinkedIn 4 • Different types of “data discovery” happen in different teams of analysts](https://reader035.vdocuments.mx/reader035/viewer/2022071008/5fc5e92ecf249f730f2e4015/html5/thumbnails/15.jpg)
Step 2: Enrich the Catalog
15
An inventory without a sense of usage is not very informative, need to know:
• Who used it• How was it used?• Why was that data helpful
Samples of common queries: What is the growth rate in Country X? How is our sales pipeline tracking for the quarter?What customers are at risk for churning?
![Page 16: LinkedIn Links Analysts to Collaborate on Analysis · 2016. 9. 13. · The analytics team at LinkedIn 4 • Different types of “data discovery” happen in different teams of analysts](https://reader035.vdocuments.mx/reader035/viewer/2022071008/5fc5e92ecf249f730f2e4015/html5/thumbnails/16.jpg)
Step 3: Support Human Adoption
16
• Training• Support• Adapting
![Page 17: LinkedIn Links Analysts to Collaborate on Analysis · 2016. 9. 13. · The analytics team at LinkedIn 4 • Different types of “data discovery” happen in different teams of analysts](https://reader035.vdocuments.mx/reader035/viewer/2022071008/5fc5e92ecf249f730f2e4015/html5/thumbnails/17.jpg)
Value of Alation for LinkedIn Analysts
17
Productivity:• Collaboration: Teams around the world can quickly share insights with
one anotherROI:• Teams are spending more time disseminating knowledge and less time
writing queries. This shortens product release cycles, drives faster deal closings, and increases overall productivity.
Benefits:• Onboarding has been greatly simplified as Alation has generated an
organic repository of up to date knowledge
![Page 18: LinkedIn Links Analysts to Collaborate on Analysis · 2016. 9. 13. · The analytics team at LinkedIn 4 • Different types of “data discovery” happen in different teams of analysts](https://reader035.vdocuments.mx/reader035/viewer/2022071008/5fc5e92ecf249f730f2e4015/html5/thumbnails/18.jpg)
Alation Delivers a GPS for Analysts
18
Data Catalog links data and analysts together for collaboration• Automates the inventory• Maintains a rich catalog based on actual analyst behaviors• Reinforces best practices
- SmartSuggest recommendations- Behavioral interventions for governance- Monitors wide & deep usage
![Page 19: LinkedIn Links Analysts to Collaborate on Analysis · 2016. 9. 13. · The analytics team at LinkedIn 4 • Different types of “data discovery” happen in different teams of analysts](https://reader035.vdocuments.mx/reader035/viewer/2022071008/5fc5e92ecf249f730f2e4015/html5/thumbnails/19.jpg)
Table Explorer
19
![Page 20: LinkedIn Links Analysts to Collaborate on Analysis · 2016. 9. 13. · The analytics team at LinkedIn 4 • Different types of “data discovery” happen in different teams of analysts](https://reader035.vdocuments.mx/reader035/viewer/2022071008/5fc5e92ecf249f730f2e4015/html5/thumbnails/20.jpg)
Popularity Indicators
20
![Page 21: LinkedIn Links Analysts to Collaborate on Analysis · 2016. 9. 13. · The analytics team at LinkedIn 4 • Different types of “data discovery” happen in different teams of analysts](https://reader035.vdocuments.mx/reader035/viewer/2022071008/5fc5e92ecf249f730f2e4015/html5/thumbnails/21.jpg)
Data Profiling
21
![Page 22: LinkedIn Links Analysts to Collaborate on Analysis · 2016. 9. 13. · The analytics team at LinkedIn 4 • Different types of “data discovery” happen in different teams of analysts](https://reader035.vdocuments.mx/reader035/viewer/2022071008/5fc5e92ecf249f730f2e4015/html5/thumbnails/22.jpg)
Lineage
22
![Page 23: LinkedIn Links Analysts to Collaborate on Analysis · 2016. 9. 13. · The analytics team at LinkedIn 4 • Different types of “data discovery” happen in different teams of analysts](https://reader035.vdocuments.mx/reader035/viewer/2022071008/5fc5e92ecf249f730f2e4015/html5/thumbnails/23.jpg)
Articles to Collaborate on Definitions
23
![Page 24: LinkedIn Links Analysts to Collaborate on Analysis · 2016. 9. 13. · The analytics team at LinkedIn 4 • Different types of “data discovery” happen in different teams of analysts](https://reader035.vdocuments.mx/reader035/viewer/2022071008/5fc5e92ecf249f730f2e4015/html5/thumbnails/24.jpg)
Data Catalogs address complexity
24
A platform for efficient & effective human collaboration • Proactive recommendations• Inline documentation• Details to navigate Data proliferation
- Table Explorer- Data Profiling- Interactive Query Editor- Lineage
![Page 25: LinkedIn Links Analysts to Collaborate on Analysis · 2016. 9. 13. · The analytics team at LinkedIn 4 • Different types of “data discovery” happen in different teams of analysts](https://reader035.vdocuments.mx/reader035/viewer/2022071008/5fc5e92ecf249f730f2e4015/html5/thumbnails/25.jpg)
Find out more about Data Catalogs
25
alation.com/resources• TDWI Best Practices Report, Improving Data Preparation for
Business Analytics, Q3 2016
Alation Booth #729
![Page 26: LinkedIn Links Analysts to Collaborate on Analysis · 2016. 9. 13. · The analytics team at LinkedIn 4 • Different types of “data discovery” happen in different teams of analysts](https://reader035.vdocuments.mx/reader035/viewer/2022071008/5fc5e92ecf249f730f2e4015/html5/thumbnails/26.jpg)
Thank You
Questions/CommentsEmail:
Join Us AtAlation Booth
Follow UsTwitter
Rate This Session # with the PARTNERS Mobile App
Remember To Share Your Virtual Passes
[email protected] & [email protected]
#729
739
26
@slangenfeld @alation