big data usage in linkedin

30
Recruiting Solutions Recruiting Solutions Recruiting Solutions Harvesting Information Excellence Information Excellence 2012 Sep Session

Upload: information-excellence

Post on 23-Jan-2015

400 views

Category:

Technology


6 download

DESCRIPTION

Information Excellence Presentation 2010 Sep from Hari Shankar, Linkedin Big Data Engineer, on Big Data usage in Linkedin

TRANSCRIPT

Page 1: Big Data Usage in Linkedin

Recruiting SolutionsRecruiting SolutionsRecruiting Solutions

Harvesting Information Excellence

Information Excellence2012 Sep Session

Page 2: Big Data Usage in Linkedin

Information Excellence 2 informationexcellence.wordpress.com

Big Data Usage and Implementation in Linkedin

Hari Shankar, Big Data Engineer, Linkedin

Thank You

for hosting us today

Today’s Speakers

Page 3: Big Data Usage in Linkedin

Big data and Hadoop

September 2012

Hari Shankar MenonSoftware engineerLinkedIn

3

Page 4: Big Data Usage in Linkedin

LinkedIn Engineering Data warehouse team

Previously, Software engineer @Clickable– Worked on building the reporting and analytics platform on

Hadoop and HBase.

Hadoop and Open-source enthusiast

4

About me

Page 5: Big Data Usage in Linkedin

About LinkedIn Data Infrastructure overview Hadoop@LinkedIn Challenges

5

Agenda

Page 6: Big Data Usage in Linkedin

Our missionConnect the world’s professionals to make

them more productive and successful

6

Page 7: Big Data Usage in Linkedin

7

*as of Nov 4, 2011**as of June 30, 2011

2 48

17

32

55

90

2004 2005 2006 2007 2008 2009 2010

LinkedIn Members (Millions)

175M+

85%Fortune 100 Companies use LinkedIn to hire

Company Pages

>2M

**

New Members joining

~2/sec

Professional searches in 2011

~4.2B

LinkedIn by numbers

Page 8: Big Data Usage in Linkedin

About LinkedIn Data Infrastructure overview Hadoop@LinkedIn Challenges

8

Page 9: Big Data Usage in Linkedin

* Chart from Philip Russom- Research Director: TDWI

What is big data?

Page 10: Big Data Usage in Linkedin

10

Infrastructure technologies

Primary data store (Front-end)Distributed key-value store

Document-oriented store

Distributed PubSub messaging

Search technologies

Database change replication SenseiDB

Zoie Bobo

Page 11: Big Data Usage in Linkedin

11

http://data.linkedin.com/opensource

Open source

Page 12: Big Data Usage in Linkedin

About LinkedIn Data Infrastructure overview Hadoop@LinkedIn Challenges

12

Page 13: Big Data Usage in Linkedin

What is Hadoop Evolution of Hadoop Impact

13

Page 14: Big Data Usage in Linkedin

Recommendation systems– Generating recommendations– Modeling– A/B Testing– Grandfathering

Data warehouse/ETL– Raw data storage– Aggregations– Heavy lifting

Data sciences– Strategic analyses– Experimentation sandbox

14

@

Page 15: Big Data Usage in Linkedin

15

Pandora Search for People

Events YouMay BeInterested In

Groups browse maps

The Recommendations opportunity

• Relevance/Latency

• Offline computation

• Caching

Page 16: Big Data Usage in Linkedin

16

Improving recommendations

• Mathematical modeling

• A/B Testing

• Grandfathering

Page 17: Big Data Usage in Linkedin

17

Hadoop in the Data warehouse

• Source of truth• Lower retention• Ad-hoc analysis

• Longer retention• Complex

transformations• Algorithmic

computations

Page 18: Big Data Usage in Linkedin

18

Hadoop in Data Sciences

• Deep dives

• Sandbox

• Hackday projects

Page 19: Big Data Usage in Linkedin

19

Data Insights - 1

Job migration after financial collapse

Page 20: Big Data Usage in Linkedin

20

Data Insights - 2

Page 21: Big Data Usage in Linkedin

21

Data Insights - 3

Page 22: Big Data Usage in Linkedin

About LinkedIn Data Infrastructure overview Hadoop@LinkedIn Challenges

22

Page 23: Big Data Usage in Linkedin

1. User adoption of new technologies2. Real-time processing3. Graph/Network algorithms4. Making data accessible

23

Challenges

Page 24: Big Data Usage in Linkedin

24

User adoption

Page 25: Big Data Usage in Linkedin

25

• Challenges• Random reads/writes• Warm-up time

• Solutions• Parts of the problem that can be moved offline?• HBase, Voldemort

Real-time processing

Page 26: Big Data Usage in Linkedin

26

• Graph problems• Traditional joins

Map-reduce-incompatible problems

Page 27: Big Data Usage in Linkedin

27

• Hadoop Tons of data

Making data accessible

Page 28: Big Data Usage in Linkedin

Finally!

No Silver bullet

Hadoop Offline processing

Scalability by design

28

Page 29: Big Data Usage in Linkedin

www.linkedin.com/in/harisreekumar

29

www.linkedin.com/company/linkedin/careers

Page 30: Big Data Usage in Linkedin

Information Excellence 30 informationexcellence.wordpress.com

Community Focused

Volunteer Driven

Knowledge Share

Accelerated Learning

Collective Excellence

Distilled Knowledge

Shared, Non Conflicting Goals

Validation / Brainstorm platform

Mentor, Guide, Coach

Satisfied, Empowered Professional

Richer Industry and Academia

About Information Excellence Group

Progress Information Excellence

Towards an Enriched Profession, Business and Society