scale, structure, and semantics

33
Recruiting Solutions Scale, Structure, and Semantics Daniel Tunkelang Principal Data Scientist at LinkedIn Daniel 1

Upload: daniel-tunkelang

Post on 09-May-2015

6.409 views

Category:

Technology


0 download

DESCRIPTION

Keynote at 2012 Semantic Technology and Business Conference Scale, Structure, and Semantics Daniel Tunkelang, LinkedIn Science fiction has a mixed track record when it comes to anticipating technological innovations. While Jules Verne fared well with with his predictions of submarine and space technology, artificial intelligence hasn't produced anything like Arthur C. Clarke's HAL 9000. Instead, we've managed to elicit intelligence from machines through unexpected means. Search engines have achieved remarkable success in organizing the world's information by crawling the web, indexing documents, and exploiting link structure to establish authoritativeness. At LinkedIn, we apply large-scale analytics to terabytes of semistructured data to deliver products and insights that serve our 150M+ members. Semantics emerge when we apply the right analytical techniques to a sufficient quality and quantity of data. In this talk, I will describe how LinkedIn's huge and rich graph of relationship data that powers the products our users love. I believe that the lessons we have learned apply broadly to other semantic applications. While quantity and quality of data are the key challenges to delivering a semantically rich experience, the key is to create the right ecosystem that incents people to give you good data, which then forms the basis for great data products.

TRANSCRIPT

Page 1: Scale, Structure, and Semantics

Recruiting Solutions Recruiting Solutions Recruiting Solutions

Scale, Structure, and Semantics Daniel Tunkelang Principal Data Scientist at LinkedIn

Daniel

1

Page 2: Scale, Structure, and Semantics

Take-Aways

2

Communication trumps knowledge representation.

Communication is the problem and the solution.

Page 3: Scale, Structure, and Semantics

Overview

1.  Knowledge representation is overrated. 2.  Computation is underrated.

3.  We have a communication problem.

3

Page 4: Scale, Structure, and Semantics

The Bad News

1.  Knowledge representation is overrated. 2.  Computation is underrated.

3.  We have a communication problem.

4

Page 5: Scale, Structure, and Semantics

AI: a dream deferred.

5

Page 6: Scale, Structure, and Semantics

Memex: the Computer Science Version

6

Page 7: Scale, Structure, and Semantics

Cyc

7

Page 8: Scale, Structure, and Semantics

Freebase

8

Page 9: Scale, Structure, and Semantics

Wolfram Alpha

9

Page 10: Scale, Structure, and Semantics

Knowledge representation is overrated.

Today’s knowledge repositories are: §  incomplete §  inconsistent §  inscrutable §  and not sustained by economic incentives. 1986 estimate of effort to complete Cyc: §  250,000 rules + 350 person-years

10

Page 11: Scale, Structure, and Semantics

The Good News

1.  Knowledge representation is overrated. 2.  Computation is underrated.

3.  We have a communication problem.

11

Page 12: Scale, Structure, and Semantics

Deep Blue

12

vs.

Page 13: Scale, Structure, and Semantics

Watson

13

Page 14: Scale, Structure, and Semantics

Plain Old Search Engines are Pretty Good Too

14

http://blog.stephenwolfram.com/2011/01/jeopardy-ibm-and-wolframalpha/

Page 15: Scale, Structure, and Semantics

The Unreasonable Effectiveness of Data

§  simple models + lots of data >> elaborate models + less data

§  machine translation: parallel corpora >> elaborate rules for syntactic and semantic patterns

§  semantic web formalism just means semantic interpretation on shorter strings between angle brackets

Alon Halevy, Peter Norvig, and Fernando Pereira (2009)

15

Page 16: Scale, Structure, and Semantics

Today’s Challenge

1.  Knowledge representation is overrated. 2.  Computation is underrated.

3.  We have a communication problem.

16

Page 17: Scale, Structure, and Semantics

Semi-structured Data

17

Michael K. Bergman, http://www.mkbergman.com/

Page 18: Scale, Structure, and Semantics

Semi-structured Data at LinkedIn

<person> <id> <first-name /> <last-name /> <location> <name> <country> <code> </country>

</location> <industry> … </person>

Summary

I lead a data science team at LinkedIn, which analyzes terabytes of data to produce products and insights that serve LinkedIn’s members. Prior to LinkedIn, I led a local search quality team at Google and was a founding employee of faceted search pioneer Endeca (acquired by Oracle in 2010), where…

Page 19: Scale, Structure, and Semantics

Semi-structured Search is a Killer App

19

Page 20: Scale, Structure, and Semantics

Another Example: Helping a Friend

Dear Daniel, I'm attaching the resume of an old friend who just moved up to the Bay Area.

He has a very strong background in: §  mobile / wireless applications §  start-ups and new product launches §  international expansion

Best regards, XXX

20

Page 21: Scale, Structure, and Semantics

Company Search

21

Page 22: Scale, Structure, and Semantics

Semi-structured Data Empowers Users

22

Page 23: Scale, Structure, and Semantics

Data-Driven Recommendations

23

Page 24: Scale, Structure, and Semantics

Data-Driven Computation Serves Communication

24

for i in [1..n]! s ← w1 w2 … wi! if Pc(s) > 0! a ← new Segment()! a.segs ← {s}! a.prob ← Pc(s)! B[i] ← {a}! for j in [1..i-1]! for b in B[j]! s ← wj wj+1 … wi! if Pc(s) > 0! a ← new Segment()! a.segs ← b.segs U {s}! a.prob ← b.prob * Pc(s)! B[i] ← B[i] U {a}! sort B[i] by prob! truncate B[i] to size k!

Page 25: Scale, Structure, and Semantics

Recommendations Leverage Semi-structured Data

25

Corpus Stats

Job

User Base

Filtered

title geo company

industry description functional area

Candidate

General expertise specialties education headline geo experience

Current Position title summary tenure length industry functional area …

Similarity (candidate expertise, job description)

0.56 Similarity

(candidate specialties, job description)

0.2 Transition probability

(candidate industry, job industry)

0.43

Title Similarity

0.8

Similarity (headline, title)

0.7 . . .

derived

Matching Binary Exact matches: geo, industry, … Soft transition probabilities, similarity, … Text

Transition probabilities Connectivity yrs of experience to reach title education needed for this title …

Page 26: Scale, Structure, and Semantics

Skills: A Practical Knowledge Representation

26

Page 27: Scale, Structure, and Semantics

Data-Driven Query Expansion for Recall

27

Page 28: Scale, Structure, and Semantics

Data-Driven Query Refinement for Precision

28

Page 29: Scale, Structure, and Semantics

There is no perfect schema or vocabulary.

§  And even if there were, not everyone would use it.

§  Knowledge representation has only succeeded within narrow scope.

§  Brute force is surprisingly effective but does not leverage the user as an intelligent partner.

29

Page 30: Scale, Structure, and Semantics

Communication is the problem and the solution.

§  Rich communication channel fills gaps in system’s knowledge representation and in user’s knowledge.

§  Use data science to make the system smart, but be humble and empower the human user.

You've got the brawn I've got the brains Let's make lots of money Pet Shop Boys, “Opportunities”

30

Page 31: Scale, Structure, and Semantics

The Future is Upon Us

31

Page 32: Scale, Structure, and Semantics

One More Thing

“More data beats clever algorithms but better data beats more data.”

Monica Rogati @ Strata 2012

32

Page 33: Scale, Structure, and Semantics

Questions?

Contact:

[email protected]

We’re Hiring!

Thank You!

33