kevin teh insight presentation

10
Disambiguating Twitter Search Kevin Teh [email protected] Insight Data Science Fellows Program March 2013 Tuesday, February 26, 13

Upload: kevin-teh

Post on 15-Jun-2015

386 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Kevin teh   insight presentation

Disambiguating Twitter SearchKevin [email protected] Data Science Fellows ProgramMarch 2013

Tuesday, February 26, 13

Page 2: Kevin teh   insight presentation

That’s not the python that I meant...

Tuesday, February 26, 13

Page 3: Kevin teh   insight presentation

The solution? cluster-pluck.

Tuesday, February 26, 13

Page 4: Kevin teh   insight presentation

cluster-pluck disambiguates Twitter search in real time

Tuesday, February 26, 13

Page 5: Kevin teh   insight presentation

It works in Spanish too!

Tuesday, February 26, 13

Page 6: Kevin teh   insight presentation

Tuesday, February 26, 13

Page 7: Kevin teh   insight presentation

Tools

300,000Tweets

User

Filter

Word Filter Web Application

Tuesday, February 26, 13

Page 8: Kevin teh   insight presentation

Algorithmread query and d/l

corpus of 1500 tweets

select potentially meaningful words

countwords

cluster candidatesinto groups

assign tweetsto clusters

filter outcommon words

rank remaining words by rate of capitalization and

select top 10

rank remaining words by number

of occurrences and select top 10

link two candidates if their relative

proportion of co-occurrence is

greater than 0.25

rank connected components by

total occurrences and take top 3

Tuesday, February 26, 13

Page 9: Kevin teh   insight presentation

Kevin [email protected]

Math PhD -- May ’13Topic: Noncommutative Geometry (Whatever that is)

B.A.Sc. -- April ’07Engineering Science (Whatever that is)

Tuesday, February 26, 13

Page 10: Kevin teh   insight presentation

Tuesday, February 26, 13