kevin teh insight presentation
TRANSCRIPT
Disambiguating Twitter SearchKevin [email protected] Data Science Fellows ProgramMarch 2013
Tuesday, February 26, 13
That’s not the python that I meant...
Tuesday, February 26, 13
The solution? cluster-pluck.
Tuesday, February 26, 13
cluster-pluck disambiguates Twitter search in real time
Tuesday, February 26, 13
It works in Spanish too!
Tuesday, February 26, 13
Tuesday, February 26, 13
Tools
300,000Tweets
User
Filter
Word Filter Web Application
Tuesday, February 26, 13
Algorithmread query and d/l
corpus of 1500 tweets
select potentially meaningful words
countwords
cluster candidatesinto groups
assign tweetsto clusters
filter outcommon words
rank remaining words by rate of capitalization and
select top 10
rank remaining words by number
of occurrences and select top 10
link two candidates if their relative
proportion of co-occurrence is
greater than 0.25
rank connected components by
total occurrences and take top 3
Tuesday, February 26, 13
Kevin [email protected]
Math PhD -- May ’13Topic: Noncommutative Geometry (Whatever that is)
B.A.Sc. -- April ’07Engineering Science (Whatever that is)
Tuesday, February 26, 13
Tuesday, February 26, 13