h2o world - quora: machine learning algorithms to grow the worlds knowledge - xavier amatriain
TRANSCRIPT
![Page 1: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain](https://reader033.vdocuments.mx/reader033/viewer/2022050613/586f790b1a28ab10258b6e7d/html5/thumbnails/1.jpg)
Machine Learning to Grow the World's Knowledge
Xavier Amatriain (@xamat)
11/10/2015
![Page 2: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain](https://reader033.vdocuments.mx/reader033/viewer/2022050613/586f790b1a28ab10258b6e7d/html5/thumbnails/2.jpg)
Our Mission
“To share and grow the world’s
knowledge”
• Millions of questions & answers
• Millions of users
• Thousands of topics
• ...
![Page 3: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain](https://reader033.vdocuments.mx/reader033/viewer/2022050613/586f790b1a28ab10258b6e7d/html5/thumbnails/3.jpg)
Demand
What we care about
Quality
Relevance
![Page 4: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain](https://reader033.vdocuments.mx/reader033/viewer/2022050613/586f790b1a28ab10258b6e7d/html5/thumbnails/4.jpg)
Data@Quora
![Page 5: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain](https://reader033.vdocuments.mx/reader033/viewer/2022050613/586f790b1a28ab10258b6e7d/html5/thumbnails/5.jpg)
Lots of data relations
![Page 6: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain](https://reader033.vdocuments.mx/reader033/viewer/2022050613/586f790b1a28ab10258b6e7d/html5/thumbnails/6.jpg)
Complex network propagation effects
![Page 7: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain](https://reader033.vdocuments.mx/reader033/viewer/2022050613/586f790b1a28ab10258b6e7d/html5/thumbnails/7.jpg)
Importance of topics & semantics
![Page 8: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain](https://reader033.vdocuments.mx/reader033/viewer/2022050613/586f790b1a28ab10258b6e7d/html5/thumbnails/8.jpg)
Machine Learning@Quora
![Page 9: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain](https://reader033.vdocuments.mx/reader033/viewer/2022050613/586f790b1a28ab10258b6e7d/html5/thumbnails/9.jpg)
Ranking - Answer rankingWhat is a good Quora answer?
• truthful
• reusable
• provides explanation
• well formatted
• ...
![Page 10: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain](https://reader033.vdocuments.mx/reader033/viewer/2022050613/586f790b1a28ab10258b6e7d/html5/thumbnails/10.jpg)
Ranking - Answer rankingHow are those dimensions translated
into features?
• Features that relate to the text
quality itself
• Interaction features
(upvotes/downvotes, clicks,
comments…)
• User features (e.g. expertise in topic)
![Page 11: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain](https://reader033.vdocuments.mx/reader033/viewer/2022050613/586f790b1a28ab10258b6e7d/html5/thumbnails/11.jpg)
Ranking - Feed• Goal: Present most interesting stories for
a user at a given time• Interesting = topical relevance +
social relevance + timeliness
• Stories = questions + answers
• ML: Personalized learning-to-rank approach
• Relevance-ordered vs time-ordered = big
gains in engagement
• Challenges:
• potentially many candidate stories
• real-time ranking
• optimize for relevance
![Page 12: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain](https://reader033.vdocuments.mx/reader033/viewer/2022050613/586f790b1a28ab10258b6e7d/html5/thumbnails/12.jpg)
Feed dataset: impression logs
click
upvote
downvote
expand
share
click
answer pass
downvote
follow
![Page 13: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain](https://reader033.vdocuments.mx/reader033/viewer/2022050613/586f790b1a28ab10258b6e7d/html5/thumbnails/13.jpg)
● Value of showing a story to a user, e.g. weighted sum of actions:
v = ∑a va 1{ya = 1}
● Goal: predict this value for new stories. 2 possible approaches:○ predict value directly
v_pred = f(x)
■ pros: single regression model
■ cons: can be ambiguous, coupled
○ predict probabilities for each action, then compute expected value:
v_pred = E[ V | x ] = ∑a va p(a | x)
■ pros: better use of supervised signal, decouples action models from action values
■ cons: more costly, one classifier per action
What is relevance?
![Page 14: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain](https://reader033.vdocuments.mx/reader033/viewer/2022050613/586f790b1a28ab10258b6e7d/html5/thumbnails/14.jpg)
● Essential for getting good rankings
● Better if updated in real-time (more reactive)
● Main sets of features:○ user (e.g. age, country, recent activity)
○ story (e.g. popularity, trendiness, quality)
○ interactions between the two (e.g. topic or author affinity)
Feature engineering
![Page 15: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain](https://reader033.vdocuments.mx/reader033/viewer/2022050613/586f790b1a28ab10258b6e7d/html5/thumbnails/15.jpg)
● Linear
○ simple, fast to train
○ manual, non-linear transforms for richer
representation (buckets, ngrams)
● Decision trees
○ learn non-linear representations
● Tree ensembles
○ Random forests
○ Gradient boosted decision trees
● In-house C++ training code, third-party
libraries for prototyping new models
Models
![Page 16: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain](https://reader033.vdocuments.mx/reader033/viewer/2022050613/586f790b1a28ab10258b6e7d/html5/thumbnails/16.jpg)
Scalability: feed backend system
Aggregator 1 Aggregator 2 Aggregator 3
Leaf 1 Leaf 2 Leaf 3
Aggregator
Leaf
Requests from Web (python)
...
...
...
user_id
object_id
![Page 17: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain](https://reader033.vdocuments.mx/reader033/viewer/2022050613/586f790b1a28ab10258b6e7d/html5/thumbnails/17.jpg)
Recommendations - Topics
Goal: Recommend new topics for the
user to follow
• Based on
• Other topics followed
• Users followed
• User interactions
• Topic-related features
• ...
![Page 18: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain](https://reader033.vdocuments.mx/reader033/viewer/2022050613/586f790b1a28ab10258b6e7d/html5/thumbnails/18.jpg)
Recommendations - Users
Goal: Recommend new users to follow
• Based on:
• Other users followed
• Topics followed
• User interactions
• User-related features
• ...
![Page 19: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain](https://reader033.vdocuments.mx/reader033/viewer/2022050613/586f790b1a28ab10258b6e7d/html5/thumbnails/19.jpg)
Related Questions
• Given interest in question A (source) what other
questions will be interesting?
• Not only about similarity, but also “interestingness”
• Features such as:
• Textual
• Co-visit
• Topics
• …
• Important for logged-out use case
![Page 20: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain](https://reader033.vdocuments.mx/reader033/viewer/2022050613/586f790b1a28ab10258b6e7d/html5/thumbnails/20.jpg)
Duplicate Questions• Important issue for Quora
• Want to make sure we don’t disperse
knowledge to the same question
• Solution: binary classifier trained with
labelled data
• Features
• Textual vector space models
• Usage-based features
• ...
![Page 21: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain](https://reader033.vdocuments.mx/reader033/viewer/2022050613/586f790b1a28ab10258b6e7d/html5/thumbnails/21.jpg)
User Trust/Expertise InferenceGoal: Infer user’s trustworthiness in relation
to a given topic
• We take into account:
• Answers written on topic
• Upvotes/downvotes received
• Endorsements
• ...
• Trust/expertise propagates through the network
• Must be taken into account by other algorithms
![Page 22: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain](https://reader033.vdocuments.mx/reader033/viewer/2022050613/586f790b1a28ab10258b6e7d/html5/thumbnails/22.jpg)
Trending TopicsGoal: Highlight current events that are
interesting for the user
• We take into account:
• Global “Trendiness”
• Social “Trendiness”
• User’s interest
• ...
• Trending topics are a great discovery mechanism
![Page 23: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain](https://reader033.vdocuments.mx/reader033/viewer/2022050613/586f790b1a28ab10258b6e7d/html5/thumbnails/23.jpg)
Spam Detection/Moderation• Very important for Quora to keep quality of
content
• Pure manual approaches do not scale
• Hard to get algorithms 100% right
• ML algorithms detect content/user issues
• Output of the algorithms feed manually
curated moderation queues
![Page 24: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain](https://reader033.vdocuments.mx/reader033/viewer/2022050613/586f790b1a28ab10258b6e7d/html5/thumbnails/24.jpg)
Content Creation Prediction• Quora’s algorithms not only optimize for
probability of reading
• Important to predict probability of a user
answering a question
• Parts of our system completely rely on
that prediction
• E.g. A2A (ask to answer) suggestions
![Page 25: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain](https://reader033.vdocuments.mx/reader033/viewer/2022050613/586f790b1a28ab10258b6e7d/html5/thumbnails/25.jpg)
Models
![Page 26: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain](https://reader033.vdocuments.mx/reader033/viewer/2022050613/586f790b1a28ab10258b6e7d/html5/thumbnails/26.jpg)
Models● Logistic Regression
● Elastic Nets
● Gradient Boosted Decision
Trees
● Random Forests
● (Deep) Neural Networks
● LambdaMART
● Matrix Factorization
● LDA
● ...
![Page 27: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain](https://reader033.vdocuments.mx/reader033/viewer/2022050613/586f790b1a28ab10258b6e7d/html5/thumbnails/27.jpg)
Experimentation
![Page 28: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain](https://reader033.vdocuments.mx/reader033/viewer/2022050613/586f790b1a28ab10258b6e7d/html5/thumbnails/28.jpg)
⚫ Extensive A/B testing, data-driven decision-
making
⚫ Separate, orthogonal “layers” for different parts
of the system
⚫ Experiment framework showing comparisons for
various metrics
Experimentation
![Page 29: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain](https://reader033.vdocuments.mx/reader033/viewer/2022050613/586f790b1a28ab10258b6e7d/html5/thumbnails/29.jpg)
Conclusions
![Page 30: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain](https://reader033.vdocuments.mx/reader033/viewer/2022050613/586f790b1a28ab10258b6e7d/html5/thumbnails/30.jpg)
Conclusions
• At Quora we have not only Big, but also “rich” data
• Our algorithms need to understand and optimize
complex aspects such as quality, interestingness, or user
expertise
• We believe ML will be one of the keys to our success
• We have many interesting problems, and many unsolved
challenges
![Page 31: H2O World - Quora: Machine Learning Algorithms to Grow the Worlds Knowledge - Xavier Amatriain](https://reader033.vdocuments.mx/reader033/viewer/2022050613/586f790b1a28ab10258b6e7d/html5/thumbnails/31.jpg)