lei yang, senior engineering manager, quora at mlconf nyc - 4/15/16

29
Sharing and growing the world's knowledge with machine learning Lei Yang ([email protected]) April 2016

Upload: mlconf

Post on 14-Apr-2017

465 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16

Sharing and growing the world's knowledge with machine learning

Lei Yang ([email protected])

April 2016

Page 2: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16

Our mission

“To share and grow the world’s

knowledge”

● Millions of questions & answers

● Millions of users

● Thousands of topics

● ...

Page 3: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16

Demand

What we care about

Quality

Relevance

Page 4: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16

Data@Quora

Page 5: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16

Topic Question

User

Answer

Actions

Page 6: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16

Lots of data relations

Page 7: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16

Complex network propagation effects

Page 8: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16

Importance of topics & semantics

Page 9: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16

Machine Learning@Quora

Page 10: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16

Ranking - Answer ranking

What is a good Quora answer?

● Truthful

● Reusable

● Provides explanation

● well formatted

...

Page 11: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16

Ranking - Answer ranking

How are those criteria translated

into features?

● Features that relate to the text quality

itself

● Interaction features (upvotes/downvotes,

clicks, comments…)

● User features (e.g. expertise in topic)

Page 12: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16

Ranking - Feed

Present most interesting stories for a user at

a given time

● Interesting = topical relevance +

social relevance + timeliness

● Stories = questions + answers

● Personalized learning-to-rank approach

● Relevance-ordered vs time-ordered = big

gains in engagement

● Challenges

○ Potentially many candidate stories

○ Real-time ranking

○ Objective function

Page 13: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16

Ranking - Feed

● Personalized LTR model

● Features

○ Quality of question/answer

○ Topics the user is interested in

or knows about

○ Users the user is following

○ What is trending/popular

○ ...

● Different temporal windows

● Multi-stage solution with different

“streams”

Page 14: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16

Recommendations - Topics

Recommend new topics for the user

to follow, based on

● Topics you already follow

● Users you already follow

● Interactions with questions/answers

● Topic-related features

● ...

Page 15: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16

Recommendations - Users

Recommend new users for the user

to follow, based on:

● Users you already follow

● Topics you already follow

● Interactions with users

● User-related features

● ...

Page 16: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16

Related questions

Given interest in a question, what other questions

are interesting?

● Not only about similarity, but also “interestingness”

● Features such as:

○ Textual

○ Co-visit

○ Topics

○ …

● Important for logged-out use case

Page 17: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16

Duplicate questions

● Important issue for Quora

○ Want to make sure we don’t disperse

knowledge to the same question

● Binary classifier trained with labelled data

● Features

○ Textual vector space models

○ Usage-based features

○ ...

Page 18: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16

User expertise inference

Infer user’s trustworthiness in relation

to a given topic

● We take into account:

○ Answers written on topic

○ Upvotes/downvotes received

○ Endorsements

○ ...

● Trust/expertise propagates through the network

● Useful as input/features in other models

Page 19: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16

Spam detection and moderation

● Very important for Quora to keep quality of

content

● Pure manual approaches do not scale

● Hard to get algorithms 100% right

● ML algorithms detect content/user issues

○ Output of the algorithms feed manually

curated moderation queues

Page 20: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16

Content creation prediction

● Quora’s algorithms not only optimize for

probability of reading

● Important to predict probability of a user

answering a question

● Some product features completely rely

on that prediction

○ E.g. A2A (ask to answer) suggestions

Page 21: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16

Trending topics

Highlight current events that are interesting

to the user

● We take into account:

○ Global “Trendiness”

○ Social “Trendiness”

○ User’s interest

○ ...

● Trending topics are a great discovery mechanism

Page 22: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16

Models &Experimentation

Page 23: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16

Models

● Logistic Regression

● Elastic Nets

● Gradient Boosted Decision Trees

● Random Forests

● (Deep) Neural Networks

● LambdaMART

● Matrix Factorization

● LDA

● ...

Page 24: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16

Open source project -- QMF

Quora Matrix Factorization

https://github.com/quora/qmf

● Currently BPR and WALS

● Multithreaded implementation

in C++14

Page 25: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16

ML platform

● Allow ML Engineers and Data

Scientists to collaborate within

the same ML framework

● Easy integration with well known

tools and open source libraries

● Offline evaluation and debugging

● User friendly Python frontend

● High performance and scalable

C++/CUDA backend

Redshift MySQL

S3 PythonUser Interface

Trainer Box

Session

CPU GPU

Disk

...WALS BPR

Page 26: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16

● Extensive A/B testing, data-driven

decision-making

● Separate, orthogonal “layers” for

different parts of the system

● Experiment framework showing

comparisons for various metrics

Experimentation

Page 27: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16

Conclusions

Page 28: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16

Conclusions

● At Quora we have not only Big, but also “rich” data

● Our algorithms need to understand and optimize complex aspects such

as quality, interestingness, relevance, or user expertise

● We believe ML will be one of the keys to our success

● We have many interesting problems, and many unsolved challenges

Page 29: Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16

We are hiring! www.quora.com/careers