efficient crowd-sourcing

22
David Karger Sewoong Oh Devavrat Shah MIT + UIUC Efficient crowd- sourcing

Upload: luyu

Post on 24-Feb-2016

38 views

Category:

Documents


0 download

DESCRIPTION

Efficient crowd-sourcing. David Karger Sewoong Oh Devavrat Shah MIT + UIUC. A classical example. A patient is asked: rate your pain on scale 1-10 Medical student gets answer : 5 Intern gets answer : 8 Fellow gets answer : 4.5 Doctor gets answer : 6 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Efficient crowd-sourcing

David Karger Sewoong Oh Devavrat Shah

MIT + UIUC

Efficient crowd-sourcing

Page 2: Efficient crowd-sourcing

A classical example

oA patient is asked: rate your pain on scale 1-10o Medical student gets answer : 5o Intern gets answer : 8o Fellow gets answer : 4.5o Doctor gets answer : 6

oSo what is the “right” amount of pain?

oCrowd-sourcingo Pain of patient = tasko Answer of patient = completion of task by a worker

Page 3: Efficient crowd-sourcing

Contemporary example

Page 4: Efficient crowd-sourcing

Contemporary example

Page 5: Efficient crowd-sourcing

Contemporary example

Page 6: Efficient crowd-sourcing

oGoal: reliable estimate the tasks with min’l cost oKey operational questions:

o Task assignmento Inferring the “answers”

Contemporary example

Page 7: Efficient crowd-sourcing

Model a la Dawid and Skene ‘79oN tasks

o Denote by t1, t2, …, tN – “true” value in {1,..,K}

oM workers o Denote by w1, w2, …, wM – “confusion” matrix

o Worker j: confusion matrix Pj=[Pjkl]

o Worker j’s answer: is l for task with value k with prob. Pjkl

oBinary symmetric case o K = 2: tasks takes value +1 or -1o Correct answer w.p. pj

Page 8: Efficient crowd-sourcing

Model a la Dawid and Skene ‘79

t1 tNt2 tN-1

w1

w2

wM-1 wM

A11 AN-1 1

AN2

A2M

oBinary tasks:oWorker reliability:

oNecessary assumption: we know

Page 9: Efficient crowd-sourcing

Question

oGoal: given N taskso To obtain answer correctly w.p. at least 1-εo What is the minimal number of questions (edges)

needed?o How to assign them, and how to infer tasks values?

t1 tNt2 tN-1

w1

w2

wM-1 wM

A11 AN-1 1

AN2

A2M

Page 10: Efficient crowd-sourcing

oTask assignment grapho Random regular grapho Or, regular graph w large girth

Task assignment

t1 tNt2 tN-1

w1

w2

wM-1 wM

A11 AN-1 1

AN2

A2M

Page 11: Efficient crowd-sourcing

oMajority:

oOracle:

Inferring answerst1 tNt2 tN-1

w1

w2

wM-1 wM

A11 AN-1 1

AN2A2M

Page 12: Efficient crowd-sourcing

oMajority:

oOracle:

oOur Approach:

Inferring answers

t1 tNt2 tN-1

w1

w2

wM-1 wM

A11 AN-1 1

AN2A2M

Page 13: Efficient crowd-sourcing

o Iteratively learn

o Message-passingo O(# edges) operations

o Approximation ofo Maximum Likelihood

Inferring answerst1 tNt2 tN-1

w1

w2

wM-1 wM

A11 AN-1 1

AN2A2M

Page 14: Efficient crowd-sourcing

Inferring answerst1 tNt2 tN-1

w1

w2

wM-1 wM

A11 AN-1 1

AN2A2M

o Theorem (Karger-Oh-Shah). o Let n tasks assigned to n workers as per

o an (l,l) random regular graph o Let ql > √2 o Then, for all n large enough (i.e. n =Ω(lO(log(1/q)) elq))) after O(log (1/q)) iterations of the algorithm

Crowd Quality

Page 15: Efficient crowd-sourcing

How good?

oTo achieve target Perror ≤ε, we need o Per task budget l = Θ(1/q log (1/ε))

oAnd this is minimax optimal

oUnder majority voting (with any graph choice)o Per task budget required is l = Ω(1/q2 log (1/ε))

no significant gain by knowing side-information(golden question, reputation, …!)

Page 16: Efficient crowd-sourcing

Adaptive solution

Theorem (Karger-Oh-Shah). Given any adaptive algorithm,

let Δ be the average number of workers required per task

to achieve desired Perror ≤ε Then there exists {pj} with quality q so that

gain through adaptivity is limited

Page 17: Efficient crowd-sourcing

Model from Dawid-Skene ’79

Theorem (Karger-Oh-Shah). To achieve reliability 1-ε, per task redundancy scales as

K/q (log 1/ε + log K)

Through reducing K-ary problem to K-binary problems

(and dealing with few asymmetries)

Page 18: Efficient crowd-sourcing

Experiments: Amazon MTurk

oLearning similaritieso Recommendationso Searching, …

Page 19: Efficient crowd-sourcing

oLearning similaritieso Recommendationso Searching, …

Experiments: Amazon MTurk

Page 20: Efficient crowd-sourcing

Experiments: Amazon MTurk

Page 21: Efficient crowd-sourcing

Task Assignment: Why Random Graph

Page 22: Efficient crowd-sourcing

Remarks

oCrow-sourcingo Regular graph + message passingo Useful for designing surveys/taking polls

oAlgorithmicallyo Iterative algorithm is like power-iteration

oBeyond stand-alone taskso Learning global structure, e.g. ranking