analytic methods for optimizing realtime crowdsourcing

91
MIT HUMAN-COMPUTER INTERACTION Analytic Methods for Optimizing Realtime Crowdsourcing Michael Bernstein, David Karger, Rob Miller, and Joel Brandt MIT CSAIL and Adobe Systems Tuesday, May 8, 12

Upload: michael-bernstein

Post on 24-May-2015

367 views

Category:

Technology


0 download

DESCRIPTION

Collective Intelligence 2012 Realtime crowdsourcing research has demonstrated that it is possible to recruit paid crowds within seconds by managing a small, fast-reacting worker pool. Realtime crowds enable crowd-powered systems that respond at interactive speeds: for example, cameras, robots and instant opinion polls. So far, these techniques have mainly been proof-of-concept prototypes: research has not yet attempted to understand how they might work at large scale or optimize their cost/performance trade-offs. In this paper, we use queueing theory to analyze the retainer model for realtime crowdsourcing, in particular its expected wait time and cost to requesters. We provide an algorithm that allows requesters to minimize their cost subject to performance requirements. We then propose and analyze three techniques to improve performance: push notifications, shared retainer pools, and precruitment, which involves recalling retainer workers before a task actually arrives. An experimental validation finds that precruited workers begin a task 500 milliseconds after it is posted, delivering results below the one-second cognitive threshold for an end-user to stay in flow.

TRANSCRIPT

Page 1: Analytic Methods for Optimizing Realtime Crowdsourcing

MIT HUMAN-COMPUTER INTERACTION

Analytic Methods for Optimizing Realtime Crowdsourcing

Michael Bernstein, David Karger, Rob Miller, and Joel BrandtMIT CSAIL and Adobe Systems

Tuesday, May 8, 12

Page 2: Analytic Methods for Optimizing Realtime Crowdsourcing

MIT HUMAN-COMPUTER INTERACTION

Use queueing theory to understand and optimize performance of a paid, realtime crowdsourcing platform.

•Relationship between crowd size and response time

•Algorithm for optimizing crowd size & cost vs. response time

• Improvements to the platform: 500 millisecond feedback

Tuesday, May 8, 12

Page 3: Analytic Methods for Optimizing Realtime Crowdsourcing

Realtime CrowdsAnswering visual questions for blind users[Bigham et al. 2010]

Tuesday, May 8, 12

Page 4: Analytic Methods for Optimizing Realtime Crowdsourcing

Realtime CrowdsAnswering visual questions for blind users[Bigham et al. 2010]

Crowd-assisted photography[Bernstein et al. 2011]

Tuesday, May 8, 12

Page 5: Analytic Methods for Optimizing Realtime Crowdsourcing

Realtime CrowdsAnswering visual questions for blind users[Bigham et al. 2010]

Crowd-assisted photography[Bernstein et al. 2011]

Tuesday, May 8, 12

Page 6: Analytic Methods for Optimizing Realtime Crowdsourcing

Paid CrowdsourcingPay small amounts of money for short tasks

Amazon Mechanical Turk: Roughly five million tasks completed per year at 1-5¢ each [Ipeirotis 2010]

Label an imageRequester: Matt C.

Reward: $0.01

Transcribe short audio clipRequester: Gordon L.

Reward: $0.04

Tuesday, May 8, 12

Page 7: Analytic Methods for Optimizing Realtime Crowdsourcing

Retainer RecruitmentWorkers sign up in advance½¢ per minute to remain on callAlert when the task is ready

Wait at most: 5 minutesTask:Click on the verbs in the paragraph

He leapt the fence anddashed toward the door.

[Bernstein et al. 2011]Tuesday, May 8, 12

Page 8: Analytic Methods for Optimizing Realtime Crowdsourcing

Retainer RecruitmentWorkers sign up in advance½¢ per minute to remain on callAlert when the task is ready

Wait at most: 5 minutesTask:Click on the verbs in the paragraph

alert()

Start now! OK

He leapt the fence anddashed toward the door.

[Bernstein et al. 2011]Tuesday, May 8, 12

Page 9: Analytic Methods for Optimizing Realtime Crowdsourcing

Retainer RecruitmentWorkers sign up in advance½¢ per minute to remain on callAlert when the task is ready

50% of workers return in two seconds, and 75% of workers return in three seconds.

[Bernstein et al. 2011]Tuesday, May 8, 12

Page 10: Analytic Methods for Optimizing Realtime Crowdsourcing

State of the LiteratureRealtime Crowds

• Recruit crowds in two seconds, execute traditional tasks (e.g., votes) in five seconds

• Maintain continuous control of remote interfaces

• Opportunities in deployable, intelligently reactive software

[Bigham et al. 2010, Bernstein et al. 2011, Lasecki et al. 2011]Tuesday, May 8, 12

Page 11: Analytic Methods for Optimizing Realtime Crowdsourcing

The ChallengeRunning Out of Retainer Workers

Tuesday, May 8, 12

Page 12: Analytic Methods for Optimizing Realtime Crowdsourcing

The ChallengeRunning Out of Retainer Workers

Tuesday, May 8, 12

Page 13: Analytic Methods for Optimizing Realtime Crowdsourcing

The ChallengeRunning Out of Retainer Workers

Tuesday, May 8, 12

Page 14: Analytic Methods for Optimizing Realtime Crowdsourcing

The ChallengeRunning Out of Retainer Workers

Tuesday, May 8, 12

Page 15: Analytic Methods for Optimizing Realtime Crowdsourcing

The ChallengeRunning Out of Retainer Workers

Tuesday, May 8, 12

Page 16: Analytic Methods for Optimizing Realtime Crowdsourcing

The ChallengeRunning Out of Retainer Workers

Tuesday, May 8, 12

Page 17: Analytic Methods for Optimizing Realtime Crowdsourcing

The ChallengeRunning Out of Retainer Workers

LossNon-realtime response

Tuesday, May 8, 12

Page 18: Analytic Methods for Optimizing Realtime Crowdsourcing

The TradeoffMissed tasks, non-realtime results

Extra retainer workers,extra cost

Tuesday, May 8, 12

Page 19: Analytic Methods for Optimizing Realtime Crowdsourcing

The Goal

Optimize the tradeoff betweenrecruiting too many workers anddropping too many tasks.

Tuesday, May 8, 12

Page 20: Analytic Methods for Optimizing Realtime Crowdsourcing

The Goal

Optimize the tradeoff betweenrecruiting too many workers anddropping too many tasks.

Budget-optimal crowdsourcing is possible in non-realtime scenarios [Dai, Mausam and Weld 2010; Kamar, Hacker and Horvitz 2012; Karger, Oh, and Shah 2011]

Tuesday, May 8, 12

Page 21: Analytic Methods for Optimizing Realtime Crowdsourcing

1 Model

2 Optimization

3 PlatformOut

line

Tuesday, May 8, 12

Page 22: Analytic Methods for Optimizing Realtime Crowdsourcing

Queueing Theory

• Formal framework for stochastic arrival and service processes

• Basic idea: random task arrivals and random processing times for workers

• Quantify how long tasks will need to waitin line

ModelOptimizePlatformO

utlin

e

Queueing theory for completion times: [Ipeirotis 2010]Tuesday, May 8, 12

Page 23: Analytic Methods for Optimizing Realtime Crowdsourcing

Queueing TheoryM/M/1 queue

Markovian (Poisson process) task arrivals, rateMarkovian (Poisson process) server work time, rate One server

µ�

Server

MM1

Tuesday, May 8, 12

Page 24: Analytic Methods for Optimizing Realtime Crowdsourcing

Server

Queueing TheoryM/M/1 queue

Markovian (Poisson process) task arrivals, rateMarkovian (Poisson process) server work time, rate One server

µ�M

M1

Tuesday, May 8, 12

Page 25: Analytic Methods for Optimizing Realtime Crowdsourcing

ServerTask

Queueing TheoryM/M/1 queue

Markovian (Poisson process) task arrivals, rateMarkovian (Poisson process) server work time, rate One server

µ�M

M1

Tuesday, May 8, 12

Page 26: Analytic Methods for Optimizing Realtime Crowdsourcing

ServerTask

Queueing TheoryM/M/1 queue

Markovian (Poisson process) task arrivals, rateMarkovian (Poisson process) server work time, rate One server

µ�M

M1

Tuesday, May 8, 12

Page 27: Analytic Methods for Optimizing Realtime Crowdsourcing

Server

Queueing TheoryM/M/1 queue

Markovian (Poisson process) task arrivals, rateMarkovian (Poisson process) server work time, rate One server

µ�M

M1

Tuesday, May 8, 12

Page 28: Analytic Methods for Optimizing Realtime Crowdsourcing

Server

Queueing TheoryM/M/1 queue

Markovian (Poisson process) task arrivals, rateMarkovian (Poisson process) server work time, rate One server

µ�M

M1

Tuesday, May 8, 12

Page 29: Analytic Methods for Optimizing Realtime Crowdsourcing

Server

Queueing TheoryM/M/1 queue

Markovian (Poisson process) task arrivals, rateMarkovian (Poisson process) server work time, rate One server

µ�M

M1

Tuesday, May 8, 12

Page 30: Analytic Methods for Optimizing Realtime Crowdsourcing

Server

Queueing TheoryM/M/1 queue

Markovian (Poisson process) task arrivals, rateMarkovian (Poisson process) server work time, rate One server

µ�M

M1

Tuesday, May 8, 12

Page 31: Analytic Methods for Optimizing Realtime Crowdsourcing

Server

Queueing TheoryM/M/1 queue

Markovian (Poisson process) task arrivals, rateMarkovian (Poisson process) server work time, rate One server

µ�M

M1

Tuesday, May 8, 12

Page 32: Analytic Methods for Optimizing Realtime Crowdsourcing

Server

Queueing TheoryM/M/1 queue

Markovian (Poisson process) task arrivals, rateMarkovian (Poisson process) server work time, rate One server

µ�M

M1

Tuesday, May 8, 12

Page 33: Analytic Methods for Optimizing Realtime Crowdsourcing

Server

Queueing TheoryM/M/1 queue

Markovian (Poisson process) task arrivals, rateMarkovian (Poisson process) server work time, rate One server

µ�M

M1

Tuesday, May 8, 12

Page 34: Analytic Methods for Optimizing Realtime Crowdsourcing

Server

Queueing TheoryM/M/1 queue

Markovian (Poisson process) task arrivals, rateMarkovian (Poisson process) server work time, rate One server

µ�M

M1

Tuesday, May 8, 12

Page 35: Analytic Methods for Optimizing Realtime Crowdsourcing

Queueing TheoryM/M/c/c queue

Markovian (Poisson process) task arrivals, rateMarkovian (Poisson process) server work time, rate c serversc max tasks in servers and queue

µ�M

Mcc

Tuesday, May 8, 12

Page 36: Analytic Methods for Optimizing Realtime Crowdsourcing

Queueing TheoryM/M/c/c queue

Markovian (Poisson process) task arrivals, rateMarkovian (Poisson process) server work time, rate c serversc max tasks in servers and queue

µ�M

Mcc

Tuesday, May 8, 12

Page 37: Analytic Methods for Optimizing Realtime Crowdsourcing

Queueing TheoryM/M/c/c queue

Markovian (Poisson process) task arrivals, rateMarkovian (Poisson process) server work time, rate c serversc max tasks in servers and queue

µ�M

Mcc

Tuesday, May 8, 12

Page 38: Analytic Methods for Optimizing Realtime Crowdsourcing

Queueing TheoryM/M/c/c queue

Markovian (Poisson process) task arrivals, rateMarkovian (Poisson process) server work time, rate c serversc max tasks in servers and queue

µ�M

Mcc

Tuesday, May 8, 12

Page 39: Analytic Methods for Optimizing Realtime Crowdsourcing

Queueing TheoryM/M/c/c queue

Markovian (Poisson process) task arrivals, rateMarkovian (Poisson process) server work time, rate c serversc max tasks in servers and queue

µ�M

Mcc

Tuesday, May 8, 12

Page 40: Analytic Methods for Optimizing Realtime Crowdsourcing

Queueing TheoryM/M/c/c queue

Markovian (Poisson process) task arrivals, rateMarkovian (Poisson process) server work time, rate c serversc max tasks in servers and queue

µ�M

Mcc

Tuesday, May 8, 12

Page 41: Analytic Methods for Optimizing Realtime Crowdsourcing

Queueing TheoryM/M/c/c queue

Markovian (Poisson process) task arrivals, rateMarkovian (Poisson process) server work time, rate c serversc max tasks in servers and queue

µ�M

Mcc

Tuesday, May 8, 12

Page 42: Analytic Methods for Optimizing Realtime Crowdsourcing

All servers busy

Queueing TheoryM/M/c/c queue

Markovian (Poisson process) task arrivals, rateMarkovian (Poisson process) server work time, rate c serversc max tasks in servers and queue

µ�M

Mcc

Tuesday, May 8, 12

Page 43: Analytic Methods for Optimizing Realtime Crowdsourcing

Queueing TheoryM/M/c/c queue

Markovian (Poisson process) task arrivals, rateMarkovian (Poisson process) server work time, rate c serversc max tasks in servers and queue

µ�M

Mcc

Tuesday, May 8, 12

Page 44: Analytic Methods for Optimizing Realtime Crowdsourcing

Modeling Retainer Recruitment

Tuesday, May 8, 12

Page 45: Analytic Methods for Optimizing Realtime Crowdsourcing

Retainer QueueM/M/c/c queue

c workers, no waiting queueTask arrivals: Poisson process, rateWorker recruitment time: Poisson process, rate µ

Crowd

Tuesday, May 8, 12

Page 46: Analytic Methods for Optimizing Realtime Crowdsourcing

Retainer QueueM/M/c/c queue

c workers, no waiting queueTask arrivals: Poisson process, rateWorker recruitment time: Poisson process, rate µ

Crowd

Tuesday, May 8, 12

Page 47: Analytic Methods for Optimizing Realtime Crowdsourcing

Retainer QueueM/M/c/c queue

c workers, no waiting queueTask arrivals: Poisson process, rateWorker recruitment time: Poisson process, rate µ

Crowd

Tuesday, May 8, 12

Page 48: Analytic Methods for Optimizing Realtime Crowdsourcing

c workers, no waiting queueTask arrivals: Poisson process, rateWorker recruitment time: Poisson process, rate µ

Retainer QueueM/M/c/c queue

Crowd

Tuesday, May 8, 12

Page 49: Analytic Methods for Optimizing Realtime Crowdsourcing

c workers, no waiting queueTask arrivals: Poisson process, rateWorker recruitment time: Poisson process, rate µ

Retainer QueueM/M/c/c queue

Crowd

Tuesday, May 8, 12

Page 50: Analytic Methods for Optimizing Realtime Crowdsourcing

c workers, no waiting queueTask arrivals: Poisson process, rateWorker recruitment time: Poisson process, rate µ

Retainer QueueM/M/c/c queue

Crowd

Tuesday, May 8, 12

Page 51: Analytic Methods for Optimizing Realtime Crowdsourcing

c workers, no waiting queueTask arrivals: Poisson process, rateWorker recruitment time: Poisson process, rate µ

Retainer QueueLoss

Crowd

Tuesday, May 8, 12

Page 52: Analytic Methods for Optimizing Realtime Crowdsourcing

c workers, no waiting queueTask arrivals: Poisson process, rateWorker recruitment time: Poisson process, rate µ

Retainer QueueLoss

Crowd

Tuesday, May 8, 12

Page 53: Analytic Methods for Optimizing Realtime Crowdsourcing

c workers, no waiting queueTask arrivals: Poisson process, rateWorker recruitment time: Poisson process, rate µ

Retainer QueueLoss

Crowd

All servers busy

Tuesday, May 8, 12

Page 54: Analytic Methods for Optimizing Realtime Crowdsourcing

Retainer QueueLoss

Crowd

All servers busy

Tuesday, May 8, 12

Page 55: Analytic Methods for Optimizing Realtime Crowdsourcing

Retainer QueueLoss

Crowd

All servers busy

Tuesday, May 8, 12

Page 56: Analytic Methods for Optimizing Realtime Crowdsourcing

Retainer QueueLoss

Crowd

All servers busy

P (i servers busy) = ⇡(i)

P (all servers busy) = ⇡(c)

Tuesday, May 8, 12

Page 57: Analytic Methods for Optimizing Realtime Crowdsourcing

Retainer QueueLoss

Crowd

All servers busy

P (i servers busy) = ⇡(i)

P (all servers busy) = ⇡(c)

Tuesday, May 8, 12

Page 58: Analytic Methods for Optimizing Realtime Crowdsourcing

Retainer QueueLoss

Crowd

All servers busy

P (i servers busy) = ⇡(i)

P (all servers busy) = ⇡(c)

Tuesday, May 8, 12

Page 59: Analytic Methods for Optimizing Realtime Crowdsourcing

Retainer QueueLoss

Crowd

All servers busy

P (i servers busy) = ⇡(i)

P (all servers busy) = ⇡(c)

Tuesday, May 8, 12

Page 60: Analytic Methods for Optimizing Realtime Crowdsourcing

Model Predictions

1. Probability that all workers are busy:→ the task has to wait for expected time

2. Cost of keeping a retainer pool of size c→ cost depends on number of idle servers

1/µ⇡(c)

Tuesday, May 8, 12

Page 61: Analytic Methods for Optimizing Realtime Crowdsourcing

Probability of Loss

• Draw on Erlang’s Loss Formula from queueing theory: probability of a rejected request in an M/M/c/c queue

• Let be the traffic intensity:

(roughly, the number of new tasks that will arrive in the time it takes to recruit a worker)

⇢⇢ = �/µ

Tuesday, May 8, 12

Page 62: Analytic Methods for Optimizing Realtime Crowdsourcing

Probability of Loss

Erlang’s Loss Formula says:

Remarkably, this result makes no assumptionsabout the arrival distribution.

⇡(c) = P (c servers busy)

=⇢c/c!Pci=0 ⇢

i/i!

Tuesday, May 8, 12

Page 63: Analytic Methods for Optimizing Realtime Crowdsourcing

Probability of Loss

●●

●●

●●

2 4 6 8 10

0.1

0.2

0.5

1.0

2.0

5.0

(a) Cost of retainer

Size of retainer pool

Paym

ent u

nits

per

uni

t tim

e

●●

●●

●●

●●

●●

●●

●●

●●

Traffic intensity ρ0.10.51510

2 4 6 8 10

1e−1

01e−0

71e−0

41e−0

1

(b) Probability of waiting

Size of retainer pool

Prob

abilit

y of

wai

ting

●●

● ● ● ● ● ●●

●●

● ● ● ● ● ● ● ● ● ●

Traffic intensity ρ0.10.51510

2 4 6 8 10

1e−1

01e−0

71e−0

41e−0

11e

+02

(c) Expected wait time

Size of retainer pool

Expe

cted

wai

t tim

e (s

econ

ds)

●●

● ● ● ● ● ● ●●

●●

● ● ● ● ● ● ● ● ● ●

Traffic intensity ρ0.10.51510

●●

●●

●●

2 4 6 8 10

0.1

0.2

0.5

1.0

2.0

5.0

(a) Cost of retainer

Size of retainer pool

Paym

ent u

nits

per

uni

t tim

e

●●

●●

●●

●●

●●

●●

●●

●●

Traffic intensity ρ0.10.51510

2 4 6 8 10

1e−1

01e−0

71e−0

41e−0

1

(b) Probability of waiting

Size of retainer pool

Prob

abilit

y of

wai

ting

●●

● ● ● ● ● ●●

●●

● ● ● ● ● ● ● ● ● ●

Traffic intensity ρ0.10.51510

2 4 6 8 10

1e−1

01e−0

71e−0

41e−0

11e

+02

(c) Expected wait time

Size of retainer pool

Expe

cted

wai

t tim

e (s

econ

ds)

●●

● ● ● ● ● ● ●●

●●

● ● ● ● ● ● ● ● ● ●

Traffic intensity ρ0.10.51510

Tuesday, May 8, 12

Page 64: Analytic Methods for Optimizing Realtime Crowdsourcing

Expected Waiting Time

P (c servers busy)⇥ (expected recruitment time)

= ⇡(c)1

µ

=⇢

c/c!Pc

i=0 ⇢i/i!

1

µ

P (c servers busy)⇥ (expected recruitment time)

= ⇡(c)1

µ

=⇢

c/c!Pc

i=0 ⇢i/i!

1

µ

P (c servers busy)⇥ (expected recruitment time)

= ⇡(c)1

µ

=⇢

c/c!Pc

i=0 ⇢i/i!

1

µ

Tuesday, May 8, 12

Page 65: Analytic Methods for Optimizing Realtime Crowdsourcing

Expected CostHow much do we pay in steady-state?

Depends on how many workers are usually waiting on retainer.

Tuesday, May 8, 12

Page 66: Analytic Methods for Optimizing Realtime Crowdsourcing

Expected CostProbability of i busy servers in an M/M/c/c queue is a more general version of Erlang’s Loss Formula:

Derive the expected number of busy workers:

⇡(i) =⇢i/i!Pci=0 ⇢

i/i!

E[i] = ⇢[1� ⇡(c)]

Tuesday, May 8, 12

Page 67: Analytic Methods for Optimizing Realtime Crowdsourcing

Expected CostProbability of i busy servers in an M/M/c/c queue is a more general version of Erlang’s Loss Formula:

Derive the expected number of busy workers:

Total cost is the number of idle workers:

⇡(i) =⇢i/i!Pci=0 ⇢

i/i!

E[i] = ⇢[1� ⇡(c)]

c� ⇢[1� ⇡(c)]

Tuesday, May 8, 12

Page 68: Analytic Methods for Optimizing Realtime Crowdsourcing

Expected Cost

●●

●●

●●

2 4 6 8 10

0.1

0.2

0.5

1.0

2.0

5.0

(a) Cost of retainer

Size of retainer pool

Paym

ent u

nits

per

uni

t tim

e

●●

●●

●●

●●

●●

●●

●●

●●

Traffic intensity ρ0.10.51510

2 4 6 8 10

1e−1

01e−0

71e−0

41e−0

1

(b) Probability of waiting

Size of retainer pool

Prob

abilit

y of

wai

ting

●●

● ● ● ● ● ●●

●●

● ● ● ● ● ● ● ● ● ●

Traffic intensity ρ0.10.51510

2 4 6 8 10

1e−1

01e−0

71e−0

41e−0

11e

+02

(c) Expected wait time

Size of retainer pool

Expe

cted

wai

t tim

e (s

econ

ds)

●●

● ● ● ● ● ● ●●

●●

● ● ● ● ● ● ● ● ● ●

Traffic intensity ρ0.10.51510

Cost goes down when ,but performance suffers.

●●

●●

●●

2 4 6 8 10

0.1

0.2

0.5

1.0

2.0

5.0

(a) Cost of retainer

Size of retainer pool

Paym

ent u

nits

per

uni

t tim

e

●●

●●

●●

●●

●●

●●

●●

●●

Traffic intensity ρ0.10.51510

2 4 6 8 10

1e−1

01e−0

71e−0

41e−0

1

(b) Probability of waiting

Size of retainer pool

Prob

abilit

y of

wai

ting

●●

● ● ● ● ● ●●

●●

● ● ● ● ● ● ● ● ● ●

Traffic intensity ρ0.10.51510

2 4 6 8 10

1e−1

01e−0

71e−0

41e−0

11e

+02

(c) Expected wait time

Size of retainer pool

Expe

cted

wai

t tim

e (s

econ

ds)

●●

● ● ● ● ● ● ●●

●●

● ● ● ● ● ● ● ● ● ●

Traffic intensity ρ0.10.51510

c < ⇢

Tuesday, May 8, 12

Page 69: Analytic Methods for Optimizing Realtime Crowdsourcing

Expected Cost

●●

●●

●●

2 4 6 8 10

0.1

0.2

0.5

1.0

2.0

5.0

(a) Cost of retainer

Size of retainer pool

Paym

ent u

nits

per

uni

t tim

e

●●

●●

●●

●●

●●

●●

●●

●●

Traffic intensity ρ0.10.51510

2 4 6 8 101e−1

01e−0

71e−0

41e−0

1

(b) Probability of waiting

Size of retainer pool

Prob

abilit

y of

wai

ting

●●

● ● ● ● ● ●●

●●

● ● ● ● ● ● ● ● ● ●

Traffic intensity ρ0.10.51510

2 4 6 8 10

1e−1

01e−0

71e−0

41e−0

11e

+02

(c) Expected wait time

Size of retainer pool

Expe

cted

wai

t tim

e (s

econ

ds)

●●

● ● ● ● ● ● ●●

●●

● ● ● ● ● ● ● ● ● ●

Traffic intensity ρ0.10.51510

Cost goes down when ,but performance suffers.

●●

●●

●●

2 4 6 8 10

0.1

0.2

0.5

1.0

2.0

5.0

(a) Cost of retainer

Size of retainer pool

Paym

ent u

nits

per

uni

t tim

e

●●

●●

●●

●●

●●

●●

●●

●●

Traffic intensity ρ0.10.51510

2 4 6 8 10

1e−1

01e−0

71e−0

41e−0

1

(b) Probability of waiting

Size of retainer pool

Prob

abilit

y of

wai

ting

●●

● ● ● ● ● ●●

●●

● ● ● ● ● ● ● ● ● ●

Traffic intensity ρ0.10.51510

2 4 6 8 10

1e−1

01e−0

71e−0

41e−0

11e

+02

(c) Expected wait time

Size of retainer pool

Expe

cted

wai

t tim

e (s

econ

ds)

●●

● ● ● ● ● ● ●●

●●

● ● ● ● ● ● ● ● ● ●

Traffic intensity ρ0.10.51510

c < ⇢

●●

●●

●●

2 4 6 8 10

0.1

0.2

0.5

1.0

2.0

5.0

(a) Cost of retainer

Size of retainer pool

Paym

ent u

nits

per

uni

t tim

e

●●

●●

●●

●●

●●

●●

●●

●●

Traffic intensity ρ0.10.51510

2 4 6 8 101e−1

01e−0

71e−0

41e−0

1

(b) Probability of waiting

Size of retainer pool

Prob

abilit

y of

wai

ting

●●

● ● ● ● ● ●●

●●

● ● ● ● ● ● ● ● ● ●

Traffic intensity ρ0.10.51510

2 4 6 8 10

1e−1

01e−0

71e−0

41e−0

11e

+02

(c) Expected wait time

Size of retainer pool

Expe

cted

wai

t tim

e (s

econ

ds)

●●

● ● ● ● ● ● ●●

●●

● ● ● ● ● ● ● ● ● ●

Traffic intensity ρ0.10.51510

Tuesday, May 8, 12

Page 70: Analytic Methods for Optimizing Realtime Crowdsourcing

Optimal Retainer Size• Size of retainer pool is typically the only

value that requesters can manipulate

• Minimize costs by keeping the retainer pool small while keeping low

ModelOptimizePlatformO

utlin

e

⇡(c)

Tuesday, May 8, 12

Page 71: Analytic Methods for Optimizing Realtime Crowdsourcing

Optimal Retainer SizeBased on Maximum Miss Probability

Given a maximum desired probability of a miss :

Minimize c subject to

0 2 4 6 8

1e−1

01e−0

4

Cost vs. probability of waiting

Expected payments per unit time

Prob

abilit

y of

wai

ting

●●

●●

●● ● ● ● ● ● ●●

●●

●●

●●●●●●● ● ● ● ● ● ● ● ● ● ●●

Traffic intensity ρ0.10.51510

pmax

⇡(c) pmax

Tuesday, May 8, 12

Page 72: Analytic Methods for Optimizing Realtime Crowdsourcing

Optimal Retainer SizeBased on Maximum Miss Probability

Given a maximum desired probability of a miss :

Minimize c subject to

0 2 4 6 8

1e−1

01e−0

4

Cost vs. probability of waiting

Expected payments per unit time

Prob

abilit

y of

wai

ting

●●

●●

●● ● ● ● ● ● ●●

●●

●●

●●●●●●● ● ● ● ● ● ● ● ● ● ●●

Traffic intensity ρ0.10.51510

pmax

⇡(c) pmax

Tuesday, May 8, 12

Page 73: Analytic Methods for Optimizing Realtime Crowdsourcing

Optimal Retainer SizeBased on Joint Cost

If the “pizza delivery” property holds: we can quantify the cost of loss

●●

●●

●●

● ● ● ●

2 4 6 8 10

15

2010

050

0

Total cost of task and retainer

Size of retainer pool

Com

bine

d co

st o

f ret

aine

r and

mis

sed

task

●●

●●

●● ● ● ●

● ●●

● ● ● ●

●● ● ● ● ●

Cost of a missed task1101001000

Tuesday, May 8, 12

Page 74: Analytic Methods for Optimizing Realtime Crowdsourcing

Improving the Retainer Model

ModelOptimizePlatformO

utlin

e

1 Subscriptions2 Shared Pools3 Predictive

Recruitment

Tuesday, May 8, 12

Page 75: Analytic Methods for Optimizing Realtime Crowdsourcing

Retainer Subscriptions

• Proposal: increase by allowing workers to subscribe to realtime tasks

• Instead of posting to the global task list, the platform sends a message to subscribers

• Change crowdsourcing from a pull model to a push model

µ

Tuesday, May 8, 12

Page 76: Analytic Methods for Optimizing Realtime Crowdsourcing

Global Retainer Pools• Sharing one global retainer pool across

requesters improves performance

• Intuition: Most workers are padding for unlikely runs of arrivals)

Time

Task 1

Tuesday, May 8, 12

Page 77: Analytic Methods for Optimizing Realtime Crowdsourcing

Global Retainer Pools• Sharing one global retainer pool across

requesters improves performance

• Intuition: Most workers are padding for unlikely runs of arrivals)

Time

Task 1Task 2

Tuesday, May 8, 12

Page 78: Analytic Methods for Optimizing Realtime Crowdsourcing

Global Retainer Pools• Sharing one global retainer pool across

requesters improves performance

• Intuition: Most workers are padding for unlikely runs of arrivals)

Time

Combined

Tuesday, May 8, 12

Page 79: Analytic Methods for Optimizing Realtime Crowdsourcing

Global Retainer Pools• Sharing one global retainer pool across

requesters improves performance

• Intuition: Most workers are padding for unlikely runs of arrivals)

Time

Combined

Tuesday, May 8, 12

Page 80: Analytic Methods for Optimizing Realtime Crowdsourcing

Global Retainer Pools• Through approximation, individual pools:

• Shared pools across k requesters:

• Loss rate declines exponentially with the number of bundled retainer pools

⇡(c) ⇡p2⇡c

�e�⇢(e⇢/c)c

⇡(c) ⇡p2⇡kc

�e�⇢(e⇢/c)c

�k⇡(c) ⇡p2⇡c

�e�⇢(e⇢/c)c

⇡(c) ⇡p2⇡kc

�e�⇢(e⇢/c)c

�k

⇡(c) ⇡

⇡(c) ⇡

Tuesday, May 8, 12

Page 81: Analytic Methods for Optimizing Realtime Crowdsourcing

Global Retainer Pools• Through approximation, individual pools:

• Shared pools across k requesters:

• Loss rate declines exponentially with the number of bundled retainer pools

⇡(c) ⇡p2⇡c

�e�⇢(e⇢/c)c

⇡(c) ⇡p2⇡kc

�e�⇢(e⇢/c)c

�k⇡(c) ⇡p2⇡c

�e�⇢(e⇢/c)c

⇡(c) ⇡p2⇡kc

�e�⇢(e⇢/c)c

�k

⇡(c) ⇡

⇡(c) ⇡

Tuesday, May 8, 12

Page 82: Analytic Methods for Optimizing Realtime Crowdsourcing

Global Retainer Pools

Cost dramatically decreases as you combine retainers: k dollars to log(k) dollars

Tuesday, May 8, 12

Page 83: Analytic Methods for Optimizing Realtime Crowdsourcing

Global Retainer Routing

• Not every worker in a global retainer pool is good at every task

• If we assigned each worker to any task they could do, some tasks would starve

Tuesday, May 8, 12

Page 84: Analytic Methods for Optimizing Realtime Crowdsourcing

Global Retainer Routing

• We want to maintain a buffer of workers to respond to all kinds of tasks

• A linear programming technique can balance the traffic intensities across all tasks

Tuesday, May 8, 12

Page 85: Analytic Methods for Optimizing Realtime Crowdsourcing

Precruitment• Predictive Recruitment: notify workers

before the task arrives

• Recall workers in expectation of having a task by the time they arrive 2–3 seconds later

Tuesday, May 8, 12

Page 86: Analytic Methods for Optimizing Realtime Crowdsourcing

PrecruitmentFormative Study, N=373 tasks

• 3¢ for 3-minute retainer task: whack-a-mole

• ‘Loading...’ screen for randomly-selected time [0, 20] seconds after worker returns

• Click on randomly-placed mole

Tuesday, May 8, 12

Page 87: Analytic Methods for Optimizing Realtime Crowdsourcing

PrecruitmentFormative Study, N=373 tasks

• 3¢ for 3-minute retainer task: whack-a-mole

• ‘Loading...’ screen for randomly-selected time [0, 20] seconds after worker returns

• Click on randomly-placed mole

Tuesday, May 8, 12

Page 88: Analytic Methods for Optimizing Realtime Crowdsourcing

PrecruitmentResults

• Median time to mouse move: 0.50 seconds

• Standard retainer model (start timer @ alert): median mouse move in 1.36 seconds

Time waiting for task (sec)

Res

pons

e tim

e to

mov

e m

ouse

(sec

)

0.51.0

5.010.0

●●

●●

●●

● ● ●●

● ● ●●

● ●●

●●

●● ●

●●

●●●

●●●●

●●

●●

●●●

●●

●●

● ●

●● ●

●●●

●●

●●●

●●

●● ●

●●

● ●

●●

● ● ●●●

●●

●●

● ● ● ●●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

● ● ●●

●●

●●

●● ●●●

●●

●●

●●●

●●

● ●●

● ●

●●

●●

●●●

●●

●●

●●

●●●

● ●● ●●

●● ●●●

●●

●● ●

●●

●●

●●●

● ●

●● ●

●●

●●

●●

● ●●

●●

● ●

●●

●●

● ●●

●●

●●

● ● ●●

●●

●●

●●

●●

●● ●

●●●●

0 5 10 15 20

Tuesday, May 8, 12

Page 89: Analytic Methods for Optimizing Realtime Crowdsourcing

Discussion• Empirics: Can deployed crowdsourcing

platforms support lots of realtime tasks?

• Theory: Crowds as queueing systems

• Reputation: median response time, overall response rate

Tuesday, May 8, 12

Page 90: Analytic Methods for Optimizing Realtime Crowdsourcing

MIT HUMAN-COMPUTER INTERACTION

Use queueing theory to understand and optimize performance of a paid, realtime crowdsourcing platform.

•Relationship between crowd size and response time

•Algorithm for optimizing crowd size vs. response time

• Improvements to the platform: 500 millisecond feedback

Tuesday, May 8, 12

Page 91: Analytic Methods for Optimizing Realtime Crowdsourcing

MIT HUMAN-COMPUTER INTERACTION

Analytic Methods for Optimizing Realtime Crowdsourcing

Michael Bernstein, David Karger, Rob Miller, and Joel BrandtMIT CSAIL and Adobe Systems

Tuesday, May 8, 12