crowdfill: collecting structured data from the crowd

24
CrowdFill: Collecting Structured Data from the Crowd Hyunjung Park Jennifer Widom Stanford University

Upload: hyunjung-park

Post on 10-May-2015

287 views

Category:

Software


1 download

TRANSCRIPT

Page 1: CrowdFill: Collecting Structured Data from the Crowd

CrowdFill: Collecting Structured Data from the Crowd

Hyunjung Park Jennifer Widom

Stanford University

Page 2: CrowdFill: Collecting Structured Data from the Crowd

Goal

•Collect high-quality structured data from the crowd, while capping total monetary cost and keeping latency low

6/25/2014 Hyunjung Park 2

name nationality position caps goals

Brazil

Messi FW

Klose Germany 133

Page 3: CrowdFill: Collecting Structured Data from the Crowd

Traditional Microtask-based Approach

1. Decompose the data collection task into a set of microtaskse.g., “What position does Klose play?”

“How many goals has Messi scored?”

2. Each worker provides specific pieces of data via microtasks

3. Assemble the collected pieces of data into the final table

6/25/2014 Hyunjung Park 3

Page 4: CrowdFill: Collecting Structured Data from the Crowd

CrowdFill’s Table-filling Approach

1. Present an entire partially-filled table to all participating workers

2. Each worker contributes what they know to the table by filling in empty cells, and voting on data entered by others

3. Propagate worker actions in real-time to synchronize the table across all workers

6/25/2014 Hyunjung Park 4

Page 5: CrowdFill: Collecting Structured Data from the Crowd

CrowdFill’s Table-filling Approach

6/25/2014 Hyunjung Park 5

Page 6: CrowdFill: Collecting Structured Data from the Crowd

Outline

•Formal model

•Overall architecture

•Concurrent operations

•Satisfying values constraint

•Compensation scheme

•Experimental evaluation

•Related work

6/25/2014 Hyunjung Park 6

Page 7: CrowdFill: Collecting Structured Data from the Crowd

Formal Model: Schema

•Table SpecificationColumn definitions and primary keySoccerPlayer(name, nationality, position, caps, goals)

•Scoring FunctionAccept a row r if and only if f(ur, dr) > 0

where ur and dr are its upvote and downvote countse.g., “majority of three or more”

f(ur, dr) = ur−dr if ur+dr≥20 otherwise

6/25/2014 Hyunjung Park 7

Page 8: CrowdFill: Collecting Structured Data from the Crowd

Formal Model: Constraints

•Values ConstraintFinal table S must “match” template T (a partially-filled

table)

•Cardinality ConstraintFinal table S must contain at least N rowsSpecial case of values constraint

6/25/2014 Hyunjung Park 8

name nationality position

Argentina

FW

name nationality position

Messi Argentina FW

Rooney England FW

Page 9: CrowdFill: Collecting Structured Data from the Crowd

Formal Model: Candidate Table

•Candidate table RExposed to clientsPrimary key not enforcedEach row annotated with its upvote and downvote

counts

6/25/2014 Hyunjung Park 9

name nationality position

Messi Argentina FW 2 0

Ronaldo Portugal FW 3 0

Ronaldo Portugal MF 2 1

Neymar Brazil 0 1

Page 10: CrowdFill: Collecting Structured Data from the Crowd

Formal Model: Operations

•Primitive Operations on R Insert a new empty row into RFill in an empty column of a row with a valueUpvote a complete rowDownvote a non-empty row

6/25/2014 Hyunjung Park 10

name nationality position

Messi Argentina FW 2 0

Ronaldo Portugal FW 3 0

Ronaldo Portugal MF 2 1

Neymar Brazil 0 1

name nationality position

Messi Argentina FW 2 0

Ronaldo Portugal FW 3 0

Ronaldo Portugal MF 2 1

Neymar Brazil 0 1

0 0

name nationality position

Messi Argentina FW 2 0

Ronaldo Portugal FW 3 0

Ronaldo Portugal MF 2 1

Neymar Brazil 0 1

Klose 0 0

name nationality position

Messi Argentina FW 2 0

Ronaldo Portugal FW 3 0

Ronaldo Portugal MF 2 1

Neymar Brazil 0 1

Klose Germany 0 0

name nationality position

Messi Argentina FW 2 0

Ronaldo Portugal FW 3 0

Ronaldo Portugal MF 2 1

Neymar Brazil 0 1

Klose Germany DF 0 0

name nationality position

Messi Argentina FW 2 0

Ronaldo Portugal FW 3 0

Ronaldo Portugal MF 2 1

Neymar Brazil 0 1

Klose Germany DF 1 0

name nationality position

Messi Argentina FW 2 0

Ronaldo Portugal FW 3 0

Ronaldo Portugal MF 2 1

Neymar Brazil 0 1

Klose Germany DF 1 1

name nationality position

Messi Argentina FW 2 0

Ronaldo Portugal FW 3 0

Ronaldo Portugal MF 2 1

Neymar Brazil 0 1

Klose Germany DF 1 2

Page 11: CrowdFill: Collecting Structured Data from the Crowd

Formal Model: Final Table

•Final table SDerived from candidate table REach complete row r in R such that f(ur, dr) > 0, andf(ur, dr) is the highest score of any row with the same primary key as r

6/25/2014 Hyunjung Park 11

name nationality position

Messi Argentina FW 2 0

Ronaldo Portugal FW 3 0

Ronaldo Portugal MF 2 1

Neymar Brazil 0 1

Klose German DF 1 2

name nationality position

Messi Argentina FW

Ronaldo Portugal FW

Page 12: CrowdFill: Collecting Structured Data from the Crowd

CrowdFill Architecture

Front-end Server

Back-end Server

Database

Worker Client

Web Interface

CrowdsourcingMarketplace

taskacceptance

task setup,payment

results collectiontable specs, payment

Execution Server

CentralClient

Worker Client

Worker Client

Worker Client

dataentry

6/25/2014 Hyunjung Park 12

Page 13: CrowdFill: Collecting Structured Data from the Crowd

Outline

•Formal model

•Overall architecture

•Concurrent operations

•Satisfying values constraint

•Compensation scheme

•Experimental evaluation

•Related work

6/25/2014 Hyunjung Park 13

Page 14: CrowdFill: Collecting Structured Data from the Crowd

Concurrent Operations

•Model designed to minimize effects of concurrency (details in paper)Operations are easily mergedConflicts are resolved seamlessly

•Convergence theoremArchitecture ensures server and all clients apply the

same operations, possibly with different ordersTheorem guarantees that server and all clients

converge to the same candidate table whenever the system quiesces

6/25/2014 Hyunjung Park 14

Page 15: CrowdFill: Collecting Structured Data from the Crowd

Satisfying Values Constraint

•Values constraint Final table S must match template T

•Worker clientsPerform fill, upvote, and downvote operationsNeed not be aware of the template T

• Special “Central client”Automatically populates new rows to guide the final table S

towards the template T

• Probable Row Invariant (PRI)R always contains just enough “probable” rows matching

template TPRI maintained based on maximum bipartite matching

6/25/2014 Hyunjung Park 15

Page 16: CrowdFill: Collecting Structured Data from the Crowd

Compensation Scheme: Overview

•After data collectionAllocate a total monetary budget based on each

worker’s overall contribution to the final tableEncourage workers to submit useful workMake total monetary cost predictable

•During data collectionProvide estimated compensation for individual actions

to keep workers engaged

6/25/2014 Hyunjung Park 16

Page 17: CrowdFill: Collecting Structured Data from the Crowd

Compensation Scheme: Contribution

•Given final table S, operation op contributed to Sif:op filled in a cell in S (“direct” contribution)op first provided a value for S while creating a subset

of a row in S (“indirect” contribution)op upvoted a row in Sop downvoted a combination of values not present in S

6/25/2014 Hyunjung Park 17

Page 18: CrowdFill: Collecting Structured Data from the Crowd

Compensation Scheme: Allocation

•Uniform allocationEach cell and contributing vote has the same

compensationEach cell divided into direct and indirect contributions

•Column-weighted allocationTake into account varying difficulty of filling in

different columns and casting votes

•Dual-weighted allocationAlso take into account entering new key values can get

progressively more difficult as the table fills up

6/25/2014 Hyunjung Park 18

Page 19: CrowdFill: Collecting Structured Data from the Crowd

Experimental Evaluation: Setting

•SoccerPlayer(name, nationality, position, caps, goals, date-of-birth)

•Scoring function: “majority of three or more”

•Goal: information about 20 players with caps between 80 and 99

•Five volunteer workers

•Total monetary budget: $10

•Dual-weighted allocation scheme

6/25/2014 Hyunjung Park 19

Page 20: CrowdFill: Collecting Structured Data from the Crowd

Experimental Evaluation: Summary

• In our representative runOverall latency: 10m 44s#Rows in the candidate table: 23Final compensations: $0.51, $1.68, $2.08, $2.24, $3.49No “slowdown” in obtaining new primary keys

6/25/2014 Hyunjung Park 20

Page 21: CrowdFill: Collecting Structured Data from the Crowd

Accuracy of Estimated Compensation

6/25/2014 Hyunjung Park 21

Page 22: CrowdFill: Collecting Structured Data from the Crowd

Related Work

•Crowdsourcing structured dataCrowdDB [Franklin et al. 2011]Deco [Park et al. 2012]

•Real-time cooperative editing systemsConvergence [Ellis and Gibbs 1989] Intention preservation [Sun et al. 1998]

•Monetary compensation for crowdsourcing Incentive designs [Shaw et al. 2011]

6/25/2014 Hyunjung Park 22

Page 23: CrowdFill: Collecting Structured Data from the Crowd

Summary

•CrowdFill’s novel table-filling approachReal-time collaboration among workers Intuitive data entry interfaceCompensation based on contribution

• In the paper:Full description of the formal modelPRI maintenance algorithm with examples

More details about the compensation schemeMore experimental results

6/25/2014 Hyunjung Park 23

Page 24: CrowdFill: Collecting Structured Data from the Crowd

Thank you