cs 5306 info 5306: crowdsourcing and human …hirsh/5306/lecture21.pdfinfo 5306: crowdsourcing and...

CS 5306INFO 5306:

Crowdsourcing andHuman Computation

Lecture 2111/14/17

Haym Hirsh

?

??

?

?

?

?

?

?

?

?

? ?

? ?

? Long-term goal:Integrating human and machine intelligence

Using human computation inartificial intelligence

Using artificial intelligence inhuman computation

"Artificial intelligence and collective intelligence", Weld, D.S., Lin, C.H. and Bragg, J., 2015

• Why AI for human computation• Difficult to manage large number of tasks across diverse workers

• Allowing for more complex workflows

• Ease of use

• Efficiency gains

• Making sense of differing inputs


• Modeling worker skill• Given “gold standard” questions for which answers are known

• “Programmatic Gold: Targeted and Scalable Quality Assurance in Crowdsourcing”, Oleson D, Sorokin A, Laughlin GP, Hester V, Le J, Biewald L., HComp 2011

“Programmatic Gold: Targeted and Scalable Quality Assurance in Crowdsourcing”, Oleson D, Sorokin A, Laughlin GP, Hester V,

Le J, Biewald L., HComp 2011

• Gold questions• Must be disguised to look like other tasks (even as other tasks might change,

such as wording)

• Must be enough of them otherwise workers learn to recognize gold questions, resulting in incorrect accuracy estimates

• Can use as a tool to teach workers by giving the correct answer when they give the wrong one

• Can create questions targeting common errors

• Can automate gold question creation (“pyrite” questions)• Mutate questions so that answers change in known ways (such as yes -> no)

• Take questions with a strong consensus for a single answer as new gold questions


• Modeling worker skill• Given “gold standard” questions for which answers are known

• “Programmatic Gold: Targeted and Scalable Quality Assurance in Crowdsourcing”, Oleson D, Sorokin A, Laughlin GP, Hester V, Le J, Biewald L., Hcomp 2011

• Collective Assessment using Expectation Maximization• “Maximum likelihood estimation of observer error-rates using the EM algorithm”, Dawid

AP, Skene AM. Applied statistics. 1979 Jan 1:20-8.

“Maximum likelihood estimation of observer error-rates using the EM algorithm”, Dawid AP, Skene AM, Applied Statistics

• Goal:Estimate 𝑃𝑤 𝑟 𝑎 for worker w, correct answer a, possible response r

• Approach:• Set 𝑃𝑤 𝑟 𝑎 at random

• Iteratively improve 𝑃𝑤 𝑟 𝑎• Compute weighted majority vote

• Compare to correct answers

• Adjust 𝑃𝑤 𝑟 𝑎


• Collective Assessment using Expectation Maximization• “Maximum likelihood estimation of observer error-rates using the EM algorithm”, Dawid AP,

Skene AM. Applied statistics. 1979 Jan 1:20-8.


• Collective Assessment using Expectation Maximization• “Maximum likelihood estimation of observer error-rates using the EM algorithm”,

Dawid AP, Skene AM. Applied statistics. 1979 Jan 1:20-8.• “Whose vote should count more: Optimal integration of labels from labelers of

unknown expertise”, Whitehill, J.; Ruvolo, P.; Bergsma, J.; Wu, T.; and Movellan, J., NIPS 2009.• Learn task difficulties

• “The multidimensional wisdom of crowds”, Welinder, P.; Branson, S.; Belongie, S.; and Perona, NIPS 2010.• Learn other task parameters

• “An algorithm that finds truth even if most people are wrong”, Prelec D, Seung S., 2007.• Elicit “meta-knowledge: Weight workers by how well they predict the crowd (Bayesian Truth

Serum)• “Crowdsourcing control: Moving beyond multiple choice”, Lin, C. H.; Mausam; and

Weld, D. S. UAI 2012.• Allow tasks without a fixed set of answers


• Collective Assessment using Expectation Maximization• “Learning from crowds”, Raykar, V. C.; Yu, S.; Zhao, L. H.; and Valadez, G., Journal of Machine

Learning Research 11:1297–1322, 2010• Weight workers not by label accuracy but by accuracy of the outcome of machine learning on the data

• “Bayesian bias mitigation for crowdsourcing”, Wauthier, F. L., and Jordan, M. I., NIPS 2011• Learn worker biases (and how to weight them) from the outcomes of machine learning on the data

• “Vox populi: Collecting high-quality labels from a crowd”, Dekel O, Shamir O., COLT 2009.• Prune workers whose responses change the outcomes of machine learning

• “Good learners for evil teachers”, Dekel O, Shamir O, ICML 2009• Integrate learning fallibility into the machine learning algorithm

• “False discovery rate control and statistical quality assessment of annotators in crowdsourced ranking” ICML 2016.• Learn systematic biases of workers (like preferring clicking left side answers)

• “Optimality of Belief Propagation for Crowdsourced Classification”, Jungseul Ok, Sewoong Oh, Jinwoo Shin, Yung Yi, ICML 2016• Optimality of algorithms if each worker is given only two tasks

• Better learning algorithms• Better analyses


• Workflow optimization:• By hand

• How many votes: Estimating (a,d) – a answer, d difficulty• “Decision-theoretic control of crowd-sourced workflows”,

Dai, P.; Mausam; and Weld, D. S., AAAI 2010

• “POMDP-based control of workflows for crowdsourcing” Dai, P.; Lin, C. H.; Mausam; and Weld, D. S., Artificial Intelligence 202:52–85, 2013


• Workflow optimization:• How many votes: “Exact exponent in optimal rates for crowdsourcing”, Gao C, Lu Y,

Zhou D., ICML 2016. • Accuracy/cost tradeoff in a database setting: “Crowdscreen: Algorithms for filtering

data with humans”, Parameswaran AG, Garcia-Molina H, Park H, Polyzotis N, Ramesh A, Widom J., SIGMOD 2010.

• Value of a worker’s judgement based on information it gives about answer: “Pay by the bit: an information-theoretic metric for collective human judgment”, Waterhouse TP., CSCW 2013.

• Also assess value of machine intelligence: “Combining human and machine intelligence in large-scale crowdsourcing”, Kamar, E.; Hacker, S.; and Horvitz, E., AAMAS 2012.

• Switching workflows: “Dynamically switching between synergistic workflows for crowdsourcing”, Lin, C. H.; Mausam; and Weld, D. S., AAAI 2012.


• Active learning – what to label:• “Get another label? improving data quality and data mining using multiple, noisy

labelers”, Sheng, V. S.; Provost, F.; and Ipeirotis, P. G., KDD 2008.• Which items to relabel to improve learning

• “To re(label), or not to re(label), Lin, C. H.; Mausam; and Weld, D. S., HCOMP 2014• Is it better to relabel or to label something new

• “Proactive learning: cost-sensitive active learning with multiple imperfect oracles”, Donmez, P., and Carbonell, J. G., CIKM 2008.• Weigh tradeoff between worker accuracy and cost

• “Efficiently learning the accuracy of labeling sources for selective sampling”, Donmez, P.; Carbonell, J. G.; and Schneider, J., KDD 2009. • Weigh value of learning more about worker accuracy (“exploration vs exploitation”)



• Selecting the best worker for the task:• “Optimistic knowledge gradient policy for optimal budget allocation in crowdsourcing”

Chen, X.; Lin, Q.; and Zhou, D., ICML 2013• Adapt assignments to learned worker accuracy (without limit though)

• “Budget-optimal crowdsourcing using low-rank matrix approximations”, Karger, D. R.; Oh, S.; and Shah, D., Conference on Communication, Control, and Computing 2011• How to allocate tasks based on probability of error, cost

• “Online task assignment in crowdsourcing markets”, Ho, C.-J., and Vaughan, J. W., AAAI 2012“Adaptive task assignment for crowdsourced classification”, Ho, C.-J.; Jabbari, S.; and Vaughan, J. W., ICML 2013• Tasks fall into categories, workers have (initially unknown) ability on each type of task and a max

number they can be given



• Selecting the best worker for the task:• “Parallel task routing for crowdsourcing” Bragg, J.; Kolobov, A.; Mausam; and

Weld, D. S., HCOMP 2014. • Assign batches of tasks to workers given known task difficulties and worker abilities

• “Generalized Task Markets for Human and Machine Computation”, Shahaf D, Horvitz E., AAAI 2010.• Assign tasks to workers based on known ability and cost


• Minimizing latency:• Retainer model

• “Vizwiz: nearly real-time answers to visual questions”, Bigham, J. P.; Jayant, C.; Ji, H.; Little, G.; Miller, A.; Miller, R. C.; Miller, R.; Tatarowicz, A.; White, B.; White, S.; and Yeh, T., UIST 2010.

• Queuing theory to model arrival of workers and plan accordingly• “Analytic methods for optimizing realtime crowdsourcing”, Bernstein, M.; Karger, D.;

Miller, R.; and Brandt, J., Collective Intelligence 2012.

?

??

?

?

?

?

?

?

?

?

? ?

? ?




“artificial artificial intelligence”:Human Computation as “Fake” AI

AI = Machine Learning

Machine Learning = Human-Labeled Data

Human-Labeled Data = Human Computation




AI = Human Computation




AI = Human Computation

(just get me more data)

Human Computation to Label Data

• Main application areas:

• Computer vision

• Natural language processing

Human Computation in Machine Learning

• Clustering:• Heikinheimo, H. and Ukkonen, A., 2013. The crowd-median algorithm. In First

AAAI Conference on Human Computation and Crowdsourcing.

• Tamuz, O., Liu, C., Belongie, S., Shamir, O. and Kalai, A.T., 2011. Adaptively learning the crowd kernel. Proceedings of the 28th International Conference on Machine Learning.

Machine Learning with Humans “In the Loop”

• Iterate between learning and human labeling• “Visual recognition with humans in the loop”, Branson, S., Wah, C., Schroff, F.,

Babenko, B., Welinder, P., Perona, P. and Belongie, S., ECCV 2010.

• Game does machine learning behind the scene, guiding the game• “Game-powered machine learning”, Luke Barrington, Douglas Turnbull, and

Gert Lanckriet, Proceedings of the National Academy of Sciences, 109(17), pp.6411-6416, 2012.

• Iterate between learning and feature elicitation• “Crowdsourcing Feature Discovery via Adaptively Chosen Comparisons”,

Chaudhuri, K. and Kalai, A., HComp 2015.

Computer Vision/Robotics with Human Computation

• Game with two players, one is human controller other “robot”, learn from how they interact• “Crowdsourcing human-robot interaction: New methods and system evaluation in a public

environment”, Breazeal, C., DePalma, N., Orkin, J., Chernova, S. and Jung, M., Journal of Human-Robot Interaction, 2013

• Robot reinforcement learning with humans providing the feedback signal with clicks• “Robot reinforcement learning using crowdsourced rewards”, Penaloza, C.I., Chernova, S., Mae, Y.

and Arai, T., Proceedings of the IROS Workshop on Cloud Robotics, 2013

• Use crowd for o dialog authoring, dialog editing, and nonverbal behavior authoring• “Semi-situated learning of verbal and nonverbal content for repeated human-robot interaction”,

Leite, I., Pereira, A., Funkhouser, A., Li, B. and Lehman, J.F., Proceedings of the 18th ACM International Conference on Multimodal Interaction, 2016

• Get samples of natural language commands using AMT, get judgments of robot success using AMT• "Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation“,

Tellex, S. et al, AAAI 2011

?

??

?

?

?

?

?

?

?

?

? ?

? ?




• “Leveraging Complementary Contributions of Different Workers for Efficient Crowdsourcing of Video Captions”, Huang Y, Huang Y, Xue N, Bigham JP, Proceedings of the CHI Conference on Human Factors in Computing Systems 2017• Integrates automated speech recognition with human speakers of differing

proficiencies

Next Time

• “Generalized Task Markets for Human and Machine Computation”, Shahaf, D. and Horvitz, E., AAAI 2010.

Question: Where are the markets?

cs 5306 info 5306: crowdsourcing and human …hirsh/5306/lecture21.pdfinfo 5306: crowdsourcing and...

Documents