2014.chi.structured labeling to facilitate concept evolution in machine learning
TRANSCRIPT
STRUCTURED LABELING TO FACILITATE CONCEPT EVOLUTION IN MACHINE LEARNING
Presenter: Hillol Sarker
AuthorsTodd Kulesza, Saleema Amershi, Rich Caruana, Danyel Fisher, Denis Charles
Motivation Machine Learning
We want to train a machine according to some target concept
Supervised machine learning needs consistent labeled data e.g., spam filter, email prioritize Difficult to obtain
Introduction Preliminary Study Incorporate Feedback Study Result Conclusion
Problem Labeling Consistency is compromised
Labeler Expertise Familiarity with concept Judgment ability
Data Contains Ambiguity Changing distribution
Concept change over time
Example?Example?
Introduction Preliminary Study Incorporate Feedback Study Result Conclusion
Semantic Location
Concept EvolutionConcept Evolution
Introduction Preliminary Study Incorporate Feedback Study Result Conclusion
Existing Approach Machine Learning approaches
Noise-tolerant algorithm Multiple labeler Majority voting Weighting scheme Pairwise comparison (A better fit, then B)
Problem: No human judgment
Introduction Preliminary Study Incorporate Feedback Study Result Conclusion
Approach Conduct series of formative studies
In order to investigate concept evolution in practice
Observations and feedback from these studies informed final prototype Incorporate feedbacks on initial labeler software
Design a Study Evaluate proposed Structured Labeling
Introduction Preliminary Study Incorporate Feedback Study Result Conclusion
Preliminary Study 1 Researchers/practitioners create
guidelines for labelers Interviewed 2 Feedbacks
Guideline creation process is iterative Evolves observing new data
e.g., examples with multiple interpretation
Introduction Preliminary Study Incorporate Feedback Study Result Conclusion
Preliminary Study 2 Recruited 11 machine learning expert Binary choice task
Prototype Software
Introduction Preliminary Study Incorporate Feedback Study Result Conclusion
Preliminary Study 3 Conducted on 9 of previous 11
participants 4 week apart Using Same Prototype Software
Same content but shuffled order
Not Significant Difference
Significant Difference
Introduction Preliminary Study Incorporate Feedback Study Result Conclusion
Incorporate Feedbacks in Study Software
Introduction Preliminary Study Incorporate Feedback Study Result Conclusion
Study Software Interface Experiment tested 3 interface conditions
Baseline Traditional Mutually Exclusive “Yes”, “No”, “Could be”
Structured Manual Structuring
Structured Labeling Assisted Structuring
Structured Labeling + Automated Assistance
Introduction Preliminary Study Incorporate Feedback Study Result Conclusion
Study Procedure 15 participant
108 items to label
Fixed task order Cooking, travel,
and gardening
Study Procedure Brief Introduction Time to practice Log interaction in each interface Completion of each
task=>Questionnaire Completion of 3 task=>Questionnaire
Introduction Preliminary Study Incorporate Feedback Study Result Conclusion
Result: GroupGroup CountStructured > Baseline (p<0.001)
Manual > Baseline (p<0.001) Assisted > Baseline (p<0.001)
Pages per GroupCould be < Yes or NoYes < No
No Could Be Yes
Introduction Preliminary Study Incorporate Feedback Study Result Conclusion
Result: RevisionRevisited Count
Manual > Baseline (p<0.005) Assisted > Baseline (p<0.005)
Revised CountStructured > Baseline (p<0.011)
Manual > Baseline (p<0.006) Assisted > Baseline (p<0.024)
First Half Last Half
Introduction Preliminary Study Incorporate Feedback Study Result Conclusion
Result: Label Quality Matric ARI (Adjusted Rand Index)
Measures Agreement Pairs of items that should end up together over all
possible pairs Label Quality
Manual > Baseline (p=0.02) Assisted > Baseline (p=0.02) Manual ≠ Assisted (P=0.394)
Introduction Preliminary Study Incorporate Feedback Study Result Conclusion
Result: Labeling Labeling Speed
Manual < Baseline (p=0.003) Assisted < Baseline (p<0.001)
Introduction Preliminary Study Incorporate Feedback Study Result Conclusion
Feedback Participant
ranked each tool as their favorite
How often did your concept change? Likert-scale
Favorite Lease Favorite
Introduction Preliminary Study Incorporate Feedback Study Result Conclusion
Summary Structured Labeling
Helps people evolve concept Increases label consistency at cost of speed
Can help Machine learning algorithm Weight for groups (e.g., “definitely yes” vs.
“yes”)
Introduction Preliminary Study Incorporate Feedback Study Result Conclusion
Contribution Concept evolution causes inconsistent
labeling
Being first to show its importance
Not Significant Difference Significant Difference
Introduction Preliminary Study Incorporate Feedback Study Result Conclusion