2014.chi.structured labeling to facilitate concept evolution in machine learning

STRUCTURED LABELING TO FACILITATE CONCEPT EVOLUTION IN MACHINE LEARNING

Presenter: Hillol Sarker

AuthorsTodd Kulesza, Saleema Amershi, Rich Caruana, Danyel Fisher, Denis Charles

Motivation Machine Learning

We want to train a machine according to some target concept

Supervised machine learning needs consistent labeled data e.g., spam filter, email prioritize Difficult to obtain

Introduction Preliminary Study Incorporate Feedback Study Result Conclusion

Problem Labeling Consistency is compromised

Labeler Expertise Familiarity with concept Judgment ability

Data Contains Ambiguity Changing distribution

Concept change over time

Example?Example?


Semantic Location

Concept EvolutionConcept Evolution


Existing Approach Machine Learning approaches

Noise-tolerant algorithm Multiple labeler Majority voting Weighting scheme Pairwise comparison (A better fit, then B)

Problem: No human judgment


Approach Conduct series of formative studies

In order to investigate concept evolution in practice

Observations and feedback from these studies informed final prototype Incorporate feedbacks on initial labeler software

Design a Study Evaluate proposed Structured Labeling


Preliminary Study 1 Researchers/practitioners create

guidelines for labelers Interviewed 2 Feedbacks

Guideline creation process is iterative Evolves observing new data

e.g., examples with multiple interpretation


Preliminary Study 2 Recruited 11 machine learning expert Binary choice task

Prototype Software


Preliminary Study 3 Conducted on 9 of previous 11

participants 4 week apart Using Same Prototype Software

Same content but shuffled order

Not Significant Difference

Significant Difference


Incorporate Feedbacks in Study Software


Study Software Interface Experiment tested 3 interface conditions

Baseline Traditional Mutually Exclusive “Yes”, “No”, “Could be”

Structured Manual Structuring

Structured Labeling Assisted Structuring

Structured Labeling + Automated Assistance


Study Procedure 15 participant

108 items to label

Fixed task order Cooking, travel,

and gardening

Study Procedure Brief Introduction Time to practice Log interaction in each interface Completion of each

task=>Questionnaire Completion of 3 task=>Questionnaire


Result: GroupGroup CountStructured > Baseline (p<0.001)

Manual > Baseline (p<0.001) Assisted > Baseline (p<0.001)

Pages per GroupCould be < Yes or NoYes < No

No Could Be Yes


Result: RevisionRevisited Count


Revised CountStructured > Baseline (p<0.011)


First Half Last Half


Result: Label Quality Matric ARI (Adjusted Rand Index)

Measures Agreement Pairs of items that should end up together over all

possible pairs Label Quality

Manual > Baseline (p=0.02) Assisted > Baseline (p=0.02) Manual ≠ Assisted (P=0.394)


Result: Labeling Labeling Speed

Manual < Baseline (p=0.003) Assisted < Baseline (p<0.001)


Feedback Participant

ranked each tool as their favorite

How often did your concept change? Likert-scale

Favorite Lease Favorite


Summary Structured Labeling

Helps people evolve concept Increases label consistency at cost of speed

Can help Machine learning algorithm Weight for groups (e.g., “definitely yes” vs.

“yes”)


Contribution Concept evolution causes inconsistent

labeling

Being first to show its importance

Not Significant Difference Significant Difference


Critique of work Fixed task order used

e.g., Cooking, travel, and gardening Carry over effect

Limited to supervised learning Assisted structuring

Not always possible May bias decision


Thank You

2014.chi.structured labeling to facilitate concept evolution in machine learning

Science

assisted baseline p

manual assisted p

machine learning presenter

motivation machine

concept judgment ability

initial labeler software

distribution concept

problem labeling consistency