inception workshop active learning for text … · 12/03/2018 · inception workshop active...

31
INCEpTION workshop Active Learning for Text Annotation March 12th, 2018 | TU Darmstadt | Computer Science Department | UKP Lab | Ji-Ung Lee | INCEpTION Workshop | 1

Upload: phungcong

Post on 17-Sep-2018

228 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: INCEpTION workshop Active Learning for Text … · 12/03/2018 · INCEpTION workshop Active Learning for Text Annotation March 12th, 2018 | TU Darmstadt | Computer Science Department

INCEpTION workshop

Active Learning for Text Annotation

March 12th, 2018 | TU Darmstadt | Computer Science Department | UKP Lab | Ji-Ung Lee | INCEpTION Workshop | 1

Page 2: INCEpTION workshop Active Learning for Text … · 12/03/2018 · INCEpTION workshop Active Learning for Text Annotation March 12th, 2018 | TU Darmstadt | Computer Science Department

Outline

I Motivation

I Active Learning in a Nutshell

I Active Learning Scenarios

I Sampling Strategies, Advantages and Disadvantages

I Conclusion

March 12th, 2018 | TU Darmstadt | Computer Science Department | UKP Lab | Ji-Ung Lee | INCEpTION Workshop | 2

Page 3: INCEpTION workshop Active Learning for Text … · 12/03/2018 · INCEpTION workshop Active Learning for Text Annotation March 12th, 2018 | TU Darmstadt | Computer Science Department

Motivation

A Supervised Machine Learning (ML) Approach

1. Annotatedocuments

2. Train a model

3. Evaluate it

Supervised Machine Learning:

MLModel

HumanAnnotators

Evaluation

1

2 3

?

StaticProcess

March 12th, 2018 | TU Darmstadt | Computer Science Department | UKP Lab | Ji-Ung Lee | INCEpTION Workshop | 3

Page 4: INCEpTION workshop Active Learning for Text … · 12/03/2018 · INCEpTION workshop Active Learning for Text Annotation March 12th, 2018 | TU Darmstadt | Computer Science Department

Motivation

A Supervised Machine Learning (ML) Approach

I What if the model performs poorly?

1. Try out different modelsI Tune hyper-parametersI Use different featuresI . . .

2. Annotate more data for trainingI Resource consumingI Not necessarily helpful

March 12th, 2018 | TU Darmstadt | Computer Science Department | UKP Lab | Ji-Ung Lee | INCEpTION Workshop | 4

Page 5: INCEpTION workshop Active Learning for Text … · 12/03/2018 · INCEpTION workshop Active Learning for Text Annotation March 12th, 2018 | TU Darmstadt | Computer Science Department

Motivation

Example – WSD (bass)

Task: Classify sentences containing bass into the correct senses

I like playing the bass guitar. → bass (instrument)

I caught a big bass yesterday. → bass (fish)

Turn down the bass. → bass (tone)

March 12th, 2018 | TU Darmstadt | Computer Science Department | UKP Lab | Ji-Ung Lee | INCEpTION Workshop | 5

Page 6: INCEpTION workshop Active Learning for Text … · 12/03/2018 · INCEpTION workshop Active Learning for Text Annotation March 12th, 2018 | TU Darmstadt | Computer Science Department

Motivation

Example – WSD (bass)

Perfect Model:

bass(fish)

bass (tone)

bass(instrument)

bass(instrument)

March 12th, 2018 | TU Darmstadt | Computer Science Department | UKP Lab | Ji-Ung Lee | INCEpTION Workshop | 6

Page 7: INCEpTION workshop Active Learning for Text … · 12/03/2018 · INCEpTION workshop Active Learning for Text Annotation March 12th, 2018 | TU Darmstadt | Computer Science Department

Motivation

Example – WSD (bass)

Imperfect Model:

bass(fish)

bass (tone)

bass(instrument)

bass(instrument)

March 12th, 2018 | TU Darmstadt | Computer Science Department | UKP Lab | Ji-Ung Lee | INCEpTION Workshop | 7

Page 8: INCEpTION workshop Active Learning for Text … · 12/03/2018 · INCEpTION workshop Active Learning for Text Annotation March 12th, 2018 | TU Darmstadt | Computer Science Department

Motivation

Example – WSD (bass)

I True labels are unknown beforeannotation

I More annotated sentences forbass (instrument) may not help,since the model is already goodfor bass (instrument)

Annotating more data:

HumanAnnotators?

?

bass(instrument)

MLModel

Sample data randomly

How to assess the helpfulness of unlabeled data?I Active Learning

March 12th, 2018 | TU Darmstadt | Computer Science Department | UKP Lab | Ji-Ung Lee | INCEpTION Workshop | 8

Page 9: INCEpTION workshop Active Learning for Text … · 12/03/2018 · INCEpTION workshop Active Learning for Text Annotation March 12th, 2018 | TU Darmstadt | Computer Science Department

Active Learning in a Nutshell

Active learning hypothesis:Machine Learning (ML) algorithms can learn faster (and better) if they may chosethe training data themselves [1]

1. Sample most informativeexample(s)

2. Query those example(s)for labeling to an oracle(human annotator)

3. Improve model iteratively

Active Learning:

MLModel

HumanAnnotators

Re-training &Evaluation

Best Model

2

1 3

?

?Sampling

IterativeProcess

March 12th, 2018 | TU Darmstadt | Computer Science Department | UKP Lab | Ji-Ung Lee | INCEpTION Workshop | 9

Page 10: INCEpTION workshop Active Learning for Text … · 12/03/2018 · INCEpTION workshop Active Learning for Text Annotation March 12th, 2018 | TU Darmstadt | Computer Science Department

Active Learning Scenarios

Pool-based Sampling Scenario

I Pool-based Sampling Scenario [8]I Small pool of labeled data, large pool of unlabeled dataI AL model samples examples which are assumed to be most helpfulI Fitting scenario for annotating large corpora

March 12th, 2018 | TU Darmstadt | Computer Science Department | UKP Lab | Ji-Ung Lee | INCEpTION Workshop | 10

Page 11: INCEpTION workshop Active Learning for Text … · 12/03/2018 · INCEpTION workshop Active Learning for Text Annotation March 12th, 2018 | TU Darmstadt | Computer Science Department

Active Learning Scenarios

Stream-based Sampling Scenario

I Pool-based Sampling Scenario [8]I Small pool of labeled data, large pool of unlabeled dataI AL model samples examples which are assumed to be most helpfulI Fitting scenario for annotating large corpora

I Stream-based Sampling Scenario [3]I Continuous stream of unlabeled dataI AL model decides to sample an incoming example or notI Useful for online learning set ups

March 12th, 2018 | TU Darmstadt | Computer Science Department | UKP Lab | Ji-Ung Lee | INCEpTION Workshop | 11

Page 12: INCEpTION workshop Active Learning for Text … · 12/03/2018 · INCEpTION workshop Active Learning for Text Annotation March 12th, 2018 | TU Darmstadt | Computer Science Department

Active Learning Scenarios

Membership Query Synthesis

I Pool-based Sampling Scenario [8]I Small pool of labeled data, large pool of unlabeled dataI AL model samples examples which are assumed to be most helpfulI Fitting scenario for annotating large corpora

I Stream-based Sampling Scenario [3]I Continuous stream of unlabeled dataI AL model decides to sample an incoming example or notI Useful for online learning set ups

I Membership Query Synthesis [2]I AL model constructs examples for samplingI May lead to nonsensical dataI Less-suited for textual data

March 12th, 2018 | TU Darmstadt | Computer Science Department | UKP Lab | Ji-Ung Lee | INCEpTION Workshop | 12

Page 13: INCEpTION workshop Active Learning for Text … · 12/03/2018 · INCEpTION workshop Active Learning for Text Annotation March 12th, 2018 | TU Darmstadt | Computer Science Department

Sampling Strategies

How to determine the usefulness of unlabeled data?

I Uncertainty Sampling [8]

I Query-by-Committee [4]

I Expected Error Reduction [7]

I Variance Reduction [6]

March 12th, 2018 | TU Darmstadt | Computer Science Department | UKP Lab | Ji-Ung Lee | INCEpTION Workshop | 13

Page 14: INCEpTION workshop Active Learning for Text … · 12/03/2018 · INCEpTION workshop Active Learning for Text Annotation March 12th, 2018 | TU Darmstadt | Computer Science Department

Sampling Strategies

Uncertainty Sampling

Idea: Sample example the model is most uncertain about

Measure uncertainty by:

I Prediction confidenceI MarginI Entropy

Uncertainty Sampling

I For binary classification, all three are equivalent

March 12th, 2018 | TU Darmstadt | Computer Science Department | UKP Lab | Ji-Ung Lee | INCEpTION Workshop | 14

Page 15: INCEpTION workshop Active Learning for Text … · 12/03/2018 · INCEpTION workshop Active Learning for Text Annotation March 12th, 2018 | TU Darmstadt | Computer Science Department

Sampling Strategies

Prediction confidence

Input Tone Instrument Fish Confidence

I sing bass in our choir. → 0.8 0.15 0.05 0.8

I like playing the bass guitar. → 0.49 0.36 0.15 0.49

I caught a big bass yesterday. → 0.5 0.45 0.05 0.5

Turn down the bass. → 0.5 0.25 0.25 0.5

I Sample sentence with lowest prediction confidenceI Only takes into account the confidence for predicted class (e.g. tone)

March 12th, 2018 | TU Darmstadt | Computer Science Department | UKP Lab | Ji-Ung Lee | INCEpTION Workshop | 15

Page 16: INCEpTION workshop Active Learning for Text … · 12/03/2018 · INCEpTION workshop Active Learning for Text Annotation March 12th, 2018 | TU Darmstadt | Computer Science Department

Sampling Strategies

Margin

Input Tone Instrument Fish Margin

I sing bass in our choir. → 0.8 0.15 0.05 0.65

I like playing the bass guitar. → 0.49 0.36 0.15 0.13

I caught a big bass yesterday. → 0.5 0.45 0.05 0.05

Turn down the bass. → 0.5 0.25 0.25 0.25

I Sample sentence with the smallest margin between the most confident andsecond most confident prediction

March 12th, 2018 | TU Darmstadt | Computer Science Department | UKP Lab | Ji-Ung Lee | INCEpTION Workshop | 16

Page 17: INCEpTION workshop Active Learning for Text … · 12/03/2018 · INCEpTION workshop Active Learning for Text Annotation March 12th, 2018 | TU Darmstadt | Computer Science Department

Sampling Strategies

Entropy

Input Tone Instrument Fish Entropy

I sing bass in our choir. → 0.8 0.15 0.05 0.61

I like playing the bass guitar. → 0.49 0.36 0.15 1.00

I caught a big bass yesterday. → 0.5 0.45 0.05 0.85

Turn down the bass. → 0.5 0.25 0.25 1.04

I Entropy measures the amount of disorder – somewhat similar to measuringthe uncertainty over all classes

I Sample sentence with the highest entropy

March 12th, 2018 | TU Darmstadt | Computer Science Department | UKP Lab | Ji-Ung Lee | INCEpTION Workshop | 17

Page 18: INCEpTION workshop Active Learning for Text … · 12/03/2018 · INCEpTION workshop Active Learning for Text Annotation March 12th, 2018 | TU Darmstadt | Computer Science Department

Sampling Strategies

Query-by-Committee (QbC)

Idea: Learn a set of classifiers with different hypotheses

I Every classifier predicts (votes) for anunlabeled candidate example

I Sample example with the most disagreementI Popular measurements for disagreement:

I Vote entropyI KL divergence

Query by Committee

I Can be seen as a search through the hypothesis space

March 12th, 2018 | TU Darmstadt | Computer Science Department | UKP Lab | Ji-Ung Lee | INCEpTION Workshop | 18

Page 19: INCEpTION workshop Active Learning for Text … · 12/03/2018 · INCEpTION workshop Active Learning for Text Annotation March 12th, 2018 | TU Darmstadt | Computer Science Department

Sampling Strategies

(Soft) Vote Entropy

Prediction probabilities of two different models for [tone, instrument, fish]

Input Model 1 Model 2 Entropy

I sing bass in our choir. → [0.8, 0.1, 0.1] [0.6, 0.3, 0.1] 0.80

I like playing the bass guitar. → [0.2, 0.7, 0.1] [0.2, 0.6, 0.2] 0.89

I caught a big bass yesterday. → [0.5, 0.3, 0.2] [0.1, 0.1, 0.8] 1.03

Turn down the bass. → [0.4, 0.1, 0.5] [0.3, 0.6, 0.1] 1.10

I QbC generalization of entropy-based uncertainty samplingI Compute entropy over averaged prediction confidenceI Sample sentence with the highest vote entropy

March 12th, 2018 | TU Darmstadt | Computer Science Department | UKP Lab | Ji-Ung Lee | INCEpTION Workshop | 19

Page 20: INCEpTION workshop Active Learning for Text … · 12/03/2018 · INCEpTION workshop Active Learning for Text Annotation March 12th, 2018 | TU Darmstadt | Computer Science Department

Sampling Strategies

KL Divergence (KLD)

Prediction probabilities of two different models for [tone, instrument, fish]

Input Model 1 Model 2 KLD

I sing bass in our choir. → [0.8, 0.1, 0.1] [0.6, 0.3, 0.1] 0.033

I like playing the bass guitar. → [0.2, 0.7, 0.1] [0.2, 0.6, 0.2] 0.010

I caught a big bass yesterday. → [0.5, 0.3, 0.2] [0.1, 0.1, 0.8] 0.195

Turn down the bass. → [0.4, 0.1, 0.5] [0.3, 0.6, 0.1] 0.175

I Kullback–Leibler divergence (relative entropy) compares probabilitydistributions

I Sample sentence with the highest KL divergence

March 12th, 2018 | TU Darmstadt | Computer Science Department | UKP Lab | Ji-Ung Lee | INCEpTION Workshop | 20

Page 21: INCEpTION workshop Active Learning for Text … · 12/03/2018 · INCEpTION workshop Active Learning for Text Annotation March 12th, 2018 | TU Darmstadt | Computer Science Department

Sampling Strategies

Expected Error/ Variance Reduction

Uncertainty Sampling:I Most uncertain example gives most improvement on prediction performanceI This is not necessarily true

Expected Error Reduction [7]:I Minimize the expected future error directly

Variance Reduction [6]:I Computing expected future error is costlyI Minimize it indirectly by minimizing output variance

March 12th, 2018 | TU Darmstadt | Computer Science Department | UKP Lab | Ji-Ung Lee | INCEpTION Workshop | 21

Page 22: INCEpTION workshop Active Learning for Text … · 12/03/2018 · INCEpTION workshop Active Learning for Text Annotation March 12th, 2018 | TU Darmstadt | Computer Science Department

Expected Error Reduction

Algorithm 1 Expected Error ReductionRequire: model M, labeled data L, unlabeled data X , labels Y , Expected loss E(M)

for x ∈ X dofor y ∈ Y do

L̂← {L + (x , y)}M̂ ← train(L̂)lossx ,y ← E(M̂)

end forlossx ← avg(lossx ,y )

end forx̂ ← min(lossx )

March 12th, 2018 | TU Darmstadt | Computer Science Department | UKP Lab | Ji-Ung Lee | INCEpTION Workshop | 22

Page 23: INCEpTION workshop Active Learning for Text … · 12/03/2018 · INCEpTION workshop Active Learning for Text Annotation March 12th, 2018 | TU Darmstadt | Computer Science Department

Advantages and Disadvantages

Uncertainty Sampling

ProsI Simple, fastI Easy to implementI Usable with any probabilistic model

ConsI Does not care about outliersI Confident wrong predictions may never get sampled:

Input Tone Instrument Fish

I sing bass in our choir. → 0.1 0.1 0.8

March 12th, 2018 | TU Darmstadt | Computer Science Department | UKP Lab | Ji-Ung Lee | INCEpTION Workshop | 23

Page 24: INCEpTION workshop Active Learning for Text … · 12/03/2018 · INCEpTION workshop Active Learning for Text Annotation March 12th, 2018 | TU Darmstadt | Computer Science Department

Advantages and Disadvantages

Query-by-Committee

ProsI SimpleI Usable with any learning algorithm, or sets of different algorithms

ConsI Difficult to trainI Difficult to maintain

If using different algorithms:I Make sure to normalize their outputs, if necessaryI Consider using weighted voting for different model performances

March 12th, 2018 | TU Darmstadt | Computer Science Department | UKP Lab | Ji-Ung Lee | INCEpTION Workshop | 24

Page 25: INCEpTION workshop Active Learning for Text … · 12/03/2018 · INCEpTION workshop Active Learning for Text Annotation March 12th, 2018 | TU Darmstadt | Computer Science Department

Advantages and Disadvantages

Expected Error/ Variance Reduction

ProsI Directly minimizes the expected error / variance

ConsI Computationally expensiveI Difficult to implementI Limited to pool-based sampling scenarioI Variance reduction is limited to regression models

March 12th, 2018 | TU Darmstadt | Computer Science Department | UKP Lab | Ji-Ung Lee | INCEpTION Workshop | 25

Page 26: INCEpTION workshop Active Learning for Text … · 12/03/2018 · INCEpTION workshop Active Learning for Text Annotation March 12th, 2018 | TU Darmstadt | Computer Science Department

Conclusion

Active Learning for Text Annotation

In general:I Allows an iterative training of model requiring less training dataI Gives a good estimate how models may perform later onI May sample data which is hard to annotate (increases annotation time)

Input Tone Instrument Fish

Turn down the bass. → 0.4 0.3 0.3

Watch out for:I Skewed label distributions (QbC can help)I Unreliable oracles, e.g. crowd-sourcing (estimate annotator performance)I Outliers (Use cluster-based extensions of active learning)

March 12th, 2018 | TU Darmstadt | Computer Science Department | UKP Lab | Ji-Ung Lee | INCEpTION Workshop | 26

Page 27: INCEpTION workshop Active Learning for Text … · 12/03/2018 · INCEpTION workshop Active Learning for Text Annotation March 12th, 2018 | TU Darmstadt | Computer Science Department

Thank you for your attention!

March 12th, 2018 | TU Darmstadt | Computer Science Department | UKP Lab | Ji-Ung Lee | INCEpTION Workshop | 27

Page 28: INCEpTION workshop Active Learning for Text … · 12/03/2018 · INCEpTION workshop Active Learning for Text Annotation March 12th, 2018 | TU Darmstadt | Computer Science Department

Other Query Strategies

Cluster-based Approaches:I Density WeightingI Hierarchical Sampling

Advantages, Disadvantages:I Pros: Model the actual input distribution, less prone to outliersI Cons: Actual input distribution may not relate to actual labels

March 12th, 2018 | TU Darmstadt | Computer Science Department | UKP Lab | Ji-Ung Lee | INCEpTION Workshop | 28

Page 29: INCEpTION workshop Active Learning for Text … · 12/03/2018 · INCEpTION workshop Active Learning for Text Annotation March 12th, 2018 | TU Darmstadt | Computer Science Department

References I

Burr Settles, University of Wisconsin–Madison, Active Learning Literature Survey,Computer Sciences Technical Report, 2010

Dana Angluin, Queries and Concept Learning, In: Machine Learning, 1988, April,Vol. 2, Issue 4, pages 319–342, doi:10.1023/A:1022821128753,

Les E. Atlas and David A. Cohn and Richard E. Ladner, Training ConnectionistNetworks with Queries and Selective Sampling, In: Advances in Neural InformationProcessing Systems 2, pages 566–573, 1990, Morgan-Kaufmann,

H. S. Seung and M. Opper and H. Sompolinsky, Query by Committee, In:Proceedings of the Fifth Annual Workshop on Computational Learning Theory,COLT ’92, 1992, Pittsburgh, Pennsylvania, USA, pages 287–294,doi:10.1145/130385.130417, ACM

March 12th, 2018 | TU Darmstadt | Computer Science Department | UKP Lab | Ji-Ung Lee | INCEpTION Workshop | 29

Page 30: INCEpTION workshop Active Learning for Text … · 12/03/2018 · INCEpTION workshop Active Learning for Text Annotation March 12th, 2018 | TU Darmstadt | Computer Science Department

References II

Burr Settles and Mark Craven and Soumya Ray, Multiple-Instance Active Learning,In: Advances in Neural Information Processing Systems 20, pages 1289–1296,2008, Curran Associates Inc.,

David A. Cohn, Neural Network Exploration Using Optimal Experiment Design, In:Advances in Neural Information Processing Systems 6, pages 679–686, 1994,Morgan-Kaufmann,

Nicholas Roy and Andrew McCallum, Toward Optimal Active Learning ThroughSampling Estimation of Error Reduction, In: Proceedings of the EighteenthInternational Conference on Machine Learning, ICML ’01, 2001, pages 441–448,Morgan Kaufmann,

March 12th, 2018 | TU Darmstadt | Computer Science Department | UKP Lab | Ji-Ung Lee | INCEpTION Workshop | 30

Page 31: INCEpTION workshop Active Learning for Text … · 12/03/2018 · INCEpTION workshop Active Learning for Text Annotation March 12th, 2018 | TU Darmstadt | Computer Science Department

References III

David D. Lewis and William A. Gale, A Sequential Algorithm for Training TextClassifiers, In: Proceedings of the 17th Annual International ACM SIGIRConference on Research and Development in Information Retrieval, SIGIR ’94,1994, Dublin, Ireland, pages 3–12 Springer-Verlag New York Inc.,

March 12th, 2018 | TU Darmstadt | Computer Science Department | UKP Lab | Ji-Ung Lee | INCEpTION Workshop | 31