page 1 naacl-hlt 2010 los angeles, ca training paradigms for correcting errors in grammar and usage...

35
Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois at Urbana-Champaign

Upload: shanon-garrison

Post on 17-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois

Page 1

NAACL-HLT 2010

Los Angeles, CA

Training Paradigms for Correcting Errors in Grammar and Usage

Alla Rozovskaya and Dan Roth

University of Illinois at Urbana-Champaign

Page 2: Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois

Page 2

Error correction tasks

Context-sensitive spelling mistakes I would like a peace*/piece of cake.

English as a Second Language (ESL) mistakes

Mistakes involving prepositions To*/in my mind, this is a serious problem.

Mistakes involving articles Nearly 30000 species of plants are under the*/a serious threat of disappearing.

Laziness is the engine of the*/<NONE> progress.

Page 3: Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois

Page 3

The standard training paradigm for error correction

Example: Correcting article mistakes[Izumi et al., ’03; Han et al., ’06; De Felice and Pulman, ’08; Gamon et al., ’08]

Cast the problem as a classification task Provide a set of candidates: {a,the,NONE} Task: select the appropriate candidate in context Define features based on the surrounding context and train a

classifier on correct (native) data

Laziness is the engine of progress[the]

Features:w1B=of, w1A=progress, w2Bw1B=engine-of, …

Page 4: Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois

Page 4

The standard training paradigm for error correction

Correcting article mistakes [Izumi et al., ’03; Han et al., ’06; De Felice and Pulman, ’08;

Gamon et al., ’08]

Correcting preposition mistakes [Eeg-Olofsson and Knutsson, ’03; Gamon et al., ’08;

Tetreault and Chodorow, ’08, others]

Context-sensitive spelling correction [Golding and Roth, ’96,’99; Carlson et al., ’01, others]

Page 5: Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois

Page 5

But this is a paradigm for a selection task!

Selection task (e.g. WSD): We have a set of candidates Task: select the correct candidate from a set of candidates

The selection paradigm is appropriate for WSD, because

there is no proposed candidate in context

Page 6: Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois

Page 6

The typical error correction training paradigm is the paradigm of a selection task!

Why?

Easy to obtain training data – can use correct text No need for annotation

Page 7: Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois

Page 7

Outline

The error correction task: Problem statementThe error correction task: Problem statement The typical training paradigm – does selection rather than The typical training paradigm – does selection rather than

correctioncorrection Selection versus correction

What is the appropriate training paradigm for the correction task? The ESL corpus Training paradigms for the error correction task

Key idea Methods of error generation

Experiments Conclusions

Page 8: Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois

Page 8

Selection tasks versus error correction tasks

Article selection task Nearly 30000 species of plants are under ___ serious threat of disappearing.

Article correction task Nearly 30000 species of plants areunder the serious threat of disappearing.

Set of candidates: {a,the,NONE}

Set of candidates: {a,the,NONE}

source

Page 9: Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois

Page 9

Correction versus selection

Article selection classifier Accuracy on native English data 87-90% Baseline for the article selection task 60-70%

(use the most common article)

Non-native data accuracy >90% If we use the writer’s selection, the results are very good already!

Conclusion:Need to use the proposed candidate

(or will make more mistakes than there are in the data)

Error rate=10%

With a selection model – can use it as a threshold

Can we do better if we use the proposed candidate in training?

Page 10: Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois

Page 10

The proposed article is a useful resource

We want to use the proposed article in training 90% of articles are used correctly Article mistakes are not random

Selection paradigm: Can we use the proposed candidate in training?

- No: In native data, the proposed article always corresponds to the label

Page 11: Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois

Page 11

How can we use the proposed article in training?

Using annotated data for training Laziness is the engine of <the,NONE> progress.

Annotating data for training is expensive

*Need a method to generate training data for the error correction task without expensive annotation.

source label

Page 12: Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois

Page 12

Contributions of this work

We propose a method to generate training data for the error correction task Avoid expensive data annotation

We use the generated data to train classifiers in the paradigm of correction With the proposed candidate in training

We show that error correction training paradigms are superior to the selection paradigm of training

Page 13: Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois

Page 13

Outline

The error correction task: Problem statementThe error correction task: Problem statement The typical training paradigm – does selection rather than The typical training paradigm – does selection rather than

correctioncorrection Selection versus correctionSelection versus correction

What is the appropriate training paradigm for correction?What is the appropriate training paradigm for correction? The ESL corpus Training paradigms for the error correction task

Key idea Methods of error generation

Experiments Conclusions

Page 14: Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois

Page 14

The annotated ESL corpus

Annotated a corpus of ESL sentences (60K words) Extracted from two corpora of ESL essays:

ICLE [Granger et al.,’02] CLEC [Gui and Yang,’03]

Sentences written by ESL students of 9 first languages Each sentence is fully corrected and error tagged Annotated by native English speakers Experiments: Chinese, Czech, Russian

Page 15: Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois

Page 15

The annotated ESL corpus

Annotating ESL sentences with an annotation tool

Sentence for annotation

Page 16: Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois

Page 16

The annotated ESL corpus

Each sentence is fully corrected and error-taggedFor details about the annotation, please see

[Rozovskaya and Roth, ’10, NAACL-BEA5]

Before annotation “This time asks for looking at things with our

eyes opened.” With annotation comments

“This time @period, age, time@ asks $us$ for <to> looking *look* at things with our eyes opened .” After annotation

“This period asks us to look at things with our eyes opened.”

Page 17: Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois

Page 17

Outline

The error correction task: Problem statementThe error correction task: Problem statement The typical training paradigm – does selection rather than The typical training paradigm – does selection rather than

correctioncorrection Selection versus correctionSelection versus correction

What is the appropriate training paradigm for correction?What is the appropriate training paradigm for correction? The ESL data used in the evaluationThe ESL data used in the evaluation Training paradigms for the error correction task

Key idea Methods of error generation

Experiments Conclusions

Page 18: Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois

Page 18

Training paradigms for the error correction task

Generate artificial article errors in native training data The source article can be used in training as a feature Constraint:

We want training data to be similar to non-native text Other works that use artificial errors do not take into account error

patterns in non-native data [Sjöbergh and Knutsson, ’05; Brockett et al., ’06, Foster and Andersen, ’09]

Key idea: We want to be able to use the proposed candidate in training

Page 19: Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois

Page 19

Training paradigms for the error correction task

We examine article errors in the annotated data:

Add errors selectively

Mimic the article distribution the error rate the error patterns of the non-native text

Page 20: Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois

Page 20

Error rates in article usage

Very common mistakes made by non-native speakers of English

TOEFL essays by Russian, Chinese, and Japanese speakers:13% of noun phrases have article mistakes [Han et al., ’06]

Essays by advanced Chinese, Czech, Russian learners of ESL: 10% of noun phrases have article mistakes.

Page 21: Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois

Page 21

Distribution of articles in the annotated ESL data

Source language

Examples total

Error rate

Errors total

Classes

a the None

Chinese 1713 9.2% 158 8.5 28.2 63.3

Czech 1061 9.6% 102 9.1 22.9 68.0

Russian 2146 10.4% 224 10.5 21.7 67.9

English Wikipedia

9.6 29.1 61.4

This error rate sets the baseline for the

task around 90%

This error rate sets the baseline for the

task around 90%

Page 22: Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois

Page 22

Distribution of article errors in the annotated ESL text

Distribution of errors by type

Missing the Missing a Extr.the Extr.a Conf.(a,the )0

10

20

30

40

50

60

Chinese

Czech

Russian

Not all confusions are equally likely

Errors are dependent on the

first language of the writer

Page 23: Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois

Page 23

Characteristics of the non-native data: Summary

Article distribution Error rates Error patterns of the non-native text

We use this knowledge to generate errors for error correction training paradigms

Page 24: Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois

Page 24

Error correction training paradigm 1: General

General

Add errors uniformly at random with error rate conf, where conf 2{5%,10%,12%,14%,16%,18%}

Example: Let error rate=10%

xxxxxxxxxxxxxxxxx 2 f1;2;3g

xxx

x+x=2xxxx

replace(the, a, 0.05)replace(the,NONE,0.05)

the a NONE

replace(a, the, 0.05)replace(a,NONE,0.05)

replace(NONE, a, 0.05)replace(NONE,the,0.05)

Page 25: Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois

Page 25

Error correction training paradigm 2: ArticleDistr ArticleDistr

a the NONE

Czech 9.1 22.9 68.0

English Wikipedia

9.6 29.1 61.4Mimic the distribution of the ESL source articles in training

the

replace(the, a, p1)

replace(the,NONE,p2)

Constraints:(1) ProbTrain(the)=ProbCzech(the)(2) p1, p2 ¸ minConf, where minConf 2{0.02, 0.03, 0.04, 0.05}

Example:A linear program is set up to

find p1 and p2

Page 26: Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois

Page 26

Error correction training paradigm 3: ErrorDistr

ErrorDistr Add article mistakes to mimic the error rate and confusion patterns observed

in the ESL data.

Example: Chinese Error rate: 9.2%

Missing the Missing a Extr. the Extr. a Conf.(a,the)0

10

20

30

40

50

60

Article confusions by error type

Page 27: Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois

Page 27

Error correction training paradigms: Summary

Key idea: generate artificial errors in native training data We can use the source article in training as a feature Important constraints:

Errors mimic the error patterns of the ESL text Error rate Distribution of different article confusions

Page 28: Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois

Page 28

Error correction training paradigms: Costs

3 error generation methods

Use different knowledge (and have different costs) Paradigm 1 (error rate in the data) Paradigm 2 (distribution of articles in the ESL data) – no annotation

required Paradigm 3 (error rate and article confusions) – requires annotated

data (the most costly method)

Page 29: Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois

Page 29

Outline

The error correction task: Problem statementThe error correction task: Problem statement The typical training paradigm – does selection rather than The typical training paradigm – does selection rather than

correctioncorrection Selection versus correctionSelection versus correction

What is the appropriate training paradigm for correction?What is the appropriate training paradigm for correction? The ESL data used in the evaluationThe ESL data used in the evaluation Training paradigms for the error correction taskTraining paradigms for the error correction task

Key ideaKey idea Methods of error generationMethods of error generation

Experiments Conclusions

Page 30: Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois

Page 30

Experimental setup

Train a TrainClean classifier using the selection paradigm

3 classifiers are Trained With artificial Errors (TWE classifiers)

Online learning paradigm and the Averaged Perceptron Algorithm.

Page 31: Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois

Page 31

Features

Features are based on the 3-word window around the target. If we take [a] brief look back

if-IN we-PRP take-VBP [a] brief-JJ look-NN back-RB

Word features: headWord=look, w3B=if, w2B=we,w1B=take, w1A=brief, etc.

Tag features: p3B=IN, p2B=PRP, etc.

Composite features: w2Bw1B=we-take w1Bw1A= take-brief , etc. Feature Type FeaturesSimplelexical features w3B, w2B, w1B, w1A, w2A, w3A, w1B,

headWordSimplepart-of-speech features p3B, p2B, p1B, p1A, p2A, p3A, headP os,

headN umberComposite features w2Bw1B, w1Bw1A, w1Aw2A, p2Bp1B,

p1Bp1A, p1Ap2A, p1Bw1B, p2Bw2B,p2Aw2A, p1Aw1A, p1Ap2Ap3A,headWordheadP os, w1BheadWord,w1AheadWord

1

source feature – TWE systems only

Page 32: Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois

Page 32

Performance on the data by Russian speakers

TrainingParadigm

Accuracy Error reduction

TrainClean 90.62% 5.92%

TWE (General) 91.25% 12.24%

TWE (Article Distr.) 91.52% 14.94%

TWE (Error Distr.) 91.63% 16.05%

Baseline 90.03%

All TWE’s outperform the selection paradigm TrainClean for all languages

On average, TWE (Error Distr.) provides the best improvement

Page 33: Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois

Page 33

Improvement due to training with errors

Source language

Baseline TrainClean

TWE Error reduction

Chinese 92.03% 91.85% 92.67% 10.06%

Czech 90.88% 91.82% 92.22% 4.89%

Russian 90.03% 90.62% 91.63% 10.77%

Page 34: Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois

Page 34

Conclusions

We argued that the error correction task should be studied in the error correction paradigm rather than the current selection paradigm The baseline for the error correction task is high Mistakes are not random

We have proposed a method to generate training data for error correction tasks using artificial errors The artificial errors mimic error rates and error patterns in the non-

native text The method allows us to train with the proposed candidate, in the

paradigm of error correction The error correction training paradigms are superior to the

typical selection training paradigm

Page 35: Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois

Page 35

Thank you!Questions?