nick pendar and elena cotos iowa state university

25
NICK PENDAR AND ELENA COTOS IOWA STATE UNIVERSITY THE 3RD WORKSHOP ON INNOVATIVE USE OF NLP FOR BUILDING EDUCATIONAL APPLICATIONS JUNE 19, 2008 Automatic Identification of Discourse Moves in Scientific Article Introductions

Upload: ewa

Post on 14-Jan-2016

42 views

Category:

Documents


0 download

DESCRIPTION

Automatic Identification of Discourse Moves in Scientific Article Introductions. NICK PENDAR AND ELENA COTOS IOWA STATE UNIVERSITY THE 3RD WORKSHOP ON INNOVATIVE USE OF NLP FOR BUILDING EDUCATIONAL APPLICATIONS JUNE 19, 2008. Outline. Background and motivation - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: NICK PENDAR  AND  ELENA COTOS IOWA STATE UNIVERSITY

NICK PENDAR AND ELENA COTOSIOWA STATE UNIVERSITY

THE 3RD WORKSHOP ON INNOVATIVE USE OF NLP FOR BUILDING EDUCATIONAL APPLICATIONS

JUNE 19, 2008

Automatic Identification of Discourse Moves

in Scientific Article Introductions

Page 2: NICK PENDAR  AND  ELENA COTOS IOWA STATE UNIVERSITY

Outline

Background and motivationDiscourse move identification

Data and annotation schemeFeature selectionSentence representationClassifierEvaluationInter-annotator agreement

Further work

Page 3: NICK PENDAR  AND  ELENA COTOS IOWA STATE UNIVERSITY

Automated evaluation: Background

Automated essay scoring (AES) in performance-based and high-stakes standardized tests (e.g., ACT, GMAT, TOEFL, etc.)

Automated error detection in L2 output (Burstein and Chodorow, 1999; Chodorow et al., 2007; Han et al., 2006; Leacock and Chodorow, 2003)

Assessment of various constructs, e.g., topical content, grammar, style, mechanics, syntactic complexity, and deviance or plagiarism (Burstein, 2003; Elliott, 2003; Landauer et al., 2003; Mitchell et al., 2002; Page, 2003; Rudner and Liang, 2002)

Text organization limited to recognizing the five-paragraph essay format, thesis, and topic sentences

AntMover (Anthony and Lashkia, 2003)

Page 4: NICK PENDAR  AND  ELENA COTOS IOWA STATE UNIVERSITY

Wide range of possibilities for high quality evaluation and feedback (Criterion; Burstein, Chodorow, & Leacock, 2004)

Potential in formative assessment, but – the effects of intelligent formative feedback are not fully investigated

Warschauer and Ware (2006) call for the development of a classroom research agenda that would help evaluate and guide the application of AES in the writing pedagogy

“the potential of automated essay evaluation for improving student writing is an empirical question, and virtually no peer-reviewed research has yet been published” (Hyland and Hyland, 2006, p. 109)

Automated evaluation: CALI Motivation

Page 5: NICK PENDAR  AND  ELENA COTOS IOWA STATE UNIVERSITY

Automated evaluation: EAP Motivation

EAP pedagogical approaches (Cortes, 2006; Levis &

Levis-Muller, 2003; Vann & Myers, 2001) fail to provide NNSs with sufficient academic writing practice and remediational guidance

Problem of disciplinarityAn NLP-based academic discourse evaluation

software application could account for this drawback

Such an application has not yet been developed

Page 6: NICK PENDAR  AND  ELENA COTOS IOWA STATE UNIVERSITY

Automated evaluation: Research Motivation

Long-term research goals: design and implementation of IADE (Intelligent

Academic Discourse Evaluator)analysis of IADE effectiveness for formative

assessment purposes

Page 7: NICK PENDAR  AND  ELENA COTOS IOWA STATE UNIVERSITY

Evaluates students’ research article introductions in terms of moves/steps (Swales 1990, 2004)

Draws from SLA models: interactionist views (Caroll, 1999; Gass, 1997;

Long, 1996; Long & Robinson, 1998; Mackey, Gass, & McDonough, 2000; Swain, 1993) and Systemic Functional Linguistics (Martin, 1992; Halliday, 1985)

Skill Acquisition Theory of learning (DeKeyser, 2007 )

Is informed by empirical research on the provision of feedback

Is informed by Evidence Centered Design principles (Mislevy et al., 2006)

Page 8: NICK PENDAR  AND  ELENA COTOS IOWA STATE UNIVERSITY

Discourse Move Identification

Approached as a classification problem (similar to Burstein et al., 2003)

given a sentence and a finite set of moves and steps, what move/step does the sentence signify?

ISUAW corpus: 1,623 articles; 1,322,089 words; average length of articles 814.09 words

Stratified sampling of 401 introduction sections representative of 20 academic disciplines

Sub-corpus: 267,029 words; average length 665.91 words; 11,149 sentences

Manual annotation

Page 9: NICK PENDAR  AND  ELENA COTOS IOWA STATE UNIVERSITY

Discourse Move Identification

Annotation scheme (Swales, 1990; Swales, 2004)

Page 10: NICK PENDAR  AND  ELENA COTOS IOWA STATE UNIVERSITY

Discourse Move Identification

Multiple layers of annotation for cases when the same sentence signified more than one move or more than one step

Page 11: NICK PENDAR  AND  ELENA COTOS IOWA STATE UNIVERSITY

Feature Selection

Features that reliably indicate a move/stepText-categorization approach (see Sebastiani, 2002)

Each sentence treated as a data item to be classified and represented as an n-dimensional vector in the Euclidean space

The task of the learning algorithm is to find a function F : S → M that would map the sentences in the corpus S to classes in M = {m1,m2,m3}

Identification of moves, not yet steps

Page 12: NICK PENDAR  AND  ELENA COTOS IOWA STATE UNIVERSITY

Feature Selection

Extraction of word unigrams, bigrams, and trigrams from the annotated corpus

Preprocessing:All tokens stemmed using the NLTK port of the Porter

Stemmer algorithm (Porter, 1980)

All numbers in the texts replaced by the string _number_

The tokens inside each n-gram alphabetized in case of bigrams and trigrams

All n-grams with a frequency of less than five excluded

Page 13: NICK PENDAR  AND  ELENA COTOS IOWA STATE UNIVERSITY

Feature Selection

Odds ratio

Conditional probabilities are calculated as maximum likelihood estimates

N-grams with maximum odds ratios selected as features

Page 14: NICK PENDAR  AND  ELENA COTOS IOWA STATE UNIVERSITY

Sentence Representation

Each sentence represented as a vectorPresence or absence of terms in sentences

recorded as Boolean values (0 for the absence of the corresponding term or a 1 for its presence)

Page 15: NICK PENDAR  AND  ELENA COTOS IOWA STATE UNIVERSITY

Classifier

Support Vector Machines (SVM) (Basu et al., 2003; Burges, 1998; Cortes and Vapnik, 1995; Joachims, 1998; Vapnik, 1995)

five-fold cross validationMachine learning environment RAPIDMINER

(Mierswa et al., 2006)

RBF kernel found through a set of different parameter settings on the feature set with 3,000 unigrams

Parameters not necessarily the best; exhaustive searches will be performed on the other feature sets

Page 16: NICK PENDAR  AND  ELENA COTOS IOWA STATE UNIVERSITY

Evaluation

Five-fold cross validation on 14 different feature sets were performed

Page 17: NICK PENDAR  AND  ELENA COTOS IOWA STATE UNIVERSITY

Evaluation

Accuracy - the proportion of classifications that agreed with the manually assigned labels

Page 18: NICK PENDAR  AND  ELENA COTOS IOWA STATE UNIVERSITY

Evaluation

Precision - what proportion of the items assigned to a given category actually belonged to it

Recall - what proportion of the items actually belonging to a category were labeled correctly

Page 19: NICK PENDAR  AND  ELENA COTOS IOWA STATE UNIVERSITY

Evaluation

Trigram models result in the best precision

Unigram models result in the best recall

Page 20: NICK PENDAR  AND  ELENA COTOS IOWA STATE UNIVERSITY

Evaluation

Move 2 is most difficult to identify as revealed by error analysis – Move 2 gets misclassified as Move 1Use the relative position of the sentence in the text to

disambiguate the move involvedsee what percentage of Move 2 sentences identified

as Move 1 by the system also have been labeled Move 1 by the annotator

Extracted features are not discipline-dependent

Page 21: NICK PENDAR  AND  ELENA COTOS IOWA STATE UNIVERSITY

This just in…

Built a model with top 3000 unigrams and top 3000 trigrams

Precision: 91.14%Recall: 82.98%Kappa: 87.57

Page 22: NICK PENDAR  AND  ELENA COTOS IOWA STATE UNIVERSITY

Inter-annotator agreement

Second annotations on a sample of files across all 20 disciplines = 487 sentences

k - inter-annotator agreementP(A) - observed probability of agreementP(E) - expected probability of agreement

Average k = 0.945 over the three moves

Page 23: NICK PENDAR  AND  ELENA COTOS IOWA STATE UNIVERSITY

Further work on IADE

Ongoing experiments to improve accuracy experimenting with different kernel parameters to find optimal models

More annotation Inter-annotator agreement (3 annotators)Identification of stepsDevelopment of intelligent feedbackWeb interface design

Page 24: NICK PENDAR  AND  ELENA COTOS IOWA STATE UNIVERSITY

Further research with IADE

Evaluation of IADE effectiveness Learning potential Learner fit Meaning focus Authenticity Impact Practicality (Chapelle, 2001)

Process/product research direction - interaction between use and outcome (Warschauer &Ware, 2006)

Target for evaluation - “what is taught through technology” (Chapelle, 2007, p.30)

Page 25: NICK PENDAR  AND  ELENA COTOS IOWA STATE UNIVERSITY

THANK YOU!

Questions?Suggestions?