nick pendar and elena cotos iowa state university
DESCRIPTION
Automatic Identification of Discourse Moves in Scientific Article Introductions. NICK PENDAR AND ELENA COTOS IOWA STATE UNIVERSITY THE 3RD WORKSHOP ON INNOVATIVE USE OF NLP FOR BUILDING EDUCATIONAL APPLICATIONS JUNE 19, 2008. Outline. Background and motivation - PowerPoint PPT PresentationTRANSCRIPT
NICK PENDAR AND ELENA COTOSIOWA STATE UNIVERSITY
THE 3RD WORKSHOP ON INNOVATIVE USE OF NLP FOR BUILDING EDUCATIONAL APPLICATIONS
JUNE 19, 2008
Automatic Identification of Discourse Moves
in Scientific Article Introductions
Outline
Background and motivationDiscourse move identification
Data and annotation schemeFeature selectionSentence representationClassifierEvaluationInter-annotator agreement
Further work
Automated evaluation: Background
Automated essay scoring (AES) in performance-based and high-stakes standardized tests (e.g., ACT, GMAT, TOEFL, etc.)
Automated error detection in L2 output (Burstein and Chodorow, 1999; Chodorow et al., 2007; Han et al., 2006; Leacock and Chodorow, 2003)
Assessment of various constructs, e.g., topical content, grammar, style, mechanics, syntactic complexity, and deviance or plagiarism (Burstein, 2003; Elliott, 2003; Landauer et al., 2003; Mitchell et al., 2002; Page, 2003; Rudner and Liang, 2002)
Text organization limited to recognizing the five-paragraph essay format, thesis, and topic sentences
AntMover (Anthony and Lashkia, 2003)
Wide range of possibilities for high quality evaluation and feedback (Criterion; Burstein, Chodorow, & Leacock, 2004)
Potential in formative assessment, but – the effects of intelligent formative feedback are not fully investigated
Warschauer and Ware (2006) call for the development of a classroom research agenda that would help evaluate and guide the application of AES in the writing pedagogy
“the potential of automated essay evaluation for improving student writing is an empirical question, and virtually no peer-reviewed research has yet been published” (Hyland and Hyland, 2006, p. 109)
Automated evaluation: CALI Motivation
Automated evaluation: EAP Motivation
EAP pedagogical approaches (Cortes, 2006; Levis &
Levis-Muller, 2003; Vann & Myers, 2001) fail to provide NNSs with sufficient academic writing practice and remediational guidance
Problem of disciplinarityAn NLP-based academic discourse evaluation
software application could account for this drawback
Such an application has not yet been developed
Automated evaluation: Research Motivation
Long-term research goals: design and implementation of IADE (Intelligent
Academic Discourse Evaluator)analysis of IADE effectiveness for formative
assessment purposes
Evaluates students’ research article introductions in terms of moves/steps (Swales 1990, 2004)
Draws from SLA models: interactionist views (Caroll, 1999; Gass, 1997;
Long, 1996; Long & Robinson, 1998; Mackey, Gass, & McDonough, 2000; Swain, 1993) and Systemic Functional Linguistics (Martin, 1992; Halliday, 1985)
Skill Acquisition Theory of learning (DeKeyser, 2007 )
Is informed by empirical research on the provision of feedback
Is informed by Evidence Centered Design principles (Mislevy et al., 2006)
Discourse Move Identification
Approached as a classification problem (similar to Burstein et al., 2003)
given a sentence and a finite set of moves and steps, what move/step does the sentence signify?
ISUAW corpus: 1,623 articles; 1,322,089 words; average length of articles 814.09 words
Stratified sampling of 401 introduction sections representative of 20 academic disciplines
Sub-corpus: 267,029 words; average length 665.91 words; 11,149 sentences
Manual annotation
Discourse Move Identification
Annotation scheme (Swales, 1990; Swales, 2004)
Discourse Move Identification
Multiple layers of annotation for cases when the same sentence signified more than one move or more than one step
Feature Selection
Features that reliably indicate a move/stepText-categorization approach (see Sebastiani, 2002)
Each sentence treated as a data item to be classified and represented as an n-dimensional vector in the Euclidean space
The task of the learning algorithm is to find a function F : S → M that would map the sentences in the corpus S to classes in M = {m1,m2,m3}
Identification of moves, not yet steps
Feature Selection
Extraction of word unigrams, bigrams, and trigrams from the annotated corpus
Preprocessing:All tokens stemmed using the NLTK port of the Porter
Stemmer algorithm (Porter, 1980)
All numbers in the texts replaced by the string _number_
The tokens inside each n-gram alphabetized in case of bigrams and trigrams
All n-grams with a frequency of less than five excluded
Feature Selection
Odds ratio
Conditional probabilities are calculated as maximum likelihood estimates
N-grams with maximum odds ratios selected as features
Sentence Representation
Each sentence represented as a vectorPresence or absence of terms in sentences
recorded as Boolean values (0 for the absence of the corresponding term or a 1 for its presence)
Classifier
Support Vector Machines (SVM) (Basu et al., 2003; Burges, 1998; Cortes and Vapnik, 1995; Joachims, 1998; Vapnik, 1995)
five-fold cross validationMachine learning environment RAPIDMINER
(Mierswa et al., 2006)
RBF kernel found through a set of different parameter settings on the feature set with 3,000 unigrams
Parameters not necessarily the best; exhaustive searches will be performed on the other feature sets
Evaluation
Five-fold cross validation on 14 different feature sets were performed
Evaluation
Accuracy - the proportion of classifications that agreed with the manually assigned labels
Evaluation
Precision - what proportion of the items assigned to a given category actually belonged to it
Recall - what proportion of the items actually belonging to a category were labeled correctly
Evaluation
Trigram models result in the best precision
Unigram models result in the best recall
Evaluation
Move 2 is most difficult to identify as revealed by error analysis – Move 2 gets misclassified as Move 1Use the relative position of the sentence in the text to
disambiguate the move involvedsee what percentage of Move 2 sentences identified
as Move 1 by the system also have been labeled Move 1 by the annotator
Extracted features are not discipline-dependent
This just in…
Built a model with top 3000 unigrams and top 3000 trigrams
Precision: 91.14%Recall: 82.98%Kappa: 87.57
Inter-annotator agreement
Second annotations on a sample of files across all 20 disciplines = 487 sentences
k - inter-annotator agreementP(A) - observed probability of agreementP(E) - expected probability of agreement
Average k = 0.945 over the three moves
Further work on IADE
Ongoing experiments to improve accuracy experimenting with different kernel parameters to find optimal models
More annotation Inter-annotator agreement (3 annotators)Identification of stepsDevelopment of intelligent feedbackWeb interface design
Further research with IADE
Evaluation of IADE effectiveness Learning potential Learner fit Meaning focus Authenticity Impact Practicality (Chapelle, 2001)
Process/product research direction - interaction between use and outcome (Warschauer &Ware, 2006)
Target for evaluation - “what is taught through technology” (Chapelle, 2007, p.30)
THANK YOU!
Questions?Suggestions?