Andreea Bodnari,1 Peter Szolovits,1 Ozlem Uzuner2
1MIT, CSAIL, Cambridge, MA, USA2Department of Information Studies, University at Albany SUNY, Albany,
NY, USA
10.16.2012- Rochester, MN
MCORES: a system for noun phrase coreference resolution for clinical records
2012 SHARPn Summit “Secondary Use”
Outline
Medical coreference resolution system (MCORES)
Experimental results
Conclusion
Page 2
Electronic Medical Records (EMRs) – large information repositories
Clinical information requires processingLower level: sentence parsing, tokenization Higher level: coreference resolution, semantic
disambiguation
Coreference resolution: a fundamental step in text processing
Page 3
Why coreference resolution?
English medical corpus provided by i2b2 National Center for Biomedical Computing De-identified medical discharge summaries▪ Source: PH & BIDMC▪ Content: 230(PH) + 196(BIDMC) discharge summaries
Annotated concepts and coreference chains
Concept types
Page 4
Data: i2b2/VA corpus
PersonsProblemsTreatmentsTests Pronouns
NP Instance Creation
Feature Generation
Classification
Output Clustering
Page 5
Coreference resolution algorithm
Markables of same semantic category are paired together
MCORES creates positive instances only from neighboring markable pairs in a chain
1Instance creation akin to McCharty and LehnertPage 6
1. NP instance creation
Page 7
Table 3: Distribution of coreferent and non-coreferent instances per semantic category over instances containing exact, partial, and no textual overlap.
1. NP instance creation
Multi-perspective features Antecedent perspective Anaphor perspective Greedy perspective Stingy perspective
Phrase-level lexicalSentence-level lexicalSyntacticSemanticMiscellaneous
Page 8
2. Feature Generation
Phrase-level lexical
Token overlap*Normalized token overlapEdit-distanceNormalized edit-distance
Sentence-level lexical
Sentence-level token overlap*Filtered sentence-level token overlap*Left and right mention overlap
stingy and greedy perspectives only
Page 9
2. Feature Generation (lexical)
* multi-perspective feature
Syntactic
Number agreementNoun overlap*Surname match
Semantic
UMLS CUI overlap*UMLS CUI token overlap*UMLS semantic type overlap*Anaphor UMLS semantic type
Page 10
2. Feature Generation (syntactic & semantic)
* multi-perspective feature
Token distanceMention distanceAll-mention distanceSentence distanceSection matchSection distance
Page 11
2. Feature Generation (miscellaneous)
C4.5 decision tree algorithmFlexible Readable prediction model
Classify pairs of markables based on values of the feature vectors
Page 12
3. Classification
Classifier makes pairwise predictions onlyPairwise predictions clustered into coference chainsAggressive-merge1 clustering algorithm
prediction [M1] - [M2]
all preceding pairwise predictions linked to [M1]or [M2]
1Aggresive-merge algorithm proposed by McCarthy and Lehnert
Page 13
4. Output Clustering
Feature set evaluationPerspectives evaluationPerformance evaluation against In house baseline Third party system (RECONCILEACL09
& BART)
Evaluation metric: unweighted averages of Recall, Precision, and F-measures of MUC B3
CEAF BLANC
Page 14
Evaluation
Page 15
MCORES’ advantage comes from linking markables with no token overlap
Phrase-level sub-MCORES performs similarly to MCORES
Greedy perspective system is the most favorable single-perspective system
Multi-perspective system performs as well or better than single-perspective systems
Error analysis MCORES fails to classify misspelled person pairs
Medical problems false positives due to difference between newly and recurring events
Treatments false positives due to medications presenting different routes of administration
Tests false positive due to the large number of full overlap instances that did not corefer
Page 16
Discussion
Developed coreference resolution system for the medical domain (MCORES)
MCORES innovates through a multi-perspective and knowledge-based feature set
MCORES outperforms third party systems and an in-house baseline, improving coreference resolution on clinical records
Page 17
Conclusion