Download - Andreea Bodnari, 1 Peter Szolovits, 1 Ozlem Uzuner 2 1 MIT, CSAIL, Cambridge, MA, USA

Andreea Bodnari,1 Peter Szolovits,1 Ozlem Uzuner2

1MIT, CSAIL, Cambridge, MA, USA2Department of Information Studies, University at Albany SUNY, Albany,

NY, USA

10.16.2012- Rochester, MN

MCORES: a system for noun phrase coreference resolution for clinical records

2012 SHARPn Summit “Secondary Use”

Outline

Medical coreference resolution system (MCORES)

Experimental results

Conclusion

Electronic Medical Records (EMRs) – large information repositories

Clinical information requires processingLower level: sentence parsing, tokenization Higher level: coreference resolution, semantic

disambiguation

Coreference resolution: a fundamental step in text processing

Why coreference resolution?

English medical corpus provided by i2b2 National Center for Biomedical Computing De-identified medical discharge summaries▪ Source: PH & BIDMC▪ Content: 230(PH) + 196(BIDMC) discharge summaries

Annotated concepts and coreference chains

Concept types

Data: i2b2/VA corpus

PersonsProblemsTreatmentsTests Pronouns

NP Instance Creation

Feature Generation

Classification

Output Clustering

Coreference resolution algorithm

Markables of same semantic category are paired together

MCORES creates positive instances only from neighboring markable pairs in a chain

1Instance creation akin to McCharty and Lehnert

1. NP instance creation

Table 3: Distribution of coreferent and non-coreferent instances per semantic category over instances containing exact, partial, and no textual overlap.

1. NP instance creation

Multi-perspective features Antecedent perspective Anaphor perspective Greedy perspective Stingy perspective

Phrase-level lexicalSentence-level lexicalSyntacticSemanticMiscellaneous

2. Feature Generation

Phrase-level lexical

Token overlap*Normalized token overlapEdit-distanceNormalized edit-distance

Sentence-level lexical

Sentence-level token overlap*Filtered sentence-level token overlap*Left and right mention overlap

stingy and greedy perspectives only

2. Feature Generation (lexical)

* multi-perspective feature

Syntactic

Number agreementNoun overlap*Surname match

Semantic

UMLS CUI overlap*UMLS CUI token overlap*UMLS semantic type overlap*Anaphor UMLS semantic type

2. Feature Generation (syntactic & semantic)

* multi-perspective feature

Token distanceMention distanceAll-mention distanceSentence distanceSection matchSection distance

2. Feature Generation (miscellaneous)

C4.5 decision tree algorithmFlexible Readable prediction model

Classify pairs of markables based on values of the feature vectors

3. Classification

Classifier makes pairwise predictions onlyPairwise predictions clustered into coference chainsAggressive-merge1 clustering algorithm

prediction [M1] - [M2]

all preceding pairwise predictions linked to [M1]or [M2]

1Aggresive-merge algorithm proposed by McCarthy and Lehnert

4. Output Clustering

Feature set evaluationPerspectives evaluationPerformance evaluation against In house baseline Third party system (RECONCILEACL09

& BART)

Evaluation metric: unweighted averages of Recall, Precision, and F-measures of MUC B3

CEAF BLANC

Evaluation

MCORES’ advantage comes from linking markables with no token overlap

Phrase-level sub-MCORES performs similarly to MCORES

Greedy perspective system is the most favorable single-perspective system

Multi-perspective system performs as well or better than single-perspective systems

Error analysis MCORES fails to classify misspelled person pairs

Medical problems false positives due to difference between newly and recurring events

Treatments false positives due to medications presenting different routes of administration

Tests false positive due to the large number of full overlap instances that did not corefer

Discussion

Developed coreference resolution system for the medical domain (MCORES)

MCORES innovates through a multi-perspective and knowledge-based feature set

MCORES outperforms third party systems and an in-house baseline, improving coreference resolution on clinical records

Conclusion

Download - Andreea Bodnari, 1 Peter Szolovits, 1 Ozlem Uzuner 2 1 MIT, CSAIL, Cambridge, MA, USA

Top Related