automatic prediction of evidence-based recommendations via sentence-level polarity classification

Automatic Prediction of Evidence-basedRecommendations via Sentence-level Polarity

Classification

Abeed Sarker1,2 Diego Molla1,2 Cecile Paris1,2

Macquarie University1 and CSIRO ICT Centre2

Sydney, Australia

IJCNLP 2013, Nagoya, Japan

Sentence Polarity for Evidence Based Medicine Feasibility Study Automatic Polarity Classification Results

Contents

Sentence Polarity for Evidence Based Medicine

Feasibility Study

Automatic Polarity Classification

Results

EBM Sentence Polarity Sarker, Molla, Paris 2/24


Evidence Based Medicine

http://laikaspoetnik.wordpress.com/2009/04/04/evidence-based-medicine-the-facebook-of-medicine/


http://laikaspoetnik.wordpress.com/2009/04/04/evidence-based-medicine-the-facebook-of-medicine/


The Ultimate Goal



Sentence Polarity for EBM

The Task

I Given a context intervention, determine the polarity of asentence returned by an automatic summariser.

Q IR

doc1

doc2

doc3

summarisers

s11

s21

s31

s12

s22

s32

polarity

detectors

multi-summariser

drug1, +

drug2, +

drug3, −

+

+

−

+

−+



Sentence Polarity in ContextDifferent contexts may determine different polarities

Sentence fragment

The present study demonstrated that the combination ofcimetidine with levamisole is more effective than cimetidine aloneand is a highly effective therapy ...

Polarities in Context

I cimetidine with levamisole: recommended.

I cimetidine alone: not recommended.



Related Work

Related tasks

I Sentiment analysis

I Semantic orientation

I Opinion mining

I Subjectivity

Typical approaches usestatistical classifiers (e.g.SVM) trained on bag-of-wordfeatures.

Closely Related

Niu et al. (2005,2006) Polarity classification of medical sentencesinto four categories (positive, negative, neutral, nooutcome).

Our approach contemplates the possibility of the same sentencehaving multiple polarities.



Contents


Feasibility Study


Results



Data and Annotation

Initial corpus

456 clinical questions sourcedfrom the Journal of FamilyPractice.

Polarity annotations

589 sentences from 33questions annotated.

I Bottom-line answers.

I Key sentences extractedby QSpec summariser.



Example of annotations

Question

What is the most effective beta-blocker for heart failure?

Bottom-line answer

Three beta-blockers- carvedilol, metoprolol, and bisoprolol-reducemortality in chronic heart failure caused by left ventricular systolicdysfunction, when used in addition to diuretics and angiotensinconverting enzyme (ACE) inhibitors.

Contextual polarities

carvedilol — recommended; metoprolol — recommended;bisoprolol — recommended.



Analysis I

Inter-annotator agreement (124 sentences)

I Cohen Kappa: k = 0.85 (almost perfect agreement).

Agreement between annotated sentences and bottom-linesummaries

I Interventions with positive polarity that are mentioned in thebottom-line summary: 177.

I Polarity agreement: 95.5%.



Analysis II

But do we have enough interventions?

I Out of 109 unique interventions listed in the bottom-linesummaries . . .

I . . . 99 are listed in the annotated sentences.

I Recall= 90.8%

I If we ignore missing abstracts: Recall = 96.1%



Contents


Feasibility Study


Results



Approach

I Train a statistical classifier (SVM).

I Input: context, sentence (may have sentence duplicates, eachwith a different context).

I Output: the polarity.

Features

1. Word n-grams

2. Change Phrases

3. UMLS Semantic Types

4. Negations

5. PIBOSO Category

6. Synset Expansion

7. Context Windows

8. Dependency Chains

9. Other Features



Description of Features I

1 Word n-grams

I n = 1, 2

I Lowercased, stop words removed, stemmed (Porter).

I Context words (strings matching the provided contexts)replaced with generic string ’ CONTEXT ’.

I Disorder terms (UMLS semantic types) replaced with genericstring ’ DISORDER ’.



Description of Features II

2 Change Phrases

I Expanded Niu et al. (2005) groups of good, bad, more, lesswords.

I Features used: more-good, more-bad, less-good, less-bad.

I Context window of 4 words.

3 UMLS semantic types

I Used all UMLS semantic types as binary features.



Description of Features III

4 Negations

I Niu et al. 2005.

I BioScope corpus.

I NegEx.

5 PIBOSO categories

I Population, Intervention, Background, Outcome, Studydesign, Other.

I Used Kim et al. (2011) classifier.



Description of Features IV

6 Synset Expansion

I Use WordNet to expand synonyms.

7 Context Windows

I Terms within 3-word boundaries around context-drug terms.

I Terms before are appended ’BEFORE’ string.

I Terms after are appended ’AFTER’ string.



Description of Features V

8 Dependency chains

I Used GDep parser.I For each intervention, follow dependencies using this rule:

1. Move up the dependency chain until we find a verb or the root.2. Move down the dependencies and collect all terms.

I Terms collected are appended ’DEP’ string.

9 Other features

I Context-intervention position.

I Summary sentence position.

I Presence of modals, comparatives, superlatives.



Contents


Feasibility Study


Results



Results with SVM Classifier

Training: 85% of annotated data (2008 sentences).

Test: 15% of annotated data (354 sentences).

Feature setsAccuracy F-score

Value (%) 95% CI Positive Non-positive

1,2,3,4 (Niu) 76.0 71.2–80.4 0.58 0.831–6 78.5 73.8–82.8 0.64 0.85All (Niu) 83.9 79.7–87.6 0.71 0.89All (Bioscope) 84.7 80.5–88.9 0.74 0.89All (NegEx) 84.5 80.2–88.1 0.73 0.89



Impact of Training Size on Classification Results

It seems that we will getbetter results with moredata. . .



Towards Generation of Bottom-line Recommendations

I Used the 33 questions from our preliminary analysis.

I Compared automatic polarities of interventions with manualannotations of bottom-line summaries.

Results

Recall Precision F1

0.62 0.82 0.71

We might get better results with more training data.



Conclusionshttp://web.science.mq.edu.au/˜diego/medicalnlp/

I There is strong agreement between polarity of interventions inclinical abstracts and polarity in bottom-line summaries.

I A SVM classifier with a range of features including contextfeatures achieve better results than classifiers without contextfeatures.

I More training data will probably lead to better results.

Bottom-line conclusions

I Polarity classification of abstract sentences may help EBMsummarisation.

I More data are needed.


automatic prediction of evidence-based recommendations via sentence-level polarity classification

Health & Medicine