bionlp09 winners

21
 Extracting Complex Biological Events with Rich Graph-Based Feature Sets Jari Björne, Juho Heimonen, Filip Ginter, Antti Airola, Tapio Pahikkala, Tapio Salakoski BioNLP 2009 Workshop Farzaneh Sarafraz 18 June 2009

Upload: farzanehs

Post on 12-Jul-2015

241 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: BioNLP09 Winners

   

Extracting Complex Biological Eventswith Rich Graph­Based Feature Sets

Jari Björne, Juho Heimonen, Filip Ginter, AnttiAirola, Tapio Pahikkala, Tapio SalakoskiBioNLP 2009 Workshop

Farzaneh Sarafraz18 June 2009

Page 2: BioNLP09 Winners

   

BioNLP'09 Task 1

Events in abstracts Given: gene and gene products (proteins) Wanted: events

− type− trigger− participant(s)− cause (if applicable)

Page 3: BioNLP09 Winners

   

Example

"I kappa B/MAD­3 masks the nuclear localization signal of NF­kappa B p65 and requires the transactivation domain to inhibit NF­kappa B p65 DNA binding. "

Event: negative regulation

Trigger: masks

Theme1: the first p65

Cause: MAD­3

Page 4: BioNLP09 Winners

   

Event Types

Gene expression Transcription Protein Catabolism Localisation Phosphorylation

Binding Regulation Positive regulation Negative regulation

Page 5: BioNLP09 Winners

   

Training and Test Data

Training data: 800 abstracts Development data: 150 abstracts Test data: 260 abstracts

Page 6: BioNLP09 Winners

   

The System

Trigger recognition− Methods similar to NER− Classification

Argument detection− Graph edge selection− Classification

Semantic post­processing− Rule­based

Page 7: BioNLP09 Winners

   

Trigger Detection

Token labelling (one for each type and one ­) 92% of triggers are single token

− Adjacent tokens form a trigger if they appear in the training data

Triggers that share a token:− Combined class: gene expression/pos regulation

A graph node for each trigger− Not duplicated just yet

Page 8: BioNLP09 Winners

   

Classification ­ SVM

Token features− Binary: capitalisation, presence of punctuation or 

numeric characters− Stem− Character bigrams and trigrams− Token is known triggers in training data− All the above for linear and dependency 

“neighbours”

Page 9: BioNLP09 Winners

   

Classification ­ SVM

Frequency features− # of named entities

In sentence In a linear window around the token Bag­of­words count of token texts in the sentence (?)

Dependency chains− Up to depth of 3 from the token are constructed− At each depth both token and frequency features− Plus dep type and sequence of dep types in chain

Page 10: BioNLP09 Winners

   

Two SVMs

“Somewhat”  different feature sets Combined weighted results

“This design should be considered an artifact of the time­constrained, experiment­driven development of the system rather than a principled design”

Page 11: BioNLP09 Winners

   

Precision/Recall trade­off

Undetected trigger ­­> undetected event All triggers have events in the training data ­­> 

bias towards reporting an event for all detected triggers

Adjust P/R explicitly − multiply the negative class by β− find   β experimentally

Page 12: BioNLP09 Winners

   

Edge Detection

Multi­class SVM All potential directed edges

− Event node to named entity− Event node to event node (nested event)− Labelled as theme, cause, or negative

Each edge is predicted independently

Page 13: BioNLP09 Winners

   

Feature Set – Central Concept

Shortest undirected path of syntactic dependencies in the Stanford scheme parse of the sentence.

Page 14: BioNLP09 Winners

   

Feature Set

Token text, POS, entity/event class, dependency (subject)

N­grams: merging the attributes of 2­4− Consecutive tokens− Consecutive dependencies− Each token and two neighbouring dependencies− Each dependency and two neighbouring tokens− One bigram showing direction

Page 15: BioNLP09 Winners

   

Other Features

Individual component features Semantic node features Frequency features

Page 16: BioNLP09 Winners

   

Semantic Post­Processing

Duplicate nodes− Same class and same trigger− Combined trigger

Remove improper arguments Remove directed cycles by removing the 

weakest link

Page 17: BioNLP09 Winners

   

Duplicating Event Nodes

Task restrictions− Two causes,− must have theme,− etc.

Several heuristics

x­th first dependency in shortest path from the event for binding

Page 18: BioNLP09 Winners

   

Results

Page 19: BioNLP09 Winners

   

Compared to Us

Page 20: BioNLP09 Winners

   

What Didn't Work/Wasn't Tried

CRF HMM Removing strong independence assumption Co­reference resolution (4.8%)

Page 21: BioNLP09 Winners

   

End.