bionlp09
TRANSCRIPT
BIONLP'09 Shared Task
Farzaneh SarafrazJames EalesReza MohammadiGoran Nenadic
26 March 2009
BioNLP'09 Task 1
Events in abstracts Given: gene and gene products (proteins) Wanted: events
− type− trigger− participant(s)− cause (if applicable)
Example
"I kappa B/MAD3 masks the nuclear localization signal of NFkappa B p65 and requires the transactivation domain to inhibit NFkappa B p65 DNA binding. "
Event: negative regulation
Trigger: masks
Theme1: the first p65
Cause: MAD3
Event Types
Gene expression Transcription Protein Catabolism Localisation Phosphorylation
Binding Regulation Positive regulation Negative regulation
Training and Test Data
Training data: 800 abstracts Development data: 150 abstracts Test data: 260 abstracts
Our System
1) Finding trigger and type
2) Finding participants (themes)
3) Post processing
1) Finding Triggers and Types CRF"I kappa B/MAD3 masks the nuclear localization..." 0 0 0 0 9 0 0 0
"The binding of I kappa B/MAD3 to NFkappa B p65 is 0 0 0 0 0 0 0 0 0 0 0 0
sufficient to retarget NFkappa B p65 from the 0 0 4 0 0 0 0 0
nucleus to the cytoplasm." 0 0 0 0
9: negative regulation
4: localisation
CRF features for each token
isprotein isPPIword generic POS tag logfrequency of token being a trigger for each
event type (10 features) number of proteins in sentence (sentencelevel)
Trigger Detection Post Processing
Positive discrimination− Manually looking at false negatives− Adding recurring triggers
Negative discrimination− Manually looking at false positives− Filtering out common mistaken tokens
Trigger Detection Results
Event Class #Gold R P FscoreLocalisation 40 77.5 47.69 59.05Binding 180 33.33 54.55 41.38Gene expression 282 76.6 58.54 66.36Transcription 68 58.82 18.6 28.27
19 84.21 88.89 86.4940 97.5 81.25 88.64
Nonreg total 629 63.91 48.73 55.3Regulation 138 13.04 62.07 21.56Positive regulation462 13.85 54.24 22.07Neg. regulation 153 29.41 45.92 35.86All total 1382 38.28 49.44 43.15
Protein catabolismPhosphorylation
2) Finding Participants
Type and number of participants− 1 theme (protein)
Gene expression Transcription Protein Catabolism Localisation Phosphorylation
− 1 or more themes (protein) Binding
− 1 theme and 1 cause (proteins/other events)
Regulation Positive regulation Negative regulation
Parse Tree Distance
Parse Tree Distance Analysis
Theme in Subtree
Single Theme events− Theme in subtree 0.7054− Theme not in subtree 0.2946
Binding event− Any theme in subtree = 0.5435− Any theme not in subtree = 0.4565
Regulation events− Either theme or cause in subtree = 0.5919− Either theme or cause not in subtree = 0.4081
Distance in Trigger Subtree
Distances not in Trigger Subtree
Rules Concerning Parse Tree Analysis
For "binding", report as themes:− up to the second closest protein in the subtree− and the first closest protein in the rest of the tree
"In contrast, gp41 failed to stimulate NFkappaB binding activity in as much as no NFkappaB bound to the main NFkappaBbinding site 2 of the IL10 promoter after addition of gp41."
Successfully missing out the final gp41.
Example of a Missed (FN) Theme
For gene expression− All the proteins in the subtree are reported as
themes"The 15lipoxygenase (lox) gene is expressed in a tissuespecific manner, predominantly in erythroid cells but also in airway epithelial cells and eosinophils."
is
/ \
gene expressed
|
15lipoxygenase
Evaluation on Development Data
Event Class #Gold R P FscoreLocalisation 53 67.92 46.75 55.38Binding 312 21.47 63.81 32.13Gene expression 356 64.61 76.33 69.98Transcription 82 53.66 89.8 67.18
21 90.48 67.86 77.5547 91.49 53.09 67.19
Nonreg total 871 50.4 68.44 58.05Regulation 172 5.23 33.33 9.05Positive regulation 632 3.48 21.36 5.99Neg. regulation 201 9.45 15.08 11.62Regulatory total 1005 4.98 19.53 7.93All total 1876 26.07 54.46 35.26
Protein catabolismPhosphorylation
Evaluation on Test Data
Event Class #Gold R P FscoreLocalisation 174 44.83 53.06 48.6Binding 347 12.68 40.37 19.3Gene expression 722 52.63 69.34 59.84Transcription 137 15.33 67.74 25
14 42.86 50 46.15135 78.52 53.81 63.86
Nonreg total 1529 41.53 60.82 49.36Regulation 291 3.09 19.15Positive regulation 983 1.12 8.87 1.99Neg. regulation 379 12.4 20.52 15.46Regulatory total 1653 4.05 16.75 6.53All total 3182 22.06 48.61 30.35
Protein catabolismPhosphorylation
5.33
Results: Ranked 12 out of 24 teams
Rank R P FScore Rank R P FScore1 46.73 58.48 51.95 13 25.96 36.26 30.262 45.82 47.52 46.66 14 20.93 49.3 29.383 34.98 61.59 44.62 15 22.69 40.55 29.14 36.9 55.59 44.35 16 21.53 36.99 27.215 33.41 51.55 40.54 17 17.44 39.99 24.296 28.13 53.56 36.88 18 28.63 20.88 24.157 28.22 45.78 34.92 19 13.45 71.81 22.668 27.75 46.6 34.78 20 22.78 19.03 20.749 21.62 62.21 32.09 21 30.42 14.11 19.2810 21.12 56.9 30.8 22 11.25 66.54 19.2511 22.5 47.7 30.58 23 11.69 31.42 17.0412 22.06 48.61 30.35 24 9.4 61.65 16.31
End.
Other Tasks
Event detection and characterization Event argument recognition Negations and speculations
Example
"I kappa B/MAD3 masks the nuclear localization signal of NFkappa B p65 and requires the transactivation domain to inhibit NFkappa B p65 DNA binding. "
Event: negative regulation
Trigger: masks
Theme1: the first p65
Cause: MAD3
Site: nuclear localization signal
Example
"In contrast, NFkappa B p50 alone fails to stimulate kappa Bdirected transcription, and based on prior in vitro studies, is not directly regulated by I kappa B. "
Event: regulation
Theme1: this p50
Trigger: regulated
Negation: true for this event
Speculation: none