rosario hearst
TRANSCRIPT
![Page 1: Rosario Hearst](https://reader030.vdocuments.mx/reader030/viewer/2022013114/55624e12d8b42a1b4b8b4a8b/html5/thumbnails/1.jpg)
Multiway Relation Classification:Application to ProteinProtein Interactions
Barbara RosarioMarti A. Hearst2005
Farzaneh Sarafraz30 April 2009
![Page 2: Rosario Hearst](https://reader030.vdocuments.mx/reader030/viewer/2022013114/55624e12d8b42a1b4b8b4a8b/html5/thumbnails/2.jpg)
HIV1 Human Protein Interactions Database Pair of Proteins Interaction type(s) between them PubMed ID (etc.)
![Page 3: Rosario Hearst](https://reader030.vdocuments.mx/reader030/viewer/2022013114/55624e12d8b42a1b4b8b4a8b/html5/thumbnails/3.jpg)
Data
2224 records (now 5134) 65 interaction types (now 68) 809 proteins (now 1434 + 9 and 2295 pairs) 984 articles (now 3099) Average 1.9 interactions per PP (max = 23) Average 5.9 interactions per article (max = 90)
![Page 4: Rosario Hearst](https://reader030.vdocuments.mx/reader030/viewer/2022013114/55624e12d8b42a1b4b8b4a8b/html5/thumbnails/4.jpg)
Goal
For every “triple”− PP− A (Article with unique pmid)
Find the interaction type− (ignore 7.7% of the triples with > 1 interaction)
![Page 5: Rosario Hearst](https://reader030.vdocuments.mx/reader030/viewer/2022013114/55624e12d8b42a1b4b8b4a8b/html5/thumbnails/5.jpg)
NER
LocusLink “Conservative” approach No coreference analysis Not good recall High precision
![Page 6: Rosario Hearst](https://reader030.vdocuments.mx/reader030/viewer/2022013114/55624e12d8b42a1b4b8b4a8b/html5/thumbnails/6.jpg)
Method – assuming one interaction
For a subset of all the PPs (45%)− Get all full text articles− Get the sentences that have both PP− Group as “papers”
Also for a tripe PPA− Get the papers that cite A− Get the sentences that have PP and mention A− Group as “citances”
![Page 7: Rosario Hearst](https://reader030.vdocuments.mx/reader030/viewer/2022013114/55624e12d8b42a1b4b8b4a8b/html5/thumbnails/7.jpg)
Training Data Construction
“papers”− 0.5 sentence per triple (max 79)− 50.6 sentences per interaction type (max 119)
“citances”− 0.4 sentence per triple (max 105)− 49.2 sentences per interaction type (max 162)
Include an interaction type if >40 in both
![Page 8: Rosario Hearst](https://reader030.vdocuments.mx/reader030/viewer/2022013114/55624e12d8b42a1b4b8b4a8b/html5/thumbnails/8.jpg)
Interaction Types
![Page 9: Rosario Hearst](https://reader030.vdocuments.mx/reader030/viewer/2022013114/55624e12d8b42a1b4b8b4a8b/html5/thumbnails/9.jpg)
Task
Given a PPA triple Extract sentences that have PP Predict for the entire PPA one of 10 interaction
types
![Page 10: Rosario Hearst](https://reader030.vdocuments.mx/reader030/viewer/2022013114/55624e12d8b42a1b4b8b4a8b/html5/thumbnails/10.jpg)
Models
Generative− Dynamic Graphical Model− Simple Naïve Bayes Classifier
Discriminative− Neural Network (feedforward, conjugate gradient)
![Page 11: Rosario Hearst](https://reader030.vdocuments.mx/reader030/viewer/2022013114/55624e12d8b42a1b4b8b4a8b/html5/thumbnails/11.jpg)
Dynamic Graphical Model
Based on previous work Roles: PROTEIN, NULL Features: words
![Page 12: Rosario Hearst](https://reader030.vdocuments.mx/reader030/viewer/2022013114/55624e12d8b42a1b4b8b4a8b/html5/thumbnails/12.jpg)
DM – Assumptions
There is an interaction Single interaction per sentence As many role states as words Words = features
− One feature node per role− Roles are hidden− Protein names may be masked
![Page 13: Rosario Hearst](https://reader030.vdocuments.mx/reader030/viewer/2022013114/55624e12d8b42a1b4b8b4a8b/html5/thumbnails/13.jpg)
Evaluation
Documentlevel− (Not all the sentences describe an interaction)− For every triple an interaction is assigned to the
whole document− Using two methods:
Mj Cf
![Page 14: Rosario Hearst](https://reader030.vdocuments.mx/reader030/viewer/2022013114/55624e12d8b42a1b4b8b4a8b/html5/thumbnails/14.jpg)
Mj
for each triple− for each sentence of the triple
find the interaction that maximises the posterior probability of the interaction given features
assign to all sentences of the triple the most frequent interaction
![Page 15: Rosario Hearst](https://reader030.vdocuments.mx/reader030/viewer/2022013114/55624e12d8b42a1b4b8b4a8b/html5/thumbnails/15.jpg)
Cf
get all conditional probabilities (do not assign per sentence)
for each triple− choose the interaction that maximises the sum over
all the triple's sentences
![Page 16: Rosario Hearst](https://reader030.vdocuments.mx/reader030/viewer/2022013114/55624e12d8b42a1b4b8b4a8b/html5/thumbnails/16.jpg)
Results
![Page 17: Rosario Hearst](https://reader030.vdocuments.mx/reader030/viewer/2022013114/55624e12d8b42a1b4b8b4a8b/html5/thumbnails/17.jpg)
Comparison
Trigger word− 70 triggers for 10 interactions− Cooccurrence− Choose the “most specific” type− If both specific or no trigger, choose nothing− Backoff: if in doubt, choose the most frequent
interaction
![Page 18: Rosario Hearst](https://reader030.vdocuments.mx/reader030/viewer/2022013114/55624e12d8b42a1b4b8b4a8b/html5/thumbnails/18.jpg)
Comparison
− Key(B): trigger word (backoff)− Base: the most frequent interaction
![Page 19: Rosario Hearst](https://reader030.vdocuments.mx/reader030/viewer/2022013114/55624e12d8b42a1b4b8b4a8b/html5/thumbnails/19.jpg)
SentenceLevel Experiments
Manual annotation of 2114 sentences 68.3% disagreed with HIV database Contacted some of the authors
− DB error− Contradiction
“require” but under certain conditions “inhibit”
![Page 20: Rosario Hearst](https://reader030.vdocuments.mx/reader030/viewer/2022013114/55624e12d8b42a1b4b8b4a8b/html5/thumbnails/20.jpg)
SentenceLevel Evaluation
![Page 21: Rosario Hearst](https://reader030.vdocuments.mx/reader030/viewer/2022013114/55624e12d8b42a1b4b8b4a8b/html5/thumbnails/21.jpg)
Thank you.