predicting virus mutations through relational learning … · predicting virus mutations through...
TRANSCRIPT
Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion
Predicting virus mutations through
relational learningAIMM 2012
E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and APasserini2
September 9th, 2012
1 - Departement d’Informatique, FS, Universite Libre de Bruxelles2 - Department of Computer Science and Information Engineering, University of Trento
3 - Ambiotec sas
E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 1/24
Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion
Motivations
Mining relevant features from protein mutation data
understanding the properties of functional sites
developing novel proteins with useful/relevant function
Rational Design
engineering technique modifying existing proteins by sitedirected mutagenesis
assumes knowledge (or intuition) about the e↵ects ofspecific mutations
involves extensive trial-and-error experiments
also serves to improve understanding protein function
E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 2/24
Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion
IntroductionAn artificial system mimicking rational design
Goal
To build an artificial system mimicking the rational designprocess
A relational learning approach to:
1 mine rules from mutation data describing mutationsrelevant to a certain behavior
2 use the rules to infer novel mutations that may induce asimilar behavior
E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 3/24
Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion
A Relational Learning Approach
backgroundknowledge
dataset of mutations / mutants
rank of novel
relevant mutations
hypothesis
E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 4/24
Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion
Step 1: Relational Learning PhaseLearning in First Order Logic
data D, background knowledge B and features inducedduring learning are represented in first order logicres against(M,nnrti) mut(M,P) AND close to site(P)
head body
searching for a set of clauses (hypothesis) covering all ormost positive examples, and none or few negative ones.
Advantages
expressivity and interpretability of the learned model
possibility to make use of specific background knowledge
ability to learn rules from description of complex,structured entities
the learnt rules constrain the rational design space
E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 5/24
Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion
Step 2: Generative PhaseMutation Generation Algorithm
Algorithm Mutation generation1: input: background knowledge B, learned model H, k2: output: rank of the most relevant mutations R3: procedure GenerateMutations(B,H, k)4: Initialize DM ;5: A find all mutations m that satisfy at least one clause c
i
2 H6: for m 2 M do
7: score SM
(m) . number of clauses ci
satisfied by m8: DM DM [ {(m, score)}9: end for
10: R RankMuts(DM,B,H, k) . rank relevant mutations11: return R12: end procedure
E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 6/24
Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion
HIV-1 RT Drug Resistance
mining rules from HIV mutation data
understand the virus adaptation mechanism
design drugs that e↵ectively counter potentiallyresistant mutants
Datasets1 Reverse Transcriptase (RT) mutations from the Los Alamos
National Laboratories HIV resistance database
NRTI ! 95 mutationsNNRTI ! 56 mutations
2 RT mutants from the Stanford HIV drug resistance
database
NRTI ! 639 mutantsNNRTI ! 747 mutants
E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 7/24
Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion
Learning settingsLearning from mutations
Mutation -based learning
Input examples: single amino-acid mutations conferringresistance to a class of drugs
aa(Pos,AA)
mut(MutationID,AA,Pos,AA1)
Target concept: a model (i.e. set of rules) describing amutation conferring resistance to a certain classof drugs
res against(MutationID,Drug)
Learning setting: learn from positive examples only(annotation on mutations NOT conferringresistance is scarce)
Output: generated resistance mutations
E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 8/24
Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion
Background Knowledge
Background Knowledge Predicates(excerpt)
typeaa(T,AA)
same type aa(R1,R2,T)
same type mut t(MutID,Pos,T)
close to site(Pos)
location(L,Pos)
catalytic propensity(AA,CP)
(Betts and Russell, 2003)
Background Knowledge Rules (example)
same type aa(R1,R2,T) typeaa(T,R1) AND typeaa(T,R2)
different type mut t(MutID,Pos) mut(MutID,R1,Pos,R2)AND NOT same type aa(R1,R2,T)
E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 9/24
Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion
Learned HypothesisModel for the resistance to NNRTI
>wt ...AGLKKKKSVTVLDVG...YQYMDDLYVG...WETWWTEY...WIPEWEFVN...
| | | | | | | |
98 112 181 190 398 405 410 418
D DD W W
mut(A,B,C,D) AND position(C,190)
mut(A,B,C,D) AND position(C,190) AND typeaa(polar,D)
mut(A,y,C,D) AND typeaa(aliphatic,D)
mut(A,B,C,a) AND position(C,106)
E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 10/24
Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion
Experimental Setting
Aleph ILP system (one-class classification setting )
30 random training/test set splits (70/30) (for each ofthe 2 learning tasks)
enrichement in the test mutations (recall)
comparison against the random generator
E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 11/24
Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion
Experimental Results
Mean recall % on 30 splits
Algorithm Random Generator
NNRTI 86 • 58NRTI 55 • 46
Mean n. generated mutations n. test mutationsNNRTI 5201 17NRTI 5548 28
(•) significant improvement evaluated with a paired Wilcoxon test(↵=0.01)
0"
10"
20"
30"
40"
50"
60"
70"
80"
90"
100"
1" 2" 3" 4" 5" 6" 7" 8" 9" 10"
mean%recall%
number%of%sa.sfied%clauses%per%generated%muta.on%
NNRTI"
NNRTI"(rand)"
NRTI"
NRTI"(rand)"
E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 12/24
Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion
Learning settingsLearning from mutants
Mutant -based learning
Input examples: mutant resistant or not to a class of drugs
aa(Pos,AA)
mut(MutantID,AA,Pos,AA1)
Target concept: a model (i.e. set of rules) describing amutant resistant to a certain class of drugs
res against(MutantID,Drug)
Learning setting: binary classification setting
Output: generated resistant mutants with a single aminoacid mutation
E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 13/24
Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion
Experimental Setting
Aleph ILP system (binary classification setting )
30 random training/test set splits (for each of the 2learning tasks)
enrichment in test set mutations as performance measure(recall)
comparison against the random generator
E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 14/24
Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion
Experimental Results
Mean recall % on 30 splits
Algorithm Random Generator
NNRTI 17 • 1NRTI 7 • 3
Mean n. generated mutations mean n. test mutationsNNRTI 236 26NRTI 420 40
0"
2"
4"
6"
8"
10"
12"
14"
16"
18"
1" 2"
mean%recall%
number%of%sa.sfied%clauses%%per%generated%muta.on%
NNRTI"
NNRTI"(rand)"
NRTI"
NRTI"(rand)"
(•) significant improvement evaluatedwith a paired Wilcoxon test (↵=0.01)
E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 15/24
Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion
Results
NNRTI rules (excerpt)
res against(A,nnrti) mut(A,B,C,D) AND position(C,177) AND
catalytic propensity(D,medium) AND same type mut t(A,C,polar)
res against(A,nnrti) mut(A,B,C,D) AND catalytic propensity(D,high) AND
typeaa(aromatic,B) AND same typeaa(D,B,neutral)
NRTI rules (excerpt)
res against(A,nrti) mut(A,B,C,D) AND position(C,33)
res against(A,nrti) mut(A,B,C,r) AND typeaa(tiny,B) AND typeaa(polar,B)
NNRTI prediction highlights
Identified resistance survaillance mutations(53%): 103N, 106A, 181C, 181I, 181V, 188C,188H, 190A, 190E, 190S
Other identified resistance mutations (29% ofDataset 1): 98G, 227C, 190C, 190Q, 190T,190V
Other identified mutations (from the literature):238N
Other key positions from the rules are: 177
Highly scored not reported as resistancemutations: 181N, 181D, 318C, 232C
NRTI prediction highlights
Identified resistance survaillance mutations(18%): 67E, 67G, 67N, 116Y, 184V, 184I
Other identified resistance mutations (18% ofDataset 1): 44D, 62V, 67A, 67S, 69R, 184T
Other identified mutations (from the literature):219H
Other key positions from the rules are: 33, 194,218
E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 16/24
Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion
Summary
Relational learning approach mimicking the rational designprocess:
HIV RT mutations/mutants
we built a relational knowledge base
we mined relevant relational features for modelingresistance mutations/mutants
we generated candidate mutations satisfying the learnedrules
promising results, both in the mutation-based and in themutant-based learning settings, suggest a potential inguiding mutant engineering or predicting virus evolution
E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 17/24
Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion
Future WorkWork in progress
extend the background knowledge
single nucleotide change(a,d).
aamutations single(R1,R2) mut(M,R1,P,R2) AND
(single nucleotide change(R1,R2) OR
single nucleotide change(R2,R1))
post-processing involving mutant evaluation by statisticallearning approaches and stability predictors or searchagainst HIV genome databases
and generalize the approach to jointly generate sets ofrelated mutations (mutants with multiple mutations)
E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 18/24
Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion
Future WorkFrom single to multiple amino acid mutations
Observations
multiple mutations are often required in order to a↵ectprotein function
neutral network theory claims that neutral mutations arerequired as intermediate steps to e↵ective ones (debated)
...EYIQAKVQM...LDNLLNIEVAY...
...EYIQAKVQM...LDNLLDIEVAY...
...EYIQAKVQM...LENLLDIEVAY...
...EYIQAKVQM...LENLLNIEVAY...
Motivation
Multiple point mutations (MPM)
Multiple mutations are often required in order to a↵ectprotein function
Neutral network theory claims that neutral mutations arerequired as intermediate steps to e↵ective ones (debated)
Andrea Passerini — Frankenstein Junior 4/14
Motivation
Multiple point mutations (MPM)
Multiple mutations are often required in order to a↵ectprotein function
Neutral network theory claims that neutral mutations arerequired as intermediate steps to e↵ective ones (debated)
Andrea Passerini — Frankenstein Junior 4/14
E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 19/24
Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion
Future WorkFrom single to multiple amino acid mutations
Predicting multiple mutations
predicting single mutations does not consider the jointe↵ect of multiple mutations
trying all possible combinations is computationallyinfeasible (and not enough data)
...EYIQAKVQM...LDNLLNIEVAY...
...EYIQAKVQM...LDNLLDIEVAY...
...EYIQAKVQM...LENLLDIEVAY...
...EYIQAKVQM...LENLLNIEVAY...
Motivation
Multiple point mutations (MPM)
Multiple mutations are often required in order to a↵ectprotein function
Neutral network theory claims that neutral mutations arerequired as intermediate steps to e↵ective ones (debated)
Andrea Passerini — Frankenstein Junior 4/14
Motivation
Multiple point mutations (MPM)
Multiple mutations are often required in order to a↵ectprotein function
Neutral network theory claims that neutral mutations arerequired as intermediate steps to e↵ective ones (debated)
Andrea Passerini — Frankenstein Junior 4/14
E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 20/24
Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion
Predicting multiple mutations
E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 21/24
Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion
Predicting multiple mutations
>m542 PISPIET FAIKKKSSS PLDKDFRKY ELREHLLKWGFY EIQKQGPGQWT IVGAETF>wt PISPIET...FAIKKKDST...PLDEDFRKY...ELRQHLLRWGFT...EIQKQGQGQWT...IVGAETF>m2012 PISPIET FAIKKKDST PLDESFRKY KLREHLLRWGFT EVQKQGPDQWT IPGAETY ******* ******.*: ***:.**** :**:***:*** *:**** .*** * ****: - - - - | | | | 67 123 207 334
mut(A,B,C,p),pos(C,334),correlated_mut(A,D,E),pos(D,207),typeaa(A,E,negative).
>m2006 PMSPIET FAIKKKDST PLHEDFRKY ELREHLLKWGLT EVQKQGPDQWT IAGAETY>wt PISPIET...FAIKKKDST...PLDEDFRKY...ELRQHLLRWGFT...EIQKQGQGQWT...IVGAETF>m1288 PISPIDT FAIKKKNSD PLDESFRKY ELREHLLKWGFF EIQKQGPGQWT IPGAETY *:***:* ******:* ** *.**** ***:***:**: *:**** .*** * ****:
- - - - - | | | | | 67 121 123 207 334
E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 22/24
Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion
Predicting multiple mutations
>m542 PISPIET FAIKKKSSS PLDKDFRKY ELREHLLKWGFY EIQKQGPGQWT IVGAETF>wt PISPIET...FAIKKKDST...PLDEDFRKY...ELRQHLLRWGFT...EIQKQGQGQWT...IVGAETF>m2012 PISPIET FAIKKKDST PLDESFRKY KLREHLLRWGFT EVQKQGPDQWT IPGAETY ******* ******.*: ***:.**** :**:***:*** *:**** .*** * ****: - - - - | | | | 67 123 207 334
mut(A,B,C,p),pos(C,334),correlated_mut(A,D,E),pos(D,207),typeaa(A,E,negative).
67 122 328 | | | - - -
PISPIET...FAIKKKDST...PLDNDFRKY...ELREHLLRWGFT...EIQKQGPGQWT...IVGAETF PISPIET...FAIKKKDST...PLDEDFRKY...ELRDHLLRWGFT...QIQKQGPGQWT...IVGAETF PISPIET...FAIKKKSST...PLDEDFRKY...ELRDHLLRWGFT...EIQKQGPGQWT...IVGAETF
>m2006 PMSPIET FAIKKKDST PLHEDFRKY ELREHLLKWGLT EVQKQGPDQWT IAGAETY>wt PISPIET...FAIKKKDST...PLDEDFRKY...ELRQHLLRWGFT...EIQKQGQGQWT...IVGAETF>m1288 PISPIDT FAIKKKNSD PLDESFRKY ELREHLLKWGFF EIQKQGPGQWT IPGAETY *:***:* ******:* ** *.**** ***:***:**: *:**** .*** * ****:
- - - - - | | | | | 67 121 123 207 334
E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 23/24
Intro A Relational Learning Approach HIV RT Drug Resistance Learning from mutations Learning from mutants Conclusion
Thank you
Questions ?
Elisa [email protected]
E Cilia1, S Teso2, S Ammendola3, T Lenaerts1, and A Passerini2 — Predicting virus mutations through relational learning 24/24