john mitchell; james mcdonagh ; neetika nath
DESCRIPTION
John Mitchell; James McDonagh ; Neetika Nath. Rob Lowe; Richard Marchese Robinson . RF-Score: a Machine Learning Scoring Function for Protein-Ligand Binding Affinities . Ballester, P.J. & Mitchell, J.B.O. (2010) Bioinformatics 26, 1169-1175 . - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: John Mitchell; James McDonagh ; Neetika Nath](https://reader035.vdocuments.mx/reader035/viewer/2022062305/56816139550346895dd09797/html5/thumbnails/1.jpg)
1
John Mitchell; James McDonagh; Neetika Nath
Rob Lowe; Richard Marchese Robinson
![Page 2: John Mitchell; James McDonagh ; Neetika Nath](https://reader035.vdocuments.mx/reader035/viewer/2022062305/56816139550346895dd09797/html5/thumbnails/2.jpg)
RF-Score: a Machine Learning Scoring Functionfor Protein-Ligand Binding Affinities
• Ballester, P.J. & Mitchell, J.B.O. (2010) Bioinformatics 26, 1169-1175
![Page 3: John Mitchell; James McDonagh ; Neetika Nath](https://reader035.vdocuments.mx/reader035/viewer/2022062305/56816139550346895dd09797/html5/thumbnails/3.jpg)
![Page 4: John Mitchell; James McDonagh ; Neetika Nath](https://reader035.vdocuments.mx/reader035/viewer/2022062305/56816139550346895dd09797/html5/thumbnails/4.jpg)
Calculating the affinities of protein-ligand complexes:
· For docking
· For post-processing docking hits
· For virtual screening
· For lead optimisation
· For 3D QSAR
· Within series of related complexes
· For any general complex
· Absolute (hard!)
· Relative
A difficult, unsolved problem.
![Page 5: John Mitchell; James McDonagh ; Neetika Nath](https://reader035.vdocuments.mx/reader035/viewer/2022062305/56816139550346895dd09797/html5/thumbnails/5.jpg)
Three existing approaches …
1. Force fields
![Page 6: John Mitchell; James McDonagh ; Neetika Nath](https://reader035.vdocuments.mx/reader035/viewer/2022062305/56816139550346895dd09797/html5/thumbnails/6.jpg)
Three existing approaches …
2. Empirical Functions
![Page 7: John Mitchell; James McDonagh ; Neetika Nath](https://reader035.vdocuments.mx/reader035/viewer/2022062305/56816139550346895dd09797/html5/thumbnails/7.jpg)
Three existing approaches …
2. Empirical Functions
![Page 8: John Mitchell; James McDonagh ; Neetika Nath](https://reader035.vdocuments.mx/reader035/viewer/2022062305/56816139550346895dd09797/html5/thumbnails/8.jpg)
Three existing approaches …
3. Knowledge based
![Page 9: John Mitchell; James McDonagh ; Neetika Nath](https://reader035.vdocuments.mx/reader035/viewer/2022062305/56816139550346895dd09797/html5/thumbnails/9.jpg)
How knowledge-based scoring functions have worked …
· P-L complexes from PDB· Assign atoms to types· Find histograms of type-type distances· Convert to an ‘energy’· Add up the energies from all P-L atom pairs
![Page 10: John Mitchell; James McDonagh ; Neetika Nath](https://reader035.vdocuments.mx/reader035/viewer/2022062305/56816139550346895dd09797/html5/thumbnails/10.jpg)
![Page 11: John Mitchell; James McDonagh ; Neetika Nath](https://reader035.vdocuments.mx/reader035/viewer/2022062305/56816139550346895dd09797/html5/thumbnails/11.jpg)
2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 80
200
400
600
800
1000
1200
Nitrogen-Oxygen Distance Dis-tribution
DIstance/ Angstroms
Num
ber o
bser
ved
![Page 12: John Mitchell; James McDonagh ; Neetika Nath](https://reader035.vdocuments.mx/reader035/viewer/2022062305/56816139550346895dd09797/html5/thumbnails/12.jpg)
· This conversion of the histogram into an energy function uses a “reverse Boltzmann” methodology.
· Thus it “assumes” that the atoms of protein and ligand are independent particles in equilibrium at temperature T.
· For a variety of reasons, these are poor assumptions …
![Page 13: John Mitchell; James McDonagh ; Neetika Nath](https://reader035.vdocuments.mx/reader035/viewer/2022062305/56816139550346895dd09797/html5/thumbnails/13.jpg)
· Molecular connectivity: atom-atom distances are miles from being independent.
· Excluded volume effects.
· No physical basis for assuming such an equilibrium.
· Changes in structure with T are small and not like those implied by the Boltzmann distribution.
![Page 14: John Mitchell; James McDonagh ; Neetika Nath](https://reader035.vdocuments.mx/reader035/viewer/2022062305/56816139550346895dd09797/html5/thumbnails/14.jpg)
We thought about this …
… and wrote a paper saying
“It’s not true, but it sort of works”
![Page 15: John Mitchell; James McDonagh ; Neetika Nath](https://reader035.vdocuments.mx/reader035/viewer/2022062305/56816139550346895dd09797/html5/thumbnails/15.jpg)
We thought about this …
… and wrote a paper saying
“It’s not true, but it sort of works”
![Page 16: John Mitchell; James McDonagh ; Neetika Nath](https://reader035.vdocuments.mx/reader035/viewer/2022062305/56816139550346895dd09797/html5/thumbnails/16.jpg)
Then we had a better idea – could we dispense with the reverse Boltzmann formalism?
![Page 17: John Mitchell; James McDonagh ; Neetika Nath](https://reader035.vdocuments.mx/reader035/viewer/2022062305/56816139550346895dd09797/html5/thumbnails/17.jpg)
· Instead of assuming a formula that relates the distance distribution to the binding free energy …
… use machine learning to learn the relationship from known structures and binding affinities.
![Page 18: John Mitchell; James McDonagh ; Neetika Nath](https://reader035.vdocuments.mx/reader035/viewer/2022062305/56816139550346895dd09797/html5/thumbnails/18.jpg)
· Instead of assuming a formula that relates the distance distribution to the binding free energy …
… use machine learning to learn the relationship from known structures and binding affinities.
· And persuade someone to pay for it!
![Page 19: John Mitchell; James McDonagh ; Neetika Nath](https://reader035.vdocuments.mx/reader035/viewer/2022062305/56816139550346895dd09797/html5/thumbnails/19.jpg)
2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 80
200400600800
10001200
Nitrogen-Oxygen Distance Distribution
DIstance/ Angstroms
Num
ber o
bser
ved
Random Forest
Predicted binding affinity
![Page 20: John Mitchell; James McDonagh ; Neetika Nath](https://reader035.vdocuments.mx/reader035/viewer/2022062305/56816139550346895dd09797/html5/thumbnails/20.jpg)
Random Forest● Introduced by Briemann and Cutler (2001)● Development of Decision Trees (Recursive Partitioning):
● Dataset is partitioned into consecutively smaller subsets
● Each partition is based upon the value of one descriptor
● The descriptor used at each split is selected so as to optimise splitting
● Bootstrap sample of N objects chosen from the N available objects with replacement
![Page 21: John Mitchell; James McDonagh ; Neetika Nath](https://reader035.vdocuments.mx/reader035/viewer/2022062305/56816139550346895dd09797/html5/thumbnails/21.jpg)
· The Random Forest is a just forest of randomly generated decision trees …
… whose outputs are averaged to give the final prediction
![Page 22: John Mitchell; James McDonagh ; Neetika Nath](https://reader035.vdocuments.mx/reader035/viewer/2022062305/56816139550346895dd09797/html5/thumbnails/22.jpg)
Building RF-Score
PDBbind 2007
![Page 23: John Mitchell; James McDonagh ; Neetika Nath](https://reader035.vdocuments.mx/reader035/viewer/2022062305/56816139550346895dd09797/html5/thumbnails/23.jpg)
Building RF-Score
PDBbind 2007
![Page 24: John Mitchell; James McDonagh ; Neetika Nath](https://reader035.vdocuments.mx/reader035/viewer/2022062305/56816139550346895dd09797/html5/thumbnails/24.jpg)
Validation results: PDBbind set
· Following method of Cheng et al. JCIM 49, 1079 (2009)· Independent test set PDBbind core 2007, 195 complexes from 65 clusters
![Page 25: John Mitchell; James McDonagh ; Neetika Nath](https://reader035.vdocuments.mx/reader035/viewer/2022062305/56816139550346895dd09797/html5/thumbnails/25.jpg)
Validation results: PDBbind set
· RF-Score outperforms competitor scoring functions, at least on our test· RF-Score is available for free from our group website
![Page 26: John Mitchell; James McDonagh ; Neetika Nath](https://reader035.vdocuments.mx/reader035/viewer/2022062305/56816139550346895dd09797/html5/thumbnails/26.jpg)
26
John Mitchell; James McDonagh; Neetika Nath
Rob Lowe; Richard Marchese Robinson