discriminant mixture of 3d molecular surface models

IntroThe 3D Surface Model

Discriminant ArchitectureResults and Future Work

Discriminant Mixture of 3D Molecular SurfaceModels

Pascal LamblinJoint work with Yoshua Bengio, Dan Popovici, Benoit Cromp

and Pierre-Jean L’Heureux

UdeM-McGill-MITACS Machine Learning Seminars

1 IntroQSAR and Virtual Screening

2 The 3D Surface ModelSurface and Surface TemplateTheoretical MotivationThe Alignment ProcessAlignment Results

3 Discriminant ArchitectureThe ScoresThe architecture

4 Results and Future WorkResultsFuture WorkConclusion

QSAR and Virtual Screening

Quantitative Structure-Activity Relationship

Try to predict the activity of a molecule from its structure (itsformula)

Activity: against some predefined target

Virtual Screening

Part of the process of drug discovery (pharmaceuticalindustry)

Screening: find compounds active against an interestingtarget

Virtual: without testing the actual chemical reaction

We don’t have much information on the target (we cannotuse other computational chemistry tools)

Use data banks full of molecules, only a small fraction areactive

We have samples of known (actually tested) actives andinactives

Virtual Screening

Surface and Surface TemplateTheoretical MotivationThe Alignment ProcessAlignment Results

Model Overview

We focus on the surface of the molecule, since it is the partthat directly interacts with the target

We consider the shape of the surface, and the value of somechemical features (electrostatic charges, hydrophobicity,distance to closer O atom...) on this surface

We suppose that there is a number of “perfect” templatesurfaces that fit as well as possible into the target site

Some parts of the template surface (or some properties) canplay a more or less important role in determining the activity

Model Overview

Molecular Surface

A molecular surface m is represented as a list of points, wherepoint i has:

3D spatial coordinates (xmi , ym

i , zmi )

for each chemical property k, its value pmi ,k

Molecular Surface

i , zmi )

Molecular Surface

i , zmi )

Molecular Surface Template

A template t is also represented as a list of points, containing

3D spatial coordinates (x ti , y

ti , z t

the standard deviation σti of a 3D spherical Gaussian centered

on the spatial coordinates

for each chemical property k, the mean µti ,k and standard

deviation σti ,k of a Gaussian

And a label at ∈ {0, 1} (active or inactive)

ti , z t

Generative Model

We can consider that the templates define a generative model,and that a molecule surface x is drawn from template t bythis process:

1 For each point i of the template, sample (xi , yi , zi ) and pi,k

2 Sample a rigid transformation T from a prior distribution P(T )3 Apply T to each point sampled in 1

The likelihood P(x|t) can be written:

P(x|t) =

∫P(x|T , t)P(T )dT

where P(x|T , t) =∏

i N (T−1(xi , yi , zi ); (xti , y

ti , z t

i ), σti I )

The integral is intractable, so we perform an approximatemaximization over T , using ICP

We train the model parameters discriminatively, because wedon’t have the exact likelihood

Generative Model

P(x|t) =

i N (T−1(xi , yi , zi ); (xti , y

ti , z t

i ), σti I )

Generative Model

P(x|t) =

i N (T−1(xi , yi , zi ); (xti , y

ti , z t

i ), σti I )

Generative Model

P(x|t) =

i N (T−1(xi , yi , zi ); (xti , y

ti , z t

i ), σti I )

Generative Model

P(x|t) =

i N (T−1(xi , yi , zi ); (xti , y

ti , z t

i ), σti I )

Generative Model

P(x|t) =

i N (T−1(xi , yi , zi ); (xti , y

ti , z t

i ), σti I )

Generative Model

P(x|t) =

i N (T−1(xi , yi , zi ); (xti , y

ti , z t

i ), σti I )

Our Goal

Learn the templates, so that we know

its perfect shape and propertieswhere it is important to have them

Be able to recognize actives and inactives

Become rich, healthy, and famous, and live happily ever after

Our Goal

Overview

We need a similarity measure between a template and a surface

This measure should be invariant by translation and rotationFind the most likely spacial alignment (using both geometryand chemical features) by an approximate method: ICPAlign the template on the molecule surfaceFrom the aligned surfaces, compute a score:

(x ti − xm

ji)2 + (y t

i − ymji

)2 + (z ti − zm

(µti ,k − pm

ji ,k)2

σti ,k

−∑

log σti −

log σti ,k

Overview

(x ti − xm

ji)2 + (y t

i − ymji

)2 + (z ti − zm

(µti ,k − pm

ji ,k)2

σti ,k

−∑

log σti −

log σti ,k

Overview

(x ti − xm

ji)2 + (y t

i − ymji

)2 + (z ti − zm

(µti ,k − pm

ji ,k)2

σti ,k

−∑

log σti −

log σti ,k

Overview

(x ti − xm

ji)2 + (y t

i − ymji

)2 + (z ti − zm

(µti ,k − pm

ji ,k)2

σti ,k

−∑

log σti −

log σti ,k

Overview

(x ti − xm

ji)2 + (y t

i − ymji

)2 + (z ti − zm

(µti ,k − pm

ji ,k)2

σti ,k

−∑

log σti −

log σti ,k

The alignment method: ICP

Iterative method, usual for registration of 2D or 3D shapes

For each point i on the first surface, find its nearest neighborji on the other surface.

∀i , ji = argminj

(x ti − xm

j )2 + (y ti − ym

j )2 + (z ti − zm

Compute the rigid transformation minimizing the sum ofsquare distances between the pairs of nearest neighbors

(R,T ) = min∑

(x ti − xm

ji)2 + (y t

i − ymji

)2 + (z ti − zm

with (x ti , y

ti , z t

i )′ = R(x t

i , yti , z t

i ) + T

Apply this transformation and iterate, until convergence.

Since ICP is sensitive to local minima, we try different initialconditions

∀i , ji = argminj

(x ti − xm

j )2 + (y ti − ym

j )2 + (z ti − zm

(R,T ) = min∑

(x ti − xm

ji)2 + (y t

i − ymji

)2 + (z ti − zm

with (x ti , y

ti , z t

i )′ = R(x t

i , yti , z t

i ) + T

∀i , ji = argminj

(x ti − xm

j )2 + (y ti − ym

j )2 + (z ti − zm

(R,T ) = min∑

(x ti − xm

ji)2 + (y t

i − ymji

)2 + (z ti − zm

with (x ti , y

ti , z t

i )′ = R(x t

i , yti , z t

i ) + T

∀i , ji = argminj

(x ti − xm

j )2 + (y ti − ym

j )2 + (z ti − zm

(R,T ) = min∑

(x ti − xm

ji)2 + (y t

i − ymji

)2 + (z ti − zm

with (x ti , y

ti , z t

i )′ = R(x t

i , yti , z t

i ) + T

∀i , ji = argminj

(x ti − xm

j )2 + (y ti − ym

j )2 + (z ti − zm

(R,T ) = min∑

(x ti − xm

ji)2 + (y t

i − ymji

)2 + (z ti − zm

with (x ti , y

ti , z t

i )′ = R(x t

i , yti , z t

i ) + T

The modified method

Use also chemical features and template’s deviations during thenearest-neighbors computations

Geometry only:

∀i , ji = argminj

(x ti − xm

j )2 + (y ti − ym

j )2 + (z ti − zm

With chemical features and deviations:

∀i , ji = argminj

(x ti − xm

j )2 + (y ti − ym

j )2 + (z ti − zm

(µti ,k − pm

j ,k)2

σti ,k

The modified method

Geometry only:

∀i , ji = argminj

(x ti − xm

j )2 + (y ti − ym

j )2 + (z ti − zm

∀i , ji = argminj

(x ti − xm

j )2 + (y ti − ym

j )2 + (z ti − zm

(µti ,k − pm

j ,k)2

σti ,k

The modified method

Geometry only:

∀i , ji = argminj

(x ti − xm

j )2 + (y ti − ym

j )2 + (z ti − zm

∀i , ji = argminj

(x ti − xm

j )2 + (y ti − ym

j )2 + (z ti − zm

(µti ,k − pm

j ,k)2

σti ,k

The modified method

Use of chemical distances for weighting

Without weighting:

(R,T ) = min∑

i − xmji

)2 + (y ti − ym

ji)2 + (z t

i − zmji

With chemical features and weighting:

(R,T ) = min∑

wi (xti − xm

ji)2 + (y t

i − ymji

)2 + (z ti − zm

with wi = sigmoid(

(α−

√∑k

(µti,k−pm

ji ,k)2

σti,k

where (x ti , y

ti , z t

i )′ = R(x t

i , yti , z t

i ) + T

The modified method

Without weighting:

(R,T ) = min∑

i − xmji

)2 + (y ti − ym

ji)2 + (z t

i − zmji

(R,T ) = min∑

wi (xti − xm

ji)2 + (y t

i − ymji

)2 + (z ti − zm

with wi = sigmoid(

(α−

√∑k

(µti,k−pm

ji ,k)2

σti,k

))where (x t

i , yti , z t

i )′ = R(x t

i , yti , z t

i ) + T

The modified method

Without weighting:

(R,T ) = min∑

i − xmji

)2 + (y ti − ym

ji)2 + (z t

i − zmji

(R,T ) = min∑

wi (xti − xm

ji)2 + (y t

i − ymji

)2 + (z ti − zm

with wi = sigmoid(

(α−

√∑k

(µti,k−pm

ji ,k)2

σti,k

))where (x t

i , yti , z t

i )′ = R(x t

i , yti , z t

i ) + T

Visualizing alignments

Figure: Without chemical information, with chemical information

Utility of using chemical features

Figure: Without chemical information

Figure: With chemical information

The ScoresThe architecture

Formula of the Score

The alignment score between template t and molecularsurface m is:

Smt = −1

(x ti − xm

ji)2 + (y t

i − ymji

)2 + (z ti − zm

(µti ,k − pm

ji ,k)2

σti ,k

−∑

log σti −

log σti ,k

where (x ti , y

ti , z t

i )′ = R(x t

i , yti , z t

i ) + T , and (R,T ) is obtainedthrough ICP.

Approximate likelihood that m was generated from t

Formula of the Score

The alignment score between template t and molecularsurface m is:

Smt = −1

(x ti − xm

ji)2 + (y t

i − ymji

)2 + (z ti − zm

(µti ,k − pm

ji ,k)2

σti ,k

−∑

log σti −

log σti ,k

where (x ti , y

ti , z t

i )′ = R(x t

i , yti , z t

i ) + T , and (R,T ) is obtainedthrough ICP.

Approximate likelihood that m was generated from t

A neural net

The scores with all templates are the input of an ordinaryNeural Network

The network discriminates between actives and inactives(cross-entropy)

A neural net

The scores with all templates are the input of an ordinaryNeural Network

The network discriminates between actives and inactives(cross-entropy)

Training

We train the architecture by backpropagating the error gradient

to the output weights

to the input weights

to the template parameters (σti , µt

i ,k , σti ,k , α and β)

Training

Some implementation tricks

Since we are more interested in actives, we replicate the activesurfaces in the training set, in order to have at least as manyactive as inactives

We initialize the templates from randomly-picked actives andinactives from the training set

The scores need to be normalized in order not to saturate theinput neurons, we initialize then learn the normalizing factors

What training achieves

After the training phase, we should have learned:

the templates, including standard deviations

a discriminant system, telling us if a surface is likely to beactive

that it is not enough to get rich and famous

ResultsFuture WorkConclusion

Results on McMaster contest data set

Dataset of molecules tested against E. Coli dihydrofolatereductase

33 actives out of 50 000

We selected 93 inactives (as diverse as possible)

Comparison with PLS (Partial Least Squares), we reported

Lift =as/ns

Split Surface Template Learning PLS

1 173.96 149.11

2 149.11 149.11

3 149.11 173.96

Lift =as/ns

1 173.96 149.11

2 149.11 149.11

3 149.11 173.96

Lift =as/ns

1 173.96 149.11

2 149.11 149.11

3 149.11 173.96

Lift =as/ns

1 173.96 149.11

2 149.11 149.11

3 149.11 173.96

Lift =as/ns

1 173.96 149.11

2 149.11 149.11

3 149.11 173.96

Future Work

Also update the templates’ spatial coordinates during thelearning phase (not only the deviations)

Add molecular-level chemical properties as inputs of theneural net

Find a way to speed up the computation of the alignments(approximate or faster nearest neighbors finding, better set ofinitial transformations)

More experiments...

Future Work

More experiments...

Future Work

More experiments...

Future Work

More experiments...

Future Work

Design tools to easily visualize and exploit learned templates

Conclusion

We have a method that:

gives results as good as state of the art

produces surface templates, interpretable by chemists

does not need to compare each pair of molecule in thedatabase

Conclusion

Questions?

The End

discriminant mixture of 3d molecular surface models

Documents

flexible discriminant and mixture...

discriminant analysis to describe multiple regression...

1 lecture 4 linear machine linear discriminant functions...

discriminant e

discriminant analysis. 18-2 similarities and differences...

chapter 5: linear discriminant functions introduction ...

การจ...

molecular spectroscopy - tanta...

incremental hierarchical discriminant...

the quadratic formula and the discriminant. the discriminant

flexible discriminant and mixture models - stanford...

linear discriminant functions discriminant functions least...

mixmod user’s guide mixture modelling software ·...

discriminant analysis with graph learning for...

discriminant analysis

semiparametric discriminant analysis of mixture ... ·...

discriminant analysis

kernelized discriminant analysis and adaptive methods for...

molecular simulation of oil mixture adsorption character

linear discriminant analysis, two classes linear...