pls and cross validation
DESCRIPTION
Notes for 3D QSARTRANSCRIPT
![Page 1: PLS and Cross Validation](https://reader035.vdocuments.mx/reader035/viewer/2022072115/577cc7b31a28aba711a1a67a/html5/thumbnails/1.jpg)
Chapter 4: Partial Least Square (PLS) and Cross validation
Pavithra.K.B
![Page 2: PLS and Cross Validation](https://reader035.vdocuments.mx/reader035/viewer/2022072115/577cc7b31a28aba711a1a67a/html5/thumbnails/2.jpg)
Introduction
• Partial Least Squares (PLS)-Hermann and Svante Wold
• Predict differences in the values of dependent variables, or target properties from the explanatory properties, or descriptors.
• Multiple dependent variables QSAR equation is made for each target property but the coefficients are interrelated
• PLS is an extension of the more familiar technique known as multiple regression (MR).
![Page 3: PLS and Cross Validation](https://reader035.vdocuments.mx/reader035/viewer/2022072115/577cc7b31a28aba711a1a67a/html5/thumbnails/3.jpg)
• The overall goal is to use the predictors to predict the responses
• This is achieved by extracting latent variables T and U from sampled factors and responses respectively
• The extracted factors T (X scores) are used to predict the Y scores U, and then the predicted Y scores used to construct predictions for the response s
![Page 4: PLS and Cross Validation](https://reader035.vdocuments.mx/reader035/viewer/2022072115/577cc7b31a28aba711a1a67a/html5/thumbnails/4.jpg)
• The outcome knowledge about the explanatory properties to reduce uncertainty in the target properties.
Computationally two algorithms are used• NIPALS Algorithm• SIMPLS Algorithm
![Page 5: PLS and Cross Validation](https://reader035.vdocuments.mx/reader035/viewer/2022072115/577cc7b31a28aba711a1a67a/html5/thumbnails/5.jpg)
Terminology of Multiple Regression and PLS
• s is the root mean square (RMS) or standard error• r2 is the proportion of the original variance• F-ratio explained ratio unexplained • residual Actual target property value difference calculated target property value.• equation is the QSAR, a set of coefficients and an
intercept or offset, used for prediction.• prediction = intercept + (explanatory1 * coeff1) +
(explanatory2 * coeff2) + (explanatory3 * coeff3) + …
![Page 6: PLS and Cross Validation](https://reader035.vdocuments.mx/reader035/viewer/2022072115/577cc7b31a28aba711a1a67a/html5/thumbnails/6.jpg)
• Cross validation with PLS some indices change omitted as meaningless.
• The key difference is in the definition of the s value. • Analyses not involving cross validation– s is the uncertainty remaining after the least-squares fit
has been performed. • In cross validation– s becomes the expected uncertainty in prediction for an
individual compound based on the data available from other compounds in the set;
– in this context, s is the root mean PRedictive Error Sum of Squares (PRESS).
![Page 7: PLS and Cross Validation](https://reader035.vdocuments.mx/reader035/viewer/2022072115/577cc7b31a28aba711a1a67a/html5/thumbnails/7.jpg)
• the “cross validated r2” (called q2) often much lower than the (conventional) r2 for the same data.
• However, PRESS and q2 are proving to generally be much better indicators than s and r2 of how reliable predictions are.
• The formula for q2 is:
• Ypred = a predicted value• Yactual = an actual or experimental value• Ymean = the best estimate of the mean of all values that might
be predicted the summations are over the same set of Y.• the numerator is PRESS.
![Page 8: PLS and Cross Validation](https://reader035.vdocuments.mx/reader035/viewer/2022072115/577cc7b31a28aba711a1a67a/html5/thumbnails/8.jpg)
COMPARISON OF PLS WITH MULTIPLE REGRESSION
Major advantages PLS offers over MR are• ability to produce robust equations even when the
number of 'independent variables‘ vastly exceeds the number of experimental observations.
• Predictions more accurate than MR• PLS models are much more stable • can simultaneously derive models for more than
one dependent variable• Much more rapid computation with large data
matrices
![Page 9: PLS and Cross Validation](https://reader035.vdocuments.mx/reader035/viewer/2022072115/577cc7b31a28aba711a1a67a/html5/thumbnails/9.jpg)
Cross validation
• Cross-Validation (used in PLS)– Remove one or more pieces of input data– Rederive QSAR equation– Calculate omitted data– Compute root-mean-square error to evaluate efficacy of model
• Typically 20% of data is removed for each iteration• The model with the lowest RMS error has the optimal number of
components/descriptors
![Page 10: PLS and Cross Validation](https://reader035.vdocuments.mx/reader035/viewer/2022072115/577cc7b31a28aba711a1a67a/html5/thumbnails/10.jpg)
Leave-one-out cross validation
• Leave-one-out cross validation (LOOCV) is K-fold cross validation taken to its logical extreme, with K equal to N, the number of data points in the set.
• That means that N separate times, the function approximator is trained on all the data except for one point and a prediction is made for that point.
• As before the average error is computed and used to evaluate the model.
![Page 11: PLS and Cross Validation](https://reader035.vdocuments.mx/reader035/viewer/2022072115/577cc7b31a28aba711a1a67a/html5/thumbnails/11.jpg)
Chapter 7 13
![Page 12: PLS and Cross Validation](https://reader035.vdocuments.mx/reader035/viewer/2022072115/577cc7b31a28aba711a1a67a/html5/thumbnails/12.jpg)
Chapter 7 14
![Page 13: PLS and Cross Validation](https://reader035.vdocuments.mx/reader035/viewer/2022072115/577cc7b31a28aba711a1a67a/html5/thumbnails/13.jpg)
Chapter 7 15
![Page 14: PLS and Cross Validation](https://reader035.vdocuments.mx/reader035/viewer/2022072115/577cc7b31a28aba711a1a67a/html5/thumbnails/14.jpg)
Chapter 7 16
![Page 15: PLS and Cross Validation](https://reader035.vdocuments.mx/reader035/viewer/2022072115/577cc7b31a28aba711a1a67a/html5/thumbnails/15.jpg)
Chapter 7 17
![Page 16: PLS and Cross Validation](https://reader035.vdocuments.mx/reader035/viewer/2022072115/577cc7b31a28aba711a1a67a/html5/thumbnails/16.jpg)
![Page 17: PLS and Cross Validation](https://reader035.vdocuments.mx/reader035/viewer/2022072115/577cc7b31a28aba711a1a67a/html5/thumbnails/17.jpg)
• In cross-validation, one value is left out, a model is derived using the remaining data
• the model is used to predict the value originally left out. This procedure is repeated for all values, yielding q2
• q2 is normally (much) lower than r2 and values greater than 0.5 already indicate significant predictive power.
![Page 18: PLS and Cross Validation](https://reader035.vdocuments.mx/reader035/viewer/2022072115/577cc7b31a28aba711a1a67a/html5/thumbnails/18.jpg)
Thank you