introduction to machine learning for bioinformatics › dotclear › public › lectures ›...

78
Introduction to Machine Learning for Bioinformatics S1133 2019 Chloé-Agathe Azencott Centre for Computational Biology, Mines ParisTech [email protected]

Upload: others

Post on 09-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

Introduction to Machine Learning for Bioinformatics

S1133 2019

Chloé­Agathe AzencottCentre for Computational Biology, Mines ParisTech

[email protected]

Page 2: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

Overview● What kind of problems can machine learning solve, 

particularly in bioinformatics?● Some popular supervised ML algorithms:

– Linear models– Support vector machines– Random forests– Neural networks

● How do we select a machine learning algorithm?● What is overfitting, and how can we avoid it?● How do we deal with high­dimensional data?

Page 3: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

3

What is (Machine) Learning?

Page 4: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

4

Why Learn?● Learning: 

Aquire a skill from experience, practice.● Machine learning: Programming computers to 

– model phenomena– by means of optimizing an objective function– using example data.

Page 5: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

5

Why Learn?● There is no need to “learn” to calculate payroll.● Learning is used when

– Human expertise does not exist (bioinformatics);– Humans are unable to explain their expertise (speech 

recognition, computer vision).

Classical program

Machine learningprogram

Answers

Rules

Data

Rules

Data

Answers

Page 6: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

6

What about AI?

Page 7: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

7

Artificial Intelligence

● Weak AI: a software or machine focused on one narrow task

E.g. Siri (Apple), DeepBlue (IBM), AlphaGo (Alphabet), Alexa (Amazon).

● Strong AI: a machine – able to address any problem

– possessing consciousness, sentience, a mind of its own

– always active and interacting with the world

E.g.  Hal  (Clarke),  Skynet  (Cameron),  Rachael  (Dick), Samantha (Jonze).

Page 8: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

8

Artificial Intelligence

ML is a component of AI.

So are robotics, data bases, language processing...

Page 9: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

9

Zoo of ML Problems 

Page 10: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

10

Unsupervised learning

Data

Images, text, measurements, omics data...

ML algo Data!

Learn a new representation of the data

X

p

n

Page 11: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

11

Clustering

Data

Group similar data points together

– Understand general characteristics of the data;– Infer some properties of an object based on 

how it relates to other objects.

ML algo

Page 12: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

12

Clustering: applications– Customer segmentation

Find groups of customers with similar buying behavior.

– Topic modelingFind groups of documents with similar content (and thus, hopefully, topics).

– Gene expression clustering– Disease subtyping

Find groups of patients with similar molecular basis for the disease / similar symptomes.

– Tumor cell clusteringUntangle tumor heterogeneity.

Page 13: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

13

Dimensionality reduction

Data ML algo

Find a lower­dimensional representation

Data

X

p

n

Images, text, measurements, omics data... X

m

n

Page 14: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

14

Dimensionality reduction

Data

Find a lower­dimensional representation

– Reduce storage space & computational time– Remove redundances– Visualization (in 2 or 3 dimensions) and interpretability.

ML algo Data

Page 15: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

15

Supervised learning

Data

ML algo

Make predictions

Labels

Predictor

decision function

X

p

n yn

Page 16: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

16

ClassificationMake discrete predictions

Binary classification

Multi­class classification

Data

ML algo

Labels

Predictor

Page 17: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

17

Classification

gene 1

Healthy

Sickg

ene

 2

Page 18: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

18

Classification: Applications– Face recognition – Handwritten character recognition

– What is the function of this gene?– Is this DNA sequence a micro­RNA?– Does this blood sample come from a diseased 

or a healthy individual?– Is this drug appropriate for this patient?– What side effects could this new drug have?

Page 19: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

19

RegressionMake continuous predictions

Data

ML algo

Labels

Predictor

Page 20: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

20

Regression

gene 1

ge

ne 2

Page 21: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

21

Regression: Applications

– Click predictionHow many people will click on this ad? Comment on this post? Share this article on social media? 

– Load predictionHow many users will my service have at a given time?

– Gene expression regulation: how much of this gene will be expressed?

– When will this patient relapse?– Drug efficacy: how well does this drug work on this tumor?– What is the binding affinity between these two  molecules?– How soluble is this chemical in water?

Page 22: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

22

Parametric models● Decision function has a set form● Model complexity ≈ number of parameters

● Decision function can have “arbitrary” form● Model complexity grows with the number of 

samples.

Non­parametric models

Page 23: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

23

Training a supervised model● Ingredients:

– Data 

– Hypotheses classType of decision function f

– Cost functionError of f on a sample

● Recipe: empirical risk minimzation (ERM)Find, among all functions of the hypotheses class, one that minimizes the average error of f on all samples.

Page 24: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

24

Linear models

Page 25: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

25

Linear regression

● Least­squares fit:

● Equivalent to maximizing the likelihood                    under the assumption of Gaussian noise

● Exact solution                                                                      if X has full column rank

Page 26: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

26

Classification: logistic regressionLinear function   probability→

● Solve by maximizing the likelihood● No analytical solution● Use gradient descent.

Page 27: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

27

Support Vector Machines

Page 28: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

28

Large margin classifier● Find the separating hyperplane with the largest 

margin.

f(x)

prediction errorinverse of the margin

Page 29: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

29

When lines are not enough

Page 30: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

30

When lines are not enough

Page 31: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

31

When lines are not enough

● Non­linear mapping to a feature space

Page 32: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

32

When lines are not enough● Non­linear mapping to a space of higher 

dimension.● https://www.youtube.com/watch?v=3liCbRZPrZA

● Example:

Page 33: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

33

The kernel trick● The solution & SVM­solving algorithm can be 

expressed using only● Never need to explicitly compute  

● k: kernel

– must be positive semi­definite – can be interpreted as a similarity function.

● Support vectors: training points for which α ≠ 0.

Page 34: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

34

RBF kernel● Radial Basis Function kernel, or Gaussian kernel

● The feature space is infinite­dimensional.

Page 35: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

35

Random Forests

Page 36: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

36

Decision trees

● Well suited to categorical features● Naturally handle multi­class classification● Interpretable● Perform poorly.

Color?

Shape?Size?

Apple BananaGrape Apple

YellowGreen

BigSmall Curved Ball

Page 37: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

37

Ensemble learning● Combining weak learners averages out their 

individual errors (wisdom of crowds)● Final prediction:

– Classification: majority vote– Regression: average.

● Bagging: weak learners are trained on bootstraped samples of the data (Breiman, 1996).

bootstrap: sample n, with replacement.● Boosting: weak learners are built iteratively, based on 

performance (Shapire, 1990).

Page 38: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

38

Random forests● Combine many decision trees

● Each tree is trained on a data set created using

– A bootstrap sample (sample with replacement) of the data– A random sample of the features.

● Very powerful in practice.

Majority vote

Page 39: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

39

(Deep) Neural Networks

Page 40: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

40

(Deep) neural networks

...

... hidden layer

● Nothing more than a (possibly complicated) parametric model

Page 41: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

41

(Deep) neural networks

...

... hidden layer

● Fitting weights:– Non­convex optimization problem– Solved with gradient descent– Can be difficult to tune.

Page 42: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

42

(Deep) neural networks

...

... hidden layer

● Learn an internal representation of the data on which a linear model works well.

● Currently one of the most powerful supervised learning algorithms for large training sets.

Page 43: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

43

(Deep) neural networks

Yann Le Cun et al. (1990)

Internal representation of the digits data

Page 44: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

44

Adversarial examples

panda gibbon

Goodfellow et al. ICLR 2015https://arxiv.org/pdf/1412.6572v3.pdf

Page 45: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

45

October 14Deep learning for image recognition 

Source:  DOI: 10.1109/ICPR.2016.7900002 

Page 46: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

46

Generalization & overfitting

Page 47: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

47

Generalization● Goal: build models that make good predictions 

on new data.● Models that work “too well” on the data we learn 

on tend to model noise as well as the underlying phenomenon: overfitting.

Page 48: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

48

Overfitting (Classification)

Page 49: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

49

Overfitting (Regression)

Overfitting

Underfitting

Page 50: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

50

Model complexitySimple models are

– more plausible (Occam’s Razor)– easier to train, use, and interpret.

Pre

dic

tion 

err

or

Model complexity

On training data

On new data

Page 51: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

51

● Prevent overfitting by designing an objective function that accounts not only for the prediction error but also for model complexity.

min (empirical_error + λ*model_complexity)

● Rember the SVM

Regularization

f(x)

prediction errorinverse of the margin

Page 52: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

52

Ridge regression

● Unique solution, always exists

● Grouped selection:Correlated variables get similar weights.

● ShrinkageCoefficients shrink towards 0.

prediction error regularizer

hyperparameter

Page 53: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

53

Geometry of ridge regression

feasible region

unconstrained minimum

Page 54: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

54

Model selection & evaluation

Page 55: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

55

No free lunch theorem

● Any two optimization algorithms are equivalent when their performance is averaged over all possible problems.

● For any learner, there is a learning task that it will not learn well.

Wolpert & Macready, 1997

Page 56: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

56

Evaluation on held­out data● If we evaluate the model on the data we’ve used to 

train it, we risk over­estimating performance.

● Proper procedure:

– Separate the data in train/test sets

– Use a cross­validation on the train set to find the best algorithm + hyperparameter(s)

– Train this best algorithm + hyperparameter(s) on the entire train set

– The performance on the test set estimates generalization performance.

Page 57: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

57

Evaluation on held­out data● If we evaluate the model on the data we’ve used to 

train it, we risk over­estimating performance.

● Proper procedure:

– Separate the data in train/test sets

– Use a cross­validation on the train set to find the best algorithm + hyperparameter(s)

– Train this best algorithm + hyperparameter(s) on the entire train set

– The performance on the test set estimates generalization performance.

Train Test

Page 58: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

58

Evaluation on held­out data● If we evaluate the model on the data we’ve used to 

train it, we risk over­estimating performance.

● Proper procedure:

– Separate the data in train/test sets

– Use a cross­validation on the train set to find the best algorithm + hyperparameter(s)

– Train this best algorithm + hyperparameter(s) on the entire train set

– The performance on the test set estimates generalization performance.

Val

ValTrain

Val

Val

Val

Train

Train

Train

Train

perf

perf

perf

perf

perf

CV score

Train Test

Page 59: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

59

Evaluation on held­out data● If we evaluate the model on the data we’ve used to 

train it, we risk over­estimating performance.

● Proper procedure:

– Separate the data in train/test sets

– Use a cross­validation on the train set to find the best algorithm + hyperparameter(s)

– Train this best algorithm + hyperparameter(s) on the entire train set

– The performance on the test set estimates generalization performance.

Train Test

Page 60: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

60

Practical

Page 61: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

61

ML in high dimension

Page 62: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

62

“Big data” vs “fat data”

Challenges when p>>n:– Computational challenges (e.g. linear regression)– Curse of dimensionality– Higher risk of overfitting.

Page 63: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

63

Curse of dimensionality● Methods/intuitions that work in low dimension do 

not necessarily apply in high dimension.● Hyperspace is very big and everything is far apart

Most points inside a cube are outside of the sphere inscribed in this cube.

● Dimensionality reduction:– Feature extraction– Feature selection.

r

Page 64: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

64

Feature extraction

● Project the data onto a lower­dimensional space● Creates new features   → interpretability?● Matrix factorization techniques:

PCA & kPCA, factorial analysis, non­negative matrix factorization.

● Manifold learning techniques

MDS, tSNE

● Autoencoders: neural networks 

Goal: recover the input as well as possible.

Page 65: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

65

Principal Components Analysis● Find a space of dimension m s.t. the variance of 

the data projected onto that space is maximal.

Projection on x1

Pro

jec

tion 

on 

x 2

Page 66: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

66

Principal Components Analysis● Find a space of dimension m s.t. the variance of 

the data projected onto that space is maximal.● Find orthonormal directions                                 s.t.

● These directions, called principal components are the eigenvectors of          , sorted by decreasing eigenvalue.

Page 67: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

67

Principal Components Analysis

Novembre et al, Nature 2008

SNP data (Affymetrix 500K) for 1387 Europeans

Page 68: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

68

Supervised feature selectionKeep relevant features only

● Filter approaches: apply a statistical test to assign a score to each feature

● Wrapper approaches: use a greedy search to find the best set of features for a given ML algorithm

● Embedded approaches: fit a sparse model, i.e. that is encouraged to not use all the features.

Page 69: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

69

A p>>n simulation

p=1 000, n=100, 10 causal features.

Page 70: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

70

A p>>n simulation

p=1 000, n=100, 10 causal features.

Page 71: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

71

A p>>n simulation

Statistical test

Page 72: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

72

A p>>n simulation

Linear regression

Page 73: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

73

Lasso

feasible region

unconstrained minimum

prediction error regularizer

● Create sparse models: drive coefficients of β towards 0.

● No explicit solution, use gradient descent.

Page 74: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

74

A p>>n simulation

Lasso

Page 75: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

75

ElasticNet● Lasso tends to be unstable:

– Picks one of several correlated variables at random

– Different results on similar data sets

● ElasticNet: combine Ridge with Lasso

Page 76: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

76

ML Toolboxes● Python: scikit­learnhttp://scikit-learn.org

● R: Machine Learning Task Viewhttp://cran.r-project.org/web/views/MachineLearning.html

● Matlab™: Machine Learning with MATLABhttp://mathworks.com/machine-learning/index.html

– Statistics and Machine Learning Toolbox– Deep Learning Toolbox

● Deep learning software

– Theano http://deeplearning.net/software/theano/ – TensorFlow http://www.tensorflow.org/

– Caffe http://caffe.berkeleyvision.org/

– Keras https://keras.io/

Page 77: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

77

Page 78: Introduction to Machine Learning for Bioinformatics › dotclear › public › lectures › s1133_2019 › ...2019/09/30  · Introduction to Machine Learning for Bioinformatics S1133

78

SummaryMachine learning =

data + model + objective function● Catalog:

– Supervised vs unsupervised– Parametric vs non­parametric– Linear models, SVMs, random forests, neural networks.

● Key concerns: – Avoid overfitting    regularization⇒

– Measure generalization performance.● p >> n setting:

– Feature selection– Sparsity (Lasso, ElasticNet)