pfizer hts machine learning algorithms: november 2002

74
Pfizer HTS Machine Learning Algorithms: November 2002 Paul Hsiung ([email protected]) Paul Komarek ([email protected]) Ting Liu ([email protected]) Andrew W. Moore ([email protected]) Auton Lab, Carnegie Mellon University School of Computer Science www.autonlab.org

Upload: wang-bradford

Post on 03-Jan-2016

35 views

Category:

Documents


3 download

DESCRIPTION

Pfizer HTS Machine Learning Algorithms: November 2002. Paul Hsiung ([email protected]) Paul Komarek ([email protected]) Ting Liu ([email protected]) Andrew W. Moore ([email protected]) Auton Lab , Carnegie Mellon University School of Computer Science www.autonlab.org. Datasets. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Pfizer HTS Machine Learning Algorithms: November 2002

Pfizer HTS Machine Learning Algorithms:

November 2002Paul Hsiung ([email protected])

Paul Komarek ([email protected])Ting Liu ([email protected])

Andrew W. Moore ([email protected])

Auton Lab, Carnegie Mellon UniversitySchool of Computer Science

www.autonlab.org

Page 2: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 2

DatasetsOur Name

Num. Records

Num Attributes

Num non-zero input cells

Num positive outputs

Description

train1 26,733 6,348 3.7M 804 The original dataset sent to CMU in Feb 2002

test1 1,456 6,121 0.2M 878 The test set associated with the above training set

jun-3-1 88,358 1,143,054

30M 423 The large “TEST3” dataset sent to us in May 2002. the “-1” at the end denotes that we were using the first of the four activation columns

combined

88,358 1,143,054

30M 211 Combining the “TEST3” datasets. The activation in Combined is positive if and only if at least two of the four original activations were positive.

Page 3: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 3

Projections

Our Name name given to original

name given to 100 dimensional projection

name given to 10 dimensional projection

train1 train1 train100 train10

test1 test1 test100 test10

train1 train1 train-pls-100

train-pls-10

test1 test1 test-pls-100 test-pls-10

jun-3-1 n/a jun-3-1 n/a

combined n/a combined n/a

Page 4: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 4

Previous AlgorithmsBC Bayes Classifier

On original data, a naïve categorical classifier was used.On Real-valued projected data, a Naïve Gaussian classifier was used.

Dtree

Decision TreeThis technique is also known as Recursive Partitioning and CART. It was only implemented for the original data.

SVM Support Vector Machine.Except where stated otherwise, a linear SVM was used. We could not find significant performance difference between Linear SVM and Radial Basis Function SVM with a variety of RBF parameters.

k-NN k-nearest neighborExcept where stated otherwise, k=9 neighbors were used. Only implemented for projected data.

LR Logistic RegressionExcept where stated otherwise, used Conjugate Gradient to perform intermediate weighted regressions, using a newly developed technique.

Page 5: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 5

New Algorithmsnew-KNN

Tractable High dimensional k-nearest neighborCan work on the 1,000,000 dimensional “June” data.

EFP Explicit False Positive Logistic RegressionLogistic regression that accounts for the high false positive rate.

SMod

Super Model.Automatically combining the predictions from multiple algorithms with a “meta-level” of logistic regression.

PLS-proj

Partial Least Squares ProjectionUsing PLS instead of PCA to project down data

PLS Partial Least Squares PredictionUsing the PLS algorithm as a predictor

Page 6: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 6

Explicit False Positive Model

Page 7: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 7

Explicit False Positive Model

Page 8: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 8

Example in 2 dimensions: Decision Boundary

Page 9: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 9

Example in 2 dimensions: 100 true positives

Page 10: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 10

100 true positives and 100 true negatives

Page 11: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 11

100 TP, 100 TN, 10 FP

Page 12: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 12

Using regular logistic regression

Page 13: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 13

Using EFP Model

Page 14: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 14

Example: 10000 true positives

Page 15: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 15

10000 true positives, 10000 true negatives

Page 16: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 16

10000 TP, 10000 TN, 1000 FP

Page 17: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 17

Using regular logistic regression

Page 18: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 18

Using EFP Model

Page 19: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 19

EFP Model Real Data Results

K-fold

Page 20: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 20

EFP Effect

…Very impressive on Train1 / Test1

Page 21: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 21

Log X-axis

Page 22: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 22

EFP Effect

…Unimpressive on jun31 / jun32

Page 23: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 23

Super Model• Divide Training Set into Compartment A

and Compartment B

• Learn each of N models on Compartment A

• Predict each of N models on Compartment B

• Learn best weighting of opinions with Logistic Regression of Predictions on Compartment B

• Apply the models and their weights to Test Data

Page 24: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 24

Comparison

Page 25: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 25

Log X-Axis Scale

Page 26: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 26

Comparison on 100-dims

Page 27: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 27

Log X-axis

Page 28: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 28

Comparison on 10 dims

Page 29: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 29

Log X-axis

Page 30: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 30

NewKNN summary of results and timings

Page 31: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 31

Page 32: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 32

Page 33: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 33

Page 34: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 34

Page 35: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 35

Page 36: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 36

Page 37: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 37

Page 38: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 38

Page 39: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 39

Page 40: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 40

Page 41: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 41

Page 42: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 42

Page 43: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 43

Page 44: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 44

Page 45: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 45

PLS summary of results•PLS projections did not do so well.•However, PLS as a predictor performed well,especially under train100/test100.•PLS is fast. The runtime varies from 1 to 10 minutes.•But PLS takes large amounts of memory. Impossibleto use in a sparse representation. (This is due to theupdate on each iteration.)

Page 46: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 46

Page 47: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 47

Page 48: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 48

Page 49: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 49

Page 50: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 50

Page 51: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 51

Page 52: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 52

Page 53: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 53

Page 54: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 54

Page 55: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 55

Page 56: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 56

Page 57: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 57

Page 58: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 58

Page 59: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 59

Page 60: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 60

Page 61: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 61

Page 62: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 62

Page 63: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 63

Page 64: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 64

Page 65: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 65

Page 66: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 66

Page 67: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 67

Page 68: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 68

Page 69: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 69

Page 70: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 70

Page 71: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 71

Page 72: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 72

Summary of results• SVM best early on in Train1, LR better in the

long-haul.• Projecting to 10-d always a disaster• Projecting to 100-d often indistinguishable from

behavior with original data (and much cheaper)• Naïve Gaussian Bayes Classifier best on JUN-3-1

(k-nn better for long haul)• Naïve Gaussian Bayes Classifier best on

combined• Non-linear SVM never seems distinguishable

from Linear SVM• All methods have won in at least one context,

except Dtree.

Page 73: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 73

Some AUC ResultsExperiment Algorithm AUC

Train on Train1 then test on Test1

Linear SVM 0.876*

Best non-Linear SVM

0.875*

BC 0.867*

LR 0.71

KNN 0.872*

DTree 0.70

Combined SVM 0.638

BC 0.700

LR 0.606

KNN 0.603

* = Not statistically significantly different

Page 74: Pfizer HTS Machine Learning Algorithms: November 2002

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 74

Some AUC ResultsExperiment Algorithm AUC

10-fold cross-validation on Train1

Linear SVM 0.919

BC 0.885

LR 0.933

DTree 0.894