name of student: kung-hua chang date: july 8, 2005 socalbsi

19
The Chicken Project Dimension Reduction-Ba sed Penalized logistic Regression for cancer classification Using M icroarray Data By L. Shen and E.C. Ta n Name of student: Kung-Hua Chang Date: July 8, 2005 SoCalBSI California State University at Los Angeles

Upload: lore

Post on 08-Feb-2016

46 views

Category:

Documents


1 download

DESCRIPTION

Dimension Reduction-Based Penalized logistic Regression for cancer classification Using Microarray Data By L. Shen and E.C. Tan. Name of student: Kung-Hua Chang Date: July 8, 2005 SoCalBSI California State University at Los Angeles. Background. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Name of student: Kung-Hua Chang Date: July 8, 2005 SoCalBSI

The Chicken Project

Dimension Reduction-Based Penalized logistic Regression for cancer classification Using Mi

croarray Data By L. Shen and E.C. Tan

Name of student: Kung-Hua ChangDate: July 8, 2005

SoCalBSICalifornia State University at Los Angeles

Page 2: Name of student: Kung-Hua Chang Date: July 8, 2005 SoCalBSI

Background

Microarray data have the characteristics that the number of samples ismuch less than the number of variables.

This causes the “curse of dimensionality” problem.

In order to solve this problem, many dimension reduction methods are used such as Singular Value Decomposition and Partial Least Squares.

Page 3: Name of student: Kung-Hua Chang Date: July 8, 2005 SoCalBSI

Background (cont’d)

Singular Value Decomposition and Partial Least Squares.

Given a m x n matrix X that stores all of the gene expression data. Then X can be approximated as:

Page 4: Name of student: Kung-Hua Chang Date: July 8, 2005 SoCalBSI

Background (cont’d)

Page 5: Name of student: Kung-Hua Chang Date: July 8, 2005 SoCalBSI

Background (cont’d)

Logistic regression and least square regression.

They are ways to draw a line that can approximate a set of points.

Page 6: Name of student: Kung-Hua Chang Date: July 8, 2005 SoCalBSI

Background (cont’d)

The difference is that logistic regression equations are solved iteratively. A trial equation is fitted and tweaked over and over in order to improve the fit. Iterations stop when the improvement from one step to the next is suitably small.

Least square regression can be solved explicitly.

Page 7: Name of student: Kung-Hua Chang Date: July 8, 2005 SoCalBSI

Background (cont’d)

Penalized logistic regression is just a logistic regression method except that there is a cost function associated with it.

Page 8: Name of student: Kung-Hua Chang Date: July 8, 2005 SoCalBSI

Background (cont’d)

Support Vector Machine (SVM) SVM tries a find a hyper-plane that can

separate different sets of data. Not a linear model.

Page 9: Name of student: Kung-Hua Chang Date: July 8, 2005 SoCalBSI

Hypothesis

The combination of dimension reduction-based penalized logistic regression has the best performance compared to support vector machine and least squares regression.

Page 10: Name of student: Kung-Hua Chang Date: July 8, 2005 SoCalBSI

Data Analysis

The above table shows the number of training/testing cases in the seven publicly available cancer data sets.

Page 11: Name of student: Kung-Hua Chang Date: July 8, 2005 SoCalBSI

Data Analysis (cont’d)

Page 12: Name of student: Kung-Hua Chang Date: July 8, 2005 SoCalBSI

Data Analysis (cont’d)

Page 13: Name of student: Kung-Hua Chang Date: July 8, 2005 SoCalBSI

Data Analysis

Page 14: Name of student: Kung-Hua Chang Date: July 8, 2005 SoCalBSI

Data Analysis

Generally, the partial least square based classifier uses less time than the singular value decomposition based classifier.

Page 15: Name of student: Kung-Hua Chang Date: July 8, 2005 SoCalBSI

Data Analysis (cont’d)

The penalized logistic regression training requires solving a set of linear equations iteratively until convergence, while the least square regression training requires solving a set of linear equations only once. So it’s reasonable to see that penalized logistic regression uses more time than the least square regression.

Page 16: Name of student: Kung-Hua Chang Date: July 8, 2005 SoCalBSI

Data Analysis (cont’d)

The overall time required by partial least squares and SVD-based regression method is much less than that of support vector machine.

Page 17: Name of student: Kung-Hua Chang Date: July 8, 2005 SoCalBSI

Data Analysis

Page 18: Name of student: Kung-Hua Chang Date: July 8, 2005 SoCalBSI

Conclusion

The combination of dimension reductionbased penalized logistic regression has thebest performance compared to supportvector machine and least squaresregression.

Page 19: Name of student: Kung-Hua Chang Date: July 8, 2005 SoCalBSI

References

[1] L. Shen and E.C. Tan (to appear in June, 2005) "Dimension Reduction-Based Penalized Logistic Regression for Cancer Classification Using Microarray Data", IEEE/ACM Trans. Computatio

nal Biology and Bioinformatics

[2] SoCalBSI: http://instructional1.calstatela.edu/jmomand2/

[3] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning; Data mining, Inference and Prediction. Spring

er Verlag, New York, 2001.