large-scale sparse logistic regression

30
Center for Evolutionary Functional Genomics Large-Scale Sparse Logistic Regression Jieping Ye Arizona State University Joint work with Jun Liu and Jianhui Chen

Upload: muncel

Post on 22-Jan-2016

60 views

Category:

Documents


0 download

DESCRIPTION

Large-Scale Sparse Logistic Regression. Jieping Ye Arizona State University Joint work with Jun Liu and Jianhui Chen. Prediction: Disease or not Confidence (probability) Identify Informative features. Sparse Logistic Regression. Logistic Regression. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Large-Scale Sparse Logistic Regression

Center for Evolutionary Functional Genomics

Large-Scale Sparse Logistic Regression

Jieping Ye

Arizona State University

Joint work with Jun Liu and Jianhui Chen

Page 2: Large-Scale Sparse Logistic Regression

Center for Evolutionary Functional Genomics

Prediction: Disease or not Confidence (probability) Identify Informative features

Sparse Logistic Regression

Page 3: Large-Scale Sparse Logistic Regression

Center for Evolutionary Functional Genomics

Logistic Regression

Logistic Regression (LR) has been applied to Document classification (Brzezinski, 1999)

Natural language processing (Jurafsky and Martin, 2000)

Computer vision (Friedman et al., 2000)

Bioinformatics (Liao and Chin, 2007)

Regularization is commonly applied to reduce overfitting and obtain a robust classifier. Two well-known regularizations are:

L2-norm regularization (Minka, 2007)

L1-norm regularization (Koh et al., 2007)

Page 4: Large-Scale Sparse Logistic Regression

Center for Evolutionary Functional Genomics

Sparse Logistic Regression

L1-norm regularization leads to sparse logistic regression (SLR) Simultaneous feature selection and classification Enhanced model interpretability Improved classification performance

Applications M.-Y. Park and T. Hastie, Penalized Logistic Regression for

Detecting Gene Interactions. Biostatistics, 2008. T. Wu et al. Genomewide Association Analysis by Lasso Penalized

Logistic Regression. Bioinformatics, 2009.

Page 5: Large-Scale Sparse Logistic Regression

Center for Evolutionary Functional Genomics

Large-Scale Sparse Logistic Regression

Many applications involve data of large dimensionality

The MRI images used in Alzheimer’s Disease study contain more than 1 million voxels (features)

Major Challenge How to scale sparse logistic

regression to large-scale problems?

Page 6: Large-Scale Sparse Logistic Regression

Center for Evolutionary Functional Genomics

The Proposed Lassplore Algorithm

Lassplore (LArge-Scale SParse LOgistic REgression) is a first-order method

Each iteration of Lassplore involves the matrix-vector multiplication onlyScale to large-size problemsEfficient for sparse data

Lassplore achieves the optimal convergence rate among all first-order methods

Page 7: Large-Scale Sparse Logistic Regression

Center for Evolutionary Functional Genomics

Outline

Logistic Regression

Sparse Logistic Regression

Lassplore

Experiments

Page 8: Large-Scale Sparse Logistic Regression

Center for Evolutionary Functional Genomics

Logistic Regression (1)

Logistic regression model is given by

1

Prob( | ) ( )1 exp ( )

T

Tb a w a c

b w a c

a nR is the sample

{ 1, 1}b is the class label

Page 9: Large-Scale Sparse Logistic Regression

Center for Evolutionary Functional Genomics

Logistic Regression (2)

1{a , }mi i ib Given a set of m training data , we can compute w

and c by minimizing the average logistic loss:

1a2a ma

Prob( | ) ( )Ti i ib a w a c

i

Prob( | )i ib a is maximized

overfitting

Page 10: Large-Scale Sparse Logistic Regression

Center for Evolutionary Functional Genomics

L1-ball Constrained Logistic Regression

Favorable Properties: Obtaining sparse solution Performing feature selection and classification simultaneously Improving classification performance

How to solve the L1-ball constrained optimization problem?

Page 11: Large-Scale Sparse Logistic Regression

Center for Evolutionary Functional Genomics

Gradient Method for Sparse Logistic Regression

Let us consider the gradient descent for solving the optimization problem: min ( )

x Gg x

kx1kx

1 '( ) /k k k kx x g x L

Page 12: Large-Scale Sparse Logistic Regression

Center for Evolutionary Functional Genomics

Euclidean Projection onto the L1-Ball

v1

π(v1)

π(v2)

π(v3)

v2

v3

z

y

x

z

0

The Euclidean projection onto the L1-ball (Duchi et al., 2008) is a building block, and it can be solved in linear time (Liu and Ye, 2009).

Page 13: Large-Scale Sparse Logistic Regression

Center for Evolutionary Functional Genomics

Gradient Method & Nesterov’s Method (1)

g(.) Gradient Descent Nesterov’s method

smooth and convex O(1/k) O(1/k2)

smooth and strongly convex with conditional number C

21

1

kC

OC

11

k

OC

Convergence rates:

Nesterov’s method achieves the lower-complexity bound of smooth optimization by first-order black-box method, and thus is an optimal method.

Page 14: Large-Scale Sparse Logistic Regression

Center for Evolutionary Functional Genomics

Gradient Method & Nesterov’s Method (2)

The theoretical number of iterations (up to a constant factor) for achieving an accuracy of 10-8:

g(.) Gradient Descent Nesterov’s method

smooth and convex 108 104

smooth and strongly convex with conditional number

C= 104

4.6×104 1.8×103

Page 15: Large-Scale Sparse Logistic Regression

Center for Evolutionary Functional Genomics

Characteristics of the Lassplore

First-order black-box Oracle based method At each iteration, we only need to evaluate the function value and gradient

Utilizing the Nesterov’s method (Nesterov, 2003)

Global convergence rate of O(1/k2) for the general case

Linear convergence rate for the strongly convex case

An adaptive line search scheme The step size is allowed to increase during the iterations

This line search scheme is applicable to the general smooth convex optimization

Page 16: Large-Scale Sparse Logistic Regression

Center for Evolutionary Functional Genomics

Key Components and Settings

xk

sk

xk+1

xk-1

sk=xk+βk(xk-xk-1)xk+1=sk-g'(sk)/Lk

Previous schemes for :Nesterov’s constant scheme (Nesterov, 2003)

Nemirovski’s line search scheme (Nemirovski, 1994)

,k kL

1kx

1 '( ) /k k k kx x g x L kx

Page 17: Large-Scale Sparse Logistic Regression

Center for Evolutionary Functional Genomics

Previous Line Search Schemes

k

Nesterov’s constant scheme (Nesterov, 2003): is set to a constant value L, the Lipschitz continuous gradient of the function g(.) is dependent on the conditional number C

kL

k

Nemirovski’s line search scheme (Nemirovski, 1994): is allowed to increase, and upper-bounded by 2L is identical for every function g(.)

kL

Page 18: Large-Scale Sparse Logistic Regression

Center for Evolutionary Functional Genomics

Proposed Line Search SchemeCharacteristics: is allowed to adaptively tuned (increasing and decreasing) and

upper-bounded by 2L is dependent on It preserves the optimal convergence rate (technical proof refers to the

paper)

kL

kkL

Page 19: Large-Scale Sparse Logistic Regression

Center for Evolutionary Functional Genomics

Related Work Y. Nesterov. Gradient methods for minimizing composite

objective function (Technical Report 2007/76). S. Becker, J. Bobin, and E. J. Candès. NESTA: a fast and

accurate first-order method for sparse recovery. 2009. A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding

algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2, 183-202, 2009.

K.-C. Toh and S. Yun. An accelerated proximal gradient algorithm for nuclear norm regularized least squares problems. Preprint, National University of Singapore, March 2009.

S. Ji and J. Ye. An Accelerated Gradient Method for Trace Norm Minimization. The Twenty-Sixth International Conference on Machine Learning, 2009.

Page 20: Large-Scale Sparse Logistic Regression

Center for Evolutionary Functional Genomics

Experiments: Data Sets

Page 21: Large-Scale Sparse Logistic Regression

Center for Evolutionary Functional Genomics

Comparison of the Line Search Schemes

Comparison the proposed adaptive scheme (Adap) with the one proposed by Nemirovski (Nemi)

kL Objective

Page 22: Large-Scale Sparse Logistic Regression

Center for Evolutionary Functional Genomics

Pathwise Solutions: Warm Start vs. Cold Start

Page 23: Large-Scale Sparse Logistic Regression

Center for Evolutionary Functional Genomics

Comparison with ProjectionL1 (Schmidt et al., 2007)

Adaptive Scheme

Page 24: Large-Scale Sparse Logistic Regression

Center for Evolutionary Functional Genomics

Comparison with ProjectionL1 (Schmidt et al., 2007)

Adaptive Scheme

Page 25: Large-Scale Sparse Logistic Regression

Center for Evolutionary Functional Genomics

Comparison with l1-logreg (Koh et al., 2007)

Page 26: Large-Scale Sparse Logistic Regression

Center for Evolutionary Functional Genomics

Drosophila Gene Expression Image Analysis

Drosophila embryogenesis is divided into 17 developmental stages (1-17)

BDGP

Fly-FISH

Page 27: Large-Scale Sparse Logistic Regression

Center for Evolutionary Functional Genomics

Sparse Logistic Regression: Application (2)

Page 28: Large-Scale Sparse Logistic Regression

Center for Evolutionary Functional Genomics

Summary

The Lassplore algorithm for sparse logistic regression First-order black-box method Optimal convergence rate Adaptive line search scheme

Future work Apply the proposed approach for other mixed-norm

regularized optimization Biological image analysis

Page 29: Large-Scale Sparse Logistic Regression

Center for Evolutionary Functional Genomics

The Lassplore Package

http://www.public.asu.edu/~jye02/Software/lassplore/

Page 30: Large-Scale Sparse Logistic Regression

Center for Evolutionary Functional Genomics

Thank you!