research article coordinate descent based hierarchical...

12
Research Article Coordinate Descent Based Hierarchical Interactive Lasso Penalized Logistic Regression and Its Application to Classification Problems Jin-Jia Wang and Yang Lu School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, China Correspondence should be addressed to Jin-Jia Wang; [email protected] Received 20 August 2014; Revised 1 December 2014; Accepted 1 December 2014; Published 16 December 2014 Academic Editor: Wei-Chiang Hong Copyright © 2014 J.-J. Wang and Y. Lu. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. We present the hierarchical interactive lasso penalized logistic regression using the coordinate descent algorithm based on the hierarchy theory and variables interactions. We define the interaction model based on the geometric algebra and hierarchical constraint conditions and then use the coordinate descent algorithm to solve for the coefficients of the hierarchical interactive lasso model. We provide the results of some experiments based on UCI datasets, Madelon datasets from NIPS2003, and daily activities of the elder. e experimental results show that the variable interactions and hierarchy contribute significantly to the classification. e hierarchical interactive lasso has the advantages of the lasso and interactive lasso. 1. Introduction Sparse linear models (such as the lasso) are a remarkable success of the regression analysis of high-dimensional data [1]. e lasso is a least squares regression with the L1 penalty function. It can also be extended to the generalized linear model [2], for example, the logistic regression with L1 penalty used for classification [3]. In the lasso model, the response variable is assumed to be a linear weighted sum of the predictor variables, and the optimization problem used to find the weighting coefficients can be solved by the coordinate descent algorithm [4]. If, in the analysis of high-dimensional data, the response variable cannot be explained by a linear weighted sum of predictor variables, a higher-order model and quadratic model need to be used. In most cases, this sug- gests the presence of variable interactions [5]. e presence of such interactions is considered important, as, for example, the interaction between single nucleotide polymorphisms (SNPs) plays an important role in the diagnosis of cancer and other diseases [6]. While the linear model has some advantages, such as good interpretability and simple calculations, the variable interaction models are considered to be a focus of the modern research [7]. ere are three types of methods used in the hierarchical interaction models. e first one is a multistep method. is method is based on removing or adding the best predictor variables or interaction variables in each iteration. Once the predictor variables corresponding to the interaction variables are in the model, the interaction variables must be in the model as well [8]. Alternatively, we can consider the variable selection before the interaction selection [9]. Usually, the modified LARS algorithm is used in such models to solve the interaction model [10]. e second type is the Bayes model method. is approach improves the random search variable selection method for the hierarchical interaction model [11]. e third type is based on optimization. e sparse interaction model is formulated as a nonconvex optimization problem [12] and further expressed as a convex optimization problem, such as all-pair lasso [13] or interaction group lasso [14]. In the literature on the sparse structures [15], composite absolute penalties (CAP) can also obtain the sparseness of the group and interaction, but the interaction coefficient is penalized twice [16]. To solve for the hierarchical sparseness in the nonlinear interaction problem, the existing litera- ture [17] has introduced the VANISH method. e logic regression method considers the binary variable high-level interaction [18]. e existing literature [19] uses a simple recursive approach to select the interaction variables from high-dimensional data. e literature [20] proposed a genetic Hindawi Publishing Corporation Mathematical Problems in Engineering Volume 2014, Article ID 430201, 11 pages http://dx.doi.org/10.1155/2014/430201

Upload: others

Post on 04-Jun-2020

20 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Research Article Coordinate Descent Based Hierarchical ...downloads.hindawi.com/journals/mpe/2014/430201.pdf · Coordinate Descent Based Hierarchical Interactive Lasso Penalized Logistic

Research ArticleCoordinate Descent Based Hierarchical InteractiveLasso Penalized Logistic Regression and Its Application toClassification Problems

Jin-Jia Wang and Yang Lu

School of Information Science and Engineering Yanshan University Qinhuangdao 066004 China

Correspondence should be addressed to Jin-Jia Wang wjjysueducn

Received 20 August 2014 Revised 1 December 2014 Accepted 1 December 2014 Published 16 December 2014

Academic Editor Wei-Chiang Hong

Copyright copy 2014 J-J Wang and Y LuThis is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

We present the hierarchical interactive lasso penalized logistic regression using the coordinate descent algorithm based on thehierarchy theory and variables interactions We define the interaction model based on the geometric algebra and hierarchicalconstraint conditions and then use the coordinate descent algorithm to solve for the coefficients of the hierarchical interactive lassomodel We provide the results of some experiments based on UCI datasets Madelon datasets from NIPS2003 and daily activitiesof the elderThe experimental results show that the variable interactions and hierarchy contribute significantly to the classificationThe hierarchical interactive lasso has the advantages of the lasso and interactive lasso

1 Introduction

Sparse linear models (such as the lasso) are a remarkablesuccess of the regression analysis of high-dimensional data[1] The lasso is a least squares regression with the L1 penaltyfunction It can also be extended to the generalized linearmodel [2] for example the logistic regression with L1 penaltyused for classification [3] In the lasso model the responsevariable is assumed to be a linear weighted sum of thepredictor variables and the optimization problem used tofind theweighting coefficients can be solved by the coordinatedescent algorithm [4] If in the analysis of high-dimensionaldata the response variable cannot be explained by a linearweighted sum of predictor variables a higher-order modeland quadratic model need to be used In most cases this sug-gests the presence of variable interactions [5]The presence ofsuch interactions is considered important as for example theinteraction between single nucleotide polymorphisms (SNPs)plays an important role in the diagnosis of cancer and otherdiseases [6] While the linear model has some advantagessuch as good interpretability and simple calculations thevariable interaction models are considered to be a focus ofthe modern research [7]

There are three types of methods used in the hierarchicalinteraction models The first one is a multistep method This

method is based on removing or adding the best predictorvariables or interaction variables in each iteration Once thepredictor variables corresponding to the interaction variablesare in the model the interaction variables must be in themodel as well [8] Alternatively we can consider the variableselection before the interaction selection [9] Usually themodified LARS algorithm is used in such models to solve theinteraction model [10] The second type is the Bayes modelmethodThis approach improves the random search variableselection method for the hierarchical interaction model[11] The third type is based on optimization The sparseinteractionmodel is formulated as a nonconvex optimizationproblem [12] and further expressed as a convex optimizationproblem such as all-pair lasso [13] or interaction group lasso[14]

In the literature on the sparse structures [15] compositeabsolute penalties (CAP) can also obtain the sparseness ofthe group and interaction but the interaction coefficient ispenalized twice [16] To solve for the hierarchical sparsenessin the nonlinear interaction problem the existing litera-ture [17] has introduced the VANISH method The logicregression method considers the binary variable high-levelinteraction [18] The existing literature [19] uses a simplerecursive approach to select the interaction variables fromhigh-dimensional dataThe literature [20] proposed a genetic

Hindawi Publishing CorporationMathematical Problems in EngineeringVolume 2014 Article ID 430201 11 pageshttpdxdoiorg1011552014430201

2 Mathematical Problems in Engineering

1

k-vectors2-vectors

1-vectors

0-vectord-vector

middot middot middot

middot middot middotmiddot middot middot

middot middot middot

e1e2

ei

ed

middot middot middot

middot middot middot

e1 and e2

e1 and e3

ei and ej

edminus1 and ed

ei and middot middot middot and ej and middot middot middot and ek

e1 and e2 and middot middot middot and ed

Figure 1 The diagram of the subspaces in geometric algebra

algorithm using selection to choose interaction variables inhigh-dimensional data

The literature [13] presents a hierarchical interactive lassomethod for regression and provides a method of model coef-ficients estimation using KKT conditions and the Lagrangemultiplier method Based on the literature [13] and our pastwork the authors propose the concept of geometric algebrainteraction and coordinate descent algorithm for the hier-archical interactive lasso penalized logistic regression Weused experimental data including 4 kinds of datasets from theUCI machine learning database one Madelon datasets fromNIPS2003 and one daily life activity recognition datasetsTheexperimental results reveal the outstanding advantages of thehierarchical interactive lasso method compared to the lassoand interactive lasso methods The innovations include thefollowing (1) We use geometric algebra to explain variableinteraction (2) we derive an improved coordinate descentalgorithm to solve the hierarchical interactive lasso penalizedlogistic regression (3) we use the hierarchical interactivelasso for the classification problem

2 The Variable Interaction Theory ofGeometric Algebra

Definition 1 If the function 119891(119909 119910) cannot be represented asa sum of independent functions 119891

1(119909) + 119891

2(119910) then 119909 119910 in

the 119891 function are said to have interaction

A popular explanation of Definition 1 is that if a responsevariable cannot be represented as a linear weighted sumof the prediction variables it is probably because there areinteractions between the variables

Interactions between variables can be easily explainedby the geometric algebra theory Figure 1 is a diagramshowing all subspace in geometric algebra The 1-vectorsnamely order-1main variables can represent a119901dimensionalsubspace of the original data That is 119901 dimensional base ofthe original data is projected on the 1-vectors The 2-vectorsshow the interaction between two variables The simplest 2-vectors coefficient can be the product of two 1-vectors Inthe literature [13] our proposed area feature is considered asone of the interactions In the literature [20] our proposedorthocenter feature is considered as one of the interactionsHigher-order interactions are represented by 119896-vectors Inthis paper we only study area interactions between 1-vectors

This method can also be extended to the nonlinear complexfunction interactions or higher order

3 The Binary Logistic Regression Based onInteraction and Hierarchy

The outcome variable in the binary logistic model is denotedby119884 the income variables are predictors119883 119909

1 119909

119895 119909

119901

are order-1 main variables and the pairwise 119909119895119909119896are inter-

actions variables between order-1 main variables The binarylogistic model has the form

log 119894119905 (119875 (119884 = 1 | 119883)) =

119901

sum119895=0

120573119895119909119895+1

2sum119895 =119896

Θ119895119896119909119895119909119896+ 120576 (1)

where Θ119895119895= 0 the main variables coefficients are 120573 isin 119877119901+1

the interaction variables coefficients areΘ isin 119877119901times119901 1199090is 1 and

120576 satisfies119873(0 1205902)Assume that the training samples are (x

1 1199101) (xi 119910119894)

(xN 119910119873) xi isin 119877119901

119910119894=

1 119884119894= 1

0 119884119894= 1

(2)

Our goal is to select a feature subset from the order-1mainvariables (dimension 119901) and order-2 interaction variables(dimension 119901(119901 minus 1)2) We then estimate the coefficientvalues for nonzero model parameters We can obtain theprobability of two classes as follows

Pr (119884 = 1 | xi)

=1

1 + exp (minussum119895120573119895119909119894119895minus (12)sum

119895 =119896Θ119895119896119909119894119895119909119894119896)

= 119901 (xi)

119875 (119884 = 2 | xi)

=1

1 + exp (sum119895120573119895119909119894119895+ (12)sum

119895 =119896Θ119895119896119909119894119895119909119894119896)

= 1 minus 119901 (xi)

(3)

The maximum likelihood estimation is used to estimatethe unknown model parameters which make the likelihoodfunction of 119873 independent observations the largest Wedefine

119871 (120573 Θ) =119873

prod119894=1

119901 (xi)119910119894 [1 minus 119901 (xi)]

(1minus119910119894) (4)

Then the logarithmic likelihood function of (4) is

1

119873ln [119871 (120573 Θ)]

=1

119873

119873

sum119894=1

[119910119894(xTi 120573 +

1

2xTi Θxi) + ln (1 minus 119901 (xi))]

(5)

Mathematical Problems in Engineering 3

We use the second-order Taylor expansion at the currentestimated value (120573 Θ) for (5) and obtain the subproblem asfollows

119897119876(120573 Θ) =

1

2120596119894(119911119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896)

2

+(119910119894minus 119901 (xi))

2

2 lowast 1199012 (xi) [1 minus 119901 (xi)]2

+ [

[

119910119894(sum119895

120573119895119909119894119895+ sum119895 =119896

Θ119895119896119909119894119895119909119894119896)

+ ln (1 minus 119901 (xi))]]

(6)

where 120596119894= 119901(xi)[1 minus 119901(xi)] 119911119894 = sum

119895120573119895119909119894119895+ (12)sum

119895 =119896Θ119895119896

119909119894119895119909119894119896+ (119910119894minus 119901(xi))119901(xi)[1 minus 119901(xi)]

The proof that (5) implies (6) is presented in theAppendix A

In order to obtain the sparse solution for themain variablecoefficients and interaction coefficients the penalty functionis used to enhance the stability of the interactive model

(120573 Θ) = arg Min120573isin119877119901+1Θisin119877119901times119901

minus 119897119876(120573 Θ) + 120582

1

100381710038171003817100381712057310038171003817100381710038171

+ 1205822 Θ1

(7)

We focus on those interactions that have large main vari-able coefficients Such restrictions are known as ldquohierarchyrdquoThe mathematical expression for them is Θ

119895119896= 0 rArr 120573

119895= 0

or 120573119896

= 0 So we add the constraints enforcing the hierarchyinto (7) as follows

(120573 Θ)

= Min120573isin119877119901+1Θisin119877119901times119901

minus 119897119876(120573 Θ) + 120582

1

100381710038171003817100381712057310038171003817100381710038171 + 1205822 Θ1

st 10038171003817100381710038171003817Θ119895100381710038171003817100381710038171 le

1003816100381610038161003816100381612057311989510038161003816100381610038161003816 for 119895 = 1 119901

(8)

where Θ119895is the 119895th column of Θ If Θ

119895119896= 0 then Θ

1198951gt 0

Θ1198961gt 0 so 120573

119895= 0 and 120573

119896= 0 The new constraint

guarantees the hierarchy but we cannot obtain a convexsolution because (8) is not convex So instead of 120573 we use120573+ 120573minus And the corresponding convex relaxation of (8) is asfollows

(120573plusmn Θ) = Min1205730isin119877120573

plusmnisin119877119901Θisin119877119901times119901

minus 119897119876(120573+ minus 120573minus Θ)

+ 12058211119879 (120573+ + 120573minus) + 120582

2sum119895

10038171003817100381710038171003817Θ11989510038171003817100381710038171003817

st10038171003817100381710038171003817Θ119895

100381710038171003817100381710038171 le 120573+119895+ 120573minus119895

120573+119895ge 0 120573minus

119895ge 0

for 119895 = 1 119901

(9)

where 120573 = 120573+ minus 120573minus 120573plusmn = maxplusmn120573 0 120573+ 120573minus isin 119877119901+1 and1205731= 120573+ + 120573minus

4 Coordinate Descent Algorithm andKKT Conditions

The basic idea of the coordinate descent algorithm is toconvert multivariate problems into multiple single variablesubproblems It allows optimizing only one-dimensionalvariables at a time The solution can be updated in a cycleWe solve (9) using the coordinate descent algorithm

The Lagrange function corresponding to (9) is as follows

119871 (120573+ 120573minus Θ) = minus119897119876(120573+ minus 120573minus Θ)

+ (12058211 minus minus 120574+1)

119879

120573+

+ (12058211 minus minus 120574minus1)

119879

120573minus

+ ⟨diag (12058221 + )119880Θ⟩

(10)

where

119895119896=

isin [minus1 1] Θ119896119895= 0

sign (Θ119895119896) Θ119895119896

= 0(11)

and and 120574plusmn are the dual variables corresponding tothe hierarchical constraint and the nonnegative constraintsFormula (10) can be decomposed into 119901 subproblems

119871 (120573+119895 120573minus119895 Θ119895) = minus119897

119876(120573+119895minus 120573minus119895 Θ119895)

+ (1205821minus 119895minus 120574+)119879

120573+119895

+ (1205821minus 119895minus 120574minus)119879

120573minus119895

+ ⟨(1205822+ 120572)119880

119895 Θ119895⟩

(12)

The solution of (12) as a convex problem can be obtainedby a set of optimality conditions known as the KKT (Karush-Kuhn-Tucker) conditions This is the key advantage of ourapproach

The stationary conditions of (12) according to KKT are120597119871120597120573plusmn

119895= 0 120597119871120597Θ

119895119896= 0The complementary conditions are

Θ1198951le 120573+119895+120573minus119895 120574plusmn119895120573plusmn119895= 0 120573plusmn ge 0 120574plusmn ge 0 ge 0

119895(Θ1198951minus

120573+119895minus 120573minus119895) = 0 We assume that (1119873)sum119873

119894=11199092119894119895= 1 For our

problem the KKT conditions can be written as follows

120573+119895minus 120573minus119895= 119878 [119909

119894119895(119911119894minus 119860119894) + 120573+119895minus 120573minus1198951205821+ 119895

120596119894

]

Θ119895119896=119878 [(119909119894119895119909119894119896)119879

(119911119894minus 119860119894+ Θ119895119896119909119894119895119909119894119896) 2 (120582

2+ 119895) 120596119894]

10038171003817100381710038171003817119909119894119895119909119894119896100381710038171003817100381710038172

(13)

where119860119894= sum119895120573119895119909119894119895+(12)sum

119895 =119896Θ119895119896119909119894119895119909119894119896 119911119894120596119894are from (6)

and 119878 denotes the soft-threshold operator defined by 119878(119888 120582) =sign(119888)(|119888| minus 120582)

+

4 Mathematical Problems in Engineering

Table 1 Datasets information

Datasets Samples Variable dimension Interactive dimension Class1 Breast-cancer-Wisconsin 683 9 36 22 Ionosphere 351 33 528 23 Liver disorders 345 6 15 24 Sonar 208 60 1770 2

Table 2 The experimental results of our method

Datasets Coefficients Error rate () SD Time (s) 120582

Breast-cancer-Wisconsin 22 30 001 949 314Ionosphere 101 280 002 12463 282Liver disorders 25 260 002 1211 237Sonar 117 140 002 15069 073

The proof of both expressions (13) can be found inAppendix B

Now we define 119891(119895) = Θ

1198951minus 120573+119895minus 120573minus119895 Then

119891 (119895)

=

10038171003817100381710038171003817100381710038171003817100381710038171003817

119875

sum119896=1

119878 [(119909119894119895119909119894119896)119879

(119911119894minus 119860119894+ Θ119895119896119909119894119895119909119894119896)

2 (1205822+ 119895)

120596119894

]

sdot (10038171003817100381710038171003817119909119894119895119909119894119896

100381710038171003817100381710038172

2)minus1

100381710038171003817100381710038171003817100381710038171003817100381710038171

minus100381710038171003817100381710038171003817100381710038171003817119878 [119909119894119895(119911119894minus 119860119894) + 120573+119895minus 120573minus1198951205821+ 119895

120596119894

]1003817100381710038171003817100381710038171003817100381710038171

(14)

The remaining KKT conditions only involve 119891() = 0119891() le 0 ge 0 Observing that 119891 is nonincreasing withrespect to and is piecewise linear it is easy to get the solutionfor

In conclusion the overall idea of the coordinate descentalgorithm is that the minimization of (9) is equivalent to theminimization of (10) Formula (10) can be decomposed into119901 independent formula (12) Formula (12) can be solved as(13) The final coefficient optimization iteration formula is asfollows

120573+(119898+1)119895

minus 120573minus(119898+1)119895

= 119878 [119909119894119895(119911119898119894minus 119860119898119894) + 120573+(119898)119895

minus 120573minus(119898)119895

1205821+ 119898119895

120596119896119894

]

Θ119898+1119895119896

=119878 [(119909119894119895119909119894119896) (119911119898119894minus 119860119898119894+ Θ119898119895119896119909119894119895119909119894119896) 2 (120582

2+ 119898119895) 120596119898119894]

10038171003817100381710038171003817119909119894119895119909119894119896100381710038171003817100381710038172

(15)

where 120573+(119898)119895

minus 120573minus(119898)119895

is the estimated value of the 119895th mainvariable coefficient after119898 iterations and Θ119898

119895119896is the estimation

of the interaction coefficient between the 119895th variable and the119896th variable after119898 iterations

5 The Experimental Results and Analysis

51 The Experimental Results and Analysis of Four UCIDatasets There are four UCI study database datasets whichinclude the breast-cancer-Wisconsin datasets Ionospheredatasets Liver disorders datasets and Sonar datasets asshown in Table 1

We do the 10-fold cross-validation (10-CV) experimentsin the paper for 20 times using 119877 where 120582

1= 2120582

2= 120582

Besideswe complete an experiment employing the interactivehierarchical lasso logistic regression method The resultsinclude the number of nonzero variable coefficients averageerror rate of the 10-CV standard deviation (SD) CPU timeand the value of lambda (120582) estimated The results are shownin Table 2

The results of the 10-CV based on the four datasets arepresented in Figures 2 to 5 using the proposedmethod In thefigures the horizontal axis represents the logarithmic value of120582 and the vertical axis is the error rate of the 10-CV Besidesthe horizontal axis at the top of each figure represents thenumber of nonzero variable coefficients corresponding to the120582 value

The results for the breast-cancer-Wisconsin datasets areshown in Figure 2 The minimum error rate is 003 and thenumber of selected variables is more than 11 The results forthe Ionosphere experimental datasets are shown in Figure 3When the number of selected variables is 101 the lowest errorrate is 028 with the smaller standard deviation The numberof the selected variables is larger than the original dimensionso the interactions provide the classified information Theresults for the Liver disorders datasets are presented inFigure 4 If the number of selected variables is 25 the lowesterror rate can reach 026 while the standard deviation is002 Finally the results for the Sonar datasets are presentedin Figure 5 When more than 80 variables are selected theminimum error rate is 014

Mathematical Problems in Engineering 5

005

015

025

035

Cros

s-va

lidat

ion

erro

r

24 20 14 14 13 13 11 11 6Number of features

02

3

1 2 3 4 5

log(120582)

Figure 2 The results of breast-cancer-Wisconsin

028

030

032

034

036

038

177 139 104 83 73 53 30 9Number of features

3

Cros

s-va

lidat

ion

erro

r

log(120582)minus1 0 1 2 3

Figure 3 The results of Ionosphere

In what follows we compare our method to the existingliterature [13] The classification results and training time ofour method are better than those shown in the literature[13] The experimental results of the lasso all-pair lassoand conventional pattern recognition methods with 10-foldcross-validation of 20 times in the four UCI datasets arelisted respectively in Tables 3 4 and 5 Conventional patternrecognitionmethods include support vectormachine (SVM)linear discriminant analysis (LDA) quadratic discriminantanalysis (QDA)119870-nearest neighborhood (119870-NN) and deci-sion tree (DT) methods The lasso is a method that considersthe main variables without the interaction variables All-pairlasso is a method that considers the main variables and inter-action variables but without the hierarchy The experimental

025

030

035

33 32 32 28 26 24 24 19 12 8

040

log(120582)minus1 0 1 2 3

Number of features

Cros

s-va

lidat

ion

erro

r

Figure 4 The results of Liver disorders

03

05

124 112 97 88 77 70 42 18 8

02

04

log(120582)0 1 2 3

1Number of features

Cros

s-va

lidat

ion

erro

r

Figure 5 The results of Sonar

Table 3 The experimental results of the lasso penalized logisticregression model

Datasets Error rate() SD Time (s)

Breast-cancer-Wisconsin 322 00001 03744Ionosphere 3115 00075 04863Liver disorders 3077 00077 00908Sonar 2213 00121 156806

results show that our model is better in classification resultsand more stableThis highlights the advantage of the variableinteractions and hierarchy

52 The Experimental Results for High-Dimensional SmallSample Data The Madelon datasets from NIPS2003 was

6 Mathematical Problems in Engineering

Table 4 The experimental results of the all-pair lasso penalizedlogistic regression model

Datasets Error rate() SD Time (s)

Breast-cancer-Wisconsin 293 00001 121213Ionosphere 2700 00089 31734Liver disorders 2636 00058 49190Sonar 2063 00115 41041

used to evaluate our method The sample numbers of thetraining validation and testing of the data were respectively2000 600 and 1800 The class number is 2 The variabledimension is 500 So the interactive dimension is 124750You can find more information about the datasets followingthe link httpwwwnipsfscecssotonacuk you can alsodownload the datasets and see the results of challengesbalance error rates and the area under the curve The modelis trained by using a training set The model parameters areselected by using a validation set Also the prediction resultsof the final model using the test set are uploaded online Theclassification score of the final model is obtained Our resultsare shown in Table 6 The results show that our method isslightly better than the lasso and all-pair lasso This impliesthat the interactions may also be important in the Madelondatasets

53 Activity Recognition (AR) Using Inertial Sensors of Smart-phones Anguita et al collected sensor data of smartphones[10] They used the support vector machine (SVM) methodto solve the classification problem of the daily life activityrecognition These results play an extremely significant rolein disability and elderly care Datasets can be downloadedfollowing the literature [10] 30 volunteers aged 19ndash48 yearsparticipated in the study Each person performed six activitieswearing the smartphone on the waist To obtain the dataclass label experiments were conducted using video record-ing The smartphone used in the experiments had built-inaccelerometer and gyroscope for measuring 3D linear accel-eration and angular acceleration The sampling frequencywas 50Hz which is more than enough for capturing humanmovements

We use the datasets to evaluate our method We use theupstairs and downstairs movements as two active classesThetraining sets have 986 samples and 1073 samples respectivelyThe test sets have 420 samples and 471 samples respectivelyThe variable dimension is 561 which includes the time andfrequency from sensor signals

Experimental results of the three lassomethods and somepattern recognitionmethods are shown inTable 7The resultsshow that our method is better than the pattern recognitionmethods since it takes the variable selection and interactioninto account Our method achieves the best classificationresults with less training and testing time

54 The Numerical Simulation Results and Discussion Nowsuppose that the number and dimension of the samples are

119899 = 200 119901 = 20 We take interactions into considerationand provide the following three kinds of simulation based onformula (1)

(1) The real model is hierarchical Θ119895119896

= 0 rArr 120573119895

= 0 or120573119896

= 0 119895 119896 = 1 119901 There are 10 nonzero elementsin 120573 and 20 nonzero elements in Θ

(2) The real model only includes interactive variables120573119895= 0 119895 = 1 119901 There are 20 nonzero elements

in Θ

(3) The real model only includes main variables Θ119895119896

=0 119895 119896 = 1 119901 There are 10 nonzero elements in 120573

The SNR of the main variables is 15 and SNR of theinteraction variables is 1 The experiment results of 100 timesare shown in Figure 6

When the real model is hierarchical our method is thebest and the lasso is the worst This is shown in Figure 6(a)When the real model only includes interaction variances theinteractive lasso is the best and our method takes the secondplace while the lasso is still theworst as shown in Figure 6(b)The reason for this result is that when our method fitsthe model the interaction variables are considered to bemain variables When the real model only includes mainvariables the lasso is the best and our method still takes thesecond place and the all-pair lasso is the worst as shown inFigure 6(c)

We believe thatmany actual classification problems couldbe hierarchical and interactive They contain both mainvariables and interaction variables Our method fits in thiskind of situation

6 Conclusion

Taking into consideration the interaction between variablesthe hierarchical interactive lasso penalized logistic regressionusing the coordinate descent algorithm is derivedWeprovidethe model definition constraint condition and the convexrelaxation condition for the model We obtain a solutionfor the coefficients of the proposed model based on theconvex optimization and coordinate descent algorithm Wefurther provide experimental results based on four UCIdatasets NIPS2003 feature selection challenge datasets andtrue daily life activities identification datasets The resultsshow that the interaction widely exists in the classificationmodels They also demonstrate that the variable interactioncontributes to the response The classification performanceof our method is superior to the lasso all-pair lasso andsome pattern recognition methods It turns out that thevariable interaction and hierarchy are two important factorsOur further research is planned as follows other convexoptimization methods including the generalized gradientdescent method or alternating direction multiplier methodthe hierarchical interactive lasso penalized multiclass logisticregressionmethod the elastic net method or the hierarchicalgroup lasso method The application of the multisensorinteraction in the daily life activities of the elderly is a newway of using of our method

Mathematical Problems in Engineering 7

Table 5 The experimental results of the traditional pattern recognition methods

Datasets Classifier Error rate () SD Time (s)

Breast-cancer-Wisconsin

SVM 342 00019 13092LDA 396 00007 02429QDA 492 00014 02309K-NN 461 00030 06636DT 499 00041 16201

Ionosphere

SVM 3583 00063 29149LDA 3310 00090 06168QDA 3201 00076 04516K-NN 2830 00090 06363DT 3823 00408 30865

Liver disorders

SVM 3080 00100 11401LDA 3131 00089 02474QDA 4019 00119 01958K-NN 3689 00124 06089DT 3694 00187 19921

Sonar

SVM 2570 00197 12181LDA 2513 00145 05897QDA 2380 00205 04594K-NN 1309 00101 05345DT 3591 00303 07153

Table 6 The experimental results for the Madelon datasets

Methods Balance error rate () Area under the curve ()Training Validation Testing Training Validation Testing

Lasso 3808 3721 3661 6478 6279 6340All-pair lasso 3498 3701 3657 6502 6299 6343Our method 3000 3635 3512 7000 6808 6776

Appendices

A Proofs from (5) to (6)

For notational convenience we have written 119901119894instead of

119901(119909119894) A logarithmic likelihood function of (4) is as follows

119871119876=

1

119873ln [119871 (120573 Θ)]

=1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

(A1)

First we give the first- and second-order partial derivativeand mixed partial derivative of (4) with respect to 120573 and Θ

120597119871119876

120597120573

=120597 (1119873)sum

119873

119894=1[119910119894(119909119879119894120573 + (12) 119909119879

119894Θ119909119894) + ln (1 minus 119901

119894)]

120597120573

=1

119873

119873

sum119894=1

[(119909119879119894119910119894)119879

minusexp (119909119879

119894120573 + (12) 119909119879

119894Θ119909119894)

1 + exp (119909119879119894120573 + (12) 119909119879

119894Θ119909119894)sdot 119909119894]

=1

119873

119873

sum119894=1

(119910119879119894119909119894minus 119901119879119894119909119894) =

1

119873

119873

sum119894=1

119909119894(119910119894minus 119901119894)

1205972119871119876

1205971205732

=120597

120597120573

1

119873

119873

sum119894=1

119909119879119894(119910119894minus 119901119894) = minus

1

119873

119873

sum119894=1

119909119879119894

120597119901119894

120597120573

=1

119873

119873

sum119894=1

119909119879119894

exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

[1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)]2sdot (minus119909119894)

= minus1

119873

119873

sum119894=1

119909119894

exp (119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

sdot1

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)sdot 119909119879119894

= minus1

119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894

120597119871119876

120597Θ

8 Mathematical Problems in Engineering

Table 7 The experimental results of the three lasso methods and five traditional pattern recognition methods

Methods Error rate in training () Training time (s) Error rate in testing () Testing time (s)Lasso 000 02649 146 00312All-pair lasso 000 08807 135 00953Our method 000 06012 112 00556SVM 000 09516 123 009361-NN 000 00000 920 079563-NN 000 00000 898 07488QDA 661 14196 404 02184DT 000 07332 1493 01092

0055

005

0045

004

0035

003

0025

002

0015

Erro

r rat

e

Lasso Our method All-pair lasso

(a) Hierarchical interaction

0055

005

0045

004

0035

003

0025

002

0015

Erro

r rat

e

Lasso Our method All-pair lasso

(b) Interaction variances only

0055

005

0045

004

0035

003

0025

002

0015

Erro

r rat

e

Lasso Our method All-pair lasso

(c) Main variances only

Figure 6 The error rate of the three lasso methods in the simulation experiment (the Bayes error is shown by the purple dotted line in thegraph)

Mathematical Problems in Engineering 9

=120597 (1119873)sum

119873

119894=1[119910119894(119909119879119894120573 + (12) 119909119879

119894Θ119909119894) + ln (1 minus 119901

119894)]

120597Θ

=1

119873

119873

sum119894=1

[(1

2119910119894119909119879119894119909119894)119879

minusexp (119909119879

119894120573 + (12) 119909119879

119894Θ119909119894)

1 + exp (119909119879119894120573 + (12) 119909119879

119894Θ119909119894)

sdot1

2sdot 119909119879119894119909119894]

=1

119873

119873

sum119894=1

(1

2119910119894119909119879119894119909119894minus1

2119901119894119909119879119894119909119894)

=1

2119873

119873

sum119894=1

119909119879119894119909119894(119910119894minus 119901119894)

1205972119871119876

120597Θ2

=120597

120597Θ

1

2119873

119873

sum119894=1

119909119879119894119909119894(119910119894minus 119901119894) = minus

1

2119873

119873

sum119894=1

119909119879119894119909119894

120597119901119894

120597Θ

=1

2119873

119873

sum119894=1

exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

[1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)]2

sdot (minus1

2119909119879119894119909119894) sdot 119909119879119894119909119894

= minus1

4119873

119873

sum119894=1

exp (119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

sdot1

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)sdot1003817100381710038171003817119909119894

10038171003817100381710038172

= minus1

4119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot1003817100381710038171003817119909119894

10038171003817100381710038172

1205972119871119876

120597120573120597Θ

=120597

120597120573

1

119873

119873

sum119894=1

119909119879119894(119910119894minus 119901119894) = minus

1

119873

119873

sum119894=1

119909119879119894

120597119901119894

120597Θ

=1

119873

119873

sum119894=1

119909119879119894

exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

[1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)]2

sdot (minus1

2119909119879119894119909119894)

= minus1

119873

119873

sum119894=1

119909119879119894

exp (119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

sdot1

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

= minus1

2119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894sdot 119909119879119894

(A2)

Then (A1) is expanded by using Taylor series with respectto the expended point (120573 Θ)

119897119876(120573 Θ)

= 119871119876+ (120573 minus 120573) sdot

120597119871119876

120597120573+ (Θ minus Θ) sdot

120597119871119876

120597Θ

+1

2[(120573 minus 120573) sdot

120597119871119876

1205971205732+ 2 (120573 minus 120573)

119879

(Θ minus Θ)1205972119871119876

120597120573120597Θ

+ (Θ minus Θ) sdot120597119871119876

120597Θ2]

=1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

+ (120573 minus 120573) sdot1

119873

119873

sum119894=1

119909119894(119910119894minus 119901119894)

+ (Θ minus Θ) sdot1

2119873

119873

sum119894=1

119909119879119894119909119894(119910119894minus 119901119894)

minus1

2(120573 minus 120573) sdot

1

119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894

minus 2 (120573 minus 120573)119879

(Θ minus Θ)

sdot1

2119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894sdot 119909119879119894

minus (Θ minus Θ) sdot1

4119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot1003817100381710038171003817119909119894

10038171003817100381710038172

= minus1

2119873

119873

sum119894=1

(1 minus 119901119894)

sdot 119901119894[119909119879119894(120573 minus 120573)]

2

+ (120573 minus 120573)119879

(Θ minus Θ) 119909119894

sdot 119909119879119894119909119894+1

4[119909119879119894(Θ minus Θ) 119909

119894]2

+1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

+1

119873

119873

sum119894=1

119909119879119894(120573 minus 120573) (119910

119894minus 119901119894)

+1

2119873

119873

sum119894=1

119909119879119894(Θ minus Θ) 119909

119894(119910119894minus 119901119894)

= minus1

2119873

119873

sum119894=1

(1 minus 119901119894)

10 Mathematical Problems in Engineering

sdot 119901119894119909119879119894(120573 minus 120573)

+1

2119909119879119894(Θ minus Θ) 119909

1198942

+1

119873

119873

sum119894=1

119909119879119894(120573 minus 120573) (119910

119894minus 119901119894)

+1

2119873

119873

sum119894=1

119909119879119894(Θ minus Θ) 119909

119894(119910119894minus 119901119894)

+1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

= minus1

2119873

119873

sum119894=1

(1 minus 119901119894)

sdot 119901119894119909119879119894(120573 minus 120573) +

1

2119909119879119894(Θ minus Θ) 119909

119894

+119910119894minus 119901119894

119901119894[1 minus 119901

119894]

2

+1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

minus1

2119873

119873

sum119894=1

[119910119894minus 119901119894

119901119894[1 minus 119901

119894]]

2

= minus1

2119873

119873

sum119894=1

120596119894(119911119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896)

2

+1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

minus1

2119873

119873

sum119894=1

[119910119894minus 119901119894

119901119894[1 minus 119901

119894]]

2

(A3)

where

120596119894= 119901 (119909

119894) [1 minus 119901 (119909

119894)]

119911119894= sum119895

120573119895119909119894119895+1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896

+119910119894minus 119901 (119909

119894)

119901 (119909119894) [1 minus 119901 (119909

119894)]

(A4)

B Proofs from (12) to (13)

First there are three cases in calculating 120573+119895minus 120573minus119895

(1) 120573+119895ge 0 120573minus

119895= 0

120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+

119895minus 120573minus119895minus1205821+ 119895

120596119894

(B1)

(2) 120573+119895ge 0 120573minus

119895= 0

120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+

119895minus 120573minus119895+1205821+ 119895

120596119894

(B2)

(3) 120573+119895ge 0 120573minus

119895= 0

120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+

119895minus 120573minus119895 (B3)

We derive

120573+119895minus 120573minus119895= 119878 [119909

119894119895(119911119894minus 119860119894) + 120573+119895minus 120573minus1198951205821+ 119895

120596119894

] (B4)

Secondly we give the following by the stationary condi-tion 120597119871120597Θ

119895119896= 0

1

2120596119894sdot 2(119911

119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896)

sdot (minus1

2119909119894119895119909119894119896) + (120582

2+ 120572119895)119880119895119896= 0

120596119894sdot (119911119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896)

sdot (1

2119909119894119895119909119894119896) = (120582

2+ 120572119895)119880119895119896

(119911119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896)

sdot (1

2119909119894119895119909119894119896) =

(1205822+ 120572119895)119880119895119896

120596119894

(B5)

Supposing

120574(minus119895119896) = 119911119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896+ Θ119895119896119909119894119895119909119894119896 (B6)

we have

(120574(minus119895119896) minus Θ119895119896119909119894119895119909119894119896) sdot (119909119894119895119909119894119896)

=2 (1205822+ 120572119895)119880119895119896

120596119894

Θ119895119896119909119894119895119909119894119896119909119894119895119909119894119896

= minus2 (1205822+ 120572119895)119880119895119896

120596119894

+ 120574(minus119895119896) sdot 119909119894119895119909119894119896

Mathematical Problems in Engineering 11

Θ119895119896

=120574(minus119895119896) sdot 119909

119894119895119909119894119896minus 2 (120582

2+ 120572119895)119880119895119896120596119894

(119909119894119895119909119894119896)2

(B7)

We discuss three cases about Θ119895119896value

(1) Θ119895119896gt 0 119880

119895119896= 1

Θ119895119896=120574(minus119895119896) sdot 119909

119894119895119909119894119896minus 2 (120582

2+ 120572119895) 120596119894

(119909119894119895119909119894119896)2

(B8)

(2) Θ119895119896lt 0 119880

119895119896= minus1

Θ119895119896=120574(minus119895119896) sdot 119909

119894119895119909119894119896+ 2 (120582

2+ 120572119895) 120596119894

(119909119894119895119909119894119896)2

(B9)

(3) Θ119895119896= 0

We drive

Θ119895119896=119878 [120574(minus119895119896) sdot 119909

119894119895119909119894119896 2 (1205822+ 120572119895) 120596119894]

(119909119894119895119909119894119896)2

(B10)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported by the National Natural ScienceFoundation of China (nos 61273019 61473339) China Post-doctoral Science Foundation (2014M561202) Hebei Postdoc-toral Science Foundation Special Fund Project and HebeiTop Young Talents Support Program

References

[1] R Tibshirani ldquoRegression shrinkage and selection via the lassordquoJournal of the Royal Statistical Society B vol 58 no 1 pp 267ndash288 1996

[2] M Y Park and T Hastie ldquo1198711-regularization path algorithm

for generalized linear modelsrdquo Journal of the Royal StatisticalSociety B Statistical Methodology vol 69 no 4 pp 659ndash6772007

[3] T TWu Y F Chen THastie E Sobel andK Lange ldquoGenome-wide association analysis by lasso penalized logistic regressionrdquoBioinformatics vol 25 no 6 pp 714ndash721 2009

[4] J Friedman T Hastie and R Tibshirani ldquoRegularization pathsfor generalized linear models via coordinate descentrdquo Journal ofStatistical Software vol 33 no 1 pp 1ndash22 2010

[5] M Lim and T Hastie ldquoLearning interactions via hierarchicalgroup-lasso regularization[EB]rdquo httpwwwstanfordedusimhastiePapersglinternetpdf

[6] H Schwender and K Ickstadt ldquoIdentification of SNP interac-tions using logic regressionrdquo Biostatistics vol 9 no 1 pp 187ndash198 2008

[7] S Noah and R Tibshirani ldquoA Permutation Approach to TestingInteractions inManyDimensionsrdquo httpstatwebstanfordedusimtibsresearchhtml

[8] JWu BDevlin S RingquistM Trucco andK Roeder ldquoScreenand clean a tool for identifying interactions in genome-wideassociation studiesrdquo Genetic Epidemiology vol 34 no 3 pp275ndash285 2010

[9] Y Nardi and A Rinaldo ldquoThe log-linear group-lasso estimatorand its asymptotic propertiesrdquo Bernoulli vol 18 no 3 pp 945ndash974 2012

[10] M Yuan V R Joseph and Y Lin ldquoAn efficient variable selectionapproach for analyzing designed experimentsrdquo Technometricsvol 49 no 4 pp 430ndash439 2007

[11] H Chipman ldquoBayesian variable selection with related predic-torsrdquoThe Canadian Journal of Statistics vol 24 no 1 pp 17ndash361996

[12] N H Choi W Li and J Zhu ldquoVariable selection with thestrong heredity constraint and its oracle propertyrdquo Journal of theAmerican Statistical Association vol 105 no 489 pp 354ndash3642010

[13] J Bien J Taylor and R Tibshirani ldquoA lasso for hierarchicalinteractionsrdquoTheAnnals of Statistics vol 41 no 3 pp 1111ndash11412013

[14] M Yuan and Y Lin ldquoModel selection and estimation in regres-sion with grouped variablesrdquo Journal of the Royal StatisticalSociety Series B StatisticalMethodology vol 68 no 1 pp 49ndash672006

[15] R Jenatton J-Y Audibert and F Bach ldquoStructured variableselection with sparsity-inducing normsrdquo Journal of MachineLearning Research vol 12 no 10 pp 2777ndash2824 2011

[16] P Radchenko and G M James ldquoVariable selection usingadaptive nonlinear interaction structures in high dimensionsrdquoJournal of the American Statistical Association vol 105 no 492pp 1541ndash1553 2010

[17] F Bach R Jenatton J Mairal and G Obozinski ldquoStructuredsparsity through convex optimizationrdquo Statistical Science vol27 no 4 pp 450ndash468 2012

[18] I Ruczinski C Kooperberg and M LeBlanc ldquoLogic regres-sionrdquo Journal of Computational and Graphical Statistics vol 12no 3 pp 475ndash511 2003

[19] P Hall and J-H Xue ldquoOn selecting interacting features fromhigh-dimensional datardquo Computational Statistics amp Data Anal-ysis vol 71 pp 694ndash708 2014

[20] J-J Wang J Li T Zhang and W-X Hong ldquoDistinguishingvisual feature extraction method using quadratic map andgenetic algorithmrdquo Journal of System Simulation vol 21 no 16pp 5080ndash5083 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 2: Research Article Coordinate Descent Based Hierarchical ...downloads.hindawi.com/journals/mpe/2014/430201.pdf · Coordinate Descent Based Hierarchical Interactive Lasso Penalized Logistic

2 Mathematical Problems in Engineering

1

k-vectors2-vectors

1-vectors

0-vectord-vector

middot middot middot

middot middot middotmiddot middot middot

middot middot middot

e1e2

ei

ed

middot middot middot

middot middot middot

e1 and e2

e1 and e3

ei and ej

edminus1 and ed

ei and middot middot middot and ej and middot middot middot and ek

e1 and e2 and middot middot middot and ed

Figure 1 The diagram of the subspaces in geometric algebra

algorithm using selection to choose interaction variables inhigh-dimensional data

The literature [13] presents a hierarchical interactive lassomethod for regression and provides a method of model coef-ficients estimation using KKT conditions and the Lagrangemultiplier method Based on the literature [13] and our pastwork the authors propose the concept of geometric algebrainteraction and coordinate descent algorithm for the hier-archical interactive lasso penalized logistic regression Weused experimental data including 4 kinds of datasets from theUCI machine learning database one Madelon datasets fromNIPS2003 and one daily life activity recognition datasetsTheexperimental results reveal the outstanding advantages of thehierarchical interactive lasso method compared to the lassoand interactive lasso methods The innovations include thefollowing (1) We use geometric algebra to explain variableinteraction (2) we derive an improved coordinate descentalgorithm to solve the hierarchical interactive lasso penalizedlogistic regression (3) we use the hierarchical interactivelasso for the classification problem

2 The Variable Interaction Theory ofGeometric Algebra

Definition 1 If the function 119891(119909 119910) cannot be represented asa sum of independent functions 119891

1(119909) + 119891

2(119910) then 119909 119910 in

the 119891 function are said to have interaction

A popular explanation of Definition 1 is that if a responsevariable cannot be represented as a linear weighted sumof the prediction variables it is probably because there areinteractions between the variables

Interactions between variables can be easily explainedby the geometric algebra theory Figure 1 is a diagramshowing all subspace in geometric algebra The 1-vectorsnamely order-1main variables can represent a119901dimensionalsubspace of the original data That is 119901 dimensional base ofthe original data is projected on the 1-vectors The 2-vectorsshow the interaction between two variables The simplest 2-vectors coefficient can be the product of two 1-vectors Inthe literature [13] our proposed area feature is considered asone of the interactions In the literature [20] our proposedorthocenter feature is considered as one of the interactionsHigher-order interactions are represented by 119896-vectors Inthis paper we only study area interactions between 1-vectors

This method can also be extended to the nonlinear complexfunction interactions or higher order

3 The Binary Logistic Regression Based onInteraction and Hierarchy

The outcome variable in the binary logistic model is denotedby119884 the income variables are predictors119883 119909

1 119909

119895 119909

119901

are order-1 main variables and the pairwise 119909119895119909119896are inter-

actions variables between order-1 main variables The binarylogistic model has the form

log 119894119905 (119875 (119884 = 1 | 119883)) =

119901

sum119895=0

120573119895119909119895+1

2sum119895 =119896

Θ119895119896119909119895119909119896+ 120576 (1)

where Θ119895119895= 0 the main variables coefficients are 120573 isin 119877119901+1

the interaction variables coefficients areΘ isin 119877119901times119901 1199090is 1 and

120576 satisfies119873(0 1205902)Assume that the training samples are (x

1 1199101) (xi 119910119894)

(xN 119910119873) xi isin 119877119901

119910119894=

1 119884119894= 1

0 119884119894= 1

(2)

Our goal is to select a feature subset from the order-1mainvariables (dimension 119901) and order-2 interaction variables(dimension 119901(119901 minus 1)2) We then estimate the coefficientvalues for nonzero model parameters We can obtain theprobability of two classes as follows

Pr (119884 = 1 | xi)

=1

1 + exp (minussum119895120573119895119909119894119895minus (12)sum

119895 =119896Θ119895119896119909119894119895119909119894119896)

= 119901 (xi)

119875 (119884 = 2 | xi)

=1

1 + exp (sum119895120573119895119909119894119895+ (12)sum

119895 =119896Θ119895119896119909119894119895119909119894119896)

= 1 minus 119901 (xi)

(3)

The maximum likelihood estimation is used to estimatethe unknown model parameters which make the likelihoodfunction of 119873 independent observations the largest Wedefine

119871 (120573 Θ) =119873

prod119894=1

119901 (xi)119910119894 [1 minus 119901 (xi)]

(1minus119910119894) (4)

Then the logarithmic likelihood function of (4) is

1

119873ln [119871 (120573 Θ)]

=1

119873

119873

sum119894=1

[119910119894(xTi 120573 +

1

2xTi Θxi) + ln (1 minus 119901 (xi))]

(5)

Mathematical Problems in Engineering 3

We use the second-order Taylor expansion at the currentestimated value (120573 Θ) for (5) and obtain the subproblem asfollows

119897119876(120573 Θ) =

1

2120596119894(119911119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896)

2

+(119910119894minus 119901 (xi))

2

2 lowast 1199012 (xi) [1 minus 119901 (xi)]2

+ [

[

119910119894(sum119895

120573119895119909119894119895+ sum119895 =119896

Θ119895119896119909119894119895119909119894119896)

+ ln (1 minus 119901 (xi))]]

(6)

where 120596119894= 119901(xi)[1 minus 119901(xi)] 119911119894 = sum

119895120573119895119909119894119895+ (12)sum

119895 =119896Θ119895119896

119909119894119895119909119894119896+ (119910119894minus 119901(xi))119901(xi)[1 minus 119901(xi)]

The proof that (5) implies (6) is presented in theAppendix A

In order to obtain the sparse solution for themain variablecoefficients and interaction coefficients the penalty functionis used to enhance the stability of the interactive model

(120573 Θ) = arg Min120573isin119877119901+1Θisin119877119901times119901

minus 119897119876(120573 Θ) + 120582

1

100381710038171003817100381712057310038171003817100381710038171

+ 1205822 Θ1

(7)

We focus on those interactions that have large main vari-able coefficients Such restrictions are known as ldquohierarchyrdquoThe mathematical expression for them is Θ

119895119896= 0 rArr 120573

119895= 0

or 120573119896

= 0 So we add the constraints enforcing the hierarchyinto (7) as follows

(120573 Θ)

= Min120573isin119877119901+1Θisin119877119901times119901

minus 119897119876(120573 Θ) + 120582

1

100381710038171003817100381712057310038171003817100381710038171 + 1205822 Θ1

st 10038171003817100381710038171003817Θ119895100381710038171003817100381710038171 le

1003816100381610038161003816100381612057311989510038161003816100381610038161003816 for 119895 = 1 119901

(8)

where Θ119895is the 119895th column of Θ If Θ

119895119896= 0 then Θ

1198951gt 0

Θ1198961gt 0 so 120573

119895= 0 and 120573

119896= 0 The new constraint

guarantees the hierarchy but we cannot obtain a convexsolution because (8) is not convex So instead of 120573 we use120573+ 120573minus And the corresponding convex relaxation of (8) is asfollows

(120573plusmn Θ) = Min1205730isin119877120573

plusmnisin119877119901Θisin119877119901times119901

minus 119897119876(120573+ minus 120573minus Θ)

+ 12058211119879 (120573+ + 120573minus) + 120582

2sum119895

10038171003817100381710038171003817Θ11989510038171003817100381710038171003817

st10038171003817100381710038171003817Θ119895

100381710038171003817100381710038171 le 120573+119895+ 120573minus119895

120573+119895ge 0 120573minus

119895ge 0

for 119895 = 1 119901

(9)

where 120573 = 120573+ minus 120573minus 120573plusmn = maxplusmn120573 0 120573+ 120573minus isin 119877119901+1 and1205731= 120573+ + 120573minus

4 Coordinate Descent Algorithm andKKT Conditions

The basic idea of the coordinate descent algorithm is toconvert multivariate problems into multiple single variablesubproblems It allows optimizing only one-dimensionalvariables at a time The solution can be updated in a cycleWe solve (9) using the coordinate descent algorithm

The Lagrange function corresponding to (9) is as follows

119871 (120573+ 120573minus Θ) = minus119897119876(120573+ minus 120573minus Θ)

+ (12058211 minus minus 120574+1)

119879

120573+

+ (12058211 minus minus 120574minus1)

119879

120573minus

+ ⟨diag (12058221 + )119880Θ⟩

(10)

where

119895119896=

isin [minus1 1] Θ119896119895= 0

sign (Θ119895119896) Θ119895119896

= 0(11)

and and 120574plusmn are the dual variables corresponding tothe hierarchical constraint and the nonnegative constraintsFormula (10) can be decomposed into 119901 subproblems

119871 (120573+119895 120573minus119895 Θ119895) = minus119897

119876(120573+119895minus 120573minus119895 Θ119895)

+ (1205821minus 119895minus 120574+)119879

120573+119895

+ (1205821minus 119895minus 120574minus)119879

120573minus119895

+ ⟨(1205822+ 120572)119880

119895 Θ119895⟩

(12)

The solution of (12) as a convex problem can be obtainedby a set of optimality conditions known as the KKT (Karush-Kuhn-Tucker) conditions This is the key advantage of ourapproach

The stationary conditions of (12) according to KKT are120597119871120597120573plusmn

119895= 0 120597119871120597Θ

119895119896= 0The complementary conditions are

Θ1198951le 120573+119895+120573minus119895 120574plusmn119895120573plusmn119895= 0 120573plusmn ge 0 120574plusmn ge 0 ge 0

119895(Θ1198951minus

120573+119895minus 120573minus119895) = 0 We assume that (1119873)sum119873

119894=11199092119894119895= 1 For our

problem the KKT conditions can be written as follows

120573+119895minus 120573minus119895= 119878 [119909

119894119895(119911119894minus 119860119894) + 120573+119895minus 120573minus1198951205821+ 119895

120596119894

]

Θ119895119896=119878 [(119909119894119895119909119894119896)119879

(119911119894minus 119860119894+ Θ119895119896119909119894119895119909119894119896) 2 (120582

2+ 119895) 120596119894]

10038171003817100381710038171003817119909119894119895119909119894119896100381710038171003817100381710038172

(13)

where119860119894= sum119895120573119895119909119894119895+(12)sum

119895 =119896Θ119895119896119909119894119895119909119894119896 119911119894120596119894are from (6)

and 119878 denotes the soft-threshold operator defined by 119878(119888 120582) =sign(119888)(|119888| minus 120582)

+

4 Mathematical Problems in Engineering

Table 1 Datasets information

Datasets Samples Variable dimension Interactive dimension Class1 Breast-cancer-Wisconsin 683 9 36 22 Ionosphere 351 33 528 23 Liver disorders 345 6 15 24 Sonar 208 60 1770 2

Table 2 The experimental results of our method

Datasets Coefficients Error rate () SD Time (s) 120582

Breast-cancer-Wisconsin 22 30 001 949 314Ionosphere 101 280 002 12463 282Liver disorders 25 260 002 1211 237Sonar 117 140 002 15069 073

The proof of both expressions (13) can be found inAppendix B

Now we define 119891(119895) = Θ

1198951minus 120573+119895minus 120573minus119895 Then

119891 (119895)

=

10038171003817100381710038171003817100381710038171003817100381710038171003817

119875

sum119896=1

119878 [(119909119894119895119909119894119896)119879

(119911119894minus 119860119894+ Θ119895119896119909119894119895119909119894119896)

2 (1205822+ 119895)

120596119894

]

sdot (10038171003817100381710038171003817119909119894119895119909119894119896

100381710038171003817100381710038172

2)minus1

100381710038171003817100381710038171003817100381710038171003817100381710038171

minus100381710038171003817100381710038171003817100381710038171003817119878 [119909119894119895(119911119894minus 119860119894) + 120573+119895minus 120573minus1198951205821+ 119895

120596119894

]1003817100381710038171003817100381710038171003817100381710038171

(14)

The remaining KKT conditions only involve 119891() = 0119891() le 0 ge 0 Observing that 119891 is nonincreasing withrespect to and is piecewise linear it is easy to get the solutionfor

In conclusion the overall idea of the coordinate descentalgorithm is that the minimization of (9) is equivalent to theminimization of (10) Formula (10) can be decomposed into119901 independent formula (12) Formula (12) can be solved as(13) The final coefficient optimization iteration formula is asfollows

120573+(119898+1)119895

minus 120573minus(119898+1)119895

= 119878 [119909119894119895(119911119898119894minus 119860119898119894) + 120573+(119898)119895

minus 120573minus(119898)119895

1205821+ 119898119895

120596119896119894

]

Θ119898+1119895119896

=119878 [(119909119894119895119909119894119896) (119911119898119894minus 119860119898119894+ Θ119898119895119896119909119894119895119909119894119896) 2 (120582

2+ 119898119895) 120596119898119894]

10038171003817100381710038171003817119909119894119895119909119894119896100381710038171003817100381710038172

(15)

where 120573+(119898)119895

minus 120573minus(119898)119895

is the estimated value of the 119895th mainvariable coefficient after119898 iterations and Θ119898

119895119896is the estimation

of the interaction coefficient between the 119895th variable and the119896th variable after119898 iterations

5 The Experimental Results and Analysis

51 The Experimental Results and Analysis of Four UCIDatasets There are four UCI study database datasets whichinclude the breast-cancer-Wisconsin datasets Ionospheredatasets Liver disorders datasets and Sonar datasets asshown in Table 1

We do the 10-fold cross-validation (10-CV) experimentsin the paper for 20 times using 119877 where 120582

1= 2120582

2= 120582

Besideswe complete an experiment employing the interactivehierarchical lasso logistic regression method The resultsinclude the number of nonzero variable coefficients averageerror rate of the 10-CV standard deviation (SD) CPU timeand the value of lambda (120582) estimated The results are shownin Table 2

The results of the 10-CV based on the four datasets arepresented in Figures 2 to 5 using the proposedmethod In thefigures the horizontal axis represents the logarithmic value of120582 and the vertical axis is the error rate of the 10-CV Besidesthe horizontal axis at the top of each figure represents thenumber of nonzero variable coefficients corresponding to the120582 value

The results for the breast-cancer-Wisconsin datasets areshown in Figure 2 The minimum error rate is 003 and thenumber of selected variables is more than 11 The results forthe Ionosphere experimental datasets are shown in Figure 3When the number of selected variables is 101 the lowest errorrate is 028 with the smaller standard deviation The numberof the selected variables is larger than the original dimensionso the interactions provide the classified information Theresults for the Liver disorders datasets are presented inFigure 4 If the number of selected variables is 25 the lowesterror rate can reach 026 while the standard deviation is002 Finally the results for the Sonar datasets are presentedin Figure 5 When more than 80 variables are selected theminimum error rate is 014

Mathematical Problems in Engineering 5

005

015

025

035

Cros

s-va

lidat

ion

erro

r

24 20 14 14 13 13 11 11 6Number of features

02

3

1 2 3 4 5

log(120582)

Figure 2 The results of breast-cancer-Wisconsin

028

030

032

034

036

038

177 139 104 83 73 53 30 9Number of features

3

Cros

s-va

lidat

ion

erro

r

log(120582)minus1 0 1 2 3

Figure 3 The results of Ionosphere

In what follows we compare our method to the existingliterature [13] The classification results and training time ofour method are better than those shown in the literature[13] The experimental results of the lasso all-pair lassoand conventional pattern recognition methods with 10-foldcross-validation of 20 times in the four UCI datasets arelisted respectively in Tables 3 4 and 5 Conventional patternrecognitionmethods include support vectormachine (SVM)linear discriminant analysis (LDA) quadratic discriminantanalysis (QDA)119870-nearest neighborhood (119870-NN) and deci-sion tree (DT) methods The lasso is a method that considersthe main variables without the interaction variables All-pairlasso is a method that considers the main variables and inter-action variables but without the hierarchy The experimental

025

030

035

33 32 32 28 26 24 24 19 12 8

040

log(120582)minus1 0 1 2 3

Number of features

Cros

s-va

lidat

ion

erro

r

Figure 4 The results of Liver disorders

03

05

124 112 97 88 77 70 42 18 8

02

04

log(120582)0 1 2 3

1Number of features

Cros

s-va

lidat

ion

erro

r

Figure 5 The results of Sonar

Table 3 The experimental results of the lasso penalized logisticregression model

Datasets Error rate() SD Time (s)

Breast-cancer-Wisconsin 322 00001 03744Ionosphere 3115 00075 04863Liver disorders 3077 00077 00908Sonar 2213 00121 156806

results show that our model is better in classification resultsand more stableThis highlights the advantage of the variableinteractions and hierarchy

52 The Experimental Results for High-Dimensional SmallSample Data The Madelon datasets from NIPS2003 was

6 Mathematical Problems in Engineering

Table 4 The experimental results of the all-pair lasso penalizedlogistic regression model

Datasets Error rate() SD Time (s)

Breast-cancer-Wisconsin 293 00001 121213Ionosphere 2700 00089 31734Liver disorders 2636 00058 49190Sonar 2063 00115 41041

used to evaluate our method The sample numbers of thetraining validation and testing of the data were respectively2000 600 and 1800 The class number is 2 The variabledimension is 500 So the interactive dimension is 124750You can find more information about the datasets followingthe link httpwwwnipsfscecssotonacuk you can alsodownload the datasets and see the results of challengesbalance error rates and the area under the curve The modelis trained by using a training set The model parameters areselected by using a validation set Also the prediction resultsof the final model using the test set are uploaded online Theclassification score of the final model is obtained Our resultsare shown in Table 6 The results show that our method isslightly better than the lasso and all-pair lasso This impliesthat the interactions may also be important in the Madelondatasets

53 Activity Recognition (AR) Using Inertial Sensors of Smart-phones Anguita et al collected sensor data of smartphones[10] They used the support vector machine (SVM) methodto solve the classification problem of the daily life activityrecognition These results play an extremely significant rolein disability and elderly care Datasets can be downloadedfollowing the literature [10] 30 volunteers aged 19ndash48 yearsparticipated in the study Each person performed six activitieswearing the smartphone on the waist To obtain the dataclass label experiments were conducted using video record-ing The smartphone used in the experiments had built-inaccelerometer and gyroscope for measuring 3D linear accel-eration and angular acceleration The sampling frequencywas 50Hz which is more than enough for capturing humanmovements

We use the datasets to evaluate our method We use theupstairs and downstairs movements as two active classesThetraining sets have 986 samples and 1073 samples respectivelyThe test sets have 420 samples and 471 samples respectivelyThe variable dimension is 561 which includes the time andfrequency from sensor signals

Experimental results of the three lassomethods and somepattern recognitionmethods are shown inTable 7The resultsshow that our method is better than the pattern recognitionmethods since it takes the variable selection and interactioninto account Our method achieves the best classificationresults with less training and testing time

54 The Numerical Simulation Results and Discussion Nowsuppose that the number and dimension of the samples are

119899 = 200 119901 = 20 We take interactions into considerationand provide the following three kinds of simulation based onformula (1)

(1) The real model is hierarchical Θ119895119896

= 0 rArr 120573119895

= 0 or120573119896

= 0 119895 119896 = 1 119901 There are 10 nonzero elementsin 120573 and 20 nonzero elements in Θ

(2) The real model only includes interactive variables120573119895= 0 119895 = 1 119901 There are 20 nonzero elements

in Θ

(3) The real model only includes main variables Θ119895119896

=0 119895 119896 = 1 119901 There are 10 nonzero elements in 120573

The SNR of the main variables is 15 and SNR of theinteraction variables is 1 The experiment results of 100 timesare shown in Figure 6

When the real model is hierarchical our method is thebest and the lasso is the worst This is shown in Figure 6(a)When the real model only includes interaction variances theinteractive lasso is the best and our method takes the secondplace while the lasso is still theworst as shown in Figure 6(b)The reason for this result is that when our method fitsthe model the interaction variables are considered to bemain variables When the real model only includes mainvariables the lasso is the best and our method still takes thesecond place and the all-pair lasso is the worst as shown inFigure 6(c)

We believe thatmany actual classification problems couldbe hierarchical and interactive They contain both mainvariables and interaction variables Our method fits in thiskind of situation

6 Conclusion

Taking into consideration the interaction between variablesthe hierarchical interactive lasso penalized logistic regressionusing the coordinate descent algorithm is derivedWeprovidethe model definition constraint condition and the convexrelaxation condition for the model We obtain a solutionfor the coefficients of the proposed model based on theconvex optimization and coordinate descent algorithm Wefurther provide experimental results based on four UCIdatasets NIPS2003 feature selection challenge datasets andtrue daily life activities identification datasets The resultsshow that the interaction widely exists in the classificationmodels They also demonstrate that the variable interactioncontributes to the response The classification performanceof our method is superior to the lasso all-pair lasso andsome pattern recognition methods It turns out that thevariable interaction and hierarchy are two important factorsOur further research is planned as follows other convexoptimization methods including the generalized gradientdescent method or alternating direction multiplier methodthe hierarchical interactive lasso penalized multiclass logisticregressionmethod the elastic net method or the hierarchicalgroup lasso method The application of the multisensorinteraction in the daily life activities of the elderly is a newway of using of our method

Mathematical Problems in Engineering 7

Table 5 The experimental results of the traditional pattern recognition methods

Datasets Classifier Error rate () SD Time (s)

Breast-cancer-Wisconsin

SVM 342 00019 13092LDA 396 00007 02429QDA 492 00014 02309K-NN 461 00030 06636DT 499 00041 16201

Ionosphere

SVM 3583 00063 29149LDA 3310 00090 06168QDA 3201 00076 04516K-NN 2830 00090 06363DT 3823 00408 30865

Liver disorders

SVM 3080 00100 11401LDA 3131 00089 02474QDA 4019 00119 01958K-NN 3689 00124 06089DT 3694 00187 19921

Sonar

SVM 2570 00197 12181LDA 2513 00145 05897QDA 2380 00205 04594K-NN 1309 00101 05345DT 3591 00303 07153

Table 6 The experimental results for the Madelon datasets

Methods Balance error rate () Area under the curve ()Training Validation Testing Training Validation Testing

Lasso 3808 3721 3661 6478 6279 6340All-pair lasso 3498 3701 3657 6502 6299 6343Our method 3000 3635 3512 7000 6808 6776

Appendices

A Proofs from (5) to (6)

For notational convenience we have written 119901119894instead of

119901(119909119894) A logarithmic likelihood function of (4) is as follows

119871119876=

1

119873ln [119871 (120573 Θ)]

=1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

(A1)

First we give the first- and second-order partial derivativeand mixed partial derivative of (4) with respect to 120573 and Θ

120597119871119876

120597120573

=120597 (1119873)sum

119873

119894=1[119910119894(119909119879119894120573 + (12) 119909119879

119894Θ119909119894) + ln (1 minus 119901

119894)]

120597120573

=1

119873

119873

sum119894=1

[(119909119879119894119910119894)119879

minusexp (119909119879

119894120573 + (12) 119909119879

119894Θ119909119894)

1 + exp (119909119879119894120573 + (12) 119909119879

119894Θ119909119894)sdot 119909119894]

=1

119873

119873

sum119894=1

(119910119879119894119909119894minus 119901119879119894119909119894) =

1

119873

119873

sum119894=1

119909119894(119910119894minus 119901119894)

1205972119871119876

1205971205732

=120597

120597120573

1

119873

119873

sum119894=1

119909119879119894(119910119894minus 119901119894) = minus

1

119873

119873

sum119894=1

119909119879119894

120597119901119894

120597120573

=1

119873

119873

sum119894=1

119909119879119894

exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

[1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)]2sdot (minus119909119894)

= minus1

119873

119873

sum119894=1

119909119894

exp (119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

sdot1

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)sdot 119909119879119894

= minus1

119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894

120597119871119876

120597Θ

8 Mathematical Problems in Engineering

Table 7 The experimental results of the three lasso methods and five traditional pattern recognition methods

Methods Error rate in training () Training time (s) Error rate in testing () Testing time (s)Lasso 000 02649 146 00312All-pair lasso 000 08807 135 00953Our method 000 06012 112 00556SVM 000 09516 123 009361-NN 000 00000 920 079563-NN 000 00000 898 07488QDA 661 14196 404 02184DT 000 07332 1493 01092

0055

005

0045

004

0035

003

0025

002

0015

Erro

r rat

e

Lasso Our method All-pair lasso

(a) Hierarchical interaction

0055

005

0045

004

0035

003

0025

002

0015

Erro

r rat

e

Lasso Our method All-pair lasso

(b) Interaction variances only

0055

005

0045

004

0035

003

0025

002

0015

Erro

r rat

e

Lasso Our method All-pair lasso

(c) Main variances only

Figure 6 The error rate of the three lasso methods in the simulation experiment (the Bayes error is shown by the purple dotted line in thegraph)

Mathematical Problems in Engineering 9

=120597 (1119873)sum

119873

119894=1[119910119894(119909119879119894120573 + (12) 119909119879

119894Θ119909119894) + ln (1 minus 119901

119894)]

120597Θ

=1

119873

119873

sum119894=1

[(1

2119910119894119909119879119894119909119894)119879

minusexp (119909119879

119894120573 + (12) 119909119879

119894Θ119909119894)

1 + exp (119909119879119894120573 + (12) 119909119879

119894Θ119909119894)

sdot1

2sdot 119909119879119894119909119894]

=1

119873

119873

sum119894=1

(1

2119910119894119909119879119894119909119894minus1

2119901119894119909119879119894119909119894)

=1

2119873

119873

sum119894=1

119909119879119894119909119894(119910119894minus 119901119894)

1205972119871119876

120597Θ2

=120597

120597Θ

1

2119873

119873

sum119894=1

119909119879119894119909119894(119910119894minus 119901119894) = minus

1

2119873

119873

sum119894=1

119909119879119894119909119894

120597119901119894

120597Θ

=1

2119873

119873

sum119894=1

exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

[1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)]2

sdot (minus1

2119909119879119894119909119894) sdot 119909119879119894119909119894

= minus1

4119873

119873

sum119894=1

exp (119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

sdot1

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)sdot1003817100381710038171003817119909119894

10038171003817100381710038172

= minus1

4119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot1003817100381710038171003817119909119894

10038171003817100381710038172

1205972119871119876

120597120573120597Θ

=120597

120597120573

1

119873

119873

sum119894=1

119909119879119894(119910119894minus 119901119894) = minus

1

119873

119873

sum119894=1

119909119879119894

120597119901119894

120597Θ

=1

119873

119873

sum119894=1

119909119879119894

exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

[1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)]2

sdot (minus1

2119909119879119894119909119894)

= minus1

119873

119873

sum119894=1

119909119879119894

exp (119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

sdot1

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

= minus1

2119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894sdot 119909119879119894

(A2)

Then (A1) is expanded by using Taylor series with respectto the expended point (120573 Θ)

119897119876(120573 Θ)

= 119871119876+ (120573 minus 120573) sdot

120597119871119876

120597120573+ (Θ minus Θ) sdot

120597119871119876

120597Θ

+1

2[(120573 minus 120573) sdot

120597119871119876

1205971205732+ 2 (120573 minus 120573)

119879

(Θ minus Θ)1205972119871119876

120597120573120597Θ

+ (Θ minus Θ) sdot120597119871119876

120597Θ2]

=1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

+ (120573 minus 120573) sdot1

119873

119873

sum119894=1

119909119894(119910119894minus 119901119894)

+ (Θ minus Θ) sdot1

2119873

119873

sum119894=1

119909119879119894119909119894(119910119894minus 119901119894)

minus1

2(120573 minus 120573) sdot

1

119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894

minus 2 (120573 minus 120573)119879

(Θ minus Θ)

sdot1

2119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894sdot 119909119879119894

minus (Θ minus Θ) sdot1

4119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot1003817100381710038171003817119909119894

10038171003817100381710038172

= minus1

2119873

119873

sum119894=1

(1 minus 119901119894)

sdot 119901119894[119909119879119894(120573 minus 120573)]

2

+ (120573 minus 120573)119879

(Θ minus Θ) 119909119894

sdot 119909119879119894119909119894+1

4[119909119879119894(Θ minus Θ) 119909

119894]2

+1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

+1

119873

119873

sum119894=1

119909119879119894(120573 minus 120573) (119910

119894minus 119901119894)

+1

2119873

119873

sum119894=1

119909119879119894(Θ minus Θ) 119909

119894(119910119894minus 119901119894)

= minus1

2119873

119873

sum119894=1

(1 minus 119901119894)

10 Mathematical Problems in Engineering

sdot 119901119894119909119879119894(120573 minus 120573)

+1

2119909119879119894(Θ minus Θ) 119909

1198942

+1

119873

119873

sum119894=1

119909119879119894(120573 minus 120573) (119910

119894minus 119901119894)

+1

2119873

119873

sum119894=1

119909119879119894(Θ minus Θ) 119909

119894(119910119894minus 119901119894)

+1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

= minus1

2119873

119873

sum119894=1

(1 minus 119901119894)

sdot 119901119894119909119879119894(120573 minus 120573) +

1

2119909119879119894(Θ minus Θ) 119909

119894

+119910119894minus 119901119894

119901119894[1 minus 119901

119894]

2

+1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

minus1

2119873

119873

sum119894=1

[119910119894minus 119901119894

119901119894[1 minus 119901

119894]]

2

= minus1

2119873

119873

sum119894=1

120596119894(119911119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896)

2

+1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

minus1

2119873

119873

sum119894=1

[119910119894minus 119901119894

119901119894[1 minus 119901

119894]]

2

(A3)

where

120596119894= 119901 (119909

119894) [1 minus 119901 (119909

119894)]

119911119894= sum119895

120573119895119909119894119895+1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896

+119910119894minus 119901 (119909

119894)

119901 (119909119894) [1 minus 119901 (119909

119894)]

(A4)

B Proofs from (12) to (13)

First there are three cases in calculating 120573+119895minus 120573minus119895

(1) 120573+119895ge 0 120573minus

119895= 0

120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+

119895minus 120573minus119895minus1205821+ 119895

120596119894

(B1)

(2) 120573+119895ge 0 120573minus

119895= 0

120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+

119895minus 120573minus119895+1205821+ 119895

120596119894

(B2)

(3) 120573+119895ge 0 120573minus

119895= 0

120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+

119895minus 120573minus119895 (B3)

We derive

120573+119895minus 120573minus119895= 119878 [119909

119894119895(119911119894minus 119860119894) + 120573+119895minus 120573minus1198951205821+ 119895

120596119894

] (B4)

Secondly we give the following by the stationary condi-tion 120597119871120597Θ

119895119896= 0

1

2120596119894sdot 2(119911

119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896)

sdot (minus1

2119909119894119895119909119894119896) + (120582

2+ 120572119895)119880119895119896= 0

120596119894sdot (119911119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896)

sdot (1

2119909119894119895119909119894119896) = (120582

2+ 120572119895)119880119895119896

(119911119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896)

sdot (1

2119909119894119895119909119894119896) =

(1205822+ 120572119895)119880119895119896

120596119894

(B5)

Supposing

120574(minus119895119896) = 119911119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896+ Θ119895119896119909119894119895119909119894119896 (B6)

we have

(120574(minus119895119896) minus Θ119895119896119909119894119895119909119894119896) sdot (119909119894119895119909119894119896)

=2 (1205822+ 120572119895)119880119895119896

120596119894

Θ119895119896119909119894119895119909119894119896119909119894119895119909119894119896

= minus2 (1205822+ 120572119895)119880119895119896

120596119894

+ 120574(minus119895119896) sdot 119909119894119895119909119894119896

Mathematical Problems in Engineering 11

Θ119895119896

=120574(minus119895119896) sdot 119909

119894119895119909119894119896minus 2 (120582

2+ 120572119895)119880119895119896120596119894

(119909119894119895119909119894119896)2

(B7)

We discuss three cases about Θ119895119896value

(1) Θ119895119896gt 0 119880

119895119896= 1

Θ119895119896=120574(minus119895119896) sdot 119909

119894119895119909119894119896minus 2 (120582

2+ 120572119895) 120596119894

(119909119894119895119909119894119896)2

(B8)

(2) Θ119895119896lt 0 119880

119895119896= minus1

Θ119895119896=120574(minus119895119896) sdot 119909

119894119895119909119894119896+ 2 (120582

2+ 120572119895) 120596119894

(119909119894119895119909119894119896)2

(B9)

(3) Θ119895119896= 0

We drive

Θ119895119896=119878 [120574(minus119895119896) sdot 119909

119894119895119909119894119896 2 (1205822+ 120572119895) 120596119894]

(119909119894119895119909119894119896)2

(B10)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported by the National Natural ScienceFoundation of China (nos 61273019 61473339) China Post-doctoral Science Foundation (2014M561202) Hebei Postdoc-toral Science Foundation Special Fund Project and HebeiTop Young Talents Support Program

References

[1] R Tibshirani ldquoRegression shrinkage and selection via the lassordquoJournal of the Royal Statistical Society B vol 58 no 1 pp 267ndash288 1996

[2] M Y Park and T Hastie ldquo1198711-regularization path algorithm

for generalized linear modelsrdquo Journal of the Royal StatisticalSociety B Statistical Methodology vol 69 no 4 pp 659ndash6772007

[3] T TWu Y F Chen THastie E Sobel andK Lange ldquoGenome-wide association analysis by lasso penalized logistic regressionrdquoBioinformatics vol 25 no 6 pp 714ndash721 2009

[4] J Friedman T Hastie and R Tibshirani ldquoRegularization pathsfor generalized linear models via coordinate descentrdquo Journal ofStatistical Software vol 33 no 1 pp 1ndash22 2010

[5] M Lim and T Hastie ldquoLearning interactions via hierarchicalgroup-lasso regularization[EB]rdquo httpwwwstanfordedusimhastiePapersglinternetpdf

[6] H Schwender and K Ickstadt ldquoIdentification of SNP interac-tions using logic regressionrdquo Biostatistics vol 9 no 1 pp 187ndash198 2008

[7] S Noah and R Tibshirani ldquoA Permutation Approach to TestingInteractions inManyDimensionsrdquo httpstatwebstanfordedusimtibsresearchhtml

[8] JWu BDevlin S RingquistM Trucco andK Roeder ldquoScreenand clean a tool for identifying interactions in genome-wideassociation studiesrdquo Genetic Epidemiology vol 34 no 3 pp275ndash285 2010

[9] Y Nardi and A Rinaldo ldquoThe log-linear group-lasso estimatorand its asymptotic propertiesrdquo Bernoulli vol 18 no 3 pp 945ndash974 2012

[10] M Yuan V R Joseph and Y Lin ldquoAn efficient variable selectionapproach for analyzing designed experimentsrdquo Technometricsvol 49 no 4 pp 430ndash439 2007

[11] H Chipman ldquoBayesian variable selection with related predic-torsrdquoThe Canadian Journal of Statistics vol 24 no 1 pp 17ndash361996

[12] N H Choi W Li and J Zhu ldquoVariable selection with thestrong heredity constraint and its oracle propertyrdquo Journal of theAmerican Statistical Association vol 105 no 489 pp 354ndash3642010

[13] J Bien J Taylor and R Tibshirani ldquoA lasso for hierarchicalinteractionsrdquoTheAnnals of Statistics vol 41 no 3 pp 1111ndash11412013

[14] M Yuan and Y Lin ldquoModel selection and estimation in regres-sion with grouped variablesrdquo Journal of the Royal StatisticalSociety Series B StatisticalMethodology vol 68 no 1 pp 49ndash672006

[15] R Jenatton J-Y Audibert and F Bach ldquoStructured variableselection with sparsity-inducing normsrdquo Journal of MachineLearning Research vol 12 no 10 pp 2777ndash2824 2011

[16] P Radchenko and G M James ldquoVariable selection usingadaptive nonlinear interaction structures in high dimensionsrdquoJournal of the American Statistical Association vol 105 no 492pp 1541ndash1553 2010

[17] F Bach R Jenatton J Mairal and G Obozinski ldquoStructuredsparsity through convex optimizationrdquo Statistical Science vol27 no 4 pp 450ndash468 2012

[18] I Ruczinski C Kooperberg and M LeBlanc ldquoLogic regres-sionrdquo Journal of Computational and Graphical Statistics vol 12no 3 pp 475ndash511 2003

[19] P Hall and J-H Xue ldquoOn selecting interacting features fromhigh-dimensional datardquo Computational Statistics amp Data Anal-ysis vol 71 pp 694ndash708 2014

[20] J-J Wang J Li T Zhang and W-X Hong ldquoDistinguishingvisual feature extraction method using quadratic map andgenetic algorithmrdquo Journal of System Simulation vol 21 no 16pp 5080ndash5083 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 3: Research Article Coordinate Descent Based Hierarchical ...downloads.hindawi.com/journals/mpe/2014/430201.pdf · Coordinate Descent Based Hierarchical Interactive Lasso Penalized Logistic

Mathematical Problems in Engineering 3

We use the second-order Taylor expansion at the currentestimated value (120573 Θ) for (5) and obtain the subproblem asfollows

119897119876(120573 Θ) =

1

2120596119894(119911119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896)

2

+(119910119894minus 119901 (xi))

2

2 lowast 1199012 (xi) [1 minus 119901 (xi)]2

+ [

[

119910119894(sum119895

120573119895119909119894119895+ sum119895 =119896

Θ119895119896119909119894119895119909119894119896)

+ ln (1 minus 119901 (xi))]]

(6)

where 120596119894= 119901(xi)[1 minus 119901(xi)] 119911119894 = sum

119895120573119895119909119894119895+ (12)sum

119895 =119896Θ119895119896

119909119894119895119909119894119896+ (119910119894minus 119901(xi))119901(xi)[1 minus 119901(xi)]

The proof that (5) implies (6) is presented in theAppendix A

In order to obtain the sparse solution for themain variablecoefficients and interaction coefficients the penalty functionis used to enhance the stability of the interactive model

(120573 Θ) = arg Min120573isin119877119901+1Θisin119877119901times119901

minus 119897119876(120573 Θ) + 120582

1

100381710038171003817100381712057310038171003817100381710038171

+ 1205822 Θ1

(7)

We focus on those interactions that have large main vari-able coefficients Such restrictions are known as ldquohierarchyrdquoThe mathematical expression for them is Θ

119895119896= 0 rArr 120573

119895= 0

or 120573119896

= 0 So we add the constraints enforcing the hierarchyinto (7) as follows

(120573 Θ)

= Min120573isin119877119901+1Θisin119877119901times119901

minus 119897119876(120573 Θ) + 120582

1

100381710038171003817100381712057310038171003817100381710038171 + 1205822 Θ1

st 10038171003817100381710038171003817Θ119895100381710038171003817100381710038171 le

1003816100381610038161003816100381612057311989510038161003816100381610038161003816 for 119895 = 1 119901

(8)

where Θ119895is the 119895th column of Θ If Θ

119895119896= 0 then Θ

1198951gt 0

Θ1198961gt 0 so 120573

119895= 0 and 120573

119896= 0 The new constraint

guarantees the hierarchy but we cannot obtain a convexsolution because (8) is not convex So instead of 120573 we use120573+ 120573minus And the corresponding convex relaxation of (8) is asfollows

(120573plusmn Θ) = Min1205730isin119877120573

plusmnisin119877119901Θisin119877119901times119901

minus 119897119876(120573+ minus 120573minus Θ)

+ 12058211119879 (120573+ + 120573minus) + 120582

2sum119895

10038171003817100381710038171003817Θ11989510038171003817100381710038171003817

st10038171003817100381710038171003817Θ119895

100381710038171003817100381710038171 le 120573+119895+ 120573minus119895

120573+119895ge 0 120573minus

119895ge 0

for 119895 = 1 119901

(9)

where 120573 = 120573+ minus 120573minus 120573plusmn = maxplusmn120573 0 120573+ 120573minus isin 119877119901+1 and1205731= 120573+ + 120573minus

4 Coordinate Descent Algorithm andKKT Conditions

The basic idea of the coordinate descent algorithm is toconvert multivariate problems into multiple single variablesubproblems It allows optimizing only one-dimensionalvariables at a time The solution can be updated in a cycleWe solve (9) using the coordinate descent algorithm

The Lagrange function corresponding to (9) is as follows

119871 (120573+ 120573minus Θ) = minus119897119876(120573+ minus 120573minus Θ)

+ (12058211 minus minus 120574+1)

119879

120573+

+ (12058211 minus minus 120574minus1)

119879

120573minus

+ ⟨diag (12058221 + )119880Θ⟩

(10)

where

119895119896=

isin [minus1 1] Θ119896119895= 0

sign (Θ119895119896) Θ119895119896

= 0(11)

and and 120574plusmn are the dual variables corresponding tothe hierarchical constraint and the nonnegative constraintsFormula (10) can be decomposed into 119901 subproblems

119871 (120573+119895 120573minus119895 Θ119895) = minus119897

119876(120573+119895minus 120573minus119895 Θ119895)

+ (1205821minus 119895minus 120574+)119879

120573+119895

+ (1205821minus 119895minus 120574minus)119879

120573minus119895

+ ⟨(1205822+ 120572)119880

119895 Θ119895⟩

(12)

The solution of (12) as a convex problem can be obtainedby a set of optimality conditions known as the KKT (Karush-Kuhn-Tucker) conditions This is the key advantage of ourapproach

The stationary conditions of (12) according to KKT are120597119871120597120573plusmn

119895= 0 120597119871120597Θ

119895119896= 0The complementary conditions are

Θ1198951le 120573+119895+120573minus119895 120574plusmn119895120573plusmn119895= 0 120573plusmn ge 0 120574plusmn ge 0 ge 0

119895(Θ1198951minus

120573+119895minus 120573minus119895) = 0 We assume that (1119873)sum119873

119894=11199092119894119895= 1 For our

problem the KKT conditions can be written as follows

120573+119895minus 120573minus119895= 119878 [119909

119894119895(119911119894minus 119860119894) + 120573+119895minus 120573minus1198951205821+ 119895

120596119894

]

Θ119895119896=119878 [(119909119894119895119909119894119896)119879

(119911119894minus 119860119894+ Θ119895119896119909119894119895119909119894119896) 2 (120582

2+ 119895) 120596119894]

10038171003817100381710038171003817119909119894119895119909119894119896100381710038171003817100381710038172

(13)

where119860119894= sum119895120573119895119909119894119895+(12)sum

119895 =119896Θ119895119896119909119894119895119909119894119896 119911119894120596119894are from (6)

and 119878 denotes the soft-threshold operator defined by 119878(119888 120582) =sign(119888)(|119888| minus 120582)

+

4 Mathematical Problems in Engineering

Table 1 Datasets information

Datasets Samples Variable dimension Interactive dimension Class1 Breast-cancer-Wisconsin 683 9 36 22 Ionosphere 351 33 528 23 Liver disorders 345 6 15 24 Sonar 208 60 1770 2

Table 2 The experimental results of our method

Datasets Coefficients Error rate () SD Time (s) 120582

Breast-cancer-Wisconsin 22 30 001 949 314Ionosphere 101 280 002 12463 282Liver disorders 25 260 002 1211 237Sonar 117 140 002 15069 073

The proof of both expressions (13) can be found inAppendix B

Now we define 119891(119895) = Θ

1198951minus 120573+119895minus 120573minus119895 Then

119891 (119895)

=

10038171003817100381710038171003817100381710038171003817100381710038171003817

119875

sum119896=1

119878 [(119909119894119895119909119894119896)119879

(119911119894minus 119860119894+ Θ119895119896119909119894119895119909119894119896)

2 (1205822+ 119895)

120596119894

]

sdot (10038171003817100381710038171003817119909119894119895119909119894119896

100381710038171003817100381710038172

2)minus1

100381710038171003817100381710038171003817100381710038171003817100381710038171

minus100381710038171003817100381710038171003817100381710038171003817119878 [119909119894119895(119911119894minus 119860119894) + 120573+119895minus 120573minus1198951205821+ 119895

120596119894

]1003817100381710038171003817100381710038171003817100381710038171

(14)

The remaining KKT conditions only involve 119891() = 0119891() le 0 ge 0 Observing that 119891 is nonincreasing withrespect to and is piecewise linear it is easy to get the solutionfor

In conclusion the overall idea of the coordinate descentalgorithm is that the minimization of (9) is equivalent to theminimization of (10) Formula (10) can be decomposed into119901 independent formula (12) Formula (12) can be solved as(13) The final coefficient optimization iteration formula is asfollows

120573+(119898+1)119895

minus 120573minus(119898+1)119895

= 119878 [119909119894119895(119911119898119894minus 119860119898119894) + 120573+(119898)119895

minus 120573minus(119898)119895

1205821+ 119898119895

120596119896119894

]

Θ119898+1119895119896

=119878 [(119909119894119895119909119894119896) (119911119898119894minus 119860119898119894+ Θ119898119895119896119909119894119895119909119894119896) 2 (120582

2+ 119898119895) 120596119898119894]

10038171003817100381710038171003817119909119894119895119909119894119896100381710038171003817100381710038172

(15)

where 120573+(119898)119895

minus 120573minus(119898)119895

is the estimated value of the 119895th mainvariable coefficient after119898 iterations and Θ119898

119895119896is the estimation

of the interaction coefficient between the 119895th variable and the119896th variable after119898 iterations

5 The Experimental Results and Analysis

51 The Experimental Results and Analysis of Four UCIDatasets There are four UCI study database datasets whichinclude the breast-cancer-Wisconsin datasets Ionospheredatasets Liver disorders datasets and Sonar datasets asshown in Table 1

We do the 10-fold cross-validation (10-CV) experimentsin the paper for 20 times using 119877 where 120582

1= 2120582

2= 120582

Besideswe complete an experiment employing the interactivehierarchical lasso logistic regression method The resultsinclude the number of nonzero variable coefficients averageerror rate of the 10-CV standard deviation (SD) CPU timeand the value of lambda (120582) estimated The results are shownin Table 2

The results of the 10-CV based on the four datasets arepresented in Figures 2 to 5 using the proposedmethod In thefigures the horizontal axis represents the logarithmic value of120582 and the vertical axis is the error rate of the 10-CV Besidesthe horizontal axis at the top of each figure represents thenumber of nonzero variable coefficients corresponding to the120582 value

The results for the breast-cancer-Wisconsin datasets areshown in Figure 2 The minimum error rate is 003 and thenumber of selected variables is more than 11 The results forthe Ionosphere experimental datasets are shown in Figure 3When the number of selected variables is 101 the lowest errorrate is 028 with the smaller standard deviation The numberof the selected variables is larger than the original dimensionso the interactions provide the classified information Theresults for the Liver disorders datasets are presented inFigure 4 If the number of selected variables is 25 the lowesterror rate can reach 026 while the standard deviation is002 Finally the results for the Sonar datasets are presentedin Figure 5 When more than 80 variables are selected theminimum error rate is 014

Mathematical Problems in Engineering 5

005

015

025

035

Cros

s-va

lidat

ion

erro

r

24 20 14 14 13 13 11 11 6Number of features

02

3

1 2 3 4 5

log(120582)

Figure 2 The results of breast-cancer-Wisconsin

028

030

032

034

036

038

177 139 104 83 73 53 30 9Number of features

3

Cros

s-va

lidat

ion

erro

r

log(120582)minus1 0 1 2 3

Figure 3 The results of Ionosphere

In what follows we compare our method to the existingliterature [13] The classification results and training time ofour method are better than those shown in the literature[13] The experimental results of the lasso all-pair lassoand conventional pattern recognition methods with 10-foldcross-validation of 20 times in the four UCI datasets arelisted respectively in Tables 3 4 and 5 Conventional patternrecognitionmethods include support vectormachine (SVM)linear discriminant analysis (LDA) quadratic discriminantanalysis (QDA)119870-nearest neighborhood (119870-NN) and deci-sion tree (DT) methods The lasso is a method that considersthe main variables without the interaction variables All-pairlasso is a method that considers the main variables and inter-action variables but without the hierarchy The experimental

025

030

035

33 32 32 28 26 24 24 19 12 8

040

log(120582)minus1 0 1 2 3

Number of features

Cros

s-va

lidat

ion

erro

r

Figure 4 The results of Liver disorders

03

05

124 112 97 88 77 70 42 18 8

02

04

log(120582)0 1 2 3

1Number of features

Cros

s-va

lidat

ion

erro

r

Figure 5 The results of Sonar

Table 3 The experimental results of the lasso penalized logisticregression model

Datasets Error rate() SD Time (s)

Breast-cancer-Wisconsin 322 00001 03744Ionosphere 3115 00075 04863Liver disorders 3077 00077 00908Sonar 2213 00121 156806

results show that our model is better in classification resultsand more stableThis highlights the advantage of the variableinteractions and hierarchy

52 The Experimental Results for High-Dimensional SmallSample Data The Madelon datasets from NIPS2003 was

6 Mathematical Problems in Engineering

Table 4 The experimental results of the all-pair lasso penalizedlogistic regression model

Datasets Error rate() SD Time (s)

Breast-cancer-Wisconsin 293 00001 121213Ionosphere 2700 00089 31734Liver disorders 2636 00058 49190Sonar 2063 00115 41041

used to evaluate our method The sample numbers of thetraining validation and testing of the data were respectively2000 600 and 1800 The class number is 2 The variabledimension is 500 So the interactive dimension is 124750You can find more information about the datasets followingthe link httpwwwnipsfscecssotonacuk you can alsodownload the datasets and see the results of challengesbalance error rates and the area under the curve The modelis trained by using a training set The model parameters areselected by using a validation set Also the prediction resultsof the final model using the test set are uploaded online Theclassification score of the final model is obtained Our resultsare shown in Table 6 The results show that our method isslightly better than the lasso and all-pair lasso This impliesthat the interactions may also be important in the Madelondatasets

53 Activity Recognition (AR) Using Inertial Sensors of Smart-phones Anguita et al collected sensor data of smartphones[10] They used the support vector machine (SVM) methodto solve the classification problem of the daily life activityrecognition These results play an extremely significant rolein disability and elderly care Datasets can be downloadedfollowing the literature [10] 30 volunteers aged 19ndash48 yearsparticipated in the study Each person performed six activitieswearing the smartphone on the waist To obtain the dataclass label experiments were conducted using video record-ing The smartphone used in the experiments had built-inaccelerometer and gyroscope for measuring 3D linear accel-eration and angular acceleration The sampling frequencywas 50Hz which is more than enough for capturing humanmovements

We use the datasets to evaluate our method We use theupstairs and downstairs movements as two active classesThetraining sets have 986 samples and 1073 samples respectivelyThe test sets have 420 samples and 471 samples respectivelyThe variable dimension is 561 which includes the time andfrequency from sensor signals

Experimental results of the three lassomethods and somepattern recognitionmethods are shown inTable 7The resultsshow that our method is better than the pattern recognitionmethods since it takes the variable selection and interactioninto account Our method achieves the best classificationresults with less training and testing time

54 The Numerical Simulation Results and Discussion Nowsuppose that the number and dimension of the samples are

119899 = 200 119901 = 20 We take interactions into considerationand provide the following three kinds of simulation based onformula (1)

(1) The real model is hierarchical Θ119895119896

= 0 rArr 120573119895

= 0 or120573119896

= 0 119895 119896 = 1 119901 There are 10 nonzero elementsin 120573 and 20 nonzero elements in Θ

(2) The real model only includes interactive variables120573119895= 0 119895 = 1 119901 There are 20 nonzero elements

in Θ

(3) The real model only includes main variables Θ119895119896

=0 119895 119896 = 1 119901 There are 10 nonzero elements in 120573

The SNR of the main variables is 15 and SNR of theinteraction variables is 1 The experiment results of 100 timesare shown in Figure 6

When the real model is hierarchical our method is thebest and the lasso is the worst This is shown in Figure 6(a)When the real model only includes interaction variances theinteractive lasso is the best and our method takes the secondplace while the lasso is still theworst as shown in Figure 6(b)The reason for this result is that when our method fitsthe model the interaction variables are considered to bemain variables When the real model only includes mainvariables the lasso is the best and our method still takes thesecond place and the all-pair lasso is the worst as shown inFigure 6(c)

We believe thatmany actual classification problems couldbe hierarchical and interactive They contain both mainvariables and interaction variables Our method fits in thiskind of situation

6 Conclusion

Taking into consideration the interaction between variablesthe hierarchical interactive lasso penalized logistic regressionusing the coordinate descent algorithm is derivedWeprovidethe model definition constraint condition and the convexrelaxation condition for the model We obtain a solutionfor the coefficients of the proposed model based on theconvex optimization and coordinate descent algorithm Wefurther provide experimental results based on four UCIdatasets NIPS2003 feature selection challenge datasets andtrue daily life activities identification datasets The resultsshow that the interaction widely exists in the classificationmodels They also demonstrate that the variable interactioncontributes to the response The classification performanceof our method is superior to the lasso all-pair lasso andsome pattern recognition methods It turns out that thevariable interaction and hierarchy are two important factorsOur further research is planned as follows other convexoptimization methods including the generalized gradientdescent method or alternating direction multiplier methodthe hierarchical interactive lasso penalized multiclass logisticregressionmethod the elastic net method or the hierarchicalgroup lasso method The application of the multisensorinteraction in the daily life activities of the elderly is a newway of using of our method

Mathematical Problems in Engineering 7

Table 5 The experimental results of the traditional pattern recognition methods

Datasets Classifier Error rate () SD Time (s)

Breast-cancer-Wisconsin

SVM 342 00019 13092LDA 396 00007 02429QDA 492 00014 02309K-NN 461 00030 06636DT 499 00041 16201

Ionosphere

SVM 3583 00063 29149LDA 3310 00090 06168QDA 3201 00076 04516K-NN 2830 00090 06363DT 3823 00408 30865

Liver disorders

SVM 3080 00100 11401LDA 3131 00089 02474QDA 4019 00119 01958K-NN 3689 00124 06089DT 3694 00187 19921

Sonar

SVM 2570 00197 12181LDA 2513 00145 05897QDA 2380 00205 04594K-NN 1309 00101 05345DT 3591 00303 07153

Table 6 The experimental results for the Madelon datasets

Methods Balance error rate () Area under the curve ()Training Validation Testing Training Validation Testing

Lasso 3808 3721 3661 6478 6279 6340All-pair lasso 3498 3701 3657 6502 6299 6343Our method 3000 3635 3512 7000 6808 6776

Appendices

A Proofs from (5) to (6)

For notational convenience we have written 119901119894instead of

119901(119909119894) A logarithmic likelihood function of (4) is as follows

119871119876=

1

119873ln [119871 (120573 Θ)]

=1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

(A1)

First we give the first- and second-order partial derivativeand mixed partial derivative of (4) with respect to 120573 and Θ

120597119871119876

120597120573

=120597 (1119873)sum

119873

119894=1[119910119894(119909119879119894120573 + (12) 119909119879

119894Θ119909119894) + ln (1 minus 119901

119894)]

120597120573

=1

119873

119873

sum119894=1

[(119909119879119894119910119894)119879

minusexp (119909119879

119894120573 + (12) 119909119879

119894Θ119909119894)

1 + exp (119909119879119894120573 + (12) 119909119879

119894Θ119909119894)sdot 119909119894]

=1

119873

119873

sum119894=1

(119910119879119894119909119894minus 119901119879119894119909119894) =

1

119873

119873

sum119894=1

119909119894(119910119894minus 119901119894)

1205972119871119876

1205971205732

=120597

120597120573

1

119873

119873

sum119894=1

119909119879119894(119910119894minus 119901119894) = minus

1

119873

119873

sum119894=1

119909119879119894

120597119901119894

120597120573

=1

119873

119873

sum119894=1

119909119879119894

exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

[1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)]2sdot (minus119909119894)

= minus1

119873

119873

sum119894=1

119909119894

exp (119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

sdot1

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)sdot 119909119879119894

= minus1

119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894

120597119871119876

120597Θ

8 Mathematical Problems in Engineering

Table 7 The experimental results of the three lasso methods and five traditional pattern recognition methods

Methods Error rate in training () Training time (s) Error rate in testing () Testing time (s)Lasso 000 02649 146 00312All-pair lasso 000 08807 135 00953Our method 000 06012 112 00556SVM 000 09516 123 009361-NN 000 00000 920 079563-NN 000 00000 898 07488QDA 661 14196 404 02184DT 000 07332 1493 01092

0055

005

0045

004

0035

003

0025

002

0015

Erro

r rat

e

Lasso Our method All-pair lasso

(a) Hierarchical interaction

0055

005

0045

004

0035

003

0025

002

0015

Erro

r rat

e

Lasso Our method All-pair lasso

(b) Interaction variances only

0055

005

0045

004

0035

003

0025

002

0015

Erro

r rat

e

Lasso Our method All-pair lasso

(c) Main variances only

Figure 6 The error rate of the three lasso methods in the simulation experiment (the Bayes error is shown by the purple dotted line in thegraph)

Mathematical Problems in Engineering 9

=120597 (1119873)sum

119873

119894=1[119910119894(119909119879119894120573 + (12) 119909119879

119894Θ119909119894) + ln (1 minus 119901

119894)]

120597Θ

=1

119873

119873

sum119894=1

[(1

2119910119894119909119879119894119909119894)119879

minusexp (119909119879

119894120573 + (12) 119909119879

119894Θ119909119894)

1 + exp (119909119879119894120573 + (12) 119909119879

119894Θ119909119894)

sdot1

2sdot 119909119879119894119909119894]

=1

119873

119873

sum119894=1

(1

2119910119894119909119879119894119909119894minus1

2119901119894119909119879119894119909119894)

=1

2119873

119873

sum119894=1

119909119879119894119909119894(119910119894minus 119901119894)

1205972119871119876

120597Θ2

=120597

120597Θ

1

2119873

119873

sum119894=1

119909119879119894119909119894(119910119894minus 119901119894) = minus

1

2119873

119873

sum119894=1

119909119879119894119909119894

120597119901119894

120597Θ

=1

2119873

119873

sum119894=1

exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

[1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)]2

sdot (minus1

2119909119879119894119909119894) sdot 119909119879119894119909119894

= minus1

4119873

119873

sum119894=1

exp (119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

sdot1

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)sdot1003817100381710038171003817119909119894

10038171003817100381710038172

= minus1

4119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot1003817100381710038171003817119909119894

10038171003817100381710038172

1205972119871119876

120597120573120597Θ

=120597

120597120573

1

119873

119873

sum119894=1

119909119879119894(119910119894minus 119901119894) = minus

1

119873

119873

sum119894=1

119909119879119894

120597119901119894

120597Θ

=1

119873

119873

sum119894=1

119909119879119894

exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

[1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)]2

sdot (minus1

2119909119879119894119909119894)

= minus1

119873

119873

sum119894=1

119909119879119894

exp (119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

sdot1

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

= minus1

2119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894sdot 119909119879119894

(A2)

Then (A1) is expanded by using Taylor series with respectto the expended point (120573 Θ)

119897119876(120573 Θ)

= 119871119876+ (120573 minus 120573) sdot

120597119871119876

120597120573+ (Θ minus Θ) sdot

120597119871119876

120597Θ

+1

2[(120573 minus 120573) sdot

120597119871119876

1205971205732+ 2 (120573 minus 120573)

119879

(Θ minus Θ)1205972119871119876

120597120573120597Θ

+ (Θ minus Θ) sdot120597119871119876

120597Θ2]

=1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

+ (120573 minus 120573) sdot1

119873

119873

sum119894=1

119909119894(119910119894minus 119901119894)

+ (Θ minus Θ) sdot1

2119873

119873

sum119894=1

119909119879119894119909119894(119910119894minus 119901119894)

minus1

2(120573 minus 120573) sdot

1

119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894

minus 2 (120573 minus 120573)119879

(Θ minus Θ)

sdot1

2119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894sdot 119909119879119894

minus (Θ minus Θ) sdot1

4119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot1003817100381710038171003817119909119894

10038171003817100381710038172

= minus1

2119873

119873

sum119894=1

(1 minus 119901119894)

sdot 119901119894[119909119879119894(120573 minus 120573)]

2

+ (120573 minus 120573)119879

(Θ minus Θ) 119909119894

sdot 119909119879119894119909119894+1

4[119909119879119894(Θ minus Θ) 119909

119894]2

+1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

+1

119873

119873

sum119894=1

119909119879119894(120573 minus 120573) (119910

119894minus 119901119894)

+1

2119873

119873

sum119894=1

119909119879119894(Θ minus Θ) 119909

119894(119910119894minus 119901119894)

= minus1

2119873

119873

sum119894=1

(1 minus 119901119894)

10 Mathematical Problems in Engineering

sdot 119901119894119909119879119894(120573 minus 120573)

+1

2119909119879119894(Θ minus Θ) 119909

1198942

+1

119873

119873

sum119894=1

119909119879119894(120573 minus 120573) (119910

119894minus 119901119894)

+1

2119873

119873

sum119894=1

119909119879119894(Θ minus Θ) 119909

119894(119910119894minus 119901119894)

+1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

= minus1

2119873

119873

sum119894=1

(1 minus 119901119894)

sdot 119901119894119909119879119894(120573 minus 120573) +

1

2119909119879119894(Θ minus Θ) 119909

119894

+119910119894minus 119901119894

119901119894[1 minus 119901

119894]

2

+1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

minus1

2119873

119873

sum119894=1

[119910119894minus 119901119894

119901119894[1 minus 119901

119894]]

2

= minus1

2119873

119873

sum119894=1

120596119894(119911119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896)

2

+1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

minus1

2119873

119873

sum119894=1

[119910119894minus 119901119894

119901119894[1 minus 119901

119894]]

2

(A3)

where

120596119894= 119901 (119909

119894) [1 minus 119901 (119909

119894)]

119911119894= sum119895

120573119895119909119894119895+1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896

+119910119894minus 119901 (119909

119894)

119901 (119909119894) [1 minus 119901 (119909

119894)]

(A4)

B Proofs from (12) to (13)

First there are three cases in calculating 120573+119895minus 120573minus119895

(1) 120573+119895ge 0 120573minus

119895= 0

120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+

119895minus 120573minus119895minus1205821+ 119895

120596119894

(B1)

(2) 120573+119895ge 0 120573minus

119895= 0

120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+

119895minus 120573minus119895+1205821+ 119895

120596119894

(B2)

(3) 120573+119895ge 0 120573minus

119895= 0

120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+

119895minus 120573minus119895 (B3)

We derive

120573+119895minus 120573minus119895= 119878 [119909

119894119895(119911119894minus 119860119894) + 120573+119895minus 120573minus1198951205821+ 119895

120596119894

] (B4)

Secondly we give the following by the stationary condi-tion 120597119871120597Θ

119895119896= 0

1

2120596119894sdot 2(119911

119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896)

sdot (minus1

2119909119894119895119909119894119896) + (120582

2+ 120572119895)119880119895119896= 0

120596119894sdot (119911119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896)

sdot (1

2119909119894119895119909119894119896) = (120582

2+ 120572119895)119880119895119896

(119911119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896)

sdot (1

2119909119894119895119909119894119896) =

(1205822+ 120572119895)119880119895119896

120596119894

(B5)

Supposing

120574(minus119895119896) = 119911119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896+ Θ119895119896119909119894119895119909119894119896 (B6)

we have

(120574(minus119895119896) minus Θ119895119896119909119894119895119909119894119896) sdot (119909119894119895119909119894119896)

=2 (1205822+ 120572119895)119880119895119896

120596119894

Θ119895119896119909119894119895119909119894119896119909119894119895119909119894119896

= minus2 (1205822+ 120572119895)119880119895119896

120596119894

+ 120574(minus119895119896) sdot 119909119894119895119909119894119896

Mathematical Problems in Engineering 11

Θ119895119896

=120574(minus119895119896) sdot 119909

119894119895119909119894119896minus 2 (120582

2+ 120572119895)119880119895119896120596119894

(119909119894119895119909119894119896)2

(B7)

We discuss three cases about Θ119895119896value

(1) Θ119895119896gt 0 119880

119895119896= 1

Θ119895119896=120574(minus119895119896) sdot 119909

119894119895119909119894119896minus 2 (120582

2+ 120572119895) 120596119894

(119909119894119895119909119894119896)2

(B8)

(2) Θ119895119896lt 0 119880

119895119896= minus1

Θ119895119896=120574(minus119895119896) sdot 119909

119894119895119909119894119896+ 2 (120582

2+ 120572119895) 120596119894

(119909119894119895119909119894119896)2

(B9)

(3) Θ119895119896= 0

We drive

Θ119895119896=119878 [120574(minus119895119896) sdot 119909

119894119895119909119894119896 2 (1205822+ 120572119895) 120596119894]

(119909119894119895119909119894119896)2

(B10)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported by the National Natural ScienceFoundation of China (nos 61273019 61473339) China Post-doctoral Science Foundation (2014M561202) Hebei Postdoc-toral Science Foundation Special Fund Project and HebeiTop Young Talents Support Program

References

[1] R Tibshirani ldquoRegression shrinkage and selection via the lassordquoJournal of the Royal Statistical Society B vol 58 no 1 pp 267ndash288 1996

[2] M Y Park and T Hastie ldquo1198711-regularization path algorithm

for generalized linear modelsrdquo Journal of the Royal StatisticalSociety B Statistical Methodology vol 69 no 4 pp 659ndash6772007

[3] T TWu Y F Chen THastie E Sobel andK Lange ldquoGenome-wide association analysis by lasso penalized logistic regressionrdquoBioinformatics vol 25 no 6 pp 714ndash721 2009

[4] J Friedman T Hastie and R Tibshirani ldquoRegularization pathsfor generalized linear models via coordinate descentrdquo Journal ofStatistical Software vol 33 no 1 pp 1ndash22 2010

[5] M Lim and T Hastie ldquoLearning interactions via hierarchicalgroup-lasso regularization[EB]rdquo httpwwwstanfordedusimhastiePapersglinternetpdf

[6] H Schwender and K Ickstadt ldquoIdentification of SNP interac-tions using logic regressionrdquo Biostatistics vol 9 no 1 pp 187ndash198 2008

[7] S Noah and R Tibshirani ldquoA Permutation Approach to TestingInteractions inManyDimensionsrdquo httpstatwebstanfordedusimtibsresearchhtml

[8] JWu BDevlin S RingquistM Trucco andK Roeder ldquoScreenand clean a tool for identifying interactions in genome-wideassociation studiesrdquo Genetic Epidemiology vol 34 no 3 pp275ndash285 2010

[9] Y Nardi and A Rinaldo ldquoThe log-linear group-lasso estimatorand its asymptotic propertiesrdquo Bernoulli vol 18 no 3 pp 945ndash974 2012

[10] M Yuan V R Joseph and Y Lin ldquoAn efficient variable selectionapproach for analyzing designed experimentsrdquo Technometricsvol 49 no 4 pp 430ndash439 2007

[11] H Chipman ldquoBayesian variable selection with related predic-torsrdquoThe Canadian Journal of Statistics vol 24 no 1 pp 17ndash361996

[12] N H Choi W Li and J Zhu ldquoVariable selection with thestrong heredity constraint and its oracle propertyrdquo Journal of theAmerican Statistical Association vol 105 no 489 pp 354ndash3642010

[13] J Bien J Taylor and R Tibshirani ldquoA lasso for hierarchicalinteractionsrdquoTheAnnals of Statistics vol 41 no 3 pp 1111ndash11412013

[14] M Yuan and Y Lin ldquoModel selection and estimation in regres-sion with grouped variablesrdquo Journal of the Royal StatisticalSociety Series B StatisticalMethodology vol 68 no 1 pp 49ndash672006

[15] R Jenatton J-Y Audibert and F Bach ldquoStructured variableselection with sparsity-inducing normsrdquo Journal of MachineLearning Research vol 12 no 10 pp 2777ndash2824 2011

[16] P Radchenko and G M James ldquoVariable selection usingadaptive nonlinear interaction structures in high dimensionsrdquoJournal of the American Statistical Association vol 105 no 492pp 1541ndash1553 2010

[17] F Bach R Jenatton J Mairal and G Obozinski ldquoStructuredsparsity through convex optimizationrdquo Statistical Science vol27 no 4 pp 450ndash468 2012

[18] I Ruczinski C Kooperberg and M LeBlanc ldquoLogic regres-sionrdquo Journal of Computational and Graphical Statistics vol 12no 3 pp 475ndash511 2003

[19] P Hall and J-H Xue ldquoOn selecting interacting features fromhigh-dimensional datardquo Computational Statistics amp Data Anal-ysis vol 71 pp 694ndash708 2014

[20] J-J Wang J Li T Zhang and W-X Hong ldquoDistinguishingvisual feature extraction method using quadratic map andgenetic algorithmrdquo Journal of System Simulation vol 21 no 16pp 5080ndash5083 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 4: Research Article Coordinate Descent Based Hierarchical ...downloads.hindawi.com/journals/mpe/2014/430201.pdf · Coordinate Descent Based Hierarchical Interactive Lasso Penalized Logistic

4 Mathematical Problems in Engineering

Table 1 Datasets information

Datasets Samples Variable dimension Interactive dimension Class1 Breast-cancer-Wisconsin 683 9 36 22 Ionosphere 351 33 528 23 Liver disorders 345 6 15 24 Sonar 208 60 1770 2

Table 2 The experimental results of our method

Datasets Coefficients Error rate () SD Time (s) 120582

Breast-cancer-Wisconsin 22 30 001 949 314Ionosphere 101 280 002 12463 282Liver disorders 25 260 002 1211 237Sonar 117 140 002 15069 073

The proof of both expressions (13) can be found inAppendix B

Now we define 119891(119895) = Θ

1198951minus 120573+119895minus 120573minus119895 Then

119891 (119895)

=

10038171003817100381710038171003817100381710038171003817100381710038171003817

119875

sum119896=1

119878 [(119909119894119895119909119894119896)119879

(119911119894minus 119860119894+ Θ119895119896119909119894119895119909119894119896)

2 (1205822+ 119895)

120596119894

]

sdot (10038171003817100381710038171003817119909119894119895119909119894119896

100381710038171003817100381710038172

2)minus1

100381710038171003817100381710038171003817100381710038171003817100381710038171

minus100381710038171003817100381710038171003817100381710038171003817119878 [119909119894119895(119911119894minus 119860119894) + 120573+119895minus 120573minus1198951205821+ 119895

120596119894

]1003817100381710038171003817100381710038171003817100381710038171

(14)

The remaining KKT conditions only involve 119891() = 0119891() le 0 ge 0 Observing that 119891 is nonincreasing withrespect to and is piecewise linear it is easy to get the solutionfor

In conclusion the overall idea of the coordinate descentalgorithm is that the minimization of (9) is equivalent to theminimization of (10) Formula (10) can be decomposed into119901 independent formula (12) Formula (12) can be solved as(13) The final coefficient optimization iteration formula is asfollows

120573+(119898+1)119895

minus 120573minus(119898+1)119895

= 119878 [119909119894119895(119911119898119894minus 119860119898119894) + 120573+(119898)119895

minus 120573minus(119898)119895

1205821+ 119898119895

120596119896119894

]

Θ119898+1119895119896

=119878 [(119909119894119895119909119894119896) (119911119898119894minus 119860119898119894+ Θ119898119895119896119909119894119895119909119894119896) 2 (120582

2+ 119898119895) 120596119898119894]

10038171003817100381710038171003817119909119894119895119909119894119896100381710038171003817100381710038172

(15)

where 120573+(119898)119895

minus 120573minus(119898)119895

is the estimated value of the 119895th mainvariable coefficient after119898 iterations and Θ119898

119895119896is the estimation

of the interaction coefficient between the 119895th variable and the119896th variable after119898 iterations

5 The Experimental Results and Analysis

51 The Experimental Results and Analysis of Four UCIDatasets There are four UCI study database datasets whichinclude the breast-cancer-Wisconsin datasets Ionospheredatasets Liver disorders datasets and Sonar datasets asshown in Table 1

We do the 10-fold cross-validation (10-CV) experimentsin the paper for 20 times using 119877 where 120582

1= 2120582

2= 120582

Besideswe complete an experiment employing the interactivehierarchical lasso logistic regression method The resultsinclude the number of nonzero variable coefficients averageerror rate of the 10-CV standard deviation (SD) CPU timeand the value of lambda (120582) estimated The results are shownin Table 2

The results of the 10-CV based on the four datasets arepresented in Figures 2 to 5 using the proposedmethod In thefigures the horizontal axis represents the logarithmic value of120582 and the vertical axis is the error rate of the 10-CV Besidesthe horizontal axis at the top of each figure represents thenumber of nonzero variable coefficients corresponding to the120582 value

The results for the breast-cancer-Wisconsin datasets areshown in Figure 2 The minimum error rate is 003 and thenumber of selected variables is more than 11 The results forthe Ionosphere experimental datasets are shown in Figure 3When the number of selected variables is 101 the lowest errorrate is 028 with the smaller standard deviation The numberof the selected variables is larger than the original dimensionso the interactions provide the classified information Theresults for the Liver disorders datasets are presented inFigure 4 If the number of selected variables is 25 the lowesterror rate can reach 026 while the standard deviation is002 Finally the results for the Sonar datasets are presentedin Figure 5 When more than 80 variables are selected theminimum error rate is 014

Mathematical Problems in Engineering 5

005

015

025

035

Cros

s-va

lidat

ion

erro

r

24 20 14 14 13 13 11 11 6Number of features

02

3

1 2 3 4 5

log(120582)

Figure 2 The results of breast-cancer-Wisconsin

028

030

032

034

036

038

177 139 104 83 73 53 30 9Number of features

3

Cros

s-va

lidat

ion

erro

r

log(120582)minus1 0 1 2 3

Figure 3 The results of Ionosphere

In what follows we compare our method to the existingliterature [13] The classification results and training time ofour method are better than those shown in the literature[13] The experimental results of the lasso all-pair lassoand conventional pattern recognition methods with 10-foldcross-validation of 20 times in the four UCI datasets arelisted respectively in Tables 3 4 and 5 Conventional patternrecognitionmethods include support vectormachine (SVM)linear discriminant analysis (LDA) quadratic discriminantanalysis (QDA)119870-nearest neighborhood (119870-NN) and deci-sion tree (DT) methods The lasso is a method that considersthe main variables without the interaction variables All-pairlasso is a method that considers the main variables and inter-action variables but without the hierarchy The experimental

025

030

035

33 32 32 28 26 24 24 19 12 8

040

log(120582)minus1 0 1 2 3

Number of features

Cros

s-va

lidat

ion

erro

r

Figure 4 The results of Liver disorders

03

05

124 112 97 88 77 70 42 18 8

02

04

log(120582)0 1 2 3

1Number of features

Cros

s-va

lidat

ion

erro

r

Figure 5 The results of Sonar

Table 3 The experimental results of the lasso penalized logisticregression model

Datasets Error rate() SD Time (s)

Breast-cancer-Wisconsin 322 00001 03744Ionosphere 3115 00075 04863Liver disorders 3077 00077 00908Sonar 2213 00121 156806

results show that our model is better in classification resultsand more stableThis highlights the advantage of the variableinteractions and hierarchy

52 The Experimental Results for High-Dimensional SmallSample Data The Madelon datasets from NIPS2003 was

6 Mathematical Problems in Engineering

Table 4 The experimental results of the all-pair lasso penalizedlogistic regression model

Datasets Error rate() SD Time (s)

Breast-cancer-Wisconsin 293 00001 121213Ionosphere 2700 00089 31734Liver disorders 2636 00058 49190Sonar 2063 00115 41041

used to evaluate our method The sample numbers of thetraining validation and testing of the data were respectively2000 600 and 1800 The class number is 2 The variabledimension is 500 So the interactive dimension is 124750You can find more information about the datasets followingthe link httpwwwnipsfscecssotonacuk you can alsodownload the datasets and see the results of challengesbalance error rates and the area under the curve The modelis trained by using a training set The model parameters areselected by using a validation set Also the prediction resultsof the final model using the test set are uploaded online Theclassification score of the final model is obtained Our resultsare shown in Table 6 The results show that our method isslightly better than the lasso and all-pair lasso This impliesthat the interactions may also be important in the Madelondatasets

53 Activity Recognition (AR) Using Inertial Sensors of Smart-phones Anguita et al collected sensor data of smartphones[10] They used the support vector machine (SVM) methodto solve the classification problem of the daily life activityrecognition These results play an extremely significant rolein disability and elderly care Datasets can be downloadedfollowing the literature [10] 30 volunteers aged 19ndash48 yearsparticipated in the study Each person performed six activitieswearing the smartphone on the waist To obtain the dataclass label experiments were conducted using video record-ing The smartphone used in the experiments had built-inaccelerometer and gyroscope for measuring 3D linear accel-eration and angular acceleration The sampling frequencywas 50Hz which is more than enough for capturing humanmovements

We use the datasets to evaluate our method We use theupstairs and downstairs movements as two active classesThetraining sets have 986 samples and 1073 samples respectivelyThe test sets have 420 samples and 471 samples respectivelyThe variable dimension is 561 which includes the time andfrequency from sensor signals

Experimental results of the three lassomethods and somepattern recognitionmethods are shown inTable 7The resultsshow that our method is better than the pattern recognitionmethods since it takes the variable selection and interactioninto account Our method achieves the best classificationresults with less training and testing time

54 The Numerical Simulation Results and Discussion Nowsuppose that the number and dimension of the samples are

119899 = 200 119901 = 20 We take interactions into considerationand provide the following three kinds of simulation based onformula (1)

(1) The real model is hierarchical Θ119895119896

= 0 rArr 120573119895

= 0 or120573119896

= 0 119895 119896 = 1 119901 There are 10 nonzero elementsin 120573 and 20 nonzero elements in Θ

(2) The real model only includes interactive variables120573119895= 0 119895 = 1 119901 There are 20 nonzero elements

in Θ

(3) The real model only includes main variables Θ119895119896

=0 119895 119896 = 1 119901 There are 10 nonzero elements in 120573

The SNR of the main variables is 15 and SNR of theinteraction variables is 1 The experiment results of 100 timesare shown in Figure 6

When the real model is hierarchical our method is thebest and the lasso is the worst This is shown in Figure 6(a)When the real model only includes interaction variances theinteractive lasso is the best and our method takes the secondplace while the lasso is still theworst as shown in Figure 6(b)The reason for this result is that when our method fitsthe model the interaction variables are considered to bemain variables When the real model only includes mainvariables the lasso is the best and our method still takes thesecond place and the all-pair lasso is the worst as shown inFigure 6(c)

We believe thatmany actual classification problems couldbe hierarchical and interactive They contain both mainvariables and interaction variables Our method fits in thiskind of situation

6 Conclusion

Taking into consideration the interaction between variablesthe hierarchical interactive lasso penalized logistic regressionusing the coordinate descent algorithm is derivedWeprovidethe model definition constraint condition and the convexrelaxation condition for the model We obtain a solutionfor the coefficients of the proposed model based on theconvex optimization and coordinate descent algorithm Wefurther provide experimental results based on four UCIdatasets NIPS2003 feature selection challenge datasets andtrue daily life activities identification datasets The resultsshow that the interaction widely exists in the classificationmodels They also demonstrate that the variable interactioncontributes to the response The classification performanceof our method is superior to the lasso all-pair lasso andsome pattern recognition methods It turns out that thevariable interaction and hierarchy are two important factorsOur further research is planned as follows other convexoptimization methods including the generalized gradientdescent method or alternating direction multiplier methodthe hierarchical interactive lasso penalized multiclass logisticregressionmethod the elastic net method or the hierarchicalgroup lasso method The application of the multisensorinteraction in the daily life activities of the elderly is a newway of using of our method

Mathematical Problems in Engineering 7

Table 5 The experimental results of the traditional pattern recognition methods

Datasets Classifier Error rate () SD Time (s)

Breast-cancer-Wisconsin

SVM 342 00019 13092LDA 396 00007 02429QDA 492 00014 02309K-NN 461 00030 06636DT 499 00041 16201

Ionosphere

SVM 3583 00063 29149LDA 3310 00090 06168QDA 3201 00076 04516K-NN 2830 00090 06363DT 3823 00408 30865

Liver disorders

SVM 3080 00100 11401LDA 3131 00089 02474QDA 4019 00119 01958K-NN 3689 00124 06089DT 3694 00187 19921

Sonar

SVM 2570 00197 12181LDA 2513 00145 05897QDA 2380 00205 04594K-NN 1309 00101 05345DT 3591 00303 07153

Table 6 The experimental results for the Madelon datasets

Methods Balance error rate () Area under the curve ()Training Validation Testing Training Validation Testing

Lasso 3808 3721 3661 6478 6279 6340All-pair lasso 3498 3701 3657 6502 6299 6343Our method 3000 3635 3512 7000 6808 6776

Appendices

A Proofs from (5) to (6)

For notational convenience we have written 119901119894instead of

119901(119909119894) A logarithmic likelihood function of (4) is as follows

119871119876=

1

119873ln [119871 (120573 Θ)]

=1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

(A1)

First we give the first- and second-order partial derivativeand mixed partial derivative of (4) with respect to 120573 and Θ

120597119871119876

120597120573

=120597 (1119873)sum

119873

119894=1[119910119894(119909119879119894120573 + (12) 119909119879

119894Θ119909119894) + ln (1 minus 119901

119894)]

120597120573

=1

119873

119873

sum119894=1

[(119909119879119894119910119894)119879

minusexp (119909119879

119894120573 + (12) 119909119879

119894Θ119909119894)

1 + exp (119909119879119894120573 + (12) 119909119879

119894Θ119909119894)sdot 119909119894]

=1

119873

119873

sum119894=1

(119910119879119894119909119894minus 119901119879119894119909119894) =

1

119873

119873

sum119894=1

119909119894(119910119894minus 119901119894)

1205972119871119876

1205971205732

=120597

120597120573

1

119873

119873

sum119894=1

119909119879119894(119910119894minus 119901119894) = minus

1

119873

119873

sum119894=1

119909119879119894

120597119901119894

120597120573

=1

119873

119873

sum119894=1

119909119879119894

exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

[1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)]2sdot (minus119909119894)

= minus1

119873

119873

sum119894=1

119909119894

exp (119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

sdot1

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)sdot 119909119879119894

= minus1

119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894

120597119871119876

120597Θ

8 Mathematical Problems in Engineering

Table 7 The experimental results of the three lasso methods and five traditional pattern recognition methods

Methods Error rate in training () Training time (s) Error rate in testing () Testing time (s)Lasso 000 02649 146 00312All-pair lasso 000 08807 135 00953Our method 000 06012 112 00556SVM 000 09516 123 009361-NN 000 00000 920 079563-NN 000 00000 898 07488QDA 661 14196 404 02184DT 000 07332 1493 01092

0055

005

0045

004

0035

003

0025

002

0015

Erro

r rat

e

Lasso Our method All-pair lasso

(a) Hierarchical interaction

0055

005

0045

004

0035

003

0025

002

0015

Erro

r rat

e

Lasso Our method All-pair lasso

(b) Interaction variances only

0055

005

0045

004

0035

003

0025

002

0015

Erro

r rat

e

Lasso Our method All-pair lasso

(c) Main variances only

Figure 6 The error rate of the three lasso methods in the simulation experiment (the Bayes error is shown by the purple dotted line in thegraph)

Mathematical Problems in Engineering 9

=120597 (1119873)sum

119873

119894=1[119910119894(119909119879119894120573 + (12) 119909119879

119894Θ119909119894) + ln (1 minus 119901

119894)]

120597Θ

=1

119873

119873

sum119894=1

[(1

2119910119894119909119879119894119909119894)119879

minusexp (119909119879

119894120573 + (12) 119909119879

119894Θ119909119894)

1 + exp (119909119879119894120573 + (12) 119909119879

119894Θ119909119894)

sdot1

2sdot 119909119879119894119909119894]

=1

119873

119873

sum119894=1

(1

2119910119894119909119879119894119909119894minus1

2119901119894119909119879119894119909119894)

=1

2119873

119873

sum119894=1

119909119879119894119909119894(119910119894minus 119901119894)

1205972119871119876

120597Θ2

=120597

120597Θ

1

2119873

119873

sum119894=1

119909119879119894119909119894(119910119894minus 119901119894) = minus

1

2119873

119873

sum119894=1

119909119879119894119909119894

120597119901119894

120597Θ

=1

2119873

119873

sum119894=1

exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

[1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)]2

sdot (minus1

2119909119879119894119909119894) sdot 119909119879119894119909119894

= minus1

4119873

119873

sum119894=1

exp (119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

sdot1

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)sdot1003817100381710038171003817119909119894

10038171003817100381710038172

= minus1

4119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot1003817100381710038171003817119909119894

10038171003817100381710038172

1205972119871119876

120597120573120597Θ

=120597

120597120573

1

119873

119873

sum119894=1

119909119879119894(119910119894minus 119901119894) = minus

1

119873

119873

sum119894=1

119909119879119894

120597119901119894

120597Θ

=1

119873

119873

sum119894=1

119909119879119894

exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

[1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)]2

sdot (minus1

2119909119879119894119909119894)

= minus1

119873

119873

sum119894=1

119909119879119894

exp (119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

sdot1

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

= minus1

2119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894sdot 119909119879119894

(A2)

Then (A1) is expanded by using Taylor series with respectto the expended point (120573 Θ)

119897119876(120573 Θ)

= 119871119876+ (120573 minus 120573) sdot

120597119871119876

120597120573+ (Θ minus Θ) sdot

120597119871119876

120597Θ

+1

2[(120573 minus 120573) sdot

120597119871119876

1205971205732+ 2 (120573 minus 120573)

119879

(Θ minus Θ)1205972119871119876

120597120573120597Θ

+ (Θ minus Θ) sdot120597119871119876

120597Θ2]

=1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

+ (120573 minus 120573) sdot1

119873

119873

sum119894=1

119909119894(119910119894minus 119901119894)

+ (Θ minus Θ) sdot1

2119873

119873

sum119894=1

119909119879119894119909119894(119910119894minus 119901119894)

minus1

2(120573 minus 120573) sdot

1

119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894

minus 2 (120573 minus 120573)119879

(Θ minus Θ)

sdot1

2119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894sdot 119909119879119894

minus (Θ minus Θ) sdot1

4119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot1003817100381710038171003817119909119894

10038171003817100381710038172

= minus1

2119873

119873

sum119894=1

(1 minus 119901119894)

sdot 119901119894[119909119879119894(120573 minus 120573)]

2

+ (120573 minus 120573)119879

(Θ minus Θ) 119909119894

sdot 119909119879119894119909119894+1

4[119909119879119894(Θ minus Θ) 119909

119894]2

+1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

+1

119873

119873

sum119894=1

119909119879119894(120573 minus 120573) (119910

119894minus 119901119894)

+1

2119873

119873

sum119894=1

119909119879119894(Θ minus Θ) 119909

119894(119910119894minus 119901119894)

= minus1

2119873

119873

sum119894=1

(1 minus 119901119894)

10 Mathematical Problems in Engineering

sdot 119901119894119909119879119894(120573 minus 120573)

+1

2119909119879119894(Θ minus Θ) 119909

1198942

+1

119873

119873

sum119894=1

119909119879119894(120573 minus 120573) (119910

119894minus 119901119894)

+1

2119873

119873

sum119894=1

119909119879119894(Θ minus Θ) 119909

119894(119910119894minus 119901119894)

+1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

= minus1

2119873

119873

sum119894=1

(1 minus 119901119894)

sdot 119901119894119909119879119894(120573 minus 120573) +

1

2119909119879119894(Θ minus Θ) 119909

119894

+119910119894minus 119901119894

119901119894[1 minus 119901

119894]

2

+1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

minus1

2119873

119873

sum119894=1

[119910119894minus 119901119894

119901119894[1 minus 119901

119894]]

2

= minus1

2119873

119873

sum119894=1

120596119894(119911119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896)

2

+1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

minus1

2119873

119873

sum119894=1

[119910119894minus 119901119894

119901119894[1 minus 119901

119894]]

2

(A3)

where

120596119894= 119901 (119909

119894) [1 minus 119901 (119909

119894)]

119911119894= sum119895

120573119895119909119894119895+1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896

+119910119894minus 119901 (119909

119894)

119901 (119909119894) [1 minus 119901 (119909

119894)]

(A4)

B Proofs from (12) to (13)

First there are three cases in calculating 120573+119895minus 120573minus119895

(1) 120573+119895ge 0 120573minus

119895= 0

120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+

119895minus 120573minus119895minus1205821+ 119895

120596119894

(B1)

(2) 120573+119895ge 0 120573minus

119895= 0

120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+

119895minus 120573minus119895+1205821+ 119895

120596119894

(B2)

(3) 120573+119895ge 0 120573minus

119895= 0

120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+

119895minus 120573minus119895 (B3)

We derive

120573+119895minus 120573minus119895= 119878 [119909

119894119895(119911119894minus 119860119894) + 120573+119895minus 120573minus1198951205821+ 119895

120596119894

] (B4)

Secondly we give the following by the stationary condi-tion 120597119871120597Θ

119895119896= 0

1

2120596119894sdot 2(119911

119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896)

sdot (minus1

2119909119894119895119909119894119896) + (120582

2+ 120572119895)119880119895119896= 0

120596119894sdot (119911119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896)

sdot (1

2119909119894119895119909119894119896) = (120582

2+ 120572119895)119880119895119896

(119911119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896)

sdot (1

2119909119894119895119909119894119896) =

(1205822+ 120572119895)119880119895119896

120596119894

(B5)

Supposing

120574(minus119895119896) = 119911119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896+ Θ119895119896119909119894119895119909119894119896 (B6)

we have

(120574(minus119895119896) minus Θ119895119896119909119894119895119909119894119896) sdot (119909119894119895119909119894119896)

=2 (1205822+ 120572119895)119880119895119896

120596119894

Θ119895119896119909119894119895119909119894119896119909119894119895119909119894119896

= minus2 (1205822+ 120572119895)119880119895119896

120596119894

+ 120574(minus119895119896) sdot 119909119894119895119909119894119896

Mathematical Problems in Engineering 11

Θ119895119896

=120574(minus119895119896) sdot 119909

119894119895119909119894119896minus 2 (120582

2+ 120572119895)119880119895119896120596119894

(119909119894119895119909119894119896)2

(B7)

We discuss three cases about Θ119895119896value

(1) Θ119895119896gt 0 119880

119895119896= 1

Θ119895119896=120574(minus119895119896) sdot 119909

119894119895119909119894119896minus 2 (120582

2+ 120572119895) 120596119894

(119909119894119895119909119894119896)2

(B8)

(2) Θ119895119896lt 0 119880

119895119896= minus1

Θ119895119896=120574(minus119895119896) sdot 119909

119894119895119909119894119896+ 2 (120582

2+ 120572119895) 120596119894

(119909119894119895119909119894119896)2

(B9)

(3) Θ119895119896= 0

We drive

Θ119895119896=119878 [120574(minus119895119896) sdot 119909

119894119895119909119894119896 2 (1205822+ 120572119895) 120596119894]

(119909119894119895119909119894119896)2

(B10)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported by the National Natural ScienceFoundation of China (nos 61273019 61473339) China Post-doctoral Science Foundation (2014M561202) Hebei Postdoc-toral Science Foundation Special Fund Project and HebeiTop Young Talents Support Program

References

[1] R Tibshirani ldquoRegression shrinkage and selection via the lassordquoJournal of the Royal Statistical Society B vol 58 no 1 pp 267ndash288 1996

[2] M Y Park and T Hastie ldquo1198711-regularization path algorithm

for generalized linear modelsrdquo Journal of the Royal StatisticalSociety B Statistical Methodology vol 69 no 4 pp 659ndash6772007

[3] T TWu Y F Chen THastie E Sobel andK Lange ldquoGenome-wide association analysis by lasso penalized logistic regressionrdquoBioinformatics vol 25 no 6 pp 714ndash721 2009

[4] J Friedman T Hastie and R Tibshirani ldquoRegularization pathsfor generalized linear models via coordinate descentrdquo Journal ofStatistical Software vol 33 no 1 pp 1ndash22 2010

[5] M Lim and T Hastie ldquoLearning interactions via hierarchicalgroup-lasso regularization[EB]rdquo httpwwwstanfordedusimhastiePapersglinternetpdf

[6] H Schwender and K Ickstadt ldquoIdentification of SNP interac-tions using logic regressionrdquo Biostatistics vol 9 no 1 pp 187ndash198 2008

[7] S Noah and R Tibshirani ldquoA Permutation Approach to TestingInteractions inManyDimensionsrdquo httpstatwebstanfordedusimtibsresearchhtml

[8] JWu BDevlin S RingquistM Trucco andK Roeder ldquoScreenand clean a tool for identifying interactions in genome-wideassociation studiesrdquo Genetic Epidemiology vol 34 no 3 pp275ndash285 2010

[9] Y Nardi and A Rinaldo ldquoThe log-linear group-lasso estimatorand its asymptotic propertiesrdquo Bernoulli vol 18 no 3 pp 945ndash974 2012

[10] M Yuan V R Joseph and Y Lin ldquoAn efficient variable selectionapproach for analyzing designed experimentsrdquo Technometricsvol 49 no 4 pp 430ndash439 2007

[11] H Chipman ldquoBayesian variable selection with related predic-torsrdquoThe Canadian Journal of Statistics vol 24 no 1 pp 17ndash361996

[12] N H Choi W Li and J Zhu ldquoVariable selection with thestrong heredity constraint and its oracle propertyrdquo Journal of theAmerican Statistical Association vol 105 no 489 pp 354ndash3642010

[13] J Bien J Taylor and R Tibshirani ldquoA lasso for hierarchicalinteractionsrdquoTheAnnals of Statistics vol 41 no 3 pp 1111ndash11412013

[14] M Yuan and Y Lin ldquoModel selection and estimation in regres-sion with grouped variablesrdquo Journal of the Royal StatisticalSociety Series B StatisticalMethodology vol 68 no 1 pp 49ndash672006

[15] R Jenatton J-Y Audibert and F Bach ldquoStructured variableselection with sparsity-inducing normsrdquo Journal of MachineLearning Research vol 12 no 10 pp 2777ndash2824 2011

[16] P Radchenko and G M James ldquoVariable selection usingadaptive nonlinear interaction structures in high dimensionsrdquoJournal of the American Statistical Association vol 105 no 492pp 1541ndash1553 2010

[17] F Bach R Jenatton J Mairal and G Obozinski ldquoStructuredsparsity through convex optimizationrdquo Statistical Science vol27 no 4 pp 450ndash468 2012

[18] I Ruczinski C Kooperberg and M LeBlanc ldquoLogic regres-sionrdquo Journal of Computational and Graphical Statistics vol 12no 3 pp 475ndash511 2003

[19] P Hall and J-H Xue ldquoOn selecting interacting features fromhigh-dimensional datardquo Computational Statistics amp Data Anal-ysis vol 71 pp 694ndash708 2014

[20] J-J Wang J Li T Zhang and W-X Hong ldquoDistinguishingvisual feature extraction method using quadratic map andgenetic algorithmrdquo Journal of System Simulation vol 21 no 16pp 5080ndash5083 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 5: Research Article Coordinate Descent Based Hierarchical ...downloads.hindawi.com/journals/mpe/2014/430201.pdf · Coordinate Descent Based Hierarchical Interactive Lasso Penalized Logistic

Mathematical Problems in Engineering 5

005

015

025

035

Cros

s-va

lidat

ion

erro

r

24 20 14 14 13 13 11 11 6Number of features

02

3

1 2 3 4 5

log(120582)

Figure 2 The results of breast-cancer-Wisconsin

028

030

032

034

036

038

177 139 104 83 73 53 30 9Number of features

3

Cros

s-va

lidat

ion

erro

r

log(120582)minus1 0 1 2 3

Figure 3 The results of Ionosphere

In what follows we compare our method to the existingliterature [13] The classification results and training time ofour method are better than those shown in the literature[13] The experimental results of the lasso all-pair lassoand conventional pattern recognition methods with 10-foldcross-validation of 20 times in the four UCI datasets arelisted respectively in Tables 3 4 and 5 Conventional patternrecognitionmethods include support vectormachine (SVM)linear discriminant analysis (LDA) quadratic discriminantanalysis (QDA)119870-nearest neighborhood (119870-NN) and deci-sion tree (DT) methods The lasso is a method that considersthe main variables without the interaction variables All-pairlasso is a method that considers the main variables and inter-action variables but without the hierarchy The experimental

025

030

035

33 32 32 28 26 24 24 19 12 8

040

log(120582)minus1 0 1 2 3

Number of features

Cros

s-va

lidat

ion

erro

r

Figure 4 The results of Liver disorders

03

05

124 112 97 88 77 70 42 18 8

02

04

log(120582)0 1 2 3

1Number of features

Cros

s-va

lidat

ion

erro

r

Figure 5 The results of Sonar

Table 3 The experimental results of the lasso penalized logisticregression model

Datasets Error rate() SD Time (s)

Breast-cancer-Wisconsin 322 00001 03744Ionosphere 3115 00075 04863Liver disorders 3077 00077 00908Sonar 2213 00121 156806

results show that our model is better in classification resultsand more stableThis highlights the advantage of the variableinteractions and hierarchy

52 The Experimental Results for High-Dimensional SmallSample Data The Madelon datasets from NIPS2003 was

6 Mathematical Problems in Engineering

Table 4 The experimental results of the all-pair lasso penalizedlogistic regression model

Datasets Error rate() SD Time (s)

Breast-cancer-Wisconsin 293 00001 121213Ionosphere 2700 00089 31734Liver disorders 2636 00058 49190Sonar 2063 00115 41041

used to evaluate our method The sample numbers of thetraining validation and testing of the data were respectively2000 600 and 1800 The class number is 2 The variabledimension is 500 So the interactive dimension is 124750You can find more information about the datasets followingthe link httpwwwnipsfscecssotonacuk you can alsodownload the datasets and see the results of challengesbalance error rates and the area under the curve The modelis trained by using a training set The model parameters areselected by using a validation set Also the prediction resultsof the final model using the test set are uploaded online Theclassification score of the final model is obtained Our resultsare shown in Table 6 The results show that our method isslightly better than the lasso and all-pair lasso This impliesthat the interactions may also be important in the Madelondatasets

53 Activity Recognition (AR) Using Inertial Sensors of Smart-phones Anguita et al collected sensor data of smartphones[10] They used the support vector machine (SVM) methodto solve the classification problem of the daily life activityrecognition These results play an extremely significant rolein disability and elderly care Datasets can be downloadedfollowing the literature [10] 30 volunteers aged 19ndash48 yearsparticipated in the study Each person performed six activitieswearing the smartphone on the waist To obtain the dataclass label experiments were conducted using video record-ing The smartphone used in the experiments had built-inaccelerometer and gyroscope for measuring 3D linear accel-eration and angular acceleration The sampling frequencywas 50Hz which is more than enough for capturing humanmovements

We use the datasets to evaluate our method We use theupstairs and downstairs movements as two active classesThetraining sets have 986 samples and 1073 samples respectivelyThe test sets have 420 samples and 471 samples respectivelyThe variable dimension is 561 which includes the time andfrequency from sensor signals

Experimental results of the three lassomethods and somepattern recognitionmethods are shown inTable 7The resultsshow that our method is better than the pattern recognitionmethods since it takes the variable selection and interactioninto account Our method achieves the best classificationresults with less training and testing time

54 The Numerical Simulation Results and Discussion Nowsuppose that the number and dimension of the samples are

119899 = 200 119901 = 20 We take interactions into considerationand provide the following three kinds of simulation based onformula (1)

(1) The real model is hierarchical Θ119895119896

= 0 rArr 120573119895

= 0 or120573119896

= 0 119895 119896 = 1 119901 There are 10 nonzero elementsin 120573 and 20 nonzero elements in Θ

(2) The real model only includes interactive variables120573119895= 0 119895 = 1 119901 There are 20 nonzero elements

in Θ

(3) The real model only includes main variables Θ119895119896

=0 119895 119896 = 1 119901 There are 10 nonzero elements in 120573

The SNR of the main variables is 15 and SNR of theinteraction variables is 1 The experiment results of 100 timesare shown in Figure 6

When the real model is hierarchical our method is thebest and the lasso is the worst This is shown in Figure 6(a)When the real model only includes interaction variances theinteractive lasso is the best and our method takes the secondplace while the lasso is still theworst as shown in Figure 6(b)The reason for this result is that when our method fitsthe model the interaction variables are considered to bemain variables When the real model only includes mainvariables the lasso is the best and our method still takes thesecond place and the all-pair lasso is the worst as shown inFigure 6(c)

We believe thatmany actual classification problems couldbe hierarchical and interactive They contain both mainvariables and interaction variables Our method fits in thiskind of situation

6 Conclusion

Taking into consideration the interaction between variablesthe hierarchical interactive lasso penalized logistic regressionusing the coordinate descent algorithm is derivedWeprovidethe model definition constraint condition and the convexrelaxation condition for the model We obtain a solutionfor the coefficients of the proposed model based on theconvex optimization and coordinate descent algorithm Wefurther provide experimental results based on four UCIdatasets NIPS2003 feature selection challenge datasets andtrue daily life activities identification datasets The resultsshow that the interaction widely exists in the classificationmodels They also demonstrate that the variable interactioncontributes to the response The classification performanceof our method is superior to the lasso all-pair lasso andsome pattern recognition methods It turns out that thevariable interaction and hierarchy are two important factorsOur further research is planned as follows other convexoptimization methods including the generalized gradientdescent method or alternating direction multiplier methodthe hierarchical interactive lasso penalized multiclass logisticregressionmethod the elastic net method or the hierarchicalgroup lasso method The application of the multisensorinteraction in the daily life activities of the elderly is a newway of using of our method

Mathematical Problems in Engineering 7

Table 5 The experimental results of the traditional pattern recognition methods

Datasets Classifier Error rate () SD Time (s)

Breast-cancer-Wisconsin

SVM 342 00019 13092LDA 396 00007 02429QDA 492 00014 02309K-NN 461 00030 06636DT 499 00041 16201

Ionosphere

SVM 3583 00063 29149LDA 3310 00090 06168QDA 3201 00076 04516K-NN 2830 00090 06363DT 3823 00408 30865

Liver disorders

SVM 3080 00100 11401LDA 3131 00089 02474QDA 4019 00119 01958K-NN 3689 00124 06089DT 3694 00187 19921

Sonar

SVM 2570 00197 12181LDA 2513 00145 05897QDA 2380 00205 04594K-NN 1309 00101 05345DT 3591 00303 07153

Table 6 The experimental results for the Madelon datasets

Methods Balance error rate () Area under the curve ()Training Validation Testing Training Validation Testing

Lasso 3808 3721 3661 6478 6279 6340All-pair lasso 3498 3701 3657 6502 6299 6343Our method 3000 3635 3512 7000 6808 6776

Appendices

A Proofs from (5) to (6)

For notational convenience we have written 119901119894instead of

119901(119909119894) A logarithmic likelihood function of (4) is as follows

119871119876=

1

119873ln [119871 (120573 Θ)]

=1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

(A1)

First we give the first- and second-order partial derivativeand mixed partial derivative of (4) with respect to 120573 and Θ

120597119871119876

120597120573

=120597 (1119873)sum

119873

119894=1[119910119894(119909119879119894120573 + (12) 119909119879

119894Θ119909119894) + ln (1 minus 119901

119894)]

120597120573

=1

119873

119873

sum119894=1

[(119909119879119894119910119894)119879

minusexp (119909119879

119894120573 + (12) 119909119879

119894Θ119909119894)

1 + exp (119909119879119894120573 + (12) 119909119879

119894Θ119909119894)sdot 119909119894]

=1

119873

119873

sum119894=1

(119910119879119894119909119894minus 119901119879119894119909119894) =

1

119873

119873

sum119894=1

119909119894(119910119894minus 119901119894)

1205972119871119876

1205971205732

=120597

120597120573

1

119873

119873

sum119894=1

119909119879119894(119910119894minus 119901119894) = minus

1

119873

119873

sum119894=1

119909119879119894

120597119901119894

120597120573

=1

119873

119873

sum119894=1

119909119879119894

exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

[1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)]2sdot (minus119909119894)

= minus1

119873

119873

sum119894=1

119909119894

exp (119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

sdot1

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)sdot 119909119879119894

= minus1

119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894

120597119871119876

120597Θ

8 Mathematical Problems in Engineering

Table 7 The experimental results of the three lasso methods and five traditional pattern recognition methods

Methods Error rate in training () Training time (s) Error rate in testing () Testing time (s)Lasso 000 02649 146 00312All-pair lasso 000 08807 135 00953Our method 000 06012 112 00556SVM 000 09516 123 009361-NN 000 00000 920 079563-NN 000 00000 898 07488QDA 661 14196 404 02184DT 000 07332 1493 01092

0055

005

0045

004

0035

003

0025

002

0015

Erro

r rat

e

Lasso Our method All-pair lasso

(a) Hierarchical interaction

0055

005

0045

004

0035

003

0025

002

0015

Erro

r rat

e

Lasso Our method All-pair lasso

(b) Interaction variances only

0055

005

0045

004

0035

003

0025

002

0015

Erro

r rat

e

Lasso Our method All-pair lasso

(c) Main variances only

Figure 6 The error rate of the three lasso methods in the simulation experiment (the Bayes error is shown by the purple dotted line in thegraph)

Mathematical Problems in Engineering 9

=120597 (1119873)sum

119873

119894=1[119910119894(119909119879119894120573 + (12) 119909119879

119894Θ119909119894) + ln (1 minus 119901

119894)]

120597Θ

=1

119873

119873

sum119894=1

[(1

2119910119894119909119879119894119909119894)119879

minusexp (119909119879

119894120573 + (12) 119909119879

119894Θ119909119894)

1 + exp (119909119879119894120573 + (12) 119909119879

119894Θ119909119894)

sdot1

2sdot 119909119879119894119909119894]

=1

119873

119873

sum119894=1

(1

2119910119894119909119879119894119909119894minus1

2119901119894119909119879119894119909119894)

=1

2119873

119873

sum119894=1

119909119879119894119909119894(119910119894minus 119901119894)

1205972119871119876

120597Θ2

=120597

120597Θ

1

2119873

119873

sum119894=1

119909119879119894119909119894(119910119894minus 119901119894) = minus

1

2119873

119873

sum119894=1

119909119879119894119909119894

120597119901119894

120597Θ

=1

2119873

119873

sum119894=1

exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

[1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)]2

sdot (minus1

2119909119879119894119909119894) sdot 119909119879119894119909119894

= minus1

4119873

119873

sum119894=1

exp (119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

sdot1

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)sdot1003817100381710038171003817119909119894

10038171003817100381710038172

= minus1

4119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot1003817100381710038171003817119909119894

10038171003817100381710038172

1205972119871119876

120597120573120597Θ

=120597

120597120573

1

119873

119873

sum119894=1

119909119879119894(119910119894minus 119901119894) = minus

1

119873

119873

sum119894=1

119909119879119894

120597119901119894

120597Θ

=1

119873

119873

sum119894=1

119909119879119894

exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

[1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)]2

sdot (minus1

2119909119879119894119909119894)

= minus1

119873

119873

sum119894=1

119909119879119894

exp (119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

sdot1

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

= minus1

2119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894sdot 119909119879119894

(A2)

Then (A1) is expanded by using Taylor series with respectto the expended point (120573 Θ)

119897119876(120573 Θ)

= 119871119876+ (120573 minus 120573) sdot

120597119871119876

120597120573+ (Θ minus Θ) sdot

120597119871119876

120597Θ

+1

2[(120573 minus 120573) sdot

120597119871119876

1205971205732+ 2 (120573 minus 120573)

119879

(Θ minus Θ)1205972119871119876

120597120573120597Θ

+ (Θ minus Θ) sdot120597119871119876

120597Θ2]

=1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

+ (120573 minus 120573) sdot1

119873

119873

sum119894=1

119909119894(119910119894minus 119901119894)

+ (Θ minus Θ) sdot1

2119873

119873

sum119894=1

119909119879119894119909119894(119910119894minus 119901119894)

minus1

2(120573 minus 120573) sdot

1

119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894

minus 2 (120573 minus 120573)119879

(Θ minus Θ)

sdot1

2119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894sdot 119909119879119894

minus (Θ minus Θ) sdot1

4119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot1003817100381710038171003817119909119894

10038171003817100381710038172

= minus1

2119873

119873

sum119894=1

(1 minus 119901119894)

sdot 119901119894[119909119879119894(120573 minus 120573)]

2

+ (120573 minus 120573)119879

(Θ minus Θ) 119909119894

sdot 119909119879119894119909119894+1

4[119909119879119894(Θ minus Θ) 119909

119894]2

+1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

+1

119873

119873

sum119894=1

119909119879119894(120573 minus 120573) (119910

119894minus 119901119894)

+1

2119873

119873

sum119894=1

119909119879119894(Θ minus Θ) 119909

119894(119910119894minus 119901119894)

= minus1

2119873

119873

sum119894=1

(1 minus 119901119894)

10 Mathematical Problems in Engineering

sdot 119901119894119909119879119894(120573 minus 120573)

+1

2119909119879119894(Θ minus Θ) 119909

1198942

+1

119873

119873

sum119894=1

119909119879119894(120573 minus 120573) (119910

119894minus 119901119894)

+1

2119873

119873

sum119894=1

119909119879119894(Θ minus Θ) 119909

119894(119910119894minus 119901119894)

+1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

= minus1

2119873

119873

sum119894=1

(1 minus 119901119894)

sdot 119901119894119909119879119894(120573 minus 120573) +

1

2119909119879119894(Θ minus Θ) 119909

119894

+119910119894minus 119901119894

119901119894[1 minus 119901

119894]

2

+1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

minus1

2119873

119873

sum119894=1

[119910119894minus 119901119894

119901119894[1 minus 119901

119894]]

2

= minus1

2119873

119873

sum119894=1

120596119894(119911119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896)

2

+1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

minus1

2119873

119873

sum119894=1

[119910119894minus 119901119894

119901119894[1 minus 119901

119894]]

2

(A3)

where

120596119894= 119901 (119909

119894) [1 minus 119901 (119909

119894)]

119911119894= sum119895

120573119895119909119894119895+1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896

+119910119894minus 119901 (119909

119894)

119901 (119909119894) [1 minus 119901 (119909

119894)]

(A4)

B Proofs from (12) to (13)

First there are three cases in calculating 120573+119895minus 120573minus119895

(1) 120573+119895ge 0 120573minus

119895= 0

120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+

119895minus 120573minus119895minus1205821+ 119895

120596119894

(B1)

(2) 120573+119895ge 0 120573minus

119895= 0

120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+

119895minus 120573minus119895+1205821+ 119895

120596119894

(B2)

(3) 120573+119895ge 0 120573minus

119895= 0

120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+

119895minus 120573minus119895 (B3)

We derive

120573+119895minus 120573minus119895= 119878 [119909

119894119895(119911119894minus 119860119894) + 120573+119895minus 120573minus1198951205821+ 119895

120596119894

] (B4)

Secondly we give the following by the stationary condi-tion 120597119871120597Θ

119895119896= 0

1

2120596119894sdot 2(119911

119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896)

sdot (minus1

2119909119894119895119909119894119896) + (120582

2+ 120572119895)119880119895119896= 0

120596119894sdot (119911119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896)

sdot (1

2119909119894119895119909119894119896) = (120582

2+ 120572119895)119880119895119896

(119911119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896)

sdot (1

2119909119894119895119909119894119896) =

(1205822+ 120572119895)119880119895119896

120596119894

(B5)

Supposing

120574(minus119895119896) = 119911119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896+ Θ119895119896119909119894119895119909119894119896 (B6)

we have

(120574(minus119895119896) minus Θ119895119896119909119894119895119909119894119896) sdot (119909119894119895119909119894119896)

=2 (1205822+ 120572119895)119880119895119896

120596119894

Θ119895119896119909119894119895119909119894119896119909119894119895119909119894119896

= minus2 (1205822+ 120572119895)119880119895119896

120596119894

+ 120574(minus119895119896) sdot 119909119894119895119909119894119896

Mathematical Problems in Engineering 11

Θ119895119896

=120574(minus119895119896) sdot 119909

119894119895119909119894119896minus 2 (120582

2+ 120572119895)119880119895119896120596119894

(119909119894119895119909119894119896)2

(B7)

We discuss three cases about Θ119895119896value

(1) Θ119895119896gt 0 119880

119895119896= 1

Θ119895119896=120574(minus119895119896) sdot 119909

119894119895119909119894119896minus 2 (120582

2+ 120572119895) 120596119894

(119909119894119895119909119894119896)2

(B8)

(2) Θ119895119896lt 0 119880

119895119896= minus1

Θ119895119896=120574(minus119895119896) sdot 119909

119894119895119909119894119896+ 2 (120582

2+ 120572119895) 120596119894

(119909119894119895119909119894119896)2

(B9)

(3) Θ119895119896= 0

We drive

Θ119895119896=119878 [120574(minus119895119896) sdot 119909

119894119895119909119894119896 2 (1205822+ 120572119895) 120596119894]

(119909119894119895119909119894119896)2

(B10)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported by the National Natural ScienceFoundation of China (nos 61273019 61473339) China Post-doctoral Science Foundation (2014M561202) Hebei Postdoc-toral Science Foundation Special Fund Project and HebeiTop Young Talents Support Program

References

[1] R Tibshirani ldquoRegression shrinkage and selection via the lassordquoJournal of the Royal Statistical Society B vol 58 no 1 pp 267ndash288 1996

[2] M Y Park and T Hastie ldquo1198711-regularization path algorithm

for generalized linear modelsrdquo Journal of the Royal StatisticalSociety B Statistical Methodology vol 69 no 4 pp 659ndash6772007

[3] T TWu Y F Chen THastie E Sobel andK Lange ldquoGenome-wide association analysis by lasso penalized logistic regressionrdquoBioinformatics vol 25 no 6 pp 714ndash721 2009

[4] J Friedman T Hastie and R Tibshirani ldquoRegularization pathsfor generalized linear models via coordinate descentrdquo Journal ofStatistical Software vol 33 no 1 pp 1ndash22 2010

[5] M Lim and T Hastie ldquoLearning interactions via hierarchicalgroup-lasso regularization[EB]rdquo httpwwwstanfordedusimhastiePapersglinternetpdf

[6] H Schwender and K Ickstadt ldquoIdentification of SNP interac-tions using logic regressionrdquo Biostatistics vol 9 no 1 pp 187ndash198 2008

[7] S Noah and R Tibshirani ldquoA Permutation Approach to TestingInteractions inManyDimensionsrdquo httpstatwebstanfordedusimtibsresearchhtml

[8] JWu BDevlin S RingquistM Trucco andK Roeder ldquoScreenand clean a tool for identifying interactions in genome-wideassociation studiesrdquo Genetic Epidemiology vol 34 no 3 pp275ndash285 2010

[9] Y Nardi and A Rinaldo ldquoThe log-linear group-lasso estimatorand its asymptotic propertiesrdquo Bernoulli vol 18 no 3 pp 945ndash974 2012

[10] M Yuan V R Joseph and Y Lin ldquoAn efficient variable selectionapproach for analyzing designed experimentsrdquo Technometricsvol 49 no 4 pp 430ndash439 2007

[11] H Chipman ldquoBayesian variable selection with related predic-torsrdquoThe Canadian Journal of Statistics vol 24 no 1 pp 17ndash361996

[12] N H Choi W Li and J Zhu ldquoVariable selection with thestrong heredity constraint and its oracle propertyrdquo Journal of theAmerican Statistical Association vol 105 no 489 pp 354ndash3642010

[13] J Bien J Taylor and R Tibshirani ldquoA lasso for hierarchicalinteractionsrdquoTheAnnals of Statistics vol 41 no 3 pp 1111ndash11412013

[14] M Yuan and Y Lin ldquoModel selection and estimation in regres-sion with grouped variablesrdquo Journal of the Royal StatisticalSociety Series B StatisticalMethodology vol 68 no 1 pp 49ndash672006

[15] R Jenatton J-Y Audibert and F Bach ldquoStructured variableselection with sparsity-inducing normsrdquo Journal of MachineLearning Research vol 12 no 10 pp 2777ndash2824 2011

[16] P Radchenko and G M James ldquoVariable selection usingadaptive nonlinear interaction structures in high dimensionsrdquoJournal of the American Statistical Association vol 105 no 492pp 1541ndash1553 2010

[17] F Bach R Jenatton J Mairal and G Obozinski ldquoStructuredsparsity through convex optimizationrdquo Statistical Science vol27 no 4 pp 450ndash468 2012

[18] I Ruczinski C Kooperberg and M LeBlanc ldquoLogic regres-sionrdquo Journal of Computational and Graphical Statistics vol 12no 3 pp 475ndash511 2003

[19] P Hall and J-H Xue ldquoOn selecting interacting features fromhigh-dimensional datardquo Computational Statistics amp Data Anal-ysis vol 71 pp 694ndash708 2014

[20] J-J Wang J Li T Zhang and W-X Hong ldquoDistinguishingvisual feature extraction method using quadratic map andgenetic algorithmrdquo Journal of System Simulation vol 21 no 16pp 5080ndash5083 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 6: Research Article Coordinate Descent Based Hierarchical ...downloads.hindawi.com/journals/mpe/2014/430201.pdf · Coordinate Descent Based Hierarchical Interactive Lasso Penalized Logistic

6 Mathematical Problems in Engineering

Table 4 The experimental results of the all-pair lasso penalizedlogistic regression model

Datasets Error rate() SD Time (s)

Breast-cancer-Wisconsin 293 00001 121213Ionosphere 2700 00089 31734Liver disorders 2636 00058 49190Sonar 2063 00115 41041

used to evaluate our method The sample numbers of thetraining validation and testing of the data were respectively2000 600 and 1800 The class number is 2 The variabledimension is 500 So the interactive dimension is 124750You can find more information about the datasets followingthe link httpwwwnipsfscecssotonacuk you can alsodownload the datasets and see the results of challengesbalance error rates and the area under the curve The modelis trained by using a training set The model parameters areselected by using a validation set Also the prediction resultsof the final model using the test set are uploaded online Theclassification score of the final model is obtained Our resultsare shown in Table 6 The results show that our method isslightly better than the lasso and all-pair lasso This impliesthat the interactions may also be important in the Madelondatasets

53 Activity Recognition (AR) Using Inertial Sensors of Smart-phones Anguita et al collected sensor data of smartphones[10] They used the support vector machine (SVM) methodto solve the classification problem of the daily life activityrecognition These results play an extremely significant rolein disability and elderly care Datasets can be downloadedfollowing the literature [10] 30 volunteers aged 19ndash48 yearsparticipated in the study Each person performed six activitieswearing the smartphone on the waist To obtain the dataclass label experiments were conducted using video record-ing The smartphone used in the experiments had built-inaccelerometer and gyroscope for measuring 3D linear accel-eration and angular acceleration The sampling frequencywas 50Hz which is more than enough for capturing humanmovements

We use the datasets to evaluate our method We use theupstairs and downstairs movements as two active classesThetraining sets have 986 samples and 1073 samples respectivelyThe test sets have 420 samples and 471 samples respectivelyThe variable dimension is 561 which includes the time andfrequency from sensor signals

Experimental results of the three lassomethods and somepattern recognitionmethods are shown inTable 7The resultsshow that our method is better than the pattern recognitionmethods since it takes the variable selection and interactioninto account Our method achieves the best classificationresults with less training and testing time

54 The Numerical Simulation Results and Discussion Nowsuppose that the number and dimension of the samples are

119899 = 200 119901 = 20 We take interactions into considerationand provide the following three kinds of simulation based onformula (1)

(1) The real model is hierarchical Θ119895119896

= 0 rArr 120573119895

= 0 or120573119896

= 0 119895 119896 = 1 119901 There are 10 nonzero elementsin 120573 and 20 nonzero elements in Θ

(2) The real model only includes interactive variables120573119895= 0 119895 = 1 119901 There are 20 nonzero elements

in Θ

(3) The real model only includes main variables Θ119895119896

=0 119895 119896 = 1 119901 There are 10 nonzero elements in 120573

The SNR of the main variables is 15 and SNR of theinteraction variables is 1 The experiment results of 100 timesare shown in Figure 6

When the real model is hierarchical our method is thebest and the lasso is the worst This is shown in Figure 6(a)When the real model only includes interaction variances theinteractive lasso is the best and our method takes the secondplace while the lasso is still theworst as shown in Figure 6(b)The reason for this result is that when our method fitsthe model the interaction variables are considered to bemain variables When the real model only includes mainvariables the lasso is the best and our method still takes thesecond place and the all-pair lasso is the worst as shown inFigure 6(c)

We believe thatmany actual classification problems couldbe hierarchical and interactive They contain both mainvariables and interaction variables Our method fits in thiskind of situation

6 Conclusion

Taking into consideration the interaction between variablesthe hierarchical interactive lasso penalized logistic regressionusing the coordinate descent algorithm is derivedWeprovidethe model definition constraint condition and the convexrelaxation condition for the model We obtain a solutionfor the coefficients of the proposed model based on theconvex optimization and coordinate descent algorithm Wefurther provide experimental results based on four UCIdatasets NIPS2003 feature selection challenge datasets andtrue daily life activities identification datasets The resultsshow that the interaction widely exists in the classificationmodels They also demonstrate that the variable interactioncontributes to the response The classification performanceof our method is superior to the lasso all-pair lasso andsome pattern recognition methods It turns out that thevariable interaction and hierarchy are two important factorsOur further research is planned as follows other convexoptimization methods including the generalized gradientdescent method or alternating direction multiplier methodthe hierarchical interactive lasso penalized multiclass logisticregressionmethod the elastic net method or the hierarchicalgroup lasso method The application of the multisensorinteraction in the daily life activities of the elderly is a newway of using of our method

Mathematical Problems in Engineering 7

Table 5 The experimental results of the traditional pattern recognition methods

Datasets Classifier Error rate () SD Time (s)

Breast-cancer-Wisconsin

SVM 342 00019 13092LDA 396 00007 02429QDA 492 00014 02309K-NN 461 00030 06636DT 499 00041 16201

Ionosphere

SVM 3583 00063 29149LDA 3310 00090 06168QDA 3201 00076 04516K-NN 2830 00090 06363DT 3823 00408 30865

Liver disorders

SVM 3080 00100 11401LDA 3131 00089 02474QDA 4019 00119 01958K-NN 3689 00124 06089DT 3694 00187 19921

Sonar

SVM 2570 00197 12181LDA 2513 00145 05897QDA 2380 00205 04594K-NN 1309 00101 05345DT 3591 00303 07153

Table 6 The experimental results for the Madelon datasets

Methods Balance error rate () Area under the curve ()Training Validation Testing Training Validation Testing

Lasso 3808 3721 3661 6478 6279 6340All-pair lasso 3498 3701 3657 6502 6299 6343Our method 3000 3635 3512 7000 6808 6776

Appendices

A Proofs from (5) to (6)

For notational convenience we have written 119901119894instead of

119901(119909119894) A logarithmic likelihood function of (4) is as follows

119871119876=

1

119873ln [119871 (120573 Θ)]

=1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

(A1)

First we give the first- and second-order partial derivativeand mixed partial derivative of (4) with respect to 120573 and Θ

120597119871119876

120597120573

=120597 (1119873)sum

119873

119894=1[119910119894(119909119879119894120573 + (12) 119909119879

119894Θ119909119894) + ln (1 minus 119901

119894)]

120597120573

=1

119873

119873

sum119894=1

[(119909119879119894119910119894)119879

minusexp (119909119879

119894120573 + (12) 119909119879

119894Θ119909119894)

1 + exp (119909119879119894120573 + (12) 119909119879

119894Θ119909119894)sdot 119909119894]

=1

119873

119873

sum119894=1

(119910119879119894119909119894minus 119901119879119894119909119894) =

1

119873

119873

sum119894=1

119909119894(119910119894minus 119901119894)

1205972119871119876

1205971205732

=120597

120597120573

1

119873

119873

sum119894=1

119909119879119894(119910119894minus 119901119894) = minus

1

119873

119873

sum119894=1

119909119879119894

120597119901119894

120597120573

=1

119873

119873

sum119894=1

119909119879119894

exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

[1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)]2sdot (minus119909119894)

= minus1

119873

119873

sum119894=1

119909119894

exp (119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

sdot1

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)sdot 119909119879119894

= minus1

119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894

120597119871119876

120597Θ

8 Mathematical Problems in Engineering

Table 7 The experimental results of the three lasso methods and five traditional pattern recognition methods

Methods Error rate in training () Training time (s) Error rate in testing () Testing time (s)Lasso 000 02649 146 00312All-pair lasso 000 08807 135 00953Our method 000 06012 112 00556SVM 000 09516 123 009361-NN 000 00000 920 079563-NN 000 00000 898 07488QDA 661 14196 404 02184DT 000 07332 1493 01092

0055

005

0045

004

0035

003

0025

002

0015

Erro

r rat

e

Lasso Our method All-pair lasso

(a) Hierarchical interaction

0055

005

0045

004

0035

003

0025

002

0015

Erro

r rat

e

Lasso Our method All-pair lasso

(b) Interaction variances only

0055

005

0045

004

0035

003

0025

002

0015

Erro

r rat

e

Lasso Our method All-pair lasso

(c) Main variances only

Figure 6 The error rate of the three lasso methods in the simulation experiment (the Bayes error is shown by the purple dotted line in thegraph)

Mathematical Problems in Engineering 9

=120597 (1119873)sum

119873

119894=1[119910119894(119909119879119894120573 + (12) 119909119879

119894Θ119909119894) + ln (1 minus 119901

119894)]

120597Θ

=1

119873

119873

sum119894=1

[(1

2119910119894119909119879119894119909119894)119879

minusexp (119909119879

119894120573 + (12) 119909119879

119894Θ119909119894)

1 + exp (119909119879119894120573 + (12) 119909119879

119894Θ119909119894)

sdot1

2sdot 119909119879119894119909119894]

=1

119873

119873

sum119894=1

(1

2119910119894119909119879119894119909119894minus1

2119901119894119909119879119894119909119894)

=1

2119873

119873

sum119894=1

119909119879119894119909119894(119910119894minus 119901119894)

1205972119871119876

120597Θ2

=120597

120597Θ

1

2119873

119873

sum119894=1

119909119879119894119909119894(119910119894minus 119901119894) = minus

1

2119873

119873

sum119894=1

119909119879119894119909119894

120597119901119894

120597Θ

=1

2119873

119873

sum119894=1

exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

[1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)]2

sdot (minus1

2119909119879119894119909119894) sdot 119909119879119894119909119894

= minus1

4119873

119873

sum119894=1

exp (119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

sdot1

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)sdot1003817100381710038171003817119909119894

10038171003817100381710038172

= minus1

4119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot1003817100381710038171003817119909119894

10038171003817100381710038172

1205972119871119876

120597120573120597Θ

=120597

120597120573

1

119873

119873

sum119894=1

119909119879119894(119910119894minus 119901119894) = minus

1

119873

119873

sum119894=1

119909119879119894

120597119901119894

120597Θ

=1

119873

119873

sum119894=1

119909119879119894

exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

[1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)]2

sdot (minus1

2119909119879119894119909119894)

= minus1

119873

119873

sum119894=1

119909119879119894

exp (119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

sdot1

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

= minus1

2119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894sdot 119909119879119894

(A2)

Then (A1) is expanded by using Taylor series with respectto the expended point (120573 Θ)

119897119876(120573 Θ)

= 119871119876+ (120573 minus 120573) sdot

120597119871119876

120597120573+ (Θ minus Θ) sdot

120597119871119876

120597Θ

+1

2[(120573 minus 120573) sdot

120597119871119876

1205971205732+ 2 (120573 minus 120573)

119879

(Θ minus Θ)1205972119871119876

120597120573120597Θ

+ (Θ minus Θ) sdot120597119871119876

120597Θ2]

=1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

+ (120573 minus 120573) sdot1

119873

119873

sum119894=1

119909119894(119910119894minus 119901119894)

+ (Θ minus Θ) sdot1

2119873

119873

sum119894=1

119909119879119894119909119894(119910119894minus 119901119894)

minus1

2(120573 minus 120573) sdot

1

119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894

minus 2 (120573 minus 120573)119879

(Θ minus Θ)

sdot1

2119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894sdot 119909119879119894

minus (Θ minus Θ) sdot1

4119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot1003817100381710038171003817119909119894

10038171003817100381710038172

= minus1

2119873

119873

sum119894=1

(1 minus 119901119894)

sdot 119901119894[119909119879119894(120573 minus 120573)]

2

+ (120573 minus 120573)119879

(Θ minus Θ) 119909119894

sdot 119909119879119894119909119894+1

4[119909119879119894(Θ minus Θ) 119909

119894]2

+1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

+1

119873

119873

sum119894=1

119909119879119894(120573 minus 120573) (119910

119894minus 119901119894)

+1

2119873

119873

sum119894=1

119909119879119894(Θ minus Θ) 119909

119894(119910119894minus 119901119894)

= minus1

2119873

119873

sum119894=1

(1 minus 119901119894)

10 Mathematical Problems in Engineering

sdot 119901119894119909119879119894(120573 minus 120573)

+1

2119909119879119894(Θ minus Θ) 119909

1198942

+1

119873

119873

sum119894=1

119909119879119894(120573 minus 120573) (119910

119894minus 119901119894)

+1

2119873

119873

sum119894=1

119909119879119894(Θ minus Θ) 119909

119894(119910119894minus 119901119894)

+1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

= minus1

2119873

119873

sum119894=1

(1 minus 119901119894)

sdot 119901119894119909119879119894(120573 minus 120573) +

1

2119909119879119894(Θ minus Θ) 119909

119894

+119910119894minus 119901119894

119901119894[1 minus 119901

119894]

2

+1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

minus1

2119873

119873

sum119894=1

[119910119894minus 119901119894

119901119894[1 minus 119901

119894]]

2

= minus1

2119873

119873

sum119894=1

120596119894(119911119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896)

2

+1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

minus1

2119873

119873

sum119894=1

[119910119894minus 119901119894

119901119894[1 minus 119901

119894]]

2

(A3)

where

120596119894= 119901 (119909

119894) [1 minus 119901 (119909

119894)]

119911119894= sum119895

120573119895119909119894119895+1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896

+119910119894minus 119901 (119909

119894)

119901 (119909119894) [1 minus 119901 (119909

119894)]

(A4)

B Proofs from (12) to (13)

First there are three cases in calculating 120573+119895minus 120573minus119895

(1) 120573+119895ge 0 120573minus

119895= 0

120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+

119895minus 120573minus119895minus1205821+ 119895

120596119894

(B1)

(2) 120573+119895ge 0 120573minus

119895= 0

120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+

119895minus 120573minus119895+1205821+ 119895

120596119894

(B2)

(3) 120573+119895ge 0 120573minus

119895= 0

120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+

119895minus 120573minus119895 (B3)

We derive

120573+119895minus 120573minus119895= 119878 [119909

119894119895(119911119894minus 119860119894) + 120573+119895minus 120573minus1198951205821+ 119895

120596119894

] (B4)

Secondly we give the following by the stationary condi-tion 120597119871120597Θ

119895119896= 0

1

2120596119894sdot 2(119911

119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896)

sdot (minus1

2119909119894119895119909119894119896) + (120582

2+ 120572119895)119880119895119896= 0

120596119894sdot (119911119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896)

sdot (1

2119909119894119895119909119894119896) = (120582

2+ 120572119895)119880119895119896

(119911119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896)

sdot (1

2119909119894119895119909119894119896) =

(1205822+ 120572119895)119880119895119896

120596119894

(B5)

Supposing

120574(minus119895119896) = 119911119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896+ Θ119895119896119909119894119895119909119894119896 (B6)

we have

(120574(minus119895119896) minus Θ119895119896119909119894119895119909119894119896) sdot (119909119894119895119909119894119896)

=2 (1205822+ 120572119895)119880119895119896

120596119894

Θ119895119896119909119894119895119909119894119896119909119894119895119909119894119896

= minus2 (1205822+ 120572119895)119880119895119896

120596119894

+ 120574(minus119895119896) sdot 119909119894119895119909119894119896

Mathematical Problems in Engineering 11

Θ119895119896

=120574(minus119895119896) sdot 119909

119894119895119909119894119896minus 2 (120582

2+ 120572119895)119880119895119896120596119894

(119909119894119895119909119894119896)2

(B7)

We discuss three cases about Θ119895119896value

(1) Θ119895119896gt 0 119880

119895119896= 1

Θ119895119896=120574(minus119895119896) sdot 119909

119894119895119909119894119896minus 2 (120582

2+ 120572119895) 120596119894

(119909119894119895119909119894119896)2

(B8)

(2) Θ119895119896lt 0 119880

119895119896= minus1

Θ119895119896=120574(minus119895119896) sdot 119909

119894119895119909119894119896+ 2 (120582

2+ 120572119895) 120596119894

(119909119894119895119909119894119896)2

(B9)

(3) Θ119895119896= 0

We drive

Θ119895119896=119878 [120574(minus119895119896) sdot 119909

119894119895119909119894119896 2 (1205822+ 120572119895) 120596119894]

(119909119894119895119909119894119896)2

(B10)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported by the National Natural ScienceFoundation of China (nos 61273019 61473339) China Post-doctoral Science Foundation (2014M561202) Hebei Postdoc-toral Science Foundation Special Fund Project and HebeiTop Young Talents Support Program

References

[1] R Tibshirani ldquoRegression shrinkage and selection via the lassordquoJournal of the Royal Statistical Society B vol 58 no 1 pp 267ndash288 1996

[2] M Y Park and T Hastie ldquo1198711-regularization path algorithm

for generalized linear modelsrdquo Journal of the Royal StatisticalSociety B Statistical Methodology vol 69 no 4 pp 659ndash6772007

[3] T TWu Y F Chen THastie E Sobel andK Lange ldquoGenome-wide association analysis by lasso penalized logistic regressionrdquoBioinformatics vol 25 no 6 pp 714ndash721 2009

[4] J Friedman T Hastie and R Tibshirani ldquoRegularization pathsfor generalized linear models via coordinate descentrdquo Journal ofStatistical Software vol 33 no 1 pp 1ndash22 2010

[5] M Lim and T Hastie ldquoLearning interactions via hierarchicalgroup-lasso regularization[EB]rdquo httpwwwstanfordedusimhastiePapersglinternetpdf

[6] H Schwender and K Ickstadt ldquoIdentification of SNP interac-tions using logic regressionrdquo Biostatistics vol 9 no 1 pp 187ndash198 2008

[7] S Noah and R Tibshirani ldquoA Permutation Approach to TestingInteractions inManyDimensionsrdquo httpstatwebstanfordedusimtibsresearchhtml

[8] JWu BDevlin S RingquistM Trucco andK Roeder ldquoScreenand clean a tool for identifying interactions in genome-wideassociation studiesrdquo Genetic Epidemiology vol 34 no 3 pp275ndash285 2010

[9] Y Nardi and A Rinaldo ldquoThe log-linear group-lasso estimatorand its asymptotic propertiesrdquo Bernoulli vol 18 no 3 pp 945ndash974 2012

[10] M Yuan V R Joseph and Y Lin ldquoAn efficient variable selectionapproach for analyzing designed experimentsrdquo Technometricsvol 49 no 4 pp 430ndash439 2007

[11] H Chipman ldquoBayesian variable selection with related predic-torsrdquoThe Canadian Journal of Statistics vol 24 no 1 pp 17ndash361996

[12] N H Choi W Li and J Zhu ldquoVariable selection with thestrong heredity constraint and its oracle propertyrdquo Journal of theAmerican Statistical Association vol 105 no 489 pp 354ndash3642010

[13] J Bien J Taylor and R Tibshirani ldquoA lasso for hierarchicalinteractionsrdquoTheAnnals of Statistics vol 41 no 3 pp 1111ndash11412013

[14] M Yuan and Y Lin ldquoModel selection and estimation in regres-sion with grouped variablesrdquo Journal of the Royal StatisticalSociety Series B StatisticalMethodology vol 68 no 1 pp 49ndash672006

[15] R Jenatton J-Y Audibert and F Bach ldquoStructured variableselection with sparsity-inducing normsrdquo Journal of MachineLearning Research vol 12 no 10 pp 2777ndash2824 2011

[16] P Radchenko and G M James ldquoVariable selection usingadaptive nonlinear interaction structures in high dimensionsrdquoJournal of the American Statistical Association vol 105 no 492pp 1541ndash1553 2010

[17] F Bach R Jenatton J Mairal and G Obozinski ldquoStructuredsparsity through convex optimizationrdquo Statistical Science vol27 no 4 pp 450ndash468 2012

[18] I Ruczinski C Kooperberg and M LeBlanc ldquoLogic regres-sionrdquo Journal of Computational and Graphical Statistics vol 12no 3 pp 475ndash511 2003

[19] P Hall and J-H Xue ldquoOn selecting interacting features fromhigh-dimensional datardquo Computational Statistics amp Data Anal-ysis vol 71 pp 694ndash708 2014

[20] J-J Wang J Li T Zhang and W-X Hong ldquoDistinguishingvisual feature extraction method using quadratic map andgenetic algorithmrdquo Journal of System Simulation vol 21 no 16pp 5080ndash5083 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 7: Research Article Coordinate Descent Based Hierarchical ...downloads.hindawi.com/journals/mpe/2014/430201.pdf · Coordinate Descent Based Hierarchical Interactive Lasso Penalized Logistic

Mathematical Problems in Engineering 7

Table 5 The experimental results of the traditional pattern recognition methods

Datasets Classifier Error rate () SD Time (s)

Breast-cancer-Wisconsin

SVM 342 00019 13092LDA 396 00007 02429QDA 492 00014 02309K-NN 461 00030 06636DT 499 00041 16201

Ionosphere

SVM 3583 00063 29149LDA 3310 00090 06168QDA 3201 00076 04516K-NN 2830 00090 06363DT 3823 00408 30865

Liver disorders

SVM 3080 00100 11401LDA 3131 00089 02474QDA 4019 00119 01958K-NN 3689 00124 06089DT 3694 00187 19921

Sonar

SVM 2570 00197 12181LDA 2513 00145 05897QDA 2380 00205 04594K-NN 1309 00101 05345DT 3591 00303 07153

Table 6 The experimental results for the Madelon datasets

Methods Balance error rate () Area under the curve ()Training Validation Testing Training Validation Testing

Lasso 3808 3721 3661 6478 6279 6340All-pair lasso 3498 3701 3657 6502 6299 6343Our method 3000 3635 3512 7000 6808 6776

Appendices

A Proofs from (5) to (6)

For notational convenience we have written 119901119894instead of

119901(119909119894) A logarithmic likelihood function of (4) is as follows

119871119876=

1

119873ln [119871 (120573 Θ)]

=1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

(A1)

First we give the first- and second-order partial derivativeand mixed partial derivative of (4) with respect to 120573 and Θ

120597119871119876

120597120573

=120597 (1119873)sum

119873

119894=1[119910119894(119909119879119894120573 + (12) 119909119879

119894Θ119909119894) + ln (1 minus 119901

119894)]

120597120573

=1

119873

119873

sum119894=1

[(119909119879119894119910119894)119879

minusexp (119909119879

119894120573 + (12) 119909119879

119894Θ119909119894)

1 + exp (119909119879119894120573 + (12) 119909119879

119894Θ119909119894)sdot 119909119894]

=1

119873

119873

sum119894=1

(119910119879119894119909119894minus 119901119879119894119909119894) =

1

119873

119873

sum119894=1

119909119894(119910119894minus 119901119894)

1205972119871119876

1205971205732

=120597

120597120573

1

119873

119873

sum119894=1

119909119879119894(119910119894minus 119901119894) = minus

1

119873

119873

sum119894=1

119909119879119894

120597119901119894

120597120573

=1

119873

119873

sum119894=1

119909119879119894

exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

[1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)]2sdot (minus119909119894)

= minus1

119873

119873

sum119894=1

119909119894

exp (119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

sdot1

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)sdot 119909119879119894

= minus1

119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894

120597119871119876

120597Θ

8 Mathematical Problems in Engineering

Table 7 The experimental results of the three lasso methods and five traditional pattern recognition methods

Methods Error rate in training () Training time (s) Error rate in testing () Testing time (s)Lasso 000 02649 146 00312All-pair lasso 000 08807 135 00953Our method 000 06012 112 00556SVM 000 09516 123 009361-NN 000 00000 920 079563-NN 000 00000 898 07488QDA 661 14196 404 02184DT 000 07332 1493 01092

0055

005

0045

004

0035

003

0025

002

0015

Erro

r rat

e

Lasso Our method All-pair lasso

(a) Hierarchical interaction

0055

005

0045

004

0035

003

0025

002

0015

Erro

r rat

e

Lasso Our method All-pair lasso

(b) Interaction variances only

0055

005

0045

004

0035

003

0025

002

0015

Erro

r rat

e

Lasso Our method All-pair lasso

(c) Main variances only

Figure 6 The error rate of the three lasso methods in the simulation experiment (the Bayes error is shown by the purple dotted line in thegraph)

Mathematical Problems in Engineering 9

=120597 (1119873)sum

119873

119894=1[119910119894(119909119879119894120573 + (12) 119909119879

119894Θ119909119894) + ln (1 minus 119901

119894)]

120597Θ

=1

119873

119873

sum119894=1

[(1

2119910119894119909119879119894119909119894)119879

minusexp (119909119879

119894120573 + (12) 119909119879

119894Θ119909119894)

1 + exp (119909119879119894120573 + (12) 119909119879

119894Θ119909119894)

sdot1

2sdot 119909119879119894119909119894]

=1

119873

119873

sum119894=1

(1

2119910119894119909119879119894119909119894minus1

2119901119894119909119879119894119909119894)

=1

2119873

119873

sum119894=1

119909119879119894119909119894(119910119894minus 119901119894)

1205972119871119876

120597Θ2

=120597

120597Θ

1

2119873

119873

sum119894=1

119909119879119894119909119894(119910119894minus 119901119894) = minus

1

2119873

119873

sum119894=1

119909119879119894119909119894

120597119901119894

120597Θ

=1

2119873

119873

sum119894=1

exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

[1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)]2

sdot (minus1

2119909119879119894119909119894) sdot 119909119879119894119909119894

= minus1

4119873

119873

sum119894=1

exp (119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

sdot1

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)sdot1003817100381710038171003817119909119894

10038171003817100381710038172

= minus1

4119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot1003817100381710038171003817119909119894

10038171003817100381710038172

1205972119871119876

120597120573120597Θ

=120597

120597120573

1

119873

119873

sum119894=1

119909119879119894(119910119894minus 119901119894) = minus

1

119873

119873

sum119894=1

119909119879119894

120597119901119894

120597Θ

=1

119873

119873

sum119894=1

119909119879119894

exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

[1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)]2

sdot (minus1

2119909119879119894119909119894)

= minus1

119873

119873

sum119894=1

119909119879119894

exp (119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

sdot1

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

= minus1

2119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894sdot 119909119879119894

(A2)

Then (A1) is expanded by using Taylor series with respectto the expended point (120573 Θ)

119897119876(120573 Θ)

= 119871119876+ (120573 minus 120573) sdot

120597119871119876

120597120573+ (Θ minus Θ) sdot

120597119871119876

120597Θ

+1

2[(120573 minus 120573) sdot

120597119871119876

1205971205732+ 2 (120573 minus 120573)

119879

(Θ minus Θ)1205972119871119876

120597120573120597Θ

+ (Θ minus Θ) sdot120597119871119876

120597Θ2]

=1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

+ (120573 minus 120573) sdot1

119873

119873

sum119894=1

119909119894(119910119894minus 119901119894)

+ (Θ minus Θ) sdot1

2119873

119873

sum119894=1

119909119879119894119909119894(119910119894minus 119901119894)

minus1

2(120573 minus 120573) sdot

1

119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894

minus 2 (120573 minus 120573)119879

(Θ minus Θ)

sdot1

2119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894sdot 119909119879119894

minus (Θ minus Θ) sdot1

4119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot1003817100381710038171003817119909119894

10038171003817100381710038172

= minus1

2119873

119873

sum119894=1

(1 minus 119901119894)

sdot 119901119894[119909119879119894(120573 minus 120573)]

2

+ (120573 minus 120573)119879

(Θ minus Θ) 119909119894

sdot 119909119879119894119909119894+1

4[119909119879119894(Θ minus Θ) 119909

119894]2

+1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

+1

119873

119873

sum119894=1

119909119879119894(120573 minus 120573) (119910

119894minus 119901119894)

+1

2119873

119873

sum119894=1

119909119879119894(Θ minus Θ) 119909

119894(119910119894minus 119901119894)

= minus1

2119873

119873

sum119894=1

(1 minus 119901119894)

10 Mathematical Problems in Engineering

sdot 119901119894119909119879119894(120573 minus 120573)

+1

2119909119879119894(Θ minus Θ) 119909

1198942

+1

119873

119873

sum119894=1

119909119879119894(120573 minus 120573) (119910

119894minus 119901119894)

+1

2119873

119873

sum119894=1

119909119879119894(Θ minus Θ) 119909

119894(119910119894minus 119901119894)

+1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

= minus1

2119873

119873

sum119894=1

(1 minus 119901119894)

sdot 119901119894119909119879119894(120573 minus 120573) +

1

2119909119879119894(Θ minus Θ) 119909

119894

+119910119894minus 119901119894

119901119894[1 minus 119901

119894]

2

+1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

minus1

2119873

119873

sum119894=1

[119910119894minus 119901119894

119901119894[1 minus 119901

119894]]

2

= minus1

2119873

119873

sum119894=1

120596119894(119911119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896)

2

+1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

minus1

2119873

119873

sum119894=1

[119910119894minus 119901119894

119901119894[1 minus 119901

119894]]

2

(A3)

where

120596119894= 119901 (119909

119894) [1 minus 119901 (119909

119894)]

119911119894= sum119895

120573119895119909119894119895+1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896

+119910119894minus 119901 (119909

119894)

119901 (119909119894) [1 minus 119901 (119909

119894)]

(A4)

B Proofs from (12) to (13)

First there are three cases in calculating 120573+119895minus 120573minus119895

(1) 120573+119895ge 0 120573minus

119895= 0

120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+

119895minus 120573minus119895minus1205821+ 119895

120596119894

(B1)

(2) 120573+119895ge 0 120573minus

119895= 0

120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+

119895minus 120573minus119895+1205821+ 119895

120596119894

(B2)

(3) 120573+119895ge 0 120573minus

119895= 0

120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+

119895minus 120573minus119895 (B3)

We derive

120573+119895minus 120573minus119895= 119878 [119909

119894119895(119911119894minus 119860119894) + 120573+119895minus 120573minus1198951205821+ 119895

120596119894

] (B4)

Secondly we give the following by the stationary condi-tion 120597119871120597Θ

119895119896= 0

1

2120596119894sdot 2(119911

119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896)

sdot (minus1

2119909119894119895119909119894119896) + (120582

2+ 120572119895)119880119895119896= 0

120596119894sdot (119911119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896)

sdot (1

2119909119894119895119909119894119896) = (120582

2+ 120572119895)119880119895119896

(119911119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896)

sdot (1

2119909119894119895119909119894119896) =

(1205822+ 120572119895)119880119895119896

120596119894

(B5)

Supposing

120574(minus119895119896) = 119911119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896+ Θ119895119896119909119894119895119909119894119896 (B6)

we have

(120574(minus119895119896) minus Θ119895119896119909119894119895119909119894119896) sdot (119909119894119895119909119894119896)

=2 (1205822+ 120572119895)119880119895119896

120596119894

Θ119895119896119909119894119895119909119894119896119909119894119895119909119894119896

= minus2 (1205822+ 120572119895)119880119895119896

120596119894

+ 120574(minus119895119896) sdot 119909119894119895119909119894119896

Mathematical Problems in Engineering 11

Θ119895119896

=120574(minus119895119896) sdot 119909

119894119895119909119894119896minus 2 (120582

2+ 120572119895)119880119895119896120596119894

(119909119894119895119909119894119896)2

(B7)

We discuss three cases about Θ119895119896value

(1) Θ119895119896gt 0 119880

119895119896= 1

Θ119895119896=120574(minus119895119896) sdot 119909

119894119895119909119894119896minus 2 (120582

2+ 120572119895) 120596119894

(119909119894119895119909119894119896)2

(B8)

(2) Θ119895119896lt 0 119880

119895119896= minus1

Θ119895119896=120574(minus119895119896) sdot 119909

119894119895119909119894119896+ 2 (120582

2+ 120572119895) 120596119894

(119909119894119895119909119894119896)2

(B9)

(3) Θ119895119896= 0

We drive

Θ119895119896=119878 [120574(minus119895119896) sdot 119909

119894119895119909119894119896 2 (1205822+ 120572119895) 120596119894]

(119909119894119895119909119894119896)2

(B10)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported by the National Natural ScienceFoundation of China (nos 61273019 61473339) China Post-doctoral Science Foundation (2014M561202) Hebei Postdoc-toral Science Foundation Special Fund Project and HebeiTop Young Talents Support Program

References

[1] R Tibshirani ldquoRegression shrinkage and selection via the lassordquoJournal of the Royal Statistical Society B vol 58 no 1 pp 267ndash288 1996

[2] M Y Park and T Hastie ldquo1198711-regularization path algorithm

for generalized linear modelsrdquo Journal of the Royal StatisticalSociety B Statistical Methodology vol 69 no 4 pp 659ndash6772007

[3] T TWu Y F Chen THastie E Sobel andK Lange ldquoGenome-wide association analysis by lasso penalized logistic regressionrdquoBioinformatics vol 25 no 6 pp 714ndash721 2009

[4] J Friedman T Hastie and R Tibshirani ldquoRegularization pathsfor generalized linear models via coordinate descentrdquo Journal ofStatistical Software vol 33 no 1 pp 1ndash22 2010

[5] M Lim and T Hastie ldquoLearning interactions via hierarchicalgroup-lasso regularization[EB]rdquo httpwwwstanfordedusimhastiePapersglinternetpdf

[6] H Schwender and K Ickstadt ldquoIdentification of SNP interac-tions using logic regressionrdquo Biostatistics vol 9 no 1 pp 187ndash198 2008

[7] S Noah and R Tibshirani ldquoA Permutation Approach to TestingInteractions inManyDimensionsrdquo httpstatwebstanfordedusimtibsresearchhtml

[8] JWu BDevlin S RingquistM Trucco andK Roeder ldquoScreenand clean a tool for identifying interactions in genome-wideassociation studiesrdquo Genetic Epidemiology vol 34 no 3 pp275ndash285 2010

[9] Y Nardi and A Rinaldo ldquoThe log-linear group-lasso estimatorand its asymptotic propertiesrdquo Bernoulli vol 18 no 3 pp 945ndash974 2012

[10] M Yuan V R Joseph and Y Lin ldquoAn efficient variable selectionapproach for analyzing designed experimentsrdquo Technometricsvol 49 no 4 pp 430ndash439 2007

[11] H Chipman ldquoBayesian variable selection with related predic-torsrdquoThe Canadian Journal of Statistics vol 24 no 1 pp 17ndash361996

[12] N H Choi W Li and J Zhu ldquoVariable selection with thestrong heredity constraint and its oracle propertyrdquo Journal of theAmerican Statistical Association vol 105 no 489 pp 354ndash3642010

[13] J Bien J Taylor and R Tibshirani ldquoA lasso for hierarchicalinteractionsrdquoTheAnnals of Statistics vol 41 no 3 pp 1111ndash11412013

[14] M Yuan and Y Lin ldquoModel selection and estimation in regres-sion with grouped variablesrdquo Journal of the Royal StatisticalSociety Series B StatisticalMethodology vol 68 no 1 pp 49ndash672006

[15] R Jenatton J-Y Audibert and F Bach ldquoStructured variableselection with sparsity-inducing normsrdquo Journal of MachineLearning Research vol 12 no 10 pp 2777ndash2824 2011

[16] P Radchenko and G M James ldquoVariable selection usingadaptive nonlinear interaction structures in high dimensionsrdquoJournal of the American Statistical Association vol 105 no 492pp 1541ndash1553 2010

[17] F Bach R Jenatton J Mairal and G Obozinski ldquoStructuredsparsity through convex optimizationrdquo Statistical Science vol27 no 4 pp 450ndash468 2012

[18] I Ruczinski C Kooperberg and M LeBlanc ldquoLogic regres-sionrdquo Journal of Computational and Graphical Statistics vol 12no 3 pp 475ndash511 2003

[19] P Hall and J-H Xue ldquoOn selecting interacting features fromhigh-dimensional datardquo Computational Statistics amp Data Anal-ysis vol 71 pp 694ndash708 2014

[20] J-J Wang J Li T Zhang and W-X Hong ldquoDistinguishingvisual feature extraction method using quadratic map andgenetic algorithmrdquo Journal of System Simulation vol 21 no 16pp 5080ndash5083 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 8: Research Article Coordinate Descent Based Hierarchical ...downloads.hindawi.com/journals/mpe/2014/430201.pdf · Coordinate Descent Based Hierarchical Interactive Lasso Penalized Logistic

8 Mathematical Problems in Engineering

Table 7 The experimental results of the three lasso methods and five traditional pattern recognition methods

Methods Error rate in training () Training time (s) Error rate in testing () Testing time (s)Lasso 000 02649 146 00312All-pair lasso 000 08807 135 00953Our method 000 06012 112 00556SVM 000 09516 123 009361-NN 000 00000 920 079563-NN 000 00000 898 07488QDA 661 14196 404 02184DT 000 07332 1493 01092

0055

005

0045

004

0035

003

0025

002

0015

Erro

r rat

e

Lasso Our method All-pair lasso

(a) Hierarchical interaction

0055

005

0045

004

0035

003

0025

002

0015

Erro

r rat

e

Lasso Our method All-pair lasso

(b) Interaction variances only

0055

005

0045

004

0035

003

0025

002

0015

Erro

r rat

e

Lasso Our method All-pair lasso

(c) Main variances only

Figure 6 The error rate of the three lasso methods in the simulation experiment (the Bayes error is shown by the purple dotted line in thegraph)

Mathematical Problems in Engineering 9

=120597 (1119873)sum

119873

119894=1[119910119894(119909119879119894120573 + (12) 119909119879

119894Θ119909119894) + ln (1 minus 119901

119894)]

120597Θ

=1

119873

119873

sum119894=1

[(1

2119910119894119909119879119894119909119894)119879

minusexp (119909119879

119894120573 + (12) 119909119879

119894Θ119909119894)

1 + exp (119909119879119894120573 + (12) 119909119879

119894Θ119909119894)

sdot1

2sdot 119909119879119894119909119894]

=1

119873

119873

sum119894=1

(1

2119910119894119909119879119894119909119894minus1

2119901119894119909119879119894119909119894)

=1

2119873

119873

sum119894=1

119909119879119894119909119894(119910119894minus 119901119894)

1205972119871119876

120597Θ2

=120597

120597Θ

1

2119873

119873

sum119894=1

119909119879119894119909119894(119910119894minus 119901119894) = minus

1

2119873

119873

sum119894=1

119909119879119894119909119894

120597119901119894

120597Θ

=1

2119873

119873

sum119894=1

exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

[1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)]2

sdot (minus1

2119909119879119894119909119894) sdot 119909119879119894119909119894

= minus1

4119873

119873

sum119894=1

exp (119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

sdot1

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)sdot1003817100381710038171003817119909119894

10038171003817100381710038172

= minus1

4119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot1003817100381710038171003817119909119894

10038171003817100381710038172

1205972119871119876

120597120573120597Θ

=120597

120597120573

1

119873

119873

sum119894=1

119909119879119894(119910119894minus 119901119894) = minus

1

119873

119873

sum119894=1

119909119879119894

120597119901119894

120597Θ

=1

119873

119873

sum119894=1

119909119879119894

exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

[1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)]2

sdot (minus1

2119909119879119894119909119894)

= minus1

119873

119873

sum119894=1

119909119879119894

exp (119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

sdot1

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

= minus1

2119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894sdot 119909119879119894

(A2)

Then (A1) is expanded by using Taylor series with respectto the expended point (120573 Θ)

119897119876(120573 Θ)

= 119871119876+ (120573 minus 120573) sdot

120597119871119876

120597120573+ (Θ minus Θ) sdot

120597119871119876

120597Θ

+1

2[(120573 minus 120573) sdot

120597119871119876

1205971205732+ 2 (120573 minus 120573)

119879

(Θ minus Θ)1205972119871119876

120597120573120597Θ

+ (Θ minus Θ) sdot120597119871119876

120597Θ2]

=1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

+ (120573 minus 120573) sdot1

119873

119873

sum119894=1

119909119894(119910119894minus 119901119894)

+ (Θ minus Θ) sdot1

2119873

119873

sum119894=1

119909119879119894119909119894(119910119894minus 119901119894)

minus1

2(120573 minus 120573) sdot

1

119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894

minus 2 (120573 minus 120573)119879

(Θ minus Θ)

sdot1

2119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894sdot 119909119879119894

minus (Θ minus Θ) sdot1

4119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot1003817100381710038171003817119909119894

10038171003817100381710038172

= minus1

2119873

119873

sum119894=1

(1 minus 119901119894)

sdot 119901119894[119909119879119894(120573 minus 120573)]

2

+ (120573 minus 120573)119879

(Θ minus Θ) 119909119894

sdot 119909119879119894119909119894+1

4[119909119879119894(Θ minus Θ) 119909

119894]2

+1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

+1

119873

119873

sum119894=1

119909119879119894(120573 minus 120573) (119910

119894minus 119901119894)

+1

2119873

119873

sum119894=1

119909119879119894(Θ minus Θ) 119909

119894(119910119894minus 119901119894)

= minus1

2119873

119873

sum119894=1

(1 minus 119901119894)

10 Mathematical Problems in Engineering

sdot 119901119894119909119879119894(120573 minus 120573)

+1

2119909119879119894(Θ minus Θ) 119909

1198942

+1

119873

119873

sum119894=1

119909119879119894(120573 minus 120573) (119910

119894minus 119901119894)

+1

2119873

119873

sum119894=1

119909119879119894(Θ minus Θ) 119909

119894(119910119894minus 119901119894)

+1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

= minus1

2119873

119873

sum119894=1

(1 minus 119901119894)

sdot 119901119894119909119879119894(120573 minus 120573) +

1

2119909119879119894(Θ minus Θ) 119909

119894

+119910119894minus 119901119894

119901119894[1 minus 119901

119894]

2

+1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

minus1

2119873

119873

sum119894=1

[119910119894minus 119901119894

119901119894[1 minus 119901

119894]]

2

= minus1

2119873

119873

sum119894=1

120596119894(119911119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896)

2

+1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

minus1

2119873

119873

sum119894=1

[119910119894minus 119901119894

119901119894[1 minus 119901

119894]]

2

(A3)

where

120596119894= 119901 (119909

119894) [1 minus 119901 (119909

119894)]

119911119894= sum119895

120573119895119909119894119895+1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896

+119910119894minus 119901 (119909

119894)

119901 (119909119894) [1 minus 119901 (119909

119894)]

(A4)

B Proofs from (12) to (13)

First there are three cases in calculating 120573+119895minus 120573minus119895

(1) 120573+119895ge 0 120573minus

119895= 0

120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+

119895minus 120573minus119895minus1205821+ 119895

120596119894

(B1)

(2) 120573+119895ge 0 120573minus

119895= 0

120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+

119895minus 120573minus119895+1205821+ 119895

120596119894

(B2)

(3) 120573+119895ge 0 120573minus

119895= 0

120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+

119895minus 120573minus119895 (B3)

We derive

120573+119895minus 120573minus119895= 119878 [119909

119894119895(119911119894minus 119860119894) + 120573+119895minus 120573minus1198951205821+ 119895

120596119894

] (B4)

Secondly we give the following by the stationary condi-tion 120597119871120597Θ

119895119896= 0

1

2120596119894sdot 2(119911

119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896)

sdot (minus1

2119909119894119895119909119894119896) + (120582

2+ 120572119895)119880119895119896= 0

120596119894sdot (119911119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896)

sdot (1

2119909119894119895119909119894119896) = (120582

2+ 120572119895)119880119895119896

(119911119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896)

sdot (1

2119909119894119895119909119894119896) =

(1205822+ 120572119895)119880119895119896

120596119894

(B5)

Supposing

120574(minus119895119896) = 119911119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896+ Θ119895119896119909119894119895119909119894119896 (B6)

we have

(120574(minus119895119896) minus Θ119895119896119909119894119895119909119894119896) sdot (119909119894119895119909119894119896)

=2 (1205822+ 120572119895)119880119895119896

120596119894

Θ119895119896119909119894119895119909119894119896119909119894119895119909119894119896

= minus2 (1205822+ 120572119895)119880119895119896

120596119894

+ 120574(minus119895119896) sdot 119909119894119895119909119894119896

Mathematical Problems in Engineering 11

Θ119895119896

=120574(minus119895119896) sdot 119909

119894119895119909119894119896minus 2 (120582

2+ 120572119895)119880119895119896120596119894

(119909119894119895119909119894119896)2

(B7)

We discuss three cases about Θ119895119896value

(1) Θ119895119896gt 0 119880

119895119896= 1

Θ119895119896=120574(minus119895119896) sdot 119909

119894119895119909119894119896minus 2 (120582

2+ 120572119895) 120596119894

(119909119894119895119909119894119896)2

(B8)

(2) Θ119895119896lt 0 119880

119895119896= minus1

Θ119895119896=120574(minus119895119896) sdot 119909

119894119895119909119894119896+ 2 (120582

2+ 120572119895) 120596119894

(119909119894119895119909119894119896)2

(B9)

(3) Θ119895119896= 0

We drive

Θ119895119896=119878 [120574(minus119895119896) sdot 119909

119894119895119909119894119896 2 (1205822+ 120572119895) 120596119894]

(119909119894119895119909119894119896)2

(B10)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported by the National Natural ScienceFoundation of China (nos 61273019 61473339) China Post-doctoral Science Foundation (2014M561202) Hebei Postdoc-toral Science Foundation Special Fund Project and HebeiTop Young Talents Support Program

References

[1] R Tibshirani ldquoRegression shrinkage and selection via the lassordquoJournal of the Royal Statistical Society B vol 58 no 1 pp 267ndash288 1996

[2] M Y Park and T Hastie ldquo1198711-regularization path algorithm

for generalized linear modelsrdquo Journal of the Royal StatisticalSociety B Statistical Methodology vol 69 no 4 pp 659ndash6772007

[3] T TWu Y F Chen THastie E Sobel andK Lange ldquoGenome-wide association analysis by lasso penalized logistic regressionrdquoBioinformatics vol 25 no 6 pp 714ndash721 2009

[4] J Friedman T Hastie and R Tibshirani ldquoRegularization pathsfor generalized linear models via coordinate descentrdquo Journal ofStatistical Software vol 33 no 1 pp 1ndash22 2010

[5] M Lim and T Hastie ldquoLearning interactions via hierarchicalgroup-lasso regularization[EB]rdquo httpwwwstanfordedusimhastiePapersglinternetpdf

[6] H Schwender and K Ickstadt ldquoIdentification of SNP interac-tions using logic regressionrdquo Biostatistics vol 9 no 1 pp 187ndash198 2008

[7] S Noah and R Tibshirani ldquoA Permutation Approach to TestingInteractions inManyDimensionsrdquo httpstatwebstanfordedusimtibsresearchhtml

[8] JWu BDevlin S RingquistM Trucco andK Roeder ldquoScreenand clean a tool for identifying interactions in genome-wideassociation studiesrdquo Genetic Epidemiology vol 34 no 3 pp275ndash285 2010

[9] Y Nardi and A Rinaldo ldquoThe log-linear group-lasso estimatorand its asymptotic propertiesrdquo Bernoulli vol 18 no 3 pp 945ndash974 2012

[10] M Yuan V R Joseph and Y Lin ldquoAn efficient variable selectionapproach for analyzing designed experimentsrdquo Technometricsvol 49 no 4 pp 430ndash439 2007

[11] H Chipman ldquoBayesian variable selection with related predic-torsrdquoThe Canadian Journal of Statistics vol 24 no 1 pp 17ndash361996

[12] N H Choi W Li and J Zhu ldquoVariable selection with thestrong heredity constraint and its oracle propertyrdquo Journal of theAmerican Statistical Association vol 105 no 489 pp 354ndash3642010

[13] J Bien J Taylor and R Tibshirani ldquoA lasso for hierarchicalinteractionsrdquoTheAnnals of Statistics vol 41 no 3 pp 1111ndash11412013

[14] M Yuan and Y Lin ldquoModel selection and estimation in regres-sion with grouped variablesrdquo Journal of the Royal StatisticalSociety Series B StatisticalMethodology vol 68 no 1 pp 49ndash672006

[15] R Jenatton J-Y Audibert and F Bach ldquoStructured variableselection with sparsity-inducing normsrdquo Journal of MachineLearning Research vol 12 no 10 pp 2777ndash2824 2011

[16] P Radchenko and G M James ldquoVariable selection usingadaptive nonlinear interaction structures in high dimensionsrdquoJournal of the American Statistical Association vol 105 no 492pp 1541ndash1553 2010

[17] F Bach R Jenatton J Mairal and G Obozinski ldquoStructuredsparsity through convex optimizationrdquo Statistical Science vol27 no 4 pp 450ndash468 2012

[18] I Ruczinski C Kooperberg and M LeBlanc ldquoLogic regres-sionrdquo Journal of Computational and Graphical Statistics vol 12no 3 pp 475ndash511 2003

[19] P Hall and J-H Xue ldquoOn selecting interacting features fromhigh-dimensional datardquo Computational Statistics amp Data Anal-ysis vol 71 pp 694ndash708 2014

[20] J-J Wang J Li T Zhang and W-X Hong ldquoDistinguishingvisual feature extraction method using quadratic map andgenetic algorithmrdquo Journal of System Simulation vol 21 no 16pp 5080ndash5083 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 9: Research Article Coordinate Descent Based Hierarchical ...downloads.hindawi.com/journals/mpe/2014/430201.pdf · Coordinate Descent Based Hierarchical Interactive Lasso Penalized Logistic

Mathematical Problems in Engineering 9

=120597 (1119873)sum

119873

119894=1[119910119894(119909119879119894120573 + (12) 119909119879

119894Θ119909119894) + ln (1 minus 119901

119894)]

120597Θ

=1

119873

119873

sum119894=1

[(1

2119910119894119909119879119894119909119894)119879

minusexp (119909119879

119894120573 + (12) 119909119879

119894Θ119909119894)

1 + exp (119909119879119894120573 + (12) 119909119879

119894Θ119909119894)

sdot1

2sdot 119909119879119894119909119894]

=1

119873

119873

sum119894=1

(1

2119910119894119909119879119894119909119894minus1

2119901119894119909119879119894119909119894)

=1

2119873

119873

sum119894=1

119909119879119894119909119894(119910119894minus 119901119894)

1205972119871119876

120597Θ2

=120597

120597Θ

1

2119873

119873

sum119894=1

119909119879119894119909119894(119910119894minus 119901119894) = minus

1

2119873

119873

sum119894=1

119909119879119894119909119894

120597119901119894

120597Θ

=1

2119873

119873

sum119894=1

exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

[1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)]2

sdot (minus1

2119909119879119894119909119894) sdot 119909119879119894119909119894

= minus1

4119873

119873

sum119894=1

exp (119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

sdot1

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)sdot1003817100381710038171003817119909119894

10038171003817100381710038172

= minus1

4119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot1003817100381710038171003817119909119894

10038171003817100381710038172

1205972119871119876

120597120573120597Θ

=120597

120597120573

1

119873

119873

sum119894=1

119909119879119894(119910119894minus 119901119894) = minus

1

119873

119873

sum119894=1

119909119879119894

120597119901119894

120597Θ

=1

119873

119873

sum119894=1

119909119879119894

exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

[1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)]2

sdot (minus1

2119909119879119894119909119894)

= minus1

119873

119873

sum119894=1

119909119879119894

exp (119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

sdot1

1 + exp (minus119909119879119894120573 minus (12) 119909119879

119894Θ119909119894)

= minus1

2119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894sdot 119909119879119894

(A2)

Then (A1) is expanded by using Taylor series with respectto the expended point (120573 Θ)

119897119876(120573 Θ)

= 119871119876+ (120573 minus 120573) sdot

120597119871119876

120597120573+ (Θ minus Θ) sdot

120597119871119876

120597Θ

+1

2[(120573 minus 120573) sdot

120597119871119876

1205971205732+ 2 (120573 minus 120573)

119879

(Θ minus Θ)1205972119871119876

120597120573120597Θ

+ (Θ minus Θ) sdot120597119871119876

120597Θ2]

=1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

+ (120573 minus 120573) sdot1

119873

119873

sum119894=1

119909119894(119910119894minus 119901119894)

+ (Θ minus Θ) sdot1

2119873

119873

sum119894=1

119909119879119894119909119894(119910119894minus 119901119894)

minus1

2(120573 minus 120573) sdot

1

119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894

minus 2 (120573 minus 120573)119879

(Θ minus Θ)

sdot1

2119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894sdot 119909119879119894

minus (Θ minus Θ) sdot1

4119873

119873

sum119894=1

(1 minus 119901119894) sdot 119901119894sdot1003817100381710038171003817119909119894

10038171003817100381710038172

= minus1

2119873

119873

sum119894=1

(1 minus 119901119894)

sdot 119901119894[119909119879119894(120573 minus 120573)]

2

+ (120573 minus 120573)119879

(Θ minus Θ) 119909119894

sdot 119909119879119894119909119894+1

4[119909119879119894(Θ minus Θ) 119909

119894]2

+1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

+1

119873

119873

sum119894=1

119909119879119894(120573 minus 120573) (119910

119894minus 119901119894)

+1

2119873

119873

sum119894=1

119909119879119894(Θ minus Θ) 119909

119894(119910119894minus 119901119894)

= minus1

2119873

119873

sum119894=1

(1 minus 119901119894)

10 Mathematical Problems in Engineering

sdot 119901119894119909119879119894(120573 minus 120573)

+1

2119909119879119894(Θ minus Θ) 119909

1198942

+1

119873

119873

sum119894=1

119909119879119894(120573 minus 120573) (119910

119894minus 119901119894)

+1

2119873

119873

sum119894=1

119909119879119894(Θ minus Θ) 119909

119894(119910119894minus 119901119894)

+1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

= minus1

2119873

119873

sum119894=1

(1 minus 119901119894)

sdot 119901119894119909119879119894(120573 minus 120573) +

1

2119909119879119894(Θ minus Θ) 119909

119894

+119910119894minus 119901119894

119901119894[1 minus 119901

119894]

2

+1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

minus1

2119873

119873

sum119894=1

[119910119894minus 119901119894

119901119894[1 minus 119901

119894]]

2

= minus1

2119873

119873

sum119894=1

120596119894(119911119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896)

2

+1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

minus1

2119873

119873

sum119894=1

[119910119894minus 119901119894

119901119894[1 minus 119901

119894]]

2

(A3)

where

120596119894= 119901 (119909

119894) [1 minus 119901 (119909

119894)]

119911119894= sum119895

120573119895119909119894119895+1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896

+119910119894minus 119901 (119909

119894)

119901 (119909119894) [1 minus 119901 (119909

119894)]

(A4)

B Proofs from (12) to (13)

First there are three cases in calculating 120573+119895minus 120573minus119895

(1) 120573+119895ge 0 120573minus

119895= 0

120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+

119895minus 120573minus119895minus1205821+ 119895

120596119894

(B1)

(2) 120573+119895ge 0 120573minus

119895= 0

120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+

119895minus 120573minus119895+1205821+ 119895

120596119894

(B2)

(3) 120573+119895ge 0 120573minus

119895= 0

120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+

119895minus 120573minus119895 (B3)

We derive

120573+119895minus 120573minus119895= 119878 [119909

119894119895(119911119894minus 119860119894) + 120573+119895minus 120573minus1198951205821+ 119895

120596119894

] (B4)

Secondly we give the following by the stationary condi-tion 120597119871120597Θ

119895119896= 0

1

2120596119894sdot 2(119911

119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896)

sdot (minus1

2119909119894119895119909119894119896) + (120582

2+ 120572119895)119880119895119896= 0

120596119894sdot (119911119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896)

sdot (1

2119909119894119895119909119894119896) = (120582

2+ 120572119895)119880119895119896

(119911119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896)

sdot (1

2119909119894119895119909119894119896) =

(1205822+ 120572119895)119880119895119896

120596119894

(B5)

Supposing

120574(minus119895119896) = 119911119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896+ Θ119895119896119909119894119895119909119894119896 (B6)

we have

(120574(minus119895119896) minus Θ119895119896119909119894119895119909119894119896) sdot (119909119894119895119909119894119896)

=2 (1205822+ 120572119895)119880119895119896

120596119894

Θ119895119896119909119894119895119909119894119896119909119894119895119909119894119896

= minus2 (1205822+ 120572119895)119880119895119896

120596119894

+ 120574(minus119895119896) sdot 119909119894119895119909119894119896

Mathematical Problems in Engineering 11

Θ119895119896

=120574(minus119895119896) sdot 119909

119894119895119909119894119896minus 2 (120582

2+ 120572119895)119880119895119896120596119894

(119909119894119895119909119894119896)2

(B7)

We discuss three cases about Θ119895119896value

(1) Θ119895119896gt 0 119880

119895119896= 1

Θ119895119896=120574(minus119895119896) sdot 119909

119894119895119909119894119896minus 2 (120582

2+ 120572119895) 120596119894

(119909119894119895119909119894119896)2

(B8)

(2) Θ119895119896lt 0 119880

119895119896= minus1

Θ119895119896=120574(minus119895119896) sdot 119909

119894119895119909119894119896+ 2 (120582

2+ 120572119895) 120596119894

(119909119894119895119909119894119896)2

(B9)

(3) Θ119895119896= 0

We drive

Θ119895119896=119878 [120574(minus119895119896) sdot 119909

119894119895119909119894119896 2 (1205822+ 120572119895) 120596119894]

(119909119894119895119909119894119896)2

(B10)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported by the National Natural ScienceFoundation of China (nos 61273019 61473339) China Post-doctoral Science Foundation (2014M561202) Hebei Postdoc-toral Science Foundation Special Fund Project and HebeiTop Young Talents Support Program

References

[1] R Tibshirani ldquoRegression shrinkage and selection via the lassordquoJournal of the Royal Statistical Society B vol 58 no 1 pp 267ndash288 1996

[2] M Y Park and T Hastie ldquo1198711-regularization path algorithm

for generalized linear modelsrdquo Journal of the Royal StatisticalSociety B Statistical Methodology vol 69 no 4 pp 659ndash6772007

[3] T TWu Y F Chen THastie E Sobel andK Lange ldquoGenome-wide association analysis by lasso penalized logistic regressionrdquoBioinformatics vol 25 no 6 pp 714ndash721 2009

[4] J Friedman T Hastie and R Tibshirani ldquoRegularization pathsfor generalized linear models via coordinate descentrdquo Journal ofStatistical Software vol 33 no 1 pp 1ndash22 2010

[5] M Lim and T Hastie ldquoLearning interactions via hierarchicalgroup-lasso regularization[EB]rdquo httpwwwstanfordedusimhastiePapersglinternetpdf

[6] H Schwender and K Ickstadt ldquoIdentification of SNP interac-tions using logic regressionrdquo Biostatistics vol 9 no 1 pp 187ndash198 2008

[7] S Noah and R Tibshirani ldquoA Permutation Approach to TestingInteractions inManyDimensionsrdquo httpstatwebstanfordedusimtibsresearchhtml

[8] JWu BDevlin S RingquistM Trucco andK Roeder ldquoScreenand clean a tool for identifying interactions in genome-wideassociation studiesrdquo Genetic Epidemiology vol 34 no 3 pp275ndash285 2010

[9] Y Nardi and A Rinaldo ldquoThe log-linear group-lasso estimatorand its asymptotic propertiesrdquo Bernoulli vol 18 no 3 pp 945ndash974 2012

[10] M Yuan V R Joseph and Y Lin ldquoAn efficient variable selectionapproach for analyzing designed experimentsrdquo Technometricsvol 49 no 4 pp 430ndash439 2007

[11] H Chipman ldquoBayesian variable selection with related predic-torsrdquoThe Canadian Journal of Statistics vol 24 no 1 pp 17ndash361996

[12] N H Choi W Li and J Zhu ldquoVariable selection with thestrong heredity constraint and its oracle propertyrdquo Journal of theAmerican Statistical Association vol 105 no 489 pp 354ndash3642010

[13] J Bien J Taylor and R Tibshirani ldquoA lasso for hierarchicalinteractionsrdquoTheAnnals of Statistics vol 41 no 3 pp 1111ndash11412013

[14] M Yuan and Y Lin ldquoModel selection and estimation in regres-sion with grouped variablesrdquo Journal of the Royal StatisticalSociety Series B StatisticalMethodology vol 68 no 1 pp 49ndash672006

[15] R Jenatton J-Y Audibert and F Bach ldquoStructured variableselection with sparsity-inducing normsrdquo Journal of MachineLearning Research vol 12 no 10 pp 2777ndash2824 2011

[16] P Radchenko and G M James ldquoVariable selection usingadaptive nonlinear interaction structures in high dimensionsrdquoJournal of the American Statistical Association vol 105 no 492pp 1541ndash1553 2010

[17] F Bach R Jenatton J Mairal and G Obozinski ldquoStructuredsparsity through convex optimizationrdquo Statistical Science vol27 no 4 pp 450ndash468 2012

[18] I Ruczinski C Kooperberg and M LeBlanc ldquoLogic regres-sionrdquo Journal of Computational and Graphical Statistics vol 12no 3 pp 475ndash511 2003

[19] P Hall and J-H Xue ldquoOn selecting interacting features fromhigh-dimensional datardquo Computational Statistics amp Data Anal-ysis vol 71 pp 694ndash708 2014

[20] J-J Wang J Li T Zhang and W-X Hong ldquoDistinguishingvisual feature extraction method using quadratic map andgenetic algorithmrdquo Journal of System Simulation vol 21 no 16pp 5080ndash5083 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 10: Research Article Coordinate Descent Based Hierarchical ...downloads.hindawi.com/journals/mpe/2014/430201.pdf · Coordinate Descent Based Hierarchical Interactive Lasso Penalized Logistic

10 Mathematical Problems in Engineering

sdot 119901119894119909119879119894(120573 minus 120573)

+1

2119909119879119894(Θ minus Θ) 119909

1198942

+1

119873

119873

sum119894=1

119909119879119894(120573 minus 120573) (119910

119894minus 119901119894)

+1

2119873

119873

sum119894=1

119909119879119894(Θ minus Θ) 119909

119894(119910119894minus 119901119894)

+1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

= minus1

2119873

119873

sum119894=1

(1 minus 119901119894)

sdot 119901119894119909119879119894(120573 minus 120573) +

1

2119909119879119894(Θ minus Θ) 119909

119894

+119910119894minus 119901119894

119901119894[1 minus 119901

119894]

2

+1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

minus1

2119873

119873

sum119894=1

[119910119894minus 119901119894

119901119894[1 minus 119901

119894]]

2

= minus1

2119873

119873

sum119894=1

120596119894(119911119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896)

2

+1

119873

119873

sum119894=1

[119910119894(119909119879119894120573 +

1

2119909119879119894Θ119909119894) + ln (1 minus 119901

119894)]

minus1

2119873

119873

sum119894=1

[119910119894minus 119901119894

119901119894[1 minus 119901

119894]]

2

(A3)

where

120596119894= 119901 (119909

119894) [1 minus 119901 (119909

119894)]

119911119894= sum119895

120573119895119909119894119895+1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896

+119910119894minus 119901 (119909

119894)

119901 (119909119894) [1 minus 119901 (119909

119894)]

(A4)

B Proofs from (12) to (13)

First there are three cases in calculating 120573+119895minus 120573minus119895

(1) 120573+119895ge 0 120573minus

119895= 0

120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+

119895minus 120573minus119895minus1205821+ 119895

120596119894

(B1)

(2) 120573+119895ge 0 120573minus

119895= 0

120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+

119895minus 120573minus119895+1205821+ 119895

120596119894

(B2)

(3) 120573+119895ge 0 120573minus

119895= 0

120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+

119895minus 120573minus119895 (B3)

We derive

120573+119895minus 120573minus119895= 119878 [119909

119894119895(119911119894minus 119860119894) + 120573+119895minus 120573minus1198951205821+ 119895

120596119894

] (B4)

Secondly we give the following by the stationary condi-tion 120597119871120597Θ

119895119896= 0

1

2120596119894sdot 2(119911

119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896)

sdot (minus1

2119909119894119895119909119894119896) + (120582

2+ 120572119895)119880119895119896= 0

120596119894sdot (119911119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896)

sdot (1

2119909119894119895119909119894119896) = (120582

2+ 120572119895)119880119895119896

(119911119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896)

sdot (1

2119909119894119895119909119894119896) =

(1205822+ 120572119895)119880119895119896

120596119894

(B5)

Supposing

120574(minus119895119896) = 119911119894minussum119895

120573119895119909119894119895minus1

2sum119895 =119896

Θ119895119896119909119894119895119909119894119896+ Θ119895119896119909119894119895119909119894119896 (B6)

we have

(120574(minus119895119896) minus Θ119895119896119909119894119895119909119894119896) sdot (119909119894119895119909119894119896)

=2 (1205822+ 120572119895)119880119895119896

120596119894

Θ119895119896119909119894119895119909119894119896119909119894119895119909119894119896

= minus2 (1205822+ 120572119895)119880119895119896

120596119894

+ 120574(minus119895119896) sdot 119909119894119895119909119894119896

Mathematical Problems in Engineering 11

Θ119895119896

=120574(minus119895119896) sdot 119909

119894119895119909119894119896minus 2 (120582

2+ 120572119895)119880119895119896120596119894

(119909119894119895119909119894119896)2

(B7)

We discuss three cases about Θ119895119896value

(1) Θ119895119896gt 0 119880

119895119896= 1

Θ119895119896=120574(minus119895119896) sdot 119909

119894119895119909119894119896minus 2 (120582

2+ 120572119895) 120596119894

(119909119894119895119909119894119896)2

(B8)

(2) Θ119895119896lt 0 119880

119895119896= minus1

Θ119895119896=120574(minus119895119896) sdot 119909

119894119895119909119894119896+ 2 (120582

2+ 120572119895) 120596119894

(119909119894119895119909119894119896)2

(B9)

(3) Θ119895119896= 0

We drive

Θ119895119896=119878 [120574(minus119895119896) sdot 119909

119894119895119909119894119896 2 (1205822+ 120572119895) 120596119894]

(119909119894119895119909119894119896)2

(B10)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported by the National Natural ScienceFoundation of China (nos 61273019 61473339) China Post-doctoral Science Foundation (2014M561202) Hebei Postdoc-toral Science Foundation Special Fund Project and HebeiTop Young Talents Support Program

References

[1] R Tibshirani ldquoRegression shrinkage and selection via the lassordquoJournal of the Royal Statistical Society B vol 58 no 1 pp 267ndash288 1996

[2] M Y Park and T Hastie ldquo1198711-regularization path algorithm

for generalized linear modelsrdquo Journal of the Royal StatisticalSociety B Statistical Methodology vol 69 no 4 pp 659ndash6772007

[3] T TWu Y F Chen THastie E Sobel andK Lange ldquoGenome-wide association analysis by lasso penalized logistic regressionrdquoBioinformatics vol 25 no 6 pp 714ndash721 2009

[4] J Friedman T Hastie and R Tibshirani ldquoRegularization pathsfor generalized linear models via coordinate descentrdquo Journal ofStatistical Software vol 33 no 1 pp 1ndash22 2010

[5] M Lim and T Hastie ldquoLearning interactions via hierarchicalgroup-lasso regularization[EB]rdquo httpwwwstanfordedusimhastiePapersglinternetpdf

[6] H Schwender and K Ickstadt ldquoIdentification of SNP interac-tions using logic regressionrdquo Biostatistics vol 9 no 1 pp 187ndash198 2008

[7] S Noah and R Tibshirani ldquoA Permutation Approach to TestingInteractions inManyDimensionsrdquo httpstatwebstanfordedusimtibsresearchhtml

[8] JWu BDevlin S RingquistM Trucco andK Roeder ldquoScreenand clean a tool for identifying interactions in genome-wideassociation studiesrdquo Genetic Epidemiology vol 34 no 3 pp275ndash285 2010

[9] Y Nardi and A Rinaldo ldquoThe log-linear group-lasso estimatorand its asymptotic propertiesrdquo Bernoulli vol 18 no 3 pp 945ndash974 2012

[10] M Yuan V R Joseph and Y Lin ldquoAn efficient variable selectionapproach for analyzing designed experimentsrdquo Technometricsvol 49 no 4 pp 430ndash439 2007

[11] H Chipman ldquoBayesian variable selection with related predic-torsrdquoThe Canadian Journal of Statistics vol 24 no 1 pp 17ndash361996

[12] N H Choi W Li and J Zhu ldquoVariable selection with thestrong heredity constraint and its oracle propertyrdquo Journal of theAmerican Statistical Association vol 105 no 489 pp 354ndash3642010

[13] J Bien J Taylor and R Tibshirani ldquoA lasso for hierarchicalinteractionsrdquoTheAnnals of Statistics vol 41 no 3 pp 1111ndash11412013

[14] M Yuan and Y Lin ldquoModel selection and estimation in regres-sion with grouped variablesrdquo Journal of the Royal StatisticalSociety Series B StatisticalMethodology vol 68 no 1 pp 49ndash672006

[15] R Jenatton J-Y Audibert and F Bach ldquoStructured variableselection with sparsity-inducing normsrdquo Journal of MachineLearning Research vol 12 no 10 pp 2777ndash2824 2011

[16] P Radchenko and G M James ldquoVariable selection usingadaptive nonlinear interaction structures in high dimensionsrdquoJournal of the American Statistical Association vol 105 no 492pp 1541ndash1553 2010

[17] F Bach R Jenatton J Mairal and G Obozinski ldquoStructuredsparsity through convex optimizationrdquo Statistical Science vol27 no 4 pp 450ndash468 2012

[18] I Ruczinski C Kooperberg and M LeBlanc ldquoLogic regres-sionrdquo Journal of Computational and Graphical Statistics vol 12no 3 pp 475ndash511 2003

[19] P Hall and J-H Xue ldquoOn selecting interacting features fromhigh-dimensional datardquo Computational Statistics amp Data Anal-ysis vol 71 pp 694ndash708 2014

[20] J-J Wang J Li T Zhang and W-X Hong ldquoDistinguishingvisual feature extraction method using quadratic map andgenetic algorithmrdquo Journal of System Simulation vol 21 no 16pp 5080ndash5083 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 11: Research Article Coordinate Descent Based Hierarchical ...downloads.hindawi.com/journals/mpe/2014/430201.pdf · Coordinate Descent Based Hierarchical Interactive Lasso Penalized Logistic

Mathematical Problems in Engineering 11

Θ119895119896

=120574(minus119895119896) sdot 119909

119894119895119909119894119896minus 2 (120582

2+ 120572119895)119880119895119896120596119894

(119909119894119895119909119894119896)2

(B7)

We discuss three cases about Θ119895119896value

(1) Θ119895119896gt 0 119880

119895119896= 1

Θ119895119896=120574(minus119895119896) sdot 119909

119894119895119909119894119896minus 2 (120582

2+ 120572119895) 120596119894

(119909119894119895119909119894119896)2

(B8)

(2) Θ119895119896lt 0 119880

119895119896= minus1

Θ119895119896=120574(minus119895119896) sdot 119909

119894119895119909119894119896+ 2 (120582

2+ 120572119895) 120596119894

(119909119894119895119909119894119896)2

(B9)

(3) Θ119895119896= 0

We drive

Θ119895119896=119878 [120574(minus119895119896) sdot 119909

119894119895119909119894119896 2 (1205822+ 120572119895) 120596119894]

(119909119894119895119909119894119896)2

(B10)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported by the National Natural ScienceFoundation of China (nos 61273019 61473339) China Post-doctoral Science Foundation (2014M561202) Hebei Postdoc-toral Science Foundation Special Fund Project and HebeiTop Young Talents Support Program

References

[1] R Tibshirani ldquoRegression shrinkage and selection via the lassordquoJournal of the Royal Statistical Society B vol 58 no 1 pp 267ndash288 1996

[2] M Y Park and T Hastie ldquo1198711-regularization path algorithm

for generalized linear modelsrdquo Journal of the Royal StatisticalSociety B Statistical Methodology vol 69 no 4 pp 659ndash6772007

[3] T TWu Y F Chen THastie E Sobel andK Lange ldquoGenome-wide association analysis by lasso penalized logistic regressionrdquoBioinformatics vol 25 no 6 pp 714ndash721 2009

[4] J Friedman T Hastie and R Tibshirani ldquoRegularization pathsfor generalized linear models via coordinate descentrdquo Journal ofStatistical Software vol 33 no 1 pp 1ndash22 2010

[5] M Lim and T Hastie ldquoLearning interactions via hierarchicalgroup-lasso regularization[EB]rdquo httpwwwstanfordedusimhastiePapersglinternetpdf

[6] H Schwender and K Ickstadt ldquoIdentification of SNP interac-tions using logic regressionrdquo Biostatistics vol 9 no 1 pp 187ndash198 2008

[7] S Noah and R Tibshirani ldquoA Permutation Approach to TestingInteractions inManyDimensionsrdquo httpstatwebstanfordedusimtibsresearchhtml

[8] JWu BDevlin S RingquistM Trucco andK Roeder ldquoScreenand clean a tool for identifying interactions in genome-wideassociation studiesrdquo Genetic Epidemiology vol 34 no 3 pp275ndash285 2010

[9] Y Nardi and A Rinaldo ldquoThe log-linear group-lasso estimatorand its asymptotic propertiesrdquo Bernoulli vol 18 no 3 pp 945ndash974 2012

[10] M Yuan V R Joseph and Y Lin ldquoAn efficient variable selectionapproach for analyzing designed experimentsrdquo Technometricsvol 49 no 4 pp 430ndash439 2007

[11] H Chipman ldquoBayesian variable selection with related predic-torsrdquoThe Canadian Journal of Statistics vol 24 no 1 pp 17ndash361996

[12] N H Choi W Li and J Zhu ldquoVariable selection with thestrong heredity constraint and its oracle propertyrdquo Journal of theAmerican Statistical Association vol 105 no 489 pp 354ndash3642010

[13] J Bien J Taylor and R Tibshirani ldquoA lasso for hierarchicalinteractionsrdquoTheAnnals of Statistics vol 41 no 3 pp 1111ndash11412013

[14] M Yuan and Y Lin ldquoModel selection and estimation in regres-sion with grouped variablesrdquo Journal of the Royal StatisticalSociety Series B StatisticalMethodology vol 68 no 1 pp 49ndash672006

[15] R Jenatton J-Y Audibert and F Bach ldquoStructured variableselection with sparsity-inducing normsrdquo Journal of MachineLearning Research vol 12 no 10 pp 2777ndash2824 2011

[16] P Radchenko and G M James ldquoVariable selection usingadaptive nonlinear interaction structures in high dimensionsrdquoJournal of the American Statistical Association vol 105 no 492pp 1541ndash1553 2010

[17] F Bach R Jenatton J Mairal and G Obozinski ldquoStructuredsparsity through convex optimizationrdquo Statistical Science vol27 no 4 pp 450ndash468 2012

[18] I Ruczinski C Kooperberg and M LeBlanc ldquoLogic regres-sionrdquo Journal of Computational and Graphical Statistics vol 12no 3 pp 475ndash511 2003

[19] P Hall and J-H Xue ldquoOn selecting interacting features fromhigh-dimensional datardquo Computational Statistics amp Data Anal-ysis vol 71 pp 694ndash708 2014

[20] J-J Wang J Li T Zhang and W-X Hong ldquoDistinguishingvisual feature extraction method using quadratic map andgenetic algorithmrdquo Journal of System Simulation vol 21 no 16pp 5080ndash5083 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 12: Research Article Coordinate Descent Based Hierarchical ...downloads.hindawi.com/journals/mpe/2014/430201.pdf · Coordinate Descent Based Hierarchical Interactive Lasso Penalized Logistic

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of