research article coordinate descent based hierarchical...
TRANSCRIPT
Research ArticleCoordinate Descent Based Hierarchical InteractiveLasso Penalized Logistic Regression and Its Application toClassification Problems
Jin-Jia Wang and Yang Lu
School of Information Science and Engineering Yanshan University Qinhuangdao 066004 China
Correspondence should be addressed to Jin-Jia Wang wjjysueducn
Received 20 August 2014 Revised 1 December 2014 Accepted 1 December 2014 Published 16 December 2014
Academic Editor Wei-Chiang Hong
Copyright copy 2014 J-J Wang and Y LuThis is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited
We present the hierarchical interactive lasso penalized logistic regression using the coordinate descent algorithm based on thehierarchy theory and variables interactions We define the interaction model based on the geometric algebra and hierarchicalconstraint conditions and then use the coordinate descent algorithm to solve for the coefficients of the hierarchical interactive lassomodel We provide the results of some experiments based on UCI datasets Madelon datasets from NIPS2003 and daily activitiesof the elderThe experimental results show that the variable interactions and hierarchy contribute significantly to the classificationThe hierarchical interactive lasso has the advantages of the lasso and interactive lasso
1 Introduction
Sparse linear models (such as the lasso) are a remarkablesuccess of the regression analysis of high-dimensional data[1] The lasso is a least squares regression with the L1 penaltyfunction It can also be extended to the generalized linearmodel [2] for example the logistic regression with L1 penaltyused for classification [3] In the lasso model the responsevariable is assumed to be a linear weighted sum of thepredictor variables and the optimization problem used tofind theweighting coefficients can be solved by the coordinatedescent algorithm [4] If in the analysis of high-dimensionaldata the response variable cannot be explained by a linearweighted sum of predictor variables a higher-order modeland quadratic model need to be used In most cases this sug-gests the presence of variable interactions [5]The presence ofsuch interactions is considered important as for example theinteraction between single nucleotide polymorphisms (SNPs)plays an important role in the diagnosis of cancer and otherdiseases [6] While the linear model has some advantagessuch as good interpretability and simple calculations thevariable interaction models are considered to be a focus ofthe modern research [7]
There are three types of methods used in the hierarchicalinteraction models The first one is a multistep method This
method is based on removing or adding the best predictorvariables or interaction variables in each iteration Once thepredictor variables corresponding to the interaction variablesare in the model the interaction variables must be in themodel as well [8] Alternatively we can consider the variableselection before the interaction selection [9] Usually themodified LARS algorithm is used in such models to solve theinteraction model [10] The second type is the Bayes modelmethodThis approach improves the random search variableselection method for the hierarchical interaction model[11] The third type is based on optimization The sparseinteractionmodel is formulated as a nonconvex optimizationproblem [12] and further expressed as a convex optimizationproblem such as all-pair lasso [13] or interaction group lasso[14]
In the literature on the sparse structures [15] compositeabsolute penalties (CAP) can also obtain the sparseness ofthe group and interaction but the interaction coefficient ispenalized twice [16] To solve for the hierarchical sparsenessin the nonlinear interaction problem the existing litera-ture [17] has introduced the VANISH method The logicregression method considers the binary variable high-levelinteraction [18] The existing literature [19] uses a simplerecursive approach to select the interaction variables fromhigh-dimensional dataThe literature [20] proposed a genetic
Hindawi Publishing CorporationMathematical Problems in EngineeringVolume 2014 Article ID 430201 11 pageshttpdxdoiorg1011552014430201
2 Mathematical Problems in Engineering
1
k-vectors2-vectors
1-vectors
0-vectord-vector
middot middot middot
middot middot middotmiddot middot middot
middot middot middot
e1e2
ei
ed
middot middot middot
middot middot middot
e1 and e2
e1 and e3
ei and ej
edminus1 and ed
ei and middot middot middot and ej and middot middot middot and ek
e1 and e2 and middot middot middot and ed
Figure 1 The diagram of the subspaces in geometric algebra
algorithm using selection to choose interaction variables inhigh-dimensional data
The literature [13] presents a hierarchical interactive lassomethod for regression and provides a method of model coef-ficients estimation using KKT conditions and the Lagrangemultiplier method Based on the literature [13] and our pastwork the authors propose the concept of geometric algebrainteraction and coordinate descent algorithm for the hier-archical interactive lasso penalized logistic regression Weused experimental data including 4 kinds of datasets from theUCI machine learning database one Madelon datasets fromNIPS2003 and one daily life activity recognition datasetsTheexperimental results reveal the outstanding advantages of thehierarchical interactive lasso method compared to the lassoand interactive lasso methods The innovations include thefollowing (1) We use geometric algebra to explain variableinteraction (2) we derive an improved coordinate descentalgorithm to solve the hierarchical interactive lasso penalizedlogistic regression (3) we use the hierarchical interactivelasso for the classification problem
2 The Variable Interaction Theory ofGeometric Algebra
Definition 1 If the function 119891(119909 119910) cannot be represented asa sum of independent functions 119891
1(119909) + 119891
2(119910) then 119909 119910 in
the 119891 function are said to have interaction
A popular explanation of Definition 1 is that if a responsevariable cannot be represented as a linear weighted sumof the prediction variables it is probably because there areinteractions between the variables
Interactions between variables can be easily explainedby the geometric algebra theory Figure 1 is a diagramshowing all subspace in geometric algebra The 1-vectorsnamely order-1main variables can represent a119901dimensionalsubspace of the original data That is 119901 dimensional base ofthe original data is projected on the 1-vectors The 2-vectorsshow the interaction between two variables The simplest 2-vectors coefficient can be the product of two 1-vectors Inthe literature [13] our proposed area feature is considered asone of the interactions In the literature [20] our proposedorthocenter feature is considered as one of the interactionsHigher-order interactions are represented by 119896-vectors Inthis paper we only study area interactions between 1-vectors
This method can also be extended to the nonlinear complexfunction interactions or higher order
3 The Binary Logistic Regression Based onInteraction and Hierarchy
The outcome variable in the binary logistic model is denotedby119884 the income variables are predictors119883 119909
1 119909
119895 119909
119901
are order-1 main variables and the pairwise 119909119895119909119896are inter-
actions variables between order-1 main variables The binarylogistic model has the form
log 119894119905 (119875 (119884 = 1 | 119883)) =
119901
sum119895=0
120573119895119909119895+1
2sum119895 =119896
Θ119895119896119909119895119909119896+ 120576 (1)
where Θ119895119895= 0 the main variables coefficients are 120573 isin 119877119901+1
the interaction variables coefficients areΘ isin 119877119901times119901 1199090is 1 and
120576 satisfies119873(0 1205902)Assume that the training samples are (x
1 1199101) (xi 119910119894)
(xN 119910119873) xi isin 119877119901
119910119894=
1 119884119894= 1
0 119884119894= 1
(2)
Our goal is to select a feature subset from the order-1mainvariables (dimension 119901) and order-2 interaction variables(dimension 119901(119901 minus 1)2) We then estimate the coefficientvalues for nonzero model parameters We can obtain theprobability of two classes as follows
Pr (119884 = 1 | xi)
=1
1 + exp (minussum119895120573119895119909119894119895minus (12)sum
119895 =119896Θ119895119896119909119894119895119909119894119896)
= 119901 (xi)
119875 (119884 = 2 | xi)
=1
1 + exp (sum119895120573119895119909119894119895+ (12)sum
119895 =119896Θ119895119896119909119894119895119909119894119896)
= 1 minus 119901 (xi)
(3)
The maximum likelihood estimation is used to estimatethe unknown model parameters which make the likelihoodfunction of 119873 independent observations the largest Wedefine
119871 (120573 Θ) =119873
prod119894=1
119901 (xi)119910119894 [1 minus 119901 (xi)]
(1minus119910119894) (4)
Then the logarithmic likelihood function of (4) is
1
119873ln [119871 (120573 Θ)]
=1
119873
119873
sum119894=1
[119910119894(xTi 120573 +
1
2xTi Θxi) + ln (1 minus 119901 (xi))]
(5)
Mathematical Problems in Engineering 3
We use the second-order Taylor expansion at the currentestimated value (120573 Θ) for (5) and obtain the subproblem asfollows
119897119876(120573 Θ) =
1
2120596119894(119911119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896)
2
+(119910119894minus 119901 (xi))
2
2 lowast 1199012 (xi) [1 minus 119901 (xi)]2
+ [
[
119910119894(sum119895
120573119895119909119894119895+ sum119895 =119896
Θ119895119896119909119894119895119909119894119896)
+ ln (1 minus 119901 (xi))]]
(6)
where 120596119894= 119901(xi)[1 minus 119901(xi)] 119911119894 = sum
119895120573119895119909119894119895+ (12)sum
119895 =119896Θ119895119896
119909119894119895119909119894119896+ (119910119894minus 119901(xi))119901(xi)[1 minus 119901(xi)]
The proof that (5) implies (6) is presented in theAppendix A
In order to obtain the sparse solution for themain variablecoefficients and interaction coefficients the penalty functionis used to enhance the stability of the interactive model
(120573 Θ) = arg Min120573isin119877119901+1Θisin119877119901times119901
minus 119897119876(120573 Θ) + 120582
1
100381710038171003817100381712057310038171003817100381710038171
+ 1205822 Θ1
(7)
We focus on those interactions that have large main vari-able coefficients Such restrictions are known as ldquohierarchyrdquoThe mathematical expression for them is Θ
119895119896= 0 rArr 120573
119895= 0
or 120573119896
= 0 So we add the constraints enforcing the hierarchyinto (7) as follows
(120573 Θ)
= Min120573isin119877119901+1Θisin119877119901times119901
minus 119897119876(120573 Θ) + 120582
1
100381710038171003817100381712057310038171003817100381710038171 + 1205822 Θ1
st 10038171003817100381710038171003817Θ119895100381710038171003817100381710038171 le
1003816100381610038161003816100381612057311989510038161003816100381610038161003816 for 119895 = 1 119901
(8)
where Θ119895is the 119895th column of Θ If Θ
119895119896= 0 then Θ
1198951gt 0
Θ1198961gt 0 so 120573
119895= 0 and 120573
119896= 0 The new constraint
guarantees the hierarchy but we cannot obtain a convexsolution because (8) is not convex So instead of 120573 we use120573+ 120573minus And the corresponding convex relaxation of (8) is asfollows
(120573plusmn Θ) = Min1205730isin119877120573
plusmnisin119877119901Θisin119877119901times119901
minus 119897119876(120573+ minus 120573minus Θ)
+ 12058211119879 (120573+ + 120573minus) + 120582
2sum119895
10038171003817100381710038171003817Θ11989510038171003817100381710038171003817
st10038171003817100381710038171003817Θ119895
100381710038171003817100381710038171 le 120573+119895+ 120573minus119895
120573+119895ge 0 120573minus
119895ge 0
for 119895 = 1 119901
(9)
where 120573 = 120573+ minus 120573minus 120573plusmn = maxplusmn120573 0 120573+ 120573minus isin 119877119901+1 and1205731= 120573+ + 120573minus
4 Coordinate Descent Algorithm andKKT Conditions
The basic idea of the coordinate descent algorithm is toconvert multivariate problems into multiple single variablesubproblems It allows optimizing only one-dimensionalvariables at a time The solution can be updated in a cycleWe solve (9) using the coordinate descent algorithm
The Lagrange function corresponding to (9) is as follows
119871 (120573+ 120573minus Θ) = minus119897119876(120573+ minus 120573minus Θ)
+ (12058211 minus minus 120574+1)
119879
120573+
+ (12058211 minus minus 120574minus1)
119879
120573minus
+ ⟨diag (12058221 + )119880Θ⟩
(10)
where
119895119896=
isin [minus1 1] Θ119896119895= 0
sign (Θ119895119896) Θ119895119896
= 0(11)
and and 120574plusmn are the dual variables corresponding tothe hierarchical constraint and the nonnegative constraintsFormula (10) can be decomposed into 119901 subproblems
119871 (120573+119895 120573minus119895 Θ119895) = minus119897
119876(120573+119895minus 120573minus119895 Θ119895)
+ (1205821minus 119895minus 120574+)119879
120573+119895
+ (1205821minus 119895minus 120574minus)119879
120573minus119895
+ ⟨(1205822+ 120572)119880
119895 Θ119895⟩
(12)
The solution of (12) as a convex problem can be obtainedby a set of optimality conditions known as the KKT (Karush-Kuhn-Tucker) conditions This is the key advantage of ourapproach
The stationary conditions of (12) according to KKT are120597119871120597120573plusmn
119895= 0 120597119871120597Θ
119895119896= 0The complementary conditions are
Θ1198951le 120573+119895+120573minus119895 120574plusmn119895120573plusmn119895= 0 120573plusmn ge 0 120574plusmn ge 0 ge 0
119895(Θ1198951minus
120573+119895minus 120573minus119895) = 0 We assume that (1119873)sum119873
119894=11199092119894119895= 1 For our
problem the KKT conditions can be written as follows
120573+119895minus 120573minus119895= 119878 [119909
119894119895(119911119894minus 119860119894) + 120573+119895minus 120573minus1198951205821+ 119895
120596119894
]
Θ119895119896=119878 [(119909119894119895119909119894119896)119879
(119911119894minus 119860119894+ Θ119895119896119909119894119895119909119894119896) 2 (120582
2+ 119895) 120596119894]
10038171003817100381710038171003817119909119894119895119909119894119896100381710038171003817100381710038172
(13)
where119860119894= sum119895120573119895119909119894119895+(12)sum
119895 =119896Θ119895119896119909119894119895119909119894119896 119911119894120596119894are from (6)
and 119878 denotes the soft-threshold operator defined by 119878(119888 120582) =sign(119888)(|119888| minus 120582)
+
4 Mathematical Problems in Engineering
Table 1 Datasets information
Datasets Samples Variable dimension Interactive dimension Class1 Breast-cancer-Wisconsin 683 9 36 22 Ionosphere 351 33 528 23 Liver disorders 345 6 15 24 Sonar 208 60 1770 2
Table 2 The experimental results of our method
Datasets Coefficients Error rate () SD Time (s) 120582
Breast-cancer-Wisconsin 22 30 001 949 314Ionosphere 101 280 002 12463 282Liver disorders 25 260 002 1211 237Sonar 117 140 002 15069 073
The proof of both expressions (13) can be found inAppendix B
Now we define 119891(119895) = Θ
1198951minus 120573+119895minus 120573minus119895 Then
119891 (119895)
=
10038171003817100381710038171003817100381710038171003817100381710038171003817
119875
sum119896=1
119878 [(119909119894119895119909119894119896)119879
(119911119894minus 119860119894+ Θ119895119896119909119894119895119909119894119896)
2 (1205822+ 119895)
120596119894
]
sdot (10038171003817100381710038171003817119909119894119895119909119894119896
100381710038171003817100381710038172
2)minus1
100381710038171003817100381710038171003817100381710038171003817100381710038171
minus100381710038171003817100381710038171003817100381710038171003817119878 [119909119894119895(119911119894minus 119860119894) + 120573+119895minus 120573minus1198951205821+ 119895
120596119894
]1003817100381710038171003817100381710038171003817100381710038171
(14)
The remaining KKT conditions only involve 119891() = 0119891() le 0 ge 0 Observing that 119891 is nonincreasing withrespect to and is piecewise linear it is easy to get the solutionfor
In conclusion the overall idea of the coordinate descentalgorithm is that the minimization of (9) is equivalent to theminimization of (10) Formula (10) can be decomposed into119901 independent formula (12) Formula (12) can be solved as(13) The final coefficient optimization iteration formula is asfollows
120573+(119898+1)119895
minus 120573minus(119898+1)119895
= 119878 [119909119894119895(119911119898119894minus 119860119898119894) + 120573+(119898)119895
minus 120573minus(119898)119895
1205821+ 119898119895
120596119896119894
]
Θ119898+1119895119896
=119878 [(119909119894119895119909119894119896) (119911119898119894minus 119860119898119894+ Θ119898119895119896119909119894119895119909119894119896) 2 (120582
2+ 119898119895) 120596119898119894]
10038171003817100381710038171003817119909119894119895119909119894119896100381710038171003817100381710038172
(15)
where 120573+(119898)119895
minus 120573minus(119898)119895
is the estimated value of the 119895th mainvariable coefficient after119898 iterations and Θ119898
119895119896is the estimation
of the interaction coefficient between the 119895th variable and the119896th variable after119898 iterations
5 The Experimental Results and Analysis
51 The Experimental Results and Analysis of Four UCIDatasets There are four UCI study database datasets whichinclude the breast-cancer-Wisconsin datasets Ionospheredatasets Liver disorders datasets and Sonar datasets asshown in Table 1
We do the 10-fold cross-validation (10-CV) experimentsin the paper for 20 times using 119877 where 120582
1= 2120582
2= 120582
Besideswe complete an experiment employing the interactivehierarchical lasso logistic regression method The resultsinclude the number of nonzero variable coefficients averageerror rate of the 10-CV standard deviation (SD) CPU timeand the value of lambda (120582) estimated The results are shownin Table 2
The results of the 10-CV based on the four datasets arepresented in Figures 2 to 5 using the proposedmethod In thefigures the horizontal axis represents the logarithmic value of120582 and the vertical axis is the error rate of the 10-CV Besidesthe horizontal axis at the top of each figure represents thenumber of nonzero variable coefficients corresponding to the120582 value
The results for the breast-cancer-Wisconsin datasets areshown in Figure 2 The minimum error rate is 003 and thenumber of selected variables is more than 11 The results forthe Ionosphere experimental datasets are shown in Figure 3When the number of selected variables is 101 the lowest errorrate is 028 with the smaller standard deviation The numberof the selected variables is larger than the original dimensionso the interactions provide the classified information Theresults for the Liver disorders datasets are presented inFigure 4 If the number of selected variables is 25 the lowesterror rate can reach 026 while the standard deviation is002 Finally the results for the Sonar datasets are presentedin Figure 5 When more than 80 variables are selected theminimum error rate is 014
Mathematical Problems in Engineering 5
005
015
025
035
Cros
s-va
lidat
ion
erro
r
24 20 14 14 13 13 11 11 6Number of features
02
3
1 2 3 4 5
log(120582)
Figure 2 The results of breast-cancer-Wisconsin
028
030
032
034
036
038
177 139 104 83 73 53 30 9Number of features
3
Cros
s-va
lidat
ion
erro
r
log(120582)minus1 0 1 2 3
Figure 3 The results of Ionosphere
In what follows we compare our method to the existingliterature [13] The classification results and training time ofour method are better than those shown in the literature[13] The experimental results of the lasso all-pair lassoand conventional pattern recognition methods with 10-foldcross-validation of 20 times in the four UCI datasets arelisted respectively in Tables 3 4 and 5 Conventional patternrecognitionmethods include support vectormachine (SVM)linear discriminant analysis (LDA) quadratic discriminantanalysis (QDA)119870-nearest neighborhood (119870-NN) and deci-sion tree (DT) methods The lasso is a method that considersthe main variables without the interaction variables All-pairlasso is a method that considers the main variables and inter-action variables but without the hierarchy The experimental
025
030
035
33 32 32 28 26 24 24 19 12 8
040
log(120582)minus1 0 1 2 3
Number of features
Cros
s-va
lidat
ion
erro
r
Figure 4 The results of Liver disorders
03
05
124 112 97 88 77 70 42 18 8
02
04
log(120582)0 1 2 3
1Number of features
Cros
s-va
lidat
ion
erro
r
Figure 5 The results of Sonar
Table 3 The experimental results of the lasso penalized logisticregression model
Datasets Error rate() SD Time (s)
Breast-cancer-Wisconsin 322 00001 03744Ionosphere 3115 00075 04863Liver disorders 3077 00077 00908Sonar 2213 00121 156806
results show that our model is better in classification resultsand more stableThis highlights the advantage of the variableinteractions and hierarchy
52 The Experimental Results for High-Dimensional SmallSample Data The Madelon datasets from NIPS2003 was
6 Mathematical Problems in Engineering
Table 4 The experimental results of the all-pair lasso penalizedlogistic regression model
Datasets Error rate() SD Time (s)
Breast-cancer-Wisconsin 293 00001 121213Ionosphere 2700 00089 31734Liver disorders 2636 00058 49190Sonar 2063 00115 41041
used to evaluate our method The sample numbers of thetraining validation and testing of the data were respectively2000 600 and 1800 The class number is 2 The variabledimension is 500 So the interactive dimension is 124750You can find more information about the datasets followingthe link httpwwwnipsfscecssotonacuk you can alsodownload the datasets and see the results of challengesbalance error rates and the area under the curve The modelis trained by using a training set The model parameters areselected by using a validation set Also the prediction resultsof the final model using the test set are uploaded online Theclassification score of the final model is obtained Our resultsare shown in Table 6 The results show that our method isslightly better than the lasso and all-pair lasso This impliesthat the interactions may also be important in the Madelondatasets
53 Activity Recognition (AR) Using Inertial Sensors of Smart-phones Anguita et al collected sensor data of smartphones[10] They used the support vector machine (SVM) methodto solve the classification problem of the daily life activityrecognition These results play an extremely significant rolein disability and elderly care Datasets can be downloadedfollowing the literature [10] 30 volunteers aged 19ndash48 yearsparticipated in the study Each person performed six activitieswearing the smartphone on the waist To obtain the dataclass label experiments were conducted using video record-ing The smartphone used in the experiments had built-inaccelerometer and gyroscope for measuring 3D linear accel-eration and angular acceleration The sampling frequencywas 50Hz which is more than enough for capturing humanmovements
We use the datasets to evaluate our method We use theupstairs and downstairs movements as two active classesThetraining sets have 986 samples and 1073 samples respectivelyThe test sets have 420 samples and 471 samples respectivelyThe variable dimension is 561 which includes the time andfrequency from sensor signals
Experimental results of the three lassomethods and somepattern recognitionmethods are shown inTable 7The resultsshow that our method is better than the pattern recognitionmethods since it takes the variable selection and interactioninto account Our method achieves the best classificationresults with less training and testing time
54 The Numerical Simulation Results and Discussion Nowsuppose that the number and dimension of the samples are
119899 = 200 119901 = 20 We take interactions into considerationand provide the following three kinds of simulation based onformula (1)
(1) The real model is hierarchical Θ119895119896
= 0 rArr 120573119895
= 0 or120573119896
= 0 119895 119896 = 1 119901 There are 10 nonzero elementsin 120573 and 20 nonzero elements in Θ
(2) The real model only includes interactive variables120573119895= 0 119895 = 1 119901 There are 20 nonzero elements
in Θ
(3) The real model only includes main variables Θ119895119896
=0 119895 119896 = 1 119901 There are 10 nonzero elements in 120573
The SNR of the main variables is 15 and SNR of theinteraction variables is 1 The experiment results of 100 timesare shown in Figure 6
When the real model is hierarchical our method is thebest and the lasso is the worst This is shown in Figure 6(a)When the real model only includes interaction variances theinteractive lasso is the best and our method takes the secondplace while the lasso is still theworst as shown in Figure 6(b)The reason for this result is that when our method fitsthe model the interaction variables are considered to bemain variables When the real model only includes mainvariables the lasso is the best and our method still takes thesecond place and the all-pair lasso is the worst as shown inFigure 6(c)
We believe thatmany actual classification problems couldbe hierarchical and interactive They contain both mainvariables and interaction variables Our method fits in thiskind of situation
6 Conclusion
Taking into consideration the interaction between variablesthe hierarchical interactive lasso penalized logistic regressionusing the coordinate descent algorithm is derivedWeprovidethe model definition constraint condition and the convexrelaxation condition for the model We obtain a solutionfor the coefficients of the proposed model based on theconvex optimization and coordinate descent algorithm Wefurther provide experimental results based on four UCIdatasets NIPS2003 feature selection challenge datasets andtrue daily life activities identification datasets The resultsshow that the interaction widely exists in the classificationmodels They also demonstrate that the variable interactioncontributes to the response The classification performanceof our method is superior to the lasso all-pair lasso andsome pattern recognition methods It turns out that thevariable interaction and hierarchy are two important factorsOur further research is planned as follows other convexoptimization methods including the generalized gradientdescent method or alternating direction multiplier methodthe hierarchical interactive lasso penalized multiclass logisticregressionmethod the elastic net method or the hierarchicalgroup lasso method The application of the multisensorinteraction in the daily life activities of the elderly is a newway of using of our method
Mathematical Problems in Engineering 7
Table 5 The experimental results of the traditional pattern recognition methods
Datasets Classifier Error rate () SD Time (s)
Breast-cancer-Wisconsin
SVM 342 00019 13092LDA 396 00007 02429QDA 492 00014 02309K-NN 461 00030 06636DT 499 00041 16201
Ionosphere
SVM 3583 00063 29149LDA 3310 00090 06168QDA 3201 00076 04516K-NN 2830 00090 06363DT 3823 00408 30865
Liver disorders
SVM 3080 00100 11401LDA 3131 00089 02474QDA 4019 00119 01958K-NN 3689 00124 06089DT 3694 00187 19921
Sonar
SVM 2570 00197 12181LDA 2513 00145 05897QDA 2380 00205 04594K-NN 1309 00101 05345DT 3591 00303 07153
Table 6 The experimental results for the Madelon datasets
Methods Balance error rate () Area under the curve ()Training Validation Testing Training Validation Testing
Lasso 3808 3721 3661 6478 6279 6340All-pair lasso 3498 3701 3657 6502 6299 6343Our method 3000 3635 3512 7000 6808 6776
Appendices
A Proofs from (5) to (6)
For notational convenience we have written 119901119894instead of
119901(119909119894) A logarithmic likelihood function of (4) is as follows
119871119876=
1
119873ln [119871 (120573 Θ)]
=1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
(A1)
First we give the first- and second-order partial derivativeand mixed partial derivative of (4) with respect to 120573 and Θ
120597119871119876
120597120573
=120597 (1119873)sum
119873
119894=1[119910119894(119909119879119894120573 + (12) 119909119879
119894Θ119909119894) + ln (1 minus 119901
119894)]
120597120573
=1
119873
119873
sum119894=1
[(119909119879119894119910119894)119879
minusexp (119909119879
119894120573 + (12) 119909119879
119894Θ119909119894)
1 + exp (119909119879119894120573 + (12) 119909119879
119894Θ119909119894)sdot 119909119894]
=1
119873
119873
sum119894=1
(119910119879119894119909119894minus 119901119879119894119909119894) =
1
119873
119873
sum119894=1
119909119894(119910119894minus 119901119894)
1205972119871119876
1205971205732
=120597
120597120573
1
119873
119873
sum119894=1
119909119879119894(119910119894minus 119901119894) = minus
1
119873
119873
sum119894=1
119909119879119894
120597119901119894
120597120573
=1
119873
119873
sum119894=1
119909119879119894
exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
[1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)]2sdot (minus119909119894)
= minus1
119873
119873
sum119894=1
119909119894
exp (119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
sdot1
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)sdot 119909119879119894
= minus1
119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894
120597119871119876
120597Θ
8 Mathematical Problems in Engineering
Table 7 The experimental results of the three lasso methods and five traditional pattern recognition methods
Methods Error rate in training () Training time (s) Error rate in testing () Testing time (s)Lasso 000 02649 146 00312All-pair lasso 000 08807 135 00953Our method 000 06012 112 00556SVM 000 09516 123 009361-NN 000 00000 920 079563-NN 000 00000 898 07488QDA 661 14196 404 02184DT 000 07332 1493 01092
0055
005
0045
004
0035
003
0025
002
0015
Erro
r rat
e
Lasso Our method All-pair lasso
(a) Hierarchical interaction
0055
005
0045
004
0035
003
0025
002
0015
Erro
r rat
e
Lasso Our method All-pair lasso
(b) Interaction variances only
0055
005
0045
004
0035
003
0025
002
0015
Erro
r rat
e
Lasso Our method All-pair lasso
(c) Main variances only
Figure 6 The error rate of the three lasso methods in the simulation experiment (the Bayes error is shown by the purple dotted line in thegraph)
Mathematical Problems in Engineering 9
=120597 (1119873)sum
119873
119894=1[119910119894(119909119879119894120573 + (12) 119909119879
119894Θ119909119894) + ln (1 minus 119901
119894)]
120597Θ
=1
119873
119873
sum119894=1
[(1
2119910119894119909119879119894119909119894)119879
minusexp (119909119879
119894120573 + (12) 119909119879
119894Θ119909119894)
1 + exp (119909119879119894120573 + (12) 119909119879
119894Θ119909119894)
sdot1
2sdot 119909119879119894119909119894]
=1
119873
119873
sum119894=1
(1
2119910119894119909119879119894119909119894minus1
2119901119894119909119879119894119909119894)
=1
2119873
119873
sum119894=1
119909119879119894119909119894(119910119894minus 119901119894)
1205972119871119876
120597Θ2
=120597
120597Θ
1
2119873
119873
sum119894=1
119909119879119894119909119894(119910119894minus 119901119894) = minus
1
2119873
119873
sum119894=1
119909119879119894119909119894
120597119901119894
120597Θ
=1
2119873
119873
sum119894=1
exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
[1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)]2
sdot (minus1
2119909119879119894119909119894) sdot 119909119879119894119909119894
= minus1
4119873
119873
sum119894=1
exp (119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
sdot1
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)sdot1003817100381710038171003817119909119894
10038171003817100381710038172
= minus1
4119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot1003817100381710038171003817119909119894
10038171003817100381710038172
1205972119871119876
120597120573120597Θ
=120597
120597120573
1
119873
119873
sum119894=1
119909119879119894(119910119894minus 119901119894) = minus
1
119873
119873
sum119894=1
119909119879119894
120597119901119894
120597Θ
=1
119873
119873
sum119894=1
119909119879119894
exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
[1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)]2
sdot (minus1
2119909119879119894119909119894)
= minus1
119873
119873
sum119894=1
119909119879119894
exp (119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
sdot1
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
= minus1
2119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894sdot 119909119879119894
(A2)
Then (A1) is expanded by using Taylor series with respectto the expended point (120573 Θ)
119897119876(120573 Θ)
= 119871119876+ (120573 minus 120573) sdot
120597119871119876
120597120573+ (Θ minus Θ) sdot
120597119871119876
120597Θ
+1
2[(120573 minus 120573) sdot
120597119871119876
1205971205732+ 2 (120573 minus 120573)
119879
(Θ minus Θ)1205972119871119876
120597120573120597Θ
+ (Θ minus Θ) sdot120597119871119876
120597Θ2]
=1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
+ (120573 minus 120573) sdot1
119873
119873
sum119894=1
119909119894(119910119894minus 119901119894)
+ (Θ minus Θ) sdot1
2119873
119873
sum119894=1
119909119879119894119909119894(119910119894minus 119901119894)
minus1
2(120573 minus 120573) sdot
1
119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894
minus 2 (120573 minus 120573)119879
(Θ minus Θ)
sdot1
2119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894sdot 119909119879119894
minus (Θ minus Θ) sdot1
4119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot1003817100381710038171003817119909119894
10038171003817100381710038172
= minus1
2119873
119873
sum119894=1
(1 minus 119901119894)
sdot 119901119894[119909119879119894(120573 minus 120573)]
2
+ (120573 minus 120573)119879
(Θ minus Θ) 119909119894
sdot 119909119879119894119909119894+1
4[119909119879119894(Θ minus Θ) 119909
119894]2
+1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
+1
119873
119873
sum119894=1
119909119879119894(120573 minus 120573) (119910
119894minus 119901119894)
+1
2119873
119873
sum119894=1
119909119879119894(Θ minus Θ) 119909
119894(119910119894minus 119901119894)
= minus1
2119873
119873
sum119894=1
(1 minus 119901119894)
10 Mathematical Problems in Engineering
sdot 119901119894119909119879119894(120573 minus 120573)
+1
2119909119879119894(Θ minus Θ) 119909
1198942
+1
119873
119873
sum119894=1
119909119879119894(120573 minus 120573) (119910
119894minus 119901119894)
+1
2119873
119873
sum119894=1
119909119879119894(Θ minus Θ) 119909
119894(119910119894minus 119901119894)
+1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
= minus1
2119873
119873
sum119894=1
(1 minus 119901119894)
sdot 119901119894119909119879119894(120573 minus 120573) +
1
2119909119879119894(Θ minus Θ) 119909
119894
+119910119894minus 119901119894
119901119894[1 minus 119901
119894]
2
+1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
minus1
2119873
119873
sum119894=1
[119910119894minus 119901119894
119901119894[1 minus 119901
119894]]
2
= minus1
2119873
119873
sum119894=1
120596119894(119911119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896)
2
+1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
minus1
2119873
119873
sum119894=1
[119910119894minus 119901119894
119901119894[1 minus 119901
119894]]
2
(A3)
where
120596119894= 119901 (119909
119894) [1 minus 119901 (119909
119894)]
119911119894= sum119895
120573119895119909119894119895+1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896
+119910119894minus 119901 (119909
119894)
119901 (119909119894) [1 minus 119901 (119909
119894)]
(A4)
B Proofs from (12) to (13)
First there are three cases in calculating 120573+119895minus 120573minus119895
(1) 120573+119895ge 0 120573minus
119895= 0
120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+
119895minus 120573minus119895minus1205821+ 119895
120596119894
(B1)
(2) 120573+119895ge 0 120573minus
119895= 0
120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+
119895minus 120573minus119895+1205821+ 119895
120596119894
(B2)
(3) 120573+119895ge 0 120573minus
119895= 0
120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+
119895minus 120573minus119895 (B3)
We derive
120573+119895minus 120573minus119895= 119878 [119909
119894119895(119911119894minus 119860119894) + 120573+119895minus 120573minus1198951205821+ 119895
120596119894
] (B4)
Secondly we give the following by the stationary condi-tion 120597119871120597Θ
119895119896= 0
1
2120596119894sdot 2(119911
119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896)
sdot (minus1
2119909119894119895119909119894119896) + (120582
2+ 120572119895)119880119895119896= 0
120596119894sdot (119911119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896)
sdot (1
2119909119894119895119909119894119896) = (120582
2+ 120572119895)119880119895119896
(119911119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896)
sdot (1
2119909119894119895119909119894119896) =
(1205822+ 120572119895)119880119895119896
120596119894
(B5)
Supposing
120574(minus119895119896) = 119911119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896+ Θ119895119896119909119894119895119909119894119896 (B6)
we have
(120574(minus119895119896) minus Θ119895119896119909119894119895119909119894119896) sdot (119909119894119895119909119894119896)
=2 (1205822+ 120572119895)119880119895119896
120596119894
Θ119895119896119909119894119895119909119894119896119909119894119895119909119894119896
= minus2 (1205822+ 120572119895)119880119895119896
120596119894
+ 120574(minus119895119896) sdot 119909119894119895119909119894119896
Mathematical Problems in Engineering 11
Θ119895119896
=120574(minus119895119896) sdot 119909
119894119895119909119894119896minus 2 (120582
2+ 120572119895)119880119895119896120596119894
(119909119894119895119909119894119896)2
(B7)
We discuss three cases about Θ119895119896value
(1) Θ119895119896gt 0 119880
119895119896= 1
Θ119895119896=120574(minus119895119896) sdot 119909
119894119895119909119894119896minus 2 (120582
2+ 120572119895) 120596119894
(119909119894119895119909119894119896)2
(B8)
(2) Θ119895119896lt 0 119880
119895119896= minus1
Θ119895119896=120574(minus119895119896) sdot 119909
119894119895119909119894119896+ 2 (120582
2+ 120572119895) 120596119894
(119909119894119895119909119894119896)2
(B9)
(3) Θ119895119896= 0
We drive
Θ119895119896=119878 [120574(minus119895119896) sdot 119909
119894119895119909119894119896 2 (1205822+ 120572119895) 120596119894]
(119909119894119895119909119894119896)2
(B10)
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work was supported by the National Natural ScienceFoundation of China (nos 61273019 61473339) China Post-doctoral Science Foundation (2014M561202) Hebei Postdoc-toral Science Foundation Special Fund Project and HebeiTop Young Talents Support Program
References
[1] R Tibshirani ldquoRegression shrinkage and selection via the lassordquoJournal of the Royal Statistical Society B vol 58 no 1 pp 267ndash288 1996
[2] M Y Park and T Hastie ldquo1198711-regularization path algorithm
for generalized linear modelsrdquo Journal of the Royal StatisticalSociety B Statistical Methodology vol 69 no 4 pp 659ndash6772007
[3] T TWu Y F Chen THastie E Sobel andK Lange ldquoGenome-wide association analysis by lasso penalized logistic regressionrdquoBioinformatics vol 25 no 6 pp 714ndash721 2009
[4] J Friedman T Hastie and R Tibshirani ldquoRegularization pathsfor generalized linear models via coordinate descentrdquo Journal ofStatistical Software vol 33 no 1 pp 1ndash22 2010
[5] M Lim and T Hastie ldquoLearning interactions via hierarchicalgroup-lasso regularization[EB]rdquo httpwwwstanfordedusimhastiePapersglinternetpdf
[6] H Schwender and K Ickstadt ldquoIdentification of SNP interac-tions using logic regressionrdquo Biostatistics vol 9 no 1 pp 187ndash198 2008
[7] S Noah and R Tibshirani ldquoA Permutation Approach to TestingInteractions inManyDimensionsrdquo httpstatwebstanfordedusimtibsresearchhtml
[8] JWu BDevlin S RingquistM Trucco andK Roeder ldquoScreenand clean a tool for identifying interactions in genome-wideassociation studiesrdquo Genetic Epidemiology vol 34 no 3 pp275ndash285 2010
[9] Y Nardi and A Rinaldo ldquoThe log-linear group-lasso estimatorand its asymptotic propertiesrdquo Bernoulli vol 18 no 3 pp 945ndash974 2012
[10] M Yuan V R Joseph and Y Lin ldquoAn efficient variable selectionapproach for analyzing designed experimentsrdquo Technometricsvol 49 no 4 pp 430ndash439 2007
[11] H Chipman ldquoBayesian variable selection with related predic-torsrdquoThe Canadian Journal of Statistics vol 24 no 1 pp 17ndash361996
[12] N H Choi W Li and J Zhu ldquoVariable selection with thestrong heredity constraint and its oracle propertyrdquo Journal of theAmerican Statistical Association vol 105 no 489 pp 354ndash3642010
[13] J Bien J Taylor and R Tibshirani ldquoA lasso for hierarchicalinteractionsrdquoTheAnnals of Statistics vol 41 no 3 pp 1111ndash11412013
[14] M Yuan and Y Lin ldquoModel selection and estimation in regres-sion with grouped variablesrdquo Journal of the Royal StatisticalSociety Series B StatisticalMethodology vol 68 no 1 pp 49ndash672006
[15] R Jenatton J-Y Audibert and F Bach ldquoStructured variableselection with sparsity-inducing normsrdquo Journal of MachineLearning Research vol 12 no 10 pp 2777ndash2824 2011
[16] P Radchenko and G M James ldquoVariable selection usingadaptive nonlinear interaction structures in high dimensionsrdquoJournal of the American Statistical Association vol 105 no 492pp 1541ndash1553 2010
[17] F Bach R Jenatton J Mairal and G Obozinski ldquoStructuredsparsity through convex optimizationrdquo Statistical Science vol27 no 4 pp 450ndash468 2012
[18] I Ruczinski C Kooperberg and M LeBlanc ldquoLogic regres-sionrdquo Journal of Computational and Graphical Statistics vol 12no 3 pp 475ndash511 2003
[19] P Hall and J-H Xue ldquoOn selecting interacting features fromhigh-dimensional datardquo Computational Statistics amp Data Anal-ysis vol 71 pp 694ndash708 2014
[20] J-J Wang J Li T Zhang and W-X Hong ldquoDistinguishingvisual feature extraction method using quadratic map andgenetic algorithmrdquo Journal of System Simulation vol 21 no 16pp 5080ndash5083 2009
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
2 Mathematical Problems in Engineering
1
k-vectors2-vectors
1-vectors
0-vectord-vector
middot middot middot
middot middot middotmiddot middot middot
middot middot middot
e1e2
ei
ed
middot middot middot
middot middot middot
e1 and e2
e1 and e3
ei and ej
edminus1 and ed
ei and middot middot middot and ej and middot middot middot and ek
e1 and e2 and middot middot middot and ed
Figure 1 The diagram of the subspaces in geometric algebra
algorithm using selection to choose interaction variables inhigh-dimensional data
The literature [13] presents a hierarchical interactive lassomethod for regression and provides a method of model coef-ficients estimation using KKT conditions and the Lagrangemultiplier method Based on the literature [13] and our pastwork the authors propose the concept of geometric algebrainteraction and coordinate descent algorithm for the hier-archical interactive lasso penalized logistic regression Weused experimental data including 4 kinds of datasets from theUCI machine learning database one Madelon datasets fromNIPS2003 and one daily life activity recognition datasetsTheexperimental results reveal the outstanding advantages of thehierarchical interactive lasso method compared to the lassoand interactive lasso methods The innovations include thefollowing (1) We use geometric algebra to explain variableinteraction (2) we derive an improved coordinate descentalgorithm to solve the hierarchical interactive lasso penalizedlogistic regression (3) we use the hierarchical interactivelasso for the classification problem
2 The Variable Interaction Theory ofGeometric Algebra
Definition 1 If the function 119891(119909 119910) cannot be represented asa sum of independent functions 119891
1(119909) + 119891
2(119910) then 119909 119910 in
the 119891 function are said to have interaction
A popular explanation of Definition 1 is that if a responsevariable cannot be represented as a linear weighted sumof the prediction variables it is probably because there areinteractions between the variables
Interactions between variables can be easily explainedby the geometric algebra theory Figure 1 is a diagramshowing all subspace in geometric algebra The 1-vectorsnamely order-1main variables can represent a119901dimensionalsubspace of the original data That is 119901 dimensional base ofthe original data is projected on the 1-vectors The 2-vectorsshow the interaction between two variables The simplest 2-vectors coefficient can be the product of two 1-vectors Inthe literature [13] our proposed area feature is considered asone of the interactions In the literature [20] our proposedorthocenter feature is considered as one of the interactionsHigher-order interactions are represented by 119896-vectors Inthis paper we only study area interactions between 1-vectors
This method can also be extended to the nonlinear complexfunction interactions or higher order
3 The Binary Logistic Regression Based onInteraction and Hierarchy
The outcome variable in the binary logistic model is denotedby119884 the income variables are predictors119883 119909
1 119909
119895 119909
119901
are order-1 main variables and the pairwise 119909119895119909119896are inter-
actions variables between order-1 main variables The binarylogistic model has the form
log 119894119905 (119875 (119884 = 1 | 119883)) =
119901
sum119895=0
120573119895119909119895+1
2sum119895 =119896
Θ119895119896119909119895119909119896+ 120576 (1)
where Θ119895119895= 0 the main variables coefficients are 120573 isin 119877119901+1
the interaction variables coefficients areΘ isin 119877119901times119901 1199090is 1 and
120576 satisfies119873(0 1205902)Assume that the training samples are (x
1 1199101) (xi 119910119894)
(xN 119910119873) xi isin 119877119901
119910119894=
1 119884119894= 1
0 119884119894= 1
(2)
Our goal is to select a feature subset from the order-1mainvariables (dimension 119901) and order-2 interaction variables(dimension 119901(119901 minus 1)2) We then estimate the coefficientvalues for nonzero model parameters We can obtain theprobability of two classes as follows
Pr (119884 = 1 | xi)
=1
1 + exp (minussum119895120573119895119909119894119895minus (12)sum
119895 =119896Θ119895119896119909119894119895119909119894119896)
= 119901 (xi)
119875 (119884 = 2 | xi)
=1
1 + exp (sum119895120573119895119909119894119895+ (12)sum
119895 =119896Θ119895119896119909119894119895119909119894119896)
= 1 minus 119901 (xi)
(3)
The maximum likelihood estimation is used to estimatethe unknown model parameters which make the likelihoodfunction of 119873 independent observations the largest Wedefine
119871 (120573 Θ) =119873
prod119894=1
119901 (xi)119910119894 [1 minus 119901 (xi)]
(1minus119910119894) (4)
Then the logarithmic likelihood function of (4) is
1
119873ln [119871 (120573 Θ)]
=1
119873
119873
sum119894=1
[119910119894(xTi 120573 +
1
2xTi Θxi) + ln (1 minus 119901 (xi))]
(5)
Mathematical Problems in Engineering 3
We use the second-order Taylor expansion at the currentestimated value (120573 Θ) for (5) and obtain the subproblem asfollows
119897119876(120573 Θ) =
1
2120596119894(119911119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896)
2
+(119910119894minus 119901 (xi))
2
2 lowast 1199012 (xi) [1 minus 119901 (xi)]2
+ [
[
119910119894(sum119895
120573119895119909119894119895+ sum119895 =119896
Θ119895119896119909119894119895119909119894119896)
+ ln (1 minus 119901 (xi))]]
(6)
where 120596119894= 119901(xi)[1 minus 119901(xi)] 119911119894 = sum
119895120573119895119909119894119895+ (12)sum
119895 =119896Θ119895119896
119909119894119895119909119894119896+ (119910119894minus 119901(xi))119901(xi)[1 minus 119901(xi)]
The proof that (5) implies (6) is presented in theAppendix A
In order to obtain the sparse solution for themain variablecoefficients and interaction coefficients the penalty functionis used to enhance the stability of the interactive model
(120573 Θ) = arg Min120573isin119877119901+1Θisin119877119901times119901
minus 119897119876(120573 Θ) + 120582
1
100381710038171003817100381712057310038171003817100381710038171
+ 1205822 Θ1
(7)
We focus on those interactions that have large main vari-able coefficients Such restrictions are known as ldquohierarchyrdquoThe mathematical expression for them is Θ
119895119896= 0 rArr 120573
119895= 0
or 120573119896
= 0 So we add the constraints enforcing the hierarchyinto (7) as follows
(120573 Θ)
= Min120573isin119877119901+1Θisin119877119901times119901
minus 119897119876(120573 Θ) + 120582
1
100381710038171003817100381712057310038171003817100381710038171 + 1205822 Θ1
st 10038171003817100381710038171003817Θ119895100381710038171003817100381710038171 le
1003816100381610038161003816100381612057311989510038161003816100381610038161003816 for 119895 = 1 119901
(8)
where Θ119895is the 119895th column of Θ If Θ
119895119896= 0 then Θ
1198951gt 0
Θ1198961gt 0 so 120573
119895= 0 and 120573
119896= 0 The new constraint
guarantees the hierarchy but we cannot obtain a convexsolution because (8) is not convex So instead of 120573 we use120573+ 120573minus And the corresponding convex relaxation of (8) is asfollows
(120573plusmn Θ) = Min1205730isin119877120573
plusmnisin119877119901Θisin119877119901times119901
minus 119897119876(120573+ minus 120573minus Θ)
+ 12058211119879 (120573+ + 120573minus) + 120582
2sum119895
10038171003817100381710038171003817Θ11989510038171003817100381710038171003817
st10038171003817100381710038171003817Θ119895
100381710038171003817100381710038171 le 120573+119895+ 120573minus119895
120573+119895ge 0 120573minus
119895ge 0
for 119895 = 1 119901
(9)
where 120573 = 120573+ minus 120573minus 120573plusmn = maxplusmn120573 0 120573+ 120573minus isin 119877119901+1 and1205731= 120573+ + 120573minus
4 Coordinate Descent Algorithm andKKT Conditions
The basic idea of the coordinate descent algorithm is toconvert multivariate problems into multiple single variablesubproblems It allows optimizing only one-dimensionalvariables at a time The solution can be updated in a cycleWe solve (9) using the coordinate descent algorithm
The Lagrange function corresponding to (9) is as follows
119871 (120573+ 120573minus Θ) = minus119897119876(120573+ minus 120573minus Θ)
+ (12058211 minus minus 120574+1)
119879
120573+
+ (12058211 minus minus 120574minus1)
119879
120573minus
+ ⟨diag (12058221 + )119880Θ⟩
(10)
where
119895119896=
isin [minus1 1] Θ119896119895= 0
sign (Θ119895119896) Θ119895119896
= 0(11)
and and 120574plusmn are the dual variables corresponding tothe hierarchical constraint and the nonnegative constraintsFormula (10) can be decomposed into 119901 subproblems
119871 (120573+119895 120573minus119895 Θ119895) = minus119897
119876(120573+119895minus 120573minus119895 Θ119895)
+ (1205821minus 119895minus 120574+)119879
120573+119895
+ (1205821minus 119895minus 120574minus)119879
120573minus119895
+ ⟨(1205822+ 120572)119880
119895 Θ119895⟩
(12)
The solution of (12) as a convex problem can be obtainedby a set of optimality conditions known as the KKT (Karush-Kuhn-Tucker) conditions This is the key advantage of ourapproach
The stationary conditions of (12) according to KKT are120597119871120597120573plusmn
119895= 0 120597119871120597Θ
119895119896= 0The complementary conditions are
Θ1198951le 120573+119895+120573minus119895 120574plusmn119895120573plusmn119895= 0 120573plusmn ge 0 120574plusmn ge 0 ge 0
119895(Θ1198951minus
120573+119895minus 120573minus119895) = 0 We assume that (1119873)sum119873
119894=11199092119894119895= 1 For our
problem the KKT conditions can be written as follows
120573+119895minus 120573minus119895= 119878 [119909
119894119895(119911119894minus 119860119894) + 120573+119895minus 120573minus1198951205821+ 119895
120596119894
]
Θ119895119896=119878 [(119909119894119895119909119894119896)119879
(119911119894minus 119860119894+ Θ119895119896119909119894119895119909119894119896) 2 (120582
2+ 119895) 120596119894]
10038171003817100381710038171003817119909119894119895119909119894119896100381710038171003817100381710038172
(13)
where119860119894= sum119895120573119895119909119894119895+(12)sum
119895 =119896Θ119895119896119909119894119895119909119894119896 119911119894120596119894are from (6)
and 119878 denotes the soft-threshold operator defined by 119878(119888 120582) =sign(119888)(|119888| minus 120582)
+
4 Mathematical Problems in Engineering
Table 1 Datasets information
Datasets Samples Variable dimension Interactive dimension Class1 Breast-cancer-Wisconsin 683 9 36 22 Ionosphere 351 33 528 23 Liver disorders 345 6 15 24 Sonar 208 60 1770 2
Table 2 The experimental results of our method
Datasets Coefficients Error rate () SD Time (s) 120582
Breast-cancer-Wisconsin 22 30 001 949 314Ionosphere 101 280 002 12463 282Liver disorders 25 260 002 1211 237Sonar 117 140 002 15069 073
The proof of both expressions (13) can be found inAppendix B
Now we define 119891(119895) = Θ
1198951minus 120573+119895minus 120573minus119895 Then
119891 (119895)
=
10038171003817100381710038171003817100381710038171003817100381710038171003817
119875
sum119896=1
119878 [(119909119894119895119909119894119896)119879
(119911119894minus 119860119894+ Θ119895119896119909119894119895119909119894119896)
2 (1205822+ 119895)
120596119894
]
sdot (10038171003817100381710038171003817119909119894119895119909119894119896
100381710038171003817100381710038172
2)minus1
100381710038171003817100381710038171003817100381710038171003817100381710038171
minus100381710038171003817100381710038171003817100381710038171003817119878 [119909119894119895(119911119894minus 119860119894) + 120573+119895minus 120573minus1198951205821+ 119895
120596119894
]1003817100381710038171003817100381710038171003817100381710038171
(14)
The remaining KKT conditions only involve 119891() = 0119891() le 0 ge 0 Observing that 119891 is nonincreasing withrespect to and is piecewise linear it is easy to get the solutionfor
In conclusion the overall idea of the coordinate descentalgorithm is that the minimization of (9) is equivalent to theminimization of (10) Formula (10) can be decomposed into119901 independent formula (12) Formula (12) can be solved as(13) The final coefficient optimization iteration formula is asfollows
120573+(119898+1)119895
minus 120573minus(119898+1)119895
= 119878 [119909119894119895(119911119898119894minus 119860119898119894) + 120573+(119898)119895
minus 120573minus(119898)119895
1205821+ 119898119895
120596119896119894
]
Θ119898+1119895119896
=119878 [(119909119894119895119909119894119896) (119911119898119894minus 119860119898119894+ Θ119898119895119896119909119894119895119909119894119896) 2 (120582
2+ 119898119895) 120596119898119894]
10038171003817100381710038171003817119909119894119895119909119894119896100381710038171003817100381710038172
(15)
where 120573+(119898)119895
minus 120573minus(119898)119895
is the estimated value of the 119895th mainvariable coefficient after119898 iterations and Θ119898
119895119896is the estimation
of the interaction coefficient between the 119895th variable and the119896th variable after119898 iterations
5 The Experimental Results and Analysis
51 The Experimental Results and Analysis of Four UCIDatasets There are four UCI study database datasets whichinclude the breast-cancer-Wisconsin datasets Ionospheredatasets Liver disorders datasets and Sonar datasets asshown in Table 1
We do the 10-fold cross-validation (10-CV) experimentsin the paper for 20 times using 119877 where 120582
1= 2120582
2= 120582
Besideswe complete an experiment employing the interactivehierarchical lasso logistic regression method The resultsinclude the number of nonzero variable coefficients averageerror rate of the 10-CV standard deviation (SD) CPU timeand the value of lambda (120582) estimated The results are shownin Table 2
The results of the 10-CV based on the four datasets arepresented in Figures 2 to 5 using the proposedmethod In thefigures the horizontal axis represents the logarithmic value of120582 and the vertical axis is the error rate of the 10-CV Besidesthe horizontal axis at the top of each figure represents thenumber of nonzero variable coefficients corresponding to the120582 value
The results for the breast-cancer-Wisconsin datasets areshown in Figure 2 The minimum error rate is 003 and thenumber of selected variables is more than 11 The results forthe Ionosphere experimental datasets are shown in Figure 3When the number of selected variables is 101 the lowest errorrate is 028 with the smaller standard deviation The numberof the selected variables is larger than the original dimensionso the interactions provide the classified information Theresults for the Liver disorders datasets are presented inFigure 4 If the number of selected variables is 25 the lowesterror rate can reach 026 while the standard deviation is002 Finally the results for the Sonar datasets are presentedin Figure 5 When more than 80 variables are selected theminimum error rate is 014
Mathematical Problems in Engineering 5
005
015
025
035
Cros
s-va
lidat
ion
erro
r
24 20 14 14 13 13 11 11 6Number of features
02
3
1 2 3 4 5
log(120582)
Figure 2 The results of breast-cancer-Wisconsin
028
030
032
034
036
038
177 139 104 83 73 53 30 9Number of features
3
Cros
s-va
lidat
ion
erro
r
log(120582)minus1 0 1 2 3
Figure 3 The results of Ionosphere
In what follows we compare our method to the existingliterature [13] The classification results and training time ofour method are better than those shown in the literature[13] The experimental results of the lasso all-pair lassoand conventional pattern recognition methods with 10-foldcross-validation of 20 times in the four UCI datasets arelisted respectively in Tables 3 4 and 5 Conventional patternrecognitionmethods include support vectormachine (SVM)linear discriminant analysis (LDA) quadratic discriminantanalysis (QDA)119870-nearest neighborhood (119870-NN) and deci-sion tree (DT) methods The lasso is a method that considersthe main variables without the interaction variables All-pairlasso is a method that considers the main variables and inter-action variables but without the hierarchy The experimental
025
030
035
33 32 32 28 26 24 24 19 12 8
040
log(120582)minus1 0 1 2 3
Number of features
Cros
s-va
lidat
ion
erro
r
Figure 4 The results of Liver disorders
03
05
124 112 97 88 77 70 42 18 8
02
04
log(120582)0 1 2 3
1Number of features
Cros
s-va
lidat
ion
erro
r
Figure 5 The results of Sonar
Table 3 The experimental results of the lasso penalized logisticregression model
Datasets Error rate() SD Time (s)
Breast-cancer-Wisconsin 322 00001 03744Ionosphere 3115 00075 04863Liver disorders 3077 00077 00908Sonar 2213 00121 156806
results show that our model is better in classification resultsand more stableThis highlights the advantage of the variableinteractions and hierarchy
52 The Experimental Results for High-Dimensional SmallSample Data The Madelon datasets from NIPS2003 was
6 Mathematical Problems in Engineering
Table 4 The experimental results of the all-pair lasso penalizedlogistic regression model
Datasets Error rate() SD Time (s)
Breast-cancer-Wisconsin 293 00001 121213Ionosphere 2700 00089 31734Liver disorders 2636 00058 49190Sonar 2063 00115 41041
used to evaluate our method The sample numbers of thetraining validation and testing of the data were respectively2000 600 and 1800 The class number is 2 The variabledimension is 500 So the interactive dimension is 124750You can find more information about the datasets followingthe link httpwwwnipsfscecssotonacuk you can alsodownload the datasets and see the results of challengesbalance error rates and the area under the curve The modelis trained by using a training set The model parameters areselected by using a validation set Also the prediction resultsof the final model using the test set are uploaded online Theclassification score of the final model is obtained Our resultsare shown in Table 6 The results show that our method isslightly better than the lasso and all-pair lasso This impliesthat the interactions may also be important in the Madelondatasets
53 Activity Recognition (AR) Using Inertial Sensors of Smart-phones Anguita et al collected sensor data of smartphones[10] They used the support vector machine (SVM) methodto solve the classification problem of the daily life activityrecognition These results play an extremely significant rolein disability and elderly care Datasets can be downloadedfollowing the literature [10] 30 volunteers aged 19ndash48 yearsparticipated in the study Each person performed six activitieswearing the smartphone on the waist To obtain the dataclass label experiments were conducted using video record-ing The smartphone used in the experiments had built-inaccelerometer and gyroscope for measuring 3D linear accel-eration and angular acceleration The sampling frequencywas 50Hz which is more than enough for capturing humanmovements
We use the datasets to evaluate our method We use theupstairs and downstairs movements as two active classesThetraining sets have 986 samples and 1073 samples respectivelyThe test sets have 420 samples and 471 samples respectivelyThe variable dimension is 561 which includes the time andfrequency from sensor signals
Experimental results of the three lassomethods and somepattern recognitionmethods are shown inTable 7The resultsshow that our method is better than the pattern recognitionmethods since it takes the variable selection and interactioninto account Our method achieves the best classificationresults with less training and testing time
54 The Numerical Simulation Results and Discussion Nowsuppose that the number and dimension of the samples are
119899 = 200 119901 = 20 We take interactions into considerationand provide the following three kinds of simulation based onformula (1)
(1) The real model is hierarchical Θ119895119896
= 0 rArr 120573119895
= 0 or120573119896
= 0 119895 119896 = 1 119901 There are 10 nonzero elementsin 120573 and 20 nonzero elements in Θ
(2) The real model only includes interactive variables120573119895= 0 119895 = 1 119901 There are 20 nonzero elements
in Θ
(3) The real model only includes main variables Θ119895119896
=0 119895 119896 = 1 119901 There are 10 nonzero elements in 120573
The SNR of the main variables is 15 and SNR of theinteraction variables is 1 The experiment results of 100 timesare shown in Figure 6
When the real model is hierarchical our method is thebest and the lasso is the worst This is shown in Figure 6(a)When the real model only includes interaction variances theinteractive lasso is the best and our method takes the secondplace while the lasso is still theworst as shown in Figure 6(b)The reason for this result is that when our method fitsthe model the interaction variables are considered to bemain variables When the real model only includes mainvariables the lasso is the best and our method still takes thesecond place and the all-pair lasso is the worst as shown inFigure 6(c)
We believe thatmany actual classification problems couldbe hierarchical and interactive They contain both mainvariables and interaction variables Our method fits in thiskind of situation
6 Conclusion
Taking into consideration the interaction between variablesthe hierarchical interactive lasso penalized logistic regressionusing the coordinate descent algorithm is derivedWeprovidethe model definition constraint condition and the convexrelaxation condition for the model We obtain a solutionfor the coefficients of the proposed model based on theconvex optimization and coordinate descent algorithm Wefurther provide experimental results based on four UCIdatasets NIPS2003 feature selection challenge datasets andtrue daily life activities identification datasets The resultsshow that the interaction widely exists in the classificationmodels They also demonstrate that the variable interactioncontributes to the response The classification performanceof our method is superior to the lasso all-pair lasso andsome pattern recognition methods It turns out that thevariable interaction and hierarchy are two important factorsOur further research is planned as follows other convexoptimization methods including the generalized gradientdescent method or alternating direction multiplier methodthe hierarchical interactive lasso penalized multiclass logisticregressionmethod the elastic net method or the hierarchicalgroup lasso method The application of the multisensorinteraction in the daily life activities of the elderly is a newway of using of our method
Mathematical Problems in Engineering 7
Table 5 The experimental results of the traditional pattern recognition methods
Datasets Classifier Error rate () SD Time (s)
Breast-cancer-Wisconsin
SVM 342 00019 13092LDA 396 00007 02429QDA 492 00014 02309K-NN 461 00030 06636DT 499 00041 16201
Ionosphere
SVM 3583 00063 29149LDA 3310 00090 06168QDA 3201 00076 04516K-NN 2830 00090 06363DT 3823 00408 30865
Liver disorders
SVM 3080 00100 11401LDA 3131 00089 02474QDA 4019 00119 01958K-NN 3689 00124 06089DT 3694 00187 19921
Sonar
SVM 2570 00197 12181LDA 2513 00145 05897QDA 2380 00205 04594K-NN 1309 00101 05345DT 3591 00303 07153
Table 6 The experimental results for the Madelon datasets
Methods Balance error rate () Area under the curve ()Training Validation Testing Training Validation Testing
Lasso 3808 3721 3661 6478 6279 6340All-pair lasso 3498 3701 3657 6502 6299 6343Our method 3000 3635 3512 7000 6808 6776
Appendices
A Proofs from (5) to (6)
For notational convenience we have written 119901119894instead of
119901(119909119894) A logarithmic likelihood function of (4) is as follows
119871119876=
1
119873ln [119871 (120573 Θ)]
=1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
(A1)
First we give the first- and second-order partial derivativeand mixed partial derivative of (4) with respect to 120573 and Θ
120597119871119876
120597120573
=120597 (1119873)sum
119873
119894=1[119910119894(119909119879119894120573 + (12) 119909119879
119894Θ119909119894) + ln (1 minus 119901
119894)]
120597120573
=1
119873
119873
sum119894=1
[(119909119879119894119910119894)119879
minusexp (119909119879
119894120573 + (12) 119909119879
119894Θ119909119894)
1 + exp (119909119879119894120573 + (12) 119909119879
119894Θ119909119894)sdot 119909119894]
=1
119873
119873
sum119894=1
(119910119879119894119909119894minus 119901119879119894119909119894) =
1
119873
119873
sum119894=1
119909119894(119910119894minus 119901119894)
1205972119871119876
1205971205732
=120597
120597120573
1
119873
119873
sum119894=1
119909119879119894(119910119894minus 119901119894) = minus
1
119873
119873
sum119894=1
119909119879119894
120597119901119894
120597120573
=1
119873
119873
sum119894=1
119909119879119894
exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
[1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)]2sdot (minus119909119894)
= minus1
119873
119873
sum119894=1
119909119894
exp (119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
sdot1
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)sdot 119909119879119894
= minus1
119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894
120597119871119876
120597Θ
8 Mathematical Problems in Engineering
Table 7 The experimental results of the three lasso methods and five traditional pattern recognition methods
Methods Error rate in training () Training time (s) Error rate in testing () Testing time (s)Lasso 000 02649 146 00312All-pair lasso 000 08807 135 00953Our method 000 06012 112 00556SVM 000 09516 123 009361-NN 000 00000 920 079563-NN 000 00000 898 07488QDA 661 14196 404 02184DT 000 07332 1493 01092
0055
005
0045
004
0035
003
0025
002
0015
Erro
r rat
e
Lasso Our method All-pair lasso
(a) Hierarchical interaction
0055
005
0045
004
0035
003
0025
002
0015
Erro
r rat
e
Lasso Our method All-pair lasso
(b) Interaction variances only
0055
005
0045
004
0035
003
0025
002
0015
Erro
r rat
e
Lasso Our method All-pair lasso
(c) Main variances only
Figure 6 The error rate of the three lasso methods in the simulation experiment (the Bayes error is shown by the purple dotted line in thegraph)
Mathematical Problems in Engineering 9
=120597 (1119873)sum
119873
119894=1[119910119894(119909119879119894120573 + (12) 119909119879
119894Θ119909119894) + ln (1 minus 119901
119894)]
120597Θ
=1
119873
119873
sum119894=1
[(1
2119910119894119909119879119894119909119894)119879
minusexp (119909119879
119894120573 + (12) 119909119879
119894Θ119909119894)
1 + exp (119909119879119894120573 + (12) 119909119879
119894Θ119909119894)
sdot1
2sdot 119909119879119894119909119894]
=1
119873
119873
sum119894=1
(1
2119910119894119909119879119894119909119894minus1
2119901119894119909119879119894119909119894)
=1
2119873
119873
sum119894=1
119909119879119894119909119894(119910119894minus 119901119894)
1205972119871119876
120597Θ2
=120597
120597Θ
1
2119873
119873
sum119894=1
119909119879119894119909119894(119910119894minus 119901119894) = minus
1
2119873
119873
sum119894=1
119909119879119894119909119894
120597119901119894
120597Θ
=1
2119873
119873
sum119894=1
exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
[1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)]2
sdot (minus1
2119909119879119894119909119894) sdot 119909119879119894119909119894
= minus1
4119873
119873
sum119894=1
exp (119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
sdot1
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)sdot1003817100381710038171003817119909119894
10038171003817100381710038172
= minus1
4119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot1003817100381710038171003817119909119894
10038171003817100381710038172
1205972119871119876
120597120573120597Θ
=120597
120597120573
1
119873
119873
sum119894=1
119909119879119894(119910119894minus 119901119894) = minus
1
119873
119873
sum119894=1
119909119879119894
120597119901119894
120597Θ
=1
119873
119873
sum119894=1
119909119879119894
exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
[1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)]2
sdot (minus1
2119909119879119894119909119894)
= minus1
119873
119873
sum119894=1
119909119879119894
exp (119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
sdot1
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
= minus1
2119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894sdot 119909119879119894
(A2)
Then (A1) is expanded by using Taylor series with respectto the expended point (120573 Θ)
119897119876(120573 Θ)
= 119871119876+ (120573 minus 120573) sdot
120597119871119876
120597120573+ (Θ minus Θ) sdot
120597119871119876
120597Θ
+1
2[(120573 minus 120573) sdot
120597119871119876
1205971205732+ 2 (120573 minus 120573)
119879
(Θ minus Θ)1205972119871119876
120597120573120597Θ
+ (Θ minus Θ) sdot120597119871119876
120597Θ2]
=1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
+ (120573 minus 120573) sdot1
119873
119873
sum119894=1
119909119894(119910119894minus 119901119894)
+ (Θ minus Θ) sdot1
2119873
119873
sum119894=1
119909119879119894119909119894(119910119894minus 119901119894)
minus1
2(120573 minus 120573) sdot
1
119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894
minus 2 (120573 minus 120573)119879
(Θ minus Θ)
sdot1
2119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894sdot 119909119879119894
minus (Θ minus Θ) sdot1
4119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot1003817100381710038171003817119909119894
10038171003817100381710038172
= minus1
2119873
119873
sum119894=1
(1 minus 119901119894)
sdot 119901119894[119909119879119894(120573 minus 120573)]
2
+ (120573 minus 120573)119879
(Θ minus Θ) 119909119894
sdot 119909119879119894119909119894+1
4[119909119879119894(Θ minus Θ) 119909
119894]2
+1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
+1
119873
119873
sum119894=1
119909119879119894(120573 minus 120573) (119910
119894minus 119901119894)
+1
2119873
119873
sum119894=1
119909119879119894(Θ minus Θ) 119909
119894(119910119894minus 119901119894)
= minus1
2119873
119873
sum119894=1
(1 minus 119901119894)
10 Mathematical Problems in Engineering
sdot 119901119894119909119879119894(120573 minus 120573)
+1
2119909119879119894(Θ minus Θ) 119909
1198942
+1
119873
119873
sum119894=1
119909119879119894(120573 minus 120573) (119910
119894minus 119901119894)
+1
2119873
119873
sum119894=1
119909119879119894(Θ minus Θ) 119909
119894(119910119894minus 119901119894)
+1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
= minus1
2119873
119873
sum119894=1
(1 minus 119901119894)
sdot 119901119894119909119879119894(120573 minus 120573) +
1
2119909119879119894(Θ minus Θ) 119909
119894
+119910119894minus 119901119894
119901119894[1 minus 119901
119894]
2
+1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
minus1
2119873
119873
sum119894=1
[119910119894minus 119901119894
119901119894[1 minus 119901
119894]]
2
= minus1
2119873
119873
sum119894=1
120596119894(119911119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896)
2
+1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
minus1
2119873
119873
sum119894=1
[119910119894minus 119901119894
119901119894[1 minus 119901
119894]]
2
(A3)
where
120596119894= 119901 (119909
119894) [1 minus 119901 (119909
119894)]
119911119894= sum119895
120573119895119909119894119895+1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896
+119910119894minus 119901 (119909
119894)
119901 (119909119894) [1 minus 119901 (119909
119894)]
(A4)
B Proofs from (12) to (13)
First there are three cases in calculating 120573+119895minus 120573minus119895
(1) 120573+119895ge 0 120573minus
119895= 0
120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+
119895minus 120573minus119895minus1205821+ 119895
120596119894
(B1)
(2) 120573+119895ge 0 120573minus
119895= 0
120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+
119895minus 120573minus119895+1205821+ 119895
120596119894
(B2)
(3) 120573+119895ge 0 120573minus
119895= 0
120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+
119895minus 120573minus119895 (B3)
We derive
120573+119895minus 120573minus119895= 119878 [119909
119894119895(119911119894minus 119860119894) + 120573+119895minus 120573minus1198951205821+ 119895
120596119894
] (B4)
Secondly we give the following by the stationary condi-tion 120597119871120597Θ
119895119896= 0
1
2120596119894sdot 2(119911
119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896)
sdot (minus1
2119909119894119895119909119894119896) + (120582
2+ 120572119895)119880119895119896= 0
120596119894sdot (119911119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896)
sdot (1
2119909119894119895119909119894119896) = (120582
2+ 120572119895)119880119895119896
(119911119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896)
sdot (1
2119909119894119895119909119894119896) =
(1205822+ 120572119895)119880119895119896
120596119894
(B5)
Supposing
120574(minus119895119896) = 119911119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896+ Θ119895119896119909119894119895119909119894119896 (B6)
we have
(120574(minus119895119896) minus Θ119895119896119909119894119895119909119894119896) sdot (119909119894119895119909119894119896)
=2 (1205822+ 120572119895)119880119895119896
120596119894
Θ119895119896119909119894119895119909119894119896119909119894119895119909119894119896
= minus2 (1205822+ 120572119895)119880119895119896
120596119894
+ 120574(minus119895119896) sdot 119909119894119895119909119894119896
Mathematical Problems in Engineering 11
Θ119895119896
=120574(minus119895119896) sdot 119909
119894119895119909119894119896minus 2 (120582
2+ 120572119895)119880119895119896120596119894
(119909119894119895119909119894119896)2
(B7)
We discuss three cases about Θ119895119896value
(1) Θ119895119896gt 0 119880
119895119896= 1
Θ119895119896=120574(minus119895119896) sdot 119909
119894119895119909119894119896minus 2 (120582
2+ 120572119895) 120596119894
(119909119894119895119909119894119896)2
(B8)
(2) Θ119895119896lt 0 119880
119895119896= minus1
Θ119895119896=120574(minus119895119896) sdot 119909
119894119895119909119894119896+ 2 (120582
2+ 120572119895) 120596119894
(119909119894119895119909119894119896)2
(B9)
(3) Θ119895119896= 0
We drive
Θ119895119896=119878 [120574(minus119895119896) sdot 119909
119894119895119909119894119896 2 (1205822+ 120572119895) 120596119894]
(119909119894119895119909119894119896)2
(B10)
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work was supported by the National Natural ScienceFoundation of China (nos 61273019 61473339) China Post-doctoral Science Foundation (2014M561202) Hebei Postdoc-toral Science Foundation Special Fund Project and HebeiTop Young Talents Support Program
References
[1] R Tibshirani ldquoRegression shrinkage and selection via the lassordquoJournal of the Royal Statistical Society B vol 58 no 1 pp 267ndash288 1996
[2] M Y Park and T Hastie ldquo1198711-regularization path algorithm
for generalized linear modelsrdquo Journal of the Royal StatisticalSociety B Statistical Methodology vol 69 no 4 pp 659ndash6772007
[3] T TWu Y F Chen THastie E Sobel andK Lange ldquoGenome-wide association analysis by lasso penalized logistic regressionrdquoBioinformatics vol 25 no 6 pp 714ndash721 2009
[4] J Friedman T Hastie and R Tibshirani ldquoRegularization pathsfor generalized linear models via coordinate descentrdquo Journal ofStatistical Software vol 33 no 1 pp 1ndash22 2010
[5] M Lim and T Hastie ldquoLearning interactions via hierarchicalgroup-lasso regularization[EB]rdquo httpwwwstanfordedusimhastiePapersglinternetpdf
[6] H Schwender and K Ickstadt ldquoIdentification of SNP interac-tions using logic regressionrdquo Biostatistics vol 9 no 1 pp 187ndash198 2008
[7] S Noah and R Tibshirani ldquoA Permutation Approach to TestingInteractions inManyDimensionsrdquo httpstatwebstanfordedusimtibsresearchhtml
[8] JWu BDevlin S RingquistM Trucco andK Roeder ldquoScreenand clean a tool for identifying interactions in genome-wideassociation studiesrdquo Genetic Epidemiology vol 34 no 3 pp275ndash285 2010
[9] Y Nardi and A Rinaldo ldquoThe log-linear group-lasso estimatorand its asymptotic propertiesrdquo Bernoulli vol 18 no 3 pp 945ndash974 2012
[10] M Yuan V R Joseph and Y Lin ldquoAn efficient variable selectionapproach for analyzing designed experimentsrdquo Technometricsvol 49 no 4 pp 430ndash439 2007
[11] H Chipman ldquoBayesian variable selection with related predic-torsrdquoThe Canadian Journal of Statistics vol 24 no 1 pp 17ndash361996
[12] N H Choi W Li and J Zhu ldquoVariable selection with thestrong heredity constraint and its oracle propertyrdquo Journal of theAmerican Statistical Association vol 105 no 489 pp 354ndash3642010
[13] J Bien J Taylor and R Tibshirani ldquoA lasso for hierarchicalinteractionsrdquoTheAnnals of Statistics vol 41 no 3 pp 1111ndash11412013
[14] M Yuan and Y Lin ldquoModel selection and estimation in regres-sion with grouped variablesrdquo Journal of the Royal StatisticalSociety Series B StatisticalMethodology vol 68 no 1 pp 49ndash672006
[15] R Jenatton J-Y Audibert and F Bach ldquoStructured variableselection with sparsity-inducing normsrdquo Journal of MachineLearning Research vol 12 no 10 pp 2777ndash2824 2011
[16] P Radchenko and G M James ldquoVariable selection usingadaptive nonlinear interaction structures in high dimensionsrdquoJournal of the American Statistical Association vol 105 no 492pp 1541ndash1553 2010
[17] F Bach R Jenatton J Mairal and G Obozinski ldquoStructuredsparsity through convex optimizationrdquo Statistical Science vol27 no 4 pp 450ndash468 2012
[18] I Ruczinski C Kooperberg and M LeBlanc ldquoLogic regres-sionrdquo Journal of Computational and Graphical Statistics vol 12no 3 pp 475ndash511 2003
[19] P Hall and J-H Xue ldquoOn selecting interacting features fromhigh-dimensional datardquo Computational Statistics amp Data Anal-ysis vol 71 pp 694ndash708 2014
[20] J-J Wang J Li T Zhang and W-X Hong ldquoDistinguishingvisual feature extraction method using quadratic map andgenetic algorithmrdquo Journal of System Simulation vol 21 no 16pp 5080ndash5083 2009
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Mathematical Problems in Engineering 3
We use the second-order Taylor expansion at the currentestimated value (120573 Θ) for (5) and obtain the subproblem asfollows
119897119876(120573 Θ) =
1
2120596119894(119911119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896)
2
+(119910119894minus 119901 (xi))
2
2 lowast 1199012 (xi) [1 minus 119901 (xi)]2
+ [
[
119910119894(sum119895
120573119895119909119894119895+ sum119895 =119896
Θ119895119896119909119894119895119909119894119896)
+ ln (1 minus 119901 (xi))]]
(6)
where 120596119894= 119901(xi)[1 minus 119901(xi)] 119911119894 = sum
119895120573119895119909119894119895+ (12)sum
119895 =119896Θ119895119896
119909119894119895119909119894119896+ (119910119894minus 119901(xi))119901(xi)[1 minus 119901(xi)]
The proof that (5) implies (6) is presented in theAppendix A
In order to obtain the sparse solution for themain variablecoefficients and interaction coefficients the penalty functionis used to enhance the stability of the interactive model
(120573 Θ) = arg Min120573isin119877119901+1Θisin119877119901times119901
minus 119897119876(120573 Θ) + 120582
1
100381710038171003817100381712057310038171003817100381710038171
+ 1205822 Θ1
(7)
We focus on those interactions that have large main vari-able coefficients Such restrictions are known as ldquohierarchyrdquoThe mathematical expression for them is Θ
119895119896= 0 rArr 120573
119895= 0
or 120573119896
= 0 So we add the constraints enforcing the hierarchyinto (7) as follows
(120573 Θ)
= Min120573isin119877119901+1Θisin119877119901times119901
minus 119897119876(120573 Θ) + 120582
1
100381710038171003817100381712057310038171003817100381710038171 + 1205822 Θ1
st 10038171003817100381710038171003817Θ119895100381710038171003817100381710038171 le
1003816100381610038161003816100381612057311989510038161003816100381610038161003816 for 119895 = 1 119901
(8)
where Θ119895is the 119895th column of Θ If Θ
119895119896= 0 then Θ
1198951gt 0
Θ1198961gt 0 so 120573
119895= 0 and 120573
119896= 0 The new constraint
guarantees the hierarchy but we cannot obtain a convexsolution because (8) is not convex So instead of 120573 we use120573+ 120573minus And the corresponding convex relaxation of (8) is asfollows
(120573plusmn Θ) = Min1205730isin119877120573
plusmnisin119877119901Θisin119877119901times119901
minus 119897119876(120573+ minus 120573minus Θ)
+ 12058211119879 (120573+ + 120573minus) + 120582
2sum119895
10038171003817100381710038171003817Θ11989510038171003817100381710038171003817
st10038171003817100381710038171003817Θ119895
100381710038171003817100381710038171 le 120573+119895+ 120573minus119895
120573+119895ge 0 120573minus
119895ge 0
for 119895 = 1 119901
(9)
where 120573 = 120573+ minus 120573minus 120573plusmn = maxplusmn120573 0 120573+ 120573minus isin 119877119901+1 and1205731= 120573+ + 120573minus
4 Coordinate Descent Algorithm andKKT Conditions
The basic idea of the coordinate descent algorithm is toconvert multivariate problems into multiple single variablesubproblems It allows optimizing only one-dimensionalvariables at a time The solution can be updated in a cycleWe solve (9) using the coordinate descent algorithm
The Lagrange function corresponding to (9) is as follows
119871 (120573+ 120573minus Θ) = minus119897119876(120573+ minus 120573minus Θ)
+ (12058211 minus minus 120574+1)
119879
120573+
+ (12058211 minus minus 120574minus1)
119879
120573minus
+ ⟨diag (12058221 + )119880Θ⟩
(10)
where
119895119896=
isin [minus1 1] Θ119896119895= 0
sign (Θ119895119896) Θ119895119896
= 0(11)
and and 120574plusmn are the dual variables corresponding tothe hierarchical constraint and the nonnegative constraintsFormula (10) can be decomposed into 119901 subproblems
119871 (120573+119895 120573minus119895 Θ119895) = minus119897
119876(120573+119895minus 120573minus119895 Θ119895)
+ (1205821minus 119895minus 120574+)119879
120573+119895
+ (1205821minus 119895minus 120574minus)119879
120573minus119895
+ ⟨(1205822+ 120572)119880
119895 Θ119895⟩
(12)
The solution of (12) as a convex problem can be obtainedby a set of optimality conditions known as the KKT (Karush-Kuhn-Tucker) conditions This is the key advantage of ourapproach
The stationary conditions of (12) according to KKT are120597119871120597120573plusmn
119895= 0 120597119871120597Θ
119895119896= 0The complementary conditions are
Θ1198951le 120573+119895+120573minus119895 120574plusmn119895120573plusmn119895= 0 120573plusmn ge 0 120574plusmn ge 0 ge 0
119895(Θ1198951minus
120573+119895minus 120573minus119895) = 0 We assume that (1119873)sum119873
119894=11199092119894119895= 1 For our
problem the KKT conditions can be written as follows
120573+119895minus 120573minus119895= 119878 [119909
119894119895(119911119894minus 119860119894) + 120573+119895minus 120573minus1198951205821+ 119895
120596119894
]
Θ119895119896=119878 [(119909119894119895119909119894119896)119879
(119911119894minus 119860119894+ Θ119895119896119909119894119895119909119894119896) 2 (120582
2+ 119895) 120596119894]
10038171003817100381710038171003817119909119894119895119909119894119896100381710038171003817100381710038172
(13)
where119860119894= sum119895120573119895119909119894119895+(12)sum
119895 =119896Θ119895119896119909119894119895119909119894119896 119911119894120596119894are from (6)
and 119878 denotes the soft-threshold operator defined by 119878(119888 120582) =sign(119888)(|119888| minus 120582)
+
4 Mathematical Problems in Engineering
Table 1 Datasets information
Datasets Samples Variable dimension Interactive dimension Class1 Breast-cancer-Wisconsin 683 9 36 22 Ionosphere 351 33 528 23 Liver disorders 345 6 15 24 Sonar 208 60 1770 2
Table 2 The experimental results of our method
Datasets Coefficients Error rate () SD Time (s) 120582
Breast-cancer-Wisconsin 22 30 001 949 314Ionosphere 101 280 002 12463 282Liver disorders 25 260 002 1211 237Sonar 117 140 002 15069 073
The proof of both expressions (13) can be found inAppendix B
Now we define 119891(119895) = Θ
1198951minus 120573+119895minus 120573minus119895 Then
119891 (119895)
=
10038171003817100381710038171003817100381710038171003817100381710038171003817
119875
sum119896=1
119878 [(119909119894119895119909119894119896)119879
(119911119894minus 119860119894+ Θ119895119896119909119894119895119909119894119896)
2 (1205822+ 119895)
120596119894
]
sdot (10038171003817100381710038171003817119909119894119895119909119894119896
100381710038171003817100381710038172
2)minus1
100381710038171003817100381710038171003817100381710038171003817100381710038171
minus100381710038171003817100381710038171003817100381710038171003817119878 [119909119894119895(119911119894minus 119860119894) + 120573+119895minus 120573minus1198951205821+ 119895
120596119894
]1003817100381710038171003817100381710038171003817100381710038171
(14)
The remaining KKT conditions only involve 119891() = 0119891() le 0 ge 0 Observing that 119891 is nonincreasing withrespect to and is piecewise linear it is easy to get the solutionfor
In conclusion the overall idea of the coordinate descentalgorithm is that the minimization of (9) is equivalent to theminimization of (10) Formula (10) can be decomposed into119901 independent formula (12) Formula (12) can be solved as(13) The final coefficient optimization iteration formula is asfollows
120573+(119898+1)119895
minus 120573minus(119898+1)119895
= 119878 [119909119894119895(119911119898119894minus 119860119898119894) + 120573+(119898)119895
minus 120573minus(119898)119895
1205821+ 119898119895
120596119896119894
]
Θ119898+1119895119896
=119878 [(119909119894119895119909119894119896) (119911119898119894minus 119860119898119894+ Θ119898119895119896119909119894119895119909119894119896) 2 (120582
2+ 119898119895) 120596119898119894]
10038171003817100381710038171003817119909119894119895119909119894119896100381710038171003817100381710038172
(15)
where 120573+(119898)119895
minus 120573minus(119898)119895
is the estimated value of the 119895th mainvariable coefficient after119898 iterations and Θ119898
119895119896is the estimation
of the interaction coefficient between the 119895th variable and the119896th variable after119898 iterations
5 The Experimental Results and Analysis
51 The Experimental Results and Analysis of Four UCIDatasets There are four UCI study database datasets whichinclude the breast-cancer-Wisconsin datasets Ionospheredatasets Liver disorders datasets and Sonar datasets asshown in Table 1
We do the 10-fold cross-validation (10-CV) experimentsin the paper for 20 times using 119877 where 120582
1= 2120582
2= 120582
Besideswe complete an experiment employing the interactivehierarchical lasso logistic regression method The resultsinclude the number of nonzero variable coefficients averageerror rate of the 10-CV standard deviation (SD) CPU timeand the value of lambda (120582) estimated The results are shownin Table 2
The results of the 10-CV based on the four datasets arepresented in Figures 2 to 5 using the proposedmethod In thefigures the horizontal axis represents the logarithmic value of120582 and the vertical axis is the error rate of the 10-CV Besidesthe horizontal axis at the top of each figure represents thenumber of nonzero variable coefficients corresponding to the120582 value
The results for the breast-cancer-Wisconsin datasets areshown in Figure 2 The minimum error rate is 003 and thenumber of selected variables is more than 11 The results forthe Ionosphere experimental datasets are shown in Figure 3When the number of selected variables is 101 the lowest errorrate is 028 with the smaller standard deviation The numberof the selected variables is larger than the original dimensionso the interactions provide the classified information Theresults for the Liver disorders datasets are presented inFigure 4 If the number of selected variables is 25 the lowesterror rate can reach 026 while the standard deviation is002 Finally the results for the Sonar datasets are presentedin Figure 5 When more than 80 variables are selected theminimum error rate is 014
Mathematical Problems in Engineering 5
005
015
025
035
Cros
s-va
lidat
ion
erro
r
24 20 14 14 13 13 11 11 6Number of features
02
3
1 2 3 4 5
log(120582)
Figure 2 The results of breast-cancer-Wisconsin
028
030
032
034
036
038
177 139 104 83 73 53 30 9Number of features
3
Cros
s-va
lidat
ion
erro
r
log(120582)minus1 0 1 2 3
Figure 3 The results of Ionosphere
In what follows we compare our method to the existingliterature [13] The classification results and training time ofour method are better than those shown in the literature[13] The experimental results of the lasso all-pair lassoand conventional pattern recognition methods with 10-foldcross-validation of 20 times in the four UCI datasets arelisted respectively in Tables 3 4 and 5 Conventional patternrecognitionmethods include support vectormachine (SVM)linear discriminant analysis (LDA) quadratic discriminantanalysis (QDA)119870-nearest neighborhood (119870-NN) and deci-sion tree (DT) methods The lasso is a method that considersthe main variables without the interaction variables All-pairlasso is a method that considers the main variables and inter-action variables but without the hierarchy The experimental
025
030
035
33 32 32 28 26 24 24 19 12 8
040
log(120582)minus1 0 1 2 3
Number of features
Cros
s-va
lidat
ion
erro
r
Figure 4 The results of Liver disorders
03
05
124 112 97 88 77 70 42 18 8
02
04
log(120582)0 1 2 3
1Number of features
Cros
s-va
lidat
ion
erro
r
Figure 5 The results of Sonar
Table 3 The experimental results of the lasso penalized logisticregression model
Datasets Error rate() SD Time (s)
Breast-cancer-Wisconsin 322 00001 03744Ionosphere 3115 00075 04863Liver disorders 3077 00077 00908Sonar 2213 00121 156806
results show that our model is better in classification resultsand more stableThis highlights the advantage of the variableinteractions and hierarchy
52 The Experimental Results for High-Dimensional SmallSample Data The Madelon datasets from NIPS2003 was
6 Mathematical Problems in Engineering
Table 4 The experimental results of the all-pair lasso penalizedlogistic regression model
Datasets Error rate() SD Time (s)
Breast-cancer-Wisconsin 293 00001 121213Ionosphere 2700 00089 31734Liver disorders 2636 00058 49190Sonar 2063 00115 41041
used to evaluate our method The sample numbers of thetraining validation and testing of the data were respectively2000 600 and 1800 The class number is 2 The variabledimension is 500 So the interactive dimension is 124750You can find more information about the datasets followingthe link httpwwwnipsfscecssotonacuk you can alsodownload the datasets and see the results of challengesbalance error rates and the area under the curve The modelis trained by using a training set The model parameters areselected by using a validation set Also the prediction resultsof the final model using the test set are uploaded online Theclassification score of the final model is obtained Our resultsare shown in Table 6 The results show that our method isslightly better than the lasso and all-pair lasso This impliesthat the interactions may also be important in the Madelondatasets
53 Activity Recognition (AR) Using Inertial Sensors of Smart-phones Anguita et al collected sensor data of smartphones[10] They used the support vector machine (SVM) methodto solve the classification problem of the daily life activityrecognition These results play an extremely significant rolein disability and elderly care Datasets can be downloadedfollowing the literature [10] 30 volunteers aged 19ndash48 yearsparticipated in the study Each person performed six activitieswearing the smartphone on the waist To obtain the dataclass label experiments were conducted using video record-ing The smartphone used in the experiments had built-inaccelerometer and gyroscope for measuring 3D linear accel-eration and angular acceleration The sampling frequencywas 50Hz which is more than enough for capturing humanmovements
We use the datasets to evaluate our method We use theupstairs and downstairs movements as two active classesThetraining sets have 986 samples and 1073 samples respectivelyThe test sets have 420 samples and 471 samples respectivelyThe variable dimension is 561 which includes the time andfrequency from sensor signals
Experimental results of the three lassomethods and somepattern recognitionmethods are shown inTable 7The resultsshow that our method is better than the pattern recognitionmethods since it takes the variable selection and interactioninto account Our method achieves the best classificationresults with less training and testing time
54 The Numerical Simulation Results and Discussion Nowsuppose that the number and dimension of the samples are
119899 = 200 119901 = 20 We take interactions into considerationand provide the following three kinds of simulation based onformula (1)
(1) The real model is hierarchical Θ119895119896
= 0 rArr 120573119895
= 0 or120573119896
= 0 119895 119896 = 1 119901 There are 10 nonzero elementsin 120573 and 20 nonzero elements in Θ
(2) The real model only includes interactive variables120573119895= 0 119895 = 1 119901 There are 20 nonzero elements
in Θ
(3) The real model only includes main variables Θ119895119896
=0 119895 119896 = 1 119901 There are 10 nonzero elements in 120573
The SNR of the main variables is 15 and SNR of theinteraction variables is 1 The experiment results of 100 timesare shown in Figure 6
When the real model is hierarchical our method is thebest and the lasso is the worst This is shown in Figure 6(a)When the real model only includes interaction variances theinteractive lasso is the best and our method takes the secondplace while the lasso is still theworst as shown in Figure 6(b)The reason for this result is that when our method fitsthe model the interaction variables are considered to bemain variables When the real model only includes mainvariables the lasso is the best and our method still takes thesecond place and the all-pair lasso is the worst as shown inFigure 6(c)
We believe thatmany actual classification problems couldbe hierarchical and interactive They contain both mainvariables and interaction variables Our method fits in thiskind of situation
6 Conclusion
Taking into consideration the interaction between variablesthe hierarchical interactive lasso penalized logistic regressionusing the coordinate descent algorithm is derivedWeprovidethe model definition constraint condition and the convexrelaxation condition for the model We obtain a solutionfor the coefficients of the proposed model based on theconvex optimization and coordinate descent algorithm Wefurther provide experimental results based on four UCIdatasets NIPS2003 feature selection challenge datasets andtrue daily life activities identification datasets The resultsshow that the interaction widely exists in the classificationmodels They also demonstrate that the variable interactioncontributes to the response The classification performanceof our method is superior to the lasso all-pair lasso andsome pattern recognition methods It turns out that thevariable interaction and hierarchy are two important factorsOur further research is planned as follows other convexoptimization methods including the generalized gradientdescent method or alternating direction multiplier methodthe hierarchical interactive lasso penalized multiclass logisticregressionmethod the elastic net method or the hierarchicalgroup lasso method The application of the multisensorinteraction in the daily life activities of the elderly is a newway of using of our method
Mathematical Problems in Engineering 7
Table 5 The experimental results of the traditional pattern recognition methods
Datasets Classifier Error rate () SD Time (s)
Breast-cancer-Wisconsin
SVM 342 00019 13092LDA 396 00007 02429QDA 492 00014 02309K-NN 461 00030 06636DT 499 00041 16201
Ionosphere
SVM 3583 00063 29149LDA 3310 00090 06168QDA 3201 00076 04516K-NN 2830 00090 06363DT 3823 00408 30865
Liver disorders
SVM 3080 00100 11401LDA 3131 00089 02474QDA 4019 00119 01958K-NN 3689 00124 06089DT 3694 00187 19921
Sonar
SVM 2570 00197 12181LDA 2513 00145 05897QDA 2380 00205 04594K-NN 1309 00101 05345DT 3591 00303 07153
Table 6 The experimental results for the Madelon datasets
Methods Balance error rate () Area under the curve ()Training Validation Testing Training Validation Testing
Lasso 3808 3721 3661 6478 6279 6340All-pair lasso 3498 3701 3657 6502 6299 6343Our method 3000 3635 3512 7000 6808 6776
Appendices
A Proofs from (5) to (6)
For notational convenience we have written 119901119894instead of
119901(119909119894) A logarithmic likelihood function of (4) is as follows
119871119876=
1
119873ln [119871 (120573 Θ)]
=1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
(A1)
First we give the first- and second-order partial derivativeand mixed partial derivative of (4) with respect to 120573 and Θ
120597119871119876
120597120573
=120597 (1119873)sum
119873
119894=1[119910119894(119909119879119894120573 + (12) 119909119879
119894Θ119909119894) + ln (1 minus 119901
119894)]
120597120573
=1
119873
119873
sum119894=1
[(119909119879119894119910119894)119879
minusexp (119909119879
119894120573 + (12) 119909119879
119894Θ119909119894)
1 + exp (119909119879119894120573 + (12) 119909119879
119894Θ119909119894)sdot 119909119894]
=1
119873
119873
sum119894=1
(119910119879119894119909119894minus 119901119879119894119909119894) =
1
119873
119873
sum119894=1
119909119894(119910119894minus 119901119894)
1205972119871119876
1205971205732
=120597
120597120573
1
119873
119873
sum119894=1
119909119879119894(119910119894minus 119901119894) = minus
1
119873
119873
sum119894=1
119909119879119894
120597119901119894
120597120573
=1
119873
119873
sum119894=1
119909119879119894
exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
[1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)]2sdot (minus119909119894)
= minus1
119873
119873
sum119894=1
119909119894
exp (119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
sdot1
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)sdot 119909119879119894
= minus1
119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894
120597119871119876
120597Θ
8 Mathematical Problems in Engineering
Table 7 The experimental results of the three lasso methods and five traditional pattern recognition methods
Methods Error rate in training () Training time (s) Error rate in testing () Testing time (s)Lasso 000 02649 146 00312All-pair lasso 000 08807 135 00953Our method 000 06012 112 00556SVM 000 09516 123 009361-NN 000 00000 920 079563-NN 000 00000 898 07488QDA 661 14196 404 02184DT 000 07332 1493 01092
0055
005
0045
004
0035
003
0025
002
0015
Erro
r rat
e
Lasso Our method All-pair lasso
(a) Hierarchical interaction
0055
005
0045
004
0035
003
0025
002
0015
Erro
r rat
e
Lasso Our method All-pair lasso
(b) Interaction variances only
0055
005
0045
004
0035
003
0025
002
0015
Erro
r rat
e
Lasso Our method All-pair lasso
(c) Main variances only
Figure 6 The error rate of the three lasso methods in the simulation experiment (the Bayes error is shown by the purple dotted line in thegraph)
Mathematical Problems in Engineering 9
=120597 (1119873)sum
119873
119894=1[119910119894(119909119879119894120573 + (12) 119909119879
119894Θ119909119894) + ln (1 minus 119901
119894)]
120597Θ
=1
119873
119873
sum119894=1
[(1
2119910119894119909119879119894119909119894)119879
minusexp (119909119879
119894120573 + (12) 119909119879
119894Θ119909119894)
1 + exp (119909119879119894120573 + (12) 119909119879
119894Θ119909119894)
sdot1
2sdot 119909119879119894119909119894]
=1
119873
119873
sum119894=1
(1
2119910119894119909119879119894119909119894minus1
2119901119894119909119879119894119909119894)
=1
2119873
119873
sum119894=1
119909119879119894119909119894(119910119894minus 119901119894)
1205972119871119876
120597Θ2
=120597
120597Θ
1
2119873
119873
sum119894=1
119909119879119894119909119894(119910119894minus 119901119894) = minus
1
2119873
119873
sum119894=1
119909119879119894119909119894
120597119901119894
120597Θ
=1
2119873
119873
sum119894=1
exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
[1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)]2
sdot (minus1
2119909119879119894119909119894) sdot 119909119879119894119909119894
= minus1
4119873
119873
sum119894=1
exp (119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
sdot1
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)sdot1003817100381710038171003817119909119894
10038171003817100381710038172
= minus1
4119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot1003817100381710038171003817119909119894
10038171003817100381710038172
1205972119871119876
120597120573120597Θ
=120597
120597120573
1
119873
119873
sum119894=1
119909119879119894(119910119894minus 119901119894) = minus
1
119873
119873
sum119894=1
119909119879119894
120597119901119894
120597Θ
=1
119873
119873
sum119894=1
119909119879119894
exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
[1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)]2
sdot (minus1
2119909119879119894119909119894)
= minus1
119873
119873
sum119894=1
119909119879119894
exp (119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
sdot1
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
= minus1
2119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894sdot 119909119879119894
(A2)
Then (A1) is expanded by using Taylor series with respectto the expended point (120573 Θ)
119897119876(120573 Θ)
= 119871119876+ (120573 minus 120573) sdot
120597119871119876
120597120573+ (Θ minus Θ) sdot
120597119871119876
120597Θ
+1
2[(120573 minus 120573) sdot
120597119871119876
1205971205732+ 2 (120573 minus 120573)
119879
(Θ minus Θ)1205972119871119876
120597120573120597Θ
+ (Θ minus Θ) sdot120597119871119876
120597Θ2]
=1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
+ (120573 minus 120573) sdot1
119873
119873
sum119894=1
119909119894(119910119894minus 119901119894)
+ (Θ minus Θ) sdot1
2119873
119873
sum119894=1
119909119879119894119909119894(119910119894minus 119901119894)
minus1
2(120573 minus 120573) sdot
1
119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894
minus 2 (120573 minus 120573)119879
(Θ minus Θ)
sdot1
2119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894sdot 119909119879119894
minus (Θ minus Θ) sdot1
4119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot1003817100381710038171003817119909119894
10038171003817100381710038172
= minus1
2119873
119873
sum119894=1
(1 minus 119901119894)
sdot 119901119894[119909119879119894(120573 minus 120573)]
2
+ (120573 minus 120573)119879
(Θ minus Θ) 119909119894
sdot 119909119879119894119909119894+1
4[119909119879119894(Θ minus Θ) 119909
119894]2
+1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
+1
119873
119873
sum119894=1
119909119879119894(120573 minus 120573) (119910
119894minus 119901119894)
+1
2119873
119873
sum119894=1
119909119879119894(Θ minus Θ) 119909
119894(119910119894minus 119901119894)
= minus1
2119873
119873
sum119894=1
(1 minus 119901119894)
10 Mathematical Problems in Engineering
sdot 119901119894119909119879119894(120573 minus 120573)
+1
2119909119879119894(Θ minus Θ) 119909
1198942
+1
119873
119873
sum119894=1
119909119879119894(120573 minus 120573) (119910
119894minus 119901119894)
+1
2119873
119873
sum119894=1
119909119879119894(Θ minus Θ) 119909
119894(119910119894minus 119901119894)
+1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
= minus1
2119873
119873
sum119894=1
(1 minus 119901119894)
sdot 119901119894119909119879119894(120573 minus 120573) +
1
2119909119879119894(Θ minus Θ) 119909
119894
+119910119894minus 119901119894
119901119894[1 minus 119901
119894]
2
+1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
minus1
2119873
119873
sum119894=1
[119910119894minus 119901119894
119901119894[1 minus 119901
119894]]
2
= minus1
2119873
119873
sum119894=1
120596119894(119911119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896)
2
+1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
minus1
2119873
119873
sum119894=1
[119910119894minus 119901119894
119901119894[1 minus 119901
119894]]
2
(A3)
where
120596119894= 119901 (119909
119894) [1 minus 119901 (119909
119894)]
119911119894= sum119895
120573119895119909119894119895+1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896
+119910119894minus 119901 (119909
119894)
119901 (119909119894) [1 minus 119901 (119909
119894)]
(A4)
B Proofs from (12) to (13)
First there are three cases in calculating 120573+119895minus 120573minus119895
(1) 120573+119895ge 0 120573minus
119895= 0
120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+
119895minus 120573minus119895minus1205821+ 119895
120596119894
(B1)
(2) 120573+119895ge 0 120573minus
119895= 0
120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+
119895minus 120573minus119895+1205821+ 119895
120596119894
(B2)
(3) 120573+119895ge 0 120573minus
119895= 0
120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+
119895minus 120573minus119895 (B3)
We derive
120573+119895minus 120573minus119895= 119878 [119909
119894119895(119911119894minus 119860119894) + 120573+119895minus 120573minus1198951205821+ 119895
120596119894
] (B4)
Secondly we give the following by the stationary condi-tion 120597119871120597Θ
119895119896= 0
1
2120596119894sdot 2(119911
119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896)
sdot (minus1
2119909119894119895119909119894119896) + (120582
2+ 120572119895)119880119895119896= 0
120596119894sdot (119911119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896)
sdot (1
2119909119894119895119909119894119896) = (120582
2+ 120572119895)119880119895119896
(119911119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896)
sdot (1
2119909119894119895119909119894119896) =
(1205822+ 120572119895)119880119895119896
120596119894
(B5)
Supposing
120574(minus119895119896) = 119911119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896+ Θ119895119896119909119894119895119909119894119896 (B6)
we have
(120574(minus119895119896) minus Θ119895119896119909119894119895119909119894119896) sdot (119909119894119895119909119894119896)
=2 (1205822+ 120572119895)119880119895119896
120596119894
Θ119895119896119909119894119895119909119894119896119909119894119895119909119894119896
= minus2 (1205822+ 120572119895)119880119895119896
120596119894
+ 120574(minus119895119896) sdot 119909119894119895119909119894119896
Mathematical Problems in Engineering 11
Θ119895119896
=120574(minus119895119896) sdot 119909
119894119895119909119894119896minus 2 (120582
2+ 120572119895)119880119895119896120596119894
(119909119894119895119909119894119896)2
(B7)
We discuss three cases about Θ119895119896value
(1) Θ119895119896gt 0 119880
119895119896= 1
Θ119895119896=120574(minus119895119896) sdot 119909
119894119895119909119894119896minus 2 (120582
2+ 120572119895) 120596119894
(119909119894119895119909119894119896)2
(B8)
(2) Θ119895119896lt 0 119880
119895119896= minus1
Θ119895119896=120574(minus119895119896) sdot 119909
119894119895119909119894119896+ 2 (120582
2+ 120572119895) 120596119894
(119909119894119895119909119894119896)2
(B9)
(3) Θ119895119896= 0
We drive
Θ119895119896=119878 [120574(minus119895119896) sdot 119909
119894119895119909119894119896 2 (1205822+ 120572119895) 120596119894]
(119909119894119895119909119894119896)2
(B10)
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work was supported by the National Natural ScienceFoundation of China (nos 61273019 61473339) China Post-doctoral Science Foundation (2014M561202) Hebei Postdoc-toral Science Foundation Special Fund Project and HebeiTop Young Talents Support Program
References
[1] R Tibshirani ldquoRegression shrinkage and selection via the lassordquoJournal of the Royal Statistical Society B vol 58 no 1 pp 267ndash288 1996
[2] M Y Park and T Hastie ldquo1198711-regularization path algorithm
for generalized linear modelsrdquo Journal of the Royal StatisticalSociety B Statistical Methodology vol 69 no 4 pp 659ndash6772007
[3] T TWu Y F Chen THastie E Sobel andK Lange ldquoGenome-wide association analysis by lasso penalized logistic regressionrdquoBioinformatics vol 25 no 6 pp 714ndash721 2009
[4] J Friedman T Hastie and R Tibshirani ldquoRegularization pathsfor generalized linear models via coordinate descentrdquo Journal ofStatistical Software vol 33 no 1 pp 1ndash22 2010
[5] M Lim and T Hastie ldquoLearning interactions via hierarchicalgroup-lasso regularization[EB]rdquo httpwwwstanfordedusimhastiePapersglinternetpdf
[6] H Schwender and K Ickstadt ldquoIdentification of SNP interac-tions using logic regressionrdquo Biostatistics vol 9 no 1 pp 187ndash198 2008
[7] S Noah and R Tibshirani ldquoA Permutation Approach to TestingInteractions inManyDimensionsrdquo httpstatwebstanfordedusimtibsresearchhtml
[8] JWu BDevlin S RingquistM Trucco andK Roeder ldquoScreenand clean a tool for identifying interactions in genome-wideassociation studiesrdquo Genetic Epidemiology vol 34 no 3 pp275ndash285 2010
[9] Y Nardi and A Rinaldo ldquoThe log-linear group-lasso estimatorand its asymptotic propertiesrdquo Bernoulli vol 18 no 3 pp 945ndash974 2012
[10] M Yuan V R Joseph and Y Lin ldquoAn efficient variable selectionapproach for analyzing designed experimentsrdquo Technometricsvol 49 no 4 pp 430ndash439 2007
[11] H Chipman ldquoBayesian variable selection with related predic-torsrdquoThe Canadian Journal of Statistics vol 24 no 1 pp 17ndash361996
[12] N H Choi W Li and J Zhu ldquoVariable selection with thestrong heredity constraint and its oracle propertyrdquo Journal of theAmerican Statistical Association vol 105 no 489 pp 354ndash3642010
[13] J Bien J Taylor and R Tibshirani ldquoA lasso for hierarchicalinteractionsrdquoTheAnnals of Statistics vol 41 no 3 pp 1111ndash11412013
[14] M Yuan and Y Lin ldquoModel selection and estimation in regres-sion with grouped variablesrdquo Journal of the Royal StatisticalSociety Series B StatisticalMethodology vol 68 no 1 pp 49ndash672006
[15] R Jenatton J-Y Audibert and F Bach ldquoStructured variableselection with sparsity-inducing normsrdquo Journal of MachineLearning Research vol 12 no 10 pp 2777ndash2824 2011
[16] P Radchenko and G M James ldquoVariable selection usingadaptive nonlinear interaction structures in high dimensionsrdquoJournal of the American Statistical Association vol 105 no 492pp 1541ndash1553 2010
[17] F Bach R Jenatton J Mairal and G Obozinski ldquoStructuredsparsity through convex optimizationrdquo Statistical Science vol27 no 4 pp 450ndash468 2012
[18] I Ruczinski C Kooperberg and M LeBlanc ldquoLogic regres-sionrdquo Journal of Computational and Graphical Statistics vol 12no 3 pp 475ndash511 2003
[19] P Hall and J-H Xue ldquoOn selecting interacting features fromhigh-dimensional datardquo Computational Statistics amp Data Anal-ysis vol 71 pp 694ndash708 2014
[20] J-J Wang J Li T Zhang and W-X Hong ldquoDistinguishingvisual feature extraction method using quadratic map andgenetic algorithmrdquo Journal of System Simulation vol 21 no 16pp 5080ndash5083 2009
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
4 Mathematical Problems in Engineering
Table 1 Datasets information
Datasets Samples Variable dimension Interactive dimension Class1 Breast-cancer-Wisconsin 683 9 36 22 Ionosphere 351 33 528 23 Liver disorders 345 6 15 24 Sonar 208 60 1770 2
Table 2 The experimental results of our method
Datasets Coefficients Error rate () SD Time (s) 120582
Breast-cancer-Wisconsin 22 30 001 949 314Ionosphere 101 280 002 12463 282Liver disorders 25 260 002 1211 237Sonar 117 140 002 15069 073
The proof of both expressions (13) can be found inAppendix B
Now we define 119891(119895) = Θ
1198951minus 120573+119895minus 120573minus119895 Then
119891 (119895)
=
10038171003817100381710038171003817100381710038171003817100381710038171003817
119875
sum119896=1
119878 [(119909119894119895119909119894119896)119879
(119911119894minus 119860119894+ Θ119895119896119909119894119895119909119894119896)
2 (1205822+ 119895)
120596119894
]
sdot (10038171003817100381710038171003817119909119894119895119909119894119896
100381710038171003817100381710038172
2)minus1
100381710038171003817100381710038171003817100381710038171003817100381710038171
minus100381710038171003817100381710038171003817100381710038171003817119878 [119909119894119895(119911119894minus 119860119894) + 120573+119895minus 120573minus1198951205821+ 119895
120596119894
]1003817100381710038171003817100381710038171003817100381710038171
(14)
The remaining KKT conditions only involve 119891() = 0119891() le 0 ge 0 Observing that 119891 is nonincreasing withrespect to and is piecewise linear it is easy to get the solutionfor
In conclusion the overall idea of the coordinate descentalgorithm is that the minimization of (9) is equivalent to theminimization of (10) Formula (10) can be decomposed into119901 independent formula (12) Formula (12) can be solved as(13) The final coefficient optimization iteration formula is asfollows
120573+(119898+1)119895
minus 120573minus(119898+1)119895
= 119878 [119909119894119895(119911119898119894minus 119860119898119894) + 120573+(119898)119895
minus 120573minus(119898)119895
1205821+ 119898119895
120596119896119894
]
Θ119898+1119895119896
=119878 [(119909119894119895119909119894119896) (119911119898119894minus 119860119898119894+ Θ119898119895119896119909119894119895119909119894119896) 2 (120582
2+ 119898119895) 120596119898119894]
10038171003817100381710038171003817119909119894119895119909119894119896100381710038171003817100381710038172
(15)
where 120573+(119898)119895
minus 120573minus(119898)119895
is the estimated value of the 119895th mainvariable coefficient after119898 iterations and Θ119898
119895119896is the estimation
of the interaction coefficient between the 119895th variable and the119896th variable after119898 iterations
5 The Experimental Results and Analysis
51 The Experimental Results and Analysis of Four UCIDatasets There are four UCI study database datasets whichinclude the breast-cancer-Wisconsin datasets Ionospheredatasets Liver disorders datasets and Sonar datasets asshown in Table 1
We do the 10-fold cross-validation (10-CV) experimentsin the paper for 20 times using 119877 where 120582
1= 2120582
2= 120582
Besideswe complete an experiment employing the interactivehierarchical lasso logistic regression method The resultsinclude the number of nonzero variable coefficients averageerror rate of the 10-CV standard deviation (SD) CPU timeand the value of lambda (120582) estimated The results are shownin Table 2
The results of the 10-CV based on the four datasets arepresented in Figures 2 to 5 using the proposedmethod In thefigures the horizontal axis represents the logarithmic value of120582 and the vertical axis is the error rate of the 10-CV Besidesthe horizontal axis at the top of each figure represents thenumber of nonzero variable coefficients corresponding to the120582 value
The results for the breast-cancer-Wisconsin datasets areshown in Figure 2 The minimum error rate is 003 and thenumber of selected variables is more than 11 The results forthe Ionosphere experimental datasets are shown in Figure 3When the number of selected variables is 101 the lowest errorrate is 028 with the smaller standard deviation The numberof the selected variables is larger than the original dimensionso the interactions provide the classified information Theresults for the Liver disorders datasets are presented inFigure 4 If the number of selected variables is 25 the lowesterror rate can reach 026 while the standard deviation is002 Finally the results for the Sonar datasets are presentedin Figure 5 When more than 80 variables are selected theminimum error rate is 014
Mathematical Problems in Engineering 5
005
015
025
035
Cros
s-va
lidat
ion
erro
r
24 20 14 14 13 13 11 11 6Number of features
02
3
1 2 3 4 5
log(120582)
Figure 2 The results of breast-cancer-Wisconsin
028
030
032
034
036
038
177 139 104 83 73 53 30 9Number of features
3
Cros
s-va
lidat
ion
erro
r
log(120582)minus1 0 1 2 3
Figure 3 The results of Ionosphere
In what follows we compare our method to the existingliterature [13] The classification results and training time ofour method are better than those shown in the literature[13] The experimental results of the lasso all-pair lassoand conventional pattern recognition methods with 10-foldcross-validation of 20 times in the four UCI datasets arelisted respectively in Tables 3 4 and 5 Conventional patternrecognitionmethods include support vectormachine (SVM)linear discriminant analysis (LDA) quadratic discriminantanalysis (QDA)119870-nearest neighborhood (119870-NN) and deci-sion tree (DT) methods The lasso is a method that considersthe main variables without the interaction variables All-pairlasso is a method that considers the main variables and inter-action variables but without the hierarchy The experimental
025
030
035
33 32 32 28 26 24 24 19 12 8
040
log(120582)minus1 0 1 2 3
Number of features
Cros
s-va
lidat
ion
erro
r
Figure 4 The results of Liver disorders
03
05
124 112 97 88 77 70 42 18 8
02
04
log(120582)0 1 2 3
1Number of features
Cros
s-va
lidat
ion
erro
r
Figure 5 The results of Sonar
Table 3 The experimental results of the lasso penalized logisticregression model
Datasets Error rate() SD Time (s)
Breast-cancer-Wisconsin 322 00001 03744Ionosphere 3115 00075 04863Liver disorders 3077 00077 00908Sonar 2213 00121 156806
results show that our model is better in classification resultsand more stableThis highlights the advantage of the variableinteractions and hierarchy
52 The Experimental Results for High-Dimensional SmallSample Data The Madelon datasets from NIPS2003 was
6 Mathematical Problems in Engineering
Table 4 The experimental results of the all-pair lasso penalizedlogistic regression model
Datasets Error rate() SD Time (s)
Breast-cancer-Wisconsin 293 00001 121213Ionosphere 2700 00089 31734Liver disorders 2636 00058 49190Sonar 2063 00115 41041
used to evaluate our method The sample numbers of thetraining validation and testing of the data were respectively2000 600 and 1800 The class number is 2 The variabledimension is 500 So the interactive dimension is 124750You can find more information about the datasets followingthe link httpwwwnipsfscecssotonacuk you can alsodownload the datasets and see the results of challengesbalance error rates and the area under the curve The modelis trained by using a training set The model parameters areselected by using a validation set Also the prediction resultsof the final model using the test set are uploaded online Theclassification score of the final model is obtained Our resultsare shown in Table 6 The results show that our method isslightly better than the lasso and all-pair lasso This impliesthat the interactions may also be important in the Madelondatasets
53 Activity Recognition (AR) Using Inertial Sensors of Smart-phones Anguita et al collected sensor data of smartphones[10] They used the support vector machine (SVM) methodto solve the classification problem of the daily life activityrecognition These results play an extremely significant rolein disability and elderly care Datasets can be downloadedfollowing the literature [10] 30 volunteers aged 19ndash48 yearsparticipated in the study Each person performed six activitieswearing the smartphone on the waist To obtain the dataclass label experiments were conducted using video record-ing The smartphone used in the experiments had built-inaccelerometer and gyroscope for measuring 3D linear accel-eration and angular acceleration The sampling frequencywas 50Hz which is more than enough for capturing humanmovements
We use the datasets to evaluate our method We use theupstairs and downstairs movements as two active classesThetraining sets have 986 samples and 1073 samples respectivelyThe test sets have 420 samples and 471 samples respectivelyThe variable dimension is 561 which includes the time andfrequency from sensor signals
Experimental results of the three lassomethods and somepattern recognitionmethods are shown inTable 7The resultsshow that our method is better than the pattern recognitionmethods since it takes the variable selection and interactioninto account Our method achieves the best classificationresults with less training and testing time
54 The Numerical Simulation Results and Discussion Nowsuppose that the number and dimension of the samples are
119899 = 200 119901 = 20 We take interactions into considerationand provide the following three kinds of simulation based onformula (1)
(1) The real model is hierarchical Θ119895119896
= 0 rArr 120573119895
= 0 or120573119896
= 0 119895 119896 = 1 119901 There are 10 nonzero elementsin 120573 and 20 nonzero elements in Θ
(2) The real model only includes interactive variables120573119895= 0 119895 = 1 119901 There are 20 nonzero elements
in Θ
(3) The real model only includes main variables Θ119895119896
=0 119895 119896 = 1 119901 There are 10 nonzero elements in 120573
The SNR of the main variables is 15 and SNR of theinteraction variables is 1 The experiment results of 100 timesare shown in Figure 6
When the real model is hierarchical our method is thebest and the lasso is the worst This is shown in Figure 6(a)When the real model only includes interaction variances theinteractive lasso is the best and our method takes the secondplace while the lasso is still theworst as shown in Figure 6(b)The reason for this result is that when our method fitsthe model the interaction variables are considered to bemain variables When the real model only includes mainvariables the lasso is the best and our method still takes thesecond place and the all-pair lasso is the worst as shown inFigure 6(c)
We believe thatmany actual classification problems couldbe hierarchical and interactive They contain both mainvariables and interaction variables Our method fits in thiskind of situation
6 Conclusion
Taking into consideration the interaction between variablesthe hierarchical interactive lasso penalized logistic regressionusing the coordinate descent algorithm is derivedWeprovidethe model definition constraint condition and the convexrelaxation condition for the model We obtain a solutionfor the coefficients of the proposed model based on theconvex optimization and coordinate descent algorithm Wefurther provide experimental results based on four UCIdatasets NIPS2003 feature selection challenge datasets andtrue daily life activities identification datasets The resultsshow that the interaction widely exists in the classificationmodels They also demonstrate that the variable interactioncontributes to the response The classification performanceof our method is superior to the lasso all-pair lasso andsome pattern recognition methods It turns out that thevariable interaction and hierarchy are two important factorsOur further research is planned as follows other convexoptimization methods including the generalized gradientdescent method or alternating direction multiplier methodthe hierarchical interactive lasso penalized multiclass logisticregressionmethod the elastic net method or the hierarchicalgroup lasso method The application of the multisensorinteraction in the daily life activities of the elderly is a newway of using of our method
Mathematical Problems in Engineering 7
Table 5 The experimental results of the traditional pattern recognition methods
Datasets Classifier Error rate () SD Time (s)
Breast-cancer-Wisconsin
SVM 342 00019 13092LDA 396 00007 02429QDA 492 00014 02309K-NN 461 00030 06636DT 499 00041 16201
Ionosphere
SVM 3583 00063 29149LDA 3310 00090 06168QDA 3201 00076 04516K-NN 2830 00090 06363DT 3823 00408 30865
Liver disorders
SVM 3080 00100 11401LDA 3131 00089 02474QDA 4019 00119 01958K-NN 3689 00124 06089DT 3694 00187 19921
Sonar
SVM 2570 00197 12181LDA 2513 00145 05897QDA 2380 00205 04594K-NN 1309 00101 05345DT 3591 00303 07153
Table 6 The experimental results for the Madelon datasets
Methods Balance error rate () Area under the curve ()Training Validation Testing Training Validation Testing
Lasso 3808 3721 3661 6478 6279 6340All-pair lasso 3498 3701 3657 6502 6299 6343Our method 3000 3635 3512 7000 6808 6776
Appendices
A Proofs from (5) to (6)
For notational convenience we have written 119901119894instead of
119901(119909119894) A logarithmic likelihood function of (4) is as follows
119871119876=
1
119873ln [119871 (120573 Θ)]
=1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
(A1)
First we give the first- and second-order partial derivativeand mixed partial derivative of (4) with respect to 120573 and Θ
120597119871119876
120597120573
=120597 (1119873)sum
119873
119894=1[119910119894(119909119879119894120573 + (12) 119909119879
119894Θ119909119894) + ln (1 minus 119901
119894)]
120597120573
=1
119873
119873
sum119894=1
[(119909119879119894119910119894)119879
minusexp (119909119879
119894120573 + (12) 119909119879
119894Θ119909119894)
1 + exp (119909119879119894120573 + (12) 119909119879
119894Θ119909119894)sdot 119909119894]
=1
119873
119873
sum119894=1
(119910119879119894119909119894minus 119901119879119894119909119894) =
1
119873
119873
sum119894=1
119909119894(119910119894minus 119901119894)
1205972119871119876
1205971205732
=120597
120597120573
1
119873
119873
sum119894=1
119909119879119894(119910119894minus 119901119894) = minus
1
119873
119873
sum119894=1
119909119879119894
120597119901119894
120597120573
=1
119873
119873
sum119894=1
119909119879119894
exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
[1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)]2sdot (minus119909119894)
= minus1
119873
119873
sum119894=1
119909119894
exp (119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
sdot1
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)sdot 119909119879119894
= minus1
119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894
120597119871119876
120597Θ
8 Mathematical Problems in Engineering
Table 7 The experimental results of the three lasso methods and five traditional pattern recognition methods
Methods Error rate in training () Training time (s) Error rate in testing () Testing time (s)Lasso 000 02649 146 00312All-pair lasso 000 08807 135 00953Our method 000 06012 112 00556SVM 000 09516 123 009361-NN 000 00000 920 079563-NN 000 00000 898 07488QDA 661 14196 404 02184DT 000 07332 1493 01092
0055
005
0045
004
0035
003
0025
002
0015
Erro
r rat
e
Lasso Our method All-pair lasso
(a) Hierarchical interaction
0055
005
0045
004
0035
003
0025
002
0015
Erro
r rat
e
Lasso Our method All-pair lasso
(b) Interaction variances only
0055
005
0045
004
0035
003
0025
002
0015
Erro
r rat
e
Lasso Our method All-pair lasso
(c) Main variances only
Figure 6 The error rate of the three lasso methods in the simulation experiment (the Bayes error is shown by the purple dotted line in thegraph)
Mathematical Problems in Engineering 9
=120597 (1119873)sum
119873
119894=1[119910119894(119909119879119894120573 + (12) 119909119879
119894Θ119909119894) + ln (1 minus 119901
119894)]
120597Θ
=1
119873
119873
sum119894=1
[(1
2119910119894119909119879119894119909119894)119879
minusexp (119909119879
119894120573 + (12) 119909119879
119894Θ119909119894)
1 + exp (119909119879119894120573 + (12) 119909119879
119894Θ119909119894)
sdot1
2sdot 119909119879119894119909119894]
=1
119873
119873
sum119894=1
(1
2119910119894119909119879119894119909119894minus1
2119901119894119909119879119894119909119894)
=1
2119873
119873
sum119894=1
119909119879119894119909119894(119910119894minus 119901119894)
1205972119871119876
120597Θ2
=120597
120597Θ
1
2119873
119873
sum119894=1
119909119879119894119909119894(119910119894minus 119901119894) = minus
1
2119873
119873
sum119894=1
119909119879119894119909119894
120597119901119894
120597Θ
=1
2119873
119873
sum119894=1
exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
[1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)]2
sdot (minus1
2119909119879119894119909119894) sdot 119909119879119894119909119894
= minus1
4119873
119873
sum119894=1
exp (119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
sdot1
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)sdot1003817100381710038171003817119909119894
10038171003817100381710038172
= minus1
4119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot1003817100381710038171003817119909119894
10038171003817100381710038172
1205972119871119876
120597120573120597Θ
=120597
120597120573
1
119873
119873
sum119894=1
119909119879119894(119910119894minus 119901119894) = minus
1
119873
119873
sum119894=1
119909119879119894
120597119901119894
120597Θ
=1
119873
119873
sum119894=1
119909119879119894
exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
[1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)]2
sdot (minus1
2119909119879119894119909119894)
= minus1
119873
119873
sum119894=1
119909119879119894
exp (119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
sdot1
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
= minus1
2119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894sdot 119909119879119894
(A2)
Then (A1) is expanded by using Taylor series with respectto the expended point (120573 Θ)
119897119876(120573 Θ)
= 119871119876+ (120573 minus 120573) sdot
120597119871119876
120597120573+ (Θ minus Θ) sdot
120597119871119876
120597Θ
+1
2[(120573 minus 120573) sdot
120597119871119876
1205971205732+ 2 (120573 minus 120573)
119879
(Θ minus Θ)1205972119871119876
120597120573120597Θ
+ (Θ minus Θ) sdot120597119871119876
120597Θ2]
=1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
+ (120573 minus 120573) sdot1
119873
119873
sum119894=1
119909119894(119910119894minus 119901119894)
+ (Θ minus Θ) sdot1
2119873
119873
sum119894=1
119909119879119894119909119894(119910119894minus 119901119894)
minus1
2(120573 minus 120573) sdot
1
119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894
minus 2 (120573 minus 120573)119879
(Θ minus Θ)
sdot1
2119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894sdot 119909119879119894
minus (Θ minus Θ) sdot1
4119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot1003817100381710038171003817119909119894
10038171003817100381710038172
= minus1
2119873
119873
sum119894=1
(1 minus 119901119894)
sdot 119901119894[119909119879119894(120573 minus 120573)]
2
+ (120573 minus 120573)119879
(Θ minus Θ) 119909119894
sdot 119909119879119894119909119894+1
4[119909119879119894(Θ minus Θ) 119909
119894]2
+1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
+1
119873
119873
sum119894=1
119909119879119894(120573 minus 120573) (119910
119894minus 119901119894)
+1
2119873
119873
sum119894=1
119909119879119894(Θ minus Θ) 119909
119894(119910119894minus 119901119894)
= minus1
2119873
119873
sum119894=1
(1 minus 119901119894)
10 Mathematical Problems in Engineering
sdot 119901119894119909119879119894(120573 minus 120573)
+1
2119909119879119894(Θ minus Θ) 119909
1198942
+1
119873
119873
sum119894=1
119909119879119894(120573 minus 120573) (119910
119894minus 119901119894)
+1
2119873
119873
sum119894=1
119909119879119894(Θ minus Θ) 119909
119894(119910119894minus 119901119894)
+1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
= minus1
2119873
119873
sum119894=1
(1 minus 119901119894)
sdot 119901119894119909119879119894(120573 minus 120573) +
1
2119909119879119894(Θ minus Θ) 119909
119894
+119910119894minus 119901119894
119901119894[1 minus 119901
119894]
2
+1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
minus1
2119873
119873
sum119894=1
[119910119894minus 119901119894
119901119894[1 minus 119901
119894]]
2
= minus1
2119873
119873
sum119894=1
120596119894(119911119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896)
2
+1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
minus1
2119873
119873
sum119894=1
[119910119894minus 119901119894
119901119894[1 minus 119901
119894]]
2
(A3)
where
120596119894= 119901 (119909
119894) [1 minus 119901 (119909
119894)]
119911119894= sum119895
120573119895119909119894119895+1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896
+119910119894minus 119901 (119909
119894)
119901 (119909119894) [1 minus 119901 (119909
119894)]
(A4)
B Proofs from (12) to (13)
First there are three cases in calculating 120573+119895minus 120573minus119895
(1) 120573+119895ge 0 120573minus
119895= 0
120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+
119895minus 120573minus119895minus1205821+ 119895
120596119894
(B1)
(2) 120573+119895ge 0 120573minus
119895= 0
120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+
119895minus 120573minus119895+1205821+ 119895
120596119894
(B2)
(3) 120573+119895ge 0 120573minus
119895= 0
120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+
119895minus 120573minus119895 (B3)
We derive
120573+119895minus 120573minus119895= 119878 [119909
119894119895(119911119894minus 119860119894) + 120573+119895minus 120573minus1198951205821+ 119895
120596119894
] (B4)
Secondly we give the following by the stationary condi-tion 120597119871120597Θ
119895119896= 0
1
2120596119894sdot 2(119911
119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896)
sdot (minus1
2119909119894119895119909119894119896) + (120582
2+ 120572119895)119880119895119896= 0
120596119894sdot (119911119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896)
sdot (1
2119909119894119895119909119894119896) = (120582
2+ 120572119895)119880119895119896
(119911119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896)
sdot (1
2119909119894119895119909119894119896) =
(1205822+ 120572119895)119880119895119896
120596119894
(B5)
Supposing
120574(minus119895119896) = 119911119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896+ Θ119895119896119909119894119895119909119894119896 (B6)
we have
(120574(minus119895119896) minus Θ119895119896119909119894119895119909119894119896) sdot (119909119894119895119909119894119896)
=2 (1205822+ 120572119895)119880119895119896
120596119894
Θ119895119896119909119894119895119909119894119896119909119894119895119909119894119896
= minus2 (1205822+ 120572119895)119880119895119896
120596119894
+ 120574(minus119895119896) sdot 119909119894119895119909119894119896
Mathematical Problems in Engineering 11
Θ119895119896
=120574(minus119895119896) sdot 119909
119894119895119909119894119896minus 2 (120582
2+ 120572119895)119880119895119896120596119894
(119909119894119895119909119894119896)2
(B7)
We discuss three cases about Θ119895119896value
(1) Θ119895119896gt 0 119880
119895119896= 1
Θ119895119896=120574(minus119895119896) sdot 119909
119894119895119909119894119896minus 2 (120582
2+ 120572119895) 120596119894
(119909119894119895119909119894119896)2
(B8)
(2) Θ119895119896lt 0 119880
119895119896= minus1
Θ119895119896=120574(minus119895119896) sdot 119909
119894119895119909119894119896+ 2 (120582
2+ 120572119895) 120596119894
(119909119894119895119909119894119896)2
(B9)
(3) Θ119895119896= 0
We drive
Θ119895119896=119878 [120574(minus119895119896) sdot 119909
119894119895119909119894119896 2 (1205822+ 120572119895) 120596119894]
(119909119894119895119909119894119896)2
(B10)
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work was supported by the National Natural ScienceFoundation of China (nos 61273019 61473339) China Post-doctoral Science Foundation (2014M561202) Hebei Postdoc-toral Science Foundation Special Fund Project and HebeiTop Young Talents Support Program
References
[1] R Tibshirani ldquoRegression shrinkage and selection via the lassordquoJournal of the Royal Statistical Society B vol 58 no 1 pp 267ndash288 1996
[2] M Y Park and T Hastie ldquo1198711-regularization path algorithm
for generalized linear modelsrdquo Journal of the Royal StatisticalSociety B Statistical Methodology vol 69 no 4 pp 659ndash6772007
[3] T TWu Y F Chen THastie E Sobel andK Lange ldquoGenome-wide association analysis by lasso penalized logistic regressionrdquoBioinformatics vol 25 no 6 pp 714ndash721 2009
[4] J Friedman T Hastie and R Tibshirani ldquoRegularization pathsfor generalized linear models via coordinate descentrdquo Journal ofStatistical Software vol 33 no 1 pp 1ndash22 2010
[5] M Lim and T Hastie ldquoLearning interactions via hierarchicalgroup-lasso regularization[EB]rdquo httpwwwstanfordedusimhastiePapersglinternetpdf
[6] H Schwender and K Ickstadt ldquoIdentification of SNP interac-tions using logic regressionrdquo Biostatistics vol 9 no 1 pp 187ndash198 2008
[7] S Noah and R Tibshirani ldquoA Permutation Approach to TestingInteractions inManyDimensionsrdquo httpstatwebstanfordedusimtibsresearchhtml
[8] JWu BDevlin S RingquistM Trucco andK Roeder ldquoScreenand clean a tool for identifying interactions in genome-wideassociation studiesrdquo Genetic Epidemiology vol 34 no 3 pp275ndash285 2010
[9] Y Nardi and A Rinaldo ldquoThe log-linear group-lasso estimatorand its asymptotic propertiesrdquo Bernoulli vol 18 no 3 pp 945ndash974 2012
[10] M Yuan V R Joseph and Y Lin ldquoAn efficient variable selectionapproach for analyzing designed experimentsrdquo Technometricsvol 49 no 4 pp 430ndash439 2007
[11] H Chipman ldquoBayesian variable selection with related predic-torsrdquoThe Canadian Journal of Statistics vol 24 no 1 pp 17ndash361996
[12] N H Choi W Li and J Zhu ldquoVariable selection with thestrong heredity constraint and its oracle propertyrdquo Journal of theAmerican Statistical Association vol 105 no 489 pp 354ndash3642010
[13] J Bien J Taylor and R Tibshirani ldquoA lasso for hierarchicalinteractionsrdquoTheAnnals of Statistics vol 41 no 3 pp 1111ndash11412013
[14] M Yuan and Y Lin ldquoModel selection and estimation in regres-sion with grouped variablesrdquo Journal of the Royal StatisticalSociety Series B StatisticalMethodology vol 68 no 1 pp 49ndash672006
[15] R Jenatton J-Y Audibert and F Bach ldquoStructured variableselection with sparsity-inducing normsrdquo Journal of MachineLearning Research vol 12 no 10 pp 2777ndash2824 2011
[16] P Radchenko and G M James ldquoVariable selection usingadaptive nonlinear interaction structures in high dimensionsrdquoJournal of the American Statistical Association vol 105 no 492pp 1541ndash1553 2010
[17] F Bach R Jenatton J Mairal and G Obozinski ldquoStructuredsparsity through convex optimizationrdquo Statistical Science vol27 no 4 pp 450ndash468 2012
[18] I Ruczinski C Kooperberg and M LeBlanc ldquoLogic regres-sionrdquo Journal of Computational and Graphical Statistics vol 12no 3 pp 475ndash511 2003
[19] P Hall and J-H Xue ldquoOn selecting interacting features fromhigh-dimensional datardquo Computational Statistics amp Data Anal-ysis vol 71 pp 694ndash708 2014
[20] J-J Wang J Li T Zhang and W-X Hong ldquoDistinguishingvisual feature extraction method using quadratic map andgenetic algorithmrdquo Journal of System Simulation vol 21 no 16pp 5080ndash5083 2009
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Mathematical Problems in Engineering 5
005
015
025
035
Cros
s-va
lidat
ion
erro
r
24 20 14 14 13 13 11 11 6Number of features
02
3
1 2 3 4 5
log(120582)
Figure 2 The results of breast-cancer-Wisconsin
028
030
032
034
036
038
177 139 104 83 73 53 30 9Number of features
3
Cros
s-va
lidat
ion
erro
r
log(120582)minus1 0 1 2 3
Figure 3 The results of Ionosphere
In what follows we compare our method to the existingliterature [13] The classification results and training time ofour method are better than those shown in the literature[13] The experimental results of the lasso all-pair lassoand conventional pattern recognition methods with 10-foldcross-validation of 20 times in the four UCI datasets arelisted respectively in Tables 3 4 and 5 Conventional patternrecognitionmethods include support vectormachine (SVM)linear discriminant analysis (LDA) quadratic discriminantanalysis (QDA)119870-nearest neighborhood (119870-NN) and deci-sion tree (DT) methods The lasso is a method that considersthe main variables without the interaction variables All-pairlasso is a method that considers the main variables and inter-action variables but without the hierarchy The experimental
025
030
035
33 32 32 28 26 24 24 19 12 8
040
log(120582)minus1 0 1 2 3
Number of features
Cros
s-va
lidat
ion
erro
r
Figure 4 The results of Liver disorders
03
05
124 112 97 88 77 70 42 18 8
02
04
log(120582)0 1 2 3
1Number of features
Cros
s-va
lidat
ion
erro
r
Figure 5 The results of Sonar
Table 3 The experimental results of the lasso penalized logisticregression model
Datasets Error rate() SD Time (s)
Breast-cancer-Wisconsin 322 00001 03744Ionosphere 3115 00075 04863Liver disorders 3077 00077 00908Sonar 2213 00121 156806
results show that our model is better in classification resultsand more stableThis highlights the advantage of the variableinteractions and hierarchy
52 The Experimental Results for High-Dimensional SmallSample Data The Madelon datasets from NIPS2003 was
6 Mathematical Problems in Engineering
Table 4 The experimental results of the all-pair lasso penalizedlogistic regression model
Datasets Error rate() SD Time (s)
Breast-cancer-Wisconsin 293 00001 121213Ionosphere 2700 00089 31734Liver disorders 2636 00058 49190Sonar 2063 00115 41041
used to evaluate our method The sample numbers of thetraining validation and testing of the data were respectively2000 600 and 1800 The class number is 2 The variabledimension is 500 So the interactive dimension is 124750You can find more information about the datasets followingthe link httpwwwnipsfscecssotonacuk you can alsodownload the datasets and see the results of challengesbalance error rates and the area under the curve The modelis trained by using a training set The model parameters areselected by using a validation set Also the prediction resultsof the final model using the test set are uploaded online Theclassification score of the final model is obtained Our resultsare shown in Table 6 The results show that our method isslightly better than the lasso and all-pair lasso This impliesthat the interactions may also be important in the Madelondatasets
53 Activity Recognition (AR) Using Inertial Sensors of Smart-phones Anguita et al collected sensor data of smartphones[10] They used the support vector machine (SVM) methodto solve the classification problem of the daily life activityrecognition These results play an extremely significant rolein disability and elderly care Datasets can be downloadedfollowing the literature [10] 30 volunteers aged 19ndash48 yearsparticipated in the study Each person performed six activitieswearing the smartphone on the waist To obtain the dataclass label experiments were conducted using video record-ing The smartphone used in the experiments had built-inaccelerometer and gyroscope for measuring 3D linear accel-eration and angular acceleration The sampling frequencywas 50Hz which is more than enough for capturing humanmovements
We use the datasets to evaluate our method We use theupstairs and downstairs movements as two active classesThetraining sets have 986 samples and 1073 samples respectivelyThe test sets have 420 samples and 471 samples respectivelyThe variable dimension is 561 which includes the time andfrequency from sensor signals
Experimental results of the three lassomethods and somepattern recognitionmethods are shown inTable 7The resultsshow that our method is better than the pattern recognitionmethods since it takes the variable selection and interactioninto account Our method achieves the best classificationresults with less training and testing time
54 The Numerical Simulation Results and Discussion Nowsuppose that the number and dimension of the samples are
119899 = 200 119901 = 20 We take interactions into considerationand provide the following three kinds of simulation based onformula (1)
(1) The real model is hierarchical Θ119895119896
= 0 rArr 120573119895
= 0 or120573119896
= 0 119895 119896 = 1 119901 There are 10 nonzero elementsin 120573 and 20 nonzero elements in Θ
(2) The real model only includes interactive variables120573119895= 0 119895 = 1 119901 There are 20 nonzero elements
in Θ
(3) The real model only includes main variables Θ119895119896
=0 119895 119896 = 1 119901 There are 10 nonzero elements in 120573
The SNR of the main variables is 15 and SNR of theinteraction variables is 1 The experiment results of 100 timesare shown in Figure 6
When the real model is hierarchical our method is thebest and the lasso is the worst This is shown in Figure 6(a)When the real model only includes interaction variances theinteractive lasso is the best and our method takes the secondplace while the lasso is still theworst as shown in Figure 6(b)The reason for this result is that when our method fitsthe model the interaction variables are considered to bemain variables When the real model only includes mainvariables the lasso is the best and our method still takes thesecond place and the all-pair lasso is the worst as shown inFigure 6(c)
We believe thatmany actual classification problems couldbe hierarchical and interactive They contain both mainvariables and interaction variables Our method fits in thiskind of situation
6 Conclusion
Taking into consideration the interaction between variablesthe hierarchical interactive lasso penalized logistic regressionusing the coordinate descent algorithm is derivedWeprovidethe model definition constraint condition and the convexrelaxation condition for the model We obtain a solutionfor the coefficients of the proposed model based on theconvex optimization and coordinate descent algorithm Wefurther provide experimental results based on four UCIdatasets NIPS2003 feature selection challenge datasets andtrue daily life activities identification datasets The resultsshow that the interaction widely exists in the classificationmodels They also demonstrate that the variable interactioncontributes to the response The classification performanceof our method is superior to the lasso all-pair lasso andsome pattern recognition methods It turns out that thevariable interaction and hierarchy are two important factorsOur further research is planned as follows other convexoptimization methods including the generalized gradientdescent method or alternating direction multiplier methodthe hierarchical interactive lasso penalized multiclass logisticregressionmethod the elastic net method or the hierarchicalgroup lasso method The application of the multisensorinteraction in the daily life activities of the elderly is a newway of using of our method
Mathematical Problems in Engineering 7
Table 5 The experimental results of the traditional pattern recognition methods
Datasets Classifier Error rate () SD Time (s)
Breast-cancer-Wisconsin
SVM 342 00019 13092LDA 396 00007 02429QDA 492 00014 02309K-NN 461 00030 06636DT 499 00041 16201
Ionosphere
SVM 3583 00063 29149LDA 3310 00090 06168QDA 3201 00076 04516K-NN 2830 00090 06363DT 3823 00408 30865
Liver disorders
SVM 3080 00100 11401LDA 3131 00089 02474QDA 4019 00119 01958K-NN 3689 00124 06089DT 3694 00187 19921
Sonar
SVM 2570 00197 12181LDA 2513 00145 05897QDA 2380 00205 04594K-NN 1309 00101 05345DT 3591 00303 07153
Table 6 The experimental results for the Madelon datasets
Methods Balance error rate () Area under the curve ()Training Validation Testing Training Validation Testing
Lasso 3808 3721 3661 6478 6279 6340All-pair lasso 3498 3701 3657 6502 6299 6343Our method 3000 3635 3512 7000 6808 6776
Appendices
A Proofs from (5) to (6)
For notational convenience we have written 119901119894instead of
119901(119909119894) A logarithmic likelihood function of (4) is as follows
119871119876=
1
119873ln [119871 (120573 Θ)]
=1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
(A1)
First we give the first- and second-order partial derivativeand mixed partial derivative of (4) with respect to 120573 and Θ
120597119871119876
120597120573
=120597 (1119873)sum
119873
119894=1[119910119894(119909119879119894120573 + (12) 119909119879
119894Θ119909119894) + ln (1 minus 119901
119894)]
120597120573
=1
119873
119873
sum119894=1
[(119909119879119894119910119894)119879
minusexp (119909119879
119894120573 + (12) 119909119879
119894Θ119909119894)
1 + exp (119909119879119894120573 + (12) 119909119879
119894Θ119909119894)sdot 119909119894]
=1
119873
119873
sum119894=1
(119910119879119894119909119894minus 119901119879119894119909119894) =
1
119873
119873
sum119894=1
119909119894(119910119894minus 119901119894)
1205972119871119876
1205971205732
=120597
120597120573
1
119873
119873
sum119894=1
119909119879119894(119910119894minus 119901119894) = minus
1
119873
119873
sum119894=1
119909119879119894
120597119901119894
120597120573
=1
119873
119873
sum119894=1
119909119879119894
exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
[1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)]2sdot (minus119909119894)
= minus1
119873
119873
sum119894=1
119909119894
exp (119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
sdot1
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)sdot 119909119879119894
= minus1
119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894
120597119871119876
120597Θ
8 Mathematical Problems in Engineering
Table 7 The experimental results of the three lasso methods and five traditional pattern recognition methods
Methods Error rate in training () Training time (s) Error rate in testing () Testing time (s)Lasso 000 02649 146 00312All-pair lasso 000 08807 135 00953Our method 000 06012 112 00556SVM 000 09516 123 009361-NN 000 00000 920 079563-NN 000 00000 898 07488QDA 661 14196 404 02184DT 000 07332 1493 01092
0055
005
0045
004
0035
003
0025
002
0015
Erro
r rat
e
Lasso Our method All-pair lasso
(a) Hierarchical interaction
0055
005
0045
004
0035
003
0025
002
0015
Erro
r rat
e
Lasso Our method All-pair lasso
(b) Interaction variances only
0055
005
0045
004
0035
003
0025
002
0015
Erro
r rat
e
Lasso Our method All-pair lasso
(c) Main variances only
Figure 6 The error rate of the three lasso methods in the simulation experiment (the Bayes error is shown by the purple dotted line in thegraph)
Mathematical Problems in Engineering 9
=120597 (1119873)sum
119873
119894=1[119910119894(119909119879119894120573 + (12) 119909119879
119894Θ119909119894) + ln (1 minus 119901
119894)]
120597Θ
=1
119873
119873
sum119894=1
[(1
2119910119894119909119879119894119909119894)119879
minusexp (119909119879
119894120573 + (12) 119909119879
119894Θ119909119894)
1 + exp (119909119879119894120573 + (12) 119909119879
119894Θ119909119894)
sdot1
2sdot 119909119879119894119909119894]
=1
119873
119873
sum119894=1
(1
2119910119894119909119879119894119909119894minus1
2119901119894119909119879119894119909119894)
=1
2119873
119873
sum119894=1
119909119879119894119909119894(119910119894minus 119901119894)
1205972119871119876
120597Θ2
=120597
120597Θ
1
2119873
119873
sum119894=1
119909119879119894119909119894(119910119894minus 119901119894) = minus
1
2119873
119873
sum119894=1
119909119879119894119909119894
120597119901119894
120597Θ
=1
2119873
119873
sum119894=1
exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
[1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)]2
sdot (minus1
2119909119879119894119909119894) sdot 119909119879119894119909119894
= minus1
4119873
119873
sum119894=1
exp (119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
sdot1
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)sdot1003817100381710038171003817119909119894
10038171003817100381710038172
= minus1
4119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot1003817100381710038171003817119909119894
10038171003817100381710038172
1205972119871119876
120597120573120597Θ
=120597
120597120573
1
119873
119873
sum119894=1
119909119879119894(119910119894minus 119901119894) = minus
1
119873
119873
sum119894=1
119909119879119894
120597119901119894
120597Θ
=1
119873
119873
sum119894=1
119909119879119894
exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
[1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)]2
sdot (minus1
2119909119879119894119909119894)
= minus1
119873
119873
sum119894=1
119909119879119894
exp (119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
sdot1
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
= minus1
2119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894sdot 119909119879119894
(A2)
Then (A1) is expanded by using Taylor series with respectto the expended point (120573 Θ)
119897119876(120573 Θ)
= 119871119876+ (120573 minus 120573) sdot
120597119871119876
120597120573+ (Θ minus Θ) sdot
120597119871119876
120597Θ
+1
2[(120573 minus 120573) sdot
120597119871119876
1205971205732+ 2 (120573 minus 120573)
119879
(Θ minus Θ)1205972119871119876
120597120573120597Θ
+ (Θ minus Θ) sdot120597119871119876
120597Θ2]
=1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
+ (120573 minus 120573) sdot1
119873
119873
sum119894=1
119909119894(119910119894minus 119901119894)
+ (Θ minus Θ) sdot1
2119873
119873
sum119894=1
119909119879119894119909119894(119910119894minus 119901119894)
minus1
2(120573 minus 120573) sdot
1
119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894
minus 2 (120573 minus 120573)119879
(Θ minus Θ)
sdot1
2119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894sdot 119909119879119894
minus (Θ minus Θ) sdot1
4119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot1003817100381710038171003817119909119894
10038171003817100381710038172
= minus1
2119873
119873
sum119894=1
(1 minus 119901119894)
sdot 119901119894[119909119879119894(120573 minus 120573)]
2
+ (120573 minus 120573)119879
(Θ minus Θ) 119909119894
sdot 119909119879119894119909119894+1
4[119909119879119894(Θ minus Θ) 119909
119894]2
+1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
+1
119873
119873
sum119894=1
119909119879119894(120573 minus 120573) (119910
119894minus 119901119894)
+1
2119873
119873
sum119894=1
119909119879119894(Θ minus Θ) 119909
119894(119910119894minus 119901119894)
= minus1
2119873
119873
sum119894=1
(1 minus 119901119894)
10 Mathematical Problems in Engineering
sdot 119901119894119909119879119894(120573 minus 120573)
+1
2119909119879119894(Θ minus Θ) 119909
1198942
+1
119873
119873
sum119894=1
119909119879119894(120573 minus 120573) (119910
119894minus 119901119894)
+1
2119873
119873
sum119894=1
119909119879119894(Θ minus Θ) 119909
119894(119910119894minus 119901119894)
+1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
= minus1
2119873
119873
sum119894=1
(1 minus 119901119894)
sdot 119901119894119909119879119894(120573 minus 120573) +
1
2119909119879119894(Θ minus Θ) 119909
119894
+119910119894minus 119901119894
119901119894[1 minus 119901
119894]
2
+1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
minus1
2119873
119873
sum119894=1
[119910119894minus 119901119894
119901119894[1 minus 119901
119894]]
2
= minus1
2119873
119873
sum119894=1
120596119894(119911119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896)
2
+1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
minus1
2119873
119873
sum119894=1
[119910119894minus 119901119894
119901119894[1 minus 119901
119894]]
2
(A3)
where
120596119894= 119901 (119909
119894) [1 minus 119901 (119909
119894)]
119911119894= sum119895
120573119895119909119894119895+1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896
+119910119894minus 119901 (119909
119894)
119901 (119909119894) [1 minus 119901 (119909
119894)]
(A4)
B Proofs from (12) to (13)
First there are three cases in calculating 120573+119895minus 120573minus119895
(1) 120573+119895ge 0 120573minus
119895= 0
120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+
119895minus 120573minus119895minus1205821+ 119895
120596119894
(B1)
(2) 120573+119895ge 0 120573minus
119895= 0
120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+
119895minus 120573minus119895+1205821+ 119895
120596119894
(B2)
(3) 120573+119895ge 0 120573minus
119895= 0
120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+
119895minus 120573minus119895 (B3)
We derive
120573+119895minus 120573minus119895= 119878 [119909
119894119895(119911119894minus 119860119894) + 120573+119895minus 120573minus1198951205821+ 119895
120596119894
] (B4)
Secondly we give the following by the stationary condi-tion 120597119871120597Θ
119895119896= 0
1
2120596119894sdot 2(119911
119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896)
sdot (minus1
2119909119894119895119909119894119896) + (120582
2+ 120572119895)119880119895119896= 0
120596119894sdot (119911119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896)
sdot (1
2119909119894119895119909119894119896) = (120582
2+ 120572119895)119880119895119896
(119911119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896)
sdot (1
2119909119894119895119909119894119896) =
(1205822+ 120572119895)119880119895119896
120596119894
(B5)
Supposing
120574(minus119895119896) = 119911119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896+ Θ119895119896119909119894119895119909119894119896 (B6)
we have
(120574(minus119895119896) minus Θ119895119896119909119894119895119909119894119896) sdot (119909119894119895119909119894119896)
=2 (1205822+ 120572119895)119880119895119896
120596119894
Θ119895119896119909119894119895119909119894119896119909119894119895119909119894119896
= minus2 (1205822+ 120572119895)119880119895119896
120596119894
+ 120574(minus119895119896) sdot 119909119894119895119909119894119896
Mathematical Problems in Engineering 11
Θ119895119896
=120574(minus119895119896) sdot 119909
119894119895119909119894119896minus 2 (120582
2+ 120572119895)119880119895119896120596119894
(119909119894119895119909119894119896)2
(B7)
We discuss three cases about Θ119895119896value
(1) Θ119895119896gt 0 119880
119895119896= 1
Θ119895119896=120574(minus119895119896) sdot 119909
119894119895119909119894119896minus 2 (120582
2+ 120572119895) 120596119894
(119909119894119895119909119894119896)2
(B8)
(2) Θ119895119896lt 0 119880
119895119896= minus1
Θ119895119896=120574(minus119895119896) sdot 119909
119894119895119909119894119896+ 2 (120582
2+ 120572119895) 120596119894
(119909119894119895119909119894119896)2
(B9)
(3) Θ119895119896= 0
We drive
Θ119895119896=119878 [120574(minus119895119896) sdot 119909
119894119895119909119894119896 2 (1205822+ 120572119895) 120596119894]
(119909119894119895119909119894119896)2
(B10)
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work was supported by the National Natural ScienceFoundation of China (nos 61273019 61473339) China Post-doctoral Science Foundation (2014M561202) Hebei Postdoc-toral Science Foundation Special Fund Project and HebeiTop Young Talents Support Program
References
[1] R Tibshirani ldquoRegression shrinkage and selection via the lassordquoJournal of the Royal Statistical Society B vol 58 no 1 pp 267ndash288 1996
[2] M Y Park and T Hastie ldquo1198711-regularization path algorithm
for generalized linear modelsrdquo Journal of the Royal StatisticalSociety B Statistical Methodology vol 69 no 4 pp 659ndash6772007
[3] T TWu Y F Chen THastie E Sobel andK Lange ldquoGenome-wide association analysis by lasso penalized logistic regressionrdquoBioinformatics vol 25 no 6 pp 714ndash721 2009
[4] J Friedman T Hastie and R Tibshirani ldquoRegularization pathsfor generalized linear models via coordinate descentrdquo Journal ofStatistical Software vol 33 no 1 pp 1ndash22 2010
[5] M Lim and T Hastie ldquoLearning interactions via hierarchicalgroup-lasso regularization[EB]rdquo httpwwwstanfordedusimhastiePapersglinternetpdf
[6] H Schwender and K Ickstadt ldquoIdentification of SNP interac-tions using logic regressionrdquo Biostatistics vol 9 no 1 pp 187ndash198 2008
[7] S Noah and R Tibshirani ldquoA Permutation Approach to TestingInteractions inManyDimensionsrdquo httpstatwebstanfordedusimtibsresearchhtml
[8] JWu BDevlin S RingquistM Trucco andK Roeder ldquoScreenand clean a tool for identifying interactions in genome-wideassociation studiesrdquo Genetic Epidemiology vol 34 no 3 pp275ndash285 2010
[9] Y Nardi and A Rinaldo ldquoThe log-linear group-lasso estimatorand its asymptotic propertiesrdquo Bernoulli vol 18 no 3 pp 945ndash974 2012
[10] M Yuan V R Joseph and Y Lin ldquoAn efficient variable selectionapproach for analyzing designed experimentsrdquo Technometricsvol 49 no 4 pp 430ndash439 2007
[11] H Chipman ldquoBayesian variable selection with related predic-torsrdquoThe Canadian Journal of Statistics vol 24 no 1 pp 17ndash361996
[12] N H Choi W Li and J Zhu ldquoVariable selection with thestrong heredity constraint and its oracle propertyrdquo Journal of theAmerican Statistical Association vol 105 no 489 pp 354ndash3642010
[13] J Bien J Taylor and R Tibshirani ldquoA lasso for hierarchicalinteractionsrdquoTheAnnals of Statistics vol 41 no 3 pp 1111ndash11412013
[14] M Yuan and Y Lin ldquoModel selection and estimation in regres-sion with grouped variablesrdquo Journal of the Royal StatisticalSociety Series B StatisticalMethodology vol 68 no 1 pp 49ndash672006
[15] R Jenatton J-Y Audibert and F Bach ldquoStructured variableselection with sparsity-inducing normsrdquo Journal of MachineLearning Research vol 12 no 10 pp 2777ndash2824 2011
[16] P Radchenko and G M James ldquoVariable selection usingadaptive nonlinear interaction structures in high dimensionsrdquoJournal of the American Statistical Association vol 105 no 492pp 1541ndash1553 2010
[17] F Bach R Jenatton J Mairal and G Obozinski ldquoStructuredsparsity through convex optimizationrdquo Statistical Science vol27 no 4 pp 450ndash468 2012
[18] I Ruczinski C Kooperberg and M LeBlanc ldquoLogic regres-sionrdquo Journal of Computational and Graphical Statistics vol 12no 3 pp 475ndash511 2003
[19] P Hall and J-H Xue ldquoOn selecting interacting features fromhigh-dimensional datardquo Computational Statistics amp Data Anal-ysis vol 71 pp 694ndash708 2014
[20] J-J Wang J Li T Zhang and W-X Hong ldquoDistinguishingvisual feature extraction method using quadratic map andgenetic algorithmrdquo Journal of System Simulation vol 21 no 16pp 5080ndash5083 2009
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
6 Mathematical Problems in Engineering
Table 4 The experimental results of the all-pair lasso penalizedlogistic regression model
Datasets Error rate() SD Time (s)
Breast-cancer-Wisconsin 293 00001 121213Ionosphere 2700 00089 31734Liver disorders 2636 00058 49190Sonar 2063 00115 41041
used to evaluate our method The sample numbers of thetraining validation and testing of the data were respectively2000 600 and 1800 The class number is 2 The variabledimension is 500 So the interactive dimension is 124750You can find more information about the datasets followingthe link httpwwwnipsfscecssotonacuk you can alsodownload the datasets and see the results of challengesbalance error rates and the area under the curve The modelis trained by using a training set The model parameters areselected by using a validation set Also the prediction resultsof the final model using the test set are uploaded online Theclassification score of the final model is obtained Our resultsare shown in Table 6 The results show that our method isslightly better than the lasso and all-pair lasso This impliesthat the interactions may also be important in the Madelondatasets
53 Activity Recognition (AR) Using Inertial Sensors of Smart-phones Anguita et al collected sensor data of smartphones[10] They used the support vector machine (SVM) methodto solve the classification problem of the daily life activityrecognition These results play an extremely significant rolein disability and elderly care Datasets can be downloadedfollowing the literature [10] 30 volunteers aged 19ndash48 yearsparticipated in the study Each person performed six activitieswearing the smartphone on the waist To obtain the dataclass label experiments were conducted using video record-ing The smartphone used in the experiments had built-inaccelerometer and gyroscope for measuring 3D linear accel-eration and angular acceleration The sampling frequencywas 50Hz which is more than enough for capturing humanmovements
We use the datasets to evaluate our method We use theupstairs and downstairs movements as two active classesThetraining sets have 986 samples and 1073 samples respectivelyThe test sets have 420 samples and 471 samples respectivelyThe variable dimension is 561 which includes the time andfrequency from sensor signals
Experimental results of the three lassomethods and somepattern recognitionmethods are shown inTable 7The resultsshow that our method is better than the pattern recognitionmethods since it takes the variable selection and interactioninto account Our method achieves the best classificationresults with less training and testing time
54 The Numerical Simulation Results and Discussion Nowsuppose that the number and dimension of the samples are
119899 = 200 119901 = 20 We take interactions into considerationand provide the following three kinds of simulation based onformula (1)
(1) The real model is hierarchical Θ119895119896
= 0 rArr 120573119895
= 0 or120573119896
= 0 119895 119896 = 1 119901 There are 10 nonzero elementsin 120573 and 20 nonzero elements in Θ
(2) The real model only includes interactive variables120573119895= 0 119895 = 1 119901 There are 20 nonzero elements
in Θ
(3) The real model only includes main variables Θ119895119896
=0 119895 119896 = 1 119901 There are 10 nonzero elements in 120573
The SNR of the main variables is 15 and SNR of theinteraction variables is 1 The experiment results of 100 timesare shown in Figure 6
When the real model is hierarchical our method is thebest and the lasso is the worst This is shown in Figure 6(a)When the real model only includes interaction variances theinteractive lasso is the best and our method takes the secondplace while the lasso is still theworst as shown in Figure 6(b)The reason for this result is that when our method fitsthe model the interaction variables are considered to bemain variables When the real model only includes mainvariables the lasso is the best and our method still takes thesecond place and the all-pair lasso is the worst as shown inFigure 6(c)
We believe thatmany actual classification problems couldbe hierarchical and interactive They contain both mainvariables and interaction variables Our method fits in thiskind of situation
6 Conclusion
Taking into consideration the interaction between variablesthe hierarchical interactive lasso penalized logistic regressionusing the coordinate descent algorithm is derivedWeprovidethe model definition constraint condition and the convexrelaxation condition for the model We obtain a solutionfor the coefficients of the proposed model based on theconvex optimization and coordinate descent algorithm Wefurther provide experimental results based on four UCIdatasets NIPS2003 feature selection challenge datasets andtrue daily life activities identification datasets The resultsshow that the interaction widely exists in the classificationmodels They also demonstrate that the variable interactioncontributes to the response The classification performanceof our method is superior to the lasso all-pair lasso andsome pattern recognition methods It turns out that thevariable interaction and hierarchy are two important factorsOur further research is planned as follows other convexoptimization methods including the generalized gradientdescent method or alternating direction multiplier methodthe hierarchical interactive lasso penalized multiclass logisticregressionmethod the elastic net method or the hierarchicalgroup lasso method The application of the multisensorinteraction in the daily life activities of the elderly is a newway of using of our method
Mathematical Problems in Engineering 7
Table 5 The experimental results of the traditional pattern recognition methods
Datasets Classifier Error rate () SD Time (s)
Breast-cancer-Wisconsin
SVM 342 00019 13092LDA 396 00007 02429QDA 492 00014 02309K-NN 461 00030 06636DT 499 00041 16201
Ionosphere
SVM 3583 00063 29149LDA 3310 00090 06168QDA 3201 00076 04516K-NN 2830 00090 06363DT 3823 00408 30865
Liver disorders
SVM 3080 00100 11401LDA 3131 00089 02474QDA 4019 00119 01958K-NN 3689 00124 06089DT 3694 00187 19921
Sonar
SVM 2570 00197 12181LDA 2513 00145 05897QDA 2380 00205 04594K-NN 1309 00101 05345DT 3591 00303 07153
Table 6 The experimental results for the Madelon datasets
Methods Balance error rate () Area under the curve ()Training Validation Testing Training Validation Testing
Lasso 3808 3721 3661 6478 6279 6340All-pair lasso 3498 3701 3657 6502 6299 6343Our method 3000 3635 3512 7000 6808 6776
Appendices
A Proofs from (5) to (6)
For notational convenience we have written 119901119894instead of
119901(119909119894) A logarithmic likelihood function of (4) is as follows
119871119876=
1
119873ln [119871 (120573 Θ)]
=1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
(A1)
First we give the first- and second-order partial derivativeand mixed partial derivative of (4) with respect to 120573 and Θ
120597119871119876
120597120573
=120597 (1119873)sum
119873
119894=1[119910119894(119909119879119894120573 + (12) 119909119879
119894Θ119909119894) + ln (1 minus 119901
119894)]
120597120573
=1
119873
119873
sum119894=1
[(119909119879119894119910119894)119879
minusexp (119909119879
119894120573 + (12) 119909119879
119894Θ119909119894)
1 + exp (119909119879119894120573 + (12) 119909119879
119894Θ119909119894)sdot 119909119894]
=1
119873
119873
sum119894=1
(119910119879119894119909119894minus 119901119879119894119909119894) =
1
119873
119873
sum119894=1
119909119894(119910119894minus 119901119894)
1205972119871119876
1205971205732
=120597
120597120573
1
119873
119873
sum119894=1
119909119879119894(119910119894minus 119901119894) = minus
1
119873
119873
sum119894=1
119909119879119894
120597119901119894
120597120573
=1
119873
119873
sum119894=1
119909119879119894
exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
[1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)]2sdot (minus119909119894)
= minus1
119873
119873
sum119894=1
119909119894
exp (119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
sdot1
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)sdot 119909119879119894
= minus1
119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894
120597119871119876
120597Θ
8 Mathematical Problems in Engineering
Table 7 The experimental results of the three lasso methods and five traditional pattern recognition methods
Methods Error rate in training () Training time (s) Error rate in testing () Testing time (s)Lasso 000 02649 146 00312All-pair lasso 000 08807 135 00953Our method 000 06012 112 00556SVM 000 09516 123 009361-NN 000 00000 920 079563-NN 000 00000 898 07488QDA 661 14196 404 02184DT 000 07332 1493 01092
0055
005
0045
004
0035
003
0025
002
0015
Erro
r rat
e
Lasso Our method All-pair lasso
(a) Hierarchical interaction
0055
005
0045
004
0035
003
0025
002
0015
Erro
r rat
e
Lasso Our method All-pair lasso
(b) Interaction variances only
0055
005
0045
004
0035
003
0025
002
0015
Erro
r rat
e
Lasso Our method All-pair lasso
(c) Main variances only
Figure 6 The error rate of the three lasso methods in the simulation experiment (the Bayes error is shown by the purple dotted line in thegraph)
Mathematical Problems in Engineering 9
=120597 (1119873)sum
119873
119894=1[119910119894(119909119879119894120573 + (12) 119909119879
119894Θ119909119894) + ln (1 minus 119901
119894)]
120597Θ
=1
119873
119873
sum119894=1
[(1
2119910119894119909119879119894119909119894)119879
minusexp (119909119879
119894120573 + (12) 119909119879
119894Θ119909119894)
1 + exp (119909119879119894120573 + (12) 119909119879
119894Θ119909119894)
sdot1
2sdot 119909119879119894119909119894]
=1
119873
119873
sum119894=1
(1
2119910119894119909119879119894119909119894minus1
2119901119894119909119879119894119909119894)
=1
2119873
119873
sum119894=1
119909119879119894119909119894(119910119894minus 119901119894)
1205972119871119876
120597Θ2
=120597
120597Θ
1
2119873
119873
sum119894=1
119909119879119894119909119894(119910119894minus 119901119894) = minus
1
2119873
119873
sum119894=1
119909119879119894119909119894
120597119901119894
120597Θ
=1
2119873
119873
sum119894=1
exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
[1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)]2
sdot (minus1
2119909119879119894119909119894) sdot 119909119879119894119909119894
= minus1
4119873
119873
sum119894=1
exp (119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
sdot1
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)sdot1003817100381710038171003817119909119894
10038171003817100381710038172
= minus1
4119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot1003817100381710038171003817119909119894
10038171003817100381710038172
1205972119871119876
120597120573120597Θ
=120597
120597120573
1
119873
119873
sum119894=1
119909119879119894(119910119894minus 119901119894) = minus
1
119873
119873
sum119894=1
119909119879119894
120597119901119894
120597Θ
=1
119873
119873
sum119894=1
119909119879119894
exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
[1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)]2
sdot (minus1
2119909119879119894119909119894)
= minus1
119873
119873
sum119894=1
119909119879119894
exp (119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
sdot1
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
= minus1
2119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894sdot 119909119879119894
(A2)
Then (A1) is expanded by using Taylor series with respectto the expended point (120573 Θ)
119897119876(120573 Θ)
= 119871119876+ (120573 minus 120573) sdot
120597119871119876
120597120573+ (Θ minus Θ) sdot
120597119871119876
120597Θ
+1
2[(120573 minus 120573) sdot
120597119871119876
1205971205732+ 2 (120573 minus 120573)
119879
(Θ minus Θ)1205972119871119876
120597120573120597Θ
+ (Θ minus Θ) sdot120597119871119876
120597Θ2]
=1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
+ (120573 minus 120573) sdot1
119873
119873
sum119894=1
119909119894(119910119894minus 119901119894)
+ (Θ minus Θ) sdot1
2119873
119873
sum119894=1
119909119879119894119909119894(119910119894minus 119901119894)
minus1
2(120573 minus 120573) sdot
1
119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894
minus 2 (120573 minus 120573)119879
(Θ minus Θ)
sdot1
2119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894sdot 119909119879119894
minus (Θ minus Θ) sdot1
4119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot1003817100381710038171003817119909119894
10038171003817100381710038172
= minus1
2119873
119873
sum119894=1
(1 minus 119901119894)
sdot 119901119894[119909119879119894(120573 minus 120573)]
2
+ (120573 minus 120573)119879
(Θ minus Θ) 119909119894
sdot 119909119879119894119909119894+1
4[119909119879119894(Θ minus Θ) 119909
119894]2
+1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
+1
119873
119873
sum119894=1
119909119879119894(120573 minus 120573) (119910
119894minus 119901119894)
+1
2119873
119873
sum119894=1
119909119879119894(Θ minus Θ) 119909
119894(119910119894minus 119901119894)
= minus1
2119873
119873
sum119894=1
(1 minus 119901119894)
10 Mathematical Problems in Engineering
sdot 119901119894119909119879119894(120573 minus 120573)
+1
2119909119879119894(Θ minus Θ) 119909
1198942
+1
119873
119873
sum119894=1
119909119879119894(120573 minus 120573) (119910
119894minus 119901119894)
+1
2119873
119873
sum119894=1
119909119879119894(Θ minus Θ) 119909
119894(119910119894minus 119901119894)
+1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
= minus1
2119873
119873
sum119894=1
(1 minus 119901119894)
sdot 119901119894119909119879119894(120573 minus 120573) +
1
2119909119879119894(Θ minus Θ) 119909
119894
+119910119894minus 119901119894
119901119894[1 minus 119901
119894]
2
+1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
minus1
2119873
119873
sum119894=1
[119910119894minus 119901119894
119901119894[1 minus 119901
119894]]
2
= minus1
2119873
119873
sum119894=1
120596119894(119911119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896)
2
+1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
minus1
2119873
119873
sum119894=1
[119910119894minus 119901119894
119901119894[1 minus 119901
119894]]
2
(A3)
where
120596119894= 119901 (119909
119894) [1 minus 119901 (119909
119894)]
119911119894= sum119895
120573119895119909119894119895+1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896
+119910119894minus 119901 (119909
119894)
119901 (119909119894) [1 minus 119901 (119909
119894)]
(A4)
B Proofs from (12) to (13)
First there are three cases in calculating 120573+119895minus 120573minus119895
(1) 120573+119895ge 0 120573minus
119895= 0
120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+
119895minus 120573minus119895minus1205821+ 119895
120596119894
(B1)
(2) 120573+119895ge 0 120573minus
119895= 0
120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+
119895minus 120573minus119895+1205821+ 119895
120596119894
(B2)
(3) 120573+119895ge 0 120573minus
119895= 0
120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+
119895minus 120573minus119895 (B3)
We derive
120573+119895minus 120573minus119895= 119878 [119909
119894119895(119911119894minus 119860119894) + 120573+119895minus 120573minus1198951205821+ 119895
120596119894
] (B4)
Secondly we give the following by the stationary condi-tion 120597119871120597Θ
119895119896= 0
1
2120596119894sdot 2(119911
119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896)
sdot (minus1
2119909119894119895119909119894119896) + (120582
2+ 120572119895)119880119895119896= 0
120596119894sdot (119911119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896)
sdot (1
2119909119894119895119909119894119896) = (120582
2+ 120572119895)119880119895119896
(119911119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896)
sdot (1
2119909119894119895119909119894119896) =
(1205822+ 120572119895)119880119895119896
120596119894
(B5)
Supposing
120574(minus119895119896) = 119911119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896+ Θ119895119896119909119894119895119909119894119896 (B6)
we have
(120574(minus119895119896) minus Θ119895119896119909119894119895119909119894119896) sdot (119909119894119895119909119894119896)
=2 (1205822+ 120572119895)119880119895119896
120596119894
Θ119895119896119909119894119895119909119894119896119909119894119895119909119894119896
= minus2 (1205822+ 120572119895)119880119895119896
120596119894
+ 120574(minus119895119896) sdot 119909119894119895119909119894119896
Mathematical Problems in Engineering 11
Θ119895119896
=120574(minus119895119896) sdot 119909
119894119895119909119894119896minus 2 (120582
2+ 120572119895)119880119895119896120596119894
(119909119894119895119909119894119896)2
(B7)
We discuss three cases about Θ119895119896value
(1) Θ119895119896gt 0 119880
119895119896= 1
Θ119895119896=120574(minus119895119896) sdot 119909
119894119895119909119894119896minus 2 (120582
2+ 120572119895) 120596119894
(119909119894119895119909119894119896)2
(B8)
(2) Θ119895119896lt 0 119880
119895119896= minus1
Θ119895119896=120574(minus119895119896) sdot 119909
119894119895119909119894119896+ 2 (120582
2+ 120572119895) 120596119894
(119909119894119895119909119894119896)2
(B9)
(3) Θ119895119896= 0
We drive
Θ119895119896=119878 [120574(minus119895119896) sdot 119909
119894119895119909119894119896 2 (1205822+ 120572119895) 120596119894]
(119909119894119895119909119894119896)2
(B10)
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work was supported by the National Natural ScienceFoundation of China (nos 61273019 61473339) China Post-doctoral Science Foundation (2014M561202) Hebei Postdoc-toral Science Foundation Special Fund Project and HebeiTop Young Talents Support Program
References
[1] R Tibshirani ldquoRegression shrinkage and selection via the lassordquoJournal of the Royal Statistical Society B vol 58 no 1 pp 267ndash288 1996
[2] M Y Park and T Hastie ldquo1198711-regularization path algorithm
for generalized linear modelsrdquo Journal of the Royal StatisticalSociety B Statistical Methodology vol 69 no 4 pp 659ndash6772007
[3] T TWu Y F Chen THastie E Sobel andK Lange ldquoGenome-wide association analysis by lasso penalized logistic regressionrdquoBioinformatics vol 25 no 6 pp 714ndash721 2009
[4] J Friedman T Hastie and R Tibshirani ldquoRegularization pathsfor generalized linear models via coordinate descentrdquo Journal ofStatistical Software vol 33 no 1 pp 1ndash22 2010
[5] M Lim and T Hastie ldquoLearning interactions via hierarchicalgroup-lasso regularization[EB]rdquo httpwwwstanfordedusimhastiePapersglinternetpdf
[6] H Schwender and K Ickstadt ldquoIdentification of SNP interac-tions using logic regressionrdquo Biostatistics vol 9 no 1 pp 187ndash198 2008
[7] S Noah and R Tibshirani ldquoA Permutation Approach to TestingInteractions inManyDimensionsrdquo httpstatwebstanfordedusimtibsresearchhtml
[8] JWu BDevlin S RingquistM Trucco andK Roeder ldquoScreenand clean a tool for identifying interactions in genome-wideassociation studiesrdquo Genetic Epidemiology vol 34 no 3 pp275ndash285 2010
[9] Y Nardi and A Rinaldo ldquoThe log-linear group-lasso estimatorand its asymptotic propertiesrdquo Bernoulli vol 18 no 3 pp 945ndash974 2012
[10] M Yuan V R Joseph and Y Lin ldquoAn efficient variable selectionapproach for analyzing designed experimentsrdquo Technometricsvol 49 no 4 pp 430ndash439 2007
[11] H Chipman ldquoBayesian variable selection with related predic-torsrdquoThe Canadian Journal of Statistics vol 24 no 1 pp 17ndash361996
[12] N H Choi W Li and J Zhu ldquoVariable selection with thestrong heredity constraint and its oracle propertyrdquo Journal of theAmerican Statistical Association vol 105 no 489 pp 354ndash3642010
[13] J Bien J Taylor and R Tibshirani ldquoA lasso for hierarchicalinteractionsrdquoTheAnnals of Statistics vol 41 no 3 pp 1111ndash11412013
[14] M Yuan and Y Lin ldquoModel selection and estimation in regres-sion with grouped variablesrdquo Journal of the Royal StatisticalSociety Series B StatisticalMethodology vol 68 no 1 pp 49ndash672006
[15] R Jenatton J-Y Audibert and F Bach ldquoStructured variableselection with sparsity-inducing normsrdquo Journal of MachineLearning Research vol 12 no 10 pp 2777ndash2824 2011
[16] P Radchenko and G M James ldquoVariable selection usingadaptive nonlinear interaction structures in high dimensionsrdquoJournal of the American Statistical Association vol 105 no 492pp 1541ndash1553 2010
[17] F Bach R Jenatton J Mairal and G Obozinski ldquoStructuredsparsity through convex optimizationrdquo Statistical Science vol27 no 4 pp 450ndash468 2012
[18] I Ruczinski C Kooperberg and M LeBlanc ldquoLogic regres-sionrdquo Journal of Computational and Graphical Statistics vol 12no 3 pp 475ndash511 2003
[19] P Hall and J-H Xue ldquoOn selecting interacting features fromhigh-dimensional datardquo Computational Statistics amp Data Anal-ysis vol 71 pp 694ndash708 2014
[20] J-J Wang J Li T Zhang and W-X Hong ldquoDistinguishingvisual feature extraction method using quadratic map andgenetic algorithmrdquo Journal of System Simulation vol 21 no 16pp 5080ndash5083 2009
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Mathematical Problems in Engineering 7
Table 5 The experimental results of the traditional pattern recognition methods
Datasets Classifier Error rate () SD Time (s)
Breast-cancer-Wisconsin
SVM 342 00019 13092LDA 396 00007 02429QDA 492 00014 02309K-NN 461 00030 06636DT 499 00041 16201
Ionosphere
SVM 3583 00063 29149LDA 3310 00090 06168QDA 3201 00076 04516K-NN 2830 00090 06363DT 3823 00408 30865
Liver disorders
SVM 3080 00100 11401LDA 3131 00089 02474QDA 4019 00119 01958K-NN 3689 00124 06089DT 3694 00187 19921
Sonar
SVM 2570 00197 12181LDA 2513 00145 05897QDA 2380 00205 04594K-NN 1309 00101 05345DT 3591 00303 07153
Table 6 The experimental results for the Madelon datasets
Methods Balance error rate () Area under the curve ()Training Validation Testing Training Validation Testing
Lasso 3808 3721 3661 6478 6279 6340All-pair lasso 3498 3701 3657 6502 6299 6343Our method 3000 3635 3512 7000 6808 6776
Appendices
A Proofs from (5) to (6)
For notational convenience we have written 119901119894instead of
119901(119909119894) A logarithmic likelihood function of (4) is as follows
119871119876=
1
119873ln [119871 (120573 Θ)]
=1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
(A1)
First we give the first- and second-order partial derivativeand mixed partial derivative of (4) with respect to 120573 and Θ
120597119871119876
120597120573
=120597 (1119873)sum
119873
119894=1[119910119894(119909119879119894120573 + (12) 119909119879
119894Θ119909119894) + ln (1 minus 119901
119894)]
120597120573
=1
119873
119873
sum119894=1
[(119909119879119894119910119894)119879
minusexp (119909119879
119894120573 + (12) 119909119879
119894Θ119909119894)
1 + exp (119909119879119894120573 + (12) 119909119879
119894Θ119909119894)sdot 119909119894]
=1
119873
119873
sum119894=1
(119910119879119894119909119894minus 119901119879119894119909119894) =
1
119873
119873
sum119894=1
119909119894(119910119894minus 119901119894)
1205972119871119876
1205971205732
=120597
120597120573
1
119873
119873
sum119894=1
119909119879119894(119910119894minus 119901119894) = minus
1
119873
119873
sum119894=1
119909119879119894
120597119901119894
120597120573
=1
119873
119873
sum119894=1
119909119879119894
exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
[1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)]2sdot (minus119909119894)
= minus1
119873
119873
sum119894=1
119909119894
exp (119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
sdot1
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)sdot 119909119879119894
= minus1
119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894
120597119871119876
120597Θ
8 Mathematical Problems in Engineering
Table 7 The experimental results of the three lasso methods and five traditional pattern recognition methods
Methods Error rate in training () Training time (s) Error rate in testing () Testing time (s)Lasso 000 02649 146 00312All-pair lasso 000 08807 135 00953Our method 000 06012 112 00556SVM 000 09516 123 009361-NN 000 00000 920 079563-NN 000 00000 898 07488QDA 661 14196 404 02184DT 000 07332 1493 01092
0055
005
0045
004
0035
003
0025
002
0015
Erro
r rat
e
Lasso Our method All-pair lasso
(a) Hierarchical interaction
0055
005
0045
004
0035
003
0025
002
0015
Erro
r rat
e
Lasso Our method All-pair lasso
(b) Interaction variances only
0055
005
0045
004
0035
003
0025
002
0015
Erro
r rat
e
Lasso Our method All-pair lasso
(c) Main variances only
Figure 6 The error rate of the three lasso methods in the simulation experiment (the Bayes error is shown by the purple dotted line in thegraph)
Mathematical Problems in Engineering 9
=120597 (1119873)sum
119873
119894=1[119910119894(119909119879119894120573 + (12) 119909119879
119894Θ119909119894) + ln (1 minus 119901
119894)]
120597Θ
=1
119873
119873
sum119894=1
[(1
2119910119894119909119879119894119909119894)119879
minusexp (119909119879
119894120573 + (12) 119909119879
119894Θ119909119894)
1 + exp (119909119879119894120573 + (12) 119909119879
119894Θ119909119894)
sdot1
2sdot 119909119879119894119909119894]
=1
119873
119873
sum119894=1
(1
2119910119894119909119879119894119909119894minus1
2119901119894119909119879119894119909119894)
=1
2119873
119873
sum119894=1
119909119879119894119909119894(119910119894minus 119901119894)
1205972119871119876
120597Θ2
=120597
120597Θ
1
2119873
119873
sum119894=1
119909119879119894119909119894(119910119894minus 119901119894) = minus
1
2119873
119873
sum119894=1
119909119879119894119909119894
120597119901119894
120597Θ
=1
2119873
119873
sum119894=1
exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
[1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)]2
sdot (minus1
2119909119879119894119909119894) sdot 119909119879119894119909119894
= minus1
4119873
119873
sum119894=1
exp (119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
sdot1
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)sdot1003817100381710038171003817119909119894
10038171003817100381710038172
= minus1
4119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot1003817100381710038171003817119909119894
10038171003817100381710038172
1205972119871119876
120597120573120597Θ
=120597
120597120573
1
119873
119873
sum119894=1
119909119879119894(119910119894minus 119901119894) = minus
1
119873
119873
sum119894=1
119909119879119894
120597119901119894
120597Θ
=1
119873
119873
sum119894=1
119909119879119894
exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
[1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)]2
sdot (minus1
2119909119879119894119909119894)
= minus1
119873
119873
sum119894=1
119909119879119894
exp (119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
sdot1
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
= minus1
2119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894sdot 119909119879119894
(A2)
Then (A1) is expanded by using Taylor series with respectto the expended point (120573 Θ)
119897119876(120573 Θ)
= 119871119876+ (120573 minus 120573) sdot
120597119871119876
120597120573+ (Θ minus Θ) sdot
120597119871119876
120597Θ
+1
2[(120573 minus 120573) sdot
120597119871119876
1205971205732+ 2 (120573 minus 120573)
119879
(Θ minus Θ)1205972119871119876
120597120573120597Θ
+ (Θ minus Θ) sdot120597119871119876
120597Θ2]
=1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
+ (120573 minus 120573) sdot1
119873
119873
sum119894=1
119909119894(119910119894minus 119901119894)
+ (Θ minus Θ) sdot1
2119873
119873
sum119894=1
119909119879119894119909119894(119910119894minus 119901119894)
minus1
2(120573 minus 120573) sdot
1
119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894
minus 2 (120573 minus 120573)119879
(Θ minus Θ)
sdot1
2119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894sdot 119909119879119894
minus (Θ minus Θ) sdot1
4119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot1003817100381710038171003817119909119894
10038171003817100381710038172
= minus1
2119873
119873
sum119894=1
(1 minus 119901119894)
sdot 119901119894[119909119879119894(120573 minus 120573)]
2
+ (120573 minus 120573)119879
(Θ minus Θ) 119909119894
sdot 119909119879119894119909119894+1
4[119909119879119894(Θ minus Θ) 119909
119894]2
+1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
+1
119873
119873
sum119894=1
119909119879119894(120573 minus 120573) (119910
119894minus 119901119894)
+1
2119873
119873
sum119894=1
119909119879119894(Θ minus Θ) 119909
119894(119910119894minus 119901119894)
= minus1
2119873
119873
sum119894=1
(1 minus 119901119894)
10 Mathematical Problems in Engineering
sdot 119901119894119909119879119894(120573 minus 120573)
+1
2119909119879119894(Θ minus Θ) 119909
1198942
+1
119873
119873
sum119894=1
119909119879119894(120573 minus 120573) (119910
119894minus 119901119894)
+1
2119873
119873
sum119894=1
119909119879119894(Θ minus Θ) 119909
119894(119910119894minus 119901119894)
+1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
= minus1
2119873
119873
sum119894=1
(1 minus 119901119894)
sdot 119901119894119909119879119894(120573 minus 120573) +
1
2119909119879119894(Θ minus Θ) 119909
119894
+119910119894minus 119901119894
119901119894[1 minus 119901
119894]
2
+1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
minus1
2119873
119873
sum119894=1
[119910119894minus 119901119894
119901119894[1 minus 119901
119894]]
2
= minus1
2119873
119873
sum119894=1
120596119894(119911119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896)
2
+1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
minus1
2119873
119873
sum119894=1
[119910119894minus 119901119894
119901119894[1 minus 119901
119894]]
2
(A3)
where
120596119894= 119901 (119909
119894) [1 minus 119901 (119909
119894)]
119911119894= sum119895
120573119895119909119894119895+1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896
+119910119894minus 119901 (119909
119894)
119901 (119909119894) [1 minus 119901 (119909
119894)]
(A4)
B Proofs from (12) to (13)
First there are three cases in calculating 120573+119895minus 120573minus119895
(1) 120573+119895ge 0 120573minus
119895= 0
120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+
119895minus 120573minus119895minus1205821+ 119895
120596119894
(B1)
(2) 120573+119895ge 0 120573minus
119895= 0
120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+
119895minus 120573minus119895+1205821+ 119895
120596119894
(B2)
(3) 120573+119895ge 0 120573minus
119895= 0
120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+
119895minus 120573minus119895 (B3)
We derive
120573+119895minus 120573minus119895= 119878 [119909
119894119895(119911119894minus 119860119894) + 120573+119895minus 120573minus1198951205821+ 119895
120596119894
] (B4)
Secondly we give the following by the stationary condi-tion 120597119871120597Θ
119895119896= 0
1
2120596119894sdot 2(119911
119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896)
sdot (minus1
2119909119894119895119909119894119896) + (120582
2+ 120572119895)119880119895119896= 0
120596119894sdot (119911119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896)
sdot (1
2119909119894119895119909119894119896) = (120582
2+ 120572119895)119880119895119896
(119911119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896)
sdot (1
2119909119894119895119909119894119896) =
(1205822+ 120572119895)119880119895119896
120596119894
(B5)
Supposing
120574(minus119895119896) = 119911119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896+ Θ119895119896119909119894119895119909119894119896 (B6)
we have
(120574(minus119895119896) minus Θ119895119896119909119894119895119909119894119896) sdot (119909119894119895119909119894119896)
=2 (1205822+ 120572119895)119880119895119896
120596119894
Θ119895119896119909119894119895119909119894119896119909119894119895119909119894119896
= minus2 (1205822+ 120572119895)119880119895119896
120596119894
+ 120574(minus119895119896) sdot 119909119894119895119909119894119896
Mathematical Problems in Engineering 11
Θ119895119896
=120574(minus119895119896) sdot 119909
119894119895119909119894119896minus 2 (120582
2+ 120572119895)119880119895119896120596119894
(119909119894119895119909119894119896)2
(B7)
We discuss three cases about Θ119895119896value
(1) Θ119895119896gt 0 119880
119895119896= 1
Θ119895119896=120574(minus119895119896) sdot 119909
119894119895119909119894119896minus 2 (120582
2+ 120572119895) 120596119894
(119909119894119895119909119894119896)2
(B8)
(2) Θ119895119896lt 0 119880
119895119896= minus1
Θ119895119896=120574(minus119895119896) sdot 119909
119894119895119909119894119896+ 2 (120582
2+ 120572119895) 120596119894
(119909119894119895119909119894119896)2
(B9)
(3) Θ119895119896= 0
We drive
Θ119895119896=119878 [120574(minus119895119896) sdot 119909
119894119895119909119894119896 2 (1205822+ 120572119895) 120596119894]
(119909119894119895119909119894119896)2
(B10)
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work was supported by the National Natural ScienceFoundation of China (nos 61273019 61473339) China Post-doctoral Science Foundation (2014M561202) Hebei Postdoc-toral Science Foundation Special Fund Project and HebeiTop Young Talents Support Program
References
[1] R Tibshirani ldquoRegression shrinkage and selection via the lassordquoJournal of the Royal Statistical Society B vol 58 no 1 pp 267ndash288 1996
[2] M Y Park and T Hastie ldquo1198711-regularization path algorithm
for generalized linear modelsrdquo Journal of the Royal StatisticalSociety B Statistical Methodology vol 69 no 4 pp 659ndash6772007
[3] T TWu Y F Chen THastie E Sobel andK Lange ldquoGenome-wide association analysis by lasso penalized logistic regressionrdquoBioinformatics vol 25 no 6 pp 714ndash721 2009
[4] J Friedman T Hastie and R Tibshirani ldquoRegularization pathsfor generalized linear models via coordinate descentrdquo Journal ofStatistical Software vol 33 no 1 pp 1ndash22 2010
[5] M Lim and T Hastie ldquoLearning interactions via hierarchicalgroup-lasso regularization[EB]rdquo httpwwwstanfordedusimhastiePapersglinternetpdf
[6] H Schwender and K Ickstadt ldquoIdentification of SNP interac-tions using logic regressionrdquo Biostatistics vol 9 no 1 pp 187ndash198 2008
[7] S Noah and R Tibshirani ldquoA Permutation Approach to TestingInteractions inManyDimensionsrdquo httpstatwebstanfordedusimtibsresearchhtml
[8] JWu BDevlin S RingquistM Trucco andK Roeder ldquoScreenand clean a tool for identifying interactions in genome-wideassociation studiesrdquo Genetic Epidemiology vol 34 no 3 pp275ndash285 2010
[9] Y Nardi and A Rinaldo ldquoThe log-linear group-lasso estimatorand its asymptotic propertiesrdquo Bernoulli vol 18 no 3 pp 945ndash974 2012
[10] M Yuan V R Joseph and Y Lin ldquoAn efficient variable selectionapproach for analyzing designed experimentsrdquo Technometricsvol 49 no 4 pp 430ndash439 2007
[11] H Chipman ldquoBayesian variable selection with related predic-torsrdquoThe Canadian Journal of Statistics vol 24 no 1 pp 17ndash361996
[12] N H Choi W Li and J Zhu ldquoVariable selection with thestrong heredity constraint and its oracle propertyrdquo Journal of theAmerican Statistical Association vol 105 no 489 pp 354ndash3642010
[13] J Bien J Taylor and R Tibshirani ldquoA lasso for hierarchicalinteractionsrdquoTheAnnals of Statistics vol 41 no 3 pp 1111ndash11412013
[14] M Yuan and Y Lin ldquoModel selection and estimation in regres-sion with grouped variablesrdquo Journal of the Royal StatisticalSociety Series B StatisticalMethodology vol 68 no 1 pp 49ndash672006
[15] R Jenatton J-Y Audibert and F Bach ldquoStructured variableselection with sparsity-inducing normsrdquo Journal of MachineLearning Research vol 12 no 10 pp 2777ndash2824 2011
[16] P Radchenko and G M James ldquoVariable selection usingadaptive nonlinear interaction structures in high dimensionsrdquoJournal of the American Statistical Association vol 105 no 492pp 1541ndash1553 2010
[17] F Bach R Jenatton J Mairal and G Obozinski ldquoStructuredsparsity through convex optimizationrdquo Statistical Science vol27 no 4 pp 450ndash468 2012
[18] I Ruczinski C Kooperberg and M LeBlanc ldquoLogic regres-sionrdquo Journal of Computational and Graphical Statistics vol 12no 3 pp 475ndash511 2003
[19] P Hall and J-H Xue ldquoOn selecting interacting features fromhigh-dimensional datardquo Computational Statistics amp Data Anal-ysis vol 71 pp 694ndash708 2014
[20] J-J Wang J Li T Zhang and W-X Hong ldquoDistinguishingvisual feature extraction method using quadratic map andgenetic algorithmrdquo Journal of System Simulation vol 21 no 16pp 5080ndash5083 2009
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
8 Mathematical Problems in Engineering
Table 7 The experimental results of the three lasso methods and five traditional pattern recognition methods
Methods Error rate in training () Training time (s) Error rate in testing () Testing time (s)Lasso 000 02649 146 00312All-pair lasso 000 08807 135 00953Our method 000 06012 112 00556SVM 000 09516 123 009361-NN 000 00000 920 079563-NN 000 00000 898 07488QDA 661 14196 404 02184DT 000 07332 1493 01092
0055
005
0045
004
0035
003
0025
002
0015
Erro
r rat
e
Lasso Our method All-pair lasso
(a) Hierarchical interaction
0055
005
0045
004
0035
003
0025
002
0015
Erro
r rat
e
Lasso Our method All-pair lasso
(b) Interaction variances only
0055
005
0045
004
0035
003
0025
002
0015
Erro
r rat
e
Lasso Our method All-pair lasso
(c) Main variances only
Figure 6 The error rate of the three lasso methods in the simulation experiment (the Bayes error is shown by the purple dotted line in thegraph)
Mathematical Problems in Engineering 9
=120597 (1119873)sum
119873
119894=1[119910119894(119909119879119894120573 + (12) 119909119879
119894Θ119909119894) + ln (1 minus 119901
119894)]
120597Θ
=1
119873
119873
sum119894=1
[(1
2119910119894119909119879119894119909119894)119879
minusexp (119909119879
119894120573 + (12) 119909119879
119894Θ119909119894)
1 + exp (119909119879119894120573 + (12) 119909119879
119894Θ119909119894)
sdot1
2sdot 119909119879119894119909119894]
=1
119873
119873
sum119894=1
(1
2119910119894119909119879119894119909119894minus1
2119901119894119909119879119894119909119894)
=1
2119873
119873
sum119894=1
119909119879119894119909119894(119910119894minus 119901119894)
1205972119871119876
120597Θ2
=120597
120597Θ
1
2119873
119873
sum119894=1
119909119879119894119909119894(119910119894minus 119901119894) = minus
1
2119873
119873
sum119894=1
119909119879119894119909119894
120597119901119894
120597Θ
=1
2119873
119873
sum119894=1
exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
[1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)]2
sdot (minus1
2119909119879119894119909119894) sdot 119909119879119894119909119894
= minus1
4119873
119873
sum119894=1
exp (119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
sdot1
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)sdot1003817100381710038171003817119909119894
10038171003817100381710038172
= minus1
4119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot1003817100381710038171003817119909119894
10038171003817100381710038172
1205972119871119876
120597120573120597Θ
=120597
120597120573
1
119873
119873
sum119894=1
119909119879119894(119910119894minus 119901119894) = minus
1
119873
119873
sum119894=1
119909119879119894
120597119901119894
120597Θ
=1
119873
119873
sum119894=1
119909119879119894
exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
[1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)]2
sdot (minus1
2119909119879119894119909119894)
= minus1
119873
119873
sum119894=1
119909119879119894
exp (119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
sdot1
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
= minus1
2119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894sdot 119909119879119894
(A2)
Then (A1) is expanded by using Taylor series with respectto the expended point (120573 Θ)
119897119876(120573 Θ)
= 119871119876+ (120573 minus 120573) sdot
120597119871119876
120597120573+ (Θ minus Θ) sdot
120597119871119876
120597Θ
+1
2[(120573 minus 120573) sdot
120597119871119876
1205971205732+ 2 (120573 minus 120573)
119879
(Θ minus Θ)1205972119871119876
120597120573120597Θ
+ (Θ minus Θ) sdot120597119871119876
120597Θ2]
=1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
+ (120573 minus 120573) sdot1
119873
119873
sum119894=1
119909119894(119910119894minus 119901119894)
+ (Θ minus Θ) sdot1
2119873
119873
sum119894=1
119909119879119894119909119894(119910119894minus 119901119894)
minus1
2(120573 minus 120573) sdot
1
119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894
minus 2 (120573 minus 120573)119879
(Θ minus Θ)
sdot1
2119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894sdot 119909119879119894
minus (Θ minus Θ) sdot1
4119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot1003817100381710038171003817119909119894
10038171003817100381710038172
= minus1
2119873
119873
sum119894=1
(1 minus 119901119894)
sdot 119901119894[119909119879119894(120573 minus 120573)]
2
+ (120573 minus 120573)119879
(Θ minus Θ) 119909119894
sdot 119909119879119894119909119894+1
4[119909119879119894(Θ minus Θ) 119909
119894]2
+1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
+1
119873
119873
sum119894=1
119909119879119894(120573 minus 120573) (119910
119894minus 119901119894)
+1
2119873
119873
sum119894=1
119909119879119894(Θ minus Θ) 119909
119894(119910119894minus 119901119894)
= minus1
2119873
119873
sum119894=1
(1 minus 119901119894)
10 Mathematical Problems in Engineering
sdot 119901119894119909119879119894(120573 minus 120573)
+1
2119909119879119894(Θ minus Θ) 119909
1198942
+1
119873
119873
sum119894=1
119909119879119894(120573 minus 120573) (119910
119894minus 119901119894)
+1
2119873
119873
sum119894=1
119909119879119894(Θ minus Θ) 119909
119894(119910119894minus 119901119894)
+1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
= minus1
2119873
119873
sum119894=1
(1 minus 119901119894)
sdot 119901119894119909119879119894(120573 minus 120573) +
1
2119909119879119894(Θ minus Θ) 119909
119894
+119910119894minus 119901119894
119901119894[1 minus 119901
119894]
2
+1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
minus1
2119873
119873
sum119894=1
[119910119894minus 119901119894
119901119894[1 minus 119901
119894]]
2
= minus1
2119873
119873
sum119894=1
120596119894(119911119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896)
2
+1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
minus1
2119873
119873
sum119894=1
[119910119894minus 119901119894
119901119894[1 minus 119901
119894]]
2
(A3)
where
120596119894= 119901 (119909
119894) [1 minus 119901 (119909
119894)]
119911119894= sum119895
120573119895119909119894119895+1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896
+119910119894minus 119901 (119909
119894)
119901 (119909119894) [1 minus 119901 (119909
119894)]
(A4)
B Proofs from (12) to (13)
First there are three cases in calculating 120573+119895minus 120573minus119895
(1) 120573+119895ge 0 120573minus
119895= 0
120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+
119895minus 120573minus119895minus1205821+ 119895
120596119894
(B1)
(2) 120573+119895ge 0 120573minus
119895= 0
120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+
119895minus 120573minus119895+1205821+ 119895
120596119894
(B2)
(3) 120573+119895ge 0 120573minus
119895= 0
120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+
119895minus 120573minus119895 (B3)
We derive
120573+119895minus 120573minus119895= 119878 [119909
119894119895(119911119894minus 119860119894) + 120573+119895minus 120573minus1198951205821+ 119895
120596119894
] (B4)
Secondly we give the following by the stationary condi-tion 120597119871120597Θ
119895119896= 0
1
2120596119894sdot 2(119911
119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896)
sdot (minus1
2119909119894119895119909119894119896) + (120582
2+ 120572119895)119880119895119896= 0
120596119894sdot (119911119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896)
sdot (1
2119909119894119895119909119894119896) = (120582
2+ 120572119895)119880119895119896
(119911119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896)
sdot (1
2119909119894119895119909119894119896) =
(1205822+ 120572119895)119880119895119896
120596119894
(B5)
Supposing
120574(minus119895119896) = 119911119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896+ Θ119895119896119909119894119895119909119894119896 (B6)
we have
(120574(minus119895119896) minus Θ119895119896119909119894119895119909119894119896) sdot (119909119894119895119909119894119896)
=2 (1205822+ 120572119895)119880119895119896
120596119894
Θ119895119896119909119894119895119909119894119896119909119894119895119909119894119896
= minus2 (1205822+ 120572119895)119880119895119896
120596119894
+ 120574(minus119895119896) sdot 119909119894119895119909119894119896
Mathematical Problems in Engineering 11
Θ119895119896
=120574(minus119895119896) sdot 119909
119894119895119909119894119896minus 2 (120582
2+ 120572119895)119880119895119896120596119894
(119909119894119895119909119894119896)2
(B7)
We discuss three cases about Θ119895119896value
(1) Θ119895119896gt 0 119880
119895119896= 1
Θ119895119896=120574(minus119895119896) sdot 119909
119894119895119909119894119896minus 2 (120582
2+ 120572119895) 120596119894
(119909119894119895119909119894119896)2
(B8)
(2) Θ119895119896lt 0 119880
119895119896= minus1
Θ119895119896=120574(minus119895119896) sdot 119909
119894119895119909119894119896+ 2 (120582
2+ 120572119895) 120596119894
(119909119894119895119909119894119896)2
(B9)
(3) Θ119895119896= 0
We drive
Θ119895119896=119878 [120574(minus119895119896) sdot 119909
119894119895119909119894119896 2 (1205822+ 120572119895) 120596119894]
(119909119894119895119909119894119896)2
(B10)
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work was supported by the National Natural ScienceFoundation of China (nos 61273019 61473339) China Post-doctoral Science Foundation (2014M561202) Hebei Postdoc-toral Science Foundation Special Fund Project and HebeiTop Young Talents Support Program
References
[1] R Tibshirani ldquoRegression shrinkage and selection via the lassordquoJournal of the Royal Statistical Society B vol 58 no 1 pp 267ndash288 1996
[2] M Y Park and T Hastie ldquo1198711-regularization path algorithm
for generalized linear modelsrdquo Journal of the Royal StatisticalSociety B Statistical Methodology vol 69 no 4 pp 659ndash6772007
[3] T TWu Y F Chen THastie E Sobel andK Lange ldquoGenome-wide association analysis by lasso penalized logistic regressionrdquoBioinformatics vol 25 no 6 pp 714ndash721 2009
[4] J Friedman T Hastie and R Tibshirani ldquoRegularization pathsfor generalized linear models via coordinate descentrdquo Journal ofStatistical Software vol 33 no 1 pp 1ndash22 2010
[5] M Lim and T Hastie ldquoLearning interactions via hierarchicalgroup-lasso regularization[EB]rdquo httpwwwstanfordedusimhastiePapersglinternetpdf
[6] H Schwender and K Ickstadt ldquoIdentification of SNP interac-tions using logic regressionrdquo Biostatistics vol 9 no 1 pp 187ndash198 2008
[7] S Noah and R Tibshirani ldquoA Permutation Approach to TestingInteractions inManyDimensionsrdquo httpstatwebstanfordedusimtibsresearchhtml
[8] JWu BDevlin S RingquistM Trucco andK Roeder ldquoScreenand clean a tool for identifying interactions in genome-wideassociation studiesrdquo Genetic Epidemiology vol 34 no 3 pp275ndash285 2010
[9] Y Nardi and A Rinaldo ldquoThe log-linear group-lasso estimatorand its asymptotic propertiesrdquo Bernoulli vol 18 no 3 pp 945ndash974 2012
[10] M Yuan V R Joseph and Y Lin ldquoAn efficient variable selectionapproach for analyzing designed experimentsrdquo Technometricsvol 49 no 4 pp 430ndash439 2007
[11] H Chipman ldquoBayesian variable selection with related predic-torsrdquoThe Canadian Journal of Statistics vol 24 no 1 pp 17ndash361996
[12] N H Choi W Li and J Zhu ldquoVariable selection with thestrong heredity constraint and its oracle propertyrdquo Journal of theAmerican Statistical Association vol 105 no 489 pp 354ndash3642010
[13] J Bien J Taylor and R Tibshirani ldquoA lasso for hierarchicalinteractionsrdquoTheAnnals of Statistics vol 41 no 3 pp 1111ndash11412013
[14] M Yuan and Y Lin ldquoModel selection and estimation in regres-sion with grouped variablesrdquo Journal of the Royal StatisticalSociety Series B StatisticalMethodology vol 68 no 1 pp 49ndash672006
[15] R Jenatton J-Y Audibert and F Bach ldquoStructured variableselection with sparsity-inducing normsrdquo Journal of MachineLearning Research vol 12 no 10 pp 2777ndash2824 2011
[16] P Radchenko and G M James ldquoVariable selection usingadaptive nonlinear interaction structures in high dimensionsrdquoJournal of the American Statistical Association vol 105 no 492pp 1541ndash1553 2010
[17] F Bach R Jenatton J Mairal and G Obozinski ldquoStructuredsparsity through convex optimizationrdquo Statistical Science vol27 no 4 pp 450ndash468 2012
[18] I Ruczinski C Kooperberg and M LeBlanc ldquoLogic regres-sionrdquo Journal of Computational and Graphical Statistics vol 12no 3 pp 475ndash511 2003
[19] P Hall and J-H Xue ldquoOn selecting interacting features fromhigh-dimensional datardquo Computational Statistics amp Data Anal-ysis vol 71 pp 694ndash708 2014
[20] J-J Wang J Li T Zhang and W-X Hong ldquoDistinguishingvisual feature extraction method using quadratic map andgenetic algorithmrdquo Journal of System Simulation vol 21 no 16pp 5080ndash5083 2009
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Mathematical Problems in Engineering 9
=120597 (1119873)sum
119873
119894=1[119910119894(119909119879119894120573 + (12) 119909119879
119894Θ119909119894) + ln (1 minus 119901
119894)]
120597Θ
=1
119873
119873
sum119894=1
[(1
2119910119894119909119879119894119909119894)119879
minusexp (119909119879
119894120573 + (12) 119909119879
119894Θ119909119894)
1 + exp (119909119879119894120573 + (12) 119909119879
119894Θ119909119894)
sdot1
2sdot 119909119879119894119909119894]
=1
119873
119873
sum119894=1
(1
2119910119894119909119879119894119909119894minus1
2119901119894119909119879119894119909119894)
=1
2119873
119873
sum119894=1
119909119879119894119909119894(119910119894minus 119901119894)
1205972119871119876
120597Θ2
=120597
120597Θ
1
2119873
119873
sum119894=1
119909119879119894119909119894(119910119894minus 119901119894) = minus
1
2119873
119873
sum119894=1
119909119879119894119909119894
120597119901119894
120597Θ
=1
2119873
119873
sum119894=1
exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
[1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)]2
sdot (minus1
2119909119879119894119909119894) sdot 119909119879119894119909119894
= minus1
4119873
119873
sum119894=1
exp (119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
sdot1
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)sdot1003817100381710038171003817119909119894
10038171003817100381710038172
= minus1
4119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot1003817100381710038171003817119909119894
10038171003817100381710038172
1205972119871119876
120597120573120597Θ
=120597
120597120573
1
119873
119873
sum119894=1
119909119879119894(119910119894minus 119901119894) = minus
1
119873
119873
sum119894=1
119909119879119894
120597119901119894
120597Θ
=1
119873
119873
sum119894=1
119909119879119894
exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
[1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)]2
sdot (minus1
2119909119879119894119909119894)
= minus1
119873
119873
sum119894=1
119909119879119894
exp (119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
sdot1
1 + exp (minus119909119879119894120573 minus (12) 119909119879
119894Θ119909119894)
= minus1
2119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894sdot 119909119879119894
(A2)
Then (A1) is expanded by using Taylor series with respectto the expended point (120573 Θ)
119897119876(120573 Θ)
= 119871119876+ (120573 minus 120573) sdot
120597119871119876
120597120573+ (Θ minus Θ) sdot
120597119871119876
120597Θ
+1
2[(120573 minus 120573) sdot
120597119871119876
1205971205732+ 2 (120573 minus 120573)
119879
(Θ minus Θ)1205972119871119876
120597120573120597Θ
+ (Θ minus Θ) sdot120597119871119876
120597Θ2]
=1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
+ (120573 minus 120573) sdot1
119873
119873
sum119894=1
119909119894(119910119894minus 119901119894)
+ (Θ minus Θ) sdot1
2119873
119873
sum119894=1
119909119879119894119909119894(119910119894minus 119901119894)
minus1
2(120573 minus 120573) sdot
1
119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894
minus 2 (120573 minus 120573)119879
(Θ minus Θ)
sdot1
2119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot 119909119879119894sdot 119909119894sdot 119909119879119894
minus (Θ minus Θ) sdot1
4119873
119873
sum119894=1
(1 minus 119901119894) sdot 119901119894sdot1003817100381710038171003817119909119894
10038171003817100381710038172
= minus1
2119873
119873
sum119894=1
(1 minus 119901119894)
sdot 119901119894[119909119879119894(120573 minus 120573)]
2
+ (120573 minus 120573)119879
(Θ minus Θ) 119909119894
sdot 119909119879119894119909119894+1
4[119909119879119894(Θ minus Θ) 119909
119894]2
+1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
+1
119873
119873
sum119894=1
119909119879119894(120573 minus 120573) (119910
119894minus 119901119894)
+1
2119873
119873
sum119894=1
119909119879119894(Θ minus Θ) 119909
119894(119910119894minus 119901119894)
= minus1
2119873
119873
sum119894=1
(1 minus 119901119894)
10 Mathematical Problems in Engineering
sdot 119901119894119909119879119894(120573 minus 120573)
+1
2119909119879119894(Θ minus Θ) 119909
1198942
+1
119873
119873
sum119894=1
119909119879119894(120573 minus 120573) (119910
119894minus 119901119894)
+1
2119873
119873
sum119894=1
119909119879119894(Θ minus Θ) 119909
119894(119910119894minus 119901119894)
+1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
= minus1
2119873
119873
sum119894=1
(1 minus 119901119894)
sdot 119901119894119909119879119894(120573 minus 120573) +
1
2119909119879119894(Θ minus Θ) 119909
119894
+119910119894minus 119901119894
119901119894[1 minus 119901
119894]
2
+1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
minus1
2119873
119873
sum119894=1
[119910119894minus 119901119894
119901119894[1 minus 119901
119894]]
2
= minus1
2119873
119873
sum119894=1
120596119894(119911119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896)
2
+1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
minus1
2119873
119873
sum119894=1
[119910119894minus 119901119894
119901119894[1 minus 119901
119894]]
2
(A3)
where
120596119894= 119901 (119909
119894) [1 minus 119901 (119909
119894)]
119911119894= sum119895
120573119895119909119894119895+1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896
+119910119894minus 119901 (119909
119894)
119901 (119909119894) [1 minus 119901 (119909
119894)]
(A4)
B Proofs from (12) to (13)
First there are three cases in calculating 120573+119895minus 120573minus119895
(1) 120573+119895ge 0 120573minus
119895= 0
120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+
119895minus 120573minus119895minus1205821+ 119895
120596119894
(B1)
(2) 120573+119895ge 0 120573minus
119895= 0
120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+
119895minus 120573minus119895+1205821+ 119895
120596119894
(B2)
(3) 120573+119895ge 0 120573minus
119895= 0
120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+
119895minus 120573minus119895 (B3)
We derive
120573+119895minus 120573minus119895= 119878 [119909
119894119895(119911119894minus 119860119894) + 120573+119895minus 120573minus1198951205821+ 119895
120596119894
] (B4)
Secondly we give the following by the stationary condi-tion 120597119871120597Θ
119895119896= 0
1
2120596119894sdot 2(119911
119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896)
sdot (minus1
2119909119894119895119909119894119896) + (120582
2+ 120572119895)119880119895119896= 0
120596119894sdot (119911119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896)
sdot (1
2119909119894119895119909119894119896) = (120582
2+ 120572119895)119880119895119896
(119911119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896)
sdot (1
2119909119894119895119909119894119896) =
(1205822+ 120572119895)119880119895119896
120596119894
(B5)
Supposing
120574(minus119895119896) = 119911119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896+ Θ119895119896119909119894119895119909119894119896 (B6)
we have
(120574(minus119895119896) minus Θ119895119896119909119894119895119909119894119896) sdot (119909119894119895119909119894119896)
=2 (1205822+ 120572119895)119880119895119896
120596119894
Θ119895119896119909119894119895119909119894119896119909119894119895119909119894119896
= minus2 (1205822+ 120572119895)119880119895119896
120596119894
+ 120574(minus119895119896) sdot 119909119894119895119909119894119896
Mathematical Problems in Engineering 11
Θ119895119896
=120574(minus119895119896) sdot 119909
119894119895119909119894119896minus 2 (120582
2+ 120572119895)119880119895119896120596119894
(119909119894119895119909119894119896)2
(B7)
We discuss three cases about Θ119895119896value
(1) Θ119895119896gt 0 119880
119895119896= 1
Θ119895119896=120574(minus119895119896) sdot 119909
119894119895119909119894119896minus 2 (120582
2+ 120572119895) 120596119894
(119909119894119895119909119894119896)2
(B8)
(2) Θ119895119896lt 0 119880
119895119896= minus1
Θ119895119896=120574(minus119895119896) sdot 119909
119894119895119909119894119896+ 2 (120582
2+ 120572119895) 120596119894
(119909119894119895119909119894119896)2
(B9)
(3) Θ119895119896= 0
We drive
Θ119895119896=119878 [120574(minus119895119896) sdot 119909
119894119895119909119894119896 2 (1205822+ 120572119895) 120596119894]
(119909119894119895119909119894119896)2
(B10)
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work was supported by the National Natural ScienceFoundation of China (nos 61273019 61473339) China Post-doctoral Science Foundation (2014M561202) Hebei Postdoc-toral Science Foundation Special Fund Project and HebeiTop Young Talents Support Program
References
[1] R Tibshirani ldquoRegression shrinkage and selection via the lassordquoJournal of the Royal Statistical Society B vol 58 no 1 pp 267ndash288 1996
[2] M Y Park and T Hastie ldquo1198711-regularization path algorithm
for generalized linear modelsrdquo Journal of the Royal StatisticalSociety B Statistical Methodology vol 69 no 4 pp 659ndash6772007
[3] T TWu Y F Chen THastie E Sobel andK Lange ldquoGenome-wide association analysis by lasso penalized logistic regressionrdquoBioinformatics vol 25 no 6 pp 714ndash721 2009
[4] J Friedman T Hastie and R Tibshirani ldquoRegularization pathsfor generalized linear models via coordinate descentrdquo Journal ofStatistical Software vol 33 no 1 pp 1ndash22 2010
[5] M Lim and T Hastie ldquoLearning interactions via hierarchicalgroup-lasso regularization[EB]rdquo httpwwwstanfordedusimhastiePapersglinternetpdf
[6] H Schwender and K Ickstadt ldquoIdentification of SNP interac-tions using logic regressionrdquo Biostatistics vol 9 no 1 pp 187ndash198 2008
[7] S Noah and R Tibshirani ldquoA Permutation Approach to TestingInteractions inManyDimensionsrdquo httpstatwebstanfordedusimtibsresearchhtml
[8] JWu BDevlin S RingquistM Trucco andK Roeder ldquoScreenand clean a tool for identifying interactions in genome-wideassociation studiesrdquo Genetic Epidemiology vol 34 no 3 pp275ndash285 2010
[9] Y Nardi and A Rinaldo ldquoThe log-linear group-lasso estimatorand its asymptotic propertiesrdquo Bernoulli vol 18 no 3 pp 945ndash974 2012
[10] M Yuan V R Joseph and Y Lin ldquoAn efficient variable selectionapproach for analyzing designed experimentsrdquo Technometricsvol 49 no 4 pp 430ndash439 2007
[11] H Chipman ldquoBayesian variable selection with related predic-torsrdquoThe Canadian Journal of Statistics vol 24 no 1 pp 17ndash361996
[12] N H Choi W Li and J Zhu ldquoVariable selection with thestrong heredity constraint and its oracle propertyrdquo Journal of theAmerican Statistical Association vol 105 no 489 pp 354ndash3642010
[13] J Bien J Taylor and R Tibshirani ldquoA lasso for hierarchicalinteractionsrdquoTheAnnals of Statistics vol 41 no 3 pp 1111ndash11412013
[14] M Yuan and Y Lin ldquoModel selection and estimation in regres-sion with grouped variablesrdquo Journal of the Royal StatisticalSociety Series B StatisticalMethodology vol 68 no 1 pp 49ndash672006
[15] R Jenatton J-Y Audibert and F Bach ldquoStructured variableselection with sparsity-inducing normsrdquo Journal of MachineLearning Research vol 12 no 10 pp 2777ndash2824 2011
[16] P Radchenko and G M James ldquoVariable selection usingadaptive nonlinear interaction structures in high dimensionsrdquoJournal of the American Statistical Association vol 105 no 492pp 1541ndash1553 2010
[17] F Bach R Jenatton J Mairal and G Obozinski ldquoStructuredsparsity through convex optimizationrdquo Statistical Science vol27 no 4 pp 450ndash468 2012
[18] I Ruczinski C Kooperberg and M LeBlanc ldquoLogic regres-sionrdquo Journal of Computational and Graphical Statistics vol 12no 3 pp 475ndash511 2003
[19] P Hall and J-H Xue ldquoOn selecting interacting features fromhigh-dimensional datardquo Computational Statistics amp Data Anal-ysis vol 71 pp 694ndash708 2014
[20] J-J Wang J Li T Zhang and W-X Hong ldquoDistinguishingvisual feature extraction method using quadratic map andgenetic algorithmrdquo Journal of System Simulation vol 21 no 16pp 5080ndash5083 2009
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
10 Mathematical Problems in Engineering
sdot 119901119894119909119879119894(120573 minus 120573)
+1
2119909119879119894(Θ minus Θ) 119909
1198942
+1
119873
119873
sum119894=1
119909119879119894(120573 minus 120573) (119910
119894minus 119901119894)
+1
2119873
119873
sum119894=1
119909119879119894(Θ minus Θ) 119909
119894(119910119894minus 119901119894)
+1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
= minus1
2119873
119873
sum119894=1
(1 minus 119901119894)
sdot 119901119894119909119879119894(120573 minus 120573) +
1
2119909119879119894(Θ minus Θ) 119909
119894
+119910119894minus 119901119894
119901119894[1 minus 119901
119894]
2
+1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
minus1
2119873
119873
sum119894=1
[119910119894minus 119901119894
119901119894[1 minus 119901
119894]]
2
= minus1
2119873
119873
sum119894=1
120596119894(119911119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896)
2
+1
119873
119873
sum119894=1
[119910119894(119909119879119894120573 +
1
2119909119879119894Θ119909119894) + ln (1 minus 119901
119894)]
minus1
2119873
119873
sum119894=1
[119910119894minus 119901119894
119901119894[1 minus 119901
119894]]
2
(A3)
where
120596119894= 119901 (119909
119894) [1 minus 119901 (119909
119894)]
119911119894= sum119895
120573119895119909119894119895+1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896
+119910119894minus 119901 (119909
119894)
119901 (119909119894) [1 minus 119901 (119909
119894)]
(A4)
B Proofs from (12) to (13)
First there are three cases in calculating 120573+119895minus 120573minus119895
(1) 120573+119895ge 0 120573minus
119895= 0
120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+
119895minus 120573minus119895minus1205821+ 119895
120596119894
(B1)
(2) 120573+119895ge 0 120573minus
119895= 0
120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+
119895minus 120573minus119895+1205821+ 119895
120596119894
(B2)
(3) 120573+119895ge 0 120573minus
119895= 0
120573+119895minus 120573minus119895= 119909119879119895(119911119894minus 119860) + 120573+
119895minus 120573minus119895 (B3)
We derive
120573+119895minus 120573minus119895= 119878 [119909
119894119895(119911119894minus 119860119894) + 120573+119895minus 120573minus1198951205821+ 119895
120596119894
] (B4)
Secondly we give the following by the stationary condi-tion 120597119871120597Θ
119895119896= 0
1
2120596119894sdot 2(119911
119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896)
sdot (minus1
2119909119894119895119909119894119896) + (120582
2+ 120572119895)119880119895119896= 0
120596119894sdot (119911119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896)
sdot (1
2119909119894119895119909119894119896) = (120582
2+ 120572119895)119880119895119896
(119911119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896)
sdot (1
2119909119894119895119909119894119896) =
(1205822+ 120572119895)119880119895119896
120596119894
(B5)
Supposing
120574(minus119895119896) = 119911119894minussum119895
120573119895119909119894119895minus1
2sum119895 =119896
Θ119895119896119909119894119895119909119894119896+ Θ119895119896119909119894119895119909119894119896 (B6)
we have
(120574(minus119895119896) minus Θ119895119896119909119894119895119909119894119896) sdot (119909119894119895119909119894119896)
=2 (1205822+ 120572119895)119880119895119896
120596119894
Θ119895119896119909119894119895119909119894119896119909119894119895119909119894119896
= minus2 (1205822+ 120572119895)119880119895119896
120596119894
+ 120574(minus119895119896) sdot 119909119894119895119909119894119896
Mathematical Problems in Engineering 11
Θ119895119896
=120574(minus119895119896) sdot 119909
119894119895119909119894119896minus 2 (120582
2+ 120572119895)119880119895119896120596119894
(119909119894119895119909119894119896)2
(B7)
We discuss three cases about Θ119895119896value
(1) Θ119895119896gt 0 119880
119895119896= 1
Θ119895119896=120574(minus119895119896) sdot 119909
119894119895119909119894119896minus 2 (120582
2+ 120572119895) 120596119894
(119909119894119895119909119894119896)2
(B8)
(2) Θ119895119896lt 0 119880
119895119896= minus1
Θ119895119896=120574(minus119895119896) sdot 119909
119894119895119909119894119896+ 2 (120582
2+ 120572119895) 120596119894
(119909119894119895119909119894119896)2
(B9)
(3) Θ119895119896= 0
We drive
Θ119895119896=119878 [120574(minus119895119896) sdot 119909
119894119895119909119894119896 2 (1205822+ 120572119895) 120596119894]
(119909119894119895119909119894119896)2
(B10)
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work was supported by the National Natural ScienceFoundation of China (nos 61273019 61473339) China Post-doctoral Science Foundation (2014M561202) Hebei Postdoc-toral Science Foundation Special Fund Project and HebeiTop Young Talents Support Program
References
[1] R Tibshirani ldquoRegression shrinkage and selection via the lassordquoJournal of the Royal Statistical Society B vol 58 no 1 pp 267ndash288 1996
[2] M Y Park and T Hastie ldquo1198711-regularization path algorithm
for generalized linear modelsrdquo Journal of the Royal StatisticalSociety B Statistical Methodology vol 69 no 4 pp 659ndash6772007
[3] T TWu Y F Chen THastie E Sobel andK Lange ldquoGenome-wide association analysis by lasso penalized logistic regressionrdquoBioinformatics vol 25 no 6 pp 714ndash721 2009
[4] J Friedman T Hastie and R Tibshirani ldquoRegularization pathsfor generalized linear models via coordinate descentrdquo Journal ofStatistical Software vol 33 no 1 pp 1ndash22 2010
[5] M Lim and T Hastie ldquoLearning interactions via hierarchicalgroup-lasso regularization[EB]rdquo httpwwwstanfordedusimhastiePapersglinternetpdf
[6] H Schwender and K Ickstadt ldquoIdentification of SNP interac-tions using logic regressionrdquo Biostatistics vol 9 no 1 pp 187ndash198 2008
[7] S Noah and R Tibshirani ldquoA Permutation Approach to TestingInteractions inManyDimensionsrdquo httpstatwebstanfordedusimtibsresearchhtml
[8] JWu BDevlin S RingquistM Trucco andK Roeder ldquoScreenand clean a tool for identifying interactions in genome-wideassociation studiesrdquo Genetic Epidemiology vol 34 no 3 pp275ndash285 2010
[9] Y Nardi and A Rinaldo ldquoThe log-linear group-lasso estimatorand its asymptotic propertiesrdquo Bernoulli vol 18 no 3 pp 945ndash974 2012
[10] M Yuan V R Joseph and Y Lin ldquoAn efficient variable selectionapproach for analyzing designed experimentsrdquo Technometricsvol 49 no 4 pp 430ndash439 2007
[11] H Chipman ldquoBayesian variable selection with related predic-torsrdquoThe Canadian Journal of Statistics vol 24 no 1 pp 17ndash361996
[12] N H Choi W Li and J Zhu ldquoVariable selection with thestrong heredity constraint and its oracle propertyrdquo Journal of theAmerican Statistical Association vol 105 no 489 pp 354ndash3642010
[13] J Bien J Taylor and R Tibshirani ldquoA lasso for hierarchicalinteractionsrdquoTheAnnals of Statistics vol 41 no 3 pp 1111ndash11412013
[14] M Yuan and Y Lin ldquoModel selection and estimation in regres-sion with grouped variablesrdquo Journal of the Royal StatisticalSociety Series B StatisticalMethodology vol 68 no 1 pp 49ndash672006
[15] R Jenatton J-Y Audibert and F Bach ldquoStructured variableselection with sparsity-inducing normsrdquo Journal of MachineLearning Research vol 12 no 10 pp 2777ndash2824 2011
[16] P Radchenko and G M James ldquoVariable selection usingadaptive nonlinear interaction structures in high dimensionsrdquoJournal of the American Statistical Association vol 105 no 492pp 1541ndash1553 2010
[17] F Bach R Jenatton J Mairal and G Obozinski ldquoStructuredsparsity through convex optimizationrdquo Statistical Science vol27 no 4 pp 450ndash468 2012
[18] I Ruczinski C Kooperberg and M LeBlanc ldquoLogic regres-sionrdquo Journal of Computational and Graphical Statistics vol 12no 3 pp 475ndash511 2003
[19] P Hall and J-H Xue ldquoOn selecting interacting features fromhigh-dimensional datardquo Computational Statistics amp Data Anal-ysis vol 71 pp 694ndash708 2014
[20] J-J Wang J Li T Zhang and W-X Hong ldquoDistinguishingvisual feature extraction method using quadratic map andgenetic algorithmrdquo Journal of System Simulation vol 21 no 16pp 5080ndash5083 2009
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Mathematical Problems in Engineering 11
Θ119895119896
=120574(minus119895119896) sdot 119909
119894119895119909119894119896minus 2 (120582
2+ 120572119895)119880119895119896120596119894
(119909119894119895119909119894119896)2
(B7)
We discuss three cases about Θ119895119896value
(1) Θ119895119896gt 0 119880
119895119896= 1
Θ119895119896=120574(minus119895119896) sdot 119909
119894119895119909119894119896minus 2 (120582
2+ 120572119895) 120596119894
(119909119894119895119909119894119896)2
(B8)
(2) Θ119895119896lt 0 119880
119895119896= minus1
Θ119895119896=120574(minus119895119896) sdot 119909
119894119895119909119894119896+ 2 (120582
2+ 120572119895) 120596119894
(119909119894119895119909119894119896)2
(B9)
(3) Θ119895119896= 0
We drive
Θ119895119896=119878 [120574(minus119895119896) sdot 119909
119894119895119909119894119896 2 (1205822+ 120572119895) 120596119894]
(119909119894119895119909119894119896)2
(B10)
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work was supported by the National Natural ScienceFoundation of China (nos 61273019 61473339) China Post-doctoral Science Foundation (2014M561202) Hebei Postdoc-toral Science Foundation Special Fund Project and HebeiTop Young Talents Support Program
References
[1] R Tibshirani ldquoRegression shrinkage and selection via the lassordquoJournal of the Royal Statistical Society B vol 58 no 1 pp 267ndash288 1996
[2] M Y Park and T Hastie ldquo1198711-regularization path algorithm
for generalized linear modelsrdquo Journal of the Royal StatisticalSociety B Statistical Methodology vol 69 no 4 pp 659ndash6772007
[3] T TWu Y F Chen THastie E Sobel andK Lange ldquoGenome-wide association analysis by lasso penalized logistic regressionrdquoBioinformatics vol 25 no 6 pp 714ndash721 2009
[4] J Friedman T Hastie and R Tibshirani ldquoRegularization pathsfor generalized linear models via coordinate descentrdquo Journal ofStatistical Software vol 33 no 1 pp 1ndash22 2010
[5] M Lim and T Hastie ldquoLearning interactions via hierarchicalgroup-lasso regularization[EB]rdquo httpwwwstanfordedusimhastiePapersglinternetpdf
[6] H Schwender and K Ickstadt ldquoIdentification of SNP interac-tions using logic regressionrdquo Biostatistics vol 9 no 1 pp 187ndash198 2008
[7] S Noah and R Tibshirani ldquoA Permutation Approach to TestingInteractions inManyDimensionsrdquo httpstatwebstanfordedusimtibsresearchhtml
[8] JWu BDevlin S RingquistM Trucco andK Roeder ldquoScreenand clean a tool for identifying interactions in genome-wideassociation studiesrdquo Genetic Epidemiology vol 34 no 3 pp275ndash285 2010
[9] Y Nardi and A Rinaldo ldquoThe log-linear group-lasso estimatorand its asymptotic propertiesrdquo Bernoulli vol 18 no 3 pp 945ndash974 2012
[10] M Yuan V R Joseph and Y Lin ldquoAn efficient variable selectionapproach for analyzing designed experimentsrdquo Technometricsvol 49 no 4 pp 430ndash439 2007
[11] H Chipman ldquoBayesian variable selection with related predic-torsrdquoThe Canadian Journal of Statistics vol 24 no 1 pp 17ndash361996
[12] N H Choi W Li and J Zhu ldquoVariable selection with thestrong heredity constraint and its oracle propertyrdquo Journal of theAmerican Statistical Association vol 105 no 489 pp 354ndash3642010
[13] J Bien J Taylor and R Tibshirani ldquoA lasso for hierarchicalinteractionsrdquoTheAnnals of Statistics vol 41 no 3 pp 1111ndash11412013
[14] M Yuan and Y Lin ldquoModel selection and estimation in regres-sion with grouped variablesrdquo Journal of the Royal StatisticalSociety Series B StatisticalMethodology vol 68 no 1 pp 49ndash672006
[15] R Jenatton J-Y Audibert and F Bach ldquoStructured variableselection with sparsity-inducing normsrdquo Journal of MachineLearning Research vol 12 no 10 pp 2777ndash2824 2011
[16] P Radchenko and G M James ldquoVariable selection usingadaptive nonlinear interaction structures in high dimensionsrdquoJournal of the American Statistical Association vol 105 no 492pp 1541ndash1553 2010
[17] F Bach R Jenatton J Mairal and G Obozinski ldquoStructuredsparsity through convex optimizationrdquo Statistical Science vol27 no 4 pp 450ndash468 2012
[18] I Ruczinski C Kooperberg and M LeBlanc ldquoLogic regres-sionrdquo Journal of Computational and Graphical Statistics vol 12no 3 pp 475ndash511 2003
[19] P Hall and J-H Xue ldquoOn selecting interacting features fromhigh-dimensional datardquo Computational Statistics amp Data Anal-ysis vol 71 pp 694ndash708 2014
[20] J-J Wang J Li T Zhang and W-X Hong ldquoDistinguishingvisual feature extraction method using quadratic map andgenetic algorithmrdquo Journal of System Simulation vol 21 no 16pp 5080ndash5083 2009
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of