fuzzy class logistic regression analysis · regression mixture analysis with a fuzzy class model....

International Journal of Uncertainty, % % ^Fuzziness and Knowledge-Based Systems « | ^ World ScientificVol. 12, No. 6 (2004) 761-780 * • www.worldscienlitic.com

© World Scientific Publishing Company

FUZZY CLASS LOGISTIC REGRESSION ANALYSIS

MIIN-SHEN YANG* and HWEI-MING GHEN

Department of Applied Mathematics, Chung Yuan Christian University,Chung-Li, Taiwan 32023, R.O.G.

*msyang@math. cycu. edu. tw

Received December 2002Revised 11 June 2004

Distribution mixtures are used as models to analyze grouped data. The estimation ofparameters is an important step for mixture distributions. The latent class model isgenerally used as the analysis of mixture distributions for discrete data. In this paper,we consider the parameter estimation for a mixture of logistic regression models. Weknow that the expectation maximization (EM) algorithm Wcis most used for estimatingthe parameters of logistic regression mixture models. In this paper, we propose a newtype of fuzzy class model and then derive an algorithm for the parameter estimationof a fuzzy class logistic regression model. The effects of the explanatory variables onthe response variables Eire described. The focus is on binary responses for the logisticregression mixture analysis with a fuzzy class model. An algorithm, called a fuzzy CIEIS-sification maximum likelihood (FGML), is then created. The mean squared error (MSE)based accuracy criterion for the FGML and EM algorithms to the parameter estimationof logistic regression mixture models are compared using the samples drawn from logis-tic regression mixtures of two classes. Numerical results show that the proposed FGMLalgorithm presents good eiccuracy and is recommended as a new tool for the parameterestimation of the logistic regression mixture models.

Keywords: Fuzzy set; latent class; fuzzy class; fuzzy clustering; generalized linecir model;mixture logistic regression model; parameter estimation.

1. IntroductionThe latent class model was first proposed in the 1950s by Lazarsfeld [15] and An-derson [2]. It has been widely used in many areas, for example, in psychology,sociology and economics (see Everitt [8]). The latent class model has also beeneffectively used for the analysis of grouped categorical data. Distribution mixturesare used as models in statistical studies and also have applications in clustering (seeEveritt and Hand [10] and McLachlan and Ba,sford [19]). A mixture of multivariatemultinomial distributions is used as a latent class model for categorical data.

761

762 M.S. Yang & H.-M. Ghen

Since Zadeh [27] proposed fuzzy set theory which produced the idea of partialmembership described by a membership function, fuzziness has received more at-tention. The use of fuzzy sets can provide a method for extending a latent classmodel into a fuzzy class model. This allows embedding fuzzy classification methodsinto fuzzy class models for categorical or discrete data. Recently, Yang and Yu[25] proposed an approach to estimate the parameters in fuzzy class models usinga fiizzy clustering algorithm for the analysis of grouped categorical data. Theydemonstrated that their algorithm for estimating the parameters of multivariatemultinomial mixtures based on fuzzy class models is more accuracy and robustnessthan those based on latent class models.

In this paper, we will propose a fuzzy class logistic regression model and thenderive an algorithm for this regression analysis to describe the effects of the explana-tory variables on the response variables. This paper focuses on binary responses forlogistic regression mixture analysis with latent and fuzzy class models. The remain-der of the paper is organized as follows. Section 2 introduces the logistic regressionmodel based on the logit transformation of a proportion which is the most impor-tant one of the generalized linear model. In Section 3, we review the latent classlogistic regression model and also give its most used Expectation-Maximazation(EM) algorithm. In Section 4, we propose the fuzzy logistic regression model andthen derive its corresponding algorithm called a fuzzy classification maximum hke-lihood (FCML). Section 5 gives numerical comparisons between the EM and FCMLalgorithms. Conclusions are made in Section 6.

2. Logistic regression modelIn this section, we describe a logistic regression model with the explanatory variableseffects on the response variables. Let us consider the Bernoulli distribution for abinary response variable Y with probabilities P{Y = 1) = p, P{Y = 0) = 1 - p.The response variable Y has the probability density function of the form

f{y;p) = py(i-py-y^[i-p

= (l-p)exp[yhi j ^ - l

which is in the form of a natural exponential family with the natural parameterQ{p) = ln( j ^ ) , called the logit of p. A generalized linear model using the logit linkis called a logit model.

For a binary response variable Y, the regression model E{Y) = p(x) = ^ xiPi1=1

with the covariate vector x = (xi,...,it) is called a linear probability model. Afunction having the following form

is called a logistic regression function (see Agresti [1]). The principal objective isto investigate the relationship between the response probability p and the vector ofcovariates x. In other words, the probability p of the response variable F = 1 is a

Fazzy Class Logistic Regression Analysis 763

function of x and will be denoted by p(x) . Thus, we have the odds of the responsevariable Y = 1

' 1=1

and the log odds have a linear relationship with

^ \ > (=1

The appropriate link is the log odds transformation, called the logit link. Thislogit model is usually referred to as the logistic regression model (see Jobson [13]).In next sections, we will consider this logistic regression with latent and fuzzy classmodels and then provide algorithms for the estimation of parameters of these latentand fuzzy class logistic regression models.

3. Latent class logistic regression model and EM algorithmThe response variables Yi are considered to be the results of events, measuredin a sample of n individuals. They can be classified into two mutually exclusivecategories A and B. The associated probabilities of each category are p and (1 — p)respectively. For example, when we answer any question from a questionnaire, thecategories A and B might represent the results Yes and No for each question. Wecan define the variable Yi to indicate the two categories by letting Yi = \ for Yesand yj = 0 for No. We may assume that there are t independent covariate variablesthat will affect the results of events. Because we want to assess the effects of thecovariate variables on the binary response variables, we assume that the observationsare drawn from a finite Bernoulli distribution mixture. The point masses of thisdistribution can be then interpreted as latent classes of individuals (see Desarbo etal. [7]). We assume that the mixture distribution consists of one latent variablewith c unknown latent classes.

Let the indexesi = 1, ...,n denote individuals;fc = 1, ...,c denote latent classes;l — \,...,t denote covariate variables.

Let Y = {y\,Y2,...,Fn) be the Bernoulli reindom variables. The latent class logisticregression model can then be constructed as follows (see DeSarbo et al. [6,7],Agresti [1]). It is assumed that each individual follows a distribution that is amixture of finite number of c classes in proportion a = {OL\, ..., a^, where it is notknown in advance that each individual will arise from which class and the constraintX]fc=i a*; = 1 is submitted. Now, the conditional probability density function ofeach individual Yi = yi coming from class fc is

/t|fc(yt|Pfc) = (Pfe)"* (1 - Pkf""' = (1 - Pk) exp(yi ln j - ^ — ) (1)

where pk gives the probability of the individual i in class fc. Let pk be an expectedvalue for Yi, conditional on class fc with E{Yi) = pk for i = l,...,n. Comparing


equation (1) with the following general form in class fc

we have

o-iPk) = 1 - Pfc

b{yi) = 11 PkIn .

1 -Pfc

We specify a linear predictor Tjk and a link function g{-) in class fc with

Let X = {xi,...,Xt) be the covariate variable and let /3 = {0i,02,--,Pc) be anunknown coefficient with /3fc = {0ki) in class fc. The linear predictor is produced bythe t covariates and the parameter vectors f3k in class fc with

1=1

For each individual i, the linear predictor rjki and a link function ff(.)in class fc is

Vki = gipkixi)) (2)

where Xj = {xii,...,Xit). It leads to

t

(3)1=1

Thus, conditional on class fc, a generalized linear model consists of a specificationof the Bernoulli of a random variable Yi, a linear predictor rjki and a function g{-)that links the random and systematic components. According to MuCullagh andNelder [18], we CEin get a canonical link with Q(pk{xi)) = Vki- Then

and

Fazzy Glass Logistic Regression Analysis 765

We then have

Pfci - Pk[Xt) -

According to the above result, we have extended pt as a function of the covariatevariables and with pki =Pk{xi). Equation (1) can then be rewritten as follows:

/t|fc(yi|Pfc) = (l-pfci)exp(yilD^'" ). (6)1 - Pfci

We express the probability density function (6) in the following form:

fi\kiyi\0k) = exp{yi Y^ XiiPki - hi[l + e x p ( ^ xufiki)]) (7)1=1 1=1

which is conditional on class fc for each individual J/J. The unconditional probabilitydensity function of Yi can therefore be expressed in a finite mixture form (see Everittand Hand [10] with

f{yi\a,P) = Y,akfi\k{yim (8)fc=i

under the constraint YL'k=i O'fc — ^•To estimate the parameters ajtand /3fc, given yi and xu, we formulate the likeli-

hood function for a and /3 as follows:

L{a,l3\Y = y) =1=1 t=ifc=in e t

i=lfc=l 1=1 1=1

The estimates of a and 0 can be obtained by maximizing the likelihood function (9)or log-likelihood lnL with respect to a and P, subject to the constraint YTk=i ^k =1. However, to directly maximize the likelihood function (9) or lnL is complicatedand difficult. It is known that the EM algorithm had been used as an effectivemethod for approximating maximum likelihood estimates of the parameters in latentclass models (e.g. Celeux and Govaert [4], Everitt [9]). Next, we will derive the EMalgorithm for latent logistic regression models. Its derivation is similar to Desarboand Wedel [6] about a mixture likelihood approach for generalized linear models.

In the EM algorithm, we introduce non-observed variables Zki which indicatethat if the individual i belongs to latent class fc, then Zki = 1 and Zki = 0 otherwise.It is assumed that Zki are independently and identically distributed (iid) multinomialdistribution with probabilities ak as follows:

fc=l


where the vector z, = {zu,Z2i,...,Zd)' and a = (ai,...,ac)'. In general, thevectors {zi,...,Zc) are called the latent class variables. We denote the matrixZ = {zi,...,Zn)' and the data matrix X == (xi,...,xt), where xi = (xii,X2i,...,ini)'.Furthermore, it is assumed that yi given Zj are conditionally independent, and thatyi given Zi has the probability with

fc=i

When Zki are considered as missing data, a complete likelihood function for Y andZ can be formulated as follows:

L{a,l3\Y,Z) =

t = i

i = l fc=l fc=l

i = l fc=l

The log-likelihood function of equation (10) is

fc=l i = l fc=l t = l

An iterative algorithm is applied to approximate the maximization for the completelog-likelihood function (11) subject to the constraint YTk=i (^k = I- The algorithmincludes two steps, called E(expectation) and M(maximization), so that it is calledthe EM algorithm.

In E-step, because the latent class {zi,..., Zc} is missing, we then use the expec-tation E{zki\Y,a,l3) to estimate the Zki. Because

which is identical to the posterior probability that the individual i arises from theclass fc of latent class variables via Bayes' rule. So, once estimates of Qfe and Pk are

Puzzy Class Logistic Regression Analysis 767

obtained, then the estimates of Zki can be obtained by

where fi\k{yi\(3k) is described as equation (7).In M-step, to majdmize the expectation of lnL with respect to ak and Pk , the

non-observed latent class variables {zi,...,Zc} in equation (11) are replaced by theirestimates Zki of equation (12). Thus,

fc=l i = l fc=l t = l

where Zki is obtained by equation (12). Maximizing equation (13) with respectto ak, subject to the constraint X)fe=i <fe = 1, is equivalent to maximizing theLagrajigian

c n c

y y Zki m Qfe — A(^ ak — 1)fc=it=i fc=i

where A is a Lagrange multiplier. We derive it with respect to ak as

^ ^ - A = 0.

We have

Note that 1 = YX=i " fc ~ X 13fc=i 23r=i ^ki = \- n. Thus, A = n is then obtained.Updating ak, we can get the current estimate of ak with

ti- (14)

Maximizing equation (13) with respect to /3k is equivalent to maximizing each ofthe fc expressions as

We then derive it withor* "

768 M.S. Yang & H.-M. Chen

that is a nonlinear function of Pki- There is no analytical solution for the estimate ofPk based on the equation gj*^ = 0. But we can show that the maximum-likelihoodestimate of the parameter Pk in the linear predictor r/fc can be obtained by iterativelyreweighted least squares which is similar to Desarbo and Wedel's [6] derivation ofa mixture likelihood approach for generalized linear models. Using the chain rule,the derivative with respect to Pki is

d]nfnkiyi\Pki)dpkii-

In the case of generalized linear models, it is convenient to express | ^ as a productwith

dpki'

Note first that the derivative of equation (6) with respect to pki is

d^ fi\k{yi\Pki) _ jyi

and according to equation (3), we have

Thus the first derivative with respect to Pki is

oo — / , ^ki 71 T9Pki ^ Pki{^ - Pki)

and the second derivative with respect to Pki' is

-Pfei(l -Pfci) - {y-Pki)[-Pki

The expectation of second derivative with negative sign is

where Wk is an n x n diagonal matrix of weights given by

According to equation (4), we have


Thus, the diagonal matrix of weights reduces to

Wk = diag[zkiPki{.l - Pki)]- (15)

Following the line of the general Newton-Raphson procedure, the estimate of Pkcan be obtained in the following way. The Newton-Raphson process with expectedderivatives for a random sample gives

A5pk = C

where Ais a,t xt matrix with

i=l

and C is an / X 1 vector with

The matrix A becomes jhe weighted products matrix of covariates with weightsand the new estimate Pk = P + SP shall satisfy the equation APk = APk + A6Pk

+ C. We then have

i-l

The new estimate Pk satisfies

n nkiXiiTiki +

n

i=l

Let

7fci =

= In-P^-i-{yi-Pki)—-^ r (16)1 P P ( l P )


where pfcj is described in equation (5). We then have {APk)i = "= i WkiXn'yki. Wewrite it to a matrix form with

where A = X WkX. We have

The estimate Pk is then obtained by

Pk = {X'WkX)-'x'Wklk (17)

where all quajitities appearing on the right are computed using the initials /3^ andZf^ . Thus the solution for (17) usually requires an iterative calculation. We some-how give initials P^°^ of Pk and 4°^of Zk to get wl°'' and j^^K We then solve (17)to get solution Pf. and use it to get WJ^ \JI ', and so on. We write the iterativesolution as

PJ;^ = {x'wl'^X)-'x'wl''\i'\ s = l , 2 , . . . (18)To summarize the above derivations for fitting a latent CIEISS logistic regressionmodel, we have the EM algorithm as follows:

EM algorithm

step 1: Fix 2 < c < n and fix any e > 0.Given initial values z° and /3j.° and let s = 1.

step 2: Compute p^^"^^with PI'~^^ by (5)

step 3: Compute a^^With z ."" ^ by (14)

step 4: Compute WJ;.'^ with 4*"^^ and pi,'""^'by (15)

step 5: Compute 7 ' with p^^~^^ by (16)

step 6: Update ^["^with WJf^ and 7] ^ by (18)

step 7: Update z^^^with a^'^and p^^ by (12)

step 8: Compare Zf. to 2 .*" in a convenient matrix norm.

If \zi'^ -zi'~^^\<e ,stop.Else s = s-i-l, return to step 2.

4. Fuzzy class logistic regression model and its cdgorithmIn Section 3, we had considered the latent class logistic regression model and itsEM algorithm. It uses the latent class variables zi,...,Zc which indicate that if theindividual i belongs to latent class k, then Zki — 1 and Zki = 0 otherwise. Let B bea data set and {Bi,- - -, Be} be a partition on the data set B. In fact, a partition ofB into c classes can be presented by the partition {Bi, • • •, Be} or equivalently bythe latent class variables {zi, ...,Zc} such that Zfe(y) = 1 if y is in B^ and Zfc(j/) = 0if y is not in Bk for all fc = 1, • • •, c. This is known as clustering B into c classes

Fuzzy Class Logistic Regression Analysis 771

using {zi,..., Zc} and called a hard c-partition of B. Thus, the latent class variables{zi,...,Zc} are considered to be non-overlapping clustering and crisp class variableswith a probability sense. However, in the social and behavioral sciences, someconcepts are not well defined that may have vagueness over the real meaning ofterms such as social classes and public opinions with overlapping clustering cases.To consider overlapping clustering with vagueness, it is better to extend latent classvariables to fuzzy class variables based on the fuzzy set concept.

The fuzzy set in Zadeh [27] indicated the ideal of partial membership whichwas described by a membership function. In a latent class model, the latent classvariables {zi,...,Zc} are indicator functions such that Zfc(j/) = 1 if the individualy belongs to the latent class k and Zkiy) = 0 otherwise. Consider the fuzzy ex-tension to allow fj,k{y) to be functions assuming values in the interval [0,1] suchthat Ylt=i Mfc(y) — 1 for all y. Thus, {/ i, ...,/ic} become fuzzy sets on the set Baccording to Zadeh [27]. Here we call {/ii, ...,/ic} the fuzzy class variables whichare the fuzzy extention of the latent class variables {zi, ...,Zc}. In fact, we extenda hard c-partition {2i,...,Zc} to a fuzzy c-partition {^i,- - - ,fic} of B. In fuzzyclustering literature, Ruspini [20] first applied the fuzzy c-partition {/xi,...,/Xc} forclustering. Now fuzzy clustering has been widely studied and applied in variousareas (see Bezdek et al. [3] and Yang [22]). Next, we shall extend the latent classlogistic regression model to the fuzzy class logistic regression model and then deriveits algorithm.

Consider a fuzzy c-partition {ni,- - - ,iic}- We now extend the log-likelihoodequation (11) of the latent class logistic regression model to the fuzzy class logisticregression model with

(19)fc=l i = l fc=l t = l

subject to ^^^1 afc = 1 and I3fc=i Mfct = 1 for all i = 1, ...,n with /.iki = fJ-kiyi) G[0,1]. According to the optimization problems in (11) and (19), the only differenceis that Zki G {0,1} in (11) but fiki e [0,1] in (19). Since equations (11) and (19) arelinear in Zki and Hki, the optimal solutions occur at the end point 0 or 1 in both (11)and (19). Therefore, the optimization problem in (19) should be equivalent to thatin (11). If we want to make a fuzzy extension of the optimization in Xi {fi,a,P),it is necessary to increase the power of iiki to /x^, for which m > 1 represents thedegree of fuzziness. Thus, we make the fuzzy extention of equation (11) to be

fc=i t=i fc=i i=i

Based on the concept of Yang's fuzzy classification maximum likelihood (FCML)[23], we may add a constant w to the term ^^=1 XI"=i Mfciln fc. Thus, a newoptimization objective function is created as

fc=it=i

where m > 1 and lu > 0 are fixed constants. We mention that the weightingexponent m is used to handle the fuzziness of the fuzzy c-partition {fj,i, - - - ,fJ,c}. In


general, when m tends to 1, the fuzzy c-partition will tend to the hard c-partition,and when m tends to infinite, the {fii,- - - ,Hc} will tend to the constant i , i.e.Mfc = ^ for all /c = 1, • • •, c. The constant w is used to adjust the bias with thepenalty term Y^^=i Yl7=i Mfct lno fc- The influence of u; to the FCML procedure wasstudied in Yang and Su [24]. Based on the FCML procedure, a penalized fuzzy c-means (PFCM) clustering algorithm had been proposed by Yang [23]. This PFCMhad been applied in image segmentation and vector quantization (see Lin et al.[16,17]). The optimality test and parameter selection about w and m of the PFCMalgorithm was recently investigated by Yu and Yang [26]. Now, we use this FCMLobjective function (21) to derive an algorithm for the estimation of a fuzzy classlogistic regression model and call it the FCML algorithm.

The maximization of the objective function Lm.w (M, a, P) with respect to ak and/Xfci, subject to the constraints Yl'k=i Qfc = 1 and X^^-i Hki = 1, can be obtainedby the Lagrange method. Let dki = ^fi\k (2/t|/?fc) + whiak- Then the LagrangianHm,w (M. OC, /3, Al, A2) of Lm.w (M> a, P) should be

Uki - 1) (22)fc=it=i fc=i fc=i

where Ai and A2 are the Lagrange multipliers. We take the first derivatives of theLagrangian Hm,w (MIQ;,/?, Ai,A2) with respect to ak and Hki and set them to be 0.We have that

n

1 = 1

T^dki - A2 = 0, (24)

fc = l and ^/xfci = l. (25)fc=i fc=i

Based on equations (23)~(25), we get the estimates Sk and fiki as follows:

, „ , _ 1=1 Pfci

, - 1

]= l , . . . , n , fc = l , . . . , c . (27)

Maximizing (21) with respect to Pk is equivalent to maximizing each of the k ex-pressions

t = i


To take the derivatives with respect to Pki and set them to be zero, we have

Similar to the derivation in Section 3, we have the first derivative with respect toPki as

i

and the second derivative with respect to /?^j' as

l -^ „ -Pkijl - Pki) - {yi - Pki)[-Pki + 1 - Pfct] <:gpfci ,2

The expectation of second derivative with negative sign is

where Wk is an n x n diagonal matrix of weights given by

kiil-Pki))]- (28)

Following the lines of the general Newton-Raphson procedure, we can obtain theparameter estimates as follows. Given the initial estimates ^ , we may computethe vectors p^ and jjj. . Using these values, the adjusted dependent variate, 7fc, isdefined with components

7fci = Vki + {yi-Pki)-;f^dpki

= ln-^^i--H(y,-pfci)—77^ T (29)1-Pfct Pfci(l-Pfci)

where pfcj is described in equation (5) and the estimate satisfies the following equa-tion:

We then obtain the estimate Pk with

y ' ' • (30)

where all quantities appearing on the right are computed using the initials P^ andfT^'. Thus the solution for (30) usually requires an iterative calculation. We giveinitials 31° o{ Pk and /x ° of/Ifc to get WJ^^^ and fj^K We then solve (30) to get the


solution Pf^ and use it to get WJ^ ,7^ , and so on. We write the iterative solutionas follows:

pl;^ = {X'wl'^X)-^X'wi''^^i'\ s = l , 2 , . . . (31)

To summarize the proposed algorithm for fitting fuzzy class logistic regression mod-els, we have the FCML algorithm as follows:

FCML algorithm

Step 1: Fix 2 < c < n and fix any e > 0.Civen initial value fj,^ and P^ and let s = 1.

Step 2: Compute p ."" ^ with p['~'^'' by (5).Step 3: Compute a[^^ with fx['~^^ by (26).Step 4: Compute WJ;^'^ with Mfc ' and pi'~^^ by (28).Step 5: Compute j^^^ withp^""^^ by (29).Step 6: Update p^"^ with W^ ' and 7 " by (31)Step 7: Update ^1"^ with a['^ and pi''^ by (27)Step 8: Compare p]^ to fi]^~ in a convenient matrix norm.

If | | i3^')-Mi'" '^| |<e,stop.Else s = 5 + 1, return to Step 2.

5. Numerical results

In this section, we produce some numerical data from a mixture of two logisticregression models using FORTRAN and implement these data sets to the EM andFCML algorithms. In general, m = 2 in FCML is recommended (see [16, 17, 23]).Thus, we choose m = 2 in all simulations. For the constant w in FCML, we choosew = 0 with no penalty term for Examples 1 and 2. On the basis of the accuracycriterion, we make numerical comparisons of the EM and FCML algorithms in thesetwo examples. The accuracy criterion is measured by the mean squared errors(MSE) that is the average sum of squared errors between the estimate and trueparameter. Since the constant w controls the penalty term, we take w = 0 andw = 0.3 to compare its effect to FCML in Example 3.

Let Berip) denote a Bernoulli distribution with a proportion p. We consider amixture of two logistic regression models with

Let the random samples be drawn from the mixture distribution (32). We use FOR-TRAN to get the random samples (xj, yi). To make the log odds log y ^ available,we first fix the upper bound and lower bound of the probability p, for example,0.05 < p < 0.95. We then input the given values of parameters /?oi,/3ii,/9o2,/9i2-According to the log odds transformation

\n-^^^^-^=Pok + xplk, k = l,2,1 - Pfc {x)


we can derive the upper bound and lower bound of the explanatory variable x. Weset Xu as the upper bound and XL as the lower bound of x. We take a randomsample data xi,---,Xq from the uniform distribution \J{xi,xu)- For each givenXi, we produce a random sample Zii,---,Zir from U(O,1) such that, if zy < a,we choose the model 1 of Ber{ i^exi^ft)t+^'i'ix))' otherwise we choose the model

chosen Bernoulli distribution, we produce asajnple point j/y using the IMSL in FORTRAN. Thus, we generalize the randomsample data (a;i,i/ii),-• •, (a;i,yir),-••, {xq,yqi),--• ,{xq,ygr) with a total samplesize n = q X r from the logistic regression mixture model (32).

To demonstrate the accuracy of parameter estimation for logistic regression mix-ture models, we use the MSE criterion for three examples. In all our simulations,we generalize 5 sets of random samples {{xi, yu) , --- ,{xi, yir), --•, {xq,yqr)} witha sample size n = q x r = 400 for each given test. We run each sample with 10groups of different initial values and then calculate MSEs based on all 5 sets with10 groups of parameter estimates. On the basis of these MSEs with diflFerent givenparameters, we can see the accuracy of the EM and FCML algorithms about dif-ferent logistic regression mixture models. These numerical results and comparisonsare presented in the following examples.

Example 1. We consider four different sets of parameters {Po\,Pii,Po2,Pi2} withtwo different mixing proportions of a as shown in Table 1. In each test of Ai ~Dl and A2 ~ D2, we take 5 sets of random samples that each sample has 400observations.

Table 1. Different tests Ai ~ Di and A2 ~ D2 for different parameters p andmixing proportions a

Poi

5245

Pll

1123

PO2

5345

Pl2

-1-2-2-3

a0.5 0.6Al A2Bl B2Cl C2Dl D2

We implement the EM and FCML algorithms for these random samples and use10 different initial values for each random sample. We then compute the MSEs ofthese 50 parameter estimates for each given test. The resulting MSEs of parameterestimates are shown in Tables 2 and 3. We mention that we implement FCMLwith no penalty term of to = 0 in this example. We know that the penalty termSfc=i 5Zr=i Mfci 111 Ofc can ajust the bias and improve the accuracy of FCML. We willpresent this improvement of FCML in Example 3. To simplify the comparisons, weonly compare FCML with «; = 0 to the EM algorithm. According to the resultsfrom Tables 2 and 3, we find that both of EM and FCML with m = 2 and w = 0(i.e. no penalty term) have almost the same accuracy.


Table 2. MSE of parameter estimation using EM and FCML for tests Ai, B\, Ci and £>i

EM FCML EM ' FCML EM ' FCML EM ' FCMLa 0.0001 0.0005 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001

doi 0.2743 0.6689 0.3214 0.5689 0.3914 0.4676 0.3415 0.1021011 0.0901 0.1561 0.0103 0.0126 0.2249 0.2339 0.3924 0.3328002 0.1024 0.1254 0.5204 0.5568 0.0183 0.0675 0.1385 0.1322012 0.0742 0.1103 0.1318 0.1714 0.1423 0.3743 0.4891 0.4310

Table 3. MSE of parauneter estimation using EM and FCML for tests A2, B2, C2 and D2

aBox

Boi012

AEM

0.00140.42420.00490.32170.0872

2FCML0.00070.52390.00520.01830.1665

EM0.00100.54380.01790.20880.1235

B2FCML0.00260.41020.01960.11210.1327

EM0.00090.54350.01880.51190.3412

C J

FCML0.00010.17360.03240.61380.5804

£EM

0.00070.94330.54790.80340.6124

'2FCML0.00260.75390.50970.21210.6213

That is, the EM can be considered as producing a fuzzy result exactly as FCML andthat only the degree of fuzziness m and the penalty scale w of FCML can be varied.On the other hand, when the logistic model has smaller slopes Pn and P12, theMSEs of EM are smaller than those of FCML. However, when the slopes Pu andP12 become larger, the MSEs of FCML become smaller than those of EM. Moreover,we find that the larger slopes Pn and P12 of the explanatory variable x have largereffects to the changes of x so that their MSEs get larger. This phenomenon will bealso demonstrated in the next example. Totally, the accuracy of both algorithmsare acceptable. Thus, except using EM, we may use FCML as another good toolfor the estimation on logistic regression mixture models.

Example 2. In this example, we fix the mixing proportion with a = 0.5 andconsider three different parameters of P with tests Ei, E2 and E3 as shown inTable 4. Although we had seen that different parameters of models affect theMSEs of parameter estimation using EM and FCML in Example 1, we would liketo advancedly demonstrate its effects to our proposed FCML algorithm in thisexample. The procedure of random sampling and simulation here is similar toExample 1. The resulting MSEs of FCML are shown in Table 5. We see that thelarger shopes Pn and P12 of the explanatory variable x have larger effects to thechanges of i so that their MSEs get larger. However, the accuracy of estimationusing our FCML algorithm is still stable.

Table 4. Tests Ei ~ E3 with different pjirameter /3

Pm444

Pn235

P02444

P12-2-3-5

TestsElE2E3


Table 5. MSE of parameter estimation using FCML for tests Ei ~ E3

aPoiA lP02P12

El0.00010.46760.23390.06750.3743

£20.00030.00130.87310.13230.6537

£ 30.00030.00140.38390.13220.6923

Exeunple 3. For observing the influence of the penalty term w 5Zfc=i 127=i M S log <fcin FCML, Yang and Su [24] had studied it on the normal mixture parameter esti-mation. In this example, we would like to demonstrate the influence of the penaltyterm to our FCML estimation for logistic regressiom mixture models. We considertwo cases of lu = 0 and w = 0.3. The tests are shown in Table 6. We mention thatu; = 0 in FCML is the case of without the penalty term that had been used in thenumerical comparisons to EM in Example 1. The MSEs of tests Fi , F2, Gi and G2are shown in Table 7. We find that MSEs with m = 0.3 are almost smaller andmore stable than those with m = Q. This is because the penalty term can adjustthe bias of FCML estimation. In our more simulation experiments, we suggest thatthe values of w are chosen better between 0 and 0.7 for logistic regression mixturemodels.

Table 6. Tests Fi ~ F2 and Gi ~ G2 for different w

a Poi Pll /9o2 P12 W0 0.3

0.5 4 1 3 -1 Fl F20.6 7 1 3 -0.5 Gi G2

Table 7. MSE of parameter estimation using FCML for tests F\, F2, G\ and G2

F2 Gi G2a

Poi0nP02P12

0.00130.43150.13210.08340.2143

0.00090.37230.10340.08370.1879

0.00250.67290.01970.01210.0354

0.00120.43470.01990.01130.0253

6. ConclusionsIn this paper, cluster analysis on latent class and fuzzy class logistic regressionmodels for categorical response data was studied. It is known that the EM algo-rithm had been used for the latent class logistic regression model. In general, thelatent class variables are considered as non-overlapping clustering and crisp classvariables with a probability sense. However, some concepts in the social and be-havioral sciences may not be well defined with vagueness and also with overlapping


clustering cases. To consider overlapping clustering with vagueness, we extendedlatent class variables to fuzzy class variables based on the fuzzy set concept in thispaper. We then extended the log-hkelihood equation of EM for the latent classlogistic regression model to the fuzzy log-likelihood equation for the fuzzy class lo-gistic regression model. Based on the extended objective function, we created anFCML algorithm and then used the algorithm to the parameter estimation for afuzzy class logistic regression model. According to numerical results, we found thatthe FCML algorithm can be well used as a tool for estimating the parameters oflogistic regression mixture models. In van Rees et al. [21], they used latent classmodels to analyze the relation between the response variable of reading behaviorand explanatory variables of readers' background such as Age, Education, Gender,Activity and Time Costs. In marketing research, market segmentation has becomea central concern of top management and strategic plarmers (see Datta [5]). Clusteranalysis has been considered in market segmentation (see Green and Krieger [11],Hruschka and Natter [12] and Kuo et al. [14]). To apply the proposed fuzzy classmodels and the FCML algorithm to market segmentation and on the analysis ofthe relation between the response variable and explanatory variables in logigsticregression mixture models will be our further research topic.

Acknowledgements

The authors would like to thank the anonymous referees for their helpful commentsajid suggestions to improve the presentation of the paper. This work was supportedin part by the National Science Council of Taiwan under Grant NSC-88-2118-M-033-001.

References

[1] A. Agresti, Categorical Data Analysis, Wiley, New York, 1990.

[2] T.W. Anderson, 'On estimation of parameters in latent structure analysis', Psy-chometrika, 19, 1-10 (1954).

[3] J.C. Bezdek, J.M. Keller, R. Krishnapuram and N.R. Pal, Fuzzy Models andAlgorithms for Pattem Recognition and Image Processing, The Handbooks ofFuzzy Sets Series, Kluwer, Dordrecht, 1999.

[4] G. Celeux and G. Govaert, 'Clustering criteria for discrete data and latent classmodels'. Journal of Classification, 8, 157-176 (1991).

[5] Y. Datta, 'Market segmentation: an integrated framework'. Long Range Plan-ning 29, 797-811 (1996).

[6] W.S. DeSarbo and M. Wedel, 'A mixture likelihood approach for generalizedlinear models'. Journal of Classification, 12, 21-55 (1995).

Fuzzy Class Logistic Regression Analysis 779

[7] W.S. DeSarbo, M. Wedel and J.R. Bult, 'A latent class Poisson regression modelfor heterogeneous count data', Journal of Applied Econometrics, 8, 397-411(1993).

[8] B.S. Everitt, An Introduction to Latent Variable Models, Chapman and Hall,New York, 1984.

[9] B.S. Everitt, 'A note on parameter estimation for Lazarsfeld's latent class modelusing the EM algorithm', Multivariate Behavioral Research, 19, 79-89 (1984),

[10] B.S, Everitt and D.J. Hand, Finite Mixture Distributions, Chapman and Hall,New York, 1981.

[11] P.E. Green and A.M. Krieger, 'Alternative approaches to cluster-based marketsegmentation'. Journal of the Market Research Society, 3, 221-239 (1995).

[12] H, Hruschka and M. Natter, 'Comparing performance of feedforward neuralnets and k-means for cluster-based market segmentation', European Journal ofOperational Research, 114, 346-353 (1999).

[13] J.D. Jobson, Applied Multivariate Data Analysis, Vol. 2. Categorical and Mul-tivariate Methods, Springer-Verlag, New York, 1992.

[14] R.J. Kuo, L.M, Ho and CM. Hu, 'Cluster analysis in industrial market segmen-tation through artificial neural network'. Computers & Industrial Engineering,42, 391-399 (2002).

[15] P.F, Lazarsfeld, 'The logical and mathematical foundation of latent structureanalysis', in: S.A. Stouffer, L. Guttman, E.A. Suchman, P.F. Lazarsfeld, S.A.Star, J.A. Clausen (Eds.), Measurement and Prediction, Princeton UniversityPress, Princeton, 1950.

[16] J.S. Lin, K.S. Cheng and C.W. Mao, 'Segmentation of multispectral magneticresonance image using penalized fuzzy competitive learning network'. Computersand Biomedical Research, 29, 314-326 (1996).

[17] S.H. Liu and J.S. Lin, 'Vector quantization in DCT domain using possibilisticc-means based on penalized and compensated constraints', Pattem Recognition,35, 2201-2211 (2002),

[18] P. McCullagh and J.A. Nelder, Ceneralized Linear Models, Chapman and Hall,New York, 1989.

[19] G.J. McLachlan and K.E. Basford, Mixture Models: Inference and Applicationsto Clustering, Marcel Dekker, New York, 1988.

[20] E.H. Ruspini, 'A new approach to clustering'. Information and Control, 15,22-32 (1969).


[21] K. van Rees, J. Vermunt and M. Verboord, 'Cultural classifications under dis-cussion latent class a,nalysis of highbrow and lowbrow reading'. Poetics, 26,349-365 (1999).

[22] M.S. Yang, 'A survey of fuzzy clustering'. Mathematical and Computer Mod-elling, 18, 1-16 (1993),

[23] M.S. Yang, 'On a class of fuzzy classification maximum likelihood procedures'.Fuzzy Sets and Systems, 57, 365-375 (1993).

[24] M.S. Yang and C.F. Su, 'On parameter estimation for normal mixtures basedon fuzzy clustering algorithms'. Fuzzy Sets and Systems, 68, 13-28 (1994).

[25] M.S. Yang and N.Y. Yu, 'Estimation of parameters in latent class models usingfuzzy clustering algorithms', European Journal of Operational Research, (2003)(Accepted).

[26] J. Yu and M.S. Yang, 'Optimality test for generalized FCM and its applicationto parameter selection', IEEE Trans, on Fuzzy Systems, (2004) (Accepted).

[27] L.A. Zadeh, 'Fuzzy sets'. Information and control, 8, 338-353 (1965).

fuzzy class logistic regression analysis · regression mixture analysis with a fuzzy class model....

Documents