2288473.pdf

Upload: pedro-henrique-rosa

Post on 04-Jun-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 2288473.pdf

    1/20

    Estimating Optimal Transformations for Multiple Regression and Correlation

    Author(s): Leo Breiman and Jerome H. FriedmanSource: Journal of the American Statistical Association, Vol. 80, No. 391 (Sep., 1985), pp. 580-598Published by: American Statistical AssociationStable URL: http://www.jstor.org/stable/2288473.

    Accessed: 23/01/2014 18:34

    Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at.http://www.jstor.org/page/info/about/policies/terms.jsp

    .JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of

    content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms

    of scholarship. For more information about JSTOR, please contact [email protected].

    .

    American Statistical Associationis collaborating with JSTOR to digitize, preserve and extend access toJournal

    of the American Statistical Association.

    http://www.jstor.org

    http://www.jstor.org/action/showPublisher?publisherCode=astatahttp://www.jstor.org/stable/2288473?origin=JSTOR-pdfhttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/stable/2288473?origin=JSTOR-pdfhttp://www.jstor.org/action/showPublisher?publisherCode=astata
  • 8/13/2019 2288473.pdf

    2/20

    EstimatingptimalTransformationsorMultipleRegression nd CorrelationLEO BREIMAN nd JEROMEH. FRIEDMAN*Inregressionnalysishe esponse ariable andthe redictorvariables I, . . ,Xp reofteneplaced yfunctions(Y) and4I(XI), . . . p, (Xp). We discuss procedureor stimatingthose unctions* and4 . . . * thatminimize2 = E{[0(Y)_ lp=, 0j(Xj)]2}/var[0(Y)], ivenonlya sample (Yk, Xkl,. . ., Xkp), 1 ' k ? N} and makingminimal ssumptionsconcerninghe data distributionr theform f the solutionfunctions.or thebivariatease, p = 1,0* and4* satisfy *= p(0*, 4*) = max0,0p[0(Y),(X)], where is theproductmoment orrelationoefficientndp* is the maximal orre-lationbetween andY. Ourprocedurehus lso providesmethod or stimatinghe maximal orrelationetween wovariables.KEY WORDS: Smoothing;CE.

    1. INTRODUCTIONNonlinearransformationf variables s a commonlysedpractice n regression roblems. wo common oals are sta-bilizationf error ariance ndsymmetrization/normalizationof error istribution.more omprehensiveoal,andthe neweadopt,s tofind hose ransformationshat roduce he est-fittingdditivemodel.Knowledgef uch ransformationsidsin the nterpretationndunderstandingf therelationshipe-tween heresponse ndpredictors.Let Y, X, .. . ,Xpberandomariables ith the esponseandXI, . . ,XXphe redictors.et0(Y), q$(XI), . . . ,Op(Xp)bearbitraryeasurable ean-zerounctionsf he orrespond-ingrandom ariables. he fractionf variance ot xplained(e2) bya regressionf0(Y) on 4I,I i(Xi) is

    E{LO(Y) - E i(xi)e2(0, 1 . . . , 4P) = E02(y) . (1.1)

    Then efineptimal ransformationssfunctions*,41*,..4* thatminimize1.1); that s,e2(0*, min,. ., 44) = mm e2(0, 01, . . ., 4p). (1.2)p~~~ .k 0o01.... XpWe show nSection that ptimal ransformationsxist ndsatisfy complex ystemf integralquations. heheart four pproachs that here s a simpleterativelgorithmsingonly ivariateonditionalxpectations,hich onvergesoanoptimal olution.When heconditionalxpectationsre esti-mated rom finite ataset,then se ofthe lgorithmesultsinestimates f theoptimal ransformations.Thismethod as somepowerfulharacteristics.t can be

    * Leo Breimans Professor, epartmentfStatistics, niversityfCali-fornia, erkeley, A 94720.Jerome . FriedmansProfessor,epartmentfStatisticsndStanford inearAcceleratorenter, tanfordniversity,tan-ford, A 94305. Thisworkwas supportedyOffice fNaval Research on-tracts 00014-82-K-0054nd N00014-81-K-0340.

    applied n situations here heresponse r thepredictorsn-volve arbitrary ixturesf continuousrdered ariables ndcategoricalariablesorderedrunordered).hefunctions,01, . . ., ~ are real-valued.f theoriginal ariable s cate-gorical, he pplicationf 0 orXiassigns real-valuedcoreto eachof tscategoricalalues.The proceduresnonparametric.heoptimal ransformationestimatesre basedsolely n thedatasample (Yk, Xkl, .Xkp), 1 ? k ? N} withminimal ssumptionsoncerninghedatadistributionnd theform f theoptimal ransformations.In particular,edo not equirehe ransformationunctionsobe from particulararameterizedamilyr evenmonotone.(Laterwe illustrateituationsn which heoptimal ransfor-mationsre notmonotone.)It is applicable oat east hree ituations:

    1. random esignsnregression2. autoregressivechemesnstationaryrgodic ime eries3. controlledesignsnregression.In the firstfthese,we assume hedata Yk, Xk), k = 1,. N, are ndependentamples romhedistributionf Y,XI, . . ., Xp. In thesecond, stationary ean-zerorgodictime eries I,X2, . . isassumed,he ptimal ransformationsare defined o be thefunctionshatminimize

    E02(XpX- >and the data consist fN + p consecutive bservationsl,

    * XN+P- This s put na standard ataform y definingYk = Xk+p, Xk = (Xk+p1, *, Xk), k = 1, . ,N.Inthe ontrolledesign ituation, distribution(dy x) fortheresponse ariableY is specified or very oint = (xl,. . ., xp) n thedesign pace.The Nth-orderesign onsistsof a specificationfN points l, . . . , XN inthedesign pace,and he ata onsist f hese oints ogetherithmeasurementsontheresponse ariables l, . . ., YN. The Yk} areassumed

    independentith k drawn rom hedistribution(dy I k).DenotebyPN(dx) the mpirical istributionhat ivesmass1/N o eachof thepoints l, . . ., XN. AssumefurtherhatPN P, where (dx) is a probability easure n thedesignspace.ThenP(dy Ix) andP(dx) determinehedistributionfrandom ariablesY,XI, . . . , Xp,andtheoptimal ransfor-mations re defined s in 1.2).For thebivariatease, p = 1, theoptimal ransformations0*(Y), +*(X) satisfyp*(X, Y) = p(Q*, q*) = maxp[0(Y), +(X)], (1.3)0,0

    ? 1985Americantatistical ssociationJournalf heAmericantatistical ssociationSeptember 985,Vol.80, No. 391, heorynd Methods580

    This content downloaded from 193.136.144.3 on Thu, 23 Jan 2014 18:34:54 PMAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 8/13/2019 2288473.pdf

    3/20

    Breiman nd Friedman: stimating ptimalTransformations 581wherep is the product-moment-correlationoefficient. hequantity*(X, Y) is known s themaximal orrelationetweenX and Y, and it is used as a generalmeasure f dependence(Gebelein 947;also seeRenyi 959,Sarmanov 958a,b, andLancaster 958). The maximal orrelationas the followingpropertiesRenyi1959):

    1. 0 ' p*(X, Y) ' 1.2. p*(X, Y) = 0 if and only fX and Y are ndependent.3. If there xists relation f theform (X) = v(Y), whereu and v are Borel-measurableunctions ith ar[u(X)]> 0,then *(X, Y) = 1.Therefore,nthebivariatease ourprocedurean also be re-garded s a method or stimatinghemaximal orrelatione-tweenwovariables, rovidings a by-productstimatesfthefunctions*, 4*, that chieve hemaximum.In the next ection,we describe ur procedure orfindingoptimal ransformationssing lgorithmicotation, eferringmathematicalustificationsoSection and Appendix . Wenext llustrateheproceduren Section by applyingt to asimulated ata set in which heoptimal ransformationsreknown. he estimatesresurprisinglyood.Ouralgorithmsalso applied otheBostonhousing ataofHarrison ndRub-infeld 1978) as listed nBelsley t al. (1980). The transfor-mations ound y thealgorithm enerally iffer rom hoseappliednthe riginalnalysis. inally, eapply he rocedureto a multipleime eries rising romnairpollutiontudy.AFORTRAN mplementationf our lgorithms available romeitheruthor. ection presentsgeneral iscussionndrelatesthis rocedureo othermpiricalmethods or indingransfor-mations.Section andAppendix provide ometheoreticalrame-work or healgorithm.n Section , underweak conditionson the ointdistributionfY,XI, . . . , Xp, t is shown hatoptimalransformationsxist nd aregenerally nique ptoachange fsign.Theoptimalransformationsre characterizedas the igenfunctionsf a setof inearntegralquations hosekernelsnvolve ivariate istributions. e then how hat urprocedureonverges o optimal ransformations.Appendix discusses he lgorithms applied o finite atasets.The results re dependentn thetypeof data smoothemployed o estimatehe bivariate onditionalxpectations.Convergencef thealgorithms proven nlyfor restrictedclass of datasmooths. owever,nmore han1,000 applica-tionsof thealgorithmn a varietyf data setsusingthreedifferentypes f datasmoothers,nlyone (very ontrived)instance fnonconvergenceas beenfound.Appendix also contains roof f a consistencyesult. n-derfairly eneral onditions,s the ample ize increaseshefinite ata transformationsonvergena "weak" sense o thedistributionalpaceoptimalransformations.he essential on-dition f the heoremnvolves he symptoticonsistencyfasequenceof data smooths.n the case of iid data there reknown esultsoncerninghe onsistencyf various mooths.Stone's 1977) pioneeringaper stablishedonsistencyork-nearest-neighbormoothing.evroyendWagner1980) and,independently,piegelman nd Sacks 1980) gave weak con-ditions or onsistencyf kernel mooths. ee Stone 1977)and Devroye 1981) for review f the iterature.

    There reno analogous esults, owever, or tationaryr-godic eries rcontrolledesigns. o remedyhiswe show hatthere re sequences f data smooths hathave the requisitepropertiesn all three ases.This article s presentedn twodistinct arts. ections -4give a fairly ontechnicalverview f themethod nd discussits applicationo data. Section and Appendix are, of ne-cessity,more echnical, resentinghetheoreticaloundationfor heprocedure.Theresrelevant revious ork. losest nspirit o theACEalgorithme develop s theMORALS algorithmf Young tal. (1976) also see de Leeuw t al. 1976). t uses n alternatingleast squares fit,but t restrictsransformationsn discreteorderedariableso be monotonicndtransformationsn con-tinuous ariables o be linear r polynomial. o theoreticalframeworkorMORALS is given.Renyi 1959) gave a proof f the xistence f optimalrans-formationsnthe ivariateaseunder onditionsimilar ooursin thegeneralase. He also derivedntegralquationsatisfiedby0* andq* with ernels ependingn thebivariateensityofX andYandconcentratednfindingolutionsssuminghisdensitynown. heequationseemgenerallyntractable ithonly fewknown olutions. e did not onsider heproblemofestimating*, q9* rom ata.Kolmogorovsee SarmanovndZaharov 960 ndLancaster1969) proved hatfY1, . . , Yq,XI, . . ., Xphave a jointnormal istribution,hen hefunctions(YI, . . . , Yq), 4(XI,* . . , Xp)havingmaximumorrelationre linear. t followsfrom his hat n theregression odel

    p0(Y) = > 4i(Xi) + Z, (1.4)i=lifthe4i(Xi),i = 1, . ., p, have a jointnormal istributionand Z is an independent(0, 72), then heoptimal ransfor-mations s definedn 1.2) are0, 01, . . . , Op.Generally,ora modelof theform1.4) with independentf (XI, .Xp),the ptimalransformationsrenot qualto0, 1,OP.But n xampleswith imulatedatageneratedrommodelsof theform1.4), withnon-normal4i(Xi)}, the estimatedoptimalransformationsere lways lose to0, 01, . . , Op.Finally,we note the work n a differentirectionyKi-meldorft al. (1982), who constructedlinear-programming-type lgorithmofind hemonotoneransformations(Y), #(X)thatmaximize he ample orrelationoefficientnthe ivariatecasep = 1.

    2. THEALGORITHMOurprocedureorfinding *, 0*, . . ., 4* is iterative.Assume known istributionor hevariables ,XI, . ,Xp.Withoutoss ofgenerality,etE02(Y) = 1, and assume hatall functionsaveexpectationero.To illustrate,e firstook atthebivariatease:

    e2(0, 4) = E[0(Y) - /(X)]2. (2.1)Consider heminimizationf 2.1) with espect o 0(Y) forgiven unction(X), keeping Q2 = 1. The solutions01(Y) = E[+b(X) Y]/IIE[44X) Y]II (2.2)

    This content downloaded from 193.136.144.3 on Thu, 23 Jan 2014 18:34:54 PMAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 8/13/2019 2288473.pdf

    4/20

    582 Journal ftheAmericanStatisticalAssociation, eptember1985withI * - [E( )2] 2. Next,consider he unrestrictedin-imizationf 2.1) with espect o +(X) for given (Y). Thesolutions

    OI(X) = E[0(Y) IX]. (2.3)Equations2.2) and 2.3) formhebasis of an iterativepti-mization rocedurenvolvinglternatingonditionalxpecta-tionsACE).Basic ACE Algorithm

    Set 0(Y) = Y/IIYII;Iterate ntil 2(0, q) fails o decrease:XI(X) = E[0(Y) IX];replace (X) withXI(X);01(Y) = E[O(X) IY]/IIE[k(X) Y]II;replace (Y) with I(Y);End terationoop;0 and0 arethe olutions * and4*;End Algorithm.Thisalgorithmecreases 2.1) at each stepby alternatinglyminimizingith especto onefunctionnd holdinghe therfixed t its previous valuation. ach iterationexecutionfthe terationoop) performsnepairofthese ingle-functionminimizations.he process eginswithn nitial uessfor neof he unctions0 = Y/IIYII)nd ndswhen completeterationpass fails o decrease 2. In Section , we prove hat he al-gorithmonverges o optimalransformations*,O*.Now consider hemoregeneral ase ofmultipleredictorsXI,. . . ,Xp.We proceedndirect nalogywith he asicACEalgorithm. eminimize

    e2(0, q5, * , kp)= E[0(Y) - I dj(XJ)1,2.4)holding Q2 = 1, EO = E I = E4p = 0, throughseries f ingle-functioninimizationsnvolvingivariateon-ditionalxpectations.or given et ffunctions$1(XI),Op(Xp),minimizationf 2.4) with espect o ?(Y) yields01(Y)= E[ I](xi) j E[ i xi) IY (2.5)

    The next tep s to minimize2.4) with especto 4I(X1),... ., qp(Xp),given 0(Y). This is obtained hroughnotheriterativelgorithm.onsider heminimizationf (2.4) withrespectoa single unctionk(Xk) for iven (Y) anda givenset41, . , 4 k-1I, 4k+17 * , 4p.Thesolutions

    kk, Xk) =E [0(Y) - > i(Xi) IXkj (2.6)i$kThecorrespondingterativelgorithms as follows:

    Set 41(XI), . . . , 4p(Xp) 0;Iterate ntil 2(0,4', .P . , 4) fails odecrease;Fork= ltopDo:Xkk,l(Xk) E[0(Y) - i#k q5i(Xi) I kI;replace kk(Xk) with k1 (Xk);

    End ForLoop;End teration oop;01, . Xparethe olution unctions.

    Each iterationfthe nner orloopminimizes2 (2.4) withrespecto thefunctionk(Xk), k = 1, . . . , p, with ll otherfunctionsixed t their reviousvaluationsexecutionf theForloop). The outeroop is iteratedntil ne complete assoverthepredictorariablesinner or oop)fails odecreasee2 2.4).Substitutinghis rocedureor he orrespondingingle unc-tion ptimizationnthebivariate CE algorithmivesrise othefullACE algorithmorminimizinghe 2.4) e2.ACE Algorithm

    Set 0(Y) = Y/IIYIInd+,(X1), . . ., 4p(Xp)= 0;Iterate ntil 2(0, 4,, . . ., 4p)fails odecrease;Iterate ntil 2(0, 1, . . ., /p) ails odecrease;Fork= ltopDo:Ok,l(Xk) = E[0(Y) - Eilk qi(Xi) | Xk];replace k(Xk) withkk, (Xk);End ForLoop;End nnerterationoop;01(Y) = E[Yi=, i(Xi) IY]IIIE[= I Oi(Xi) Y];replace (Y) with 1(Y);EndOuterterationoop;0,4, . . . , Op re the olutions *, 0, . ,p;End ACE Algorithm.

    In Section , we prove hat heACE algorithmonvergesooptimalransformations.3. APPLICATIONS

    In theprevious ection, heACE algorithmas developedin the ontext fknown istributions.n practice, atadistri-butions re seldomknown. nstead, nehas a data set (Yk,Xkl, . . . , Xkp), 1 k ? N} thats presumedo be a samplefromY, XI, . . ., Xp. The goal is to estimateheoptimaltransformationunctions(Y), 41(XI), . . . , 4p(Xp)from hedata.This anbeaccomplishedy pplyingheACE algorithmtothedatawith hequantities2, liii,nd the onditionalx-pectations eplaced ysuitable stimates. he resultingunc-tions , 4*, . . , Op are then aken s estimatesf thecorrespondingptimalransformations.The estimateor 2 s the sualmean quaredrror or egres-sion:

    e2(o, * * 4P IE 0O(Yk) I Oj(Xk)]Nk=l L J=Ifg(y,xl, . . ., xp) s a functionefinedor ll datavalues,thenu1gh12s replaced y

    111J12=IIgIIN = N E 9 Yk, Xkl, Xkp)-Nk=1IFor he ase of ategoricalariables,he onditionalxpectationestimatesrestraightforward:f thedata re (Xk, Zk)}, k = 1,N, andZ is categorical,hen

    E[XIZ=z] = 2 Xk ,Zk.Z Zk Z

    whereX is real-valued nd the sums are over the ubset fobservations aving categorical) alue Z = z. Forvariablesthat an assumemany rdered alues, he stimations based

    This content downloaded from 193.136.144.3 on Thu, 23 Jan 2014 18:34:54 PMAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 8/13/2019 2288473.pdf

    5/20

    Breiman and Friedman: EstimatingOptimal Transformations 583on smoothingechniques.uchproceduresavebeen he ub-ject of considerabletudy e.g., see GasserandRosenblatt1979,Cleveland 979,and Craven nd Wahba1979). Sincethe moothersrepeatedlyppliedn the lgorithm,igh peedis desirable,s wellas adaptabilityo ocalcurvature. e usea smoothermployingocal linear itswithvarying indowwidth determined y local cross-validationthe "super-smoother";ee Friedmannd Stuetzle 982).A AThe algorithmvaluates *, 04, . ., /* at all the orre-spondingata values;that s, 0*(y) is evaluated t the etofdatavalues Ykl, k = 1, . . . ,N. Thesimplest ayto under-stand he hape f the ransformationssbymeans fa plot fthe unctionersushe orrespondingata alues-thats,throughtheplots f *(Yk) versus k and41, . . , 4 versus hedatavaluesofxl, . . . , xp,respectively.Inthis ection,we illustrateheACE procedurey applyingit ovarious ata ets. n order oevaluate erformancen finitesamples, heprocedures firstpplied o simulated ataforwhich heoptimalransformationsreknown.We next pplyitto theBoston ousing ataofHarrisonndRubinfeld1978)as listed nBelsley tal. (1980), contrastingheACE trans-formationsith hose sed nthe riginalnalysis. orour astexample,weapply heACE procedureoa multipleime eries

    to study he elation etween irpollutionozone) andvariousmeteorologicaluantities.Ourfirstxample onsists f200bivariatebservations(Yk,Xk), 1 ? k ? 2001generatedromhemodel

    Yk = exp[xk + Ek],with hexl and the k drawn ndependentlyrom standardnormal istribution(0, 1). Figure a) shows scatterplotfthese ata.Figures (b)-l(d) show heresults f applyingheACE algorithmo thedata. The estimated ptimal ransfor-mation *(y) is shown nFigure1(b)'s plotof 0*(Yk) versusYk, 1 s k s 200. Figure1(c) is a plot of 4*(Xk) versus k.Theseplots uggest he ransformations(y) = log(y) nd+(x)= X3,which reoptimalor he arentistribution.igure d)is a plotof 0*(Yk) versus *(Xk). This plot ndicates morelinear elation etween he ransformedariables han hat e-tween heuntransformednes.Thenext ssueweaddresss howmuchhe lgorithmverfitsthedata duetotherepeatedmoothings,esultingn inflatedestimatesf themaximal orrelation* and ofR*2 = 1 -e*2. Theanswer,n the imulatedata etswehavegenerated,is surprisinglyittle.To illustratehis,we contrastwo estimatesfp* andR*2

    a ~~~~~~~~~~~~~~~~~~cy s.xiF ()40 0 L

    20

    -1 0 1 -1 0

    b d2 0*(Y) - 2_2 0b*(y)s. 0*(x)

    A 0-1'-

    -2 -2-2KI I~~ I III -0 20 40 60 -2 -1 0Figure . First xample:a) Originalata; b) Transformny; c) Transformn x; (d) Transformedata.This content downloaded from 193.136.144.3 on Thu, 23 Jan 2014 18:34:54 PM

    All use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 8/13/2019 2288473.pdf

    6/20

    584 Journal f the AmericanStatisticalAssociation, eptember1985Table 1. Comparisonfp*Estimates

    StandardEstimate Mean Deviationp*direct .700 .034ACE .709 .036

    using he bovemodel.The known ptimalransformationsre0(Y) = log Y, +(X) = X3. Therefore, e define he directestimate for *, given nydata etgenerateds abovebythesample orrelationetweenogYkandxl andsetR2 = p2. TheACE algorithmroduces he stimateslN

    P N E 6*(Yk) *i(Xk)=Nk=I1andR*2 = 1I - p*2 In thismodelp* = .707 andR*2- .5.For100data ets, achof ize200, generatedromhe bovemodel, hemeans ndstandardeviations f thep* estimatesare nTable 1. The means ndstandardeviationsftheR 2estimatesre nTable2.We also computed the differences * - p and R*2 - R2for he 100datasets.The means nd standard eviationsrein Table 3.The precedingxperimentas duplicatedor malleramplesize N = 100. In thiscase we obtained he differencesnTable4.Wenext how napplicationftheprocedureosimulateddatageneratedrom hemodel

    Yk = exp[sin(27tXk) + Ck12], 1 ? k ? 200,with heXk sampled rom uniformistribution(0, 1) andthe k drawnndependentlyftheXkfrom standardormaldistribution(0, 1). Figure (a) shows scatterplotf thesedata.Figures (b) and2(c) showtheoptimal ransformationestimates *(y) and+*(x). Althoughog(y)andsin(2irx) renottheoptimal ransformationsor hismodel owing o thenon-normalistributionf in(2irx)],hese ransformationsrestill learly uggestedytheresultingstimates.Our next xample onsists f a sample f200 triplesYk,Xkl, Xk2), 1 k 200}drawn rom hemodelY = XIX2,withXI andX2generatedndependentlyrom uniformistributionU(- 1, 1). Notethat0(Y) = log(Y) and Oj(Xj) = log Xj(j = 1, 2) cannot e solutions ere, inceY,XI, andX2allassumenegative alues.Figure (a) showsa plotof 0*(Yk)versus k, andFigures (b) and3(c) show orrespondinglotsof j* Xkl) and 45(Xk2) (1 ' k ' 200). All three olutiontransformationunctionsre seen to be double-valued. heoptimal ransformationsor hisproblemre0*(Y) = log|Y|and4j(Xj) = loglXjlj = 1, 2). Theestimateslearly eflectthis tructurexcept ear he rigin, herehe mootherannotreproducehe nfiniteiscontinuitynthederivative.

    Table2. ComparisonfR*2 stimatesStandardEstimate Mean Deviation

    R*2 irect .492 .047ACE .503 .050

    Table 3. EstimateDifferencesStandardEstimate Mean Deviation

    p- .001 .015R*2- R2 .012 .022

    Thisexample llustrateshat heACE algorithms able toproduce onmonotonicstimatesor oth esponsendpredic-tor ransformations.For our next xample,we apply heACE algorithmotheBostonhousingmarket ataofHarrisonndRubinfeld1978).A completeistingf hese ata ppearsnBelsley t l. (1980).Harrison nd Rubinfeld sedthesedatato estimatemarginalairpollution amages srevealedn he ousingmarket.entralto theirnalysiswas a housing alueequationhat elates hemedianvalue of owner-occupiedomes n each of the 506censustractsn the BostonStandardMetropolitantatisticalAreatoair pollutionas reflectednconcentrationfnitrogenoxides)and to 12 other ariables hat re thoughto affecthousing rices.Thisequationwas estimatedy tryingo de-termine hebest-fittingunctionalorm f housing riceonthese 3 variables. y experimentingith numberfpossibletransformationsf he 4variablesresponsend13predictors),Harrisonnd Rubinfeldettlednanequation f theformlog(MV) = al + a2(RM)2 + a3AGE

    + a4log(DIS) + a5log(RAD) + a6TAX+ a7PTRATIO + a8(B - .63)2+ aglog(LSTAT) + ajOCRIM + aj1ZN+ a12INDUS + a13CHAS + a14(NOX)P + c.

    A brief escriptionf eachvariables given n Appendix .(For a more omplete escription,ee Harrison nd Rubinfeld1978, table4.) The coefficientsl, . . . , a14weredeterminedby a least quares it omeasurementsfthe14variables orthe506 census racts. hebestvalue for he xponent wasfoundobe2.0, by numericalptimizationgrid earch). his"basicequation"was usedtogeneratestimates or hewill-ingness opayfor nd themarginalenefitsfclean ir. Har-rison ndRubinfeld1978) noted hat heresults re highlysensitiveotheparticularpecificationfthe orm f hehous-ing price equation.Weapplied heACE algorithmothe ransformedeasure-mentsy', xl .. x13) using = 2 forNOX) appearingnthebasic quation. othe xtenthat hese ransformationsre losetotheoptimal nes,the lgorithmillproduce lmostinearfunctions.eparturesrominearityndicate ransformationsthat animprovehequality f thefit.In this andthefollowing)xamplewe apply heprocedurein a forwardtepwisemanner. orthefirst ass we considerTable 4. EstimateDifferences, ample Size 100

    StandardEstimate Mean Deviationp* - p .029 .034R*- R2.042 .051

    This content downloaded from 193.136.144.3 on Thu, 23 Jan 2014 18:34:54 PMAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 8/13/2019 2288473.pdf

    7/20

    Breiman nd Friedman: stimatingptimalTransformations 585_ . l l I | I~~~~~~~~~~~~~~~~~~~I|I T

    a6 _y vs.x6~~~~~~~~~~~~~~~~~~~~~

    2

    0:~~~~~~~~~~~ 10.2 0.4 0

    b2

    0I0 .2 4. 0.6.

    1.0

    2 0*(y)~~~~~~~0*x

    0.5

    0.0-0.5

    0 0.2 0.4 0.6 0.8 1Figure . Second Example: a) Original ata; (b) Transformed;,(c) Transformed.

    the 13 bivariateroblemsp =1) involvingheresponse 'with achof thepredictorariables ' (1I k -< 13) inturn.

    ~~~~~~~~ .th prdco 1ta aiie 20(',4kx) sicuein h moe.thescn'as(oe-h eanig1 rdctrs nlue th:2tiait rbem p=2 novn 'O~ e (k$k) h rdco htmxmzsA[2y)tkkXk) 12() sicue nth oe.Ti owr

    2 a

    1 _

    -2 -1 -0.5 ~~~0 0.5

    b

    0

    -1-0.5 0 0.5 1

    C 2* (X2)

    0

    -1 -0.5 0 0.5 1Figure 3. ThirdExample: (a) Transformed ; (b) Transformed ;(c) Transformed2.

    selection rocedures continued ntil hebestpredictorfthe

    nex ass nrese th o h rvosps yls hn.1

    Th reulin fia oe.novdforpeitr n aanA .8. ApligAEsmlaeusyt l 3peitrreslt in an inres in 2of onl .02.,,I ,IFiur 4() hos po of he ouinrsos 1rnfr

    This content downloaded from 193.136.144.3 on Thu, 23 Jan 2014 18:34:54 PMAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 8/13/2019 2288473.pdf

    8/20

    586 Joumal of the AmericanStatisticalAssociation, eptember1985

    3 ai ~~~~~~~~~~~~~~~~~0.4e3 e- 00 ''''IW^?|I2 f4*(I0gMV) 0*(PTRATIO)1 ~~~~0.2

    0 -- 0.0

    .2La I I I I , ,I,_ __2 __,_____";8.5 9 9.5 10 10.5 11 12 14 16 18 20 224 ''' i i'~~~~~~~~~~~~~~0.b 0.42*(MV)' 0.2*(TAX)

    21 0 r t0.2

    0 0.0

    0 10 20 30 40 50 200 400 600

    C 9. 0120 0*(RM2) 0.0 , $*(NOX2)2~~~~~~~~~~~~~~~~~~~~

    I ~~~~-0.1

    / ~~~~~~~~~~-0.2-0.3~~~~~~~~~~~~.00 20 40 60 80 0.002 0.004 O.OOB 0.008

    1.01. d .*(log LSTAT) h

    1 ~2 -:. : ,0.5 -

    0.0

    - o -2N--4 -3 -2 -1 -1 0 1 2 3 4

    Figure . BostonHdousingata: a) Transformedog(MV);b) TransformedV; c) TransformedM2 a= .492);d) Transformedog(LSTAT)(a - .417); e) TransformedTRtatioa = .147); f)Transformedax a - .122); g) TransformedOX2 a = .09); ih)TransformedVersusPredictorf Transformed.

    This content downloaded from 193.136.144.3 on Thu, 23 Jan 2014 18:34:54 PMAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 8/13/2019 2288473.pdf

    9/20

    Breiman nd Friedman:EstimatingptimalTransformations 587mation(y'). Thisfunctions seen ohave positiveurvaturefor entral aluesofy', connectingwo traightine egmentsofdifferentlope in either ide. This suggests hat he oga-rithmicransformationaybe toosevere.Figure (b) showsthe ransformation(y) resulting hen he forwardtepwise)ACE algorithms applied othe riginal ntransformedensusmeasurements.The samepredictorariableet ppearsnthismodel.)Thisanalysisndicates hat, fanything, mild rans-formation,nvolvingositiveurvature,smost ppropriateortheresponse ariable.Figures4(c)-4(f) show theACE transformations(X)k,x,).kk4(Xk4) for he transformed)redictorariables ' appearingin the finalmodel. The standarddeviationu(4,*) is indicatedineachgraph.This provides measure f howstronglyach4>*(xj)entersnto he modelfor0*(y'). [Note thatv(0) =1.] The two ermshat ntermost tronglynvolve henumberofrooms quaredFigure(c)] and he ogarithmf he ractionofpopulationhat s of ower tatusFigure (d)]. Thenearlylinear hapeofthe atterransformationuggestshat he rig-inal ogarithmicransformationas appropriateor hisvari-able.The transformationnthenumberfrooms quared ari-able s far rominear, owever,ndicatinghatsimple uadraticdoes not dequately apturetsrelationshipohousing alue.Forfewer han ixrooms, ousing alue sroughlyndependentof roomnumber, hereas or arger aluesthere s a strongincreasinginear ependence.heremainingwovariables hatenter nto hismodel repupil-teacheratio ndpropertyaxrate.The solutionransformationor heformer,igure (e),is seen to be approximatelyinearwhereas hat or he atter,Figure (f),hasconsiderable onlineartructure.ortaxratesofupto$320, housing rice eems o fall apidly ithncreas-ingtax, whereas or arger ates heassociation s roughlyconstant.Althoughhevariable NOX)2was not elected y our tep-wiseprocedure, e can try oestimatetsmarginalffect nmedian omevalueby ncludingtwith he our elected ari-ables andrunning CE with heresultingivepredictorari-ables. The increasenR2 overthefour-predictorodelwas.006.Thesolutionransformationsnthe esponse ndoriginalfour redictorshanged eryittle. hesolutionransformationforNOX)2 s shownnFigure (g). Thiscurve s a nonmon-otonic unctionfNOX2,notwell pproximatedy linear ormonotone) unction.hismakes tdifficultoformulatesim-ple interpretationfthewillingnesso pay for lean airfromthese ata. For owconcentrationalues,housing rices eemto ncreasewithncreasingNOX)2,whereas orhigher aluesthis rends substantiallyeversed.Figure (h) shows scatterplotfO*(Yk) verus _j_ f* Xkj)for hefour-predictorodel.Thisplot howsno evidence fadditionaltructureot apturednthemodel

    40()= , /j*(Xj) e.j=1

    Thee^*2 esultingromheuseoftheACE transformationsas.11,?s comparedo he 2value f 20producedy heHarrisonand Rubinfeld1978) transformationsnvolving ll 14 varia-bles.Forour final xample,we use theACE algorithmostudytherelationshipetweentmosphericzoneconcentrationnd

    meteorologynthe os Angeles asin.Thedata onsist fdailymeasurementsfozoneconcentrationmaximumne hour v-erage) nd ightmeteorologicaluantitiesor 30 days f1976.Appendix liststhe variables sed in thestudy.The ACEalgorithmasapplied ere nthe ameforwardtepwiseman-ner s in theprevioushousing ata)example. ourvariableswere elected.Theseare thefirst our isted nAppendix .Theresulting2was .78. RunningheACE algorithmith lleight redictorariables roducesnR2 of 79.In order o assess theextent owhich hesemeteorologicalvariables apture he dailyvariation f the ozone level,thevariable ay-of-the-yearasadded nd heACE algorithmasrunwitht ndthe ourelectedmeteorologicalariables. hiscan detect ossible easonal ffectsot apturedy themete-orological ariables. he resulting2was .82. Figures (a)-5(f) showthe ptimal ransformationstimates.The solution or heresponse ransformation,igure (a),shows hat, tmost, verymild ransformationith egativecurvatures ndicated.imilarly,igure (b) ndicateshathereis nocompelling ecessityoconsider transformationnthemost nfluentialredictorariable, andburg irForceBaseTemperature.he solutionransformationstimatesor he e-mainingariables, owever,re allhighly onlinearand non-monotonic).orexample, igure (d) suggestshat he zoneconcentrations muchmorenfluencedythemagnitudehanthe ignofthepressureradient.The solution or heday-of-the-yearariable, igure (f),indicates substantialeasonal ffectfterccountingor hemeteorologicalariables. hiseffects minimumttheyearboundaries nd has a broadmaximumeaking t aboutMay1. This anbecompared ith he ependencefozonepollutiononday-of-the-yearlone, withoutakingnto ccount heme-teorologicalariables. igure (g) shows smooth fozoneconcentrationnday-of-the-year.his mooth as anR2of 38and s seen topeakabout hreemonthsaterAugust ).The fact hat heday-of-the-yearransformationeaked tthebeginningfMaywas initiallyuzzling o us, since thehighestollutionays ccur rom ulyoSeptember.his atterfact s confirmedy theday-of-the-yearransformationiththemeteorologicalariables emoved. ur urrentelief sthatwith hemeteorologicalariablesntered,ay-of-the-yeare-comes partialurrogateor ours fdaylighteforendduringthemorningommuterush. hedecline astMay1may henbe explained ythe fact hat aylightaving imegoes intoeffectnLos Angeles n the astSundaynApril.These data llustratehatACE is usefulnuncoveringnter-estingnd uggestiveelationships.heform f he ependenceon theDaggett ressure radientnd on theday-of-the-yearwouldbe extremelyifficulto find yanypreviousmethod-ology.

    4. DISCUSSIONTheACE algorithmrovides fullyutomated ethod orestimatingptimal ransformationsn multiple egression.talsoprovides methodor stimating aximalorrelatione-tween andom ariables.tdiffersrom thermpirical ethods

    for indingransformationsBox andTidwell1962;Anscombeand Tukey1963; Box and Cox 1964; Kruskal1964, 1965;Fraser1967; Box and Hill 1974; Linsey1972, 1974;Wood

    This content downloaded from 193.136.144.3 on Thu, 23 Jan 2014 18:34:54 PMAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 8/13/2019 2288473.pdf

    10/20

    588 Journalf heAmerican tatisticalssociation,eptember1i9852 a ~~~~~~~~~~~~~0.3

    0 UP03) 10.2 ~*(VSTY)5 ~~~~~0.10 -j 0.0-1 ~~~~~~~~~~~~~~~~~~~~~~~-0.1

    i-i_ _ _ _ _ _ _ _ _ _ _ _ _ _ __0__20 10 20 30 40 0 100 200 3001.5

    b f41.0 p)0.2 - ~*(Day ofYear)0.50.0.0~~~~~~~~~~~~~~~~~~~.

    -0.2-0.5

    -0.4-1.0

    20 40 60 80 0 100 200 300 4001.0

    0.2 c 7 q5~~~~~*(IBHT) 0.50.1~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~~~~~~~1

    0.0 0.0

    0.1 ~~~~~~~~~~~~~~~~~~~-0.5-0.2

    -1.00 1000 2000 3000 4000 5000 0 100 200 ...300 400d

    0.2 - $~~~~~*(DGPG)0.0

    -0.2

    This content downloaded from 193.136.144.3 on Thu, 23 Jan 2014 18:34:54 PMAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 8/13/2019 2288473.pdf

    11/20

    Breiman and Friedman: EstimatingOptimal Transformations 5891974; MostellerndTukey 977; andTukey 982) nthat he"best"transformationsfthe esponsendpredictorariablesareunambiguouslyefinednd stimatedithoutse of d hocheuristics,estrictiveistributionalssumptions,rrestrictionof the ransformationoa particulararametricamily.Thealgorithmsreasonablyomputerfficient.ntheBos-tonhousing atasetcomprising06 datapointswith 4vari-ableseach,therun ook 12 seconds fcentralrocessing nit(CPU) time n an IBM 3081computer.urguess s that histranslatesnto2.5 minutes n a VAX 11/750 omputer. oextrapolateo other roblems, se theestimate hat unningtime sproportionalo numberfvariables)x (sample ize).A strongdvantage f theACE procedures the bility oincorporateariables fquitedifferentype ntermsf the etofvalues hey anassume.Thetransformationunctions(y),01(xj), . . . , Op(xp) ssumevalueson the real line. Theirargumentsan, however,ssumevalues on anyset.For ex-ample, rderedeal,periodiccircularlyalued) eal, rdered,andunorderedategoricalariablesanbe incorporatednthesameregressionquation. orperiodic ariables,he mootherwindow eed nlywrap roundhe oundaries. orcategoricalvariables,heprocedureanberegardeds estimatingptimalscores or achof heir alues. The pecial ase of categoricalresponse nd a single ategoricalredictorariable s knownas canonical nalysis-see Kendall ndStuart 967,p. 568-andtheoptimal cores an, inthis ase, also be obtained ysolution f a matrixigenvectorroblem.)TheACE procedurean alsohandle ariables fmixedype.Forexample, variablendicatingresentmaritaltatusmighttakeon an integeralue numberfyearsmarried)rone ofseveral ategorical alues N = never, = divorced,W =widowed, tc.). Thispresents o additionalomplicationnestimatingonditionalxpectations.his ability rovidesstraightforwardaytohandlemissing atavalues Youngetal. 1976). In addition o theregularetsofvaluesrealized ya variable,t can also takeonthevalue"missing."Insome ituationshe nalyst,fter unningCE,maywanttoestimatealues fyratherhan *(y), given specific alueof x. One method ordoing this s to attempto compute0 Q*- ( j*(Xj)). Letting = 1j=1j*(XJ), however,weknow hat hebest east quares redictorf Y ofthe orm(Z)is givenbyE(Y IZ). This s implementednthe urrentCEprogramypredicting as the functionf jP=I 4* (xj), ob-tained ysmoothinghedatavaluesofyonthedatavaluesofEj> j* (xj). We aregratefuloArthurwens for uggestingthis imple ndelegant redictionrocedure.The solution unctions*(y) and4 (x1), . ., * xp)canbe stored s a setofvaluesassociatedwith achobservation(Yk,Xkl, . . . , xkp), 1 ? k ? N. Since0(y)and+(x), however,areusually moothfor ontinuous, x), they an be easilyapproximatednd toreds cubic pline unctionsdeBoor 978)with fewknots.As a tool fordataanalysis, he ACE procedurerovidesgraphicalutputo ndicate needfor ransformationss wellas toguidentheir hoice. f particularlot uggestsfamiliarfunctionalorm or transformation,hen hedata an bepre-transformedsing his unctionalormnd theACE algorithmcan be rerun. he linearityor nonlinearity)f theresultingACE transformationn thevariablen question ivesan in-

    dication f howgood theanalyst's uess s. We havefoundthat he lots hemselvesften ive urprisingew nsightsntotherelationshipetween heresponse ndpredictorariables.As with nyregressionrocedure, high egree fassoci-ation etween redictorariables ansometimesausethe n-dividual ransformationstimateso be highly ariable, venthoughhe ompletemodel s reasonablytable.When his ssuspected,unninghe lgorithmnrandomlyelectedubsetsof thedata, or on bootstrapamplesEfron 979), can assistinassessing hevariability.TheACE method asgeneralityeyond hat xploited ere.Animmediateeneralizationould nvolvemultipleesponsevariables I, . . . , Yq.Thegeneralizedlgorithmould sti-mate ptimalransformations*, . . .0, O*,04*, . ., p* thatminimize EL 01Y1) o (Xj)~subject oEO = O,= 1, ..., q, E = O,j = 1, ...,p, and IY, 01(Y1)1121.This extension eneralizes heACE proceduren a sensesimilaro that nwhich anonical orrelationeneralizedinearregression.The ACE algorithmSection ) is easilymodifiedo ncor-poratehis xtension. n nneroopover he esponseariables,analogousothat or he redictorariables, eplaceshe ingle-function inimization.

    5. OPTIMAL RANSFORMATIONSNFUNCTION PACE5.1 Introduction

    Inthis ection,we firstrove he xistencefoptimalrans-formationsTheorem .2). Thenwe show hat heACE algo-rithmonvergeso an optimal ransformationTheorems .4and5.5).Define andom ariableso takevalues ithernthe eals rina finite rcountable norderedet. Given set ofrandomvariables ,XI, . . . , Xp a transformationsdefinedya setofreal-valuedmeasurableunctions0, 4), . ., 4)P) (0,4), eachfunctionefinedn therange fthecorrespondingrandom ariables, uch hatEO(Y) = 0, E/j(Xj) = 0, j = 1, . ., pE02(y) < oo, E)j2(Xj) < oo, j = 1. p. (5.1)

    Use thenotation+(X) = E 4(Xi). (5.2)

    Denote he et of all transformationsyW.Definition.1. A transformation0*, q*) is optimal orregressionfE(0*)2 = 1 and

    e*2 = E[O*(Y) - (*(X)12= inf E[0(Y) - 4(X)]2; EQ2 =1}

    Definition2. A transformationQ* , + * *) is optimal or

    This content downloaded from 193.136.144.3 on Thu, 23 Jan 2014 18:34:54 PMAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 8/13/2019 2288473.pdf

    12/20

    590 Journal fthe American StatisticalAssociation, eptember1985correlationfE(0**)2 = 1,k(o**)2 = 1, and

    p= E[0**(Y)4**(X)]= sup E[O(Y)4(X)]; E(4)2 = 1, EO2 = 1}.

    Theorem.1. If 0**, 4**) s optimal or orrelation,hen0* = 0**, 4* = p*4** is optimal or egression,nd theconverse. urthermore,*2 1 -p*.Proof.Write

    E(O- )2 = 1 - 2EO4 + Eb2= 1 - 2E(O)VE + E42,

    where -q EIV. HenceE(O - )2 21 -2p* + E42 (5.3)

    with quality nly fEO = p*. The minimumftherightside of 5.3) overE42 is at E42 = (p*)2,where t s equalto1 - (p*)2. Then e*)2 = 1 - (p*)2; and if 0**, 4,**)soptimal orcorrelation,henO* = 0**, 4o* = p*4)** isoptimal or egression.hearguments reversible.A similarresult ppears nCsakiandFisher 963.)5.2 ExistenceofOptimalTransformations

    Toshow xistencefoptimalransformations,wo dditionalassumptionsre needed.Assumption.1. Theonly etoffunctionsatisfying5.1)such hat

    0(Y) + > 4j(Xj) = 0 a.s.are ndividually.s. zero.To formulatehe econd ssumption,euse Definition.3.

    Definition.3. Define he Hilbertpaces H2(Y), H2(XI),. , H2(Xp) as the etsoffunctionsatisfying5.1) with heusual nner roduct;hats,H2(Xi) is the etof allmeasurable4, uch that 4j(Xj) = 0, Eoj2(Xj) < oowith 0j', 4j) =E[j' (Xj)0j(Xj)]Assumption.2. Theconditionalxpectationperators

    E(qj(Xj) | Y): H2(Xi) H2(Y),E(4i(X1) Xi): H2(XJ) H2(Xi), i = jE(O(Y) | X) H2(Y) H2(Xi)

    are all compact.Assumption.2 is satisfiednmost ases of nterest. suf-ficientonditionsgiven ythe ollowing.etX,Y berandomvariables ithointdensityx,yndmarginalsx, y. hen heconditionalxpectationperatornH2(Y)-* H2(X) scompactif f fkyIfXfY]dxdy o(.Theorem .2. UnderAssumptions.1 and 5.2, optimal

    transformationsxist.Somemachinerys needed.

    Proposition.1. The set of all functionsof theformf(Y, X) = O(Y) + , 41(X1), 0 E H2(Y), fjE H2(Xj),with he nner roductnd norm

    (g, f) = E[gf], lf 12 Ef2,is a HilbertpacedenotedyH2.Thesubspace f all functions4 of theform

    (X) = t(X1), qj E H2(Xy),is a closed inear ubspacedenoted y H2(X). So areH2(Y),HAX,), -. . , H2(Xp).

    Proposition.1 follows rom roposition.2.Proposition.2. UnderAssumptions.1 and5.2, therereconstants < c1 c2< oo uch hat

    C, 11011, IkiPI2) o + 1 p,j2' C2(1O1112 > likIV)2

    Proof. heright-handnequalitys mmediate.fthe eft idedoesnothold,we canfind sequence n n + z )n suchthat IIn0112 JP, 1i1onjJ2 1, but llfnl12O 0. There isa subsequence ' suchthat0n'w 0, O)n, j);n the enseofweakconvergencenH2(Y), H2(X1), . , H2(Xp),respec-tively.Write

    E[0n'j(Xj)0n'i(Xi)] = E[1n,'(Xj)E(0n'i(Xi) I Xj)]tosee thatAssumption.2 implies 4n)t,n'i E4j4i (i = j),and similarly orEOn'n4',j.FurthermoreI) < lim nf kknII,11011lim nf In'll. Thus,defining = 0 + Ejoj,

    lf112 110 , 2& < lim nf If112 0,Iwhich mplies, y Assumption.1, that = 4, = = p= 0. On theother and,lIfn12 IOn'I112 1InIj4.1122o (On', n'j)i i + 2 (o n" n'i)iojHence, ff = 0, then im nf lfnlI2 1.

    Corollary.1. Iffnwf nH2, then n > 0 inH2(Y), Onj4j nH2(Xj), j = 1, . ., p, and the onverse.Proof. Iffn On Onjw0 + Ej 4j,then y Prop-osition .2, im sup II?nIl?o, im up I4,nIll??. aken' suchthat n 0', 4t, n4)J,nd etf' = 0' + Ej 4);. hen orany g E H2, (g, fn)- (g,f') so g, f) = (g,f') ll g. TheconverseS easier.Definition.4. In H2, et y P1, ndPxdenote he rojectionoperatorsntoH2(Y),H2(Xj),andH2(X),respectively.

    This content downloaded from 193.136.144.3 on Thu, 23 Jan 2014 18:34:54 PMAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 8/13/2019 2288473.pdf

    13/20

    Breiman and Friedman: EstimatingOptimal Transformations 591OnH2(Xi),Pj (j # i) is the onditionalxpectationperator,and similarlyor y.

    Proposition.3. Py scompactnH2(X) -> H2(Y),andPxis compactnH2(Y) -> H2(X).Proof. Take t)n E H2(X), 4, t). This implies,by Cor-ollary .1, that)nj - ()i. By Assumption.2, PyOnj 4 PY )so that y4n 4 Py4. Nowtake0 E H2(Y), 4 E H2(X); then(0, Py4) = (0, 4) = (PxO,4). ThusPx: H2(Y) -> H2(X) isthe djoint fPyand hence ompact.

    Now to complete he proof f Theorem .2, consider hefunctional10 )112nthe etof all 0, 4) with 101121. Forany0, 4,110 Q112 110 pX0II2.

    If theres a 0* that chieves heminimumf 110 PXOII2over1101121,then noptimalransformations 0*,PxO*.On110112110- PX0II2 = 1 - IIPX0II2.

    Lets = {supllPxOll;10111}. TakeOnuch hat lInII2 1,On-4 0, and IPX0nll-> . By thecompactnessfPx, IIPXOIllIIPxOlI s. Furthermore,1011 1. If 11011 1, thenfor0' =0/11011,e getthecontradictionIPxO'II s. Hence11011 1and 0, Px0) is an optimal ransformation.hisarguments-sumes hat > 0. Ifs = 0, then 10 Px0II= 1 for ll 0 with11011 1, andany 0, 0) is optimal.5.3 CharacterizationofOptimalTransformations

    Define wooperators, : H2(Y) -> H2(Y) and V: H2(X)H2(X),byUS = PyPx0, V+ = PxPrProposition.4. U and V arecompact, elf-adjoint,ndnon-negativeefinite. heyhave the same eigenvalues,ndtheres a 1-1correspondenceetween igenspacesor givenpositive igenvalue pecifiedy

    0 = PXOIIIPo0II 0 = PY/iiiPY1ii-Proof Direct erification.

    Let the argest igenvalue e denoted yA, A = IlUlI IIVII.In the equel we add the ssumptionhat here s at leastone0(Y) such hatIPx0II 0. ThenA> 0 andTheorem.3 follows.Theorem .3. If Q*, 4* is an optimal ransformationorregression,hen

    AS* = U0*, * =VConversely,f0 satisfies O= UO, 11011 1, then , Px0 isoptimal orregression.f 4 satisfiesAO = V+, then0 =Py/llIPyIll,ndA/llIPyIllreoptimal orregression.n ad-dition,

    (e2) = 1 -Proof Let 0*,j** be optimal. Then A* = PxO*. Write

    110* +*II2 1 - 2(0*, Xt*) Ikg*112.Note hat 0*, 4)*) (0*, Py4i*)_ IIPy4)*IIith qualitynlyif Q* = cPy4)*, c constant. herefore,* = y*lP*l.

    This mpliesIIPyI*110* UO*, 11P4*4* = V+*

    so thatJlPy4*11s an eigenvalue * of U, V. Computingives110* 4*112 1 - A*. Nowtake0 any igenfunctionfUcorrespondingoA,with 1011 1. Let4 = P,0; then 10)112 1 - A. This shows hatQ*,O* are notoptimal nlessA*= . The rest f he heoremsstraightforwarderification.Corollary .2. IfA hasmultiplicityne,then heoptimaltransformations unique ptoa sign hange. nany ase,thesetofoptimalransformationss finite imensional.

    5.4 Alternatingonditional MethodsDirect olution f theequations O = UO or A4 = V4 isformidable. ttemptingo use data to directlystimate hesolutionss justas difficult.n thebivariatease, ifX, Y arecategorical,hen 0 = UO becomes matrixigenvalue rob-lem and is tractable. his is thecase treatednKendall ndStuart1967).TheACE algorithms founded n the bservationhat hereis an iterative ethod or indingptimal ransformations.eillustratehis n thebivariate ase. The goal is to minimize110(Y) 4(X)112 ith 101121. DenotePxO= E(0 IX),Py4= E(O I Y). Startwith ny first-guessunctionO(Y)havinga nonzero rojectionn the igenspacef he argestigenvalueofU. Thendefine sequence f functionsy

    o = Px0o01 = PYko/llPY001101 = PXOl,

    and ngeneral/,+l = PXOn,n+1= PY0n+1llPYfn+11l.t isclearthat t eachstep n the teration10 0112 s decreased.It is nothard o show thatngeneral, ng,)nconvergeo anoptimalransformation.Thepreceding ethod falternatingonditionalsxtends othegeneralmultivariatease. The analog s clear; givenO,nOngthenext terations

    On+1 = PXOn, On+1 = PYOn+1111PYOn+111-However, here s an additionalssue:HowcanPx0 be com-puted sing nly he onditionalxpectationperators1 ] =1, . . . , p)? This is doneby starting ith omefunction0and terativelyubtractingff heprojectionsf0 - Ononthesubspaces 2(X1), . . . , H2(Xp) ntilweget function uchthat heprojectionf0 - 4 on each ofH2(X1) s zero.Thisleads to thedouble-loop lgorithm.TheDouble-LoopAlgorithm

    TheOuter oop. (a) Startwith n initial uess 0O(Y). b)Put On+1 = PXOn 0n+1 = Pyk)n+1II1Pyk)n+111ndrepeat ntilconvergence.

    LetPEOO be theprojectionf00on the igenspace of UcorrespondingoA.Theorem.4. If IIPEOOII 0, define n optimal ransfor-mation y0* = PESOOIIPEOOII,*= PXO*hen 10Jn ?*11?0,1k,, (*I>O11

    This content downloaded from 193.136.144.3 on Thu, 23 Jan 2014 18:34:54 PMAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 8/13/2019 2288473.pdf

    14/20

    592 Journal ofthe American Statistical Association, September 1985Proof. Notice hat ,+, = UO,/1lUO,l.or any n, 0On =

    ant,* + gn, where n I E, because, f t s true orn, thenOn+1 = (an7,* + Ugn)/||an7O + Ugn|i

    andUgn s I to E. For anyg I E, lUghi rilgil,where 0 and defineG(y)= sup IITW4)I/IIW41I;111I 1, IIW)II y}.Take nw4), k1I < 1, I|WVnlly othatITWIIIIIWnII-G(y).Then 111I 1, IIW4jIIy,andG(y) = IITW4)I/IIW411.ThusG(y) < 1for ll y> 0 and s clearly onincreasingn y.Then

    IITmW`WIIIITWTM"-(11 G(IITmlW411)IITm-W41Putyo= IIWIlm Gm(ym)yo;hen ITmWIIYm.ut learly

    The range fW s dense n H2(X). Otherwise,heres a 4)'# 0 such hat4)',W4)) = 0, all4). his mpliesW*4)',4))- 0 orW'*4' = 0. Then IT*4)'II II4)'II,nd repetitionf

    the argumentivenbeforeeads to 00. For any4 andE > 0, takeW4l so that 14 W+l11 e. Then ITmIll e +IITmW4111,hich ompletesheproof.There re two versions f thedouble oop. In thefirst,heinitial unctions0are the imitingunctionsroducedytheprecedingnneroop.This s calledtherestartersion. nthe

    second, he nitial unctionsre00 0. This s thefresh tartversion. he main heoreticalifferences that strongeron-sistency esultholdsforthefresh tart.Restarts a faster-runninglgorithm,nd t s embodiedntheACE code.TheSingle-Loop lgorithm

    Theoriginalmplementationf ACE combined single t-eration f the nneroopwith n iterationf the outer oop.Thus t s summarizedythefollowing.1. Startwith 0, k0= 0.2. Ifthe urrentunctionsre n, 4n, define )n+1 by

    On - 4)n+I = T(fJn )n) d3. LetOn+1= Pkn+1/IIPy4n+1. un o onvergence.This is a cleaner lgorithmhan hedouble oop, and itsimplementationndataruns t east wice s fast s thedoubleloop ndrequires nly single onvergenceest.Unfortunately,we havebeen unableto provethat t converges n functionspace.Assumingonvergence,t anbe shownhathe imiting0 is an eigenfunctionf U. But giving onditions or 0 tocorrespondo , oreven howing hat willcorrespondo ,"almost lways" seemsdifficult.orthis eason,we adoptedthedouble-loop lgorithmnstead.

    APPENDIXA: THEACE ALGORITHM NFINITE ATASETSA.1 Introduction

    TheACE algorithms mplementednfiniteata ets y eplacingconditionalxpectations,ivenontinuousariables,y ata mooths.Inthe heoreticalesultsoncerninghe onvergencend onsistencypropertiesf heACEalgorithm,he riticallementsthe ropertiesofthe ata moothsed.Theresultsrefragmentary.onvergenceofthe lgorithmsprovennly or restrictedlass f mooths.npractice,nmore han ,000 uns fACE on a wide arietyfdatasets ndusinghree ifferentypesfsmooths,ehave een nlyone nstanceffailureoconverge. fairlyeneral,utweak, on-sistencyroofs given.Weconjecturehe orm fa strongeron-sistencyesult.A.2 Data SmoothsDefine data etD to be a set x, XN} ofN pointsnp-dimensionalpace; hats, Xk (Xkl, . . ,Xkp). et )N ethe ollectionof ll such ata ets. or ixed , define(x) as the pace f ll real-valued unctionsdefinednD; thats,4 E F(x) is definedy heN realnumbers+(xl), . . . I)(XN)}. Define(x,), = 1, . ., p,as the pace f llreal-valuedunctionsefinednthe et xl,, x2 ,

    *. * , XNj}-Definition.l. Adata mooth ofx onxj s a mapping: F(x)-*F(x,) definedor very inGPN. If4) E F(x), denotehe orre-spondinglementnF(xj)by (4) | xj)and tsvalues y (4)IXkj).Letxbe any neof x,,... p Some xamplesfdata moothsare he ollowing.

    This content downloaded from 193.136.144.3 on Thu, 23 Jan 2014 18:34:54 PMAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 8/13/2019 2288473.pdf

    15/20

    Breiman and Friedman: EstimatingOptimal Transformations 5931. Histogram. Dividetherealaxis intodisjointntervalsI,}. IfxkE I,, define

    S(O Ixk) = - > 4(Xm).nX,,m4kE1

    2. NearestNeighbor. FixM < N12. Order he igetting < x2< .. < XN (assumeno ties)andcorresponding(x,), ).Put

    S(ktIXk) IOXkM )2M m=-MmOOIfM points re not vailable noneside,makeupthedeficiencynthe ther ide.

    3. Kernel. TakeK(x) defined n the ealswithmaximumtx =0. ThenS(4O I k) = O?(xm)K(xm - Xk) E K(x, - Xk).m f

    4. Regression. FixM and order k s in example . AtXk, re-gressthe values of 4)(xk+M) . . ., 4)(xk+M), excluding (Xk), onXk-M, . . ., Xk+M, excludingk, getting regressionineL(x). PutS(I I Xk) = L(xk). If M points renot vailable n each sideofXk,makeup thedeficiencyn the ther ide.

    5. Supersmoother. ee Friedmannd Stuetzle1982).Somepropertieshat rerelevanto thebehavior f smoothersregivennext. hesepropertiesold nly f hey retrue or ll D C ,1. Linearity. A smooths linearf

    S(aqi + /42) = aSq51+ fS4)2for ll 41, )2 E F(x) and all constants, ,B.2. Constant reserving. If 4 E F(x) is constant4-c), thenSO = c.

    To give a furtherroperty,ntroducehe nner roduct )NonF(x) defined y(4), ')N = - 4)(Xk)4)'(Xk)N k

    andthe orrespondingorm 1 IN3. Boundedness. S is bounded yM ifIIS)IIN MII4IIN, all4 E F(x),

    whereIS5IIN s definednF(x,) exactly s IkPIINs definednF(x).In theseexamples f smooths, ll are linear, xcept hesuper-smoother.his mplies hey an be representeds anN X N matrixoperator arying ith . All are constantreserving.istogramsndthenearest eighborrebounded y2. Regressions unbounded ue

    to end ffects,ut nthe ectionA.5 we ntroducemodifiedegres-sionsmooth hat s bounded y2. The bound orkernelmoothssmore omplicated.A.3 Convergence ofACE

    Let thedatabe oftheformYk, Xk) = (Yk, Xkl. Xkp), k = 1,N. Assume thaty = x= x= = 0. Define smoothsS.,S l . .p, whereS, F(y, x) F(y) and S,: F(y, x) -> F(x,).LetH2(y,x) be the etof ll functionsnF(y, x) with eromean, ndletH2(y),H2(x,)be the orrespondingubspaces.It s essential omodifyhe moothso that he esultingunctionshave zero means. This is doneby subtractinghe mean; thus hemodified, is defined ySf4) S,4) - Av(S,4)). (A.1)Henceforth, e use onlymodifiedmooths nd assume he original

    smooth o be constant reservingo that hemodifiedmoothsakeconstantsnto ero.The ACE algorithms defined ythefollowing.1- 0( )(Yk) = Yk, Cb50(xkJ) = 0.(The inner oop)2. At then stage ftheouter oop,start ith (n)5 0(?).Forevery

    m I and = 1, p, define(m+1) S(0n) - > 4m+) _-E >1 m)

    iJKeep increasing until onvergenceo -,.

    (The outer oop)3. Set Q(n+1)= SY(i 0)/)IISy(li j)IINGo back to the nneroopwithOj?)' = 4, restart) r Oj50 = 0 (fresh tart). Continue untilcon-vergence.To formalize his algorithm,ntroducehe spaceH2(O, +) withelements0, 04, . ., ,p), E H2(y), ,E H2(x,), nd ubspacesH2(0) with lements0, 0, 0, . . ., 0) = 0 andH2(W) with lements

    (O, 01, . ., p) =+4Forf = (fo, ,., fp) nH2(0, 4)),define,: H2(O,4)H2(0, ))by (S,f) =0, j ? i=fi Sij( f,), j=i\,oj

    Starting ith = (0, 0, 0, . . , 0), 4)(m)= (0, 0(m)),onecompletecycle nthe nneroop s describedy0 - + (m I = I Sp)(I - Sp - ) ... (I - Sj)(O t() (A.2)DefineT onH2(0, 4) H2(0, 4)) as theproduct peratorn A.2).Then

    4)(m) = 0 - Tm(0 - 4)(O)) (A.3)If, for given0, the nneroop converges,hen he imiting )satisfies

    S(0- 4) = 0, ] = 1, P. (A.4)That s, the mooth f the esiduals nanypredictorariables zero.Adding

    0 = Sy,SIISYNIk (A.5)to (A.4) givesa setofequations atisfiedytheestimatedptimaltransformations.Assume,for heremainderf this ection, hat he smooths relinear. he A.4) canbe writtens

    SA) = S,O, i = 1, . - , P. (A.6)Let sp(S,) denote he pectrumf thematrixj. Assume1 o sp(Sj).(Thenumber is inthe pectrumor onstantreservingmooths utnotformodifiedmooths.) efinematrices , byA,= S,(I - S,)-Iand the matrix as ,AJ. Assumefurtherhat 1 sp(A). Then(A.6) has theunique olution4,= A,(I + A)'0, j = 1, . . . 'p- (A.7)The element = (0, 4,p.4),,) givenby A.7) will be denotedbyPO. RewriteA.3) using I - T)(0 - P0) = 0 as

    4)(m) = PO - Tm(P - ''J'?' (A. 8)Therefore,he nner oop convergesf t can be shown hat mfo 0for ll f E H2(4)). Whatwe can show s Theorem . 1.

    Theorem .]. If det[I + A] $ 0 and f the pectral adii f l,,. ,Snare all less than ne, a necessarynd sufficientondition or

    This content downloaded from 193.136.144.3 on Thu, 23 Jan 2014 18:34:54 PMAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 8/13/2019 2288473.pdf

    16/20

    594 Journal f theAmericanStatisticalAssociation, eptember1985iTmfO 0 for ll f E H2(4,) s that

    det[A - (I - Si/A)-I(I -S))] (A.9)has no zeros nJA 1exceptA= 1.Proof. ForTmf 0, allf E H2(4), it s necessaryndsufficientthat he pectral adius fT be lessthan ne. Theequation f = 2fincomponentorms

    Af,= -Si(Ai f, + E f, j = 1,. p. (A.1O)IO.Proposition .1. IfONis definednH2(y)for ll data etsD, and0 E H2(Y) such hat

    EIION(y) 0(y)112 0,thenE ON(Y) 0(y) 2

    Proof. Write0/110110/IIOIIN0(/101 l/IIOIIN).hen twoparts reneeded: irst,o show hat11IIONIINIOIINN

    This content downloaded from 193.136.144.3 on Thu, 23 Jan 2014 18:34:54 PMAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 8/13/2019 2288473.pdf

    17/20

    Breiman and Friedman: EstimatingOptimal Transformations 595and second, o show hat

    F 1- IIOIIN 2NForthefirst art, et

    S2 1 (0N(Yk) 0 Yk) 2 (ON , 0)NN N k IIONIIN IIOIIN) IIONIINIIOIIN)Then N ? 4, so it s enough oshow hatN 0 toget SN 0.Let

    VN N> (ON(Yk) -(YJ= IIONIIk + 111N - 2(ON, 0)N= (IIONIIN - IIIIN) + 2(1IOIINIIONIIN - (ON, 0)N)-

    Both terms re positive, and since EVN- 0, E(I10NIIN - IIOIIN)2 - 0and E(IIOIINIIONIIN - (ON, 0)N) O 0. By assumption, 1101kN 110112,resultingn SN 40Now look atWN = - 0O2(yk)[liII01IN - 1/11011]2Nk

    IIIIk(1 IIOIIN - 1/11011)2= (1 - IIOIIN/IIOII)

    Then WN -- 0 follows romhe ssumptions.Using Proposition .1, itfollows hat IION(Y; m, 1) - 0(y; m,N)II- 0 and, nconsequence,hat I,N(X,; m,1) -,(x, ; m,1)112- 0.In functionpace,define

    P)m'Q= 0 - Tm0Um= x

    Then 0(; m,1) = Um lIUnThe last tep ntheproofs showinghat||UM00I/11UM00II0*11| 0

    as m, go to nfinity.eginwith roposition.2.PropositionA.2. As m - oo, Um U in the uniform peratornorm.Proof. llUmO U0II = IlPyTmPx0llIlTmPx0ll.ow onH2(Y),IlTmPxIlO 0. If not, take 0mg 10mll 1 such that1T1'PX0mll 6, allm.LetOm,'40; then X0m s PxO nd

    JlTm'PXOm,llITm'Px(0m, 0)11 IITmPx0IIC llPx(0rm 0)11+ IlTm'PxOll.

    By Proposition5.5) the ight-handidegoestozero.The operator ms notnecessarilyelf-adjoint,ut t s compact.By PropositionA.2), if0(sp(U)) is anyopensetcontainingp(U),then ormsufficientlyarge, p(Um) 0(sp(U)). Suppose,for im-plicity,hat he igenspaceAcorrespondingo the argest igenvalue

    i ofU is one-dimensional.The proof oes throughfE, is higher-dimensional,ut t s more omplicated.) henfor ny openneigh-borhood ofA, ndmsufficientlyarge, heres only neeigenvalueAmfUmn0, )mrn> s, and theprojection(m) fUm orrespondingto )r convergeso PEAnthe uniformperatoropology.Moreover,'ircan be taken s the igenvalue fUrnavingargest bsolute alue.If L' is the econd argest igenvalue f U and4msthe igenvalueof Urn aving he econdhighest bsolute alue,then assuming ,~is one-dimensional)m > A'.

    Write Wm = Um P~,E W = U -PE;so IlWm Wll - 0 again. Now,

    Um6o = Pm)0O + WI0oU'0O = 1IPE-0o + W00. (A.16)

    For any > 0 we will showthat here xistsmo, suchthat ormMO, 1 ? 10,

    ||wmo/im01A 8, |W Soll/2 - * (A. 17)Taker = (, + A')/2 nd selectmo uch hat > max(G, ImlI;MO).Denote by R(A, Wm) he resolvent f Wm. henWI = I | RQL, Wm)di27r I|=r Rand-ilmi rI | gR(A,Wm)11dJAJ,27r 12=r

    where I)4 s arc ength longJIH r.On JiA r,form mo, IIR(A,Wm)II_s continuousnd bounded. urthermore,IR(Q,Wm)II> IIR({,W)II niformly.fM(r) = maxlpI=rIIR(Q,)II, henIIWII r'M(r)(1 + Am),whereAmO 0 as m -> oo. Certainly,IIWIIIr'M(r).Fix6 > 0 such hat1 + 6)r < A.Takem' such hat orm2 max(mo,ms), Am (1 + 6)r. Then

    IIWII/II 1/(1 + 6))'M(r)(l + Am)and

    ll'l li'< 1/ 1+ 6))'M(r) -Nowchoose newmo nd10 uch hatA. 17) is satisfied.Using A.17),

    ul 00 P(m)00where8m,i 0 as m, I -l oo.ThusU1 PE0O PE-00m0- 0* =1IIU||0H m,I IIPEm1l IIPE-0011andtherightidegoesto zero as m, - oo.Thetermweak onsistencys used abovebecausewe have nminda desirabletrongeresult.Weconjecturehat or easonablemooths,the setCN = {(Y1,Xl), . . ., (YN,XN); algorithm onverges}satisfiesP(CN) -+1 and thatfor0N, the imit on CNstarting rom fixed00,

    E[ICNII0N 0N] 0.We also conjecturehat uch theorem ill be difficultoprove.Aweaker, utprobablymuch asierresultwouldbe to assume heuseof elf-adjointon-negativeefinitemooths ith on-negativeatrixelements. henwe know hat he lgorithmonvergeso someON,andwe conjecturehat [II0N 0*N] 0A.5 Mean Squared Consistency fNearestNeighborSmooths

    To show that he ACE algorithms applicablen a situation, eneed overifyhat he ssumptionsfTheoremA.2) can be satisfied.We do this, irstssuminghat hedata Y,,X), . (YN,XN) resamples rom two-dimensionaltationary,rgodic rocess. hen heergodic heoremmplies hat or ny0 E L2(Y), 11011kl 11012nd,trivially,~I0ISI>~ lOWTo show thatwe can get a bounded, inear equence f smoothsthat remean quared onsistent, e use the earest eighbormooths.

    This content downloaded from 193.136.144.3 on Thu, 23 Jan 2014 18:34:54 PMAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 8/13/2019 2288473.pdf

    18/20

    596 Journal fthe American StatisticalAssociation, eptember1985TheoremA.3. Let (Y1, XI),'. . . , (YN, XN) be samples from astationaryrgodic rocess uch hat he istributionfX hasnoatoms.Thenthere xists a meansquared onsistentequenceof nearest-neighbormooths fY on X.Theproof eginswith emmaA. 1.LemmaA.J. Suppose hat (dx) hasnoatoms, nd etPN(dx)P(dx). Take 3N> O, N- >O; defineJ(x; ) = [x - c,x + ?];

    and CN(x) = min{e; PN(J(x, ?)) 2 AN}e(x) = min{e;P(J(x, e)) 6 .

    ThenusingA to denote ymmetricifference,PN(J(X, EN(X)) A J(x, e(x))) -* 0 uniformlynx (A.18)and

    lim sup sup PN(J(x,E(x)) A J(y, E(y))) c &X(h), (A. 19)N {(x,y);Ix-yjIt}where 1(h)- 0 ash- 0.Proof. LetFN(x),F(x) be the umulativef orrespondingoPN,P. SinceFN - F andF is continuous,hentfollows hatsupIFN(x) - F(x)I -- O.

    To proveA. 18),note hatPN(J(X, 9N) AJ(X,))_ 1PN(J(XN)) - PN(J(X, 0))

    1N - PN(J(X,9N))l+ 1|N - 31 + IFN(X+ ?(x)) - F(x + ?(x))|+ IFN(X ?(X)) - FN(x - ( ,

    which oes it.To prove A. 19), it s sufficientoshow hatsup P(J(x, e(x)) AJ(y, ?(y))) c ?X(h)-

    x,y,k-yj5hFirst, ote hat

    |?(x) - s(Y)I S Ix - yA.IfJ(x,E(x)), J(y, e(y)) overlap, hen heirymmetricifferenceon-sists ftwo ntervals,, 12 such hatJIj? 2jx - Yl, I21 21x - yl.There s anho 0 such hatf x - y ho, he woneighborhoodsalwaysoverlap.Otherwiseheres a sequence x"},with (x,) -* 0andP(J(x", e(x"))) = 3, whichs impossible,inceP has no atoms.Thenforh s ho

    sup P(J(x, e(x)) AJ(y, e(y))) s 2 sup P(I)x,y;jx-yt-h |iI92h

    and theright-handidegoestozero as h -> 0.The emma sapplied s follows: etg(y)be anybounded unctioninL2(Y). Define 6(g Ix), using f) todenote he ndicatorunction,as11/ g(y) (x' EJ(x,e(x)))P(dy, dx')

    = 11/ Px(g x') I(x' E J(x, e(x)))P(dx').Note hat a s bounded nd ontinuousnx.Denote y W he moothswithM = [NJ]. Proposition.3 follows.

    Proposition .3. ElISg g - Pjgllj - 0 for ixed .Proof. By A. 18),with robabilityne,Sr (g I x) = (1/ N]) I g(yj)I(x1 E J(x, EN(X)))

    can be replaced or ll x bygN(x,a)) = (11[3N])> g(y3)1(x, J(x, iE(x))),wherew is a sample equence.

    Bythe rgodic heorem,or countablex"} dense n the eal ine,andc E W', P(W') = 1,('N(X., w0) = gN(X, CO) - Pb(g I Xn) -O 0.Use (A. 19) to establish hat or nyboundednterval andanywoW', N(X, co) 0 uniformlyor E J. Thenwrite

    1Nll|DN(X,)IIN E N'(Xk,w)I(XkE J)N k=1N+ ->k=Fkl, o41(Xk E' J).Nk=

    The first erm s bounded ndgoes to zero for o E W'; hence tsexpectationoesto ero.Theexpectationf he econd ennsboundedby cP(X E ' J). SinceJ can be taken rbitrarilyarge, his ompletestheproof.Using he nequalityEjIS6'g - Pxglls 2 Ej|S( g - P6gll + 21IP6g Pxgll2gives

    limsupEjjS?g - Pxgll2 21jP6g Pxgll2.Proposition .4. Forany4(x) cL2(X), im,,,0jjP& - O.11 0.Proof. For4 bounded ndcontinuous,

    Oq(x')I(x' E J(x, (x)))P(dx') (x)as (5-- 0 for very . SincesuplP,5 - ? c for ll (, then IP,4- oil-- 0. Thepropositionollowsf t can be shown hat or very0 E L2(X), imsup6llP0ll o. But

    IP6l12 = f [ O(x')I(x' E J(x,C(x)))P(dx')1P(dx)S O (X )2 p(d) [ I(X' E-J(x, (x)))P(dx)]

    Suppose hat ' is such hat here renumbers+, c- with ([x', x'+ c+]) = (, P([x', x' - -]) = 6. Thenx' E J(x,E(x)) impliesxi - e x x' + +, and

    116f (x' E J(x, (x)))P(dx) 2. (A.20)If,say,P([x', co)) < then 2? x' - c and A.20) stillholds, ndsimilarlyfP((- oo,x']) < 3.

    Take {OnJto be a countable etof functionsense nL2(Y). ByPropositions.3 andA.4, for ny > 0, we canselect (e, n),N(6,n) so that or ll n,E1lS'On PX0,Ik2 c for s ((, n),N 2 N(5, n).LetcM I 0 as M -* 0o;defineM = minnlM 6(c, n)andN(M) =maxn.M (6M,n). Then

    E1IS,Nn - PX0n112 CM for ? M,N 2 N(M).PutM(N) = max{M;N ? max(M,N(M))}. ThenM(N) -> 00 as Noo, ndthe equence f smoothsI is mean quared onsistentfor llOn. Notinghat or E L2(Y),

    EIIS0B -PX0II2 s 3EIISI0n - PXOn N + 9110 - OnlI2completesheproof f the heorem.The fact thatACE uses modified mooths Wg = Smg -Av(S?g) andfunctionssuch hat g = 0 causesnoproblems,ince

    IIAv(S rg)II= (Av(SNg))2and

    Av(Sag, gN(x, cf),using henotationf Proposition .3.

    This content downloaded from 193.136.144.3 on Thu, 23 Jan 2014 18:34:54 PMAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 8/13/2019 2288473.pdf

    19/20

    Breiman and Friedman: EstimatingOptimal Transformations 597Assume is bounded, nd write

    I N ( ) 1NAv(SI) ) =N k Ni +By the rgodic heorem,he econd erm oesa.s. toEPj(g IX), andan argument imickingheproof fProposition .3 shows hat hefirsterm oesto zero a.s.Finally,write

    IEP6(g I )| = IEP6(g IX) - EPxgI s lIP64 - 4lI,where = Pxg.Thus,Theorem .3 can be easily hanged o accountformodifiedmooths.In the ontrolledxperimentituation,heXk}are not andom, utthe onditionN(dx) P(dx) is imposed.Additionalssumptionsrenecessary.

    Assumption.1. ForO(Y) anybounded unctionnL2(Y),E(O(Y)| X = x) is continuousn x.Assumption.2. For i # i and +(x) anybounded ontinuousfunction,(O(X,) IX, = x) is continuousn x.A necessary esults Proposition .5.Proposition .S. ForO(y)bounded nL2(Y) and +(x) boundedandcontinuous,

    I N- E O(yJ)o(xJ) s > EO(Y)O(X).NJ=ILetTN = J=, (YJ)4(xJ).henETN= J7g(x,)+(xj), g(x) =E[O(Y) IX = x]. Byhypothesis,TNIN-> EO(Y) (X). Furthermore,NON var(TN) = E E[O(y) - g(x )]20(X )N

    =E hx,) (x,),* Iwhere (x) = E[(O(Y) - g(X))2 | X = x]. Sinceho is continuousandbounded,hen NIN -+ Eh(X)O(X). Nowthe pplicationf Kol-mogorov's xponentialoundgives

    TNIN - ETNIN aS > 0,provingheproposition.In Theorem .2 we addthe estrictionhat 0be a bounded unctioninL2(Y). Then theconditionn 0 maybe relaxed o thefollowing:For0, anybounded unctionnL2(Y), 1111 110112,I0lIN1->These ollow romroposition.5 and ts roof. urthermore,ecauseofAssumptions.1andA.2,mean quaredonsistencyf he moothscan be relaxed o thefollowing equirements.

    Assumption.3. For =#and very ounded ontinuousunction+(x,), 2I~ P4I -->0.||s,+ - PJOIIN ?Assumption.4. Forevery ounded unction(y)E L2(Y),

    EIIS,O - P N0IIk 0Assumption.5. Forevery ounded ontinuousunction(x,),EIIS,q$ PVII2 ->0.

    The existence f sequences f nearest-neighbormooths atisfyingAssumptions .3, A.4, and A.5 canbe provenn a fashion imilarto the rooffTheorem .3. Assumption. 3 isproven sing emmaA.1 andProposition .4. Assumptions.4 andA.5 require ropo-sitionA.S inaddition.If thedata are id, strongeresults an be obtained. or nstance,meansquared onsistencyan be proven or modifiedegressionsmoothimilar o the upersmoother.or x of anypoint, etJ(x) bethe ndexes f theM points n XA} directlybovex plus heM below.If there re onlyM' < M above below), then nclude heM + (M

    - M') directlyelow above). For a regressionmooth,S(4 I ) = f + [rF(0, x)/](x - xx), (A.21)

    where /X,xx re the averages of 0(yj, x, over the indexes in J(x),andFx(4,x), U2 are he ovariance etweenYk), Xk and he arianceofXkover he ndexes n J(x).Writehe econd erm n A.21) as[Fx(& x)OlR[(x - Xx)ux]If there reM points bove and below nJ(x), t s nothard o showthat

    l(x - XX)/I s 1This s not rue ear ndpoints herex -x )/Ixcan become rbi-trarilyarge sM gets arge.Thisendpointehavior eeps egressionfrom einguniformlyounded. o remedy his, efine function

    [x], = x, lxl?1= sign(x), lxi 1,

    and define hemodifiedegressionmooth yS(4 I ) = x + Fr(4, x)/Ux[(x XX)/Ux],. (A.22)This modifiedmooths bounded y2.

    TheoremA.4. If, as N -> oo,M -> oo,MIN -> 0, and P(dx) hasno atoms, hen hemodifiedegressionmoothsre meansquaredconsistent.The proof s in Breimannd Friedman1982). We are almost ertainthat hemodifiedegressionmoothsre lsomean quared onsistentfor tationaryrgodic ime eries nd n heweaker ense or ontrolledexperiments,utunder ess definitiveonditionsn rates twhichM

    00.

    APPENDIXB: VARIABLES SED IN THEHOUSINGVALUEEQUATIONOFHARRISONAND RUBINFELD1978)

    MV-median value of owner-occupiedomeRM-average number f rooms n owner nitsAGE-proportion fowner nits uilt rior o 1940DIS-weighteddistancesofive mploymententersntheBostonregionRAD-index ofaccessibilityo radialhighwaysTAX-fullpropertyax rate $/$10,000)PTRATIO-pupil-teacheratio y town chooldistrictB-black proportionf populationLSTAT-proportionfpopulationhats lower tatusCRIM-crime rate ytownZN-proportion f town'sresidentialand zoned for otsgreaterthan 5,000 square eetINDUS-proportion fnonretailusiness cresper ownCHAS-Charles Riverdummy 1 iftract ounds he CharlesRiver, otherwiseNOX-nitrogen xide concentrationnpphm

    APPENDIXC: VARIABLES SED IN THEOZONE-POLLUTIONXAMPLESBTP-Sandburg AirForceBase temperatureC?)IBHT-inversionbase heightft.)DGPG-Daggett pressure radientmmhg)VSTY-visibilitymiles)VDHT-Vandenburg 00 millibar eightin)HMDT-humidity percent)

    This content downloaded from 193.136.144.3 on Thu, 23 Jan 2014 18:34:54 PMAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 8/13/2019 2288473.pdf

    20/20

    598 Journal f theAmericanStatisticalAssociation, eptember1985IBTP-inversionbasetemperatureF?)WDSP-wind speed mph)Dependent ariable:UP03-Upland ozoneconcentrationppm)

    [Received ugust 982.Revised uly 984.]REFERENCES

    Anscombe, . J., and Tukey,J.W. (1963), "The Examinationnd Analysisof Residuals,"Technometrics,, 141-160.Belsley,D. A., Kuh, E., andWelsch,R. E. (1980),Regressioniagnostics,New York:JohnWiley.Box, G. E. P., and Cox, D. R. (1964), "AnAnalysis fTransformations,"Journal f theRoyalStatistical ociety, er.B, 26, 211-252.Box, G. E. P., andHill,W.J. 1974), "CorrectingnhomogeneityfVarianceWith owerTransformationeighting," echnometrics,6,385-389.Box,G. E. P.,andTidwell, . W. 1962), "Transformationsf he ndependentVariables," echnometrics,, 531-550.Breiman, ., and Friedman, . 1982), "Estimating ptimal ransformationsforMultipleRegressionnd Correlation,"echnicalReport , UniversityofCalifornia, erkeley, ept.of Statistics.Cleveland,W.S. (1979),"Robust ocallyWeighted egressionnd moothingScatterplots,"ournal f heAmericantatisticalssociation,4,828-836.Craven, ., andWahba,G. (1979), "Smoothing oisyDataWith pline unc-tions: stimatingheCorrect egreeofSmoothingytheMethod fGen-eralized ross-Validation,"umerische athematik,1, 317-403.Csaki, P., and Fisher,J. (1963), "On the General Notion of MaximalCorrelation," agyarTudomanyos kademia, udapest,Matematikai o-tato ntezet, ozlemenyei,, 27-51.DeBoor,C. (1978),A PracticalGuide oSplines,New York: pringer-Verlag.De Leeuw,J.,Young,F. W., and Takane,Y. (1976), "Additive tructurenQualitativeata:AnAlternatingeastSquaresMethodWith ptimal cal-ingFeatures," sychometrika,1, 471-503.Devroye, . (1981), "On theAlmost verywhereonvergencef Nonpara-metric egressionunction stimates," heAnnals fStatistics,, 1310-1319.Devroye, ., andWagner, . J. 1980), "Distribution-FreeonsistencyesultsinNonparametriciscriminationndRegressionunction stimation,"he

    Annals fStatistics,, 231-239.Efron, . (1979), "Bootstrap ethods:Anotherook at theJackknife,"heAnnals fStatistics,, 1-26.Fraser, . A. S. (1967), "DataTransformationsnd he inearModel, AnnalsofMathematicaltatistics,8, 1456-1465.Friedman, .H., and Stuetzle,W. 1982), "SmoothingfScatterplots,"ech-nicalReport RION006,Stanfordniversity,ept.of Statistics.Gasser, T., and Rosenblatt, . (eds.) (1979), "Smoothing echniques or

    CurveEstimation,"nLectureNotes nMathematics,o. 757,NewYork:Springer-Verlag.Gebelein, . (1947), "Das StatitisticheroblemerKorrelationlsVariationsundEigenwertroblem nd Sein Zusammenhang itderAusgleichung-Srechnung,"eitschriftuer Angewandte athematikndMechanik, 1,364-379.Harrison, ., andRubinfeld,. L. (1978), "HedonicHousing rices ndtheDemand or leanAir,"JournalfEnvironmentalconomicsManagement,5, 81-102.Kendall,M. A., and Stuart, . (1967), The Advanced heory fStatistics(Vol. 2), NewYork:Hafner ublishing.Kimeldorf, ., May,J.H., andSampson,A. R. (1982), "ConcordantndDiscordantMonotoneCorrelationsnd Their Evaluations y NonlinearOptimization,"tudies n theManagementciences 19): OptimizationnStatistics,ds. S. H. Zanakis ndJ.S. Rustagi, msterdam:orth-Holland,pp. 117-130.Kruskal,J. B. (1964), "Nonmetric ultidimensionalcaling:A NumericalMethod," sychometrika,9, 115-129.(1965), "Analysis f Factorial xperimentsyEstimating onotoneTransformationsf theData,"JournalftheRoyal tatisticalociety, er.B, 27, 251-263.Lancaster, . 0. (1958), "The Structuref Bivariate istributions,"nnalsofMathematicaltatistics,9, 719-736.(1969), The Chi-Squared istribution,ewYork:JohnWiley.Linsey, J. K. (1972), "FittingResponse SurfacesWith Power Trans-

    formations,"ournalf heRoyal tatisticalociety, er.C, 21, 234-237.(1974), "ConstructionndComparisonfStatistical odels,"JournaloftheRoyalStatisticalociety, er.B, 36, 418-425.Mosteller, ., andTukey, .W. 1977); Data Analysis ndRegression, ead-ing,MA: Addison-Wesley.Renyi,A. (1959), "On Measures f Dependence,"ActaMathematica ca-demiae cientiarumungaricae, 0,441-451.Sarmanov, . V. (1958a), "TheMaximalCorrelationoefficientSymmetricCase)," DokladyAkademii auk UzSSR,120, 715-718.(1958b), "The Maximal Correlation oefficientNonsymmetricCase)," DokladyAkademii aukUzSSR, 121,52-55.Sarmanov, . V., and Zaharov,V. K. (1960), "Maximum oefficientsfMultiple orrelation,"okladyAkademii auk UzSSR,130, 269-271.Spiegelman, ., and Sacks, J. (1980), "ConsistentWindowEstimationnNonparametricegression," heAnnals fStatistics,, 240-246.Stone,C. J. 1977), "Consistent onparametricegression," heAnnals fStatistics,, 139-149.Tukey,J. W. (1982), "The Use of Smeltingn GuidingRe-Expression,"nModern ataAnalysis,ds.J.LaurnerndA. Siegel,NewYork:AcademicPress.Wood,J.T. (1974), "AnExtensionf theAnalysisfTransformationsfBoxand Cox," Journal ftheRoyal tatisticalociety, er.C, 23, 278-283.Young,F.W.,de Leeuw,J., ndTakane,Y. (1976), "RegressionWith ual-itative ndQuantitativeariables:An Alternatingeast SquaresMethodWithOptimal calingFeatures," sychometrika,1, 505-529.

    CommentDARYLPREGIBON nd YEHUDAVARDI*

    Indata nalysis,he hoiceoftransformationss often onesubjectively. CE is a major ttempto bring bjectivityothis rea.As Breimannd Friedman ave demonstratediththeir xamples, nd as we have experienced ith urown,ACE is a powerfulool ndeed.Ourcommentsresometimescriticaln naturendreflecturview that heres muchmoreto be done on thesubject.We consider hemethodologysignificantontributiono statistics, owever,ndwould iketocomplimenthe uthorsor ttackingn mportantroblem,* Daryl Pregibon nd YehudaVardiare Members f Technical taff,AT & T Bell Laboratories, urray ill,NJ07974.

    for arrowinghegapbetweenmathematicaltatisticsnddataanalysis,nd for rovidinghedata nalyst ith useful ool.1. ACE IN THEORY:HOW MEANINGFULSMAXIMAL ORRELATION?

    To keep urdiscussionimplewe imitthere othe ivariatecase,though he ssues hatwe raise reequally elevanto thegeneralase.ThebasisofACE lies n he ropertiesfmaximal? 1985AmericantatisticalssociationJournalf heAmericantatistical ssociationSeptember985,Vol.80,No. 391, heorynd Methods