regresioni shumfisht

Upload: jumbo-smith

Post on 03-Jun-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 regresioni shumfisht

    1/164

  • 8/12/2019 regresioni shumfisht

    2/164

  • 8/12/2019 regresioni shumfisht

    3/164

    M U LT IP L E REG R E S S I O NINH Y D R O L O G Y

  • 8/12/2019 regresioni shumfisht

    4/164

  • 8/12/2019 regresioni shumfisht

    5/164

    M U L T I P L E R E G R E S S I O N I NH Y D R O L O G Y

    b y R . L . H older

    aN STIT U T E O F H Y D R O L O G YW A L L IN G F O R D

  • 8/12/2019 regresioni shumfisht

    6/164

    1985 In st i tu te o f H yd ro log yW a llin gfo rd , Oxfo rd sh i re OX IO 8B 8

    ISB N 0 94 85 40 00 I

    T he In st it u te of H yd ro logy is a co m po nen t es t ab lish m en t o f t heN a tu ra l En viro n m en t Re sea rc h C o u ncil

    A ll r igh ts re ser ved . N o pa r t o f t h is pu blic at io n m a y be re p ro d uced , st ore d in a re triev a l system , o rt ra n sm itted in a n y fo r m o r b yan y me a n s,e lectr o n ic, mec ha n ical, pho to co p yin g , rec o rd in g , o r o t he rw ise ,w it h o u t th e pr io r pe rm issio n o f th e co py r ight o wn er , In stit u te o f H yd ro logy , C ro wm a r sh G if o rd ,Wa ll in gfo rd , Ox o n OX IO 8B B, En g lan d .Pr in ted i n G rea t Br i ta in by G al Lard Prin ters) Ltd, G reat Y arm ou t h

  • 8/12/2019 regresioni shumfisht

    7/164

    P R E F A C E

    T her e a re ma n y p ro b lem s in hy d r o log y tha t m a y be solved by m u lt ip lere gres sio npro ced u res .Thi styp eo fa na lysis ma y be u sed in f oo da nd lowfowst ud ies , fo r exa mp le,and in catc h me n t m od e llin g . R a in fa ll-ru n o f eq u a t io n sd er ived b yusin g m u lt iplereg res sio n p r oc ed u re sha vebeendev e lo ped a nd use dfo r ava r iet yof p ur p ose ssuc hasfowpr oj ec t io n in t im e sof d r ou gh t a ndfo r t heest ima t io n o f p ast f ows fro m wea t he r da ta .

    A s no sin gle wo rk o f re ferenc ed ea lt co mp re hen sively wit h the u se o fm u lt ip le regr es sio n in h yd ro log y, t he D ep a r tme n t o f the En viro n m en t' sC en t r a l W a te r P la n nin gUn it , in J un e 197 5,c om m issio n edone fr o m Mr R . L .H o lde r o f the D ep a r tme nt o f M a t he m a t ica l Sta tist ics, Bir m in gh a mU nive rs it y . F o llow ingthe t r a ns fero fcer ta in func t io n sandre sp o n sib ilit ies o ft he Do E U n itto the N a tu ra l En vir o nm en t Rese a rch Cou nc il 's In st it u te ofH yd ro log y , a rev ised and upd a ted bo o k inc or p or a t ing mo re su b st a n t ia lexa m p les wa s pre pa red wit h the a ssis ta nce of th est a f o f the In st itu te .

    Bec a u seof the a p pa re n t gen er a lit y o f t heme t h odof rela t ing o neva ri a b le to ase t of o therva r ia b les , mu ltip le reg re ssio n is pr o ba b ly the mo st fre q u en t lyu sed a nd ind eed misu sed st a t ist ica l t oo l . U nd o ub ted ly, t he tech niq ue ispo ten t ia lly ver y u sefu l a nd it is c u r ren t ly t he su bj ect of mu ch th e or e t ica lre sea rchby ma them a t ica l st a t is t ic ia n s ; b u t , a s wit han ysta t ist ica lp ro ced ur e , itiscruc ia llyim po rt a nt tound er sta nd t he b a sis , a ssu mp t io n s a nd lim ita t io ns o ft he tec hn iq ue . C o m pu te r pa ck a ges h a ve ta ke nthe d ru d ger y ou t o f re gressio na n a lysi s a n d some allo wgrea t fexi bi lityin th e t yp e o fa na lysis co nd uc ted .Aswel l a sinst r uc t in g t herea d ero nt he ba sic tec hn iq ue s , th isbo ok a im s to ed uc a tere ad er s to ex t en d thei ru se o f reg re ss io n be yo nd the sta nd a rdpr oced u re s .

    Ia mver y p lea sed the re fo reth a t ith a s bee npo ss ib le fo r thisIn st it u te top u b lish M r Ho lde r' s mo st u sefu l add itio n to the hyd ro log ica l lit er a tu re .J . S . McC u lloc hD irec to r , In st it ut e of H yd r o log yA pr il, 198 5

  • 8/12/2019 regresioni shumfisht

    8/164

  • 8/12/2019 regresioni shumfisht

    9/164

    ACKNO W LE DGEM ENT S

    Mysin cer et h a nk st oC o linW r igh to ftheD epa rt men to f theE nv ir o nme nt fo rt he o r igin a l ide a a nd in it ia l su pp o rt fo r this b oo k t o D a vid Jo ne s of theIn st itu te of H yd ro log yfo rp ro vid in gthepr o blem sa nd d a t a set sfo rC h a pt er 5a swe lla sma n yhe lp fu lsu gge st ion s to M rsT .C a r rfo r typ in gthem an u scr ip t M r s A . M a yho fo r a ssis t in g wi th t he co mp u tin g a nd M iss P . Bin n fo rsu gges t in gma n y im pr oveme n t s to t he o r igin a lma n u scr ip t .

    Birm ingham Octobe r 1984R .L. H .

  • 8/12/2019 regresioni shumfisht

    10/164

  • 8/12/2019 regresioni shumfisht

    11/164

    C O N T E N T S

    P ref aceA ck no wledg em en tsC hap te r 1 S im ple Linear Regression /

    1.1 In t r od uc t ion .1.1 . 1 A p roblem in linear re gressio n a n a l ysis1.1 .2 A s s u m pt ionsm a d ein line a r regre ssio n J1 . 13 Inte r p r e ta t ion o f the a s s u m p t ions 41.1.4 W ha t can be achieved byu sing linear regressio n a n a lysis? 6

    1.2 T he Basic Method 71.2.1 Fi t t ing a st ra igh t line . 71.2 .2 E s t im a te s and th eir pr ecisio n 91.2.3 Sign if ca nce te sts 101.2.4 P redic t ion . . . 12

    1.3 Exten s ionsto theBasic M e t h od 141.3. 1 Re pea ted o bserva t ions 141.3.2 F itti ng and co mpa r i n g severa lst ra igh t line s 171.3.3 O b s er va t ionswith u n e q u a l p rec isio n 2 1

    1.4 A lter na tives t o Least Squa re s 231.4 . 1 Pen ci l a nd ruler . 231.4 .2 Ro bu st a nd d ist r ibut io n freem e t h o d s . 241.4.3 Bayes ian m et h o d s 261.4.4 Linea r func ti o n a l re la t io nsh ips 29

    C hap ter 2 M ultiple Linear Regress io n 322 . 1 Int rod uc t io n. . . . 32

    2 . 1.1 P ro b le m s for mult iple linearregression an alysis . 322. 1.2 A ssu m p t i o ns m ad e in mu ltiple linear regressio n 322. 1.3 Inte r p re t a t ion o f th e a s su m p t i ons 332. 1.4 W h a t c a n be a chieve d by using m ult iple l inear regressio n ? . 34

    2.2 The Basic M etho d . 352 .2 . 1 F ittingthe mo de l 352.2 .2 E stim a t e sand th eir precis ion 3,72 .2 .3 P redic t ion 39

  • 8/12/2019 regresioni shumfisht

    12/164

    C O N T E N T S2 .3 Sign if c a nc e T e st s a n d th e ' Best ' Eq ua tion 40

    2. 3. 1 G en er a l lin ea r h yp o t hesis 402.3 .2 In it ia l sign if c a n ce te sts 4 12 .3 .3 Selec tio n o f var ia b les t he 'be st ' e q u a tion 432.3 .4 A ll p o ss ib le re g re ss io ns 432.3 .5 F o rw a rd se lec ti o n 442. 3.6 Bac kw a rd se lec ti o n 462.3 .7 Ste pw ise re gre ss io n 462 .4 Ex te ns io n s to t he Ba sic M e t hod 472.4 . 1 F ittin g an d co m p a r in g sev er a l re gr e ss io n line s 472.4 .2 O b se rv a t io n s w it h un equ a l p rec isio n 502.4 .3 M issi n g ob se r va t io n s 5 12 .5 Spe c ial M od e ls 5 12.5 .1 Un iva ri a te p o lyn o m ia lmo d e ls 5 12.5 .2 M u ltiva r ia b le p o lyn om ia l m o de ls 572.5 .3 Per io d ic re gre ss io n 572 .5.4 D u m m y va r ia b les 622 .6 A lte r na tive s t o lea st sq u a re s 632.6 . 1 Pe nc il a nd ru ler . 632.6 .2 R o bu st a n d d ist ri b u ti on fre e me th od s 642.6 .3 R id ge re g re ss io n an d prin c ip a l c om p on e n ts re gr es si o n 652.6 .4 Ba ye sia n m et ho d s 702.6 .5 F u nc ti o n a l re la t io n sh ip s 70

    C hap ter 3 Be fore a M ultip le Regression Analysis 723 . 1 W h a t to I nc lu de a nd W hy 723. 1.1 W h y is t he a n a lys is be ing co nd uc ted ? 723. 1.2 W h ich in d ep e nd e n t va ria ble s sh o u ld be used ? 733 .2 T he d ist ri bu ti o n of t he d ep e nd en t va ri a ble 753.2 . 1 R e q u ire m en ts o f lea st squ a r es 753.2 .2 E vid enc e to ju st ify o r qu es t io n t he a ssu m pt io ns . 763.2 .3 T est of t he a ssu m p tio ns 80

    3. 3 T ra n sfo r m a ti o n s 823.3 . 1 Va ri a nc e st a bi lisin g tr a ns f o r m a t io n s 823.3 .2 T ra n sfo r m a t io n s t o nor ma lit y a nd line a r ising t ra n sfo rm a t ion s 833.3 .3 Box C o x tra n sf or m a t ion s 87

    3 .4 Au to c o r re la t io n in M u lt ip le Reg r e ss io n. 893.4 . 1 Po ssib le c a u ses and conse q u en ce s 893.4 . 2 T ra n sfo r m a ti o n s 90

    C hap ter 4 Aft er a M ultiple Regressio n Analysis 944 . 1 So m e P re lim in a r y C he c k s 944. 1. 1 Ex a m in in g t he fo r m of th e re gr ess io n e qu a t io n 944. 1.2 E xa m in in g th e be ha viou r o f th e re g ress io n mod e l 95

    4. 1.3 Sta b ilit y o f th e m o d e l 964 .2 Pr o b lem s of N u m er ica l St a b ili ty 964.2 . 1 N u me r ica l m et h od s u sed in re gr ess io n 964.2 .2 T he re lat ive m e r it s of th eva r io u s n u m er ica l met ho d s . 974.2 .3 D et ec ting t he fa ilu re of th e n u me ric a l met ho d s 98

  • 8/12/2019 regresioni shumfisht

    13/164

    C O N T E N T S4 .3 Ana l ysis of Residu a l s 99

    4 .3 .1 Plo t t ing th e residu a ls . 994.3.2 So m e tests on the re siduals 1004 .3.3 Ot her resid ua ls 1024.3.4 A u t o c o rr elation 103

    C hap ter 5 S ome xamples 1065 .1 An Exa m ple of F itt in g a nd C o m pa ring Seve r a l Regre ssion Lines 1065.2 M u lt iple Regre ssio n o n M ean A n n u a l F lood 114

    5.2 .1 I n t r oduc tion 1145.2.2 T ransfo r ma tio n s and weig h t s on a n nua l m a xi m um f o od 1175.2 .3 Regression of the stand a rd devia t ion 1185.2.4 C o m pa r ison be tw een re gi ons 1205.2 .5 Examin a t ion of as sum pti o n s . 123

    5.3 Ste pwise Re gression Cho os ing the Bes t Pred ict or s 1255.3 . 1 Introd uct ion 1255.3.2 A n exam p le of ste pwise regre ssio n 1265.3 .3 Som e fu r ther regress io n s 1305.3.4 A sim ple pred icto r for mo n th ly f ow 134

    P ostscr ip t 14 1Inde x 143

  • 8/12/2019 regresioni shumfisht

    14/164

  • 8/12/2019 regresioni shumfisht

    15/164

    Chapter 1

    SIM P L EL IN E ARR E G R E S SI O N

    1 1 Introduction1.1.1 Apro b lemin linea rregress ion an a lysisA stud y o f the relat ion sh ip between rain fal l and ru n -o f ina p ar t icula rar eama y ,amon gst otherthin gs , ha veledt he in vestigat orto keepreco r dso f thean nua l ra infa ll an d thea n nu al ru n -of overa per iodofsevera l yea rs.A nexamp leo fsuch rec ord s, tak enfr o mt heA lwen ca tc hmen t , Lewis(1957) ,isgiven inTa b le 1.

    T a b le 1 Mo n t hlyra in fa llan d ru n -o ffor th eAlwe nCa tc h men t, N or t hW a les 19 12- 19 15(m m )

    Yea r Jan . Fe b . M a r . Ap r . M ay J une J uly A ug S ep . O ct . N o v. Dec .

    Th istab legivesthepr ecisedet ai ls , with inreco rd ingaccu racy , o frain fal landrun -o f intheAlwen ca tch men t betw een 19 12an d19 15an d , a ssuch ,isthemo stcom ple testat ist ical rep resen tat ion ofthe investiga to r' sf nd ings .Ho wever ,somealter na t ive sta t is tica lrep rese n ta t iono fthesefac tsma y b en ecessar yino rd er toach ieveso me specif c o bject ive.The invest igat o r ma ywish to :

    (a) Sum marise his d at ain terms o fju st afewper tine nt nu mb ers .(b ) Decidewh ethe r rainfa ll an d run -of inf uen ce eacho t her .(c) Pred ict so mefu tu re ru n -o f wh ich migh tbe expected fro macer ta inan n u al rain fa ll .

  • 8/12/2019 regresioni shumfisht

    16/164

    Pred ict t herain fa l l t ha t would b enecessa rytoprod uceace r tainru n-o f .D ecidewh ethe r cer t ainoft he rea d ings in t hetab le ar eexcept ion a l or ar en o to f the same p a tte rn or tren d as t he o t hers .Buildor co mp leteso mema t hema ticalmo d elre lat ing ra infa llan drun -o f .M ak eso me co mp a riso nbetweent herea d ing s given in T ab le 1an dsim ila r rea d ingso b tained fr o ma no ther ar ea .

    T o ac hiev e a n yof t heseobj ec tives, the fr st step co u ld beto d rawa grap hofan nu alo r mo n t hlyra infa llag ains t ru n-of .C ar efu l examin a tion of t his gra p h an d jud iciou su seo f aru ler ,a f exiblecu rvean d hisownexp er ience w ould help togivethe investiga tor an an swertoo bjec t ives(a) , (b ) , (c), (d),(e) and(g).Linea rreg ressionan alysiswou lda lso beh elp fu l . H owever ,insu ggest ingthis fu rt hertech niq u e,it is n o t ou rinten tiontod en igrategra p h icala nd visu al meth od s ;ind eed , it isho p edtha t theread erwillrea liseth att he tw o ar eco mp lemen tar y.Allowa ncefo r o t her fac to rs,such aseva po ra tio n o rmo nt htomo n t h va ria t ion , wo u ldimp rovetheaccu rac yoftherela t ion sh ipsho wn inF igur e 1.M or e co mp lexre la t ion ship s of thistypear ed iscu ssedin C ha p ter 2 .

    (d )(e)(f(g)

    2 50

    20 0

    150

    50

    0

    M U LT I P L ER E G R ESS I O N IN HY D R O L O G Y

    0 5 0 10 0 15 0 20 0Ra in fa ll mm l

    F ig . I . A lwe nc a tchm e n tra in fa lla nd r un -o f 19 12- 15.

    25 0

  • 8/12/2019 regresioni shumfisht

    17/164

    SI M P L E LIN EA R RE G R E SSI O N

    1.1.2 A ssum p tion smad e inlinea rre gressio nSimp lelinea rregressionma ybeap p liedtopr ob lemsinwh icha reco rdhas beenma deo f thevalu es o ftwo var ia b les, re ferred toa s y , he depend ent va ria ble,a nd x , he ind epend en t va ria b le .Itisa ssu medt ha t , fo r an ysuchreco rd ,thema them at ical mo d el y =a+bx+ e I )where a a nd b ar econ stan t s and e s a va ria ble , d escrib es therelat ion shipbetw eent he y read ingan d the x read ing . Bytem p or ar ilyignor ingt heter m e, weseet ha tastr aig ht line(linea r )re la tion s hipis assumedbe tween y an d x , wit h a,t heint ercep t ,an d b, heslopeofthegr ap ho f y plot tedaga inst x . H owever ,ifou rmo del o nly al lowedforrea d ingsof y an dxwhich fellexact lyonastr aig h tline, it wou ldbeoflitt le pr act ical va lue . Inclu siono fthevar iab le e allowsrea d ing s o f y an dxto d eviatefro mastraight line , bu tassu mp tion sar ema d ea b ou t e soas tofo rce thesed eviat ion stohaveap ar ticula rp at ter n .Ifweim agin ebei ngab letotak e ma nyrea d ing s a ll givingt hesamevalu eof x , h enso me y va lueswillbe great er t hen a+bx a nd so meless ,i.e .someva lues o f ewillbepo sitive an d somen ega tive.M o st o ft he assum pt io ns ma d ein linea rreg ressio nca nbe sta ted inter ms of the va luesof e. Wewill as sume tha t t hea rit hme ticmea no ftheva lu esof e szer o . W ewill a lsoa ssu met ha t thevar ia nceoftheseva lueso f e salwayst he samewher ever thevalueofxha ppen st ofal l.At ala te r sta ge,we will al soneed toassum etha tthesevaluesof e forma n o rmald ist r ib ut ion .

    F igure2ind icatesthetypeof grap honewou ldexp ect if it werep o ssib letoreco rdma n yva lueso f y allwith thesame x valu e.Inp rac tice , wewill freq u ent ly haveo n ly one valu eof y op lotatone xp o sitio n .Co n sequ ent ly , on eof th epr o blem s wewillha vetoco n siderishowtoju stifytheaboveassump tion swhen th erea d ing s ar enotava ilab leintheidea lfo rmas showninF igu re 2.

    Analtern a tiveinterp re ta tion o fthe assu m ption sisas fo llows. If wearea bleto f xa va lueof x , henthevalueof y wereco rdshou ldbe a+bx . However ,due

    y = a 3x

    F ig . 2 . D ist r ib u ti on o f va lu es a b o ut th e regr es sion lin e .

  • 8/12/2019 regresioni shumfisht

    18/164

    M U LT I P L E RE G R E SS I O N IN HY D R O L O G Yto error s, inaccu racie s, u nc o n tr olla b leor inex p lica b le var ia t ion s ,t heread ingo f y hatwemak e is a+bx +e, e rep resent ingt he er ro r in mea surem en t . T hen ,o n aver age , such err o rssho u ld be ze r o , i .e . thereisn ocon sisten t bia s in ou rrea d ingsasa resulto ferro rs ma d e. F u r ther mor e,a llread ingssho u ld b emad ewit hequ a l p recisio n ,i .e. agivensize err o r is equ al ly likely to be ma d e at an yva lue of x . F ina lly, th e err or s o f mea surem en t sh ou ld fo rm a n or ma ld is t rib ut ion .

    T hereiso ne fur t hera ssu mp tion toad d to bo th of t hese exp lan at ion sand t hisist ha t all er ror s(value s o f e) ar eassu med tobeind ep end ent ,i .e .the m a gn itu d eof t h eerr o r in one read ing d oes n o t inf uen ce the magni tud e of the er ro r inan o therread ing .

    1.1.3 In ter pr eta t ion of t he assum p tion sLe t u s con sider t hed irecta pp lica tio nof lin ea r r egress io ntothe d at aofT ab le1.A s ma y be d edu ced fr o m t heir t it les, a nd cer tain ly fro m eq u at ion ( 1) , th eva ria b les y a nd x ar etrea t ed d if er ent lyin linea r regression; they a re no tin terc han gea b le. T hu s, so me t hou g h tmu st be givento whichvar ia b lewe ca ll yan dwhich we ca ll x . Id ea lly ,wewo u ld hav e on evar ia blesubject to er ro rsandth eot her f xed , con tro lledor err or fr ee ; the fo rm erwou ld be y an d thela t terx .H owever , wit h rai nfal l an dru n -o f a s p o ten tia l y a nd x var ia b les, t he ch oiceis b yn omea n s s tra igh tfo rwar d . Ind eed , co n sider ing themea suremen to fthesetw o var ia b les, w e wou ld pro b ab ly ha ve to co nclud et ha t b ot h were subjec t t oer ro rs ; co n sequ en tly, th e linea r reg ression mo de l ( 1), which ap pea rs toa t t rib u teall er ro rtoo ne va r ia b le ,is no tappr o p ria te. Mod els wh icha llowbothva r ia b les to be su bject to er ro r will bed iscu ssed late r bu t for now , letu sco n sider wh a t cir cu mst a nces migh t lead u s t o use linea r reg ressio n fo rra in fa ll/ ru n -o f pr o blem s.A s issofreq uen tlythecase , it is o u ro bj ec tive, t oget her w ithso me k n owledgeo f thep hysicalp rocessbein gstud ied , whic h d eter mine s t he fo r m of t hemod el .If we wishtop red ict thelikelya nnu a l ru n -o f fro manan nu alrainf all o f R , thenwewill need to assum e t h at R is f xed an d predict wha t we rega rd as anu nce r tain qu a nt ity ,run -o f .Th us, th ere issomeint u itivesup p o r t fo r assu ming

    t h att heava ilab le rainfall read ings a ref xeda nd , toget her with so me stat istica lre a so n ing , t his lead s u s to co nclud e tha t rainfa ll sho u ldbe trea ted as thein d ep end en t var ia b le x . In gener a l, we sho u ld usua lly aim a t tak ing thep red ictedvar ia b let obe y a ndthe p red icto rtobe x . Inthis pa r t icu larexa mp le,t her eisafur therrea so n fo r ta k ingrainfa llasthe x va r iab lein th at ra in fal lis ,toso meexten t ,ca u sa l ofrun -o f an dhe nce ou rmod el ma ybeinter p re teda sbein goft he for mou t pu t = somefunct ion of inp u t + er ro r

    H a v ing d ec ided u po n a n x a nd y , wenext ha ve t oco nsider o u rassu m p tion sa b o u t the er ro rso r inex p lica b le var ia t ion s . Im ag ine d rawin gt hebes tpo ssib lelin etodesc r ibe t he p o int s in Figure 1, as illust ra ted in F igure 3;t he ver t icald isp lacemen t of each po int fr o m t his stra ight line rep resen ts the er ro ro rin exp lica b levar ia tio n fo r t ha t read ing .If t he f r st a ssum pt io n of er ro r s av era ging to ze ro is t ru e, t hen this will

  • 8/12/2019 regresioni shumfisht

    19/164

    SI M PL E L I N EA R RE G R ESSI ON2 0 0

    15 0

    5 0

    00

    F o r t h e s a k e o f c l a r i t y t h e v e r t i c a l d is p l a c e m e nto f e v e r y Point ha s n ot be draw n

    1 10 0 er ror er orcc e r or* *4

    I e re r r o r e rro r ro re r o r e

    * *e r ro r

    * * * *

    o r

    er ro r *

    5 0 100 150 2 0 0R a i nf a l l ( -n m )

    Fig. 3. Error or inexplicable variat ion in y readings.

    * *e rror

    *e r r o r

    usu ally lead to a co llection o f displacementswhich ap pear to have n o pattern tot hem and few particular ly outstand ing values .Similarly, reference to F igure 3 willhelp in assessing the second assum pt ion ,that of equ al precision or erro rvaria nce. This wou ld beref ected iqthe grap hbya similar spread o f deviation sabout theline over t hewho le range of x values . If,o nthe other ha nd , po int stend to gro up close tothe line insome regio ns o f xa nd a rewidely d ispersed to eit her sid e ofthe line inot her regions of x , thenthismight suggest that t he precisio n of resultsvaries.

    The assumption thatthe errors fo r m a no r mal d istribu tion is no t essentialfor a ll the step s in a linea r regression analysis. For insta nce, a best f t tingst ra ight line ma y be obt ained, and some appr oximat e sta tement ma d e a bou tthe accuracy o f that line, without t his assumpt ion . However, if suc h a nassumption can be mad e or a rra nged (see Section 3.3) a fa r moreco mp lete andsatisfactory ana lysis ca n be accomplished . A s in the case of the two previousassumptions, it is u sually necessary to refer to the outcome of a linea rregression an a lysis in o rd er to a ssessthe va lid ityo f this assum ption (see Sect ion

  • 8/12/2019 regresioni shumfisht

    20/164

    MULTIPLE REGRESSION IN HYDROLOGY4.3). However, some knowledge of the distribution of hydrological data will beof value in detecting likely problem cases (see Section 3.2).The assumption of independence is usually violated when there is some carryover from one reading to the next, frequently when such readings come as asequence in time. An example might be where two run-of measurement s aremade over time period s which over lap or which are both aff ected by the sameheavy rainfall or drought . Another example is where one reading contributes tothe next in some way, as might happen with river f ow measurements taken atstations which are close together. Problems which are more appropriatelymodelled as time series are considered later. However, as with the normalityassumption , the choice of a best f tting straight line is not necessarily dependenton this assumption of independence being satisf ed.

    1.1.4 What can be achieved by using linear regression analysis?So far, it has been suggested that linear regression analysis might help insolving problems (a) to (g) of subsection 1.1.1 and that some 'best f tting'straight line might also appear . Before plunging into a detailed description ofhow we might give an answer to these objectives, it would be as well to examinemore specif cally what it is possible to achieve using linear regression an alysis.First of all, let us suppose that the assumptions mentioned earlier aresatisf ed, that we have chosen a y and an x , and tha t we have a set of data similarto that of Tab le 1, namely pairs of values of y and x. We may estimate a and b nequation (1), together with their standard errors, or, alternatively, we mayderive conf dence interva ls for a and b or for the line a + bx. This will give ananswer to problem (a), some idea of (b), and possibly an appropriate answerto (f).Having estimated a and b, we may use these estimates to predict a value of ycorresponding to a particular x (and vice versa) by calculating

    = + bx (2)where a and S are estimates ofy , a and b respectively. Alternat ively, we mayderive a conf dence interval for this unknown value of y . This will help toanswer (c) or (d).We may carry out tests of signif cance on b and/or on a n order to examinesimplif cat ions of equation ( 1). For example, we cou ld test a =0, or even b = 0.If we have several sets of similar data , then we may estimate a and b for each setand carry out tests on the similarity of the dif erent as and bs. The f rst testsmight help with problem (b) or (f) and the others with problem (g).If, for each value of x recorded in our data, we estimate the correspondingvalue of y using equat ion (2) with our estimates of aand b, hen the dif erencebetween the recorded and the estimated value of y s usually referred to as theresidual. The set of residuals calculated from all the data contains usefulinformation . Patterns in these residuals, when plotted against their associatedx values, may indicate a poor model and may suggest the direction in whichimprovements might be made. The residuals may also be used to examine thevalidity of the assumptions mentioned in subsection 1.1.2. A residual which

  • 8/12/2019 regresioni shumfisht

    21/164

    SI M PL E L I N EA R REG R ESSI O N

    was ma rkedlydif e re n t fro mthe othe rs wo u ld be a nind ica t ion to the so lu t i o no f p r o b le m (e).

    1.2 T he Ba sic M e th od1.2 .1 F itting a s t r a igh t lineT he ba sic d a t afor thi ssect ionwill co n sist of pa irs o fvalues o f y a nd x wheret heid en ti ty o f y and x has been es ta blished a s alrea d y o ut lined . T h e s epairs o fva lue s wil l be d en ot ed b y(y , , .v, ) , y 2, x 2), , y , x , ) a nd mo d el ( 1) w ill the nbe c o me

    vi = a +bx1+ e1 (fo r i = 1, 2, 3, . . . ,T his is illus t ra ted in Figure 4 for three pai rs of po in t s .

    Y3

    X I

    X 2

    a 2

    y o a + b

    (3 )

    = 3F ig. 4. Regressio n line an d associated data .

    The ex act po sition o f the line y = a + bx isu nk no wn and our pr ob lemis t om a k ean inte lligent gu ess a t it s po sit ion , g iven the po in t son the gr a p h .Therea re ma ny propo sals as to how this in te lligen t guess sh ould be mad e . We wil lexamin eo ne met ho d indeta il, n a m e ly least sq uares es t ima t io n , b u t we willalsoco n sider so m e a lte r n a t ives .T heo bject ive o f lea st sq u a res est ima t io n is to c ho o se va lues ofthe unk no w n sso as to minim ise

    S 2 =, E e? = E y1 a bx j 2i = i = 1

  • 8/12/2019 regresioni shumfisht

    22/164

    i.e. the to tal vert ical discrepancy of the points from the line (regardless of sign)should be as small as possible.Solving as2aa=0, and aviab=0 will give the values of a and b,denoted bya and b, which minimise S 2 Hence, solving the equations

    will give the estimatesE Y i ) I n E xxi1= 1 i = 1 i = 1rtE X - ( i X ) 2 I n E ( x i

    i = 1 1= 1 i = 1and

    where

    2 E y i t xo = 0 and 2 E y i 6.x) xi = 0 (4)1= 1 1= 1

    = 1=1 b ri=1

    1= 1E y 1 111 and I = xi) / n

    1= 1The f rst expression in equa tion (5) is the one usually recommended forcalculation because it preserves accuracy. However, this point is only validprovided full accuracy can be retained throughout t he calculat ion . If there area large number of data point s and y and x are relatively large values, then thisma y lead to Et yixi and (E := x 1) E 7=, yi)/n both being large and similar;hence, their dif erence may be seriously aff ected by the roundof errorsgenerated when calculating either of the large expressions. In suchcircumstances, the second expression in equation (5) is more satisfactory.Roundof problems usually arise where a digital computer has been used forcalculation and, under these circumstances, there is little extra hardshipinvolved in using the alternative expression.As sum of squares and cross products appear frequently in regressioncalculations, let us def ne the following terms:

    Sx= E (x i )2 = x ) 21= 1 i = 1

    MULTIPLE REGRESSION IN HYDROLOGY

    1= 1

    syy= E (Yi 7)2= Ei 1

    y? Yi) 2 n= 1= 1

    Sxy E (x, ) y, 7)= E xiy i( E y i) I n1= 1 i = 1 1= 1 i = 1

    Thus, in this new notation, equation (5) becomes 5 =sxylsxx.

    (5)

    (6)

  • 8/12/2019 regresioni shumfisht

    23/164

    SIM PLE L IN EA R REGRESSI ON1.2.2 Estimates and their precisionThe most we can hope to get from data which do not exactly form a stra ight lineis estimates of a and b ; if we add some new data, then almost certainly ourestimates will change. By making some assumption about the variance of thevariable e (see subsection 1.1.3), we can derive the variances of a and b. Inparticular, if we assume that the variance of e (denoted by Var (e)) is 0-2, then itfollows that

    and

    2 2a X iVar () i = 1ns xx (7)

    Var (b) = (3 2 (8)SxxAll the quantities in expressions (7) and (8), except for a 2 , may be calculatedfrom the data . Since or2 is the variance of the variable e, it is natu ral to use theresiduals, namely

    i t = y , bx, (for i = 1, 2, . . ., n) (9)in order to estimate a' . We know from equations (4) that 1 = I et = 0 and,hence, that the arithmetic mean of the residuals is zero.Consequently, if we use the sum of squares about the mean of the residuals asthe basis of our estimate of o-2, then t hat sum of squares will just be: 7= e?.The appropriate divisor is n 2, two degrees of freedom having been ' lost' byestimating a and b. Hence, our estimate of a2 will be

    1 1= E (y, 5,0 22 = n 2 i= n 2 1=11 ( c, (Sx,)2

    n 2 V j YY S x x )(10)

    Which expression in equation (10) is chosen for calculating Oa depends on twofactors. If the residuals are to be calculated in any case, then it is obviouslysensible to use the f rst expression . If they are not , then the last expression maybe preferable. In evaluating the component expressions S x , S x y and S yr theremarks which were made at the end of subsection 1.2.1, concerning numericalaccuracy apply also in this context.We are now able to report estimates of a and b together with estimatedstandard errors ( / estimated variance) for those estimates. An alterna tivesumma ry would be to provide conf dence intervals for a and b. However, asthese are probabilistic statements, they require some assumptions about theprobability distribution of the variable e. The usual assumption is that e is anormal random variable; we have already assumed that its mean is zero andthat its variance is o-2. Shorthand notation for these assumptions ise N (0, a2).

  • 8/12/2019 regresioni shumfisht

    24/164

    M U LT IPLE REG R ESSIO N IN H Y D RO LO G YU

    l f,in o u r mod el (3), wea ssu m e th a t e1 N (0 , a ' ) (fo r i = 1, 2, . . . , n) a n d tha te are ind ep e n d e n t (see subsec ti on 1.1.2), the nE 4

    N ( a , nSx

    - N ( b )In add i t i on , E7 , y , a bx ,)2, u su a lly ca lled the residu a l (or erro r) su m o fsq u a res , follow s 6242_2(see footn o te) . This in turn mea n s t ha t

    a]

    a n d

    a n d

    t n - 2i d2 E 4i=1ns n.,

    If t (n , p ) is def ned by p = n 7 (tn) d tn, wh er e .1*0 is the p roba b ili ty den sit yfu nc t ion of t he t r a n d o m variable wi th n degrees of freed o m , then t he100 ( 1a) % co nf d en c e in te rva l fo r a isE 4

    ns xxa n d t he 100( 1 a) % co n f d ence interva l for b is

    it + 1(n 2, 1 a/ 2)

    5 + t (n 2 , 1 4 2)

    ( 12)

    1.2 .3 Signif cance test sE q u a t ion s ( 11) a nd ( 12) m a y also be used to test the va lidit y o f app rop r ia teh y p o theses abou t a a nd b . F or insta nce , wi th the Alwen da ta , we mig h t askw he t her a hypo t hesis of a = 0 is valid . If it is valid , then th iswould imp ly t ha tthe m odel shou ld be

    run -o f = b x rainfal l+ e r ro rEach of the three rand om var iab les x , t. and F. a re specia l funct ions of Normal r and omva r iables whichare frequen tly encountered in pr actice. T he suf ces a re referred to a s deg rees off reedom a nd relate to th e number o finde pend ent normal rand om variables involved in thefu nct ion. M ost texts on mathema tical s ta t istics give def nit ions of these random var ia bles andderive their pro bability d ensity function s.

  • 8/12/2019 regresioni shumfisht

    25/164

    SI M P L E L IN EA R REGRESSIO Nso that , excep t for erro r , we wou ld expect zero run-of when there is zerorain fa ll .

    If our hypothesis a=0 is true , then eq uat ion (11) becomes

    nS .Hence, if we accept the hypothesis a = 0 when ever

    [Ian < t (n 2, 1al2)

    [a ]1= 1

    i n - 2

    a nd reject the hypo thesis ot her wise , then this will give us a 100a%signif cancetest for this hypothesis .W e might alsoconsiderwhether the hypothesis b = 0 s valid . If it is va l id ,then this would give a model

    ru n -of = con st ant + erro rwh ich would imply that rainfall does no t a f ect ru n -o f .If the hypothesis b = 0 s true, then equation ( 12) becomes

    [5 ]6 2 n- 2

    Hen ce, if weaccept the hypothesis b= 0when ever

    < t (n 2, Ia/2)6 21 S .

    a nd rej ect the hypo thesis o ther wise, then t his w ill givea 100a % signif cance testfo r this hypo t hesis.

    Clearly, bo th of these test pr ocedures are equivalent to acce pting therespective hypotheses whenever the po in t s a = 0 o r b = 0 fa ll within the100(1a) % conf dence interva ls co nst ru cted for a a nd b.

    An o ther hyp ot hesis on b which ma ybe o f interest , a l tho ugh no t to theAlwendata exa mple, is the hypothesis b = I . If we are testing a new measurin ginstru ment and we are tak ingrea dings (y )o n i temswhere the exa ct result (x ) sk nown , then f t t ing a straig ht line y = a + bx will allow u st o test for correctzeroing o f the inst ru ment (a = 0) and co rrect calibration (b = 1) . This , o fcourse , assumes tha t a linea r relationship is appro p riate.

  • 8/12/2019 regresioni shumfisht

    26/164

    12 M U LT I P L E R E G R E SS I O N IN H YD R O L O G YT he approp riate test procedure would be to accept the hypo thesis b= 1wh en ever

    which ha s va r iance

    lb 11< t(n 2, 1oe/ 2)/ 6 2S n

    a nd to reject the hypo thes is ot herwise. Calibra tio n experimen ts are discussedag ain late r .

    1.2.4 Pr edictio nO ne ofthe pu rp oses o ff t tin gastra ight line toa set o f datamight beeither tointerp olate o r to ext rapo la te. Having ca rried out a regression o f y on x , t isu su a l to wa n t to pr edict ava lue of y co rrespo nd ing t oa kn own valueof x. T heobvio us pred ictor is

    = a +

    Va r (f ) = + (x - ISxH owever,we mu stb e careful to considerj ust wh at thisestimate is estim atingan d , in pa rticular , what its va r iance im p lies. Ifwe wereab letomeasure valueso f y repea tedly a t t his kn own value of x ,then the arithmetic mean ofthese y swou ld ten d to some f xed nu mber, con fusinglyca lledthe mea n valueofy . It isthis f xed nu mber wh ich we are estimating and the varia nce re present s t hee r ro rs of e stimationwh ichwe w ill m ake a sa result of using onlyestimates of aa n d b . O u r f xednumb erwouldbe a + bx wh ich we co uld ca lculate exactly ifo n lywe knew a a nd b .H o w ever, we su p p o sed in m od el ( 1) t ha t an ysingle readingof y wa s made u po f a+bx +e, each reading showing some unpr edictab leerror e from it s idea lva lue a + bx (t he onewe ha ve discu ssed in the p revious paragra ph). O u restim at e of an ysing leread ingwill be a + bx (as ourestima teo fe mu st be ze ro)but its va riance will b e

    (72 + 6 2 1 I + ( X 20 1S x

    the f rs t co mponen t for theerror o fthe rea d ing a nd t he second co mpone nt fort he error of pred ict ion.A nalte rna tive pr esenta t io n of this info rmationis to give conf d ence intervalsfo r a +bx a nd a + bx + e, he former being for the mean value of y a nd thela t te r for a single reading o f y . T hey are

    + bx t(n 2, 1 ao/ 2) ^ 2 ( 1 (x ) 2n S , ) ( 13)

  • 8/12/2019 regresioni shumfisht

    27/164

    SIMPLE LINEAR REGRESSIONand

    + t x + t(n 2, 1 a/2) 62( 1 + 1 + (x *12) (14)n Sxrespectively, for 1000 a) % conf dence intervals.Figures 5 and 6 show the conf dence intervals (13) and (14) plotted on thesame grap h as the regression line,y = a+ bx ; these grap hs give a more obviousimpression of conf dence interval (13) representing the precision of theregression line and conf dence interval (14) representing the interval whichgives some limits to the readings of y.

    F ig. 5. C onf dence interva l for the mean va lue of y .Figure 5 shows the loci of the conf dence limits for the mean value of y.Therefore, for a f xed value of x , there is a probability of (1 a) of the intervaldef ned by (13) containing the mean value of y.The whole region illustrated inFigure 5 should not be confused with the conf dence region for the liney = a +bx , i.e. the region such that there is an overall probability of (1 a) of itcontaining y = a +bx .To f nd the conf dence region for y +bx , replace t(n 2, 1 a/2) inequation (13) by [F(2,n 2, 1 a)] 2, where 1 a = foFt2,. - 2 , 1 - a)g ( F ) d F andg(F) is the probability density function of the F rand om variable with 2 andn 2 degrees of freedom. Tab les of F(n1, n2, 1 a) are widely available (forinstance, Tab le 18 of Biometrika Tables for Stat isticians, Vol. 1, Pearson andHartley, 1972).

    V . ; + 6

    Fig. 6. Conf dence or pred ict ion interval for a single reading of y .

  • 8/12/2019 regresioni shumfisht

    28/164

    14 M U LT I P LE R EG R E S S IO N IN H Y D R O L O G YIf, becau se of the n atur e of the variables x an d y , it is n ecessary to p redict avalue of x f rom an observed value of y , then t he natu ral estimate wou ld be

    Y x A 100(1 a)7 0 con f dence interval for the correct value of x would be

    (5c _ i ) 128 2 to [ 62(n S

    1 + + ( .1c (n

    i + 1 12 6 2 1 1/ 2

    Sxx xx ) ) S xx+ 82 t 2 0 . 2S xxwh ere t = t n 2, 1

    1.3 Extensions to the Basic Method1.3.1 Repeated observationsLet u s co nsider t he situatio n where , insteadof a single value o f y being recordedfor each value o f x , several independent observations of y are ava ilable.Alternat ively , we cou ld con sider th a t , by chance, therear e severa l values of y al lwith the same value o f x . A notation to cope with this situation is outlinedbelow.

    Such a situation might arise when x is a variable over which we have somecon t ro l or choice an d we are able to repeatedly observe values of y underidentical cond i t ions.Our mo del might be

    y o = a + bx i + e ij (for j = 1, 2, . . . , ri an d i = 1, 2 , . . . , n) (15)wh ich is no t very d if erent f rommo del (3).Let us def ne the following te rm s :

    2S ,Rx = E 1 ) 2 = E 114 E rixi N

    i = 1 i = 1 i = 1r ; n r iS ix; = E , ) = Ex; E yi; - E E yu) ( E rix i) / N1=1 i = 1 = 1 I = 1 = =

    2S yRy = E Y u P . ) 2 = E Y? E t y ) / Ni=1 = 1 = 1 = 1 i = 1 = 1

  • 8/12/2019 regresioni shumfisht

    29/164

    SI M P L E LI N E A R R E G R E S S I O Nwhere

    a n dN = E r i = ot a l n u m b e r o f y re ad ings

    i = IThen, the least sq uares es tima tes o f a a nd b a re

    = ;T. S.)za nd

    - S Rb = Y ( 17)sT he est im ate of slope , 5, is simila r to th a t wh ich wo uld have com e from f t t ingas t r a ight line to the pairs of po in ts (j , ,, x i) excep t tha t each p o int is we ightedacco rdin g to the number of y r ead ings t aken .

    T he va ria nces of the estima t e s a r e

    a n d

    r nY i . = Y aV r i Y = E=1 i=1j = I

    E 114Va r (ci ) = N S 1

    rix iV N

    ( 16)

    (18)

    Var (8 ) = isfa 2 ( 19)x

    When cons idering a po ssible estim a te of a2, t is wo rt h no t icing the extrap o t en t ia l o f er ed b y data in this form . A n est imate s imilar to (10) wo u ld be1 E r o7 d 5.0 2 (20)n 2 =1

    H owever , an a l te rna tive es t im ate is avai la b le byco nsider in g the va r i a bi l ity o fall the y va l u es which have bee n recorded for one x value . For ins tance ,1 L 9,.)2 j l

    wo u ld give a n esti m a te of cr2 fro m y valu es r eco rded with x = x i. U sin g sim ila restim ates of a 2 f rom y value s r ecorded with other x va lues , and com bin in gthese into a single exp ression , gives an es t imate

    1E y ) 2 (2 1)N n i=1 = 1 '

    W e m ight n ow u se t he estim at e (20) as a me asu re o f how we ll the l in ea r m o d e lf tted t h e d a t a . F o r m a lly, t his ma y be ac hieved b y mod ifying the mo d el (15) toy ri = a + bx i + Li + e, (fo r j = 1, 2 , . . . , r, a nd i = 1, 2 , . . . ,n) (22)

  • 8/12/2019 regresioni shumfisht

    30/164

    o M U L T I P L E R E G R E S S IO N IN H Y D R O L O G Y

    w here L 1, L 2 , . . . , L n a re unk nown parameters which allow for consisten td eviation s from a + bx ; n the means 5 . If we f nd the hypothesisL 1 = L 2 = = L = 0 to be valid , then this implies that a linear model= a + bx , + e,., sadequ at e for relating y and x. Conversely, if we f nd t he

    h ypothesis to be un reasonable , then th is imp lies tha tthe linear mo del does n o tad e q u a tely explain the relationship between y an d x .A n an a lysis of va ria nce tab le p rovid es a n eat sum ma ry of the info rm a tio nn ecessary to test this h ypo thesis, as well as the hyp othesis b = 0.

    T h e c o lu m n m ean sq uare has been derive d fro m ( su m o f squ ares/d egrees of f r eedom ) .In an an alysis o f varia nce tab le, the to tal variation inthe data (represented b ythe to ta lsum of squar es) is par titionedin to a series ofmean ingful ind ependen tq u antities. In this case,

    Total Variatio n explained Variation explained + Erro rvariation bythe regression line b y L 1, L 2 , . . . , L n varia tionor, in other words,

    Systemat ic depar tureTota l sum = Regression sum + from regression line Residual sumof sq uares of squares sum of squares of squaresIn m ost p roperly cons t ructed analysis o f variance tables, the ratio

    M ean squar e due to XResidual mean square

    will follow an F d istr ibu tion , with degrees of freedom equa l to those of thenumera tor a nd those of the denominator respect ively, whenever X has no realef ect , o r ro le, in explaining the to tal varia t ion .Th u s ,Regression mean square F I,N- nR esidual mean square

    when no variat ion has been explained by the regression , i.e. when thehypothesis b = 0 is tru e .

  • 8/12/2019 regresioni shumfisht

    31/164

    SIMPLE LINEAR REGRESSION

    The 100(1 a) tyoconf dence intervals for a and b are

    Similarly,Systematic departure mean square Fn_2 ,N _ nResidual mean square

    when there is no systematic departure from the regression line, i.e. when thehypothesis L I = L 2 = L= 0 is true.Hence, a 100a % signif cance test would lead us to accept the hypothesisb = 0 wheneverRegression mean square < F(1, N n, 1 a)Residual mean square

    Similarly, a 100a % signif cance test would lead us to accept the hypothesisL i = L 2 = = L,, = 0 wheneverSystematic departure mean square < F(n 2, N n, 1 a)Residua l mean square

    and+ t(N n, 1 a/2)

    + t (N n, 1a/2)respectively, where a2 s given in equation (21).1.3.2 Fitt ing and comparing several straight linesIf several sets of rainfallun-of data have been collected from dif erent sitesand a linear model has proved to give a satisfactory explanation of the data,then it may prove useful to compare the estimates of a and b calculated fromthe data from the dif erent sites. Some interpretation may be attached to a andb; for instance, if we interpret a as the run-of from zero rainfall and b as theproportion of rainfall appearing as run-off, then subsequent comparisons ofthe estimates of a and b will give some idea of the similarity of the sites in thesetwo features.Let us assume that there are n sites from which da ta have been collected andthat , from site i, the data consist of r, pairs of readings of y and x which aredenoted by (y ,i, x ,j) (for j = 1, 2, . . ri). It will usually be sensible to f t separatestraight lines to the data from each site. For site i, the model equivalent to (3)would be

    = a, b1x 11+ e1 (23)Estimates of a, and b, would be derived by applying the basic method

  • 8/12/2019 regresioni shumfisht

    32/164

  • 8/12/2019 regresioni shumfisht

    33/164

    S I M P L E L IN EA R R E G RE S S I O N 19

    Being ab le to accept this hypothesis leads to simpler and more mean ingfu lc om pa r ison s between t he otherstatistics. Th e ap pr op ria te tests are mo st easilyd ispla yed in an analysis o f variance tab le bu t , in order to avoid cum b ersomea lgeb raic exp ression s, it will b e necessar ytointro duce some newno ta t ion . Letu s d ef ne the following ter ms .

    Sx = E (x i./ 1 i . )2

    [H en ce, ii i = 5 ;,y1S cx.]

    where

    57,y =

    .i = 1F .

    S tyr = Ej = 1= (yu}7.)(xiJj = 1

    S : = E (x ij - 1 . . ) 21= 1 j = 1(y J f . .)2= 1 = 1

    S : y = E (x ii1= 1 J=1

    to = s:i sihI v ,L L a n dj = 1

    T he f rs t fou r quant i ties are calculated usin g dat a from just a single site.H owever , al thou gh the rema inin g express ion s ar e of a similar type to the f r s tfo u r , they invo lve dat a fro m al l of the sites. T heyare calculated by igno rin g thedistinction of site s a nd using all of the d ata from all sites to give an 'overall'quanti ty (o= overa ll).

    F ina lly, let us d ef neS xe x = E sxx

    i = 1

    S :y = E S yi = 1

    )(yu . . )

    n= E E y u i_ y y=y

    T hese four exp ressio n s ar e of a simila r type to S : x , S : y , etc ., in tha t th ey involve

  • 8/12/2019 regresioni shumfisht

    34/164

    20 MU LTI PLE REGRESSION IN HYDR O LOGYdata from all of the sites. However, they do not ignore the identity of the sitesbut give a combined quant ity which allows certain dif erences that might existbetween the sites to be taken into account (c = combined).If we were able to conclude that the hypothesis b 1= b2 = = b wasacceptable, then be would represent a sensible combined estimate of thecommon slope. However, if. in addition, we were able to conclude that thehypothesis a l = a2 = = awas acceptable, then bewould be a more sensiblecombined estimate of the common slope.

    OverallregressionD if erence inpositions

    Dif erences inslopesResidual

    Total

    Source Sum of squares Degrees off reedomt os, 5;, best )

    E hes ,i=E

    1= 1N 2n

    STy N

    Mean square

    ( Sum o f squ aresDegrees of freedom)

    A similar procedure may be applied to this analysis of variance table as to t heprevious one. However, it is preferable to carry out the tests in the followingorder:for 100a % signif cance tests

    1 . accept the hypothesis b, = 62 = = bwheneverDif erence in slopes mean square < F(n 1, N 2n, 1 Residual mean square

    2. if the hypothesis 61 = b2 = = 6, has been accepted, then accept thehypothesis al = a2 = = ; wheneverDif erence in positions mean square < F(n 1, N 2n , 1 a)Residual mean square

    3. if both hypotheses b 1= 62 = = band a l = a2 = = a have beenaccepted, then accept that there is no linear association between y and xwheneverOverall regression mean square < F(1, N 2n, 1 a)Residual mean square

    For more complex comparisons, such as concurrency of regression lines, thereader is referred to a more advanced text on regression analysis, such asWilliams (1959) or Seber ( 1977).

  • 8/12/2019 regresioni shumfisht

    35/164

    S IM P L E L IN EA R R E G R E SS I O N

    1.3.3 Observa t ion s wit h u neq ua l p recisionOne o f t he a s sumpt io n s me ntioned in su bsect ion 1.1.2 wa s th a t a ll of the yvalues should be measured with e qual prec is ion , i . e . the f u c t u a t io n o rvar iab i lityin ea ch y value shou ldbe the sam e . T his will no t a lways be t ru eforhydrologica ld a ta and met hod s fo r d e t ect ing w het her this is the ca se, wh ich u sethe da t a on ly, ar e given in Section 4 . 3 .

    By considering the type o f d a ta be ing record ed , or by u sin g t he re su l t s o fprevio u s stud ies ,it ma y be possible to re la te the varia nce of a y value to the yva lue itself . If it is possible, then t he p r ob lem o f unequ a l prec is ion may beoverco me by tak ing so me t ra nsfo r ma t ion o f t he y va lue s as d escribed inSect ion3 .3 .

    O cca s ion al ly , t he varianc es o f the y values a r e kn ow n exac t ly. T h is wil l n o tu sua l ly happen when , fo r in s tance , y is r u n -of and x is ra infal l. H owever , i tm ay occu r w hen the ys ar e som e stat isticssuc h a s the slo pes of aregression lin eca lcula ted o n sepa rate sets of d a ta which are being rela ted to some feature xmeasured on ea ch of the se ts of da ta.

    O u r info r ma tion will t hen co n sist o f th e pair s of p oi nt s (y 1, x 1) , . . . , (y , x )tog e ther wit h the n var iances of the y values , a ?, a ; , . . . , a . The est imate ineq u at io n (5) wil l s t ill be an unbiased es t im a te of the slope of t he re gressio nlin e . H oweve r , und er these new a s s u m p t i o n s , i t s variance wil l be( E 1= ol (x i X)2)/ (Sxx)2 and t his is larg er tha n the var iance of a n a lte rna tivee s t i m a t o r , s x;

    S x xwh ere S a nd S xwy are d ef ned as fo l lows:

    w he re

    a n d

    S r;= E wi(xi i=S b; = E wi(x i

    i = 1

    (E . . 1 .wiy i) 1 ( E wi) ( E w. ;) / (iE, w) a ndi = 1 i = 1 i =T he co rre spo nd ing est im at or of a is

    ti = )7 1.The va ria nces of these new es ti m a t o r s a r e

    Va r (5) =1Va r (d ) n +

    W i i = 1

    1(7 2

    (24)

  • 8/12/2019 regresioni shumfisht

    36/164

    22 M U LT I P LE R EG R E SSI O N IN HYD R O L O G Y

    T h e100(1a)% co nf de nce inte rva ls fo r a a nd b would be

    an d

    wh ere

    an d

    wh er e

    a n d

    Y i . = ( 2 y u) I r iJ= 1

    + 4 (4 2 )

    a Z (a/ 2 )

    i = 1

    i = 1

    Var(8) = 1S r ;

    Srx

    -

    1

    Z OO 1 - -= e - 2 duI_0 27c

    ( Th euse o f the n o r ma l d istribu tion ra ther than the t distr ibutio n intheseca lculat ion s isa direct co n sequence o f kn owingthe variances of the y va lues.)If repeatedobservatio ns ar eava ilable,and if the y readings associatedwitheach value of x ; ha ve var ia nce 4 , then t he estimates in equation s ( 16) and(17)will beco me

    S WRb =x x

    wh er e S r: a nd S 'xvyR ar ed ef ned as follows :S Z,R = E x )=S r: = E

    E wirix s) / E win) an di 1 i = 1

    T he var iancesof a a nd B will beVar(a) =

    E1 wi t.19 1.) / i wit)i = i = 1

    1 swtE wit.;

    W I =.2

  • 8/12/2019 regresioni shumfisht

    37/164

    S I M P L E L IN EA R RE G R E SS I O N

    T heanalysisofva ria nce t ab le will beco me

    R eg ressio nSy s tem a t ic depa r tu r efr o m

    reg re ssio n lineR esid ua l

    T o ta l

    Sour ce

    1.4 Alternatives to Least Squares

    S um of squar es Deg rees of r eedom

    P S71 a n 2

    i = 1r iE ivicy;, .13 . ) 2 N ni= l j = 1

    1= 1 j= 1

    T hemet hod sof testin ga nd t he co nclu sion s ar e similar to t hosed escrib ed insu bsection 1.2. 1.As w ill b e seeninSect ion 3.3, it isa helpt oh ave rep ea ted o b servation s in ast ud ywhereitis susp ected th at th e var iance o f y ma y no t remain co n stan t .In itially,it is st ra ightf o rwa rd t o test whet he r the var ia nce o f y h as remain edcon st an ta nd then , if it ha s n ot , it ispo ssible t o allowfor t his even when theva riances Gq, a 2 ar e u nk n own . Est im ates o f th ese va ria nces may b eo btained fr om

    = (Y i .?j ,a nd thesema y b e u sed in t he p reced ing theo ryto give estim a tes of a a nd b .However , in fere nce from the con f den ceinterva lsand the an a lysis o fvar ian cet ab leshou ld be ma d e wit hca utio n , par ticu lar ly when an yofr 1 r2, r ar esm all.

    A ltern at ively, p lot t ing (it aga inst x , ma y suggest that a re latio nship existsbetween t he var ia nce of y an d the va r iab le x (e.g . (4 .= cc; or = ax l) . If suchare latio nsh ip were, fo r ins ta nce , a ?=ax i, t hen wi in equ at ion (24) cou ld berep laced by 1/(ax 1 givin g

    h = ( 029)(xiz)) 1( f(xi z)2)x; ) 1V; s )O ther wise, a t ra n sfo rma tio n o fy migh t beappr op r iat eand th is techn iqu eisdesc ribed in Sect ion 3.3 .

    1.4. 1 Penc ilan d r ulerAn yone who a ttem pt s a regress ion a na lysis w it hou t p lot t ing thed at a in som eformisask ing fo r tro u b le. A p lo t o f y aga inst x o ngrap h pa per will revea l t hetype o f re la tionsh ip t ha t might exist between y a n d x . It will show whet her y

  • 8/12/2019 regresioni shumfisht

    38/164

    2 4 M U LT I P L E REG R E SSI O N IN H Y D R O LO G Yinc rea ses or decr easeswit h x and wheth er th e rela t ion ship is linea r or no n -line ar .Itwill suggest howstr ongor wea k t herela tio n sh ipmight be a nd , ind eed ,wh et her ther e is an y relat ionsh ip at all . It will sh owu p po int s wh ich ar eo bvio u sly d if eren tfro mthemajo rit y and itwillind icat e t hera nge of y a nd xover whic h therela t ion ship has b een investiga ted .Wh y t hen d o weno t f n ishthe j o b , dr aw a linedown t hemid d leof t he d at aan d for getabou t ma the ma tical fo r mu lae an dcalcula t ion s ? T he ma in reaso nsar e t hat leas t sq u ar esis a met hod wh ich is im pa rt ia l, gives rep ea tab le resu ltsa n d pro videsa fra mew or k fo rinferenc e.F urt hermo re , if yougenu inelybelievet h a t t helinea r regressio nmo d elis t heapp ro p ria teone ,then leastsq u ar esistheme t hodwhic h w ill givet he ' best l ine' (i.e .t he mo st p recise est im a tes of a an d b) .Im ag ine being faced with a plot o f po int s ; t here is fre que nt ly n o nat ura l'mid d le' ,n oper sonwo u ld ha vemu ch co nf d en ce in som eon e else's st ra igh tline,a n d who , in an ycase , co u ld qua n tify the pr ecisio n of their st raig ht line ?

    U n fo r tu n at ely, lea st sq u ar esest ima tion will n o t necessa rilygive't h e righ tlin e' ;it is ,at bes t , an in t elligent guess . I t re lies on cer tain assum pt ion s and ,co n sequ ent ly, if t hese are n o t val id, t hena cr itica l assessmen t bv eve , wh ichd isco u nt ssome p o int s a nd gives gr ea te r weigh t to o t her s, ma y give ast ra ightline which bet ter suit sthe sho rt ter m object ives t ha t t heexpe rim enter ha s inm ind . H owever , in t helong term , hewill p ro b ab ly ben ef t fro minvestiga t ingt he reaso n s wh y t he leas t sq uares assu mp tion sar e inva lid .1.4 .2 R o b u st an d distr ibu tion f ree met h od sD ist r ibu tion free met h od s of est im at ion a nd test ing occu p ya n interm ed ia tep o sitionbetween t hepen cil a ndru ler meth od an d themet hod of lea st squ ar esest im at io n . They do n o t req uire as ma n y as su m p tion s as the leas t sq ua resme t h od b ut , never theless ,they d oallow in ference,a swellases t im a tion , to bem ad eontheslop epa ra meter b. T h e assu mp tion s u sua lly requ iredar et ha t there la t ion ship between y an d x is of t hefor mdescrib ed in equ at ion (3)an d tha tth e es ar e mu tu allyind ep end ent and foll ow the sa med ist rib u tion .Asimp led ist rib u tio nfr ee meth od ofesti ma tion is to tak e a pa ir of p o ints,say (y xi) a nd (y1,xj) , an d toca lcu lat et he slop eo ft he linejoin ingthese twop o in ts, i .e. ca lcu late

    =T h is is rep ea ted fo r all n (n 1)/2 p air sof po int s to give n (n 1)/ 2 sepa r at eslo pes, b 12,f n ,. . 115,_ s. T hen , the nu mbe rs 612,613, , 6_ ar e ar -ra n ged in inc rea singord er o f mag n itu d eto givean o rd ered sequ en ce d en o tedb y b(, ) 8(2) 6 3 ) 6 0 .9 wh ere N = n (n 1)/ 2. T he med ian of t hisset o f nu mb er s b,(N 0 / 2 ) if N isodd a nd W 7(N7 2 ) bo12 if N is even)ist henta k en as the est ima te of the slope pa ra mete r , b.T o ob tain a n ap pr o xim a te 100( 1 ct) % con f d ence interva l fo r b, thefo llowing q u an tit iesar e ca lculated:

    nea res t in(n l ) (2n + 5) )r, = integer N Z (a/ 2) 18to

  • 8/12/2019 regresioni shumfisht

    39/164

    SIMP L EL I N EA R REG R ESS I O N

    a nd

    wh ere Z 4 2) isdef n edin su b sect io n 1.3 .3.T he lower a n dup per lim it sof t he 100( 1 a) c on f d enc e in te rva l fo r b a re

    ta ke nto b e br o a nd 60.2, re s pec tive ly.Th isap pr o xim a tepr oc ed ur eiso nlyva lidfo r a re la t ivelyla rg e n. F or a n exa ct pro ced u re , there a d er is re fer red toH o lla nd er a nd W o lfe 1973), p. 207.

    A nalt er na t iveand in gen io u sm et ho d w a s d eve lo ped by D a nie ls 1954). Itisb a sedo nthefa ct tha ty = a+ bx m a y be w ri t te nint he fo r m a=y .vb a nd th a tthiseq u a t io nma y be reg a rd edasast ra igh t line re la t ing a a nd b, wi thslo pe a ndin te rc ep t y . T hu s, t he se t o f re ad in gs yx1) , , y ,x) ma ybere p re sen ted as n ine s a =y i x , b), a= y 2x 2b), .. ., a=y x b) wh ich ,inpic to ri a l fo rm ,migh t lo o k like F igu re 7.

    Id ea lly, wewou ldex pec t a llt he line sto in te r sect a t o ne po in t w hic hwo u ldgive us ou r est im a te s o f a a nd b. O f co u rse ,t h iswo u ld on ly occ u r if a llt heo r igin a l p o in ts y 1, x 1) , .. , y x ) ha p pe ned to fall exa ct lyon ast r a igh t line .W ewill u su a lly ha ve to cho o se so me regio n inthe ' m id d le' o f thema ss o fin te r sec ting lin es a s co nt a in in g ou r est im a tes o f a a nd b.

    A s is illu str at edin Figu re 7, he p ictur ewill u su a llyco n sist o fa setof c lo sedreg io n s (n ea r t he m idd le) a nda set of o pe nreg io n s (a ro u nd the ed ge) . Aco nv en ien tsco r e fo ran ypart icu la r re gio n is d en o te dby m a nd de f n edt

    obe t hemini m u mn u m b er o f lin eswhic hh avetobe c ro ssed toesc a pe fr o mtha treg io nin to the nea re st op en re gio n .

    a

    ne a res tr 2= in te ger -21 N+ Z 4 2)

    \ I nn I ) 2n+ 5))18to

    F ig . 7. Da ta po in ts re pr ese n tedb y aser ies of st ra igh t line s .

  • 8/12/2019 regresioni shumfisht

    40/164

    26 M U LT I PL EREG RESSI O N IN H Y D R O L O G YT hus , if oneregio nha sthe larges t n, henitwou ldseemrea son ab le t o ta ke aa nd h as beingintha tregio n . However , a st hisd oesnotgiveus u niq u eva lue s ofa a n d b a ndas it is n o t necessa rilytrueth a t j ust o ne regio nwillha ve t helarg estm , it seem stha taco nf d ence inter va l is the na tu ra l o utco meofthismet ho dofes tim a tion .T he100( 1a)%co nf d enceregio nfo r a a nd h is mad e upofallth o se regio n sfo rwhich m > mo wh eret hevalueo f m o sca lcula tedas fo llows :

    fo r a = 0 .05 ,

    fo r a =0 .0 1,

    wh ere

    nearestm o nteger (n 3.023 1 T)

    tonearest

    = in teger O 3.562to

    T heseva luesof m c, a reapp ro xim a t ion sfo rlarge ii.However ,the for mer is no tm islead ing fo r n > 12 a nd the lat te r fo r n > 16a nd ,in bo thca ses , when n sbe lowt hese lim its ,the exac t va lueof m c,szero .A lter na tively ,the exactva lueofm omaybeca lcula tedb y so lvingco z 1 e - (2r: +z)2/2 = r=0\./ r

    fo r mc,where z = (n 2n/0)/ 1 ;i.W e mayuse th isinfo r ma t iontotesthypo thesesab o u t a a nd b . F or ins tan ce,in ord er to te stt heh ypo t hesis a = 0, h= 1, wecheck whet her the regio n inwh ich th is po in t fal lsha s beeninclud ed int he 100( 1a)% co n f d ence in t erva l.Ifitha s, t hen weaccep tthehypo t hesistha t a = 0,b = I ;ifitha sno t , t henwerej ectthe hyp o thesis .T h is will givea 100a %signif ca ncetest .Asim ila rusema ybemad eo fthe co n f d enceinterva lwh ichwasca lcu la t ed byth e previou smet hod . F o r a 100a% sign if ca nce test , we sho u ldaccep t thehyp o t hesiso n b wh enever t heh ypo t hesized value o f b s includ ed in the1000 a) %co nf d ence in terva l.1.4 .3 Bayesian me tho d sIn this sec tio n on Ba yesianmeth o d s,itwill be mo re co nven ien t tota kethemo d el relat ing y an d x intheform

    y i =a+ f (x i ei (25)_ 1x L x in

    By co mp ar ing th is mo d elwit h mod el (3), itwill beseen tha t a=a a ndb= 13. U singthesamen o ta t ionas b efore, t helea st squ ar es estima tes o fa a nd )3a re 1 (26)n 1. 1

  • 8/12/2019 regresioni shumfisht

    41/164

    SI M P L EL IN EA RR EG R E SS I O N

    an d

    an dthevar ia nceso ftheseestima tesar eVar (a ) = oI n an dVar f ) . crIS , Ifthe assu mp tion sd escrib ed in sub sectio n 1.1.2a revalid ,thentheest ima tes, aan d , ar e ind epen d ent .

    Ba yesia n method sallowt heu seo finfo r ma tio nab o ut aan d wh ichisad d itio n al t otha tpr ovid edbyt hed a ta .Id ea lly , the info rma tionab o u t ash ou ld ta k e the fo rmofad istr ibu tion(ca lledt he p rio rd istr ibu tion )wh ichwo u ldgive t hepo ssible valu es o faan d how likelythey are tooccu r ,i .e.t he prio rd ist ribu tion fo rawou ldb e a su mm ar y of thestat eof kn owledg eab ou t abeforet he datainqu est ion wereava ila ble . A sim ilardist r ibu tio nsh ou ld beava ilab lefor 13.

    F o rexamp le,wemigh t assu me th att hep r io r d istr ibu tio nfo rais N o rma lwit hme anp an d va ria ncec , i.e . ou rpas texp er iencesuggests ate nde ncyforatotak e valu escent redab ou tpa wit hthevariab ilityab ou tth at po inth avin gt h ech ar ac ter isticsoftheNo r ma l d istr ibu t ion . W e mightalso assumetha tthepr io rd ist rib u tionfo r isN o rma l,bu t withmeanPpandvar ia ncea t .Ano bj ective of a Bayesianan aly sisis to upd at ethe prio rdist rib u tion s b yinc lud ing theinfo rmationonaan df con tain ed inthed at a. Th e resu ltin gd ist rib u tion ,the 'u pd at edprio r' ,is ca lledt he p ost erio r distr ibu tiona nditsum ma r ises all tha t is k n own ab o u t aa nd /3, inc lud ingthe in fo rma tio nco nt ain ed int hedat a .

    Ino u r exa mp le,wh ereN or ma lpr ior distr ibu tio n sareassumed for aand 11,the po ster ior d istr ibu tion sar e as fo llows :

    fo ra,

    fo r ,

    n 1 n 10.2 2 6 2 0. 2/

    n 1p 6 2 a o.a2

    0 SP f ? 1

    S (27)S x

    S x 1 S 1+ 2 + 2Crp

    Ho wever , if wewa nttorep o r ton lya singleva luefo ra, thend wou ldbenat u ralto usethemea n o fthepo steri or d istr ibu tionof a,

    T h is is ca lledtheBayeses tim a to r ofaa nditisclear lyjusttheweighted mea nof

  • 8/12/2019 regresioni shumfisht

    42/164

    28 M U LT I PL EREGRESSI ON IN H Y DR O LOG Yt he lea stsqu ar esest im a teo faan dthe pr ior mea n ofa . Similar ly , the Bayesest ima to r of [3 s

    111,3?

    fS 12 + 2

    C o m plete prio r igno ra nceab out a pa ra met eris u sua llyexpressed byusingau n ifo rmp r io rd istr ibu tionfo rtha t pa ra mete r . If un ifor mp r iord istr ib u tio nsar ea ssu medfo raan d , henit fo llowstha tthep o ster ior d ist rib u tion fo raisN (a , 0 2/ n )a ndt hep ost erio r d istr ibu tio nfo r /3is N (f ,o-2Sxx).T hus ,ifno t hin giskn ownab outaa nd p rio r toco llect in gthed a ta ,thentheBayes esti ma tor sofaan d /3willco r respo ndwit ht heleast sq ua resestima tesofaan d 13.T heBayesianmeth odh asthepo tent ia ltoinco rp o ra teintothees tim a tiono faa nd16 a ll sh ad es ofopin io nan dk n owledgewhichca nbesu mm ar isedinthefo r m of a p rio r d ist rib u t ion . H o wever , it is mo re likely t ha t o ur p rio rk n ow led gewillcon sist of severa l ind ep end en t estim atesof a a nd 13 whichweh aveprev iou slyd er ivedfr o m sim ila r se t so f dat a toou rp resen tset .Th u s,a lth o u gh we migh tbeab letogue ss at t he fo rmo f thed istr ibu tiono f theseest im ates , wewill p ro ba blybe quit eu n a b letod escribeit p reciselyandsayth at ,fo r inst a nce ,it isN o rma lwitha par ticu larmea na ndap ar ticu larvar ian ce .Em pir icalBa yes meth od s havebee n der ivedspecif ca lly toco pewit ht hisp ro b lem. Sup p o se th at , o n k 1 p revio u s occasion s in co mp a ra b lecirc u mstan ces, da tasetssim ilart o thep resen t on ehavebeen co llectedan d ,fr o meac h dat a set ,estima tesof aan d /3ha vebeend erived . D eno te the sees t im at esbya1, . . _ and , , F ro mt hep resen td a ta ,wema yca lcu lateleastsqu ar esestima tes o f a, 13 an d o-2 as givenby(26),(27) a nd( 10) .D eno te these estima tesb yaik, k a nd6 2, respect ively .D ef ne h2 tobethe larger o f

    n \ 1/ 5 1 (65 ) a ndkJ=1a ndd ef n e ig obe t he lar gerof

    wh er e

    N ow , let

    ( / ) 5 /71 j E l ( f f - T2 a ndka nd f =-k = 1

    F Cj= li an d 131=Cj = k2 k j 131=A 21113

  • 8/12/2019 regresioni shumfisht

    43/164

    SI M PL E L I N EA R REG R ESSI O N

    T he n ,an em pi rica l Ba yes es t im a te o f a is

    6 2 E ((sin AJ)IAJ)2 + ((sin B Y B ) 2j =a E ((sinM A) 2./ =1

    a nda nem pi rica l Bayes es t im a te of /3is

    E ((sin C) / C) 2+((s n DJ)/ D) 2i =6 2f k

    ((sinCJ)/ CJ) 2J =

    F or fu r the r d et a ils o f th ismet h od , the re a de risre fer redtotheorig ina l pa per b yC lem me r and Kru tc hk o ff (196 8) .

    1.4 .4 Line a r func t ion a l re la t io n sh ip sItha s bee n em ph a sisedtha t the lin ea r reg res sio nmod el ( 1) es sen t ia llyassu medt ha t er ro r , ra nd o mva r ia t io n ,etc .on ly af ect edthe depe n d en t va r ia b le , y . Amo re ge ner a l,an d pe r ha psmo re r ea list ic, m o de l mig ht a llowbo th y a nd x tobera nd o mva r ia b les .

    T he fu nc t io na l re la t io n sh ipmo d elassu m es t ha t a lin ea r re la t io ns hi p wo u ldex ist b etw e en y a ndx , if y a ndx co uldha ve been rec or d ed in ide a lisedci rcu m sta nce swh er e noe r ro r wa s mad e .

    H enc e, t he fu nct io n a l re la t io n sh ip m od e l a ssu me sidea l y = a +b (id ea l x )

    H owev er ,theno r ma lre ad in gs t ha t we a r eab letota k e of y a nd x a re re la tedtot he id ea lised ones by

    a ndy re ad in g=ide a l y + e

    x r ea d ing =ide a lx +wh er e e a nd 3re p rese n tt he er ro r s . Thu s , fo ro ur n pa ir s ofre ad in gs ,(y 1, x 1) , , (y , x ) , the rewill be a n asso cia tedset of (u n k no wn ) e rr or s ,(e 1, 3 1) , . . . ,(e, 3), a nd o u r m od el wil l be

    (y ; e1)= a+ b (x ; Si) (28)W ewillalso a ssu me th a t bo th the e a ndthe 3 er ro rs are no r ma llydist ribu tedwit hva ria nce scr,2and 01, re sp ect ive ly .Co n seq uen t ly, weareassu m ing t ha t a ll yo bser va t io ns a re m ad ewi theq u a l p rec ision ,a nd likew ise fo r the xo bser va t io ns .

    T he re fo re ,ifwearest ud yin gasit ua t io nin wh ich bo th y a nd xaresu bj ect toer ro r and mo d el ( 1) is in a p pr o pr ia te , t henwe mig ht beo b ligedtou se thisfu nct io na l re la t io ns h ipmo de l .Atfr st sigh t , it m igh t seemth a titwill a lw a ysbe

  • 8/12/2019 regresioni shumfisht

    44/164

    30 M U LTI P LE REG R ESSI O NIN HY DR O LO G Yb et ter tou se thismod el, pa r t icular lyasmo d el( 1) is j u staspecia lcase of mo d el(28) with Si=0 . H owever , in ord er toes t ima te a an d b n mod el (28) , mo rein fo rma t ion is req u ired tha n tha t provid ed bythe n p airs of read ings alo ne .T h eb ar e minim um of info rm at io n req u ired is k nowled geof eith er (a) a ,(b) 01 or (c) the ra tio A = c i ao.In eac h ca se,H owever , in ca se (a) ,

    , SYY - n 1S xyin ca se (b ),

    wher e

    = .);

    - xys - (n D a l

    (29)

    (30)an d , in ca se (c) ,

    (Sy,. A2ku (s yy_ A 2s x y 4 A 2 s x y ) 22S y

    If t h e n umer at or of eq ua t ion (29) is nega tive, th en t ake = 0 . If th ed en o mina to r of eq u a t ion (30) is n ega tive, t hen tak e b= a ) .In ca se (c) , a 100( 1 ) %con f d ence int erva l fo r b sgiven byA an( tan - 1( L3) I sin [2 t(n2 , 1a/ 2)X ]) 2

    (3 1)

    22(S xxS (S xy)2)x 2= n2 )[(S yy A 2s n 2 4 A 2 s x y) 2 ]In order to test the hypo thesis b= bo(t ypica lly , 60 migh t be 1or 0) , it isp ro b ab lyea siest to co mp u te thea bo veconf d ence in terva lan d then to checkwh et her 60 s inc lud ed in this inter val . If it is, then weaccep t the hypo thesisb=60; o ther wise , we rej ect the h ypo t hesis . T his will pro vide a 100a%sign if ca nce test .T he est im a tes of b given abo ve ar e the maxim u m likeliho od esti matesa p prop r iat e fo r the th ree d if erent si tu at ion s. A n a lter n at ive q u ickmeth od ofest im at ion is as follows :

    1. P lot (y x 1) , y 2, x 2) , (y , x ).2 . D iv ide t he x axis int o three pa rt s so tha t a pp roximately 1/ 3 o f theo b served x va luesfal lineac hpa r t .(En su re tha tt hef rst an d last gro upco nta in a n eq u a lnu mber wh ich is a sclose to n/ 3aspo ss ible .)3. C o mp u te t he a r it h met icmea n s of the x an d y va lues in thef rst gro up(den o ted by.kiand 3 1 respectively)and thethird gro up (d eno ted by 1 3a nd 373, resp ec t ively).4. Est im ate b b y b= ( j 3 P 1 ) / ( 1 3 1 1) a nd est im a te a bya=j7

  • 8/12/2019 regresioni shumfisht

    45/164

    SI M P LE LIN EAR RE G RE SSI O N

    A 100( 1a) c on f d en ce inte rv a l fo r b ma y be fo r med a lt ho u gh it r eq u ire sco n sid er a b ly m o re c a lcu la t io n . T he followin g ta ble illu stra te s t he d a ta a ft erbei ng d ivid ed in to thr ee grou ps .

    DA TA

    Ar ith met ic mea nk sho uld be a snea r to nI 3 as po ssible and m = n2k .

    Let u s def n e t he fo llow in g ter m s :

    S -? x = E (x11- 11)2+E ( x 2 i 1 2 ) 2 E )C3 i 13 ) 2i = 1 i= 1 i =E (x11- 1)(y11- .)71) + E ( X 2 i 1 2 ) ( 3 2 i .f 2 )G

    References

    values values values values val ues values

    X I I Y I1 x 2 i Y 2 I x 3 I y 31x I 2 Y I 2 x 2 2 Y 2 2 x 3 2 Y 3 2

    x I k Y l k X 2 m Y2m x 3 k Y 3 k

    i = 1 i = 1

    Group I Group 2 Gro up 3

    x l Yi X 2 172 -C3

    +E )C3 1 1 3 ) ( y 3 i 9 3 )i =1 S t y = E (h i 9 1) 2 E ( 12 i 37 2 ) 2 E 023,- .373)2i = 1 i = 1=

    T he lowe r a n du ppe rlimit so ft heconf de nc e in te rv a l fo r b a re givena sthetwor o o ts of the q ua d ra tic eq u a t io n in b,

    [t(n 3 1 a/ 2) ]2(1 3 m i ) 2 ( b b )2 = (n 3) (S }Gy 2b Sx6), b 2 s t )

    C lem m e r , B. A. a nd K ru tc h k o ff , R .G .( 196 8) . Biom e trik a , 55( 3), 52 5.D an iels,H .E . ( 1954) . A nn . M a th . S t a t ., 25(3) , 4 99 .H o llan d era n d Wo lfe( 1973 ). N on p a ra m et ric S t a nistical M e thods . J o hn Wiley &So n s .Lew is ,W . K .( 19 57 ). I nve sti ga t io n o fRa in fa ll, R u n -O ffa nd Yield o n t heAlwen a ndBre n ig Ca tc h me n ts .Pea rs o n , E .S .a n d H art ley , H . 0 . ( 19 72) . B iom e trik a Ta blesf o r S ta t isticians , Vo l. I .

    C a mb r idg e U n iver sity Press .Sebe r, G .A .F . ( 19 77) . L inea r R eg ression A na lys is. J o h nW iley& So n sW i llia ms , E . J . ( 1959) . R eg ress ion A n alys is. J o h n Wiley &So n s .

  • 8/12/2019 regresioni shumfisht

    46/164

    Cha pter 2

    M U L T I P L E L I N E A RR E G R E S S I O N

    2 . 1 Intro duc tio n2 .1.1 P ro b lems fo rmu lt iple linea r regression an alysisA n investiga tor may , fo ra variet y of rea son s, beint erest ed in stud yingthere lat ion shipbetw een ra infal land ru n -of in a pa r t icula r area . G ivenra infa lla nd run -o f reco rd s,hewo u ld pro bab lyfnd linea rregressio nmet hod shelp fu linac hievinghis o bjec t ives . Ho wever ,itwou ldbefoo lishtosu pp o setha t , giveninfo r mat iono nra infa llon ly ,heco u ldho petopred ict accu ra te lyt heresu lta n trun -o f .Ma n yother fac to rs , somequ an t if a ble,willinf uen cethe ru n-o f inapa r t icula r a rea .F o r inst a nce , ra in fall in ten sity a nd evapo ra t ion ma ybo t hin f uencetheresu ltingru n -o f .T hus, a rea list icd at a ba sewou ld no tju st co n sist of ru n -o f andrainfa llrea d ings on ly ; it wo u ld co n sist o f read ings o n ru n -o f (ca lled the d epend entva r ia ble) andrea d ingsonasmany fea tu res wh ichar eliab letoinfl ue nce ru n-o f(ca lledtheind epend en tva r ia b les) a s itis sensib le togat her .It istothistype o fd at a ba set h at t he tech n iqu eof mu ltip le linearregr ession an a lysis ma ybeap p liedwithpr o f t . U sing mu ltip lelinearregre ssio n , it ma ybe po ssible toac hieve object ives sim ila rtothose out lined in thesequ ence (a)g) given insu bsect ion1.1.1where ,in stead of o nlyra infa ll, we have awho le co llect ionofind epend ent var iab les. O nce we p ro gress fro m stud ying how o ne o r tw ova r ia blesinf u enceat hir d ,gra phica ltech n iqu esan dvisua l a ssessment beco memo red if cu lt a nd we ha ve to re ly muc h mo re o n ma t he ma tical mo de ls.Ho wever , thisd oesno tmeanthattheoutc o me o f am ult iplereg ressio nan alysisca n n ot bequ estio n edorassessed . A ppliedco mmo nsen seis evenmo re vitalinch eck ing fo rnu merica lblu nd ers, in validassu mp tion s,etc .whenin terp ret ingt he ou tco me o f a mu lti p le regressio n a na lysis or co n side rin g u nexpectedfea tu res o f the d at a .2. 1.2 A ssum pt io n smad e inmult ip lelinea r regressionM u ltiple linearregr ession a p p lies topr o b lemsinwh ichreco rd sh avebeen kep to f o ne var ia ble , y , he d ependen t var iab le , an d sever a l o t her var iab les

  • 8/12/2019 regresioni shumfisht

    47/164

    M U LT I P L E LI N EA R REG R ES S I O N

    x 1x2, . . . , x k, heind epe nde n t va ria b les, a nd in wh ich the objectiverequ ire st he re la t ion sh ip between th evar ia b le y an d the va ria b les x l ,x2, . . . , x kt o beinvest iga ted . F o r an y such reco rd , t he specif c mathem at ical re la t ion sh ip(m od el) as su med is

    y = a + b ix i + b 2x 2 + + bkx k + e (32)wh ere a , b 1, b 2, . bk a re co n st an ts an d e isa va ria b le . T hu s,it is assu medt haty is linea r ly re late d to eac h o f the in d epend en t var ia bles an d tha t ea chind epe nd en tvaria b le ha sanadditive ef ectony . T her efore, atthis sta ge, we a rea ssu m ingth at x 1x2, . . . , x i, do n o t in ter ac t am o ngs t t hemselvesint heireffecton y . T he var ia b le e serves t he same pu rp o se as in the simp le linea r mo d eld esc r ibedin sub sectio n 1.1.2 an d iden tica l assu mp tio n s are mad e on e inmu ltiplelinea r regressio n . T hu s, u nd er rep ea ted iden tica lcond ition s ( t h at is ,wh en va lues o f x x 2, . . . , x i, arekep tco nst an t) , we expect t he ar ithmet icme ano f valu esof e to bezero an d we expec t th evar ian ce o f value s o f e obethe sa me ,wh at ever t he co nst a nt value s of x i , x 2, . . . , x k.

    To ca rr yout test s of signif ca nce o r to est ab lish co nf d enceint erva ls,we w illneed to as su met ha t thesevaluesof e fo rm a no rma l d is tr ibu tio n an d tha t a llvalu esof e ar e ind epe nd ent .

    2 .1.3 In terp ret at io n of the a ssump tio n sT he p ro b lemof decid ingwh ich is y a ndwh ieh is x ismo re well def n ed in t hem ultiple reg ress ion situ at ion . U sua lly, we will wan t to assess t he co mb inedef ect of severa l var ia b les on a sing levar iab le. T h is m a y be to pr edict y when wekn ow x l , x 2, . . . x i, ortod ecide wh ich o f x l ,x2, , x kd o, in fac t , inf uen ce y ,or wema ysimp lywan t to su mma r ise t he da ta .

    It ispro b ab lyonlyint he lat ter ca se th at t here migh t be som e d ou b t as t o theiden tity o f y . T h e t ype of re lat ion sh ip b ein g est imat ed aga in assu mes t ha tx x 2, ;x ka re k n own o r err or free , f lling j ust the sa me ro le as x insu b sectio n 1.1 .3.Ind eed , ifitpr oves im p o ssib le to d ecide which isy a mo n gstthe va ria b lesmea su r ed , then t hisma yind icate t h at mu ltip le regressiona nalysisis ina p pr o pr iat e an dth at some o ther typeofcor rela t ion an alysis, o r p rin cip a lco mp o nen t san a lysis,wou ld be mo re suit ab lefor the prob lem.T he assum p tio n sabou t t he var ia b le e ca n no t be seen ea sily in ter ms o f agra ph , ma in ly b ecau se the mo d el (32) is a hyp er pla ne in k + 1) d im en sio na lsp ace .However , we ca nuse t heinte rp ret a t iono ft he simp le linea r mo d el givenin sub sect ion 1. 1.3 to ex pla in thismo re co mp lex situ at ion . If we inter p re ta +b 1x 1 +b 2x 2+ +bkx k as beingt he valu eo fy t ha t we expect to o b serv e ,given t heco nd it ion s o rsitu at ion def ned b yt he valu es of x 1, x 2, . . . , x k, a nd ifweinter p re t a+bix , +b2x 2 + +bkx k +e as being the value o f y t ha t weac tu a llyobserve ,t hent he va lueo fe , t hedif ere nce betweenwh a t we ac tu a llyo bserve an d wh at we expect to ob serve , ag ain rep resent s the er ro r orinex p licab le var ia t ion in y .T hus , if wek new thevalues of a , b 1, b2, bk an d , hence , we co u ldplo t agrap hofo bserv ed y aga inst y ideal = a+ b 1x , + + bkx k fo r eachreco rd , t henwe wo u ld ha ve t he situ at ion illust ra ted in F igu re 8.

    T he vert ical d isplace mentof eachpo int from the4 50 line rep resen ts th evalue

  • 8/12/2019 regresioni shumfisht

    48/164

    o bserv ed )

    4 5

    M U L T I P L ER E G R ESS I O N IN H Y D R O L O G Y

    Y Pde al

    F ig . 8. P lo t o f ob ser ved a nd idea l va lu es of y .of e . If we were ablet o ob serverep ea ted lyvalues of y all with the same y ideal,t hen wewo u ld o b tain a ver t ical ar ra y o f p o in t s a nd ther esho u ld be an equ alsp rea d o n eith er side o f t he line . Fu r t her mo re , if we were to rep ea t t hispr oc ed u re at a d if eren tvalueofy i dea l t hen wesho uld o btaina s im ila rspr ead o fp o in ts (t hey sho u ld beneit her mo re n or less widelyscat tered) .Also , t hesep o in t sshou ld fo rm a n o rm al dist r ibu tio n cen tred on the line.T h e a ssum p tion of ind epen d en ce ha s the sa me interp retat io n as insu b sectio n 1.1 .3 .2 . 1.4 Wh at can be ac hieved byusing mu ltip lelinea r regression ?T he qu ick an swer is 'every t hing t ha t wa s ach ieved using simp le linea rregression an d a bit mo re ' . Estim a tes of a , bi , b2,. . bk ma y be d erived ,to get her with sta nd ar der ro rs andco n f d enceint erva ls . H owever , in mu ltip lelinea r reg ression , th ere is fa r more scop efor testsofsign if ca nce a ndfar mo reneed fo r them .T yp ica lly, fo r a var ia b le x i, wewill be ab le to d ecid et he fo llowing :

    ( 1) Whet herx ,ha s an in f uence on y .(2) W h et her ,after a llowingfor t he inf u encetha tot her spec if edva ria b lesha ve o n y , he va ria b le x ;still gives som e fu ther expla n at ion o ft he wa yin wh ich y var ies .A s an examp le o f t his , let u s su p p o se t ha t

    y =run-o fx 1 = rain fa llx 2 = d ura t ion of ra infa ll

  • 8/12/2019 regresioni shumfisht

    49/164

    M U LT I P L E L IN E ARR E G R E SS I O N J D

    F u r ther mor e, su pp o setha t , fo r theareabein g stu d ied ,whenitrains , it ra in s a ta con st an t ra te . T hen , we wo u ld ha ve t herelat ion ship x = kx , whe re k s aco n sta nt .W e w o ulddiscover fr o mou r tes t so fsignif ca nce tha t rain fa lland d u rat ionofrainfa llb othinf u encerun-o f . Howev er , when w ek nowwh at t herain fallhasbeen , the d u rat ion of t herain fa llwill tellu sno thin g fur ther ab ou trun -of , i .e. ifx 1isk nown , th en x2 is redu nd an t . Rea list icprac tical pr o b lems ar erar elythisd istinc t , bu t we d o have th e p ot ent ia l to ma k e this t ype of invest igat io n inmu ltiple regress ion an aly sis.

    Ha ving sum ma r ised ou r d at a in ter ms of estim at es o f a , b p b 2, . . . ,-bk, wema y co mp ar ethese est ima tes w it h sim ilar est ima tes fro m o ther sets o fd at asoas t o assess t he simi larity of the sets of da ta in ter ms o f t heir re lation sh ipbetween y an d x l , x 2, . . , x k.

    By su b stitu t ing ou r est ima t es of a , b 1,b 2, . . bk in t o mo del (32) (an dd isrega rd ing e), we ma ypred ic t y fo r specif ed valu es of x i , x 2,. . . , X . H avin gp red ic ted y value s at o b se rved valu es of x1, x 2, . . . , x k, w e ma y form the'r esid u als ' (t hedif erenc es between the pr ed icted y valu es a nd the observ ed yvalue s) j u st a s fo r the simp le linea r mo del and fo r sim ilar rea so n s.

    2.2 The Basic M ethod2 .2 .1 F itt ing t he mo d elT he basic u n it o f d at a fo r t his mod el will n o longer be ap air o fvalues of yan d x , a s in su b sect ion 1.2.1, bu t k + 1 n umb ers co r resp o nd ing to valuesof y , x , x 2, , x k. Hen ce , th ewh o le d at a se t w illcon sis to f n such b as icunit sa nd will be den o ted by y 1,x 11, x 21, . . , x ki ) , (y 2, x 12, x22, . . . , x k2) , y r, x 1r, x 2, x k) .

    Th e mo del (32) wo u ld imply t he re lati on shipy =a+b 1x 11+b 2x 21+ + bkx k; (for i = 1,2 ,6.. , n) (3 3)

    fo r t his set o f d at a .However ,j ust as it p ro ved u sefu l insimp lelinea r r egressio ntorewr itethe mod el in tothefor mofmo d el(25),t herear e someadvan ta ges inrewritin gt he mo d el (33) int o the form

    Yi= 11 + Pi ( x l i i l ) /6 2 ( x 2 1 i 2 ) + + fi k k t i - eiwhere

    1 c_,-)C2 in 1, 1 etc .

    (3 4)

    Th is is the for m of mo d el usu a lly encou n tered in texts on mu ltip leregression . Bycomp ar ing mo de l (34)wit hmo del(33),we seet ha t hi = fI (fo rj = 1, 2, . . . , k an d a=a - 81 1 /6 2 -1 2 - - l ki k F igure 9 illustr at es how themo d ela nd d at a might loo kifplo tt edwith k = 2an d n 3.T he sh a d edar ea repr esent s t he p lan e y =a + b 1x , + b 2x 2, d raw nfo r y , x x 2 > 0 , a nd the la rged ot s ind icate t he p o sitio n of t he p oin ts, Y 1, ) ( 1 1 5x 2 1 ) , ( y 2 ) X i 2, X 2 2 ) , ( 323, x 13, x n ) . Hence, t he lengt hs e1, e 2,6.3rep re sent the vert ical d ista nce fro m ea ch o fthese po int s t o t he plan e.

  • 8/12/2019 regresioni shumfisht

    50/164

    3 6 M U LT I P L ER E G R E SS I O N IN H Y D R O L O G YO urpr ob lemistha t , alt h ou ghweknowthe p o sitio nof t hepo int s, wedono tk n owthep o si tio no ft he p lan e ;ino t her wo rd s,wedo no t kn ow a, b, a nd b2.T h emet h odofleas t squ ar eswould leadus to cho o setho se value s of a, b l an db2 which min im ise

    3 35 2 Ei=14= E y, a bix ,, b2x2)2O n ce ag a in ,weareat tem p tingto ma kethever tica l d iscrep an cyof thepoin t sfr o mthe pla ne(rega rd less ofsign) a s sma ll as p o ss ib le .Inthe gener alca seof n o b servat io n sand k var iab les, w ewill wan ttoch oo sea, b 1, b2, . . . , bk (o r a, f l ,$2, . . . , lk ominimise

    S 2 -= E 4= E y, ab 1x 1, b2x 2i bkx ki)2

    So lvingas28=0,as2lap,=0,. .fo r the va lues o f cx, t )62,Si , f z, f d:

    (yi X 1 =

    = E 0 11 a 131(x 1 Z1) /6 2 0 ( 21 f k k k j k) ) 2i =

    1 02- 51(X2i =

    0,52/0f k =0giVes th efo llowing equ at ion s4 3k wh ich min im ise S 2 (den ot ed b ynE Y i = Yni=

    S I E (x 11k 1)2 11, E CX ) ( X 2 i 1 .2)i = 1 i = 1f k E )C 1 .1 0 k ;i =

    = 1 E (x1 1) ( t 2 i .k 2) + B 2 E r 2i i 2 ) 2i =I i = + k E X 2i 1 2 ) ( X 1a 1 1 )

    E j7)(x ki k) f , oc 5E,)(xk1 0 + E O C2 1 i 2) (x ki .k k)i = I i = i = + i k O Ck i 1 0 2i =

    Sim p lif ca tion s ma y be mad eto thepr esen tat iono f this in for ma tionby :( 1) Arepr esen tat io n u sing matr ices.(2 ) U seof t he no ta tio nSxjy= E ,9 y1 ) (fo r = 1,2,. . k )i = Is, E (x 1 ) ( 1 i ) (fo rj=1,2,. k an d 1= 1, 2, . . .,k )

  • 8/12/2019 regresioni shumfisht

    51/164

    M U L T I P L E L I N EA R R E G R ESS I O N

    1 12

    V2

    Iy = a+bir l+121 2e3

    S xy = S xji (35)T h u s , the est im ates s,$2, , k a regiven b y

    f = (36)which , toge the r witha = .)7, gives usest ima tes of a ll the pa rame ter s in m od el(34) . (S 2 is the ma tr ix inverse of Sy.)

    2.2.2 Estim ates a nd their precisio nIf we a ssume t ha t thevar ia nceof e, i s 0 -2 (for i = 1, 2,. . . , h),then itfol lowst h a t

    o-2Var (a ) ,H owe ver , a st he est im a tes o f /3 /32, . . . , f k a reno t mu tu a lly independ ent , the reare k 2 d if er ent va r iances and co var iance s asso c iated with the m . T hese are

  • 8/12/2019 regresioni shumfisht

    52/164

    .515

    conveniently displayed inwhich 's denoted byVar (P O Cov ($ /32) Cov ( 1133 )C o li ( /3 1 , 3 2 Var (n2 . Cov ( 2p3)

    _Cov (a l , Bk) Coy (/2A)and referred to as V It may be shown tha t

    Vs = u2S,72

    MU LT IP LE R EGR ESSION IN HYDRO LOGYa matrix, called the variance covariance matrix,

    Coy (PI,SkiCoy ($2,fk)Var (f k)

    (37)Equation (37) requires knowledge of a2 which, as in simple linear regression,will be unknown. However , also as in sim ple linear regression , we may estimateei by the ith residual.

    671 = y i ( x l P 2 ( x 2 i -* 2 ) f k ( ) Ck i i k) (38)It may easily be shown that E 7=1 = 0 and , consequent ly, the ar ithmetic meanof the residuals is a lways zero . Hence , we ma y aga in base our estimate of a2 onR = E T= e?, called the residual sum of squares. However, in this case, theappropriate divisor will be n k I as k + 1 degrees of freedom have been'lost' in estimating v, 8 B1 , 2 , f k a = 1 E (Yi it)12 ( X 2 i 5E2 ) P k ( X k i -k k ) ) 2n k 11

    Hence,

    [is stn k 1 1Y 2 2Ywhere S . = 17=, (y10 2.To calculate Syy, S and t may be easier to use the exp ressions

    = n ) 2S E y? ( iyi=1 i=1Sxfy= E y ji ( V ni= = 1 i = 1

    (39)

    (40)

    1;i=1although remarks made in Chapter 1, and in particular in subsection 1.2.1,concerning numerical accuracy, are equally pertinent in this context.T hus, we now have suf cient in formation to estimate the variances andcovariances of T , /32, . . . , f k as well as the variance of az. It is usually moreinformative to study the correlation s tha n t he covariances and these may easilybe derived by

    Coy (T , 81)Correlat ion between and 13, 7 Var (Ai) Var (81) (41)

  • 8/12/2019 regresioni shumfisht

    53/164

    M U LT I P L E L I N EA R REG R ESSI O N 39

    By assuming tha t ei N (0 , a 2) (for = I , 2,. . . ,n ) ,w e ma y d eriveco nf denceinterv a ls for the pa ra meters a , )01, fi k. In mu ltiple regr ession , the residualsu mofsquaresfollows o 2y2_k_ Co nsequently, a )/ 7 62/n n_k_ an d

    T )/ Y Est ima teofVar(f j) (n_ k_ 1 wh ere ' Estim at e o f Var (f i)' iso btained from the a p pr o priate elemen t of V B (g iven in equ a t io n (37)) a fte rsubst itu ting 6 2 (given in eq u ation (39)) for a ' .Hence, a 100( 1 a) % co n f d en ce interval for a is

    t (n k 1, 1a/ 2 ) 7 6 2/n (42)a nd ind iv idua l 100( 1a) % co n f d e nceintervals fo r 31, 0 2, , f k aregiven b y

    (n k 1, 1 a/ 2 ) i Estima te o f Var (pi) (43)Since f k will almo st alway s be co r related , there isso me d an ger inusing t hese sepa rate con f d ence interva ls, pa r ticularly wh en th e o bj ec tive is tof nd so me'j oint ' co nf d ence regio n , fo r examp le, for f tan d )62. A na ssessmentofthe co rrelatio nbetween pia nd f 2(see equ at ion (4 1))would be ad visablea ndif it pr oves high , then it ma ybe as wellto co nsider u sing the joint conf d enceregion . Th e 100( 1a) % con fd ence region fo r 1, 132, iksd ef nedby tho seva luesof /31, )32, , P k wh ich sa tisfy

    (f T ' S (f f ) (k + 1)(12F (k +1,n k 1, 1 a)where

    a nd (A ' den o t es the t ran spose of A)

    2.2 .3 Pred ic tionHaving estima ted the u n k n owns inm odel (3 4), wear ein ap ositio n to pred ic t avalue o f y fro m kn owledgeof xl , x 2, . . . , x k. Ifwehavevaluesfo r x 1,x2,. . . ,x ka nd t hese ar e d enoted by x 1 p , x 2 X k p , then we ma y predict y usin g

    = + /31(x1 _ i)+$ 2 ( X 2 p - 1 2 ) + f k ( X kp (4 4)To d etermine t he varia nce of 9 , le t u sde fn e X,, = Kx1pfe,), (x2p 2), .(xk 01t hen , the var iance of f is given by

    Var (9) = o-2X,,,S2 (45)T hepr ob lem ofconside rin g wha tw e ar e actuallya t temp tingtopredictwithfha s beend iscussed for sim p le linea r regression : T he distinc t ion s ma d e t here ar eequ ally relevant in the co ntext o f multip le regression . T h e va riance given inequa tio n (45) o nly rep resent s ou r u ncertaint yab ou t a , /11, , f k an d it s u seisonlyappropr iate whenwe are tryin g to pred ict the mean va lue o f y . Ho wever ,

  • 8/12/2019 regresioni shumfisht

    54/164

    40 M U LT IP LER EG R ESS IO N IN HY D RO LO G Ywh en we a re try ing to p redict the outcome o f a single read ing, the varian cegiven in equation (45) sh ou ld be increased to

    + a 2X Sp x x pt he ad ditiona l co mp o nen t , a 2, beingfor the er roro f t he reading .C o rresp o nd ing 100( 1 a) co nf dence intervals fo r themean value o r t h eo u tc o me ofa single read ing are

    )3 n k I , 1a/ 2) 7 6 2XpS;x' X ;, (46)a nd

    f t n k 1, 1 a/ 2 ) \ / 6 2( 1+X,S ; ; X ) (47)respectively.T he co nf d ence regio n fo r theh yperpla ne

    Y = + i l (x 1 1) + )612(x 21 2) + + f k ( X k -d o es no t have any grea t practica l merit , main ly b ecau se o f the d ifculty ofvisua lly d ispla yingsu ch a region .

    2 .3 S ignif cance Tests andthe Best Equatio n2 .3 . 1 G en eral linea r h yp o t hesisA va rietyof signif ca nce te sts are available fo r stu dying va riou s fea turesof themo d el (34).We dealherewiththese te sts inisolation an dlater o nwillexpla inh ow com b inations o f such tests may be used, fo r instance , to decid e which is the'be st' equa t ion .M an y tests can be co n st ructed fro m o ne b asic result whichis, somewhata m b iguo usly,referredto a s thegeneral linea r hypo thesis. A h ypo thesisabou tt heparameters inm od el (34) might , for insta nce,sta tethat /31, 133, I?, an df 9ar ea ll zero (i.e. variablesx i , x 3, x 7 a nd x 9areof n oimpor ta ncein mo del(34)).T h u s, ageneral linea r hyp o thesis migh t tak e a for m wh ich exactlyspecif es t heva luesofp of the pa r a met er s inmo d el (34) .Im agin e modi fyin