# regression analysis linear regression logistic regression

Embed Size (px)

TRANSCRIPT

- Slide 1

Regression analysis Linear regression Logistic regression Slide 2 Relationship and association 2 Slide 3 Straight line 3 Slide 4 Best straight line? 4 Slide 5 Best straight line! 5 Least square estimation Slide 6 Simple linear regression 1.Is the association linear? 6 Slide 7 Simple linear regression 1.Is the association linear? 2.Describe the association: what is b 0 and b 1 BMI = -12.6kg/m 2 +0.35kg/m 3 *Hip 7 Slide 8 Simple linear regression 1.Is the association linear? 2.Describe the association 3.Is the slope significantly different from 0? Help SPSS!!! 8 Coefficients a Model Unstandardized Coefficients Standardized Coefficients tSig. BStd. ErrorBeta 1(Constant)-12,5812,331-5,396,000 Hip,345,023,56515,266,000 a. Dependent Variable: BMI Slide 9 Simple linear regression 1.Is the association linear? 2.Describe the association 3.Is the slope significantly different from 0? 4.How good is the fit? How far are the data points fom the line on avarage? 9 Slide 10 The Correlation Coefficient, r 10 R = 0 R = 1 R = 0.7 R = -0.5 Slide 11 r 2 Goodness of fit How much of the variation can be explained by the model? 11 R 2 = 0 R 2 = 1 R 2 = 0.5 R 2 = 0.2 Slide 12 Multiple linear regression Could waist measure descirbe some of the variation in BMI? BMI =1.3 kg/m 2 + 0.42 kg/m 3 * Waist Or even better: 12 Slide 13 Multiple linear regression Adding age: adj R 2 = 0.352 Adding thigh: adj R 2 = 0.352? 13 Coefficients a Model Unstandardized Coefficients Standardized Coefficients tSig. 95,0% Confidence Interval for B BStd. ErrorBetaLower BoundUpper Bound 1(Constant)-9,0012,449-3,676,000-13,813-4,190 Waist,168,043,2013,923,000,084,252 Hip,252,031,4118,012,000,190,313 Age-,064,018-,126-3,492,001-,101-,028 a. Dependent Variable: BMI Coefficients a Model Unstandardized Coefficients Standardized Coefficients tSig. 95,0% Confidence Interval for B BStd. ErrorBetaLower BoundUpper Bound 1(Constant)3,5811,7842,007,045,0757,086 Waist,168,043,2013,923,000,084,252 Age-,064,018-,126-3,492,001-,101-,028 Thigh,252,031,4118,012,000,190,313 a. Dependent Variable: BMI Slide 14 Assumptions 1.Dependent variable must be metric continuous 2.Independent must be continuous or ordinal 3.Linear relationship between dependent and all independent variables 4.Residuals must have a constant spread. 5.Residuals are normal distributed 6.Independent variables are not perfectly correlated with each other 14 Slide 15 Non-parametric correlation 15 Slide 16 Ranked Correlation Kendalls Spearmans r s Korrelation koefficienten er mellem -1 og 1. Hvor -1 er perfekt omvendt korrelation, 0 betyder ingen korrelation, og 1 betyder perfekt korrelation. Pearson is the correlation method for normal data Remember the assumptions: 1.Dependent variable must be metric continuous 2.Independent must be continuous or ordinal 3.Linear relationship between dependent and all independent variables 4.Residuals must have a constant spread. 5.Residuals are normal distributed 16 Slide 17 Kendalls - Et eksempel 17 Slide 18 Kendalls - Et eksempel 18 Slide 19 Spearman det samme eksempel d2d2 14911199116 19 Slide 20 Korrelation i SPSS 20 Slide 21 Korrelation i SPSS Correlations ab a Pearson Correlation 1,685 * Sig. (2-tailed),029 N10 b Pearson Correlation,685 * 1 Sig. (2-tailed),029 N10 *. Correlation is significant at the 0.05 level (2-tailed). 21 Correlations ab Kendall's tau_ba Correlation Coefficient 1,000,511 * Sig. (2-tailed).,040 N10 b Correlation Coefficient,511 * 1,000 Sig. (2-tailed),040. N10 Spearman's rhoa Correlation Coefficient 1,000,685 * Sig. (2-tailed).,029 N10 b Correlation Coefficient,685 * 1,000 Sig. (2-tailed),029. N10 *. Correlation is significant at the 0.05 level (2-tailed). Slide 22 Logistic regression 22 Slide 23 Logistic Regression If the dependent variable is categorical and especially binary? Use some interpolation method Linear regression cannot help us. 23 Slide 24 24 The sigmodal curve Slide 25 25 The sigmodal curve The intercept basically just scale the input variable Slide 26 26 The sigmodal curve The intercept basically just scale the input variable Large regression coefficient risk factor strongly influences the probability Slide 27 27 The sigmodal curve The intercept basically just scale the input variable Large regression coefficient risk factor strongly influences the probability Positive regression coefficient risk factor increases the probability Logistic regession uses maximum likelihood estimation, not least square estimation Slide 28 Does age influence the diagnosis? Continuous independent variable 28 Variables in the Equation BS.E.WalddfSig.Exp(B) 95% C.I.for EXP(B) LowerUpper Step 1 a Age,109,010108,7451,0001,1151,0921,138 Constant-4,213,42399,0971,000,015 a. Variable(s) entered on step 1: Age. Slide 29 Does previous intake of OCP influence the diagnosis? Categorical independent variable Variables in the Equation BS.E.WalddfSig.Exp(B) 95% C.I.for EXP(B) LowerUpper Step 1 a OCP(1)-,311,1802,9791,084,733,5151,043 Constant,233,1233,5831,0581,263 a. Variable(s) entered on step 1: OCP. 29 Slide 30 Odds ratio 30 Slide 31 Multiple logistic regression Variables in the Equation BS.E.WalddfSig.Exp(B) 95% C.I.for EXP(B) LowerUpper Step 1 a Age,123,011115,3431,0001,1311,1061,157 BMI,083,01918,7321,0001,0871,0461,128 OCP,528,2195,8081,0161,6951,1042,603 Constant-6,974,76283,7771,000,001 a. Variable(s) entered on step 1: Age, BMI, OCP. 31 Slide 32 Predicting the diagnosis by logistic regression What is the probability that the tumor of a 50 year old woman who has been using OCP and has a BMI of 26 is malignant? z = -6.974 + 0.123*50 + 0.083*26 + 0.28*1 = 1.6140 p = 1/(1+e -1.6140 ) = 0.8340 32 Variables in the Equation BS.E.WalddfSig.Exp(B) 95% C.I.for EXP(B) LowerUpper Step 1 a Age,123,011115,3431,0001,1311,1061,157 BMI,083,01918,7321,0001,0871,0461,128 OCP,528,2195,8081,0161,6951,1042,603 Constant-6,974,76283,7771,000,001 a. Variable(s) entered on step 1: Age, BMI, OCP.