# s tatistics 1 regression & correlation. s tatistics 2 outline x, y & regression models simple...

of 44 /44
1 STATISTICS Regression & Correlation

Post on 20-Jan-2016

220 views

Category:

## Documents

Embed Size (px)

TRANSCRIPT

• Regression & Correlation

• OutlineX, Y & Regression ModelsSimple linear regression (SLR)The logic of SLR: SST=SSR+SSESLR: ANOVA table & R-square SLRANOVA2-s t test Multiple Linear RegressionPearsons correlation coefficient (r)R2, r, bZ, t, F, 2

• X and Y

• Univariate analysis: 1X1Y*:normalityIndependence..Cat.: categorical; Num.: numerical

Sheet1

XYComparisonsMethods

BinaryNum._normal2 indep. meansTwo-sample t test*

CategoricalNum._normal>= 2 indep. meansOne-way ANOVA*

BinaryNum._non-normal2 indep. mediansWilcoxon rank sum

CategoricalNum._non-normal>= 2 indep. mediansKruskal-Wallis

num._normalNum._normalRegression*

Num._normal2 related meansPaired t

Num._non-normal2 related mediansWilcoxon signed rank

CategoricalCategoricalX related to YPearson's Chi-sq

Categorical_Binary2 related prop.McNemar Chi-sq

Categorical_BinaryCategorical_Binary2 indep. Prop.Pearson's Chi-sq

Categorical_BinaryCategorical_Binary2 indep. Prop.2-Z

Categorical_BinaryCategorical_Binary2 indep. Prop.Fisher's exact

• Multivariate analysis: Xs1Y*:Multivariate normalityIndependence..Cat.: categorical; Num.: numericalCART: classification and regression treeANOVA: analysis of varianceANCOVA: analysis of covarianceMANOVA: multivariate analysis of varianceGEE: generalized estimating equations

Sheet1

XsYMethods

CategoricalCat.Log-linear

Cat.+Num.Cat.(binary)Logistic regression

Cat.+Num.Cat.(>=3)Logistic regression

Dicriminant analysis*

Cluster analysis

Propensity scores

CART

Cat.Num.ANOVA*

MANOVA*

Num.Num.Multiple regression*

Cat.+Num.Num.(censored)Cox Propotional hazard model

Confounding factorsNum.ANCOVA*

MANOVA*

GEE*

Confounding factorsCat.Mantel-Haenszel

Num.Factor analysis

• Regression ModelsMathematical models to describe the relationship between Y and X

The use of regression modelAdjustmentPredictionFinding important factors for Y

• Regression ModelsDefinition: Mathematical models to describe the relationship between Y and XPurpose: The use of regression model:Find important factors for Y and/or Prediction

• Simple linear regression (SLR)Model:

• SLR Example

Sheet1

IDAGECHOL

134141.4

239180.5

344178.4

446212

548203.2

651224.1

753186

860350

961286.3

1065287.6

1166330.3

1267311.3

• SLR: parameter estimationThe least square method

Point estimate:

• The logic of SLR: SST=SSR+SSE SST = SSE + SSR Total amount unexplained at Xiamount at Xi unexplained by regression amount at Xi explained by regression

• SLR: parameter estimationThe least square methodmin SSE:Point estimate

• SLR example: Regression lineEstimated Model: CHOL=(-57.5964988786446) + ( 5.65024919013205) * (Age)

• SLR: ANOVA table & R-square R2=0.82, p=0.0001

• SLR: qualitative covariateExample: X=treatment, 1 or 0Y=SBPHypothesis H0: 1 = 0 H1: 10 :H0: 1 = 0 H1: 10Note: 1 = 1 - 0

• SLRANOVA2-s t test 2-s t ANOVA2-s t SLR H0: 1 = 0 H0: 1 = 0Dummy variable: KK-1ANOVA SLR H0: 1 = 2 = 3 H0: 1 = 2 = 0

• Multiple Linear RegressionModel

Example: Is Age a predictor for SBP adjusting for Sex?

• MLR: example

• Pearsons correlation coefficient (r)Relationship btw X and Y

Properties of Pearsons rRange: UnitlessGood for normally distributed X and Y rcos Spearmans correlation coefficientPearsons r for ranked X and YGood for non- normally distributed X and Y

• Spearmans Rho: rank correlationRelationship btw X and Y

Spearmans correlation coefficientPearsons r for ranked X and YGood for non- normally distributed X and Y

• Assumptions in RegressionLinear IndependentNormal distributionEqual VarianceFor all the values of x, are independent,normally distributed, have the same SD = () mean = 0Yi = 0 + 1Xi + i and are the unknown parameters = random error fluctuations

• R2, r, br and b

r2: Coefficient of Determination: The proportion of the variability among the observed values of Y that is explained by the linear regression of Y on X.YX

• r, b: rb

rb

• 1

• 2The Standard Error of the Estimate

SE of RL

SE of prediction

• 3Note (a): b1

Note (b): b0

• 1030-39(X)10(Y)(:89P374)

rF0.05 (1,8) =5.3210103501095%CI3501095%CI:

• ()

• YPredicting Nominal or categorical outcomeOdds Ratio ( ; )Cross sectional studyCohort study (Follow-up study)Case-control studyClinical trialLogistic Regression

• Odds ratioOddsOdds (Odds ratio): p1 = A / (A+B): p0 = C / (C+D)111100?

• Cross sectional studyCohort study (Follow-up study)Case-control studyClinical trial

• (bias): selection bias: information bias: misclassification: confounding

• (recall bias)

• ()EED

• Confounding factorsX1X2Y

• (intervention)(randomization)(placebo effect)

• Study DesignsCase-control studyMatched case-control studyCohort studyMatched cohort studyRandomization clinical trialComplete matched cohort studyCausality and correlationY=a+b1X1+b2X2+b3X3+b4X4+b5X5

• Logistic regression:Simple linear regression:Logistic regression: YY(0,1)(- , )?Logistic transformation

• Logistic regressionORORexp(beta)XORXXOR modelXX: Xx=1x=0Y

• LRmen with unintentional injurySoderstrom, 1997 Table 10-5,p247(BAC>50mg/Dl)

• Z, t, F, 2 Z2 , chi-square

• Z, t, F, 2 F ,chi-square

• Z, t, F, 2