s tatistics 1 regression & correlation. s tatistics 2 outline x, y & regression models simple...

of 44 /44
1 STATISTICS Regression & Correlation

Post on 20-Jan-2016

220 views

Category:

Documents


2 download

Embed Size (px)

TRANSCRIPT

  • Regression & Correlation

  • OutlineX, Y & Regression ModelsSimple linear regression (SLR)The logic of SLR: SST=SSR+SSESLR: ANOVA table & R-square SLRANOVA2-s t test Multiple Linear RegressionPearsons correlation coefficient (r)R2, r, bZ, t, F, 2

  • X and Y

  • Univariate analysis: 1X1Y*:normalityIndependence..Cat.: categorical; Num.: numerical

    Sheet1

    XYComparisonsMethods

    BinaryNum._normal2 indep. meansTwo-sample t test*

    CategoricalNum._normal>= 2 indep. meansOne-way ANOVA*

    BinaryNum._non-normal2 indep. mediansWilcoxon rank sum

    CategoricalNum._non-normal>= 2 indep. mediansKruskal-Wallis

    num._normalNum._normalRegression*

    Num._normal2 related meansPaired t

    Num._non-normal2 related mediansWilcoxon signed rank

    CategoricalCategoricalX related to YPearson's Chi-sq

    Categorical_Binary2 related prop.McNemar Chi-sq

    Categorical_BinaryCategorical_Binary2 indep. Prop.Pearson's Chi-sq

    Categorical_BinaryCategorical_Binary2 indep. Prop.2-Z

    Categorical_BinaryCategorical_Binary2 indep. Prop.Fisher's exact

  • Multivariate analysis: Xs1Y*:Multivariate normalityIndependence..Cat.: categorical; Num.: numericalCART: classification and regression treeANOVA: analysis of varianceANCOVA: analysis of covarianceMANOVA: multivariate analysis of varianceGEE: generalized estimating equations

    Sheet1

    XsYMethods

    CategoricalCat.Log-linear

    Cat.+Num.Cat.(binary)Logistic regression

    Cat.+Num.Cat.(>=3)Logistic regression

    Dicriminant analysis*

    Cluster analysis

    Propensity scores

    CART

    Cat.Num.ANOVA*

    MANOVA*

    Num.Num.Multiple regression*

    Cat.+Num.Num.(censored)Cox Propotional hazard model

    Confounding factorsNum.ANCOVA*

    MANOVA*

    GEE*

    Confounding factorsCat.Mantel-Haenszel

    Num.Factor analysis

  • Regression ModelsMathematical models to describe the relationship between Y and X

    The use of regression modelAdjustmentPredictionFinding important factors for Y

  • Regression ModelsDefinition: Mathematical models to describe the relationship between Y and XPurpose: The use of regression model:Find important factors for Y and/or Prediction

  • Simple linear regression (SLR)Model:

  • SLR Example

    Sheet1

    IDAGECHOL

    134141.4

    239180.5

    344178.4

    446212

    548203.2

    651224.1

    753186

    860350

    961286.3

    1065287.6

    1166330.3

    1267311.3

  • SLR: parameter estimationThe least square method

    Point estimate:

  • The logic of SLR: SST=SSR+SSE SST = SSE + SSR Total amount unexplained at Xiamount at Xi unexplained by regression amount at Xi explained by regression

  • SLR: parameter estimationThe least square methodmin SSE:Point estimate

  • SLR example: Regression lineEstimated Model: CHOL=(-57.5964988786446) + ( 5.65024919013205) * (Age)

  • SLR: ANOVA table & R-square R2=0.82, p=0.0001

  • SLR: qualitative covariateExample: X=treatment, 1 or 0Y=SBPHypothesis H0: 1 = 0 H1: 10 :H0: 1 = 0 H1: 10Note: 1 = 1 - 0

  • SLRANOVA2-s t test 2-s t ANOVA2-s t SLR H0: 1 = 0 H0: 1 = 0Dummy variable: KK-1ANOVA SLR H0: 1 = 2 = 3 H0: 1 = 2 = 0

  • Multiple Linear RegressionModel

    Example: Is Age a predictor for SBP adjusting for Sex?

  • MLR: example

  • Pearsons correlation coefficient (r)Relationship btw X and Y

    Properties of Pearsons rRange: UnitlessGood for normally distributed X and Y rcos Spearmans correlation coefficientPearsons r for ranked X and YGood for non- normally distributed X and Y

  • Spearmans Rho: rank correlationRelationship btw X and Y

    Spearmans correlation coefficientPearsons r for ranked X and YGood for non- normally distributed X and Y

  • Assumptions in RegressionLinear IndependentNormal distributionEqual VarianceFor all the values of x, are independent,normally distributed, have the same SD = () mean = 0Yi = 0 + 1Xi + i and are the unknown parameters = random error fluctuations

  • R2, r, br and b

    r2: Coefficient of Determination: The proportion of the variability among the observed values of Y that is explained by the linear regression of Y on X.YX

  • r, b: rb

    rb

  • 1

  • 2The Standard Error of the Estimate

    SE of RL

    SE of prediction

  • 3Note (a): b1

    Note (b): b0

  • 1030-39(X)10(Y)(:89P374)

    rF0.05 (1,8) =5.3210103501095%CI3501095%CI:

  • ()

  • YPredicting Nominal or categorical outcomeOdds Ratio ( ; )Cross sectional studyCohort study (Follow-up study)Case-control studyClinical trialLogistic Regression

  • Odds ratioOddsOdds (Odds ratio): p1 = A / (A+B): p0 = C / (C+D)111100?

  • Cross sectional studyCohort study (Follow-up study)Case-control studyClinical trial

  • (bias): selection bias: information bias: misclassification: confounding

  • (recall bias)

  • ()EED

  • Confounding factorsX1X2Y

  • (intervention)(randomization)(placebo effect)

  • Study DesignsCase-control studyMatched case-control studyCohort studyMatched cohort studyRandomization clinical trialComplete matched cohort studyCausality and correlationY=a+b1X1+b2X2+b3X3+b4X4+b5X5

  • Logistic regression:Simple linear regression:Logistic regression: YY(0,1)(- , )?Logistic transformation

  • Logistic regressionORORexp(beta)XORXXOR modelXX: Xx=1x=0Y

  • LRmen with unintentional injurySoderstrom, 1997 Table 10-5,p247(BAC>50mg/Dl)

  • Z, t, F, 2 Z2 , chi-square

  • Z, t, F, 2 F ,chi-square

  • Z, t, F, 2