s tatistics 1 regression & correlation. s tatistics 2 outline x, y & regression models simple...
Post on 20-Jan-2016
220 views
Embed Size (px)
TRANSCRIPT
-
Regression & Correlation
-
OutlineX, Y & Regression ModelsSimple linear regression (SLR)The logic of SLR: SST=SSR+SSESLR: ANOVA table & R-square SLRANOVA2-s t test Multiple Linear RegressionPearsons correlation coefficient (r)R2, r, bZ, t, F, 2
-
X and Y
-
Univariate analysis: 1X1Y*:normalityIndependence..Cat.: categorical; Num.: numerical
Sheet1
XYComparisonsMethods
BinaryNum._normal2 indep. meansTwo-sample t test*
CategoricalNum._normal>= 2 indep. meansOne-way ANOVA*
BinaryNum._non-normal2 indep. mediansWilcoxon rank sum
CategoricalNum._non-normal>= 2 indep. mediansKruskal-Wallis
num._normalNum._normalRegression*
Num._normal2 related meansPaired t
Num._non-normal2 related mediansWilcoxon signed rank
CategoricalCategoricalX related to YPearson's Chi-sq
Categorical_Binary2 related prop.McNemar Chi-sq
Categorical_BinaryCategorical_Binary2 indep. Prop.Pearson's Chi-sq
Categorical_BinaryCategorical_Binary2 indep. Prop.2-Z
Categorical_BinaryCategorical_Binary2 indep. Prop.Fisher's exact
-
Multivariate analysis: Xs1Y*:Multivariate normalityIndependence..Cat.: categorical; Num.: numericalCART: classification and regression treeANOVA: analysis of varianceANCOVA: analysis of covarianceMANOVA: multivariate analysis of varianceGEE: generalized estimating equations
Sheet1
XsYMethods
CategoricalCat.Log-linear
Cat.+Num.Cat.(binary)Logistic regression
Cat.+Num.Cat.(>=3)Logistic regression
Dicriminant analysis*
Cluster analysis
Propensity scores
CART
Cat.Num.ANOVA*
MANOVA*
Num.Num.Multiple regression*
Cat.+Num.Num.(censored)Cox Propotional hazard model
Confounding factorsNum.ANCOVA*
MANOVA*
GEE*
Confounding factorsCat.Mantel-Haenszel
Num.Factor analysis
-
Regression ModelsMathematical models to describe the relationship between Y and X
The use of regression modelAdjustmentPredictionFinding important factors for Y
-
Regression ModelsDefinition: Mathematical models to describe the relationship between Y and XPurpose: The use of regression model:Find important factors for Y and/or Prediction
-
Simple linear regression (SLR)Model:
-
SLR Example
Sheet1
IDAGECHOL
134141.4
239180.5
344178.4
446212
548203.2
651224.1
753186
860350
961286.3
1065287.6
1166330.3
1267311.3
-
SLR: parameter estimationThe least square method
Point estimate:
-
The logic of SLR: SST=SSR+SSE SST = SSE + SSR Total amount unexplained at Xiamount at Xi unexplained by regression amount at Xi explained by regression
-
SLR: parameter estimationThe least square methodmin SSE:Point estimate
-
SLR example: Regression lineEstimated Model: CHOL=(-57.5964988786446) + ( 5.65024919013205) * (Age)
-
SLR: ANOVA table & R-square R2=0.82, p=0.0001
-
SLR: qualitative covariateExample: X=treatment, 1 or 0Y=SBPHypothesis H0: 1 = 0 H1: 10 :H0: 1 = 0 H1: 10Note: 1 = 1 - 0
-
SLRANOVA2-s t test 2-s t ANOVA2-s t SLR H0: 1 = 0 H0: 1 = 0Dummy variable: KK-1ANOVA SLR H0: 1 = 2 = 3 H0: 1 = 2 = 0
-
Multiple Linear RegressionModel
Example: Is Age a predictor for SBP adjusting for Sex?
-
MLR: example
-
Pearsons correlation coefficient (r)Relationship btw X and Y
Properties of Pearsons rRange: UnitlessGood for normally distributed X and Y rcos Spearmans correlation coefficientPearsons r for ranked X and YGood for non- normally distributed X and Y
-
Spearmans Rho: rank correlationRelationship btw X and Y
Spearmans correlation coefficientPearsons r for ranked X and YGood for non- normally distributed X and Y
-
Assumptions in RegressionLinear IndependentNormal distributionEqual VarianceFor all the values of x, are independent,normally distributed, have the same SD = () mean = 0Yi = 0 + 1Xi + i and are the unknown parameters = random error fluctuations
-
R2, r, br and b
r2: Coefficient of Determination: The proportion of the variability among the observed values of Y that is explained by the linear regression of Y on X.YX
-
r, b: rb
rb
-
1
-
2The Standard Error of the Estimate
SE of RL
SE of prediction
-
3Note (a): b1
Note (b): b0
-
1030-39(X)10(Y)(:89P374)
rF0.05 (1,8) =5.3210103501095%CI3501095%CI:
-
()
-
YPredicting Nominal or categorical outcomeOdds Ratio ( ; )Cross sectional studyCohort study (Follow-up study)Case-control studyClinical trialLogistic Regression
-
Odds ratioOddsOdds (Odds ratio): p1 = A / (A+B): p0 = C / (C+D)111100?
-
Cross sectional studyCohort study (Follow-up study)Case-control studyClinical trial
-
(bias): selection bias: information bias: misclassification: confounding
-
(recall bias)
-
()EED
-
Confounding factorsX1X2Y
-
(intervention)(randomization)(placebo effect)
-
Study DesignsCase-control studyMatched case-control studyCohort studyMatched cohort studyRandomization clinical trialComplete matched cohort studyCausality and correlationY=a+b1X1+b2X2+b3X3+b4X4+b5X5
-
Logistic regression:Simple linear regression:Logistic regression: YY(0,1)(- , )?Logistic transformation
-
Logistic regressionORORexp(beta)XORXXOR modelXX: Xx=1x=0Y
-
LRmen with unintentional injurySoderstrom, 1997 Table 10-5,p247(BAC>50mg/Dl)
-
Z, t, F, 2 Z2 , chi-square
-
Z, t, F, 2 F ,chi-square
-
Z, t, F, 2