using sas to compute partial correlation - lex jansen · 1 pharmasug2010 - paper sp01 using sas®...

5

Click here to load reader

Upload: vudiep

Post on 16-Sep-2018

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Using SAS to Compute Partial Correlation - Lex Jansen · 1 PharmaSUG2010 - Paper SP01 Using SAS® to Compute Partial Correlation Jianxin Lin, Aiming Yang, Arvind Shah Merck & Co.,

1

PharmaSUG2010 - Paper SP01

Using SAS® to Compute Partial Correlation Jianxin Lin, Aiming Yang, Arvind Shah Merck & Co., Inc. Rahway, NJ 07065

ABSTRACT Partial correlation is used in many epidemiological studies and clinical trials when a researcher investigates an association involving some potential confounding factors. Partial correlation is a method used to describe the relationship between two variables when controlling for effects of one or more variables in this relationship. In SAS, several procedures such as PROC CORR, PROC REG, and PROC GLM, can be used to obtain partial correlation coefficient. This paper will illustrate how to use these different procedures to get partial correlation, and explain the difference among these procedures. KEYWORDS: Partial Correlation, PROC CORR, PROC REG, PROC GLM

INTRODUCTION

Partial correlation is the correlation of two variables while controlling for a third or more other variables. It measures the strength of a linear relationship between two variables, while controlling the effect of other variables. For example, rXY.ZW is the correlation between variables X and Y, controlling for variables Z and W. The partial correlation, rXY.ZW, may be the same, lower, or higher than the simple correlation, rXY, A researcher compares the partial correlation (ex., rXY.ZW) with the simple correlation (ex., rXY). When simple and partial correlation coefficients are close to each other, it implies no or minimal effects of the controlled variables Z and W on the association of the variables X and Y. When the partial correlation is lower than the simple correlation, it would imply that the original correlation is spurious. When the partial correlation is higher than the simple correlation, it would apply that the original correlation is suppressed. Partial correlation analysis is important when studying relationship in linear form between more than two variables. Order of correlation is the number of controlled variable, for example, rXY.ZW is a second order partial correlation, rXY.Z

is a first order partial correlation, rXY is a zero order partial correlation, that is, a simple correlation coefficient. How to calculate partial correlation?

1. Calculate from lower order correlation A. For first order partial correlation:

B. For higher order partial correlation: The formula is a straightforward extension of the preceding first-order formula. As an example of second order partial correlation:

)1)(1( 2

.

2

.

...

.

ZYWZXW

ZYWZXWZXY

ZWXY

rr

rrrr

2. Calculate from multiple regression coefficient of X when regressing Y on X and variables Z and W.

DFresidualt

tr

X

XZWXY

_2.

where tX is the t test statistic for testing the significance of the coefficient of X in the multiple regression. 3. Calculate from coefficient of determination (R

2) through multiple regressions

)1(

)(2

.

2

.

2

.2

.

XY

XYXZWY

ZWXYR

RRr

Page 2: Using SAS to Compute Partial Correlation - Lex Jansen · 1 PharmaSUG2010 - Paper SP01 Using SAS® to Compute Partial Correlation Jianxin Lin, Aiming Yang, Arvind Shah Merck & Co.,

2

where

2

.XZWYR is the R2

from the full model (Y on X, Z, W), and 2

.XYR is the R2 from the reduced model (Y

on X). Or,

reduced

fullreduced

ZWXYSSE

SSESSEr 2

.

where SSEreduced and SSEfull are error sum of squares of reduced and full models, respectively. 4. Calculate from correlation of residuals from multiple regressions

Regress X on Z, W, and Y on Z , W, compute the residuals eX = X - X , eY = Y - Y , then,

rXY.ZW = YX eer .

How to get partial correlation from SAS procedure?

1. PROC CORR procedure. PROC CORR data=data [/ spearman or kendall]; Var X; /* can be many variables */ With Y; /* can be many variables */ Partial Z W; /* can be many variables */

Run; By default, SAS gives Pearson partial correlation coefficients. When option Spearman is added, then SAS provides both Pearson partial correlation coefficients and Spearman's partial rank-order correlation coefficients. When option Kendall is added, then SAS provides both Pearson partial correlation coefficients and Kendall's partial tau-b correlation coefficients. 2. PROC REG procedure. PROC REG data=data; Model Y = Z W X / pcorr1 pcorr2;

Run; SAS provides type I and II squared partial correlation coefficients. The type II squared partial correlation coefficient is same as the squared Pearson partial correlation coefficients got from . 3. PROC GLM procedure. PROC GLM data=data; Class (some categorical variables); Model X Y = Z W; Manova / printe;

Run; SAS gives partial correlation coefficients that are same as Pearson partial correlation coefficients got from PROC CORR. Note that, the difference among three procedures is that PROC CORR provides the option to get Spearman's partial rank-order and Kendall's partial tau-b correlation coefficients; both PROC CORR and PROC REG procedures can’t have categorical controlled variables, while PROC GLM procedure can handle categorical controlled variables by using class statement. If there are some categorical controlled variables, and one wants to use PROC CORR or PROC REG, then one just needs to code these variables to dummy variables first.

EXAMPLE This example comes from a randomized, double-blind, 6-week, parallel group trial of hypercholesterolemic patients randomized to mg equivalent doses of atorvastatin (Atorva) vs. ezetimibe (EZE) 10mg/simvastatin (Simva) to evaluate their low density lipoprotein cholesterol (LDL-C) reduction effect (Christie M. Ballantyne et al. 2005). Some clinicians may have a concern that a decrease in C-reactive protein (CRP) may be mainly due to decrease in LDL-C. To answer this question, one needs to compute the partial correlation coefficient between percent changes from baseline in CRP and LDL-C controlling the variables trt, baseline LDL-C (bl_ldlc) baseline HDL-C (bl_hdl), and baseline CRP (bl_crp). Here, trt is treatment, which is coded as 1=Atorva 10 mg, 2= Atorva 20 mg, 3=Atorva 40 mg, 4= Atorva 80 mg, 5=EZE 10/Simva 10 mg, 6=EZE 10/Simva 20 mg, 7=EZE 10/Simva 40 mg, 8=EZE 10/Simva 80 mg. Note that, even trt is numeric variable, but it isn’t continuous, it is categorical. Log (SE_CRP/BL_CRP) is used to

Page 3: Using SAS to Compute Partial Correlation - Lex Jansen · 1 PharmaSUG2010 - Paper SP01 Using SAS® to Compute Partial Correlation Jianxin Lin, Aiming Yang, Arvind Shah Merck & Co.,

3

replace the CRP percent change from baseline since CRP isn't normally distributed, where SE_CRP is study end CRP and BL_CRP is baseline CRP. SAS code and output for using PROC CORR, PROC REG, PROC GLM: title "Partial corrlation between log(SE_CRP/BL_CRP) and LDLCP"; title2 "Control trt bl_ldlc bl_hdl bl_crp"; title3 "PROC CORR"; PROC CORR data = crpeff pearson cov;

var log_crp; with ldlcp; partial trt bl_ldlc bl_hdl bl_crp; run;

SAS output: The CORR Procedure 4 Partial Variables: TRT bl_ldlc bl_hdl bl_crp 1 With Variables: ldlcp 1 Variables: log_crp Pearson Partial Correlation Coefficients, N = 1832 Prob > |r| under H0: Partial Rho=0 log_crp ldlcp 0.05614

Pct Chg From Baseline In LDL-C 0.0164 The partial correlation coefficient is 0.05614.

title3 "PROC REG"; PROC REG data= crpeff;

model log_crp=trt bl_ldlc bl_hdl bl_crp ldlcp / pcorr1 pcorr2; quit; SAS output: Parameter Estimates Squared Squared Partial Partial Variable Label DF t Value Pr > |t| Corr Type I Corr Type II Intercept Intercept 1 -0.97 0.3301 . . TRT Treatment Group Code 1 -0.41 0.6835 0.00091943 0.00009107 bl_ldlc Baseline LDL-C 1 0.66 0.5118 0.00058478 0.00023574 bl_hdl Baseline HDL 1 1.42 0.1561 0.00278 0.00110 bl_crp Baseline CRP 1 -13.73 <.0001 0.09340 0.09362 ldlcp Pct Chg From Baseline In LDL-C 1 2.40 0.0164 0.00315 0.00315

The squared partial correlation coefficient is 0.00315 = 0.05614 * 0.05614. title3 "PROC GLM"; PROC GLM data=crpeff;

class trt; model log_crp ldlcp = trt bl_ldlc bl_hdl bl_crp; manova / printe; run; quit;

Page 4: Using SAS to Compute Partial Correlation - Lex Jansen · 1 PharmaSUG2010 - Paper SP01 Using SAS® to Compute Partial Correlation Jianxin Lin, Aiming Yang, Arvind Shah Merck & Co.,

4

SAS output: Partial Correlation Coefficients from the Error SSCP Matrix / Prob > |r| DF = 1821 log_crp ldlcp log_crp 1.000000 0.044924

0.0552 ldlcp 0.044924 1.000000 0.0552 The partial correlation coefficient is 0.044924, which is different from 0.05614 from PROC CORR. title "Partial corrlation between log(SE_CRP/BL_CRP) and LDLCP"; title2 "Control trt bl_ldlc bl_hdl bl_crp"; title3 "PROC CORR with categorical controlled variables"; data crpeff;

set crpeff; trt1=(trt=1); trt2=(trt=2); trt3=(trt=3); trt4=(trt=4); trt5=(trt=5); trt6=(trt=6); trt7=(trt=7); trt8=(trt=8); run; PROC CORR data = crpeff pearson cov;

var log_crp; with ldlcp; partial trt1 trt2 trt3 trt4 trt5 trt6 trt7 bl_ldlc bl_hdl bl_crp; run;

SAS output: The CORR Procedure Pearson Partial Correlation Coefficients, N = 1832 Prob > |r| under H0: Partial Rho=0 log_crp ldlcp 0.04492

Pct Chg From Baseline In LDL-C 0.0552 This time, the treatment is treated as categorical controlled variable; the partial correlation coefficient is 0.04492, which is same as the result from PROC GLM procedure. title "Simple corrlation between log(SE_CRP/BL_CRP) and LDLCP"; title2 "PROC CORR"; PROC CORR data = crpeff pearson cov;

var log_crp; with ldlcp; run;

SAS output: 1 With Variables: ldlcp 1 Variables: log_crp Pearson Correlation Coefficients Prob > |r| under H0: Rho=0 Number of Observations log_crp ldlcp 0.05359 Pct Chg From Baseline In LDL-C 0.0218 1832

Page 5: Using SAS to Compute Partial Correlation - Lex Jansen · 1 PharmaSUG2010 - Paper SP01 Using SAS® to Compute Partial Correlation Jianxin Lin, Aiming Yang, Arvind Shah Merck & Co.,

5

The partial correlation coefficient between percent changes from baseline in CRP and LDL-C controlling variables is 0.04492 with p=0.0552 is lower than the simple correlation coefficient (0.05359 with p=0.0218). That is, the original correlation is spurious. Therefore, one may say that the proportion of decrease of CRP being due to decrease of LDL-C is very small.

CONCLUSION In SAS, PROC CORR, PROC REG, and PROC GLM procedures can provide partial correlation coefficient. Partial correlation still needs to meet the assumptions of linearity and homoscedasticity. If these assumptions don’t satisfy then the Spearman partial rank-order correlation coefficients or Kendall's partial tau-b correlation coefficients may be more appropriate. When there are categorical controlled variables, the PROC GLM is easier to use than PROC CORR or PROC REG since PROC GLM does not need to code these variables to dummy variables. REFERENCES

Christie M. Ballantyne et al. (2005). Dose-comparison study of the combination of ezetimibe and simvastatin (Vytorin) versus atorvastatin in patients with hypercholesterolemia: The Vytorin Versus Atorvastatin (VYVA) Study. American Heart Journal. Volume 149, Number 3, p464-473. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SASInstitute Inc. in the USA and other countries. ® indicates USA registration.

CONTACT INFORMATION

Your comments and questions are valuable and appreciated. Authors can be reached at

Jianxin Lin Merck & Co., Inc. Rahway, NJ 07065 U.S.A. Email: [email protected]

Aiming Yang Merck & Co., Inc. Rahway, NJ 07065 U.S.A. Email: [email protected] Arvind Shah Merck & Co., Inc. Rahway, NJ 07065 U.S.A. Email: [email protected]