logistic regression in case-control study
DESCRIPTION
This is a basic presentation about use of Logistic regression in case-control study of genetics data in R.TRANSCRIPT
![Page 1: Logistic Regression in Case-Control Study](https://reader036.vdocuments.mx/reader036/viewer/2022062300/554b0151b4c905c12d8b4d7c/html5/thumbnails/1.jpg)
Logistic Regression in Case-Control study using – A
statistical tool
Satish Gupta
![Page 2: Logistic Regression in Case-Control Study](https://reader036.vdocuments.mx/reader036/viewer/2022062300/554b0151b4c905c12d8b4d7c/html5/thumbnails/2.jpg)
What is R?
The R statistical programming language is a free open
source package.
The language is very powerful for writing programs.
Many statistical functions are already built in.
Contributed packages expand the functionality to
cutting edge research.
![Page 3: Logistic Regression in Case-Control Study](https://reader036.vdocuments.mx/reader036/viewer/2022062300/554b0151b4c905c12d8b4d7c/html5/thumbnails/3.jpg)
Getting Started
Go to www.r-project.org
Downloads: CRAN (Comprehensive R Archive
Network)
Set your Mirror: location close to you.
Select Windows 95 or later, MacOS or UNIX
platforms
![Page 4: Logistic Regression in Case-Control Study](https://reader036.vdocuments.mx/reader036/viewer/2022062300/554b0151b4c905c12d8b4d7c/html5/thumbnails/4.jpg)
Getting Started
![Page 5: Logistic Regression in Case-Control Study](https://reader036.vdocuments.mx/reader036/viewer/2022062300/554b0151b4c905c12d8b4d7c/html5/thumbnails/5.jpg)
Basic operators and calculations
Comparison operators equal: == not equal: != greater/less than: > < greater/less than or equal: >= <=
Example: 1 == 1 # Returns TRUE
![Page 6: Logistic Regression in Case-Control Study](https://reader036.vdocuments.mx/reader036/viewer/2022062300/554b0151b4c905c12d8b4d7c/html5/thumbnails/6.jpg)
Basic operators and calculationsLogical operators AND: &
x <- 1:10; y <- 10:1 # Creates the sample vectors 'x' and 'y'.
x > y & x > 5 # Returns TRUE where both comparisons return TRUE.
OR: |
x == y | x != y # Returns TRUE where at least one comparison is TRUE.
NOT: !
!x > y # The '!' sign returns the negation (opposite) of a logical vector.
![Page 7: Logistic Regression in Case-Control Study](https://reader036.vdocuments.mx/reader036/viewer/2022062300/554b0151b4c905c12d8b4d7c/html5/thumbnails/7.jpg)
Basic operators and calculationsCalculations Four basic arithmetic functions: addition, subtraction,
multiplication and division
1 + 1; 1 - 1; 1 * 1; 1 / 1 # Returns results of basic arithmetic calculations.
Calculations on vectors
x <- 1:10; sum(x); mean(x), sd(x); sqrt(x) # Calculates for the vector x its sum, mean, standard deviation and square root.
x <- 1:10; y <- 1:10; x + y # Calculates the sum for each element in the vectors x and y.
![Page 8: Logistic Regression in Case-Control Study](https://reader036.vdocuments.mx/reader036/viewer/2022062300/554b0151b4c905c12d8b4d7c/html5/thumbnails/8.jpg)
R-Graphics
R provides comprehensive graphics utilities for visualizing and exploring scientific data. It includes:
Scatter plots Line plots Bar plots Pie charts Heatmaps Venn diagrams Density plots Box plots
![Page 9: Logistic Regression in Case-Control Study](https://reader036.vdocuments.mx/reader036/viewer/2022062300/554b0151b4c905c12d8b4d7c/html5/thumbnails/9.jpg)
Data handling in R Load data: mydata = read.csv(“/path/mydata.csv”) See data on screen: data(mydata) See top part of data: head(mydata) Specific number of rows and column of data: mydata[1:10,1:3] To get a type of data: class(mydata) Changing class of data: newdata = as.matrix(mydata) Summary of data: summary(mydata) Selecting (KEEPING) variables (columns)
newdata = mydata[c(1,3:5)]
![Page 10: Logistic Regression in Case-Control Study](https://reader036.vdocuments.mx/reader036/viewer/2022062300/554b0151b4c905c12d8b4d7c/html5/thumbnails/10.jpg)
Data handling in R
Selecting observations
newdata= subset(mydata, age>=20 | age <10, select=c(ID, weight)
newdata= subset(mydata, sex==“Male” & age >25, select=weight:income)
Excluding (DROPPING) variables (columns)
newdata = mydata[c(-3,-5)]
mydata$v3 = NULL
![Page 11: Logistic Regression in Case-Control Study](https://reader036.vdocuments.mx/reader036/viewer/2022062300/554b0151b4c905c12d8b4d7c/html5/thumbnails/11.jpg)
R-Library There are many tools defined as “package” are present in R for
different kind of analysis including data from genetics and genomics.
Depending upon the availability of library, it can be downloaded from two sources
Using CRAN (Comprehensive R Archive Network) as:
install.packages(“package_name”)
Using Bioconductor as:
source("http://bioconductor.org/biocLite.R")
biocLite(“package_name”)
![Page 12: Logistic Regression in Case-Control Study](https://reader036.vdocuments.mx/reader036/viewer/2022062300/554b0151b4c905c12d8b4d7c/html5/thumbnails/12.jpg)
R-Library
To load a package,
library() #Lists all libraries/packages that are available on a system.
library(genetics) #Package for genetics data analysis
library(help=genetics) #Lists all functions/objects of “genetics”
package
?function #Opens documentation of a function
![Page 13: Logistic Regression in Case-Control Study](https://reader036.vdocuments.mx/reader036/viewer/2022062300/554b0151b4c905c12d8b4d7c/html5/thumbnails/13.jpg)
What is Logistic Regression?
Logistic regression describes the relationship between a dichotomous response variable and a set of explanatory variables.
Logistic regression is often used because the relationship between the DV (a discrete variable) and a predictor is non-linear.
![Page 14: Logistic Regression in Case-Control Study](https://reader036.vdocuments.mx/reader036/viewer/2022062300/554b0151b4c905c12d8b4d7c/html5/thumbnails/14.jpg)
A General Model:
Logistic Regression
JJdisease
diseasedisease XX
p
pp
110)
1log()logit(
Where:
pdisease is the probability that an individual has a particular disease.
β0 is the intercept
β1, β2 … βJ are the coefficients (effects) of genetic factors
X1, X2 … XJ are the variables of genetic factors
![Page 15: Logistic Regression in Case-Control Study](https://reader036.vdocuments.mx/reader036/viewer/2022062300/554b0151b4c905c12d8b4d7c/html5/thumbnails/15.jpg)
Assumptions
Logistic regression does not make any assumptions of normality, linearity, and homogeneity of variance for the independent variables.
Because it does not impose these requirements, it is preferred to discriminant analysis when the data does not satisfy these assumptions.
![Page 16: Logistic Regression in Case-Control Study](https://reader036.vdocuments.mx/reader036/viewer/2022062300/554b0151b4c905c12d8b4d7c/html5/thumbnails/16.jpg)
Questions ??
What is the relative importance of each predictor variable? How does each predictor variable affect the outcome? Does a predictor variable make the solution better or
worse or have no effect? Are there interactions among predictors?
Does adding interactions among predictors (continuous or categorical) improve the model?
What is the strength of association between the outcome variable and a set of predictors?
Often in model comparison you want non-significant differences so strength of association is reported for even non-significant effects.
![Page 17: Logistic Regression in Case-Control Study](https://reader036.vdocuments.mx/reader036/viewer/2022062300/554b0151b4c905c12d8b4d7c/html5/thumbnails/17.jpg)
Types of Logistic Regression Unconditional logistic regression
Conditional logistic regression
** Rule of thumbs
Use conditional logistic regression if matching has been done, and unconditional if there has been no matching.
When in doubt, use conditional because it always gives unbiased results. The unconditional method is said to overestimate the odds ratio if it is not appropriate.
![Page 18: Logistic Regression in Case-Control Study](https://reader036.vdocuments.mx/reader036/viewer/2022062300/554b0151b4c905c12d8b4d7c/html5/thumbnails/18.jpg)
Data FormatStatus Matset Se_Quartiles GPX1 GPX4 SEP15 TXN2
1 1 <60 CT TT AG AG
0 1 >60 – 70 CC CC GG GG
1 2 <60 TT CC AG AA
0 2 >70 – 80 CC CT GG GG
1 3 >80 CC CC AA AA
0 3 >60 – 70 CT TT GG GG
1 4 <60 CC CC AA AG
0 4 >70 – 80 TT TT GG GG
1 5 >80 CC CC AG AA
0 5 <60 CC CC GG GG
1 6 >70 – 80 CT TT AA AA
0 6 >80 CC CC GG AG
1 7 >60 – 70 TT CC AA AG
![Page 19: Logistic Regression in Case-Control Study](https://reader036.vdocuments.mx/reader036/viewer/2022062300/554b0151b4c905c12d8b4d7c/html5/thumbnails/19.jpg)
Data and Library loading
Load and use data in R (Using Lung cancer data from PLoS One 2013, 8(3):e59051).
lung = read.csv(“/path/lung.csv”, sep= “\t”, header = TRUE)
Load the library and use data for analysis
library(epicalc)
use(lung)
![Page 20: Logistic Regression in Case-Control Study](https://reader036.vdocuments.mx/reader036/viewer/2022062300/554b0151b4c905c12d8b4d7c/html5/thumbnails/20.jpg)
Data Analysis
Performing conditional logistic regression (Case vs. Control)
clogit_lung = clogit(Status ~ Se_Quartiles + strata(Matset), data = .data)
clogistic.display(clogit_lung)
OR(95%CI) P(Wald's test) P(LR-test)
Quartiles: ref.=<60 <0.001
>60 – 70 0.4(0.15 – 1.09) 0.074
>70 – 80 0.11(0.03 – 0.33) <0.001
>80 0.10(0.03 – 0.34) <0.001
![Page 21: Logistic Regression in Case-Control Study](https://reader036.vdocuments.mx/reader036/viewer/2022062300/554b0151b4c905c12d8b4d7c/html5/thumbnails/21.jpg)
Data Analysis
Performing conditional logistic regression (Case vs. Control),
clogit_lung = clogit(Status ~ GPX1+ strata(Matset), data = .data)
clogistic.display(clogit_lung)
OR(95%CI) P(Wald's test) P(LR-test)
GPX1: ref.=CC 0.032
CT 0.44(0.22 – 0.86) 0.017
TT 0.42(0.13 – 1.38) 0.151
![Page 22: Logistic Regression in Case-Control Study](https://reader036.vdocuments.mx/reader036/viewer/2022062300/554b0151b4c905c12d8b4d7c/html5/thumbnails/22.jpg)
Data Analysis Performing conditional logistic regression (Case vs. Control),
clogit_lung = clogit(Status ~ Se_Quartiles + GPX1+ strata(Matset), data = .data)
clogistic.display(clogit_lung)
crude OR(95%CI)
adj. OR(95%CI)
P(Wald's test) P(LR-test)
Quartiles: ref.=<60 <0.001
>60 – 70 0.4(0.15 – 1.09) 0.32(0.11 – 0.96) 0.042
>70 – 80 0.11(0.03 – 0.33) 0.09(0.02 – 0.3) <0.001
>80 0.1(0.03 – 0.34) 0.05(0.01 – 0.23) <0.001
GPX1:ref.=CC 0.006
CT 0.44(0.22 – 0.86) 0.26(0.11 – 0.65) 0.004
TT 0.42(0.13 – 1.38) 0.44(0.09 – 2.18) 0.313
Environmental Factor
Genetic Factor
![Page 23: Logistic Regression in Case-Control Study](https://reader036.vdocuments.mx/reader036/viewer/2022062300/554b0151b4c905c12d8b4d7c/html5/thumbnails/23.jpg)
Data Analysis
Performing unconditional logistic regression (Case vs. Control),
ulogit_lung = glm(Status ~ Se_Quartiles , family=binomial, data = .data)
logistic.display(ulogit_lung)
OR(95%CI) P(Wald's test) P(LR-test)
Quartiles: ref.=<60 <0.001
>60 – 70 0.41 (0.17 – 1.02) 0.054
>70 – 80 0.13 (0.05 – 0.34) <0.001
>80 0.17 (0.07 – 0.42) <0.001
![Page 24: Logistic Regression in Case-Control Study](https://reader036.vdocuments.mx/reader036/viewer/2022062300/554b0151b4c905c12d8b4d7c/html5/thumbnails/24.jpg)
Data Analysis
Performing unconditional logistic regression (Case vs. Control),
ulogit_lung = glm(Status ~ GPX1 , family=binomial, data = .data)
logistic.display(ulogit_lung)
OR(95%CI) P(Wald's test) P(LR-test)
Quartiles: ref.=CC 0.034
CT 0.45 (0.24 – 0.85) 0.014
TT 0.44 (0.14 – 1.36) 0.156
![Page 25: Logistic Regression in Case-Control Study](https://reader036.vdocuments.mx/reader036/viewer/2022062300/554b0151b4c905c12d8b4d7c/html5/thumbnails/25.jpg)
Data Analysis Performing unconditional logistic regression (Case vs.
Control),
ulogit_lung = glm(Status ~ Se_Quartiles , family=binomial, data = .data)
logistic.display(ulogit_lung)
crude
OR(95%CI)adj.
OR(95%CI) P(Wald's test) P(LR-test)
Quartiles: ref.=<60 <0.001
>60 – 70 0.41 (0.17 – 1.02) 0.43 (0.17 – 1.08) 0.074
>70 – 80 0.13 (0.05 – 0.34) 0.13 (0.05 – 0.34) <0.001
>80 0.17 (0.07 – 0.42) 0.15 (0.06 – 0.39) <0.001
GPX1:ref.=CC 0.024
CT 0.45 (0.24 – 0.85) 0.40(0.20 – 0.80) 0.01
TT 0.44 (0.14 – 1.36) 0.42 (0.12 – 1.41) 0.161
![Page 26: Logistic Regression in Case-Control Study](https://reader036.vdocuments.mx/reader036/viewer/2022062300/554b0151b4c905c12d8b4d7c/html5/thumbnails/26.jpg)
Something More Changing the default reference
GPX1 = relevel(GPX1, ref = "TT")pack()
Saving the result
result = clogistic.display(clogit_lung)
write.csv(result$table, file=“path/result.csv“, sep = “\t”)
write.table(result$table, file=“path/result.xls“, sep = “\t”)
![Page 27: Logistic Regression in Case-Control Study](https://reader036.vdocuments.mx/reader036/viewer/2022062300/554b0151b4c905c12d8b4d7c/html5/thumbnails/27.jpg)
Summary: regression models
Regression models can be used to describe the average effect of predictors on outcomes in your data set.
They can tell how likely that the effect is just be due to chance.
They can look at each predictor “adjusting for” the others (estimating what would happen if all others were held constant.)
![Page 28: Logistic Regression in Case-Control Study](https://reader036.vdocuments.mx/reader036/viewer/2022062300/554b0151b4c905c12d8b4d7c/html5/thumbnails/28.jpg)
Thanks to,
Prof. Virasakdi Chongsuvivatwong
Epidemiology Unit,
Faculty of Medicine,
Prince of Songkla University, Thailand