introduction to biostatistics and bioinformatics regression and correlation

44
Introduction to Biostatistics and Bioinformatics Regression and Correlation

Upload: vincent-wheeler

Post on 14-Jan-2016

231 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Introduction to Biostatistics and Bioinformatics Regression and Correlation

Introduction to Biostatistics and Bioinformatics

Regression and Correlation

Page 2: Introduction to Biostatistics and Bioinformatics Regression and Correlation

Learning Objectives

Regression – estimation of the relationship between variables • Linear regression• Assessing the assumptions• Non-linear regression

Page 3: Introduction to Biostatistics and Bioinformatics Regression and Correlation

Learning Objectives

Regression – estimation of the relationship between variables • Linear regression• Assessing the assumptions• Non-linear regression

Correlation • Correlation coefficient quantifies the association strength• Sensitivity to the distribution

Page 4: Introduction to Biostatistics and Bioinformatics Regression and Correlation

Relationships

Relationship No Relationship

Page 5: Introduction to Biostatistics and Bioinformatics Regression and Correlation

Relationships

Linear Relationships

Non-Linear Relationship

Page 6: Introduction to Biostatistics and Bioinformatics Regression and Correlation

Relationships

Linear, Strong Linear, Weak

Page 7: Introduction to Biostatistics and Bioinformatics Regression and Correlation

Linear Regression

Linear, Strong Linear, Weak Non-Linear

Page 8: Introduction to Biostatistics and Bioinformatics Regression and Correlation

Linear Regression - Residuals

Linear, Strong Linear, Weak Non-Linear

Resi

duals

Resi

duals

Resi

duals

Page 9: Introduction to Biostatistics and Bioinformatics Regression and Correlation

Linear Regression Model

Linearcomponent

Intercept Slope

Random Error

Dependent Variable

Independent Variable

Random Error component

ii10i εXββY

Page 10: Introduction to Biostatistics and Bioinformatics Regression and Correlation

Linear Regression Assumptions

The relationship between the variables is linear.

Page 11: Introduction to Biostatistics and Bioinformatics Regression and Correlation

Linear Regression Assumptions

The relationship between the variables is linear.

Errors are independent, normally distributed with mean zero and constant variance.

Page 12: Introduction to Biostatistics and Bioinformatics Regression and Correlation

Linear Regression Assumptions

Linear Non-LinearR

esi

duals

Resi

duals

Page 13: Introduction to Biostatistics and Bioinformatics Regression and Correlation

Linear Regression Assumptions

Constant Variance Variable VarianceR

esi

duals

Resi

duals

Page 14: Introduction to Biostatistics and Bioinformatics Regression and Correlation

Linear Regression Model

Linearcomponent

Intercept Slope

Random Error

Dependent Variable

Independent Variable

Random Error component

ii10i εXββY

Page 15: Introduction to Biostatistics and Bioinformatics Regression and Correlation

Linear Regression – Estimating the Line

Estimated

Intercept

Estimated Slope

Estimated Value

Independent Variablei10i XˆˆY

Page 16: Introduction to Biostatistics and Bioinformatics Regression and Correlation

Least Squares Method

Find slope and intercept given measurements Xi,Yi, i=1..N

that minimizes the sum of the squares of the residuals.

Page 17: Introduction to Biostatistics and Bioinformatics Regression and Correlation

Least Squares Method

2

iS

Find slope and intercept given measurements Xi,Yi, i=1..N

that minimizes the sum of the squares of the residuals.

Page 18: Introduction to Biostatistics and Bioinformatics Regression and Correlation

Least Squares Method

Find slope and intercept given measurements Xi,Yi, i=1..N

that minimizes the sum of the squares of the residuals.

0ˆ0

S

01

S

Page 19: Introduction to Biostatistics and Bioinformatics Regression and Correlation

Least Squares Method

Find slope and intercept given measurements Xi,Yi, i=1..N

that minimizes the sum of the squares of the residuals.

0))X)X(

(ˆXY

XY(2Xˆ2XXˆ2X

Y2XY2

Xˆ2XXˆ2XY2XY2)Xˆ2X)XˆY(2XY2(

)Xˆ2Xˆ2XY2()XˆXˆˆ2XˆY2(ˆ

)XˆXˆˆ2ˆ)Xˆˆ(Y2(Yˆ

)XˆXˆˆ2ˆ)Xˆˆ(Y2(Yˆ

))Xˆˆ()Xˆˆ(Y2(Yˆ

))Xˆˆ(Y(ˆ

)Y-Y(ˆˆˆ

2i

2i

1ii

ii2i1i

i1i

iii

2i1i1iii

2i1i1ii

2i1i0ii

2i

21i10i1i

1

2i

21i10

20i10i

2i

1

2i

21i10

20i10i

2i

1

2i10i10i

2i

1

2i10i

1

2ii

1

2

11

NNNN

Si

NN

N

N

i1

i0

2i

2i

iiii

1

XˆYˆ

X)X(

XYXY

ˆ

Page 20: Introduction to Biostatistics and Bioinformatics Regression and Correlation

Linear Regression in Python

import scipy.stats as stats

slope,intercept,r_value,p_value,std_err = stats.linregress(x,y)

Page 21: Introduction to Biostatistics and Bioinformatics Regression and Correlation

Linear Regression Example

Linear, Strong

Resi

duals

x=np.linspace(-1,1,points)y=x+0.1*np.random.normal(size=points)slope,intercept,r_value,p_value,std_err = stats.linregress(x,y)y_line=slope*x+intercept

fig, (ax1) = plt.subplots(1,figsize=(4,4))ax1.scatter(x,y,color='#4D0132',lw=0,s=60)ax1.set_xlim([-1.5,1.5])ax1.set_ylim([-1.5,1.5])ax1.plot(x,y_line,color='red',lw=2)fig.savefig('linear.png')

fig, (ax1) = plt.subplots(1,figsize=(4,4))ax1.scatter(x,y-y_line, color='#963725',lw=0,s=60)ax1.set_xlim([-1.5,1.5])ax1.set_ylim([-1.5,1.5])fig.savefig('linear-residuals.png')

Page 22: Introduction to Biostatistics and Bioinformatics Regression and Correlation

Linear Regression Example

x=np.linspace(-1,1,points)y=x+0.4*np.random.normal(size=points)slope,intercept,r_value,p_value,std_err = stats.linregress(x,y)y_line=slope*x+intercept

fig, (ax1) = plt.subplots(1,figsize=(4,4))ax1.scatter(x,y,color='#4D0132',lw=0,s=60)ax1.set_xlim([-1.5,1.5])ax1.set_ylim([-1.5,1.5])ax1.plot(x,y_line,color='red',lw=2)fig.savefig('linear-weak.png')

fig, (ax1) = plt.subplots(1,figsize=(4,4))ax1.scatter(x,y-y_line, color='#963725',lw=0,s=60)ax1.set_xlim([-1.5,1.5])ax1.set_ylim([-1.5,1.5])fig.savefig('linear-weak-residuals.png')

Linear, Weak

Resi

duals

Page 23: Introduction to Biostatistics and Bioinformatics Regression and Correlation

Linear Regression Example

Outlier

Page 24: Introduction to Biostatistics and Bioinformatics Regression and Correlation

Regression – Non-linear data

Solution 1: Transformation

Solution 2: Non-linear Regression

,...)ˆ,ˆ,f(XY 10ii

Page 25: Introduction to Biostatistics and Bioinformatics Regression and Correlation

Correlation Coefficient

22 )()(

))((

YYXX

YYXXr

ii

ii

• A measure of the correlation between the two variables

• Quantifies the association strength

Pearson correlation coefficient:

Page 26: Introduction to Biostatistics and Bioinformatics Regression and Correlation

Correlation Coefficient

Page 27: Introduction to Biostatistics and Bioinformatics Regression and Correlation

Correlation Coefficient

Page 28: Introduction to Biostatistics and Bioinformatics Regression and Correlation

Correlation Coefficient

Page 29: Introduction to Biostatistics and Bioinformatics Regression and Correlation

Correlation Coefficient

Page 30: Introduction to Biostatistics and Bioinformatics Regression and Correlation

Correlation Coefficient

Page 31: Introduction to Biostatistics and Bioinformatics Regression and Correlation

Correlation Coefficient

Source: Wikipedia

Page 32: Introduction to Biostatistics and Bioinformatics Regression and Correlation

Coefficient of Variation

n

ni

iix

1

xxx n,...,,21

Variance

Sample

Mean

n

i

ni

ix

1

2

2)(

Coefficient of Variation (CV)

Page 33: Introduction to Biostatistics and Bioinformatics Regression and Correlation

Correlation Coefficient and CV

Uniform distribution

Page 34: Introduction to Biostatistics and Bioinformatics Regression and Correlation

Correlation Coefficient and CV

Uniform distribution Normal distribution Lognormal distribution

Page 35: Introduction to Biostatistics and Bioinformatics Regression and Correlation

Correlation Coefficient - Outliers

Outlier

Page 36: Introduction to Biostatistics and Bioinformatics Regression and Correlation

Correlation Coefficient – Non-linear

Solutions:• Transformation• Rank correlation (Spearman, r=0.93)

Page 37: Introduction to Biostatistics and Bioinformatics Regression and Correlation

Correlation Coefficient and p-value

Hypothesis: Is there a correlation?

r r r

p p p

Page 38: Introduction to Biostatistics and Bioinformatics Regression and Correlation

Application: Analytical Measurements

Theoretical Concentration

Measu

red

C

on

cen

trati

on

Page 39: Introduction to Biostatistics and Bioinformatics Regression and Correlation

A Few Characteristics of Analytical Measurements

Accuracy: Closeness of agreement between a test result and an accepted reference value.

Precision: Closeness of agreement between independent test results.

Robustness: Test precision given small, deliberate changes in test conditions (preanalytic delays, variations in storage temperature).

Lower limit of detection: The lowest amount of analyte that is statistically distinguishable from background or a negative control.

Limit of quantification: Lowest and highest concentrations of analyte that can be quantitatively determined with suitable precision and accuracy.

Linearity: The ability of the test to return values that are directly proportional to the concentration of the analyte in the sample.

Page 40: Introduction to Biostatistics and Bioinformatics Regression and Correlation

Limit of Detection and Linearity

Theoretical Concentration

Measu

red

C

on

cen

trati

on

Page 41: Introduction to Biostatistics and Bioinformatics Regression and Correlation

Precision and Accuracy

Theoretical Concentration

Theoretical Concentration

Measu

red

C

on

cen

trati

on

Measu

red

C

on

cen

trati

on

Page 42: Introduction to Biostatistics and Bioinformatics Regression and Correlation

Summary - Regression

Source: http://xkcdsw.com/content/img/2274.png

Page 43: Introduction to Biostatistics and Bioinformatics Regression and Correlation

Summary - Correlation

Page 44: Introduction to Biostatistics and Bioinformatics Regression and Correlation

Next Lecture: Experimental Design & Analysis

Experimental Design by Christine Ambrosinowww.hawaii.edu/fishlab/Nearside.htm