biostatistics unit 9 – regression and correlation

35
Biostatistics Unit 9 – Regression and Correlation

Upload: poppy-copeland

Post on 28-Dec-2015

241 views

Category:

Documents


4 download

TRANSCRIPT

Biostatistics

Unit 9 – Regression and Correlation

Regression and Correlation

Introduction

Regression and correlation analysis studies the relationships between variables.

This area of statistics was started in the 1860s by Francis Galton (1822-1911) who was also Darwin’s Cousin.

Nature of DataThe data are in the form of (x,y) pairs.

Graphical Representation

A scatter plot (x-y) plot is used to display regression and correlation data. The regression line has the form

y = mx + b

In actual practice, various forms are used such as y = ax + b and y = a + bx.

General Regression Line

y = + x + is the y-intercept

is the slope

is the error term

Calculations

For each point, the vertical distance from the point to the regression line is squared. Adding these gives the sum of squares.

Regression Analysis

Regression analysis allows the experimenter to predict one value based on the value of another.

Data

Data are in the form of (x,y) pairs.

Regression Equation

Using the regression equation

• Interpolation is used to find values of points between the data points.

• Extrapolation is used to find values of points outside the range of the data.

Be careful that the results of the calculations give realistic results.

Significance of regression analysis

It is possible to perform the linear regression t test. In this test:

is the population regression coefficient

is the population correlation coefficient

Hypotheses

H0: and = 0

HA: and 0

Calculations and Results

Calculator setup

Calculations and Results

Results

Correlation

Correlation is used to give information about the relationship between x and y. When the regression equation is calculated, the correlation results indicate the nature and strength of the relationship.

Correlation Coefficient

The correlation coefficient, r, indicates the nature and strength of the relationship. Values of r range from -1 to +1. A correlation coefficient of 0 means that there is no relationship.

Correlation Coefficient

Perfect negative correlation, r = -1.

Correlation Coefficient

No correlation, r = 0.

Correlation Coefficient

Perfect positive correlation, r = +1.

Coefficient of Determination

The coefficient of determination is r2. It has values between 0 and 1. The value of r2 indicates the percentage of the relationship resulting from the factor being studied.

Graphs

Scatter plot

Graphs

Scatter plot with regression line

Data for calculations

Calculations

Calculate the regression equation

Calculations

Calculate the regression equation

Calculations

Calculate the regression equation

y = 4.53x – 1.57

Calculations

Calculate the correlation coefficient

Coefficient of Determination

The coefficient of determination is r2. It indicates the percentage of the contribution that the factor makes toward the relationship between x and y. With r = .974, the coefficient of determination r2 = .948. This means that about 95% of the relationship is due to the temperature.

Residuals

The distance that each point is above or below the line is called a residual. With a good relationship, the values of the residuals will be randomly scattered. If there is not a random residual plot then there is another factor or effect involved that needs attention.

Calculate the residual variance

Calculate the residual variance

Results of linear regression t test

Results of linear regression t test

fin