discovering and describing relationships farideh dehkordi-vakil

35
Discovering and Describing Relationships Farideh Dehkordi-Vakil

Post on 21-Dec-2015

232 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Discovering and Describing Relationships Farideh Dehkordi-Vakil

Discovering and Describing Relationships

Farideh Dehkordi-Vakil

Page 2: Discovering and Describing Relationships Farideh Dehkordi-Vakil

Exploring Relationships between Two Quantitative Variables

Scatter plots Represent the relationship between two

different continuous variables measured on the same subjects.

Each point in the plot represents the values for one subject for the two variables.

Page 3: Discovering and Describing Relationships Farideh Dehkordi-Vakil

Exploring Relationships between Two Quantitative Variables

Example:

Data reported by the organization for Economic Development and Cooperation on its 29 member nations in 1998.

Per capita gross domestic product is on x-axis

Per capita health care expenditures is on y-axis.

Per capita gross domestic product & Per capita health care expenditures

0

5000

10000

15000

20000

25000

30000

35000

40000

0 500 1000 1500 2000 2500 3000 3500 4000 4500

PCGDP

Page 4: Discovering and Describing Relationships Farideh Dehkordi-Vakil

Exploring Relationships between Two Quantitative Variables

We can describe the overall pattern of scatter plot by Form or shape Direction strength

Page 5: Discovering and Describing Relationships Farideh Dehkordi-Vakil

Exploring Relationships between Two Quantitative Variables

Form or shape The form shown by the scatter plot is linear if

the points lie in a straight-line pattern. Strength

The relation ship is strong if the points lie close to a line, with little scatter.

Page 6: Discovering and Describing Relationships Farideh Dehkordi-Vakil

Exploring Relationships between Two Quantitative Variables

Direction Positive and negative association

Two variables are positively associated when above-average values of one variable tend to occur in individuals with above average values for the other variable, and below average values of both also tend to occur together.

Two variable are negatively associated when above average values for one tend to occur in subjects with below average values of the other, and vice-versa

Page 7: Discovering and Describing Relationships Farideh Dehkordi-Vakil

Exploring Relationships between Two Quantitative Variables

Per capita health care example “subjects” studied are

countries Form of relationship is

roughly linear The direction is

positive The relationship is

strong.

Per capita gross domestic product & Per capita health care expenditures

0

5000

10000

15000

20000

25000

30000

35000

40000

0 500 1000 1500 2000 2500 3000 3500 4000 4500

PCGDP

Page 8: Discovering and Describing Relationships Farideh Dehkordi-Vakil

Correlation It is often useful to have a measure of degree of

association between two variables. For example, you may believe that sales may be affected by expenditures on advertising, and want to measure the degree of association between sales and advertising. Correlation coefficient is a numeric measure of the

direction and strength of linear relationship between two continuous variables

The notation for sample correlation coefficient is r.

Page 9: Discovering and Describing Relationships Farideh Dehkordi-Vakil

Correlation There are several alternative ways to write the

algebraic expression for the correlation coefficient. The following is one.

X and Y represent the two variables of interest. For example advertising and sales or per capita gross domestic product, and the per capita health care expenditure.

n is the number of subjects in the sample The notation for population correlation coefficient is .

2222 )()( YYnXXn

YXXYnr

Page 10: Discovering and Describing Relationships Farideh Dehkordi-Vakil

Correlation Facts about correlation coefficient

r has no unit. r > 0 indicates a positive association; r < 0

indicates a negative association r is always between –1 and +1 Values of r near 0 imply a very weak linear

relationship Correlation measures only the strength of linear

association.

Page 11: Discovering and Describing Relationships Farideh Dehkordi-Vakil

Correlation We could perform a hypothesis test to

determine whether the value of a sample correlation coefficient (r) gives us reason to believe that the population correlation () is significantly different from zero

The hypothesis test would be

H0: = 0

Ha: 0

Page 12: Discovering and Describing Relationships Farideh Dehkordi-Vakil

Correlation The test statistic would be

The test statistic has a t-distribution with n-2 degrees of freedom.

Reject H0 if

21

02

nr

rt

2;22;2 or nn tttt

Page 13: Discovering and Describing Relationships Farideh Dehkordi-Vakil

Example: Do wages rise with experience?

Many factors affect the wages of workers: the industry they work in, their type of job, their education and their experience, and changes in general levels of wages. We will look at a sample of 59 married women who hold customer service jobs in Indiana banks. The following table gives their weekly wages at a specific point in time also their length of service with their employer, in month. The size of the place of work is recorded simply as “large” (100 or more workers) or “small.” Because industry, job type, and the time of measurement are the same for all 59 subjects, we expect to see a clear relationship between wages and length of service.

Page 14: Discovering and Describing Relationships Farideh Dehkordi-Vakil

Example: Do wages rise with experience?

Page 15: Discovering and Describing Relationships Farideh Dehkordi-Vakil

Example: Do wages rise with experience?

Page 16: Discovering and Describing Relationships Farideh Dehkordi-Vakil

Example: Do wages rise with experience?

The correlation between wages and length of service for the 59 bank workers is r = 0.3535.

We expect a positive correlation between length of service and wages in the population of all married female bank workers. Is the sample result convincing that this is true?

Page 17: Discovering and Describing Relationships Farideh Dehkordi-Vakil

Example: Do wages rise with experience?

To compute correlation: we need:

Replacing these in the formula

We want to test

H0: = 0 Ha: > 0

The test statistic is853.2

)3535.0(1

2593535.0

1

222

r

nrt

4159 23070 YX 451031 9461302 22 YX

1719430 XY

3535.)4159()451031(59)23070()9461302(59

)4159)(23070()1719430(59

)()( 222222

YYnXXn

YXXYnr

Page 18: Discovering and Describing Relationships Farideh Dehkordi-Vakil

Example: Do wages rise with experience?

Comparing t = 2.853 with critical values from the t table with n - 2 = 57 degrees of freedom help us to make our decision.

Conclusion: Since P( t > 2.853) < .005, we reject H0. There is a positive correlation between wages

and length of service.

Page 19: Discovering and Describing Relationships Farideh Dehkordi-Vakil

Correlograms: An Alternative Method of Data Exploration In evaluating time series data, it is useful to look at the correlation

between successive observations over time. This measure of correlation is called autocorrelation and may be

calculated as follows:

rk = autocorrelation coefficient for a k period lag. mean of the time series. yt = Value of the time series at period t. y t-k = Value of time series k periods before period t.

n

tt

n

ktktt

k

yy

yyyyr

1

2

1

)(

))((

y

Page 20: Discovering and Describing Relationships Farideh Dehkordi-Vakil

Correlograms: An Alternative Method of Data Exploration

Autocorrelation coefficient for different time lags can be used to answer the following questions about a time series data. Are the data random?

In this case the autocorrelations between yt and y t-k for any lag are close to zero. The successive values of a time series are not related to each other.

Page 21: Discovering and Describing Relationships Farideh Dehkordi-Vakil

Correlograms: An Alternative Method of Data Exploration

Is there a trend? If the series has a trend, yt and y t-k are highly

correlated The autocorrelation coefficients are significantly

different from zero for the first few lags and then gradually drops toward zero.

The autocorrelation coefficient for the lag 1 is often very large (close to 1).

A series that contains a trend is said to be non-stationary.

Page 22: Discovering and Describing Relationships Farideh Dehkordi-Vakil

Correlograms: An Alternative Method of Data Exploration

Is there seasonal pattern? If a series has a seasonal pattern, there will be a

significant autocorrelation coefficient at the seasonal time lag or multiples of the seasonal lag.

The seasonal lag is 4 for quarterly data and 12 for monthly data.

Page 23: Discovering and Describing Relationships Farideh Dehkordi-Vakil

Correlograms: An Alternative Method of Data Exploration

Is it stationary? A stationary time series is one whose basic

statistical properties, such as the mean and variance, remain constant over time.

Autocorrelation coefficients for a stationary series decline to zero fairly rapidly, generally after the second or third time lag.

Page 24: Discovering and Describing Relationships Farideh Dehkordi-Vakil

Correlograms: An Alternative Method of Data Exploration

To determine whether the autocorrelation at lag k is significantly different from zero, the following hypothesis and rule of thumb may be used.

H0: k= 0, Ha: k 0

For any k, reject H0 if Where n is the number of observations. This rule of thumb is for = 5%

nrk

2

Page 25: Discovering and Describing Relationships Farideh Dehkordi-Vakil

Correlograms: An Alternative Method of Data Exploration

The hypothesis test developed to determine whether a particular autocorrelation coefficient is significantly different from zero is:

Hypotheses H0: k= 0, Ha: k 0

Test Statistic:kn

rt k

1

0

Page 26: Discovering and Describing Relationships Farideh Dehkordi-Vakil

Correlograms: An Alternative Method of Data Exploration

Reject H0 if

2;2; or knkn tttt

Page 27: Discovering and Describing Relationships Farideh Dehkordi-Vakil

Correlograms: An Alternative Method of Data Exploration

The plot of the autocorrelations versus time lag is called Correlogram.

The horizontal scale is the time lag The vertical axis is the autocorrelation

coefficient. Patterns in a Correlogram are used to

analyze key features of data.

Page 28: Discovering and Describing Relationships Farideh Dehkordi-Vakil

Example:Mobil Home Shipment Correlograms for the mobile home shipment Note that this is quarterly data

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1 2 3 4 5 6 7 8 9 10 11 12

ACF

Upper Limit

Lower Limit

Page 29: Discovering and Describing Relationships Farideh Dehkordi-Vakil

Example:Japanese exchange Rate As the world’s economy becomes increasingly

interdependent, various exchange rates between currencies have become important in making business decisions. For many U.S. businesses, The Japanese exchange rate (in yen per U.S. dollar) is an important decision variable. A time series plot of the Japanese-yen U.S.-dollar exchange rate is shown below. On the basis of this plot, would you say the data is stationary? Is there any seasonal component to this time series plot?

Page 30: Discovering and Describing Relationships Farideh Dehkordi-Vakil

Example:Japanese exchange Rate

Japanese Exchange Rate

0

20

40

60

80

100

120

140

160

180

0 5 10 15 20 25 30

Months

Exc

han

ge

Rat

e ( ye

n p

er U

.S. d

olla

r)

EXRJ

Page 31: Discovering and Describing Relationships Farideh Dehkordi-Vakil

Example:Japanese exchange Rate Here is the autocorrelation

structure for EXRJ. With a sample size of 12,

the critical value is

This is the approximate 95% critical value for rejecting the null hypothesis of zero autocorrelation at lag K.

Obs ACF1 .81572 .53833 .27334 .03405 -.12146 -.19247 -.21578 -.19789 -.121510 -.121711 -.182312 -.2593

408.024

22

n

Page 32: Discovering and Describing Relationships Farideh Dehkordi-Vakil

Example:Japanese exchange Rate The Correlograms for EXRJ is given below

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1 2 3 4 5 6 7 8 9 10 11 12

ACF

Upper Limit

Lower Limit

Page 33: Discovering and Describing Relationships Farideh Dehkordi-Vakil

Example:Japanese exchange Rate Since the autocorrelation coefficients fall to

below the critical value after just two periods, we can conclude that there is no trend in the data.

Page 34: Discovering and Describing Relationships Farideh Dehkordi-Vakil

Example:Japanese exchange Rate

To check for seasonality at = .05 The hypotheses are:

H0; 12 = 0 Ha:12 0

Test statistic is:

Reject H0 if

899.01224/1

2595.

1

0

kn

rt k

2;2; or knkn tttt 179.2025.0;122; tt kn

Page 35: Discovering and Describing Relationships Farideh Dehkordi-Vakil

Example:Japanese exchange Rate Since

We do not reject H0 , therefore seasonality does not appear to be an attribute of the data.

179.2899.0 025.0;12 tt