problem set i: review

21
Problem Set I: Review Intro, Measures of Central Tendency & Variability, Z-scores and the Normal Distribution, Correlation, and Regression

Upload: borna

Post on 06-Feb-2016

94 views

Category:

Documents


0 download

DESCRIPTION

Problem Set I: Review. Intro, Measures of Central Tendency & Variability, Z-scores and the Normal Distribution, Correlation, and Regression. QUESTION 1: Short answer: What is a statistic? Give a definition and an example. Explain how the example illustrates the definition you have provided. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Problem Set I: Review

Problem Set I: ReviewIntro, Measures of Central Tendency & Variability, Z-scores and the Normal Distribution, Correlation,

and Regression

Page 2: Problem Set I: Review

QUESTION 1: Short answer: What is a statistic? Give a definition and an example. Explain how the example illustrates the definition you have provided.

A statistic is a number that organizes, summarizes, and makes understandable a collection of data. An example of a statistic is the mean.

The mean is a single number calculated on a set of data which gives an idea of the collection of values without having to report them all individually.

Page 3: Problem Set I: Review

QUESTION 2: A psychologist interested in the dating habits of undergraduates in the Psychology major samples 10 students and determines the number of dates they have had in the last six months. He knows that the mean number of dates is 7.8, and the sum of squares (SS) is 223.2. Assume a normal distribution.

A. What percentage of all undergraduate students went on less than 4 dates in the last six months?

B. If the psychologist had 10 students total, approximately how many of these students went on between 8 and 13 dates in the last six months?

In order to make ANY conclusions about proportions, we need to use the z-table.To use z-scores, we need the mean (x) and standard deviation (s).

_

x_

= 7.8s = 4.98

Turn 4 into a z-score: (4-7.8)/4.98 = -.76After shading in the distribution, it’s clear that “less than 4 dates” refers to an AREA C

The AREA C for z= .76 is .2236, or 22.36%

1

NSSs

92.223

s

98.4s

Page 4: Problem Set I: Review

QUESTION 2: A psychologist interested in the dating habits of undergraduates in the Psychology major samples 10 students and determines the number of dates they have had in the last six months. He knows that the mean number of dates is 7.8, and the sum of squares (SS) is 223.2. Assume a normal distribution.

A. What percentage of all undergraduate students went on less than 4 dates in the last six months?

B. If the psychologist had 10 students total, approximately how many of these students went on between 8 and 13 dates in the last six months?

In order to make ANY conclusions about proportions, we need to use the z-table.To use z-scores, we need the mean (x) and standard deviation (s).

_

x_

= 7.8s = 4.98

Turn 4 into a z-score: (4-7.8)/4.98 = -.76After shading in the distribution, it’s clear that “less than 4 dates” refers to an AREA C

The AREA C for z= .76 is .2236, or 22.36%

1

NSSs

92.223

s

98.4s

After shading in the distribution, it’s clear that “between 8 and 13” refers to an portion of the distribution which we can only find by combining areas from the table.

Turn 13 into a z-score: (13-7.8)/4.98 = 1.04 Turn 8 into a z-score: (8-7.8)/4.98 = .04

The AREA B for z= 1.04 is .3508

The AREA B for z= .04 is .0160

.3508 - .0160 = .3348 and so 10(.3348) = 3.35 students (approximately 3)

Page 5: Problem Set I: Review

QUESTION 2: A psychologist interested in the dating habits of undergraduates in the Psychology major samples 10 students and determines the number of dates they have had in the last six months. He knows that the mean number of dates is 7.8, and the sum of squares (SS) is 223.2. Assume a normal distribution.In order to make ANY conclusions about proportions, we need to use the z-table.To use z-scores, we need the mean (x) and standard deviation (s).

_

x_

= 7.8s = 4.98

C. What is the number of dates one must have gone on in the last six months in order to be in the top 2.5%?

After shading in the distribution, it’s clear that the top 2.5% is in the right tail of the distribution, and extends from some z-score and beyond (this means it’s an AREA C).

We need the z-score which has an AREA C closest to .0250 without going over.We find an AREA C which is EXACTLY .0250 for a z-score of 1.96.

We need to turn this into a raw score, in other words, a number of dates.

x_

Raw Score = + z(s)Raw Score = 7.8 + 1.96(4.98)Raw Score = 7.8 + 9.76Raw Score = 17.56 dates

Page 6: Problem Set I: Review

QUESTION 3: A researcher in a learning laboratory believes that the amount of water a rat drinks before entering a maze will affect how well the rat performs in the maze. He records the amount of water consumed by each of his 4 rats (in ounces) and then puts them each into a maze and records how long it takes each rat to complete the maze (in seconds). He then calculates the correlation coefficient between these two variables, which is .48. His data can be found below: Water consumed (oz) Maze Completion Time (sec)

4 72 87 151 12

A. As practice, find the correlation coefficient of this data by hand. Confirm that it does indeed come out to be .48.

B. Write out the equation of the regression line for predicting maze performance from amount of water consumed.

x ywater time x2 y2 xy

4 7 16 49 282 8 4 64 167 15 49 225 1051 12 1 144 12

Sums 14 42 70 482 161

Raw Score Method

ny

ynx

x

nyx

xyr

22

22

442482

41470

44214161

22r

4414824970147161

r

412114

r

34.2914

r

48.r

Page 7: Problem Set I: Review

QUESTION 3: A researcher in a learning laboratory believes that the amount of water a rat drinks before entering a maze will affect how well the rat performs in the maze. He records the amount of water consumed by each of his 4 rats (in ounces) and then puts them each into a maze and records how long it takes each rat to complete the maze (in seconds). He then calculates the correlation coefficient between these two variables, which is .48. His data can be found below: Water consumed (oz) Maze Completion Time (sec)

4 72 87 151 12

A. As practice, find the correlation coefficient of this data by hand. Confirm that it does indeed come out to be .48.

B. Write out the equation of the regression line for predicting maze performance from amount of water consumed.

MEAN 3.50 10.50S 2.65 3.70

x y Zx Zy ZxZywater mazetime

4.00 7.00 0.19 -0.95 -0.182.00 8.00 -0.57 -0.68 0.387.00 15.00 1.32 1.22 1.611.00 12.00 -0.94 0.41 -0.38

Sumxy= 1.43

Page 8: Problem Set I: Review

QUESTION 3: A researcher in a learning laboratory believes that the amount of water a rat drinks before entering a maze will affect how well the rat performs in the maze. He records the amount of water consumed by each of his 4 rats (in ounces) and then puts them each into a maze and records how long it takes each rat to complete the maze (in seconds). He then calculates the correlation coefficient between these two variables, which is .48. His data can be found below: Water consumed (oz) Maze Completion Time (sec)

4 72 87 151 12

A. As practice, find the correlation coefficient of this data by hand. Confirm that it does indeed come out to be .48.

B. Write out the equation of the regression line for predicting maze performance from amount of water consumed.

MEAN 3.50 10.50S 2.65 3.70

Z-score Method

1

nZxZy

r

343.1

r

48.rx y Zx Zy ZxZywater mazetime

4.00 7.00 0.19 -0.95 -0.182.00 8.00 -0.57 -0.68 0.387.00 15.00 1.32 1.22 1.611.00 12.00 -0.94 0.41 -0.38

Sumxy= 1.43

Page 9: Problem Set I: Review

QUESTION 3: A researcher in a learning laboratory believes that the amount of water a rat drinks before entering a maze will affect how well the rat performs in the maze. He records the amount of water consumed by each of his 4 rats (in ounces) and then puts them each into a maze and records how long it takes each rat to complete the maze (in seconds). He then calculates the correlation coefficient between these two variables, which is .48. His data can be found below: Water consumed (oz) Maze Completion Time (sec)

4 72 87 151 12

A. As practice, find the correlation coefficient of this data by hand. Confirm that it does indeed come out to be .48.

B. Write out the equation of the regression line for predicting maze performance from amount of water consumed.

MEAN 3.50 10.50S 2.65 3.70

b = r(sy/sx)b = .48(3.70/2.65)b = .67

a = 10.50 -.67(3.50) a = y – bx

_ _ y = .67x + 8.16

a = 8.16

x y Zx Zy ZxZywater mazetime

4.00 7.00 0.19 -0.95 -0.182.00 8.00 -0.57 -0.68 0.387.00 15.00 1.32 1.22 1.611.00 12.00 -0.94 0.41 -0.38

Sumxy= 1.43

Page 10: Problem Set I: Review

QUESTION 3: A researcher in a learning laboratory believes that the amount of water a rat drinks before entering a maze will affect how well the rat performs in the maze. He records the amount of water consumed by each of his 4 rats (in ounces) and then puts them each into a maze and records how long it takes each rat to complete the maze (in seconds). He then calculates the correlation coefficient between these two variables, which is .48. His data can be found below: Water consumed (oz) Maze Completion Time (sec)

4 72 87 151 12

C. Make a prediction of how long it would take a rat that drank 10oz of water to complete this maze.

D. Write out the equation of the regression line to predict amount of water consumed from time spent to complete the maze.

y = .67x + 8.16 y = .67(10) + 8.16 y = 14.86 seconds

MEAN 3.50 10.50S 2.65 3.70

b = r(sy/sx)b = .48(2.65/3.70)b = .34

a = 3.50 -.34(10.50) a = y – bx

_ _ y = .34x - .07

a = -.07

Page 11: Problem Set I: Review

QUESTION 3: A researcher in a learning laboratory believes that the amount of water a rat drinks before entering a maze will affect how well the rat performs in the maze. He records the amount of water consumed by each of his 4 rats (in ounces) and then puts them each into a maze and records how long it takes each rat to complete the maze (in seconds). He then calculates the correlation coefficient between these two variables, which is .48. His data can be found below: Water consumed (oz) Maze Completion Time (sec)

4 72 87 151 12

E. What kind of relationship exists between water consumption and maze completion speed? Is it better for the rats to have consumed a lot of water prior to entering the maze, or does it hinder their performance?

F. Calculate the coefficient of determination. What does this value tell you about how well you are or are not able to make an accurate prediction using this regression line.

Since there is a positive moderate correlation between water consumption and maze completion, it implies that the more water a rat drinks, the longer it takes to complete the maze. It seems as though it’s better for the rats not to consume a lot of water so their completion time is quicker.

r-squared is (.48)(.48) = .2304, which means there is only 23% of completion time accounted for by water consumed. This is a small amount of variation, telling us perhaps our prediction is not very accurate.

Page 12: Problem Set I: Review

QUESTION 4: Below is a sample of scores on a new version of an IQ test. The range of possible points on this test is 0-100.Name ScoreMaria 78John 90David 50Julia 65Marta 100

A. Calculate the mean, standard deviation, and variance of these scores. (do this by hand, show your work)

B. What is the z-score obtained by Julia, and what does this z-score tell us about her grade?

Name Score (x) x^2Maria 78 6084John 90 8100David 50 2500Julia 65 4225Marta 100 10000

Sums 383 30909

Raw score method: Nx

xSS2

2

5

383309092

SS

514668930909 SS

8.2933730909 SS2.1571SS

Mean = Sx/N

Sx = 383

383/5 = 76.6

Sx Sx2

1

NSSs

42.1571

s

82.19s

8.392s

8.3922 s

Page 13: Problem Set I: Review

QUESTION 4: Below is a sample of scores on a new version of an IQ test. The range of possible points on this test is 0-100.Name ScoreMaria 78John 90David 50Julia 65Marta 100

A. Calculate the mean, standard deviation, and variance of these scores. (do this by hand, show your work)

B. What is the z-score obtained by Julia, and what does this z-score tell us about her grade?

Deviation Method:

Mean = Sx/N

Sx = 383

383/5 = 76.6

1

NSSs

42.1571

s

82.19s

8.392s

8.3922 s

Name Score (x) xbar x-xbar (x-xbar)^2Maria 78 76.6 1.4 1.96John 90 76.6 13.4 179.56David 50 76.6 -26.6 707.56Julia 65 76.6 -11.6 134.56Marta 100 76.6 23.4 547.56

Sum 1571.2

S(x-x)2 aka SS_

z = (x-x)/s_

z = (65-76.6)/19.82z = -.59

Julia’s z-score is negative, indicating she performed worse than average, and specifically .59 standard deviations below average.

Page 14: Problem Set I: Review

QUESTION 4: Below is a sample of scores on a new version of an IQ test. The range of possible points on this test is 0-100.

Suppose you want to know if this IQ test is in any way related to the old IQ test, so you administer a version of the old test to each of these individuals. The following are their scores on the old IQ test:

Name ScoreMaria 110John 130David 70Julia 90Marta 160

C. Is there a relationship between the scores on the old test and the scores on the new test? In other words, does the new test seem to be measuring IQ in the same way? Describe the relationship.

x yNEW OLD x^2 y^2 xy

78 110 6084 12100 858090 130 8100 16900 1170050 70 2500 4900 350065 90 4225 8100 5850

100 160 10000 25600 16000

Sums 383 560 30909 67600 45630

Page 15: Problem Set I: Review

x yNEW OLD x^2 y^2 xy

78 110 6084 12100 858090 130 8100 16900 1170050 70 2500 4900 350065 90 4225 8100 5850

100 160 10000 25600 16000

Sums 383 560 30909 67600 45630

QUESTION 4: Below is a sample of scores on a new version of an IQ test. The range of possible points on this test is 0-100.

Suppose you want to know if this IQ test is in any way related to the old IQ test, so you administer a version of the old test to each of these individuals. The following are their scores on the old IQ test:

Name ScoreMaria 110John 130David 70Julia 90Marta 160

C. Is there a relationship between the scores on the old test and the scores on the new test? In other words, does the new test seem to be measuring IQ in the same way? Describe the relationship.

Raw Score Method

ny

ynx

x

nyx

xyr

22

22

r 45630

(383)(560)5

30900 (383)2

5

67600

(560)2

5

r 45630 42896

30900 29337.8 67600 62720

r 2734

1571.2 4880

r 2734

7667456

r 2734

2769.02

r .99

Page 16: Problem Set I: Review

QUESTION 4: Below is a sample of scores on a new version of an IQ test. The range of possible points on this test is 0-100.

Suppose you want to know if this IQ test is in any way related to the old IQ test, so you administer a version of the old test to each of these individuals. The following are their scores on the old IQ test:

Name ScoreMaria 110John 130David 70Julia 90Marta 160

C. Is there a relationship between the scores on the old test and the scores on the new test? In other words, does the new test seem to be measuring IQ in the same way? Describe the relationship.

Yes, there is a strong positive correlation between the two versions of the test. The higher the score on the old version, the higher the score on the new version, thus it seems that the two tests are measuring IQ the same way.

Page 17: Problem Set I: Review

QUESTION 5: Over the years, my students have informed me that they feel as though I seem to grade paper assignments according to their length. To assess this relationship, I decide to perform a correlational analysis on the number of pages of 12 papers and the grades I assigned to them. I find that the correlation coefficient (r ) is -.90. The following is also known:

Page length GradeSx 89 971Sx2 805 79717

Suppose a student had access to this information and wanted to predict their grade for an upcoming paper. Their paper is 3 pages long. A. Write out the equation of the regression line to predict grade from paper length.

B. Predict the grade for this student whose paper is 3 pages long.C. If someone received a grade of 100 on their paper, predict the number of pages of their

paper (this will involve multiple steps; ie find the equation of the regression line first, then plug in to make a prediction).

Means

The Mean

Nx

42.71289

92.8012971

Page length Grade

7.42 80.92

x y

Page 18: Problem Set I: Review

QUESTION 5: Over the years, my students have informed me that they feel as though I seem to grade paper assignments according to their length. To assess this relationship, I decide to perform a correlational analysis on the number of pages of 12 papers and the grades I assigned to them. I find that the correlation coefficient (r ) is -.90. The following is also known:

Page length GradeSx 89 971Sx2 805 79717

Suppose a student had access to this information and wanted to predict their grade for an upcoming paper. Their paper is 3 pages long. A. Write out the equation of the regression line to predict grade from paper length.

B. Predict the grade for this student whose paper is 3 pages long.C. If someone received a grade of 100 on their paper, predict the number of pages of their

paper (this will involve multiple steps; ie find the equation of the regression line first, then plug in to make a prediction).

Standard Deviation (s)

Means

1289805

2

SS

Page length Grade

12

971797172

SS

127921805 SS

08.660805 SS

92.144SS

1294284179717 SS

08.7857079717 SS

92.1146SS

1

NSSs

Nx

xSS2

2

We know that…

and

1192.144

s11

92.1146s

3.63 10.217.42 80.92

17.13s

63.3s

27.104s

21.10s

x y

Page 19: Problem Set I: Review

QUESTION 5: Over the years, my students have informed me that they feel as though I seem to grade paper assignments according to their length. To assess this relationship, I decide to perform a correlational analysis on the number of pages of 12 papers and the grades I assigned to them. I find that the correlation coefficient (r ) is -.90. The following is also known:

Page length GradeSx 89 971Sx2 805 79717

Suppose a student had access to this information and wanted to predict their grade for an upcoming paper. Their paper is 3 pages long. A. Write out the equation of the regression line to predict grade from paper length.

B. Predict the grade for this student whose paper is 3 pages long.C. If someone received a grade of 100 on their paper, predict the number of pages of their

paper (this will involve multiple steps; ie find the equation of the regression line first, then plug in to make a prediction).

Means 3.63 10.21

7.42 80.92

x y

b = r(sy/sx)b = -.90(10.21/3.63)b =-2.53

a = 80.92 – (-2.53(7.42))a = 99.69

y = -2.53x + 99.69

y = -2.53(3) + 99.69 = 92.1

a = y - bx_ _

Page 20: Problem Set I: Review

QUESTION 5: Over the years, my students have informed me that they feel as though I seem to grade paper assignments according to their length. To assess this relationship, I decide to perform a correlational analysis on the number of pages of 12 papers and the grades I assigned to them. I find that the correlation coefficient (r ) is -.90. The following is also known:

Page length GradeSx 89 971Sx2 805 79717

C. If someone received a grade of 100 on their paper, predict the number of pages of their paper (this will involve multiple steps; ie find the equation of the regression line first, then plug in to make a prediction).

Means 3.63 10.21

7.42 80.92

y x

b = r(sy/sx)b = -.90(3.63/10.21)b = -.32

a = 7.42 – (-.32(80.92))a = 33.30

y = -.32x + 33.30

y = -.32(100) + 33.30 = 1.31 pages

a = y - bx_ _

Page 21: Problem Set I: Review

QUESTION 6: You are collecting IQ data from a sample of 20 of your classmates. You record the following IQ scores:

IQ = {120, 110, 120, 100, 120, 130, 100, 110, 130, 120, 80, 140, 110, 90, 70, 120, 120, 110, 130, 140}

A. Describe the shape of the distribution of IQ scores.

B. Find the Mean, Median, and Mode. Use these values to support your judgment of the distribution’s shape in part A.

The distribution is negatively skewed and unimodal.

The mean of this distribution is 113.5, median and mode are both 120. The fact that the mean is smaller than the median supports the conclusion that the distribution is negatively skewed.