2.5. measures of central tendency. three most common ...2.5. measures of central tendency. three...

10
LECTURE NOTES FOR MATH 21 - TUFTS UNIVERSITY 17 2.5. Measures of Central Tendency. Three most common measures of central tendency: the mean, the median, and the mode. 2.5.1. The arithmetic mean. X i are the data, P X i is the sum of these data, and N is the size of these data. μ = P X i N e.g. the mean of 1, 2, 3, 6, 8 is 20/5=4 2.5.2. The median. The median is the number such that the number of scores above it and below it are the same. In the above table, the median is 20. the median of the numbers 2, 4, 7, 12 is: (4+7) 2 =5.5. When there are numbers with the same values, then the median is equal to the 50th per- centile. 2.5.3. Mode. The mode is the most frequently occurring value. For the data in Table 1, the mode is 18 since more teams had 18 touchdown passes. The mode in the following table is 600+700 2 = 650.

Upload: others

Post on 19-Aug-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 2.5. Measures of Central Tendency. Three most common ...2.5. Measures of Central Tendency. Three most common measures of central tendency: the mean, the median, and the mode. 2.5.1

LECTURE NOTES FOR MATH 21 - TUFTS UNIVERSITY 17

2.5. Measures of Central Tendency. Three most common measures of central tendency:the mean, the median, and the mode.

2.5.1. The arithmetic mean. Xi are the data,P

Xi is the sum of these data, and N is thesize of these data.

µ =

PXi

N

e.g. the mean of 1, 2, 3, 6, 8 is 20/5 = 4

2.5.2. The median. The median is the number such that the number of scores above it andbelow it are the same.

In the above table, the median is 20.

the median of the numbers 2, 4, 7, 12 is: (4+7)2 = 5.5.

When there are numbers with the same values, then the median is equal to the 50th per-centile.

2.5.3. Mode. The mode is the most frequently occurring value. For the data in Table 1, themode is 18 since more teams had 18 touchdown passes.

The mode in the following table is 600+7002 = 650.

Page 2: 2.5. Measures of Central Tendency. Three most common ...2.5. Measures of Central Tendency. Three most common measures of central tendency: the mean, the median, and the mode. 2.5.1

18 Y. LIU

The mean and median are the same when the distribution is symmetric. e.g. 1, 3, 4, 5,6, 7, 9. (What happens if, say, we replace 9 by 10? median does not change, but the meanchanges.) e.g. normal distribution (the bell shape distribution).

2.6. Comparing Measures of Central Tendency. The mean and median are the samewhen the distribution is symmetric. What happens when the data/distribution is not sym-metric.

Example 2.17. Figure 1 shows the distribution of 642 scores on an introductory psychologytest.

A distribution with positive skew is a distribution which has a longer right tail than theleft tail.

Mode: 84.00; Median: 90.00; Mean: 91.58.

Page 3: 2.5. Measures of Central Tendency. Three most common ...2.5. Measures of Central Tendency. Three most common measures of central tendency: the mean, the median, and the mode. 2.5.1

LECTURE NOTES FOR MATH 21 - TUFTS UNIVERSITY 19

When distributions have a positive skew, the mean is typically higher than the median.Explain using the data: 3, 4, 4,4, 5, 5, 5, 5, 6, 7, 8, 9.

Example 2.18. baseball salary.

Example 2.19. What is the mean of 2, 4, 6, and 8? Answer: 5.

Example 2.20. What is the median of -2, 4, 0, 3, and 8? Answer: 3

Example 2.21. What is the mode of -2, 4, 0, 3, 0, 2, 4, 4, and 8? Answer: 4.

Example 2.22. Tom’s test scores on his six tests are 95, 80, 75, 97, 75, 88. Which measureof central tendency would be the highest?

Answer: Mean = 85, Median = 84, Mode = 75, so the mean of his scores is the highest.

Example 2.23. Jane’s test scores on her five tests are 90, 87, 70, 97, and 75. Her teacheris going to take the median of the test grades to calculate her final grade. Jane thinks she canargue and get two points back on some of the tests. Which test score(s) should she argue?

Page 4: 2.5. Measures of Central Tendency. Three most common ...2.5. Measures of Central Tendency. Three most common measures of central tendency: the mean, the median, and the mode. 2.5.1

20 Y. LIU

Answer: If the teacher is going to use the median as the final grade, she should only arguethe middle score (87). Changing the other scores by 2 points would not a↵ect the median.

2.7. Measures of Variability. Variability refers to how “spread out” a group of scores is.

Example 2.24. Following are Quiz 1 scores and Quiz 2 scores.

Quizscore # of people5 26 68 49 37 5

Q:Find the mean score for each quiz. Answer: (5⇥ 2+ 6⇥ 6+ 7⇥ 5+ 8⇥ 4+ 9⇥ 3)/(2+6 + 5 + 4 + 3) = 7.

Both mean scores are 7. But the scores on Quiz 1 are more densely packed and those onQuiz 2 are more spread out. The di↵erences among students were much greater on Quiz 2than on Quiz 1. The mean does not capture this di↵erence.

Page 5: 2.5. Measures of Central Tendency. Three most common ...2.5. Measures of Central Tendency. Three most common measures of central tendency: the mean, the median, and the mode. 2.5.1

LECTURE NOTES FOR MATH 21 - TUFTS UNIVERSITY 21

Definition 2.25. Range: the highest score minus the lowest score. (a simple measure ofvariability). e.g. 99, 45, 23, 67, 45, 91, 82, 78, 62, 51. Range: 76.

Range of quiz 1: 4. Range of quiz 1: 6.

Definition 2.26. Interquartile Range(IQR): the range of the middle 50% of the scores in adistribution. It is computed as follows: IQR = 75th percentile - 25th percentile

Example 2.27. For Quiz 1, the 75th percentile is 8 and the 25th percentile is 6. Theinterquartile range is therefore 2. For Quiz 2, which has greater spread, the 75th percentileis 9, the 25th percentile is 5, and the interquartile range is 4.

2.7.1. Variance. The most used way to measure variability is to calculate the variance. {def.var}

Definition 2.28. The variance from a sample is the mean of squared deviation.

2 =1

N

NX

i=1

(Xi � µ)2.

Here µ is the mean of the sample (not expected value). Note: �

2 is biased. (Biased in thesense that if we repeat this N-step experiment, and calculate �

2 for each experiment, theaverage value of �2 does not approach the variance of the population.)

The unbiased variance is:

s

2 =1

N � 1

NX

i=1

(Xi � µ)2.

The standard deviation is square root of the variance.

Page 6: 2.5. Measures of Central Tendency. Three most common ...2.5. Measures of Central Tendency. Three most common measures of central tendency: the mean, the median, and the mode. 2.5.1

22 Y. LIU

Example 2.29. Score sample: 1, 2, 4, and 5. Then µ = 3. s2 = 10/3. �2 = 10/4 = 2.5.

Example 2.30. Quiz 1:

Quiz 1: �

2 = 1.5. Quiz 2 show that its variance is �

2 = 6.7.

standard deviations of the two quiz distributions 1.225 and 2.588.

Page 7: 2.5. Measures of Central Tendency. Three most common ...2.5. Measures of Central Tendency. Three most common measures of central tendency: the mean, the median, and the mode. 2.5.1

LECTURE NOTES FOR MATH 21 - TUFTS UNIVERSITY 23

2.8. Shapes of Distributions. first explain what positive skew means.

skew is an important index of the shape of a distribution.

2.8.1. Quantify the skew. 1. Pearson’s measure of skew:

3

(mean�median)

e.g. in the baseball salaries example, Pearson’s measure of skew is

3(1, 183, 417� 500, 000)/1, 390, 922 = 1.47.

2. A more commonly used measure of skew:

X (Xi � µ)3

3.

3. Kurtosis (another measure of skew):

X (Xi � µ)4

4� 3.

3. Describing Bivariate Data

Describe the relationship between two variables. e.g. relationship between heights andweights of people. e.g. relationship between high school grade point average and SAT score.

3.1. Introduction.

Example 3.1. Relationship between the ages of a couple.

If we look at the individual distribution:

Page 8: 2.5. Measures of Central Tendency. Three most common ...2.5. Measures of Central Tendency. Three most common measures of central tendency: the mean, the median, and the mode. 2.5.1

24 Y. LIU

The info of the relation is lost.

We can plot the data in the plane to maintain the pairing.

First, it is clear that there is a strong relationship between the husband’s age and the wife’sage: the older the husband, the older the wife.

Second, the points cluster along a straight line. When this occurs, the relationship is calleda linear relationship.

Note: Not all scatter plots show linear relationships.

3.2. Pearson’s r.

Page 9: 2.5. Measures of Central Tendency. Three most common ...2.5. Measures of Central Tendency. Three most common measures of central tendency: the mean, the median, and the mode. 2.5.1

LECTURE NOTES FOR MATH 21 - TUFTS UNIVERSITY 25

Definition 3.2. The covariance of two variables (Pearson’s r):

r =

P(Xi � µX)(Yi � µY )p

(P

(Xi � µX)2)(P

(Yi � µY )2)=

PXiYi � 1

NµXµYq(P

X

2i � 1

2X)(

PY

2i � 1

2Y )

.

Example 3.3. Calculate the covariance of two variables.

In the above table xi = Xi � µX and yi = Yi � µY .

r =30p16 · 60

= 0.968.

a measure of the strength of the linear relationship between two variables.

symmetric. range: -1 to 1. r close to -1 indicates a negative linear relationship betweenvariables. r close to 1 indicates a positive linear relationship between variables. r close tozero indicates no linear relationship.

Relationship between age and sleep. Do they have a linear relationship?

Page 10: 2.5. Measures of Central Tendency. Three most common ...2.5. Measures of Central Tendency. Three most common measures of central tendency: the mean, the median, and the mode. 2.5.1

26 Y. LIU