statistics lecture notes dr. halil İbrahim cebecİ chapter 03 numerical descriptive techniques

38
Statistics Lecture Notes Dr. Halil İbrahim CEBECİ Chapter 03 Numerical Descriptive Techniques

Upload: dayna-maxwell

Post on 19-Jan-2016

232 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistics Lecture Notes Dr. Halil İbrahim CEBECİ Chapter 03 Numerical Descriptive Techniques

Statistics Lecture Notes

Dr. Halil İbrahim CEBECİ

Chapter 03Numerical

Descriptive Techniques

Page 2: Statistics Lecture Notes Dr. Halil İbrahim CEBECİ Chapter 03 Numerical Descriptive Techniques

Measures of Central Location Mean, Median, Mode

Measures of Variability Range, Standard Deviation, Variance, Coefficient of

Variation

Measures of Relative Standing Percentiles, Quartiles

Measures of Linear Relationship Covariance, Correlation, Least Squares Line

Numerical Descriptive Techniques

Statistics Lecture Notes – Chapter 03

Page 3: Statistics Lecture Notes Dr. Halil İbrahim CEBECİ Chapter 03 Numerical Descriptive Techniques

Arithmetic Mean

It is computed by simply adding up all the observations and dividing by the total number of observations

Measures of Central Location

Statistics Lecture Notes – Chapter 03

N

xN

ii

1𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛𝑀𝑒𝑎𝑛 :

𝑆𝑎𝑚𝑝𝑙𝑒𝑀𝑒𝑎𝑛 :n

xx

n

ii

1

is seriously affected by extreme values called “outliers”.

E.g. as soon as a billionaire moves into a neighborhood, the average household income increases beyond what it was previously

Page 4: Statistics Lecture Notes Dr. Halil İbrahim CEBECİ Chapter 03 Numerical Descriptive Techniques

Ex3.1 – Weights of the students of a classroom in elementary school are given below. Calculate the arithmetic mean. If a new student with weight of 140lbs came in, what the new mean of the classroom would be. Discuss the validity of results.

Arithmetic Mean

Statistics Lecture Notes – Chapter 03

89 77 90 101 66 66 76 59 64 75

88 65 75 72 70 64 66 68 82 80

N

xN

ii

1 ¿89+77+90+…+80

20=74,65

¿89+77+90+…+80+140

21=77 ,77

Page 5: Statistics Lecture Notes Dr. Halil İbrahim CEBECİ Chapter 03 Numerical Descriptive Techniques

Median

The value that falls in the middle of the pre-ordered (ascending or descending) observation list

Where there is an even number of observations, the median is determined by averaging the two observation in the middle.

Measures of Central Location

Statistics Lecture Notes – Chapter 03

Page 6: Statistics Lecture Notes Dr. Halil İbrahim CEBECİ Chapter 03 Numerical Descriptive Techniques

Ex3.2 - Find the median of the value of Ex3.1 for two different situation.

Measures of Central Location

Statistics Lecture Notes – Chapter 03

59 64 64 65 66 66 66 68 70 72

75 75 76 77 80 82 88 89 90 101 140

𝑀𝑒𝑑𝑖𝑎𝑛  21=75

Page 7: Statistics Lecture Notes Dr. Halil İbrahim CEBECİ Chapter 03 Numerical Descriptive Techniques

Mode

Observation that occurs with the greatest frequency.

There are several problems with using the mode as a measure of central location

In a small sample it may not be a very good measure It may not unique

Measures of Central Location

Statistics Lecture Notes – Chapter 03

Page 8: Statistics Lecture Notes Dr. Halil İbrahim CEBECİ Chapter 03 Numerical Descriptive Techniques

Ex3.3 - Find the mode of the value of Ex3.1

Mode

Statistics Lecture Notes – Chapter 03

89 77 90 101 66 66 76 59 64 75

88 65 75 72 70 64 66 68 82 80

Page 9: Statistics Lecture Notes Dr. Halil İbrahim CEBECİ Chapter 03 Numerical Descriptive Techniques

To eliminate the effect of the outliers in the sample with contains similar value, we can use geometric mean

Ex3.4 – The final exam scores of a master class are given below. Calculate the arithmetic and geometric mean

Geometric Mean

Statistics Lecture Notes – Chapter 03

45 37 40 30 35 45 50 95

Page 10: Statistics Lecture Notes Dr. Halil İbrahim CEBECİ Chapter 03 Numerical Descriptive Techniques

Measures of central location fail to tell the whole story about the distribution; that is, how much are the observations spread out around the mean value?

Measures of Variability

Statistics Lecture Notes – Chapter 03

For example, two sets of class grades are shown. The mean (=50) is the same in each case…

But, the red class has greater variability than the blue class.

Page 11: Statistics Lecture Notes Dr. Halil İbrahim CEBECİ Chapter 03 Numerical Descriptive Techniques

Range:

The range is the simplest measure of variability, calculated as:

Advantage : SimplicityDisadvantage : Simplicity

Set 1 : 4, 4, 4, 4, 4, 50 Set 2 : 4, 8, 15, 24, 39, 50

Measures of Variability

Statistics Lecture Notes – Chapter 03

Page 12: Statistics Lecture Notes Dr. Halil İbrahim CEBECİ Chapter 03 Numerical Descriptive Techniques

Variance:

A measure of the average distance between each of a set of data points and their mean value.

Variance is the difference between what is expected (Budget) and the actuals (Expenditure). it is the difference between "should take" and "did take".

Measures of Variability

Statistics Lecture Notes – Chapter 03

N

xN

ii

1

2

2

)(

N

xN

ii

1

2

2

)(

𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 :

𝑆𝑎𝑚𝑝𝑙𝑒𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 :

Page 13: Statistics Lecture Notes Dr. Halil İbrahim CEBECİ Chapter 03 Numerical Descriptive Techniques

Ex3.6 – The following are the number of summer jobs a sample of six student applied for. Find the mean and variance of these data.

Variance

Statistics Lecture Notes – Chapter 03

17 15 23 7 9 13

n

xx

n

ii

1 ¿17+15+23+7+9+13

6=14

1

)(1

2

2

n

xxs

n

ii

¿(17−14)2+(15−14)2+…+(13−14)2

6−1

¿1665

=33.2

Page 14: Statistics Lecture Notes Dr. Halil İbrahim CEBECİ Chapter 03 Numerical Descriptive Techniques

Standart Deviation:

shows how much variation or "dispersion" exists from the average (mean, or expected value). A low standard deviation indicates that the data points tend to be very close to the mean, whereas high standard deviation indicates that the data points are spread out over a large range of values

Measures of Variability

Statistics Lecture Notes – Chapter 03

𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛

𝜎=√𝜎2

𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛𝑠=√𝑠2

Page 15: Statistics Lecture Notes Dr. Halil İbrahim CEBECİ Chapter 03 Numerical Descriptive Techniques

Ex3.7 – Find the standard deviation of the values of Ex3.6

Standard Deviation

Statistics Lecture Notes – Chapter 03

17 15 23 7 9 13

1

)(1

2

2

n

xxs

n

ii

¿33.2 𝑠=√𝑠2=√33.6=5.8

Page 16: Statistics Lecture Notes Dr. Halil İbrahim CEBECİ Chapter 03 Numerical Descriptive Techniques

Interpreting the Standard Deviation

Statistics Lecture Notes – Chapter 03

Approximately 68% of all observations fallwithin one standard deviation of the mean.

Approximately 95% of all observations fall

within two standard deviations of the mean.

Approximately 99.7% of all observations fallwithin three standard deviations of the mean

If the histogram is bell-shaped

Page 17: Statistics Lecture Notes Dr. Halil İbrahim CEBECİ Chapter 03 Numerical Descriptive Techniques

A more general interpretation of the standard deviation is derived from Chebysheff’s Theorem, which applies to all shapes of histograms (not just bell shaped).

The proportion of observations in any sample that liewithin standard deviations of the mean is at least:

Chebysheff’s Theorem

Statistics Lecture Notes – Chapter 03

1−1

𝑘2𝑓𝑜𝑟 𝑘>1

Page 18: Statistics Lecture Notes Dr. Halil İbrahim CEBECİ Chapter 03 Numerical Descriptive Techniques

Ex3.8 - A professor has announced that the grades on a statistics exam have a mean value of 72 and astandard deviation of 6. Not knowing anything about the shape of the distribution of grades, what canwe say about the proportion of grades that are between:

a. 66 and 78?b. 60 and 84?c. 54 and 90?

A3.8a(66, 78) = (72 - 6, 72 + 6) we observe that k = 1. With k = 1, Chebyshev’s theorem states that at least

of the grades fall within the interval (66, 78).

Chebysheff’s Theorem

Statistics Lecture Notes – Chapter 03

Page 19: Statistics Lecture Notes Dr. Halil İbrahim CEBECİ Chapter 03 Numerical Descriptive Techniques

A3.8b(60, 84) = (72 - 12, 72 + 12) we observe that k = 2. With k = 2, Chebyshev’s theorem states that at least

of the grades fall within the interval (60, 84).

A3.8c(54, 90) = (72 - 18, 72 + 18) we observe that k = 3. With k = 3, Chebyshev’s theorem states that at least

of the grades fall within the interval (54, 90).

Chebysheff’s Theorem

Statistics Lecture Notes – Chapter 03

Page 20: Statistics Lecture Notes Dr. Halil İbrahim CEBECİ Chapter 03 Numerical Descriptive Techniques

Ex3.9 – if we know that the distribution of the values in Ex3.8 is bell-shaped, new answers are

a. 66 and 78? Approximately 68% of the grades fall between this

interval (1 standard deviation) b. 60 and 84?

Approximately 95% of the grades fall between this interval (2 standard deviation)

c. 54 and 90? Approximately 99.7% of the grades fall between this

interval (3 standard deviation)

Chebysheff’s Theorem

Statistics Lecture Notes – Chapter 03

Page 21: Statistics Lecture Notes Dr. Halil İbrahim CEBECİ Chapter 03 Numerical Descriptive Techniques

Percentile:

The th percentile is the value for which percent are less than that value and ()% are greater than that value.

Measures of Relative Standing

Statistics Lecture Notes – Chapter 03

Page 22: Statistics Lecture Notes Dr. Halil İbrahim CEBECİ Chapter 03 Numerical Descriptive Techniques

Ex3.10 – The weights in pounds of a group of workers are as follows:

Measures of Relative Standing

Statistics Lecture Notes – Chapter 03

173 165 171 175 188183 177 160 151 169162 179 145 171 175168 158 186 182 162154 180 164 166 157

a. Find the 25th percentile of the weightsb. Find the 50th percentile of the weights.c. Find the 75th percentile of the weights.

Page 23: Statistics Lecture Notes Dr. Halil İbrahim CEBECİ Chapter 03 Numerical Descriptive Techniques

A3.10a

Measures of Relative Standing

Statistics Lecture Notes – Chapter 03

𝑄1=160+162

2=161

A3.10b

𝑄2=𝑀𝑒𝑑𝑖𝑎𝑛=169

A3.10c

𝑄3=177+179

2=178

Range Weights Range Weights

1 145 14 171

2 151 15 171

3 154 16 173

4 157 17 175

5 158 18 175

6 160 19 177

7 162 20 179

8 162 21 180

9 164 22 182

10 165 23 183

11 166 24 186

12 168 25 188

13 169

Page 24: Statistics Lecture Notes Dr. Halil İbrahim CEBECİ Chapter 03 Numerical Descriptive Techniques

A box plot is a graphical summary of quantitative data. A box plot indicates what the two extreme values of a data set are, where the data are centered, and how spread out the data are.

It does this by plotting the values of five descriptive statistics of the data: the smallest value () lower quartile () median() upper quartile() largest value ()

Box Plots

Statistics Lecture Notes – Chapter 03

Page 25: Statistics Lecture Notes Dr. Halil İbrahim CEBECİ Chapter 03 Numerical Descriptive Techniques

Interquartile Range:

The interquartile range measures the spread of the middle 50% of the observations.

Whiskers:

The line extending to the left and right.The Whiskers extend outward to the smaller of 1.5 times the IQR or to the most extreme point that is not an outlier.

Box Plots

Statistics Lecture Notes – Chapter 03

Page 26: Statistics Lecture Notes Dr. Halil İbrahim CEBECİ Chapter 03 Numerical Descriptive Techniques

Ex3.11 – Use the weights in pounds of a group of workers in Ex3.10 and,Construct a box plot for these weights. Compute the interquartile range and identify any outliers.

A3.11 – The first step in constructing a box plot is to rank the data, in order to determine the numericalvalues of the five descriptive statistics to be plotted.

Box Plots

Statistics Lecture Notes – Chapter 03

Descriptive Statistics Numerical Value

145

161

169

178

188

17

135.5 and 203,5

* There is no outlier

Page 27: Statistics Lecture Notes Dr. Halil İbrahim CEBECİ Chapter 03 Numerical Descriptive Techniques

Wendy’s service time is shortest and least variable

Hardee’s has the greatest variability,

Jack-in-the-Box has the longest service times.

Box Plots

Statistics Lecture Notes – Chapter 03

Page 28: Statistics Lecture Notes Dr. Halil İbrahim CEBECİ Chapter 03 Numerical Descriptive Techniques

Covariance:

a measure of how much two variables change together

Measures of Linear Relationship

Statistics Lecture Notes – Chapter 03

N

yxN

iyixi

xy

1

))((

1

))((1

n

yyxxs

n

iii

xy

𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛𝐶𝑜𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒=¿

𝑆𝑎𝑚𝑝𝑙𝑒𝐶𝑜𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒=¿

Page 29: Statistics Lecture Notes Dr. Halil İbrahim CEBECİ Chapter 03 Numerical Descriptive Techniques

Coefficient of Correlation:

Correlation is a scaled version of covariance that takes on values in [−1,1] with a correlation of ±1 indicating perfect linear association and 0 indicating no linear relationship.

Measures of Linear Relationship

Statistics Lecture Notes – Chapter 03

𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡𝑜𝑓 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛=¿

𝑆𝑎𝑚𝑝𝑙𝑒𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡𝑜𝑓 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛=¿

𝜌=𝜎𝑥𝑦

𝜎 𝑥𝜎 𝑦

𝑟=𝑆𝑥𝑦

𝑆𝑥𝑆 𝑦

Page 30: Statistics Lecture Notes Dr. Halil İbrahim CEBECİ Chapter 03 Numerical Descriptive Techniques

Ex3.12 - Based on the sample of data shown below. Measure how these two variables are related by computing their covariance and coefficient of correlation.

Measures of Linear Relationship

Statistics Lecture Notes – Chapter 03

Years of education () Income ()

11 25

12 33

11 22

15 41

8 18

10 28

11 32

12 24

17 53

11 26

Page 31: Statistics Lecture Notes Dr. Halil İbrahim CEBECİ Chapter 03 Numerical Descriptive Techniques

Measures of Linear Relationship

Statistics Lecture Notes – Chapter 03

() ()

11 25 - 0.8 0.64 -5,2 27.04 4.16

12 33 0.2 0.04 2,8 7.84 0.56

11 22 -0.8 0.64 -8,2 67.24 6.56

15 41 3.2 10.24 10,8 116.64 34.56

8 18 -3.8 14.44 -12,2 148.84 46.36

10 28 -1.8 3.24 -2,2 4.84 3.96

11 32 -0.8 0.64 1,8 3.24 -1.44

12 24 0.2 0.04 -6,2 38.44 -1.24

17 53 5.2 27.04 22,8 519.84 118.56

11 26 -0.8 0.64 -4,2 17.64 3.36

=57.6 =951.8 =215.4

Page 32: Statistics Lecture Notes Dr. Halil İbrahim CEBECİ Chapter 03 Numerical Descriptive Techniques

Measures of Linear Relationship

Statistics Lecture Notes – Chapter 03

1

)(1

2

n

xxs

n

ii

x ¿√ 57.69 =2.53

𝑟=𝑠𝑥𝑦𝑠𝑥 𝑠 𝑦

= 23.932.53∗10.28

=0.92

1

)(1

2

n

yys

n

ii

y ¿√ 951.89 =10.28

1

))((1

n

yyxxs

n

iii

xy ¿215.49

=23.93

Page 33: Statistics Lecture Notes Dr. Halil İbrahim CEBECİ Chapter 03 Numerical Descriptive Techniques

Least Square Method:

Produces a straight line drawn through the points so that the sum of squared deviations between the points and the line is minimized.

Measures of Linear Relationship

Statistics Lecture Notes – Chapter 03

�̂�=𝑏0+𝑏1𝑥

21x

xy

s

sb

𝑏0= �̂�−𝑏1𝑥

Page 34: Statistics Lecture Notes Dr. Halil İbrahim CEBECİ Chapter 03 Numerical Descriptive Techniques

Ex3.13 - Find the least squares (regression) line for the data in Ex3.12 using the shortcut formulas for the coefficients.

Measures of Linear Relationship

Statistics Lecture Notes – Chapter 03

�̂�=−13.93+3.74 𝑥

21x

xy

s

sb

𝑏0=30.2−3.74∗11.8=−13.93

¿23.93

2.532=3.74

Page 35: Statistics Lecture Notes Dr. Halil İbrahim CEBECİ Chapter 03 Numerical Descriptive Techniques

Q3.1 - Consider the following sample of measurements

Compute each of the following:

a. the meanb. the medianc. the mode

Exercises

Statistics Lecture Notes – Chapter 03

37 32 30 28 30 32 35 28 32 29

Page 36: Statistics Lecture Notes Dr. Halil İbrahim CEBECİ Chapter 03 Numerical Descriptive Techniques

Q3.2 - Consider the following sample of data

Compute each of the following for this sample:

a. the meanb. the rangec. the varianced. the standard deviation

Exercises

Statistics Lecture Notes – Chapter 03

17 25 18 14 28 21

Page 37: Statistics Lecture Notes Dr. Halil İbrahim CEBECİ Chapter 03 Numerical Descriptive Techniques

Q3.3 - Consider the following sample of data,

a. Construct a box plot for these datab. Compute the interquartile range and identify any

outliers

Exercises

Statistics Lecture Notes – Chapter 03

208 160 175 334 228 211 179 354

265 215 191 239 298 226 220 260

173 163 226 165 252 422 284 232

225 348 290 180 300 200 245 204

256 281 230 275 158 224 315 217

Page 38: Statistics Lecture Notes Dr. Halil İbrahim CEBECİ Chapter 03 Numerical Descriptive Techniques

Q3.4 – based on the sample of data shown below,

a. Calculate Covarianceb. Calculate the coefficient of Correlationc. Find the equation of least square line

Exercises

Statistics Lecture Notes – Chapter 03

Number of Ads ()

Number of Customers ()

5 528

12 876

8 653

6 571

4 556

15 1058

10 963

7 719