probabilistic & statistical techniques eng. tamer eshtawi first semester 2007-2008 eng. tamer...

41
Probabilistic & Statistical Techniques Eng. Tamer Eshta wi First Semester 2007- 2008

Upload: juliette-babbit

Post on 14-Dec-2015

226 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester 2007-2008 Eng. Tamer Eshtawi First Semester 2007-2008

Probabilistic & Statistical Techniques

Probabilistic & Statistical Techniques

Eng. Tamer Eshtawi

First Semester 2007-2008

Eng. Tamer Eshtawi

First Semester 2007-2008

Page 2: Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester 2007-2008 Eng. Tamer Eshtawi First Semester 2007-2008

Lecture 5

Chapter 2 (part 3)

Statistics for Describing

DataMain Reference: Pearson

Education, Inc Publishing as Pearson Addison-Wesley.

Page 3: Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester 2007-2008 Eng. Tamer Eshtawi First Semester 2007-2008

Section 3-4Measures of position

Page 4: Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester 2007-2008 Eng. Tamer Eshtawi First Semester 2007-2008

Key Concept

This section introduces measures that can be used to compare values from different data sets, or to compare values within the same data set. The most important of these is the concept of the z score.

Page 5: Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester 2007-2008 Eng. Tamer Eshtawi First Semester 2007-2008

z Score (or standardized value)the number of standard

deviations that a given value x is above or below the mean

Definition

Page 6: Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester 2007-2008 Eng. Tamer Eshtawi First Semester 2007-2008

Sample

x

z

Population

Round z to 2 decimal places

Measures of Position z score

s

xxz

Page 7: Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester 2007-2008 Eng. Tamer Eshtawi First Semester 2007-2008

Interpreting Z Scores

Whenever a value is less than the mean, its corresponding z score is negative

Ordinary values: z score between –2 and 2 Unusual Values: z score < -2 or z score > 2

Page 8: Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester 2007-2008 Eng. Tamer Eshtawi First Semester 2007-2008

Definition

Q1 (First Quartile) separates the bottom 25% of sorted values from the top 75%. Q2 (Second Quartile) same as the median; separates the bottom 50% of sorted values from the top 50%.

Q1 (Third Quartile) separates the bottom 75% of sorted values from the top 25%.

Page 9: Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester 2007-2008 Eng. Tamer Eshtawi First Semester 2007-2008

Q1, Q2, Q3 divide ranked scores into four equal

parts

Quartiles

25% 25% 25% 25%

Q3Q2Q1(minimum) (maximum)

(median)

Page 10: Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester 2007-2008 Eng. Tamer Eshtawi First Semester 2007-2008

Find lower & upper Quartile

To fined Q1, first calculate one-quarter of n and add ½ to obtain ¼ n + ½ . Round this to nearest integer.

Example 1 1 1 2 3 3 8 11 14 19 19 20

n = 11,then ¼ n + ½ = ¼ (11)+½ = 3.25 rounded off to 3Q1 = 2Q3 = 19

Example 2 2 5 5 6 7 10 15 21 21 23 23 25

n = 12,then ¼ n + ½ = ¼ (12)+½ = 3.5 then the Q1 in position 3 & 4 which is (5+6)/2=5.5

Q2 in position 9 & 10 which is (21+23)/2=22

Page 11: Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester 2007-2008 Eng. Tamer Eshtawi First Semester 2007-2008

Percentiles

Just as there are three quartiles separating data into four parts, there are 99 percentiles denoted P1, P2, . . . P99, which partition the data into 100 groups.

Percentile of value x = • 100number of values less than x

total number of values

Page 12: Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester 2007-2008 Eng. Tamer Eshtawi First Semester 2007-2008

n total number of values in the data set

k percentile being used

Notation

Converting from the kth Percentile to the Corresponding Data Value

nK

100

P oflocation k

Page 13: Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester 2007-2008 Eng. Tamer Eshtawi First Semester 2007-2008

Find the percentile corresponding the weight of 0.8143& find P10, P25

Example 1

81465.02

0.8150.8143 10&9 936

100

25

0.8073 4 6.336100

10

2210036

80.8143 of percentil

thth25

th10

22

betweenP

P

P

Solution

Page 14: Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester 2007-2008 Eng. Tamer Eshtawi First Semester 2007-2008

Interquartile Range (or IQR): Q3 - Q1

10 - 90 Percentile Range: P90 - P10

Semi-interquartile Range:2

Q3 - Q1

Midquartile:2

Q3 + Q1

Some Other Statistics

Page 15: Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester 2007-2008 Eng. Tamer Eshtawi First Semester 2007-2008

Recap

In this section we have discussed:

z Scores

z Scores and unusual values

Quartiles

Percentiles

Other statistics

Page 16: Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester 2007-2008 Eng. Tamer Eshtawi First Semester 2007-2008

Section 3-5Exploratory Data Analysis (EDA)

Page 17: Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester 2007-2008 Eng. Tamer Eshtawi First Semester 2007-2008

This section discusses outliers, then introduces a new statistical graph called a boxplot, which is helpful for visualizing the distribution of data.

Key Concept

Page 18: Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester 2007-2008 Eng. Tamer Eshtawi First Semester 2007-2008

Important Principles

An outlier can have a dramatic effect on the mean.

An outlier can have a dramatic effect on the standard deviation.

An outlier can have a dramatic effect on the scale of the histogram so that the true nature of the distribution is totally obscured.

Page 19: Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester 2007-2008 Eng. Tamer Eshtawi First Semester 2007-2008

For a set of data, the 5-number summary consists of the minimum value; the first quartile Q1; the median (or second quartile Q2); the third quartile, Q3; and the maximum value.

A boxplot is a graph of a data set that consists of a line extending from the minimum value to the maximum value, and a box with lines drawn at the first quartile, Q1; the median; and the third quartile, Q3.

Definitions

Page 20: Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester 2007-2008 Eng. Tamer Eshtawi First Semester 2007-2008

Boxplots

Page 21: Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester 2007-2008 Eng. Tamer Eshtawi First Semester 2007-2008

Boxplots – cont.

Page 22: Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester 2007-2008 Eng. Tamer Eshtawi First Semester 2007-2008

Boxplots – cont.

Page 23: Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester 2007-2008 Eng. Tamer Eshtawi First Semester 2007-2008

Boxplots – cont.

Page 24: Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester 2007-2008 Eng. Tamer Eshtawi First Semester 2007-2008

Boxplots - Example

Page 25: Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester 2007-2008 Eng. Tamer Eshtawi First Semester 2007-2008

Recap

In this section we have looked at:

Exploratory Data Analysis

Effects of outliers

5-number summary

Boxplots

Page 26: Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester 2007-2008 Eng. Tamer Eshtawi First Semester 2007-2008

General Examples

Page 27: Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester 2007-2008 Eng. Tamer Eshtawi First Semester 2007-2008

Example 1

1612

192

n

xx

Fine mean, median, mode, midrange

x27817111525161414141318

192

x81113141414151617182527

5.172

278

14

5.142

1514

Midrange

Mode

Median

Solution

Page 28: Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester 2007-2008 Eng. Tamer Eshtawi First Semester 2007-2008

Example 2

0)5(6

9144.4)025.4(6

)1(

2

22

s

nn

xxns

Fine Standard deviation, variance for each of the two sample

x x2

0.8192 0.67110.815 0.66420.8163 0.66630.8211 0.67420.8181 0.66930.8247 0.68014.9144 4.025

Coke

x x2

0.8258 0.68190.8156 0.66520.8211 0.67420.817 0.66750.8216 0.6750.8302 0.68924.9313 4.053

Pepsi

32

22

1006.3)5(6

9313.4)053.4(6

)1(

s

nn

xxns

Page 29: Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester 2007-2008 Eng. Tamer Eshtawi First Semester 2007-2008

Example 3

xz

62.0

20.989.2

62.0

2.98100

100 )

z

xa

262.0

2.9896.96

96.96 )

z

xb

062.0

2.982.98

2.98 )

z

xc

Page 30: Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester 2007-2008 Eng. Tamer Eshtawi First Semester 2007-2008

Example 4Fine the indicated quartile or percentilea) Q1, b) Q3, c) P80, d) P33

Q1 position = ¼ n + ½ = ¼ (36)+½ = 9.5 (between 9th – 10th)

Q1= ( 0.8143+0.815 )/2=0.8147

Q3= ( 0.8207+0.8211 )/2=0.8209

Page 31: Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester 2007-2008 Eng. Tamer Eshtawi First Semester 2007-2008

8229.0 12 88.1136100

33

0.8152 92 8.2836100

80

th25

th80

P

P

Page 32: Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester 2007-2008 Eng. Tamer Eshtawi First Semester 2007-2008

Example 5Draw the boxplot for the following data set

-1 0 0 0 0 0 0 0 0 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

3 3 3 3 3 3 3 3 4 4 4 4 5 5 5 5 5 6 6 7 13 Sum = 139

673.252

139

n

xx

Solution

Page 33: Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester 2007-2008 Eng. Tamer Eshtawi First Semester 2007-2008

13 valuemaximum

1- valueminimum

5.32

43

214&13

5.1352 oflocation

2

22

22

3

114

21

41

21

41

1

Q

Qbetween

nQ

Mode

Median

th

Page 34: Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester 2007-2008 Eng. Tamer Eshtawi First Semester 2007-2008

Flash points

Page 35: Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester 2007-2008 Eng. Tamer Eshtawi First Semester 2007-2008

Which measure of center is the only one that can be used with data at the nominal level of measurement?

A. Mean

B. Median

C. Mode

Page 36: Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester 2007-2008 Eng. Tamer Eshtawi First Semester 2007-2008

Which of the following measures of center is not affected by outliers?

A. Mean

B. Median

C. Mode

Page 37: Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester 2007-2008 Eng. Tamer Eshtawi First Semester 2007-2008

Find the mode (s) for the given sample data.

79, 25, 79, 13, 25, 29, 56, 79

A. 79

B. 48.1

C. 42.5

D. 25

Page 38: Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester 2007-2008 Eng. Tamer Eshtawi First Semester 2007-2008

Which is not true about the variance?

A. It is the square of the standard deviation.

B. It is a measure of the spread of data.

C. The units of the variance are different from the units of the original data set.

D. It is not affected by outliers.

Page 39: Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester 2007-2008 Eng. Tamer Eshtawi First Semester 2007-2008

Weekly sales for a company are $10,000 with a standard deviation of $450. Sales for the past week were $9050. This is

A. Unusually high.

B. Unusually low.

C. About right.

Page 40: Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester 2007-2008 Eng. Tamer Eshtawi First Semester 2007-2008

In a data set with a range of 55.1 to 102.8 and 300 observations, there are 207 data points with values less than 88.6. Find the percentile for 88.6.

A. 32

B. 116.03

C. 69

D. 670

Page 41: Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester 2007-2008 Eng. Tamer Eshtawi First Semester 2007-2008

H.W 2Fine mean, median, mode, midrange, range, standard deviation, variance, P30Then draw the Boxplot

Age of US President