statistics 100 lecture set 6

50
Statistics 100 Lecture Set 6

Upload: vondra

Post on 23-Feb-2016

29 views

Category:

Documents


0 download

DESCRIPTION

Statistics 100 Lecture Set 6. Re-cap. Last day, looked at a variety of plots For categorical variables, most useful plots were bar charts and pie charts Looked at time plots for quantitative variables Key thing is to be able to quickly make a point using graphical techniques. Re-cap. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Statistics  100 Lecture Set  6

Statistics 100Lecture Set 6

Page 2: Statistics  100 Lecture Set  6

Re-cap

• Last day, looked at a variety of plots

• For categorical variables, most useful plots were bar charts and pie charts

• Looked at time plots for quantitative variables

• Key thing is to be able to quickly make a point using graphical techniques

Page 3: Statistics  100 Lecture Set  6

Re-cap

• Recall:

– A distribution of a variable tells us what values it takes on and how often it takes these values.

Page 4: Statistics  100 Lecture Set  6

Histograms

• Similar to a bar chart, would like to display main features of an empirical distribution (or data set)

• Histogram– Essentially a bar chart of values of data – Usually grouped to reduce “jitteriness” of picture– Groups are sometimes called “bins”

Page 5: Statistics  100 Lecture Set  6

Histogram

• Uses rectangles to show number (or percentage) of values in intervals

• Y-axis usually displays counts or percentages• X-Axis usually shows intervals• Rectangles are all the same width

6 8 10 12 14 16

02

46

8

Data

Cou

nts

Page 6: Statistics  100 Lecture Set  6

Example (discrete data)

• In a study of productivity, a large number of authors were classified according to the number of articles they published during a particular period of time.

Number of articles

Frequency

1 784 2 204 3 127 4 50 5 33 6 28 7 19 8 19 9 6

10 7 11 6 12 7 13 4 14 4 15 5 16 3 17 3

Page 7: Statistics  100 Lecture Set  6

Example (discrete data)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

020

040

060

080

0

Number of Articles

Cou

nt

Page 8: Statistics  100 Lecture Set  6

Example (continuous data)

• Experiment was conducted to investigate the muzzle velocity of a anti-personnel weapon (King, 1992)

• Sample of size 16 was taken and the muzzle velocity (MPH) recorded

279.9 294.1 260.7 277.9

285.5 298.6 265.1 305.3

264.9 262.9 256.0 255.3

276.3 278.2 283.6 250.0

Page 9: Statistics  100 Lecture Set  6

Constructing a Histogram – continuous data

• Find minimum and maximum values of the data

• Divide range of data into non-overlapping intervals of equal length

• Count number of observations in each interval

• Height of rectangle is number (or percentage) of observations falling in the interval

• How many categories?

Page 10: Statistics  100 Lecture Set  6

Example

• Experiment was conducted to investigate the muzzle velocity of a anti-personnel weapon (King, 1992)

• Sample of size 16 was taken and the muzzle velocity (MPH) recorded

279.9 294.1 260.7 277.9

285.5 298.6 265.1 305.3

264.9 262.9 256.0 255.3

276.3 278.2 283.6 250.0

Page 11: Statistics  100 Lecture Set  6

• What are the minimum and maximum values?

• How do we divide up the range of data?

• What happens if have too many intervals?

• Too Few intervals?

• Suppose have intervals from 240-250 and 250-260. In which interval is the data point 250 included?

Page 12: Statistics  100 Lecture Set  6

Intervals Frequency[240,250)[250,260)[260,270)[270,280)[280,290)[290,300)[300,310)

Page 13: Statistics  100 Lecture Set  6

Histogram of Muzzle Velocity

240 260 280 300 320

01

23

4

Muzzle Velocity (MPH)

Freq

uenc

y

Page 14: Statistics  100 Lecture Set  6

Interpreting histograms

• Gives an idea of:– Location of centre of the distribution

– How spread are the data

– Shape of the distribution• Symmetric• Skewed left• Skewed right

• Unimodal• Bimodal• Multimodal

– Outliers• Striking deviations from the overall pattern

Page 15: Statistics  100 Lecture Set  6

Example – mid-term 1 grades (2011)

• Was out of 34 + a bonus question (n=344)

Page 16: Statistics  100 Lecture Set  6

Example – mid-term 1 grades (2011)

• Too many bins?

Page 17: Statistics  100 Lecture Set  6

Example – mid-term 1 grades (2011)

• Too few bins

Page 18: Statistics  100 Lecture Set  6

Example – mid-term 1 grades (2011)

• Potential outlier?

Page 19: Statistics  100 Lecture Set  6

60 70 80 90 100

0

2

4

6

8

10

12

14

Exam 1 Grades

Count

Exam Grades from a Stats Class

Page 20: Statistics  100 Lecture Set  6

60 70 80 90 100

0

2

4

6

8

10

Exam 1 Grades

Count

Exam Grades for Undergraduates

Example

Page 21: Statistics  100 Lecture Set  6

60 70 80 90 100

0

2

4

6

8

Exam 1 Grades

Count

Exam Grades for Graduate Students

Example

Page 22: Statistics  100 Lecture Set  6

Numerical Summaries (Chapter 12)

• Graphic procedures visually describe data

• Numerical summaries can quickly capture main features

Page 23: Statistics  100 Lecture Set  6

Measures of Center

• Have sample of size n from some population,

• An important feature of a sample is its central value

• Most common measures of center - Mean & Median

nxxx ,...,,

21

Page 24: Statistics  100 Lecture Set  6

Sample Mean

• The sample mean is the average of a set of measurements

• The sample mean:

nix

x

n

i

1

Page 25: Statistics  100 Lecture Set  6

Sample Median

• Have a set of n measurements,

• Median (M) is point in the data that divides the data in half

• Viewed as the mid-point of the data

• To compute the median: 1. For sample size “n”, compute position = (n+1)/2

1. If position is a whole number, then M is the value at this position of the sorted data

2. If position falls between two numbers, then M is the value halfway between those two positions in the sorted data

nxxx ,...,,

21

Page 26: Statistics  100 Lecture Set  6

Example

• Finding the Median, M, when n is odd

• Example: Data = 7, 19, 4, 23, 18

Page 27: Statistics  100 Lecture Set  6

Muzzle Velocity Example

• Data (n=16)

279.9 294.1 260.7 277.9

285.5 298.6 265.1 305.3

264.9 262.9 256.0 255.3

276.3 278.2 283.6 250.0

Page 28: Statistics  100 Lecture Set  6

Muzzle Velocity Example

• Mean:

x = 279.9 + 294.1+ ...+ 250.016

= 274.6

Page 29: Statistics  100 Lecture Set  6

Muzzle Velocity Example

• Median:

Page 30: Statistics  100 Lecture Set  6

Sample Mean vs. Sample Median

• Sometimes sample median is better measure of center

• Sample median less sensitive to unusually large or small values called

• For symmetric distributions the relative location of the sample mean and median is

• For skewed distributions the relative locations are

Page 31: Statistics  100 Lecture Set  6

Other Measures of interest

• Maximum

• Minimum

Page 32: Statistics  100 Lecture Set  6

• Percentiles– A percentile of a distribution is a value that cuts off the

stated part of the distribution at or below that value, with the rest at or above that value.• 5th percentile: 5% of distribution is at or below this value and

95% is at or above this value.• 25th percentile: 25% at or below, 75% at or above• 50th percentile: 50% at or below, 50% at or above• 75th percentile: 75% at or below, 25% at or above• 90th percentile: 90% at or below, 10% at or above• 99th percentile: ___% at or below, ___% at or above

Page 33: Statistics  100 Lecture Set  6

• Percentiles– Can be applied to a population or to a sample

• Usually don’t know population– Use sample percentiles to estimate pop. percentiles

– Standardized tests often measured in percentiles– Birth statistics often measured in percentiles

• First daughter – 10th percentile weight– 25th percentile length– 95th percentile head circumference

Page 34: Statistics  100 Lecture Set  6

Important Percentiles

• First Quartile

• Second Quartile

• Third Quartile

Page 35: Statistics  100 Lecture Set  6

Computing the quartiles

• You know how to compute the median

• Q1 =

• Q3 =

Page 36: Statistics  100 Lecture Set  6

Example

• Finding the other quartiles1. For Q1, find the median of all values below M. 2. For Q3, find the median of all values above M.

• Example: 4, 7, 18, 19, 23, M=18– Q1:– Q3:

• Example: 4, 7, 12, 18, 19, 23, M=15– Q1:– Q3:

Page 37: Statistics  100 Lecture Set  6

• 5 number summary often reported:

– Min, Q1, Q2 (Median), Q3, and Max

– Summarizes both center and spread

– What proportion of data lie between Q1 and Q3?

Page 38: Statistics  100 Lecture Set  6

Box-Plot

• Displays 5-number summary graphically

• Box drawn spanning quartiles

• Line drawn in box for median

• Lines extend from box to max. and min values.

• Some programs draw whiskers only to 1.5*IQR above and below the quartiles

6065

7075

8085

Undergrads

Exa

m S

core

s

Page 39: Statistics  100 Lecture Set  6

• Can compare distributions using side-by-side box-plots

• What can you see from the plot?

Page 40: Statistics  100 Lecture Set  6

Example - Moisture Uptake

• There is a need to understand degradation of 3013 containers during long term storage

• Moisture uptake is considered a key factor in degradation due to corrosion

• Calcination removes moisture

• Calcination temperature requirements were written with very pure materials in mind, but the situation has evolved to include less pure materials, e.g. high in salts (Cl salts of particular concern)

• Calcination temperature may need to be reduced to accommodate salts.

• An experiment is to be conducted to see how the calcination temperature impacts the mean moisture uptake

Page 41: Statistics  100 Lecture Set  6

Working Example - Moisture Uptake

• Experiment Procedure: – Two calcination temperatures…wish to compare the mean uptake for each

temperature– Have 10 measurements per temperature treatment– The temperature treatments are randomly assigned to canisters

• Response: Rate of change in moisture uptake in a 48 hour period (maximum time to complete packaging)

Page 42: Statistics  100 Lecture Set  6

Box-Plot

Page 43: Statistics  100 Lecture Set  6

Other Common Measure of Spread: Sample Variance

• Sample variance of n observations:

• Units are in squared units of data

1

)(1

2

2

n

xxs

n

ii

Page 44: Statistics  100 Lecture Set  6

Sample Standard Deviation

• Sample standard deviation of n observations:

• Has same units as data

1

)(1

2

n

xxs

n

ii

Page 45: Statistics  100 Lecture Set  6

Exercise

• Compute the sample standard deviation and variance for the Muzzle Velocity Example

Page 46: Statistics  100 Lecture Set  6

Comments

• Variance and standard deviation are most useful when measure of center is

• As observations become more spread out, s : increases or decreases?

• Both measures sensitive to outliers

• 5 number summary is better than the mean and standard deviation for describing (i) skewed distributions; (ii) distributions with outliers

Page 47: Statistics  100 Lecture Set  6

Comments

• Standard deviation is zero when

• Measures spread relative to

Page 48: Statistics  100 Lecture Set  6

• Empirical Rule– If a distribution is bell-shaped and roughly symmetric, then

• About 2/3 of data will lie within ±1s of • About 95% of data will lie within ±2s of • Usually all data will lie within ±3s of

– So you can reconstruct a rough picture of the histogram from just two numbers

More interpretation of s

x

x

x

Page 49: Statistics  100 Lecture Set  6

Example

• Mid-term:

Page 50: Statistics  100 Lecture Set  6

Example

• Mid-term:

• Empirical rule tells us