continuous data

23
1 Continuous Data

Upload: stash

Post on 22-Feb-2016

65 views

Category:

Documents


0 download

DESCRIPTION

Continuous Data. Median. Sorted data: Min  position 1 Max  position n The median is the value in the “middle” position: position ½ ( 1 + n ) If this position is halfway between, then average the two associated data values. Median = 50 th percentile. Median – Failure Time data. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Continuous Data

1

Continuous Data

Page 2: Continuous Data

2

MedianSorted data: Min position 1

Max position nThe median is the value in the “middle” position:

position ½(1 + n)If this position is halfway between, then average the two associated data values.Median = 50th percentile.

Page 3: Continuous Data

3

Median – Failure Time data

Failure times in hours.The median is 232.3.The 50th percentile is 232.3.

Page 4: Continuous Data

4

Percentile / Percentile RankThe idea is to put the data onto a 0% - 100% scale.Data scale: x Percent scale: k

x is the kth percentile

equivalent

the percentile rank of x is k

Page 5: Continuous Data

5

Interpretationx is the kth percentile / the percentile rank of x is k

This means…(Approximately*) k% of units** have variable*** less than x and (100 – k)% of units have variable greater than x.* technically required; you may omit this** state what the units are – don’t use the word “units”*** state what the variable is – don’t use the word “variable”

Page 6: Continuous Data

6

Illustration 1For seniors graduating from SUNY Oswego, the 70th percentile of (the distribution of) GPAs is 3.274.

GPA x = 3.274 Percent k = 70% (=0.70)

“the 70th percentile of GPAs is 3.274”

“the percentile rank of 3.274 is 70”

Write a sentence explaining what this means, without using the word “percentile.” Your statement must identify the units and variable. You may use the word “percent,” and you must use the numbers 3.274 and 70.

Page 7: Continuous Data

7

Illustration 1For seniors graduating from SUNY Oswego, the 70th percentile of (the distribution of) GPAs is 3.274.

70% of graduation seniors have GPA below 3.274; the other 30% have GPA above 3.274.

units variable

Page 8: Continuous Data

8

Illustration 1For seniors graduating from SUNY Oswego, the 70th percentile of (the distribution of) GPAs is 3.274.

It is not correct to say…

Out of 100 graduating seniors, 70 have GPA below 3.274; the other 30 have GPA above 3.274.

There aren’t exactly 100 graduating seniors

If you chose 100, you would be unlikely to get a 70/30 split.

Page 9: Continuous Data

9

Illustration 1For seniors graduating from SUNY Oswego, the 70th percentile of (the distribution of) GPAs is 3.274.

It is not correct to say…

Out of 100 graduating seniors, 70 have GPA below 3.274; the other 30 have GPA above 3.274.

This statement is only true on average assuming you averaged over all possible samples of 100 companies. Expressing this is more difficult and confusing, so just say it the correct way:

70% of graduation seniors have GPA below 3.274; the other 30% have GPA above 3.274.

Page 10: Continuous Data

10

Illustration 170% of graduating seniors have GPA below 3.274; the other 30% have GPA above 3.274.

Do not worry about seniors with GPA exactly 3.274.

This figure is likely rounded. Very few (much less than 1% of) people will have exactly this GPA.

Page 11: Continuous Data

11

PercentilesSuitable to data where there are few to no ties.

Continuous data

Page 12: Continuous Data

12

Illustration 2In discussing investment opportunities, a financial advisor speaks about a company’s “price to earnings” ratio (PE) – the price of a share of stock divided by the amount of profit the company makes annually (ie.: How much it costs to purchase $1 of annual profit).

“For the ECC Company, its PE of 7.3 is at the 15th percentile among companies in the industrial sector.”

Write a sentence explaining what this means, without using the word “percentile.” Your statement must identify the units and variable. You may use the word “percent,” and you must use the numbers 7.3 and 15.

Page 13: Continuous Data

13

Illustration 2“For the Edmundsen company, the PE of 7.3 is at the 15th percentile among companies in the industrial sector.”

15% of companies in the industrial sector have PE below 7.3; the other 85% have PE above 7.3.

units variable

Page 14: Continuous Data

14

Illustration 215% of companies in the industrial sector have PE below 7.3; the other 85% have PE above 7.3.

It is not correct to say…

Out of 100 industrial companies, 15 have PE below 7.3; the other 85 have PE above 7.3.

There aren’t exactly 100 industrial companies

If you chose 100, you would be unlikely to get a 15/85 split.

Page 15: Continuous Data

15

Illustration 215% of companies in the industrial sector have PE below 7.3; the other 85% have PE above 7.3.

It is not correct to say

Out of 100 industrial companies, 15 have PE below 7.3; the other 85 have PE above 73.

This statement is only true on average assuming you averaged over all possible samples of 100 companies. Expressing this is more difficult and confusing, so just say it the correct way.

Page 16: Continuous Data

16

Illustration 215% of companies in the industrial sector have PE below 7.3; the other 85% have PE above 7.3.

Do not worry about companies with PE exactly 7.3. Even ECCs PE is not exactly 7.3 It’s rounded to that figure.

Page 17: Continuous Data

17

Percentiles & Percentile Ranksin Excel

Data in sorted (low to high) array

Value on data scale: x

Value on % scale “Percentile Rank”: k (%)

=PERCENTRANK(array, x, 9)

(the 9 ensures accuracy)

=PERCENTILE(array, k/100)

Page 18: Continuous Data

18

Sorted failure time data in cells B2 through B29(n = 28).

Determine the percentile rank for a failure time of 216.6 hours.

=PERCENTRANK(B2:B29, 216.6, 9)

0.3704 = 37.04%

“216.6 is the 37.04 percentile.”

“The percentile rank of 216.6 is 37.04.”

Page 19: Continuous Data

19

Rouding of PercentsFor 10% - 90%

to the nearest 1% is generally fineFor 1% - 10% and 90% - 99%

to the nearest 0.1% is fineFor 0.1% - 1.0% and 99.0% - 99.9%

to the nearest 0.01% is fineIt’s OK to give more precision than is called for. You can run into trouble working with less precision than specified here.

Page 20: Continuous Data

20

Rounding of PercentsConsider two treatments for your condition. With Treatment A the chance of dying is 0.51%. With Treatment B the chance is 1.49%.Rounded to the nearest 1%, both are 1%.Out of 10,000 people getting treatment A, on average 51 die.Out of 10,000 people getting treatment B, on average 149 die.Almost 3 times as many.

Page 21: Continuous Data

21

Sorted failure time data in cells B2 through B29(n = 28).

Determine the 75th percentile.

75% has to be “converted” to 0.75 for use in PERCENTILE

=PERCENTILE(B2:B29, 0.75)

254.2

“254.2 is the 75th percentile.”

“The percentile rank of 254.2 is 75.”

Page 22: Continuous Data

22

# of Cars OwnedSuppose we surveyed 100 families.

Most would say 1 or 2, some 3, a few 4, and a few 0.

The data are highly discrete.

0 1 2 3 40

5

10

15

20

25

30

35

40

45

50

# of Cars Owned

# of

Fam

ilies

Page 23: Continuous Data

23

# of Cars Owned (sorted)0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 2 2 2 2 2 2 2 2 2 2 22 2 2 2 2 2 2 2 2 2 2 2 2 2 2 22 2 2 2 2 2 2 2 2 2 2 2 2 2 2 22 2 3 3 3 3 3 3 3 3 3 3 3 3 3 33 3 3 4

The 38th percentile is 2. The 82nd percentile is 2.

Percentiles don’t make much sense for discrete data (and make no sense for categorical data).