math 102- statistics

61
Teaching Basic Statistics INTRODUCTION TO STATISTICS AND STATISTICAL INFERENCE

Upload: zahra-zulaikha

Post on 07-May-2015

3.341 views

Category:

Education


0 download

DESCRIPTION

Dec 15, 2011 with Ma'am Daisy

TRANSCRIPT

Page 1: Math 102- Statistics

Teaching Basic Statistics

INTRODUCTION TOSTATISTICS AND

STATISTICAL INFERENCE

Page 2: Math 102- Statistics

Session 1.2

TEACHING BASIC STATISTICS

Page 3: Math 102- Statistics

Session 1.3

TEACHING BASIC STATISTICS

Realities about Statistics

“There are three kinds of lies: lies, damned lies, and statistics” – Mark Twaine

One can not go about without statistics.“Statistics are like bikinis. What they reveal is suggestive, but what they conceal is vital.” – Aaron Levenstein 

Page 4: Math 102- Statistics

Session 1.4

TEACHING BASIC STATISTICS

Definition of Statistics

plural sense: numerical facts, e.g. CPI, peso-dollar exchange rate

singular sense: scientific discipline consisting of theory and methods for processing numerical information that one can use when making decisions in the face of uncertainty.

Page 5: Math 102- Statistics

Session 1.5

TEACHING BASIC STATISTICS

History of Statistics

The term statistics came from the Latin phrase “ratio status” which means study of practical politics or the statesman’s art.

In the middle of 18th century, the term statistik (a term due to Achenwall) was used, a German term defined as “the political science of several countries”

From statistik it became statistics defined as a statement in figures and facts of the present condition of a state.

Page 6: Math 102- Statistics

Session 1.6

TEACHING BASIC STATISTICS

Application of Statistics

Diverse applications “During the 20th Century statistical thinking

and methodology have become the scientific framework for literally dozens of fields including education, agriculture, economics, biology, and medicine, and with increasing influence recently on the hard sciences such as astronomy, geology, and physics. In other words, we have grown from a small obscure field into a big obscure field.” – Brad Efron

Page 7: Math 102- Statistics

Session 1.7

TEACHING BASIC STATISTICS

Application of Statistics

Comparing the effects of five kinds of fertilizers on the yield of a particular variety of corn

Determining the income distribution of Ateneo students under CHED

Comparing the effectiveness of two diet programs

Prediction of daily temperatures Evaluation of student performance

Page 8: Math 102- Statistics

Session 1.8

TEACHING BASIC STATISTICS

Two Aims of Statistics

Statistics aims to uncover structure in data, to explain variation…

Descriptive Inferential

Page 9: Math 102- Statistics

Session 1.9

TEACHING BASIC STATISTICS

Areas of Statistics

Descriptive statistics methods concerned w/

collecting, describing, and analyzing a set of data without drawing conclusions (or inferences) about a large group

Inferential statistics methods concerned

with the analysis of a subset of data leading to predictions or inferences about the entire set of data

Page 10: Math 102- Statistics

Session 1.10

TEACHING BASIC STATISTICS

Examples of Descriptive Statistics

Presenting the Philippine population by constructing a graph indicating the total number of Filipinos counted during the last census by age group and sex

The Department of Social Welfare and Development (DSWD) cited statistics showing an increase in the number of child abuse cases during the past five years.

Page 11: Math 102- Statistics

Session 1.11

TEACHING BASIC STATISTICS

Examples of Inferential Statistics

A new milk formulation designed to improve the psychomotor development of infants was tested on randomly selected infants. Based on the results, it was concluded that the new milk formulation is effective in improving the psychomotor development of infants.

Page 12: Math 102- Statistics

Session 1.12

TEACHING BASIC STATISTICS

Inferential Statistics

Larger Set(N units/observations) Smaller Set

(n units/observations)

Inferences and Generalizations

Page 13: Math 102- Statistics

Session 1.13

TEACHING BASIC STATISTICS

Key Definitions

A variable is a characteristic observed

or measured on every unit of the universe.

A population is the set of all possible values of the variable.

Page 14: Math 102- Statistics

Session 1.14

TEACHING BASIC STATISTICS

Key Definitions

Parameters are numerical measures that describe the population or universe of interest. Usually donated by Greek letters; (mu), (sigma), (rho), (lambda), (tau), (theta), (alpha) and (beta).

Statistics are numerical measures of a sample

Page 15: Math 102- Statistics

Session 1.15

TEACHING BASIC STATISTICS

VARIABLES

Qualitative Quantitative

ContinuousDiscrete

Types of Variables

Qualitative variable non-numerical values

Quantitative variable numerical values

a. Discrete countable

b. Continuous measurable

Page 16: Math 102- Statistics

Session 1.16

TEACHING BASIC STATISTICS

Levels of Measurement

1. Nominal Numbers or symbols used to classify

2. Ordinal scale Accounts for order; no indication of

distance between positions

3. Interval scale Equal intervals; no absolute zero

4. Ratio scale Has absolute zero

Page 17: Math 102- Statistics

Session 1.17

TEACHING BASIC STATISTICS

NOMINAL SCALEa nominal scale consists of a set of categories that have

different namesmeasurements on a nominal scale label and categorize

observations, but do not make any quantitative distinctions between observations.Variables measured at the nominal scale:

Gender (1= male, 0=female)ZIP code (7000=Philippines, …)Plate numbers of vehicles (JK3429, MC001, …)Course (Biology, Mathematics, History, …)Race (Asian, American, …)Eye color (Brown, Blue, …)

Page 18: Math 102- Statistics

Session 1.18

TEACHING BASIC STATISTICS

ORDINAL SCALEconsists of a set of categories that are organized in an

ordered sequencemeasurements on an ordinal scale rank observations in

terms of sizevariables that can be measured at the ordinal scale:

Ranks in a race (first, second, third, …)Sizes of shirts (small, medium, large, …)Order of birth (first child, second child , third child ,

…)Socio-economic status (lower, middle, upper, …)Difficulty level of a test (easy, average, difficult, …)Degree of agreement (SD, D, A, SA)

Page 19: Math 102- Statistics

Session 1.19

TEACHING BASIC STATISTICS

INTERVAL SCALE consists of ordered categories that are all

intervals of exactly the same sizeequal differences between numbers on the

scale reflect equal differences in magnitude, however, ratios of magnitudes are not meaningful.

Variables measured at the interval scale:Temperature (in oF or oC)IQ SAT scores

Page 20: Math 102- Statistics

Session 1.20

TEACHING BASIC STATISTICS

RATIO SCALE is an interval scale with additional

feature of an absolute zero pointRatios of numbers do reflect ratios of

magnitudeVariables measured at the ratio scale:

Age (16, 20, 28, …)Height (165cm, 154cm, 144cm, …)Reaction time (20sec, 43sec, 37sec,

…) Number of siblings (2, 5, 8, …)Hours spent on studying for an

exam (0, 2, 3, …)

Page 21: Math 102- Statistics

Session 1.21

TEACHING BASIC STATISTICS

Methods of Presenting Data

Textual

Tabular

Graphical

Page 22: Math 102- Statistics

Session 1.22

TEACHING BASIC STATISTICS

Mean

Median

Mode

Summary Measures

Variation

Variance

Standard Deviation

Coefficient of Variation

Range

Location

Maximum Minimum

Central Tendency

Percentile Quartile Decile

Interquartile Range

Skewness

Kurtosis

Page 23: Math 102- Statistics

Session 1.23

TEACHING BASIC STATISTICS

Measures of Location

A Measure of Location summarizes a data set by giving a “typical value” within the range of the data values that describes its location relative to entire data set.Some Common Measures:

Minimum, Maximum

Central Tendency

Percentiles, Deciles, Quartiles

Page 24: Math 102- Statistics

Session 1.24

TEACHING BASIC STATISTICS

Maximum and Minimum

Minimum is the smallest value in the data set, denoted as MIN.

Maximum is the largest value in the data set, denoted as MAX.

Page 25: Math 102- Statistics

Session 1.25

TEACHING BASIC STATISTICS

Measure of Central Tendency

A single value that is used to identify the “center” of the data it is thought of as a typical value of

the distributionprecise yet simplemost representative value of the

data

Page 26: Math 102- Statistics

Session 1.26

TEACHING BASIC STATISTICS

Mean

Most common measure of the center Also known as arithmetic average

1 21

n

ini

xx x x

xn n

Sample Mean

1 1 2

N

ii N

XX X X

N N

Population Mean

Page 27: Math 102- Statistics

Session 1.27

TEACHING BASIC STATISTICS

Properties of the Mean

may not be an actual observation in the data set

can be applied in at least interval level

easy to compute every observation contributes

to the value of the mean

Page 28: Math 102- Statistics

Session 1.28

TEACHING BASIC STATISTICS

Properties of the Mean

subgroup means can be combined to come up with a group mean

easily affected by extreme values

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14

Mean = 5Mean = 6

Page 29: Math 102- Statistics

Session 1.29

TEACHING BASIC STATISTICS

Median

Divides the observations into two equal parts If n is odd, the median is the middle number. If n is even, the median is the average of the

2 middle numbers.

Sample median denoted as

while population median is denoted as

x~

~

Page 30: Math 102- Statistics

Session 1.30

TEACHING BASIC STATISTICS

Properties of a Median

may not be an actual observation in the data set

can be applied in at least ordinal level a positional measure; not affected by

extreme values

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14

Median = 5

Page 31: Math 102- Statistics

Session 1.31

TEACHING BASIC STATISTICS

Mode

occurs most frequently nominal average computation of the mode for ungrouped or

raw data

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Mode = 9

0 1 2 3 4 5 6

No Mode

Page 32: Math 102- Statistics

Session 1.32

TEACHING BASIC STATISTICS

Properties of a Mode

can be used for qualitative as well as quantitative data

may not be unique not affected by extreme values may not exist

Page 33: Math 102- Statistics

Session 1.33

TEACHING BASIC STATISTICS

Mean, Median & Mode

Use the mean when:

sampling stability is desired other measures are to be

computed

Page 34: Math 102- Statistics

Session 1.34

TEACHING BASIC STATISTICS

Mean, Median & Mode

Use the median when:

the exact midpoint of the distribution is desired

there are extreme observations

Page 35: Math 102- Statistics

Session 1.35

TEACHING BASIC STATISTICS

Mean, Median & Mode

Use the mode when:

when the "typical" value is desired

when the dataset is measured on a nominal scale

Page 36: Math 102- Statistics

Session 1.36

TEACHING BASIC STATISTICS

Percentiles

Numerical measures that give the relative position of a data value relative to the entire data set.

Divide an array (raw data arranged in increasing or decreasing order of magnitude) into 100 equal parts.

The jth percentile, denoted as Pj, is the data value in the the data set that separates the bottom j% of the data from the top (100-j)%.

Page 37: Math 102- Statistics

Session 1.37

TEACHING BASIC STATISTICS

EXAMPLE

Suppose LJ was told that relative to the other scores on a certain test, his score was the 95th percentile. This means that 95% of those who took the test had scores less than or equal to LJ’s score, while 5% had scores higher than LJ’s.

Page 38: Math 102- Statistics

Session 1.38

TEACHING BASIC STATISTICS

Deciles

Divide an array into ten equal parts, each part having ten percent of the distribution of the data values, denoted by Dj.

The 1st decile is the 10th percentile; the 2nd decile is the 20th percentile…..

Page 39: Math 102- Statistics

Session 1.39

TEACHING BASIC STATISTICS

Quartiles

Divide an array into four equal parts, each part having 25% of the distribution of the data values, denoted by Qj.

The 1st quartile is the 25th percentile; the 2nd quartile is the 50th percentile, also the median and the 3rd quartile is the 75th percentile.

Page 40: Math 102- Statistics

Session 1.40

TEACHING BASIC STATISTICS

Measures of Variation

A measure of variation is a single value that is used to describe the spread of the distributionA measure of central tendency

alone does not uniquely describe a distribution

Page 41: Math 102- Statistics

Session 1.41

TEACHING BASIC STATISTICS

Mean = 15.5 s = 3.338

11 12 13 14 15 16 17 18 19 20 21

11 12 13 14 15 16 17 18 19 20 21

Data B

Data A

Mean = 15.5

s = .9258

11 12 13 14 15 16 17 18 19 20 21

Mean = 15.5 s = 4.57

Data C

A look at dispersion…

Page 42: Math 102- Statistics

Session 1.42

TEACHING BASIC STATISTICS

Two Types of Measures of Dispersion

Absolute Measures of Dispersion: Range Inter-quartile Range Variance Standard Deviation

Relative Measure of Dispersion: Coefficient of Variation

Page 43: Math 102- Statistics

Session 1.43

TEACHING BASIC STATISTICS

Range (R)

The difference between the maximum and minimum value in a data set, i.e.

R = MAX – MINExample: Pulse rates of 15 male residents of a

certain village

54 58 58 60 62 65 66 71 74 75 77 78 80 82 85

R = 85 - 54 = 31

Page 44: Math 102- Statistics

Session 1.44

TEACHING BASIC STATISTICS

Some Properties of the Range

The larger the value of the range, the more dispersed the observations are.

It is quick and easy to understand.

A rough measure of dispersion.

Page 45: Math 102- Statistics

Session 1.45

TEACHING BASIC STATISTICS

Inter-Quartile Range (IQR)

The difference between the third quartile and first quartile, i.e.

IQR = Q3 – Q1 Example: Pulse rates of 15 residents of a

certain village

54 58 58 60 62 65 66 71 74 75 77 78 80 82 85

IQR = 78 - 60 = 18

Page 46: Math 102- Statistics

Session 1.46

TEACHING BASIC STATISTICS

Some Properties of IQR

Reduces the influence of extreme values.

Not as easy to calculate as the Range.

Page 47: Math 102- Statistics

Session 1.47

TEACHING BASIC STATISTICS

Variance

important measure of variation shows variation about the mean

Population variance

Sample variance

N

XN

ii

1

2

2

)(

1

)(1

2

2

n

xxs

n

ii

Page 48: Math 102- Statistics

Session 1.48

TEACHING BASIC STATISTICS

Standard Deviation (SD)

most important measure of variation square root of Variance has the same units as the original data

Population SD

Sample SD

N

XN

ii

1

2)(

1

)(1

2

n

xxs

n

ii

Page 49: Math 102- Statistics

Session 1.49

TEACHING BASIC STATISTICS

Data: 10 12 14 15 17 18 18 24

n = 8 Mean =16

309.4 7

2)1624(2)1618(2)1617(2)1615(2)1614(2)1612(2)1610(

s

Computation of Standard Deviation

Page 50: Math 102- Statistics

Session 1.50

TEACHING BASIC STATISTICS

Remarks on Standard Deviation

If there is a large amount of variation, then on average, the data values will be far from the mean. Hence, the SD will be large.

If there is only a small amount of variation, then on average, the data values will be close to the mean. Hence, the SD will be small.

Page 51: Math 102- Statistics

Session 1.51

TEACHING BASIC STATISTICS

Mean = 15.5 s = 3.338 11 12 13 14 15 16 17 18 19 20 21

11 12 13 14 15 16 17 18 19 20 21

Data B

Data A

Mean = 15.5 s = .9258

11 12 13 14 15 16 17 18 19 20 21

Mean = 15.5 s = 4.57

Data C

Comparing Standard Deviation

Page 52: Math 102- Statistics

Session 1.52

TEACHING BASIC STATISTICS

Example: Team A - Heights of five marathon players in inches

65”

65 “ 65 “ 65 “ 65 “ 65 “

Mean = 65 S = 0

Comparing Standard Deviation

Page 53: Math 102- Statistics

Session 1.53

TEACHING BASIC STATISTICS

Example: Team B - Heights of five marathon players in inches

62 “ 67 “ 66 “ 70 “ 60 “

Mean = 65” s = 4.0”

Comparing Standard Deviation

Page 54: Math 102- Statistics

Session 1.54

TEACHING BASIC STATISTICS

Properties of Standard Deviation

It is the most widely used measure of dispersion. (Chebychev’s Inequality)

It is based on all the items and is rigidly defined.

It is used to test the reliability of measures calculated from samples.

The standard deviation is sensitive to the presence of extreme values.

It is not easy to calculate by hand (unlike the range).

Page 55: Math 102- Statistics

Session 1.55

TEACHING BASIC STATISTICS

Coefficient of Variation (CV)

measure of relative variation usually expressed in percent shows variation relative to mean used to compare 2 or more groups Formula :

100%

Mean

SDCV

Page 56: Math 102- Statistics

Session 1.56

TEACHING BASIC STATISTICS

Comparing CVs

Stock A: Average Price = P50

SD = P5

CV = 10% Stock B: Average Price = P100

SD = P5

CV = 5%

Page 57: Math 102- Statistics

Session 1.57

TEACHING BASIC STATISTICS

Measure of Skewness

Describes the degree of departures of the distribution of the data from symmetry.

The degree of skewness is measured by the coefficient of skewness, denoted as SK and computed as,

SD

MedianMeanK

3S

Page 58: Math 102- Statistics

Session 1.58

TEACHING BASIC STATISTICS

What is Symmetry?

A distribution is said to be symmetric about the mean, if the distribution to the left of mean is the “mirror image” of the distribution to the right of the mean. Likewise, a symmetric distribution has SK=0 since its mean is equal to its median and its mode.

Page 59: Math 102- Statistics

Session 1.59

TEACHING BASIC STATISTICS

positively skewed

Measure of Skewness

negatively skewed

Page 60: Math 102- Statistics

Session 1.60

TEACHING BASIC STATISTICS

Measure of Kurtosis

Describes the extent of peakedness or flatness of the distribution of the data.

Measured by coefficient of kurtosis (K) computed as,

4

1

43

N

i

i

X

KN

Page 61: Math 102- Statistics

Session 1.61

TEACHING BASIC STATISTICS

K = 0 mesokurtic

K > 0 leptokurtic

K < 0platykurtic

Measure of Kurtosis