measures of a distribution’s central tendency, spread, and shape chapter 3 sharon lawner weinberg...

Post on 22-Dec-2015

217 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Measures of a Measures of a Distribution’s Central Distribution’s Central Tendency, Spread, and Tendency, Spread, and ShapeShapeChapter 3Chapter 3

SHARON LAWNER WEINBERG SARAH KNAPP ABRAMOWITZ

StatisticStatisticss

SPSSSPSSAn Integrative Approach

SECOND EDITION

UsinUsingg

OverviewOverview• Measures of Central Tendency (Level)

• Mode• Median• Mean

• Measures of Dispersion (Spread)• Range• Interquartile Range• Variance• Standard Deviation

• Measure of Shape• Skewness and Skewness Ratio

Measures of Central Tendency: ModeMeasures of Central Tendency: Mode

Definition: The mode is the score that occurs most often.Useful when data are nominal or ordinal with only a limited number of categories.

To find the mode, click Analyze on the main menu bar, Descriptive Statistics, and then Frequencies.

Click on Options, and the square next to mode. Click OK.

Measures of Central Tendency: ModeMeasures of Central Tendency: ModeExample: Home Language Background (HOMELANG)

What is this variable’s mode?

Home Language Background

15 3.0 3.0 3.0

35 7.0 7.0 10.0

47 9.4 9.4 19.4

403 80.6 80.6 100.0

500 100.0 100.0

Non-English Only

Non-English Dominant

English Dominant

English Only

Total

ValidFrequency Percent Valid Percent

CumulativePercent

Home Language Background

15 3.0 3.0 3.0

35 7.0 7.0 10.0

47 9.4 9.4 19.4

403 80.6 80.6 100.0

500 100.0 100.0

Non-English Only

Non-English Dominant

English Dominant

English Only

Total

ValidFrequency Percent Valid Percent

CumulativePercent

Mode = English Only

Measures of Central Tendency: ModeMeasures of Central Tendency: Mode

Example: Although the mode is technically the South, the North Central is close enough that the distribution may be considered bimodal.

Measures of Central Tendency: ModeMeasures of Central Tendency: Mode

•Definition: A bimodal distribution is one with two modes, usually at some distance apart from each other.

•Definition: A uniform distribution is one in which all values occur with the same frequency.

Measures of Central Tendency: MedianMeasures of Central Tendency: Median

Definition: The median is the middle point in a distribution.Useful when data are ordinal or scale and severely skewed.

To find the median, click Analyze on the main menu bar, Descriptive Statistics, and then Explore. Click OK.

Or, to find the median, click Analyze on the main menu bar, Descriptive Statistics, and then Frequencies.

Click on Options, and the square next to median. Click OK.

Measures of Central Tendency: MeanMeasures of Central Tendency: MeanDefinition: The mean is the sum of all of the data

points divided by the number of data points.Useful when data are scale and not severely skewed.

To find the mean, click Analyze on the main menu bar, Descriptive Statistics, and then Explore. Click OK.

OR use Frequencies OR use Descriptives.

Measures of Central Tendency: MeanMeasures of Central Tendency: Mean

In the case where the variable is dichotomous and coded as 0 and 1, the mean is interpreted as the proportion of 1’s in the distribution.• Example: Gender

Statistics

Gender500

0

.55

1.00

1

Valid

Missing

N

Mean

Median

Mode

Statistics

Gender500

0

.55

1.00

1

Valid

Missing

N

Mean

Median

Mode

Gender

227 45.4 45.4 45.4

273 54.6 54.6 100.0

500 100.0 100.0

Male

Female

Total

ValidFrequency Percent Valid Percent

CumulativePercent

Gender

227 45.4 45.4 45.4

273 54.6 54.6 100.0

500 100.0 100.0

Male

Female

Total

ValidFrequency Percent Valid Percent

CumulativePercent

Measures of Central Tendency:Measures of Central Tendency:Comparing the Mean, Median, and ModeComparing the Mean, Median, and Mode

Compare the values of the mode, median, and mean for SES, EXPINC30, and SCHATTRT.

Statistics

500 459 417

0 41 83

18.43 51574.73 93.65

19.00 40000.00 95.00

19 50000 95

Valid

Missing

N

Mean

Median

Mode

Socio-Economic

Status

Expectedincome at

age 30

SchoolAverage DailyAttendance

Rate

Statistics

500 459 417

0 41 83

18.43 51574.73 93.65

19.00 40000.00 95.00

19 50000 95

Valid

Missing

N

Mean

Median

Mode

Socio-Economic

Status

Expectedincome at

age 30

SchoolAverage DailyAttendance

Rate

Measures of Dispersion VisuallyMeasures of Dispersion VisuallyWhen traveling to these two cities, would the same clothing be suitable

for both cities at any time during the year from the point of view of warmth?

Measures of DispersionMeasures of DispersionHow can we quantify the obvious difference in

temperature variability across the year between these two cities?• One Answer: By using the range or interquartile range

(IQR).• Another Answer: By using the variance or standard

deviation.

The Range and Interquartile RangeThe Range and Interquartile RangeDefinition: The range is the difference between the

highest and lowest values in the distribution. The interquartile range (IQR) is the range of the middle half of the data, or the difference between the 75th and 25th percentiles.Useful when data are ordinal or scale and severely skewed.

To find the IQR and range, click Analyze on the main menu bar, Descriptive Statistics, and then Explore. Click OK.

The Variance and Standard DeviationThe Variance and Standard DeviationDefinition: The variance is the average of the squared

deviations from the mean. The standard deviation is the square root of the variance. We may think of the standard deviation as the distance we have to travel in both directions from the mean to capture the majority of values in a distribution. The farther out we need to travel, the more spread out are the values of the distribution from the mean.Useful when data are scale and not severely skewed.

To find the SD and Variance, click Analyze on the main menu bar, Descriptive Statistics, and then Explore. Click OK.

Measures of DispersionMeasures of Dispersion We get the following values for the temperature example.

Consistent with the earlier boxplots, for all quantitative measures, Springfield is shown to have a greater temperature spread than San Francisco.

Descriptives

282.992

16.822

46

34

29.061

5.391

15

10

Variance

Std. Deviation

Range

Interquartile Range

Variance

Std. Deviation

Range

Interquartile Range

citySpringfield

San Francisco

tempStatistic

Descriptives

282.992

16.822

46

34

29.061

5.391

15

10

Variance

Std. Deviation

Range

Interquartile Range

Variance

Std. Deviation

Range

Interquartile Range

citySpringfield

San Francisco

tempStatistic

Measures of DispersionMeasures of Dispersion

Key words to indicate that a question relates to dispersion:Spread, variability, dispersion, heterogeneity, inconsistency, unpredictability

Measures of ShapeMeasures of ShapeDefinition: The skewness statistic is a measure of the shape of a

distribution. It is negative when the distribution is negatively skewed, zero when the distribution is not skewed, and positive when the distribution is positively skewed. Its calculation is based on the cubed deviations from the mean.

Definition: The skewness ratio is the value of the skewness statistic divided by its standard error. This measure is useful for determining the extent of skew. As a rule of thumb, when this ratio exceeds 2 in magnitude for small and moderate sized samples, the distribution is considered to be severely skewed.

Useful when data are scale.

To find the skewness ratio, click Analyze on the main menu bar, Descriptive Statistics, and then Explore. Click OK. Divide the skewness statistic by the standard error of the skew.

Examples of Distributions of Different ShapeExamples of Distributions of Different Shape

How the Shape of the Distribution Affects How the Shape of the Distribution Affects the Mean and Medianthe Mean and Median• For a severely positively skewed distribution, in general,

the mean is greater than the median.• For a severely negatively skewed distribution, in general,

the mean is less than the median. • For a symmetric distribution, the mean equals the median.

Which Measure of Central Tendency Which Measure of Central Tendency Should One UseShould One Use• An article in the Wall Street Journal online (

http://online.wsj.com/article/SB118790518546107112.html) from August 24, 2007 reported the following: • The average cost of a wedding is between $27,400 and

$28,800.• The median is approximately $15,000.

How can we justify this apparent contradiction in the cost of a wedding?

Applying What We have LearnedApplying What We have Learned• What is the extent to which eighth-grade males expect

larger incomes at age 30 than eighth-grade females?

• To what extent is there lack of consensus among males in their income expectations as compared to females?

• How are the answers to these questions influenced by the outliers and general shape of these distributions as shown in the boxplots in the last slide?

Descriptive Statistics for Males and FemalesDescriptive Statistics for Males and Females

Descriptives

60720.93 5405.410

45000.00

6E+009

79258.866

1

1000000

999999

25000

8.863 .166

43515.57 1726.263

40000.00

7E+008

26965.088

0

250000

250000

20000

3.816 .156

Mean

Median

Variance

Std. Deviation

Minimum

Maximum

Range

Interquartile Range

Skewness

Mean

Median

Variance

Std. Deviation

Minimum

Maximum

Range

Interquartile Range

Skewness

GenderMale

Female

Expected incomeat age 30

Statistic Std. Error

Descriptives

60720.93 5405.410

45000.00

6E+009

79258.866

1

1000000

999999

25000

8.863 .166

43515.57 1726.263

40000.00

7E+008

26965.088

0

250000

250000

20000

3.816 .156

Mean

Median

Variance

Std. Deviation

Minimum

Maximum

Range

Interquartile Range

Skewness

Mean

Median

Variance

Std. Deviation

Minimum

Maximum

Range

Interquartile Range

Skewness

GenderMale

Female

Expected incomeat age 30

Statistic Std. Error

Boxplots for Males and FemalesBoxplots for Males and Females

top related