chapter 2 frequency distributions, stem-and- leaf displays, and histograms

62
Chapter 2 Frequency Distributions, Stem-and-leaf displays, and Histograms

Post on 21-Dec-2015

255 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

Chapter 2

Frequency Distributions, Stem-and-leaf displays, and Histograms

Page 2: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

Where have we been?

Page 3: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

To calculate SS, the variance, and the standard deviation: find the deviations from , square and

sum them (SS), divide by N (2) and take a square root().

Example: Scores on a Psychology quiz

Student

John

JenniferArthurPatrickMarie

X

7

8357

X = 30 N = 5 = 6.00

X -

+1.00

+2.00-3.00-1.00+1.00

(X- ) = 0.00

(X - )2

1.00

4.009.001.001.00

(X- )2 = SS = 16.00

2 = SS/N = 3.20 = = 1.7920.3

Page 4: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

Ways of showing how scores are distributed

around the mean• Frequency Distributions,

• Stem-and-leaf displays

• Histograms

Page 5: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

Some definitions

• Frequency Distribution - a tabular display of the way scores are distributed across all the possible values of a variable

• Absolute Frequency Distribution - displays the count (how many there are) of each score.

• Cumulative Frequency Distribution - displays the total number of scores at and below each score.

• Relative Frequency Distribution - displays the proportion of each score.

• Relative Cumulative Frequency Distribution - displays the proportion of scores at and below each score.

Page 6: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

What does a relative frequency distribution do?

Page 7: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

•Relative Frequency Distribution - displays the proportion of each score.

Page 8: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

What does a cumulative frequency distribution show?

Page 9: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

Some definitions

• Cumulative Frequency Distribution - displays the total number of scores at and below each score.

Page 10: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

Example DataTraffic accidents by bus drivers

•Studied 708 bus drivers, all of whom had

worked for the company for the past 5 years or

more.

•Recorded all accidents for the last 4 years.

•Data looks like:

3, 0, 6, 0, 0, 2, 1, 4, 1, … 6, 0, 2

Page 11: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

Frequency distributions

# of acdnts

0

1

2

3

4

5

6

7

8

9

10

11

AbsoluteFrequency

117

157

158

115

78

44

21

7

6

1

3

1

708

CumulativeFrequency

117

274

432

547

625

669

690

697

703

704

707

708

To calculate cumulative frequencies , add up the frequency of scores at or below each point.

Page 12: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

What pops out of such a display

• 18 drivers (about 2.5% of the drivers) had 7 or more accidents during the 4 years just before the study.

• Those 18 drivers caused 147 accidents The number of accidents for the whole group was 1623. So the 18 worse 2.5% of the drivers were involved in 147/1623=9% of the accidents.

• Maybe they should be given eye/reflex exams?• Maybe they should be given desk jobs?

Page 13: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

Absolute and relative frequency distributions

# of accidents0123456789

1011

AbsoluteFrequency

117

157

158

115

78

44

21

7

6

1

3

1

708

RelativeFrequency

.165

.222

.223

.162

.110

.062

.030

.010

.008

.001

.004

.001

.998

Calculate relative frequencies, by dividing the number of each kind of score by N, the total number of scores.For example 117/708 = .165.

Relative frequencies show the proportion of scores at or below each point.

These proportions don’t sum to 1.000 because of rounding error.

Page 14: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

What can you answer?# of

accidents0123456789

1011

RelativeFreq.

.165

.222

.223

.162

.110

.062

.030

.010

.008

.001

.004

.001

.998

Percent with at most 1 accident?

Proportion with 8 or more accidents?

= .165 + .222 = .387 .387 * 100 = 38.7%

= .008 + .001 +.004 + .001 = .014

Percent with between 4 and 7 accidents?= .110 + .062 +.030 + .010 = .212 = 21.2%

Page 15: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

Cumulative Frequencies

# of acdnts

0

1

2

3

4

5

6

7

8

9

10

11

AbsoluteFrequency

117

157

158

115

78

44

21

7

6

1

3

1

708

CumulativeFrequency

117

274

432

547

625

669

690

697

703

704

707

708

CumulativeRelative

Frequency

.165

.387

.610

.773

.883

.945

.975

.983

.993

.994

.999

1.000

Cumulative frequencies show number of scores at or below each point.

Calculate by adding all scores below each point.

Cumulative relative frequencies show the proportion of scores at or below each point.

Calculate by dividing cumulative frequencies by N at each point.

Page 16: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

Grouped FrequenciesNeeded when

– number of values is large OR– values are continuous.

To calculate group intervals– First find the range.

– Determine a “good” interval based on• on number of resulting intervals,

• meaning of data, and

• common, regular numbers.

– List intervals from largest to smallest.

Page 17: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

Grouped Frequency Example

2.72 2.84 2.63 2.51 2.54 2.98 2.61 2.93 2.87 2.76 2.58 2.66 2.86 2.862.58 2.60 2.63 2.62 2.73 2.80 2.79 2.96 2.58 2.50 2.82 2.83 2.90 2.912.87 2.87 2.74 2.70 2.52 2.75 2.99 2.66 2.58 2.71 2.51 2.87 2.87 2.752.85 2.61 2.54 2.73 2.96 2.90 2.75 2.76 2.93 2.64 2.85 2.70 2.56 2.512.83 2.79 2.76 2.75 2.86 2.58 2.87 2.89 2.89 2.52 2.59 2.54 2.54 2.852.83 2.96 2.93 2.89 2.92 2.98 2.59 2.81 2.78 2.95 2.96 2.95 2.56 2.592.87 2.84 2.84 2.80 2.65 2.70 2.61 2.89 2.83 2.85 2.52 2.66 2.74 2.732.88 2.85

100 High school students’ average time in seconds to read ambiguous sentences.

Values range between 2.50 seconds and 2.99 seconds.

Page 18: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

Determining “i” (the size of the interval)

• WHAT IS THE RULE FOR DETERMINING THE SIZE OF INTERVALS TO USE IN WHICH TO GROUP DATA?

• Whatever intervals seems appropriate to most informatively present the data. It is a matter of judgement. Usually we use 6 – 12 same size intervals each of which use intuitively obvious endpoints (e.g., 5s and 0s).

Page 19: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

Grouped Frequencies

ReadingTime

2.90-2.99

2.80-2.89

2.70-2.79

2.60-2.69

2.50-2.59

ReadingTime

2.95-2.99

2.90-2.94

2.85-2.89

2.80-2.84

2.75-2.79

2.70-2.74

2.65-2.69

2.60-2.64

2.55-2.59

2.50-2.54

Frequency

16

31

20

12

21

Frequency

9

7

20

11

10

10

4

8

10

11

Range = 2.99 - 2.50 = .49 ~ .50

i = .1#i = 5

i = .05#i = 10

Page 20: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

Either is acceptable.

• Use whichever display seems most informative.

• In this case, the smaller intervals and 10 category table seems more informative.

• Sometimes it goes the other way and less detailed presentation is necessary to prevent the reader from missing the forest for the trees.

Page 21: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

How you organize the data is up to you.

• When engaged in this kind of thing, there is often more that one way to organize the data.

• You should organize the data so that people can easily understand what is going on.

• Thus, the point is to use the grouped frequency distribution to provide a simplified description of the data.

Page 22: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

Determining “i” (the size of the interval)

• WHAT IS THE RULE FOR DETERMINING THE SIZE OF INTERVALS TO USE IN WHICH TO GROUP DATA?

Page 23: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

Determining “i” (the size of the interval)

• Whatever intervals seems appropriate to most informatively present the data. Usually we use 6 – 12 same size intervals each of which use intuitively obvious endpoints (e.g., 5s and 0s).

Page 24: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

Stem and Leaf Displays

• Used when seeing all of the values is important.

• Shows – data grouped– all values– visual summary

Page 25: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

Stem and Leaf Display

• Reading time dataReading

Time

2.9

2.9

2.8

2.8

2.7

2.7

2.6

2.6

2.5

2.5

Leaves

5,5,6,6,6,6,8,8,9

0,0,1,2,3,3,3

5,5,5,5,5,6,6,6,7,7,7,7,7,7,7,8,9,9,9,9

0,0,1,2,3,3,3,3,4,4,4

5,5,5,5,6,6,6,8,9,9

0,0,0,1,2,3,3,3,4,4

5,6,6,6

0,1,1,1,2,3,3,4

6,6,8,8,8,8,8,9,9,9

0,1,1,1,2,2,2,4,4,4,4

i = .05#i = 10

Page 26: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

Stem and Leaf Display

• Reading time dataReading

Time

2.9

2.8

2.7

2.6

2.5

Leaves

0,0,1,2,3,3,3,5,5,6,6,6,6,8,8,9

0,0,1,2,3,3,3,3,4,4,4,5,5,5,5,5,6,6,6,7,7,7,7,7,7,7,8,9,9,9,9

0,0,0,1,2,3,3,3,4,4,5,5,5,5,6,6,6,8,9,9

0,1,1,1,2,3,3,4,5,6,6,6

0,1,1,1,2,2,2,4,4,4,4,6,6,8,8,8,8,8,9,9,9

i = .1#i = 5

Page 27: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

What are the advantages of stem and leaf displays?

Page 28: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

You can see both all of the values and the shape of the distribution>

Page 29: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

Figural displays of frequency data

Page 30: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

Bar graphs• Bar graphs are used to show frequency of scores when

you have a discrete variable.• Discrete data can only take on a limited number of

values.• Numbers between adjoining values of a discrete

variable are impossible or meaningless.• Bar graphs show the frequency of specific scores or

ranges of scores of a discrete variable.• The proportion of the total area of the figure taken by a

specific bar equals the proportion of that kind of score.• Note, in this context proportion and relative frequency

are synonymous.

Page 31: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

The results of rolling a six-sided die 120 times

120 rolls – and it came out 20 ones, 20 twos, etc..

1 2 3 4 5 6

100

75

50

25

0

Page 32: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

Bar graphs and Histograms• Use bar graphs, not histograms, for discrete data.

(The bars don’t touch in a bar graph, they do in a histogram.)

• You rarely see data that is really discrete.

• Discrete data are almost always categories or rankings.ANYTHING ELSE IS ALMOST CERTAINLY A CONTINUOUS VARIABLE.

• Use histograms for continuous variables.

• AGAIN, almost every score you will obtain reflects the measurement of a continuous variable.

Page 33: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

Bar graphs and Histograms• Discrete data are almost always ???

• ANYTHING ELSE IS ALMOST CERTAINLY A ???

Page 34: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

Bar graphs and Histograms• Discrete data are almost always categories or

rankings.

• ANYTHING ELSE IS ALMOST CERTAINLY A CONTINUOUS VARIABLE.

Page 35: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

Bar graphs and Histograms

• How often do you see data that is really discrete?

Page 36: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

Bar graphs and Histograms

• You rarely see data that is really discrete.

• Almost every score you will obtain reflects the measurement of a continuous variable.

Page 37: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

A stem and leaf display turned on its side shows the transition to purely figural

displays of a continuous variable999977777776665555

988666655

3332100

44433332100

9986665555

4433321000

6665

43321110

44442221110

2.50-2.54

2.55-2.59

2.60 –2.64

2.65 –2.69

2.70 –2.74

2.75 –2.79

2.80 –2.84

2.85 –2.89

2.90 –2.94

2.95 –2.99

9998888866

Page 38: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

Histogram of reading times – notice how the bars touch at the

real limits of each class!

2.50-2.54

2.55-2.59

2.60 –2.64

2.65 –2.69

2.70 –2.74

2.75 –2.79

2.80 –2.84

2.85 –2.89

2.90 –2.94

2.95 –2.99

20181614121086420

Reading Time (seconds)

Frequency

Page 39: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

Histogram concepts - 1

• Histograms must be used to display continuous data.

• Most scores obtained by psychologists are continuous, even if the scores are integers.

• WHAT COUNTS IS WHAT YOU ARE MEASURING, NOT THE PRECISION OF MEASUREMENT.

• INTEGER SCORES IN PSYCHOLOGY ARE USUALLY ROUGH MEASUREMENTS OF CONTINUOUS VARIABLES.

Page 40: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

An Example• For example, while scores on a ten question

multiple choice intro psych quiz ( 1, 2, …10) are integers, you are measuring knowledge, which is a continuous variable that could be measured with 10,000 questions, each counting .001 points. Or 1,000,000 questions each worth .00001 points.

• You measure at a specific level of precision, because that’s all you need or can afford. Logistics, not the nature of the variable, constrains the measurement of a continuous variable.

Page 41: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

Histogram concepts - 2• If you have continuous data, you can use

histograms, but remember real class limits.

• Histograms can be used for relative frequencies as well.

• Histograms can be used to describe theoretical distributions as well as actual distributions.

Page 42: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

What are the real limits of the fifth class? The highest class?

2.50-2.54

2.55-2.59

2.60 –2.64

2.65 –2.69

2.70 –2.74

2.75 –2.79

2.80 –2.84

2.85 –2.89

2.90 –2.94

2.95 –2.99

20181614121086420

Real limits of the fifth class are ???? - ???? Real limits of the highest class are ???? - ????.

Frequency

Page 43: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

Real limits of the fifth class are 2.695-2.745 Real limits of the highest class are 2.945 - 2.995

2.50-2.54

2.55-2.59

2.60 –2.64

2.65 –2.69

2.70 –2.74

2.75 –2.79

2.80 –2.84

2.85 –2.89

2.90 –2.94

2.95 –2.99

20181614121086420

Frequency

Page 44: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

Displaying theoretical distributions is the most

important function of histograms.

• Theoretical distributions show how scores can be expected to be distributed around the mean.

Page 45: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

TYPES OF THEORETICAL DISTRIBUTIONS

• Distributions are named after the shapes of their histograms. For psychologists, the most important are:– Rectangular– J-shaped– Bell (Normal)– t distributions - Close to Bell shaped, but

a little flatter

Page 46: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

Rectangular Distribution of scores

Page 47: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

The rectangular distribution is the “know nothing” distribution

• Our best prediction is that everyone will score at the mean.

• But in a rectangular distribution, scores far from the mean occur as often as do scores close to the mean.

• So the mean tells us nothing about where the next score will fall (or how the next person will behave).

• We know nothing in that case.

Page 48: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

Flipping a coin: Rectangular distributions are frequently seen in games of chance, but rarely

elsewhere.

100 flips - how many heads and tails do you expect?

Heads Tails

100

75

50

25

0

Page 49: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

Rolling a die

120 rolls - how many of each number do you expect?

1 2 3 4 5 6

100

75

50

25

0

Page 50: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

Which distribution is this?

1 2 3 4 5 6

100

75

50

25

0

Page 51: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

RECTANGULAR!

1 2 3 4 5 6

100

75

50

25

0

Page 52: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

What happens when you sample two scores at a time?

• All of a sudden things change.

• The distribution of scores begins to resemble a normal curve!!!!

• The normal curve is the “we know something” distribution, because most scores are close to the mean.

Page 53: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

Rolling 2 dice

Look at the histogram to see how this resembles a bell shaped curve.

DiceTotal

1

2

3

4

5

6

7

8

9

10

11

12

AbsoluteFreq.

0

1

2

3

4

5

6

5

4

3

2

1

36

RelativeFrequency

.000

.028

.056

.083

.111

.139

.167

.139

.111

.083

.056

.028

1.001

Page 54: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

Rolling 2 dice

360 rolls

1 2 3 4 5 6 7 8 9 10 11 12

100908070605040302010

0

Page 55: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

Normal Curve

Page 56: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

The normal curve is the “we know ???” distribution? Why?

Page 57: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

It’s the “we know something distribution”, because most scores

are near the mean.

Page 58: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

J Curve

Occurs when socially normative behaviors are measured.Most people follow the norm, but there are always a few outliers.

Page 59: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

What does the J shaped distribution represent?

• The J shaped distribution represents situations in which most everyone does about the same thing. These are unusual social situations with very clear contingencies.

• For example, how long do cars without handicapped plates park in a handicapped spot when there is a cop standing next to the spot.

• Answer: Zero minutes!• So, the J shaped distribution is the “we know

almost everything” distribution, because we can predict how a large majority of people will behave.

Page 60: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

When do you get a J shaped distribution?

Page 61: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

When do you get a J shaped distribution?

Occurs when socially normative behaviors are measured.Most people follow the norm, but there are always a few outliers.

Page 62: Chapter 2 Frequency Distributions, Stem-and- leaf displays, and Histograms

Principles of Theoretical Curves

Expected frequency = Theoretical relative frequency X N

Expected frequencies are your best estimates because they are closer, on the average, than any other estimate when we square the difference between observed and predicted frequencies.

Law of Large Numbers - The more observations that we have, the closer the relative frequencies we actually observe should come to the theoretical relative frequency distribution.