statistics 1. how long is a name? to answer this question, we might collect some data on the length...
TRANSCRIPT
Statistics 1
How long is a name?
• To answer this question, we might collect some data on the length of a name.
How long is a name?
• First we need to establish our target population.
How long is a name?
• First we need to establish our target population.
• Let’s say in this mathematics class.
How long is a name?
• What names should we use?
How long is a name?
• What names should we use?
• Names as listed on the roll.
Data
Averaging
• We call this a central tendency.
• There are 3 measures which we can use.
• MEAN
• MEDIAN
• MODE
Mean
• Usually when we say average, we are referring to the mean.
• To find the mean, we add up all the numbers and divide by how many there are.
Example
• Find the mean of 4, 0, 2, 1, 6
In Excel we can use the formula
• =average(highlight cells)
Data on names
Median
• A median is the middle value when the data is put in order.
• If there are an odd number of data, the middle is unique.
• If there is an even number of data, we need to average the two middles.
Example
• Find the median of 4, 8, 2, 9, 1
• First put them in order
• 1, 2, 4, 8, 9
Example
• Find the median of 4, 8, 2, 9, 1
• First put them in order
• 1, 2, 4, 8, 9
• The middle number is ‘4’
Example
• Find the median of 4, 8, 2, 9, 1, 6
• First put them in order
• 1, 2, 4, 6, 8, 9
• The middle number is ‘4’ and ‘6’
• Averaging gives median is 5.
Sort data on Excel or use formula =median(data)
Mode
• The mode is the most common number.
• You can have 2 modes but not more than 2.
Example
• Find the mode of 6, 4, 3, 7, 8, 6, 7, 2
Example
• Find the mode of 6, 4, 3, 7, 8, 6, 7, 2
• There are two modes 6 and 7
Using Excel
• Formula =mode(data)
• You must be careful as Excel will only give one mode
Which average is the best?
• Generally we use the mean as it includes all the data but if we have extreme values, the median is a better measure as it is not affected by extreme values.
Example
• These are the incomes of a group of university students.
• $2400, $1500, $2000, $1800, $22 000
• Find the best ‘average’.
Example
• $2400, $1500, $2000, $1800, $22 000
• The mean is not representative whereas the median is.
Frequency tables
Length Tally Frequency
3 ll 2
4 llll 5
5 llll llll llll 14
6 llll ll 7
7 llll 5
8 ll 2
Mode is 5
Length Tally Frequency
3 ll 2
4 llll 5
5 llll llll llll 14
6 llll ll 7
7 llll 5
8 ll 2
Median is also 5
Length Tally Frequency
3 ll 2
4 llll 5
5 llll llll llll 14
6 llll ll 7
7 llll 5
8 ll 2
Mean is 5.4
Length Tally Frequency
3 ll 2
4 llll 5
5 llll llll llll 14
6 llll ll 7
7 llll 5
8 ll 2
Calculating the mean by hand
Using the calculator
• STAT mode• Place data in list 1• Place frequency in list 2• CALC, SET, • 1Var Xlist list1• 1Var Freq list2• Exe• 1Var
Measures of spread
• It is not enough to just give the ‘average’.
• The mean, median and mode is the same for all 3 sets of data:
• 48 49 50 50 51 52• 40 45 50 50 50 55 60• 0 0 50 50 50 100 100• But the data sets are quite different
Measures of spread
• Range is
• (highest number) - (lowest number)
• For our data set the first names have a range of 8 - 3 = 5
Measures of spread
• Again, if there are extreme values, the range can distort the true spread of the data.
5-number summary
• We often sort the data into a 5 number summary.
• The data is split into 4 groups
Example 1
• 1 14 29 35 43 48 49 78 82 82 92 95 95
• 13 numbers
Example 1
• 1 14 29 35 43 48 49 78 82 82 92 95 95
• Lowest is 1
• Median is 49
• Highest is 95
Example 1
• 1 14 29 35 43 48 49 78 82 82 92 95 95
• Lowest is 1
• Lower quartile is 35
• Median is 49
• Upper quartile is 82
• Highest is 95
Example 2
• 9 11 17 22 23 28 30 36
Example 2
• 9 11 17 22 23 28 30 3622.514
29
Example 2
• 9 11 17 22 23 28 30 36
• 5-number summary is
• 9 14 22.5 29 36
22.514
29
For first names in our class
• The 5-number summary is 3 4 5 6 8
• Lower quartile is 4
• Upper quartile is 6
• Interquartile range is the difference between quartiles 6 - 4 = 2
Statistics so far
• Central tendencies:
• Mean = 5.4
• Median = 5
• Mean = 5
• Because the mean and median are about the same, we wouldn’t expect extreme values.
Statistics so far
• Measures of spread:
• Range = 5
• Interquartile range = 2
Statistics so far
• 5 - number summary
• 3 4 5 6 8