chapter 1 statistics
DESCRIPTION
statisticsTRANSCRIPT
-
1
Nurhana Mohamad. PPD 2 2014/15
CHAPTER 1: STATISTICS (2 hour)
1.0 Introduction
1.1 Organizing the Data
1.2 Measure of Central Tendency
grouped data, ungrouped data: mean, mode, median.
1.3 Measure of Dispersion
Range, variance, and standard deviation.
1.0 INTRODUCTION
What is Statistics?
Statistics is the science that deals with the collection, classification, analysis and interpretation of
information/data in order to make decisions.
Statistics is divided into 2 parts:
1. Descriptive statistics
2. Inferential statistics
Descriptive statistics Inferential statistics
Process of data gathering, presentation and
summary
Concerns with making conclusions or
inferences from samples about the populations
from which they have been drawn.
Example: A researcher collects on the amount
students spent on food, leisure and academic
requirements from their study loan. He then
summarizes the data by finding the mean and
standard deviation of the data. He also did
some graphs and charts to present his findings.
Example:
A researcher did an analysis to find out if it is
true students spend less than 10% of their
study loan text books.
Statistics Definition
1. Population
A population is any entire collection of objects from which we may collect data.
This could be people, animals, microchips and so on.
-
2
Nurhana Mohamad. PPD 2 2014/15
Example: A study on the performance of four-wheel drive vehicles have population
consisting of all four-wheel drive vehicles.
2. Sample
A sample is a group of units that is subset of the population
Example: 54 wheel drive vehicles for six models are selected for the study
3. Variable
A variable is a particular characteristic of the object being studied. This characteristic can
take on different values as we measure/gather it from one object to another
Example: The variable measures could be fuel consumption, models of cars and seating
capacity
4. Random
Randomness means unpredictability. One of the requirements in sampling process is to
conform to randomness. Hence, the variable being measures is called random variables.
Example: A sample of 10 students that cross the school gate within 7.30 until 7.35 am
was randomly selected to be interviewed by the prefect.
5. Data
Data are basically numbers derived from measuring a variable. However data can also be
non-numeric
Example: Height (153.4 cm, 141 cm) and favorite color (red, blue)
6. Quantitative data (or numerical data)
-consists of numbers used for calculations.
-can be continuous or discrete
Divided by two parts:
i. Continuous data -can take on values in decimal places or fractions.
Example: weight, pressure, time, temperature, cost
ii. Discrete data -whole number.
Example: the number of cars in a parking lot, The number of students in computer
lab
-
3
Nurhana Mohamad. PPD 2 2014/15
7. Qualitative data (or non-numeric data)
-labels, names or levels
-can be divided into two types nominal and ordinal
Divided by two parts:
i. Nominal data represent labels or names.
Example: the colors of Ph paper (red=1, orange=2,yellow=3, blue=4)
ii. Ordinal data represent levels or order.
Example: the first, second and third place in a competition
8. Parameter
A parameter is a value used to represent a certain population characteristic. Parameters
are assigned Greek letters.
Example: mean = and standard deviation =
9. Statistics
A statistics is a number summarizing some aspects of the data calculated using the data
collected from the sample. They are assigned Roman letters.
Example: mean = and standard deviation =
1.1 ORGANIZING THE DATA
Data can be defined as groups of information that represent the qualitative or quantitative
attributes of a variable or set of variables. Data in statistics can be classified into grouped data
and ungrouped data.
Any data that you first gather is ungrouped data. Ungrouped data is data in the raw. An
example of ungrouped data is a any list of numbers that you can think of.
Grouped data is data that has been organized into groups known as classes. Grouped
data has been 'classified' and thus some level of data analysis has taken place, which means that
the data is no longer raw.
-
4
Nurhana Mohamad. PPD 2 2014/15
Example
Ungrouped Data Grouped Data
Data on Minutes Spent on the Phone by 30 Respondents
102 124 108 86 103 82
71 104 112 118 87 95
103 116 85 122 87 100
105 97 107 67 78 125
109 99 105 99 101 92
Hana
Minutes Frequency
67-78 3
79-90 5
91-102 8
103-114 9
115-126 5
hana
The Step of Organizing Ungrouped Data to Grouped Data
Step Example
1. Determine how many classes you want to
have. A frequency distribution should have at least five classes groupings, but no more than 15
You can use formula, = 1 + 3.3 log Where = number of class, =total data
Data on Minutes Spent on the Phone by 30
Respondents
102 124 108 86 103 82
71 104 112 118 87 95
103 116 85 122 87 100
105 97 107 67 78 125
109 99 105 99 101 92
Let say we want to have 5 classes
2. Next, subtract the lowest value in the data
set from the highest value in the data set
and then you divide by the number of
classes that you want to have:
Sizeofclass = Hishestvalue lowestvaluenumberofclassesyouwanttohave
Highest value =124
Lowest value = 67
Size of class 124.115
67124=
=
3. Build a table of frequency distribution
with following the size of class
Hana
Minutes Tally Frequency
67-78 /// 3
79-90 //// 5
91-102 //// // 8
103-114 //// /// 9
115-126 //// 5
Total=30
hana
4. Start the first class number with the lowest
value.
5. Tally the data and place the results in
frequency column
-
5
Nurhana Mohamad. PPD 2 2014/15
Example A:
Suppose a researcher wished to do a study on the ages of the top 50 wealthiest people in the
world. The raw data collected was listed below:
57 90 81 73 61
59 57 69 65 60
56 85 78 68 85
85 81 61 69 52
81 43 43 37 78
82 68 67 64 48
56 49 79 77 65
40 69 80 59 54
71 76 69 61 74
83 35 74 87 49
Construct the frequency distribution with class limit 35 41, 42 48, and so on.
Solution:
Number of class = 8
Highest value = 90
Lowest value = 35
Size of class = 7875.68
3590=
Frequency Table:
Minutes Tally Frequency
35-41 /// 3
42-48 /// 3
49-55 //// 4
56-62 //// //// 10
63-69 //// //// 10
70-76 //// 5
77-83 //// //// 10
84-90 //// 5
Total=50
-
6
Nurhana Mohamad. PPD 2 2014/15
1.2 MEASURE OF CENTRAL TENDENCY
Measure of central tendency includes the mean, median and mode. It is a number that
represent value for a collection of data that can describe one word towards the data.
1.2.1 Mean
Mean is the sum of the values, divided by the total number of values.
Mean for Ungrouped Data
Sample mean, %& (Roman letter) = ' + ( ++ *+ =
-*-.'+ Where + is sample size.
Population mean, / (Greek letter) = ' + ( ++ * =
-*-.' Where is population size.
Mean for Grouped Data
Sample mean, = 0'' + 0(( ++ 0**0' + 0( ++ 0* =
0--*-.' 0-*-.' Where 0is frequency of the class
is class midpoint
Example B:
The data represent the number of days off per year for a sample of individuals selected from
seven different countries. Find the mean:
15 21 16 17 25 30 27
Solution:
Mean, 57.217
151
7
27302517162115==
++++++=x
The mean of the number of days off is 21.57 days.
Example C:
Find the mean of the Science Test marks for the selected 10 students in Class Colorful below.
40 51 60.5 45 46
53 59 44 35 53
Solution:
Mean, 65.4810
5.486
10
533544595346455.605140==
+++++++++=x
-
7
Nurhana Mohamad. PPD 2 2014/15
The mean of the Science Test marks is 48.65.
Example D:
The number of students in all the Bachelor of Technology in four programmes was listed below
by a educational officer from UTHM. Find the mean.
Multimedia : 140 Web Technology : 165
Internet Security : 210 Software Engineering : 200
Solution:
Mean, 75.1784
71500
4
200210165140==
+++=
The mean number of students in all the Bachelor of Technology is 178.75.
Example E:
A sample of 30 automobiles was tested for fuel efficiency (in miles per gallon). Find . Fuel efficiency (in miles per gallon) Frequency
8-12 3
13-17 5
18-22 15
23-27 5
28-32 2
Solution:
Class Class midpoint, -
Frequency, 0-
0-- 8-12 10 3 30
13-17 15 5 75
18-22 20 15 300
23-27 25 5 125
28-32 30 2 60
0=30 0=590
= 0--*-.' 0-*-.' =59030 = 19.67
The mean of fuel efficiency (in miles per gallon) is 19.67
-
8
Nurhana Mohamad. PPD 2 2014/15
Example F:
Find the mean for the repetition of the word knowledge in a story book.
Word knowledge Number of page
10 14
15 16
23 23
40 33
41 11
45 3
Solution:
Class midpoint, -
Frequency, 0-
0-- 10 14 140
15 16 240
23 23 529
40 33 1320
41 11 451
45 3 135
0=100 0=2815
= 0--*-.' 0-*-.' =2815100 = 28.15
The mean for the repetition of word knowledge is 28.15 in every page.
-
9
Nurhana Mohamad. PPD 2 2014/15
1.2.2 Median
Median is the most centrally located (middle) value.
Median for Ungrouped Data
The data is arranged from lowest
to the highest value
The middle value is the median.
If the data is even, median is the
average of the two middle
values.
Median for Grouped Data
Median for grouped data, 9 = :; + ?@ABC )
Where + = samplesize :; = lowerclassboundaryformedianclass < = sizeoftheclassmedian G = cumulativefrequencybeforemedianclass 0; = frequencyinthemedianclass
Example G:
The number of rooms in the eight hotels in Malaysia is listed below. Find the median.
650 450 700 550 655 350 400 500
Solution:
Arrange the data in increasing order:
350 400 450 500 550 650 655 700
Select the middle value:
350 400 450 500 550 650 655 700 JKLLMN Median, 9 = OPPQOOP(
= 10502 = 525
The median for the number of rooms in the eight hotels in Malaysia is 525 rooms
Example H:
The number of children with asthma during a specific year in seven zones in Johor is shown.
Find the median.
253 125 328 417 201 70 90
Solution:
70 90 125 201 253 328 417
-
10
Nurhana Mohamad. PPD 2 2014/15
JKLLMN The median for the number of children with asthma during a specific year in seven zones in Johor is 201
Example I:
Six customers purchased these numbers of books in a month;
7 1 3 2 8 13
Solution:
1 2 3 7 8 13 JKLLMN Median, 9 = RQS(
= 102 = 5
The median for the number of books that six customers purchased in a month is 5
Example J:
Find the median for the data in the following frequency table
Class Frequency
30 39 40 49 50 59 60 69 70 79 80 89 90 99 100 109
4
12
17
9
7
4
2
1
-
11
Nurhana Mohamad. PPD 2 2014/15
Solution:
Class Frequency,f Cumulative frequency,F
30 39 40 49 50 59 60 69 70 79 80 89 90 99 100 109
4
12
17
9
7
4
2
1
4
16
33
42
49
53
55
56
U0 =56
Sample size (total frequency),+ = 56, The middle of the data
*( = 28,
This value is located in the third 50 59 interval; we call this interval as median class. Lower boundary for median class, :; = OPQVW( = 49.5 Size of class, < = 40 30 = 10 Cumulative frequency before class median, G = 16 Frequency in median class, 0; = 17 Median, 9 = :; + ?@ABC )
= 49.5 + 10 X28 1617 Y = 56.56
Example K:
A sample of 30 automobiles was tested for fuel efficiency (in miles per gallon).
Fuel efficiency (in miles per gallon) Frequency
8-12 3
13-17 5
18-22 15
23-27 5
28-32 2
Find the median.
Solution:
-
12
Nurhana Mohamad. PPD 2 2014/15
Class Frequency,f Cumulative frequency,F
8-12 3 3
13-17 5 8
18-22 15 23
23-27 5 28
28-32 2 30
U0 =30 + = 30, *( = 15, Median class: 18-22 (Third interval)
:; = 'ZQ'S( = 17.5 < = 13 8 = 5 G = 8 0; = 15 Median, 9 = :; + ?@ABC )
= 17.5 + 5 X15 815 Y = 19.83 Example L:
A random sample of 35 states shows the number of specialty coffee shops for a specific
company. Find the median.
Class boundaries Frequency
0.5-19.5 13
19.5-38.5 8
38.5-57.5 6
57.5-76.5 5
76.5-95.5 3
Solution:
Class boundaries Frequency,f Cumulative frequency,F
0.5-19.5 13 13
19.5-38.5 8 21
38.5-57.5 6 27
57.5-76.5 5 32
76.5-95.5 3 35
U0 =35 + = 35,
-
13
Nurhana Mohamad. PPD 2 2014/15
*( = 17.5, Median class: 19.5-38.5 (Second class boundaries)
:; = 19.5 < = 19 G = 13 0; = 8 Median, 9 = :; + ?@ABC )
= 19.5 + 19 X17.5 138 Y = 30.19
1.2.3 Mode
The third measure of central tendency is mode. The mode is the value that occurs most
often in the data set. It is sometimes said to be the most typical case.
Mode for Ungrouped Data
Mode is the value that occur most frequency.
Mode for Grouped Data
Mode for grouped data, 9P = : +
-
14
Nurhana Mohamad. PPD 2 2014/15
Example N:
Find the mode for the number of branches that six banks have:
300, 324, 400, 202, 227, 198
Solution:
Mode = no mode
Example O:
Find the mode of the number of books purchased by twelve customers in a month;
7 7 4 4 4 9 7 7 4 7 4 9
Solution:
Mode = 7^+]11
Example P:
Find the mode for the data in the following frequency table
Class Frequency
30 39 40 49 50 59 60 69 70 79 80 89 90 99 100 109
4
12
17
9
7
4
2
1
Solution:
bimodal
-
15
Nurhana Mohamad. PPD 2 2014/15
Class Frequency,f
30 39 40 49 50 59 60 69 70 79 80 89 90 99 100 109
4
12
17
9
7
4
2
1
U0 =56
The greatest frequency is 17, occur at class 50 59.
Lower boundary for mode class, : = OPQVW( = 49.5 Size of class, < = 10 Difference frequency between mode class and class before it, ]' = 17 12 = 5 Difference frequency between mode class and class after it, ]( = 17 9 = 8
Mode,9P = 49.5 + 10( 55 + 8) = 53.35
Example Q:
A sample of 30 automobiles was tested for fuel efficiency (in miles per gallon).
Fuel efficiency (in miles per gallon) Frequency
8-12 3
13-17 5
18-22 15
23-27 5
28-32 2
Find the mode.
Solution:
Class Frequency,f
8-12 3
13-17 5
18-22 15
23-27 5
28-32 2
U0 =30
-
16
Nurhana Mohamad. PPD 2 2014/15
The greatest frequency is 15, occur at class 18 22.
Lower boundary for mode class, : = 'ZQ'S( = 17.5 Size of class, < = 5 Difference frequency between mode class and class before it, ]' = 15 5 = 10 Difference frequency between mode class and class after it, ]( = 15 5 = 10 Mode,9P = 17.5 + 5( 1010 + 10) = 20 Example R:
A random sample of 35 states shows the number of specialty coffee shops for a specific
company.
Class boundaries Frequency
0.5-19.5 13
19.5-38.5 8
38.5-57.5 6
57.5-76.5 5
76.5-95.5 3
Find the mode.
Solution:
Class boundaries Frequency,f
0.5-19.5 13
19.5-38.5 8
38.5-57.5 6
57.5-76.5 5
76.5-95.5 3
U0 =35
The greatest frequency is 13, occur at class boundaries 0.5 19.5.
Lower boundary for mode class, : = 0.5 Size of class, < = 19 Difference frequency between mode class and class before it, ]' = 13 0 = 13 Difference frequency between mode class and class after it, ]( = 13 8 = 5
Mode,9P = 0.5 + 19( 1313 + 5) = 14.22
-
17
Nurhana Mohamad. PPD 2 2014/15
1.3 MEASURE OF DISPERSION
Another important statistic for describing a data set is a measure of dispersion or spread in the
data. This measure tells you how much different are the values in the data set from the middle of
the data set.
For the spread or variability of a data set, three measures are commonly used; range, variance
and standard deviation.
1.3.1 Range
Range is the simplest of the three measures.
Range (Ungroup data)
The range is highest value minus the lowest
value. The symbol a is used for the range. a = highest value lowest value
Example S:
Find the range for the data on hand phone usage, for a call and texting (minutes) for eight
students in a month below:
200 70 3000 17 500 205 75 53
Solution:
Range = 3000 - 17
= 2983
Example T:
The salaries for the ABC Companys staff are listed below. Find the range.
Staff Salary (RM)
CEO 100,000
Manager 40,000
Sales Representative 30,000
Workers 1 25,000
Workers 2 15,000
Workers 3 10,000
Solution: Range = 100,000 10,0000 = 90,000
-
18
Nurhana Mohamad. PPD 2 2014/15
1.3.2 Variance and Standard Deviation
Variance is the average of the squares of the distance each value from the mean. The symbol for
the population variance is (,sigma square. Whereas for sample variance is (. Standard deviation measure the distance of each value from the mean. It is the square root of the
variance. The symbol for the population variance is ,sigma and for sample standard deviation.
Formula Sample Population (C) (A) Ungroup (B) Group
Variance
i) ( = (- )(*-.'+ 1 i) ( = 10 1U0-(- )(
*
-.'
i) ( = (- )(*-.'
ii)
)1(
2
11
2
2
===
nn
xxn
s
n
i
i
n
i
i
ii) ( )
=
= f
xfxf
fs
iin
i
ii
2
1
22
1
1 ii)
2
2
11
2
2
N
xxNn
i
i
n
i
i
===
Standard
deviation = b( = b( = b(
Example U:
A testing lab wishes to test two experimental brands of outdoor paint to see how long each will
last before fading (in month). The testing lab makes 4 gallons of each paint to test. Since
different chemical agents are added to each brand and only four cans are involved, these two
brands constitute small populations. Find the variance and standard deviation of each brand and
make a conclusion.
Brand ABC: 12 35 37 26
Brand XYZ: 17 32 37 24
Solution:
The sentences ., these two brands constitute small populations. shows the data given is
population data.
Using Formula C
-
19
Nurhana Mohamad. PPD 2 2014/15
Brand ABC
Formula C(i) Formula C(ii)
= 12 + 35 + 37 + 264 =1104 = 27.5
- - (- )( 12
35
37
26
12 27.5 = -15.5
7.5
9.5
-1.5
240.25
56.25
90.25
2.25
U(- )(*
-.'
389
= 4 Therefore,
( = (- )(*-.' =3894 = 97.25 = b( = 97.25 = 9.86 Variance, ( = 97.25and Standard
deviation, = 9.86
- -( 12
35
37
26
144
1225
1369
676
U 110 3414 = 4 Therefore,
( ) ( )
25.97
4
110341442
2
2
2
11
2
2
=
=
===
N
xxNn
i
i
n
i
i
= b( = 97.25 = 9.86 Variance, ( = 97.25and Standard deviation, = 9.86
Brand XYZ
= 17 + 32 + 37 + 244 =1104 = 27.5
- - (- )( 17
32
37
24
17 27.5 = -10.5
4.5
9.5
-3.5
110.25
20.25
90.25
12.25
U(- )(*
-.'
233
= 4 Therefore,
( = (- )(*-.' =2334 = 58.25
= b( = 58.25 = 7.63 Variance, ( = 58.25and Standard deviation, =7.63
- -( 17
32
37
24
289
1024
1369
576
U 110 3258 = 4 Therefore,
( ) ( )
25.58
4
110325842
2
2
2
11
2
2
=
=
===
N
xxNn
i
i
n
i
i
= b( = 58.25 = 7.63 Variance, ( = 58.25and Standard deviation, =7.63
Conclusion
Brand ABC : Variance, ( = 97.25and Standard deviation, = 9.86 Brand XYZ : Variance, ( = 58.31and Standard deviation, = 7.64 When the means are equal, the larger the variance or standard deviation, the more variable the data are. Since the standard
deviation of Brand ABC is 9.86 and the standard deviation of Brand XYZ is 7.64, the data are more variable for Brand ABC.
Data in Brand ABC is spreading widely compare to data in Brand XYZ.
-
20
Nurhana Mohamad. PPD 2 2014/15
Example V:
The sample of six students in UTHM was selected by the accountant to examine their pocket
money in a month. Find the variance and standard deviation from the salaries listed below.
Student Pocket Money (RM)
A 400
B 300
C 250
D 150
E 110
F 100
Solution:
Using Formula A
Formula A(i) Formula A(i)
= 400 + 300 + 250 + 150 + 110 + 1006 = 13106 = 218.33
- - (- )( 400 181.67 33003.99
300 81.67 6669.989
250 31.67 1002.989
150 -68.33 4668.989
110 -108.33 11735.39
100 -118.33 14001.99
U(- )(*
-.'
71083.33
+ = 6 Therefore,
( = (- )(*-.'+ 1 =71083.33
5 = 14216.67
= b( = 14216.67 = 119.23 Variance, ( = 14216.67 Standard deviation, = 119.23
- -( 400 160000
300 90000
250 62500
150 22500
110 12100
100 10000
U 1310 357100 + = 6 Therefore,
67.1421
)16(6
1310)357100(6
)1(
2
2
11
2
2
=
=
===
nn
xxn
s
n
i
i
n
i
i
= b( = 14216.67 = 119.23 Variance, ( = 14216.67 Standard deviation, = 119.23
-
21
Nurhana Mohamad. PPD 2 2014/15
Example W:
A sample of seven zones in Johor was selected by the Ministry of Health to check the number
children with asthma during a specific year in each zones. Find the ( and . 253 125 328 417 201 70 90
Solution:
= 253 + 125 + 328 + 417 + 201 + 70 + 907 =14847 = 212
- - (- )( 253 41 1681
125 -87 7569
328 116 13456
417 205 42025
201 -11 121
70 -142 20164
90 -122 14884
U(- )(*
-.' 99900
+ = 7 Therefore,
( = (- )(*-.'+ 1 =999006 = 16650
= b( = 16650 = 129.03 Variance, ( = 16650and Standard deviation, = 129.03
x X^2
253 64009
125 15625
328 107584
417 173889
201 40401
70 4900
90 8100
sum 1484 414508
+ = 7 Therefore,
16650
)17(7
1484)414508(7
)1(
2
2
11
2
2
=
=
===
nn
xxn
s
n
i
i
n
i
i
= b( = 16650 = 129.03 Variance, ( = 16650 Standard deviation, = 129.03
Example X:
A sample of 30 automobiles was tested for fuel efficiency (in miles per gallon).
Fuel efficiency (in miles per gallon) Frequency
8-12 3
13-17 5
18-22 15
23-27 5
28-32 2
Find the variance and standard deviation.
Solution:
Using Formula B
-
22
Nurhana Mohamad. PPD 2 2014/15
Formula B(i)
- 0- 0-- (- )( 0-(- )( 10
15
20
25
30
3
5
15
5
2
30
75
300
125
60
93.51
21.81
0.11
28.44
106.71
280.53
109.04
1.63
142.04
213.42
Total 30 590 746.66
= 0--*-.' 0-*-.' =59030 = 19.67
( = 10 1U0-(- )(*
-.'= 130 1 (746.66) = 25.75
= b( = 25.75 = 5.07 Variance, ( = 25.75and Standard deviation, = 5.07
Formula B(ii)
- 0- 0-- -( 0--( 10
15
20
25
30
3
5
15
5
2
30
75
300
125
60
100
225
400
625
900
300
1125
6000
3125
1800
Total 30 590 12350
( = 10 1 [U0--( (0--)(0 ]
*
-.'= 129 f12350
590(30 g = 25.75
= b( = 25.75 = 5.07
Variance, ( = 25.75and Standard deviation, = 5.07 Both formula gives same result. You can choose either one.
-
23
Nurhana Mohamad. PPD 2 2014/15
Example Y:
A random sample of 35 states shows the number of specialty coffee shops for a specific
company.
Class boundaries Frequency
0.5-19.5 13
19.5-38.5 8
38.5-57.5 6
57.5-76.5 5
76.5-95.5 3
Find the variance and standard deviation.
Solution:
- 0- 0-- (- )( 0-(- )( or -( 0--( 10
29
48
67
86
13
8
6
5
3
130
232
288
335
258
650.76
42.38
156
991.62
2549.24
8459.88
339.04
936
4958.1
7647.72
100
841
2304
4489
7396
1300
6728
13824
22445
22188
Total 35 1243 22340.74 66485
= 0--*-.' 0-*-.' =124335 = 35.51
( = 10 1U0-(- )(*
-.'( = 10 1 [U0--(
(0--)(0 ]*
-.'
= 135 1 (22340.74) =134 f66485
1243(35 g
= 657.08 = 657.08
= b( = 657.08 = 25.63
Variance, ( = 657.08and Standard deviation, = 25.63