chapter 1 statistics

1

Nurhana Mohamad. PPD 2 2014/15

CHAPTER 1: STATISTICS (2 hour)

1.0 Introduction

1.1 Organizing the Data

1.2 Measure of Central Tendency

grouped data, ungrouped data: mean, mode, median.

1.3 Measure of Dispersion

Range, variance, and standard deviation.

1.0 INTRODUCTION

What is Statistics?

Statistics is the science that deals with the collection, classification, analysis and interpretation of

information/data in order to make decisions.

Statistics is divided into 2 parts:

1. Descriptive statistics

2. Inferential statistics

Descriptive statistics Inferential statistics

Process of data gathering, presentation and

summary

Concerns with making conclusions or

inferences from samples about the populations

from which they have been drawn.

Example: A researcher collects on the amount

students spent on food, leisure and academic

requirements from their study loan. He then

summarizes the data by finding the mean and

standard deviation of the data. He also did

some graphs and charts to present his findings.

Example:

A researcher did an analysis to find out if it is

true students spend less than 10% of their

study loan text books.

Statistics Definition

1. Population

A population is any entire collection of objects from which we may collect data.

This could be people, animals, microchips and so on.

2


Example: A study on the performance of four-wheel drive vehicles have population

consisting of all four-wheel drive vehicles.

2. Sample

A sample is a group of units that is subset of the population

Example: 54 wheel drive vehicles for six models are selected for the study

3. Variable

A variable is a particular characteristic of the object being studied. This characteristic can

take on different values as we measure/gather it from one object to another

Example: The variable measures could be fuel consumption, models of cars and seating

capacity

4. Random

Randomness means unpredictability. One of the requirements in sampling process is to

conform to randomness. Hence, the variable being measures is called random variables.

Example: A sample of 10 students that cross the school gate within 7.30 until 7.35 am

was randomly selected to be interviewed by the prefect.

5. Data

Data are basically numbers derived from measuring a variable. However data can also be

non-numeric

Example: Height (153.4 cm, 141 cm) and favorite color (red, blue)

6. Quantitative data (or numerical data)

-consists of numbers used for calculations.

-can be continuous or discrete

Divided by two parts:

i. Continuous data -can take on values in decimal places or fractions.

Example: weight, pressure, time, temperature, cost

ii. Discrete data -whole number.

Example: the number of cars in a parking lot, The number of students in computer

lab

3


7. Qualitative data (or non-numeric data)

-labels, names or levels

-can be divided into two types nominal and ordinal

Divided by two parts:

i. Nominal data represent labels or names.

Example: the colors of Ph paper (red=1, orange=2,yellow=3, blue=4)

ii. Ordinal data represent levels or order.

Example: the first, second and third place in a competition

8. Parameter

A parameter is a value used to represent a certain population characteristic. Parameters

are assigned Greek letters.

Example: mean = and standard deviation =

9. Statistics

A statistics is a number summarizing some aspects of the data calculated using the data

collected from the sample. They are assigned Roman letters.

Example: mean = and standard deviation =

1.1 ORGANIZING THE DATA

Data can be defined as groups of information that represent the qualitative or quantitative

attributes of a variable or set of variables. Data in statistics can be classified into grouped data

and ungrouped data.

Any data that you first gather is ungrouped data. Ungrouped data is data in the raw. An

example of ungrouped data is a any list of numbers that you can think of.

Grouped data is data that has been organized into groups known as classes. Grouped

data has been 'classified' and thus some level of data analysis has taken place, which means that

the data is no longer raw.

4


Example

Ungrouped Data Grouped Data

Data on Minutes Spent on the Phone by 30 Respondents

102 124 108 86 103 82

71 104 112 118 87 95

103 116 85 122 87 100

105 97 107 67 78 125

109 99 105 99 101 92

Hana

Minutes Frequency

67-78 3

79-90 5

91-102 8

103-114 9

115-126 5

hana

The Step of Organizing Ungrouped Data to Grouped Data

Step Example

1. Determine how many classes you want to

have. A frequency distribution should have at least five classes groupings, but no more than 15

You can use formula, = 1 + 3.3 log Where = number of class, =total data

Data on Minutes Spent on the Phone by 30

Respondents

102 124 108 86 103 82

71 104 112 118 87 95

103 116 85 122 87 100

105 97 107 67 78 125

109 99 105 99 101 92

Let say we want to have 5 classes

2. Next, subtract the lowest value in the data

set from the highest value in the data set

and then you divide by the number of

classes that you want to have:

Sizeofclass = Hishestvalue lowestvaluenumberofclassesyouwanttohave

Highest value =124

Lowest value = 67

Size of class 124.115

67124=

=

3. Build a table of frequency distribution

with following the size of class

Hana

Minutes Tally Frequency

67-78 /// 3

79-90 //// 5

91-102 //// // 8

103-114 //// /// 9

115-126 //// 5

Total=30

hana

4. Start the first class number with the lowest

value.

5. Tally the data and place the results in

frequency column

5


Example A:

Suppose a researcher wished to do a study on the ages of the top 50 wealthiest people in the

world. The raw data collected was listed below:

57 90 81 73 61

59 57 69 65 60

56 85 78 68 85

85 81 61 69 52

81 43 43 37 78

82 68 67 64 48

56 49 79 77 65

40 69 80 59 54

71 76 69 61 74

83 35 74 87 49

Construct the frequency distribution with class limit 35 41, 42 48, and so on.

Solution:

Number of class = 8

Highest value = 90

Lowest value = 35

Size of class = 7875.68

3590=

Frequency Table:

Minutes Tally Frequency

35-41 /// 3

42-48 /// 3

49-55 //// 4

56-62 //// //// 10

63-69 //// //// 10

70-76 //// 5

77-83 //// //// 10

84-90 //// 5

Total=50

6


1.2 MEASURE OF CENTRAL TENDENCY

Measure of central tendency includes the mean, median and mode. It is a number that

represent value for a collection of data that can describe one word towards the data.

1.2.1 Mean

Mean is the sum of the values, divided by the total number of values.

Mean for Ungrouped Data

Sample mean, %& (Roman letter) = ' + ( ++ *+ =

-*-.'+ Where + is sample size.

Population mean, / (Greek letter) = ' + ( ++ * =

-*-.' Where is population size.

Mean for Grouped Data

Sample mean, = 0'' + 0(( ++ 0**0' + 0( ++ 0* =

0--*-.' 0-*-.' Where 0is frequency of the class

is class midpoint

Example B:

The data represent the number of days off per year for a sample of individuals selected from

seven different countries. Find the mean:

15 21 16 17 25 30 27

Solution:

Mean, 57.217

151

7

27302517162115==

++++++=x

The mean of the number of days off is 21.57 days.

Example C:

Find the mean of the Science Test marks for the selected 10 students in Class Colorful below.

40 51 60.5 45 46

53 59 44 35 53

Solution:

Mean, 65.4810

5.486

10

533544595346455.605140==

+++++++++=x

7


The mean of the Science Test marks is 48.65.

Example D:

The number of students in all the Bachelor of Technology in four programmes was listed below

by a educational officer from UTHM. Find the mean.

Multimedia : 140 Web Technology : 165

Internet Security : 210 Software Engineering : 200

Solution:

Mean, 75.1784

71500

4

200210165140==

+++=

The mean number of students in all the Bachelor of Technology is 178.75.

Example E:

A sample of 30 automobiles was tested for fuel efficiency (in miles per gallon). Find . Fuel efficiency (in miles per gallon) Frequency

8-12 3

13-17 5

18-22 15

23-27 5

28-32 2

Solution:

Class Class midpoint, -

Frequency, 0-

0-- 8-12 10 3 30

13-17 15 5 75

18-22 20 15 300

23-27 25 5 125

28-32 30 2 60

0=30 0=590

= 0--*-.' 0-*-.' =59030 = 19.67

The mean of fuel efficiency (in miles per gallon) is 19.67

8


Example F:

Find the mean for the repetition of the word knowledge in a story book.

Word knowledge Number of page

10 14

15 16

23 23

40 33

41 11

45 3

Solution:

Class midpoint, -

Frequency, 0-

0-- 10 14 140

15 16 240

23 23 529

40 33 1320

41 11 451

45 3 135

0=100 0=2815

= 0--*-.' 0-*-.' =2815100 = 28.15

The mean for the repetition of word knowledge is 28.15 in every page.

9


1.2.2 Median

Median is the most centrally located (middle) value.

Median for Ungrouped Data

The data is arranged from lowest

to the highest value

The middle value is the median.

If the data is even, median is the

average of the two middle

values.

Median for Grouped Data

Median for grouped data, 9 = :; + ?@ABC )

Where + = samplesize :; = lowerclassboundaryformedianclass < = sizeoftheclassmedian G = cumulativefrequencybeforemedianclass 0; = frequencyinthemedianclass

Example G:

The number of rooms in the eight hotels in Malaysia is listed below. Find the median.

650 450 700 550 655 350 400 500

Solution:

Arrange the data in increasing order:

350 400 450 500 550 650 655 700

Select the middle value:

350 400 450 500 550 650 655 700 JKLLMN Median, 9 = OPPQOOP(

= 10502 = 525

The median for the number of rooms in the eight hotels in Malaysia is 525 rooms

Example H:

The number of children with asthma during a specific year in seven zones in Johor is shown.

Find the median.

253 125 328 417 201 70 90

Solution:

70 90 125 201 253 328 417

10


JKLLMN The median for the number of children with asthma during a specific year in seven zones in Johor is 201

Example I:

Six customers purchased these numbers of books in a month;

7 1 3 2 8 13

Solution:

1 2 3 7 8 13 JKLLMN Median, 9 = RQS(

= 102 = 5

The median for the number of books that six customers purchased in a month is 5

Example J:

Find the median for the data in the following frequency table

Class Frequency

30 39 40 49 50 59 60 69 70 79 80 89 90 99 100 109

4

12

17

9

7

4

2

1

11


Solution:

Class Frequency,f Cumulative frequency,F

30 39 40 49 50 59 60 69 70 79 80 89 90 99 100 109

4

12

17

9

7

4

2

1

4

16

33

42

49

53

55

56

U0 =56

Sample size (total frequency),+ = 56, The middle of the data

*( = 28,

This value is located in the third 50 59 interval; we call this interval as median class. Lower boundary for median class, :; = OPQVW( = 49.5 Size of class, < = 40 30 = 10 Cumulative frequency before class median, G = 16 Frequency in median class, 0; = 17 Median, 9 = :; + ?@ABC )

= 49.5 + 10 X28 1617 Y = 56.56

Example K:

A sample of 30 automobiles was tested for fuel efficiency (in miles per gallon).

Fuel efficiency (in miles per gallon) Frequency

8-12 3

13-17 5

18-22 15

23-27 5

28-32 2

Find the median.

Solution:

12


Class Frequency,f Cumulative frequency,F

8-12 3 3

13-17 5 8

18-22 15 23

23-27 5 28

28-32 2 30

U0 =30 + = 30, *( = 15, Median class: 18-22 (Third interval)

:; = 'ZQ'S( = 17.5 < = 13 8 = 5 G = 8 0; = 15 Median, 9 = :; + ?@ABC )

= 17.5 + 5 X15 815 Y = 19.83 Example L:

A random sample of 35 states shows the number of specialty coffee shops for a specific

company. Find the median.

Class boundaries Frequency

0.5-19.5 13

19.5-38.5 8

38.5-57.5 6

57.5-76.5 5

76.5-95.5 3

Solution:

Class boundaries Frequency,f Cumulative frequency,F

0.5-19.5 13 13

19.5-38.5 8 21

38.5-57.5 6 27

57.5-76.5 5 32

76.5-95.5 3 35

U0 =35 + = 35,

13


*( = 17.5, Median class: 19.5-38.5 (Second class boundaries)

:; = 19.5 < = 19 G = 13 0; = 8 Median, 9 = :; + ?@ABC )

= 19.5 + 19 X17.5 138 Y = 30.19

1.2.3 Mode

The third measure of central tendency is mode. The mode is the value that occurs most

often in the data set. It is sometimes said to be the most typical case.

Mode for Ungrouped Data

Mode is the value that occur most frequency.

Mode for Grouped Data

Mode for grouped data, 9P = : +

14


Example N:

Find the mode for the number of branches that six banks have:

300, 324, 400, 202, 227, 198

Solution:

Mode = no mode

Example O:

Find the mode of the number of books purchased by twelve customers in a month;

7 7 4 4 4 9 7 7 4 7 4 9

Solution:

Mode = 7^+]11

Example P:

Find the mode for the data in the following frequency table

Class Frequency

30 39 40 49 50 59 60 69 70 79 80 89 90 99 100 109

4

12

17

9

7

4

2

1

Solution:

bimodal

15


Class Frequency,f

30 39 40 49 50 59 60 69 70 79 80 89 90 99 100 109

4

12

17

9

7

4

2

1

U0 =56

The greatest frequency is 17, occur at class 50 59.

Lower boundary for mode class, : = OPQVW( = 49.5 Size of class, < = 10 Difference frequency between mode class and class before it, ]' = 17 12 = 5 Difference frequency between mode class and class after it, ]( = 17 9 = 8

Mode,9P = 49.5 + 10( 55 + 8) = 53.35

Example Q:



8-12 3

13-17 5

18-22 15

23-27 5

28-32 2

Find the mode.

Solution:

Class Frequency,f

8-12 3

13-17 5

18-22 15

23-27 5

28-32 2

U0 =30

16


The greatest frequency is 15, occur at class 18 22.

Lower boundary for mode class, : = 'ZQ'S( = 17.5 Size of class, < = 5 Difference frequency between mode class and class before it, ]' = 15 5 = 10 Difference frequency between mode class and class after it, ]( = 15 5 = 10 Mode,9P = 17.5 + 5( 1010 + 10) = 20 Example R:


company.


0.5-19.5 13

19.5-38.5 8

38.5-57.5 6

57.5-76.5 5

76.5-95.5 3

Find the mode.

Solution:

Class boundaries Frequency,f

0.5-19.5 13

19.5-38.5 8

38.5-57.5 6

57.5-76.5 5

76.5-95.5 3

U0 =35

The greatest frequency is 13, occur at class boundaries 0.5 19.5.

Lower boundary for mode class, : = 0.5 Size of class, < = 19 Difference frequency between mode class and class before it, ]' = 13 0 = 13 Difference frequency between mode class and class after it, ]( = 13 8 = 5

Mode,9P = 0.5 + 19( 1313 + 5) = 14.22

17


1.3 MEASURE OF DISPERSION

Another important statistic for describing a data set is a measure of dispersion or spread in the

data. This measure tells you how much different are the values in the data set from the middle of

the data set.

For the spread or variability of a data set, three measures are commonly used; range, variance

and standard deviation.

1.3.1 Range

Range is the simplest of the three measures.

Range (Ungroup data)

The range is highest value minus the lowest

value. The symbol a is used for the range. a = highest value lowest value

Example S:

Find the range for the data on hand phone usage, for a call and texting (minutes) for eight

students in a month below:

200 70 3000 17 500 205 75 53

Solution:

Range = 3000 - 17

= 2983

Example T:

The salaries for the ABC Companys staff are listed below. Find the range.

Staff Salary (RM)

CEO 100,000

Manager 40,000

Sales Representative 30,000

Workers 1 25,000

Workers 2 15,000

Workers 3 10,000

Solution: Range = 100,000 10,0000 = 90,000

18


1.3.2 Variance and Standard Deviation

Variance is the average of the squares of the distance each value from the mean. The symbol for

the population variance is (,sigma square. Whereas for sample variance is (. Standard deviation measure the distance of each value from the mean. It is the square root of the

variance. The symbol for the population variance is ,sigma and for sample standard deviation.

Formula Sample Population (C) (A) Ungroup (B) Group

Variance

i) ( = (- )(*-.'+ 1 i) ( = 10 1U0-(- )(

*

-.'

i) ( = (- )(*-.'

ii)

)1(

2

11

2

2

===

nn

xxn

s

n

i

i

n

i

i

ii) ( )

=

= f

xfxf

fs

iin

i

ii

2

1

22

1

1 ii)

2

2

11

2

2

N

xxNn

i

i

n

i

i

===

Standard

deviation = b( = b( = b(

Example U:

A testing lab wishes to test two experimental brands of outdoor paint to see how long each will

last before fading (in month). The testing lab makes 4 gallons of each paint to test. Since

different chemical agents are added to each brand and only four cans are involved, these two

brands constitute small populations. Find the variance and standard deviation of each brand and

make a conclusion.

Brand ABC: 12 35 37 26

Brand XYZ: 17 32 37 24

Solution:

The sentences ., these two brands constitute small populations. shows the data given is

population data.

Using Formula C

19


Brand ABC

Formula C(i) Formula C(ii)

= 12 + 35 + 37 + 264 =1104 = 27.5

- - (- )( 12

35

37

26

12 27.5 = -15.5

7.5

9.5

-1.5

240.25

56.25

90.25

2.25

U(- )(*

-.'

389

= 4 Therefore,

( = (- )(*-.' =3894 = 97.25 = b( = 97.25 = 9.86 Variance, ( = 97.25and Standard

deviation, = 9.86

- -( 12

35

37

26

144

1225

1369

676

U 110 3414 = 4 Therefore,

( ) ( )

25.97

4

110341442

2

2

2

11

2

2

=

=

===

N

xxNn

i

i

n

i

i

= b( = 97.25 = 9.86 Variance, ( = 97.25and Standard deviation, = 9.86

Brand XYZ

= 17 + 32 + 37 + 244 =1104 = 27.5

- - (- )( 17

32

37

24

17 27.5 = -10.5

4.5

9.5

-3.5

110.25

20.25

90.25

12.25

U(- )(*

-.'

233

= 4 Therefore,

( = (- )(*-.' =2334 = 58.25

= b( = 58.25 = 7.63 Variance, ( = 58.25and Standard deviation, =7.63

- -( 17

32

37

24

289

1024

1369

576

U 110 3258 = 4 Therefore,

( ) ( )

25.58

4

110325842

2

2

2

11

2

2

=

=

===

N

xxNn

i

i

n

i

i

= b( = 58.25 = 7.63 Variance, ( = 58.25and Standard deviation, =7.63

Conclusion

Brand ABC : Variance, ( = 97.25and Standard deviation, = 9.86 Brand XYZ : Variance, ( = 58.31and Standard deviation, = 7.64 When the means are equal, the larger the variance or standard deviation, the more variable the data are. Since the standard

deviation of Brand ABC is 9.86 and the standard deviation of Brand XYZ is 7.64, the data are more variable for Brand ABC.

Data in Brand ABC is spreading widely compare to data in Brand XYZ.

20


Example V:

The sample of six students in UTHM was selected by the accountant to examine their pocket

money in a month. Find the variance and standard deviation from the salaries listed below.

Student Pocket Money (RM)

A 400

B 300

C 250

D 150

E 110

F 100

Solution:

Using Formula A

Formula A(i) Formula A(i)

= 400 + 300 + 250 + 150 + 110 + 1006 = 13106 = 218.33

- - (- )( 400 181.67 33003.99

300 81.67 6669.989

250 31.67 1002.989

150 -68.33 4668.989

110 -108.33 11735.39

100 -118.33 14001.99

U(- )(*

-.'

71083.33

+ = 6 Therefore,

( = (- )(*-.'+ 1 =71083.33

5 = 14216.67

= b( = 14216.67 = 119.23 Variance, ( = 14216.67 Standard deviation, = 119.23

- -( 400 160000

300 90000

250 62500

150 22500

110 12100

100 10000

U 1310 357100 + = 6 Therefore,

67.1421

)16(6

1310)357100(6

)1(

2

2

11

2

2

=

=

===

nn

xxn

s

n

i

i

n

i

i

= b( = 14216.67 = 119.23 Variance, ( = 14216.67 Standard deviation, = 119.23

21


Example W:

A sample of seven zones in Johor was selected by the Ministry of Health to check the number

children with asthma during a specific year in each zones. Find the ( and . 253 125 328 417 201 70 90

Solution:

= 253 + 125 + 328 + 417 + 201 + 70 + 907 =14847 = 212

- - (- )( 253 41 1681

125 -87 7569

328 116 13456

417 205 42025

201 -11 121

70 -142 20164

90 -122 14884

U(- )(*

-.' 99900

+ = 7 Therefore,

( = (- )(*-.'+ 1 =999006 = 16650

= b( = 16650 = 129.03 Variance, ( = 16650and Standard deviation, = 129.03

x X^2

253 64009

125 15625

328 107584

417 173889

201 40401

70 4900

90 8100

sum 1484 414508

+ = 7 Therefore,

16650

)17(7

1484)414508(7

)1(

2

2

11

2

2

=

=

===

nn

xxn

s

n

i

i

n

i

i

= b( = 16650 = 129.03 Variance, ( = 16650 Standard deviation, = 129.03

Example X:



8-12 3

13-17 5

18-22 15

23-27 5

28-32 2

Find the variance and standard deviation.

Solution:

Using Formula B

22


Formula B(i)

- 0- 0-- (- )( 0-(- )( 10

15

20

25

30

3

5

15

5

2

30

75

300

125

60

93.51

21.81

0.11

28.44

106.71

280.53

109.04

1.63

142.04

213.42

Total 30 590 746.66

= 0--*-.' 0-*-.' =59030 = 19.67

( = 10 1U0-(- )(*

-.'= 130 1 (746.66) = 25.75

= b( = 25.75 = 5.07 Variance, ( = 25.75and Standard deviation, = 5.07

Formula B(ii)

- 0- 0-- -( 0--( 10

15

20

25

30

3

5

15

5

2

30

75

300

125

60

100

225

400

625

900

300

1125

6000

3125

1800

Total 30 590 12350

( = 10 1 [U0--( (0--)(0 ]

*

-.'= 129 f12350

590(30 g = 25.75

= b( = 25.75 = 5.07

Variance, ( = 25.75and Standard deviation, = 5.07 Both formula gives same result. You can choose either one.

23


Example Y:


company.


0.5-19.5 13

19.5-38.5 8

38.5-57.5 6

57.5-76.5 5

76.5-95.5 3

Find the variance and standard deviation.

Solution:

- 0- 0-- (- )( 0-(- )( or -( 0--( 10

29

48

67

86

13

8

6

5

3

130

232

288

335

258

650.76

42.38

156

991.62

2549.24

8459.88

339.04

936

4958.1

7647.72

100

841

2304

4489

7396

1300

6728

13824

22445

22188

Total 35 1243 22340.74 66485

= 0--*-.' 0-*-.' =124335 = 35.51

( = 10 1U0-(- )(*

-.'( = 10 1 [U0--(

(0--)(0 ]*

-.'

= 135 1 (22340.74) =134 f66485

1243(35 g

= 657.08 = 657.08

= b( = 657.08 = 25.63

Variance, ( = 657.08and Standard deviation, = 25.63

chapter 1 statistics

Documents