das20502 chapter 1 descriptive statistics
DESCRIPTION
TRANSCRIPT
CHAPTER 1Descriptive Statistics
Objectives: 1. To study the basic introductory concept of statistics, including the
branches of statistics, the basic terms of statistics, and types of variables.
2. To be able to use graphical and numerical methods to describe a data set.
3. To be able to find mean, median, mode and standard deviation for grouped data and ungrouped data.
CHAPTER 1Descriptive Statistics
Descriptive Statistics
Ungrouped Data Group Data
CHAPTER 1Descriptive Statistics
Ungrouped Data
Measurement of Central Tendency
Mode Median Mean
Measurement of Dispersion
Variance Std Deviation
cont
CHAPTER 1Descriptive Statistics
Grouped Data
Measurement of Central Tendency
Mode Median Mean
Measurement of Dispersion
Variance Std Deviation
cont
CHAPTER 1Descriptive Statisticscont
Definition of basic termsa) Population consists of all items or elements of interest for a
particular decision or investigation. E.g.: All married staff over the age of 25 in UTHM.
b) Samples is a certain number of elements that have been chosen from a population. Sample is a subset of population. E.g.: a list of married staffs over the age 25 in the Registrar’s Office would be a sample from the population of all married staffs over the age of 25 in the UTHM.
c) Random sample is a sample drawn in such a way that each element of the population has a chance of being selected.
d) Simple random sample implies that any particular sample of a specified sample size has the same chance of being selected as any other sample.
CHAPTER 1Descriptive Statisticscont
e) Element / number is a specific subject or individual about which the information is collected.
f) Variable is a characteristic of the individual within the sample or population
g) Observation / measurement is the value of a variable for an element.
h) Data set is a collection of values of one or more variables. i) Ungrouped data set contains information of each number of a
sample or population. j) Grouped data set is a collection of data which are grouped in
classes. k) Raw data is data recorded in the sequence in which they are
collected and before they are processed or ranked.
CHAPTER 1Descriptive Statisticscont
l) Population parameter is a descriptive measure computed from a population data.
m) Sample statistic is a descriptive measure computed from a sample data.
n) Outliers / Extreme Values are values that are very small or very large relative to the majority of the values in a data set.
CHAPTER 1Descriptive Statisticscont
Example
ShopNumber of A4
Paper (in reams)
12345678
Elements or members
20002500300050007000500040005500
Observations or measurements
1. The following table gives the number of sales of A4 paper in 8 shops in Melaka.
Variable
CHAPTER 1Descriptive Statisticscont
Measures of central tendency are statistical measures which describe the position of a distribution.
They are also called statistics of location, and are the complement of statistics of dispersion, which provide information concerning the variance or distribution of observations.
In the univariate context, the mean, median and mode are the most commonly used measures of central tendency.
CHAPTER 1Descriptive Statisticscont
Mean- The average of data values
Median- Middle value in ranked list- Data must be arranged in increasing or decreasing order. -Ungrouped data and grouped data
Mode- Value that occur most frequency
CHAPTER 1Descriptive Statisticscont
CHAPTER 1Descriptive Statisticscont
Median for Ungrouped Data
evenisnwhenxx
oddisnwhenxMMedian nn
n
2
,, 1)2/(2/
,2/)1(
CHAPTER 1Descriptive Statisticscont
Mode for Ungrouped Data
The frequency of each value in the data set.
•If no value occurs more than once, then the data set has no mode.
•Otherwise, any value that occurs with the greatest frequency is a mode of the data set.
CHAPTER 1Descriptive Statisticscont
Exercise
1. Find the mean for the price of pen (in RM) below:2.00 2.50 3.00 3.50 2.50
2. A sample of six students in UTHM is selected and their height is measured, resulting in the following data: 150.2 cm 1.592 m 149.4 cm 152.7 cm 1.533 m 1.510 mFind the sample mean.
3. Calculate the mean for the following data:a) 14, 11, -10, 8, 8, -16b) 23, 14, 6, -7, -2, 9, 16
CHAPTER 1Descriptive Statisticscont
Example
1. Find the median of the following examination scores:80, 56, 34, 67, 55, 91, 82, 47, 75, 31, 90
2. The following data represent the number of home runs hits by all teams in the Indian League in 2004.
157 133 189 215 208 139 152 167 202 197 124 239 191 169. Find the median of this data set.
3. The data below represent the length (in seconds) of a random sample of songs released in the 90’s.198 255 287 207 176 224 215 208 241 Find the median of the data given.
CHAPTER 1Descriptive Statisticscont
Sample variance, s2, for a sample of n data values :
CHAPTER 1Descriptive Statisticscont
The variance of the n observations is
The standard deviation s is the square root of the variance,
s2 2 22 1( ) ( ) ... ( )
1 1i ny y y y y y
sn n
2s s
Formula Formula
Computing the Variance Computing the Variance
22 x
N
( )
… for a Population
22
1
x xs
n
( )
Formula Formula … for a Sample
CHAPTER 1Descriptive Statisticscont
sExample:Find the sample variance for the given data
6.1 5.7 5.8 6.0 5.8 6.3
Find the variance and std deviation of the following data:5 2 1 7 6 9
CHAPTER 1Descriptive Statisticscont
sCompute the sample variance and std deviation of the heights of the starting players on Team I.
Organizing Data
sVariable
Qualitative Quantitative
Discrete Continuous
A characteristic that varies from one person or thing to another
A non-numerically valued variable
A numerically valued variable
A quantitative variable whose possible values can be listed
A quantitative variable whose possible values form some interval of numbers
Organizing Datacont
sGrouped frequency distribution-Is obtained by giving classes or intervals together with the number of data values in each class.
Cumulative frequency-Is the frequency of a class that includes all values in a data set that fall below the upper boundary of that class
Class midpoint or mark-Is the number halfway between the lower and upper class limits of a class
Class width-Upper boundary – lower boundary
2
limlim itUpperitlower
cont
sExample: Given the data below:
Construct the frequency distribution table with class limits 42 – 45, 46 – 49, 50 – 53 and so on.
Organizing Data
cont
sOrganizing Data
Construct frequency distribution table and find the class midpoint and class width.
Age No. of Employees20 – 2930 – 3940 – 4950 – 5960 – 69
303520105
The ages of its employees in a company
sThe Ministry of Health Malaysia for Health Statistics publishes data on weights and height by age and sex in Vital and Health Statistics. The weights shown in Table, given to the nearest tenth of pound, were obtained from a sample of 18 – 24 – year-old males. Construct a grouped data table for these weights. Use a class width of 20 and a first cutpoint of 120.
Table 6a: Weights of 37 males, aged 18-24 years129.2 185.3 218.1 182.5 142.8155.2 170.0 151.3 187.5 145.6167.3 161.0 178.7 165.0 172.5191.1 150.7 187.0 173.7 178.2161.7 170.1 165.8 214.6 136.7278.8 175.6 188.7 132.1 158.5146.4 209.1 175.4 182.0 173.6149.9 158.6
Grouped Data
sSample Mean
• The sample mean of grouped data is:
n
ii
n
iii
f
xf
1
1
Grouped Data
sThe following data shows the number of mistakes that Redza had done when he typed 100 pages. Find the mean.
No. of mistake/s 0 1 2 3 4 5No. of pages 60 21 10 5 3 1
Grouped Data
sFind the mean for the data below that refers to the number of bicycles owned by 27 families at Taman Permata.
No. of bicycles No. of families01234
26
1342
Mean , M
Mean is the average of data values Ungrouped : The sample mean for raw data:
Let x1, x2, ....xn be a sample of size n.
Grouped : The sample mean for grouped data: Suppose we have a sample of size n grouped into m groups or cells
32
CENTRAL TENDENCY MEASUREMENT
Mean , M
Mean of sample data is a) Ungrouped data b) Group data
Mean of population data is
a) Ungrouped data b) Group data
n
xx i
i
ii
f
xfx
33
N
xi
i
ii
f
xf
where xi = class midpoint / mark = (lower limit – upper limit ) / 2fi = frequency of xi
CENTRAL TENDENCY MEASUREMENT
Median, M
Median is the middle value in a ranked list. The data must be arranged in increasing or decreasing order. The are two type of median which are median for ungrouped data and median for grouped data.
Ungrouped : The data, a) when n is odd (ganjil) : the median is the value of ( ) th term
in ranked list. B) when n is even (genap) : the median = average of the value of the
two middle terms Median of sample data is
a) Ungrouped data b) Group dataOdd (ganjil)
Even (genap)
21nx
342122 nn xx
Cf
FLMedian
median
n
M .2
where
LM = lower boundary for median class , C = size of class / width,
F = cummulative frequency from classes less than the median class
fm = frequency in the median class , n = number of data
CENTRAL TENDENCY MEASUREMENT
21n
sA study of sulphur oxide production within 80 days produced the distribution of the following table. Find the median.
Sulphur oxide (tonne) Frequency5.0 – 8.9
9.0 – 12.913.0 – 16.917.0 – 20.921.0 – 24.925.0 – 28.929.0 – 32.9
31014251792
sNumber of visits No. Of students0-45-9
10-1415-1920-2425-29
1741221181
Find the median for the data below that shows the number of visits to the library made by all the 100 international students in one year.
sMode is the value that occurs most frequently (highest frequency in a data set)
Grouped Data :
Note : Group Data1) Data with 2 mode is known as bimode and more 2 mode is multimodeMode for data grouped ,
Cdd
dLMoMode
ab
bM .,
MODE
Find the m
ode of the following
data.Class Frequency
11 – 1516 – 2021 – 2526 – 3031 – 35 36 – 40
61018241612
A Global Warming Awareness Exhibition was held by a state government. The above table recorded the number of visitors who visited the exhibition and the number of days having those numbers of visitors. Find the mode of number of visitors.
Number of visitors Number of days
0 – 99100 – 199200 – 299300 – 399400 – 499500 – 599
1023167224211107
s Find the mean, median and mode for the following data:
Age Number of people
17 – 2122 – 2627 – 3132 – 3637 – 4142 – 4647 – 5152 – 56
23568723
Sample Variance for Grouped Data
The formula for the sample variance for grouped data is:
f is class frequency and X is class midpoint
where
f
xfxf
fS ii
ii
2
22
1
1
sFind the variance and std deviation
Class 2 3 4 5 6 7Frequency 6 10 15 8 3 10
sxi 3.0 – 3.4 3.4 – 3.8 3.8 – 4.2 4.2 – 4.6 4.6 – 5.0
fi 4 8 11 9 6
Find the variance and std deviation
Population variance, σ2
The formula for the sample variance for grouped data is:
2
2
11
2
1
2
2
)(
N
xxN
N
x
n
ii
n
ii
n
ii
23.3 12.4 58.1 38.2 14.0 58.2 75.4 23.9 23.9 18.3
22.0 37.1 31.4 8.5 1.0 15.5 6.9 5.2 28.7 26.3
13.9 25.9 26.8 26.9 16.8 37.7 10.6 21.9 31.6 30.1
42.4 16.5 21.1 32.9 8.8 10.6 28.6 40.7 12.9 13.8
Given the data below:
a) Construct the frequency distribution table with class boundary -0.5 – 9.5, 9.5 – 19.5, 19.5 – 29.5, and so on.
b) Findi) Mean
ii) Medianiii) Modeiv) Standard deviation
Find the mean, median, mode, standard deviation
Class limit f
20 – 2930 – 3940 – 4950 – 5960 – 69
303520105