Basic Concepts and Measures of LocationDaniel Y.T. Fong(Email: [email protected])
NURS4302 - STATISTICS
School of Nursing, The University of Hong Kong
Common Comments to a course on Statistics
1. Difficult !2. Not relevant to Nursing !
Florence Nightingale(1820-1910)
南丁格爾
Common Symptoms of Failed Students (student claims)
1. Did not come to class (they claimed they learnt statistics before)
2. Did not study notes (they claimed they learnt statistics before)
3. Shared assignments (they claimed they worked together)
4. Did not ask (teachers) for help! (they had friends who studied Statistics before)
Most of us live comfortably with some level ofuncertainty
What makes statistics unique is its ability to QUANTIFY UNCERTAINTY, to make it PRECISE. This allows
statisticians to make CATEGORICAL STATEMENTS, with complete assurance – about their level of uncertainty
Learning Objectives
1. To know the key concepts underlying statistics
2. To know what statistics can do
3. To identify the type of data
4. To understand the use of different location measures
The Key Concepts- Variability
- Population and sample
Variability?
1 + 2 +3 + 4 = ?
10 10 10 10
NO !
Variability?
1 + 2 + · · · + 20 = ?
210 156 ? 250
Yes ! Due to uncertainty !
Variability is the Reality
The number of patients you saw each day varies
Your blood pressure varies Your body temperature varies Your height varies Your weight varies Your mood varies
What Statistics is About …
Managing variability Quantifying uncertainty and the
strength of evidence of an experiment(Evidence based)
Problems Solvable by Statistics (1)
~ www.polaroid.com
Like capture laugher with your camera?
Men
Women
Yes
47%
58%
significantly different?
A telephone survey, conducted during 11-14 March 2004 in US, of 1013 subjects
Descriptive Statistics
Inferential Statistics
Problems Solvable by Statistics (2)
Group 1: New drug Group 2: Placebo
10 hypertensive
patients
11 hypertensive
patients
Were their BPs
after treatment different
?
If yes, how different they were?
Descriptive Statistics
Inferential Statistics
So, What is Statistics?
Descriptive Statistics
Inferential Statistics+
• For characterization
• For reporting
• To summarize data at hand
• For decision making
• To make generalization outside the data at hand
Population
The prevalence of back pain in HK?
Quality of life in Asians?
The BP of a patient after calcium intake?
Population
The entire collection of individual units, which can be people or measurements, about which
information is desired
What do you want to do?The whole HK population
All Asians
All possible times after calcium intake in the patient
Often NOT manageable !
(Random) Sample
500 subjects in HK
500 subjects in each Asian country
1, 4, 26 weeks after calcium intake
Population The entire collection of
individual units, which can be people or measurements, about which information is desired
Sample
The whole HK population
All Asians
All possible times after calcium intake in the patient
A subset of the population selected for study
Must be manageable !
Population and Sample
• Sample mean BP• Sample proportion of
females• Mean BP• Proportion of females
Population Sample
Parameters (unknown)descriptive statistics
(known)
- The fundamental concept from which the statistical concept is based
Q & A
consists only of people
may be finite
may be infinite
can be any set of things in which we are interested
1. In statistical terms, a population
True or False ?
Classification of Data- Affects the way the data are summarized
and analyzed
Summarize? variables
Two Types of Data/Variable
Age
Temperature
Gender
Educational level
Quantitative- takes values in numbers
Qualitative- takes values in words
Statistical Analysis is always feasible
Statistical Analysis may not be feasible
Religion
Remarks
When is Statistical Analysis of Qualitative Data Feasible?
Only when the qualitative data can be quantified
FemaleMale
10
Tertiary or aboveCollegeSecondaryPrimary
3210
Quantifiable?
Categorical data
:- quantifiable qualitative data
Two Types of Quantitative Data
1. Continuous data■ May take values between any two plausible
values (uncountable)■ e.g. age
2. Discrete Data■ May take no values between two plausible
values (countable)■ e.g. number of hospital admissions per day
Between 2 and 3, one can be 2.1, 2.11, 2.111, 2.1111, etc.
Between 2 and 3, there are no values.
Levels of Measurement (1-2)
1. Nominal- categorical data without ranking order
2. Ordinal- categorical data with ranking order
Levels of Measurement (3-4)
3. Interval- quantitative data without a well-defined zero
4. Ratio- quantitative data with a well-defined zero
Samestatistical treatment
(in C)
Measurement Hierarchy
1. Nominal- categorical data withoutranking order
2. Ordinal- categorical data with ranking order
3. Interval/Ratio- quantitative data
20 or below (20, 30] (30, 40] (40, 50]60 or above
• Even• Odd
Age
...Decreasing information
Data Types
Quantitative(takes numerical values)
• Discrete(whole numbers)e.g. number of accidents, household size
• Continuous(takes decimal places)e.g. height, weight
Qualitative/Categorical(takes coded numerical values)
• Ordinal(ranking order exists)e.g. Poor/Average/Good
• Nominal(no ranking order)e.g. gender, race
Q & A
Type of delivery (vaginal versus cesarean) is categorical
White blood cell count is continuous
Examination result (pass versus fail) is ordinal
Military rank (private, sergeant, etc.) is nominal
2. In statistical terms, …
True or False ?
Descriptive Statistics- To summarize data
- Mean, Median and Mode (measures of location)
Why Descriptive Statistics?
6 2 13 17 22 9 19 7 10 5
0 11 7 19 24 16 3 7 13 4
12 29 3 4 33 1 2 6 13 3
25 30 13 25 16 30 12 10 14 2
20 30 4 2 6 12 31 10 3 3
8 24 8 8 4 8 26 12 12 15
2 8 8 20 15 6 14 21 3 8
10 11 10 23 10 14 13 35 22 17
4 10 4 0 20 53 19 5 12 8
11 20 4 13 17 12 11 15 10 2
Presenting the raw data is often infeasible Presenting the distribution is still be overwhelming Good to have only a few numbers to summarize a dataset
55 - 60
50 - 55
45 - 50
40 - 45
35 - 40
30 - 35
25 - 30
20 - 25
15 - 20
10 - 15
5 - 10
0 - 5
30
20
10
0
Depression
Freq
uenc
y
Dep
ress
ion
leve
ls o
f 10
0 ca
ncer
pat
ient
s
The Mean (平均數)
Easy to calculate and handled Use all data at hand Sensitive to aberrant values
54321 ,,,, XXXXX
54321 XXXXX
XXi i
5/)( 5
1
5
1i iX
Sample data
Sum
Mean
6, 2, 13, 17, 22
6+2+13+17+22 = 60
60/5 = 12
9006 XNew comer:
160XNow,
The average value
Use of the Mean – Example
1999/2000 Household Expenditure Survey
~ Census & Statistics Department
The Median(中位數)
2321191715131197531-1
1
0
Sample data
Rank
Median
6, 2, 13, 17, 22
13
2, 6, 13, 17, 22
12.5 13.5
50%
The middle value, i.e. there are 50% of values
below the median
Values below 13 = 2/5 = 40% ?
23211917151311108642-0
1
0900
The Median – Revisit of New Comer
50%
Sample data
Rank
Median
6, 2, 13, 17, 22
13
2, 6, 13, 17, 22
6, 2, 13, 17, 22, 900
2, 6, 13, 17, 22, 900
(13+17)/2 = 15
Use of the Median– Example2001 Population Census The Mode (眾數)
The most popular value
Sample data
Frequency
Mode
6, 2, 13, 17, 22
All or None
6, 2, 13, 17, 22, 22
22
1, 1, 1, 1, 11, 1, 1, 1, 2
6, 2, 13, 2, 17, 22, 22
1, 2, 1, 1, 2
2 and 22
The Mode – Another Example
May not be unique Impractical for continuous data
Sample data
Frequency
Mode All or None
Height of students in the class
But the least popularly used descriptive statistics
The most popular value
Likely 1 for all
Use of the Mode – Example
Guffaws (狂笑)
Chuckle (輕笑)
Giggle (傻笑)
A telephone survey, conducted during 11-14 March 2004 in US, of 1013 subjects
Cackle (格格地笑)
Snort (闭嘴低声咯咯轻笑)
8%
31%
41%
3%
6%
Total Men Women
49%
41%
Mode Chuckle Chuckle Giggle~ www.polaroid.com
Measures of Location
Advantages Disadvantages
Median• Middle value• 中位數
Mean• Average
value• 平均數
Mode• Most popular
value• 眾數
1. “Robust”, i.e. not affected by aberrant values
1. Does not use all the data2. Not easy to manipulate
mathematically
1. Is the “expected” value2. Uses all the data 3. Easy to calculate
1. Not robust to aberrant values
2. Can be difficult to interpret due to aberrant values
1. Can be useful for discrete and categorical measurements
1. Not useful for continuous data2. May not be unique3. Does not use all the data
Q & A
the mean does not use all the data there can be more than one mode in a dataset The median does not use all the data
3. Among the descriptive statistics,
True or False ?