data display and summary
DESCRIPTION
Data Display and SummaryTRANSCRIPT
Data Display and Summary
Biostatistics
By Dr Zahid Khan
2
Data
• Data is a collection of facts, such as values or measurements.
OR
• Data is information that has been translated into a form that is more convenient to move or process.
OR
• Data are any facts, numbers, or text that can be processed by a computer.
3
Statistics
Statistics is the study of the collection, summarizing, organization, analysis, and interpretation of data.
4
Vital statistics Vital statistics is collecting, summarizing,
organizing, analysis, presentation, and interpretation of data related to vital events of life as births, deaths,
marriages, divorces,
health & diseases.
5
Biostatistics Biostatistics is the application of statistical
techniques to scientific research in health-related fields, including medicine, biology, and public health.
6
Descriptive Statistics
The term descriptive statistics refers to statistics that are used to describe. When using descriptive statistics, every member of a group or population is measured. A good example of descriptive statistics is the Census, in which all members of a population are counted.
7
Inferential or Analytical Statistics
Inferential statistics are used to draw conclusions and make predictions based on the analysis of numeric data.
8
Primary & Secondary Data
• Raw or Primary data: when data collected having lot of unnecessary, irrelevant & un wanted information
• Treated or Secondary data: when we treat & remove this unnecessary, irrelevant & un wanted information
• Cooked data: when data collected not genuinely and is false and fictitious
9
Ungrouped & Grouped Data • Ungrouped data: when data presented or observed individually.
For example if we observed no. of children in 6 families
2, 4, 6, 4, 6, 4
• Grouped data: when we grouped the identical data by frequency. For example above data of children in 6 families can be grouped as:
No. of children Families
2 1
4 3
6 2
or alternatively we can make classes:
No. of children Frequency
2 - 4 4
5 - 7 2
10
Variable
A variable is something that can be changed, such as a characteristic or value. For example age, height, weight, blood pressure etc
11
Types of Variable
Independent variable: is typically the variable representing the value being manipulated or changed. For example smoking
Dependent variable: is the observed result of the independent variable being manipulated. For example ca of lung
Confounding variable: is associated with both exposure and disease. For example age is factor for many events
12
Categories of DATA
13
Quantitative or Numerical data
This data is used to describe a type of information that can be counted or expressed numerically (numbers)
2, 4 , 6, 8.5, 10.5
14
Quantitative or Numerical data (cont.)
This data is of two types
1. Discrete Data: it is in whole numbers or values and has no fraction. For example
Number of children in a family = 4
Number of patients in hospital = 320
2. Continuous Data (Infinite Number): measured on a continuous scale. It can be in fraction. For example
Height of a person = 5 feet 6 inches 5”.6’
Temperature = 92.3 °F
15
Qualitative or Categorical dataThis is non numerical data as
Male/Female, Short/Tall
This is of two types
1. Nominal Data: it has series of unordered categories
( one can not √ more than one at a time) For example
Sex = Male/Female Blood group = O/A/B/AB
2. Ordinal or Ranked Data: that has distinct ordered/ranked categories. For example
Measurement of height can be = Short / Medium / Tall
Degree of pain can be = None / Mild /Moderate / Severe
16
Stem and Leaf Plots
• .Simple way to order and display a data set.
• Abbreviate the observed data into two significant digits.
Stem Leaf
• 0 6 1 4
• 1 1 3 5
• 2 6 2 0
• 3 2
0.6 2.6 0.1
1.1 0.4 1.3 1.5 2.2 2.0 3.2
17
Measures of Central Tendency & Variation (Dispersion)
18
Measures of Central Tendency
are quantitative indices that describe the center of a distribution of data. These are
• Mean
• Median (Three M M M)
• Mode
19
Mean Mean or arithmetic mean is also called AVERAGE
and only calculated for numerical data. For example
• What average age of children in years?
Children 1 2 3 4 5 6 7
Age 6 4 4 3 2 4 6
-- Formula X = ∑ X ___
n
Mean = 6 4 4 3 2 4 5 = 28 = 4 years
7 7
20
Median
• It is central most value. For example what is central value in 2, 3, 4, 4, 4, 5, 6 data?
• If we divide data in two equal groups 2, 3, 4, 4, 4, 5, 6 hence 4 is the central most value
• Formula to calculate central value is:
Median = n + 1 (here n is the total no. of value)
2
Median = (n + 1)/2 = 7 + 1 = 8/2 = 4
21
Mode
• is the most frequently (repeated) occurring value in set of observations. Example
• No mode
Raw data: 10.3 4.9 8.9 11.7 6.3 7.7
• One mode
Raw data: 2 3 4 4 4 5 6
• More than 1 mode
Raw data: 21 28 28 41 43 43
Comparison of the Mode, the Median, and the Mean
• In a normal distribution, the mode , the median, and the mean have the same value.
• The mean is the widely reported index of central tendency for variables measured on an interval and ratio scale.
• The mean takes each and every score into account.
• It also the most stable index of central tendency and thus yields the most reliable estimate of the central tendency of the population.
23
Histogram/Bar Chart
• Histogram & Box plots are used for continuous or scale variables like temperature, Bone density etc
• Bar chart & Pie Charts are used to categorical or nominal variables like gender, name etc.
24
Measures of Dispersion
quantitative indices that describe the spread of a data set. These are
• Range
• Mean deviation
• Variance
• Standard deviation
• Coefficient of variation
• Percentile
25
Range
It is difference between highest and lowest values in a data series. For example:
the ages (in Years) of 10 children are
2, 6, 8, 10, 11, 14, 1, 6, 9, 15
here the range of age will be 15 – 1 = 14 years
26
Mean Deviation This is average deviation of all observation from
the mean -
Mean Deviation = ∑ І X – X І _______
_ n here X = Value, X = Mean n = Total no. of value
Mean Deviation ExampleA student took 5 exams in a class and had scores of 92, 75, 95, 90, and 98. Find the mean deviation for
her test scores.• First step find the mean. _
x = ∑ x ___ n
= 92+75+95+90+98
5
= 450
5
= 90
27
Dr. Riaz A. Bhutto 289/3/2012
Values = X ˉ Mean = X
Deviation from
ˉ Mean = X - X
Absolute value ofDeviationIgnoring + signs
92 90 2 2
75 90 -15 15
95 90 5 5
90 90 0 0
98 90 8 8
Total = 450
n = 5 Mean Deviation =
_ ∑І X – X І _______ = 30/5 n
--∑ X - X = 30
= 6
Average deviation from mean is 6
• 2nd step find mean deviation
29
Variance
• It is measure of variability which takes into account the difference between each observation and mean.
• The variance is the sum of the squared deviations from the mean divided by the number of values in the series minus 1.
• Sample variance is s² and population
variance is σ²
30
Variance (cont.)
• The Variance is defined as:
• The average of the squared differences from the Mean.
• To calculate the variance follow these steps:
• Work out the Mean (the simple average of the numbers)
• Then for each number: subtract the Mean and square the result (the squared difference)
• Then work out the average of those squared differences.
Dr. Riaz A. Bhutto 9/3/2012
31
Step 1
Step 2 Step 3
Step 4
Values = X ˉ Mean = X
Deviation from
ˉ Mean = X - X
ˉ ( X – X)²
2 4 -2 4
5 4 1 1
4 4 0 0
6 4 2 4
3 4 -1 1
Step 6 =
s² =_ ∑ ( X – X)² _______ = 10/5
n
= 2
∑ = 10
Step 5
S²= 2 persons²
Example: House hold size of 5 families was recorded as following: 2, 5, 4, 6, 3 Calculate variance for above data.
32
Standard Deviation
• The Standard Deviation is a measure of how spread out numbers are.
• Its symbol is σ (the greek letter sigma)
• The formula is easy: it is the square root of the Variance.i-e s = √ s²
• SD is most useful measure of dispersion
s = √ (x - x²) n (if n > 30) Population
s = √ (x - x²) n-1 (if n < 30) Sample
Standard Deviation and Standard Error
• SD is an estimate of the variability of the observations or it is sample estimate of population parameter .
• SE is a measure of precision of an estimate of a population parameter.