biostatics ppt
TRANSCRIPT
![Page 1: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/1.jpg)
PRESENTED BY,Dr. Sushi KadanakuppeII year PG studentDept of Preventive & Community DentistryOxford Dental College & Hospital
![Page 2: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/2.jpg)
BIOSTATISTICS
INTRODUCTION BASIC CONCEPTS Data Distributions
DESCRIPTIVE STATISTICS Displaying data Frequency distribution tables. Graphs or pictorial presentation of data. Tables. Numerical summary of data Measure of central tendency Measure of dispersion.
![Page 3: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/3.jpg)
ANALYTICAL OR INFERENTIAL STATISTICS
The nature and purpose of statistical inference
The process of testing hypothesis
a. False-positive & false-negative errors.
b. The null hypothesis & alternative hypothesis
c. The alpha level & p value
d. Variation in individual observations and in multiple samples.
![Page 4: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/4.jpg)
Tests of statistical significance
Choosing an appropriate statistical test
Making inferences from continuous (parametric) data.
Making inferences from ordinal data.
Making inferences from dichotomous and nominal (nonparametric) data.
CONCLUSION
![Page 5: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/5.jpg)
INTRODUCTION
![Page 6: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/6.jpg)
The worker with human material will find the
statistical method of great value and will have even
more need for it than will the laboratory worker.
![Page 7: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/7.jpg)
Claude Bernard (1927)1, a French physiologist of
the nineteenth century and a pioneer in laboratory
research, writes: “We compile statistics only when
we cannot possibly help it. Statistics yield
probability, never certainty- and can bring forth
only conjectural sciences.”
![Page 8: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/8.jpg)
The worker with human material, however, can seldom
control environment, nor can bring about drastic
changes in his subjects quickly, particularly if he is
studying chronic disease.
![Page 9: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/9.jpg)
The variability of human material, plus the fact
that time allows the introduction of many
additional factors which may contribute to a
disease process, leaves the worker with
quantitative data affected by a multiplicity of
factors.
![Page 10: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/10.jpg)
Statistical methods becomes necessary, probability
becomes of great interest, and conjecture based
upon statistical probability may show a way to
break the chain of causation of a disease even
before all factors entering into the production of
the disease are clearly understood.
![Page 11: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/11.jpg)
Yule (1950)2 has defined statistics as “methods
specially adapted to the elucidation of quantitative
data affected by a multiplicity of causes”.
![Page 12: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/12.jpg)
Fully half the work in the biostatistics involves common
sense in the selection and interpretation of data. The magic
of numbers is no substitute.
Bernard points with derision at a German author who
measured the salivary output of one sub maxillary and one
parotid gland in a dog for one hour1.
![Page 13: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/13.jpg)
This author then proceeded to deduce the output of all
salivary glands, right and left, and finally the output of
saliva of a man per kilogram per day. The result, of
course, was a very top-heavy structure built upon a set of
observations entirely too small for the purpose.
Work of this sort explains the jibes which so often ricochet
upon better statisticians. Such mistakes can be avoided.
![Page 14: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/14.jpg)
Statisticians also suffer because they are so often content merely
to collect and analyze data as an end in itself without the purpose
or hope of producing new knowledge or a new concept.
Conant (1947), in his book On Understanding Science, makes it
very clear that new concepts must alternate with the collection of
data if an advance in our knowledge is to occur3.
![Page 15: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/15.jpg)
DEFINITION
Statistics is a scientific field that deals with the collection, classification, description, analysis, interpretation, and presentation of data4.
• Descriptive statistics
• Analytical statistics
• Vital statistics
![Page 16: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/16.jpg)
a. Descriptive statistics concerns the summary measures of data for a sample of a population.
b. Analytical statistics concerns the use of data from a sample of a population to make inferences about the population.
c. Vital statistics is the ongoing collection by government agencies of data relating to events such as births, deaths, marriages, divorces and health- and –disease related conditions deemed reportable by local health authorities.
![Page 17: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/17.jpg)
USES
Biostatistics is a powerful ally in the quest for the
truth that infuses a set of data and waits to be told.
• Statistics is a scientific method that uses theory
and probability to aid in the evaluation and
interpretation of measurements and data obtained
by other methods.
![Page 18: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/18.jpg)
b. Statistics provides a powerful reinforcement for other
determinants of scientific causality.
c. Statistical reasoning, albeit unintentional or subconscious,
is involved in all scientific clinical judgments, especially
with preventive medicine/dentistry and clinical
medicine/dentistry becoming increasingly quantitative.
![Page 19: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/19.jpg)
BASIC CONCEPTS
![Page 20: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/20.jpg)
DATA
Definition: Data are the basic building blocks of statistics and refers to the individual values presented, measured, or observed.
![Page 21: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/21.jpg)
a. Population vs sample. Data can be derived from a total population or a sample.
1. A population is the universe of units or values being studied. It can consist of individuals, objects, events, observations, or any other grouping.
2. A sample is a selected part of a population.
![Page 22: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/22.jpg)
The following are some of the common types of samples:
a) Simple random sampleb) Systematic selected samplec) Stratified selected sampled) Cluster selected samplee) Nonrandomly selected, or convenience sample.
![Page 23: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/23.jpg)
b. Ungrouped vs grouped
1. Ungrouped data are presented or observed individually.
An example of ungrouped data is the following list of weights (in pounds) for six men: 140, 150, 150, 150, 160, and 160.
2. Grouped data are presented in groups consisting of identical data by frequency.
An example of grouped data is the following list of weights for the six men noted above: 140 lb (one man), 150 lb (three men), and 160 lb (two men).
![Page 24: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/24.jpg)
c. Quantitative vs qualitative
1. Quantitative data are numerical, or based on numbers.
An example of quantitative data is height measured in inches.
2. Qualitative data are nonnumerical, or based on a categorical scale.
An example of qualitative data is height measured in terms of short, medium, and tall.
![Page 25: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/25.jpg)
d. Discrete vs continuous
1.Discrete data or categorical data are data for which distinct categories and a limited number of possible values exist.
An example of discrete data is the number of children in a family, that is, two or three children, but not 2.5 children.
All qualitative data are discrete.
![Page 26: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/26.jpg)
Categorical data are further classified into two types:
• nominal scale
• ordinal scale.
![Page 27: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/27.jpg)
Nominal scale:
A variable measured on a nominal scale is characterized by named categories having no particular order.
For example,
patient gender (male/female), reason for dental visit (checkup, routine treatment, emergency), and use of fluoridated water (yes/no) are all categorical variables measured on a
nominal scale.
![Page 28: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/28.jpg)
Within each of these scales, an individual subject
may belong to only one level, and one level does
not mean something greater than any other level.
![Page 29: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/29.jpg)
Ordinal scale
Ordinal scale data are variables whose categories possess a meaningful order.
For example, Severity of periodontal disease (0=none, 1=mild, 2=moderate,
3=severe) and
Length of time spent in a dental office waiting room (1= less than 15 min, 2= 15 to less than 30 minutes, 3= 30 minutes or more) are variables measured on ordinal scales.
![Page 30: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/30.jpg)
2. Continuous data or measurement data are data for which there are an unlimited number of possible values.
An example of continuous data is an individual’s weight, which may actually be 159.232872…lb
but is reported as 159 lb.
![Page 31: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/31.jpg)
• Measurement data can be characterized by
interval scale
ratio scale
• If the continuous scale has a true 0 point, the variables derived
from it can be called ratio variables. The Kelvin temperature scale
is a ratio scale, because 0 degrees on this scale is absolute 0.
![Page 32: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/32.jpg)
• The centigrade temperature scale is a continuous
scale but not a ratio scale, because 0 degrees on
this scale does not mean the absence of heat. So
this becomes an example of an interval scale, as
zero is only a reference point.
![Page 33: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/33.jpg)
e. The quality of measured data is defined in terms of the data’s
accuracy, validity, precision, and reliability.
1. Accuracy refers to the extent that the measurement measures the
true value of what is under study.
2. Validity refers to the extent that the measurement measures what
it is supposed to measure.
![Page 34: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/34.jpg)
3. Precision refers to the extent that the
measurement is detailed.
4. Reliability refers to the extent that the
measurement is stable and dependable.
![Page 35: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/35.jpg)
Dental health professionals have a variety of uses for data5:
• For designing a health care program or facility• For evaluating the effectiveness of an oral hygiene
education program• For determining the treatment needs of a specific
population• For proper interpretation of the scientific literature.
![Page 36: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/36.jpg)
DISTRIBUTIONS Definition. A distribution is a complete summary of frequencies or proportions of a
characteristic for a series of data from a sample or population.
Types of distributions
• Binomial distribution • Uniform distribution• Skewed distribution • Normal distribution • Log-normal distribution • Poisson distribution
![Page 37: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/37.jpg)
a. Binomial distribution is a distribution of possible outcomes from a series of data characterized by two mutually exclusive categories.
b. Uniform distribution, also called rectangular distribution, is a distribution in which all events occur with equal frequency.
![Page 38: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/38.jpg)
c. Skewed distribution is a distribution that is asymmetric.
1. A skewed distribution with a tail among the lower values being characterized is skewed to the left, or negatively skewed.
2. A skewed distribution with a tail among the higher values being characterized is skewed to the right, or positively skewed.
![Page 39: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/39.jpg)
d. Normal distribution, also called Gaussian distribution, is a continuous, symmetric, bell-shaped distribution and can be defined by a number of measures.
e. Log-normal distribution is a skewed distribution when graphed using an arithmetic scale but a normal distribution when graphed using a logarithmic scale.
f. Poisson distribution is used to describe the occurrence of rare events in a large population.
![Page 40: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/40.jpg)
Normal distribution Skewed distribution
![Page 41: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/41.jpg)
Binomial distribution
![Page 42: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/42.jpg)
![Page 43: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/43.jpg)
![Page 44: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/44.jpg)
DESCRIPTIVE STATISTICS
![Page 45: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/45.jpg)
Descriptive statistical techniques enable the
researchers to numerically describe and
summarize a set of data.
![Page 46: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/46.jpg)
Data can be displayed by the following ways:
Frequency distribution tables. Graphs or pictorial presentation of data. Tables.
Numerical summary of data
Measure of central tendency Measure of dispersion.
![Page 47: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/47.jpg)
I DISPLAYING DATA
Data can be displayed by the following ways:
Frequency distribution tables. Graphs or pictorial presentation of data. Tables.
![Page 48: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/48.jpg)
Frequency Distribution Tables
To better explain the data that have been collected, the data
values are often organized and presented in a table termed a
frequency distribution table.
This type of data display shows each value that occurs in the
data set and how often each value occurs.
![Page 49: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/49.jpg)
In addition to providing the sense of the shape of a
variable’s distribution, these displays provide the
researcher with an opportunity to screen the data
values for incorrect or impossible values, a first
step in the process known as “cleaning the data”5
![Page 50: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/50.jpg)
• The data values are first arranged in order from
lowest to highest value (an array).
• The frequency with which each value occurs is
then tabulated.
![Page 51: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/51.jpg)
• The frequency of occurrence for each data point is expressed in
four ways:
1. The actual count or frequency
2. The relative frequency (percent of the total number of values).
3. Cumulative frequency (total number of observations equal to or less than the value)
4. Cumulative relative frequency (the percent of observations equal to or less than the value) commonly referred to as percentile.
![Page 52: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/52.jpg)
Exam Scores Frequency % cumulative frequency cumulative %56 1 3.0 1 3.057 1 3.0 2 6.163 1 3.0 3 9.165 2 6.1 3 15.2 66 1 3.0 3 18.2 68 2 6.1 5 24.269 2 6.1 6 30.3 70 2 3.0 8 36.4 71 1 3.0 10 42.472 1 6.1 11 45.574 2 3.0 12 48.575 1 3.0 14 54.576 3 6.1 15 63.677 2 9.1 16 69.778 1 6.1 18 72.7 79 1 3.0 21 75.880 2 3.0 23 84.881 3 3.0 24 87.9
Frequency Distribution Table for exam scores
![Page 53: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/53.jpg)
• Instead of displaying each individual value in a data
set, the frequency distribution for a variable can
group values of the variable into consecutive
intervals.
• Then the number of observations belonging to an
interval is counted.
![Page 54: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/54.jpg)
Exam scores Number of students %56-61 2 662-65 3 966-69 5 1570-73 4 12 74-77 7 2178-81 7 2182-85 3 9 86-89 2 6
Grouped frequency distribution of exam scores
![Page 55: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/55.jpg)
Although the data are condensed in a useful fashion, some information is lost.
The frequency of occurrence of an individual data point cannot be obtained from a grouped frequency distribution.
For example, in the above presentation of data, seven students scored between 74 and 77, but the number of students who scored 75 is not shown here.
![Page 56: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/56.jpg)
Graphic or pictorial presentation of data
Graphic or pictorial presentations of data are useful in simplifying
the presentation and enhancing the comprehension of data.
All graphs, figures, and other pictures should have clearly stated
and informative titles, and all axes and keys should be clearly
labeled, including the appropriate units of measurement.
![Page 57: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/57.jpg)
Visual aids can take many forms; some basic methods of
presenting data are described below.
1. Pie chart
A pie chart is a pictorial representation of the proportional
divisions of a sample or population, with the divisions
represented as parts of a whole circle.
![Page 58: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/58.jpg)
cervical caries
Occlusal caries
Root caries
Dental caries in xerostomia patients
39% 42%
19%
![Page 59: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/59.jpg)
2. Venn diagram
A Venn diagram shows the degrees of overlap and exclusivity for
two or more characteristics or factors within a sample or population
(in which case each characteristic is represented by a whole circle)
or
for a characteristic or factor among two or more samples or
populations (in which case each sample or population is represented
by a whole circle).
![Page 60: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/60.jpg)
The sizes of the circles (or other symbols) need not
be equal and may represent the relative size for
each factor or population.
![Page 61: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/61.jpg)
![Page 62: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/62.jpg)
3. Bar diagram
A bar diagram is a tool for comparing categories of
mutually exclusive discrete data.
The different categories are indicated on one axis, the
frequency of data in each category is indicated on the other
axis, and the lengths of the bars compare the categories.
Because the data categories are discrete, the bars can be
arranged in any order with spaces between them.
![Page 63: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/63.jpg)
Dental caries in Xerostomia Patients
020406080
cervical caries Occlusalcaries
Root caries
Series1
![Page 64: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/64.jpg)
4. Histogram
A histogram is a special form of bar diagram that
represents categories of continuous and ordered data.
The data are adjacent to each other on the x-axis
(abscissa), and there is no intervening space. The
frequency of data in each category is depicted on the y-
axis (ordinate), and the width of the bar represents the
interval of each category.
![Page 65: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/65.jpg)
0
10
20
30
40
50
No of Subjects
5 to 10 years
10 to 15
15 to 20
20 to 25
25 to 30
Histogram of age for xerostomia subjects
![Page 66: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/66.jpg)
5. Epidemic curve
An epidemic curve is a histogram that depicts the time course
of an illness, disease, abnormality, or condition in a defined
population and in a specified location and time period.
The time intervals are indicated on the x-axis, and the number
of cases during each time interval is indicated on the y-axis.
![Page 67: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/67.jpg)
An epidemic curve can help an investigator
determine such outbreak characteristics as the
peak of disease occurrence (mode), a possible
incubation or latency period, and the type of
disease propagation.
![Page 68: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/68.jpg)
![Page 69: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/69.jpg)
6. Frequency polygon
A frequency polygon is a representation of the distribution
of categories of continuous and ordered data and, in this
respect, is similar to a histogram.
The x-axis depicts the categories of data, and the y-axis
depicts the frequency of data in each category.
![Page 70: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/70.jpg)
In a frequency polygon, however, the frequency is
plotted against the midpoint of each category, and a
line is drawn through each of these plotted points.
The frequency polygon can be more useful than the
histogram because several frequency distributions can
be plotted easily on one graph.
![Page 71: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/71.jpg)
Frequency polygon showing cancer mortality by age groupand sex
![Page 72: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/72.jpg)
7. Cumulative frequency graph
A cumulative frequency graph also is a representation of the
distribution of continuous and ordered data.
In this case, however, the frequency of data in each category
represents the sum of the data from that category and from the
preceding categories.
![Page 73: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/73.jpg)
The x-axis depicts the categories of data, and the y-axis is the
cumulative frequency of data, sometimes given as a
percentage ranging from 0% to 100%.
The cumulative frequency graph is useful in calculating
distribution by percentile, including the median, which is the
category of data that occurs at the cumulative frequency of
50%.
![Page 74: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/74.jpg)
Medical examiner reported (MER) in St. Louis for the years 1979, 1980, & 1981
![Page 75: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/75.jpg)
8. Box plot
A box plot is a representation of the quartiles [25%,
50% (median), and 75%] and the range of a
continuous and ordered data set.
The y-axis can be arthimetic or logarithmic.
Box plots can be used to compare the different
distributions of data values.
![Page 76: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/76.jpg)
Distribution of weights of patients from hospital A and hospital B
![Page 77: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/77.jpg)
9. Spot map
A spot map, also called a geographic coordinate chart, is a map
of an area with the location of each case of an illness, disease,
abnormality, or condition identified by a spot or other symbol
on the map.
A spot map often is used in an outbreak setting and can help an
investigator determine the distribution of cases and characterize
an outbreak if the population at risk is distributed evenly over
the area.
![Page 78: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/78.jpg)
Distribution of Lyme disease cases in Canada from 1977 to 1989
![Page 79: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/79.jpg)
TABLES
In addition to graphs, data are often summarized in
tables. When material is presented in tabular form,
the table should be able to stand alone; that is,
correctly presented material in tabular form should
be understandable even if the written discussion of
the data is not read.
![Page 80: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/80.jpg)
A major concern in the presentation of both
figures and tables is readability.
Tables and figures must be clearly understood and
clearly labeled so that the reader is aided by the
information rather than confused.
![Page 81: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/81.jpg)
Suggestions for the display of data in graphic or tabular form5:
1. The contents of a table as a whole and the items in each separate
column should be clearly and fully defined. The unit of
measurement must be included.
2. If the table includes rates, the basis on which they are measured
must be clearly stated- death rate percent, per thousand, per
million, as the case may be.
![Page 82: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/82.jpg)
3. Rates or proportions should not be given alone
without any information as to the numbers of
observations on which they are based. By giving
only rates of observations and omitting the
actual number of observations, we are excluding
the basic data.
![Page 83: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/83.jpg)
4. Where percentages are used, it must be clearly indicated
that these are not absolute numbers. Rather than combine
too many figures in one table, it is often best to divide the
material into two or three small tables.
5. Full particulars of any exclusion of observations from a
collected series must be given. The reasons for and the
criteria of exclusions must be clearly defined, perhaps in a
footnote.
![Page 84: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/84.jpg)
II NUMERICAL SUMMARY OF DATA
Although graphs and frequency distribution tables can
enhance our understanding of the nature of a variable,
rarely do these techniques alone suffice to describe the
variable. A more formal numerical summary of the
variable is usually required for the full presentation of a
data set.
![Page 85: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/85.jpg)
To adequately describe a variable’s values, three summary measures are needed:
1. The sample size.2. A measure of central tendency3. A measure of dispersion.
![Page 86: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/86.jpg)
The sample size is simply the total number of
observations in the group and is symbolized by the letter N
or n.
A measure of central tendency or location describes the
middle (or typical) value in a data set.
A measure of dispersion or spread quantifies the degree
to which values in a group vary from one another.
![Page 87: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/87.jpg)
Measures of Central Tendency
![Page 88: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/88.jpg)
Whenever one wishes to evaluate the outcome of study, it is crucial
that the attributes of the sample that could have influenced it be
described.
Three statistics, the mode, median, and mean, provide a means of
describing the “typical” individual within a sample.
These statistics are frequently referred to as “measures of central
tendency”.
![Page 89: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/89.jpg)
Measures of central tendency are characteristics that describe the middle or most commonly occurring values in a series.
They tell us the point about which items have a tendency to cluster. Such a measure is considered as the most representative figure for the entire mass of data.
They are used as summary measures for the series. The series can consist of a sample of observations or a total population, and the vales can be grouped or ungrouped. Measure of central tendency is also known as statistical average.
![Page 90: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/90.jpg)
1. Mode
The mode of a data set is that value that occurs with the
greatest frequency.
A series may have no mode (i.e., no value occurs more than
once) or it may have several modes (i.e., several values
equally occur at a higher frequency than the other values in
the series).
![Page 91: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/91.jpg)
Whenever there are two nonadjacent scores with the same
frequency and they are the highest in the distribution, each
score may be referred to as the ‘mode’ and the distribution is
‘bimodal’.
In truly bimodal distribution, the population contains two
sub-groups, each of which has a different distribution that
peaks at a different point.
![Page 92: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/92.jpg)
More than one mode can also be produced artificially
by what is known as digit preference, when
observers tend to favor certain numbers over others.
For example, persons who measure blood pressure
values tend to favor even numbers, particularly those
ending in 0 (e.g., 120 mm Hg).
![Page 93: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/93.jpg)
Calculation: The mode is calculated by
determining which value or values occur most in a
series.
![Page 94: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/94.jpg)
Example: consider the following data. Patients
who had received routine periodontal scaling were
given a common pain-relieving drug and were
asked to record the minutes to 100% pain relief.
Note that “minutes to pain relief” is a continuous
variable that is measured on the ratio scale. The
patients recorded the following data:
![Page 95: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/95.jpg)
Minutes to 100% pain relief:
15 14 10 18 8 10 12 16 10 8 13
First, make an array, that is, arrange the values in ascending
order:
8 8 10 10 10 12 13 14 15 16 18
By inspection, we already know two descriptive measures
belonging to this data: N=11 and mode=10.
![Page 96: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/96.jpg)
Application and characteristics
1. The primary value of the mode lies in its ease of computation
and in its convenience as a quick indicator of a central value
in a distribution.
2. The mode is useful in practical epidemiological work, such as
determining the peak of disease occurrence in the
investigation of a disease.
![Page 97: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/97.jpg)
3. The mode is the most difficult measure of central
tendency to manipulate mathematically, that is, it is not
amenable to algebraic treatment; no analytic concepts are
based on the mode.
![Page 98: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/98.jpg)
4. It is also the least reliable because with successive
samplings from the same population the magnitude
of the mode fluctuates significantly more than the
median or mean.
It is possible, for example, that a change in just one
score can substantially change the value of the
modal score.
![Page 99: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/99.jpg)
2. Median P50
The median is the value that divides the distribution of data
points into two equal parts, that is, the value at which 50% of
the data points lie above it and 50% lie below it.
The median is the middle of the quartiles (the values that divide
the series into quarters) and the middle of the percentiles (the
values that divide the series into defined percentages).
![Page 100: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/100.jpg)
Calculation:
a) In a series with an odd number of values, the values in the series are arranged from lowest to highest, and the value that divides the series in half is the median.
b) In a series with even number of values, the two values that divide the series in half are determined, and the arithmetic mean of these values is the median.
c) An alternative method for calculating the median is to determine the 50% value on a cumulative frequency curve.
![Page 101: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/101.jpg)
Example: In the above example of data series of minutes to 100% pain relief,
8 8 10 10 10 12 13 14 15 16 18
determine which value cuts the array into equal portions. In this array, there are five
data points below 12 and there are five data points above 12. Thus the median is 12.
8 8 10 10 10 12 13 14 15 16 18
Median
![Page 102: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/102.jpg)
If the number of observations is even, unlike the preceding
example, simply take the midpoint of the two values that would
straddle the center of the data set.
Consider the following data set with N=10:
8 8 10 10 10 13 14 15 16 18
Median = 10+13
= 11.5 2
![Page 103: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/103.jpg)
Applications and characteristics:
1.The median is not sensitive to one or more extreme
values in a series; therefore, in a series with an extreme
value, the median is a more representative measure of
central tendency than the arithmetic mean.
![Page 104: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/104.jpg)
2. It is not frequently used in sampling statistics. In terms of sampling fluctuation, the median is superior to the mode but less stable than the mean. For this reason, and because the median does not possess convenient algebraic properties, it is not used as often as the mean.
3. Median is a positional average and is used only in the context of qualitative phenomena, for example, in estimating intelligence, etc., which are often encountered in sociological fields.
![Page 105: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/105.jpg)
4. Median is not useful where items need to be
assigned relative importance and weights.
5. The median is used in cumulative frequency
graphs and in survival analysis.
![Page 106: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/106.jpg)
3. Arithmetic Mean
The arithmetic mean, or simply, the mean, is the
sum of all values in a series divided by the actual
number of values in a series.
The symbol for the mean is a capital letter X with a
bar above it: or “X-bar”.
![Page 107: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/107.jpg)
Calculation: The arithmetic mean is determined as
= X / N
![Page 108: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/108.jpg)
Example:
Using the minutes to pain relief, N = 11 and X = 134. Therefore
= 134 / 11 = 12.2 min
![Page 109: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/109.jpg)
Properties of the Mean
1. The mean of a sample is an unbiased estimator of the mean
of the population from which it came.
2. The mean is the mathematical expectation. As such, it is
different from the mode, which is the value observed most often.
![Page 110: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/110.jpg)
3. The sum of the squared deviations of the
observations from the mean is smaller than the sum
of the squared deviations from any other number.
4. The sum of the squared deviations from the mean is
fixed for a given set of observations. This property is
not unique to the mean, but it is a necessary property
of any good measure of central tendency.
![Page 111: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/111.jpg)
Applications and characteristics:
1. The arithmetic mean is useful when performing analytic
manipulation. With the exception of a situation where extreme
scores occur in the distribution, the mean is generally the best
measure of central tendency.
The values of mean tend to fluctuate least from sample to sample.
![Page 112: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/112.jpg)
It is amenable to algebraic treatment and it possesses
known mathematical relationships with other statistics.
Hence, it is used in further statistical calculations.
Thus, in most situations the mean is more likely to be
used than either the mode or the median.
![Page 113: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/113.jpg)
2. The mean can be conceptualized as a fulcrum
such that the distribution of scores around it is in
perfect balance. Since the scores above and below
the mean are in perfect balance, it follows that the
algebraic sum of the observations of these scores
from the mean is 0.
![Page 114: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/114.jpg)
3. Whereas the median counts each score, no matter
what its magnitude, as only one score, the mean
takes into account the absolute magnitude of the
score. The median, therefore, does not balance the
halves of the distribution except when the
distribution is exactly symmetrical; in which case
the mean and the median have identical values.
![Page 115: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/115.jpg)
4. Another way of contrasting the median and the
mean is to compare their values when the
distribution of scores is not symmetrical.
![Page 116: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/116.jpg)
Curve (a) is positively skewed; that is, the
curve tails off to the right. In this case the
mean is larger than the median because of
the influence of the few very high scores.
Thus these high scores are sufficient to
balance off the several lower scores. The
median does not balance the distribution
because the magnitude of the scores is not
included in the computation.
xP50
![Page 117: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/117.jpg)
Curve (b) is negatively
skewed; that is, the
curve tails off to the left.
Now the mean is smaller
than the median because
of the effect of the few
very small scores.xP50
![Page 118: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/118.jpg)
5. It suffers from some limitations viz., it is unduly affected by extreme items; it may not coincide with actual value of an item in a series, and it may lead to wrong impressions, particularly when the item values are not given the average.
![Page 119: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/119.jpg)
Let’s refer again to the group of values in which one
patient recorded a rather extreme, for this group, value:
8 8 10 10 10 12 13 14 15 16 58
The adjusted mean, somewhat larger than the original
mean of 12.2, is calculated as follows:
X = 174 / 11 = 15.8 min
![Page 120: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/120.jpg)
The calculation of the mean is correct, but is its use appropriate for this data set?
By definition the mean should describe the middle of the data set.
However, for this data set the mean of 15.8 is larger than most (9 out of 11!) of
the values in the group.
Not exactly a picture of the middle!
In this case the median (12 minutes) is the better choice for the measure of
central tendency and should be used.
![Page 121: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/121.jpg)
However, mean is better than other averages,
especially in economic and social studies where
direct quantitative measurements are possible.
![Page 122: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/122.jpg)
4. Geometric mean
The geometric mean is the nth root of the product of the values in a series of n
values.
Geometric mean (or G.M.) = n XN
Where,G.M. = geometric mean,N = number of items, = Conventional product notationFor instance, the geometric mean of the numbers, 4, 6, and 9 is worked out as G.M.= 3 4.6.9 = 6
![Page 123: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/123.jpg)
Applications and characteristics
1. The geometric mean is more useful and representative
than the arithmetic mean when describing a series of
reciprocal or fractional values. The most frequently used
application of this average is in the determination of
average percent of change i.e., it is often used in the
preparation of index numbers or when we deal in ratios.
![Page 124: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/124.jpg)
2. The geometric mean can be used only for positive values.
3. It is more difficult to calculate than the arithmetic mean.
![Page 125: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/125.jpg)
5. Harmonic mean
Harmonic mean is defined as the reciprocal of the average
of reciprocals of the values of items of a series.
Symbolically, we can express it as under:
Rec X i
Harmonic mean (H.M.) = Rec. N
![Page 126: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/126.jpg)
Applications and characteristics:
1. Harmonic mean is of limited application, particularly in
cases where time and rate are involved.
2. The harmonic mean gives largest weight to the smallest
item and smallest weight to the largest item.
3. As such it is used in cases like time and motion study
where time is variable and distance constant.
![Page 127: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/127.jpg)
Measures Of Dispersion
![Page 128: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/128.jpg)
Measures of central tendency provide useful
information about the typical performance for a
group of data. To understand the data more
completely, it is necessary to know how the
members of the data set arrange themselves
about the central or typical value.
![Page 129: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/129.jpg)
The following questions must be answered:
How spread out are the data points?
How stable are the values in the group?
![Page 130: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/130.jpg)
The descriptive tools known as measures of dispersion
answer these questions by quantifying the variability of the
values within a group.
Hence, they are the characteristics that are used to describe
the spread, variation, and scatter of a series of values.
The series can consist of observations or a total population,
and the values can be grouped or ungrouped.
![Page 131: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/131.jpg)
This can be done by calculating measures based on
percentiles or measures based on the mean6.
Measures of dispersion based on percentiles
1. Percentiles
which are sometimes called quantiles, are the percentage of
observations below the point indicated when all of the
observations are ranked in descending order.
![Page 132: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/132.jpg)
The median, discussed above, is the 50th percentile.
The 75th percentile is the point below which 75%
of the observations lie, while the 25th percentile is
the point below which 25% of the observations lie.
![Page 133: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/133.jpg)
2. Range
The range is the difference between the highest and lowest values
in a series.
Range = Maximum – Minimum.
More usual, however, is the interpretation of the range as simply
the statement of the minimum and maximum values:
Range = (Minimum, Maximum)
![Page 134: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/134.jpg)
For the sample of minutes to 100% pain relief,
8 8 10 10 10 12 13 14 15 16 58
Range = (8, 18) or Range = 18-8 = 10 min
![Page 135: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/135.jpg)
The overall range reflects the distance between the highest and the lowest value in the data set.
In this example it is 10 min. In the same example, the 75th and 25th percentiles are 15 and 10
respectively and the distance between them is 5 min.
This difference is called the interquartile range (sometimes abbreviated Q3-Q1).
Because of central clumping, the interquartile range is usually considerably smaller than half the size of the overall range of values.
![Page 136: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/136.jpg)
The advantage of using percentiles is that they can
be applied to any set of continuous data, even if
the data do not form any known distribution.
![Page 137: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/137.jpg)
Application and characteristics
1. The range is used to measure data spread.
2. The range presents the exact lower and upper boundaries of a set of data points and thus quickly lends perspective regarding the variable’s distribution.
![Page 138: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/138.jpg)
3. The range is usually reported along with the sample median (not the mean).
4. The range provides no information concerning the scatter within the series.
5. The range can be deemed unstable because it is affected by one extremely high score or one extremely low value. Also, only two values are considered, and these happen to be the extreme scores of the distribution. The measure of spread known as standard deviation addresses this disadvantage of the range.
![Page 139: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/139.jpg)
Measures of dispersion based on the mean
Mean deviation, variance, and standard deviation are three measures of dispersion based on the mean.
Although mean deviation is seldom used, a discussion of it provides a better understanding of the concept of dispersion.
![Page 140: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/140.jpg)
1. Mean deviation
Because the mean has several advantages, it might seem logical to measure dispersion by taking the “average deviation” from the mean. That proves to be useless, because the sum of the deviations from the mean is 0.
![Page 141: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/141.jpg)
However, this inconvenience can easily be solved by computing the mean deviation, which is the average of the absolute value of the deviations from the mean, as shown in the following formula:
Mean deviation = (X - X)
N
![Page 142: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/142.jpg)
Because the mean deviation does not have mathematical properties that enable many statistical tests to be based on it, the formula has not come into popular use.
Instead, the variance has become the fundamental measure of dispersion in statistics that are based on the normal distribution.
![Page 143: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/143.jpg)
2. Variance
The variance is the sum of the squared deviations from the mean divided by the number of values in the series minus 1.
Variance is symbolized by s2 or V.
s2 = (X - X)2 / N-1
(X - X)2 is called sum of squares.
![Page 144: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/144.jpg)
In the above formula, the squaring solves the problem that the deviations from the mean add up to 0.
Dividing by N-1 (called degrees of freedom), instead of dividing by N, is necessary for the sample variance to be an unbiased estimator of the population variance.
![Page 145: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/145.jpg)
The numerator of the variance (i.e., the sum of the squared deviations of the observations from the mean) is an extremely important entity in statistics. It is usually called either the sum of squares (abbreviated SS) or the total sum of squares (TSS).
The TSS measures the total amount of variation in a set of observations.
![Page 146: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/146.jpg)
Properties of the variance
1. When the denominator of the equation for variance is expressed as the number of observations minus 1 (N-1), the variance of a random sample is an unbiased estimator of the variance of the population from which it was taken.
![Page 147: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/147.jpg)
2. The variance of the sum of two independently sampled variables is equal to the sum of the variances.
3. The variance of the difference between two independently sampled variables is equal to the sum of their individual variances as well.
![Page 148: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/148.jpg)
Application and characteristics
1. The principal use of the variance is in calculating the standard deviation.
2. The variance is mathematically unwieldy, and its value falls outside the range of observed values in a data set.
3. The variance is generally of greater importance to statisticians than to researchers, students, and clinicians trying to understand the fruits of data collection.
![Page 149: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/149.jpg)
We should note that the sample variance is a squared term, not so easy to fathom in relation to the sample mean.
Thus the square root of the variance, the standard deviation, is desirable.
![Page 150: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/150.jpg)
3. Standard deviation (s or SD)
The standard deviation is a measure of the variability among the individual values within a group.
Loosely defined, it is a description of the average distance of individual observations from the group mean.
![Page 151: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/151.jpg)
Conceptualizing the s, or any of the measures of variance, is more difficult than understanding the concept of central tendency.
From one point of view, however, the s is similar to the mean; that is; it represents the mean of the squared deviations.
![Page 152: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/152.jpg)
Taking the mean and the standard deviation together, a sample can be described in terms of its average score and in terms of its average variation.
If more samples were taken from the same population it would be possible to predict with some accuracy the average score of these samples and also the amount of
variation.
![Page 153: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/153.jpg)
The mathematical derivation of the standard deviation is presented here in some detail because the intermediate steps in its calculation (1) create a theme (called “sum of squares”) that is repeated over and over in statistical arithmetic and (2) create the quantity known as the sample variance.
![Page 154: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/154.jpg)
Calculation: STEPS MATHEMATICAL
TERM LABEL
1. Calculate the mean X of the group
X = X / N Sample mean
2. Subtract the mean from each value X.
(X - X) Deviation from the mean
3. Square each deviation from the mean.
(X - X)2 Squared deviation from the mean.
4. Add the squared deviations from the mean.
(X - X)2 Sum of squares (ss)
![Page 155: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/155.jpg)
5. Divide the sum of squares by (N-1).
ss / (N -1) Variance (s2)
6. Find the square root of the variance.
s2
Standard deviation (SD or s)
The above table presents the calculation of the standard deviation for our sample of minutes to 100% pain relief.
![Page 156: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/156.jpg)
We now have two sets of complete sample description for our example.
Sample Description 1 Sample Description 2
Sample size N = 11 N = 11
Measure of central tendency
Median = 12 min X = 12.2 minutes
Measure of spread Range = (8, 18) SD = 3.31
![Page 157: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/157.jpg)
The standard deviation is reported along with the sample mean, usually in the following format:
mean SD.
This format serves as a pertinent reminder that the SD measures the variability of values surrounding the middle of the data set.
![Page 158: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/158.jpg)
It also leads us to the practical application of the concepts of mean and standard deviation shown in the following rules of thumb:
X 1 SD encompasses approximately 68% of the values in a group.
X 2 SD encompasses approximately 95% of the values in a group.
X 3 SD encompasses approximately 99% of the values in a group.
![Page 159: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/159.jpg)
These rules of thumb are useful when deciding whether to report the mean SD or the median and range as the appropriate descriptive statistics for a group of data points.
If roughly 95% of the values in a group are contained in the interval X 2SD, researchers tend to use mean SD. Otherwise the median and the range are perhaps more appropriate.
![Page 160: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/160.jpg)
Applications and characteristics
1. The standard deviation is extremely important in sampling theory, in co relational analysis, in estimating reliability of measures, and in determining relative position of an individual within a distribution of scores and between distributions of scores.
![Page 161: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/161.jpg)
2. The standard deviation is the most widely used estimate of variation because of its known algebraic properties and its amenability to use with other statistics.
![Page 162: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/162.jpg)
3. It also provides a better estimate of variation in the population than the other indexes.
4. The numerical value of standard deviation is likely to fluctuate less from sample to sample than the other indexes.
![Page 163: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/163.jpg)
5. In certain circumstances, quantitative probability statements that characterize a series, a sample of observations, or a total population can be derived from the standard deviation of the series, sample, or population.
6. When the standard deviation of any sample is small, the sample mean is close to any individual value.
![Page 164: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/164.jpg)
7. When standard deviation of a random sample is small, the sample mean is likely to be close to the mean of all the data in the population.
8. The standard deviation decreases when the sample size increases.
![Page 165: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/165.jpg)
4. Coefficient of variation
The coefficient of variation is the ratio of the standard deviation of a series to the arithmetic mean of the series.
The coefficient of variation is unit less and is expressed as a percentage.
![Page 166: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/166.jpg)
Application and characteristics
1. The co efficient of variation is used to compare the relative variation, or spread, of the distributions of different series, samples, or populations or of the distributions of different characteristics of a single series.
2. The coefficient of variation can be used only for characteristics that are based on a scale with a true zero value.
![Page 167: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/167.jpg)
Calculation: The coefficient of variation (CV) is calculated as
CV (%) = SD / X 100
![Page 168: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/168.jpg)
For example,
In a typical medical school, the mean weight of 100 fourth-year medical students is 140 lb, with a standard deviation of 28 lb.
CV (%) = 28 / 140 100 = 20%
The coefficient of variation for weight is 28 lb divided by 140 lb, or 20%.
![Page 169: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/169.jpg)
THE NORMAL DISTRIBUTION
The majority of measurements of continuous data in medicine and biology tend to approximate the theoretical distribution that is known as the normal distribution and is also called the gaussian distribution (named after Johann Karl Gauss, the person who best described it)6.
![Page 170: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/170.jpg)
• The normal distribution is one of the most frequently used distributions in biomedical and dental research.
• The normal distribution is a population frequency distribution.
• It is characterized by a bell-shaped curve that is unimodal and is symmetric around the mean of the distribution.
![Page 171: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/171.jpg)
• The normal curve depends on only two parameters: the population mean and the population standard deviation.
• In order to discuss the area under the normal curve in terms of easily seen percentages of the population distribution, the normal distribution has been standardized to the normal distribution in which the population mean is 0 and the population standard deviation is 1.
![Page 172: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/172.jpg)
• The area under the normal curve can be segmented starting with the mean in the center (on the x axis) and moving by increments of 1 SD above and below the mean.
![Page 173: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/173.jpg)
Figure shows a standard normal distribution (mean = 0; SD= 1) and the percentages of area under the curve at each increment of SD.
34.13% 13.59% 2.27%.2.27%. 13.59% 34.13%
![Page 174: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/174.jpg)
• The total area beneath the normal curve is 1, or 100% of the observations in the population represented by the curve.
• As indicated in the figure, the portion of the area under the curve between the mean and 1 SD is 34.13% of the total area.
• The same area is found between the mean and one unit below the mean.
![Page 175: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/175.jpg)
Moving 2 SD more above the mean cuts off an additional 13.59% of the area, and moving a total of 3 SD above the mean cuts off another 2.27%.
![Page 176: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/176.jpg)
The theory of the standard normal distribution leads us, therefore, to the following property of a normally distributed variable:
Exactly 68.26% of the observations lie within 1 SD of the mean.
Exactly 95.45% of the observations lie within 2 SD of the mean.
Exactly 99.73% of the observations lie within 3 SD of the mean.
![Page 177: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/177.jpg)
Virtually all of the observations are contained within 3 SD of the mean. This is the justification used by those who label values outside of the interval X 3 SD as “outliers” or unlikely values.
Incidentally, the number of standard deviations away from the mean is called Z score.
![Page 178: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/178.jpg)
Problems In Analyzing A Frequency Distribution
In a normal distribution, the following holds true: mean =median =mode.
In an observed data set, there may be skewness, kurtosis, and extreme values, in which case the measures of central tendency may not follow this pattern.
![Page 179: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/179.jpg)
Skewness and Kurtosis
1.Skewness.
A horizontal stretching of a frequency distribution to one side or the other, so that one tail of observations is longer and has more observations than the other tail, is called skewness.
![Page 180: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/180.jpg)
![Page 181: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/181.jpg)
When a histogram or frequency polygon has a longer tail on the left side of the diagram, the distribution is said to be skewed to the left.
If a distribution is skewed, the mean moves farther in the direction of the long tail than does the median, because the mean is more heavily influenced by extreme values.
![Page 182: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/182.jpg)
. A quick way to get an approximate idea of whether or not a frequency distribution is skewed is to compare the mean and the median. If these two measures are close to each other, the distribution is probably not skewed.
![Page 183: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/183.jpg)
2. Kurtosis.
It is characterized by a vertical stretching of the frequency distribution.
It is the measure of the peakedness of a probability distribution.
As shown in the figure kurtotic distribution could look more peaked or could look more flattened than the bell shaped normal distribution.
A normal distribution has zero kurtosis.
![Page 184: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/184.jpg)
![Page 185: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/185.jpg)
• Significant skewness or kurtosis can be detected by statistical tests that reveal that the observed data do not form a normal distribution. Many statistical tests require that the data they analyze be normally distributed, and the tests may not be valid if they are used to compare very abnormal distributions.
• Kurtosis is seldom discussed as a problem in the medical literature, although skewness is frequently observed and is treated as a problem.
![Page 186: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/186.jpg)
3. Extreme values (Outliers) One of the most perplexing problems for the
analysis of data is how to treat a value that is abnormally far above or below the mean. However, before analyzing the data set, the investigator would want to be sure that this item of data was legitimate and would check the original source of data. Although the value is an outlier, it may probably be correct.
![Page 187: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/187.jpg)
PRESENTED BY,Dr. Sushi KadanakuppeII year PG studentDept of Preventive & Community DentistryOxford Dental College & Hospital
![Page 188: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/188.jpg)
ANALYTICAL OR INFERENTIAL STATISTICS
The nature and purpose of statistical inference
The process of testing hypothesis
a. False-positive & false-negative errors.
b. The null hypothesis & alternative hypothesis
c. The alpha level & p value
d. Variation in individual observations and in multiple samples.
![Page 189: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/189.jpg)
Tests of statistical significance
Choosing an appropriate statistical test
Making inferences from continuous (parametric) data.
Making inferences from ordinal data.
Making inferences from dichotomous and nominal (nonparametric) data.
REFERENCES
![Page 190: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/190.jpg)
THE NATURE AND PURPOSE OF STATISTICAL INFERENCE
As stated earlier, it is often impossible to study each member of a population. Instead, we select a sample from the population and from that sample attempt to generalize to the population as a whole. The process of generalizing sample results to a population is termed statistical inference and is the end product of formal statistical hypothesis testing.
![Page 191: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/191.jpg)
Inference means the drawing of conclusions from data.
Statistical inference can be defined as the drawing of conclusions from quantitative or qualitative information using the methods of statistics to describe and arrange the data and to test suitable hypotheses.
![Page 192: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/192.jpg)
Differences Between Deductive Reasoning And Inductive Reasoning
Because data do not come with their own interpretation, the interpretation must be put into the data by inductive reasoning (from Latin, meaning “to lead into”). This approach to reasoning is less familiar to most people than is deductive reasoning (from Latin, meaning “to lead out from”), which is learned from mathematics, particularly from geometry.
![Page 193: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/193.jpg)
Deductive reasoning proceeds from the general (i.e., from assumptions, from propositions, and from formulas considered true) to the specific (i.e., to specific members belonging to the general category).
Consider, for example, the following two propositions: (1) All Americans believe in democracy. (2) This person is an American. If both propositions are true, then the following deduction must be true: This person believes in democracy.
![Page 194: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/194.jpg)
Deductive reasoning is of special use in science once hypotheses are formed. Using deductive reasoning, an investigator says, If this hypothesis is true, then the following prediction or predictions also must be true.
![Page 195: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/195.jpg)
If the data are inconsistent with the predictions from the hypothesis, they force a rejection or modification of the hypothesis. If the data are consistent with the hypothesis, they cannot prove that the hypothesis is true, although they do lend support to the hypothesis.
To reiterate, even if the data are consistent with the hypothesis, they do not prove the hypothesis.
![Page 196: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/196.jpg)
Physicians often proceed from formulas accepted as true and from observed data to determine the values that variables must have in a certain clinical situation. For example, if the amount of a medication that can be safely given per kilogram of body weight (a constant) is known, then it is simple to calculate how much of that medication can be given to a patient weighing 50 kg.
This is deductive reasoning, because it proceeds from the general (a constant and a formula) to the specific (the patient).
![Page 197: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/197.jpg)
Inductive reasoning, in contrast, seeks to find valid generalizations and general principles from data. Statistics, the quantitative aid to inductive reasoning, proceeds from the specific (that is, from data) to the general (that is, to formulas or conclusions about the data).
![Page 198: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/198.jpg)
For example, by sampling a population and determining both the age and the blood pressure of the persons in the sample (the specific data), an investigator using statistical methods can determine the general relationship between age and blood pressure (e.g., that, on the average, blood pressure increases with age).
![Page 199: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/199.jpg)
Differences Between Mathematics And Statistics
The differences between mathematics and statistics can be illustrated by showing that they form the basis for very different approaches to the same basic equation:
y = mx + b
![Page 200: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/200.jpg)
This equation is the formula for a straight line in analytic geometry. It is also the formula for simple regression analysis in statistics, although the letters used and their order customarily are different.
![Page 201: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/201.jpg)
In the mathematical formula above, the b is a constant, and it stands for the y-intercept (i.e., the value of y when the variable x equals 0). The value m is also a constant, and it stands for the slope (the amount of change in y for a unit increase in the value of x).
The important thing to notice is that in mathematics, one of the variables (either x or y) is unknown (i.e., to be calculated), while the formula and the constants are known.
![Page 202: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/202.jpg)
In statistics, however, just the reverse is true: the variables, x and y, are known for all observations, and the investigator usually wishes to determine whether or not there is a linear (straight line) relationship between x and y, by estimating the slope and the intercept. This can be done using the form of analysis called linear regression, which is discussed later.
![Page 203: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/203.jpg)
As a general rule, what is known in statistics is unknown in mathematics, and vice versa. In statistics, the investigator starts from the specific observations (data) to induce or estimate the general relationships between variables.
![Page 204: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/204.jpg)
Probability
The probability of a specified event is the fraction, or proportion, of all possible events of a specified type in a sequence of almost unlimited random trials under similar conditions.
The probability of an event is the likelihood the event will occur; it can never be greater than 1 (100%) or less than 0 (0%).
![Page 205: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/205.jpg)
Applications and characteristics
1. The probability values in a population are distributed in a definable manner that can be used to analyze the population.
2. Probability values that do not follow a distribution can be analyzed using nonparametric methods.
![Page 206: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/206.jpg)
Calculation
The probability of an event is determined as
P (A) = A / N
Where P (A) = the probability of event A occurring; A = the number of times that event A actually occurs; and N = the total number of events during which event A can occur.
![Page 207: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/207.jpg)
Example: A medical student performs venipunctures on 1000 patients and is successful on 800 in the first attempt. Assuming that all other factors are equal (i.e., random selection of patients), the probability that the next venipuncture will be successful on the first attempt is 80%.
![Page 208: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/208.jpg)
Rules
a. Additive rule
1. Definition. The additive rule applies when considering the probability of one of at least two mutually exclusive events occurring, which is calculated by adding together the probability value of each event.
![Page 209: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/209.jpg)
Calculation. The probability of only one of two mutually exclusive events is determined as
P (A or B) = P (A) + P (B)
Where P (A or B) = the probability of event A or event B occurring.
![Page 210: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/210.jpg)
1. Example. About 6.3% of all medical students are black, and 5.5% are Hispanics
The probability that a medical student will ever be either black or Hispanic is 6.3% plus 5.5%, or
11.8%.
![Page 211: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/211.jpg)
a. Multiplicative rule.
1. Definition. The multiplicative rule applies when considering the probability of at least two independent events occurring together, which is calculated by multiplying the probability values for the events.
![Page 212: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/212.jpg)
1. Calculation. The probability of two independent events occurring together is determined as
P (A and B) = P (A) P (B)
Where P (A and B) = the probability of both event A and event B occurring.
![Page 213: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/213.jpg)
1. Example. About 6.3% of all medical students are black and 36.1% of all students are women. Assuming race and sex are independent selection factors, the percentage of students who are black women should be about 6.3% multiplied by 36.1%, or 2.3%.
![Page 214: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/214.jpg)
THE PROCESS OF TESTING HYPOTHESES
![Page 215: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/215.jpg)
Hypotheses are predictions about what the examination of appropriate data
will show. The following discussion introduces the basic concepts
underlying the usual tests of statistical significance.
These tests determine the probability that a finding (such as a difference
between means or proportions) represents a true deviation from what was
expected (i.e., from the model, which is often a null hypothesis that there
will be no difference between the means or proportions).
![Page 216: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/216.jpg)
False Positive And False Negative Errors Science is based on the following set of principles
1. Previous experience serves as the basis for developing hypotheses;
2. hypotheses serve as the basis for developing predictions;
3. and predictions must be subjected to experimental or observational testing.
![Page 217: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/217.jpg)
In deciding whether data are consistent or inconsistent with the hypotheses, investigators are subject to two types of error.
![Page 218: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/218.jpg)
They could assert that the data support a hypothesis when in fact the hypothesis is false; this would be a false-positive error, which is also called an alpha error or a type I error.
Conversely, they could assert that the data do not support the hypothesis when in fact the hypothesis is true; this would be a false-negative error, which is also called a beta error or a type II error.
![Page 219: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/219.jpg)
Based on the knowledge that the scientists become attached to their own hypotheses and based on the conviction that the proof in science, as in the courts, must be “beyond the reasonable doubt”, investigators are historically been particularly careful to avoid the false-positive error.
![Page 220: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/220.jpg)
Probably this is best for theoretical science.
In medicine, however, where a false-negative error in a diagnostic test may mean missing a disease until it is too late to institute therapy and where a false-negative error in the study of a medical intervention may mean overlooking an effective treatment, investigators cannot feel comfortable about false-negative errors either.
![Page 221: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/221.jpg)
The Null Hypothesis And The Alternative Hypothesis
The process of significance testing involves three basic
steps:
(1) Asserting the null hypothesis,
(2) Establishing the alpha level, and
(3) Rejecting or failing to reject a null hypothesis
![Page 222: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/222.jpg)
The first step consists of asserting the null hypothesis, which is the hypothesis that there is no real (true) difference between means or proportions of the groups being compared or that there is no real association between two continuous variables. It may seem strange to begin the process by asserting that something is not true, but it is far easier to reject an assertion than to prove something is true.
![Page 223: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/223.jpg)
If the data are not consistent with the hypothesis,
the hypothesis can be rejected.
If the data are consistent with a hypothesis, this
still does not prove the hypothesis, because other
hypotheses may fit the data equally well.
![Page 224: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/224.jpg)
The second step is to determine the probability of
being in error if the null hypothesis is rejected.
This step requires that the investigator establish an
alpha level, as described below.
![Page 225: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/225.jpg)
If the p value is found to be greater than the alpha level, the investigator fails to reject the null hypothesis. If, however, the p value is found to be less than or equal to the alpha level, the next step is to reject the null hypothesis and to accept the alternative hypothesis, which is the hypothesis that there is in fact a real difference or association. Although it may seem awkward, this process is now standard in medical science and has yielded considerable scientific benefits.
![Page 226: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/226.jpg)
Statistical tests begin with the statement of the hypothesis itself, but stated in the form of a null hypothesis.
For example, consider again the group of patients who tested the new pain-relieving drug, drug A, and recorded their number of minutes to 100% pain relief. Suppose that a similar sample of patients tested another drug, drug B, in the same way, and investigators wished to know if one group of patients experienced total pain relief more quickly than the other group.
![Page 227: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/227.jpg)
In this case, the null hypothesis would be stated in this way: “there is no difference in time to 100% pain relief between the two pain-relieving drugs A and B”. The null hypothesis is one of no difference, no effect, no association, and serves as
a reference point for the statistical test.
![Page 228: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/228.jpg)
In symbols, the null hypothesis is referred to as H0. In the
comparison of the two drugs A and B, we can state the H0 in
terms of there being no difference in the average number of minutes to pain relief between drugs A and B, or
H0: XA = XB.
The alternative is that the means of the two drugs are not equal. This is an expression of the alternative hypothesis H1.
![Page 229: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/229.jpg)
Null hypothesis H0: XA = XB
Alternative hypothesis H1: XA XB
![Page 230: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/230.jpg)
The Alpha Level And P Value
Before doing any calculations to test the null hypothesis, the investigator must establish a criterion called the alpha level, which is the maximum probability of making a false-positive error that the investigator is willing to accept.
![Page 231: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/231.jpg)
By custom, the level of alpha is usually set at
p = 0.05. This says that the investigator is willing to run a 5% risk (but no more) of being in error when asserting that the treatment and control groups truly differ.
In choosing an alpha level, the investigator inserts value judgment into the process. However, when that is done before the data are collected, at least the post hoc bias of being tempted to adjust the alpha level to make the data show statistical significance is avoided.
![Page 232: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/232.jpg)
The p value obtained by a statistical test (such as the t-test) gives the probability that the observed difference could have been obtained by chance alone, given random variation and a single test of the null hypothesis.
Usually, if the observed p value is 0.05, members of the scientific community who read about an investigation will accept the difference as being real.
![Page 233: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/233.jpg)
Although setting alpha at 0.05 is somewhat
arbitrary, that level has become so customary that
it is wise to provide explanations for choosing
another alpha level or for choosing not to perform
tests of significance at all, which may be the best
approach in some descriptive studies.
![Page 234: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/234.jpg)
The p value is the final arithmetic answer that is calculated by a statistical test of a hypothesis.
Its magnitude informs the researcher as to the validity of the H0,
that is, whether to accept or reject the H0 as worth keeping.
The p value is crucial for drawing the proper conclusions about a set of data.
![Page 235: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/235.jpg)
So what numerical value of p should be used as the dividing
line for acceptance or rejection of the H0? Here is the
decision rule for the observed value of p and the decision
regarding the H0.
If p 0.05, reject the H0
If p > 0.05, accept the H0
![Page 236: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/236.jpg)
If the observed probability is less than or equal to 0.05 (5%), the null hypothesis is rejected, that is, the observed outcome is judged to be incompatible with the notion of “no difference” or “no effect”, and the alternative hypothesis is adopted.
In this case, the results are said to be “statistically significant”.
![Page 237: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/237.jpg)
If the observed probability is greater than 0.05
(5%), the decision is to accept the null hypothesis,
and the results are called “not statistically
significant” or simply NS, the notation often used
in tables.
![Page 238: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/238.jpg)
Statistical Versus Clinical Significance
The distinction between statistical significance and clinical or
practical significance is worth mentioning.
For example, in the statistical test of the
H0: XA = XB for two drug groups,
let’s assume that the observed probability is p = 0.01, a value that
is less than the dividing line of 0.05 or 5%.
![Page 239: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/239.jpg)
This would lead the investigator to reject the H0 and to conclude
that the results are
“significant at p = 0.01”, that is, one drug caused total pain relief significantly faster, on average, than the other drug at p = 0.01.
But if the actual difference in the group means is itself clinically meaningless or negligible, the statistical significance may be considered real yet not useful.
![Page 240: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/240.jpg)
According to Dr. Horowitz,
Statistical significance, “is a mathematical expression of the degree of confidence that an observed difference between groups represents a real difference – that a zero response would not occur if the study were repeated, and that the study is not merely due to chance”.
![Page 241: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/241.jpg)
On the other hand, “clinical significance is a judgment made
by the researcher or reader that differences in response to
intervention observed between groups are important for
health”.
“It is a subjective evaluation of the test”, continues Dr.
Horowitz, based on clinical experience and familiarity with
the “disease or condition being measured”.
![Page 242: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/242.jpg)
Variation In Individual Observations And In Multiple Samples
Most tests of significance relate to a difference between means
or proportions.
They help investigators decide whether an observed difference
is real, which in statistical terms is defined as whether the
difference is greater than would be expected by chance alone.
![Page 243: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/243.jpg)
Inspecting the means to see if they were different is inadequate because it is not known whether the observed difference was unusual or whether a difference that large might have been found infrequently if the experiment were repeated.
![Page 244: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/244.jpg)
To generalize beyond the particular subjects in the single study, the investigators must know the extent to which the difference discovered in the study are reliable.
The estimate of reliability is given by the standard error, which is not the same as the standard deviation.
![Page 245: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/245.jpg)
Standard Deviation And Standard Error
A normal distribution could be completely described by its mean and standard deviation. This information is useful in describing individual observations (raw data),
but it is not useful in determining how close a sample mean from research data is to the mean for the underlying population (which is also called the true mean or the population mean). This determination must be made on the basis of the standard error.
![Page 246: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/246.jpg)
The standard error is related to the standard deviation, but it
differs from the standard deviation in important ways.
Basically, the standard error is the standard deviation of a
population of sample means, rather than of individual
observations.
Therefore the standard error refers to the variability of individual
observations, so that it provides an idea of how variable a single
estimate of the mean from one set of research data is likely to be.
![Page 247: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/247.jpg)
The frequency distribution of the 100 different means
could be plotted, treating each mean as a single
observation.
These sample means will form a truly normal
(gaussian) frequency distribution, the mean of which
would be very close to the true mean for the
underlying population.
![Page 248: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/248.jpg)
More important for this discussion, the standard
deviation of this distribution of sample means is
an unbiased estimate of the standard deviation of
the underlying population and is called the
standard error of the distribution.
![Page 249: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/249.jpg)
The standard error is a parameter that enables the
investigator to do two things that are central to the
function of statistics.
One is to estimate the probable amount of error around
a quantitative assertion.
The other is to perform tests of statistical significance.
![Page 250: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/250.jpg)
If only the standard deviation and sample size of one research sample are known, however, the standard deviation can be converted to a standard
error so that these functions can be pursued.
![Page 251: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/251.jpg)
An unbiased estimate of the standard error can be obtained from the standard deviation of a single research sample if the standard deviation was originally calculated using the degrees of freedom (N - 1) in the denominator.
The formula for converting a standard deviation (SD) to a standard error (SE) is as follows:
Standard error = SE = SD
N
![Page 252: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/252.jpg)
The larger the sample size (N), the smaller the standard
error, and the better the estimate of the population mean.
At any given point on the x-axis, the height of the bell-
shaped curve of the sample means represents the relative
probability that a single sample mean would fall at that
point.
Most of the time, the sample mean would be near the
true mean. Less often, it would be farther away.
![Page 253: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/253.jpg)
In the medical literature, means or proportions are often reported
either as the mean plus or minus 1 SD or as the mean plus or minus
1 SE.
Reported data must be examined carefully to determine whether the
SD or the SE is shown. Either is acceptable in theory, because an
SD can be converted to an SE and vice versa if the sample size is
known.
However, many journals have a policy stating whether the SD or SE
must be reported. The sample size should also be shown.
![Page 254: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/254.jpg)
Confidence Intervals
Whereas the SD shows the variability of individual observations, the SE shows the variability of means.
Whereas the mean plus or minus 1.96 SD estimates the range in which 95% of individual observations would be expected to fall, the mean plus or minus 1.96 SE estimates the range in which 95% of the means of repeated samples of the same size would be expected to fall.
![Page 255: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/255.jpg)
Moreover, if the value for the mean plus or minus
1.96 SE is known, it can be used to calculate the
95% confidence interval, which is the range of
values in which the investigator can be 95%
confident that the true mean of the underlying
population falls.
![Page 256: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/256.jpg)
Tests Of Statistical Significance
![Page 257: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/257.jpg)
The science of biostatistics has given us a large number
of tests that can be applied to public health data. An
understanding of the tests will guide an individual
toward the efficient collection of data that will meet the
assumptions of the statistical procedures particularly
well.
![Page 258: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/258.jpg)
The tests allow investigators to compare two parameters, such as means or proportions, and to determine whether the difference between them is
statistically significant.
![Page 259: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/259.jpg)
The various t- tests (the one tailed Student’s t- test, the two-tailed Student’s t –test, and the paired t- test) compare differences between means, while
z- tests compare differences between proportions.
All of these tests make comparisons possible by calculating the appropriate form of a ratio, which is called a critical ratio because it permits the investigator to make a decision.
![Page 260: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/260.jpg)
This is done by comparing the ratio obtained from whatever test is performed (e.g., a t- test) with the values in the appropriate statistical table (e.g., a table of t values) for the observed number of degrees of freedom.
Before individual tests are discussed in detail, the concepts of critical ratios and degrees of freedom are defined.
![Page 261: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/261.jpg)
Critical Ratios
Critical ratios are a class of tests of statistical significance that depend on dividing some parameter (such as a difference between means) by the standard error (SE) of that parameter.
![Page 262: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/262.jpg)
The general formula for tests of statistical tests is as follows:
Critical Ratio = Parameter
SE of that parameter
![Page 263: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/263.jpg)
When applied to the student’s t- test, the formula becomes:
Difference between two means
Critical Ratio = t =
SE of the difference between two means
![Page 264: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/264.jpg)
When applied to a z- test, the formula becomes: Difference between two proportionsCritical Ratio = z = SE of the difference between two proportions
![Page 265: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/265.jpg)
The value of the critical ratio (e.g., t or z) is then looked up in the appropriate table (of t or z) to determine the corresponding value of p.
For any critical ratio, the larger the ratio, the more likely that the difference between means or proportions is due to more than just random variation (i.e., the more likely it is that the difference can be considered statistically significant and, hence, real).
![Page 266: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/266.jpg)
Unless the total sample size is small (say, under 30), the finding of a critical ratio of greater than about 2 usually indicates that the difference is real and enables the investigator to reject the null hypothesis.
The statistical tables adjust the critical ratios for the sample size by means of the degrees of freedom.
![Page 267: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/267.jpg)
Degrees of Freedom
The term “degrees of freedom” refers to the
number of observations that are free to vary.
![Page 268: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/268.jpg)
The Idea Behind The Degrees Of Freedom
The term “degrees of freedom” refers to the number of observations (N) that are free to vary.
The degree of freedom is lost every time a mean is calculated.
Why should this be?
![Page 269: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/269.jpg)
Before putting on a pair of gloves, a person has the freedom to decide whether to begin with left or the right glove. However, once the person puts on the first glove, he or she loses the freedom to decide which glove to put on last.
If centipedes put on shoes, they would have a choice to make for the first 99 shoes but not for the 100th shoe. Right at the end, the freedom to choose (vary) is restricted.
![Page 270: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/270.jpg)
In statistics, if there are two observed values, only one estimate of the variation between them is possible.
Something has to serve as the basis against which other observations are compared.
The mean is the most “solid” estimate of the expected value of a variable, so it is assumed to be “fixed”.
![Page 271: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/271.jpg)
This implies that the numerator of the mean (the sum of
individual observations, or the sum of xi), which is based on
N observations, is also fixed.
Once N – 1 observations (each of which was, presumably, free to vary) have been added up, the last observation is not free to vary, because the total values of the N observations
must add up to the sum of xi.
![Page 272: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/272.jpg)
For this reason, 1 degree of freedom is lost each time a mean is calculated. The proper average of a sum of squares when calculated from an observed sample, therefore, is the sum of squares divided by the degrees of freedom (N - 1).
![Page 273: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/273.jpg)
Hence, for simplicity, the degrees of freedom for any test are considered to be the total sample size minus 1 degree of freedom for each mean that is calculated. In Student’s t- test 2 degrees of freedom are lost because two means are calculated (one mean for each group whose means are to be compared).
![Page 274: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/274.jpg)
The general formula for degrees of freedom for the
Student’s two-group t- test is N1 + N2 – 2,
where N1 is the sample size in the first group and
N2 is the sample size in the second group.
![Page 275: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/275.jpg)
Use of t- test
In medical research, t- tests are among the three or four most commonly used statistical tests (Emerson and Colditz 1983)6.
The purpose of t- test is to compare the means of a continuous variable in two research samples in order to determine whether or not the difference between the two observed means exceeds the difference that would be expected by chance from random sample.
![Page 276: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/276.jpg)
Sample population and Sizes
If two different samples come from two different groups (e.g., a group of men and a group of women), the Student’s t- test is used.
If the two samples come from the same group (e.g., pretreatment and post- treatment values for the same study subjects), the paired t- test is used.
![Page 277: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/277.jpg)
Both types of t-tests depend on certain assumptions, including the assumption that the data in the continuous variable are normally distributed (i.e., have a bell-shaped distribution).
Very seldom, however, will observed data be perfectly normally distributed. Does this invalidate the t-test? Fortunately, it does not. There is a convenient theorem, that rescues the t-test (and much of statistics as well).
![Page 278: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/278.jpg)
The central limit theorem can be derived theoretically or observed by experimentation.
According to the theorem, for reasonably large samples (say, 30 or more observations in each sample), the distribution of the means of many samples is normal (gaussian), even though the data in individual samples may have skewness, kurtosis, or unevenness.
![Page 279: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/279.jpg)
Because the critical theoretical requirement for the t-test is that the sample means be normally distributed, a t-test may be compared on almost any set of continuous data, if the observations can be considered a random sample and the sample size is reasonable large.
![Page 280: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/280.jpg)
The t-distribution
The t distribution was described by William Gosset,
who used the pseudonym “Student” when he
wrote the description.
![Page 281: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/281.jpg)
The t distribution looks similar to normal
distribution, except that its tails are somewhat wider
and its peak is slightly less high, depending on the
sample size.
The t distribution is necessary because when sample
sizes are small, the observed estimates of the mean
and variance are subject to considerable error.
![Page 282: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/282.jpg)
The larger the sample size is, the smaller the errors are, and the
more the t distribution looks like the normal distribution. In the
case of an infinite sample size, the two distributions are identical.
For practical purposes, when the combined sample size of the two
groups being compared is larger than 120, the difference between
the normal distribution and the t distribution is negligible.
![Page 283: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/283.jpg)
Student’s t test
There are two types of Student’s t test:
the one-tailed and
the two-tailed type.
The calculations are the same, but the interpretation of the
resulting t differs somewhat. The common features will be
discussed before the differences are outlined.
![Page 284: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/284.jpg)
Calculation of the value of t.
In both types of Student’s t test, t is calculated by
taking the observed differences between the means
of the two groups (the numerator) and dividing this
difference by the standard error of the difference
between the means of the two groups (the
denominator).
![Page 285: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/285.jpg)
Before t can be calculated, then, the standard error
of the difference between the means (SED) must
be determined.
The basic formula for this is the square root of the
sum of the respective population variances, each
divided by its own sample size.
![Page 286: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/286.jpg)
When the Student’s t-test is used to test the null hypothesis in
research involving an experimrntal group and a control group,
it usually takes the general form of the following equation:
t = xE - xC – 0 s2
p [(1 / NE) + (1 / NC)] df = NE + NC – 2
![Page 287: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/287.jpg)
The 0 in the numerator of the equation for t was added
for correctness, because the t-test determines if the
difference between the means is significantly different
from 0.
However, because the 0 does not affect the calculations
in any way, it is usually omitted from t-test formulas.
![Page 288: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/288.jpg)
The same formula, recast in terms to apply to any two independent samples (e.g., samples of men and women), is as follows,
t = x1 - x2 - 0
s2p [(1 / N1) + (1 / N2)]
df = N1 + N2 – 2
![Page 289: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/289.jpg)
in which x1 is the mean of the first sample, x2 is
the mean of the second sample, s2p is the pooled
estimate of the variance, N1 is the size of the first
sample, N2 is the size of the second sample, and df
is the degrees of freedom.
![Page 290: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/290.jpg)
The 0 in the numerator indicates that the null
hypothesis states that the difference between the
means will not be significantly different from 0.
The df is needed to enable the investigator to refer to
the correct line in the table of the values of t and their
relationship to p.
![Page 291: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/291.jpg)
The t test is designed to help investigators distinguish “explained variation” from “unexplained variation” (random error, or chance).
These concepts are like “signal” and “background noise” in radio broadcast engineering. Listeners who are searching for a particular station on their radio dial will find background nose on almost every radio frequency.
![Page 292: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/292.jpg)
When they reach the station that they want to hear, they
may not notice the background noise, since the signal
is so much stronger than this noise.
In medical studies, the particular factor that is being
investigated is similar to the radio signal, and random
error is similar to background noise.
![Page 293: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/293.jpg)
Statistical analysis helps distinguish one from the
other by comparing their strengths.
If the variation caused by the factor of interest is
considerably larger than the variation caused by
random factors (i.e., if in the t-test the ratio is
approximately 1.96), the effect of the factor of
interest becomes detectable above the statistical
“noise” of random factors.
![Page 294: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/294.jpg)
Interpretation of the results
If the value of t is large, the p value will be small, because it is unlikely that a large t ratio will be obtained by chance alone. If the p value is 0.05 or less, it is customary to assume that there is a real difference. Conceptually, the p value is the probability of being in error if the null hypothesis of no difference between the means is rejected and the alternative hypothesis of a true difference is accepted.
![Page 295: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/295.jpg)
• One-Tailed and Two-Tailed t-Tests
• These tests are sometimes called the one-sided test and the two-sided tests.
![Page 296: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/296.jpg)
• In the two-tailed test, alpha is equally divided at the ends of the two tails of the distribution. The two-tailed test is generally recommended, because differences in either direction are usually important to document.
![Page 297: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/297.jpg)
![Page 298: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/298.jpg)
For example, it is obviously important to know if a new treatment is significantly better than a standard or placebo treatment, but it is also important to know if a new treatment is significantly worse and should therefore be avoided.
In this situation, the two-tailed test provides an accepted criterion for when a difference shows the new treatment to be either better or worse.
![Page 299: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/299.jpg)
Sometimes, however, only a one-tailed test is
needed.
Suppose, for example, that a new therapy is known
to cost much more than the currently used therapy.
Obviously, it would not be used if it were worse
than the current therapy, but it would also not be
used if it were merely as good as the current
therapy.
![Page 300: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/300.jpg)
Under these circumstances, some investigators consider it acceptable to use a one-tailed test.
When this occurs, the 5% rejection region for the null hypothesis is all put on one tail of the distribution, instead of being evenly divided between the extremes of the two tails.
![Page 301: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/301.jpg)
In the one-tailed test, the null hypothesis
nonrejection region extends only to 1.645 standard
errors above the “no difference” point of 0.
In the two-tailed test, it extends to 1.96 standard
errors above and below the “no difference” point.
![Page 302: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/302.jpg)
This makes the one-tailed test more robust-that is, more
able to detect a significant difference, if it is in the
expected direction. Many investigators dislike one-tailed
tests, because they believe that if an intervention is
significantly worse than the standard therapy, that should
be documented scientifically. Most reviewers and editors
require that the use of a one-tailed significance test be
justified.
![Page 303: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/303.jpg)
Paired t- test
In many medical studies, individuals are followed over
time to see if there is a change in the value of some
continuous variable. Typically, this occurs in a “better
and after” experiment, such as one testing to see if there
was a drop in average blood pressure following
treatment or to see if there was a drop in weight
following the use of a special diet. In this type of
comparison, an individual patient serves as his or her
own control.
![Page 304: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/304.jpg)
The appropriate statistical test for this kind of data
is the paired t-test. The paired t-test is more robust
than the Student’s t-test because it considers the
variation from only one group of people, whereas
the Student’s t-test considers variation from two
groups.
Any variation that is detected in the paired t-test is
attributable to the intervention or to changes over
time in the same person.
![Page 305: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/305.jpg)
Calculation of the value of t
To calculate a paired t-test, a new variable is
created. This variable, called d, is the difference
between the values before and after the
intervention for each individual studied.
![Page 306: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/306.jpg)
The paired t-test is a test of the null hypothesis that,
on the average, the difference is equal to 0, which is
what would be expected if there were no change
over time.
Using the symbol d to indicate the mean observed
difference between the before and after values, the
formula for the paired t-test is as follows:
![Page 307: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/307.jpg)
tpaired = tp = d – 0
Standard error of d = d – 0 sd
2
N
![Page 308: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/308.jpg)
df = N – 1
But in the paired t-test, because only one mean is
calculated (d) , only one degree of freedom is
lost; therefore, the formula for the degrees of
freedom is N – 1.
![Page 309: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/309.jpg)
Interpretation of the results
If the value of t is large, the p value will be small,
because it is unlikely that a large t ratio will be
obtained by chance alone. If the p value is 0.05 or
less, it is customary to assume that there is a real
difference (i.e., that the null hypothesis of no
difference can be rejected).
![Page 310: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/310.jpg)
Use of z-tests
In contrast to t-tests, which compare differences
between means, z-tests compare differences
between proportions.
In medicine, examples of proportions that are
frequently studied are sensitivity, specificity,
positive predictive value, risks, percentages of
people with a given symptom, percentages of
people who are ill, and percentages of ill people
who survive their illness
![Page 311: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/311.jpg)
Frequently, the goal of research is to see if the
proportion of patients surviving in a treated group
differs from that in an untreated group. This can
be evaluated using a z-test for proportions.
![Page 312: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/312.jpg)
Calculation of the value of z
As discussed earlier, z is calculated by taking the
observed difference between the two proportions
(the numerator) and dividing it by the standard
error of the difference between the two
proportions (the denominator).
![Page 313: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/313.jpg)
For purposes of illustration, assume that research is being conducted to see if the proportion of patients surviving in a treated group is greater than that in an untreated group.
For each group, if p is the proportion of successes (survivals), then 1 – p is the proportion of failures (nonsurvivals).
![Page 314: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/314.jpg)
If N represents the size of the group on which the proportion is based, the parameters of the proportion could as follows:
Variance (proportion) = p (1 - p) N
![Page 315: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/315.jpg)
Standard error (proportion) = SEp = p (1 - p)
N
95% confidence interval = 95% CI = p 1.96 SEp
![Page 316: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/316.jpg)
if there is a 0.60 (60%) survival rate following a given treatment, the
calculations of SEp and the 95% CI of the proportion, based on a sample
of 100 study subjects, would be as follows:
SEp = (0.6) (0.4) / 100
= 0.24 / 100 = 0.49 / 10 = 0.049
![Page 317: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/317.jpg)
95% CI = 0.6 (1.96) (0.049) = 0.6 0.096 = between 0.6 – 0.096 and 0.6 +
0.096 = 0.504, 0.696
![Page 318: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/318.jpg)
Now that there is a way to obtain the standard error of a proportion, the standard error of the difference between proportions also can be obtained, and the equation for the z-test can be expressed as follows:
z = p1 – p2 -0 p (1 - p) [(1/ N1) + (1/ N2)]
![Page 319: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/319.jpg)
in which p1 is the proportion of the first sample, p2 is the
proportion of the second sample, N1is the size of the first
sample, N2 is the size of the second sample, and p is the
mean proportion of successes in all observations
combined. The 0 in the numerator indicates that the null
hypothesis states that the difference between the
proportions will not be significantly different from 0.
![Page 320: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/320.jpg)
Interpretation of results Note that the above formula for z is similar to the formula for t
in the Student’s t-test, as described earlier. However, because the variance and the standard error of the proportion are based on a theoretical distribution (the binominal approximation to the z distribution), the z distribution is used instead of the t distribution in determining whether the difference is statistically significant. When the z ratio is large (as when the t ratio is large), the difference is more likely to be real.
![Page 321: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/321.jpg)
The computations for the z tests appear different
from the computations for the chi-square test, but
when the same data are set up as a 2 2 table,
technically the computations for the two tests are
identical. Most people find it easier to do a chi-
square test than do a z-test for proportions.
![Page 322: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/322.jpg)
Choosing An Appropriate Statistical Test
![Page 323: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/323.jpg)
A variety of statistical tests can be used to analyze the
relationship between two or more variables. The bivariate
analysis is the analysis of the relationship between one
independent (possibly causal) variable and one dependent
(outcome) variable. Whereas, the multivariable analysis
is the analysis of the relationship of more than one
independent variable to a single dependable variable.
![Page 324: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/324.jpg)
Statistical tests should be chosen only after the types of clinical
data to be analyzed and the basic research design have been
established. In general, the analytic approach should begin with
a study of the individual variables, including their distributions
and outliers, and with a search for errors. Then bivariate
analysis can be done to test hypotheses and probe for
relationships. Only after these procedures have been done
carefully should multivariable analysis be attempted.
![Page 325: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/325.jpg)
Among the factors involved in choosing an appropriate
statistical test are the goals and research design of the study
and the type of data being gathered.
In some studies the investigators are interested in descriptive
information, such as the sensitivity or specificity of a
laboratory assay, in which case there may be no reason to
perform a test of statistical significance.
![Page 326: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/326.jpg)
In other studies, the investigators are interested in
determining whether the difference between two
means is real, in which case testing for statistical
significance is appropriate.
![Page 327: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/327.jpg)
The types of variables and the research designs set the limits to
statistical analysis and determine which tests are appropriate.
An investigator’s knowledge of the types of variables
(continuous data, ordinal data, dichotomous data and nominal
data) and appropriate statistical tests is analogous to a painter’s
knowledge of the types of media (oils, tempera, water colors,
and so forth) and the appropriate brushes and techniques to be
used.
![Page 328: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/328.jpg)
If the research design involves before and after comparisons in
the same study subjects or involves comparisons of matched
pairs of study subjects, a paired test of statistical significance-
such as the paired t-test, the Wilcoxon matched pairs signed-
ranks test, or the McNemar chi-square test- would be
appropriate. Moreover, if the sampling procedure in a study is
not random, statistical tests that assume random sampling, such
as most of the parametric tests, may not be valid.
![Page 329: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/329.jpg)
Making inferences from continuous (parametric) data
![Page 330: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/330.jpg)
If the study involves two continuous variables, the following questions may be answered:
(1) is there a real relationship between the variables or not?
(2) If there is real relationship, is it a positive or negative linear relationship (a straight-line relationship), or is it more complex?
(3) If there is a real relationship, how strong is it?
(4) How likely is the relationship to be generalizable?
![Page 331: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/331.jpg)
The best way to answer these questions is first to plot the continuous data on a joint distribution graph and then to perform correlation analysis and simple linear regression analysis.
![Page 332: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/332.jpg)
The Joint Distribution Graph
Taking the example of a sample of elderly xerostomia patients, does the number of root caries increase with increasing amounts of sugar in the diet (number of servings per day)? In this instance, data are recorded on a single group of subjects, and each subject constitutes a pair of measures (number of servings per day of sugar and number of root caries). Commonly, any pair of variables entered into a correlation analysis is given the names x and y.
![Page 333: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/333.jpg)
Y
X
![Page 334: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/334.jpg)
This data can be plotted on a joint distribution
graph, as shown in fig. The data do not form a
perfectly straight line, but they do appear to lie
along a straight line, going from the lower left to
the upper right on the graph, and all of these
observations but one are fairly close to the line.
![Page 335: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/335.jpg)
As indicated in fig, the correlation between two
variables, labeled x and y, can range from
nonexistent to strong. If the value of y increases as
x increases, the correlation is positive; if y
increases as x increases, the correlation is
negative.
![Page 336: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/336.jpg)
![Page 337: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/337.jpg)
It appears from the
graph that the
correlation between
amounts of sugar and
dental caries is strong
and is positive.
Y
X
![Page 338: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/338.jpg)
Therefore, based on fig, the answer to the first question
above is that there is a real relationship between amount of
sugar and dental caries. The graph, however, does not
reveal the probability that such a relationship could have
occurred by chance. The answer to the second question is
that the relationship is positive and is linear. The graph
does not provide quantitative information about how strong
the association is (although it looks strong to the eye).
![Page 339: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/339.jpg)
To answer these questions more precisely, it s
necessary to use the techniques of correlation and
simple linear regression. Neither the graph nor
these techniques, however, can answer the
question of how generalizable the findings are.
![Page 340: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/340.jpg)
The Pearson Correlation Coefficient
Even without plotting the observations for two
variables (variable x and variable y) on a graph, the
extent of their linear relationship can be determined
by calculating the Pearson product-moment
correlation coefficient, which is given the symbol r
and is referred to as the r value.
![Page 341: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/341.jpg)
This statistic varies from –1 to +1, going through 0. A finding of –1indicates that the two variables have a perfect negative linear relationship; +1 indicates that they have a perfect positive linear relationship; and 0 indicates that the two variables are totally independent of each other. The r value is rarely found to be –1 or +1.
![Page 342: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/342.jpg)
Frequently, there is an imperfect correlation between the
two variables, resulting in r values between 0 and 1 or
between 0 and –1. Because the Pearson correlation
coefficient is strongly influenced by extreme values, the
value of r can only be trusted when the distribution of each
of the two variables to be correlated is approximately
normal (i.e., without sever skewness or extreme outlier
values).
![Page 343: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/343.jpg)
As is the case in every test of significance, for a fixed
level of strength of association, the larger the sample
size, the more likely it is to be statistically significant.
A weak correlation in a large sample might be
statistically significant, despite the fact that it was not
etiologically or clinically important.
![Page 344: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/344.jpg)
There is no perfect statistical way to estimate
clinical importance, but with continuous variables
a valuable concept is the strength of the
association, measured by the square of the
correlation coefficient, or r2.
![Page 345: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/345.jpg)
The r2 value is the proportion of variation in y
explained by x (or vice versa). It is an important
parameter in advanced statistics.
Looking at the strength of association is analogous
to looking at the size and clinical importance of an
observed difference.
![Page 346: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/346.jpg)
Linear Regression Analysis
Linear regression is related to correlation analysis, but it produces two parameters that can be directly related to the data (i.e., the slope and the intercept). Linear regression seeks to quantify the linear relationship that may exist between an independent variable x and a dependent variable y.
![Page 347: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/347.jpg)
Recall that the formula for a straight line, as expressed in statistics, is y=a+bx. The y is the value of an observation on the y-axis; x is the value of the same observation on the x-axis; a is the regression constant (the value of y when the value of x is 0); and b is the slope (the change in the value of y for a unit change in the value of x).
![Page 348: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/348.jpg)
Linear regression is used to estimate two parameters: the slope of the line (b) and the y-intercept (a).
Most fundamental is the slope, which determines the strength of the impact of variable x on y. For example, the slope can tell how much weight will increase, on the average, for each additional centimeter of height.
![Page 349: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/349.jpg)
Linear regression analysis enables investigators to predict the value of y from the values that x takes.
In other words, the formula for linear regression is a form of statistical modeling, and the adequacy of the model is determined by how closely the value of y can be predicted from other data in the model.
![Page 350: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/350.jpg)
Just as it is possible to set confidence intervals around parameters such as means and proportions, it is possible to set confidence intervals around the slope and the intercept, using computations based on linear regression formulas. Most statistical computer programs perform these computations and are within the scope of advanced statistics.
![Page 351: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/351.jpg)
Making Inferences From Ordinal Data
![Page 352: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/352.jpg)
Many medical data are ordinal data, which are
ranked from the lowest value to the highest value
but are not measured on an exact scale. In some
cases, investigators will assume that ordinal data
meet the criteria for continuous (measurement)
data and will treat the ordinal data as though they
had been obtained from a measurement scale.
![Page 353: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/353.jpg)
For example, if the patient’s satisfaction with the care in a given hospital were being studied, the investigators might assume that the conceptual distance between “very satisfied” (coded as a 3) and “fairly satisfied” (coded as a 2) is equal to the distance between “fairly satisfied” (coded as a 2) and “unsatisfied” (coded as a 1).
![Page 354: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/354.jpg)
If the investigators are willing to make these assumptions, the data can be analyzed using the parametric statistical methods such as t-tests, analysis of variance, and analysis of the Pearson correlation coefficient. However, sometimes clinical investigators make this assumption when it is appropriate, because the statistics are easier to obtain and are more likely to produce statistical significance.
![Page 355: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/355.jpg)
If the investigator is unwilling to make such
assumptions, statistics for discrete (nonparametric) data,
such as a chi-square test, can be used.
However, analysis using chi-square would require
discarding the information about the rank of each
observation. Fortunately, there are a number of bivariate
statistical tests for ordinal data that can be used.
![Page 356: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/356.jpg)
The Mann-Whitney U Test
It is one of the best-known non-parametric
significance tests. It was proposed, apparently
independently, by Mann and Whitney (1947) and
Wilcoxon (1945), and therefore is sometimes also
called the Mann-Whitney-Wilcoxon (MWW) test
or the Wilcoxon rank-sum test.
![Page 357: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/357.jpg)
In statistics the Mann-Whitney U test is a test for assessing whether the meidans between two samples of observations are the same. The null hypothesis is that the two samples are drawn from a single population, and therefore that the medians are equal. It requires the two samples to be independet, and the observations to be ordinal or continuous measurements, i.e. one can at least say, of any two observations, which is the greater.
![Page 358: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/358.jpg)
The test for ordinal data that is similar to the Student’s t-test is the Mann-Whitney U test, also called the Wilcoxon rank-sum test. U, like t, designates a probability distribution. In the Mann-Whitney test, all of the observations in a study of two samples are ranked numerically from the smallest to the largest, without regard to whether the observations came from the first sample (e.g., the control group) or from the second sample (e.g., the experimental group).
![Page 359: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/359.jpg)
Next, the observations from the first sample are identified, the ranks in this sample are summed, and the average rank for the first sample and the variance of those ranks are determined. The process is repeated for the observations from the second sample. If the null hypothesis is true (i.e., if there is no real difference between the two samples), the average ranks of the two samples should be similar.
![Page 360: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/360.jpg)
If the average rank of one sample is considerably greater or considerably smaller than that of the other sample, the null hypothesis probably can be rejected, but a test of significance is needed to be sure.
![Page 361: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/361.jpg)
Because the U-method for calculating t is tedious, a t-test
can be done instead and will yield very similar results.
The Student’s t-test uses raw ranked data and divides the
difference between the two average ranks (which form
the numerator) by the square root of the pooled variance
of the two rank lists. The degrees of freedom equals the
sum of the sample sizes of the two groups minus 2.
![Page 362: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/362.jpg)
The Wilcoxon Matched-Pairs Signed-Ranks Test
The test is named for Frank Wilcoxon (1892–1965) who proposed this, and the rank-sum test for two independent samples (Wilcoxon, 1945). Like the t-test, the Wilcoxon test involves comparisons of differences between measurements, so it requires that the data are measured at an interval level of measurement.
![Page 363: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/363.jpg)
However it does not require assumptions about the
form of the distribution of the measurements. It
should therefore be used whenever the
distributional assumptions that underlie the t-test
cannot be satisfied.
![Page 364: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/364.jpg)
The rank-order test that is comparable to the paired t-test
is the Wilcoxon matched-pairs signed-ranks test. In this
test, all of the observations in a study of two samples are
ranked numerically from the largest to the smallest,
without regard to whether the observations came from the
first sample (e.g., the pretreatment sample) or from the
second sample (e.g., the post treatment sample).
![Page 365: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/365.jpg)
After pairs of data are identified (e.g., pretreatment and
post treatment sample), the difference in rank is identified
for each pair. If in a given pair the pretreatment
observation scored 7 ranks higher than the post treatment
observation, the difference would be noted as –7. If in
another pair the pretreatment observation scored 5 ranks
lower than the post treatment observation, the difference
would be noted as +5.
![Page 366: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/366.jpg)
Each pair would be scored in this way. If the null
hypothesis is true (i.e., if there is no real difference
between the samples), the sum of the positive
scores and negative scores should be close to 0. If
the average difference is considerably different
from 0, the null hypothesis can be rejected.
![Page 367: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/367.jpg)
The Kruskal-Wallis Test
If the investigators in a study involving continuous data want
to compare the means of three or more groups
simultaneously, the appropriate test is a one-way analysis of
variance (a one-way ANOVA), usually called an f-test. The
comparable test for ordinal data is called Kruskal-Wallis
one-way ANOVA.
![Page 368: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/368.jpg)
As in the Mann-Whitney U test, in the Kruskal-Wallis test
all of the data are ranked numerically, and the rank values
are summed in each of the groups to be compared.
The Kruskal-Wallis test seeks to determine if the average
ranks from three or more groups differ from one another
more than would be expected by chance alone.
![Page 369: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/369.jpg)
The Sign Test
The sign test can be used to test the hypothesis that there is "no
difference" between two continuous distributions X and Y.
Sometimes an experimental intervention produces positive
results in many areas, but few if any of the individual outcome
variables show a statistically significant improvement.
![Page 370: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/370.jpg)
In this case, the sign test can be extremely helpful in
comparing the results in the experimental group with those
in the control group. If the null hypothesis is true (i.e.,
there is no real difference between the groups), then, by
chance, for half of the outcome variables the experimental
group should perform better, and for half of the outcome
variables the control group should perform better.
![Page 371: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/371.jpg)
The only data needed for the sign test are the
record of whether, on the average, the
experimental subjects or the control subjects
scored “better” on each outcome variable (by what
amount is not important).
![Page 372: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/372.jpg)
If the average score in the experimental group is
better, the result is recorded as a plus sign (+); if the
average score in the control group is better, the result
is scored as a minus sign (-); and if the average score
in the two groups is exactly the same, no result is
recorded and the variable is omitted from the analysis.
![Page 373: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/373.jpg)
For the sign test, “better” can be determined from a continuous variable, an ordinal variable, a dichotomous variable, a clinical score, or a component of a score. Because under the null hypothesis, the expected proportion of plus signs is 0.5 and of minus signs is 0.5, the test compares the observed proportion of successes with the expected value of 0.5.
![Page 374: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/374.jpg)
Making Inferences From Dichotomous And Nominal (Nonparametric) Data
![Page 375: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/375.jpg)
The chi-square test, the Fisher exact probability
test, and the McNemar chi-square test can be used
in the bivariate analysis of dichotomous
nonparametric data. Usually, the data are first
arranged in a 22 table, and the goal is to test the
null hypothesis that the variables are independent.
![Page 376: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/376.jpg)
The 22 Contingency Table
The contingency table is used to determine whether
the distribution of one variable is conditionally
dependent (contingent) upon the other variable.
More specifically, provides an example of a 22
contingency table, meaning that it has two cells in
each direction.
![Page 377: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/377.jpg)
In a contingency table, a cell is a specific location
in the matrix created by the two variables whose
relationship is being studied. Each cell shows the
observed number, the expected number, and the
percentage of study subjects in each treatment
group who lived or died.
![Page 378: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/378.jpg)
![Page 379: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/379.jpg)
If there are more than two cells in each direction of a
contingency table, the table is called an R C table,
where R stands for the number of rows and C stands
for the number of columns. Although the principles
of the chi-square test are valid for R C tables, the
discussion below focuses on 22 tables.
![Page 380: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/380.jpg)
The Chi-Square Test Of Independence
After t-tests, the most basic and common form of
standard analysis in the medical literature is the chi-
square test of the independence of two variables in
a contingency table (Emerson and Colditz 1983).
![Page 381: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/381.jpg)
The chi-square test is an example of a common
approach to statistical analysis known as
statistical modeling, which seeks to develop a
statistical expression (the model) that predicts the
behavior of a dependent variable on the basis of
knowledge of one or more independent variables.
![Page 382: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/382.jpg)
The process of comparing the observed counts with the
expected counts- that is, of comparing O with E- is
called a goodness of fit test, because the goal is to see
how well the observed counts in a contingency table
“fit” the counts expected on the basis of the model.
Usually, the model in such a table is the null hypothesis
that the two variables are independent of each other.
![Page 383: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/383.jpg)
If the chi-square value is small, the fit is good and
the null hypothesis is not rejected. If, however, the
chi-square value is large, the data do not fit the
hypothesis well.
![Page 384: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/384.jpg)
Calculation Of The Chi-Square Value
Once the observed (O) and expected (E) counts are
known, the chi-square (2) value can be calculated.
One of two methods can be used, depending on the size:
Method for large numbers
Method for Small Numbers
![Page 385: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/385.jpg)
Method for large numbers
In box, the investigators begin by calculating the chi-
square value for each cell in the table, using the
following formula:
(O – E)2
E
![Page 386: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/386.jpg)
Here, the numerator is the square of the deviation
of the observed count in a given cell from the
count that would be expected in that cell if the null
hypothesis were true.
![Page 387: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/387.jpg)
This is similar to the numerator of the variance, which
is expressed as (xi - x)2, where xi represents the
observed value and x (the mean) is the expected
value. However, whereas the denominator for
variance is the degrees of freedom (N - 1), the
denominator for chi-square is the expected number
(E).
![Page 388: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/388.jpg)
To obtain the total chi-square value for a 22 table,
the investigators then add up the chi-square values
for the four cells:
2 = (O – E)2
E
![Page 389: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/389.jpg)
Thus, the basic statistical method for measuring the
total amount of variation in a data set, the total
sum of squares (TSS), is rewritten for the chi-
square test as the sum of
(O – E)2.
![Page 390: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/390.jpg)
Method for Small Numbers
Because the chi-square test is based on the normal
approximation of the binomial distribution (which is
discontinuous), many statisticians believe that a
correction for continuity is needed in the equation for
calculating chi-square, while others believe that this is
unnecessary.
![Page 391: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/391.jpg)
The correction, originally described by F. Yates
and called the Yates correction for continuity,
makes little difference if the numbers in the table
are large, but in tables with small numbers it
probably is worth doing.
![Page 392: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/392.jpg)
The only change in the chi-square test formula
given above is that in the continuity corrected chi-
square test, the number 0.5 is subtracted from the
absolute value of the (O -E) in each cell before
squaring. The formula is as follows:
![Page 393: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/393.jpg)
Yates 2 = (O - E- 0.5)2
E
![Page 394: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/394.jpg)
Clearly, the use of this formula reduces the size of
the chi-square value somewhat and reduces the
chance of finding a statistically significant
difference, so that correction for continuity makes
the test more conservative.
![Page 395: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/395.jpg)
Determination of Degrees of Freedom
The term degrees of freedom refers to the number of
observations that can be considered to be free to vary.
According to the null hypothesis, the best estimate of the
expected distribution of counts in the cells of a
contingency table is provided by the row and column
totals.
![Page 396: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/396.jpg)
Therefore, the row and column totals are considered
to be fixed, as is the mean in calculating a variance.
An observed count can be entered “freely” into one
of the cells of a 22 table has only 1 degree of
freedom.
![Page 397: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/397.jpg)
Multivariable Analysis
![Page 398: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/398.jpg)
Statistical models that have one outcome variable
but include more than one independent variable are
generally called multivariable models.
Multivariable models are intuitively attractive to
investigator to ignore the basic principles of good
research design and analysis, because
multivariable analysis also has many limitations.
![Page 399: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/399.jpg)
The methodology and interpretation of findings in
this type of analysis are difficult for most physicians,
despite the fact that the methods and results of
multivariable analysis are reported frequently in the
medical literature and their use is increasing
(Concato, Feinstein, and Holford 1993)6.
![Page 400: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/400.jpg)
Their conceptual attractiveness and the availability of
high-speed computers contribute to making these
models popular. In order to be intelligent consumers of
the medical literature, health care professionals should
understand how to interpret the findings of multivariable
analysis as they are presented in the literature.
![Page 401: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/401.jpg)
The General Linear Model
The multivariable equation, with one dependent variable and
one or more independent variables, is usually called the
general linear model. The model is “general” because there are
many variations regarding the types of variables for y and xi as
well as the number of x variables that can be used. The model
is “linear” because it is a linear combination of the xi terms.
![Page 402: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/402.jpg)
For the xi variables, a variety of transformations (e.g.,
square of x, cube of x, square root of x, or logarithm of x)
could be used and the combination of terms would still be
linear, so that the model would remain linear. What
cannot happen if the model is to remain linear is for any
of the coefficients (the bi terms) to be a square, a square
root, a logarithm, or another transformation.
![Page 403: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/403.jpg)
Numerous procedures for multivariable analysis are
based on the general linear model. These include
methods with such imposing terms as analysis of
variance (ANOVA), analysis of covariance (ANCOVA),
multiple linear regression analysis, multiple logistic
regression, the log-linear model, and discriminant
function analysis.
![Page 404: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/404.jpg)
The choice of which procedure to use depends
primarily on whether the dependent and
independent variables are continuous, dichotomous,
nominal, or ordinal. Knowing that the procedures
are all variations of the same theme (the general
linear model) helps to make them less confusing.
![Page 405: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/405.jpg)
Analysis of variance (ANOVA)
If the dependent variable is continuous and all of
the independent variables are categorical (i.e.,
nominal, dichotomous, or ordinal), the correct
multivariable technique is analysis of variance
(ANOVA).
![Page 406: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/406.jpg)
One-way ANOVA and N-way ANOVA are discussed briefly.
Both the techniques are based on the general linear model and
can be use to analyze the results of an experimental study. If the
design includes only one independent variable (e.g., treatment),
the technique is called one-way analysis, regardless of how
many different treatment groups are present. If it includes more
than one independent variable (e.g., treatment, age group, and
gender), the technique is called N-way ANOVA.
![Page 407: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/407.jpg)
One-Way ANOVA (The F-Test)
Suppose a team of investigators wanted to study the
effects of drugs A and B on blood pressure. They
might randomly allocate hypertensive patients into
four treatment groups: those taking drug A alone,
those taking drug B alone, those taking drugs A and
B in combination, and those taking a placebo.
![Page 408: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/408.jpg)
The investigators would measure systolic blood pressure before
and after treatment in each patient and calculate a difffernce score
(posttreatment systolic pressure minus pretreatment systolic
pressure) for each study subject. This difference score would
become the outcome variable. They would then calculate a mean
difference score for each of the four treatment groups (i.e., the
three drug groups and the one placebo group) so that these mean
scores could be compared in a test of statistical significance.
![Page 409: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/409.jpg)
The investigators would want to determine whether the
difference in blood pressure found in one or more of the
drug groups was large enough to be clinically important,
assuming it was a drop. Fro example, a drop in mea systolic
blood pressure from 150 mm Hg to 148 mm Hg would be
too small to be clinically useful. If the results were not
clinically useful, there would be little point in looking for an
appropriate test of significance.
![Page 410: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/410.jpg)
If, however, one or more of the groups showed a
clinically important drop in blood pressure, the
investigators would want to determine whether the
difference was likely to have occurred by chance
alone. To do this, an appropriate statistical test of
significance is needed.
![Page 411: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/411.jpg)
The Student’s t-test could be used to compare each pair of
groups, but this would require six different t-tests: each of
the three drug groups (A, B, and AB) versus the placebo
group; drug A group versus drug B group; drug A group
versus drug combination AB group; and drub B group
versus drug combination AB group. This raises the
problem of multiple hypotheses and multiple associations.
![Page 412: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/412.jpg)
Even if the investigators decide that the primary
comparison should be each drug or drug combination
with the placebo, this would still leave three
hypotheses to test instead of just one. Moreover, if two
or three groups did significantly better than the placebo
group, it would be necessary to determine if one
effective drug was significantly better than the others.
![Page 413: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/413.jpg)
There are numerous complex ways of handling the
problem of multiple associations, but the best approach
in cases such as this is to begin by performing an F-test,
which is the first step of ANOVA. The F-test is a kind of
“super t-test” that allows the investigators to compare
more than two means simultaneously.
![Page 414: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/414.jpg)
. The null hypothesis for the F-test in the previous
example is that the mean change in blood pressure (d)
will be the same for all four groups
(dA = dB = dAB = dp), indicating that all samples
were from the same population and that any differences
between the means are due to chance variation.
![Page 415: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/415.jpg)
In creating the F-test (F is for Fisher), Sir Ronald
Fisher reasoned that if two different methods could
be found to estimate the variance and if all of the
samples came from the same population, these two
different estimates of variance should be similar. He
therefore developed two measures of the variance of
the observations.
![Page 416: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/416.jpg)
One is called between-groups variance and is based on
the variation between (or among) the means. The other
is called within-groups variance and is based on the
variation within each group-i.e., variation around a
single group mean. In ANOVA, these two measures of
variance are also called the between-groups mean
square and the within-groups mean square.
![Page 417: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/417.jpg)
The ratio of the two measures of variance can therefore be expressed as follows:
F ratio = Between-groups variance = Between-groups mean square Within-groups variance Within-groups mean square
![Page 418: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/418.jpg)
If the F-ratio is fairly close to 1.0, the two
estimates of variance are similar, and the null
hypothesis that all of the means came from the
same underlying population is not rejected. If the
ratio is much larger than 1.0, there must have been
some force, attributable to group differences,
pushing the means apart, and the null hypothesis of
no difference is rejected.
![Page 419: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/419.jpg)
N-Way ANOVA
The goal of ANOVA, stated in the simplest terms,
is to explain (to “model”) the total variation found
in a study.
![Page 420: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/420.jpg)
If only one independent variable is tested in a model
and that variable happens to be gender, the total
amount of variation must be explained in terms of
how much variation is due to gender and how much is
not. Any variation (SS) that is not due to the model
(gender) is considered to be error (residual) variation.
![Page 421: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/421.jpg)
If two independent variables are tested in a model and
those variables happen to be treatment and gender, the
total amount of variation must be explained in terms of
how much variation is due to each of the following:
the independent effect of gender, the interaction
between (joint effect of) treatment and gender, and
error.
![Page 422: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/422.jpg)
If more than two variables are tested, the analysis becomes increasingly complicated, but the underlying logic remains the same. As long as research design is balanced-that is, there are equal numbers of observations in all of the study groups-ANOVA can be used to analyze the individual and joint effects of the independent variables and to partition the total variation into the various component parts.
![Page 423: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/423.jpg)
Analysis of covariance (ANCOVA)
Analysis of variance (ANOVA) and analysis of
covariance (ANCOVA) are methods for evaluating
studies in which the dependent variable is
continuous. If the independent variables are all of
the categorical type (nominal or dichotomous), then
ANOVA is used.
However, if some of the independent variables are
categorical and some are continuous, then
ANCOVA is appropriate.
![Page 424: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/424.jpg)
ANCOVA would be used, for example, in a study in which
the goal was to test the effects of hypertensive drugs on
systolic blood pressure (a continuous variable that is the
dependent variable here) and the independent variables were
age (a continuous variable) and treatment (a categorical
variable with four levels-i.e., those treated with drug A, those
treated with drug B, those treated with both A and B, and
those treated with a placebo).
![Page 425: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/425.jpg)
The ANCOVA procedure adjusts the dependent
variable on the basis of the continuous independent
variable or variables, and it then does an N-Way
ANOVA on the adjusted dependent variable. In the
above example, the ANCOVA procedure would
remove the effect of age from the analysis of the
effect of the drugs on systolic blood pressure.
![Page 426: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/426.jpg)
Controlling for age means that (artificially) all of the study subjects are made the same age. Suppose that the mean systolic blood pressure in the study group is 150 mm Hg at an average age of 50 years.
![Page 427: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/427.jpg)
The first step (and this is all done by the computer packages
that have ANCOVA) is to do a simple regression between
age and blood pressure, which shows that the blood pressure
increases, say, an average of 1 mm Hg for each year of age
over 50 years and decreases an average of 1 mm Hg for each
year of age under 50. Thus, if a subject’s age is 59, then 9
mm Hg would be subtracted from that subject’s current
blood pressure to arrive at the adjusted blood pressure.
![Page 428: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/428.jpg)
If another subject’s age is 35, then 15 mm Hg would
be added to that subject’s current blood pressure to
arrive at the adjusted value. If a subject’s age is 50, no
adjustment is necessary, because that subject is already
at the population mean age. ANCOVA can adjust the
dependent variable for several continuous independent
variables (called covariates) at the same time.
![Page 429: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/429.jpg)
Multiple Linear Regression
If the dependent variable and all of the independent
variables are continuous, the correct type of multi-variable
analysis is multiple linear regression. There are several
computerized methods of analyzing the data in a multiple
linear regression. Probably the most common method is
called stepwise linear regression.
![Page 430: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/430.jpg)
The investigator either chooses which variable to being
with (i.e., to enter first in the analysis) or else instructs
the computer to start by entering the one variable that
has the strongest association with the dependent
variable. In either case, when only the first variable
has entered, the result is a simple regression analysis.
![Page 431: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/431.jpg)
Next, the second variable is entered according to the investigator’s instructions. The explanatory strength of the variable entered- that is, the r2 –changes as each new variable is entered. The “stepping” continues until none of the remaining independent variables meets the predetermined criterion for being entered (e.g., p is 0.1 or the increase in r2 is 0.01) or until all of the variables have been entered. When the stepping stops, the analysis is complete.
![Page 432: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/432.jpg)
In addition to watching for the statistical
significance of the overall equation and of each
variable entered, the investigator keeps a close
watch on the overall r2 for each step, which is the
proportion of variation the model has explained so
far.
![Page 433: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/433.jpg)
In multiple regression equations that are
statistically significant, the increase in the total r2
after each step, compared with the total r2 after the
previous step, indicates how much additional
variation is explained by the variable just entered.
![Page 434: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/434.jpg)
References 1.C. Bernard, An introduction to the study of
experimental medicine.2. Daniel McCann, ‘Dental research: The clinical trial
formula’, JADA 1990 Apr, 384-392.3. J. M. Dunning, Principles of Dental public health,
fourth edition, 1986.4. National medical series,Preventive medicine &
public health, second edition, 1992. 5. G. M. Gluck, W.M. Morganstei, Jong’s community
detal health, fifth edition, 2003.6. J.F. Jekel, D. L. Katz, Epidemiology, biostatistics
and preventive medicine, second edition, 2001.
![Page 435: Biostatics ppt](https://reader031.vdocuments.mx/reader031/viewer/2022030310/58f9a94e760da3da068b6d47/html5/thumbnails/435.jpg)
7. Cynthia M. Pine, Community oral health, first edition, 1997.
8. Park’s text book of preventive and social medicine, eighteenth edition, 2006
9. C. R. Kothari, Research Methodology- Methods & Techniques, second edition, 2006.
10. Mahajan, Biostatistics, sixth edition, 2006. 11. B.Burt, Eklund, Dentistry, Dental practice & The
Community, sixthe edition, 2005.