chs 221 v isualizing d ata week 3 dr. wajed hatamleh 1
TRANSCRIPT
CHS 221VISUALIZING DATA
Week 3Dr. Wajed Hatamlehhttp://staff.ksu.edu.sa/whatamleh/en
1
VISUALIZING DATA
•Depict the nature of shape or shape of the data distribution
•In a graph: Different graphs used for different types of data
2
HISTOGRAM
Another common graphical presentation of quantitative data is a histogram.
The variable of interest is placed on the horizontal axis. A rectangle is drawn above each class interval with its height corresponding to the interval’s frequency, relative frequency, or percent frequency.
3
HISTOGRAMS Histograms: Used for quantitative data Similar to a bar graph, with an X and Y axis—but
adjacent values are on a continuum so bars touch one another
Data values on X axis are arranged from lowest to highest
Bars are drawn to height to show frequency or percentage (Y axis)
4
HISTOGRAMS (CONT’D) Example of a histogram: Heart rate data
f
Heart rate in bpm0
2
4
6
8
10
12
0 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74
5
HistogramA bar graph in which the horizontal scale represents the classes of data values and the vertical scale represents the frequencies.
Figure 2-1 6
Relative Frequency Histogram
Has the same shape and horizontal scale as a histogram, but the vertical scale is marked with relative frequencies.
Figure 2-27
Histogram and Relative Frequency Histogram
Figure 2-1 Figure 2-2
Ogive
An ogive is a graph of a cumulative distribution.. The data values are shown on the horizontal The data values are shown on the horizontal axis.axis. Shown on the vertical axis are the:Shown on the vertical axis are the:
• cumulative frequencies, orcumulative frequencies, or
• cumulative relative frequencies, orcumulative relative frequencies, or
• cumulative percent frequenciescumulative percent frequencies The frequency (one of the above) of each class The frequency (one of the above) of each class
is plotted as a point.is plotted as a point.
The plotted points are connected by straight The plotted points are connected by straight lines.lines.
9
Ogive
A line graph that depicts cumulative frequencies
Figure 2-4 10
BAR GRAPHS Bar graphs: Used qualitative data. Bar graphs have a horizontal dimension (X axis)
that specifies categories (i.e., data values) The vertical dimension (Y axis) specifies either
frequencies or percentages Bars for each category drawn to the height that
indicates the frequency or %
11
BAR GRAPHS Example of
a bar graph Note the
bars do not touch each other
PIE CHART
Pie Charts: Also used for qualitative data. Circle is divided into pie-shaped wedges
corresponding to percentages for a given category or data value
All pieces add up to 100% Place wedges in order, with biggest wedge starting
at “12 o’clock”
13
PIE CHART
Example of a pie chart, for same marital status data
Recap
In this Section we have discussed graphs that are pictures of distributions.
Keep in mind that the object of this section is not just to construct graphs, but to learn something about the data sets – that is, to understand the nature of their distributions.
15
CHARACTERISTICS OF A DATA DISTRIBUTION
Central tendency Variability
Both central tendency and variability can be expressed by indexes that are descriptive statistics
16
CENTRAL TENDENCY Indexes of central tendency provide a single
number to characterize a distribution
Measures of central tendency come from the center of the distribution of data values, indicating what is “typical,” and where data values tend to cluster
Popularly called an “average”
17
CENTRAL TENDENCY INDEXES
Three alternative indexes:
The mode The median The mean
18
THE MODE
The mode is the score value with the highest frequency; the most “popular” scoreAge: 26 27 27
28 29 30 31Mode = 27
The mode
19
THE MODE: ADVANTAGES
Can be used with data measured on any measurement level (including nominal level)
Easy to “compute”
Reflects an actual value in the distribution, so it is easy to understand
Useful when there are 2+ “popular” scores (i.e., in multimodal distributions)
20
Mode
A data set may be:Bimodal
MultimodalNo Mode
denoted by M
the only measure of central tendency that can be used with qualitative data
21
a. 5.40 1.10 0.42 0.73 0.48 1.10
b. 27 27 27 55 55 55 88 88 99
c. 1 2 3 6 7 8 9 10
Examples
Mode is 1.10
Bimodal - 27 & 55
No Mode
22
THE MODE: DISADVANTAGES
Ignores most information in the distribution
Tends to be unstable (i.e., value varies a lot from one sample to the next)
Some distributions may not have a mode (e.g., 10, 10, 11, 11, 12, 12)
23
THE MEDIANThe median is the
score that divides the distribution into two equal halves
50% are below the median, 50% aboveAge: 26 27 27 28
29 30 31Median (Mdn) = 28
The median
24
5.40 1.10 0.42 0.73 0.48 1.10 0.66
0.42 0.48 0.66 0.73 1.10 1.10 5.40
(in order - odd number of values)
exact middle MEDIAN is 0.73
5.40 1.10 0.42 0.73 0.48 1.10
0.42 0.48 0.73 1.10 1.10 5.40
0.73 + 1.10
2
(even number of values – no exact middleshared by two numbers)
MEDIAN is 0.915
25
THE MEDIAN: ADVANTAGES
Not influenced by outliers
Particularly good index of what is “typical” when distribution is skewed
Easy to “compute”
26
THE MEDIAN: DISADVANTAGES
Does not take actual data values into account—only an index of position
Value of median not necessarily an actual data value, so it is more difficult to understand than mode
27
THE MEAN
The mean is the arithmetic average
Data values are summed and divided by N
Age: 26 27 27 28 29 30 31
Mean = 28.3
The mean28
THE MEAN (CONT’D) Most frequently used measure of central
tendency
Equation:M = ΣX ÷ N
Where: M = sample mean Σ = the sum ofX = actual data valuesN = number of people
29
THE MEAN: ADVANTAGES
The balance point in the distribution: Sum of deviations above the mean always exactly
balances those below it
Does not ignore any information
The most stable index of central tendency
Many inferential statistics are based on the mean
30
THE MEAN: DISADVANTAGES
Sensitive to outliers
Gives a distorted view of what is “typical” when data are skewed
Value of mean is often not an actual data value
31
THE MEAN: SYMBOLS
Sample means: In reports, usually symbolized as M In statistical formulas, usually symbolized as (pronounced X bar)
Population means: The Greek letter μ (mu)
x
32
Notation
µ is pronounced ‘mu’ and denotes the mean of all values in a population
x =n
∑ x
is pronounced ‘x-bar’ and denotes the mean of a set of sample valuesx
Nµ =
∑ x
33
Best Measure of Center
34
SymmetricData is symmetric if the left half of
its histogram is roughly a mirror image of its right half.
SkewedData is skewed if it is not symmetric
and if it extends more to one side than the other.
Definitions
35
Skewness Figure 2-11
36
RecapIn this section we have discussed:
Types of Measures of CenterMeanMedianMode
Mean from a frequency distribution
Best Measures of Center
Skewness 37
MEASURES OF VARIATION
Because this section introduces the concept of variation, this is one of the most important sections in the entire book
38
DEFINITION
The range of a set of data is the difference between the highest value and the lowest value
valuehighest lowest
value
39
DEFINITION
The standard deviation of a set of sample values is a measure of variation of values about the mean
40
SAMPLE STANDARD DEVIATION FORMULA
∑ (x - x)2
n - 1S =
41
SAMPLE STANDARD DEVIATION (SHORTCUT FORMULA)
n (n - 1)
s =n (∑x2) - (∑x)2
42
Standard Deviation - Key Points
The standard deviation is a measure of variation of all values from the mean
The value of the standard deviation s is usually positive
The value of the standard deviation s can increase dramatically with the inclusion of one or more outliers (data values far away from all others)
The units of the standard deviation s are the same as the units of the original data values
43
Definition
Empirical (68-95-99.7) Rule For data sets having a distribution that is approximately bell shaped, the following properties apply:
About 68% of all values fall within 1 standard deviation of the mean About 95% of all values fall within 2 standard deviations of the mean About 99.7% of all values fall within 3 standard deviations of the mean
44
The Empirical Rule
FIGURE 2-13
45
The Empirical Rule
FIGURE 2-13
46
The Empirical Rule
FIGURE 2-13
47
ARE YOU READY
Post test Time
48
Slid
e 3
- 49
Which measure of center is the only one that can be used with data at the catogrical level of measurement?
A. Mean
B. Median
C. Mode
Slid
e 3
- 50
Which of the following measures of center is not affected by outliers?
A. Mean
B. Median
C. Mode
Slid
e 3
- 51
Which of the following measures of center is not affected by outliers?
A. Mean
B. Median
C. Mode
Slid
e 3
- 52
Find the mode (s) for the given sample data.
79, 25, 79, 13, 25, 29, 56, 79
A. 79
B. 48.1
C. 42.5
D. 25
Slid
e 3
- 53
Find the mode (s) for the given sample data.
79, 25, 79, 13, 25, 29, 56, 79
A. 79
B. 48.1
C. 42.5
D. 25
Slid
e 3
- 54
Which is not true about the variance?
A. It is the square of the standard deviation.
B. It is a measure of the spread of data.
C. The units of the variance are different from the units of the original data set.
D. It is not affected by outliers.
Slid
e 3
- 55
Which is not true about the variance?
A. It is the square of the standard deviation.
B. It is a measure of the spread of data.
C. The units of the variance are different from the units of the original data set.
D. It is not affected by outliers.
Slid
e 3
- 56
Which of the following measures of center is not affected by outliers?
A. Mean
B. Median
C. Mode
EXERCISE TIME
57
EXERCISE 1
1. The following 10 data values are diastolic blood pressure readings. Compute the mean, the range and SD, for these data.
130 110 160 120 170 120 150 140 160 140
58
EXERCISE 2
The following are the fasting blood glucose level of 10 children
1. 56 6. 56 2. 62 7. 653. 63 8. 684. 65 9. 705. 65 10. 72Compute the: a. range b. standard deviation
59
EXERCISE 3
3.The fifteen patients making initial visits to a rural health department travelled these distances: Find: a. Range, b. Standard DeviationPatient Distance
(Miles)Patient Distance
(Miles)1 5 8 62 9 9 133 11 10 74 3 11 35 12 12 156 13 13 127 12 14 15
15 5
60
ANSWER
1. Range = 60 ; SD = 20 2. Range = 16 ; SD = 4.4 3. Range = 12 ; SD = 4.2
61