data summaries. summary statistics given a large set of numbers, we often want to describe, or...
TRANSCRIPT
Data SummariesData Summaries
Summary StatisticsSummary Statistics
• Given a large set of numbers, we often want to describe, or summarize, the data with a few revealing numbers.
• Example: Yearly sales of two brands of peanut butter
Summary StatisticsSummary Statistics
• Given a large set of numbers, we often want to describe, or summarize, the data with a few revealing numbers.
• Example: Yearly sales of two brands of peanut butter
Year 1992 1993 1994 1995 1996 1997 1998 1999Skippy 12 10 15 9 12 8 11 11Jif 12 9 11 12 10 12 11 11
Summary StatisticsSummary Statistics
• Example: Yearly sales of two brands of peanut butter
• Measurements of Center
Arithmetic Mean: The Average
Year 1992 1993 1994 1995 1996 1997 1998 1999Skippy 12 10 15 9 12 8 11 11Jif 12 9 11 12 10 12 11 11
Median: The data point in the center
1
n
ii
xx
n
Summary StatisticsSummary Statistics
• Example: Yearly sales of two brands of peanut butter
Skippy Mean:
Jif Mean:
Year 1992 1993 1994 1995 1996 1997 1998 1999Skippy 12 10 15 9 12 8 11 11Jif 12 9 11 12 10 12 11 11
8811
8x
Summary StatisticsSummary Statistics
• Example: Yearly sales of two brands of peanut butter
Median: Order the DataIf even number average the two center
numbersIf odd number report the center number
Summary StatisticsSummary Statistics
• Example: Yearly sales of two brands of peanut butter
Median: Order the DataIf even number average the two center
numbersIf odd number report the center numberSmallest Largest
Skippy 8 9 10 11 11 12 12 15Jif 9 10 11 11 11 12 12 12
two center numbers
Skippy and Jif Median = 11
Why Use A Median?Why Use A Median?
• Example: Sales Force Compensation
Group 1$60 K$60 K$60 K$60 K
$210 K
Why Use A Median?Why Use A Median?
• Example: Sales Force Compensation
Group 1$60 K$60 K$60 K$60 K
$210 K
Group 2$86 K$88 K$90 K$92 K$94 K
Why Use A Median?Why Use A Median?
• Example: Sales Force Compensation
Mean $90 K $90 K
Group 1$60 K$60 K$60 K$60 K
$210 K
Group 2$86 K$88 K$90 K$92 K$94 K
Why Use A Median?Why Use A Median?
• Example: Sales Force Compensation
Mean $90 K $90 K
Median $60 K $90 K
Group 1$60 K$60 K$60 K$60 K
$210 K
Group 2$86 K$88 K$90 K$92 K$94 K
Summary StatisticsSummary Statistics
• Measurements of Variation
Range: Largest - Smallest
Standard Deviation: Square Root of Variance
Variance: Average Squared Difference
2s s
max mini iR x x
2
2 1
1
n
ii
x xs
n
Summary StatisticsSummary Statistics
• Example: Yearly sales of two brands of peanut butter
Range: Largest - Smallest
Skippy:
Jif:
15 8 7R
12 9 3R
Smallest LargestSkippy 8 9 10 11 11 12 12 15Jif 9 10 11 11 11 12 12 12
Summary StatisticsSummary Statistics• Example: Yearly sales of two brands of peanut butter
Variance: Average Squared Difference: Skippy Only
Smallest LargestSkippy 8 9 10 11 11 12 12 15Jif 9 10 11 11 11 12 12 12
Summary StatisticsSummary Statistics• Example: Yearly sales of two brands of peanut butter
Variance: Average Squared Difference: Skippy Only
Smallest LargestSkippy 8 9 10 11 11 12 12 15Jif 9 10 11 11 11 12 12 12
2 2 2
2 8 11 9 11 ... 15 11
7s
Summary StatisticsSummary Statistics• Example: Yearly sales of two brands of peanut butter
Variance: Average Squared Difference: Skippy Only
Smallest LargestSkippy 8 9 10 11 11 12 12 15Jif 9 10 11 11 11 12 12 12
2 2 2
2 3 2 ... 4
7s
9 4 ... 16
7
4.57
2 2 2
2 8 11 9 11 ... 15 11
7s
Summary StatisticsSummary Statistics• Example: Yearly sales of two brands of peanut butter
Standard Deviation: Square Root of Variance
Skippy:
Jif:
Smallest LargestSkippy 8 9 10 11 11 12 12 15Jif 9 10 11 11 11 12 12 12
2.14 4.57
1.07 1.14
Graphical SummaryGraphical Summary
• A Picture is Worth a Thousand Words (Bar Chart)
YEAR
1999.00
1998.00
1997.00
1996.00
1995.00
1994.00
1993.00
1992.00
Me
an
16
14
12
10
8
6
SKIPPY
JIF
Summary StatisticsSummary Statistics
• A Year Worth of Weekly Sales Figures
Week Skippy Jif1 10.25 10.532 7.81 9.863 11.61 11.094 14.19 10.93. . .. . .. . .51 4.56 10.5652 14.62 11.52
Summary StatisticsSummary Statistics
• Summary Statistics: Using SPSS
• Skippy Range = 16.94 - 4.56 = 12.38
• Jif Range = 14.07 - 9.06 = 5.01
Descriptive Statistics
52 4.56 16.94 10.6885 3.0282
52 9.06 14.07 11.0368 .9475
52
SKIPPY
JIF
Valid N (listwise)
N Minimum Maximum MeanStd.
Deviation
Graphical SummaryGraphical Summary
• Bar Chart
WEEK
52.00
49.00
46.00
43.00
40.00
37.00
34.00
31.00
28.00
25.00
22.00
19.00
16.00
13.00
10.00
7.00
4.00
1.00
Me
an
18
16
14
12
10
8
6
4
2
SKIPPY
JIF
Graphical SummaryGraphical Summary
• Line Chart
WEEK
52.00
49.00
46.00
43.00
40.00
37.00
34.00
31.00
28.00
25.00
22.00
19.00
16.00
13.00
10.00
7.00
4.00
1.00
Me
an
18
16
14
12
10
8
6
4
2
SKIPPY
JIF
Graphical SummaryGraphical Summary
• Histogram
SKIPPY
17.0
16.0
15.0
14.0
13.0
12.0
11.0
10.0
9.0
8.0
7.0
6.0
5.0
14
12
10
8
6
4
2
0
Std. Dev = 3.03
Mean = 10.7
N = 52.00
JIF
14.00
13.50
13.00
12.50
12.00
11.50
11.00
10.50
10.00
9.50
9.00
14
12
10
8
6
4
2
0
Std. Dev = .95
Mean = 11.04
N = 52.00
Graphical SummaryGraphical Summary
• The Box and Whisker Plot
5252N =
JIFFSKIPPY
18
16
14
12
10
8
6
4
2
35
Antidepressant SurveyAntidepressant Survey
• Questionnaire Administered to 178 Physicians Randomly Selected from 100,000 physicians who prescribe of antidepressant drugs
• Investigating Physician Usage of Antidepressant medication
QuestionnaireQuestionnaireAntidepressant Survey
Physician and Practice Characteristics 1. What is your primary medical specialty? (circle one only) Adult Psychiatry (1) General Psychiatry (6) Child/Adolescent Psychiatry (2) Internal Medicine (7) Family Practitioner (3) Neurology (8) Forensic Psychiatry (4) Other (9) General Practitioner (5)
2. How many years have you been in practice, post residency? Number years in practice: ________ (raw #)
QuestionnaireQuestionnaireDrug Profile and Utilization 3. Please indicate approximately how many prescriptions you write for each of the
following products in a typical month.
# of Rx’s in an average month
a. Celexa (raw #)
b. Effexor (raw #)
c. Luvox (raw #)
d. Paxil (raw #)
e. Prozac (raw #)
f. Serzone (raw #)
g. Wellbutrin (raw #)
h. Zoloft (raw #)
Summary StatisticsSummary Statistics
• Frequency Data (0/1 or 1 From Many)
Frequency Percent Valid Percent
Cumulative Percent
Valid Adult Psychiatry 24 13.5 13.8 13.8 Child/Adolescent
Psychiatry 7 3.9 4.0 17.8
Family Practitioner 85 47.8 48.9 66.7 General
Practitioner 1 .6 .6 67.2
General Psychiatry
10 5.6 5.7 73.0
Internal Medicine 47 26.4 27.0 100.0 Total 174 97.8 100.0
Missing System 4 2.2 Total 178 100.0
Graphical SummaryGraphical Summary
• Pie Chart
Internal Medicine
General Psychiatry
General Practitioner
Family Practitioner
Child/Adolescent Psy
Adult Psychiatry
Missing
Prescription RatesPrescription Rates
Descriptive Statistics
178 .00 60.00 6.0646 8.9755
178 .00 100.00 7.2455 12.5082
177 .00 35.00 2.0650 4.4136
178 .00 100.00 11.2725 13.2902
178 .00 100.00 15.6264 17.8469
177 .00 80.00 4.7345 8.7678
178 .00 75.00 7.4876 11.2028
178 .00 100.00 10.7079 13.9228
177
CELEXA
EFFEXOR
LUVOX
PAXIL
PROZAC
SERZONE
WELLBUTR
ZOLOFT
Valid N (listwise)
N Minimum Maximum MeanStd.
Deviation
Prescription RatesPrescription Rates
177177177177177177177177N =
ZOLOFT
WELLBUTR
SERZONE
PROZAC
PAXIL
LUVOX
EFFEXOR
CELEXA
120
100
80
60
40
20
0
-20
1141701691345290160105125115110981316929
701767889138
9
13
981701101544
529
13170
47
138
31981381259913916917011014678154131
17152
9
17087138
7314717113117498
69151
7870
89
99847
131
166
69
1467514878171810517618341691375715916752291334149115679170125131
98
1701605278
9155
70
69
131
13
509817070751691671313193416517652978
29155
Prozac Rates by Physician TypeProzac Rates by Physician Type
• First, Box Plot Summaries by Physician Type
• Second, ReCode Data - High/Average /Low Prescription Rates
Prozac Rates by Physician TypeProzac Rates by Physician Type
• Box Plot Summaries by Physician Type
47101857244N =
D.TYPE
Internal Medicine
General Psychiatry
General Practitioner
Family Practitioner
Child/Adolescent Psy
Adult Psychiatry
Missing
PR
OZ
AC
120
100
80
60
40
20
0
-20
176
87
171
78
18
17473
151
67
167
98131
69
Prozac Rates by Physician TypeProzac Rates by Physician Type
• ReCode Data
High
Average
Low
– Low Rate = 0 to 10 prescriptions per month
– Average Rate = 10 to 20 prescriptions per month
– High Rate = 20+ prescriptions per month
Cross Tabulating DataCross Tabulating Data
• Create a Table Which Summarizes Number in Each Level
D.TYPE * PROZACLV Crosstabulation
Count
4 9 11 24
1 4 2 7
58 14 13 85
1 1
4 2 4 10
33 7 7 47
100 36 38 174
Adult Psychiatry
Child/AdolescentPsychiatry
Family Practitioner
General Practitioner
General Psychiatry
Internal Medicine
D.TYPE
Total
Low Average High
PROZACLV
Total
Graphing the Cross Tabulation Graphing the Cross Tabulation
• Same Information Can be Summarized Using a Bar Plot
D.TYPE
Internal Medicine
General Psychiatry
General Practitioner
Family Practitioner
Child/Adolescent Psy
Adult Psychiatry
Missing
Co
un
t
70
60
50
40
30
20
10
0
PROZACLV
Low
Average
High
Next Class Period Next Class Period in Computer Labin Computer Lab
• Don’t forget: Next Period 11&14 BAB – from 7:15 p.m. to 9:00 p.m. We will not meet during the regularly class time during the day.
• Also, please bring a floppy disk to class, to save your work.