chapter 11 univariate data analysis; descriptive statistics these are summary measurements of a...

20
Chapter 11 Univariate Data Analysis; Descriptive Statistics These are summary measurements of a single variable. I. Averages or measures of central tendency – describes a dataset. A. Three kinds: mean, median, mode. 1. Mean: most common. Sum all the values in a group, divide by the total number of values in that group (Hint: start listing them in columns/headings). cases) (or values of number the is X of values the all of sum the is mean , n X X n X X

Upload: ferdinand-byrd

Post on 04-Jan-2016

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Chapter 11 Univariate Data Analysis; Descriptive Statistics These are summary measurements of a single variable. I.Averages or measures of central tendency

Chapter 11

Univariate Data Analysis; Descriptive Statistics

These are summary measurements of a single variable.

I. Averages or measures of central tendency – describes a dataset.

A. Three kinds: mean, median, mode.

1. Mean: most common. Sum all the values in a group, divide by the total number of values in that group (Hint: start listing them in columns/headings).

cases)(or valuesofnumber theis

X of values theall of sum theis

mean ,

n

X

X

n

XX

Page 2: Chapter 11 Univariate Data Analysis; Descriptive Statistics These are summary measurements of a single variable. I.Averages or measures of central tendency

Weighted Mean: Multiply each value by its frequency. Sum. Divide by total frequency.

2. Median: the mean is very sensitive to outlier scores that skew the distribution; median is not. It is the midpoint value.

Instructions: order all values. Find the middle-most score. That’s the median (if even number of cases, find middle-most two values; add them, divide by two).

Percentiles: 50th percentile is the median. 75th percentile means score is at or above 75% of the other scores.

3. Mode: most frequent value. B. When to use what.1. Three kinds of dataa. Nominal – categorical data (race, region).b. Ordinal – values are ranked, but not necessarily equal in

distance (7 values indicating GOP support).c. Interval – values are equal in distance (income).2. Use mean for interval (and sometimes ordinal). Use mode for

nominal (and sometimes ordinal). Use median for interval if you think there are outliers.

Page 3: Chapter 11 Univariate Data Analysis; Descriptive Statistics These are summary measurements of a single variable. I.Averages or measures of central tendency

II. Variability – how much scores differ from one another.

Which set of scores has greater variability?Set 1: 8,9,5,2,1,3,1,9Set 2: 3,4,3,5,4,6,2,3Means are Set 1: 4.75 and Set 2: 3.75. Tells us nothing

of variability.Variability is more precisely how different/far scores are

from the mean. III. Computing the RangeSubtract the lowest score from the highest (r=h-l)What is the range of these scores? 98,86,77,56,48Answer: 50 (98-48=50)IV. Computing the Standard DeviationThe standard deviation (s) is the average amount of

variability in a set of scores (average distance from mean).

Page 4: Chapter 11 Univariate Data Analysis; Descriptive Statistics These are summary measurements of a single variable. I.Averages or measures of central tendency

A. Formula:

Compute s for the following:

5,8,5,4,6,7,8,8,3,6

So, an s of 1.76 tells us that each score differs from the mean by an average of 1.76 points.

B. Purpose: to compare scores between different distributions, even when the means and standard deviations are different (e.g., men and women). Larger the s the greater the variability.

1

2

n

XXs

Page 5: Chapter 11 Univariate Data Analysis; Descriptive Statistics These are summary measurements of a single variable. I.Averages or measures of central tendency

V. Graphing and Tables. Why? Describes data visually, more clearly.

Frequency Distribution (Table 11-4) A. Class Interval Column – divides the scores up into

categories (0-4, 5-9, etc.). Usually range of 2,5,10, or 25 data points. Main thing: be consistent!

B. Frequency Column – number of scores within that range or category.

VI. GraphsA. Histogram – shows the distribution of scores by class

interval. Can compare different distributions on the same histogram. Shows:

1. Variability 2. Skewness - If the mean is greater than the median, positive

skewness. If median is greater than mean, negative skewness.

Page 6: Chapter 11 Univariate Data Analysis; Descriptive Statistics These are summary measurements of a single variable. I.Averages or measures of central tendency

Central Tendency and Variability

Relativ

e Frequency

Centre

Page 7: Chapter 11 Univariate Data Analysis; Descriptive Statistics These are summary measurements of a single variable. I.Averages or measures of central tendency

Central Tendency and Variability

Relativ

e Frequency

Spread

Page 8: Chapter 11 Univariate Data Analysis; Descriptive Statistics These are summary measurements of a single variable. I.Averages or measures of central tendency

SkewnessRelativ

e Frequ

ency

If the data set is symmetric, the mean equals the median.

MeanMedian

Page 9: Chapter 11 Univariate Data Analysis; Descriptive Statistics These are summary measurements of a single variable. I.Averages or measures of central tendency

Skewness

If the data set is skewed to the right, the mean is greater than the median.

MeanMedian

Page 10: Chapter 11 Univariate Data Analysis; Descriptive Statistics These are summary measurements of a single variable. I.Averages or measures of central tendency

Skewness

If the data set is skewed to the left, the mean is less than the median.

Mean Median

Page 11: Chapter 11 Univariate Data Analysis; Descriptive Statistics These are summary measurements of a single variable. I.Averages or measures of central tendency

B. Column Charts – simply tells the quantity of a category according to some scale. SCALE IS IMPORTANT (CSPAN-drug use story).

C. Bar Charts – same as Column chart, but reverse the axes.

D. Line Chart – Used to show trends (e.g. rise and fall in presidential popularity – line on page 317).

E. Pie Charts – Great for proportions (percent of MS budget going to each budget category).

Page 12: Chapter 11 Univariate Data Analysis; Descriptive Statistics These are summary measurements of a single variable. I.Averages or measures of central tendency

Line Graph

Page 13: Chapter 11 Univariate Data Analysis; Descriptive Statistics These are summary measurements of a single variable. I.Averages or measures of central tendency
Page 14: Chapter 11 Univariate Data Analysis; Descriptive Statistics These are summary measurements of a single variable. I.Averages or measures of central tendency

VII. The Normal Curve and Probability Theory

A. Tells us likelihood of an outcome

B. Tells us degree of confidence in a finding or outcome (i.e., how sure are we that the observed outcome is due to X versus random chance? AND how likely is it that our research hypothesis is true?).

VIII. Normal Curve or Bell-Shaped Curve Properties (Fig. 11-6)

A. Mean, median and mode are same NOT Skewed

Page 15: Chapter 11 Univariate Data Analysis; Descriptive Statistics These are summary measurements of a single variable. I.Averages or measures of central tendency

B. Perfectly symmetrical about the mean (i.e., two halves fit perfectly together).

C. Tails of the normal curve are asymptotic. Curves come close, but never touch the horizontal axis.

Are curves usually normal? Yes, especially with large sets of data (more than 30). Most scores are concentrated in the center and few are concentrated at the ends (height, intelligence, coin flipping).

Page 16: Chapter 11 Univariate Data Analysis; Descriptive Statistics These are summary measurements of a single variable. I.Averages or measures of central tendency

IX. Divisions of the Normal Curve (Fig. 11-9)A. Mean is at the center

B. Scores along x-axis correspond to standard deviations.C. Sections within the bell curve represent % of cases expected to

fall therein. Geometrically true (these are percentages of entire normal distribution).

D. For normal distributions (most data sets), practically all scores fall in between +3 and -3 sd’s (99.74%). Look at the probabilities of falling in between. 34.13% x 2 = 68.26% cases fall within 1 to -1 sd’s from mean.

Page 17: Chapter 11 Univariate Data Analysis; Descriptive Statistics These are summary measurements of a single variable. I.Averages or measures of central tendency

X. Z-scores (standard scores; i.e. the # of standard deviations from the mean)

A. Allow us to compare distributions with one another because they are scores that are standardized in units of standard deviations (can’t compare scores if they are measured differently; nonsensical). Different variables or groups will have different means and cannot be compared. But z-scores between groups of data can be compared because they are equivalent (e.g., one unit above or below the mean, respectively).

Page 18: Chapter 11 Univariate Data Analysis; Descriptive Statistics These are summary measurements of a single variable. I.Averages or measures of central tendency

B. Formula and interpretation

VII. Comparing z-scores from different distributions (p. 158 example).

-The raw scores of 12.8 and 64.8 in our data are equal distances from their respective means (z=.4 for both)

VIII. What z-scores represent

A. Z-scores correspond to sections under the curve (percentages under the curve).

Page 19: Chapter 11 Univariate Data Analysis; Descriptive Statistics These are summary measurements of a single variable. I.Averages or measures of central tendency

B. These percentages can be seen as probabilities of a certain score occurring given in Appendix D.Example of what we are saying:

“In a distribution with a mean of 100 and standard deviation of 10, what is the probability that any score will be 110 or above?”

The answer = _________.C. What about a z-score of 1.38? What are the chances that a score will fall within the mean and a z-score of 1.38? _______

• What about above a z-score of 1.38?____• What about at or below 1.38?______

Page 20: Chapter 11 Univariate Data Analysis; Descriptive Statistics These are summary measurements of a single variable. I.Averages or measures of central tendency

• What about between a z-score of 1 and 2.5? Answer:______ (look at picture 11-9)

Again, we are asking, what is the probability that a score will fall in between 1 and 2.5 standard deviations (z’s) of the mean? -1 and 2.5?