quantitative data analysis: univariate (cont’d) & bivariate statistics
DESCRIPTION
Quantitative Data Analysis: Univariate (cont’d) & Bivariate Statistics. Neuman and Robson Chapter 11. Research Data library at SFU http://www.sfu.ca/rdl/. Class Session Activities. Quiz 2 More on Univariate Statistics Begin Bivariate Statistics If time: - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Quantitative Data Analysis: Univariate (cont’d) & Bivariate Statistics](https://reader036.vdocuments.mx/reader036/viewer/2022062301/56815d91550346895dcbaa4c/html5/thumbnails/1.jpg)
Quantitative Data Analysis: Univariate (cont’d) & Bivariate Statistics
Neuman and Robson Chapter 11.
Research Data library at SFUhttp://www.sfu.ca/rdl/
![Page 2: Quantitative Data Analysis: Univariate (cont’d) & Bivariate Statistics](https://reader036.vdocuments.mx/reader036/viewer/2022062301/56815d91550346895dcbaa4c/html5/thumbnails/2.jpg)
Class Session Activities• Quiz 2
• More on Univariate Statistics• Begin Bivariate Statistics
• If time:– Hans Rosling on Using Empirical Research to Understand World Changehttp://www.youtube.com/watch?v=hVimVzgtD6w
– Hans Rosling: “Let my data set change your mind set”http://www.youtube.com/watch?v=KVhWqwnZ1eM&feature=related
![Page 3: Quantitative Data Analysis: Univariate (cont’d) & Bivariate Statistics](https://reader036.vdocuments.mx/reader036/viewer/2022062301/56815d91550346895dcbaa4c/html5/thumbnails/3.jpg)
Recall: Univariate Statistics
• Frequency distributions: explore each variable in a data set, separately to see the pattern of responses
• Measures of central tendency of the values (mean, median, mode)
• Measure of variation or variation (range, percentile, standard deviation, z-scores)
![Page 4: Quantitative Data Analysis: Univariate (cont’d) & Bivariate Statistics](https://reader036.vdocuments.mx/reader036/viewer/2022062301/56815d91550346895dcbaa4c/html5/thumbnails/4.jpg)
Studying Frequency Distributions
• Raw Data Obtain a printout of the raw data for all the variables.
• resembles a matrix, with the variable names heading the columns, and the information for each case or record displayed across the rows.
Source (for next examples): http://www.csulb.edu/~msaintg/ppa696/696uni.htm
![Page 5: Quantitative Data Analysis: Univariate (cont’d) & Bivariate Statistics](https://reader036.vdocuments.mx/reader036/viewer/2022062301/56815d91550346895dcbaa4c/html5/thumbnails/5.jpg)
Example: Raw data for a study of injuries among county workers (first 10 cases)
Raw data is difficult to grasp, especially with large number of cases or records.
![Page 6: Quantitative Data Analysis: Univariate (cont’d) & Bivariate Statistics](https://reader036.vdocuments.mx/reader036/viewer/2022062301/56815d91550346895dcbaa4c/html5/thumbnails/6.jpg)
To present the information in a more organized format, start with univariate descriptive statistics for each
variable. Example: The variable “Severity of Injury”
![Page 7: Quantitative Data Analysis: Univariate (cont’d) & Bivariate Statistics](https://reader036.vdocuments.mx/reader036/viewer/2022062301/56815d91550346895dcbaa4c/html5/thumbnails/7.jpg)
Frequency Distribution for “Severity of Injury”• Obtain a frequency distribution of the data for the variable.
– Identify the lowest and highest values of the variable, – Put all the values of the variable in order from lowest to highest. – count the number of appearance of each value of the variable. This is a count of the
frequency with which each value occurs in the data set.
![Page 8: Quantitative Data Analysis: Univariate (cont’d) & Bivariate Statistics](https://reader036.vdocuments.mx/reader036/viewer/2022062301/56815d91550346895dcbaa4c/html5/thumbnails/8.jpg)
Grouped Data• Decide on whether the data should be grouped into classes.
– Example: The severity of injury ratings can be collapsed into just a few categories or groups. – Grouped data usually has from 3 to 7 groups. – There should be no groups with a frequency of zero (in this example, there are no injuries with a
severity rating of 7 or 8).
• Ways to construct groups:– equal class intervals (e.g., 1-3, 4-6, 7-9). – Approximately equal numbers of observations in each group.
• Remember that class intervals must be both mutually exclusive and exhaustive.
![Page 9: Quantitative Data Analysis: Univariate (cont’d) & Bivariate Statistics](https://reader036.vdocuments.mx/reader036/viewer/2022062301/56815d91550346895dcbaa4c/html5/thumbnails/9.jpg)
Caution: Grouping Response Categories
• To make new categories• Facilitate analysis of trends• But decisions have effects on the
interpretation of patterns
![Page 10: Quantitative Data Analysis: Univariate (cont’d) & Bivariate Statistics](https://reader036.vdocuments.mx/reader036/viewer/2022062301/56815d91550346895dcbaa4c/html5/thumbnails/10.jpg)
Cumulative Frequency Distributions• include a third column in the table (this can be done with either simple
frequency distributions or with grouped data• How many injuries were at level 5 or lower? Answer=7
![Page 11: Quantitative Data Analysis: Univariate (cont’d) & Bivariate Statistics](https://reader036.vdocuments.mx/reader036/viewer/2022062301/56815d91550346895dcbaa4c/html5/thumbnails/11.jpg)
Percentaged Frequency Distributions
• Frequencies can also be presented in the form of percentage distributions and cumulative percentagescumulative percentages
![Page 12: Quantitative Data Analysis: Univariate (cont’d) & Bivariate Statistics](https://reader036.vdocuments.mx/reader036/viewer/2022062301/56815d91550346895dcbaa4c/html5/thumbnails/12.jpg)
Why Graph?
• way of visually presenting data• present the data• summarize the data • enhance textual descriptions • describe and explore the data • make comparisons easy • avoid distortion • provoke thought about the data
![Page 13: Quantitative Data Analysis: Univariate (cont’d) & Bivariate Statistics](https://reader036.vdocuments.mx/reader036/viewer/2022062301/56815d91550346895dcbaa4c/html5/thumbnails/13.jpg)
Bar Graphs (Bar Charts)• to display frequency distributions for variables measured at the nominal &
ordinal levels. • use the same width for all the bars with space between bars. • label the parts of the graph, including the title, the left (Y) or vertical axis,
the right (X) or horizontal axis, and the bar labels.
![Page 14: Quantitative Data Analysis: Univariate (cont’d) & Bivariate Statistics](https://reader036.vdocuments.mx/reader036/viewer/2022062301/56815d91550346895dcbaa4c/html5/thumbnails/14.jpg)
Another Bar Graph
![Page 15: Quantitative Data Analysis: Univariate (cont’d) & Bivariate Statistics](https://reader036.vdocuments.mx/reader036/viewer/2022062301/56815d91550346895dcbaa4c/html5/thumbnails/15.jpg)
Histograms
• for interval and ratio level variables• width of the bar is important, since it is the total
area under the bar that represents the proportion of the phenomenon accounted for by each category
• bars convey the relationship of one group or class of the variable to the other(s).
![Page 16: Quantitative Data Analysis: Univariate (cont’d) & Bivariate Statistics](https://reader036.vdocuments.mx/reader036/viewer/2022062301/56815d91550346895dcbaa4c/html5/thumbnails/16.jpg)
Histogram example• In the case of the counties & employee injuries, we might
have information on the rate of injury according to the number of workers in each county in State X.
![Page 17: Quantitative Data Analysis: Univariate (cont’d) & Bivariate Statistics](https://reader036.vdocuments.mx/reader036/viewer/2022062301/56815d91550346895dcbaa4c/html5/thumbnails/17.jpg)
Grouping Categories (Histograms)• If we group injury rates into
three groups:– low rate of injury would be 0.0-
1.9 injuries per 1,000 workers; – moderate would be 2.0-3.9; – high would be 4.0 and above (in
this case, up to 5.9).
![Page 18: Quantitative Data Analysis: Univariate (cont’d) & Bivariate Statistics](https://reader036.vdocuments.mx/reader036/viewer/2022062301/56815d91550346895dcbaa4c/html5/thumbnails/18.jpg)
Frequency Polygon• another way of
displaying information for an interval or ratio level variable.
• also used to show time series graphs, or the changes in rates over time.
![Page 19: Quantitative Data Analysis: Univariate (cont’d) & Bivariate Statistics](https://reader036.vdocuments.mx/reader036/viewer/2022062301/56815d91550346895dcbaa4c/html5/thumbnails/19.jpg)
Graph of Frequency Distribution (Univariate)
![Page 20: Quantitative Data Analysis: Univariate (cont’d) & Bivariate Statistics](https://reader036.vdocuments.mx/reader036/viewer/2022062301/56815d91550346895dcbaa4c/html5/thumbnails/20.jpg)
Pie Chart• Another way to show the
relationships between classes or categories;
• each "slice" represents the proportion of the total phenomenon that is due to each of the classes or groups.
![Page 21: Quantitative Data Analysis: Univariate (cont’d) & Bivariate Statistics](https://reader036.vdocuments.mx/reader036/viewer/2022062301/56815d91550346895dcbaa4c/html5/thumbnails/21.jpg)
Another visual representation of a distributions: Pie charts
![Page 22: Quantitative Data Analysis: Univariate (cont’d) & Bivariate Statistics](https://reader036.vdocuments.mx/reader036/viewer/2022062301/56815d91550346895dcbaa4c/html5/thumbnails/22.jpg)
Bivariate Statistics (relations between 2 variables)
• After examining univariate frequency distribution of the values of each variable separately,
• To study joint occurrence & distribution of the values of the independent and dependent variable together.
• The joint distribution of two variables is called a bivariate distribution.
![Page 23: Quantitative Data Analysis: Univariate (cont’d) & Bivariate Statistics](https://reader036.vdocuments.mx/reader036/viewer/2022062301/56815d91550346895dcbaa4c/html5/thumbnails/23.jpg)
Contingency Tables (Cross-tabulations) • A contingency table shows the frequency distribution of
the values of the dependent variable, given the occurrence of the values of the independent variable.
• Both variables must be grouped into a finite number of categories (usually no more than 2 or 3 categories) such as low, medium, or high; positive, neutral, or negative; male or female; etc.
![Page 24: Quantitative Data Analysis: Univariate (cont’d) & Bivariate Statistics](https://reader036.vdocuments.mx/reader036/viewer/2022062301/56815d91550346895dcbaa4c/html5/thumbnails/24.jpg)
Features of Contingency Table1. Title 2. Categories of the Independent Variable head the
tops of the columns 3. Categories of the Dependent Variable label the
rows 4. Order categories of the two variables from lowest
to highest (from left to right across the columns; from top to bottom along the rows). (Usually but not always).
5. Show totals at the foot of the columns
![Page 25: Quantitative Data Analysis: Univariate (cont’d) & Bivariate Statistics](https://reader036.vdocuments.mx/reader036/viewer/2022062301/56815d91550346895dcbaa4c/html5/thumbnails/25.jpg)
Basic Terminology (Tables)
• Parts of a Table– title (conventions)• Order of naming of variables • Dependent, independent, control
– body, cell, column, row– “marginals”
• sources, date
![Page 26: Quantitative Data Analysis: Univariate (cont’d) & Bivariate Statistics](https://reader036.vdocuments.mx/reader036/viewer/2022062301/56815d91550346895dcbaa4c/html5/thumbnails/26.jpg)
Bivariate Statistics: Parts of the Table
![Page 27: Quantitative Data Analysis: Univariate (cont’d) & Bivariate Statistics](https://reader036.vdocuments.mx/reader036/viewer/2022062301/56815d91550346895dcbaa4c/html5/thumbnails/27.jpg)
Constructing a Contingency Table1. if the variables not divided into categories, decide on
how to group the data. 2. obtain a frequency distribution for the values of the
independent variable; 3. obtain a frequency distribution for the values of the
dependent variable4. obtain the frequency distribution of the values of the
dependent variable, given the values of the independent variable (either by tabulating the raw data, or from a computer program
5. display the results of step 4 in a table
![Page 28: Quantitative Data Analysis: Univariate (cont’d) & Bivariate Statistics](https://reader036.vdocuments.mx/reader036/viewer/2022062301/56815d91550346895dcbaa4c/html5/thumbnails/28.jpg)
Table 1. Attitudes toward Consolidation by Area of Residence
Interpreting a Contingency Table• Inspect the contingency table for patterns. (difficult
if there are different totals of observations in the different categories of the independent variable)
![Page 29: Quantitative Data Analysis: Univariate (cont’d) & Bivariate Statistics](https://reader036.vdocuments.mx/reader036/viewer/2022062301/56815d91550346895dcbaa4c/html5/thumbnails/29.jpg)
Interpreting a Contingency Table
• Convert the observations in each cell to a percentage of the column total;
• be sure to still show the total number of observations for each column on which the percentages are based. (N= total number per column)
• Compare the percentages across the categories of the dependent variable (the rows).
![Page 30: Quantitative Data Analysis: Univariate (cont’d) & Bivariate Statistics](https://reader036.vdocuments.mx/reader036/viewer/2022062301/56815d91550346895dcbaa4c/html5/thumbnails/30.jpg)
Percentaged Contingency Table (example)Table 1b: Attitudes toward Consolidation by Area of
Residence
![Page 31: Quantitative Data Analysis: Univariate (cont’d) & Bivariate Statistics](https://reader036.vdocuments.mx/reader036/viewer/2022062301/56815d91550346895dcbaa4c/html5/thumbnails/31.jpg)
Interpreting a Contingency TableTable 1. Attitudes toward Consolidation by Area of Residence
• more city residents (54%) than non-city residents (37%) are for consolidation. Conversely, more non-city residents (39%) than city residents (19%) are against consolidation. About the same percentage of both groups have no opinion about
Description: More city residents (54%) than non-city residents (37%) are for consolidation. Conversely, more non-city residents (39%) than city residents (19%) are against consolidation. About the same percentage of both groups have no opinion about consolidation.
![Page 32: Quantitative Data Analysis: Univariate (cont’d) & Bivariate Statistics](https://reader036.vdocuments.mx/reader036/viewer/2022062301/56815d91550346895dcbaa4c/html5/thumbnails/32.jpg)
Grouping categories (Collapsing categories) U.N. example
Babbie, E. (1995). The practice of social researchBelmont, CA: Wadsworth
![Page 33: Quantitative Data Analysis: Univariate (cont’d) & Bivariate Statistics](https://reader036.vdocuments.mx/reader036/viewer/2022062301/56815d91550346895dcbaa4c/html5/thumbnails/33.jpg)
Collapsing Categories & omitting missing data
Babbie, E. (1995). The practice of social researchBelmont, CA: Wadsworth
![Page 34: Quantitative Data Analysis: Univariate (cont’d) & Bivariate Statistics](https://reader036.vdocuments.mx/reader036/viewer/2022062301/56815d91550346895dcbaa4c/html5/thumbnails/34.jpg)
Types of Relationships or Associations between two variables
– Correlation (or covariation)• when two variables ‘vary together’
– a type of association– Not necessarily causal
• Can be same direction (positive correlation or direct relationship)
• Can be in different directions (negative correlation or indirect relationship)
– Independence• No correlation, no relationship• Cases with values in one variable do not have
any particular value on the other variable
![Page 35: Quantitative Data Analysis: Univariate (cont’d) & Bivariate Statistics](https://reader036.vdocuments.mx/reader036/viewer/2022062301/56815d91550346895dcbaa4c/html5/thumbnails/35.jpg)
What is an association between two variables?
• Can the value of one variable be predicted, if we know the value of the other variable?
• Example: half the people participating in training programs get a job. What is the likelihood of any one participant getting a job? About fifty-fifty. So we would not be very good at predicting whether people will get jobs or not.
• If we introduce a second variable (i.e. length of time in training), does it help us to be more accurate in our predictions of the likelihood that someone will get a job?
![Page 36: Quantitative Data Analysis: Univariate (cont’d) & Bivariate Statistics](https://reader036.vdocuments.mx/reader036/viewer/2022062301/56815d91550346895dcbaa4c/html5/thumbnails/36.jpg)
Two variables
• Dependent variable: Obtaining a Job No job=100 Gets a job=100
• Independent Variable: Length of Training Program Short=100 Long=100
![Page 37: Quantitative Data Analysis: Univariate (cont’d) & Bivariate Statistics](https://reader036.vdocuments.mx/reader036/viewer/2022062301/56815d91550346895dcbaa4c/html5/thumbnails/37.jpg)
Bivariate Distribution--Perfect Positive Relationship(If training is good for getting a job)
If we know the length of the training program, we can perfectly predict the likelihood of getting a job. The longer the training program, the more likely the participant is to get a job and, conversely, the shorter the training program the less likely the participant is to get a job.
![Page 38: Quantitative Data Analysis: Univariate (cont’d) & Bivariate Statistics](https://reader036.vdocuments.mx/reader036/viewer/2022062301/56815d91550346895dcbaa4c/html5/thumbnails/38.jpg)
Bivariate Distribution--Perfect Inverse Relationship
• If we know the length of the training program, we can perfectly predict the likelihood of getting a job. The longer the training program, the less likely the participant is to get a job and, conversely, the shorter the training program the more likely the participant is to get a job. That is, as the training program length increases, likelihood of obtaining a job decreases.
![Page 39: Quantitative Data Analysis: Univariate (cont’d) & Bivariate Statistics](https://reader036.vdocuments.mx/reader036/viewer/2022062301/56815d91550346895dcbaa4c/html5/thumbnails/39.jpg)
Bivariate Distribution--No Relationship
• (If training has no relationship with getting a job)
50/50 guess. Knowing the length of the training program does not help to predict the likelihood of
getting a job.
![Page 40: Quantitative Data Analysis: Univariate (cont’d) & Bivariate Statistics](https://reader036.vdocuments.mx/reader036/viewer/2022062301/56815d91550346895dcbaa4c/html5/thumbnails/40.jpg)
Techniques for examining relationships between two variables
• Cross-tabulations or percentaged tables• Graphs, scattergrams or plots• Measures of association (e.g. correlation
coeficient, etc.)
![Page 41: Quantitative Data Analysis: Univariate (cont’d) & Bivariate Statistics](https://reader036.vdocuments.mx/reader036/viewer/2022062301/56815d91550346895dcbaa4c/html5/thumbnails/41.jpg)
Scattergram (Bivariate)
![Page 42: Quantitative Data Analysis: Univariate (cont’d) & Bivariate Statistics](https://reader036.vdocuments.mx/reader036/viewer/2022062301/56815d91550346895dcbaa4c/html5/thumbnails/42.jpg)
Interpreting a Relationship between two variables
• Do the patterns in the tables mean that there is a relationship between the two variables (in example: area of residence and attitude toward consolidation)? – Is one's attitude about consolidation associated with one's
area of residence?• If there is a relationship, how strong is it? Are the
results statistically significant? Are the results meaningfully significant?
• In order to answer these questions, we must turn to a set of statistics called Measures of Association (next day).