chapter 3: organizing data. raw data is useless to us unless we can meaningfully organize and...

Post on 19-Jan-2016

248 Views

Category:

Documents

5 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Chapter 3:Organizing Data

• Raw data is useless to us unless we can meaningfully organize and summarize it (descriptive statistics).

• Organization techniques include:– Tables such as frequency distributions– Graphs such as histograms, bar graphs, line

graphs, pie charts, stem-and-leaf plots, and scatterplots

Frequency Distribution

• A frequency distribution is a table that lists all the categories or values of a variable as well as the corresponding number of occurrences or responses for each category or value of the variable (its frequency, or how often the category occurs).

• Frequency distributions can be used for both categorical variables (nominal or ordinal) and numerical variables (interval or ratio).

To create a frequency distribution for categorical data:• First create a list of all the categories or values of the

variable and then count the number of times each different category or value occurred in the data.

• Then find the percentage of respondents for each category.

• Set up your basic table to have 3 columns: (1) the list of categories or the values of the variable of interest, (2) the frequency count, and (3) the percentage.

• When dealing with ordinal variables, make sure the categories are ranked (lowest to highest or highest to lowest).

Examples of frequency distributions for nominal variables:

Examples of frequency distributions for ordinal variables:

Statistical Software

• There are many statistical software packages that exist. Most of the output looks the same, so reading the output will be similar amongst programs. The textbook shows output from SPSS (Statistical Package for the Social Sciences). Some other useful programs are StatCrunch, SAS (Statistical Analytics System), and even Excel, amongst many others.

• Let’s look at SPSS output for a frequency distribution.

• Notice this is similar to our frequency distributions, but has a few extra columns.

• Percent is calculated the same way it was calculated prior.• Valid percent accounts for the missing data. It divides the frequency

by the total minus the missing data (1485 for this example).• Cumulative percent is a running total (based on valid %).

• It is important to be able to utilize the frequency distribution to interpret the data and answer questions.

• Utilize the valid percent column when answer percent questions.

• The two common ways to represent frequency distributions of categorical data are bar graphs and pie charts.

• For a bar graph, place the categories on the horizontal axis and either the frequency or the percent on the vertical axis.

• For a pie chart, make sure each sector is labeled, appropriately sized, and contains the percent.

Simple Frequency Distribution for Numerical Data

• Frequency distributions for numerical data are either simple (the individual values are displayed with their frequencies) or grouped (list grouped frequencies, or equal sized classes).

• Grouped frequency distributions are used when we have a large number of observations.

• Examples:

To construct a simple frequency distribution:1) Find the lowest and highest numbers.2) In column form, write in ascending order all

the consecutive numbers from the lowest to highest.

3) Count the frequency.

Example: Construct a simple frequency distribution.7 9 5 7 9 7 5 7 9 610 7 6 5 7 8 10 6 9 68 12 8 8 7 5 7 6 8 75 6 9 7 6 8 7 5 5 6

• In a grouped frequency, the numbers are usually grouped into equal-sized ranges called class intervals.

• Each class interval contains a lower class limit and an upper class limit.

• The class width is how wide each interval is, and should usually be equal amongst intervals.

• If the data contains an extremely small or large value, it might not be possible to have intervals of equal width. In this case, use an open-end class interval.

Example: Identify the class width for the following class intervals.

• The class mid-point is the average of the lower and upper class limits.

• The mid-point of a class interval 10-15 would be:

• What is the class mid-point for a class interval of 20-40?

Steps to Construct a Grouped Frequency Table

1) Find the highest and lowest values in the dataset and subtract to find the range.

2) Decide on the desired number of classes (it should be between 5 and 20) and then compute the class width by dividing the range by the desired number of classes. Note: There is no clear right or wrong answer for the number of classes.

3) Select a starting point (lower class limit). Use either the lowest number or a convenient number slightly lower than the lowest score.

4) Add the class width to the starting point to get the second lower limit. List the lower limits in a vertical column and enter the corresponding upper limits. Then fill in the values.

Example: The daily high temperature in degrees F for the month of July in Carucciville was as follows:85 83 96 101 97 90 106 102 82106 104 72 89 85 97 85 94 10092 96 104 102 75 99 92 79 94102 76 99Construct a grouped frequency distribution for the data.

85 83 96 101 97 90 106 102 82106 104 72 89 85 97 85 94 10092 96 104 102 75 99 92 79 94102 76 99

Note: There are 8 classes instead of the desired 7 due to rounding.

We can also find the relative frequencies for each class.

• Sometimes, if a variable is continuous, the values recorded in the study may be rounded off.

• In these instances, we want to know the real limits or class boundaries.

• The real limits of a continuous variable are usually the values that are above or below the recorded value by one-half of the place value to which the numbers were rounded.

• Example: Say we are examining height and rounding to the nearest centimeter. If 130 cm is recorded, the real limits are 129.5-130.5 cm, because anything in those limits would result in us recording 130 cm.

Example: Find the real class limits or the class boundaries for the following class intervals:

A histogram is a graph in which the areas in the form of vertical bars represent the frequency of occurrence in a distribution of scores.

A relative frequency histogram has the same shape as a histogram, but the vertical scale is marked with relative frequencies instead of actual frequencies.

top related