chapter 13 ch13.qxd.pdf · 13-4d frequency table with metric data ... describe the types of...

24
`247 Univariate Statistics Outline Chapter 13 13-1 Introduction 13-2 The Role of Statistics 13-2a Descriptive Statistics 13-2b Inferential Statistics 13-3 Limitations of Statistics in Research 13-4 The Frequency Distribution 13-4a An Overview 13-4b General Comments about Table Entries 13-4c Frequency Distribution with More Than One Frequency Distribution 13-4d Frequency Table with Metric Data 13-5 Graphic Presentations 13-5a The Bar Graph 13-5b The Histogram 13-5c The Pie Chart 13-6 Measures of Central Tendency 13-6a The Mode 13-6b The Median 13-6c The Mean 13-6d Comparing the Mode, Median, and Mean 13-7 Measures of Dispersion 13-7a The Variation Ratio (v) 13-7b The Range 13-7c The Mean Deviation 13-7d The Variance and the Standard Deviation 13-8 Shape of the Distribution and Metric Distributions 13-8a Skewed Distributions 13-8b The Normal Curve 13-8c Standard Scores (the Z Score) Chapter Summary Chapter Quiz Suggested Readings Endnotes Key Terms Chapter bar graph central tendency descriptive statistics dispersion frequency distribution histogram inferential statistics mean mean deviation measures of central tendency median mode negatively skewed normal distribution curve pie graph positively skewed range skewed distribution standard deviation standard normal distribution standard score univariate analysis variance variation ratio Z score

Upload: duongthu

Post on 03-Jul-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 13 ch13.qxd.pdf · 13-4d Frequency Table with Metric Data ... Describe the types of frequency distributions. 7. ... To test this hypothesis the researcher may …

`247

Univariate Statistics

O u t l i n e

Chapter

1313-1 Introduction13-2 The Role of Statistics

13-2a Descriptive Statistics13-2b Inferential Statistics

13-3 Limitations of Statistics in Research13-4 The Frequency Distribution

13-4a An Overview13-4b General Comments about Table Entries 13-4c Frequency Distribution with More Than One Frequency Distribution 13-4d Frequency Table with Metric Data

13-5 Graphic Presentations13-5a The Bar Graph13-5b The Histogram13-5c The Pie Chart

13-6 Measures of Central Tendency13-6a The Mode13-6b The Median13-6c The Mean13-6d Comparing the Mode, Median, and Mean

13-7 Measures of Dispersion13-7a The Variation Ratio (v)13-7b The Range13-7c The Mean Deviation13-7d The Variance and the Standard Deviation

13-8 Shape of the Distribution and Metric Distributions13-8a Skewed Distributions13-8b The Normal Curve13-8c Standard Scores (the Z Score)

Chapter SummaryChapter QuizSuggested ReadingsEndnotes

Key Terms

Chapter

bar graphcentral tendencydescriptive statisticsdispersionfrequency distribution histograminferential statisticsmeanmean deviationmeasures of central tendencymedianmodenegatively skewednormal distribution curvepie graphpositively skewedrangeskewed distributionstandard deviationstandard normal distributionstandard scoreunivariate analysisvariancevariation ratioZ score

Page 2: Chapter 13 ch13.qxd.pdf · 13-4d Frequency Table with Metric Data ... Describe the types of frequency distributions. 7. ... To test this hypothesis the researcher may …

248 Chapter 13

13-1 IntroductionYou have now completed several steps in the behavioral research process, such asthe literature review, the research plan, and data collection and processing. Nowyou are ready to analyze your data. This procedure, which includes the calculationof different statistics, can be the most exciting part of the entire research process.You begin to convert raw data and indefinable patterns into explanation andunderstanding. As you begin to receive signs that your data substantiates your ini-tial expectations, you begin “. . . to sense the excitement of discovery; a thoroughlyinvigorating and stimulating intellectual experience shared by all scientists” (Cole1996, 141).

Thankfully, a computer’s statistical program, such as SPPSW, will calculate thestatistics for you. The calculation, however, is secondary. The more important taskis to interpret the statistics so you can see what your data is trying to tell you. Thus,the next three chapters will give you the tools to interpret statistics so you can revelin the excitement of discovery.

An understanding of this chapter will enable you to

1. Explain the role of descriptive and inferential statistics.2. Explain a frequency distribution and describe its characteristics.3. Understand different ways to present your data.4. Interpret measures of central tendency.5. Interpret the measures of dispersion.6. Describe the types of frequency distributions.7. Explain the normal curve.

13-2 The Role of StatisticsThe role of statistics in political research is a subject of intense debate. Normativetheorists see statistics as cold and calculating. They also see the proponents of sta-tistics as more concerned with what it is versus what it should be. Behavioralists, onthe other hand, see statistics as another way to analyze and explain political phe-nomena. Despite the debate, the role of statistics in the social sciences is impor-tant. Statistics enable us to see patterns in the data and to describe and interpretobservations in ways that help us test theories and hypotheses. In short, statisticsare an invaluable tool for the political scientist who seeks to resolve importantpolitical questions.

The empirical analysis of political questions often involves a mass of quanti-tative data requiring organization before making any analysis and interpretation.Additionally, before examining the relationship between variables, you mustdescribe the typical case of a variable and determine how typical it really is (Kay1991). Statisticians call this process univariate analysis. Conversely, when weanalyze one variable in relation to another variable, we are conducting bivariateanalysis.

13-2a Descriptive Statistics There are two types of statistics that political scientists use: descriptive statisticsand inferential statistics. Descriptive statistics enable political scientists to organ-ize and summarize data. They provide us with the necessary tools to describequantitative data. Among these summarizing measures are percentages, propor-tions, means, and standard deviations. Descriptive statistics are especially usefulwhen the researcher finds it necessary to analyze interrelationships between morethan two variables.

univariate analysis: The analysisof a single variable. Researchersoften use frequency tables, bargraphs, or pie charts to completesuch an analysis.

descriptive statistics: Themathematical summary ofmeasurements for a set of data.

Page 3: Chapter 13 ch13.qxd.pdf · 13-4d Frequency Table with Metric Data ... Describe the types of frequency distributions. 7. ... To test this hypothesis the researcher may …

249Univariate Statistics

13-2b Inferential Statistics Inferential statistics deal with sample data. They enable the researcher to inferproperties of a population based on data collected from only a random probabil-ity sample of individuals. Inferential statistics have value because they offset prob-lems associated with data collection. For example, the time-cost factor associatedwith collecting data on the entire population may be prohibitive. That is, the pop-ulation may be immense and difficult to define. In such instances, inferential sta-tistics can prove to be invaluable to the social scientist.

Descriptive and inferential statistics are used in the data analysis process. Dataanalysis involves noting whether hypothesized patterns exist in the observations.We might hypothesize, for example, that urban legislators are more liberal andsupportive of welfare programs than those legislators representing rural con-stituencies. To test this hypothesis the researcher may ask urban and rural legisla-tors about their views on welfare programs and payments. The researcher thencompares the groups and uses descriptive and inferential statistics to find outwhether differences between the groups support expectations.

In sum, a descriptive statistic is a mathematical summary of measurements forone variable. Inferential statistics, on the other hand, use sample data to makestatements about the population. Descriptive and inferential statistics provideexplanations for complex political phenomena that deal with relationshipsbetween variables. Thus, they are an important tool in the political scientist’srepertoire.

13-3 Limitations of Statistics in ResearchStatistics cannot resolve every question you have about politics. Therefore, weneed to discuss some of the limitations of statistical research.

First, statistics do not provide the means for the researcher to prove anythinghe or she wants to prove. On the contrary, there are explicit procedural guidelines,rules, and decision-making criteria to follow in the statistical analysis of data. Assuch, statistics cannot make up for the lack of clear, consistent, logical thinking inthe development of a body of theory.

Second, statistics provide little help in understanding political phenomenathat we cannot empirically measure. Some contend, for example, that we cannotmeasure the critical concept of political power (Bacharach and Baratz 1962). Evenwhen measurement is possible, statistics do not always tell us whether we aremeasuring what we want to measure. There are, for example, several ways to meas-ure the rate of employment. One possibility is to contact the local unemploymentoffice and find out how many individuals have applied for unemployment bene-fits. However, what about the few who believe it is beneath them to apply for whatthey perceive as welfare? And what about those who have dropped out of the jobmarket?

A final principal limitation of statistics is that the techniques only allows us todescribe and infer trends among groups. They do not provide definite predictionsabout individual cases. Thus, while statistical techniques may provide guidelines,they do not allow us to reach certain conclusions about individuals (Cole 1996).Knowing that 64 percent of the respondents in a survey favored gun control, forexample, does not allow you to say that your neighbor favors gun control.

In sum, there are important limits on the value of statistical analysis. There aresome political problems you cannot explore statistically. For those questions sub-ject to quantitative analysis, however, statistics may only be a “poor man’s” substi-tute for controlled laboratory, or true experimental research. Statistics in these

inferential statistics: Statisticsthat enable the researcher to makedecisions (inferences) aboutcharacteristics of a populationbased on observations from arandom probability sample takenfrom the population.

Page 4: Chapter 13 ch13.qxd.pdf · 13-4d Frequency Table with Metric Data ... Describe the types of frequency distributions. 7. ... To test this hypothesis the researcher may …

250 Chapter 13

cases are only valuable when researchers carefully define the problem, develop waysto measure important concepts, and use a sound research design to collect data.Then, and only then, are statistics helpful in understanding the research question.

13-4 The Frequency Distribution Constructing a frequency distribution will probably be the first step you will takewhen organizing and presenting information. Properly constructed frequency dis-tributions help summarize a large amount of information while enhancing theinterpretation of data. In this chapter you will learn all you need to know aboutarraying and summarizing single variables.

13-4a An OverviewAs a student you have frequently read research papers, articles, and reports thatincluded descriptive statistics. Government textbooks, for example, present dis-plays of voting results, public welfare expenditures, and characteristics of congres-sional members. Additionally, media headlines read “President’s Popularity Risesby Three Percent” or “The Dow Stock Market Drops by 150 Points.” The mediaalso inundates you with these statistics in the form of public opinion polls. What-ever the source, most of your exposure has usually been with tabular statistics, orfrequency distributions.

One step in analyzing and reporting information involves the presentation offrequency distributions of the variables of concern. A frequency distribution isnothing more than a tabulation of raw data according to numerical values and dis-crete classes. A frequency distribution of party identification, for example, showsthe number of individuals belonging to a particular political party.

A frequency table is the tabular presentation of a frequency distribution. Itshould meet certain criteria to be considered presentation quality.1 Let’s examineTable 13-1, which is a distribution of respondents’ political ideology from the 1998National Opinion Research Center General Social Survey.

The presentation of a frequency distribution should include the following:

• Table labels: If there is more than one table included in a report, the tablesneed a label to distinguish them. Examples include “Table 1,”“Table 2,” or“Table 13-1.” The latter example identifies the first table of Chapter 13.

• Descriptive title: Researchers must make it clear to the reader whatinformation they are presenting. The title must be as specific as possible. Assuch, it should include the type of information (Respondents’ PoliticalIdeology), the time (1998), and any other pertinent information (GeneralSocial Survey [GSS]).

frequency distribution:A tabulation of raw data accordingto numerical values and discreteclasses. A frequency distributionof party identification, forexample, shows the number of individuals belonging to a particular political party.

T a b l e 13-1 Respondents’ Ideology, General Social Survey (GSS) 1998

Category Frequency % Cumulative %Liberal 772 28.7 28.7

Moderate 986 36.6 65.3

Conservative 933 34.7 100.0

Totals: 2691 100.0

Question: Is the respondent a liberal, moderate, or conservative?

Source: Data from James A. Davis and Tom W. Smith. National Opinion Research Center (NORC) General Social Survey (GSS)for 1998.

Page 5: Chapter 13 ch13.qxd.pdf · 13-4d Frequency Table with Metric Data ... Describe the types of frequency distributions. 7. ... To test this hypothesis the researcher may …

251Univariate Statistics

• Clear labels: Tables require clearly labeled columns that enable the reader tosee the column and row summaries of the table’s data. Our example includesfour columns. Column one contains the variable value (row) labels “Liberal”through “Conservative.” The second column lists the frequencies, or numberof cases, for each category of the variable. The third column lists the valuepercentages. For example, there are 772 respondents who said they wereliberal. The percentage column converts the frequency into a percentage ofthe 2,891 cases, or 28.7 percent. The last column, Cumulative %, is onlyneeded if there are more than two categories associated with the variable. It issimply a “running total” of the category percentages.

• Appropriate classes: Normally each group should have some entries.Additionally, classes should not be so large that they obscure the range andvariation in the data. For example, classes that divided cases into less thantwenty, or twenty or more, may find 85 percent of the cases in a single class.This obscures differences in the data. Conversely, a unit-by-unit breakdownsuch as less than one, or one to two, would be too fine a classification andleave some classes with few cases. When determining the number of classes,you need to consider the needs of your audience and the nature of your data.

• A totals row: A properly constructed frequency table must include a totalsrow showing the total number of cases included in the table and thepercentage total that will normally add up to 100 percent. We say “normallyadd up to 100 percent” because there may be a small difference (99.9 or100.1) due to the rounding of individual category values.

• Source and question: It is a good idea to specify the source of the data youpresented in the table. The source may be the Congressional Record or, as inour example, survey data collected by a national research center. Whenworking with survey data, you should also include the question that describesthe variable (Corbett 2001, 135). The source of data and the question, ifapplicable, should be presented at the bottom of the table. In our example,we used the following from the 1998 National Opinion Research CenterGeneral Social Survey: “Is respondent liberal, moderate, or conservative?”

13-4b General Comments about Table Entries When you present percentage values in your tables, be consistent with the decimalplaces. In other words, don’t use one decimal digit (.1) for some entries and twodigits for others (.14). In fact, you should limit yourself to only one decimal digit.If you do use a decimal digit with percentages, make sure you use a decimal digitwith whole percentages (62.0, not 62). In addition, don’t put percentage signs afterpercentages or use horizontal or vertical lines in the table. The use of percentagesigns and lines only clutter the appearance of the table.

13-4c Frequency Distributions with More Than One Frequency Distribution

On occasion, you may want to present more than one frequency distribution in asingle table. A major advantage of such a table is that it makes it convenient tocompare frequency distributions for different variables. For example, you maywant to compare distributions for attitudes toward spending on varied policyareas or societal problems. Or, as in Table 13-2, you may want to compareresponses toward different questions you could use to enhance the validity of asingle concept.

Page 6: Chapter 13 ch13.qxd.pdf · 13-4d Frequency Table with Metric Data ... Describe the types of frequency distributions. 7. ... To test this hypothesis the researcher may …

Note that we labeled the table “13-2” to show that it is the second tableincluded in Chapter 13. Also note that the title specifies the table’s content. At thebottom of the table, we also included the source of data and a question used tooperationalize the concept of attitude toward divisive forms of speech. In thetable, we presented frequency and percentage distributions for five types of speechthat could prove to be divisive. The table also presents the response categories (NotAllow and Allow).

Looking at the table, we can easily compare results for the five types of speech.We can readily see, for example, that a greater percentage of the respondentswould allow a person who is a homosexual to make a speech in their community(81.3 percent). There is far less tolerance, on the other hand, toward allowing aracist to deliver a speech in their community (62 percent).

There are other ways you can present more than one frequency distribution ina single table. Whatever way you decide to use, however, make sure you follow therules we presented.

13-4d Frequency Table with Metric Data There are also times when you may want to present the frequency results of met-ric variables. Table 13-3 is an example of such a table.

Notice that the table includes only the highest five states and the lowest fivestates. Note also that the table includes the mean of the distribution (12.9 percent)and the standard deviation of the distribution (4.0 percent). The mean is the aver-age level of poverty for the states. The standard deviation is a measure thatexpresses the degree of variation within a variable on the basis of the average dif-ference from the mean (Corbett 2001, 294). The smaller the standard deviation,the closer the individual case values will cluster about the mean. We cover thesemeasures in more detail in Section 13-6c and Section 13-7d.

13-5 Graphic Presentations An extension of the frequency distribution occurs when you present distributionsin graphic form. Graphs are a convenient way to present data, and they help oneto understand the data without reading a table. We limit our discussion to threebasic types of graphs: the bar graph, the histogram, and the pie graph.

252 Chapter 13

T a b l e 13-2 Distributions of Attitude toward Divisive Forms of Speech

Not Allow Allow

Type of speech # % # %Atheism 520 26.4 1451 73.6

Communism 619 31.7 1331 68.3

Sexual orientation 363 18.7 1578 81.3

Military rule 680 34.8 1272 65.2

Racism 714 38.0 1164 62.0

Question: Consider a person who is against/for ________________. If such a person wanted to make a speech in your (city/town/community), should they be allowed to speak or not?

Source: Adapted from James A. Davis and Tom W. Smith. National Opinion Research Center (NORC) General Social Survey(GSS) for 1998.

Page 7: Chapter 13 ch13.qxd.pdf · 13-4d Frequency Table with Metric Data ... Describe the types of frequency distributions. 7. ... To test this hypothesis the researcher may …

253Univariate Statistics

13-5a The Bar Graph When dealing with nominal or ordinal data, Cole recommends that you use a bargraph to present data (Cole 1996, 145). Bars are drawn for each class of the vari-able so that the height represents the number of cases for each class. Bar graphs areuseful when trying to compare categories. Figure 13-1 presents the data consid-ered in Table 13-1 in bar graph format. The visual advantage of data presented ina bar graph format is obvious. The reader can immediately see that there are notas many liberal respondents in the GSS 1998 Survey as there were moderates andconservatives.

13-5b The Histogram The histogram differs from a bar graph in that you do not separate the bars in ahistogram. The bars are adjoining to show that the variable consists of continuous

bar graph: A type of graphicdisplay of a frequency or apercentage distribution of data.One uses bar graphs with discretedata.

T a b l e 13-3 State Poverty Level

Highest Five States

Rank State Percent Below the Poverty Line1 New Mexico 25.5

2 Mississippi 20.6

3 Louisiana 20.5

4 Arizona 20.5

5 West Virginia 18.5

Lowest Five States

Rank State Percent Below the Poverty Line46 Alaska 8.2

47 Nevada 8.1

48 Utah 7.7

49 Indiana 7.5

50 New Hampshire 6.4

Mean: 12.9. Standard deviation: 4.0.

Source: Data from Percentage of the population below the poverty line (1996). Statistical Abstract of the United States, 1998.

���������������� ������ � ��� ��������

���

���

���

���

����

����

���������

��� �!� "#$ �!% &#'( �)�

Figure 13-1Respondents’ PoliticalIdeology, General SocialSurvey (GSS) 1998Source: Data from James A. Davisand Tom W. Smith. NationalOpinion Research Center (NORC)General Social Survey (GSS) for1998.

histogram: The type of bargraph that is used to depictcontinuous metric-level measures.

Page 8: Chapter 13 ch13.qxd.pdf · 13-4d Frequency Table with Metric Data ... Describe the types of frequency distributions. 7. ... To test this hypothesis the researcher may …

254 Chapter 13

data. Also, intervals, rather than discrete categories, are depicted along the hori-zontal axis. While bar graphs are used with nonmetric data, researchers use his-tograms with metric-level data. Figure 13-2 is a histogram that depicts the extentof urbanization in nations of the world. Let’s take time to examine the graph.

The bars represent the categories for the urbanization variable depicted in thehistogram. The numbers across the horizontal axis represent the intervals for eachcategory. The first classification will consist of nations having an urbanization ratefrom 2.5 percent to 7.5 percent. The second classification will consist of nationshaving an urbanization rate from 7.6 percent to 12.5 percent, and so on.

The heights of the bars are proportioned to the number of nations for eachclass. The higher the bar, the more nations there are within a particular category.The numbers alongside the vertical axis represent the number of nations (cases)included in each category. Continuing our example, there are two nations with anurbanization rate from 2.5 percent up to 7.5 percent, and there are four nationshaving an urbanization percentage from 7.6 percent to 12.5 percent.

13-5c The Pie GraphA pie graph displays a frequency distribution as a circle (or pie shape) with eachcategory shown as a different-colored slice. The larger the slice, the more casesthere are within a particular category. Political scientists use this type of graphicpresentation with nominal or ordinal data. Because of numerous categories, piecharts are inappropriate to use with metric data. Can you imagine how many slicesof pie you would have with a continuous metric variable such as the one presentedin Figure 13-2? Figure 13-3 presents the data considered in Table 13-1 in pie graphformat.

13-6 Measures of Central TendencyWhile frequency distributions and graphs help to describe and explain variables,political scientists often want to present their findings more conveniently. Reportsdealing with several variables would soon become tedious if you relied solely onthe depiction of charts and frequency distributions. Therefore, researchers oftensummarize data with measures of central tendency.

A measure of central tendency is a number that represents the principal valueof a distribution of data. We commonly refer to these measures as averages. Anaverage you are probably familiar with is your grade point average, or GPA. YourGPA describes and summarizes your academic performance in college classes.Measures of central tendency include the mode, the median, and the mean.

pie graph: A type of graphicdisplay of a frequencydistribution. Each “slice” of pierepresents a category of thevariable. The larger the slice of pie in the graph, the more cases for the particular category.

*������ � �� �� � ��+,-

��

��

��

.�� �.�� �.�� /.�� �.�� ..�� �.�� 0.�� �.�� 1.������ ���� /��� ���� .��� ���� 0��� ���� 1��� �����

Figure 13-2Histogram of Percent of the Population Living inUrban Centers throughoutthe WorldSource: Data from The WorldAlmanac and Book of Facts, 1995.

measures of central tendency:Numbers that represent theprincipal value of a distribution ofdata. We commonly refer to thesemeasures as averages. Measures of central tendency include themode, the median, and the mean.

central tendency: The mostfrequently observed, common,or central value in the distributionof values of a variable.

Page 9: Chapter 13 ch13.qxd.pdf · 13-4d Frequency Table with Metric Data ... Describe the types of frequency distributions. 7. ... To test this hypothesis the researcher may …

13-6a The Mode The mode is a convenient measure to use with nominal data. The mode is themost frequently occurring value in any distribution of data. If a distribution hasonly one mode, we say the distribution is unimodal. If there are two values thatappear most frequently, the distribution is bimodal.

Figure 13-4 shows the political party affiliation of members of the 107th ses-sion of the U.S. House of Representatives.

A close look at the figure shows that the Republican political party was themost common party affiliation of members of the 107th House of Representatives(51.0 percent). While not equal, the distribution also approximates a bimodaldistribution.

13-6b The Median The median is the middle item of a set of numbers after ranking the items accord-ing to their size (1, 2, 3, . . ., n). For a ranked distribution the median is the score

255Univariate Statistics

mode: The category of a variablewith the greatest frequency ofobservations.

�������������0�.�/�����

����0/���/��0�����

� �����00�1��1//��1�

����� �� 2����"�������&�������� ��%�����3

&#'( �)�

��� �!�

"#$ �!%

Figure 13-3Respondents’ PerceivedPolitical Ideology, GeneralSocial Survey (GSS) 1998Source: Data from James A. Davisand Tom W. Smith. NationalOpinion Research Center (NORC)General Social Survey (GSS) for1998.

*����4�42���5 �

$�4����� ����2� ���

� ��������..����.

�����

� ������������

�/.

����� �$�4����� �����2� ���������������

%�����3

Figure 13-4Political Party Affiliation of Members of the 107thU.S. CongressSource: Data fromhttp://thomas.loc.gov

median: The category or valueabove and below which one-halfof the observations lie. (Themedian is the middle category orvalue.)

Page 10: Chapter 13 ch13.qxd.pdf · 13-4d Frequency Table with Metric Data ... Describe the types of frequency distributions. 7. ... To test this hypothesis the researcher may …

of the middle case if there are an odd number of cases. If there is an even numberof cases, the median is the value halfway between the two middle cases. In otherwords, you will have to calculate the average of the two middle cases.

As an example, assume that a political science student used a scale to deter-mine the ideological views of twenty-five respondents. The distribution of scoresranging from 1 (liberal) to 7 (conservative) might appear as shown in Table 13-4.

Determining the median in Table 13-4 is a simple process if you take the fol-lowing steps:

1. Rank the numbers (7, 7, 7, 6, 6, 6, 6, . . .1, 1).2. Determine the number of items in the set (N) = 25.3. Add 1 to the number of items: 25 + 1 = 26.4. Divide the result by 2 to determine the middle item: 26/2 = 13.5. The median is 4, or the response of the thirteenth respondent.

This value is no greater than half the distribution (those first twelve studentswhose scores range from 5 through 7). Additionally, it is no smaller than half thedistribution (students 14 through 25 whose scores range from 1 through 4).

If student 25 was not included in this sample, there would not be a single mid-dle case. For a data set having an even number of items, the same steps are takento calculate the median.

1. Rank the numbers.2. Determine the number of items in the set (N) = 24.3. Add 1 to the number of items: 24 + 1 = 25.4. Divide the result by 2 to determine the middle item: 25/2 = 12.5.

In principle, the median is the 12.5 item. To determine the value you calculatethe average of the values of the twelfth and the thirteenth items (5 + 4 /2 = 4.5).The result represents the median for the sample.

Before we leave our discussion about the median, we need to discuss several ofits characteristics. First, the median case is always in the middle, and extreme val-ues do not affect the median value. Thus, its interpretive value remains constant.When we discuss the arithmetic mean in Sections 13-6c and 13-6d, we will seehow extreme values can detract from the interpretive value of the statistic. Second,although we use every item to determine the median, we do not use their actual

256 Chapter 13

T a b l e 13-4 Hypothetical Distribution of Scores of IdeologyScale of Angelo State University Students, 2001

Student Score Student Score Student Score1 7 10 5 19 3

2 7 11 5 20 3

3 7 12 5 21 2

4 6 13 4 22 2

5 6 14 4 23 2

6 6 15 4 24 1

7 6 16 4 25 1

8 5 17 4

9 5 18 4

N = 25.Median = 4.

Source: Hypothetical

Page 11: Chapter 13 ch13.qxd.pdf · 13-4d Frequency Table with Metric Data ... Describe the types of frequency distributions. 7. ... To test this hypothesis the researcher may …

values in the calculations. At most, we only use the values of the two middle itemsto calculate the median when we have an even number of cases in our data set.Third, if items do not cluster near the median, the median may not be a goodmeasure of the group’s central tendency. Last, medians usually do not take on val-ues that are not realistic. The median number of children per American families,for example, is two. The median number of children per American families willnever have a value of 1.7, for example.

13-6c The Mean The mean is used with metric data. It is the average of a set of numbers. We cal-culate the mean by summing the observations in a data set and dividing by thenumber of cases.

To illustrate how we can use the mean in political science, let’s analyze the fol-lowing problem. For the current budget year, a local “Meals for the Elderly” boardhas limited the agency to serve hot meals to a monthly average of 340 recipients.For the first nine months of 2003 the serving figures were 320, 360, 350, 350, 370,330, 360, 370, and 340. Based on the first nine months, is the agency meeting itsmonthly average target? You need to calculate the mean for the first nine monthsto answer the question.

Mean first nine months: 320 + 360 + 350 + 350 + 370 + 330 + 360 + 370 +340 / 9 months = 3150/9 = 350.

Since the mean of 350 exceeds the agency board goal of 340 recipients, it doesnot appear that the agency can meet the board’s target. You are agency adminis-trator; how many recipients must your agency average over the next three monthsto meet the target? To answer this question, you need to

1. Determine the number of clients the agency would have to feed for the year ifthe agency adhered to the board’s edict (340 * 12 = 4080).

2. Determine the number of clients the agency is feeding to date (350 * 9 =3150).

3. Determine the number of clients the agency can feed over the last threemonths of the year without exceeding the goal (4080 – 3150 = 930).

4. Determine the number of clients the agency can feed for each month for theremainder of the year (930 * 3 = 310).

The mean has several important characteristics. First, we use every item in agroup to calculate the mean. Second, unlike the mode, every group of data has oneand only one mean. Third, the mean may take on a value that is not realistic. Forexample, the average American family has exactly 1.7 children. Fourth, an extremevalue may have a disproportionate influence on the mean and thus could affecthow well the mean represents the data.

13-6d Comparing the Mode, Median, and MeanThe three measures of central tendency that we discussed represent univariate dis-tributions. Each, however, has its own characteristics that prescribe and limit itsuse. The mode is the most common value in any distribution of data. The medianis nothing more than the middle item of a set of numbers when one ranks theitems in order of size. Last, the mean is the average of a set of values.

How does one know, however, when to use the mode, the median, or themean? Alas, there is not an easy answer to this question. Most statisticians agree,however, that the application of any measure of central tendency depends on the measurement level of the variable being analyzed. Table 13-5 shows that the

257Univariate Statistics

mean: The sum of the values ofa variable divided by the numberof values.

Page 12: Chapter 13 ch13.qxd.pdf · 13-4d Frequency Table with Metric Data ... Describe the types of frequency distributions. 7. ... To test this hypothesis the researcher may …

mode can represent nominal variables such as the distribution of gender or partyaffiliation. We use the median, on the other hand, with ordinal variables such asclasses of attitudes and categories of income. The political researcher uses themean with metric variables such as age and years of formal education.

Table 13-5 also shows that it is permissible to use the measures appropriate forlower levels of measurement with higher-level data. It is not appropriate, however,to use higher-level measures with lower-level data. The mode, for example, canrepresent income, but the mean cannot represent the distribution of gender.

We can use Table 13-6 to illustrate the appropriateness for using measures ofcentral tendency. The data represented in the table is a hypothetical distribution ofGovernment grades made by Jerry Perry, a student of political science, during oneagonizing semester. Note that the grades have been arrayed in descending order.

The instructor used the ratio level of measurement for the grades. Thus,according to our discussion, the appropriate measure of central tendency to use isthe arithmetic mean. When we calculate the mean in the example, our answer is60 (540/9). Additionally, the median score, the value that occupies the middleposition in an array of values, is 55. Both averages are relatively low. Thus, neitherJerry nor his instructor is happy.

In addition, Jerry knows his father will be unhappy about his course grades.He does not want to tell his father about the low mean and median values. So hedecides that if his father asks about his average in the course, he will give his fatherthe modal value, the most commonly occurring value in an array of data. In thisexample the mode is 85, which appeals to Jerry. After he tells his father that he hadseveral grades of 85, his father will be happy and Jerry will not incur his father’swrath. In sum, our example illustrates how the different measures of central ten-dency can be misleading if used in the wrong context.

13-7 Measures of DispersionMeasures of central tendency are helpful in identifying important characteristicsamong distributions of data. They accurately reflect the actual values of distrib-uted data when the data closely group about the measures. Conversely, measuresof central tendency are less likely to reflect the actual values of all members of adistribution when the data has extreme values. For example, the mean for JerryPerry’s Government grades was 60. However, the high score was 85 and the low

258 Chapter 13

T a b l e 13-5 The Hierarchy of Measurement

Level of Measurement Measure of Central Tendency

Mode Median MeanMetric x x x

Ordinal x x

Nominal x

T a b l e 13-6 Comparison of the Measures of Central Tendency

30 35 45 50 55 70 85 85 85

↑Total of all grades = 540.Number of grades (n) = 9.Mode = 85.Median = (n + 1)/2 = 5th item, 55.Mean = 540/9 = 60.

Page 13: Chapter 13 ch13.qxd.pdf · 13-4d Frequency Table with Metric Data ... Describe the types of frequency distributions. 7. ... To test this hypothesis the researcher may …

score was 30. Thus, there is much dispersion between the mean and the extremescores. Therefore we need some measure of the deviation from the average valueto tell us how well the measure of central tendency summarizes the data.

Political scientists use measures of dispersion, also known as measures ofvariability (Corbett 2001, 134), to gain a clearer understanding of a distribution ofdata. Measures of dispersion are ways to communicate other differences in a set ofdata. They tell how much the data clusters about the various measures of centraltendency.

Suppose that researchers measured constitutional knowledge for a nationalsample of registered voters. The mean that we calculate for the test is 70 out of apossible 100. While we want to know the average score, we also want to know howmuch variation there is in the scores. In other words, how reflective is the mean indescribing the distributions of scores? Did the majority of the respondents get ascore of 70? Was there a bimodal distribution with one group of students attain-ing low scores and another group attaining high scores that averaged out to 70? Orwas there a normal distribution of scores? We use measures of dispersion toanswer these questions.

13-7a The Variation Ratio (v) The variation ratio is useful when analyzing nominal data. It is simple to calcu-late and easy to understand. Specifically, the variation ratio tells the political sci-entist the degree to which the mode satisfactorily represents a particular frequencydistribution. The formula for v is

By analyzing the formula one can see that if all cases in a distribution fell intothe modal class, the value of v would be 0. Thus, the lower the v score, the morerepresentative the mode of all cases in the distribution.

As an illustration, let’s examine the distribution of political ideology and partyidentification as shown in Table 13-7. The variation ratio for Republicans (.46)suggests that the mode is a better representation of ideology for Republicans thanfor the other groups. The variation ratio also shows that the mode is a less satis-factory summary of ideology for those respondents who say they are Independent(.70). Put another away, the Independent respondents varied more in their ideo-logical orientations. Thus, one should be careful of reporting the mode as repre-sentative of the ideological orientation of all Independents in this example.

Formula: 1 –Number of cases in the modal category

Total number of casesv =

259Univariate Statistics

dispersion: The distribution of data values around the mostcommon, middle, or averagevalue.

variation ratio: The variationratio tells the political scientist the degree to which the modesatisfactorily represents aparticular frequency distribution.

T a b l e 13-7 Distribution of Political Ideology and Party Identification, 2001

Political Ideology Democrat Independent RepublicanLiberal 360 160 80

Libertarian 220 260 180

Conservative 180 280 540

Populous 240 300 200

Totals 1000 1000 1000v computation 1 – 360/1000 1 – 300/1000 1 – 540/1000v .64 .70 .46

Source: Hypothetical

Page 14: Chapter 13 ch13.qxd.pdf · 13-4d Frequency Table with Metric Data ... Describe the types of frequency distributions. 7. ... To test this hypothesis the researcher may …

13-7b The Range The range is a useful measure when the researcher is working with ordered orranked data (Cole 1996, 162). Thus, it is useful with ordinal data and when con-sidering the degree to which the median satisfactorily represents a particular fre-quency distribution. The range is simply the difference between the largest valueand the smallest value in a distribution. The smaller the range, the more accurate,or representative, is the median score of all values in the distribution.

In Table 13-4, our example of the median, we presented a hypothetical distri-bution of ideological orientation scores for twenty-five Angelo State Universitystudents. The median in our example is 4 and the range is 6 (7 – 1). The responsesof another sample of twenty-five students using the same seven-point scale appearin Table 13-8. In this example, the median is still 4, but the range is 4 (6 – 2). Acomparison of the measures presented in the two tables shows that there is greaterhomogeneity in the responses of the second group of students.

The range is easy to calculate and has utility as a measure of dispersion. Whilewe could use the range with metric data, it is not wise to do so because extremevalues in a distribution could influence the range, thus giving a misleadingimpression of variation. Consider, for example, that there is one individual with adoctorate degree in the community you sample. If you randomly choose twenty-five persons to sample, there is an excellent chance that he or she will probably notbe included. But, for the sake of this example, suppose you do include the indi-vidual in your sample. The range in education levels will then be extremely largeand very misleading as a measure of dispersion. In addition, if we use the range asa measurement of dispersion with metric data, we do not know anything aboutthe variability of scores between the two extreme values except that the scores dolie somewhere within the range. Consequently, you should avoid using the rangewith metric data.

13-7c The Mean Deviation The mean deviation is useful when analyzing metric data. Simply put, the meandeviation is the average difference between the mean and all other values in thedistribution. The mean deviation makes use of every observation in the distribu-

260 Chapter 13

mean deviation: A measure of dispersion of data points formetric-level data. It is the mean ofdifferences between each value ina distribution and the mean of thedistribution.

T a b l e 13-8 Hypothetical Distribution of Scores of IdeologyScale of Angelo State University Students, 2001

Student Score Student Score Student Score1 6 10 5 19 3

2 6 11 5 20 3

3 6 12 5 21 2

4 6 13 4 22 2

5 6 14 4 23 2

6 6 15 4 24 2

7 6 16 4 25 2

8 5 17 4

9 5 18 4

N = 25.Median = 4.Range = 4 (6 – 2).

Source: Hypothetical

range: The distance between the highest and lowest values or the extent of categories intowhich observations fall.

Page 15: Chapter 13 ch13.qxd.pdf · 13-4d Frequency Table with Metric Data ... Describe the types of frequency distributions. 7. ... To test this hypothesis the researcher may …

261Univariate Statistics

tion. One computes this measure by taking the difference between each observa-tion in the distribution and the mean. Summing these deviations is the next step.(Note: When summing, ignore negative signs. Otherwise, the sum would always bezero.) Last, divide the sum by the number of observations. Arithmetically, themean deviation is expressed as

where

Σ = the sum of.Xi = each individual observation.X = the mean of all of the observations.n = number of observations.| | = absolute difference (ignore signs).

Table 13-9 illustrates the calculation of the mean deviation for the percentageof the total vote George W. Bush and Dick Cheney received in the southeasternstates. The results show that, on the average, the percentage of the total votereceived by the Bush/Cheney ticket in each southeastern state in the 2000 electiondiffers from the mean vote for all southeastern states by 2.7 percent.

13-7d The Variance and the Standard Deviation While the mean deviation has a more direct intuitive interpretation than othermeasures of deviation that we can use with metric data, the measure has feweruseful statistical properties than those measures (Blaylock 1979, 78). As such,political researchers do not use this measure very often. We have discussed itlargely as a way to enhance your understanding about measures of dispersion andas a prelude to our discussion of other metric measures of dispersion.

One such measure is the variance. The variance uses the mean deviation in itscalculation. When you calculate the variance, however, you square the differencesbetween each observation and the mean. Next, you sum the squares and divide the

Mean deviation =−Σ Xi x

n

T a b l e 13-9 Percent of Vote for Bush/Cheney in SoutheasternStates in the 2000 Presidential Election (Mean Deviation)

Mean = 54.0%

State % Vote | X i – X |Alabama 56.5 2.5

Arkansas 51.3 2.7

Florida 48.8 5.2

Georgia 55.0 1.0

Kentucky 56.4 2.4

Louisiana 52.6 1.4

Mississippi 57.6 3.6

South Carolina 56.9 2.9

Tennessee 51.2 2.8

Total 24.5

Mean Deviation = 24.5/9 = 2.7.

Source: Adapted from National Archives and Records Administration.

variance: Another measure ofdispersion of data points aboutthe mean for metric-level data.It is a measure of how spread out a distribution is.

Page 16: Chapter 13 ch13.qxd.pdf · 13-4d Frequency Table with Metric Data ... Describe the types of frequency distributions. 7. ... To test this hypothesis the researcher may …

262 Chapter 13

result by the number of cases. The formula for the calculation of the variancelooks very much like the formula for the calculation of the mean deviation:

where

Σ = the sum of.Xi = the summation of the value of each individual observation.X = the mean of all of the observations.n = number of observations.

The standard deviation is probably the most common measure of dispersionfor metric data. Like the variance, the basis for standard deviations is the squareddifferences between every item in a data set and the mean of that set. In fact, yousimply take the square root of the variance to calculate the standard deviation.Similar to the other measures of dispersion we discussed, the smaller the standarddeviation in a set of data, the more closely the data cluster about the measure ofcentral tendency.

We won’t trouble you with the reasoning here, but the standard deviation is astable measure of dispersion from sample to sample. Political scientists use stan-dard deviations with the normal curve to determine where scores or observationscluster about the mean and to determine a standard score. While we can useeither the variance or the standard deviation to indicate the amount of variationwithin a metric-level variable, we usually use the standard deviation.

To see the utility of the two measures, let’s examine Table 13-10. The table issimilar to Table 13-9 in that it depicts the percent of vote for Bush/Cheney insoutheastern states in the 2000 presidential election. It differs, however, in that itillustrates the calculation of the variance and the standard deviation for the samedata set. For this example, the variance is 8.7 and the standard deviation is 2.96.The lower the variance/standard deviation, the more accurately does the meanrepresent all the scores of all cases in a distribution of metric-level data.

Variance (s )( – )2

2= Σ Xi X

n

standard score: An individualobservation that belongs to adistribution with a mean of 0 and a standard deviation of 1.See Z score.

standard deviation: The mostcommon measure of dispersion of data points about the mean ofmetric-level data. It is athe squareroot of the variance.

T a b l e 13-10 Percent of Vote for Bush/Cheney inSoutheastern States in the 2000 PresidentialElection (Variance and Standard Deviation)

Mean = 54.0%

State % Vote (X i – X) (X i – X)2

Alabama 56.5 2.5 6.3

Arkansas 51.3 2.7 7.3

Florida 48.8 5.2 27.0

Georgia 55.0 1.0 1.0

Kentucky 56.4 2.4 5.8

Louisiana 52.6 1.4 2.0

Mississippi 57.6 3.6 13.0

South Carolina 56.9 2.9 8.4

Tennessee 51.2 2.8 7.8

Total 24.5 78.6

Variance (s2) = 78.6/9 = 8.7.Standard deviation (s) = /8.7 = 2.95.

Source: Adapted from National Archives and Records Administration.

Page 17: Chapter 13 ch13.qxd.pdf · 13-4d Frequency Table with Metric Data ... Describe the types of frequency distributions. 7. ... To test this hypothesis the researcher may …

The standard deviation also helps us to understand the distribution of the val-ues for a particular metric-level variable and can be helpful when we are compar-ing two or more groups of cases (Corbett 2001, 140). To illustrate our point, let’sexamine Table 13-11.

The table shows the hypothetical per capita income (PCI) for samples of citi-zens of several states. It also shows the standard deviation for the PCI in the states.Let’s analyze the results. The table shows that the income is distributed in very dif-ferent ways in Florida and in the other states. In Florida, Colorado, New York, andIllinois, people’s incomes are fairly close together. In other words, there is not agreat deal of income inequality. That is why the standard deviations are relativelylow. In New Mexico, however, there is greater income inequity. And in Texas, asevidenced by the relatively high standard deviation, there is a great deal of incomeinequity when compared to the other states.

In summary, the standard deviation and the variance show us how much vari-ation there is within a metric variable. For a variable, when there is very little dif-ference from one case to another, these statistics will be low. Conversely, whenthere is a great deal of diversity among the cases for a variable, these statistics willbe high. As we discuss later in Section 13-8b of this chapter, when the distributionof values of a variable approaches a normal distribution, the standard deviationtells us even more.

13-8 Shape of the Distribution and Metric DistributionsUp to this point we have discussed measures of central tendency and measures ofdispersion as ways to examine data distributions. In the past, political scientistsalso analyzed the shape of distributions by constructing a frequency polygon. Todo this, they would connect the midpoints of the top of each bar of a histogramwith a solid line. The shape of a distribution was a function of the distribution.Those distributions having most of their case scores above the mean had a differ-ent shape from those having a large proportion of scores below the mean.

13-8a Skewed DistributionsThree possible shapes can result when drawing frequency polygons of metric datadistributions. Figure 13-5 shows the first two shapes. Each shape represents askewed distribution. This means that in both instances there are more extremescores in one direction or the other. In the first instance, there are more extremelow scores than extreme high scores. This is a negative, or left, skewed distribution.You can see that the mean is pulled in the directio of the lower scores. If you areanalyzing the distribution of Anglo residents for the United States, the distri-butions will be negatively skewed because Anglos are in the minority only inHawaii. The second shape shows the impact of the many extreme high scores in

263Univariate Statistics

skewed distribution: A datadistribution in which moreobservations fall to one side of the mean than the other. Thus,the mean is “pulled” toward theextreme low (negative skew) orextreme high (positive skew).

negatively skewed:A distribution of values in whichmore observations lie to the left of the middle value.

T a b l e 13-11 Per Capita Income for Selected States

State Mean Income Standard DeviationFlorida 22,916 500

Texas 20,654 3000

Colorado 23,449 600

New York 26,782 550

New Mexico 18,055 1850

Illinois 24,763 625

Source: Hypothetical

frequency polygon: A graphresulting from the connection of the midpoints of the top ofeach bar of a histogram with a solid line.

Page 18: Chapter 13 ch13.qxd.pdf · 13-4d Frequency Table with Metric Data ... Describe the types of frequency distributions. 7. ... To test this hypothesis the researcher may …

the distribution. This shape represents a positive, or right, skewed distributionbecause the mean is pulled in the direction of the higher scores. If you are exam-ining the land area for the United States, the distribution will be positively skewedbecause of Texas and Alaska.

While political researchers, in the past, drew frequency polygons to get a senseof the shape of a distribution, today many statistical packages allow us to comparemetric distributions with the normal curve. Again, consider Figure 13-5. The fig-ure depicts two distributions with a normal distribution curve superimposed onthe polygons. If the distributions were normal distributions, the bars would touchthe curves. As you can readily see, there are several bars that do not reach thecurves and there are several bars that extend beyond the curves. Thus, each distri-bution is a skewed distribution. Also note that the median value is greater than thevalue of the mean in Graph (a), while the mean is greater than the median valuein Graph (b). Therefore, Graph (a) depicts a negative (left) skew and Graph (b)illustrates a positive (right) skew.

13-8b The Normal CurveA symmetrical distribution is the third shape you can obtain when constructing afrequency polygon. The third shape one can obtain is a symmetrical distribution.Figure 13-6 is a depiction of the normal distribution curve. The normal curve, aspecial type of symmetrical distribution, is very valuable in statistics because it hasseveral important properties. First, the curve is symmetrical and bell-shaped. Sec-ond, the measures of central tendency coincide at the center of the distribution. Inother words, the values of the measures are equal. Third, the curve is based on aninfinite number of observations. The last property of the normal curve that wediscuss, however, is probably its most distinctive characteristic. In any normal dis-tribution, a fixed proportion of the observations lie between the mean and fixedunits of standard deviations. To help you understand why this property is soimportant, let’s examine Figure 13-7.

The percentages can be seen in Figure 13-7. The mean of the distributiondivides the curve exactly in half. Note that a little more than 34 percent of all casesfall between the mean and one standard deviation above the mean. Additionally, alittle more than 34 percent of the cases fall between the mean and one standarddeviation below the mean. Thus, slightly more than 68 percent of all cases in a

264 Chapter 13

positively skewed:A distribution of values in whichmore observations lie to the rightof the middle value.

normal distribution curve:A frequency curve showing asymmetrical, bell-shapeddistribution in which the mean,mode, and median coincide and in which a fixed proportion ofobservations lies between themean and any distance from themean measured in terms of thestandard deviation.

+�- +2-

�11�3* �& '%67�% +(!8�11�-

��

��

��

/.�� �.�� ..�� �.�� 0.�� �.�� 1.������ .��� ���� 0��� �����1�������

"���9�/�1,� "�� ��9�0��,� �11�3�!'$!� !�'(:;!� "�� (

��

��

/�

���

"�� ��9.�8��.4 ���� "���90�80��4 ����

�������� �������� /������� �������� .�������.������ �.������ �.������ /.������ �.������ ..������

Figure 13-5Skewed Distributions and the Normal CurveSource: Statistical Abstract of the United States, 1996.

Page 19: Chapter 13 ch13.qxd.pdf · 13-4d Frequency Table with Metric Data ... Describe the types of frequency distributions. 7. ... To test this hypothesis the researcher may …

265Univariate Statistics

normal distribution lie within one standard deviation (plus or minus) of themean. Similarly, more than 13.5 percent of all cases fall between one standarddeviation and two standard deviations above the mean and between one standarddeviation and two standard deviations below the mean. Therefore, more than 95percent of all cases in a normal distribution lie within a plus or minus two stan-dard deviations of the mean. Continuing the analysis you can see that almost allof the cases (99.74 percent, to be exact) will fall within a plus or minus three stan-dard deviations of the mean.

Consequently, the standard deviation used with the normal curve can be avery important tool in the political scientist’s repertoire. It is important becausethe researcher can determine the proportion of observations included within fixeddistances of the mean. For example, assume that the public’s rating of a particularwelfare program rated on a scale of 0 to 100 has a normal distribution. Addition-ally, the distribution has a mean of 50 and a standard deviation of 10. Based onthis information we can conclude that more than 68 percent of the public assignsthe program a rating between 40 and 60 (±1 standard deviations from the mean).Additionally, more than 95 percent assigned the program a rating between 30 and70 (±2 standard deviations from the mean). Last, almost everyone in the survey

<� </ <� <� � � � / �

Figure 13-6The Normal Distribution

�=�5�4��� �.�����5������������ �� �� ����5������������=����>�3

</ <� <� � � � /

�� /� �� .� �� 0� ��+</�- +<��- +<�- "��� +�- +��- +/�-

����.

��/.1

�/��/

����.

��/.1

�/��/Figure 13-7Areas under the Normal CurveSource: Adapted fromhttp://davidmlane.com/hyperstat/normal_distribution.html.

Page 20: Chapter 13 ch13.qxd.pdf · 13-4d Frequency Table with Metric Data ... Describe the types of frequency distributions. 7. ... To test this hypothesis the researcher may …

266 Chapter 13

assigned the program a rating between 20 and 80 (±3 standard deviations fromthe mean).

13-8c Standard Scores (the Z Score)When rating the proportion of observations within a desired interval, the politi-cal scientist should express observations in units of standard deviation. For exam-ple, you can use Figure 13-7 to determine the percentage of cases rating the welfareprogram from the mean of 50 to 75. To do so you have to determine how manystandard deviations the rating of 75 lies from the mean of 50. Political scientistscalculate the Z score to accomplish this task. The formula for Z is

where

Z = the standard score.X = the value of any observation.X = the mean.s = the standard deviation.

The Z score tells us the number of standard deviations that the score lies aboveor below the mean. If we apply the formula to the example discussed, the Z score is

In our example, we find that the score of 75 is 2.5 standard deviation unitsabove the mean. Intuitively, this should make sense. Recall that we showed that ascore of 70 is 2 standard deviation units above the mean, and a score of 80 is 3units of standard deviation above the mean. Thus, a score of 75 had to fall between2 and 3 standard deviation units.

To carry our analysis further, Figure 13-7 shows that 47.72 percent of the caseslie between the mean and 2 standard deviation units above the mean. The figurealso shows that 49.87 percent of the cases lie between the mean and 3 units of stan-dard deviation above the mean. Thus, if a 75 rating is 2.5 standard deviation unitsabove the mean, somewhere between 47.72 and 49.87 percent of the publicassigned the program a score between 50 and 75. We will need to use the standardnormal distribution table to determine the exact percentage.

Table 13-12 depicts selected sections of the standard normal distributiontable. In other words, it is a partial Z table to be used with the examples in thisbook. Take the following steps to use the table:

1. Scan the far-left column to find the first two digits of the Z value. In our case,2.5.

2. Under the numerical column headings find the third digit of the Z value. Inour case, .00.

3. Extend both the column and the row until they intersect. The value that youfind at the point of intersection is the proportion of cases that lie between themean and 2.5 standard deviations above the mean. In our case, .4938. Thismeans that 49.38 percent of the public assigned the welfare program a ratingbetween 50 and 75.

Z75 – 50

102.50= =

ZX – X

s=

standard normal distribution:A normal distribution having amean of 0 and a standarddeviation and variance of 1.

Z score: The number ofstandard deviations that a scoredeviates from the mean in astandardized normal distribution.See standard score.

Page 21: Chapter 13 ch13.qxd.pdf · 13-4d Frequency Table with Metric Data ... Describe the types of frequency distributions. 7. ... To test this hypothesis the researcher may …

To conclude our discussion about the normal curve and Z scores, let’s look atanother illustration. Suppose you want to determine the percentage of the publicassigning the program a rating from 0 to 75. Before you begin to plug figures intothe formula just presented, there is a quicker way to determine the answer. Simplyadd .50 to the percentage associated with the Z value we just calculated (.4938). Wedo this because the normal curve assumes that 50 percent of the cases will lie oneither side of the mean. Thus, we conclude that 99.38 percent of the public wouldrate the program from 0 to 75.

267Univariate Statistics

T a b l e 13-12 Selected Sections of the Standard Normal Distribution

Z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09 0.0 .0000 .0040 .0080 .0120 .0160 .0199 .0239 .0279 .0319 .0359

0.5 .1915 .1950 .1985 .2019 .2054 .2088 .2123 .2157 .2190 .2224

1.0 .3413 .3438 .3461 .3485 .3508 .3531 .3554 .3577 .3599 .3621

1.5 .4332 .4345 .4357 .4370 .4382 .4394 .4406 .4418 .4429 .4441

2.0 .4772 .4778 .4783 .4788 .4793 .4798 .4803 .4808 .4812 .4817

2.5 .4938 .4940 .4941 .4943 .4945 .4946 .4948 .4949 .4951 .4952

3.0 .4987 .4987 .4987 .4988 .4988 .4989 .4989 .4989 .4990 .4990

Notes: 1. This is a partial Z table to be used with the examples in this book.2. An entry in the table is the proportion under the entire curve that is between Z = 0 and a positive value of Z. Areas fornegative values of Z are obtained by symmetry.3. To obtain the percentage of cases from the mean, multiply the cell entry by 100 (.4938 * 100 = 49.38%).

Chapter SummaryIn this chapter we examined some important tools to use inthe preliminary stage of data analysis. For example, sophis-ticated computer programs summarize data as frequencydistributions. These distributions depict a number of case(N) and number of cases by class, percentages, and, perhaps,cumulative percentages. These techniques help the politicalscientist to assess the weight of a single class in relation toother classes of a distribution or distributions.

Additionally, political scientists use measures of centraltendency to describe the distribution’s main characteristics.These measures help the researcher answer questions suchas “What is the typical party identification of respondents?”or “What is the average level of income of the group?”

Measures of central tendency, however, can be mislead-ing if not accompanied by measures that describe theamount of dispersion in the distribution. While measures ofcentral tendency reflect a group’s typical characteristic,measures of dispersion depict the extent of variance fromthe typical value, or average. The dispersion measures showhow many members of the group deviate from the typical

and the extent of their deviation. A small deviation showsthat most responses cluster around the measure of centraltendency, suggesting a homogeneous group. Large devia-tions, on the other hand, suggest that the measure of centraltendency is a poor representation of the distribution.

Another important step in examining a distribution isthe identification of its general form. For example, the shapeof frequency polygons may show that extreme scores in thedistribution may affect the measure of central tendency. Orthe form may be symmetrical or even normal, because thereare no extreme scores affecting the shape of the distribution.If this is the case, there are a fixed proportion of observa-tions lying between the mean and fixed units of standarddeviation.

The measures discussed throughout this chapter helpthe political scientist understand data distributions. Analyz-ing these descriptive statistics, however, is only the first stepin data analysis. Once summarized, researchers often wantto discover relationships between variables. We turn to thisissue in the next chapter.

C h a p t e r S u m m a r y

Page 22: Chapter 13 ch13.qxd.pdf · 13-4d Frequency Table with Metric Data ... Describe the types of frequency distributions. 7. ... To test this hypothesis the researcher may …

268 Chapter 13

Chapter Quiz1. Consider these scores: 0, 3, 1, 5, 1. The mean is

a. 2.b. 2.5.c. 3.d. None of choices a through c is correct.

2. In a symmetric, unimodal distribution,a. the median equals the mean.b. the mode equals the median.c. the mean equals the mode.d. Each of choices a through c is correct.

3. The number of mean years of the GSS variableAGE1STBRN is higher than the median. The variablemeasures the respondent’s age when their first childwas born. So we know that the distribution ofrespondent’s age when their first child was born isa. negatively skewed.b. normal.c. bimodal.d. positively skewed.

4. The standard deviation measures deviation aroundthea. mode.b. median.c. mean.d. variance.

5. The number of standard deviations a score lies fromthe mean in a normal standard distribution isa. the case’s Z score.b. the standard error.c. the confidence interval.d. None of choices a through c is correct.

6. The ________________________ is the only measureof central tendency that can properly be used withnominal data.a. modeb. medianc. meand. standard deviation

7. Political scientists use measures of _______________to describe the distribution’s average characteristics.a. dispersionb. associationc. central tendencyd. statistical significance

8. Measures of __________________________depictthe extent of variance from the typical value, oraverage value.a. dispersionb. associationc. central tendencyd. statistical significance

9. The basis for the standard deviation and_____________ is the squared differences betweenevery item in a data set and the mean of that set.a. modeb. medianc. meand. variance

10. The _________________________________ is theaverage difference between the mean and all othervalues in the distribution.a. modeb. standard deviationc. varianced. mean deviation

C h a p t e r Q u i z

Page 23: Chapter 13 ch13.qxd.pdf · 13-4d Frequency Table with Metric Data ... Describe the types of frequency distributions. 7. ... To test this hypothesis the researcher may …

269Univariate Statistics

Suggested ReadingsBernstein, Robert A. and James A. Dyer. An Introduction to

Political Science Methods, 3rd ed. Englewood Cliffs, NJ:Prentice-Hall, 1992.

Blaylock, Hubert M., Jr. Social Statistics, 2nd ed. New York:McGraw-Hill, 1979.

Cole, Richard L. Introduction to Political Science and PolicyResearch. New York: St. Martin’s Press, 1996.

Corbett, Michael. Research Methods in Political Science: AnIntroduction Using MicroCase, 4th ed. Belmont, CA:Wadsworth, 2001.

Davis, Richard and Diana Owen. New Media and AmericanPolitics. New York: Oxford University Press, 1998.

Fox, William. Social Statistics, 3rd ed. Bellevue, WA: Micro-Case, 1998.

Frankfort-Nachmias, Chava and David Nachmias. ResearchMethods in the Social Sciences, 6th ed. New York: WorthPublishers, 2000.

Johnson, Janet Buttolph, Richard A. Joslyn, and H. T.Reynolds. Political Science Research Methods, 4th ed.Washington, D.C.: Congressional Quarterly Press, 2001.

Kay, Susan Ann. Introduction to the Analysis of Political Data.Englewood Cliffs, NJ: Prentice-Hall, 1991.

Leedy, Paul D. and Jeanne Ellis Ormrod. Practical Research:Planning and Design, 7th ed. Upper Saddle River, NJ:Merrill Prentice Hall, 2001.

S u g g e s t e d R e a d i n g s

Endnote1. See Corbett, Michael. Research Methods in Political Sci-

ence: An Introduction Using MicroCase, 4th ed. Belmont,CA: Wadsworth Publishers, 2001, for a succinct presenta-tion of how to present frequency tables.

E n d n o t e

Page 24: Chapter 13 ch13.qxd.pdf · 13-4d Frequency Table with Metric Data ... Describe the types of frequency distributions. 7. ... To test this hypothesis the researcher may …