Intermediates Pss 1

Download Intermediates Pss 1

Post on 29-Nov-2015




1 download

Embed Size (px)




<ul><li><p> 1 </p><p>Intermediate SPSS (1) Hypothesis Testing and Inferential Statistics </p><p>Tutorial Goal: Building and testing hypotheses using inferential statistics in SPSS. This workshop covers </p><p>parametric and nonparametric tests, concentrating on correlation, chi-square, and t-tests. Participants learn how </p><p>to understand, analyze and report results. </p><p>Ok, lets review somewhat from our last workshop. </p><p>What is statistics? </p><p>First, what is statistics? </p><p>Statistics is the science and practice of developing knowledge through the use of empirical data expressed in quantitative form ( </p><p>So, you are basically posing a question about something in the Social Sciences and using numbers to answer it. </p><p>Some examples of these questions are: </p><p> Do countries with stricter gun control laws have fewer deaths by firearms? </p><p> What are the best methods for teaching? </p><p> What factors cause a disease to spread from one place to another? </p><p> Do religious views and class affect opinions about euthanasia? </p><p>You can answer these questions by using numbers. For statistics, there are four kinds of levels of measurement </p><p>for the variable. All your analyses extend from what kind of level your variable is. They are NOIR. </p><p>(N)ominal </p><p>(O)rdinal </p><p>(I)nterval </p><p>(R)atio </p><p>Lets talk about each one. </p><p>Nominal means that the number simply represents a category of objects. There is no measured different among </p><p>the objects or people. Some examples are giving states numbers (N.Y. 1, Connecticut 2, R.I. 3), assigning a </p><p>number for gender (male 1, female 2), or designating college major (History 1, Business 2, Sociology 3). You </p><p>are just assigning a number to something. </p><p>Ordinal means the larger number for the object is truly larger in some sort of amount. This typically means </p><p>rank. Some examples are 1st, 2</p><p>nd, and 3</p><p>rd places in a contest, or preferences for different movies. However, </p><p>there is no exactly measured difference among the objects. We dont know definitively how much larger or better 1</p><p>st is compared to 2</p><p>nd. We just know 1</p><p>st is somehow larger than 2</p><p>nd. </p><p>Interval means, like Ordinal, that there is a rank for the objects or people, but there is also a measurement for </p><p>the ranking. Some examples are degrees Celsius or Fahrenheit. We know that the different between 98 and 99 </p><p>degrees is the difference of the amount of mercury in a thermometer. Also, the difference between 42 and 43 </p><p>degrees is the same amount between 98 and 99. However, there is no true zero, which stands for a complete </p><p>lack of the object being measured. 0 degree does not mean there is no mercury, for example. </p></li><li><p> 2 </p><p>Ratio means, like Interval, that there is a measurement for the ranking, but there is also a true zero. A true zero </p><p>means that there is lack of the quality being measured. Some examples are income, where the difference </p><p>between $10,000 and $11,000 is known and zero means complete lack of income. </p><p>These levels are very important and we will be discussing them more as we go on. Nominal and Ordinal are </p><p>called Nonparametric Data, and Interval and Ratio are called Parametric Data. The statistical analyses that </p><p>you can use are dependent on what level your data are. Specifically, if you can make a logical mean using your </p><p>data, then you can use parametric data. </p><p>In this tutorial, we are interested in Inferential Statistics. This form of statistics tries to make conclusions </p><p>about a whole group from one sample from that group. So, we have two important concepts. First, population </p><p>means the entire group of whatever youre studying. Second, a sample is a subset of the population. If youre trying to do research, studying a whole population is probably out of the question. A sample is easier to obtain </p><p>and you can use the sample to surmise how the whole population behaves. Of course, it has to be a random </p><p>sample, which means that anyone or anything from the population has an equal chance of falling into the </p><p>sample. If not, then you have bias, which means that the sample is not an accurate picture of the whole </p><p>population. </p><p>Ok, now that we understand what a population and sample are, we need to know what probability theory is. </p><p>Probability Theory is the branch of mathematics that studies the likelihood of occurrence of random events in order to predict the behavior of defined systems (;r=67). So, we want to apply the theories of probability on this sample to infer what the whole population does. The best way </p><p>to understand this is by looking at dice and how they behave. </p><p>If you rolled one die, what is the chance youd get a five? </p><p>1/6 </p><p>Sample </p><p>Population </p></li><li><p> 3 </p><p>If you rolled two dice, what is the chance youd get 2 fives? </p><p> 1/36 </p><p>Ok, now look at our sample set, which is all possible outcomes. If you have two dice, the following chart has </p><p>all the 36 possible outcomes: </p><p> ( </p><p>So, the more chances you have for that outcome, the higher the probability youll have to get that outcome. For example, from all our possible outcomes, the possible outcome of 7 is 1/6, whereas the possible outcome of 2 is only 1/36. </p><p>A good graphic for this probability is seen at a web site called Introduction to Probability Models. Here you </p><p>can run a simulation of rolling two dice. The right panel below shows the result of the dice on the X axis and </p><p>the number of times on the Y axis. The first chart shows the result from rolling two dice ten times. </p><p>Result Probability </p><p>2 1/36 </p><p>3 2/36 </p><p>4 3/36 </p><p>5 4/36 </p><p>6 5/36 </p><p>7 6/36 </p><p>8 5/36 </p><p>9 4/36 </p><p>10 3/36 </p><p>11 2/36 </p><p>12 1/36 </p></li><li><p> 4 </p><p>Rolling two dice twenty times. </p><p> And finally, rolling two dice one hundred times. </p><p>( </p><p>You can see that the outcomes with more probability, numbers 6, 7, and 8, build up more quickly. You can also </p><p>see that this builds up as a bell-shaped curve. If its considered a normal distribution, you should see this kind of curve. So, the numbers with more probability are in the middle and those without a high probability are on </p><p>the extremes. This is what statistics is all about. Its about seeing what number has a high probability of occurring and what doesnt. </p><p>Subsequently, two important ideas from distribution of outcome are central tendency and variance. Lets explore these essential ideas for a moment. </p></li><li><p> 5 </p><p> (Graphic from </p><p>An IQ test is a perfect example of central tendency and variance. Your result on an IQ test is literally the </p><p>comparison of your result with everybody elses who has taken the test. Millions of people take these tests. Very few people would score low, and there a very few geniuses around who would score high. The majority </p><p>of us have average IQs. As seen in the graphic above, IQ results, when plotted out, have a normal distribution </p><p>where the majority of results cluster in the middle and results that are lower and higher are infrequent and lessen </p><p>the farther away from the center of the results. </p><p>The central tendency is measured usually by the mean (All cases added and then divided by the number of </p><p>cases). So, a score of 100 on an IQ is the mean. Its an average intelligence. Remember, the results of the majority of people bunch around 100. Variance is how far the score falls from the mean. If most of the scores </p><p>cluster around the mean, then there is low variance. It looks like a bell curve, where most of the results are in </p><p>the middle taking the shape of a bell. If the variance is high, the curve in the middle is not as high and the </p><p>results are more spread out. So, with statistics, were trying to figure out if our numbers fall near the central tendency, which means that maybe there is nothing unusual about them, or if they fall farther away towards the </p><p>extremes and are unusual. Remember from our discussion of populations, samples and IQs. The average is </p><p>100, so if you take a sample from the population, you should expect an average IQ in that sample to be around </p><p>100. However, if the average IQ in that sample turns out to be 130, then statistically your sample is not </p><p>average. </p><p> STOP! The difference between ordinal and </p><p>interval is often slight, and sometimes you can get </p><p>away with using parametric tests for ordinal data. </p><p>Ok, first, when doing statistics, you need to choose </p><p>the right test. As weve talked about, there two types of data: nonparametric and parametric. This </p><p>makes a big distinction in what tests we can perform. </p><p>If our data is nominal and ordinal, there is no mean </p><p>and so you do nonparametric tests. </p></li><li><p> 6 </p><p>There are some assumptions about data that you should be aware of. These also affect which test you choose. </p><p>Nonparametric Parametric </p><p> Nominal/Ordinal Data </p><p> Random sampling </p><p> Interval/Ratio Data </p><p> Random sampling </p><p> Normal Distribution </p><p> Equal variances of the scores in populations that the samples </p><p>come from. </p><p>Since the parametric data have more assumptions, the parametric tests are considered more powerful when the </p><p>assumptions are met. Powerful means that these tests are better at picking up differences in variables in the </p><p>population. Also they are more robust to the violations of the assumptions. So, if the assumptions are not </p><p>completely met, you can still get accurate results. The only assumption thats nonnegotiable is the level of measurement. </p><p>When you choose a statistical analysis, you need to do two things, make a hypothesis and decide on </p><p>significance. We are now going to talk about each one before we go on to the test. </p><p>1. Making hypotheses is an essential part of every test. These hypotheses always deal with how the numbers </p><p>of your sample relate to the numbers of the population. First, start with a null hypothesis and then an alternative </p><p>hypothesis. </p><p>Null Hypothesis (HO) states that the numbers of your sample do not differ significantly from the numbers of </p><p>the population. For example, you walk into any old restaurant and do an IQ test on 30 customers. The HO says </p><p>that the mean of their IQs should not differ significantly from the mean IQ of the population. </p><p>Alternative Hypothesis (HA) states that the numbers of your sample differ significantly from the numbers of </p><p>the population. For example, we heard that the restaurant has intelligence boosting spices in the food, so our </p><p>HA is that the sample of 30 people from the restaurant has a mean IQ of 130, which is much higher than the </p><p>populations IQ. </p><p>P is significance. So, if you see a result reported p</p></li><li><p> 7 </p><p> One-tailed tests are used if you have a directional hypothesis. Mainly, you put the .05 of chance in the direction of your alternative hypothesis. So, if you say youre going to find a sample with a mean of 130 and the mean is 100, you put the whole .05 in the direction of the hypothesis, which is above the mean. </p><p> Two-tailed tests are used when you are not certain in which direction your alternative hypothesis goes. So, if you hypothesize that a sample mean is somehow different than the populations mean, in either a positive or negative direction, then split the alpha into two parts of .025 and place them at either ends of </p><p>the normal distribution. </p><p> After you have performed a test, you verbalize the result in a sentence. Also you usually report five things: test </p><p>result, degrees of freedom (df), number of sample, significance and one- or two-tailed. </p><p>1. Test Result: Each test has its own mathematical equation. For our purposes in SPSS, we do not need to know the exact mechanics of each equation. We will just discuss the big picture of each test and roughly </p><p>what its doing. Basically, for these analyses here, the higher the result, the better our chances of reaching significance and rejecting the H0. However, when reporting the result, you need to report the result of the </p><p>equation. This will be pointed out in each of our tests. </p><p>2. Degrees of Freedom (df): The df is the number of frequencies that is allowed to vary, which is the number of observations minus the number of constraints. This point is very technical and really doesnt affect your research. You just need to report it. You only need to report this for chi-square, t-tests, correlation and </p><p>ANOVAs. </p><p>3. Number: Number of cases in your sample. </p><p>4. One- or two-tailed: Where you put your chance of randomness (only with parametric tests). </p><p>.05 due to </p><p>chance </p><p>.025 due to </p><p>chance </p><p>.025 due to </p><p>chance </p></li><li><p> 8 </p><p>5. Significance: You need to report the level of significance that your result reached. </p><p>From these four things, the test result and significance are the most important. Basically, you need a test result </p><p>of a number high enough to reach significance. For example, if I were doing chi-square with 2 df, I need a test </p><p>result (critical value) of at least 5.992 to reach significance. If you reach significance, you can reject the HO </p><p>and accept the HA (You always talk about accepting hypotheses). Dont worry, though. SPSS does all the math. You only need to understand and report the results. </p><p>df\area .050 .025 .010 .005 </p><p>1 3.84146 5.02389 6.63490 7.87944 </p><p>2 5.99146 7.37776 9.21034 10.59663 </p><p>3 7.81473 9.34840 11.34487 12.83816 </p><p>4 9.48773 11.14329 13.27670 14.86026 </p><p>5 11.07050 12.83250 15.08627 16.74960 </p><p>So, with our restaurant and IQ example, the result would be reported as The mean IQ of 130.76 for the 30 eaters at the restaurant was significantly higher than the national average IQ, t (29) = 20.650, p &lt; .001, one-</p><p>tailed). </p><p>In this lesson, you will be introduced to three of the major statistical tests: chi-square, correlation, and t-tests. </p><p>1. CHI-SQUARE: This test is non-parametric, so it is appropriate for nominal data. Chi-square (written 2 whose symbol you can find among the Greek letters in Microsoft Word) is used as a test of frequencies, mostly </p><p>percentages and proportions. The null hypothesis is that the numbers or frequencies that fall into categories are </p><p>not different from a distribution caused by chance. It assumes that randomness is equal distribution among the </p><p>categories. </p><p>There are two types of chi-square: goodness-of-fit and test-of-independence. Goodness-of-fit compares </p><p>frequencies of one variable against a hypothetical or known value. This test is not used quite often. Test-of-</p><p>independence compares frequencies of two or more variables, which is the more used test. Lets practice this. </p><p> Minimum critical value to </p><p>reach significance at p = </p><p>.05 with 2 df. </p></li><li><p> 9 </p><p>Our data are from Pew Internet and American Life Project (, which </p><p>collects survey data in regards to peoples Internet use. These data were collected after the last presidential election in 2004, specifically to gather data on how people behaved politically and in terms of using media. We </p><p>have three variables: </p><p> Polid: if the part...</p></li></ul>