Statistics Course in Psychology

Download Statistics Course in Psychology

Post on 25-Nov-2015




3 download

Embed Size (px)




<ul><li><p>Lesson 1 Introduction </p><p> Outline Statistics Descriptive versus inferential statistics Population versus Sample Statistic versus Parameter Simple Notation Summation Notation Statistics What are statistics? What do you thing of when you think of statistics? Can you think of some examples where you have seen statistics used? You might think about where in the real world you see statistics being used, or think about how statistics in used in your major. Statistics are divided into two main areas: descriptive and inferential statistics. Descriptive statistics- These are numbers that are used to consolidate a large amount of information. Any average, for example, is a descriptive statistic. So, batting averages, average daily rainfall, or average daily temperature are good examples of descriptive statistics. Inferential statistics- inferential statistics are used when we want to draw conclusions. For example when we want to determine if some treatment is better than another, or if there are differences in how two groups perform. A good book definition is using samples to draw inferences about populations. More on this once we define samples and populations. Population- Any set of people or objects with something in common. Anything could be a population. We could have a population of college students. We might be interested in the population of the elderly. Other examples include: single parent families, people with depression, or burn victims. For anything we might be interested in studying we could define a population. Very often we would like to test something about a population. For example, we might want to test whether a new drug might be effective for a specific group. It is impossible most of the time to give everyone a new treatment to determine if it worked or not. Instead we commonly give it to a group of people from the population to see if it is effective. This subset of the population is called a sample. When we measure something in a population it is called a parameter. When we measure something in a sample it is called a statistic. For example, if I got the average age of parents in single-family homes, the measure would be called a parameter. If I measured </p></li><li><p>the age of a sample of these same individuals it would be called a statistic. Thus, a population is to a parameter as a sample is to a statistic. This distinction between samples and population is important because this course is about inferential statistics. With inferential statistics we want to draw inferences about populations from samples. Thus, this course is mainly concerned with the rules or logic of how a relatively small sample from a large population could be tested, and the results of those tests can be inferred to be true for everyone in the population. For example, if we want to test whether Bayer asprin is better than Tylonol at relieving pain, we could not give these drugs to everyone in the population. Its not practical since the general population is so large. Instead we might give it to a couple of hundred people and see which one works better with them. With inferential statistics we can infer that what was true for a few hundred people is also true for a very large population of hundreds of thousands of people. When we write symbols about populations and samples they differ too. With populations we will use Greek letters to symbolize parameters. When we symbolize a measure from a sample (a statistic) we will use the letters you are familiar with (Roman letters). Thus, if I measure the average age of a population Id indicate the value with the Greek letter mu ( =24). While if I were to measure the same value for a subset of the population or a sample then I would indicate the value with a roman letter ( X =24). Simple Notation You might thing about descriptive statistics as the vocabulary of the "language" of statistics. If this is true then summation notation can be thought of as the alphabet of that language. Notation and summation notation is just a short hand way of representing information we have collected and mathematical operation we want to perform. For example, if I collect data on a variable, say the amount of time (in minutes) several people spent waiting at a bus stop, I can represent that group of numbers with the variable X. The variable X represents all of the data that I collected. </p><p> Amount of Time </p><p>X 5.0 11.1 8.9 3.5 12.3 15.6 </p><p>With subscripts I can also represent an individual data point within the variable set we have labeled X. For example the third data point, 8.9, is the X3 data point. The fifth data point X5 is the number 12.3. Very often when we want to represent ALL of the data </p></li><li><p>points in a variable set we will use X by itself, but we may also add the subscript i. Whenever you the subscript i, you can assume that we are referring to all the numbers for the variable X. Thus, Xi is all of the numbers in the data set or: 5,11.1,8.9,3.5,12.3,15.6. There are other common symbols we will use besides X. Sometimes we will have two data sets to deal with and refer to one distribution as X and the other distribution as Y. It is also necessary for many formulas to know how many data points are in a data set. The symbol for the number of data points in a set is N. For the data set above the number of data points or N = 6. In addition, we will use the average or mean value a good deal. We will indicate the mean, as noted above, differently for the population () than for the sample ( X ). Summation Notation Another common symbol we will use is the summation sign ( ). This symbol does not represent anything about our data itself, but instead is an operation we must perform. Whenever you see this symbol it means to add up whatever appears to the right of the sign. Thus, X or Xi tells us to add up all of the data points in our data set. For our example above it would be: 5 + 11.1 + 8.9 + 3.5 + 12.3 + 15.6 = 56.4. You will see the summation sign with other mathematical operations as well. For example X2 tells us to add all the squared X values. Thus, for our example: X2 = 52 + 11.12 + 8.92 + 3.52 + 12.32 + 15.62 -or- 25 + 123.21 + 79.21 + 12.25 + 151.29 + 243.36 = 634.32. A few more examples of summation notation are in order since the summation sign will be central to the formulas we write. The following examples should give you a better idea about how the summation sign is used. Be sure you recall the order of operations needed to solve mathematical expressions. You will find a review on the web page or you can click here: For the examples below we will use a new distribution. X = 1 2 3 4 Y = 5 6 7 8 </p><p>( )22 XX </p></li><li><p> For this expression we are saying that the sum of the squared Xs is not equal to the sum of the Xs squared. Notice here we want to perform the operation in parentheses first, and then the exponents, and then the addition. Thus: </p><p>( )22 XX ( )22222 43214321 ++++++ </p><p>1 + 4 + 9 + 16 (10)2 30 100 For the next expression we show, like in algebra, that the law of distribution applies to the summation sign as well. Again, what is important is to get a feel for how the summation sign works in equations. </p><p>YXYX +=+ )( (1+5)+(2+6)+(3+7)+(4+8) = (1+2+3+4)+(5+6+7+8) 6 + 8 + 10 + 12 = 10 + 26 36 = 36 </p></li><li><p>Lesson 2 Scales of Measure </p><p> Outline Variables -measurement versus categorical -continuous versus discreet -independent and dependent Scales of measure -nominal, ordinal, interval, ratio Variables A variable is anything we measure. This is a broad definition that includes most everything we will be interested in for an experiment. It could be the age or gender of participants, their reactions times, or anything we might be interested in. Whenever we measure a variable, it could be a measurement (quantitative) difference or a categorical (qualitative) difference. You should know both terms for each type. Measurement variables are things to which we can assign a number. It is something we can measure. Examples include age, height, weight, time measurement, or number of children in a household. These examples are also called quantitative because they measure some quantity. Categorical variables are measures of differences in type rather than amount. Examples include anything categorize such as race, gender, or color. These are also called qualitative variables because there is some quality that distinguishes these objects. Another dimension on which variables might differ is that they may be either continuous or discreet. A continuous variable is a variable that can take on any value on the scale used to measure it. Thus, a measure of 1 or 2 is valid, as well as 1.5 or 1.25. Any division on any unit on the scale produces a valid possible measure. Examples include things like height or weight. You could have an object that weighed 1 pound or 1.5 pounds or 1.25 pounds. All are possible measures. Discreet variables, on the other hand, can assume only a few possible values on the scale used to measure it. Divisions of measures are usually not valid. Thus, if I measure the number of television sets in your home it could be 1 or 2 or 3. Divisions of these values are not valid. So, you could not have 1.5 televisions or 1.25 televisions in your home. You either have a television or you dont. Another way to keep this difference in mind is that with a continuous variable is a measure of how much. A discreet variable is a measure of how many. Scales of Measure whenever we measure a variable it has to be on some type of scale. The following scales are delivered in order of increasing complexity. Each scale presented is in order of increasing order. Nominal scales These are not really scales as all, but are instead numbers used to differentiate objects. Real world examples of these variables are common. The numbers </p></li><li><p>are just labels. So, social security numbers, the channels on your television, and sports team jerseys are all good examples of nominal variables. Ordinal Scales Ordinal scales use numbers to put objects in order. No other information other than more or less is available from the scale. A good example is class rank, or any type of ranking. Someone ranked at four had a higher GPA than someone ranked as five, but we dont know how much better four is than five. Interval Scales- Interval scales contain an ordinal scale (objects are in order), but have the added feature that the distance between scale units is always the same. Class rank would not qualify because we dont know how much better one unit is than another, but with interval there is the same distance from one unit to the next anywhere we are on the scale. Examples include temperature (in Fahrenheit or Celsius), or altitude. For temperature you know that the difference in ten degrees is the same no matter how hot or cold it might be. Ratio Scales Ratio scales contain an interval scale (equal intervals between units on the scale), but have the added feature that there is a true zero point on the scale. This zero point is necessary for ratio statements to have meaning. Examples include height or weight or measures of amount of time. Notice that it is not valid to have a measure below zero on any of these scales. Something could not weigh a negative amount. These scales are much more common than interval scales because if a scale usually has a zero point. In fact scientist invented the Kelvin temperature scale so that they would have a measure of temperature on a ratio scale. Again, in order to make ratio statements such as something is twice or half of another then it must be a variable on a ratio scale. </p></li><li><p>Lesson 3 Data Displays </p><p> Outline Frequency Distributions Grouped Frequency Distributions -class interval and frequency -cumulative frequency -relative percent -cumulative relative percent -interpretations Histograms/Bar Graphs Frequency Distributions We often form frequency distributions as a way to abbreviate the values we are dealing with in a distribution. With frequency distributions we will simply record the frequency or how many values fall at a particular point on the scale. For example, if I record the number of trips out of town (X) a sample of FSU students makes, I might end up with the following data: 0 2 5 3 2 4 3 1 0 2 6 0 4 7 0 1 2 4 3 5 4 3 1 6 1 0 5 3 Instead of having a jumbled set of numbers, we can record how many of each value (f) there are for the entire x-distribution. Below is a simple frequency distribution where the X column represents the number of trips, and the corresponding value for f indicates how many people in the sample gave us that particular response. X f 0 5 1 4 2 4 3 5 4 4 5 3 6 2 7 1 From the graph we can see that five people took no trips out of town, four people took one trip out of town, four people took two trips out of town, and so on. It is important not to confuse the f-value and the x-value. The f-values are just a count of how many. So, you can reverse the process as well. It might also be helpful in some examples to go from a frequency distribution back to original data set, especially if it causes confusion. </p></li><li><p>In the following example I start with a frequency distribution and go backward to find all the original values in the distribution. X f 0 2 1 3 2 4 3 3 4 2 What is the most frequent score? The answer is two because we will have four twos in our distribution: 0 0 1 1 1 2 2 2 2 3 3 3 4 4 Grouped Frequency Distributions The above examples used discreet measures, but when we measure a variable it is often on a continuous scale. In turn, there will be few values we measure that are at the exact same point on the scale. In order to build the frequency distribution we will group several values on the scale together and count any of measurements we observe in that range for the frequency. For example, if we measure the running time of rats in a maze we might obtain the following data. Notice that if I tried to count how many values fall at any single point on the scale my frequencies will all be one. 3.25 3.95 4.61 5.92 6.87 7.12 7.58 8.25 8.69 9.56 9.67 10.24 10.95 10.99 11.34 11.59 12.34 13.45 14.53 14.86 We will begin by forming the class interval. This will be the range of value on the scale we include for each interval. There are many rules we could use to determine the size of the interval, but for this course I will always indicate how big the interval should be. In the end, we want to construct a display that has between 5 and 15 intervals. Thus: Class Interval 0-2 3-5 6-8 9-11 12-14 Once we have the class interval, we will count how many values fall within the range of each interval. Since there is a gab in each class interval, we will be actually counting any values that would get rounded...</p></li></ul>