Download - Statistics in Psychology for IGNOU students

IGNOU MAPC material © 2015, M S Ahluwalia Psychology Learners

MPC006/ASST/TMA/2014-15

IGNOU Assignment

http://psychologylearners.blogspot.in/search/label/IGNOU MAPC

http://msahluwalia.blogspot.com/

http://psychologylearners.blogspot.com/


Statistics in Psychology

Solved Assignment - MAPC





1000 words

Section A

3





Explain in detail about Normal

Probability Curve with suitable

diagrams.

Q1.

4





Normal Probability Curve

5

A1

A normal probability curve shows the theoretical shape of a normally distributed histogram. The shape of the normal probability curve is based on two parameters: mean (average) and standard deviation (sigma). It is based upon the law of probability discovered by French mathematician Abraham Demoiver (1667 – 1754). The normal curve offers a convenient and reasonably accurate description of a great number of variables. It also describes the distribution of many statistics from samples. For example, if you drew 100 random samples from a population of teenagers and computed the mean weight of each sample, you would find that the distribution of the 100 means approximates the normal curve. In such situations, the fit of the normal curve is often very good indeed. Normal curve in Psychology Sir Francis Galton (cousin of Darwin) began the first serious investigation of “individual differences," an important area of study today in education and psychology. In his research on how people differ from one another on various mental and physical traits, Galton found the normal curve to be a reasonably good description in many instances. He became greatly impressed with its applicability to natural phenomena. He referred to the normal curve as the Law of Frequency of Error.

Properties of the Normal Curve Normal curve is a theoretical invention, a mathematical model, an idealized conception of the form a distribution might take under certain circumstances. No empirical distribution—one based on actual data—ever conforms perfectly to the normal curve. However, empirical distributions often offer a reasonable approximation of the normal curve. In these instances, it is quite acceptable to say that the data are “normally distributed."





Properties of the Normal Curve contd.

6

A1

The equation of the normal curve describes a family of distributions. Normal curves may differ from one another with regard to their means and standard deviations. However, they are all members of the normal curve family because they share several properties, as discussed below: 1. Symmetrical about the mean: Normal curves are symmetrical about its mean, i.e., the left half of the

distribution is a mirror image of the right half. The two halves have identical size, shape and slope. 2. Unimodal: They are unimodal, i.e., there is a single peak and the highest point occurs at x=µ.

It follows from these first two properties that the mean, median, and mode all have the same value. 3. Bell-shaped: Normal curves have a bell-shaped form. Starting at the center of the curve and working outward,

the height of the curve descends gradually at first, then faster, and finally more slowly. The maximum ordinate occurs at the center. And, the height of the curve declines symmetrically in both directions. This property alone illustrates why an empirical distribution can never be perfectly normal.

4. Points of inflection: It has inflection points at the points µ - σ and µ + σ. At this point the curve changes its shape from convex to concave.

5. The Empirical Rule: Approximately 68% of the area under the normal curve is between x=µ - σ and x= µ + σ. Approximately 95% of the area under the normal curve is between x=µ - 2σ and x= µ + 2σ. Approximately 99.7% of the area under the normal curve is between x=µ - 3σ and x= µ + 3σ.

6. Total area =1: The area under the curve is 1. The area under the curve is considered to be 100 percent probability.

7. Area of each side = ½: The area under the curve to the right of µ equals the area under the curve to the left of µ which equals ½. Therefore, the curve can also be said to be bilateral.

8. Asymptotic: As x increases without bound (gets larger and larger), the graph approaches, but never reaches, the horizontal axis. As x decreases without bound (gets larger and larger in the negative direction), the graph approaches, but never reaches, the horizontal axis. Thus, it can be said that the curve is asymptotic.

9. Equation: Equation of the normal curve reads:

2

2

2σ

x

e2πσ

Ny





Application of normal curve using z-scores

7

A1

The relationship between the normal curve area and standard deviation units can be put to good use for answering certain questions that are fundamental to statistical reasoning. For example, the following type of question occurs frequently in statistical work: Given a normal distribution with a mean of 100 and a standard deviation of 15, what proportion of cases fall above the score 115. A standard score expresses a score’s position in relation to the mean of the distribution, using the standard deviation as the unit of measurement. z score is one type of standard score, it states how many standard deviation units the original score lies above or below the mean of its distribution. A z score is simply the deviation score divided by the standard deviation, as the following formula illustrates: Normal curve tables can be used for calculations. Following are some types of problems that can be solved: 1. Finding area of curve when score is known – helps identify probability of occurrence of a certain score 2. Finding scores when the area is known – helps to identify which score is expected to occur 3. Comparing scores from different distributions – By converting all scores to z scores, we can get the standard normal distribution. These distributions have a mean of 0 and standard deviation of 1 – regardless of the distributions original mean and standard deviation. These standard normal distributions can be used to compare different distributions.

σ

-xz





Summary and Sources

8

A1

The normal curve offers a convenient and reasonably accurate description of a great number of variables. It also describes the distribution of many statistics from samples, and therefore, is very useful in social sciences such as psychology. The major properties of the normal curve are that it is symmetrical about the mean, unimodal, bell-shaped, points of inflection, the empirical rule, total area is 1 and area of each side is half, asymptotic, and it has a standard equation. Sources: http://www.dmaictools.com/dmaic-measure/normal-probability-curve Statistics in Psychology and Education, Henry E. Garrett Fundamentals of Statistical reasoning in Education, Theodore Coladarci, Casey D. Cobb, Edward W. Minium and Robert C. Clarke (Click for eBook)




http://www.dmaictools.com/dmaic-measure/normal-probability-curve







http://www.amazon.com/STATISTICS-PSYCHOLOGY-EDUCATION-Henry-Garrett/dp/B000NVV5V4/

http://psychologylearners.blogspot.com/2015/07/fundamentals-of-statistical-reasoning.html








Numerical - ANOVA

Q2.

9





Solution

10

A2

A study was conducted in which the teachers were asked to rate students for a particular trait on a ten point scale. With the help of data given below find out whether significant difference exists in the rating of the students by the teachers.

Step 1: For the given problem, Null Hypothesis, Ho is that no significant difference exists in the rating of the students by the teachers. Numerically,

Ho → µx = µy = µz

Number of groups = k = 3 Number of cases in each group = n = 7 Number of units in total = N = 21 Alternative Hypothesis, H1 is that there is significant difference in the rating of the students by the teachers. Since, level of significance is not given we will assume it is α = 0.05, and also test at α = 0.01.





Solution contd.

11

A2

Step 2: Sample – already given as per table above, sampling not required. Step 3: Calculation of necessary statistical measures: Meanx = 33/7 = 4.71 MeanY = 35/7 = 5 Meanz = 28/7 = 4 (i) Correction term,

S. No. Group X Group Y Group Z

X X2 Y Y2 Z Z2

1 7 49 5 25 6 36

2 3 9 5 25 3 9

3 7 49 9 81 7 49

4 1 1 4 16 1 1

5 5 25 3 9 3 9

6 5 25 5 25 5 25

7 5 25 4 16 3 9

Total 33 183 35 197 28 138

86.43821

9216

21

)283533()( 22

N

xC

i

x





Solution contd.

12

A2

(ii) Total sum of squares and Degrees of freedom: = 183 + 197 + 138 – 438.86 = 79.14 Degrees of freedom = N - 1 = 21 – 1 = 20

xxitotal CZYXCxSS )()( 2222

(iii) Between group sum of squares, Degrees of freedom and Variance estimate: = 442.57 – 438.36 = 3.71 Degrees of freedom = k - 1 = 3 – 1 = 2 Mean square (variance estimate) =

xx

i

between CN

ZYXC

N

xSS

2222 )()()()(

855.12

71.32

between

between

df

SSs

between





Solution contd.

13

A2

(iv) Within group sum of squares and Degrees of freedom and Variance estimate: Degrees of freedom = N - k = 21 – 3 = 18 Mean square (variance estimate) =

43.7571.314.79betweentotalwithin SSSSSS

(v) F ratio Summary of Anova

44.019.4

855.12

2

within

between

s

sF

19.418

43.752

within

within

df

SSs

within

Source of variance df SS S2 F ratio

Between groups 2 3.71 1.855 0.44

Within groups 18 75.43 4.19

Total 20 79.14





Solution contd.

14

A2

Step 4: Identify the region of rejection With df (between) = 2, and df (within) = 18, the critical value of F as per the F tables is 3.55 at α = 0.05 and 6.01 at α = 0.01. These are the values of F beyond which the most extreme 5% and 1% of sample outcomes respectively fall if Ho is true. Step 5: Making the statistical decision and forming conclusions Because the F ratio we calculated (0.44) is lower than the critical values of F at both α = 0.05 and α = 0.01, therefore, the obtained F ratio is found to be not significant at both α = 0.05 and α = 0.01. Therefore, we accept the null hypothesis, Ho, i.e.,

µx = µy = µz

No significant difference exists in the rating of the students by the teachers.





Numerical – Mann Whitney U Test

Q3.

15





Solution

16

A3

3. Test the hypothesis of no difference between the groups by using Mann- Whitney U test with the help of the following data:

Scores obtained by educated women on attitude towards health

59, 60, 61, 64, 63, 51, 52, 55, 53, 57, 56, 54, 52, 64, 56, 54, 58, 56, 62, 60, 57

Scores obtained by uneducated women on attitude towards heath

53, 63, 63, 58, 60, 62, 66, 65, 64, 68

Solution: For the given problem, Null Hypothesis, H0, is that the two groups of women (educated and uneducated) do not differ systematically from each other, in their attitude towards health (as they were randomly selected from the same population). Alternative Hypothesis, H1 – the two groups of women (educated and uneducated) differ systematically from each other, in their attitude towards health. Step 1: Rank the data taking both groups together (see table on following page) Step 2: Find sum of ranks for smaller sample (see table) 5.223uR





Solution contd.

17

A3

Step 3: Find sum of ranks for larger sample (see table) Step 4: Calculate U for both groups:

5.272eR

Educated women

Score Rank Score Rank

59 16 54 6.5

60 18 52 2.5

61 20 64 27

64 27 56 10

63 24 54 6.5

51 1 58 14.5

52 2.5 56 10

55 8 62 21.5

53 4.5 60 18

57 12.5 57 12.5

56 10

Uneducated women

Score Rank

53 4.5

63 24

63 24

58 14.5

60 18

62 21.5

66 30

65 29

64 27

68 31 5.168

5.272231210

5.2722

)121(211021

2

)1(e

eeuee R

nnnnU

5.41

5.22355210

5.2232

)110(101021

2

)1(u

uuueu R

nnnnU





Solution contd.

18

A3

Step 5: Determine the significance of U In this case the direction of the alternative Hypothesis is not important because we are only checking if there is a difference, therefore, we are making a two tailed decision. The critical value of U for ne=21 and nu=10, for a two tailed decision with α=0.05 is not given in the table. This is because as the n for one of the groups increases beyond 20, the sampling distribution of U can be treated as normal. Therefore, we need to perform a z test as follows: Step 6: At α=0.05, z value is 1.96 for two-tailed test. Since 2.68 > 1.96, the z is significant. Therefore, we decide to reject the null Hypothesis, as absolute value of obtained z is greater than the critical value of 1.96. Since we reject Null Hypothesis, we accept alternative hypothesis – the two groups of women (educated and uneducated) differ systematically from each other, in their attitude towards health.

68.2560

5.63

12

)11021(1021

2

10215.168

12

)1(

2

ueue

uee

nnnn

nnU

z





400 words

Section B

19





Numerical – Product-Moment

correlation

Q4.

20





Solution

21

A4

4. Calculate coefficient between the following sets of scores using Pearson’s Product Moment method. Individuals A B C D E F G H I J

Test A 1 6 7 3 11 9 7 11 14 11

Test B 2 3 5 6 6 8 10 10 12 13

Step 1: Organize the data into a table – test scores reflected in columns X and Y (see table) Step 2: Compute means of X and Y. Step 3: Compute deviation of each score from its mean (see table) Step 4: Calculate squares of deviation. Then calculate sum of these deviations. (see table)

810

80

n

XX

5.710

75

n

YY





Solution contd.

22

A4

Subject Test A (X)

Test B (Y)

X - Meanx Y - Meany

(X - Meanx)

2 (Y - Meany)

2 (X - Meanx) x (Y – Meany)

A 1 2 -7 -5.5 49 30.25 38.5

B 6 3 -2 -4.5 4 20.25 9

C 7 5 -1 -2.5 1 6.25 2.5

D 3 6 5 -1.5 25 2.25 7.5

E 11 6 3 -1.5 9 2.25 -4.5

F 9 8 1 0.5 1 0.25 0.5

G 7 10 -1 2.5 1 6.25 -2.5

H 11 10 3 2.5 9 6.25 7.5

I 14 12 6 4.5 36 20.25 27

J 11 13 3 5.5 9 30.25 16.5

Total 80 75 144 124.5 102

Step 5: Calculate standard deviations for X and Y:

79.310

1442

n

XXSX

53.310

5.1242

n

YYSY





Solution contd.

23

A4

Step 6: Pearson’s Product-Moment correlation cofficient, r: This indicates a high positive correlation between the scores of the two tests. A student who scores well in one subject can be expected to score high in the other test too.

762.079.353.310

102

YX SSn

YYXXr





Discuss in detail the measures of

central tendency with suitable

example.

Q5.

24





Measures of central tendency

25

A5

Central tendency refers to the descriptive statistic that best represents the center of a data set, the particular value that all the other data seem to be gathering around. The key measures of central tendency are:

Mean, the Arithmetic Average The most commonly reported measure of central tendency is the mean, the arithmetic average of a group of scores. It is calculated by summing all the scores in a data set and then dividing this sum by the total number of scores. For example, if we explore the numbers of top finishes that countries had in World Cup soccer tournaments, the mean would be calculated by first adding the number of top finishes for each country, then dividing by the total number of countries. For the 14 countries that had at least 1 top finish: 4 + 8 + 1 + 2 + 1 + 2 + 2 + 6 + 2 + 2 + 2 + 2 + 2 + 10 = 46. In this case, we divide 46, the sum of all scores, by 14, the number of scores in this sample: 46/14 = 3.29 Visual Representations: Mean is the visual point that perfectly balances two sides of a distribution. Formula: Mean = Sum of the values/Number of values

Median, the Middle Score The second most common measure of central tendency is the median. It is the middle score of all the scores in a sample when the scores are arranged in ascending order. We can think of the median as the 50th percentile. Example: the top finishes for 13 of the 14 countries in the World Cup example are: 4, 8, 1, 2, 1, 2, 2, 6, 2, 2, 2, 2, 10





Measures of central tendency contd.

26

A5

To determine the median, follow these steps: Step 1: Arrange the scores in ascending order: 1, 1, 2, 2, 2, 2, 2, 2, 2, 4, 6, 8, 10 Step 2: Find the middle score (With an even number of scores, there will be no actual middle score. In this case, take the mean of the two middle scores.) There are 13 scores: 13/2 = 6.5. If we add 0.5 to this result, we get 7. Therefore, the median is the 7th score. We now count across to the 7th score. The median is 2. Formula: Median = (n+1)/2 ranked value

Mode, the Most Common Score The mode is the most common score of all the scores in a sample. It is readily picked out on a frequency table, histogram, or frequency polygon. Example: Determine the mode for the World Cup data for the 14 countries. The mode can be found either by searching the list of numbers for the most common score or by constructing a frequency table: 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 4, 6, 8, 10. Mode =2 When a distribution of scores has one mode, we refer to it as unimodal. When a distribution has two modes, we call it bimodal. When a distribution has more than two modes, we call it multimodal.





Summary and Sources

27

A5

To summarize, the central tendency refers to the “typical” score of a group of scores. Mean, median and mode are the three measures of central tendency most commonly used. Sources: Statistics for the Behavioral Sciences, Susan A. Nolan and Thomas Heinzen




http://www.amazon.com/Statistics-Behavioral-Sciences-Susan-Nolan/dp/142923265X/

http://www.amazon.com/Statistics-Behavioral-Sciences-Susan-Nolan/dp/142923265X/


Discuss the procedure of ANOVA.

Q6.

28





Procedure of ANOVA

29

A6

In analysis of variance, a continuous response variable, known as a dependent variable, is measured under experimental conditions identified by classification variables, known as independent variables. The variation in the response is assumed to be due to effects in the classification, with random error accounting for the remaining variation. The steps below explain the procedure of Anova. The general logic of analysis of variance is the same as that of significance tests. That is, you assume H0 to be true and then determine whether the obtained sample result is rare enough to raise doubts about H0. To do this, convert the sample result into a test statistic (F, in this case). Then locate it in the theoretical sampling distribution (the F distribution). If the test statistic falls in the region of rejection, H0 is rejected; if not, H0 is retained. The only new twist is that if H0 is rejected, follow-up testing is required to identify the specific source(s) of significance. Step 1 Formulate the statistical hypotheses and select a level of significance. Ex: Let’s assume the statistical hypotheses to be:

Ho → µ1 = µ2 = µ3

Let α=.05 be the level of significance. Step 2 Determine the desired sample size and select the sample. Step 3 Calculate the necessary sample statistics. (i) Correction term:

N

xC

i

x

2)(





Procedure of ANOVA contd.

30

A6

(ii) total sum of squares and degrees of freedom: Degrees of freedom = N - 1 (iii) between-groups sum of squares, degrees of freedom, and variance estimate: Degrees of freedom = k - 1 Mean square (variance estimate) = (iv) within-groups sum of squares, degrees of freedom, and variance estimate: Degrees of freedom = N – k Mean square (variance estimate) =

xxitotal CZYXCxSS )()( 2222

xx

i

between CN

ZYXC

N

xSS

2222 )()()()(

between

between

df

SSs

between

2

betweentotalwithin SSSSSS

within

within

df

SSs

within

2






31

A6

(v) F ratio Summary of Anova Step 4 Identify the region of rejection. Let's say dfbetween = 2 and dfwithin = 6. The critical value of F is 5.14 (using tables). This is the value of F beyond which the most extreme 5% of sample outcomes will fall when H0 is true. Step 5 Make the statistical decision and form conclusions. Because the sample F ratio falls in the rejection region (i.e., 7.75 > 5.14), we reject the Null Hypothesis. The overall F ratio is statistically significant (α = .05), so we conclude that the population means differ in some way. Next, conduct post hoc comparisons to determine the specific source(s) of the statistical significance.

2

2

within

between

s

sF

Source of variance df SS S2 F ratio

Between groups k – 1 SSbetween SSbetween / (k-1) s2between / s2

within

Within groups N – k SSwithin SSwithin/ (N-k)

Total N - 1 79.14






32

A6

Step 6 Conduct Tukey’s HSD test. • Calculate HSD: • Compare HSD with each difference between sample means. • Make the statistical decisions and form conclusions: Ex: Group 1 is significantly different from Groups 2 and 3; the difference between Groups 2 and 3 is not significant. * * * Above is the procedure of Analysis of Variance. Sources: http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/statug_anova_sect001.htm Fundamentals of Statistical reasoning in Education, Theodore Coladarci, Casey D. Cobb, Edward W. Minium and Robert C. Clarke (Click for eBook)

group

within

n

sqHSD

2




http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/statug_anova_sect001.htm









Numerical – Regression equations

Q7.

33





Solution

34

A7

7. With the help of following data, determine both the regression equations Psychology (X): Mean= 30, Standard deviation=1.6 Sociology (Y): Mean= 25, Standard deviation= 1.7 Coefficient of correlation = 0.95 Given: MeanX = 30, σX=1.6 MeanY = 25, σY=1.7 r = 0.95 The regression equations in score form (as per Garrett, p.158) are: 1. Line of regression of y on x: 2. Line of regression of x on y:

28.501.1

28125.5009375.1

306.1

7.195.025

xy

xy

xy

xxryyx

y

65.789.0

257.1

6.195.030

yx

yx

yyrxxy

x





Numerical – Chi square

Q8.

35





Solution

36

A8

8. A research was carried out in which the students of mathematics and literature were asked to express their preference for lecture method and group discussion method. The data obtained is given below. Find out whether the preference for a method is dependent on the subject taken by the students. H0, Null Hypothesis: Preference for a method is not dependant on the subject taken by students. H1, Alternative Hypothesis: Preference for a method is dependant on the subject taken by students. Step 1: Add numbers across columns and rows:

Students Preference for Lecture Method

Preference for Group Discussion method

Mathematics 3 12

Literature 5 10



Total

Mathematics 3 12 15

Literature 5 10 15

Total 8 22 30





Solution contd.

37

A8

Step 2: Calculate expected numbers for each cell: Step 3: Calculate Chi square for each cell: Step 4: Calculate total Chi square:



Mathematics 4 11

Literature 4 11



Mathematics 0.25 0.091

Literature 0.25 0.091

GrandTotal

ColumnSumRowSumEvalue

E

EOE2

2

682.0091.0091.025.025.02

total





Solution contd.

38

A8

Step 5: Calculate degrees of freedom, df: df = (number of rows – 1) x (number of columns – 1) = (2 – 1) x (2 – 1) = 1 Step 6: Check critical value of Chi-square for df = 1 and α = 0.05. χ2 critical = 3.841

χ2 total (0.682) < χ2 critical(3.841)

Step 7: Decision and Interpretation Because χ2 total < χ2 critical, therefore, the null hypothesis is accepted. Therefore, the preference for a method is not dependant on the subject taken by the students.





50 words

Section C

39





Hypothesis

40

A9

A hypothesis is a tentative statement about the relationship between two or more variables. A hypothesis is a specific, testable prediction about what you expect to happen in your study. For example, a study designed to look at the relationship between sleep deprivation and test performance might have a hypothesis that states, "This study is designed to assess the hypothesis that sleep deprived people will perform worse on a test than individuals who are not sleep deprived." Hypothesis is of two types: 1. The null hypothesis, H0, plays a central role in statistical hypothesis testing: It is the hypothesis that is assumed

to be true and formally tested, it is the hypothesis that determines the sampling distribution to be employed, and it is the hypothesis about which the final decision to "reject" or "retain" is made

2. The alternative hypothesis, H1, specifies the alternative population condition that is “supported" or “asserted" upon rejection of H0. H1 typically reflects the underlying research hypothesis of the investigator. If the interest is only in one direction it is called directional alternative hypothesis. Else, if interest is in both directions it is called non-directional alternative hypothesis.

* * * Sources: http://psychology.about.com/od/hindex/g/hypothesis.htm Fundamentals of Statistical reasoning in Education, Theodore Coladarci, Casey D. Cobb, Edward W. Minium and Robert C. Clarke (Click for eBook)




http://psychology.about.com/od/hindex/g/hypothesis.htm









Level of Significance (α)

41

A10

The level of significance, α, specifies how rare the sample result must be in order to reject H0 as untenable. It is a probability (typically .05, .01, or .001) based on the assumption that H0 is true. The probability, let’s say .05, is split evenly between the two tails—2.5% on each side—because of the non-directional, two-tailed nature of H1. The regions defined by the shaded tails are called regions of rejection (or critical regions), for if the sample mean falls in either, H0 is rejected as untenable. The critical values of z separate the regions of rejection from the middle region of retention. There are two ways to evaluate the tenability of H0: 1. Compare p value to α (in this case, .0278 < .05) 2. Compare calculated z ratio to its critical value (+ 2.20 > + 1.96) Because both p (i.e., area) and the calculated z reflect the location of the sample mean relative to the region of rejection, conclusion regarding H0 will be same. α gives the probability of rejecting H0 when it is true - Type I error. * * * Sources: <FundaFundamentals of Statistical reasoning in Education, Theodore Coladarci, Casey D. Cobb, Edward W. Minium and Robert C. Clarke (Click for eBook)













The Histogram

42

A11

When the concept of a bar chart is generalized to quantitative data, we get a histogram. The histogram comprises a series of bars of uniform width, each one representing the frequency associated with a particular class interval. As with the bar chart, either absolute or relative frequencies may be used on the vertical axis of a histogram, as long as the axis is labeled accordingly. Unlike the bar chart, the bars of a histogram are contiguous—their boundaries touch—to capture the quantitative nature of the data. (The exception occurs when an ordinal variable is graphed). Values along the horizontal axis, the class intervals, are ordered left to right from the smallest to the largest. The histogram also communicates the underlying shape of the distribution— ex: more scores at the upper end, fewer at the lower end. Although the latter observation also can be made from Table, such observations are more immediate with a well constructed histogram. * * * Sources: Fundamentals of Statistical reasoning in Education, Theodore Coladarci, Casey D. Cobb, Edward W. Minium and Robert C. Clarke (Click for eBook)












Point estimation

43

A12

A point estimate is a single value—a “point"—taken from a sample and used to estimate the corresponding parameter in the population. A statistic is an estimate of a parameter: X estimates m, s estimates s, s2 estimates s2, r estimates r, and P estimates p. Opinion polls offer the most familiar example of a point estimate. When, on the eve of a presidential election, you hear on CNN that 55% of voters prefer Candidate X (based on a random sample of likely voters), you have been given a point estimate of voter preference in the population. Point estimates should not be stated alone. That is, they should not be reported without some allowance for error due to sampling variation. Without additional information, it cannot be known whether a point estimate is likely to be fairly close to the mark (the parameter) or has a good chance of being far off. * * * Sources: Fundamentals of Statistical reasoning in Education, Theodore Coladarci, Casey D. Cobb, Edward W. Minium and Robert C. Clarke (Click for eBook)












Type I error

44

A13

The decision to reject or retain H0 depends on the announced level of significance, α. α is a statement of risk that the researcher is willing to assume in making a decision about H0. When H0 is true (µ0 = µtrue), 5% of all possible sample means nevertheless will lead to the conclusion that H0 is false because 5% of the sample means fall in the “rejection" region of the sampling distribution, even though these extreme means will occur (though rarely) when H0 is true. Thus, there’s a probability of .05 that H0 will be rejected when it is actually true. Rejecting a true H0 is a decision error. Level of significance, α, gives the probability of rejecting H0 when it is actually true. Rejecting H0 when it is true is known as a Type I error. A Type I error is getting statistically significant results “when you shouldn’t." To reduce the risk of making such an error, the researcher can set α at a lower level. * * * Sources: Fundamentals of Statistical reasoning in Education, Theodore Coladarci, Casey D. Cobb, Edward W. Minium and Robert C. Clarke (Click for eBook)












The Point-Biserial Correlation

45

A14

It is a special case of correlation in which one of the variables has only two possible values, and these values represent different groups. For instance, it is possible to find the correlation between height and gender. At first, this may seem impossible, because gender is not quantifiable, and you need numbers for both variables to calculate r. However, you can arbitrarily assign two different numbers to the two different groups and then calculate the correlation. It doesn’t matter what two numbers you assign: You will get the same r if you use 1 and 2, or 3 and 17. The r you get, which is called the point-biserial r (symbolized rpb ), is meaningful. Suppose you assign 1 to females and 2 to males and correlate these gender numbers with their heights. In this case, r will measure the tendency for the heights to get larger as the gender number gets larger (i.e., goes from 1 to 2). If we assign the larger gender number to females, the sign of r will reverse, which is why the sign of rpb is usually ignored. * * * Sources: Essentials of Statistics for the Social and Behavioral Sciences, Barry H. Cohen and R. Brooke Lea




http://psychologylearners.blogspot.com/2015/07/essentials-of-statistics-for-social-and.html


Degrees of Freedom

46

A15

The concept of degrees of freedom is central to the principle of estimating statistics of populations from samples of them. "Degrees of freedom" is commonly abbreviated to df. df is a mathematical restriction that needs to be put in place when estimating one statistic from an estimate of another. Ex 1: Imagine you have four numbers (a, b, c and d) that must add up to a total of m; you are free to choose the first three numbers at random, but the fourth must be chosen so that it makes the total equal to m - thus your degree of freedom is three. Ex 2: Take data that has been drawn at random from a normal distribution. In order to estimate standard deviation (sigma), we must first estimate mean (mu). Thus, mu is replaced by x-bar in the formula for sigma i.e., we work with the deviations from mu estimated by the deviations from x-bar. At this point, we need to apply the restriction that the deviations must sum to zero. Thus, degrees of freedom are n-1 in the equation for s below: * * * Sources: http://www.statsdirect.com/help/content/basics/degrees_freedom.htm




http://www.statsdirect.com/help/content/basics/degrees_freedom.htm

http://www.statsdirect.com/help/content/basics/degrees_freedom.htm


Variance

47

A16

The variance, denoted with the symbol S2, is the mean of the squared deviation scores. Because the variance is responsive to the value of each score in a distribution, the variance uncovers differences in variability that less sophisticated measures of variability (e.g., range) do not. The numerator of this expression, the sum of the squared deviations from the mean, has its own abbreviation; it is known as the sum of squares, or SS. The variance finds its greatest use in more advanced statistical procedures, particularly in statistical inference. The calculated value of the variance is expressed in squared units of measurement. Not only is a “squared word" difficult to understand in its own right, but the squaring is problematic on more technical grounds as well: If the scores of one distribution deviate twice as far from the mean as those of another, the variance of the first distribution will actually be four times as large as that of the second. Because of this, the variance is little used for interpretive purposes. * * * Sources: Fundamentals of Statistical reasoning in Education, Theodore Coladarci, Casey D. Cobb, Edward W. Minium and Robert C. Clarke (Click for eBook) Essentials of Statistics for the Social and Behavioral Sciences, Barry H. Cohen and R. Brooke Lea (Click for eBook)













Scatter plot

48

A17

Graphic display of data communicates the nature of a distribution more quickly and vividly. Scatterplot is arguably the most informative device for illustrating a bivariate distribution and visually assess correlation between two variables. A scatterplot has two equal-length axes, one for each variable (“bivariate"). The horizontal axis of of figure below represents score values on the spatial reasoning test (X), and the vertical axis represents score values on the test of mathematical ability (Y). Each axis is marked off according to the variable’s scale. Each dot, or data point, represents a student’s two scores simultaneously. All you need to construct a scatterplot is graph paper, ruler, pencil, and a close eye on accuracy as you plot each data point. Consider the inspection of scatterplots to be a mandatory part of correlational work because of the visual information they convey. * * * Sources: Fundamentals of Statistical reasoning in Education, Theodore Coladarci, Casey D. Cobb, Edward W. Minium and Robert C. Clarke (Click for eBook)












Wilcoxon Test

49

A18

One of the best ways to improve power is by employing repeated measures or matching, and such designs can be analyzed with nonparametric methods. A useful test to see whether the members of a pair differ in size, Wilcoxon’s test resembles the Sign-Test in scope, but it is much more sensitive. In fact, for large numbers it is almost as sensitive as the Student t-test. For small numbers with unknown distributions this test is even more sensitive than the Student t-test. When we do not know whether values are normally distributed, this test is preferred over the Student t-test. Comparisons to Other Tests Ranking discrepancies that can’t be quantified precisely is difficult, which probably accounts for why Wilcoxon’s T is rarely used in that way. The more common use for this test is as an alternative to the matched t test. The Wilcoxon test will usually have as much as 90% of the power of the matched t test; the sign test will have considerably less power. The Wilcoxon Matched-Pairs Signed-Ranks Test uses the sizes of the differences. The result can differ from that of the Sign-test, which uses the number of + and - signs of the differences. * * * Sources: Essentials of Statistics for the Social and Behavioral Sciences, Barry H. Cohen and R. Brooke Lea (Click for eBook) http://www.fon.hum.uva.nl/Service/Statistics/Signed_Rank_Test.html





http://www.fon.hum.uva.nl/Service/Statistics/Signed_Rank_Test.html


For more solved assignments visit http://PsychologyLearners.blogspot.com.

M S Ahluwalia (MSA) is a psychology learner, artist, and photographer. Know more, visit Estudiante De La

Vida or follow on Twitter or Facebook:

For Super-Notes: Click Here


















http://psychologylearners.blogspot.com/search/label/Super-Notes




http://sirfbusiness.blogspot.com/search/label/Super-Notes

http://twitter.com/msahluwalia

http://www.facebook.com/pages/M-S-Ahluwalia/258015240783

Download - Statistics in Psychology for IGNOU students

Top Related