assignment reference material (2017-18) m.p.c.-006
Post on 22-Jan-2022
3 Views
Preview:
TRANSCRIPT
ASSIGNMENT REFERENCE MATERIAL (2017-18)
M.P.C.-006
STATISTICS IN PSYCHOLOGY
SECTION-A
Answer the following question in about 1000 words (wherever applicable) each.
Q1. Describe the major statistical techniques for organising the data with suitable example and
diagrams.
Ans. The four major statistical techniques for organising the data are as follows:
(1) Classification: Classification stands for the entire process of sorting out things ‘similar’ from
amongst a group of things of different characteristics. In other words, a diverse heterogeneous data is
classified into separate homogeneous classes according to the identity of characteristics that exist
amongst the different individuals or quantities constituting the data. The process of arranging the data
in groups and classes according to resemblance and similarities is technically called classification.
According to Corner, “Classification is the process of arranging things (either actually or notionally)
in groups or classes according to their resemblances and affinities and gives expression to the unity of
attributes that may subsist amongst a diversity of individuals.”
Secrist had defined classification as, “It is the process of arranging data into sequences and groups
according to their common characteristics or separating them into different related parts.”
According to Tuttle A.M., “A classification is a scheme for breaking a category into a set of parts
called classes according to some precisely defined differing characteristics possessed by all the
elements of the category.”
In classification of data, units having common characteristics are placed in a class and in this fashion;
the whole data are divided into a number of classes.
(2) Tabulation: Tabulation is the process of summarising classified or grouped data in the form of a
table so that it is easily understood and an investigator is quickly able to locate the desired
information. A table is a systematic arrangement of classified data in columns and rows. Thus, a
statistical table makes it possible for the investigator to present a huge mass of data in a detailed and
orderly form. It facilitates comparison and often reveals certain patterns in data which are otherwise
not obvious. Classification and tabulation as a matter of fact, are not two distinct processes. Actually,
they go together. Before tabulation, data are classified and then displaced under different columns and
rows of a table. Example of data in tabular form is as follows:
(3) Graphical Presentation: Besides formal tables, statistical data can also be presented in the form
of various types of graphs. Graphs are a useful way of conveying information very quick and in brief.
With the same ease and efficiency, they help in comparing data over time and space. They are visual
aids and have a powerful impact on the people.
It is often said, “A picture is worth a thousand words.” They attract a reader's attention to what they
are supposed to convey about the data. The graphical methods of presentation for the table 1.12 are as
follows:
(1) Histogram, Frequency Polygon and Frequency Curve Histogram is a very common type of graph
for displaying classified data. It is a set of rectangle erected vertically. It has the following features:
(i) It is a rectangular diagram;
(ii) Since the rectangles are drawn with specified width and height, histogram is a two dimensional
diagram. The width of a rectangle equals the class interval and height
(iii) The area of each rectangle is proportional to frequency of the respective class.
(4) Diagrammatical Presentation: Diagram is an ideal method of presenting statistical data in visual
form. By diagram we mean the pictorial representation of the quantitative information with a view to
make them easily and readily understandable. There are various forms of diagrams which are as
follows:
(1) Bar Diagrams
(i) Simple Bar Diagram: Like line diagrams, these figures are also used where only single dimension.
Procedure is almost the same, only one thickness of lines is measured. These can also be drawn either
vertically or horizontally. Breadth of these lines or bars should be equal. Similarly, distance between
these bars should be equal. The breadth and distance between them should be taken according to
space available on the paper.
(ii) Multiple Bar Diagram: The diagram is used, when we have to make comparison between more
than two variables. The number of variables may be 2, 3, 4 or more. In case of 2 variables, pair of bars
is drawn. Similarly, in case of 3 variables, we draw triple bars. The bars are drawn on the same
proportionate basis as in case of simple bars. The same shade is given to the same item. Distance
between pairs is kept constant.
(iii) Sub-divided Bar Diagram: The data which is presented by multiple bar diagram can be presented
by this diagram. In this case, we add different variables for a period and draw it on a single bar as
shown in the following examples. The components must be kept in same order in each bar. This
diagram is more efficient if number of components is less, i.e. 3 to 5.
(2) Pie Diagram
The pie diagram is a circle. Within each circle, there is a section that represents a percentage of a
whole number. These sections are displayed like sections of a pie, which is where the graph's name
originates. The pie graph can have as many pieces as necessary to represent the data. In constructing a
pie diagram, the first step is to convert the various values of components of the variable into
percentages and then the percentages transposed into corresponding degrees. The total percentage of
the various components, i.e. 100 is taken as 360o (degrees around the centre of a circle) and the degree
of various components are calculated in proportion to the percentage values of different components.
It is expressed as:
°360
100× Component's Percentage
It should be noted that in case the data comprises of more than one variable, to show the two-
dimensional effect for making comparison among the variables, we have to obtain the square root of
the total of each variable. These square roots would represent the radius of the circles and then they
will be subdivided. A pie diagram helps us in emphasising the area and in ascertaining the relationship
between the various components as well as among the variables. However, compared to a bar
diagram, a pie diagram is less effective for accurate interpretation when the components are in large
numbers.
Q2. Using ANOVA find out if significant difference exists between the scores obtained by the
three groups of employees on Work Motivation.
Ans. K=3 (i.e., 3 groups), n=10 (i.e., each group having 10 cases), N=30 (i.e., the total number of
units in the group)
Null hypothesis Ho = µ1= µ2= µ3
Thus,
GROUP A GROUP B GROUP C
X1 X12 X2 X2
2 X3 X3
2
12 144 67 4489 43 1849
12 144 45 2025 45 2025
14 196 32 1024 34 1156
43 1849 32 1024 33 1089
34 1156 23 529 21 441
45 2025 23 529 11 121
54 2916 12 144 16 256
34 1156 43 1849 27 729
45 2025 45 2025 26 676
65 4225 56 3136 32 1024
∑∑∑∑X1=358 ∑∑∑∑X12=15836 ∑∑∑∑X2=378 ∑∑∑∑X2
2=16774 ∑∑∑∑X3=288 ∑∑∑∑X3
2=9366
10 10 10
35.8 37.8 28.8
Source of variance df SS MSS F Ratio
Among the groups (K-1)
3-1 = 2
-12369.7 -6184.85
8.61
Within the groups (N-K)
30-3=27
-19393.2 -718.2
From F table for 2 and 27 df at .05 level, the F value is 21.3. Our calculated F value is 8.61, which is
greater than F value. Therefore, the F ratio is significant at 0.5 level. Thus, the null hypothesis is
rejected.
Q3. Define nonparametric statistics. Discuss assumptions, advantages and disadvantages of
nonparametric statistics.
Ans. Non-parametric statistics are not based on the assumption of normal distribution of population.
Therefore, these are also known as distribution free statistics. They are not bound to be used with
interval scale data or normally distributed data. The data with non-continuity are to be tackled with
such statistics. In the samples, where it is difficult to maintain the assumption of normal distribution,
non-parametric statistics are used for analysis. The samples with small number of items are treated
with nonparametric statistics because of the absence of normal distribution. These can be used even
for nominal data along with the ordinal data. Some of the usual non-parametric statistics include chi-
square, Spearman’s rank difference method of correlation, Kendall’s rank difference method and
Mann- Whitney U-test.
Assumptions of Non-parametric Statistics are as follows:
(1) We face many situations, where we cannot meet the assumptions and conditions, and thus, cannot
use parametric statistical procedures. In such situation, we are bound to apply non-parametric
statistics.
(2) If our sample is in the form of nominal or ordinal scale, and the distribution of sample is not
normally distributed, and also the sample size is very small; it is always advisable to make use of the
non-parametric tests for comparing samples and to make inferences or test the significance or
trustworthiness of the computed statistics. In other words, the use of non-parametric tests is
recommended in the following situations:
(i) Where sample size is quite small. If the size of the sample is as small as N=5 or N=6, the only
alternative is to make use of non-parametric tests.
(ii) When assumption like normality of the distribution of scores in the population are doubtful, we
use nonparametric tests.
(iii) When the measurement of data is available either in the form of ordinal or nominal scales or
when the data can be expressed in the form of ranks or in the shape of + signs or – signs and
classification like “good-bad”, etc. we use non-parametric statistics.
Advantages: Few advantages of using non-parametric statistics are as follows:
• Non-parametric methods can be used to analyse categorical (nominal scaling) data, rank (ordinal
scaling) data and interval (ratio scaling) data.
• Non-parametric methods are generally easy to apply and quick to compute when sample size is
small.
• Non-parametric methods require few assumptions but are very useful when the scale of
measurement is weaker than required for parametric methods. Hence, these methods are widely used
and yield a more general broad-based conclusion.
• Non-parametric methods provide an approximate solution to an exact problem whereas parametric
methods provide an exact solution to an approximate problem.
• Non-parametric methods provide solution to problems that do not require to make the assumption
that a population is distributed normally or any specific shape.
Limitations: Few major limitations of non-parametric statistic are follows:
• Non-parametric methods should not be used when all the assumptions of the parametric methods can
be met. However, they are equally powerful when assumptions are met, when assumptions are not
met these may be more powerful.
• Non-parametric methods require more manual computational time when sample size gets larger.
• Table values for non-parametric statistics are not as readily available as of parametric methods.
• Non-parametric tests are usually not as widely used and not well known as parametric tests.
SECTION-B
Answer the following questions in about 400 words (wherever applicable) each.
Q1. Discuss type I and type II error and describe steps in setting up the level of significance.
Ans. When a statistical hypothesis is tested, there are four possible results:
(1) The hypothesis is true but out test rejects it.
(2) The hypothesis is false but our test accepts it.
(3) The hypothesis is true and our test accepts it.
(4) The hypothesis is false and our test rejects it.
Obviously, the first two possibilities lead to errors. If we reject a hypothesis when it should be
accepted (possibility No. 1), we say that a Type I error has been made. On the other hand, if we accept
a hypothesis when it should be rejected (possibility No. 2), we say that a Type II error has been made.
In either case, a wrong decision or error in judgment has occurred. The sizes of Type-I error and
Type-II error are denoted by α and β respectively. The usual practice in testing of hypothesis is to fix
α, the size of Type-I error and they try to obtain a criterion which minimises β, the size of Type-II
error.
Steps: The steps in setting up the level of significance are as follows:
(1) State the null hypothesis and the alternative hypothesis. (Note: The goal of inferential statistics is
to make general statements about the population by using sample data. Therefore in testing
hypothesis, we make our predictions about the population parameters).
(2) Set the criteria for a decision.
(3) Level of significance or alpha level for the hypothesis test: This is represented by μ which is the
probability used to define the very unlikely sample outcomes, if the null hypothesis is true.
Now suppose, a manufacturer produces some type of articles of good quality. A purchaser by chance
selects a sample randomly. It so happens that the sample contains many defective articles and it leads
the purchaser to reject the whole product. Now, the manufacturer suffers a loss even though he has
produced a good article of quality. Therefore, this Type-I error is called “producers risk”.
On the other hand, if we accept the entire lot on the basis of a sample and the lot is not really good,
the consumers are put to loss. Therefore, this Type-Il error is called the “consumers risk”.
Q2. Using Spearman’s rho for the following two sets of scores.
Ans.
Q3. With the help of Mann Whitney ‘U’ test find if significant difference exists between the
scores obtained on attitude towards health of individuals belonging to low Socio Economic
Status and high Socio Economic Status.
Ans. First rank the ratings from lowest to highest score of the both scores.
Q4. Describe partial correlation with suitable example.
Ans. A partial correlation between two variables is one that partials out or nullifies the effects of a
third variable (or a number of other variables) upon both the variables being correlated. Two
variables, A and B, are closely related. The correlation between them is partialled out, or controlled
for the influence of one or more variables is known as partial correlation. Therefore, when it is
assumed that some other variable is influencing the correlation between A and B, then the influence of
this variable(s) is partialled out for both A and B. Hence, it can be considered as a correlation between
two sets of residuals. Suppose a simple case of correlation between A and B is partialled out for C.
This can be represented as rAB.C which is read as correlation between A and B partialled out for C. The
correlation between A and B can be partialled out for more variables as well.
For example, the researcher is interested in computing the correlation between anxiety and academic
achievement controlled from intelligence. Then correlation between academic achievement (A) and
anxiety (B) will be controlled for Intelligence (C). This can be represented as:
rAcademic Achievement (A) Anxiety (B). Intelligence (C). To calculate the partial correlation (rp),we will need a data on
all three variables. The computational formula is as follows:
( )( )AB AC BC
p AB.C
AC BC
r r rr r
r r
−= =
− −2 2
1 1
The data of academic achievement, anxiety and intelligence is given below in Table. Here, the
academic achievement test, the anxiety scale and intelligence test is administered on ten students.
Table: Data of academic achievement, anxiety and intelligence for 10 students
In order to compute the partial correlation between the academic achievement and anxiety partialled
out for Intelligence, we first need to compute the Pearson’s Product moment correlation coefficient
between all three variables.
The correlation between anxiety (B) and academic achievement (A) is – 0.369.
The correlation between intelligence (C) and academic achievement (A) is 0.918.
The correlation between anxiety (B) and intelligence (C) is – 0.245.
Now, we can calculate the partial correlation as:
( )( )
( )
( ) ( )( )AB AC BC
AB.C
AC BC
. . .r r rr
r r . .
.
.
− − × −−= =
− − − − −
−=
2 2 2 2
0 369 0 918 0 245
1 1 1 0 918 1 0 245
1441
0 385
=0.375
Q5. Compute Chi-square for the following data:
Ans.
SECTION-C
Answer the following in about 50 words each.
Q1. Hypothesis testing
Ans. Hypothesis testing has a vital role in psychological measurements. By hypothesis, we mean the
tentative answer to any question. Hypothesis testing is a systematic procedure for deciding whether
the results of a research study, which examines a sample, support a particular theory or practical
innovation, which applies to a population. Hypothesis testing is the central theme in most psychology
research.
Q2. Measures of central tendency
Ans. Measures of central tendency are measures of the location of middle or center of a distribution.
The definition of “middle” or “center” is purposely left somewhat vague so that the term “central
tendency” can refer to a wide variety of measures. Measures of central tendency are the best way to
reduce a set of data and still retain part of the information. In other words, it is a way to summarise the
set with a single value.
Q3. Scatter diagram
Ans. Scatter diagram is a graphic method of finding out correlation between two variables. By this
method, direction of correlation can be ascertained. For constructing a scatter diagram, X-variable is
represented on X-axis and the Y-variable on Y-axis. Each pair of values of X and Y series is plotted
in two-dimensional space of X–Y. Thus, we get a scatter diagram by plotting all the pair of values.
Q4. Tetrachoric Correlation
Ans. Tetrachoric correlation is a correlation between two dichotomous variables that have underlying
continuous distribution. If the two variables are measured in a more refined way, then the continuous
distribution will result. For example, attitude to females and attitude towards liberalisation are two
variables to be correlated. Now, we simply measure them as having positive or negative attitude. So
we have 0 (negative attitude) and 1 (positive attitude) scores available on both the variables. Then the
correlation between these two variables can be computed using Tetrachoric correlation ( )tetr .
Q5. Multiple correlation
Ans. The multiple correlation coefficient (R) denotes a correlation of one variable with multiple other
variables. The multiple correlation coefficient is denoted as RA.BCD....K which denotes that A is
correlated with B, C, D upto k variables.
For example, we want to compute multiple correlation between A with B and C then it is expressed as
RA.BC. In this case, we create a linear combination of the B and C which is correlated with A.
Q6. Regression equation
Ans. Regression is the study of nature of relationship between the variables, so that one may be able
to predict the unknown value of one variable for a known value of another variable.
If two variables are significantly correlated, and if there is some theoretical basis for doing so, it is
possible to predict values of one variable from the other. This observation leads to a very important
concept known as ‘Regression Analysis’.
Q7. Normal curve
Ans. The Normal Probability Curve is the ideal symmetrical frequency curve. It is supposed to be
based on the data of a population. In it, the measures are concentrated closely around the centre and
taper off from this central point or top to the left and right. There are very few measures at the low
score end of the scale; an increasing number upto a maximum at the middle position; and a
symmetrical falling-off towards the high score end of the scale. The curve exhibits almost perfect
bilateral symmetry. It is symmetrical about central altitude.
Q8. Degree of freedom
Ans. The number of degrees of freedom, usually denoted by the Greek Symbol—v (read as nu) can be
interpreted as the number of useful items of information generated by a sample of given size with
respect to the estimation of a given population parameter. Thus, a sample of size 1 generates one piece
of useful information if one is estimating the population mean but none if one is estimating the
population variance. In order to know about the variance, one need at least a sample of size n ≥ 2. The
number of degrees of freedom, in general, is the total number of observation minus the number of
independent constraints imposed on the observations.
Q9. Measuring skewness
Ans. There are two methods to study the skewness in a distribution. These are as follows:
(1) Observation Method: There is a simple method of detecting the directions of skewness by the
inspection of frequency polygon prepared on the basis of the scores obtained regarding a trait of the
population or a sample drawn from a population.
(2) Statistical Method: To know the skewness in the distribution, we may also use the statistical
method. For the purpose, we use measures of central tendency, specifically mean and median values
and use the following formula.
( )k
Mean MedianS
−=
σ
3
Q10. Kruskal Wallis Analysis of Variance
Ans. The Kruskal-Wallis test is a non-parametric alternative to one-way ANOVA. There may be
cases where a researcher is not clear about the shape of the population. In this situation, the Kruskal-
Wallis test is a nonparametric alternative to one-way ANOVA. One-way ANOVA is based on the
assumptions of normality, independent groups and equal population variance. In order to perform one-
way ANOVA, it is essential that data is atleast interval scaled. On the other hand, Kruskal-Wallis test
can be performed on ordinal data and is not based on the normality assumption of the population.
Kruskal-Wallis test is based on the assumption of independency of groups. It is also based on the
assumption that individual items are selected randomly.
top related