chapter-6 reliability and...

29
CHAPTER-6 RELIABILITY AND VALIDITY 6.0 Introduction 6.1 Meaning and Methods of Reliability 6.2 Methods of Estimating Reliability 6.2.1 Test-Retest Method 6.2.2 Alternate or Parallel Forms Method 6.2.3 Split-Half Method 6.2.4 Method of Rational Equivalence 6.3 Reliability of the Present Test 6.3.1 Test-Retest Method 6.3.2 Split-Half Method 6.3.3 Reliability by Rulon Formula 6.3.4 Method of Rational Equivalence 6.3.5 Standard Error of Measurement 6.3.6 Standard Error of Correlations 6.3.7 Comprehensive View of Reliability 6.4 Meaning of Validity 6.5 Methods of Validity 6.5.1 Face Validity 6.5.2 Content Validity 6.5.3 Criterion Validity 6.5.4 Construct Validity 6.5.5 Factorial Validity

Upload: others

Post on 07-Apr-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CHAPTER-6 RELIABILITY AND VALIDITYshodhganga.inflibnet.ac.in/bitstream/10603/40090/14/14_chapter6.pdf · that every item on each form is interchangeable (Garrett, 1981). Kuder-Richardson

CHAPTER-6

RELIABILITY AND VALIDITY

6.0 Introduction

6.1 Meaning and Methods of Reliability

6.2 Methods of Estimating Reliability

6.2.1 Test-Retest Method

6.2.2 Alternate or Parallel Forms Method

6.2.3 Split-Half Method

6.2.4 Method of Rational Equivalence

6.3 Reliability of the Present Test

6.3.1 Test-Retest Method

6.3.2 Split-Half Method

6.3.3 Reliability by Rulon Formula

6.3.4 Method of Rational Equivalence

6.3.5 Standard Error of Measurement

6.3.6 Standard Error of Correlations

6.3.7 Comprehensive View of Reliability

6.4 Meaning of Validity

6.5 Methods of Validity

6.5.1 Face Validity

6.5.2 Content Validity

6.5.3 Criterion Validity

6.5.4 Construct Validity

6.5.5 Factorial Validity

Page 2: CHAPTER-6 RELIABILITY AND VALIDITYshodhganga.inflibnet.ac.in/bitstream/10603/40090/14/14_chapter6.pdf · that every item on each form is interchangeable (Garrett, 1981). Kuder-Richardson

6.6 Validity of the Present Test

6.6.1 Content Validity

6.6.2 Criterion Validity

6.6.3 Factorial Validity

6.6.4 Construct Validity

6.7 Conclusion

Page 3: CHAPTER-6 RELIABILITY AND VALIDITYshodhganga.inflibnet.ac.in/bitstream/10603/40090/14/14_chapter6.pdf · that every item on each form is interchangeable (Garrett, 1981). Kuder-Richardson

120

CHAPTER-6

RELIABILITY AND VALIDITY

6.0 Introduction

It is necessary to know the validity and reliability of the tool before

evaluating the result obtained by it. If we do not know the reliability and validity

of the instrument than the evaluation and interpretation of results obtained by such

tool or instrument is meaningless. Different kind of validity and reliability are

discussed in this chapter.

6.1 Meaning of Reliability

Reliability means consistency of the test result. Internal consistency of

results and consistency of results over a long period of time.

According to McMillan and Schumacher (1989),

Reliability refers to the consistency of measurement, the extent to which

the results are similar over different forms of the same instrument or occasions of

data collection. The goal of developing reliable measures is to minimize the

influence of chance or other variables unrelated to the intent of the measure.

Reliability is mathematically defined as the ratio of true score variance and total

variance of test scores (Gregory, 2005).

rxy = б T2 = б T

2___ б X

2

б T2 + б e2

Where rxy is reliability coefficient, 6e is measurement error, 6T2 is true

variance and 6X2

is total variance of the test scores. There is no way to directly

observe or calculate the true score, therefore, a variety of methods are used to

Page 4: CHAPTER-6 RELIABILITY AND VALIDITYshodhganga.inflibnet.ac.in/bitstream/10603/40090/14/14_chapter6.pdf · that every item on each form is interchangeable (Garrett, 1981). Kuder-Richardson

121

estimate the reliability of a test. Correlation coefficient of the test may be in

between -1.0 to +1.0.

6.2 Methods of Estimating Reliability

There are four procedures in common use for computing the reliability

coefficient (sometimes called the self correlation) of a test (Garrett, 1981).

Methods are as under.

6.2.1 Test-Retest Method

In this method same form of the test is administered twice on same group

of individuals over a period of time. Here evaluation is based on the correlation

(Pearson product moment correlation) between two administrations of the same

test, scale or instrument on same group for different times. The resulting test

scores are correlated and this correlation coefficient provides a measure of

stability and indicates how stable the test results are over the given period of time.

6.2.2 Alternate or Parallel Forms Method

Two forms of the same test (alternate form or parallel form) are

independently constructed to meet the same specifications, often on an item-by-

item basis. Thus, alternate forms of a test incorporate similar content and cover

the same range and level of difficulty in items. It is difficult to construct a parallel

form of a test.

6.2.3 Split-Half Method

Reliability also can be estimated from a single administration of a single

form of a test. The test is administered to a group of pupils in the usual manner.

Page 5: CHAPTER-6 RELIABILITY AND VALIDITYshodhganga.inflibnet.ac.in/bitstream/10603/40090/14/14_chapter6.pdf · that every item on each form is interchangeable (Garrett, 1981). Kuder-Richardson

122

After scoring the responses, the test is divided in to two splits; one split of the test

has odd number of items and other split of the test has even number of items. Two

split or halves of a test should be equivalent. The correlation coefficient between

the scores obtained on these splits is calculated and then the reliability of whole

test is calculated. Split half reliability provides a measure of internal consistency.

This coefficient indicates the degree to which consistent results are obtained from

the two halves of the test.

6.2.4 Method of Rational Equivalence

In this method, the test is equivalent to a hypothetical parallel form such

that every item on each form is interchangeable (Garrett, 1981). Kuder-

Richardson (K-R) formula in order to correlate all items on a single test with each

other when each item is scored right or wrong. K-R reliability is thus determined

from a single administration of an instrument, but without having to split the

instrument into equivalent halves. This procedure assumes that all items in an

instrument or test are equivalent to each other, and it is appropriate when the

purpose of the test is to measure a single trait. If a test or instrument has items of

varying difficulty or it measures more than one trait, the K-R estimate would

usually be lower than the split-half reliability.

6.3 Reliability of the Present Test

Reliability of the present test is estimated by

(1) Test-Retest method

(2) Split-Half method

(3) Method of Rational Equivalence

Page 6: CHAPTER-6 RELIABILITY AND VALIDITYshodhganga.inflibnet.ac.in/bitstream/10603/40090/14/14_chapter6.pdf · that every item on each form is interchangeable (Garrett, 1981). Kuder-Richardson

123

6.3.1 Test-Retest Method

To estimate reliability by this method, one school from rural area and one

school from urban area were randomly selected. The test was administered over

107 students of different schools of different area. After 29 days, RAT was given

those students who took part in test. The sample selected for reliability was

indicated in chapter-4. Scores obtained on test-retest is show in table 6.1.

Table 6.1

Scatter Diagram of Scores on Test-Retest

Test

Retest

11-15

16-20

21-25

26-30

31-35

36-40

41-45

46-50

51-55

56-60

fy

51-55 3(20) 3(25) 6

46-50 1(0) 1(8) 2

41-45 3(6) 4(9) 1(12) 1(15) 9

36-40 1(-8) 1(-4) 2(-2) 3(0) 6(2) 2(4) 2(6) 1 (10) 18

31-35 2(-1) 5(0) 4(1) 1(2) 1(4) 13

26-30 1(0) 1(0) 4(0) 2(0) 8

21-25 2(4) 4(3) 4(2) 3(1) 1(0) 1(-1) 2(-3) 17

16-20 3(8) 4(6) 4(4) 2(2) 3(0) 1(-8) 1(-10) 18

11-15 5(12) 3(9) 3(6) 1(3) 12

6-10 2(16) 1(12) 1(0) 4

fx 14 13 16 12 14 11 7 8 6 6 107

Cx = -0.336 Cy =0.0841 Σx’y’= 484 6x=2.668 SER=0.004

Cx2 = 0.1131 Cy

2 =0.007 N=108 6y=2.384

The correlation coefficient between test and retest scores was 0.72, which

was significant at 0.01 levels. Hence the reliability determined by this method for

RAT was high.

Page 7: CHAPTER-6 RELIABILITY AND VALIDITYshodhganga.inflibnet.ac.in/bitstream/10603/40090/14/14_chapter6.pdf · that every item on each form is interchangeable (Garrett, 1981). Kuder-Richardson

124

6.3.2 Split-Half Method

To determine the reliability by this method, the test was administered over

371 students of different schools. The test dividing into two equivalent halves; one

half form of the test contains odd numbered items and other half form of the test

contains even numbered items. Correlation between scores obtained on these two

halves forms was calculated, and then, Spearman-Brown prophecy formula was

applied to find out the reliability. The scatter diagram for correlation between odd

and even numbered items scores is show in table 6.2.

Table 6.2

Scatter Diagram of Scores Obtained on Two Halves of the Test

Odd

Even

1-3

4-6

7-9

10-12

13-15

16-18

19-21

22-24

25-27

28-30

fy

28-30 2(8) 5(12) 11(16) 18

25-27 1(0) 2(3) 11(6) 12(9) 5(12) 31

22-24 1(0) 9(2) 12(4) 7(6) 4(8) 33

19-21 2(-1) 7(0) 18(1) 13(2) 5(3) 45

16-18 1(0) 7(0) 18(0) 16(0) 1(0) 43

13-15 1(4) 3(3) 11(2) 16(1) 16(0) 3(-1) 50

10-12 3(8) 14(6) 25(4) 4(2) 11(0) 3(-2) 60

7-9 9(12) 21(9) 10(6) 4(3) 2(0) 46

4-6 3(20) 8(16) 15(12) 5(8) 5(4) 36

1-3 2(25) 4(20) 3(15) 09

fx 5 25 56 52 38 56 51 38 29 21 371

Cx = -0.369 Cy = -0.595 Σx’y’= 1919 6x=2.10 SER=0.001

Cx2 = 0.136 Cy

2 =0.3548 N=371 6y=2.388 SEM=1.94

The correlation between two split was 0.987 and reliability for whole test

was found out by Spearman-Brown formula. Spearman-Brown reliability for the

whole test was 0.99, which indicated that the RAT was reliable.

Page 8: CHAPTER-6 RELIABILITY AND VALIDITYshodhganga.inflibnet.ac.in/bitstream/10603/40090/14/14_chapter6.pdf · that every item on each form is interchangeable (Garrett, 1981). Kuder-Richardson

125

6.3.3 Reliability by Rulon Formula

An alternate method for finding split-half reliability was developed by

Rulon (1939). It requires only the variance of the differences between each

person’s scores on the two half tests (SDd2) and the variance of total scores (SDx

2);

these two values are substituted in the following formula, which yields the

reliability of the whole test directly (Anastasi and Urbina, 2007):

rtt = 1- ( SDd2/ SDx

2)

Where

rtt = Reliability index of the test,

d = Difference of score for each item,

SDd = S.D. of the differences between each person’s scores on the two

halves-tests

SDx = S.D. of total scores

Here,

SDd = 1.97, SDx =13.73, N=371

= 0.98

Reliability coefficient for the test was 0.98, which shows the moderately

high value of reliability of the test.

6.3.4 Method of Rational Equivalence

Estimating reliability by this method, the test was administered over 371

students of grade 5th to 7th. Then proportion of students selecting right or wrong

response of each item was found out, also standard deviation of the test score was

Page 9: CHAPTER-6 RELIABILITY AND VALIDITYshodhganga.inflibnet.ac.in/bitstream/10603/40090/14/14_chapter6.pdf · that every item on each form is interchangeable (Garrett, 1981). Kuder-Richardson

126

found out. Reliability of the test was found out by KR-20 formula, the formula is

as under.

KR20 = rtt = _n_ SDt2 – Σpq

n-1 SDt2

KR21=rtt= n_ 1 - M(n-M) n-1 n (S.Dt)2

Where,

rtt = reliability coefficient of the whole test

n= number of items in the test

SDt= S.D of total scores on the test

p=proportion of person who pass each item

q=proportion of person who do not pass each item

M=Mean of total scores

Here,

n=60, S.D= 13.73, Σpq= 14.01, M= 31.30

KR20 =rtt = 0.94

KR21 =rtt = 0.94

If KR20=KR21 then the facility values of each items should be equal (1999,

Ambasana). Reliability coefficient calculated by KR20 and KR21 was 0.94 which

indicate that the facility value of each item was nearly equal.

6.3.5 Standard Error of Measurement

According to Garrett (1981), the standard error of measurement of scores

is a better way of expressing the reliability of a test, than the reliability coefficient,

as it takes into account the variability within the group as well as the self-

Page 10: CHAPTER-6 RELIABILITY AND VALIDITYshodhganga.inflibnet.ac.in/bitstream/10603/40090/14/14_chapter6.pdf · that every item on each form is interchangeable (Garrett, 1981). Kuder-Richardson

127

correlation of the test. The effects of variable or chance error in producing

divergences of test scores from their true values is given by the formula

____ SEM = S.D.√1-r2

Where, S.D. = Standard deviation of the test scores

r = Reliability coefficient

SEM was calculated by above formula for each reliability coefficient and shown in

table 6.3.

6.3.6 Standard Error of Correlation

Also reliability of the test can be expressed in terms of standard error of

correlation. Formula for estimating SER is as under.

SER= 1-r2

√N

Where, r= correlation coefficient

N= number of students

Standard error of correlation coefficients for reliability is found out and shown in

table 6.3.

6.3.7 Comprehensive View of Reliability

In table 6.3, each reliability coefficients (rtt) and standard error values have

been shown.

Page 11: CHAPTER-6 RELIABILITY AND VALIDITYshodhganga.inflibnet.ac.in/bitstream/10603/40090/14/14_chapter6.pdf · that every item on each form is interchangeable (Garrett, 1981). Kuder-Richardson

128

Table 6.3

Summary of Reliability

No. Method of

Reliability

Sample

(n)

Reliability

coefficient(rtt)

SEM SER

1 Test-Retest 107 0.72 8.69 0.046

2 Split-Half 371 0.99 1.94 0.001

3 Rulon formula 371 0.98 2.61 0.002

4 KR20 & KR21 371 0.94 4.68 0.006

From above table 6.3, we can see that the values of reliability coefficient

are moderately high while standard error of measurement ranges 1.94 to 8.69 and

standard error of correlation coefficient ranges 0.001 to 0.046. So it can be said

that the test is highly reliable.

6.4 Meaning of Validity

Almost we know the merit of a psychological test is determined first by its

reliability but then ultimately by its validity. The validity of a test concerns what

the test measures and how well it does so. It tells us what can be inferred from test

scores (Anastasi and Urbina, 2007). A test is valid to the extent that inferences

made from it are appropriate, meaningful and useful (Gregory,

2005).Traditionally, the different ways of accumulating validity evidence are

known and discussed here in the context of present study.

6.5 Methods of Validity

Gregory (2005); Anastasi, and Urbina (2007) have discuses various forms

of validity, which are

1. Face Validity

Page 12: CHAPTER-6 RELIABILITY AND VALIDITYshodhganga.inflibnet.ac.in/bitstream/10603/40090/14/14_chapter6.pdf · that every item on each form is interchangeable (Garrett, 1981). Kuder-Richardson

129

2. Content Validity

2. Criterion Validity

3. Construct Validity

4. Factor Validity

6.5.1 Face Validity

It is not validity in the technical sense; it refers, not to what the test

actually measures, but to what it appears superficially to measure. By the look out

of the test, examinee can decide what it appears to measure.

6.5.2 Content Validity

Content related evidence is the extent to which the content of the test is

judged to be representative of some appropriate universe or domain of content. In

establishing content validity, experts typically examine the test items and indicate

whether the items measure predetermined criteria, objectives or content.

6.5.3 Criterion Validity

Whenever test scores are to be used to predict future performance or to

estimate current performance on some valued measure other than the test itself

(called a criteria), we are especially concerned with criterion-related evidence.

There are two different approaches to validity evidence subsumed under the

criterion related validity, (1) concurrent validity and (2) predictive validity

(Gregory, 2005).

Page 13: CHAPTER-6 RELIABILITY AND VALIDITYshodhganga.inflibnet.ac.in/bitstream/10603/40090/14/14_chapter6.pdf · that every item on each form is interchangeable (Garrett, 1981). Kuder-Richardson

130

(1) Concurrent Validity

It is concern with relation of the test performance to some other current

measure of performance. We obtain both measures (test performance as a score

and other current criteria) at approximately the same time and correlate the results.

(2) Predictive Validity

In predicative validity, the correlation is found out between the present test

performance and some other future measures of performance. The difference

between concurrent and predictive validity is on basis of time relations between

criterion and test.

6.5.4 Construct Validity

A construct is a theoretical, intangible quality or trait in which individuals

differ (Messick, 1995; Cited in Psychological Testing, 2005). Construct validity is

appropriate for tests of psychological traits or qualities and requires both logical

arguments and empirical evidences.

6.5.5 Factor Validity

Factor analysis is a specialized statistical technique that is particularly

useful for investigating construct validity. The purpose of factor analysis is to

identify the minimum number of determines (factors) required to account for the

inter correlations among a battery of tests (Gregory, 2005).

Factor analysis is a method for determining the number and nature of the

underlying variables among larger numbers of measures. It may also call a method

for extracting common factor variances from sets of measures (Kerlinger, 2007;

and C.R.Aldous, 2001). There are two forms or methods of factor analysis,

Page 14: CHAPTER-6 RELIABILITY AND VALIDITYshodhganga.inflibnet.ac.in/bitstream/10603/40090/14/14_chapter6.pdf · that every item on each form is interchangeable (Garrett, 1981). Kuder-Richardson

131

confirmatory and exploratory (principal components method) factor analysis. In

confirmatory factor analysis, the purpose is to confirm that test scores and

variables fit a certain pattern predicted by a theory. In exploratory factor analysis,

the purpose is to find out the factors or minimize the number of factors from sets

of measures.

6.6 Validity of the Present Test

Content validity, criterion validity, factor validity and construct validity

were determined for the present test. To determine criterion validity (concurrent

validity) of the present test, correlation coefficient of this test with other tests has

been found out. Two type of factor analysis; exploratory and centroid method

were performed for the test scores.

6.6.1 Content validity

During procedure of test construction, the test was sent to the experts of

different field like, education, mathematics, research, psychology and primary

education. Components were selected on the basis of content analysis of

mathematics text books. Therefore, components selected for the test were related

to reasoning. Experts were asked to give their opinions about test items,

instructions and components. Modifications in the test were done in the context of

experts’ opinions. So, the content of the test was representative of reasoning in

mathematics.

Page 15: CHAPTER-6 RELIABILITY AND VALIDITYshodhganga.inflibnet.ac.in/bitstream/10603/40090/14/14_chapter6.pdf · that every item on each form is interchangeable (Garrett, 1981). Kuder-Richardson

132

6.6.2 Criterion validity

To determine the criterion validity, the correlation coefficient between

scores on RAT and other similar tests which measure comparable attributes has

been calculated. The correlation coefficient of the scores on RAT with the

following standardized tests scores has been calculated.

1. Dr. S.R. Patel’s Verbal Reasoning Ability Test

2. Dr. R.S. Patel’s Numerical Ability Test

3. Dr. Jyotiben Desai’s IQ test

4. Mathematics achievement (Score in preliminary exam)

To determine criterion validity, the above tests and the RAT was

administered over 108 students from Adarsh Vidyalaya, Visnagar and Sawala

Primary School. In this sample, there were 53 boys and 55 girls. Thirty six

students of each grade were selected.

(1) Correlation between RA and Verbal Reasoning Ability

Scatter diagram of scores on RAT and Dr. S.R.Patel’s Verbal Reasoning

Ability Test (VRAT) is shown in table 6.4.

Page 16: CHAPTER-6 RELIABILITY AND VALIDITYshodhganga.inflibnet.ac.in/bitstream/10603/40090/14/14_chapter6.pdf · that every item on each form is interchangeable (Garrett, 1981). Kuder-Richardson

133

Table 6.4

Scatter Diagram of Scores on RAT and Verbal Reasoning Ability Test

VRAT

x

y

RAT

9-15

16-22

23-29

30-36

37-43

44-50

51-57

58-64

65-71

72-78

fy

51-55 1(10) 5(25) 6

46-50 1(12) 1(16) 2

41-45 1(0) 2(6) 4(4) 1(12) 2(15) 10

36-40 1(-8) 1(-4) 1(-2) 1(0) 5(2) 3(4) 5(6) 1(10) 18

31-35 11(-2) 1(-1) 1(0) 4(1) 4(3) 1(4) 13

26-30 1(0) 1(0) 11(0) 1(0) 1(0) 1(0) 7

21-25 1(4) 11(3) 4(2) 3(1) 2(0) 3(-1) 2(-2) 1(-3) 18

16-20 1(8) 1(6) 4(4) 5(2) 2(0) 2(-2) 1(-4) 1(-6) 1(-8) 18

11-15 11(9) 2(6) 1(3) 2(0) 1(-3) 3(6) 2(-9) 13

6-10 1(16) 1(4) 1(-4) 3

fx 5 6 15 13 10 17 12 18 4 8 108

Cx =0.6481 Cy = 0.111 Σx’y’= 355 6x=2.46 SER=0.067

Cx2 = 0.42 Cy2 =0.0123 N=108 6y=2.38

The correlation coefficient between the scores on RAT and Verbal

Reasoning Ability Test was 0.55, which high than expected value (0.24) and

significant at 0.01 level. Standard error of estimation was (SER) 0.067, which was

very low, so the RA and verbal Reasoning Ability was correlated.

(2) Correlation between RA and Numerical Ability

Scatter diagram of scores on RAT and Dr. R.S.Patel’s Numerical Ability

Test (NAT) is shown in table 6.5.

Page 17: CHAPTER-6 RELIABILITY AND VALIDITYshodhganga.inflibnet.ac.in/bitstream/10603/40090/14/14_chapter6.pdf · that every item on each form is interchangeable (Garrett, 1981). Kuder-Richardson

134

Table 6.5

Scatter Diagram of Scores on RAT and Numerical Ability Test

x

NAT

y

RAT

4-6

7-9

10-12

13-15

16-18

19-21

22-24

fy

51-55 1(5) 2(10) 3(15) 6

46-50 1(-8) 1(4) 2

41-45 1(-6) 2(-3) 1(0) 2(3) 2(6) 1(9) 9

36-40 11(-6) 3(-2) 6(2) 6(4) 1(6) 18

31-35 1(-3) 5(-2) 3(-1) 2(0) 2(1) 13

26-30 3(0) 3(0) 1(0) 1(0) 8

21-25 3(3) 1(2) 9(1) 3(0) 2(-1) 18

16-20 5(6) 2(4) 7(2) 4(0) 18

11-15 5(6) 6(3) 1(0) 12

6-10 1(8) 2(4) 1(0) 4

fx 14 16 35 13 15 10 5 108

Cx = -0.5463 Cy =0.07407 Σx’y’= 225 6x=1.65

Cx2

= 0.2984 Cy2 =0.0055 N=108 6y=2.375 SER=0.068

The correlation coefficient between the scores on RAT and Numerical

Ability Test was 0.54, which high than expected value (0.24) and significant at

0.01 level. Standard error of estimation was (SER) 0.068, which very low, so that

RA and numerical ability is correlated.

(3) Correlation between RA and IQ

Scatter diagram of scores on RAT and Dr. Jyotiben Desai’s IQ Test is shown

in table 6.6.

Page 18: CHAPTER-6 RELIABILITY AND VALIDITYshodhganga.inflibnet.ac.in/bitstream/10603/40090/14/14_chapter6.pdf · that every item on each form is interchangeable (Garrett, 1981). Kuder-Richardson

135

Table 6.6

Scatter Diagram of Scores on RAT and IQ Test

IQ

x

y

RAT

5-11

12-

18

19-

25

26-

32

33-

39

40-

46

47-

53

54-

60

61-

67

fy

51-55 1(-5) 2(15) 3(20) 6

46-50 1(-4) 1(12) 2

41-45 2(0) 2(3) 1(6) 2(9) 2(12) 9

36-40 1(-4) 3(0) 9(2) 4(4) 1(6) 18

31-35 3(-1) 2(0) 4(1) 1(2) 1(3) 2(4) 13

26-30 2(0) 3(0) 1(0) 1(0) 1(0) 8

21-25 5(3) 4(2) 5(1) 2(0) 1(-1) 1(-2) 18

16-20 1(8) 5(6) 5(4) 1(2) 2(0) 2(-2) 1(-4) 1(-8) 18

11-15 2(12) 2(6) 5(3) 2(-3) 1(-9) 12

6-10 1(12) 2(8) 1(4) 4

fx 3 13 17 18 11 21 8 9 8 108

Cx =0.0648 Cy =0.074 Σx’y’=334 6x=2.178 SER=0.061

Cx2 = 0.0042 Cy

2 =0.0055 N=108 6y=2.375

The correlation coefficient between the scores on RAT and IQ test was

0.60, which was grater then expected value (0.24) and significant at 0.01 level.

Standard error of estimation was (SER) 0.061, which was very low, so that RA and

IQ are correlated.

(4) Correlation between RA and Mathematics Achievement

To estimate correlation between RAT scores and mathematics

achievement scores, scores obtained in preliminary examination in mathematics

subject was considered as a mathematics achievement. The mathematics scores

were converted in to T-scores, than the correlation coefficient between these T-

scores of mathematics achievement and scores on RAT was calculated. The

Page 19: CHAPTER-6 RELIABILITY AND VALIDITYshodhganga.inflibnet.ac.in/bitstream/10603/40090/14/14_chapter6.pdf · that every item on each form is interchangeable (Garrett, 1981). Kuder-Richardson

136

scatter diagram of scores on RAT and T-scores of mathematics achievement is

plotted in table 6.7.

Table 6.7

Scatter Diagram of Scores on RAT and T-score of Mathematics Achievement

x T-

Scores

y RAT

29-32

33-36

37-40

41-44

45-48

49-52

53-56

57-60

61-64

65-68

fy

51-55 1(15) 2(20) 3(25) 6

46-50 1(16) 1(20) 2

41-45 1(6) 4(9) 4(12) 9

36-40 2(-2) 2(0) 3(2) 5(4) 3(6) 1(8) 2(10) 18

31-35 3(-1) 1(0) 2(1) 4(2) 2(4) 1(5) 13

26-30 1(0) 3(0) 2(0) 2(0) 8

21-25 1(2) 2(1) 3(0) 7(-1) 2(-2) 1(-3) 2(-4) 18

16-20 2(8) 1(6) 4(4) 4(2) 4(0) 1(-2) 2(-4) 18

11-15 1(12) 4(9) 2(6) 3(0) 1(-3) 1(-6) 12

6-10 1(16) 1(8) 1(4) 1(0) 4

fx 4 5 8 13 17 16 17 9 12 7 108

Cx =0.9259 Cy =0.07407 Σx’y’=441 6x=2.344 SER=0.046

Cx2 = 0.8573 Cy

2 =0.0055 N=108 6y=2.375

The correlation coefficient between the scores on RAT and T-scores of

mathematics achievement in preliminary exam was 0.72, which was high than

expected value (0.24) and significant at 0.01 level. Standard error of estimation

was (SER) 0.046, which is very low, so that RA and mathematics achievement is

correlated.

Page 20: CHAPTER-6 RELIABILITY AND VALIDITYshodhganga.inflibnet.ac.in/bitstream/10603/40090/14/14_chapter6.pdf · that every item on each form is interchangeable (Garrett, 1981). Kuder-Richardson

137

6.6.3 Factorial Validity

The factor validity of the scores on RAT was found out by principal

component method (exploratory factor analysis) and Thurstone’s Centroid

method. Randomly 371 students were selected as a sample for factor analysis.

6.6.3.1 Exploratory Factor Analysis

The exploratory factor analysis was done with the help of SPSS, 17.0 trial

version. The analysis can be done in four different stages (George and Mallery,

2006), which are

(1) To confirm the appropriateness of the data for factor model to be used

(2) Extraction of the factors

(3) Rotation

(4) Calculation of factor scores

(1) Appropriateness of Factor Analysis Model

The question normally arise that, is the data appropriate for the factor

analysis model? To answer this question researcher has studied the descriptive

statistics. Most of correlations between items were found positive (greater than

0.5) and significant at 0.05 and above level. The Kaiser-Meyer-Olkin measure of

sampling adequacy was 0.899 which was near to 1. Bartlett’s test of sphericity

was 7903.317, which was significant at 0.000 levels. Residual were computed

between observed and reproduced correlations. There were 359 (20%) non

redundant residuals with absolute values > 0.05. All these results show that the

factor analysis model was appropriate for the data.

Page 21: CHAPTER-6 RELIABILITY AND VALIDITYshodhganga.inflibnet.ac.in/bitstream/10603/40090/14/14_chapter6.pdf · that every item on each form is interchangeable (Garrett, 1981). Kuder-Richardson

138

(2) Extraction of the Factors

The common factors were extracted by factor analysis of the scores on

RAT. The statistics of factors extraction is mentioned in table 6.8.

Table 6.8

Factor Extraction and Statistics

Item

No.

Communality

Factor

Eigen

values

Percentage

of variance

Cumulative

percentage of

variance

1 0.354 1 13.865 23.108 23.108

2 0.353 2 2.422 4.037 27.145

3 0.292 3 2.311 3.851 30.996

4 0.479 4 1.848 3.080 34.076

5 0.379 5 1.655 2.759 36.835

6 0.482 6 1.614 2.690 39.525

7 0.561 7 1.467 2.445 41.970

8 0.364 8 1.417 2.362 44.332

9 0.587 9 1.311 2.185 46.517

10 0.510 10 1.261 2.101 48.618

11 0.395 11 1.234 2.056 50.674

12 0.458 12 1.192 1.986 52.660

13 0.556 13 1.151 1.918 54.579

14 0.541 14 1.121 1.868 56.447

15 0.480 15 1.078 1.797 58.245

16 0.497 16 1.021 1.702 59.947

17 0.552 0.998

18 0.350 0.978

19 0.382 0.934

20 0.492 0.908

21 0.483 0.877

22 0.646 0.845

23 0.561 0.835

24 0.560 0.789

25 0.593 0.784

26 0.569 0.761

27 0.617 0.752

28 0.559 0.720

Page 22: CHAPTER-6 RELIABILITY AND VALIDITYshodhganga.inflibnet.ac.in/bitstream/10603/40090/14/14_chapter6.pdf · that every item on each form is interchangeable (Garrett, 1981). Kuder-Richardson

139

Item

No.

Communality

Factor

Eigen

values

Percentage

of variance

Cumulative

percentage of

variance

29 0.357 0.708

30 0.579 0.685

31 0.531 0.613

32 0.539 0.600

33 0.439 0.595

34 0.543 0.589

35 0.524 0.577

36 0.572 0.568

37 0.521 0.557

38 0.398 0.548

39 0.554 0.502

40 0.536 0.488

41 0.553 0.467

42 0.401 0.454

43 0.371 0439

44 0.454 0.421

45 0.382 0.405

46 0.509 0.392

47 0.524 0.386

48 0.479 0.378

49 0.401 0.362

50 0.381 0.351

51 0.535 0.335

52 0.467 0.328

53 0.390 0.321

54 0.438 0.300

55 0.552 0.278

56 0.616 0.274

57 0.452 0.251

58 0.390 0.243

59 0.443 0.234

60 0.403 0.203

Above table 6.8 show that, there were 16 factors extracted and mean and

standard deviation of communality was 0.48 and 0.08 respectively. The computer

Page 23: CHAPTER-6 RELIABILITY AND VALIDITYshodhganga.inflibnet.ac.in/bitstream/10603/40090/14/14_chapter6.pdf · that every item on each form is interchangeable (Garrett, 1981). Kuder-Richardson

140

programmed by default, extract the factor which has the Eigen value is greater

than one. The Eigen value of first factor and sixteenth factor was 13.865, 1.021

respectively. It suggests that the first factor is strong than other. Out of total

variance, 59% variance was correlated with 16 common factors. The Scree plot

for factors and Eigen values show in graph 6.1. Here extracted factors were

approximately less than one third of the total items.

Scree Plot

Component Number

5855524946434037343128252219161310741

Eige

nval

ue

16

14

12

10

8

6

4

2

0

Graph 6.1

Scree Plot

First factor’s Eigen value was high than other 15 factors, so that the graph

become elbow type which indicate the first factor was strong.

(3) Rotation

The factor analysis was followed by a varimax rotation to achieve simple

structure, assisting the resolution of factors. If the items were correlated with more

than one factor before rotation, these items were correlated with only one factor

after rotation. Before rotation, except item number 1, 2, 4, 48 and 57, all other 55

Page 24: CHAPTER-6 RELIABILITY AND VALIDITYshodhganga.inflibnet.ac.in/bitstream/10603/40090/14/14_chapter6.pdf · that every item on each form is interchangeable (Garrett, 1981). Kuder-Richardson

141

items out of 60 items were correlated with one factor, which was Reasoning

Ability in mathematics.

The result of factor analysis indicates that the Reasoning Ability test in

mathematics is valid.

6.6.3.2 Thurstone’s Centroid Method

Also, factor analysis done by centroid method of Thurstone. The RAT was

constructed on the basis of different five components. The final form of the test

contained 60 items in which, 12 items depend on each component. Randomly 371

students were selected for find out correlation between components. The

correlation matrix with first factor loading is shown in table 6.9.

Table 6.9

Correlation Matrix and First Factor Loading

Component A B C D E Verification

(Total)

A (0.711) 0.702 0.711 0.554 0.620 3.298

B 0.702 (0702) 0.648 0.573 0.662 3.287

C 0.711 0.648 (0.711) 0.687 0.710 3.467

D 0.554 0.573 0.687 (0.687) 0.591 3.092

E 0.620 0.662 0.710 0.591 (0.710) 3.293

E 3.298 3.287 3.467 3.092 3.293 16.437=T1

a1=mE 0.815 0.812 0.856 0.764 0.813 __ √T = 4.054

m=0.247 Σa1= 4.06

From table 6.9, first factor loading was calculated on the basis of

correlation coefficient between the five components. On the basis of first factor

loading, residual correlation and second order factor loading was calculated.

Residual correlation matrix and second order factor loading is shown in table 6.10.

Page 25: CHAPTER-6 RELIABILITY AND VALIDITYshodhganga.inflibnet.ac.in/bitstream/10603/40090/14/14_chapter6.pdf · that every item on each form is interchangeable (Garrett, 1981). Kuder-Richardson

142

Table 6.10

Residual Correlation Matrix and Second Order Factor Loading

0.815 0.812 0.856 0.764 0.813 a1 →

↓ Component A B C D E

Total

0.815 A 0.069

(0.047)

0.40 0.013 -0.069 -0.043 -0.012

0.812 B 0.040 0.047

(0.043)

-0.047 -0.047 0.002 -0.009

0.856 C 0.013 -0.047 0.047

(-0.022)

0.033 0.014 -0.009

0.764 D -0.069 -0.047 0.033 0.069

(0.103)

-0.030 -0.01

0.813 E -0.043 0.002 0.014 -0.030 0.043

(0.049)

-0.008

Σ0 -0.012 -0.009 -0.009 -0.01 -0.008 -0.048

Σj2 -0.059 -0.052 0.013 -0.113 -0.057 -0.268

Column D 0.079 0.042 -0.053 0.113 0.003 0.184

Column C 0.053 0.136 0.053 0.179 -0.025 0.396

Column E 0.139 0.132 0.081 0.119 0.025 0.496

tj2

a2

0.208

0.237

0.179

0.204

0.128

0.146

0.188

0.214

0.068

0.077

0.771=T2

__

√T2=0.8781

Second order factor scores (a2), factor variance and communality was

manually calculated for each component, details are in table-6.11.

Page 26: CHAPTER-6 RELIABILITY AND VALIDITYshodhganga.inflibnet.ac.in/bitstream/10603/40090/14/14_chapter6.pdf · that every item on each form is interchangeable (Garrett, 1981). Kuder-Richardson

143

Table 6.11

Centroid-Factor Matrix

Test

(Component)

Factor Scores Factor Variance Communality

a1 a2 a12 a2

2 h2

A 0.815 0.237 0.664 0.056 0.720

B 0.812 0.204 0.659 0.042 0.701

C 0.856 -0.146 0.733 0.021 0.754

D 0.764 -0.214 0.584 0.046 0.630

E 0.813 -0.077 0.661 0.006 0.667

Total 3.301

95.07 %

0.171

4.93 %

3.472

100 %

As shown in table 6.11, the second order scores of test (component) C, D

and E were negative due to their rotation. We can see that, 95.07 % of total

variance was related to first order factor score which was Reasoning Ability in

mathematics. The results of factor analysis (Exploratory and Centroid method)

indicate that the RAT measures only Reasoning Ability in mathematics, so that

the test is highly valid.

6.6.4 Construct Validity

Construct related evidence is important for instruments or tests that assess

a trait or theory that cannot be measured directly, such as when the purpose of the

instrument is to measure an unobservable trait. Many psychometric theorists

regard construct validity as the unifying concept for all types of validity evidence

(Cronbach, 1988; Guion, 1980; Messick, 1995; in Psychological Testing, 2005).

In the present study, construct validity was found out in following

categories:

Page 27: CHAPTER-6 RELIABILITY AND VALIDITYshodhganga.inflibnet.ac.in/bitstream/10603/40090/14/14_chapter6.pdf · that every item on each form is interchangeable (Garrett, 1981). Kuder-Richardson

144

(1) Test Homogeneity

The RAT measures Reasoning Ability in mathematics (single construct),

so that the test items and components must be homogeneous. During test

development, all the items selected in the test were internally correlated with the

whole test and significant at 0.01 level, also factor analysis indicate that the test

measures only Reasoning Ability, also internal consistency reliability of the test

was high. So, the items of RAT are homogeneous.

(2) Developmental Change

Manny constructs can be assumed to show regular age-graded changes

from early childhood to mature adulthood and perhaps beyond (Gregory, 2005).

The development theory of Piaget indicates that the level of reasoning is increases

with age and experiences (Four stage of development). In the present study, the

Reasoning Ability of 5th to 7th grade students was measured; Reasoning Ability of

5th grade students was found lower than 6th grade students and 6th grade students

were lower than 7th grade. These differences indicate that the Reasoning Ability

was found to be increased with age and experiences and knowledge of

mathematics which show in graph 6.2.

Page 28: CHAPTER-6 RELIABILITY AND VALIDITYshodhganga.inflibnet.ac.in/bitstream/10603/40090/14/14_chapter6.pdf · that every item on each form is interchangeable (Garrett, 1981). Kuder-Richardson

145

0

20

40

60

80

100

120

140

160

180

200

3 8 13 18 23 28 33 38 43 48 53 58 63 68

Midpoint

Freq

uenc

y

5th Grade6th Grade7th Grade

Scale:On X-axis 0.9 cm=5 Unit

On Y-axis 0.7 cm=20 UnitUnit

X

Y

Graph-6.2

Comparison of Frequency Curves

It is quite obvious from the study of the graphical representation that the

frequency curve of the 7th grade students is significantly moved towards right,

while the frequency curve of the 5th grade students was significantly moved

toward left. The curve of the 6th grade students is in between the curve of 5th and

7th grade students.

(3) Correlation of the RAT Scores with Other Tests Scores

The correlation coefficients between the scores on RAT and Verbal

Reasoning Ability Test scores, Numerical Ability Test scores, IQ test scores and

T-scores of Mathematics Achievement in preliminary exam were found out. All

the correlation coefficient was high and significant at 0.01 levels. Also the values

of standard error of estimation and standard error of measurement of correlation

coefficient were found very low. These results indicate that the scores on RAT are

correlated with other similar tests.

Page 29: CHAPTER-6 RELIABILITY AND VALIDITYshodhganga.inflibnet.ac.in/bitstream/10603/40090/14/14_chapter6.pdf · that every item on each form is interchangeable (Garrett, 1981). Kuder-Richardson

146

(4) Factor analysis

The factor analysis was performed on the test scores, the results obtained

by exploratory factor analysis and Centroid method of Thurstone, indicated that

there was a single factor (Reasoning Ability) in RAT.

6.7 Conclusion

In this chapter, reliability and validity of the present test has been

discussed. The final form of the test was ready for final data collection. The

results show a satisfactory value of the reliability and validity of the present test.

The next chapter presents the analysis of the data.