characteristics of good test validity: it refers to the appropriateness or truthfulness of a tool. a...

Characteristics of Good Test

• Validity: • It refers to the appropriateness or

truthfulness of a tool. A tool is valid if it measures what it is supposed to measure.

• Reliability:• It refers to the trustworthiness or consistency

of measurement of a tool , whatever it measures.

• Objectivity:• Refers to the absence of subjective bias in the

interpretation of responses obtained by a tool.

• Economy:• The test should be simple and administered in

a short time , saving money and time.

• Practicability or Feasibility:

• The test should not require special infra- structure like dark room, one way see-through room etc.

Decision to gather evidence

Decision to allocate resources

Content analysis and test blue print

Item writing

Item review 1

Planning item scoring

Production of trial tests

Trials

Item review 2

Amendment (revise/replace/discard)

More items needed?

No

Assembly of final tests

Trail Test

• It involves time and resources• Prepare content analysis and blue print• Review each item before trail testing

Content Analysis

• What is the area of curriculum is selected?• Are there significant sections in the content?• Are there significant subdivisions in the

content?• Which of the representative areas should

include ?

Blue Print

• Title• Fundamental purpose• The aspects of curriculum covered• For whom the test is constructed• Time ,date, who will administer and who will

score• Weightage for recall , comprehensive and

reflective thinking

Blue Print

content Recall comprehension Critical thinking Total

PROSE 2 ITEMS 2 ITEMS 5 ITEMS 9

POETRY 2 ITEMS 4 ITEMS 5 ITEMS 11

GRAMMER 2 ITEMS 4 ITEMS 12 ITEMS 18

CRITICISM 4 ITEMS 2 ITEMS --------------- 6

COMPARISIONS 4 ITEMS 2 ITEMS ------------------- 6

TOTAL 14 ITEMS 14 ITEMS 22 ITEMS 50

Item Specificationcontent Recall comprehension Critical thinking Total

PROSE Items 2,5 Items 12,23 Items 28 ,31,32,40 ,50

9

POETRY Items 6,10 Items 13,14,16,17

Items 33,36,37,38.39

11

GRAMMER Items 1,7 Items 18,19,20,21

Items 21,29,30,41,42,43,44,45,46,47,48,49

18

CRITICISM Items 3,4,8,9 Items 34,35 --------------- 6

COMPARISIONS Items 11,15,22,25

Items 26,27 ------------------- 6

TOTAL 14 ITEMS 14 ITEMS 22 ITEMS 50

Scoring Key

1 2 3 4 5 6 7 8 9 10

2 5 1 2 3 4 1 4 5 3

Item Revision-1

• The dependable inferences can be made about the choice of the content

• All important parts of curriculum is addressed• Achievement over the range is assessed

How to review?1. Is the item is clear in expression ?2. Are the items expressed in a simplest possible

language ?3. Are there unintended clues to correct answer?4. Is the format reasonably consistent?5. Is there a single, clearly correct answer for each

item ?6. Is the type of item appropriate to the information

required ?7. Are there enough items to provide adequate

coverage to behaviour to be assessed ?

Purpose of Trail Test• Establishes the difficulty of each item• Identify the distracters which do not appear

plausible.• Suggest number of items to be included in the final

test• Establishing the contribution of each item to the

discrimination between candidates who achieve low and high.

• Check the adequacy of the administration instructions to identify misconceptions held by the students through analysis of their responses.

Choosing a Sample

• Sample of 100 to 150 students of varied abilities may be selected

• Approximately male and female students are equal

• Judgment Sampling technique- Target group

Try out of the Test• The test to be administered on a representative

sample , chosen from the target population for whom the test is intended , and scored . This pilot study will be useful for the following :

• To identify the weak or defective item and to reveal needed improvements.

• To determine the difficulty level and discriminating power of each individual item in order that a selection of item may be made.

• To provide data needed to determine appropriate time limit for the final test.

• To standardize the instruction and procedures.

• To know how to organize the items.

• To decide the proper format.

Scoring of Trail Test

• Needs training• Not according to the scorers' judgment• Refer to scoring key• Mechanical scoring is recommended to

maintain accuracy

Scores in the MatrixItem GEET RAI RAJU RANI SURI POO RITA JOE CATH RUTH Total

1 1 1 1 1 1 0 1 0 0 1 7

2 1 0 0 1 0 1 0 0 0 0 3

3 1 1 1 1 1 1 1 1 0 0 8

4 1 1 1 0 1 0 1 1 1 0 7

5 1 1 1 1 1 1 1 1 1 1 10

6 1 1 0 0 1 1 0 0 1 0 5

7 1 1 1 1 0 1 0 1 0 0 6

8 1 0 1 0 0 0 0 1 0 0 3

9 1 0 0 0 0 1 0 0 0 0 2

10 1 1 1 1 1 0 1 0 0 0 6

Total 10 7 7 6 6 6 5 5 3 2 57

Arranging Pupil

• After scoring the test in the trial test , according to the total score value , individuals are placed in order from high to low .

Arranging Pupils' ScoresItem GEET RAI RAJU RANI SURI POO RITA JOE CATH RUT Total

5 1 1 1 1 1 1 1 1 1 1 10

3 1 1 1 1 1 1 1 1 0 0 8

1 1 1 1 1 1 0 1 0 0 1 7

4 1 1 1 0 1 0 1 1 1 0 7

7 1 1 1 1 0 1 0 1 0 0 6

10 1 1 1 1 1 0 1 0 0 0 6

6 1 1 0 0 1 1 0 0 1 0 5

2 1 0 0 1 0 1 0 0 0 0 3

8 1 0 1 0 0 0 0 1 0 0 3

9 1 0 0 0 0 1 0 0 0 0 2

Total 10 7 7 6 6 6 5 5 3 2 57

Indices of difficulty and discriminating power of items

• Top 27% constitutes the high achievers and the bottom 27% constitutes the low achieving group.

• The indices of discriminating power and difficulty level are computed for each item of the test using the following formulae.

Analysis of an Item

I 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

0 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1

• Discriminating power =Ph-Pl U• Difficulty level = (Ph + Pl )

U• Ph= the proportion of pupils in the high achieving

group who answered the items correctly.• Pl =the proportion of pupils in the low achieving

group who answered the items correctly.• U=Total number of pupils in both groups

Types of Discriminators

• Positive Discriminator

• Negative Discriminator

• Non Discriminator

Graphical Analysis of Scores

• Acceptable & may be acceptable correct answer response pattern.

• Non acceptable correct answer response pattern.

Criteria for Selection•Discriminating •Power •Difficulty •Level

•.4 & above •Excellent item •Between .4 & .6 •Average difficulty

•Between .4 &.3 •Good •Between .2 & .4 •Difficult item

•Between .2 &.3 •Average item •Between .6 & .8 •Easy item

•Between .2 & .1 •Requires improvement

•Between .8 & 1 •Very easy item

•Less than .1 •Item to be dropped

•Between 0 & .2 •Very difficult item

Is this a good item ?

• Compute the difficulty and discrimination indices for an item administered to 263 pupils where 74 pupils answered the item correctly, 32 pupils in upper group and 23 pupil in the lower group passed the item.• Is this a good item ?

Is this a good item ?

• Compute the difficulty and discrimination indices of a test item administered to 84 pupils if 52 test takers answered the item correctly, 20 in the upper group and 12 in the lower group. • Is this a good item ?

Selection of Items

• Based on the calculated values of item discrepancy and difficulty , appropriate items are chosen for the final form of the standardized test.

• Arranged the items in the increasing order of difficulty.

Assembly of the test in the final form

• Based upon discriminating power items are first chosen and among the so chosen items, items with proper difficulty level are finally selected for the final form.

• Care should be taken to see that at least 50% of the items are of average difficulty, 25% are easy , 20% difficult and 5% are very difficult.

• A detailed scoring scheme is also to be prepared

• so as to ensure objective evaluation of pupil’ responses.

• Appropriate instruction/procedure for administering the test has also to be developed and incorporated suitably in the test.

Advantages of Item Analysis

• Powerful technique to improve instruction.• Helpful for guidance.• Valid measures of instructional

objectives.• Gives clue to the nature of the

misunderstanding and suggests remediation.

Reliability

• Stability and trustworthiness is called reliability.

• It should be free from error.

• (E.G.) Standford Binet’s I.Q.• The score is a good estimate of the child’s

mental ability.

Methods of determining Reliability

• Four procedure for computing reliability coefficient.

• Test Retest method• Alternative or Parallel form• Split half technique• Rational Equivalence

Test Retest Method

• Repetition of the test is the most simplest method of determining agreement between two sets of scores.

• The test is given and repeated on the same group and the correlation computed between the first and second set of scores.

Defects in Test Retest method

• If the test is repeated immediately, many subjects will recall their first answer- tend to increase their scores.

• Practice and confidence induced by familiarity also affect scores.

• If the interval is longer ( six month) growth changes will effect the retest.

• Because of these defects test retest is generally less useful than are the other methods.

Alternative or Parallel form method

• When alternative or parallel forms of a test can be constructed , the correlation between form A and form B may be taken as a measure of the self correlation of the test.

• The alternative form method is satisfactory when sufficient time has intervened between the administration of the two forms to weaken or eliminate memory and practice effects.

• When form B of a test follows form A closely , scores on the second form of the test will often be increased because of familiarity.

• If such increases are approximately constant (3 to 5 points) the reliability coefficient of the test will not be affected, since the paired A and B scores maintain the same relative positions in the two distributions.

• In drawing up alternative test forms ,care must be exercised to match test materials for content, difficulty and form.

• When alternative forms are virtually identical , reliability will be too high otherwise reliability will be too low.

• An interval of at least two to four weeks should be allowed between administration of the test.

The split half method

• In this method the test is first divided into two equivalent haves and the correlation found for these half – tests .

• From the reliability of the half test the self correlation of the whole test is then estimated by the Spearman Brown Prophecy formula.

• The split half method is regarded by many as the best of the methods for measuring test reliability.

• Advantage:

• Advantage is the fact that all data for computing reliability are obtained upon one occasion. So that variations brought about by difference between the two testing situations are eliminated.

• How to divide ?

• Alternative Statements

• All the items are of equal difficulty

Method of Rational Equivalence

• This method represents an attempt to get an estimate of the reliability of a test free from the objections raised against the methods outlined above.

• Two forms of tests are equivalent when the items a A , b B ,c C etc are inter changeable and when the inter item correlations are the same for both forms.

Errors• Chance Error:

• Many psychological factors affect the reliability coefficient of a test – fluctuations in interest and attention shifts in emotional attitude and differential effects of memory and practice.

• The environmental factors such as distractions, noise , interruptions, scoring errors etc all these are called ‘chance error ’ or ‘error of measurement’

• The scores may go up or down from the true value.

• Constant Errors:

• Constant errors work in only one direction . Constant error raise or lower all of the scores on a test but doesn't affect the reliability coefficient.

• Such errors are easily be avoided than are chance errors by subtracting two points from a retest score to allow for practice.

Validity

• The validity of a test or of any measuring instrument , depends upon the fidelity with which it measures , what it purports to measure.

• A test is valid when the performances which it measures correspond to the same performances as otherwise independently measured or objectively defined.

Difference between Reliability and Validity

• Suppose that a clock is set forward 20 minutes , if the clock is a good time piece the time it tells will be reliable(consistent) but will not be valid as judged by ‘standard time’.

• Validity is a relative term.

• A test is valid for a particular purpose or in a particular situation it is not generally valid.

• Content Validity:

• This requires content analysis. Validity inferred by subject experts after going through the test items and giving their opinions to what extent the test items forms a fair representative sample of the universe of items that could be , form the content areas being tested.

• Construct Validity:

• This is the functional aspect of content validity.

• Suppose the test is to measure the creative writing of students , then the items should cover the creative expression only.

• A well known test on creative expression , as well as the newly constructed creative expression test both are administered to a group of students for whom it is meant.

• The coefficient correlation computed for the scores from the two tests is an index of validity of the newly constructed test.

• Predictive Validity:

• It is concerned with the relation of test scores to some measures on future performance.

• If scores on a spelling test help us to differentiate between pupils who will succeed and pupil who fail in stenography course, then we can infer that the spelling test has predictive validity as far as stenography is concerned.

• This type of validity is mainly useful in evaluating aptitude tests.

Relations of Validity and Reliability

• They differ to different aspects for test efficiency.

• A reliable test is theoretically valid ,but may be practically invalid , as judged by its correlations with various independent criteria.

• A highly valid test cannot be unreliable since its correlation with a criterion is limited by its own index of reliability.

Want to have a best choice then

ANALYSE AND CHOOSE

characteristics of good test validity: it refers to the appropriateness or truthfulness of a tool. a...

Documents

number of items

6comparisions4 items2

type of item appropriate

resources content analysis

blue printreview

short time

area of curriculum

female students