    1. Validity

    2 . Reliability

    3 . Objectivity

    4 . Standard conditions of administration & scoring

    5 . Fairness

    6 . Standards for Interpretation

    1. Test validity:

    Test validity is the accuracy with which atest measures the trait it claims to measure.

    Validity is specific, that is, a test may bevalid for one purpose and not other purposes.

    3 kinds of validity:

    - content validity,- criterion-related validity,- construct validity

    Con te n t Validity

    A test has c on te n t validity if it adequately samplesbehavior that has been the goal of instruction

    established when subject-matter experts andexperienced teachers agree on the contentcovered

    content covered on the test must be consistentwith its domain of objectives .

    A table of specifications is needed to provideexcellent content evidence

    Cr ite r i on-r elated validity A test is said to have criterion-related validity if its resultsparallel some other external criteria.

    C riterion-related evidence is gathered by looking atrelationships between scores on the test and other sets of scores.

    - predictive evidence of a test's validity measures its abilityto predict future behaviour

    - concurrent evidence of validity measures the samedomain as another test

    - predictive and concurrent evidence are also calledconvergent validity

    - discriminant evidence existed when tests show a lack of relationship with some variables

    Con st ru c t -r elated validity C onstruct validity refers to the extent to which the testmeasures the "right" psychological constructs

    Intelligence, self-esteem and creativity are examples of suchpsychological traits

    C onstruct validity is determine by demonstrating that the itemswithin a measure are inter-related and therefore measure asingle construct.

    Inter-item correlation and factor analysis are often used todemonstrate relationships among the items.

    A nother approach is to demonstrate that the test behaves asone would expect a measure of the construct to behave

    F a c t or s I nf l u e n c i ng Validity

    1. Unclear directions .

    2 . Reading vocabulary and sentence structure too difficult .

    3 . Ambiguous statements .

    4 . Inadequate time limits .

    5 . Inappropriate level of difficulty of the test items .

    6 . Poorly constructed test items that unintentionally provide

    clues to the answer .

    7. Test items inappropriate for the outcomes being measured .

    8. Test too short .

    9. Improper arrangement of items .

    1 0 . Identifiable pattern of answers .

    2 . TEST RELIABILITY R eliability is defined as:

    - the extent to which the measurements resulting from a test arethe result of characteristics of those being measured .

    - the degree to which test scores for a grou p of test takers areconsistent over re peated a pp lications of a measurement procedure and hence are inferred to be de pendable and re peatable for an individual test taker

    (Berkowitz, Wolkowitz, Fitch, and Ko priva, 2000).

    - an indicator of the absence of random error when the test isadministered .

    Thus, reliability is a joint characteristic of a test and examinee group, not just acharacteristic of a test.

    R reliability of any one test varies from group to group.

    M eas ur es of Reliability

    Test -r etest r eliability.

    - reliability coefficient is obtained byadministering the same test twice andcorrelating the scores .

    - an excellent measure of score consistencyas one is directly measuring consistencyfrom administration to administration .

    Test -r etest r eliability.

    - problems and limitations .

    - requires two administrations of the same test with thesame group of individuals.

    - is expensive and not a good use of peoples time.

    - If short time interval, people may still remember someof the question and their responses.

    - If time interval is long, results are confounded withlearning and maturation

    Split -h al f Reliability

    C oefficient is obtained by dividing a test into halves, correlatingthe scores on each half, and then correcting for length (longer tests tend to be more reliable).

    The split can be based on:

    odd versus even numbered items, randomly selecting items,or manually balancing content and difficulty.

    A dvantage: only requires a single test administration.

    Weakness: - the resultant coefficient will vary as afunction of how the test was split.

    - not appropriate on tests where speed is a factor

    I n te rn al Con siste n c y

    Internal consistency focuses on the degree to which theindividual items are correlated with each other and is thusoften called homogeneity .

    The Coefficient is determined by

    - Cronbachs alpha,- Kuder- Richardson Formula 20 (KR-20)- Kuder-Richardson Formula 2 1 (KR-2 1 )

    The advantages: only require one test administration andthey do not depend on a particular

    split of itemsThe disadvantage: They are most applicable when the test

    measures a single skill area

    Alte rn ate -form Reliability

    Most standardized tests provide equivalent formsthat can be used interchangeably .

    These alternative forms are typically matched interms of content and difficulty .

    Scores on pairs of alternative forms for the sameexaminees are correlated to provide a measure of consistency or reliability .

    H ow H i gh S hou ld Reliability Be?

    For tests used for special education placement,high school graduation and certification, theinternal consistency reliability needs to be quitehigh - at least above .9 0, preferably above .9 5 .

    For classroom test, a reliability coefficient of .50or .60 may suffice .

    I m p ro vi ng Test Reliability

    Measurement error is reduced by:

    - writing items clearly,

    - making the instructions easily understood- adhering to proper test administration- consistent scoring- longer tests

    3 . Obje c tivity

    It refers to the accuracy of the examiner inmarking the candidates answers .

    A test is said to have the characteristic of objectivity if the examiner gives the same marksto the same answers according to the markingscheme determined earlier .

    Objectivity can be improved by using an analyticmarking scheme .

    Another way is by having a coordination meetingfor all examiners by marking a few samples of thecandidates answer scripts .

    4 . Sta n da r d c on diti on s of ad m i n ist r ati on a n d s c or i ng

    It refers to the standard procedures for the implementationof the testing process

    Was the test handled efficiently without confusion or disturbance that could interfere with effective performance?

    Were all examinees on an equal footing as far as prior knowledge of the nature of the examination?

    Did they have enough prior knowledge to be able to prepareproperly for it?

    Was cheating prevented?

    Were physical conditions of light, heat, and freedom of movement satisfactory?

    A test is fair to students if it emphasizes theknowledge, understanding, and abilities thatwere emphasized in the actual teaching of the course.

    Emphasis on various aspects of the courseas previously conveyed to students such astime allocation, reading outlines, and lists of course objectives should be adhered

    6 . Standards for Interpretation

    It refers to the assessability of the results to beinterpreted

    Evaluation results based from the interpretation willprovide correct depiction on the candidates strengthand weaknesses, suitability of teaching-learningstrategy and so on.

    A chievable if the data from a test result can be easilycollected and converted into statistical figures andexpress in the form of table and graph.

