examining the untestable assumptions of the chained linear linking for livingston score adjustment...

Upload: chiradzulu

Post on 04-Jun-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    1/117

    EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEARLINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO

    THE 2005 MSCE MATHEMATICS PAPER 2.

    M.Ed (Testing, Measurement and Evaluation) Thesis

    ByCHIFUNDO STEVEN AZIZI

    BSc (Ed) Mzuzu University

    Submitted to the Department of Educational Foundations, Faculty of Education,

    in partial fulfilment of the requirements for the degree of

    Master of Education (Testing, Measurement and Evaluation)

    University of MalawiChancellor College

    June, 2009

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    2/117

    DECLARATION

    I the undersigned hereby declare that this thesis is my own original work which has not

    been submitted to any other institution for similar purposes. Where other peoples work

    has been used acknowledgements have been made.

    ____________________________________

    Full Legal Name

    _____________________________________

    Signature

    _____________________________________

    Date

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    3/117

    Certificate of Approval

    The undersigned certify that this thesis represents the students own work and effort andhas been submitted with our approval.

    Signature: ____________________________Date:__________________________

    M. Kazima PhD (Senior Lecturer)

    Main Supervisor

    Signature: ____________________________Date:__________________________

    L. Kazembe PhD (Senior Lecturer)

    Member, Supervisory Committee

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    4/117

    iv

    To the memory of my late father, Charles Frank Azizi and late brother, Charles Mike

    Azizi. May their souls rest in peace!

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    5/117

    v

    ACKNOWLEDGEMENTS

    I would like to thank Dr. M. Kazima and Dr. L. Kazembe, my main supervisor and

    co-supervisor respectively, for their many suggestions and constant support during this

    research. Without them this work would never have come into existence.

    I also wish to thank the headteachers of Blantyre, Henry Henderson Institute,

    Bangwe, Chiradzulu, and Njamba secondary schools for allowing me to collect data from

    their institutions. Again, my gratitude goes to the Executive Director of Malawi

    Examinations Board (MANEB) for authorising me to use 2005 MSCE mathematics

    examination paper 2. Big appreciations should also go to the students who participated in

    this study; you really helped me a lot.

    I am grateful to my mum, my fiance, brothers and sisters for their love and

    financial support. Special mention goes to Ministry of Education for funding my tuition

    fee. Finally, words alone can not express my gratitude to the Almighty God who made it

    possible for me to complete this study and for the infinite blessings.

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    6/117

    vi

    ABSTRACT

    MSCE mathematics paper 2, like many high-stakes test formats, includes a section

    of optional questions in addition to mandatory part. It has been argued that offering

    options and comparing final scores is often not fair to examinees especially to those that

    attempt most difficult questions from the optional part. Livingston (1988) proposed a way

    of adjusting essay score. This was later explained from the perspective of test equating by

    Allen, Holland, and Thayer (1993) and they concluded that the proposal made implicit

    assumptions of chained linear equating about the unobserved data. This study has tested

    the assumptions with application to 2005 MSCE mathematics examination paper 2 so as

    to determine if Livingston score adjustment could be used on this examination.

    The study used systematic sampling to obtain examinees from five purposively

    selected secondary schools. The 2005 MSCE mathematics paper 2 was administered to

    247 examinees in two parts, section A followed by section B. For section B, examinees

    were asked to first indicate their choice of three optional questions and were then

    instructed to answer all of the questions.

    The results were analysed using Root Mean Square Difference (RMSD) and Root

    Expected Mean Square Difference (REMSD) to quantify the differences between the

    subgroups linking functions of unobserved and observed data. It was found that group

    invariance did not hold across the entire subgroups that were involved. This means that

    Livingston score adjustment would not be possible on this examination. It is

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    7/117

    vii

    recommended that in order to minimize optional scores inequity, item writers need to

    use analytical methods to strictly match different levels of cognitive demands of topics by

    using MSCE mathematics performance level descriptors when constructing the optional

    items.

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    8/117

    viii

    TABLE OF CONTENTS

    Page

    DEDICATION. iv

    ACKNOWLEDGEMENTS.. v

    ABSTRACT.. vi

    LIST OF TABLES xiii

    LIST OF FIGURES.. xiv

    LIST OF ACRONYMS AND ABBREVIATIONS.. xv

    CHAPTER

    1 INTRODUCTION 1

    1.1 Background... 1

    1.1.1 Characteristic of the examination investigated 1

    1.1.2 Grade Awarding Process. 2

    1.1.3 Comparability of optional questions raw scores 2

    1.1.4 Livingstons raw score adjustment.. 4

    1.2 Statement of the Problem. 6

    1.2.1 Purpose of the Study 7

    1.2.2 Research Questions. 8

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    9/117

    ix

    1.2.3 Significance of the study 8

    1.3 Theoretical Framework 9

    1.4 Definition of terms 13

    2 LITERATURE REVIEW. 15

    2.1 Introduction. 15

    2.2 General information on optional questions... 15

    2.3 Advantages of optional questions. 17

    2.4 Problems of optional questions. 18

    2.4.1 The syllabus. 19

    2.4.2 The abilities of candidates 19

    2.5 Relationship between candidates question choice and getting

    high scores.. 21

    2.6 Linking and Equating 22

    2.7 Can we link or equate optional questions?........................................ 25

    2.8 What are the consequences of not linking/equating optional questions

    scores?............................................................................................... 28

    3 METHODOLOGY.. 30

    3.1 Introduction.. 30

    3.2 The Research Questions 30

    3.3 The Design 31

    3.3.1 Description of the Research 31

    3.3.2 Population 31

    3.3.3 Sampling.. 31

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    10/117

    x

    3.3.4 Instruments. 33

    3.3.5 The administration of the instruments and data gathering. 34

    3.4 Data Analysis ........ 34

    3.4.1 Extent of difficulty in optional questions 34

    3.4.2 Correlation of scores on section B and total scores of

    the section A. 35

    3.4.3 Establishing group invariance on linking/ equating functions

    of examinees that chose a concerned optional question and

    for those that selected other questions. 36

    3.5 Ethical Considerations. 39

    3.6 Validity and Reliability 40

    3.7 Delimitations and Limitations of the study. 41

    3.7.1 Delimitations.. 41

    3.7.2 Limitations. 41

    4 RESULTS AND DISCUSSION OF THE FINDINGS. 43

    4.1 Introduction.. 43

    4.2 To what extent do optional questions differ?................................... 43

    4.2.1 Preliminary analysis... 43

    4.2.2 Comparing p-values of section B............................................. 46

    4.3 How are scores on section A and section B with choice

    correlated?........................................................................................ 47

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    11/117

    xi

    4.4 Establishing group invariance on linking/ equating functions

    of examinees that chose a concerned optional question and

    for those that selected other questions . 48

    4.4.1 Linking functions that largely vary at lower tale of choice

    question scale... 49

    4.4.2 Linking functions that largely vary at upper tale of choice

    question scale.. 51

    4.4.3 Linking functions that largely vary at lower and second

    upper tale of choice question scale. 54

    4.4.4 Linking functions that largely vary at both lower and upper

    tales of choice question scale.. 57

    4.4.5 Linking functions that constantly vary across the entire

    score scale. 58

    5 CONCLUSIONS, IMPLICATIONS AND RECOMMENDATION.... 60

    5.1 Introduction.. 60

    5.2 Conclusions... 60

    5.2.1 The main findings of the literature review60

    5.2.2 The main findings of the empirical investigation..61

    5.3 Implications....63

    5.4 Recommendation... 64

    REFERENCES.. 66

    APPENDICES... 74

    A. Pairs of subgroups that chose particular questions and other questions. 75

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    12/117

    xii

    B. Pairs of subgroups that chose particular questions and other questions ... 77

    C. Section A of 2005 M.S.C.E. Examination paper 2 presented in this studyas paper I.... 81

    D. Section B of 2005 M.S.C.E. Examination paper 2 presented in this studyas paper II .. 85

    E. Answer sheet cover page for paper I.. 89F. Answer sheet cover page for paper II 90

    G. Original form of 2005 M.S.C.E. Examination mathematics paper 2 91H.

    Letter to Executive Director of Malawi National Examinations Board 97

    I. Letter from Executive Director of Malawi National Examinations Board 98J. Letter to secondary school headteacher 99K. Letter to Shirehighlands Education Division Manageress 100L. Letter to South West Education Division Manager... 101M.My introduction letter from Head of Department to secondary schools

    headteachers... 102

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    13/117

    xiii

    LIST OF TABLES

    Table Page

    4.1 Major content areas of section A.. 44

    4.2 Major content areas of section B..45

    4.3 P-values for questions in section A and section B without choice.... 46

    4.4 Pairs of subgroups that chose particular questions and other questions and

    their graphs are illustrated in appendix A.... 51

    4.5 Pairs of subgroups that chose particular questions and other questions and

    their graphs are illustrated in appendix B.... 53

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    14/117

    xiv

    LIST OF FIGURES

    Figure Page

    4.1 Equated scores on section A from optional question 7 that largely vary at

    lower tale of choice question scale ...................................... 49

    4.2 Equated scores on section A from optional question 8 that largely vary at

    higher tale of choice question scale . 50

    4.3 Equated scores on section A that largely vary at lower and second upper tale

    of choice question scale from different optional questions . 54

    4.4 Equated scores on section A that largely vary at both lower and upper tales

    of score scale of optional question 10 .. 57

    4.5 Equated scores on section A that vary constantly across the entire score scale

    of optional question 7 ......58

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    15/117

    xv

    LIST OF ACRONYMS AND ABBREVIATIONS

    AP Advanced Placement

    CSE Certificate of Secondary Education

    DTM Difference That Matters

    HHI Henry Henderson Institute

    IRT Item Response Theory

    MANEB Malawi National Examinations Board

    MSCE Malawi School Certificate of Education

    NEAT Non-Equivalent groups Anchor Test

    REMSD Root Expected Mean Square Difference

    RMSD Root Mean Square Difference

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    16/117

    1

    CHAPTER 1

    1.0 INTRODUCTION

    This chapter provides a general overview of the problem under study. It

    considers important concepts that dissect the problem into manageable components.

    The first section is the background, followed by statement of the problem, theoretical

    framework, and definition of terms is the last component.

    1.1 Background

    Malawi School Certificate of Education (MSCE) examination among other uses

    is for certification, selection for tertiary education, and employment decisions. There

    are several subjects examined at MSCE including mathematics. It is rated as one of

    the most significant subjects for entry into most programmes in Malawian

    universities. University of Malawi, in particular, prefers candidates with at least a

    credit in mathematics among other subjects to enrol in almost every programme that

    is offered.

    1.1.1 Characteristic of the examination investigated

    At MSCE examination, mathematics has two papers; paper 1 and paper 2. Paper

    1 asks candidates to attempt all 24 questions in 2 hours and, by design, it is easier

    than paper 2, although the two papers carry the same weight: each paper carries 100

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    17/117

    2

    marks. Paper 2 has two sections, A and B (see appendix G). Section A is

    compulsory, where candidates attempt six questions worth 55 marks in total. In

    section B, however, candidates are allowed choice of questions to answer. Out of six

    questions, candidates are asked to answer three questions only, worth 45 marks in

    total. Paper 2 runs for 2 hours.

    1.1.2 Grade Awarding Process

    Mathematics, like all other subjects at MSCE examination, is graded on a nine-

    point scale (Malawi National Examinations Board, 1999).

    1-2, denote pass with distinction;

    3-6, denote pass with credit;

    7-8, denote general pass; and

    9, denotes fail.

    The raw score of each candidate is converted into grades. This is done by

    awards committee that uses grade boundaries (cutoff scores) to turn scores into

    grades (Khembo, 2004). Because mathematics has two papers, each paper is graded

    separately and then corresponding cutoff scores at 2/3, 6/7, and 8/9 are summed to

    determine the final cutoff scores for the subject.

    1.1.3 Comparability of optional questions raw scores

    Livingston (1988) observed that question developers try their best to make

    optional questions equally difficult. Angoff (1971); Newton (1977); and Wainer &

    Thissen (1994), however, argue that it is not easy to produce tests that are similar in

    difficulty. Though item setters strive to produce questions of equal difficulty, the

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    18/117

    3

    questions have their own inherent intricacy that cannot be equalized. The difficult

    inherencies come from the complexity of the topics where the questions are

    formulated. It could be nave to compare a raw score that an examinee gets from an

    optional question which elicits, for example, the use of Venn diagrams to analyse

    and interpret data to a question which asks an examinee to find the sum of

    geometric progression using a formula. These two questions come from different

    topics which differ in complexity; hence raw scores on these two questions will not

    mean the same thing because the raw scores on the two questions do not indicate

    the same level of knowledge and skill. The scores will not be comparable. To treat

    them as if they are comparable would be misleading for the score users and unfair

    to the examinees.

    Having looked at the complexity of measuring examinees who answer different

    questions, the question would be: should choice questions still be incorporated in our

    examinations? The merits and demerits of optional questions are discussed in

    literature review section. However, Kierkegaard (1986, p.24) argues if you allow

    choice, you will regret it, if you dont allow choice, you will regret it; whether you

    allow choice or not, you will regret both. This argument highlights that if choice

    were not allowed, the limitations on the domain coverage forced by the small

    number of questions might unfairly affect some candidates. And on the other hand,

    choice would compromise test fairness when it comes to comparison of scores

    because of different levels of knowledge and skills being elicited from examinees

    from each optional question. Nevertheless, one would propose to increase the length

    of the test; this is not often practical (Wainer and Thissen, 1994) taking in

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    19/117

    4

    consideration of exams time and examinees fatigue. The onus, therefore, remains

    with the examiners.

    In case of mathematics paper 2, there have not been any intense arguments over

    optional questions behaviour, except Khembo (2004) sentiments against the policy

    of allowing choice. With little or no study done on optional questions on

    examinations administered by Malawi National Examinations Board (MANEB), the

    policy of allowing choice questions in mathematics paper 2 would continue without

    reforms and innovations to improve fair assessment because most of the stakeholders

    would not know how the choice questions are performing on this paper.

    1.1.4 Livingstons raw score adjustment

    Psychometricians, nevertheless, have tried to find a post hoc solution to the

    incomparability of optional questions scores. Livingston (1988) developed a method

    for adjusting scores of optional questions to take away the differential in difficulty of

    the questions. The procedures, in brief, are imputing a score for the examinee on

    each optional question which the examinee does not answer, and then averaging the

    scores, observed and imputed, over all optional questions. Allen, Holland and

    Thayer (1993) observe that the methodology makes implicit assumptions when

    imputing scores using chained linear equating. Under this procedure, raw scores on

    optional question i are transformed to the scale of optional question j through

    scores on mandatory section (also known as common portion) for the examinees that

    answered question i .

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    20/117

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    21/117

    6

    1.2 Statement of the Problem

    Mathematics is one of the papers at MSCE examinations that are not pre-tested

    (Khembo, 2004). Pretesting allows item analysis, which in turn ensures that only

    questions of proven quality are included in the final examination. When examiners

    compile examination paper they assume that the selected questions have equal

    inherent difficulty, as it is evidenced by the equal allocation of marks (each optional

    question carries 15marks).

    In the study done by Khembo (2004), where he was investigating the use of

    performance level descriptors to ensure consistency and comparability in standard

    setting divulged that item difficulty indices (item p values) for 2002 mathematics

    paper 2 examination were varying much for questions in section B. For example,

    question 10aand bhadp-values of 0.03 and 0.01. Question 7aand bp-values were

    0.52 and 0.15, question 12aand bdifficulty indices were 0.27 and 0.14. Comparing

    the p-values of the mentioned questions; one would note that the items were

    differentially difficult. However, some would argue that the items were attempted by

    non-equivalent groups conditioned to choice, and that it would not be possible to

    compare theirp-values outright. This argument is valid, but in the mentioned study,

    the researcher employed competent mathematics teachers to establish differential

    difficulty on the optional questions. The rating by the judges using performance

    level descriptors for questions in section B for 2002 and 2003 mathematics papers

    confirmed that some questions required higher order cognitive demands than others

    for an examinee to succeed. The judges complemented what was observed from the

    p-values.

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    22/117

    7

    With observations from the teachers and coupled with conspicuous differential

    p-values for optional questions, it is clear that the introduction of optional questions

    into this paper brings in unfairness in grading. The basis for comparability of raw

    scores, thus, is considerably weakened since different examinees would answer

    samples of questions that are not comparable in difficulty.

    For this reason, there is a need of finding a method which would circumvent

    incomparability of measurements. Livingston (1988) proposed a method of adjusting

    raw scores of optional questions to achieve fairness in grading examinees that take

    different questions. In the procedure, Allen et al. (1993) note that there are implicit

    assumptions, which are used in order to adjust the scores. They call them

    Livingston missing data assumptions.

    The assumptions are based on a key theoretical requirement of test equating

    which emphasises that the resulting equating functions should not depend on the

    population on which it is calculated. In other words, the two equating functions

    should be identical regardless of which subpopulation has attempted which question.

    Therefore, before the method is adopted and adapted in our grading system,

    especially in mathematics, there is a need to scrutinise it in detail.

    1.2.1 Purpose of the Study

    General objective

    The general objective of the study is to test the assumption of chain linear

    equating/linking for Livingston raw score adjustment method on optional questions

    scores of MSCE mathematics paper 2.

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    23/117

    8

    Specific objectives

    distinguish item difficulty level of optional questions using item difficulty

    indices of raw scores.

    compare correlations between total scores of compulsory section ( i.e.

    Section A/common portion) and scores of optional questions portion.

    establish whether equating/linking functions of examinees that chose a

    concerned optional question and for those that selected a different choice

    question are group invariance.

    1.2.2 Research Questions

    1. To what extent do optional questions differ in difficulty?

    2. How are scores on optional questions portion and total scores on the

    common portion correlated?

    3. Are equating/linking functions of examinees that chose a concerned

    optional question and for those that selected alternate question group

    invariance?

    1.2.3 Significance of the study

    Fairness in measurement is of paramount significance. Every examinee ought to

    be measured using the same instrument and the same scale for comparability to be

    meaningful. As already mentioned, mathematics is one of the subjects that are

    treasured at Malawi School Certificate of Education; and as a result a certificate

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    24/117

    9

    without a pass in mathematics puts a person at a disadvantage position when it comes

    to selection for further studies or even job selection.

    To forestall this measurement quandary, Livingston suggests a method for score

    adjustment of optional questions to a common scale. It would be easy to adjust the

    scores of MSCE mathematics paper 2 using this method. The consequences,

    however, of that action are not known in our context; and therefore it is worth testing

    the mentioned fundamental assumptions as Dorans (2004); Liu, Cahn and Dorans

    (2006) say that subgroups invariance is the most critical and plays a significant role

    in assessing fairness.

    Furthermore, there has been no detailed research to the knowledge of the

    researcher that has addressed the consequences of optional questions on the

    examinations administered by Malawi National Examinations Board. This study

    would evaluate the extent of relationship between knowledge and skills measured in

    section A and those measured in section B. It would also explore the pattern of

    choices in section B conditioned to topics in Malawi senior mathematics syllabus.

    1.3 Theoretical Framework

    The process of equating is used to obtain comparable scores when more than one

    test forms are used in a test administration (Holland, von Davier, Sinharay, and Han,

    2006). Angoff (1971) has defined the equating of tests as a process to convert the

    system of units of one form to the system of units of the other so that the scores

    obtained from one form could be compared directly with the scores obtained from

    the other form.

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    25/117

    10

    The central reason for equating different test forms is to ensure fair decision

    making regarding the test results (Liu and Dorans, 2008). There are three techniques

    and methodologies for making different test forms comparable known as equating

    procedures (Jaeger, 1981; Petersen, Kolen, and Hoover, 1989; Cook and Eignor,

    1991), or designs; namely random groups, single group, and common item non-

    equivalent groups (also known as non-equivalent groups anchor test, NEAT).

    There are three equating methods used in common item non-equivalent groups

    design such as Tucker, Levine, and chain linear (von Davier and Kong, 2005). This

    study focuses on the chain linear because it uses common item(s) scores(s) as the

    middle link in a chain of linear linking relationships. Basically, the chain linear

    linking is done by equalising standardised deviation scores (z-scores) on the two test

    forms via standardised deviation common item(s) scores. Before going into detail of

    chain linear equating/linking, we first look at Livingston score adjustment procedure

    in steps as presented by Allen, Holland, and Thayer (1993, pp17-18); because at the

    end would like to connect it with the chain equating/linking functions. Here are some

    more notions for easy grasp of what to follow:

    A

    jY

    PY

    *

    j

    jj

    sectionon thescoreswithcorrelated

    perfectlywerequestiononscoreifimputedbedthat woulscorethe

    innotexamineeanforjquestiononimputedscorethe

    =

    =

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    26/117

    11

    jXYjj

    iXYii

    j

    i

    j

    i

    P,,

    P,,

    jY

    iY

    AX

    BjPP

    BiPP

    X

    AP

    j

    i

    intcoefficienncorrelatiodeviation,standardmean,denote

    intcoefficienncorrelatiodeviation,standardmean,denote

    questionoptionalonscore

    questionoptionalonscoreportion)(commonsectiononscore

    sectioninquestionanswerthatofpopulationsub

    sectioninquestionanswerthatofpopulationsub

    testas

    knownalsoiswhich,sectiontakewhoexamineesofpopulationentirethe

    =

    =

    =

    =

    =

    =

    Step 1: equating iY to each of the jY . For examinees in iP obtain the converted

    value of the observed iy to the scales of the other jY s. The converted values are

    denoted )(*

    iij yY .

    Step 2: obtaining imputed values, ( )iimputedj yy , , for ij for every examinee in iP .

    These imputed scores are weighted averages of the raw score iy and its equated score

    in the jY scale, )(*

    iij yY :

    )()1()( *, 11 iijXYiXYiimputedi yYyyy jj +=

    Step 3: calculate the adjusted score as the simple average of the observed raw score

    and the imputed scores over all koptional questions.

    { } kyyyYij

    iimputedjiadj

    +=

    )(,

    Combining steps 2 and 3 to get a simple expression for adjY , we first denote as the

    average of all the correlations,1jXY

    :

    kj XYj= 1 and

    = j XYj iijXYii jj yYyY 11 )()(*

    ,

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    27/117

    12

    where )( ii yY is the weighted average of the converted values, in other words, a

    transformation of iy into an average scale of the kquestion scores determined by the

    equations with weights proportional to the correlations,1jXY

    .

    A simple Livingston adjusted score function is expressed as

    )()1( iiiadj yYyY +=

    Coming back to chain linear equating/linking functions and connecting it with

    Livingston score adjustment, it is discovered that:

    In step 1, the linear equation for equating iY to the scale of X in iP is

    ( ) 1)()( 11

    1

    1 ii

    i

    Xii yyXiX

    i

    +=

    and the linear equation for equating iX to the scale of jj PY in is

    (2))()(1

    1

    1

    1 j

    j

    X

    X

    j

    jj xxY

    +=

    where11

    andjj XX

    are the mean and standard deviation of X for examinees

    choosing question j . The essence of the word chained in the chained linear

    equating is the substitution of x in the )(xYj of equation (2) with )( ii yX in equation

    (1), neglecting the fact that the two equating functions are for different populations

    (Brennan, 2006). That is

    (3))(

    )()())((

    *

    1

    1

    11

    1

    1

    1

    11

    1

    iij

    ii

    i

    j

    X

    X

    XX

    X

    j

    jijj

    yY

    yyXY

    j

    i

    ji

    j

    =

    ++=

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    28/117

    13

    Braun and Holland (1982) indicate that for chain equating/linking to produce

    unbiased results, the two chained equating/linking functions should not depend on

    which population is used for the equating. Dorans and Holland (2000); von Davier,

    Holland, and Thayer (2004); Dorans (2004); Liu, Cahn, and Dorans (2006) call this

    requirement population invariance. It means that equating iY to iPXon ought to

    give the same equating function as ji PXY onto (Allen et al., 1993). In this case iY

    is missing data on jP , which in this study will be available. The resulting linear

    equating function of ji PXY onto is

    (4))()( 11

    1

    1 iji

    ij

    X

    Xiij yyXj

    j

    +=

    The two linear equating/linking functions (1) and (4) therefore must have the same

    slope and intercepts in order to meet the above condition or requirement.

    1.4 Definition of terms

    Conventional secondary school: public school owned by Malawi government.

    Cutoff score/cut score: a point on a score scale in which scores at or above that

    that point are in a different category or classification than scores below the

    point.

    Difficulty: a factor causing trouble in achieving a positive result or tending to

    produce a negative result.

    Optional questions: examinees self-selected questions or choice of questions in a

    test.

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    29/117

    14

    Performance descriptors: scale of achievement levels with a set of observable

    behavioural descriptions

    Test form: examination paper

    National secondary: a school where its students are selected for admission from

    different districts across Malawi.

    District secondary: a school that admits students taken from the same district. It

    offers boarding and lodging.

    Day secondary: a school that offers no boarding and lodging. Its students come from

    surrounding communities.

    Grant-aided secondary: church affiliated school that receives financial assistance

    from Malawi Government.

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    30/117

    15

    CHAPTER 2

    2.0 LITERATURE REVIEW

    2.1 Introduction

    The literature review has seven sections. The first section gives general

    information on optional questions. The second section discusses some advantages of

    optional questions regarding to their use in test forms. The third section looks at

    problems that come with the policy of allowing candidates to choose questions in an

    examination. Relationship between candidate question choice and performing high is

    discussed in the fourth section. Definition of linking and equating under this study is

    given in fifth section. Sixth section discusses the possibility of linkage and

    equitability of optional questions using traditional equating methods. The last section

    discusses the consequences of not linking/equating when choice items are

    differentially difficult.

    2.2 General information on optional questions

    The introduction of optional questions into examinations brings in a certain

    complication of the process of measurement, since different groups of candidates

    will attempt different questions yet from a single paper; thereby creating room for

    combination of different test forms in candidates scripts (Willmott & Hall, 1975;

    Bell, 1997). In the context of mathematics paper 2, choosing three questions out of

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    31/117

    16

    six creates twenty possible combinations of test forms. The complication comes in

    because candidates answer in effect different papers out of these different

    combinations, especially when questions vary much in difficulty. It then means the

    same total mark may not represent comparable performance (Lewis, 1974).

    Good test adequately samples out questions from the content domain to provide

    a sound basis for determining the extent to which a student has mastered the course.

    Mann (1845, pp.37-40) as cited by Wainer, Wang, and Thissen (1991, p.2) argued

    that

    it is clear that the larger the number of questions put to a scholar the

    better is the opportunity to test his merits. If but a single question is put,

    the best scholar in the school may miss it, though he would succeed in

    answering the next twenty without a blunder; or the poorest scholar may

    succeed in answering one question, though certain to fail in twenty

    others. Each question is a partial test, and the greater the number of

    questions, therefore, the nearer does the test approach to completeness. It

    is very uncertain which face of a die will turn up at the first throw; but if

    the dice are thrown all day, there will be a greater equality in the number

    of faces turned up.

    The argument of Mann is quite plausible in the context of MSCE mathematics

    syllabus. To determine that one has indeed mastered MSCE mathematics, it does not

    take a single question answered correctly, but enough questions that cover fairly the

    content domain. Section A, which is a mandatory section of the mathematics paper 2

    contains fairly small items whilst in section B there are large items. Wainer et al.

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    32/117

    17

    (1991) define large items as those that take examinee longer to complete than do

    short items. Large items provide deep coverage of the content domain that can

    guarantee the examiner if one answers them correctly that the examinee has

    thoroughly mastered the course. In this case, large items need to be many but an

    examinee cannot complete many large items within the allotted testing time. One

    way of compromising testing time limits and domain coverage is by providing many

    large items and allow examinees to choose them.

    2.3 Advantages of optional questions

    Optional questions have some advantages to candidates, teachers and examiners.

    In this study, only three main advantages are discussed.

    First, optional questions provide each candidate the chance to answer questions

    on a wide range of topics (Bradlow and Thomas, 1998). It is so because the presence

    of so many questions on a paper than time can allow means wider coverage of the

    syllabus. This in return increases fairness among candidates (Allen, Holland, and

    Thayer, 2005) because they are not restricted to answer samples of questions from

    few topics.

    Second, optional questions are used in the examinations that are interested in

    measuring higher order cognitive domain (Allen et al., 2005). In these examinations,

    authenticity of candidates work is perceived by the examiners to be more realistic

    (Bradlow and Thomas, 1998). This advantage is more applicable to essay optional

    questions where candidates are just given a topic to write about. In mathematics, it is

    also applicable because optional questions demand high level of thinking. When an

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    33/117

    18

    examinee gets all marks on an optional question, it means s/he has demonstrated

    high-level cognitive ability.

    Third, examinations with question choice give teachers freedom to teach

    particular portions of the syllabus in which they may be particularly interested

    (Schools Council Examinations Bulletin, 1971; Willmott and Hall, 1975). Similarly

    candidates do concentrate on particular aspects of the topics in which they are able to

    show themselves to the best advantage. However, optional questions of mathematics

    paper 2, no teacher can confidently know which topics will be examined, therefore;

    in essence, there is no freedom of teaching particular topics and leaving out others.

    Nevertheless, some teachers have problems in executing lessons involving

    some mathematics topics. As a result, they either engage someone who is

    comfortable with the particular topics or they fallibly present the topics. The latter

    situation puts students in awkward position in terms of thorough examination

    preparations. It eventually negatively influences their choices in the examination

    since the mathematics domain has been reduced by the teachers incompetence.

    Nonetheless, candidates are forced to prepare thoroughly by studying the whole

    syllabus. One can be good at a particular topic, but still s/he is extrinsically

    motivated to study hard on the other topics in order to do well because no one can

    predict exact topics that will be examined.

    2.4 Problems of optional questions

    Although the merits of the above section cannot be denied, little attention has

    been paid to the problems brought by optional questions when they are used in

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    34/117

    19

    examinations. It appears examiners over look some of very important aspects of a

    test as a measuring instrument. Below are accounts of two main problems associated

    with examinees choice of questions. The first discusses about the difference in

    cognitive domain demands of topics in a syllabus; while the second challenge looks

    at the variability in abilities of candidates.

    2.4.1 The syllabus

    In a syllabus, there are a number of different topics. It may be argued whether or

    not syllabus topics are of the same basic level of difficulty (Willmott, 1972). One

    good example of these arguments is the one presented by School Council

    Examinations (1971) which say that in mathematics; is the quoting of geometry

    theorem followed by an example on par with factorisation followed by the solution

    of a pair of simultaneous equation? Certainly, the two topics or branch of

    mathematics could not be at the same difficulty level in our syllabus. There are quite

    a number of topics in senior secondary school mathematics syllabus which have

    different levels of difficulty. The comparability of the results of candidates

    attempting these questions drawn from different topics may be questioned.

    Therefore, putting scores from different optional questions on the same scale is

    necessary for fair comparisons.

    2.4.2 The abilities of candidates

    The level of questions may vary considerably within the same test form in terms

    of level of proficiency required of the candidates to be able to answer the question

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    35/117

    20

    fully (Willmott, 1972). The provision of question choice results in the type of

    responses required of the candidates over the whole paper not to be controlled in any

    way. Some candidates may choose to answer questions with a certain pattern of

    proficiency. For example, if a paper of ten questions consisted of five description

    questions and five explanatory questions, and candidates were to answer five

    questions in all, it is likely to see describers only and explainers only (School

    Council Examinations, 1971). This would create measurement problem when one

    tries to consider candidates with the same marks to be worthy of the same ability

    level (Willmott, 1972). In the case of mathematics, candidates who are not good at

    graphs, for example, will tend to avoid graph questions, and some whose proficiency

    is low in matrices and vectors will choose other questions. However, the fact that

    they have answered their preferred questions does not guarantee them to get full

    marks on that particular question. The gist of the matter is if they like geometry most

    than arithmetic and algebra they go for such branch of mathematics. The problem

    that would come in is of comparison: is my geometry better than your algebra or

    arithmetic? Wainer and Thissen (1994) are also concerned with such comparisons

    because there is need to take into account the difficulty of the accomplishment for

    comparison to be meaningful. It would not be fair to judge two examinees

    mathematics proficiency based on different questions. Fair play is ought to be

    achieved.

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    36/117

    21

    2.5 Relationship between candidates question choice and getting high scores

    The suggestion that optional questions allow candidates to select the questions

    on which they can perform better is contradicted by research evidence. According to

    Wang (1996), the correlation between the popularity ranking of the five choice

    questions and their corresponding means was 60.0 , and the correlation between

    the ranking of the choice questions combinations and mean score was .22.0 It is

    very surprising to note the negative correlations because it is assumed that

    examinees choose questions they feel that they would get right. Taylor and Nuttal

    study (1974) as cited by Bell (1997) asked candidates taking a Certificate of

    Secondary Education (CSE) examination to answer the questions they omitted on a

    separate occasion after the actual examination. It was found that about %25 of

    candidates actually showed an improvement in the final marks. This meant that not

    all candidates are able to choose in advance the questions on which they will score

    most highly.

    Power, Fowles, Farnum, and Gerritz (1992) found that the more the examinees

    liked a particular topic, the lower they scored on an essay they subsequently wrote

    on the chosen topic. This phenomenon is quite true when the choice between the

    questions is relatively hard for examinees to make, that is, the choices are not

    strongly determined (Allen, Holland, and Thayer, 1993). There is no knowledge on

    whether MSCE mathematics paper 2 optional questions presents this kind of

    scenario where most candidates find it hard to select questions that they would

    attempt and score most highly or not. Malawi National Examinations Board item

    developers do try to produce optional questions of equivalence in difficulty by

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    37/117

    22

    following available guidelines (Khembo, 2004). It is yet to be seen if examiners

    effort to produce optional questions of equivalence in difficulty, on face value,

    would produce hard choices on the part of examinees. The face value words are

    used because no detailed research has been done to ascertain the notion of equal

    difficulty of optional questions.

    2.6 Linking and Equating

    Linking encompasses a broad perspective on score adjustment of different test

    forms. Feurer, Holland, Green, Bertenthal, and Hemphill (1999) in their uncommon

    measures report presented three types of linking of scores of different tests that are

    built based on

    1. the same framework and same test specifications,

    2. the same framework and different test specifications, or

    3. different frameworks and different test specifications.

    Kolen and Brennan (2004, p.427) ably defined the term frameworkas a delineation

    of the scope and extent (e.g., specific content areas, skills, etc) of the domain to be

    represented in the assessment They also defined test specifications or blue printas

    specific mix of context areas and items formats, number of tasks/items, scoring

    rules, etc. On the other hand, Mislevy (1992) and Linn (1993) proposed a type of

    taxonomy for linking which mainly focuses on methodologies. They grouped the

    taxonomy into four categories, based on the strength of the resulting linkage, starting

    with equating, followed by calibration, projection, and lastly moderation.

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    38/117

    23

    When the first two types of linking presented by Feurer et al. (1999), Mislevy

    (1992) and Linn (1993) are put into the same perspective, one would find that score

    adjustment relationship of different test forms that are built on the same framework

    and same test specifications is called equating(Kolen and Brennan, 2004). Tests that

    are developed on the same framework and different specifications when linked the

    resulting relationships is called calibration. The term projection comes in because

    the methodology does not require the test forms to measure the same constructs or

    domain, and score adjustment relationship is obtained through linear or non linear

    regression.Moderationis a type of linking in which the test frameworks are different

    but the constructs are similar (Kolen and Brennan, 2004). For this case, the

    fundamental aspect relies on distribution matching.

    Looking specifically at equating as one type of linking, Lord (1980) outlined

    four requirements that must be met for equating of, say, test iY to test jY

    1. the same construct: the two tests must measure the same construct,

    2. equity: once two test forms have been equated, it should not matter to

    the examinees which form of test is administered,

    3. symmetry: the equating transformation should be systematic. This

    means the equating of iY to jY should be the inverse of equating jY to

    iY ,

    4. subpopulation invariance: the equating transformation should be

    invariant across subpopulations.

    As noted previously from the definitions on the types of linking in uncommon

    measure report; same framework is viewed as construct similarity and same test

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    39/117

    24

    specifications is considered as similarity in measurement characteristics such as test

    length, test format, administration conditions, etc (Kolen and Brennan, 2004). These

    definitions are concordant with four requirements for equating as delineated by Lord

    (1980). The study would use these definitions as benchmarks for deciding the type of

    linking which would be involved. Therefore, the term linking would be used

    (henceforth) to refer to any function used to connect the scores on one test to those

    of another test, and would reserve the term equating to the special case of linking

    that satisfies the benchmarks.

    Livingston (2004); von Davier, Holland, and Thayer (2004); Holland, von

    Davier, Sinharay, and Han (2006) describe chain linking as equating the scores on

    the new form to scores on the anchor and then equating the scores on the anchor to

    scores on the reference form. Putting the definition in our context, chain linear

    linking describes equating the scores on a particular optional question (new form) to

    total scores on common portion (anchor) and then linking the total scores on the

    common portion to scores on the other optional questions (reference forms). The

    chain formed by these two linking functions connects the score on the concerned

    optional question to the scores on the other optional questions.

    The study is particularly interested in the first part of the chain where a

    particular optional question scores are linked to total scores on common portion.

    There is an assumption that says the linear function from a particular optional

    question scores on a common portion is the same in the two populations, those that

    answer the concerned question and those that do not ( iP and jP ) (von Davier &

    Kong, 2005). Based on the assumptions level of attainment, we can substantiate the

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    40/117

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    41/117

    26

    Thayer (2005) discovered that the question choice tends to be positively associated

    with performance in the sense that the better an examinee does on a question the

    likely s/he is to prefer that question and vice versa. This revelation, however, is

    mudded with a reversal where examinees who prefer a certain question perform

    better on the unprefered question. They concluded that there is a substantial amount

    of variation around the performances in regard to preferred and unprefered choices

    and, therefore, it is difficult to justify the non-ignorable selection. With the above

    findings, it seems impossible for scores on optional questions to be treated

    interchangeably through traditional equating because it is inconsistent with the

    notion of standardised testing (Kolen and Brennan, 2004).

    Though it is deemed impossible to equate optional questions scores,

    nevertheless, comparability of scores is possible through score adjustment

    procedures (Kolen and Brennan, 2004) by employing linking paradigms. Wainer,

    Wang, and Thissen (1991) employed Item Response Theory (IRT) to explore

    equating possibility of choice items by assuming ignorable non-response using data

    from the College Boards Advanced Placement (AP) test in Chemistry. They treated

    examinees as two subpopulations. Both were administered the common items, but

    differing in the administration of the chosen questions to calibrate the item

    parameters for the common items and selected questions. They succeeded but

    without the confirmatory evidence that could only be sourced with further data.

    Allen, Holland, and Thayer (1994a, b) provided a general procedure based on

    missing-data methods for non-ignorable non-response to estimate distribution of

    scores on an optional part of a 1987 Advanced Placement (AP) European History

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    42/117

    27

    test. Using sensitivity analysis approach, they observed that an assumption of

    ignorable non-response given additional information from the common section score

    could determine the correct assumption about the non-response when only the

    optional essay score and the common section were available. Fitzpatrick and Yen

    (1995) investigated the psychometric characteristics of constructed response items

    referring to choice and non-choice passages administered to students in Grades 3, 5,

    and 8. The items were scaled using IRT methodology. The findings indicated that

    the scores obtained on different choice sets were comparable when these choices

    were scaled together with the non-choice items that all students took. The non-

    choice items play an important role in producing comparable scores. Bridgeman,

    Morgan, and Wang (1997) assessed the ability of history students to choose the

    essay topic on which they could get highest score. They concluded that techniques

    for equating scores generated by different topics are not totally satisfactory therefore;

    scoring rubrics must be established by single group of raters to enable single

    standard.

    As it can be noted, there is mixed bag of success and failure in making choice

    items scores comparable. Most of the mentioned studies used IRT methodology in

    data analyses which require strong assumptions on the test, such as

    unidimensionality and local independence. Unidimensionality is statistical

    dependence among items which comes about because the test is measuring one latent

    trait) and local independence is achieved when items are statistically independent for

    each subpopulation of examinees whose members are homogenous with respect to

    the latent trait (Crocker and Algina, 1986; Hambleton, Swaminathan, and Rogers,

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    43/117

    28

    1991). The opponents of IRT always argue that it is nave to assume that a single

    latent trait is accounted for the responses to items on a test. Thus, this study uses

    classical item analyses statistics in testing a key assumption of Livingstons score

    adjustment on MSCE mathematics paper 2 based on the requirement that the two

    equating/linking functions should not depend on a particular population used for

    equating.

    At this juncture, it should be accentuated that examinations used in the

    mentioned studies are quite different in terms of format with the one understudy. In

    those studies, examination papers had more than two sections whilst ours has two

    sections only. In regard of this, it would not be plausible to conclude equating is not

    possible for every examination with optional questions until prove so beyond

    reasonable doubt.

    2.8 What are the consequences of not linking/equating optional questions scores?

    Linking/equating have the potential to ameliorate problems presented by choice,

    through making them equivalent in difficulty. If examinees who choose different

    items are to be fairly compared with one another, the scores obtained on these items

    must be equated (Wainer, Wang, and Thissen, 1991, p.2). This process facilitates

    the linkage of scores on optional items to one another by putting them on

    comparable scale using z-score model.

    The optional questions are intended to test the same skills and types of

    knowledge which are taken from the same syllabus. Though test developers try to

    make the questions equally difficult, oftentimes more optional questions turn out to

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    44/117

    29

    be harder than others. Wang, Wainer, & Thissen (1993) observed that in 1989 AP

    chemistry and 1989 AP American History, women were adversely affected because

    most of them chose the more difficult items. This is one example among many of

    unfairness that comes along with question choice. When some optional questions are

    harder than others, the raw scores on those questions would not indicate the same

    level of the knowledge or skill the questions are intended to measure thus the scores

    would not be comparable.

    As noted previously, it remains a fact that to develop choice items of equal

    difficulty is a gargantuan challenge. Even so, to remove choice from the examination

    would reduce domain coverage because of small number of items that would be

    examined. This would affect some students. Increasing the length of time to

    accommodate large number of items is often impractical. Since choice has been

    decided as the desirable format for MSCE mathematics paper 2 examinations, there

    are two main consequences of not putting the scores on the same scale. First, same

    observed raw score on each optional item would not imply the same accomplishment

    because the difficulties of the tasks are different. Second, observed total raw scores

    from choice items combinations in section B would still present different patterns of

    mathematical proficiencies. From one combination to another that might create

    intricacy in comparison.

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    45/117

    30

    CHAPTER 3

    3.0 METHODOLOGY

    3.1 Introduction

    This chapter describes how the research problem was investigated. The list of

    questions to be answered is given first. This is followed by the design of the study,

    analysis plan, ethical consideration, validity and reliability. In the final section, a

    narrative of delimitations and limitations of the study is presented.

    3.2 The Research Questions

    The following questions were addressed in this study:

    1. To what extent do optional questions differ in difficulty?

    2. How are scores on optional questions and total scores on the common

    portion correlated?

    3. Are linking/equating functions of examinees that chose a concerned

    optional question and for those that selected another choice question

    similar?

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    46/117

    31

    3.3 The Design

    3.3.1 Description of the Research

    The research strategy which was employed is survey because the researcher

    wanted the measures used to be reliable and valid, and that there was guarantee of

    fair representation of all individuals to whom the researcher wanted the results to

    apply (Cohen, Manion, & Marrison, 2000; Slavin, 1984). Further, quantitative

    approach was the method used because it uses the positivism approach, which holds

    the belief that the social environment is real and constant regardless of time and

    setting (Creswell, 1994).

    3.3.2 Population

    The population of the study was all form 4 students from purposively sampled

    secondary schools in southwest and shirehighlands education divisions.

    3.3.3 Sampling

    The study used purposive sampling where five secondary schools were chosen

    to participate in the study. Two main reasons are given why purposive sampling

    was preferred to others. First, the researcher wanted to ensure representation of four

    major conventional secondary schools types. This is in agreement with Borg, Gall

    and Gall (1996) who say that purposive sample provides a more focused data and

    allows for a detailed analysis of a particular segment of population. Second, due to

    limitations of research funds and time it was judicious to engage schools which

    were close to each other.

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    47/117

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    48/117

    33

    classroom assessment. Since the study wanted sixty participants from each school; a

    sampling interval, k, was computed by dividing the class size of students in form 4

    class at each school by 60. From the teachers list, a name of student corresponding

    to thk number was picked, and every thk name thereafter was chosen until the

    required number was achieved.

    3.3.4 Instruments

    The main instrument that was used is a 2005 Malawi School Certificate of

    Education Examinations mathematics paper 2 (see appendix G). This paper was

    purposively chosen because it was the latest paper at the time of writing the

    research proposal.

    The design was that the candidates had no choice in section B, thereby

    increasing the test length by three more questions. In view of this, the paper was

    divided into two parts; paper 1 representing section A (see appendix C) and paper 2

    representing section B (see appendix D). This was done in agreement with the

    observation of Hand (2004, p.120) that the more questions included in a test, the

    more difficulty one might find in obtaining valid responses and candidates tire as

    the number of questions increases, and might even refuse to take part if there are

    too many.

    Paper 1 consisted of six questions and time allotted to it was 1 hour 30 minutes.

    Paper 2 took 2 hours and had six question choices. In this paper, examinees were

    instructed to read all the optional questions and chose three questions; and that they

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    49/117

    34

    should write down the number of these questions in order of preferences. Then they

    were instructed to answer all the six questions.

    The other instrument was the questionnaire that was used as a cover page for

    candidates answer sheets for paper 1 and paper 2 (see appendices E and F

    respectively). The questionnaire was used to solicit extra information from the

    candidates such as question choice preference, exclusively for paper 2, gender, and

    age.

    3.3.5The administration of the instruments and data gathering

    The two papers were administered three weeks prior to commencement of

    National Examinations. This was done to ensure that students had prepared

    thoroughly in terms of mastering the whole mathematics syllabus. This is the time

    when the majority of the secondary schools finish delivering lessons to students and

    instead they engage in revisions of various courses that are offered. The two test

    papers were administered on the same day, starting with paper 1, and after 30

    minute break, paper 2 was taken.

    Students were instructed to answer the questionnaire first before attempting the

    questions in both papers. The time given to fill the questionnaire was two minutes.

    3.4 Data Analysis

    3.4.1 Extent of difficulty in optional questions

    The item difficulty indices (p-values) were used to analyse the extent of

    difficulty in optional question. These p-values are obtained by computing the

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    50/117

    35

    average mark obtained on the question divided by the maximum mark for that

    question (Nuttal & Willmott, 1972). The p-values for questions in section A and

    section B without choice (i.e. no choices were allowed on the optional questions

    portion) were all calculated in the same manner. The item difficulty indices for

    questions in section B without choice were unbiased statistics because all

    examinees (population) were used to compute them.

    3.4.2 Correlation of scores on section B and total scores of the section A

    Pearson product-moment correlation coefficient between the common portion

    and question choice portion was calculated. The coefficient of determinant was

    worked out to determine variance in section A that is associated with the variance in

    section B. This question helped the researcher to see if the examinees would differ

    in the same way on the common portion as they would do on the optional questions

    portion. If the correlation coefficient were strong, then the researcher would know

    that section A measured similar construct as section B. It signifies that the

    mathematical knowledge and skills that were asked in section A were also available

    in section B; making the two sections measure the same mathematical elements.

    This is one requirement amenable to equating for two tests (Liu, Cahn, and Dorans,

    2006).

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    51/117

    36

    3.4.3 Establishing group invariance on equating/linking functions of examinees that

    chose a concerned optional question and for those that selected another

    question

    In normal examination, the raw score iY for examinee selecting question j is

    unobservable, in fact, iY is missing datum. Equating ji PXY onto , therefore, is

    impossible. This equating function is denoted )( iij yX . For instance, an examinee

    who chose optional questions, say, 7, 9, and 12 would have unobserved scores on

    optional questions 8, 10, and 11. Thus equating the score of, say, question 8 to scale

    of total score of section A on the group that selected question 7, or 9, or 12 is

    impossible. We could denote this equating function as )( 87,8 yX , or )( 89,8 yX , or

    )( 812,8 yX with respect to the chosen optional questions.

    The missing scores, however, were available and were used to determine the

    means 1,ji and standard deviations 1,ji . These moments were used together with

    means1jX

    and standard deviations1jX

    of section A to establish slopes and

    intercepts of functions )( iij yX . The computable missing linear equating

    is ( )11

    1

    1)( iji

    ij

    X

    Xiij yyXj

    j

    += . Other slopes and intercepts of the observable scores

    equating functions )( ii yX were computed using this equation

    ( )11

    1

    1)( ii

    i

    Xii yyXiX

    i

    += .

    For each optional question, there were five sets of linear functions. For each

    set, one function belonged to subgroup that chose a concerned question; the other

    function was for a subgroup that never selected the concerned question but chose

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    52/117

    37

    another question; and the last function was for the combined group. The two

    subgroups in each set were mutually exclusive.

    Dorans and Holland (2000) introduced two statistics to summarise differences

    between the equating functions obtained from subgroups and combined group. The

    first one is standardised Root Mean Square Difference, RMSD, which gives

    detailed information as to which Y-score points, y, that are most affected the

    subgroup difference. The second one is the standardised Root Expected Mean

    Square Difference, REMSD, which summarises overall differences between the

    equating/linking functions. The formulae for the two statistics are

    [ ]

    )(

    )()(

    )(

    1

    2

    groupcombined

    H

    h

    XXh

    X

    yeqyeqw

    yRMSD

    h

    =

    = (5)

    [ ]

    )(

    )()(

    1

    )max(

    )min(

    2

    groupcombined

    H

    h

    y

    y

    XXyhh

    X

    yeqyeqw

    REMSD

    h

    =

    = (6)

    Xeq represents transformed scores on Yto the scale of X for the combined group,

    hXeq represents transformed scores on Yto the scale of X for subgroup h. hN is the

    sample size for subgroup h, Nis the total number of examinees andN

    Nw hh = is the

    weight for the subgroup h. Furthermore, yhN is the number of examinees for

    subgroup hwith a particular score (y) on Y, andh

    yhyh N

    N= is a weighting factor

    for subgroup hand score (y).

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    53/117

    38

    As it can be noted, RMSD is computed at each y-value and the contribution of

    each subgroup is weighted by its proportional representation in the combined group.

    REMSD is a doubly weighted statistics over yh and hw .

    To evaluate the relative magnitude of RMSD and REMSD, Dorans and

    Feigenbaum (1994) suggested the notion of score Difference That Matters (DTM)

    in the context of linking the SAT to the old SAT. Test that is reported in 10-point

    unit, linking functions that are within 5 scaled score points of each other at a given

    raw score point are treated as close enough to ignore because they are less than half

    of a reported score unit of 10 (Dorans, 2004). Kolen & Brennan (2004, p. 462) give

    a good illustration on the logic of DTM when reported scores are integers,

    equivalents of 15.4 and 15.6 round to differentintegers even though they differ by

    only .2 (less than a DTM). Also equivalents of 14.6 and 15.4 round to the same

    integer even though the different by .8 (more than a DTM). The score unit on

    MSCE mathematics examination is 1-point, which is an integer. This means that

    half of score unit was considered as a score Difference That Matters, .5.

    Recall that RMSD and REMSD statistics are standardised by dividing by the

    standard deviation of scores on compulsory section for combined group. DTM was

    standardised in the same manner so that it could be used as a benchmark for

    evaluating RMSD and REMSD. When REMSD was below the standardised DTM it

    indicated that the equating functions for each subgroup were very close to that of

    the combined group, hence they were group invariance. Otherwise, they failed

    group invariance test. These functions and RMSD were plotted on graphs to

    visually display their similarities and the differences.

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    54/117

    39

    3.5 Ethical Considerations

    Creswell (2003) says codes of professional conduct for researchers are

    applicable to all research methods: qualitative, quantitative, and mixed methods. In

    this study, the researcher observed two ethical codes of conduct. First was obtaining

    informed consent, and second was to do with privacy and confidentiality.

    First, Gay and Airasian (2003) say that very rarely is it possible to conduct

    research without the cooperation of people in the setting of the study. Cooperation

    would come into play if the researcher obtains consent from participates. Before

    carrying out the research, a written permission was sought from the Education

    Division Managers and headteachers to conduct the research at their schools

    (appendices J, K, & L), and furthermore, students of the participating schools were

    asked if they were acceding to take part in the study. Only those that acceded were

    systematically selected to be the candidates. Rossman and Rallis (2003) comment

    on the significance of getting informed consent from participants by saying that the

    permission from the subjects is crucial for the ethical conduct of the research

    because it serves to protect the privacy of the participants.

    Second, Fowler (1995); Vaughn, Schumm, and Sinagub (1996); Rossman and

    Rallis (2003) mention that privacy and confidentiality during data collection is of

    paramount importance. Participants responses should be kept confidential and they

    should know the purpose of the study. Based on these assertions, the study assured

    subjects of their privacy and confidentiality during the administration of the tests by

    advising them not to disclose or write their names on the answer sheets. Letters and

    numerical values were used to distinguish examinees from one another.

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    55/117

    40

    3.6 Validity and Reliability

    Validity is defined as the accuracy or truthfulness of a measurement with

    reference to a construct of specific interest; and reliability is concerned with

    consistency of a measurement (Crocker & Algina, 1986; Bakewell, 2003). Hand

    (2004, p.129) defines validity as how well the measured variable represents the

    attribute being measured, or how well it captures the concept which is the target of

    measurement. He further defines reliability as the differences between multiple

    measurements of an attribute.

    On validity, MANEB item setters developed the instrument that was used in

    this study. These item setters are well-trained personnel with vast teaching

    experience in mathematics. During the development of the tests, they use blue

    prints, that is, tables of specifications to guide them in terms of content coverage

    and the level of cognitive demands. The blue prints help to maintain consistency of

    difficulty level of the tests over years. The papers, therefore, possess the required

    magnitude of content validity based on how they are designed. Furthermore, the

    examinees took the tests three week prior to the National examinations. This means

    that the students at that time were well prepared. Hence their responses were taken

    as their optimal performance or achievement in MSCE mathematics paper 2 as they

    displayed their true mathematics knowledge and skills.

    In assuring reliability, marking scheme was used for consistency in scoring,

    and one item rater was used to avoid inter-rater variability. The marking scheme

    used for scoring the test was developed by two experienced mathematics teachers

    from Chiradzulu Secondary School. These teachers are also MANEB mathematics

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    56/117

    41

    raters. The scheme is similar with MANEB scheme in terms of mark allocation and

    content specification. Furthermore, before rating the items, the researcher and the

    two teachers standardised the marking scheme to encompass examinees diversity

    answers. One question at a time was marked on each script before marking the

    subsequent question to ensure consistency.

    3.7 Delimitations and Limitations of the study

    3.7.1Delimitations

    The study focused only on optional questions of mathematics paper 2; hence

    the finding would not apply to other MANEB examinations that allow examinee

    choice.

    The results would not be generalised to all secondary schools in Malawi

    because the participating schools were purposively sampled. However, the results

    would be related to other schools with similar characteristics as the sampled ones.

    3.7.2Limitations

    Visiting all the secondary schools that offer mathematics would have been an

    ideal but this was impossible due to time and financial constraints. Instead the study

    was done on five schools only.

    Some students declined to participate in the study after previously affirming to

    do so. In some instances, candidates took only one paper instead of two. This

    behaviour provided scores for one paper only, as for the other one were not

    available. With this regard, they were dropped from the study thereby reducing the

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    57/117

    42

    targeted sample size. This attrition was much observed in Njamba secondary

    school. The total number of attrition was 53.

    Finally, MANEB marking scheme was not issued to the researcher to be used.

    They say it is a confidential document, hence cannot be given to anyone outside the

    organisation. This created a minor setback because it was planned to use their mark

    scheme. It resulted into extra finances and resources in bringing about two

    experienced teachers from Chiradzulu secondary schools, who are also MANEB

    item writers and scorers, together with the researcher to develop another marking

    scheme. Nonetheless, our combined experience as item scorers made the marking

    scheme similar to the ones developed by MANEB.

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    58/117

    43

    CHAPTER 4

    4.0 RESULTS AND DISCUSSION OF THE FINDINGS

    4.1 Introduction

    In this chapter, results and discussions of the findings are presented under three

    main sections. The sections were formulated based on the research questions. Thus

    they display answers to the posed research questions in chronological order, starting

    off with the first research question, and the second. Third research question is

    addressed in the final section coupled with a chapter summary.

    4.2 To what extent do optional questions differ?

    4.2.1 Preliminary Analysis

    The item content and major content areas that made up section A and section B

    are outlined in Tables 4.1 and 4.2 respectively. Almost all content areas that were

    examined in section A were also tested in section B, but with different item

    contents. It signifies that the two sections were measuring the same construct.

    Construct similarity is viewed as same framework (Feuer et al. 1999), thus both

    sections were built on the same framework.

    Furthermore, Feuer et al. (1999) define same test specifications as similarity

    in measurement characteristics/conditions such as test length, test format,

    administrations conditions, etc. Popham (1974) as cited by Crocker and Algina

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    59/117

    44

    (1986) defines item specification as sources of item content, descriptions of the

    problem situations or stimuli, etc. In view of both definitions, the items in both

    sections were built on different item specifications. This is evidenced by similar

    item format but different sources of item content. Further, the differences rested in

    the levels of cognitive operation demands. Most questions in section A demand less

    cognitive operation than those in section B as indicated by p-values in Table 4.3.

    Table 4.1: Major content areas of section A

    Section A

    Question No. Item content Content areas

    a Algebra fractions Algebra, patterns, & functions1

    b Irrational numbers Numeration

    a Subject of a formula Algebra, patterns, & functions2

    b Matrices Algebra, patterns, & functions

    a Triangle geometry Geometry3

    b Remainder theorem Algebra, patterns, & functions

    a Circle geometry Geometry4

    b Mapping Algebra, patterns, & functions

    a Measurement Numeration5

    b Speed-time graph Numeration

    a Similar figures Geometry6

    b Vectors Numeration

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    60/117

    45

    Section B

    Question No. Item content Content areas

    a Statistics Statistics & probability7

    b Formulation & solving

    quadratic equation

    Algebra, patterns, & functions

    a Partial variation Algebra, patterns, & functions8

    b Probability Statistics & probability

    a Exponential equation Algebra, patterns, & functions9

    b Linear programming Algebra, patterns, & functions

    a Equation of a straight line Algebra, patterns, & functions10

    b Arithmetic progression Algebra, patterns, & functions

    a Cyclic quadrilateral Geometry11

    b Sets Numeration

    a Trigonometry Numeration12

    b Solving polynomial

    equation graphically

    Algebra, patterns, & functions

    Table 4.2: Major content areas of section B

    Having looked at same framework and test/item specifications of the two

    sections of the test under investigation, it would be reasonable to use the term

    linking rather than equating because the two sections had different item content

    but same content areas; and the length of choice items in section B were not equal

    to items in section A. Further, level of cognitive processes required in two sections

    was different as illustrate in subsection 4.2.2. Thus, the two portions measured the

    same construct, but different specifications. However, when equating choice items,

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    61/117

    46

    the interest is on item content as opposed to content areas of the test form because

    item scores are the ones to be linked within the same test.

    4.2.2 Comparing p-values of section B

    Table 4.3: P-values for questions in section A and section B 'without choice'

    Section A Section B

    Item Max.

    mark

    Average

    mark

    p-value Item Max.

    mark

    Average

    mark

    p-value

    1 8 5.190 0.649 7 15 6.436 0.429

    2 7 4.401 0.629 8 15 5.061 0.337

    3 9 5.518 0.613 9 15 5.869 0.391

    4 10 5.801 0.580 10 15 5.116 0.341

    5 11 6.324 0.575 11 15 3.927 0.262

    6 10 1.917 0.192 12 15 7.566 0.504

    Table 4.3 displays the item difficulty indices (p-values) for questions in section

    A and section B without any choice. Questions in section A have generally higher

    p-values than those in section B. This affirms the notions that section A questions

    are easier than section B questions. The questions in the latter section were

    relatively difficult because they usually provided deep coverage of the content

    domain. Adopting the terms used by Wainer and Thissen (1994), most of the

    questions in section B would be called large items. Section A questions would be

    dubbed short items because most of them were considerably straight forward.

  • 8/13/2019 EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO TH

    62/117

    47

    However, question 6 in section A had the lowest p-value amongst all questions in

    the test. The predicament which candidates faced in attempting this question was

    translating the word problem into correct computable mathematical concepts.

    Levels of proficiency in language skills might have influenced the performances on

    this question (Crocker and Algina, 1986).

    Focusing on section B questions, it is noted that question 11 was the most

    difficult, and question 12 was the fairest question. Ordering them from least

    difficult to the most difficult question, one would get questions 12, 7, 9, 10, 8, and

    11.

    As noted, optional question 11 was the most difficult and if a student gets a raw

    score of, say 7, on that problem, it has no consequence, with the current assessment

    policy on MSCE mathematics paper 2 examinations, whether it is on problem 12

    which is the easiest. In all fairness, it is clear that one who receives a score of 7

    demonstrated more proficiency than another student who gets the same score on

    problem 12. Wainer, Wang, and Thissen (1991) and Wainer and Thissen (1994) say

    that when optional questions that are differentially difficu