measurement and evaluation (book) abbasi.docx

333
MEASUREMENT & EVALUATION Dr. Arbab Khan Afridi Dr. Arshad Ali Dr. Muhammad Rauf In Collaboration With

Upload: muhammad-nawaz-khan-abbasi

Post on 14-Dec-2015

54 views

Category:

Documents


13 download

TRANSCRIPT

Page 1: Measurement and Evaluation (Book) Abbasi.docx

MEASUREMENT & EVALUATION

Dr. Arbab Khan Afridi

Dr. Arshad Ali

Dr. Muhammad Rauf

In Collaboration With

MASTER COACHING ACADEMY (MCA)

(IER) UNIVERSITY OF PESHAWAR

Page 2: Measurement and Evaluation (Book) Abbasi.docx

All rights reserved with the Author

Authors: Dr. Arbab Khan Afridi

Dr. Arshad Ali

Dr. Muhammad Rauf

Book: Measurement & Evaluation

1st Edition: March, 2015

Composer: M. Nawaz Khan Abbasi

0333-9352585

Printers: Ijaz Printers, Peshawar

Quantity: 1000

Price: 150/-

Available at MCA Academy and leading book shops

[email protected]

Contact: 091-5843361

Cell: 0300-5930899

Page 3: Measurement and Evaluation (Book) Abbasi.docx

TABLE OF CONTENTS

UNIT-1:.............................................................................................1

INTRODUCTION.............................................................................1

1.1 EVALUATION, ASSESSMENT, MEASUREMENT AND TEST:..........11.2 THE PURPOSE OF TESTING......................................................301.3 GENERAL PRINCIPLES OF ASSESSMENT:.................................351.4 TYPE OF EVALUATION PROCEDURE........................................371.5 NORM- REFERENCED AND CRITERION REFERENCED TEST:.....431.6 EDUCATIONAL:.......................................................................45

UNIT-2:...........................................................................................50

JUDGING THE QUALITY OF THE TEST....................................50

2.1 VALIDITY, METHODS OF DETERMINING VALIDITY:................512.2 FACTORS AFFECTING VALIDITY..............................................542.3 RELIABILITY, AND METHODS OF DETERMINING RELIABILITY:562.4 FACTORS AFFECTING RELIABILITY:........................................612.5 PRACTICALITY:.......................................................................64

UNIT-3:...........................................................................................66

APPRAISING CLASSROOM TESTS (ITEMS ANALYSIS)..........66

3.1 THE VALUE OF ITEM...............................................................663.2 THE PROCEDURE/ PURPOSE OF ITEM ANALYSIS:....................723.2 MAKING THE MOST OF EXAMS: PROCEDURES FOR ITEM ANALYSIS:..........................................................................................733.3 ITEM DIFFICULTY:...................................................................913.4 THE INDEX OF DISCRIMINATION.............................................93

UNIT-4:...........................................................................................98

INTERPRETING THE TEST SCORES.........................................98

4.1 THE PERCENTAGE CORRECT SCORE:......................................984.2 THE PERCENTILE RANKS:.....................................................1084.3 STANDARD SCORES:..............................................................1134.4 PROFILE:...............................................................................115

i

Page 4: Measurement and Evaluation (Book) Abbasi.docx

UNIT-5:.........................................................................................117

EVALUATING PRODUCT, PROCEDURES & PERFORMANCE.......................................................................................................117

5.1 EVOLUTION THEMES AND TERMS PAPERS:...........................1175.2 EVALUATING GROUP WORK & PERFORMANCE....................1275.3 EVALUATING DEMONSTRATION:...........................................1315.4 EVALUATION OF PHYSICAL MOVEMENTS AND MOTOR SKILLS:

1385.5 EVALUATING ORAL PERFORMANCE:.....................................144

UNIT-6:.........................................................................................148

PORTFOLIOS...............................................................................148

6.1 PURPOSE OF PORTFOLIOS:.....................................................1486.3 GUIDELINE AND STUDENTS ROLE IN SELECTION OF PORTFOLIO ENTRIES AND SELF-EVALUATION:....................................................1566.4 USING PORTFOLIOS IN INSTRUCTION AND COMMUNICATION:

1606.5 POTENTIAL STRENGTH AND WEAKNESSES OF PORTFOLIOS:. 1636.6 EVALUATION OF PORTFOLIO:................................................169

UNIT-7:.........................................................................................171

BASIC CONCEPTS OF INFERENTIAL STATISTS...................171

7.1 CONCEPT & PURPOSE OF INFERENTIAL STATISTICS:.............1717.2 SAMPLING ERROR:................................................................1737.3 NULL HYPOTHESIS:...............................................................1757.4 TESTS OF SIGNIFICANCE:......................................................1777.5 LEVELS OF SIGNIFICANCE:....................................................1807.6 TYPE-I AND TYPE-II ERRORS: REMAINING:..........................1827.7 DEGREES OF FREEDOM:........................................................186

UNIT-8:.........................................................................................191

SELECTED TESTS OF SIGNIFICANCE....................................191

8.1 T-TEST:.................................................................................1918.2 CHI-SQUARE (X2):.................................................................1948.3 REGRESSION:........................................................................199

ii

Page 5: Measurement and Evaluation (Book) Abbasi.docx

FOREWORD

Knowledge is the main distinctive characteristic of human, due

to which hece of Allahys seems, by the gra ing a dream in the olden

dawas selected as vice-regent of Allah Almigthy. Man is superior to other

living beings, because he has the capability and potentiality to understand

as well as reason the consequences. Knowledge is obtained through the

continuous process of education. This process is usually a life long

process.

This is also a fact that education is such an activity which is bi-

lateral and participatory. It cannot be accomplished with out the two

partners-teacher and student. This activity requires a transmitter and

areceiver. If any one of them is missing the exercise would remain

incomplete.

To compare however, the two the- teacher appears superior to

his pupils as he is the organiser and director of the teaching learning

process. That is why since times immemoriable, search for significant

teachers has ever been in progress and the same is still going on. No dobt

the countable good teachers are there, but they are not countless. There is

a need of producing a countless number od genuine educators/prospective

educators to contribute in this regard.

This objective in view the people at the helm of the affairs are

trying their best to bring desirable changes in the education system,

teacher education curriculum and teacher training programmes. The best

teacher, being a dream in the old days, is about to become a reality, if the

curse outlines and syllabi are properly dispensed, it is hoped that the

required lot of teachers would be made available. The future

educators/teachers are needed to well equiped in all skills not confining

only to academic learning ignoring ITC, current affairs and contemporary

issues.

These objectives in view, improvements in the system are being

carried out to achiev the goals. The new curricula, on which is based, this

book of mine is the result of long deliberations and brain stormings

iii

Page 6: Measurement and Evaluation (Book) Abbasi.docx

undertaken by the senior educators. This is now upto the implementers

and the students to benefit from the same in the best possible capacity.

The book is now in your hands and this is not claimed to be the

final word. There is always place for improvement. The author would be

highly obliged for any comments/recommendations, if conveyed to make

it further better and improved.

Dr. Arbab Khan AfridiThe Author

iv

Page 7: Measurement and Evaluation (Book) Abbasi.docx

ACKNOWLEDGEMENT

All praises and glory be to Allah Almighty Who bestowed upon

me His blessing to be able to produce my this book, named “Teacher

Education in Pakistan”. My humble gratitude and thanks are due for Him

with submission and heartiest admiration who guided me to the right

path. This all became possible only due to Allah’s significance and

benevolence. The rays of the light of Omni-Present Allah always took me

out of the deep darkness of ignorance to the lightened path of knowledge,

spreading its reflection to the needy.

My thanks and gratitudes are due for my old student and now

my colleague Dr. ______ , who provided certain reference books and

substantive substances that were very much beneficial for the compilation

of my this book. In addition to that, I am extremely thankful to my

Composer Mr. Muhammad Nawaz Khan Abbasi, Peshawar who provided

step by step expertise views regarding printing and book production

process.

My thanks are also due for Dr. Muhammad Rauf, the ______of

the ______of IER University, Peshawar, who always took pains in

searching certain reference books for me. He always showed great

enthusiasm and pleasure in complying to any of my request regarding the

Bibliographies to make them available at the earliest.

Last but not the least, I am thankful to my family members who

cooperated with me and made all sort of requirements available to me

during the process of preparing the primary substance of this book. They

maitained a very calm and conducive environment to me during all this

period of compilation, otherwise this work would not have been possible

to have come to light.

v

Page 8: Measurement and Evaluation (Book) Abbasi.docx

Autho

r

vi

Page 9: Measurement and Evaluation (Book) Abbasi.docx

vii

Page 10: Measurement and Evaluation (Book) Abbasi.docx

UNIT-1:

INTRODUCTION

1.1 EVALUATION, ASSESSMENT, MEASUREMENT AND TEST:

1.1.1 Evaluation:

Literally, the term evaluation means “appraisal”, “Judgment” or

“assessment”, “calculation”, “estimation” or “rating” of a thing.

According to the International Dictionary of Education (by G

Terry & JB Thomas) evaluation means “value judgment” on an

observation, performance test or any data whether directly measured or

inferred. Evaluation is the qualitative assessment of a thing. It answers

the question “How good”?

A. D. Jones defines evaluation as “the process of finding the

value of something”. He further says “the process of evaluation is the

attempt to find the worth of any enterprise.

The Oxford Advanced Learner’s Dictionary defines the term

“Evaluate” as to find out or form an idea of the amount or values of

something. When we evaluate something, we mean to determine the

value or worth of that thing. Evaluation is, actually, the process through

which we collect information of something and then make a decision in

the light of that information.

So we can say that evaluation is concerned with making

judgments about things. When we act as evaluators, we attribute “value”

or “worth” to behavior, objects and processes. In the wider community,

for example, one may make evaluative comments about a play, clothes,

restaurant, a book or someone’s behavior. We may enjoy a play, admire

someone’s clothes, to speak about some restaurant and so on and so forth.

Invariably, these are rather simple, straight forward comments of value or

worth because this judgment is not based on appropriate and relevant

data.

1

Page 11: Measurement and Evaluation (Book) Abbasi.docx

According to William Wiersma and Stephen G. Jurs, “The more

effective evaluation requires the judgment which is based on appropriate

and relevant data”. For example, to say that a film is ‘good’ or ‘bad’ is

not the judgment based on appropriate and relevant data. It is, therefore,

not the exact evaluation of the film. It well-written script, tight direction

mood-enhancing music, suitable characters and so forth; because this

judgment is based on some appropriate and relevant data. These are the

characteristics upon which we can make a judgment about something.

Educational Evaluation:

The Concept of Evaluation in Education:

Educational Evaluation is a specific term which is used for the

judgment of the educational objectives. Educational Evaluation seeks to

determine how the student has achieved the stated objectives of the

learning situation.

Different educationists have defined Educational Evaluation in

different words some of which are discussed below:

1. “Educational Evaluation is a systematic process of collecting,

analyzing and interpreting information to determine the extent to

which pupils are achieving instructional objectives”.

–– Norman E. Gronlund

2. “Educational Evaluation is the systematic process of collecting

and analyzing data in order to determine whether, and to what

degree, objectives have or being achieved”.

–– L.R. Gay

3. “Educational Evaluation is the estimation of the growth and

progress of pupils towards objectives or values in the

curriculum”.

–– Writestone

2

Page 12: Measurement and Evaluation (Book) Abbasi.docx

4. “Educational Evaluation is the defined as the process of

determining the extent to which educational objectives are

achieved by the student”.

–– Remmers

Approaches to Evaluation

Evaluation in our schools is essentially concerned with two

major approaches to making judgments:

1. Product Evaluation:

It is the evaluation of students’ performance in a specific

learning context. Such kind of evaluation seeks to determine how well the

student has achieved the stated objectives of the learning situation. In this

sense the student’s performance is seen as a product of the educational

experience. A school report is an example of Product Evaluation.

2. Process Evaluation:

It is that kind of evaluation that seeks to examine the

experiences and activities involved in we learning situation. It makes

judgment about the process by which students acquired learning. In more

simple words, it examines the process of learning experience before it has

concluded. For example, the evaluation of the nature of students-teacher

interaction, instructional methods, school curricula, a specific

programmes etc. are the best examples of Process Evaluation.

1. Curriculum Evaluation

Curriculum Evaluation, as is clear from the name, is the

evaluation of a certain curriculum i.e. an instructional programme. It is

used to determine the outcome of a programme and to decide whether to

accept or reject a programme. This evaluation helps in the further

development of the curriculum materials for continuous improvement.

For a better learning, it is necessary to assess a new programme in order

to find out whether the desired outcomes are being achieved or not. The

use of evaluation techniques should enable the curriculum workers to

make steady progress in improving the curriculum. Curriculum

3

Page 13: Measurement and Evaluation (Book) Abbasi.docx

evaluation should not only be a means for judging educational

effectiveness, but also should lead to useful decisions that can serve as a

powerful force to improve the educational process. Careful evaluation

should demonstrate the strengths and weaknesses in the curriculum so

that necessary changes can be made in the instructional programme.

2. Programme Evaluation

Programme Evaluation is used for judging the effectiveness of a

programme or a special project. This evaluation is used to make a

decision about programme installation and modification. It helps to

obtain evidence to support or oppose a programme. Outside education,

‘programme evaluation' is used as a means of determining the

effectiveness, efficiency and acceptability of any form of programme.

But within education, we can use the term in a similar way as in the case

of evaluating the effectiveness of a new writing, or reading programme in

primary schools. A curriculum evaluation may qualify as a programme

evaluation if the curriculum is focused on change or improvement.

Programme evaluation, however, does not involve appraisal of curricula

(e.g. evaluation of a computerized student record keeping system.)

3. Personnel Evaluation

The evaluation of personnel is the assessment of the

performance of a working personnel in an organization. That is why it is

also called performance appraisal or staff evaluation. In education,

'personnel evaluation' is very much necessary for adopting appropriate

appraisal, plans and procedures to achieve the goal of education.

According to McNeil, J.D. "Evaluation of the performance of working

personnel can be an effective instrument for helping people in growing

and developing in their roles. It could be used as a mechanism of

continuing education and learning from one another. Through a well-

organized appraisal system every employee (r.3) can create learning

spaces for himself in the system in which he works. A good personnel

evaluation helps the employee to recognize his/ her own strengths and

weaknesses in order to enable him to improve his performance in a given

4

Page 14: Measurement and Evaluation (Book) Abbasi.docx

role. It also helps in identifying people for the purpose of motivating,

training and developing them for new roles or existing roles.

4. Institutional-Evaluation:

Institutional Evaluation is the evaluation of the total programme

of a school, college, university or other educational institution. The

evaluation of an institution is used to collect information and data on all

aspects of the function of that institution. The basic aim of this evaluation

is to determine the degree to which instructional objectives are being met

and to identify areas of strength and weakness in the total programme. An

institutional evaluation involves more than the administration of tests to

students; it may require any combination of questionnaire, interviews,

and observations with data being collected from all persons in the

institution community, including administrators, teachers, and

counsellors. The major component of institutional evaluation is the

institution testing. The more comprehensive the testing program, the

more valuable are the resulting data. That is why, for achieving the most

valuable resulting data, institutional testing programme should include

measurement of achievement, aptitude, personality and interest. Tests

selected for an institutional evaluation must match the objectives of the

institution and be appropriate for the students to be tested.

Need or Importance of Evaluation

Evaluation plays pivotal role in teaching-learning process. It

helps in providing information about the success or failure of an

educational objective. It shows whether the student has achieved the

required objective or not, and to what degree has the goal been reached?

So evaluation provides relevant information to the decision-makers need

about input, output, operation of a programs, and placement of student in

programs. Levels of understanding can be assessed. and future

educational objectives set, based on student needs. Similarly, appropriate

activities can be planned by the teacher based on the knowledge of the

attributes of the student. Evaluation also makes it easy for the teacher to

form objective, select content, and plan for learning experiences. It also

provides a guideline about all aspects of the teaching- learning process.

5

Page 15: Measurement and Evaluation (Book) Abbasi.docx

Without evaluation we cannot be aware of the effectiveness or

ineffectiveness of an educational program or objective.

Evaluation is as necessary for student as for teacher or decision-

makers. Its importance for the student is great because the whole process

of education is for the benefit of the student. The student is the centre of

interest in the teaching learning process. For the student, evaluation

provides feedback regarding better strengths and weaknesses. It

encourages the student for better study and increases his motivation.

Improvement of the teacher's teaching and the student's learning through

judgment, using available information is the ultimate need of the

evaluation process.

In a nutshell, evaluation plays central role in the teaching

learning process. It serves as a guiding principle for the selection of

supervisory techniques and also as a means for improving school-

community relation.

1.1.2 Assessment: Concept of Assessment

Literally assessment means the act of judging or assessing a

person or situation or event. It is the classification of someone or

something with respect to its worth. Assessment is a general term that

includes the full range of procedures used to gain information about

student learning (observations, ratings of performances or projects,

paper-and-pencil tests) and the formation of value judgments concerning

learning progress. A test is a particular type of assessment that typically

consists of a set of questions administered during a fixed period of time

under reasonably comparable conditions for all students. (Linn and

Groundlund, 2000)/ Assessment may include both quantitative

descriptions (measurement) and qualitative descriptions (non-

measurement) of students. In addition, assessment always includes value

judgments concerning desirability of the results. Assessment may or

may not he base on measurements; when it is, it goes beyond simple

quantitative description.

6

Page 16: Measurement and Evaluation (Book) Abbasi.docx

The process of collecting, synthesizing, and interpreting

information to aid in decision making is called assessment. For many

people, the, words,, classrooms, assessment evoke images of pupils

taking paper-and-pencil, test, teachers scoring them, and grades being

assigned to the pupils based upon their performance. Assessment, as the

term is used here, includes the full range of information that teacher

gather in their classrooms; information that helps them understand their

pupils, monitor their instruction, and establish a viable classroom culture.

It also includes the variety of ways teachers gather, synthesize, and

interpret that information.' Assessment is a general term that includes all

the ways through which teachers gather information in their classrooms.

Need for Assessment in Education

As long as there is need for the educator to make some

instructional decisions, curricular decision, and selection decision.

Placement or classification decisions based on the present or anticipated

educational status of the child so long will there be need for assessment

in educational enterprise. To the modern educator, the ultimate goal of

assessment is to facilitate learning. This could be done in a number of

ways, in each way a separate type of decision is required. The assessment

decision also determines which of tests is to be used for assessment. Thus

there is a close relationship between the purpose of evaluation, evaluate

decisions and types of tests to be used for them. The purposes of

assessment are as follows:

Selection Decision

Whenever there will be choice, selection decision is to be made.

In our daily life we see that institutions and organization need persons for

their work, they get responses from several people but they cannot take

all of them. They have to make selection out of them. Assessment of

these persons is to be made on the bases of tests given to them. Tests will

provide information, which will help in selection decision. Some persons

will be acceptable while others will not be acceptable. Similarly the

universities have to make section decisions for admitting the students to

various courses. Courses in which hundreds of candidates are applicants,

7

Page 17: Measurement and Evaluation (Book) Abbasi.docx

Selection decision is to make on stronger footing. Naturally some tests

are given to the candidates to help in selection decision such as Aptitude

tests, Intelligence tests. Achievement tests or Prognostic tests are

generally given for the purpose of selection decision. There has been

ruling from the judiciary that the scores on these tests should have a good

relationship with the success in the job or the course for which the tests

has been given. If any selection tests does not fulfill this requirement it

needs to be improved or replaced by a better one I Although perfection of

such tests cannot be guaranteed but any institution or organization which

is interested in the best students or workers will continue to make efforts

in improving the tests being used for the purpose of selection.

Placement Decision

Since school education should be provide to all in a welfare state

the schools must make provision for all, they cannot reject the candidates

for admission as the universities or colleges can do. How these

candidates placed in different programmes of school education is to be

determined on the basis of their assessment. Such school determinations

are called placement .decision. These decisions are required not only in

the case of those who are with some disadvantage but also with those

who are gifted and talented. The schools have to find one or the other

programme for all school age children depending upon their weakness or

strength. Placement tests have to be different and more useful from

selection, tests because they improve the decision to differentially assign

students to teaching programmes. Achievement test and interview are

generally used for placement decision.

Classification Decisions

Assessment is also required to help in making decisions in

regard to assigning a person to one of several different categories, jobs or

programmes. These decisions are called classification decisions because

in one particular job or programme, there may be several levels or

categories. To which level or category a particular person of child be

assigned, depends upon the results of the test. Aptitude tests, achievement

tests, interest inventories value questionnaires attitude scale and

8

Page 18: Measurement and Evaluation (Book) Abbasi.docx

personality measures are used for classification decision. There is a minor

difference in classification placement and selection. Classification refers

to the cases, where categories are essentially unordered, placement refers

to the case where the categories represent level of teaching or treatment

and selection refers to the case where the persons can be selected or

rejected.

Diagnosis and Remedial Decisions

Assessment is required to locate the students who need special

remedial help. For example what instructional strategies the teacher

should use to help a particular students or a group of student so that the

opportunities are maximized to achieve the objective. Aptitude tests,

intelligence tests, diagnostic achievement tests, diagnostic personality

measures etc. may be used to achieve the purpose.

Feed Back

It is not sufficient to assessment student through a test and doing

nothing after that. A good teacher will use tests for the purpose of

providing feedback to students. Feedback may be effective or ineffective

depending upon the circumstances. Feedback will facilitate learning if it

confirms the learner's correct responses or identifies errors and corrects

them. Test results made available to parents may be used for making

feedback evaluation device. It is also to be remembered that feedback are

both for the student and teacher because it provide information to both

and help in knowing how will students have learnt and how well the

teacher has taught.

Motivation and guidance of learning: Assessment is also used to

motivate the students for more study and providing for learning. However

motivation device can be used positively as well as negatively.

Unfortunately most of the schoolteacher use examination or refusing to

grant annual promotion to next class can motivate the student but if they

are motivated with using such evaluation techniques which provide more

confidence to the students in the subject, they will be more effective and

lasting. Aptitude tests, achievement tests, attitude scales, personality

9

Page 19: Measurement and Evaluation (Book) Abbasi.docx

measures, interest inventories, surprise quizzes encourage student for

more study and understanding.

Assigning Makers to Students:

The instructional programme remains incomplete if it is not

followed by assessment. Although no teacher chooses teaching

profession because he is interested in evaluating the students but no

teacher confines his job to teaching only. He regularly evaluates his

students and assigns them makers. Actually most of the teachers are

giving most of their time to this purpose. If teachers do not evaluate their

students, do not assign those marks or grades, how can they check their

effectiveness of teaching and learning outcome of the students?

Role of Assessment in Education Process

The assessment of learning takes place in an instructional

context and. Consequently, that learning environment shapes the reasons

why we evaluate, influences the purpose for evaluating as well how we

evaluate the determines how we should use the outcomes of our

assessment. Assessment is an integral part of instruction; it is not a

separate entity that somehow is loosely attached to the teaching process.

The instruction process and the role of evaluation in it both must be

understood as background to the study of educational measurement. To

that end, the role of assessment in instruction will be described using a

model that explains how the teaching process works.

(A) There are many models that describe the variety of approaches to

teaching found in schools, but the Basic Teaching Model (BTM),

introduced by Glaser (1962) accounts for the fundamental components of

most other specific teaching models, such as the Socratic approach, the

individualized instruction approach, or the computer dominated

instructional approach (Joyce and well, 1980). Few teachers probably

follow the BTM steps explicitly to guide their instructional activities.

And though we do not specifically endorse the use of the BTM or any

other particular model, we do advocate instructional approaches, by

10

Page 20: Measurement and Evaluation (Book) Abbasi.docx

whatever name, the account for the fundamental functions represented in

the BTM as described next.

The main purposes of the BTM are to identify the major

activities of the teacher and to describe the relationship between activities

figure III is a diagram of the mode. Our primary interest is the Performer

Assessment component, but we cannot understand completely the role of

evaluation without understanding how Performance Assessment affects,

and is affected by, other teaching activities. Instructional Objectives, the

first component of the BTM, represents the teacher's starting point in

providing instruction. What should students learn? What skills and

knowledge should be the focus of instruction? What is curriculum and

how is it denned? The second component, Entering Behaviour, indicates

that the teacher must try to assess the students' level of achievement and

readiness to learn prior to beginning. Instruction. What do the students

know already and what are their cognitive skills like? How receptive to

learning are they? Which ones seems self-motivated? This component

indicates a need for evaluation information before instruction actually

begins.

Once the teacher has decided what will be taught and to whom

the teaching is to be directed, the "How?" must be determined. The

Instructional Procedures component deals with the material and methods

of instruction the teacher selects or develops to facilitate student learning.

Does the text need to be supplemented with illustration? Should small

group projects be developed? Is there computer software available to

serve as a refresher for prerequisites? At this point instruction could

begin, and often it does, but unless the teacher makes plans to evaluate

student's performance, the students and teacher will never be sure when

learning is complete. The performance Assessment component helps to

answer the question, "Did we accomplish what we set out to do? Tests,

quizzes, teacher observations, projects, and demonstration are evaluation

tools that help to answer this question. Thus evaluation should be a

significant aspect of the teaching process; teaching does not occur,

according to the model, unless evaluation of learner performance occurs.

11

Page 21: Measurement and Evaluation (Book) Abbasi.docx

Instructional Objectives

Instructional Objectives

Instructional Objectives

Instructional Objectives

A B C D

EFeedback Loop

The model shows a fifth component, the Feedback Loop that can

be used by the teacher as both a management and a diagnostic procedure.

If the results of evaluation indicate that sufficient learning has occurred,

the loop takes the teacher back to the Instructional Objectives component,

and each successive component, so that plans for beginning the next

instructional unit can be developed. (New objectives are needed, entering

behavior is different, and methods will need to be reconsidered,) But

when evaluation results are not so positive, the Feedback Loop is a

mechanism for identifying possible explanations. (Note the arrows that

return to each component.) Were the objectives too vaguely specified?

Did students lack essential prerequisite skills or knowledge? Was the film

or text relatively ineffective? Was there insufficient practice opportunity?

Such questions need to be asked and frequently are. However, questions

need to be asked about the effectiveness of the performance assessment

procedures also, perhaps more frequently than they are. Were the test

questions appropriate? Were enough observations made? Were directions

clear to students? The Feedback Loop returns to the Performance

Assessment component to indicate that we must review and assess the

quality of out evaluation procedures, after the fact to determine the

appropriateness of the procedures and the accuracy of the information.

Unless the tools of evaluation are developed with care, inadequate

learning may go undetected or complete learning may be misinterpreted

as deficient.

In sum, good teaching requires planning for and using good

evaluation tools, Furthermore, evaluation does not take place in vacuum.

The BTM shows that other components of the teaching process provide

cues about what to evaluate, when to evaluate, and how to evaluate. Our

12

Page 22: Measurement and Evaluation (Book) Abbasi.docx

purpose is to identify such cues and to take advantage of them in building

tests and other assessment devices that measures achievement as

precisely as possible.

(B) Assessment decision maker who is concerned about all aspects of the

educational endeavour. The key point to consider and keep in mind is that

evaluation involves appraisal of particular goals or purposes. Useful

information may be obtained for evaluation procedures by both formal

and informal mean and should include information collected during

instruction as well as in the end of the course date. According to Ahmanrt

and Giock (1985) School Administrators, guidance personnel, classroom

teacher, and individual students require information that will allow them

to make informed and appropriate decision regarding their respective

educational activities. Ideally, they should be aware of all the alternatives

open to them, the possible outcomes of each alternative, and the

advantages and disadvantages of the respective outcomes, Educational

and psychological measurement can help individuals with these matters.

(C) Tyler, 1966: Airasian and Madaus. 1972: Gronlund 1976:

Thorndike and Hagen, 1977: rightly observe that the data

secured through testing procedures may have uses as give below:

First, measurement data may be employed in the placement of

students on one or another instructional programme. Usually pupils take a

pretest to measure whether they have mastered the skills that are

prerequisite to admittance to a particular course or instructional,

sequence. For instance, foreign language and mathematics programmes

are usually arranged in some hierarchical order so that achievement at

each level of learning depends on mastery of the preceding level.

The student is lead from the entering position in the hierarchy to

the terminating phase via intermediate steps, based upon the information

provided by a pretest a student can be placed:

(1) At the most appropriate point in the instructional sequence.

(2) In a programme with a particular instructional strategy on

13

Page 23: Measurement and Evaluation (Book) Abbasi.docx

(3) With an appropriate teacher.

Second, measurement data can be used in formative evaluation.

Tests are administered to students to monitor their success and to provide

them with relevant feedback. The information is employed les to grace a

student than to make instructions responsive to the student's strengths

anorweaknesses as identified by the measurement device, Mastery

learning procedures emphasize the use of formative tests to provide

detailed information about each student's grasp of a unit's objectives.

Third, measurement data has a place in diagnostic evaluation.

Diagnostic testing takes over where formative testing /eaves off When a

student fails to respond to the feedback corrective activities associated

with formative testing a more detailed search for the source of the

learning difficulty is indicated. Remediation is only possible when

teacher understands the basis of a student's problem and then designs

instruction to address the need.

Forth, measurement data may be used for summation purposes.

Such testing is employed to certify or grade students at the completion of

a course or unit of instruction. Often the result is `final' and follows the

student throughout his or her academic career (as in the case or college

and university transcripts). It is this aspect of evaluation that some

educators final particularly objectionable.

Fifth, measurement data are used by employers educational

institutions in making the selection by decisions. Many jobs and slots in

education& programme are limited in number, and there are more

applicants than positions. In order to identify the most promising

candidates standardized tests may be administered to the applicants. The

information provided by the tests presumably increases the accuracy and

objectivity of administrator's decisions. College Board examinations are

used by many universities in admitting students to graduate and

professional schools likewise employ data from standardized testing

programme make their entrance decisions.

14

Page 24: Measurement and Evaluation (Book) Abbasi.docx

Sixth, school officials in making curricular decisions in order to

evaluate existing programme use measurement data and to decide among

instructional alternative. School administrators need to assess their

students' current levels of performance the strengths and weaknesses of

the evidence.

Seventh, measurement data finds a place in personal decision-

making. Individuals confront a variety of choices at any number of points

in their lives. Should they attend college or pursue some other type of

post-high school training? What kind of Job seems most suited to their

needs? What sort of training programme should they enter? Measures of

interest, temperament, and ability can give individuals insights that can

prove helpful in the decision-making process.

Types of Assessment

Tests/ and other assessment procedures can be classified in

terms of their functional role in classroom instruction. One such

classification system follows the sequencer which assessment procedures

are likely to be used in the classroom. These categories classify the

assessment of student performance in the following – manner:

1. Placement assessment

To determine student performance at the beginning of

instruction.

2. Formative assessment

To monitor learning progress during instruction-

3. Diagnostic assessment

To diagnose learning difficulties during instruction.

4. Summative assessment

To assess achievement at the end of instruction.

Although a single instrument may sometimes be useful for more

than one purpose (e.g., both form formative and summative assessment

purposes), each of these types of classroom assessment typically requires

instruments specifically designed for the intended use.

15

Page 25: Measurement and Evaluation (Book) Abbasi.docx

All these types of assessment are discussed below in detail.

Placement Assessment

This is also called Need Analysis Assessment. Placement

assessment is concerned with the student's entry performance and

typically focuses on questions such-as the following: (1) Does the student

possess the knowledge and skills needecF to begin the planned

instruction? For example, is a student's reading comprehension at a level

that allows him or her to do the expected independent reading for a unit

in history, or does the beginning algebra student have a sufficient

command of essential arithmetic concepts? (2) To what extent has the

students already developed the understanding and skills that are the goals

of the planned instruction? Sufficient levels of comprehension and

proficiencies might indicate-the desirability of skipping certain units or of

being placed in a more advanced course. (3) To what extent do the

student's interests, work habits, and personality characteristics indicate

that one mode of instruction might be better than another (e.g., group

instruction versus independent study)? Answers to questions like these

require the use of a variety of techniques: records of past achievement,

pretests on course objectives, self-report inventories, observational

techniques, and so on. The goal of placement assessment is to determine

for each student the position in the instructional sequence and the mode

of instruction that is most beneficial.

Formative Assessment

According to Gron Lund (1990):

Formative assessment of work is used while it is in process of

being carried out so that the assessment affects the development of the

works.

Formative Assessment is a part of the instructional process.

When incorporated into classroom practice, it provides the information -

needed to adjust teaching and learning while they are happening. In this

sense, formative assessment informs both teachers and students about

student understanding at a point when timely adjustments can be made.

16

Page 26: Measurement and Evaluation (Book) Abbasi.docx

These adjustments help to ensure students achieve, targeted standards-

based learning goals within a set time frame. Although formative

assessment strategies appear in a variety of formats, there are some

distinct ways to distinguish them from summative assessments.

Formative assessment is used to monitor learning progress

during instruction; its purpose is to provide continuous feedback to both

student and teaching concerning learning successes and failures.

Feedback to students provides reinforcement of successful learning and

identifies the specific learning errors and misconceptions that need

correction. Feedback to the teacher provides information for modifying

instruction and for prescribing group and individual work. Formative

assessment depends heavily on specially prepared tests and assessments

for each segment of instruction (e.g., unit, chapter. Tests and other types

of assessment tasks used for formative assessment are most frequently

teacher made, but customized tests for publishers of textbooks and other

instructional materials also can serve this function. Observational

techniques are, of course, also useful in monitoring student progress and

identifying learning errors. Because formative .assessment is directed

toward improving learning and instruction, the results typically are not

used for assigning course grades.

Diagnostic Assessment

According to Gron Lund (1990):

Diagnostic assessment is concerned with those educational' problems

which remains unsolved even after the corrective prescription of

formative assessment.

Diagnostic assessment is a highly specialized procedure. It is

concerned with the persistent or recurring learning difficulties that are left

unresolved by the standard corrective prescriptions of formative

assessment. If a student continues to experience failure in reading,

mathematics, or other subjects, despite the use of prescribed alternative

methods of instruction, then a more detailed diagnosis is indicated. To

use a medical analogy, formative assessment provides first-aid treatment

17

Page 27: Measurement and Evaluation (Book) Abbasi.docx

for simple learning problems and diagnostic assessment searches for the

underlying causes of problems that do not respond to first-aid treatment.

Thus, diagnostic assessment is much more comprehensive and detailed. It

involves the use of specially prepared diagnostic tests as well as various,

observational techniques. Serious learning disabilities also are likely to

require the services of educational, psychological, and medical

specialists, and given the appropriate diagnosis, the development of an

individualized education plan (IEP) for the student. The aim of diagnostic

assessment is to determine the causes of persistent learning problems and

to formulate a plan for remedial action.

Summative Assessment

The assessment that is carried out at the end of a piece of work is called

summative assessment.

Summative assessment typically comes at the end of a course (or

unit) of instruction. It is designed to determine the extent to which the

instructional goals have been achieved and is used primarily for assigning

course grades or free certifying student mastery of the intended learning

outcomes. The techniques used in summative assessment are determined

by the instructional goals, but they typically include teacher made

achievement tests, ratings on various types of performance (e.g.,

laboratory, oral report), and assessments of products (e.g., themes,

drawing, research reports). These various sources of information about

student achievement may be systematically collects into a portfolio of

work that may be used to summarize or showcase the student's

accomplishments and progress. Although the main purpose of summative

assessment is grading, or the certification of student achievement, it also

provides information for judging the appropriateness of the course

objectives and the effectiveness of the instruction.

1.1.3 Measurement

Meaning &Definition of Measurement

Literally the verb measure means to find or determine the 'size',

`quantity' or 'quality' of anything. According to Chambers Dictionary the

18

Page 28: Measurement and Evaluation (Book) Abbasi.docx

term 'measure' means `to find out the size or amount of something'.

"Measurement" in the International Dictionary of Education (by G Terry

Page & J.B. Thomas) means "the act of finding the dimension of any

object and the quantity found by such an act.

The 'Oxford Advance Learner's Dictionary defines

`measurement' as the 'standard or system used in stating the size, quantity

or degree of something.' It is the way of assessing something

quantitatively. It answers the question "How much?" In other words we

can say that measurement is the quantitative aspect of evaluation. With

the help of measurement we can easily describe students' achievement by

telling their scores. These definitions show that 'measurement' is the

quantitative assessment of something. Now let's see how the term is

defined specifically in education. L. R. Gay, (1985) defines measurement

as "a process of quantifying the degree to which someone or something

possesses a given trait, i.e. quality, characteristics or features."

Educational Measurement

(The concept of measurement in education)

In Education, the term 'measurement' is used in its specific

meanings. It is the quantitative assessment of the performance of a

student, teacher, curriculum or an educational program. We can say that

the quantitative score used for educational evaluation is called

measurement. The term is used for the data collected about student or

teacher performance by using a measuring instrument in a given learning

situation. It shows the exact quantity or degree of the performance, traits

or character of the person or thing to be measured. For example instead of

saying that Hamid is underweight for his age and height, we can say that

Hamid is 18 years old, 5' 8" tall, and weight only 85 pounds. Similarly,

instead of saying that Hamid is a more intelligent than Zahid, we can say

that Hamid has a measured. IQ of 125 and Zahid has a measured IQ of

88. In each of the above cases, the numerical statement is more precise,

more objective and less open to interpretation than the corresponding

verbal statement.

19

Page 29: Measurement and Evaluation (Book) Abbasi.docx

Steps of measurement

There are two steps used for in the process of measurement. The

first step is to devise a set of operations to isolate the attribute and make

it apparent to us. Just a standard is used for judging the durability of a

thing, in the same way educators and psychologists use various methods

for testing the behaviour or performance of a student. For this purpose

they often use Stanford-Binet Tests or other tests that include operations

for eliciting behaviour that we lake to be indicative of intelligence.

The second step in measurement is to express the results of the

operations established in the first step in numerical or quantitative terms.

This involves an answer to the questions, how many or how much? Just

millimetre is used as a unit for indicating the thickness of a thing, in the

same way educators and psychologists use some numerical units for

gauging intelligence, emotional maturity and other attributes. Thus each

step in measurement rests on human- fashioned definitions. In the first

step, we define the attribute that interests us. In the second step, we

define the set of operations that will allow us to identify the attribute, and

express the result of our operations.

Difference between Evaluation and Measurement

Some people use 'evaluation' and 'measurement' in the same

meaning. Both the terms are used for the process of assessing the

performance of the student and collecting information about an

educational objective. Both tell how effective the school programme has

been and refer to the collection of information, appraisal of students, and

assessment of programme. Some recognize that measurement is one of

the essential components of evaluation. But there is difference between

the two terms. Roughly speaking, `measurement' is the quantitative

assessment whereas 'evaluation' is the quantitative as well as qualitative

assessment of the performance of a student or an educational objective.

Measurement is a limited process used for the assessment of limited and

specific educational objectives. On the other hand, evaluation is much

more comprehensive term used for all kinds of educational objectives.

Moreover, for measurement Evaluation is the continuous inspection of all

20

Page 30: Measurement and Evaluation (Book) Abbasi.docx

available information concerning the student, teacher, educational

programme and the teaching- learning process to ascertain the degree of

change in students and form valid judgements about the students and the

effectiveness of the programme. On the other hand 'measurement' is the

collection of data about the performance of a student, teacher or

curriculum etc.

However, both 'evaluation' and 'measurement' are closed closely

related. We cannot separate one from the other. Both are used for

assessing the effectiveness of a programme of the appraisal of student.

Measurement collects data directly from the objects of concern of the

students. Other information is collected from students by non-testing

procedures. Information provided by testing and non-testing is the best

thought of material to be used in the evaluation process.

The Importance of Measurement in Education

Measurement plays very important role in the teaching-learning

process. Without measurement we cannot assess the effectiveness of an

educational programme, the school or its personnel. For effective

teaching, it is necessary for the teacher to be aware of the strengths and

weaknesses of his teaching method. Similarly, for an effective learning, it

is necessary for the student to be aware of the possible outcomes of all

the alternatives. He should also be informed about the advantages and

disadvantages of the respective outcomes. All this is impossible without

measurement. Without measurement, how can a teacher be aware his

method of teaching or how a student can be informed about the outcomes

of the alternatives. Without measurement, evaluation is impossible and

without evaluation we cannot get knowledge of the effectiveness of an

educational programme. Measurement tells us about the characteristics of

students, their progress in studies and their achievements in various

subjects. It also tells how much or to what extent the instructional

objectives of the school and the individual classroom teacher being

achieved? Measurement serves as a guideline for students to develop

their educational and vocational plans for the future. With the help of

measurement, information is gathered about school programmes, policies,

21

Page 31: Measurement and Evaluation (Book) Abbasi.docx

and objectives. This information is conveyed to parents and other

members of the community. Similarly, measurement data are used by

employers and educational institutions in making the selection by

decision. With the help of standardized tests, the administrators collect

information about every applicant. The information provided by the tests

increases the accuracy and objectivity of administrators & decision

makers. In this way measurement data are employed by school officials

in making curricular decisions.

In short, measurement occupies the central place in the process

of teaching and learning. It is the only mean through which the

educational condition can be improved.

The Function of Measurement and Evaluation

Measurement' and 'evaluation' are interdependent ("N. We

cannot separate one from the other just as we cannot separate the two

sides of a coin. Evaluation is the qualitative aspect of anything, which is

based on the quantitative value (measurement) of that thing. Without

measurement we cannot make an exact evaluation of a thing.

In this respect evaluation and measurement perform the same

functions in the education.

Cron Back, in his book "Essentials of psychological testing" has

discussed the following functions of measurement and evaluation.

(1) Effectiveness of Educational Programme

In education, the concerned people and personnel must be aware

of the effectiveness of an educational programme. This is possible only

by making an evaluation of that programme. By evaluation a teacher is

able to know as to what extent the method of teaching is effective. He is

also able to know as to what extent the equipment of laboratory is

effective. This will enable him to improve his method of teaching make

learning process effective.

(2) Prediction

22

Page 32: Measurement and Evaluation (Book) Abbasi.docx

After evaluation it is possible to predict the performance of

students in future. By evaluation we know the aptitude and interest etc.

with the help of which we guide them to take admission in institution

which is according to his aptitude and interest. So, on the basis of

evaluation we can plan for the future.

(3) Selection

Measurement and evaluation is used during the selection of

suitable persons for different jobs in Govt. as well as semi Govt.

departments.

(4) Classification

Evaluation is helpful in the classification in all educational

institutions. At the end of every year, some tests are given to students to

check their ability and make classification on the basis of results obtained

from these tests.

Another educational psychologist, Camp, adds that evaluation

plays important function in making maladjusted students, students as

useful members of the society by finding their interests and attitudes.

Students suffering from inferiority complex can also be treated after their

proper evaluation.

In short, evaluation and measurement have important functions

in education. They serve as guidelines for students, teachers, ' counsellors

and administrators.

1.1.4 Test

Measurement and evaluation are the two processes that are used

to collect information about the strengths and weaknesses of an

educational programme or the performance of a student, teacher or other

personnel. But these processes need some instruments for their

operations. Such instruments are called tests. So, the instruments that are

used to measure the sample of students' behaviour under specific

conditions are called tests. In other words we can say that:

23

Page 33: Measurement and Evaluation (Book) Abbasi.docx

"A test is a systematic procedure for measuring a sample of students'

behaviour under specific conditions."

Some other definitions of test are given below:

1. A procedure for critical evaluation; a means of determining the

presence, quality, or truth of something.

2. A series of questions, problems, or physical responses designed

to determine knowledge, intelligence, or ability.

3. The means by which the presence, quality, or genuineness of

anything is determined: (e.g. a test of a new product.)

4. The trial of the quality of something: (e.g. to put to the test.)

5. A particular process or method for trying or assessing.

6. A set of problems, questions, etc., for evaluating abilities or

performance.

A test consists of a number of questions to be answered, a series

of problems to be solved, or a set of tasks to be performed by the

examinees. The questions might ask the examinees to define a word, to

do arithmetic computations, or to give some information. The questions,

problems and tasks are called test items.

Difference between Test, Measurement and Evaluation:

William Wiersma and Stephen G. Jurs (1990) in their book

"Educational Measurement and Testing" remarks that the terms of

Testing, measurement, assessment and evaluation are used with similar

meanings but they are not synonymous though they are related with each

other. They define these terms as follows:-

Test: "(It) has a narrower meaning than either measurement or

assessment. Test commonly refers to a set of items or questions under

specific conditions. When a test is given, measurement takes place;

however, all measurement is not necessarily testing".

Measurement: "For all practical purposes assessment and measurement

can be considered synonymous. When assessment is taking place,

24

Page 34: Measurement and Evaluation (Book) Abbasi.docx

information or data are being collected and measurement is being

conducted".

Evaluation: "Evaluation is a process that includes measurement and

possibly testing but it also contains the notion of a value judgment. If a

teacher administer a test to a class and computes the percentages of

correct responses, measurement and testing have taken place. The scores

must be interpreted which may mean converting them to values like As

Bs Cs and so on or judging them to be excellent, good, fair or poor. This

process is evaluation because the value judgments are being made".

Another distinction is given by Normane E. Gronlund (1985)

who defines these terms as follows in the book "Measurement and

Evaluation in Teaching".

Test: "An instrument or systematic procedure for measuring a sample of

behaviour. (Answers the question "How well does the individual

perform-either in comparison with others or in comparison with a domain

of performance tasks"?

Measurement: "The process of obtaining numerical description of the

degree to which an individual possesses a particular characteristic.

(Answers the question "How much?").

Evaluation: "The systematic process of collecting, (Classroom)

analyzing and interpreting information to determine the extent to which

pupils are achieving instructional objectives. (Answers the question

"How good").

Similarly Anthony J. Nitko (1983) in his book "Educational

Tests and Measurement" makes the distinction between Test,

Measurement and Evaluation in the following words:

Tests: "Tests are systematic procedures for observing persons and

describing them with either a numerical scale or a category system. Thus

tests may give either qualitative or quantitative information".

Measurement: "Measurement is a procedure for assigning numbers to

specified attributes or characteristics of persons in a manner that

25

Page 35: Measurement and Evaluation (Book) Abbasi.docx

maintains the real world relationships among persons with regard to what

is being measured".

Evaluation: "Evaluation involves judging the value or worth of a pupil

of an instructional method or of an educational program. Such

judgements may or may not be based on information obtained from

tests".

Robert L. Ebel and David A. Frisible (1986) in their book

"Essentials of Educational Measurement" rightly observe.

"All tests are a subset of the quantitative tools or techniques that

are classified as measurements. And all measurement techniques are a

subset of the quantitative and qualitative techniques used in evaluation."

Table showing relationship Between Testing, Measurement and

Evaluation:

Test Measurement Evaluation

An instrument or

systematic procedure

for measuring a

sample of behavior

The process of

obtaining a numerical

description of the

degree to which an

individual possesses a

particular

characteristic”

A systematic

processes of collecting

and analyzing data in

order to make

decisions

Answers the question

‘How well does the

individual perform-as

compared to others.

Answers the question

‘How much?’

It answers the ‘How

good’

It is means of

collecting

information

It gives Numerical

Value to some trait.

Involves qualitative

and quantitative

assessment and

decision-making.

Its objective is to find Its objective is to Its objective is no

26

Page 36: Measurement and Evaluation (Book) Abbasi.docx

Ability Tests Personality Tests

Achievement Test Aptitude Test Intelligent Test

Objective TestsEssay Tests

Attitude Tests Character Tests Interest Tests Adjustment Tests

out the facts

pertaining to some

aspect.

present the information

objectively.

make decisions about

all components of

educational system

Test is only a

instrument to obtain

data

Measurement

quantifies data and is

essential part of

evaluation

Depends upon testing

and measurement for

data

Types of Tests

TESTS

As it is shown in the diagram above, tests can be classified into

two broad categories according to the behaviour tested: ability tests and

personality tests. These two types are discussed in detail and are further

classified into sub-types in the following lines.

(A) Ability Tests

These tests are used to test the ability of a student. These tests

measure the maximum performance of a student that a student can do.

Ability tests are further classified into three types; (1) achievement tests,

(2) aptitude tests, (3) intelligence tests. These are discussed in the lines

below.

27

Page 37: Measurement and Evaluation (Book) Abbasi.docx

(1) Achievement tests: These tests are used to appraise the

outcomes of classroom instruction. They measure the attained ability of a

student i.e. what a student has learnt to do. Achievement tests are further

classified into two types of tests i.e. 'Essay type tests' and 'objective type

tests'. (These two types of tests will be discussed in detail in the next

question).

(2) Aptitude tests: Aptitude Tests are those tests that are used to

measure the potential ability of a student i.e. what a student can learn to

do. They measure the capacity of a student to learn a given content.

According to Hull, C. L. "An aptitude test is a psychological test

designed to predict an individual's potentialities for success or failure in a

particular occupation, subject for study, etc. this shows that an aptitude

test is a test designed to discover what potentiality a given person has for

learning some particular vocation or acquiring some particular skill.

Achievement tests and aptitude tests seem to be the same. But the

distinction between the two is that they are different in use. If a test is

used to measure the present attainment, it is called achievement test. And

if a test is used to predict the future level of performance, it is called an

aptitude test.

(3) Intelligence Tests: Intelligence Tests are those tests that are

used to measure the native capacity or the overall mental ability of a

student. These are also called scholastic aptitude tests or tests of mental

ability. There are many kinds of intelligent tests but the most popular one

is the concept of (IQ) introduce by Termen. IQ is computed by dividing

the mental age (MA) of a student by his physical age or chronological

age (PA or CA) i.e. the actual age of the student. Then the result is

multiplied by 100.

I.Q = MACA

× 100

Where:

I.Q. = Intelligence Quotient

28

Page 38: Measurement and Evaluation (Book) Abbasi.docx

M.A. = Mental Age

C.A. = Chronological Age (Physical Age)

(B) Personality Tests

Tests used for the assessment of personality of a student are

called personality tests. They measure the typical performance of a

student i.e. what a student will do. They are universally administered

almost all over the world in various fields, vocations, institutions, and for

the selection of recruits. In Pakistan, too, personality tests are used for job

selection and for the selection of army recruits like ISSB examinations.

Personality tests include attitude tests, interest tests, adjustment and

temperament tests, character tests, and tests of other motivational and

interpersonal characteristics.

Uses of Tests

Tests play important role in teaching- learning process. Without

tests we cannot make evaluation or assessment of a student's or neither

teacher's performance nor we can collect information about the

effectiveness of an educational programme. That is why tests are very

important in education. They motivate students for learning. They serve a

number of purposes in a variety of educational activities. The following

are the different uses of tests;

1. Uses of tests in teaching process

With the help of the result obtained from tests, teachers can

easily collect information about aptitude, intelligence, interests, attitude

and the overall performance of the students. He comes to know the

strengths and weaknesses of his teaching method. It becomes easy for the

teacher to grade students in a subject. Teats' results enable him to know

how the future success of a student in a subject can be predicted.

2. Uses of tests in learning process

The student is the centre of intere ;t in teaching –learning

process. All kinds of educational activiti .;s are performed for th sack of

student. That is why the use and importance of tests in th process of

29

Page 39: Measurement and Evaluation (Book) Abbasi.docx

learning is greater than in any other activity. Tests hel ,students in

knowing their strengths and weaknesses in a subject. The resul,S

obtained from these tests serve as guideline for students. They motivate

students to study.

3. Uses of Tests in Guidance

Tests show the overall performance of the students. Therefore;

they enable the examiner to know how to guide students educational and

vocational choice. Tests also make parents aware o the aptitude of their

children and can make a plan for their proper guidance. The result of the

tests in itself serves as a guideline for the students.

4. Uses of Tests in Administration

The results obtained from the tests provide the administrators of

the deportment with useful information In the light of these tests, they

can easily decide how to promote students, how to admit them an&how

to modify (trie.7) school objectives, instructional methods and curricula.

They can then easily decide how to make the teaching–learning processes

effective.

5. Uses of Tests in Research

The data collected from tests are uses as powerful tools in

research and experimentation in classroom. The research workers use

these data in their genetic or ease study research.

In short, tests are used in almost all educational activities. They

are the real tools with the help of which information about teachers,

students, curricula and etc. are gathered. And in the light of this

information, teaching and learning process is improved.

1.2 THE PURPOSE OF TESTING

Introduction:

The purpose of test is usually included the test is announced or

at the beginning of the semester when the evaluation procedures are

described as a part of the general orientation to the course. Should there

30

Page 40: Measurement and Evaluation (Book) Abbasi.docx

be any doubt whether the purpose of the test is clear to all pupils,

however it culd be explained again at at the time of testing. This is

usually done orally. The only time, a statement of the purpose of the test

needs to be included in the written direction is, when the test is to be

administered to several sections taught by different teachers, then a

written statement of purpose ensures, greater uniformity. There are

various types of test being applied in the educational institutions, because

no a child’s ability interests and personality. One test measures only a

specific ability that is why school administers use many different types of

tests even in one single area such as intelligence, move than one test are

needed over a period of years to obtain a reliable estimate of ability each

test serves its own purpose, however, testing and evaluation serve

following purposes.

Types of Testing:

There are four types of testing.

Placement Testing:

Most placement tests constructed by classroom teachers are

pretests designed to measure.

1. Whether pupils possess the prerequisite skills needed to succeed

in a unit or course or.

2. To what extent pupils have already achieved the objectives of

the planned instruction.

In the first instance we are concerned with the pupils readiness

to begin the instruction. In the second we are concerned with the

appropriateness of our planned instruction for the group and

with proper placement of each pupil in the instructional

sequence.

Formative Testing:

Formative tests are given periodically during instruction on

monitor pupils learning progress and to provide ongoing feedback to

pupils and teacher, formative testing reinforces successful learning and

31

Page 41: Measurement and Evaluation (Book) Abbasi.docx

reveals learning weaknesses in need of correction. A formative test

typically covers some predefined segment passes a rather limited simple

of learning tasks. The test items may be easy or difficult, depending on

the learning tasks in the segment of instruction being tested, formative

tests are typically criterion referenced mastery test, but norm-referenced

survey tests can also survey this function, ideally, the test will be

constructed in such a way that corrective prescription can be given for

missed test items or sets of lest items. Because the main purpose of the

test is to improve learning the result are seldom used for assigning

grades.

Diagnostic Testing:

Diagnosis of persistent learning difficulties involves much more

then diagnostic testing, but such tests are useful in the total process. The

diagnostic test takes up where the formative test leaves off if pupils do

not respond to the feedback corrective prescriptions of formative testing.

A more detailed search for the source of testing we will need to include a

number of a test items in each specific area, with some slight variation

from item to items in diagnosing pupils, difficulties in adding whole

numbers, for example we would want to include addition problems

containing. Various numbers combination with some not requiring

carrying and some requiring carrying, to pinpoint the specific types of

error each pupil is making. Because our focus is on the pupils learning

difficulties, diagnostic test must be constructed in accordance with the

most common sources of error that pupils encounter. Such tests are

typically confined to a limited area of instruction, and the test items tend

to have a relatively low level of difficulty.

Summative Testing:

The summative test ig given at the end of a course or unit of

instruction, and the results are used primarily for assigning grades or

certifying pupil mastery of the instructional objectives. The result can

also be used for evaluating the effectiveness of the instruction. The end of

the course test (final examination) is typically a norm-referenced survey

test that is broad in coverage and includes test items with a wide range of

32

Page 42: Measurement and Evaluation (Book) Abbasi.docx

difficulty. The more restricted end of unit, summative test might be norm

referenced or criterion referenced depending on whether mastery or

developmental outcomes are the focus of instruction.

Purpose of Testing:

1. To Certify Pupils’ Achievements / Grading:

Tests are given to the students to ascertain their achievements

tests provide the teacher with student’s actual achievements instead of an

intuitive generalization based on simple observation. These tests given

the teacher an objective and comprehensive picture of each pupil’s

progress. This is important because all concerned persons (students

themselves, student’s parents, teachers, counselors, administrators,

employers, admission officers, and even community) need to know

students performed in school and in particular courses.

To report Student’s Progress to Parents:

Testing gives the teacher in objective and comprehensive picture

of each pupil’s progress, so that it could be presented to the present.

These reports from the foundation for most effective cooperation between

parents and teachers, which results improved learning.

To Report to Administrators:

The results of tests indicate the extent to which the school’s

objectives are being achieved from the results of evaluation the

administrators become able to identify the weak points and strengths in

the teaching programs of their schools and take necessary action for their

improvement.

To Assess Learner’s Needs:

To test the pupils’ knowledge and skills at the beginning of

instruction enables the teacher to answer the questions like: Do the pupils

possess the abilities and skills needed to proceed with the instruction?

What, and to what level have the pupils already mastered the intended

outcomes? This information helps the teacher in planning his

instructional activities.

33

Page 43: Measurement and Evaluation (Book) Abbasi.docx

To Provide Relevant Instruction:

Testing provide a type of continuous feedback, about the

usefulness of the instructional process it helps the teacher in changing

and adapting the instructional activities continuously according to the

student’s needs.

To Furnish Instruction:

Testing factions as an instructional device it not only increases

the self-knowledge of the students, but also the attainment of specific

objectives. This practice of giving ‘tests’ is common in our institutions

through these the students become aware of their speed of progress,

errors, and present status on the basis of which they plan their further

efforts.

To Provide Guidance and Counseling:

The results of tests are especially useful for guidance and

counseling of the students. These are useful in assisting the students with

educational and vocational decisions, guiding them in the selection of

curricular and co-curricular activities, and helping them solve personal

and social adjustment problems.

To Know the level of Achievement of Objectives:

The first step in the instructional process is to determine the

extent to which the pupils achieved the instructional objectives. Testing

and evaluation help in this regard tests are useful in determining the

learning outcomes of classroom instruction. The teacher can evaluate the

success or failure of classroom learning in relation to the test results. The

teacher then accordingly adjusts the level and direction of classroom

instruction.

To Analyze the Instruction Objectives:

The information from carefully developed tests and evaluation is

used to assess the appropriateness and attainability of the instructional

34

Page 44: Measurement and Evaluation (Book) Abbasi.docx

objective. The instructional objectives are modified in the light of the

evaluation information.

To Discover Maladjusted Children:

In every school there are some students who present severe

problems of educational or social adjustment. These include the

withdrawn, the unhappy, the mentally retarded, and others who are not

adjusting to the pattern of the school. The standardized tests help the

teachers and counselors to understand and help such students.

To Appraise Educational instrumentalities:

Testing and evaluation is useful, in appraisal for educational

instrumentalities such as teachers, teaching methods, teaching materials

and text books.

To Conduct Research:

Test and evaluation data is important in research programs. The

information obtained from evaluation is used to compare the

effectiveness of different curricula, different teaching methods and

different organizational plans techniques of evaluation, and to find out

the ways to improve to teaching learning process.

To Change the Curricula:

One purpose of the tests and evaluation is to find out the weak

points in the curriculum so that it could be changed in accordance will the

need, of the society.

To measure Behavior in Controlled Situation:

Another purpose of tests is to measure the behavior of the

subject or student under controlled conditions.

1.3 GENERAL PRINCIPLES OF ASSESSMENT:

Assessment is an integrated process for determining the nature

and extent of student learning and development. In order to make this

process effective, the following principles are taken into consideration.

35

Page 45: Measurement and Evaluation (Book) Abbasi.docx

1) Clearly specify what is to be assessed the priority in the

assessment process. The effectiveness of assessment depends as

much on a careful description of what to assess as it does on the

technical qualities of the assessment procedure used. When

assessing student learning, this means clearly specifying the

intended learning goals before selecting the assessment

procedures to use.

2) An assessment procedure should be selected because of its

relevance to the characteristics or performance to be measured.

Assessment procedures are frequently selected on the basis of

their objectivity, accuracy or convenience.

3) Comprehensive assessment requires a variety of procedures. No

single type of instrument or procedure can assess the vast array

of learning and development outcomes emphasized in a school

program. Multiple choice and short answer tests of achievement

are useful for measuring knowledge, understanding, and

application outcomes, but essay tests and other written projects

are needed to assess the ability to organize and express ideas. A

complete picture of student achievement and development

requires the use of many different assessment procedures.

4) Proper use of assessment procedure requires an awareness of

their limitations. Assessment procedures range from very highly

developed measuring instruments to rather crude assessment

devices. Even the best educational and psychological measuring

instrument yield results that are subject to various types of

measurement error.

Not best or assessment asks all the questions or poses all the

problems that might appropriately be presented in a

comprehensive coverage of the knowledge, skills and

understanding relevant to the content standards or objectives of

a course or instructional sequence. Instead only a sample of the

relevant problems of questions is presented.

Even in a relatively narrow part of a content domain, such as

understanding photosynthesis or the addition and subtraction of

36

Page 46: Measurement and Evaluation (Book) Abbasi.docx

Programme Evaluation Student Evaluation

Summative Evaluation Diagnostic Evaluation Formative Evaluation

Evaluation Procedure

fractions, there are a host of problems that might be presented,

but any given test or assessment samples but a small fraction of

those problems. Limitations of assessment procedures do not

negate the value of tests and other types of assessments. A keen

awareness of the limitations of assessment instruments makes it

possible to use them more effectively. Cruder the instrument, the

greater its limitations and consequently, the more caution

required in its use.

5) Assessment is a means to an end, not an end in itself. The use of

assessment procedures implies that some useful purpose is being

served and that the user is clearly aware of this purpose. To

blindly gather data about students and then file the information

away is a waste of both time and effort. Assessment is best

viewed as a process of obtaining information on which to base

educational decisions.

1.4 TYPE OF EVALUATION PROCEDURE

The evaluation process can basically be carried out at two main

levels; programme and student. Student evaluation can be further be

subdivided into formative and summative evaluation.

Proagramme Evaluation: program evaluation is systematic method for

collecting, analyzing, and using information to answer questions about

projects, policies and programs, particularly about their effectiveness and

efficiency.

When our concern is judging the compatibility between the aims and the

learning out comes of a programme, the emphasis is on the efficacy of

37

Page 47: Measurement and Evaluation (Book) Abbasi.docx

that programme. On the other hand a ‘good’ programme may be badly

implemented. The task of quality and control is to maintain and maximize

the efficiency of a programme.

The quality content of a programme is to determined, among other

factors, by

i. Its conceptual quality.

ii. Logical relevance to the need of the student.

iii. Simplicity and comprehensibility in terms of readability and

literacy level of the content.

iv. Relative stability and survival value in the literature.

v. Applicability to familiar and novel situation.

To matter how good a programme may be; the maintenance

system must be well facilitated. The school administrators head of

subjects unit, supervisors. The teacher and the pupils must be activity

involved if successful implementation of the programme is to be realized,

the teacher being the main executor of the programme must be will

trained not just to be able to teach facts but so select facts that relate to

other facts and principles. The teacher education programmes in the

advanced teacher colleges and the universities must prepare teachers to

be able to teach their subjects effectively.

In order to be implemented a programme should be designed in

such a way that under favourable conditions certain intended learning

outcomes will emerge. The school teacher, the headmaster and supervisor

must gather information from time to time in order determine the success

or seakness of the programme. If desirable outcomes are observed, the

focus of all concerned with instruction should be to improve the

programme through an effective maintenance system. If the product

(students) produced are of poor quality, corrective measures are selected

and applied in order to achieve the desired results. If after all these

efforts, the products are still found to be poor the programme is usually

abandoned.

38

Page 48: Measurement and Evaluation (Book) Abbasi.docx

Several process are involved in the input out put process. The

teacher is the most important component of the maintenance process of

the programme.t he interacts with the students with the staff, experts and

administrators and forms a bridge between hem and learning materials.

Often he acts as the input analyzer and an identifier as well as the

teaching agent of the programme.t he external sensor examines the

learning environment to identify changes perhaps economic, political and

psychological or social within the environment that can destabilize the

system.

The input analyzer processes all the information supplied by the

external sensor and transmits it to the school administrator for appropriate

action. He analyze and organize information obtained form the input

variable into a comprehensible structure to be used in planning activities.

The identifier (usually the teacher or his head of department) examines

the out put and the internal working conditions of the maintance system.

It is he who provide the decision rules (head master) with a realibel

picture of the internal condition of the system. The input output

information provided by the analyzer and the identifier becomes the input

of decision rule and it is utilized by the headmaster to produce a decision

policy or instruction to the teacher.

Any given programme introduced into school setting is not left

in its naked form but assumes a different from for that setting. It contents

are emphasized as teacher, administrators and students.

Programme education can be carried out through the use of

surveys, interview, experimental students and so on.

Student Evaluation:

As pointed out earlier, testing forms an integral part of student

evaluation. The purpose of this type of evaluation is to determine how

well a students is performing in a programme. Through a series of oral

questions, paper-pencil tests, manipulative still tests. Tutorials

discussions, tutorials, individualized instruction, assignments, projects

39

Page 49: Measurement and Evaluation (Book) Abbasi.docx

and so on the student is gradually guided towards a desired goal.

Basically there are two types of student evaluation.

i. Forative and ii. Summative

i. Formative Evaluation:

Formative evaluation aims at ensuring a healthy acquisition and

development of knowledge and skills by students. Formative evaluation

is also used to identify students in order to guide them towards desiable

goals. As student needs and difficulties are identified, appropriate

remedial measures are taken to solve such problems. The purpose is to

find out whether after learning experience students are able to do what

they were previously unable to do. A short term objectives of formative

evaluation may be to help student perform well at the end of the

programme. It is a process of channeling input variables through a

process that will yield expected outputs. The classroom is variables

through a process that will yield expected outputs. The classroom teacher

is the best formative evaluator. Formative evaluation attempts.

1. To identify the content (knowledge or skills).

2. To appreciate the level of cognitive abilities such as

memorization, classification, comparison, analysis, explanation,

quantification, application and so on.

3. To specify the relationship between content and levels of

cognitive abilities.

In other words, formative e evaluation provides the evaluator

with useful information about the strength or weakness of the student

within an instruction context.

1. Formative evaluation is alone during an instructional

programme.

2. The instructional programme should aim at the attainment of

certain objectives during the implementation of the programme.

3. Formative evaluation is done to monitor learning and modifying

the programme if needed before its common completion.

4. Formative evaluation is for current students.

40

Page 50: Measurement and Evaluation (Book) Abbasi.docx

Characteristics of Formative Evaluation

1. It relatively focuses on molecular analysis.

2. It is because seeking.

3. It is interested in the broader experience of the programme

users.

4. It is designing exploratory and flexible.

5. It seeks to identify influential variables.

6. It requires analysis of instructional material for mapping the

hierarchical structure of the learning tasks and actual teaching of

the course for a certain period.

ii. Summative Evaluation

Summative evaluation is primary concerned with purposes

progress and outcomes of the teaching learning process attempts as far as

possible to determine to what extent the broad objective of a programme

have been achieved. It is based on the following assumptions.

1. That the programmer’s objectives are achieved.

2. That the teaching learning process has been conducted effiently.

3. That the teacher student material interaction have been

conductive to learning.

4. That there is uniformity in classroom conditions for all learners.

Unlike formative evaluation, which is guidance oriented

summative evaluation is judgmental in nature. Promotion examination,

the first school leaving certificate examination, the public examination

belongs to this form of evaluation. Summative evaluation carries threat

with it in that the student may have no knowledge of the evaluator.

According to A.F Nikto (1983) summativn already completed

programme, procedure or product. Summative evaluation is done at the

conclusion instruction and measures the extent to which student have

attained the desired out comes.

Chief Characteristics of Summative Evaluation:

1. It lends to the use of well-defined evaluation design.

41

Page 51: Measurement and Evaluation (Book) Abbasi.docx

2. It Focus on analysis.

3. It provides descriptive analysis.

4. It trends to stress local effects.

5. It is unobtrustive and non-reactive as far as possible.

6. It is concerned with broad range of issues.

7. Its instruments are reliable and valid.

Difference between the Summative and Formative Evaluation

In the beginning these terms applied for the evaluation of

curricular work only. M. Seriven explains the difference between these

terms as follows in his book Evaluation the asurus (1980).

“Formative evaluation is conducted during the development or

improvement of a programme or product (or person) it is an evaluation

conducted for in-house but is may be done by an internal or external

evaluator (preferably) a combination. Summative evaluation, on the other

hand, is conducted after completion of a programme. (or a course of

study) and for the benefit of some external audience or decision malker.

(e.g funding agency or future possible users) though it may be done by an

internal or an external evaluator or by a combination”.

Gloria, Hitchok and other (1986) state the difference between

the summative and formative avaluation in these words. “It is fairly

straight forward to produce an “ideal” type of either a summative or a

formative profiles. It is far more difficult to combine the two into one

unified system. The undervaluing philosophies of the two appear difficult

to reconcile”.

Following are the main differences between these types of

evaluation:

1. They differ in purpose, nature and timing.

2. Summative evaluation is the terminal assessment of

performance at the end of instruction but formative evaluation is

the assessment made during the instructional phase to inform the

teacher about progress learning and what more is to be done.

42

Page 52: Measurement and Evaluation (Book) Abbasi.docx

3. The summative evolution limits the use of profile and record of

achievement but they are regulary use in formative evaluation.

4. In summative evaluation, the assessment is done to test learning

outcomes against a set of objectives criteria with out revealing

the details of the route to the teacher, which the student followed

in reaching that point. Formative evaluation takes the form of a

dialogue between the student and teacher in which both

determine the task.

Broad Differences Formative and Summative

Characteristic Formative Summative

Purpose To monitor progress of student getting feedback

To check final status of student

Content focus Detailed narrow scop Gernal Board Scope

Methods Daily assignments Projects

Observations Projects

Frequency Daily Weekly, quarterly etc.

1.5 NORM- REFERENCED AND CRITERION REFERENCED TEST:

Test designed to provide, a measure of performance that is

interpretable in terms of an individual’s relative standing in some known

group is called norm-referenced test. A norm group may be made up of a

students at the local level, district level, provincial level or national level.

Types of Norms: There are two type of norms which are following.

a) National Norms: Most standardized achievement and aptitude

test require national norms because the tests are intended for

used across the country. The norm group should represent the

population of student in the country.

b) Local Norms: There are many communities where local norms

are more useful than the national for example there may be some

cities where the citizen who are above national averages in

educational and socioeconomic level.

43

Page 53: Measurement and Evaluation (Book) Abbasi.docx

Characteristics of Norm-Referenced Test

1. Its basic purpose in to measure student’s achievement to

curriculum based skill. Therefore it covers majority of the

course.

2. It is prepared for a particular grade level. As the test is

curriculum based therefore. It can only be applied to a particular

class for which it is prepared.

3. It classifies achievement as above average, average or below

average for a given grade.

4. It is generally reported in the form or percentile rank, linear

standard score, normalized standard score and grade equivalent

score.

5. Norm-referenced test is likely to have times that are very

difficult for the grade level so student can be ranked.

Drawbacks of Norm–Referenced Test

1. Test item that are answered correctly by most of the pupils are

not included in these test because of their inadequate

contribution to response variance. They will be the items that

deals with the important concepts of course content.

2. Norm-Referenced test compare an individual performance to the

performance of a group called norm group an entirely different

conclusion will be reached if the norm group is a collection of

university seniors majoring in physics.

a) Criterion – Referenced Test:

1. According to Gronlund (1985) a test designed to provide a

measure of performance that is interpretable in terms of a clearly

defined and delimited domain of learning talks is called

criterion-referenced test.

44

Page 54: Measurement and Evaluation (Book) Abbasi.docx

2. According to wiersna and Stephen (1990) criterion referenced

test describe. The performance of the student in the herm of

actual skills or task that are included in the test.

b) Haracteristics of Criterion-Referenced Test:

1. It measures student’s achievement of curriculum based skills.

2. It is prepared for a particular grade or course level.

3. It has balanced representation of goals and objectives.

4. It can be administered before and after instruction.

5. It is used to evaluate the curriculum plan, instructional progress

and group student’s interaction.

c) Limitation of CRTS:

CRTs tell only whether a learner has reached proficient in a task

area but does not show how good or poor in the learner’s level

of ability.

Task included in CRTs may be highly influenced by a given

teacher interests or biases, leading to general validity problem.

Only some area readily land themselves for listing specific test

can be built and this may be a constructing element for teacher.

1.6 EDUCATIONAL:

“Educational assessment can be defined as the process of

documenting knowledge skills, attitude and beliefs”.

Or

“The process of collecting synthesizing and interpreting

information to assessment.”

General Principles of Assessment:

Following are the main principles of assessment.

1. Clearly specify what is to be assessed has priority in the

assessment process.

2. An assessment procedure should be selected because of its

relevance to the characteristics or performance to be measured.

45

Page 55: Measurement and Evaluation (Book) Abbasi.docx

3. Comprehensive assessment requires a variety of procedures.

4. Proper use of assessment procedures requires an awareness of

their limitations.

5. Assessment is a means to an and not an end in itself.

Clearly specify what is to be assessed:

General statements from, content standard or from course

objectives can be a helpful starting point but in most cases teachers needs

to be more specific for assessment process to be effective. Thus

specification of the characteristic to be measured should precede the

selection or development of assessment procedures. Specify the intended

learning goals before selecting the assessment procedure to use.

Example:

Content standard in the field of physics night specify that

students. Should understand idea documents in field of physics.

1. Assessment may be in the form of multiple choice.

2. Short answer

3. Essay questions

4. Numerical questions

To establish assessment priorities for such a standard teacher

needs to answer the questions such as the following.

Q1. What idea?

Q2. What document?

Q3. What concepts of physics?

The general statement in standard does not answer such

questions, but they must be either explicitly or implicitly, to develop

assessments.

Assessment must be relevant to the performance to be measured:

Assessments procedures are frequently selected on the basics of

their objectivity, accuracy or convenience although there criteria are

important they are secondary to main criterion.

46

Page 56: Measurement and Evaluation (Book) Abbasi.docx

Examples:

If teachers goal is that students should learn written skills or

creative writing, composition, sentence structure so the if multiple choice

will be option for assessment then it will be poor one, teacher must

include story writing, easy, summaries or such type of things for

improving writing stills of a child.

“Close match between the intended learning goals and type of

assessment is must”.

Comprehensive assessment requires a variety of procedure:

A lot of procedures are required to assess the knowledge of a

person about anything. Things which are to be assessed also play a vital

role in connectivity with the procedure some of the procedures are given

below.

Multiple choice

Short answer

Essay test

Written projects

Observational technique

Multiple-Choice and short answer test of achievement are useful

for measuring knowledge, understanding, and application outcomes, but

essay tests and other written project are needed to assess the ability to

organize and express ideas. Projects that require students to formulate

problems, accumulate to formulate problems, accumulate information

through library research or collect data (e.g through experimental

observations or interviews) are needed to measure certain skills in

formulating and sawing problems observational techniques are needed to

assess performance skills and various aspects of students behavior and

self-report techniques are use full for assessing interests and attitudes. A

complete picture of students achievement and development requires the

use of many different assessment procedure.

47

Page 57: Measurement and Evaluation (Book) Abbasi.docx

Proper use of Assessment Procedure Requires an Awareness of their Limitations

Not a single test can assess whatever the teacher want every

procedures has its plus points and negative pointes or we can say it is not

suitable for the things to be assessed. So one must how about it and takes

care of it so that we can get correct assessment results.

Some of the major problems are

1. Sampling error

2. Chance factor

3. Incorrect interpretations

Sampling Error:

An achievement test may not adequately sample a particular

domain of instructional content. An observational instrument design to

assess a student’s social adjustment may not sample enough behavior for

a dependable index of this trait.

Sampling can be controlled though careful application of

established measurement procedures.

Chance Factor:

A second source of error is caused by chance factors influencing

assessment results, such as guessing on objective Tess, subjective scoring

on essay test, errors in judgment on observation devices and in consistent

responding on self report instrument.

Through the careful use of assessment procedure we are able to

keep these error of assessment to a minimum.

Incorrect Interpretation:

The incorrect interpretation of measurement results constitutes

another major source of error. We usually, more precise the result than

the requirement that’s why this problem waists Result must be precise

accurately.

48

Page 58: Measurement and Evaluation (Book) Abbasi.docx

Assessment is a mean to an end, not an end in itself:

The use of assessment procedure implies that some useful

purpose implies that some useful purpose is being served and that the

user is clearly aware of his purpose. The blindly gather data about

students and then file the information away is a waste of time and effort.

Assessment is best viewed as a process of obtaining information on

which to base educational decisions.

Conclusion:

All the principles are very important because they are directly

linked with the inter predation of information if the requirement is not

fulfilled than assessment will be wrong.

49

Page 59: Measurement and Evaluation (Book) Abbasi.docx

UNIT-2:

JUDGING THE QUALITY OF THE TEST

Definition: Test percentile scores are just one type of test scores you will

find on your child's testing reports. Many test reports include several

types of scores. Percentile scores are almostalways reported on major

achievement that are taken by your child's entire class. Percentile scores

will also be found on individual diagnostic test reports. Understanding

test percentile scores is important for you to make decisions about your

child's special education program.

Test percentile scores commonly reported on most standardized

assessments a child takes in school. Percentile literally means

perhundred. Percentile scores on teacher-made tests and homework

assignments are developed by dividing the student's raw score on her

work by the total number of points possible. Converting decimal scores

to percentiles is easy. The number is converted by moving the decimal

point two places to the right and adding a percent sign. A score of .98

would equal 98%.

Test percentiles on a commercially produced, norm-referenced or

standardized test, are calculated in much the same way, although

thecalculations are typically included in test manuals or calculated with

scoring software.

If a student scores at the 75th percentile on a norm-referenced test, it

canbe said that she has scored at least as well, or better than, 75 percent

of students her age from the normative sample of the test. Several

othertypes of standard scores may also appear on test reports.

Percentile rank

From Wikipedia, the free encyclopedia

The percentile rank of a score is the percentage of scores in its frequency

distribution that are the same or lower than it. For example, a test score

50

Page 60: Measurement and Evaluation (Book) Abbasi.docx

that is greater than or equal to 75% of the scores of people taking the test

is said to be at the 75th percentile rank.

Percentile ranks are commonly used to clarify the interpretation of scores

on standardized tests. For the test theory, the percentile rank of a raw

score is interpreted as the percentages of examinees in the norm group

who scored at or below the score of interest.mm

Percentile ranks (PRs) are often normally distributed (bell-shaped) while

normal curve equivalents (NLEs) are uniform and rectangular in shape.

Percentile ranks are not on an equal-interval scale; that is, the difference

between any two scores is not the same between any other two scores

whose difference in percentile ranks is the same. For example, 50 _ 25 =

25 is not the same distance as 60 _ 35 = 25 because of the bell-curve

shape of the distribution. Some percentile ranks are closer to some than

others. Percentile rank 30 is closer on the bell curve to 40 than it is to 20.

The mathematical formula is

ce+0.5 fiN

X 100%

where c is the count of all scores less than the score of interest, f is the

frequency of the score of interest, and Nis the number of examinees in the

sample. If the distribution is normally distributed, the percentile rank can

be inferred from the standard score.

2.1 VALIDITY, METHODS OF DETERMINING VALIDITY:

Introduction:

Tests play a central role in the evaluation of pupil learning. They

provide relevant measures of many important learning outcomes. Tests

and other evaluation instruments serve a variety of uses in the school, for

example test of achievement might be used for selection, placement,

diagnosis or certification of mastery.

When constructing or selecting tests and other evaluation

instruments, the most important question is to what extent will be the

51

Page 61: Measurement and Evaluation (Book) Abbasi.docx

interpretation of the scores be appropriate, meaningful and useful for the

intended application of the results?So validity is always concerned with

the specific use of results.

Factors Influencing Validity:

A careful examination of test items will indicate wether the test

appears to measure the subject matter content and the mental function

that the teacher is interested in testing. Following are the factors that

prevent the test items from functioning as intended and there by lower the

validity of the interpretation.

1. Unclear Direction:

Directions that do not clearly indicate to the pupil how to

respond to the items will reduce validity.

2. Reading Vocabulary and Sentence Structure too Difficult:

Vocabulary and sentence structure that is too complicated for

the pupils will distort the meaning of the test results.

3. Inappropriate level of difficulty:

Items that are too easy or too difficult also lower validity.

4. Poorly constructed items:

Test items that provide clues to the answers will measure the

pupil’s alertness in detecting clues as well as those aspects of

pupil performance that the test is intended to measure.

5. Ambiguity:

Ambiguous statements confuse the pupil and so causes to

discriminate in a negative direction.

6. Inadequate time limits:

Time limits that do not provide pupil with enough time to

consider the items reduce the validity.

7. Test too short:

If the test is too short to provide a representative sample of the

performance we are interested in, its validity will suffer

accordingly.

52

Page 62: Measurement and Evaluation (Book) Abbasi.docx

8. Improper arrangement:

Test items are arranged in order of difficulty, with the easiest

items first. Placing difficult items early may cause pupils to

spend too much time.

9. Identifiable pattern of answers:

Placing correct answers in some systematic patern will enable

pupils to guess the answers more easily.

Methods of Determining Validity:

There are several methods of determining the validity of

measuring instruments which we may call.

1. Content Validity:

Content validity is evaluated by showing how well the content

of the test samples the class of situations. It is especially

important in the case of achievement and proficiency measures.

It is also known as “face validity”.

2. Concurrent Validity:

It is evaluated by showing how well test scores correspond to

already accept measures of performance or status made at the

same time. For example, we may give a social studies class a

test on knowledge of basic concepts in social studies and at the

same time obtain from its teacher report on these abilities as far

as pupils in the class are concerned. If the relationship between

the test scores and the teacher’s report of abilities is high. The

test will have high concurrent validity.

3. Predictive Validity:

It is evaluated by showing how well prediction made from the

tests are confirmed by evidence gathered at some subsequent

time. When the tester wants to estimate how well a student may

be able to do in college courses on the basis of how well he has

done on test he took in secondary schools.

4. Construct Validity:

It is evaluated by investigating what psychological qualities a

test measures. It is ordinarily used when the tester has no

53

Page 63: Measurement and Evaluation (Book) Abbasi.docx

definitive criterion measure of what he is concerned with and

hence must use indirect measures. This type of validity is

usually involved in such tests as those of study habits,

appreciations, understanding and interpretation of data.

Conclusion:

In short we can say that validity is specific to the purpose and

situation for which a test is used. A test can be reliable without being

valid but the converse is not true in other words. It is conceivable that a

test can measure some quality with a high degree of consistency without

measuring at all the quality it was actually intended to measure.

2.2 FACTORS AFFECTING VALIDITY

Test experts generally agree that the most important quality of

test is its validity. The word “Validity” means “effectiveness” or

“Soundness”. It refers to the accuracy with which a thing is measured.

Types of Validity:

Validity is classified into three categories. 1) Content Validity.

2) Criterion related validity. 3) Construct Validity.

A good measuring instrument is that which is valid with respect

to all these three categories. These are discussed below.

i. Content Validity:

Content validity is the degree to which a test measures an

intended content area. In other words the content validity of a

test refers to the extent to which the test content represents a

specified universe of content.

For Example: for example if a teacher taught a course of

biology and would like to give a test at the end of the course.

ii. Construct Validity:

Construct validity is the degree to which a test measures an

intended hypothetical construct. In other words construct

validity refers to the extent to which the test throughout the

54

Page 64: Measurement and Evaluation (Book) Abbasi.docx

major. There will be a correlation between the new measuring

approach / tool with standardized measure of ability in this very

discipline (like GRE subject test).

iii. Concurrent Validity:

Concurrent validity is the degree to which the scores on a test

are related to the scores on another, already established, test

administered at the same time or to some other valid criterion

available at the same time.

For Example:We may give a social studies class a test based on

knowledge of basic concepts in social studies and at the same

time obtain from its teacher a report on these abilities. As far as

pupils in the class are concerned measures to construct that it

claims to measure.

For Example: Examples of the construct are intelligence,

creativity ability to apply principles and ability to reason. For

example if a teacher wants to measure the ability to reason, and

give two reasoning tests to his class.

iv. Criterion related Validity:

This type of validity is used to predict about the upcoming or

futures or current performance and it correlated the test results

with another criterion of interest. (Coz by, Zool).

For Example: If for an educational program, measures are

developed to assess the cumulative student learning.

v. Predictive Validity:

Predictive validity is the degree to which a test can predict how

well an individual will do in future situation. In other words,

predictive validity means the validity of a test or examination

which is based upon its correlation with some future variable.

For Example: For example one speaks of the predictive validity

of school examination for future success in higher education.

Similarly, if a small test gives the same standing to an individual

in a test which was achieved by him in a much longer test, it will

be culled concurrently validity.

55

Page 65: Measurement and Evaluation (Book) Abbasi.docx

Methods of Determining Validity:

The methods of deterning validity is also termed as, forms of

Expressing validity. There forms, generally used for expressing validity

index of the test.

1. Correlation Coefficient:

Test scores are correlated with that of criterion scores. The

obtained coefficient of correlation is the extent of validity index

of the test.

2. Expectancy Table:

Test scores are evaluated or correlated with the rating of the

supervisors. It provides empirical probabilities of the validity

index.

3. Cross Validation:

It means to have another look for correlation coefficient with

another criterion or expect any tables with other criterion. It is of

two types.

a. Empirical validation

b. Logical or rationale validation

2.3 RELIABILITY, AND METHODS OF DETERMINING RELIABILITY:

Meaning and Definition:

Reliability means consistency of measurement in the words of

Ebel and frisbie, “The ability of a test to measure the same quantity when

it is administred to an individual on two different occasions by two

different teser is called reliability. The reliability indicates the degree to

which measurement can be relied upon, to measure the same thing each

time is used.

In simple words we can say that a good measuring instrument

(test) should be reliable in reporting the results if it is done by the same

group of student under the same conditions.

56

Page 66: Measurement and Evaluation (Book) Abbasi.docx

Reliability is also called dependability or trustworthiness

reliability is the degree to which a test consistently measures whatever it

measures. The more reliable a test is, the more confidence we can have

that the scores obtained from the administration of the test are essentially

the same scores that would be obtained if the test where re-administered.

An unreliable test is essentially useless. For example, if an intelligence

test was unreliable then a student soring an IQ of 120 today might score

an IQ of 140 tomorrow and a 95 the day after tomorrow. On the other

hand if the test is reliable then the IQ of a student will remain nearly the

same each time the test is administered. The reliability of a test depends

upon the number of questions consisted by it. A test will be more reliable

if it possesses more questions. In this respect, objective type tests are

more reliable because its sampling is more extensive.

We can take another expert opinion to understand the meaning

of reliability. “If a clinical thermo meter on three successive

determinations, for example yielded reading of 97o, 103o and 99.6ofor the

same patient, it would not be considered very reliable.

Reliability, of course is a necessary but not a sufficient condition

for using a test. A highly reliable test may be totally invalid or may not

measure anything that is psychologically of educationally significant.

The reliability of a single of a single test score is expressed

quantitatively in terms of the instruments standard error of measurement.

If the standard error of measurement, for example, is 2.5, we can say that

there are approximately two chances in three (more precisely 68 in 100)

that the true score falls between 72.5 and 77.5 when the obtained score is

75. By definition, an unreliable test cannot possible be valid. The

necessary degree of reliability however depends on the use that is made

of test scores.”

Methods of Determining Reliability:

For determining reliability, it is necessary that the test should be

valid and it should measure what it is designed to measure. It should be

administered to an appropriate person or group of parsons for whom the

57

Page 67: Measurement and Evaluation (Book) Abbasi.docx

test has been developed. Reliability is a statistical measure and therefore

it can be computed by using different statistical methods. Which have

been stated in detail on next page.

1. Test-retest method:

When the reliability of the results are two measured, then at that

very situation the test retest method is used in this method the

tests are subjected to the group of students at different perrods of

time. The scores obtained from first and second time can be

correlated in order to check the stability and persistency of test.

In test re-test reliability the time factor counts a lot in very close

retesting the results are approximately the same, yielding high

correlation. But when the retest is administered after an year or

two, as the result of changes in the characteristics of students,

there are expected to be large variation in the outcome and

therefore stability will be low.\

Limitation:

i. The co-efficient of reliability established through test-retest

method is erroneous.

ii. The reliability determined through test-retest method has

memory of carry over effect.

iii. The test retest method is not an objective method ascertaining

reliability of the test.

2. Equivalent forms method:

The second method of ascertaining reliability is alternate form

method or method of equivalence. Through this method one has

to use two alternate or equivalent tests in order to establish the

reliability. This method is used to see the reliability of test for

measuring certain content area. It is applied to standardized tests

only as they have two or more forms of the same test available.

Equivalent forms are used in the same group and in close

succession. The result of both the tests are correlated. The

correlation shows the degree to which both tests are measurining

58

Page 68: Measurement and Evaluation (Book) Abbasi.docx

the same content area. Sometimes the equivalent forms are used

with time interval. Results obtained by this method provide both

stability and reliability of test. This method is generally

considered to be the best method.

Limitation:

i. Finding the reliability through this method is cumbersome

because it is difficult to judge the quality of a test which is

equivalent in each and every respect.

ii. This process is more time consuming and also it is not free from

carry over effect.

iii. More over establishment of reliability through this method is not

feasible for each and every type of test.

3. Split half method:

As the name indicates in Split-Half- method the approach is to

split the test into two reasonable equivalent halves. Such

independent sub-test are then used as a source of the two

independent scores needed for reliability estimation.

In this method a test is administered to a group of students.

Before scoring, the test is split into two equal halves. Generally

odds and evens are separated. By marking each part separately

each student gets two different scores which are correlated. The

correlation gives a measure of internal consistency, of the test.

Reliability of the test is estimated by applying spearman Brown

formula:

2 x Reliabi lity on12

test

1+Reliability on12

test

Like equivalent forms method the split-half method helps in

determining the reliability of test items are representative sample

of the content.

Limitation:

59

Page 69: Measurement and Evaluation (Book) Abbasi.docx

i. The general criticism of split-half method is concerned with

splitting on the test. As there is no rule, one may go for applying

this own conscience in splitting the test into two halves. The

way of splitting varies from person to person which affects the

reliability coefficient.

ii. The second criticism is concerned with the items difficulty.

Generally the items of a test are arranged in ordered of difficulty

but this fact is not true for each and every type of test. Say for

example, without knowing the difficulty level of items if one

goes for splitting all the difficulty items in one half and the

simple item in another half, it will affect the reliability

coefficient adversely.

4. Kuder Richardson method:

Richardson developed several formulas for measuring internal

consistency of a test. Kudder Richard son formula zo and z1 and

generally applied. But due to simplicity of the operation formula

z1 is always preffered.

Reliability (K R Z1) = K

K−1 [1−M (K−M )K S2 ]

K= the number of items in the test.

M= mean of the test scores;

S= standard deviation of the test scores.

Summary: The following methods are used for determining reliability of

a test.

A Test – Retest method i. Immediate (without interval)

B Equivalent form method ii. With time interval

C Split half method iii. Immediate

D Kuder – Richardson formula iv. With interval

5. Parallel form Reliability:

When the different sets or different parts of a test (suppose

questionnaire a and questionnaire B) are developed but they

must have a linkage (in a sense of knowledge, skills and

60

Page 70: Measurement and Evaluation (Book) Abbasi.docx

behaviors) and then these assessments instruments are subjected

on the same group. The result obtained from these groups are

then correlated which can show the reliability of the test in

regards of the alternate sets of instruments.

6. Inter - rater method of Reliability:

The Measures of the reliability about the different judges agree

upon the decisions about the assessment is called inter rater

method of rebility. The answers cannot effectively interpret by

human observes and for that very purpose the inter-rater

reliability is of utmost importance.

2.4 FACTORS AFFECTING RELIABILITY:

Reliability:

The degree or the extent of the similarities among the results

obtained on several occasion or in other words it can be defined as the

degree to which an assessment instruments elicit stable and consistent

plethora results.

Reliability means consistency of measurement. The words of

Ebel & Frisbie “The ability of a test to measure the same quantity when it

is administered, to an individual on two different occasions by two

different testers is called reliability”.

Reliability also called dependability or trust worthiness.

Reliability is the degree to which a test consistently measures whatever it

measure.

Factors which affects the reliability:

The factors which badly affects the reliability are as under:

The examinee:

Fatigue burden, lack of motivation, carelessness.

Trait of Test:

61

Page 71: Measurement and Evaluation (Book) Abbasi.docx

Ambiguous items, poorly worded direction tricky questions in

familiar format.

Conditions of test- taking and marking:

Poor examination condition, excessive heat or cold carelessness

in marling, disregards or lack of clear standards for scoring,

computational errors.

There are also some factors which affects on reliability, which

are as under:

1. A very important factor influencing test reliability is the number

of test items. That is the greater number of items in a test, the

more reliable the test.

2. Other things being equal the narrower the rang of difficulty of

the items of a test the granter the reliability.

3. Evenness in scaling is factor influencing the reliability of a test

other things being equal a test every scaled is more reliable than

a test that has gaps in the scale of difficulty of its items.

4. Other things being equal, inter-dependent items tend to decrease

the reliability of a test.

5. The more objective the scoring of a test the more reliable is the

test.

6. Chance in getting the correct answer to an items is a factor

which lowers the test reliability.

7. Other things being equal, the more homogenous the material of

a test the greater its reliability.

8. Other thing being equal, the more common the experiences

called for in a test are the members of the group taking the test

more reliable the test.

9. Other things being equal the same test given late in the school

year (i.e. after covering the unit in the class) is more reliable that

when given Carly in the year (i.e. without teaching the unit).

62

Page 72: Measurement and Evaluation (Book) Abbasi.docx

10. Other things being equal, each, question in a test lower the

reliability of test. A test answered by the systematic relall or

recognition of orderly facts or experience is more reliable than a

test answered by sudden insight because of novelty.

11. Lengthy items lower the reliability because certain factors in the

item will be over or under estimated.

12. Inadequate or faulty directions, failure to provide suitable

illustrations of the task lower the reliability.

13. Strange or unusual words of items lower the reliability.

14. The accuracy with which a test is timed is an important factor in

test reliability.

15. Difference in incentive and effort tend to make tests unreliable.

The appeal of a test is stronger with some individuals than with

others, and is stronger with an individual at one time than at

another.

16. Accidents occurring during the examination such as breaking a

pencil, running out of link, or defective test booklets influence

the reliability of the test. Outside disturbances also lower the

reliability.

17. The interval between the test and retest is important for

reliability estimate.

18. Cheating in the examination is another factor which lowers the

reliability because the score of the individual may increase or

decrease unduly.

19. Illness, worry, excitement though less important still they

influence the reliability of the test.

References

Murad Ali Katozai 1st Edition, June, 2013 Measurement and

Evaluation.

63

Page 73: Measurement and Evaluation (Book) Abbasi.docx

Dr. Mohammad Nooman & Obaid Ullah 1st Edition June 27th

2013 A Manual of Educational & Social Science and Research

Methodologies.

2.5 PRACTICALITY:

Meaning:

The word “Practicality” means “feasibility” or “us ability”.

A test will be practicable if it is easy to administrated, easy to

interpret and economical in operation. A good test is that which have

sufficiently simple instructions so that it can be administered even by a

person of low level intelligence. Tests having difficult instructions and

requiring high level training for administering them and expensive for

wide use in schools are social to have low usability or practicability.

Practicality refers to the economy of time effort and money in testing. In

other words a test should be.

Easy to design

Easy to administer

Easy to interpret

Test of Practicality of a Measuring Instrument:

The practicality attribute of a meaning instrument can be

estimated regarding its economy convenience and interpretability.

Economy consideration suggests that some mutual benefits is required

between the ideal research project and that which the budget can afford.

The length of measuring in strument is an important area where economic

pressures are swiftly left.

Convince test suggest that the measuring instrument should be

easily manageable. For this purpose one should pay proper attention to

the layout of the measuring instrument. For example 9, questionnaire

with clear instructions and instrument, is examples of this questionnaire

that lack these features.

64

Page 74: Measurement and Evaluation (Book) Abbasi.docx

Characteristics of Practicality:

There are many characteristics of practicality they are.

1. The test should be free from drawbacks and limitations of both

essay type and objective type tests. They should have they

merits and good point to both these type of test. For this purpose

a test should have both essay and objective type of test and

questions so that it may cover at the time, the whole course as

well as improve the writing skill f the students.

2. It should not require long answer for essay type questions.

3. It should have large number of short essay type questions so that

it may cover the slow course in 9, short time.

4. It should not be prepared for evaluation the knowledge and

information of the students.

5. It should be arranged in the social and economical conditions of

the country.

6. There should be no choice in the given questions. Students

should have to answer all the questions. This will discourage

selective study.

65

Page 75: Measurement and Evaluation (Book) Abbasi.docx

UNIT-3:

APPRAISING CLASSROOM TESTS (ITEMS ANALYSIS)

3.1 THE VALUE OF ITEM

3.1.1 Item Analysis

Item is a statistical technique which is used for selecting and

rejecting the items of a test on the basis of their difficulty value and

discriminative power. Item analysis is a general term that refers to the

specific methods used in education to evaluate test items, typically for the

purpose of test construction and revision. Regarded as one of the most

important aspects of test construction and increasingly receiving duration,

it k an approach incorporated into item response theory (ERT), high

serves as an alternative to classical measurement theory (GMT) or

classical test theory (CIT). Classical measurement theory considers a

score to Ile the direct result of a person's true score plus error. It is this

error that is of interest as previous measurement theories have been

unable to specify its source. However, item response theory uses item

analysis to differentiate between types of error in order to gain a

clearer(1) The main objective of item analysis is to select the appropriate

understanding of any existing deficiencies. Particular attention is given to

individual test items, item characteristics, probability of answering items

correctly, overall ability of the test taker, and degrees or levels of

knowledge being assessed.

Item analysis is concerned basically with the two characteristics

of an item--difficulty value and discriminative power.

Need of Item Analysis

Item analysis is a technique by which the test items are selected

and rejected. The selection of items may serve the purpose of the

designer or test constructor, because the items have the such

characteristics. The following are the main purpose of the test:

66

Page 76: Measurement and Evaluation (Book) Abbasi.docx

(a) Classification of students or candidates.

(b) Selection of the candidates for the job.

(c) Gradation is an academic purpose to assign grades or divisions

to the students.

(d) Prognosis and promotion of the candidates or students.

(e) Establishing individual differences, and

(f) Research for the verification of hypotheses.

The different purposes require different types of test having the

items of different characteristics. The selection or entrance test includes

the items of high difficulty value as well as high power of discrimination.

The promotion or prognostic test has the items of moderate difficulty

value. There are various techniques of item analysis which are used these

days.

The Objectives of Item Analysis

(1) The following are the main objectives of item analysis

technique: items for the final drift and reject the poor items

which do not contribute in the functioning of the test. Some

items are to be modified.

(2) Item analysis obtains the difficulty values of all the items of

preliminary draft of the test. The items are classified-

difficulties, moderate and easy items.

(3) It provides the discriminative power (item reliability; validity) to

differentiate between capable and less capable examines of all

the items preliminary draft of the test. The items are classified

on the basis of the indexes-positive, negative and no

discrimination. The negative and no discrimination power items

are rejected out rightly.

(4) It also indicates the functioning of the distructors in the

multiple-choice items. The powerful and poor distructors are

67

Page 77: Measurement and Evaluation (Book) Abbasi.docx

changed. It provides the basis for the modification to be made in

some of the items of preliminary draft.

(5) T3he reliability and validity of test depends on these

characteristics of a test. The functioning of a test is increased by

this technique. Both these indexes and considered

simultaneously in selecting and rejecting the items of a test.

(6) It provides the basis for preparing the final draft a test. In the

final draft items are arranged in difficulty order. The most easy

items are given in the beginning and most difficult items are

provided at the end.

(7) Item analysis is a cyclic technique. The modified items are tried

out and their item analysis is done again to obtain these indexes

(difficulty and discrimination). The empirical evidences are

obtained for selecting the modified items for the final draft.

Functions of Item Analysis

The main function of item analysis is to obtain the indexes of the

items which indicate its basic characteristics. There are three

characteristics

(1) Item difficulty value (D. V.) is the proportion of subjects

answering each item correctly.

(2) Discriminative power (D.P.) of item, this characteristic is of two

type —

(a) Item reliability— It is taken as the point-biserial correlation

between an item and the total test score, multiplied by the item

standard deviation.

(b) Item validity— It is taken as the point biserial correlation

between an item and a criterion score multiplied by the item

standard deviation.

The test as a whole should fulfil its purpose successfully; each

of its items must be able to discriminate between high and poor students

68

Page 78: Measurement and Evaluation (Book) Abbasi.docx

on the test. In other words, a test fulfils its purpose with maximum

success when each .items serves as good predictor. Therefore it is

essential that each item of the test should be analysed in terms of its

difficulty value and discriminative power for the justification. Item

analysis serves the following purpose

(1) To improve and modify a test for immediate use on a parallel

group of subjects.

(2) To select the best items for a test with regard to its purpose after

a proper try out on the group of subjects selected from the target

population.

(3) To provide the statistical check-up for the characteristics of the

test items for the judgment of test designer.

(4) To set up parallel forms of a test. Parallel form of test should not

require only to have Similar items content or type of items but

they should also have the sky& difficulty value and

discriminative power. Item analysis' technique that exactly

parallel test can be developed, provides 'the empirical basis.

(5) To modify and reject OF poor items of the test. The poor items

may not serve the purpose of the test. The powerful distractor of

items are changed an'tkpoor distracters are also changed.

(6) Item analysis is usually done of a power test rather than speed

test. It speed test all the items are of the same difficulty value.

The purpose of speed test is to measure the speed and accuracy

while speed is acquired through practice. There is no power test,

because the time limit is imposed, therefore these are the

speeded test. The speediness of the test depends on the difficulty

values of the items of the test. Most of the students should reach

to last items, in the allotted time for the test. Item analysis is the

study of the statistical properties of test items. The qualities

usually of interest are the difficulty of the item and its ability or

power to differentiate between more capable and less capable

examinees. Difficulty is usually expressed as the percent or

69

Page 79: Measurement and Evaluation (Book) Abbasi.docx

proportion getting the item right, and discrimination as some

index comparing success by the more capable and the less

capable students.

Meaning of definition of Difficulty Value (D.V.)

The term difficulty value of an item can be explained with the help

of simple example of extreme ends. If an item of test is answered

correctly by every examinee, it means the item is very easy the difficulty

value is 100 percent or proportion is one. This item will not serve any

purpose and there is no use to include such items in a test. Such items are

generally rejected.

If an item is not answered correctly by any of the examinees.

None could answer correctly, it means the item is most difficult, the

difficulty value is zero percent or proportion is also zero. This item will

not serve any purpose and there is no use to include such items in a test.

Such items are usually rejected.

"The difficulty value of an item is defined as the proportion or

percentage of the examinees who have answered the item

correctly."

—1.1). Guilford

"The difficulty value of an item may be defined as the proportion

of certain sample of subjects who actually know the answer of

item."

—Frank S. Freeman

In the definition of difficulty value, it has been stated that it is

the percentage and proportion of examinee's who answer the item

correctly, but in the second definition, the difficulty value is defined as

the proportion of certain sample of subjects who actually know the

answer of an item. This statement seems to be most functional and

dependable, because an item can be answered correctly by guessing but

the examinee does not know the answer of the item. The difficulty value

70

Page 80: Measurement and Evaluation (Book) Abbasi.docx

depends on actually knowing the correct answer of an item rather than

answering an item correctly.

In the procedure of item-analysis "correction for guessing”

formula is used for the scores rather than right answers. The difficulty

value is also obtained in terms of standard scores or z-scores.

Methods or Techniques of item Analysis

A recent review of the literature on item analysis indicates that

there are at least twenty three different techniques of item analysis. As it

has been discussed that item analysis technique obtain the indexes for the

characteristics of an item. The following two methods of item analysis

are most popular and are widely used.

1) Davis method of item analysis—It is the basic method of item

analysis. It is used for the prognostic test for selecting and

rejecting the items on the basis of difficulty value and

discriminative power. The right responses are considered in

obtaining the indexes for the characteristics of an item. The

proportion of right responses on the items are considered for this

purpose.

2) Stanley method of item analysis. It is used for the diagnostic

test items. The wrong responses are considered in obtaining the

difficulty value and discriminative power. The wrong responses

provide the cause of weakness of the students. The proportion of

wrong responses on an item is considered for this purpose.

There are separate techniques for obtaining difficulty value and

discriminative power of the items.

(a) Techniques of Difficulty Value.

There are two main approaches for obtaining difficult value.

a1 – Proportion of right responses on an item technique. Davis and Haper

have also used this technique.

a2 – Standard scores or z-scores or normal probability curve.

71

Page 81: Measurement and Evaluation (Book) Abbasi.docx

Technique of Discriminative Power.

b1 – Proportion of right responses on an item technique. Davis and Haper

have used this technique.

3.2 THE PROCEDURE/ PURPOSE OF ITEM ANALYSIS:

The review of literature on item analysis indicates that there are

two dozen techniques of item analysis have been devices to obtain the

difficulty value and discriminative index of an item of a test. It is not

possible to describe all the techniques of item analysis in this chapter.

Therefore, most popular and widely used techniques have been discussed.

Fredrick B. Davis method of Item Analysis of Prognostic test, and

Stanley method of Item Analysis of Diagnostic test.

"The item difficulty value may be defined as the proportion or

percentage of certain sample subjects that actually know the

answer of an item.

--Frank S. Freeman

The difficulty value depends on actually knowing the answer rather

than answering correctly i.e. right responses. In objective type test, the

items are answered correctly by guessing rather than actually knowing

the answer. It means that an item may be answered without knowing its

answer. Thus, correction for guessing is to be used for obtaining the

scores which may be actual correct responses.

It is important to note that in the procedure of item analysis item

wise scoring is done, while subject wise scoring is done in general. There

are several formulas have been developed by psychomatricians for

'guessing correction'. Some of the important formula-correction for

guessing has been discussed.

Formula-Correction for Guessing

The following two formula-corrections for guessing have been explained.

(a) Guilford's formula-correction for guessing and

72

Page 82: Measurement and Evaluation (Book) Abbasi.docx

(b) Horst's formula-correction for guessing.

(a) Guilford's formuia-correction for Guessing. J. P. Guilford has

developed the following formula-correction for guessing which used for

estimating the actual scores or actually know the answer.

S = R , (1) (n – 1)

where R = right responses on the item

W = wrong responses on the item

n = number of alternatives in the item

S = Actual correct responses on the item.

Example. An item is administered on a group of 50 subjects. The

following responses are obtained on different alternatives of the item.

(a) The functions of item analysis

(b) Selection of good items-8

(c) Rejection of poor items--7

3.2 MAKING THE MOST OF EXAMS: PROCEDURES FOR ITEM ANALYSIS:

One of the most important (if least appealing) tasks confronting

faculty members is the evaluation of student performance. This task

requires considerable skill, in part because it presents so many choices.

Decisions must be made concerning the method, format, timing, and

duration of the evaluative procedures. Once designed, the evaluative

procedure must be administered and then scored, interpreted, and graded.

Afterwards, feedback must be presented to students. Accomplishing these

tasks demands a broad range of cognitive, technical, and interpersonal

resources on the part of faculty. But an even more critical task remains,

one that perhaps too few faculty undertake with sufficient skill and

tenacity: investigating the quality of the evaluative procedure.

Even after an exam, how do we know whether that exam was a

good one? It is obvious that any exam can only be as good as the items it

73

Page 83: Measurement and Evaluation (Book) Abbasi.docx

comprises, but then what constitutes a good exam item? Our students

seem to know, or at least believe they know. But are they correct when

they claim that an item was too difficult, too tricky, or too unfair?

Lewis Aiken (1997), the author of a leading textbook on the

subject of psychological and educational assessment, contends that a

"postmortem" evaluation is just as necessary in classroom testing as it is

in medicine. Indeed, just such a postmortem procedure for exams exists--

item analysis, a group of procedures for assessing the quality of exam

items. The purpose of an item analysis is to improve the quality of an

exam by identifying items that are candidates for retention, revision, or

removal. More specifically, not only can the item analysis identify both

good and deficient items, it can also clarify what concepts the examinees

have and have not mastered.

So, what procedures are involved in an item analysis? The

specific procedures involved vary, but generally, they fall into one of two

broad categories: qualitative and quantitative.

Qualitative Item Analysis

Qualitative item analysis procedures include careful

proofreading of the exam prior to its administration for typographical

errors, for grammatical cues that might inadvertently tip off examinees to

the correct answer, and for the appropriateness of the reading level of the

material. Such procedures can also include small group discussions of the

quality of the exam and its items with examinees who have already taken

the test, or with depaitinental student assistants, or even experts in the

field. Some faculty use a "think-aloud test administration" (cf. Cohen,

Swerdlik, & Smith, 1992) in which examinees are asked to express

verbally what they are thinking as they respond to each of the items on an

exam. This procedure can assist the instructor in determining whether

certain students (such as those who performed well or those who

performed poorly on a previous exam) misinterpreted particular items,

and it can help in determining why students may have misinterpreted a

particular item.

74

Page 84: Measurement and Evaluation (Book) Abbasi.docx

Quantitative Item Analysis

In addition to these and other qualitative procedures, a thorough

item analysis also includes a number of quantitative procedures.

Specifically, three numerical indicators are often derived during an item

analysis: Item difficulty, item discrimination, and distractor power

statistics.

Item Difficulty Index (p)

The item difficulty statistic is an appropriate choice for

achievement or aptitude tests when the items are scored dichotomously

(i.e., correct vs. incorrect). Thus, it can be derived for true-false, multiple-

choice, and matching items, and even for essay items, where the

instructor can convert the range of possible point values into the

categories "passing" and "failing."

The item difficulty index, symbolized p, can be computed

simply by dividing the number of test takers who answered the item

correctly by the total number of students who answered the item. As a

proportion, p can range between 0.00, obtained when no examinees

answered the item correctly, and 1.00, obtained when all examinees

answered the item correctly. Notice that no test item need have only one

p value. Not only may the p value vary with each class group that takes

the test, an instructor may gain insight by computing the item difficulty

level for a number of different subgroups within a class, such as those

who did well on the exam overall and those who performed more poorly.

Although the computation of the item difficulty indexp is quite

straightforward, the interpretation of this statistic is not. To illustrate,

consider an item with a difficulty level of 0.20. We do know that 20% of

the examinees answered the item correctly, but we cannot be certain why

they did so. Does this item difficulty level mean that the item was

challenging for all but the best prepared of the examinees? Does it mean

that the instructor failed in his or her attempt to teach the concept

assessed by the item? Does it mean that the students failed to learn the

material? Does it mean that the item was poorly written? To answer these

75

Page 85: Measurement and Evaluation (Book) Abbasi.docx

questions, we must rely on other item analysis procedures, both

qualitative and quantitative ones.

Item Discrimination Index (D)

Item discrimination analysis deals with the fact that often

different test takers will answer a test item in different ways. As such, it

addresses questions of considerable interest to most faculty, such as,

"does the test item differentiate those who did well on the exam overall

from those who did not?" or "does the test item differentiate those who

know the material from those who do not?" In a more technical sense

then, item discrimination analysis addresses the validity of the items on a

test, that is, the extent to which the items tap the attributes they were

intended to assess. As with item difficulty, item discrimination analysis

involves a family of techniques. Which one to use depends on the type of

testing situation and the nature of the items. I'm going to look at only one

of those, the item discrimination index, symbolized D. The index

parallels the difficulty index in that it can be used whenever items can be

scored dichotomously, as correct or incorrect, and hence it is most

appropriate for true-false, multiple-choice, and matching items, and for

those essay items which the instructor can score as "pass" or "fall."

We test because we want to find out if students know the

material, but all we learn for certain is how they did on the exam we gave

them. The item discrimination index tests the test in the hope of keeping

the correlation between knowledge and exam performance as close as it

can be in an admittedly imperfect system.

The item discrimination index is calculated in the following

way:

1. Divide the group of test takers into two groups, high scoring and

low scoring. Ordinarily, this is done by dividing the examinees

into those scoring above and those scoring below the median.

(Alternatively, one could create groups made up of the top and

bottom quintiles or quartiles or even deciles.)

76

Page 86: Measurement and Evaluation (Book) Abbasi.docx

2. Compute the item difficulty levels separately for the upper

(Pupper) and lower (Plower) scoring groups.

3. Subtract the two difficulty levels such that D = P upper - Plower

How is the item discrimination index interpreted? Unlike the

item difficulty levelp, the item discrimination index can take on negative

values and can range between -1.00 and 1.00. Consider the following

situation: suppose that overall, half of the examinees answered a

particular item correctly, and that all of the examinees who scored above

the median on the exam answered the item correctly and all of the

examinees who scored below the median answered incorrectly. In such a

situation P, upper, = 1.00 and P lower = 0.00. As such, thevalue of the item

discrimination index D is 1.00 and the item is said to be a perfect positive

discriminator. Many would regard this outcome as ideal. It suggests that

those who knew the material and were well-prepared passed the item

while all others failed it.

Though it's not as unlikely as winning a million-dollar lottery,

finding a perfect positive discriminator on an exam is relatively rare.

Most psychometricians would say that items yielding positive

discrimination index values of 0.30 and above are quite good

discriminators and worthy of retention for future exams.

Finally, notice that the difficulty and discrimination are not

independent. If all the students in both the upper and lower levels either

pass or fail an item, there's nothing in the data to indicate whether the

item itself was good or not. Indeed, the value of the item discrimination

index will be maximized when only half of the test takers overall answer

an item correctly; that is, whenp = 0.50. Once again, the ideal situation is

one in which the half who passed the item were students who all did well

on the exam overall.

Does this mean that it is never appropriate to retain items on an

exam that are passed by all examinees, or by none of the examinees? Not

at all. There are many reasons to include at least some such items. Very

easy items can reflect the fact that some relatively straightforward

77

Page 87: Measurement and Evaluation (Book) Abbasi.docx

concepts were taught well and mastered by all students. Similarly, an

instructor may choose to include some very difficult items on an exam to

challenge even the best-prepared students. The instructor should simply

be aware that neither of these types of items functions well to make

discriminations among those taking the test.

[material omitted...]

Conclusion

To those concerned about the prospect of extra work involved in

item analysis, take heart: item difficulty and discrimination analysis

programs are often included in the software used in processing exams

answered on Scantron or other optically scannable forms. As such, these

analyses can often be performed for you by personnel in your computer

services office. You might consider enlisting the aid of your departmental

student assistants to help with item distractor analysis, thus providing

them with an excellent learning experience. In any case, an item analysis

can certainly help determine whether or not the items on your exams

were good, ones and to determine which items to retain, revise, or

replace.

Understanding Item Analysis Reports

Item analysis is a process which examines student responses to

individual test items (questions) in order to assess the quality of those

items and of the test as a whole. Item analysis is especially valuable in

improving items which will be used again in later tests, but it can also be

used to eliminate ambiguous or misleading items in a single test

administration. In addition, item analysis is valuable for increasing

instructors' skills in test construction, and identifying specific areas of

course content which need greater emphasis or clarity. Separate item

analyses can berequested for each raw score' created during a given

ScorePak® run. Sample

Sample item analysis (30K PDF*)

A basic assumption made by ScorePak® is that the test under

analysis is composed of items measuring a single subject area or

78

Page 88: Measurement and Evaluation (Book) Abbasi.docx

underlying ability. The quality of the test as a whole is assessed by

estimating its "internal consistency." The quality of individual items is

assessed by comparing students' item responses to their total test scores.

Following is a description of the various statistics provided on a

ScorePak® item analysis report. This report has two parts. The first part

assesses the items which made up the exam. The second part shows

statistics summarizing the performance of the test as a whole.

Item Statistics

Item statistics are used to assess the performance of individual

test items on the assumption that the overall quality of a test derives from

the quality of its items. The ScorePak® item analysis report provides the

following item information:

Item Number

This is the question number taken from the student answer sheet,

and the ScorePak® Key Sheet. Up to 150 items can be scored on the

Standard Answer Sheet.

Mean and Standard Deviation

The mean is the "average" student response to an item. It is

computed by adding up the number of points earned by all students on

the item, and dividing that total by the number of students.

The standard deviation, or S.D., is a measure of the dispersion of

student scores on that item. That is, it indicates how "spread out" the

responses were. The item standard deviation is most meaningful when

comparing items which have more than one correct alternative and when

scale scoring is used. For this reason it is not typically used to evaluate

classroom tests.

Item Difficulty

For items with one correct alternative worth a single point, the

item difficulty is simply the percentage of students who answer an item

correctly. In this case, it is also equal to the item mean. The item

difficulty index ranges from 0 to 100; the higher the value, the easier the

79

Page 89: Measurement and Evaluation (Book) Abbasi.docx

question. When an alternative is worth other than a single point, or when

there is more than one correct alternative per question, the item difficulty

is the average score on that item divided by the highest number of points

for any one alternative. Item difficulty is relevant for determining

whether students have learned the concept being tested. It also plays an

important role in the ability of an item to discriminate between students

who know the tested material and those who do not. The item will have

low discrimination if it is so difficult that almost everyone gets it wrong

or guesses, or so easy that almost everyone gets it right.

To maximize item discrimination, desirable difficulty levels are

slightly higher than midway between chance and perfect scores for the

item. (The chance score for five-option questions, for example, is 20

because one-fifth of the students responding to the question could be

expected to choose the correct option by guessing.) Ideal difficulty levels

for multiple-choice items in terms of discrimination potential are:

Format Ideal Difficulty

Five-response multiple-choice 70

Four-response multiple-choice 74

Three-response multiple-choice 77

True-false (two-response multiple-choice) 85

(from Lord, F.M. "The Relationship of the Reliability of Multiple-Choice

Test to the Distribution of Item Difficulties," Psychometrika, 1952, 18,

181-194.)

ScorePak® arbitrarily classifies item difficulty as "easy" if the

index is 85% or above; "moderate" if it is between 51 and 84%; and

"hard" if it is 50% or below.

Item Discrimination

Item discrimination refers to the ability of an item to

differentiate among students on the basis of how well they know the

material being tested. Various hand calculation procedures have

traditionally been used to compare item responses to total test scores

80

Page 90: Measurement and Evaluation (Book) Abbasi.docx

using high and low scoring groups of students. Computerized analyses

provide more accurate assessment of the discrimination power of items

because they take into account responses of all students rather than just

high and low scoring groups.

The item discrimination index provided by ScorePak® is a

Pearson Product Moment correlation2 between student responses to a

particular item and total scores on all other items on the test. This index is

the equivalent of a point-biserial coefficient in this application. It

provides an estimate of the degree to which an individual item is

measuring the same thing as the rest of the items.

Because the discrimination index reflects the degree to which an

item and the test as a whole are measuring a unitary ability or attribute,

values of the coefficient will tend to be lower for tests measuring a wide

range of content areas than for more homogeneous tests. Item

discrimination indices must always be interpreted in the context of the

type of test which is being analyzed. Items with low discrimination

indices are often ambiguously worded and should be examined. Items

with negative indices should be examined to determine why a negative

value was obtained. For example, a negative value may indicate that the

item was mis-keyed, so that students who knew the material tended to

choose an unkeyed, but correct, response option.

Tests with high internal consistency consist of items with mostly

positive relationships with total test score. In practice, values of the

discrimination index will seldom exceed .50 because of the differing

shapes of item and total score distributions. ScorePak® classifies item

discrimination as "good" if the index is above .30; "fair" if it is

between .10 and.30; and "poor" if it is below .10.

Alternate Weight

This column shows the number of points given for each

response alternative. For most tests, there will be one correct answer

which will be given one point, but ScorePak® allows multiple correct

alternatives, each of which may be assigned a different weight.

81

Page 91: Measurement and Evaluation (Book) Abbasi.docx

Means

The mean total test score (minus that item) is shown for students

who selected each of the possibleresponse alternatives. This information

should be looked at in conjunction with the discrimination index; higher

total test scores should be obtained by students choosing the correct, or

most highly weighted alternative. Incorrect alternatives with relatively

high means should be examined to determine why "better" students chose

that particular alternative.

Frequencies and Distribution

The number and percentage of students who choose each

alternative are reported. The bar graph on the right shows the percentage

choosing each response; each "#" represents approximately 2.5%.

Frequently chosen wrong alternatives may indicate common

misconception among the students.

Difficulty and discrimination Distributions

At the end of the Item Analysis report, test items are listed

according their degrees of difficulty (easy, medium, hard) and

discrimination (good, fair, poor). These distributions provide a quick

overview of the test, and can be used to identify items which are not

performing well and which can perhaps be improved or discarded.

Test Statistics

Two statistics are provided to evaluate the performance of the

test as a whole.

Reliability Coefficient

The reliability of a test refers to the extent to which the test is

likely to produce consistent scores. The particular reliability coefficient

computed by ScorePak® reflects three characteristics of the test:

The intercorrelations among the items -- the greater the relative

number of positive relationships, and the stronger those

relationships are, the greater the reliability. Item discrimination

82

Page 92: Measurement and Evaluation (Book) Abbasi.docx

indices and the test's reliability coefficient are related in this

regard.

The length of the test -- a test with more items will have a higher

reliability, all other things being equal.

The content of the test -- generally, the more diverse the subject

matter tested and the testing techniques used, the lower the

reliability.

Reliability coefficients theoretically range in value from zero

(no reliability) to 1.00 (perfect reliability). In practice, their approximate

range is from .50 to .90 for about 95% of the classroom tests scored by

ScorePak®.

High reliability means that the questions of a test tended to "pull

together." Students who answered a given question correctly were more

likely to answer other questions correctly. If a parallel test were

developed by using similar items, the relative scores of students would

show little change.

Low reliability means that the questions tended to be unrelated

to each other in terms of who answered them correctly. The resulting test

scores reflect peculiarities of the items or the testing situation more than

students' knowledge of the subject matter.

As with many statistics, it is dangerous to interpret the

magnitude of a reliability coefficient out of context. High reliability

should be demanded in situations in which a single test score is used to

make major decisions, such as professional licensure examinations.

Because classroom examinations are typically combined with other

scores to determine grades, the standards for a single test need not be as

stringent. The following general guidelines can be used to interpret

reliability coefficients for classroom exams:

Reliability Interpretation

.90 and above Excellent reliability; at the level of the best

standardized tests

83

Page 93: Measurement and Evaluation (Book) Abbasi.docx

.80- .90 Very good for a classroom test

.70 - .80 Good for a classroom test; in the range of most. There

are probably a few items which could be improved.

.60 - .70 Somewhat low. This test needs to be supplemented by

other measures (e.g., more tests) to determine grades.

There are probably some items which could be

improved.

.50 - .60 Suggests need for revision of test, unless it is quite

short (ten or fewer items). The test definitely needs to

be supplemented by other measures (e.g., more tests)

for grading.

.50 or below Questionable reliability. This test should not contribute

heavily to the course grade, and it needs revision.

The measure of reliability used by ScorePak® is Cronbach's

Alpha. This is the general form of the more commonly reported KR-20

and can be applied to tests composed of items with different numbers of

points given for different response alternatives. When coefficient alpha is

applied to tests in which each item has only one correct answer and all

correct answers are worth the same number of points, the resulting

coefficient is identical to KR-20.

(Further discussion of test reliability can be found in J. C.

Nunnally, Psychometric Theory. New York: McGraw-Hill, 1967, pp.

172-235, see especially formulas 6-26, p. 196.)

Standard Error of Measurement

The standard error of measurement is directly related to the

reliability of the test. It is an index of the amount of variability in an

individual student's performance due to random measurement error. If it

were possible to administer an infinite number of parallel tests, a

student's score would be expected to change from one administration to

the next due to a number of factors. For each student, the scores would

form a "normal" (bell-shaped) distribution. The mean of the distribution

84

Page 94: Measurement and Evaluation (Book) Abbasi.docx

is assumed to be the student's "true score," and reflects what he or she

"really" knows about the subject. The standard deviation of the

distribution is called the standard error of measurement and reflects the

amount of change in the student's score which could be expected from

one test administration to another.

Whereas the reliability of a test always varies between 0.00 and

1.00, the standard error of measurement is expressed in the same scale as

the test scores. For example, multiplying all test scores by a constant will

multiply the standard error of measurement by that same constant, but

will leave the reliability coefficient unchanged.

A general rule of thumb to predict the amount of change which

can be expected in individual test scores is to multiply the standard error

of measurement by 1.5. Only rarely would one expect a student's score to

increase or decrease by more than that amount between two such similar

tests. The smaller the standard error of measurement, the more accurate

the measurement provided by the test.

(Further discussion of the standard error of measurement can be

found in J. C. Nunnally, Psychometric Theory. New York: McGraw-Hill,

1967, pp.172-235, see especially formulas 6-34, p. 201.)

A Caution in Interpreting Item Analysis Results

Each of the various item statistics provided by ScorePak®

provides information which can be used to improve individual test items

and to increase the quality of the test as a whole. Such statistics must

always be interpreted in the context of the type of test given and the

individuals being tested. W. A. Mehrens and I. J. Lehmann provide the

following set of cautions in using item analysis results (Measurement and

Evaluation in Education and Psychology. New York: Holt, Rinehart and

Winston, 1973, 333-334):

Item analysis data are not synonymous with item validity. An

external criterion is required to accurately judge the validity of

test items. By using the internal criterion of total test score, item

85

Page 95: Measurement and Evaluation (Book) Abbasi.docx

analyses reflect internal consistency of items rather than

validity.

The discrimination index is not always a measure of item

quality. There is a variety of reasons an item may have low

discriminating power:

a) extremely difficult or easy items will have low ability to

discriminate but such items are often needed to adequately

sample course content and objectives;

b) an item may show low discrimination if the test measures many

different content areas and cognitive skills. For example, if the

majority of the test measures "knowledge of facts," then an item

assessing "ability to apply principles" may have a low

correlation with total test score, yet both types of items are

needed to measure attainment of course objectives.

Item analysis data are tentative. Such data are influenced by the

type and number of students being tested, instructional

procedures employed, and chance errors. If repeated use of

items is possible, statistics should be recorded for each

administration of each item.

Raw scores are those scores which are computed by scoring

answer sheets against a ScorePak® Key Sheet. Raw score names are

EXAM1 through EXAM9, QUIZ1 through QUIZ9, MIDTRMI through

MIDTRM3, and FINAL. ScorePak® cannot analyze scores taken from

the bonus section of student answer sheets or computed from other

scores, because such scores are not derived from individual items which

can be accessed by ScorePak®. Furthermore, separate analyses must be

requested for different versions of the same exam. Return to the text.

(anchor near note 1 in text)

A correlation is a statistic which indexes the degree of linear

relationship between two variables. If the value of one variable is related

to the value of another, they are said to be "correlated." In positive

relationships, the value of one variable tends to be high when the value of

the other is high, and low when the other is low. In negative

86

Page 96: Measurement and Evaluation (Book) Abbasi.docx

relationships, the value of one variable tends to be high when the other is

low, and vice versa. The possible values of correlation coefficients range

from -1.00 to 1.00. The strength of the relationship is shown by the

absolute value of the coefficient (that is how large the number is whether

it is positive or negative). The sign indicates the direction of the

relationship (whether positive or negative). Return to the text.

*Software capable of displaying a PDF is required for viewing or printing

this document. Adobe Reader is available free of charge from the Adobe

Web site at http://www.adobe.com/products/acrobatireadstea.html

QUESTION:

A few years ago in your Shiken column, you showed how to do

item analysis for weighted items using a calculator (Brown, 2000, pp. 19-

21) and a couple of columns back (Brown, 2002, pp. 20-23) you showed

how to do distractor efficiency analysis in a spreadsheet program. But, I

don't think you have ever shown how to do regular item analysis statistics

in a spreadsheet. Could you please do that? I think some of your readers

would find it very useful.

ANSWER:

Yes, I see what you mean. In answering questions from readers,

I explained more advanced concepts of item analysis without laying the

groundwork that other readers might need. To remedy that, in this

column, I will directly address your question, but only with regard to

norm-referenced item analysis. In my next Statistics Corner column, I

will address another reader's question, and in the process show how

criterion-referenced item analysis can be done in a spreadsheet.

The Overall Purpose of Item Analysis

Let's begin by answering the most basic question in item

analysis: Why do we do item analysis? We do it as the penultimate step

in the test development process. Such projects are usually accomplished

in the following steps:

1. Assemble or write a relatively large number of items of the type

you want on the test.

87

Page 97: Measurement and Evaluation (Book) Abbasi.docx

2. Analyze the items carefully using item format analysis to make

sure the items are well written and clear (for guidelines, see

Brown, 1996, 1999; Brown & Hudson, 2002).

3. Pilot the items using a group of students similar to the group that

will ultimately be taking the test. Under less than ideal

conditions, this pilot testing may be the first operational

administration of the test.

4. Analyze the results of the pilot testing using item analysis

techniques. These are described below for norm-referenced tests

(NRTs) and in the next column for criterion-referenced tests

(CRTs).

5. Select the most effective items (and get rid of the ineffective

items) to make a shorter, more effective revised version of the

test.

Basically, those five steps are followed in any test development or

revision project.

Item Analysis Statistics for Norm-Referenced Tests

As indicated above, the fourth step, item analysis, is different for

NRTs and CRTs, and in this column, I will only explain item analysis

statistics as they apply to NRTs. The basic purpose of any NRT is to

spread students out along a general continuum of language abilities,

usually for purposes of making aptitude, proficiency, or placement

decisions (for much more on this topic, see Brown, 1996, 1999; Brown &

Hudson, 2002). Two item statistics are typically used in the item analysis

of such norm-referenced tests: item facility and item discrimination.

Item facility (IF) is defined here as the proportion of students

who answered a particular item correctly. Thus, if45 out of 50 students

answered a particular item correctly, the proportion would be 45/50

= .90. An IFof .90 means that 90% of the students answered the item

correctly, and by extension,that the item is very easy. In Screen 1, you

will see one way to calculate IFusing the Excel® spreadsheet for item 1

(I1) in a small example data set coded 1 for correct and 0 for incorrect

88

Page 98: Measurement and Evaluation (Book) Abbasi.docx

answers. Notice the cursor has outlined cell C21 and that the

function/formula typed in that cell (shown both in the row above the

column labels and in cell B21) is = AVERAGE (C2:C19), which means

average the ones and zeros in the range between cells C2 and C19. The

result in this case is .94, a very easy item because 94% of the students are

answering correctly.

All the other NRT and CRT item analysis techniques that I will

discuss here and in the next column are based on this notion of item

facility. For instance, item discrimination can be calculated by first

figuring out who the upper and lower students are on the test (using their

total scores to sort them form the highest score to the lowest). The upper

and lower groups should probably be made up of equal numbers of

students who represent approximately one third of the total group each. In

Screen 1, I have sorted the students from high to low based on their total

test scores from 77 for Hide down to 61 for Hachiko. Then I separated

the three groups such that there are five in the top group, five in the

bottom group, and six in the middle group. Notice that Issaku and Naoyo

89

Page 99: Measurement and Evaluation (Book) Abbasi.docx

both had scores of 68 but ended up in different groups (as did Eriko and

Kimi with their scores of 70). The decision as to which group they were

assigned to was made with a coin flip.

To calculate item discrimination (ID), I started by calculating

IFfor the upper group using the following: = AVERAGE(C2:C6), as

shown in row 22. Then, I calculated IFfor the lower group using the

following: = AVERAGE(C15:C19), as shown in row 23. With IFupper

and IFlower in hand, calculatingIDsimply required subtracting IFupper–

IFlower. I did this by subtracting C22 minus C23, or = C22 -C23, as shown

in row 24, which resulted in an IDof .20 for I1.

Once I had calculated the four item analysis statistics shown in

Screen 1 for Il, I then simply copied them and pasted them into the spaces

below the other items, which resulted in all the other item statistics you

see in Screen 1. [Note that the statistics didn't always fit in the available

spaces, so I got results that looked like ### in some cells; to fix that, I

blocked out all the statistics and typed alt oca and thusadjusted the

column widths to fit the statistics. You may also want to adjust the

number of decimal places, which is beyond the scope of this article. You

can learn about this by looking in the Help menu or in the Excel manual.

Ideal items in an NRT should have an average IFof .50. Such

items would thus be well centered, i.e., 50 percent of the students would

have answered correctly, and by extension, 50 percent would have

answered incorrectly. In reality however, items rarely have an IFof

exactly .50, so those that Ell in a range between .30 and .70 are usually

considered acceptable for NRT purposes.

Once those items that fall within the .30 to .70 range of IFs are

identified, the items among them that have the highest IDs should be

further selected for inclusion in the revised test. This process would help

the test designer to keep only those items that are well centered and

discriminate well between the high and the low scoring students. Such

items are indicated in Screen 1 by an asterisk in row 25 (cleverly labeled

"Keepers").

90

Page 100: Measurement and Evaluation (Book) Abbasi.docx

For more information on using item analysis to develop NRTs,

see Brown (1995, 1996, 1999). For information on calculating NRT

statistics for weighted items (i.e., items that cannot be coded 1 or 0 for

correct and incorrect), see Brown (2000). For information on calculating

item discrimination using the point-biserial correlation coefficient instead

of ID, see Brown (2001). For an example NRT development and revision

project, see Brown (1988).

Conclusion

I hope you have found my explanation of how to do norm-

referenced item analysis statistics (item facility and item discrimination)

in a spreadsheet clear and helpful. I must emphasize that these statistics

are only appropriate for developing and analyzing norm-referenced tests,

which are usually used at the institutional level, like, for example, overall

English language proficiency tests (to help with, say,admissions

decisions) or placement tests (to help place students into different levels

of English study within a program). However, these statistics are not

appropriate for developing and analyzing classroom oriented criterion-

referenced tests like the diagnostic, progress, and achievement tests of

interest to teachers. For an explanation of item analysis as it is applied to

CRTs, read the Statistics Corner column in the next issue of this

newsletter, where I will explain the distinction between the difference

index and the B-index.

3.3 ITEM DIFFICULTY:

Definition:

“item difficulty is a measure of the proportion of individuals who

responded correctly to each test item.” Item difficulty in a test determined

by two proportion of individuals who correctly respond to the item in

particular.

“item difficulty of a test for a particular group is evaluated by the

percentage of participates who respond correctly.”

91

Page 101: Measurement and Evaluation (Book) Abbasi.docx

Explanation:

Item difficulty is simply the percentage of students taking two

tests who answered the item correctly. The larger the percentage getting

an item rights the easier two items. The higher the difficulty index, the

easier the item is understood to be (wood, 1960). To compute the item

difficulty, divide the number of people answering the item correctly by

the total number of people answering item. The proportion for the item is

usually denoted by P and it called item difficulty. The range is from 0%

to 100%.

Examples:

To determine the difficulty level of test items, a measure called

difficulty Indene is used. This measure asks teachers to calculate the

proportion of students who answered the test correctly. By looking each

alternative (for multiple choice), we can also find out if there are answers

choices that should be replaced. For example, let’s we give a multiple

choice quiz and there were four answer choices (A,B,C and D). Two

following talde illustrates how many students selected each answer

choice for Question # 1 and # 2.

Questions A B C D

#1 0 3 24* 3

#2 12* 13 3 2

*Devertes correct answers.

For question # 1, we can see that A was not a very good

distracter no one selected that answer. We can also compute the difficulty

of item by dividing the number of students who choose two correct

answers (24) by the number of total students (30) by using formula, the

difficulty of Question # 1, P is equal to

P= 2430

P= .80

92

Page 102: Measurement and Evaluation (Book) Abbasi.docx

A rough “role – of thumb” is that if the item difficulty is more

then 75, it is an easy item; if the difficulty is below 25, it is a difficult

item. Given these parameters, this item could be regarded moderately

easy 10ts (80%) of students got it correct. In contrast, Question # 2 is

much more difficult. ( 1230

=.40).P=

1230

P= .40

In fail, on question # 2, more students selected an incorrect

answer (B) than selected the correct answer (A). This item should be

carefully analyzed to ensure that B is an appropriate distracter.

Therefore “Item difficulty” should have been named “item

easiness;” it expresses the proportion or percentage of students who

answered two items correctly.

3.4 THE INDEX OF DISCRIMINATION

Introduction

1. The index of discrimination is a useful measure of item quality

whenever the purpose of a test is to produce a spread of scores,

reflecting differences in student achievement, so that distinctions

may be made among the performances of examinees. This is

likely to be the purpose of norm-referenced tests.

2. It is a degree to which students with high overall exam scores

also got a particular item correct. It is often referred to as Item

Effect, since it is an index of an item's effectiveness at

discriminating those who know the content from those who do

not.

3. The item discrimination index is a point biserial correlation

coefficient. Its possible range is -1.00 to 1.00. A strong and

positive correlation suggests that students who get any one

93

Page 103: Measurement and Evaluation (Book) Abbasi.docx

question correct also have a relatively high score on the overall

exam. Theoretically, this makes sense. Students who know the

content and who perform well on the test overall should be the

ones who know the content. There's a problem if students are

getting correct answers on a test and they don't know the

content.

Measurement of Index of Discrimination

Examples I If we are using the Item Analysis provided by Scanning

Operations, discrimination indices are listed under the column head

‘Disc.’

RESPNSE TABLE - FORMA

ITEMNO OMIT A B C D E KEY- % DISC

% % % % % %1 0 0 18 82 0 0 C 82 0.222 0 79 0 0 21 0 A 79 0.233 0 4 7 89 0 0 C 89 -0.12

The Index of Discrimination

We examine item discrimination; there are a number of things we should

consider.

1. Item difficulty! Very easy or very difficult items are not good

discriminators. If an item is so easy (e.g., difficulty = 98) that

nearly everyone gets it correct or so difficult (e.g., difficulty =

30)

2. That nearly everyone gets it wrong, then it becomes very

difficult to discriminate those who actually know the content

from those who do not.

3. That does not mean that very easy and very difficult items

should be eliminated. In fact, they are fine as long they are used

with the instructor's recognition that they will not discriminate

well and if putting them on the test matches the intention of the

94

Page 104: Measurement and Evaluation (Book) Abbasi.docx

instructor to either really challenge students or to make certain

that everyone knows a certain bit of content.

4. A poorly written item will have little ability to discriminate.

Example 2

Another measure, the Discrimination Index, refers to how well

an assessment differentiates between high and low scorers. In other

words, you should be able to expect that the high-performing students

would select the correct answer for each question more often than the

low-performing students. If this is true, then the assessment is said to

have a positive discrimination index (between 0 and 1) -- indicating that

students who received a high total score chose the correct answer for a

specific item more often than the students who had a lower overall score.

If, however, you find that more of the low-performing students

got a specific item correct, then the item has a negative discrimination

index (between -1 and 0). Let's look at an example.

Table 1 displays the results of ten questions on a quiz. Note that

the students are arranged with the top overall scorers at the top of the

table 1

Table-1: The Index of Discrimination

Student Total score (%)Questions

1 2 3

Asif 90 1 0 1

Sam 90 1 0 1

Jill 80 0 0 1

Charlie 80 1 0 1

Sonya 70 1 0 1

Ruben 60 1 0 0

Clay 60 1 0 1

Kelley 50 1 1 0

Justin 50 1 1 0

Tonya 40 0 1 0

95

Page 105: Measurement and Evaluation (Book) Abbasi.docx

“1” indicates the answer was correct; “0” indicates it was incorrect.

Steps to determine the Difficulty Index and the Discrimination Index.

1. After the students are arranged with the highest overall scores at

the top, count the number of students in the upper and lower

group who got each item correct. For Question #1, there were 4

students in the top half who got it correct and 4 students in the

bottom half.

2. Determine the Difficulty Index by dividing the number who got

it correct by the total number of students. For Question #1, this

would be 8/10 or p=.80.

3. Determine the Discrimination Index by subtracting the number

of students in the lower group who got the item correct from the

number of students in the upper group who got the item correct.

Then, divide by the number of students in each group (in this

case, there are five in each group). For Question #1, that means

you would subtract 4 from 4, and divide by 5, which results in a

Discrimination Index of 0.

4. The answers for Questions 1-3 are provided in Table 1

Table-2

Item# Correct

(upper group)# Correct

(Lower group)Difficulty

(p)Discrimination

(D)

Question 1 4 4 .80 0

Question 2 0 3 .30 -0.6

Question 3 5 1 .60 0.8

In table 2 we can see that Question #2 had a difficulty index

of .30 (meaning it was quite difficult), and it also had a negative

discrimination index of -0.6 (meaning that the low-performing students

were more likely to get this item correct). This question should be

carefully analyzed, and probably deleted or changed. Our "best" overall

question is Question 3, which had a moderate difficulty level (.60), and

discriminated extremely well (0.8).

96

Page 106: Measurement and Evaluation (Book) Abbasi.docx

Recommendations for Determining Index of Discrimination

It is typically recommended that item discrimination be at

least .20. It's best to aim even higher. Items with a negative

discrimination are theoretically indicating that either the students who

performed poorly on the test overall got the question correct or that

students with high overall test performance did not get the item correct.

Thus, the index could signal a number of problems:

There is a mistake on the scoring key.

Poorly prepared students are guessing correctly.

Well prepared students are somehow justifying the wrong

answer.

In all cases, action must be taken! So, items with negative item

difficulty must be addressed. Items with discrimination indices less

than .20 (or slightly over, but still relatively low) must be revised or

eliminated. Be certain that there is only one possible answer, that the

question is written clearly, and that your answer key is correct.

97

Page 107: Measurement and Evaluation (Book) Abbasi.docx

UNIT-4:

INTERPRETING THE TEST SCORES

4.1 THE PERCENTAGE CORRECT SCORE:

What does score test mean?

A test score is a piece of information, usually a number, that conveys the

performance of an examinee on a test. One formal definition is that it is

"a summary of the evidence contained in an examinee's responses to the

items of a test that are related to the construct or constructs being

measured."

Test scores are interpreted with a norm-referenced or criterion-referenced

interpretation, or occasionally both. A norm-referenced interpretation

means that the score conveys meaning about the examinee with regards

to their standing among other examinees. A criterion-referenced

interpretation means that the score conveys information about the

examinee with regards a specific subject matter, regardless of other

examinees' scores.

Types of Test Scores

There are two types of test scores: raw scores and scaled scores. A raw

score is a score without any sort of adjustment or transformation, such as

the simple number of questions answered correctly. A scaled score is the

results of some transformation applied to the raw score.

The purpose of scaled scores is to report scores for all examinees on a

consistent scale. Suppose that a test has two forms, and one is more

difficult than the other. It has been determined by equating that a score of

65% on form 1 is equivalent to a score of 68% on form 2. Scores on both

forms can be converted to a scale so that these two equivalent scores have

the same reported scores. For example, they could both be a score of 350

on a scale of 100 to 500.

98

Page 108: Measurement and Evaluation (Book) Abbasi.docx

Two well-known tests in the United States that have scaled scores are the

ACT and the SAT. The ACT's scale ranges from 0 to 36 and the SAT's

from 200 to 800 (per section). Ostensibly, these two scales were selected

to represent a mean and standard deviation of 18 and .6 (ACT), and 500

and 100. The upper and lower bounds were selected because an interval

of plus or minus three standard deviations contains more than 99% of a

population. Scores outside that range are difficult to measure, and return

little practical value.

Note that scaling does not affect the psychometric properties of a test, it

is something that occurs after the assessment process (and equating, if

present) is completed. Therefore, it is not an issue of psychometrics, per

se, but an issue of interpretability.

Interpretation the Score by Criterion Referencing

The raw score is number of points received on a test when the test has

been scored according to the instructions. Raw score is not very

meaningful without further information. Criterion-referenced test

interpretation permits us to describe an individual's test performance

without referring to the performance of other individuals. Thus we might

describe a student's performance in terms of the speed, precision with

which a certain task is performed. Criterion-referenced interpretation of

test scores is most meaningful when the test is designed to measure a set

of clearly stated learning tasks. Enough items are used for each

interpretation to make dependable Judgments.

Interpretation the Score by Percentages

In mathematics, a relationship with 100 is called percentage (denoted by

%). Often it is useful to express the scores in terms of percentages for

comparison. Consider the following example.

GradeClass A No. of

Students%

Class B No. of Students

%

A

B10

25

12.50

31.25

8

6

40

30

99

Page 109: Measurement and Evaluation (Book) Abbasi.docx

C

D30

15

37.50

18.75

4

2

20

10

Total 80 100 20 100

Ten students from class A and eight students from class B got

grade A. It looks apparently that class A is better in getting A grade but

12.5% of the students from class A and 40% students from class B got

grade A. It is clear from the percentages that class.

B is far better in getting grade A than class A.

Interpretation the Score by Norm Referencing

Interpretation of scores by norm referencing involves making of

scores and expressing a given score in relation, to the other scores Norm-

referenced test interpretation tells us how an individual is compared with

other persons who have taken the same test. The simplest type of

comparison is to rank the scores from highest to lowest and to note where

an individual's score falls. The rest of the scores serve as the norm group.

The given score is compared with the other scores by norm referencing.

If a student's score is second from the top in a group of 20 students, it is a

high score meaning that the scores of 90% of the students are less than

him.

Ordering and Ranking

A first step in organizing scores in the listing of scores in order

of magnitude from largest to the smallest score. The data so arranged are

called ordered array. By scanning an ordered array, we can determine

quickly the largest score, the smallest score and other facts about the data.

Ranked data consists of scores in a form that shows their relative

position on some characteristic but does not yield a numerical value for

this characteristic. The order of finish of cars in a race is an example of

ranking. If we list the cars as first, second, third etc. up to the last car, we

can say that they were ranked on the characteristic of overall speed. We

know each car's position relative to any other car's position but we have

100

Page 110: Measurement and Evaluation (Book) Abbasi.docx

no precise knowledge of the speed of any car. A high school teacher

ranked Hamid 30th in a class of 100 means that Hamid did better than 70

of his classmates but poorer than 29. But nothing has been aid about

Hamid's general level of achievement.

Measurement Scales

Measurement scales are of great significance in analyzing and

interpreting results. The important types of measurement scales are:

The Nominal Scale

The lowest measurement scale is the nominal scale. In this scale,

each individual is put into one of the distinct categories or classes. Each

class has a name. The names are just labels. There is no order in these

classes. We cannot say that one class is larger than the other class. You

cannot do arithmetic operations (addition, subtraction, multiplication,

division) on this scale.

Examples of the nominal scale are Categorization of blood

groups of the students of a college into A, B, AB and 0 groups. We

cannot say that group A is better than group B. Classification of books in

a college library according to subjects.

Distribution of the population of Pakistan according to sex,

religion, occupations, marital status, literacy etc., is examples of the

nominal scale.

The Ordinal Scale

When measurements are not only different from category to

category but can also be ranked according to some criterion, they are to

be measured on an ordinal scale. The members of anyone category are

considered equal but members of one category are considered lower than

those in another category. The ordinal scale is one-step higher than the

nominal scale because we distribute the individuals not only in classes

but we also order these classes.

Examples of the ordinal scale are Categorization of schools

according to their educational level into primary, middle, secondary or

101

Page 111: Measurement and Evaluation (Book) Abbasi.docx

higher secondary is an ordinal scale. There is an order in these classes.

The primary level is lower than the middle level and the middle level is

lower than the secondary level. You cannot do arithmetic operations on

this scale.

Individuals may be classified according to socioeconomic status

as low, medium, high. Intelligence of students may be average, above

average or below average. Classification of examination results into

different grades A), A, B, C, D, E etc. In this measurement scale, we can

say that one individual is larger than the other but we cannot say how

large it is.

The Interval Scale

In this scale, it is not only possible to order measurement but

also the distance between two measurements is known. We can say that

the difference between two measurements 30 and 40 is equal to the

difference between measurements 40 and 50. The level of the interval

scale is higher than the nominal and the ordinal scales. This is truly a

quantitative scale. A unit of measurement and a zero point are required

for this scale. The selected zero point is not necessarily a true zero. It

does not have to indicate a total absence of the quantity being measured.

We measure height in meters or feet, weight in kilograms or pounds,

temperature in centigrade or Fahrenheit, income in rupees and the time in

seconds. Arithmetic operations can be done on this scale. You can add

the income of a wife to that of his husband.

The Ratio Scale

The highest level of measurement is the ratio scale. Equality of

ratios as well as equality of intervals is determined in this scale.

Fundamental to the ratio scale is the true zero point. The measurement of

height, weight and length makes use of the ratio scale.

Frequency Distribution

Data that have been originally collected is called raw data or

primary data. It has not yet undergone any statistical technique. To

102

Page 112: Measurement and Evaluation (Book) Abbasi.docx

understand the raw data easily, we arrange into groups or classes. The

data so arranged is called groups data or frequency distribution.

General rules far the construction of a frequency distribution:

1. Determine the Range. Range is the difference between highest

and lowest scores.

2. Decide the appropriate number of class intervals: 'There is no

hard and fast formula for deciding the number of class intervals.

The number of class intervals is usually taken between 5 and 20

depending on the length of the data.

3. Determine the approximate length of the class interval by

dividing the range with number of class intervals.

5. Determine the limits of the class intervals taking the smallest

scores at the bottom of the column to the largest scores at the

top.

5. Determine the number of scores falling in each class interval.

This is done by using a tally or score sheet.

Example:

The marks obtained by 120 students of first year class in the

subject of Education are given below-Construct a frequency distribution.

57 86 69 62 75 73 80 78 87 83 77 35 70 68 84 73 81 78

61 72 59 98 95 63 76 73 88 60 52 83 86 45 70 53 85 74

62 78 89 84 60 79 91 64 84 85 81 79 90 78 83 50 71 65

76 58 71 79 51 61 61 89 81 74 76 74 82 91 71 76 80 52

71 66 77 65 44 79 95 74 79 63 83 87 77 75 83 48 70 85

61 70 72 67 61 83 75 79 97 75 66 54 81 68 78 75 83 61

33 76 62 55 72 76 78 75 99 80 83 86

The following steps are followed to-make a frequency distribution.

1. Step-1: Range = maximum score-minimum score = 99 — 33 =

66.

2. Step-2: Number of approximate class intervals to be taken is 7.

103

Page 113: Measurement and Evaluation (Book) Abbasi.docx

3. Step-3: Length of the class intervals, usually denoted by i, is.

I = Range

No .of class intervals

The length is usually rounded upward to whole number.

Therefore 9.4 is taken as 10.

4. Step-4: Determine the limits of the class intervals

90 — 99

80 — 89

70 — 79

60 — 69

50 — 59

40 — 49

30 — 39

The lowest class interval is taken in which the minimum scores

can be included. The minimum score is 33. The lowest class interval can

be started from 30, but it is convenient to start the lowest class interval

from the score to which addition of the length of the class intervals is

easy. So we start from 30. This is called lower limit of the class intervals.

Add 9 (1 — 1 = 10 — 1 = 9) to the lower limit to get the upper limit of

the first class interval. Now add consequently i = 10 to the lower limits

and upper limits to get the remaining class intervals.

5. Step-5: Distribute the scores in the class intervals by putting a

tally mark in the relevant class interval and count the number of

scores in each class interval.

Grade Tallies No. of Students

104

Page 114: Measurement and Evaluation (Book) Abbasi.docx

90 - 99 ||||| ||| 88

80 – 89 ||||| ||||| ||||| ||||| ||||| ||||| 3030

70 – 79 ||||| ||||| ||||| ||||| ||||| ||||| ||||| ||||| ||||| 45

60 – 69 ||||| ||||| ||||| ||||| || 223

50 – 59 ||||| ||||| 10

40 – 49 ||| 3

30 – 39 || 2

Frequency

The number of scores lying in a class interval is called the

frequency of that class interval. For Example two scores lie in the class

interval 30-39. Therefore 2 is the frequency of the class interval 30-39.

Mid-Point of Class Mark

The middle of a class interval is called mid point or class mark

and is usually denoted by X. It is calculated as

Midpoint = X = Lower limit+Upper limit

2

For Example, the mid point of the class interval 30-39 is

X = Lower limit+Upper limit

2=30+39

2

= 692

= 34.5

Measures of Central Tendency:

A single score calculated to represent all the scores is called an

average. Average tends to lie in the centre of an array. That is why

averages are called measures of central tendency. Since averages locate

the centre of a data set, these are also called measures of location.

Several types of average can be defined. Most commonly used

averages are arithmetic mean, median and mode.

105

Page 115: Measurement and Evaluation (Book) Abbasi.docx

The Arithmetic Mean or Mean

The arithmetic mean is the most commonly used average. It is

usually called mean or average. The arithmetic mean is defined as the

number obtained by dividing the sum of the scores by their number. It is

denoted by putting bar on the variable symbol e.g., X (reads as X bar).

The formula for calculating the arithmetic mean ungrouped data is:

X=∑ x

N

Where

Read as sigma, is the Greek symbol means sum of.

X Means sum of the values of variable X.

N is the number of scores of measurements. In order to calculate

the arithmetic mean for grouped data formula is: X fx / f /

Where

fx means the sum of product of values for ‘f’ and `x', f means

frequency of the scores and x means score. f means the sum of

all the frequencies of the distribution.

The Median:

The median of a set of scores is the middle score of the

arithmetic mean of two middle scores in an array. 50% of the scores are

less than median and 50% of the scores are greater than median.

Formula for calculating median for ungrouped data:

Median = ( N+12 )th score

Formula for calculating median for grouped data:

Median = L + if+( N

2−C)

Where

106

Page 116: Measurement and Evaluation (Book) Abbasi.docx

L = lower class boundary of the median class interval.

I = length of the median class interval.

F - the frequency of the median class interval.

N = f

C = the cumulative frequency of the class interval below the median class

interval.

The Mode

The mode is the score that occurs greatest number of times in a

data set. Mode does not always exist. If each score occur the same

number of times, there is no mode. There may be more than one mode. If

two or more scores occur greatest number of times, then there are more

than one mode.

The mode can be calculated for grouped data with the help of

following formula.

Mode = L+f m−f l

2 f m−f 1−f 2

×i

Where

L = lower class boundary of the modal class interval.

Fm = the maximum frequency.

Fi = the frequency preceding to the modal class.

f2 = the frequency succeeding to the modal class,

I = the length of the modal class interval.

Note: The mode lies in the class interval having maximum frequency.

This class interval is called the modal class.

Empirical Relationship between Mean, Median and Mode:

For moderately skewed distributions, we have the following

empirical relation:

107

Page 117: Measurement and Evaluation (Book) Abbasi.docx

Mode = 3 Median — 2 Mean

Mode = 3 (74.61) — 2 (73.42)

Mode = 76.99

Comparison of Measures of Central Tendency:

The numerical value of every score in a data set contributes to

the mean. This is not true of the mode or median because only the mean

is based on the sum of all the scores. In a single peaked symmetrical

distribution mean = median = mode. In practice, no distribution is exactly

symmetrical, so the mode, median and mean usually have different

values. If a population is not symmetrical, the mean, median and mode

will not be equal. The mean is affected by the presence of a few extreme

scores which the median and mode are not. The mean is preferred if

extreme values are not present in the data. Median is preferred if interest

is centered on the typical rather than the total score and if the distribution

is skewed. If some scores are missing so that the mean cannot be

computed directly, the median is appropriate. Mode is preferred only if

the distribution is multimodal and a multi-valued index is satisfactory.

The Quartiles

The values that divide a set of scores into four equal parts are

called quartiles and are denoted by Ql, Q2, and Q3. Q1 is called the lower

quartile and Q3 is called the upper quartile. 25% of the scores are less

than Ql- and 75% of the scores are less than Q3. Q2 is the median. The

formulas for the quartiles are given as:

Q1 = ( N+14 )th score

Q2 = 2 ( N+1 )

4= N+1

2th score

and

Q3 = 3(N+1)

4th score

108

Page 118: Measurement and Evaluation (Book) Abbasi.docx

4.2 THE PERCENTILE RANKS:

The Percentiles:

The values that divide a set of scores into hundred equal parts

are called percentiles and are denoted by P1, P2, P3, ……….. and P99.?

P25 is the first quartile, P75 is the third quartile and P50 is the median.

The Percentile Ranks (PR):

The procedure for calculating percentile ranks is the reverse of

the procedure for calculating percentiles. Here we have an individual's

score and find the percentage of scores that lies below it. In the Example,

we calculate P78== 83.37. It means that 83.37 is the score below which

78% of the scores fall. If a student has a score of 83.37, we can say that

his percentile rank (PR) is 78 on a scale of 100.

Relationships with a Distribution

Computing the Coefficient of Correlation

A coefficient of correlation measures the degree of linear

relationship between two sets of scores. The range of the coefficient is

from -- 1 to + I with intermediate value 0 meaning no linear relationship.

There are two extremes: r = + 1 indicates perfect positive correlation and

r = —I indicates perfect negative correlation. The larger the value of r,

the higher is the degree of linear relationship.

Positive Correlation Negative Correlation

No correlation

The most common methods of computing the Coefficient of

correlation are:

1. Rank-difference method:

This method is useful when the number of scores to be

correlated'-is small or exact magnitude of the scores cannot be

109

Page 119: Measurement and Evaluation (Book) Abbasi.docx

ascertained. The scores are ranked according to size or some other

criterion using numbers 1, 2, 3 n The rank-difference coefficient of

correlation can be computed by the following formula.

Rs = 1 – 6∑ D2

N (N2−1)Where D == the difference between two rankings.

N = Number of pairs of scores.

2. The Product-moment method

The product-moment coefficient is usually used when the

number of scores is large. Thus this method is used in most research

studies. The product-moment coefficient is usually denoted by r.

rxy =

∑ XY

N−(∑ X

N )(∑ Y

N )√∑ X2

N (∑ XN )

2√∑Y 2

N (∑ YN )

2

Measures of Variability:

Measures of central tendency measure the centre of a set of

scores. However, two data sets can have the same mean, median and

mode and yet be quite different in other respects. For example, consider

the heights (in inches) of the players of two basketball teams.

Team-1: 72 73 76 76 78

Team-2: 67 72 78 76 84

The two teams have the same mean height. 75 inches, but it is

clear that the heights of the players of team 2 vary much more than those

of team 2. If we have information about the centre of scores and the

manner in which they are spread out we know much more bout set of

scores. The degree to which scores tend to spread about. an average value

is called dispersion.

110

Page 120: Measurement and Evaluation (Book) Abbasi.docx

The Range

It is the simplest measure of dispersion. The range of a set of

scores is the difference between maximum scores and minimum scores.

In symbols

Range = Xm – Xo

Where Xm is the maximum score and Xo is the minimum score.

Quartile Deviation:

The quartile deviation is defined as half of the difference

between the third and the first quartiles.

In symbols

Q. D. = (Q3 – Ql) / 2

Where

Q1 is the first quartile and

Q3 is the third

The Mean Deviation or Average Deviation:

The average deviation is defined as the arithmetic mean of the

deviations of the scores from the mean or median; the deviations are

taken as positive. In symbols

M.D. = ∑ ¿X−X∨¿

N¿

For grouped data

M.D. = ∑ f ∨X−X∨¿

∑ f¿

The Standard Deviation:

The standard deviation is the positive square root of the

arithmetic mean of the squares of deviations of all the scores from their

mean.

111

Page 121: Measurement and Evaluation (Book) Abbasi.docx

S = √∑ ¿¿¿¿

Short formula for calculating standard deviation

S = √∑ X2

N−(∑ X

N )2

The Coefficient of Variation:

Karl Pearson introduced a relative measure of dispersion known

as coefficient of variation (denoted by c.v). It expresses the standard

deviation as a percentage of the arithmetic mean of a data set. It is

number without units and is used to compare variation in two or more

distributions. The smaller value of the c.v. indicates lesser variation. It is

also used as a criterion for consistent performance of the students, players

etc.

C.V. SX

× 100

Standard Scores:

A frequently used quantity is statistical analysis is the standard

score or Z-score. The standard score for a data value in the number of

standard deviations that the data value is away from the mean of the data

set.

Z = X−X

S

The Normal Curve:

Before explaining the normal distribution, some basic concepts

of probability is given below an event is a specified result That mayor

may not occur when an experiment is performed. For example, in tossing

of a coin once, appearance of head is an event, which may or may not

occur. The probability of an event is a measure of the likelihood of its

occurrence. A probability near indicates that the event is very unlikely to

112

Page 122: Measurement and Evaluation (Book) Abbasi.docx

occur. Whereas a probability near 1 indicates that the event is quite likely

to occur.

Relative frequency interpretation of probability:

Consider the, experiment of tossing a balanced coin once. There

are 50-50 chances the head will appear. Consequently, we assign a

probability of 0.5 to that event. The relative-frequency interpretation is

that in a large number of tosses, the head will appear about half of the

time.

Some Basic Properties of the Normal Curve

1. The total area under the normal curve is equal to 1.

2. The normal curve extends indefinitely in both directions.

3. The normal distribution is symmetric about the mean that is

the part of the curve to the left of is the mirror image of the

part of the curve to the right of it.

4. The mean, the median and the mode are equal.

5. Mean deviation is 0.7979 .

6. Quartile deviation is 0.6745 .

7. In a formal distribution,

– 0.674+5 to + 0.6745 covers 50% of the area.

– to + covers 68.27% of the area.

- 2 to + 2 covers 95.45 of the area.

- 3 to IA + 3 covers 99.73% of the area.

4.3 STANDARD SCORES:

Most educational and psychological tests provide standard

scores that are based on a scale that has a statistical mean (or average

score) of 100. If a student earns a standard score that is less than 100,

then that student is said to have performed below the mean, and if a

student earns a standard score that is greater than 100, then that student is

113

Page 123: Measurement and Evaluation (Book) Abbasi.docx

said to have performed above the mean. However, there is a wide range

of average scores, from low average to high average, with most students

earning standard scores on educational and psychological tests that fall in

the range of 85-115. This is the range in which 68% of the general

population performs and, therefore, is considered the normal limits of

functioning.

Classifying Standard Scores

However, the normal limits of functioning encompass three

classification categories: low average (standard scores of 80-89), average

(standard scores of 90-109), and high average (110-119). These

classifications are used typically by school psychologists and other

assessment specialists to describe a student's ability compared to same-

age peers from the general population.

Subtest Scores

Many psychological tests are composed of multiple subtests that

have a mean of 10, 50, or 100. Subtests are relatively short tests that

measure specific abilities, such as. vocabulary, general knowledge, or

short-term auditory memory. Two or more subtest scores that reflect

different aspects of the same broad ability (such as broad Verbal Ability)

are usually combined into a composite or index score that has a mean of

100. For example, a Vocabulary subtest score, a Comprehension subtest

score, and a General Information subtest score (the three subtest scores

that reflect different aspects of Verbal Ability) may be combined to form

a broad Verbal Comprehension Index score. Composite scores, such as

IQ scores, Index scores, and Cluster scores, are more reliable and valid

than individual subtest scores. Therefore, when a student's performance

demonstrates relatively uniform ability across subtests that measure

different aspects of the same broad ability (the Vocabulary,

Comprehension, and General Information subtest scores are both

average), then the most reliable and valid score is the composite score

(Verbal Comprehension Index in this example). However, when a

student's performance demonstrates uneven ability across subtests that

measure different aspects of the same broad ability (the Vocabulary score

114

Page 124: Measurement and Evaluation (Book) Abbasi.docx

is below average, the Comprehension score is below average, and the

General Information score is high average), then the Verbal

Comprehension Index may not provide an accurate estimate of verbal

ability. In this situation, the student's verbal ability may be best

understood by Helping Children at Home and School II: Handouts for

Families and Educators S2-8llooking at what each subtest measures. In

sum, it is important to remember that unless performance is relatively

uniform on the subtests that make up a particular broad ability domain

(such as Verbal Ability), then the overall score (in this case the Verbal

Comprehension Index) may be a misleading estimate.

4.4 PROFILE:

One advantage of converting raw scores to derived scores is that

a pupil’s performance on different tests can be compared directly. This is

usually done by means of a test profile, like the one presented in Figure

14.3. such a graphic representation of test data makes it easy to identify a

pupil’s relative strengths and weaknesses. Most standardized tests have

provisions for plotting test profiles.

The profile shown in figure 14.3 indicates a desirable tend in

profile construction, instead of plotting targets scores as specific points

on the scales, test performance is recorded in the form of bands that

extend one standard error of measurement above and below the pupil’s

obtained scores. Recall from our discussion of reliability that there are

approximately two chances out of three that a pupil’s true source will fall

within one standard error of the obtained score. Thus, these confidence

bands indicate the ranges of scores within which we can be reasonably

certain of finding the pupil’s true standings. Plotting them on the profile

enables us to take into account the inaccuracy of the test scores when

comparing performance on different tests. Interpreting differences

between tests is simple with these score bands. If the bands for two tests

overlap, we can assume that performance on the two tests does not

different significantly, and if the ands do not overlap, we can assume that

there is probably a real difference in performance.

115

Page 125: Measurement and Evaluation (Book) Abbasi.docx

The score bands used with the differential aptitude test can be

plotted by hand or by computer. The computer produced profile shown in

figure 14.3 is based on the same sex percentiles. There are recorded down

the left side of the profile and were obtained from the percentile norms in

table 14.3 the opposite sex percentiles are listed down the right side of the

report also to show how the scores compare with the female norms. The

differences in percentiles for some tests plot the results directly on the

profile. The use of such bands minimizes the tendency of test profiles to

present a misleading picture. Without the bands we are apt to attribute

significance to differences in test performance that can be accounted for

the chance alone.

When profiles are used to compare test performance, it is

essential that the norms for all tests be comparable. Many test publishers

provide for this by standardizing a battery of achievement tests and a

scholastic aptitude test on the same population.

Profile Narrative Reports

Some test publishers are now making available to profile of each

pupil’s cores, accompanied by a narrative report that describes how well

the pupil is achieving. The graphic profile provides a quick view of the

pupil’s strengths and weaknesses, and the narrative report aids in

interpreting the scores and in identifying areas in which instructional

emphasis is needed. A typical report of this type, for a widely used test

battery, is shown in figure 14.4.

Narrative reports should be especially useful in communicating

test result to parents. They are, of course, also helpful to those teachers

who have had little or no training in the interpretation and use fo scores

from published tests.

116

Page 126: Measurement and Evaluation (Book) Abbasi.docx

UNIT-5:

EVALUATING PRODUCT, PROCEDURES & PERFORMANCE

5.1 EVOLUTION THEMES AND TERMS PAPERS:

The evaluation is structured around a logical sequence of

seventeen questions which fall under six evaluation themes. The

following are major themes of evaluation.

1. Learning Outcome:-

The quality of learning outcome is the first theme identified in

teaching and learning from work under this theme one of important sub

theme is identified.

Attainment of Curriculum Objectives:

Considered the knowledge, skill and understanding of our pupil.

How does the knowledge level of pupil reflect the curriculum

objectives for chosen area?

What opportunities are pupil's afforded use and display their

ability applies their knowledge and skill?

Can pupils use their skills for curriculum reasoning in problem

solving?

What is the attitude of pupils to learning curriculum?

Do our pupils enjoy learning? Are they motivate to learning.

In numeracy what do you understand by each of the following

skills?

Applying and problem solving

Communicating and expressing

Implementing

Integrating

Reasoning

Understanding of recalling

117

Page 127: Measurement and Evaluation (Book) Abbasi.docx

In the course of a week how many of the following future in

yours numeracy lessons? What opportunities are provided to

development of the following context

Oral language

Reading

Writing

Digital literacy

If we have not completed a school improvement plan to date,

what do we need to focus on the support learns out comes and attainment

of curriculum objectives in each curriculum area.

2. Learning Experience.

The quality learning experience is the second theme identified in

teaching of three important sub theme are Identified.

Learning environment

Engaged in learning

Learning to learning

Engaged in Learning

Are students interested and enthused by the content and teaching

approaches used?

Do we encouraged pupil questioning considered teacher input

V'S pupil participation in your class room.

How pupils are active when teacher work?

Collaborative and independent learning.

Progressive skill learning and skill development.

Challenge and support.

To support learning by referring to outcomes and related success

criteria to allow for further enhancement of understanding.

Pupils enjoy learning in class room and are eager to find out

more,

All students in class room afforded the opportunity to participate

in lesson and engage with learning.

118

Page 128: Measurement and Evaluation (Book) Abbasi.docx

Learning Environment

To involve the students in development rules which recognize

the rights of responsibilities of the community.

Prepare supervision of pupils both within the class and at break

times within the school setting.

All the recourses well organized, labeled and clear to all

learners.

Celebrates pupils learning and achievements through a range of display.

Concrete and visual materials, centers of interest and display of pupil

work.

Learning to Learn

Learning to learning is the third sub theme of learning

experiences.

To engage the pupils to monitor their own progress in learning

for learning technique to utilize them properly in class room to develop

the skills of learner by proper planning of lessons.

To allow the learner to communicate work with other in the clam.

How do we enable the student learner to develop their personal

organization to plan out their own work study and revision skills do we

teach.

To teach the pupils how to organized prose nil the work.

To make the pupil creative and give the opportunity for collaborative

work.

3. Teac’s Practice

The quality of teacher's practice is the third theme of teacher and

learning from work. Under this theme four sub themes are identified

Preparation for teaching

Teaching approach

Management of pupils

119

Page 129: Measurement and Evaluation (Book) Abbasi.docx

Assessment

1. Preparation for Teaching

Learning outcome:

Do we provide class, relevant and differentiated learning out

comes to pupil?

How pupils are made aware of what they are going to learn?

Are pupil familiar with the expected success criteria in learning

activities.

Written Plans

Are the long and short terms plans prepared in accordance with

the rules for primary teacher.

Does the planning clearly indicate expected learning out comes,

teaching approaches resources and activities.

Monthly Progress Report

Do our cantos miosuila provide a clear picture of the progression

and continuity of pupil learning across the curriculum.

Literacy and Numeracy

Are the literacy and numeracy opportunities identified across the

curriculum?

How we identified these opportunities an are whole school plans

individual planning?

Resources

How satisfied are we with the resources, materials and

equipment we have with in our class room and available within the

school? Are the necessary and relevant material & readily available?

Assessment

Reflect on the use of assessment as an aid to teaching and

learning how do we plan for assessment.

120

Page 130: Measurement and Evaluation (Book) Abbasi.docx

Does our planning reflect whole school assessment policy?

How do we incorporate best practice as ????? in assessment

guide line 2007 into our teaching and learning?

Teaching Approach

Learning Outcome

How are lessons guided by expected learning outcome and

linked to curriculum.

What provision is made to ensure expected learning outcomes

are achieved during lesson?

Focus of Learning

Is attention given within each curriculum area.

To the systematic development and application of knowledge

and skill including ICI?

Pupils leaving timely and does it happen at a regular interval

Analysis use of Assessment Information

Information teacher's setting of learning targets and activities for

individual pupils group, the whole class pupils group the whole class,

Inform the school improvement plan and as revise and update

whole school improvement target.

What is term piper?

Term Paper:

Definition

“A term paper has two purposes the student should demonstrate

an understanding of the material as well as the ability to communicate

that understanding effectively.

Writing term papers gives students practical experience in

writing at length communicating thoughts and idea through the written

word is a necessary skill in any profession.

A term paper is a research paper written by students over an

academic term accounting of a large part of a grade. Terms papers are

121

Page 131: Measurement and Evaluation (Book) Abbasi.docx

generally intended to describe on event, a concept or argue paint. A term

is a written original work discussing a topic in detail, usually several

typed paper in length and is often due at the end of semester there is

much overlap between term papers and research paper "The term Paper"

was originally used to describe a paper (usually a research based) that

was due to at the end of "term" either a semester or quarter, depending on

which unit of measure a school used common usage has "term paper" and

"research paper" as interchangeable but this is not completely accurate.

Not all term papers involve academic research and not all research papers

are one term papers.

Term papers date back to the beginning of the 19th century

when print could be reproduced cheaply and written text of all types

(reports memoranda specifications, and scholarly articles) could be easily

produced and disseminated during the year from 1870 to 1900, mouton

and Holmes (2003) write that American education was transformed as

writing become a method of discoursed and research the hallmark or

learning.

Importance:

Right away that you are cognizant of the fundamentals of

composing A+ research projects here are some extra mysteries to

guarantee. Never forget to dependably edit your term paper articles.

The term paper is a necessary evil for every college student.

Many students wonder why the need to regurgitate lectures and research

on paper, but the term paper actually serves an important purpose for a

college education and farther careers.

Effects:

In addition to the immediate effects on a student's course grade

and grade point average, a term paper will be valuable when searching

for or advancing in careers.

122

Page 132: Measurement and Evaluation (Book) Abbasi.docx

Term Paper Evolution

Term paper has been graded according to the following criteria.

The Cortical Section

Range depth and quality of literature research on your topic.

The author has integrated a variety of key pieces of literature on

the topic but so representing the consent state of research as well

as covering various view point.

The author has integrated a variety of key pieces of literature but

focuses too much on one particular author or view point.

No independent literature research has been carried out the

author exclusively refers to pieces of literature that have been

assigned as course reading.

Correctness of theoretical part

The content of individual pieces of literature and giving than

appropriate prominence.

The contents of individual pieces of literature are largely

represented correctly although the student may give too much

prominence to individual.

The literature review reveals that the student has not fully under

stood large parts of the literature. The content of theoretical part

is, as a result incorrect to the considerable degree.

Presentation of Literature Review Development of Argument:

The student has represented the views of prominent scholars on

the topic and has developed critical argument and support of or

against the literature represented has / her literature review is

focused on the research question and relevant to it.

The student presents the current state of research concerning the

topic at hand too much on one particular view point.

The literature review shows lack of focus. The author presents

bit and piece that are loosely related to the topic at hand.

Presentation of Results

123

Page 133: Measurement and Evaluation (Book) Abbasi.docx

The tables and figures are legible and easily to grasp at the first

glance It is evident that the author has spent find best visual

means of presentation.

The formatting of tables and figures is satisfactory yet not

always easy to grasp to grasp at the first glace.

The student has not attempted to use tables and figures to

support his / her argument.

The student's plain how he/she arrived at the result represented

and indicates their significance to the topic or field of linguistics

in which paper is written.

The author largely points out the most striking result at the same

time; however he/she concentrates too much discussing aspects

that are not entirely relevant to research question at hand.

The author mainly lists example from her data no comparison of

her results with those of previous studies is offered.

Language (Vocabulary, Grammar, Style).

The student uses the academic writing register/ style with

appropriate linguistic terminatories.

The language used is largely suitable for an academic piece of

writing but the paper exhibit some mainly recurring.

The student uses writing style which is an inappropriate for an

academic paper. There are great number of grammatical mistake

and Paragraph Lake coherence.

Further instructions End rubric for term paper grading

Each student must submit an independently-written report of

their term paper project. Team members are welcome to share literature

related to their project theme and may work jointly to develop

hypotheses, predictions and experimental design. Nonetheless, the

organization and text of each report must be developed independently by

124

Page 134: Measurement and Evaluation (Book) Abbasi.docx

each team member. Normal rules concerning plagiarism apply. If you

have any questions about this, best ask us first.

The written term paper must have the following structure and

include all of the following elements:

THE PAPER SHOULD BE A MAXIMUM IF 5 PAGES (double-

spaced; Times New Roman 12 font), excluding the title page and

literature cited.

First (title) page must include:

1. Descriptive title

2. Author

3. Abstract (NOTE: MAXIMUM 200 WORDS)

Subsequent pages must include:

4. Introduction

5. Hypotheses and predictions (these can be incorporated into the

introduction or presented below a separate sub-heading)

6. Study Area and Organisms

7. Methods and experimental design

8. Significance of work

9. Literature Cited (use a journal format of your choice (e.g.

Journal of Ecology, Oecologia)

The following criteria will be used for grading the report:

1. Abstract: Does the abstract reflect the title of the project and the aim

and scope of the work? Does it contain essential information on rationale,

hypothesis, study system and significance? Is it written clearly? (10

points)

2. Introduction: Does the introduction start by introducing a significant

question in community ecology? Are statements supported by appropriate

citations from published literature? Is the initial question refined through

the introduction to statement of the objective of the study? (10 points)

125

Page 135: Measurement and Evaluation (Book) Abbasi.docx

3. Hypotheses and predictions: Are the hypotheses presented clearly

related to the objective of the study and are they logically connected to

the ideas presented in the introduction? Are hypothesis stated correctly

(i.e., do they provide an explanation for an observation)? Are predictions

logically connected to the hypotheses? Are alternatives to the primary

hypothesis acknowledged (where appropriate)? (10 points)

4. Study area and organisms: Is necessary background information on

the ecology/natural history of study organisms and the study site

presented? Is there unnecessary/irrelevant information that could have

been omitted? Is the choice of organism/study sit appropriate given the

objective of the study? (10 points)

5. Methods and Experimental Design: Is the explanation of the

methods clear (use figures if necessary)? Are the methods appropriate to

test the hypothesis proposed? Are essential methodological details

included? This might include, for example, replication of treatments, size

of treatments, duration of experiment, description of independent and

dependent variables. Has the author considered potential confounding

effects that might interfere with the ability to test the hypothesis? Are

these recognized/addressed (where possible)? (10 points)

6. Significance, originality and creativity of work: This is the

justification statement for the project – this does NOT mean just the

conservation/management (i.e., applied) importance of your proposed

work. I am looking here for a statement to indicate how this project can

move the field of community ecology forward. Does this study

potentially provide new insights into how communities are organized?

Are the results from this study system broadly applicable to other groups

of organisms or other kinds of interactions? What other studies might

build on the results from this one – or would the results of this study

allow you to infer something new about this system that you could then

go on to test? (10 points)

126

Page 136: Measurement and Evaluation (Book) Abbasi.docx

7. Literature Cited: Are the papers in the body of the proposal cited in

here? Are the references cited here in the body of the text? Is consistent

formatting used? (5 points)

8. Presentation and clarity: Are the different sections of the proposal

well linked? Are the ideas presented clearly – and can they be followed

from one section of the proposal to the next? Is the writing style clear

(topic sentences introduce themes presented in each paragraph; concise

language used; spelling and grammar acceptable)? Use of tense and

active/passive voice is consistent? Note: Use the past tense to describe

results found in previous studies; use future tense "we will..." to describe

work that you propose to do (5 points).

5.2 EVALUATING GROUP WORK & PERFORMANCE

Evaluating group work can provide valuable information about

the degree to which:

The use of group work enhanced (or otherwise) student

achievement of learning outcomes and engagement.

The use of group work enhanced (or otherwise) evaluator

delivery or assessment of the unit of study,

Specific Questions:

Evaluator can ask more specific questions about:

The response of individual students to group work as compared

to individual work.

Group work process versus the group work product.

The effectiveness of group work in class and/or out of class to

enhance learning,

The appropriateness of group work.

Organizational, planning, management and monitoring issues.

Strengths and weakness of group work and ideas for

improvement.

Diversity issues (did some students find it easier or harder,

benefit more

127

Page 137: Measurement and Evaluation (Book) Abbasi.docx

than others and why, what about issues of power),

The ways in which explained, facilitated, managed and

monitored the groups.

The overall nature of the unit o study.

Timing of Evaluation:

Evaluation an occur at any time during the unit of study

program, but it usually occurs at the end of the semester or at the end of

the task that is being undertaken and evaluated.

Ideally students should be given time to reflect upon their

experiences prior to completing any form of evaluation especially if

evaluator desire some specific information about their experiences of

group work or have a specific reflection component within the work

being evaluated.

It is also important to clearly explain why undertaking

evaluating?

It's a good idea to explain all of this at the start of the unit of

study and to provide opportunities for students to reflect along the way.

Evaluation can also be built into the requirements of the group

work tasks by asking students to complete an evaluation of their own or

the whole groups experience of group. This could also be a requirement

of their assessment. It is up to evaluator whether or not to allocate marks.

Method for Collecting Data for Evaluation:

There is no single method for designing or conducting an

evaluation method can be quantitative or qualitative, formal or informal,

formative or summative, self administrated or externally administered, or

any combination of these. There are advantages and disadvantages to

each method and evaluator will largely depend upon the purpose of the

evaluation and the content, material practices, tasks or activities being

evaluated,

128

Page 138: Measurement and Evaluation (Book) Abbasi.docx

1. Questionnaire:

Questionnaire is a common method of approach that involves

having students complete a survey in the class. When evaluator designing

questionnaire ensure that there is an introduction which explains the

purpose of the evaluation, that there are clear instructions for completion

and that the questions are unambiguous. The questions posed can be open

ended or closed, or a combination.

Open Ended Questions:-

These questions have the advantage of allowing students to

identify what were the most important elements of their experience. A

disadvantage is that they may not write much or may be nothing at all.

Closed Questions:

These are statements that allow students to rate their agreement

or disagreement with a comment or statement by using a Liker Scale.

Strongly

agree

Agree Neural Disagree Strongly agree

disagree

Students are usually willing to answer these questions, especially if the

questionnaires are anonymous. A disadvantage is that they do not give

detailed response or answer "why" or "how" questions.

2. Checklist:

Checklist is another method that can provide basic data.

An example may be a list of provided unit outcomes

(knowledge, skills, attributes, abilities etc) and students circle or tick the

ones that apply.

Alternatively evaluator could ask to generate their own list of

outcomes, For example group work provided me with......

Autonomy.

Opportunity to get to know my classmates.

129

Page 139: Measurement and Evaluation (Book) Abbasi.docx

Opportunity to work on a real life problems students are usually

willingly to complete these lists but again the disadvantages is that they

do not give detailed responses or answer "why" or "how" questions.

Evaluation Hand-Out:

Some academies design their own evaluation hand-out that can

combine a number of evaluation methods and are anonymous, quick and

easy to complete. They can take any form, use images, diagrams,

comment boxes or questions and lists as above.

Interview:

Interview can be done individually or in small groups and

provide the opportunity for evaluator to probe for deeper analysis of the

process and experience.

The disadvantage of this method is that it can be time

consuming for both evaluator and the students, and in a larger group may

be some students may be more vocal than others.

Focus Group:

Focus group uses a facilitative rather than direct questioning

approach and is a useful way of having students discusses the process of

group work. This method allows students to work off and build upon

each other's answers.

The disadvantage is that it is time consuming for both evaluator

and students and there is the added difficult of arranging a time that will

suit everyone.

Practicality of the Evaluation Process:

Before making a choice about evaluation method also consider

the following questions:

What resource is needed to undertake the evaluation? What has to be done in order to undertake the evaluation

(printing of forms, preparation of one-line questionnaire,

ordering questionnaires, arranging interview rooms)?

130

Page 140: Measurement and Evaluation (Book) Abbasi.docx

What levels of participation evaluator require from the students,

tutors, organizations or any other party who were involved in the

group work activities?

Uses of Evaluation:

It is important to consider who will use the evaluations and how

it will be used. This is a key part of the planning process which relates to

the purpose of the evaluation.

It is also important to reflect upon and consider the methods that

have been used to gather information about the effectiveness of group

work.

5.3 EVALUATING DEMONSTRATION:

1. The evaluation portion of the demonstration-performance

method is where students get an opportunity to prove that they

can do the manoeuvre without assistance.

2. For the simulated forced approach you should tell students that

you will be simulating an engine failure and that they are to

carry out the entire procedure including all checks and look-out.

3. While the student is performing this manoeuvre you must refrain

from making any comments. Offer no assistance whatsoever, not

even grunts or head nods. You must, however, observe the entire

manoeuvre very carefully, so that you can analyze any errors

that the student may make and debrief accordingly.

NOTE: You would interrupt the student's performance, of course, if

safety became a factor.

4. Success or failure during the evaluation stage of the lesson will

determine whether you carry on with the next exercise or repeat

the lesson.

Demonstration

131

Page 141: Measurement and Evaluation (Book) Abbasi.docx

1. The explanation and demonstration may be done at the same

time, or the demonstration given first followed by an

explanation, or vice versa. The skill you are required to teach

might determine the best approach.

2. Consider the following: You are teaching a student how to do a

forced landing. Here are your options:

a. Demonstrate a forced approach and simultaneously give an

explanation of what you are doing and why you are doing it; or,

b. Complete the demonstration with no explanation and then give a

detailed explanation of what you have done; or

c. Give an explanation of what you intend to do and then do it.

You will find that different instructors will approach the

teaching of this skill differently. The following represents a suggested

approach that appears to work best for most instructors.

On the flight prior to the exercise on forced landings, give a

perfect demonstration of a forced landing. It may be better not to talk

during this demonstration, since you want it to be as perfect as possible to

set the standard for the future performance. There is another advantage of

giving a perfect demonstration prior to the forced landing exercise. Your

students will be able to form a clearer mental picture when studying the

flight manual because they have seen the actual manoeuvre.

a. The next step would be for you to give a full detailed

explanation of a forced landing. During this explanation you

would use all the instructional techniques described previously.

You must give reasons for what is expected, draw comparisons

with things already known and give examples to clarify points.

This explanation should be given on the ground using visual aids

to assist student learning.

b. When in the air, give a demonstration, but also include

important parts of the explanation. Usually asking students

questions about what you are doing or should do, will give them

132

Page 142: Measurement and Evaluation (Book) Abbasi.docx

an opportunity to prove they know the procedure, although they

have not yet flown it.

c. After completing the forced landing approach, while climbing

for altitude, clear up any misunderstandings the students may

have and ask questions.

d. The demonstration and explanation portion of the

demonstration-performance method is now complete and you

should proceed to the next part, which is the student

performance and instructor supervi

Evaluation Matrix for the Demonstration

When assessing the demonstration of teaching skills, attention is

given to the applicant's use of didactic solutions. The following matrix

transparently describes the criteria used to evaluate the demonstration.

The matrix is indicative instead of normative, and is used for support

when evaluating the demonstration of teaching skills. In other words, not

all of the aspects listed in the matrix need to be assessed systematically.

The evaluators use the criteria listed below to form an overall appraisal of

the demonstration's standard by assessing the quality of the components

that are of a good or better level. If the demonstration includes a

preliminary assignment, all the individual components are assessed in

relation to it. If well grounded, the demonstration may also be virtual,

held in real time and interactively.

133

Page 143: Measurement and Evaluation (Book) Abbasi.docx

Component of the demonstration of

teaching skills

Passable

Satisfactory

Good Very good

Objectives The applicant specifies the objectives.

The applicant specifies the objectives clearly

The applicant specifies the objectives taking into account the context, content and target group.

Content

Correspondence between the topic and content of the demonstration

Academic nature of the content

Consistency and clarity of presentation of the content

Critical approach

Many-sided argumentation

Connection between theory and practice

Aptness, diversity and topicality of the research data used

Use of own research results

Consideration given to the target group in the choice of content

The topic and content of the demonstration correspond to each other.

The content is academic.

The applicant presents the content clearly and consistently.

Where appropriate, the applicant uses his/her own research results during the demonstration.

The applicant takes the

The topic and content of the demonstration correspond to each other.

The content is academic and topical.

The applicant presents the content clearly and consistently.

The applicant examines the conte critically.

The applicant discusses the topics from many angles.

The applicant explains the connection between theory and practice.

The research

134

Page 144: Measurement and Evaluation (Book) Abbasi.docx

target group into consideration when.

data discussed are relevant, many-sided and topical.

Where appropriate, the applicant uses his/her own research results during the demonstration.

Methods

Organization of teaching

Motivation of target group

Suitable use of teaching methods

Suitable use of teaching aids and materials

The teaching situation is organized appropriately.

The teaching situation is organized appropriately, taking into consideration its objectives, contents, target group and context.

The applicant inspires the target group to engage, stimulates the listeners’ interest and motivates them to participate.

The applicant uses different teaching methods appropriately in terms of the

135

Page 145: Measurement and Evaluation (Book) Abbasi.docx

situation, objectives.

Wrap-up

Evaluation of the teaching situation in terms of the objectives set

Consideration given to the target group in solutions related to evaluation

The applicant evaluates the teaching situation in terms of the objectives set.

The applicant evaluates the teaching situation in terms of the objectives set.

The solutions related to evaluation are relevant and take the target group into consideration.

Interaction skills Use of voice Clarity and

intelligibility of speech

Coherence of oral and written communication

Quality of interaction

Other matters improving communication.

The applicant’s delivery is clear and understandable.

Oral and written communication is coherent.

The applicant’s delivery is clear and understandable.

Oral, written and visual communication is coherent.

The applicant interacts with the listeners in a natural and appropriate manner in teaching situation.

Alignment of the preliminary assignment and the demonstration of teaching skills

The preliminary assignment and the

The preliminary assignment lays the foundation

136

Page 146: Measurement and Evaluation (Book) Abbasi.docx

demonstration of teaching skills are well aligned.

for and supports the demonstration, and the two form a consistent whole.

137

Page 147: Measurement and Evaluation (Book) Abbasi.docx

5.4 EVALUATION OF PHYSICAL MOVEMENTS AND MOTOR SKILLS:

Motor Skills

A motor skill is a function, which involves the precise

movement of muscles with the intent to perform a specific act.

Motor skills are skills that are associated with the activity of the

body muscles like the skills performed in sport. Fine motor skills arc the

type that is associated with small movements of the wrists, hands, feet,

fingers and toes.

Motor skills are the ability to make particular bodily movements

to achieve certain tasks. They are a way of controlling muscles to make

fluid and accurate movements. These skills must be learned, practiced

and mastered, and overtime can be performed without thought, for

example, walking or swimming. Children are clumsy in comparison to

adults, because they have yet to learn many motor skills that allow them

to effectively accomplish tasks.

Motor skills are also learned and refined in adulthood. If a

woman takes up belly dancing, her first movements will not closely

resemble that of the teacher. Overtime however, she will learn how to

control her muscles to make the signature movements that a belly dancer

makes.

Genetic factors also affect the development of motor skills, for

example, the children of a professional dancer are far more likely to be

good at dancing, with good coordination and muscular control, than the

children of a biochemist. Gross motor skills are usually learned during

childhood and require a large group of muscles to perform actions, such

as balancing or crawling. Fine motor skills involve smaller groups of

muscles and are used for fine tasks, such as threading a needle or playing

a computer game. These skills can be forgotten if disused over time.

138

Page 148: Measurement and Evaluation (Book) Abbasi.docx

Types of Motor Skills

There are two major categories of motor skills

1. Gross Motor Skills

2. Fine Motor Skills

Gross Motor Skills

Gross motor skills maneuver large muscle groups

coordinating functions for sitting, standing, walking, running,

keeping balance and changing positions. These skills involve skills are

those that are typically acquired during infancy and young childhood to

control the large muscles of the body. These skills include sitting,

crawling, walking. According to Anna Maria Wilms Floet, MD, on

Medicine. Throwing a ball, riding a bike, playing sports, lifting and

sitting upright are brief descriptions of large motor movements. Gross

motor skills depend upon muscle tone, the contraction of muscles and

their strength for positioning movements.

Fine Motor Skills

Fine motor skills coordinate precise, small movements involving

the hands, wrists, feet, toes, lips and tongue. Features of fine motor

control include handwriting, drawing, grasping objects, cutting and

controlling a computer mouse. Experts agree that one of the most

significant fine motor achievements is picking up a small object with the

index finger and thumb referred to as the pincher grip, which usually

occurs between 8 and 12 months of age

Fundamental Motor Skills

Fundamental motor skills are common motor activities with

specific observable patterns. Most skills used in sports and movement

activities are advanced versions of fundamental motor skills. For

example, throwing in softball and cricket, the baseball pitch, javelin

throw, tennis serve and netball shoulder pass are all advanced forms of

the overhand throw. The presence of all or part of the overhand throw can

be detected in the patterns used in these sport specific motor skills.

139

Page 149: Measurement and Evaluation (Book) Abbasi.docx

Similar relationships can be detected among other fundamental motor

skills and specific sport skills and movements.

Assessment of Motor Skills

A motor skills assessment is an evaluation of a patient to

determine the extent and nature of motor skill dysfunction. Care

providers like physical therapists and neurologists can perform the

assessment, which may be ordered for a number of reasons. It is not

invasive, but does require the completion of a number of tasks. The

length of time required can vary, depending on the test or tests used. It

may be necessary to set aside a full day for testing.

One reason for a motor skills assessment may be to establish a

child's baseline level of motor competency. This can provide a reference

point for the future. Physical education teachers, for example, may

perform brief assessments with new students to determine which kinds of

activities would be safe and appropriate for them. Pediatricians also use

such testing to assess their patients. If a child appears to have

developmental delays, this may result in a referral for a more extensive

examination

Different Ways to Assess Motor Skills

Motor Skills can be evaluated in different ways .some of them

are as follow.

1. Test gross motor skills using range of motion. Assess gross motor

skills by asking the individual to perform a series of movements known

as range of motion. Evaluate range of motion by asking the individual to

hold an arm out and move it in a circular direction. The arm should be

able to move in a complete circle when fully extended. Then ask the

individual to stand and place one leg out. Have the individual move the

leg up and down, back and forth and left to right. Note any difficulty in

movement, abnormalities or pain experienced by the individual.

2. Assess gross motor skills using games. Gross motor skills can be

evaluated using games and sports. Ask the individual to kick a ball to test

gross motor skills of the leg. Jumping rope is a great way to evaluate

140

Page 150: Measurement and Evaluation (Book) Abbasi.docx

motor skills, because it uses both the arms and legs working together to

accomplish the task. Hopscotch, basketball and walking on a balance

beam are also good ways to evaluate gross motor skills. Look for the

fluidity of movement, problems with balance and hand-eye coordination.

3. Evaluate fine motor skills of arms and legs. Ask the individual to put

a clothespin on the edge of a box. Stringing beads on a shoelace is

another way to assess fine motor skills. Using a stapler and placing a

paperclip on a sheet of paper are also ways to assess fine motor skills.

Place an item on the floor and ask the individual to pick it up using his

toes only. Watch the individual perform each task, looking at how

smooth the movements are and how easily the task is completed, and note

any difficulties.

4. Test fine motor skills using common household items. Give the

individual a jar and ask her to unscrew the lid and screw the lid back on.

Ask the individual to place items, such as coins or blocks, into containers

such as a bowl, bucket or cup. Draw a straight line on a piece of paper

and have the individual use a pair of scissors to cut the line on the paper.

Using pencils or pens of different sizes, ask the person to pick up and

grasp each pencil/pen. Then ask the individual to trace items drawn on

the paper. Watch for the completion of each task, looking for any

problems during each movement.

5. Assess fine motor skills while getting dressed. Ask the individual to

put on and button up a shirt. Next, have the individual put on a pair of

pants that have a snap closure and a zipper. Give the individual a pair of

shoes, which have shoelaces and not Velcro closures, and ask him to tie

the shoes. Watch the individual perform each task, looking for

difficulties, abnormal movements and the ability to perform each task

completely without help.

Some Motor Skills and Their Evaluation for Preschoolers

Dancing, either freestyle or through songs with movements, such as "I'm

a Little Teapot Dance and movement classes, like pre-ballet, can be fun

but aren't necessary for motor-skills developmen

141

Page 151: Measurement and Evaluation (Book) Abbasi.docx

Walking, around the house, neighborhood, or park. For variety, add in

marching, jogging, skipping, hopping, or even musical instruments to

form a parade. As they walk, tell stories, count, or play games. Observe

the child how he walks on a piece of string or tape, a low beam or plank

at the playground, or a homemade balance beam.

Playing pretend: Kids boost motor skills when they use their bodies to

become waddling ducks, stiff-legged robots, galloping horses, soaring

planes—whatever their imagination comes up with!

Riding tricycles, scooters, and other ride-on toys; pulling or pushing

wagons, large trucks, doll strollers, or shopping carts.

Playing tag or other classic backyard games, such as Follow the Leader,

Red Light/Green Light, Tails, or Simon Says (avoid or modify games that

force kids to sit still or to be eliminated from play, such as Duck Duck

Goose or musical chairs). Throwing, catching, and rolling large,

lightweight, soft balls Swinging, sliding, and climbing at a playground or

indoor play space.

Ball Control Skills

The following ball skills are generic in that they are not specific to a

particular sport, and they are grouped by whether they require one or two

balls. Skills are listed in their approximate order of difficulty. Younger

movers may use a plastic ball, volleyball, or child's basketball, and older

movers may use a youth-sized or adult-sized basketball. Select the

highlighted name of a given skill to view a short video clip.

Assessing Motor Skills in Early Childhood - Using the PDMS

(Peabody Developmental Motor Scale)

Does your toddler have special needs? Early diagnosis of

problems in developmental motor skills is crucial for helping children

with special needs. One of the most popular assessment tools is the

Peabody Developmental Motor Scale. Is it reliable and sufficiently

responsive?

142

Page 152: Measurement and Evaluation (Book) Abbasi.docx

After more than ten years of extensive research, a second edition

known as the PDMS-2 finally replaced the first edition of the Peabody

Developmental Motor Scale. The authors, M. Rhonda Folio and Rebecca

R. Fewell, claim that the new and updated version provides better and

more in-depth assessment of the gross and fine motor skills of preschool-

age children. The PDMS-2, of course, is just one of the most commonly-

used assessments for measuring the motor skills of toddlers. However,

for children with special needs, the Peabody Development Motor Scale is

one of the most reliable testing instruments used by many professionals,

such as therapists, psychologists, and diagnosticians.

Purpose of the Test

The main purpose of the Peabody Developmental Motor Scale is

to test the motor skills of children. Gross motor skills involve using large

muscles such as in bending, balancing, crawling, walking, and jumping.

Fine motor skills, on the other hand, involve using smaller muscles,

particularly the muscles in the hand. A child, at a specific age, is expected

to display proficiency at certain motor skills.

With the PDMS-2, most dysfunctions of motor skills will be

identified. And using the results of the PDMS-2, the special education

teacher, parents, and other professionals of the IEP team can develop a

more responsive learning and remediation program for the child with

special needs. Would you want your child to take this assessment test?

The next part describes how the test will be administered.

Administration of the Test

This assessment test is composed of six sub-tests that include

special instructions on how each is administered to the preschool-age

child. To keep the results of the test reliable and precise, the actual

instructions on how the test will be carried out are only given to the test

administrators and psychologists. This will prevent the parents from

"preparing" their child to pass the test. But the sub-tests are given below:

Reflexes – A reflexive action is a quick and automatic reaction to a

particular environmental stimulus. This reaction is measured in this sub-

143

Page 153: Measurement and Evaluation (Book) Abbasi.docx

test that is composed of eight items. This sub-test, however, is

administered only to children who are 11 months and younger because

reflexes have been observed to be extensively integrated within 12

months.

Stationary – This sub-test aims to measure the child's ability to maintain

balance or equilibrium. It involves mainly the ability of the preschool-age

child to control his or her body. It is composed of 30 items.

Locomotion – This sub-test evaluates the child's ability to move. The

movement involves crawling, walking, running, and other similar actions.

The sub-test has 89 items.

Object Manipulation – In this sub-test, the object that is manipulated is

the ball. Since it is developmentally impossible for babies to even hold a

ball, this sub-test is administered only to children who are older than 11

months. This 24-item sub-test involves activities such as throwing,

catching, and kicking balls.

Grasping – This sub-test primarily measures the preschool-age child's

ability to use the muscles of the hand. Made of 26 items, the sub-test

progressively determines their ability to grasp objects and to control

fingers.

Visual-Motor Integration – This sub-test evaluates the child's eye and

hand co-ordination. Aside from controlling muscles, the test determines

the level of the child's visual perception. Some examples of the activities

of this 72-item sub-test include building blocks and copying designs.

5.5 EVALUATING ORAL PERFORMANCE:

Communication skills are taught in a wide range of general

education courses and students are in need of speaking and listening

skills that will help them succeed in future courses and in the workplace.

Thus, the assessment of communication skills is an important issue in

general education .Oral assessment is often carried out to look for

students' ability to produce words and phrases by evaluating students'

144

Page 154: Measurement and Evaluation (Book) Abbasi.docx

fulfillment of a variety of tasks such as asking and answering questions

about themselves, doing role-plays, making up mini-dialogues, defining

or talking about some pictures given them. The operations in an oral

ability test are either informational skills or interactional skills. The

testing of speaking is widely regarded as the most challenging of all

language tests to prepare, administer and score.

Kind of Oral Communication

Oral communication can also be delivered individually or as part

of a team. Therefore, knowing the kind of oral communication act that is

expected is a necessary step in being able to give useful feedback and

ultimately an accurate evaluation

Pronunciations

Pronunciation is a basic quality of language learning. Though

most second language learners will never have the pronunciation of a

native speaker, poor pronunciation can obscure communication and

prevent a student from making his meaning known. When evaluating the

pronunciation of students, listen for clearly articulated words, appropriate

pronunciations of unusual spellings, and assimilation and contractions in

suitable places. Also listen for intonation. Are students using the correct

inflection for the types of sentences they are saying? Do they know that

the inflection of a question is different from that of a statement? Listen

for these pronunciation skills and determine into which level student

falls.

Vocabularies

Vocabulary comprehension and vocabulary production are

always two separate banks of words in the mind of a speaker, native as

well as second language. Teacher should encourage students to have a

large production vocabulary and an even larger recognition vocabulary.

For this reason it is helpful to evaluate students on the level of vocabulary

they are able to produce. Are they using the specific vocabulary

instructed them in the class? Are they using vocabulary appropriate to the

contexts in which they are speaking? Listen for the level of vocabulary

145

Page 155: Measurement and Evaluation (Book) Abbasi.docx

students are able to produce without prompting and then decide how well

they are performing in this area.

Accuracy

Grammar has always been and forever will be an important issue

in foreign language study. Writing sentences correctly on a test, though,

is not the same as accurate spoken grammar. As students speak, listen for

the grammatical structures and tools teachers have taught them. Are they

able to use multiple tenses? Do they have agreement? Is word order

correct in the sentence? All these and more are important grammatical

issues, and an effective speaker will successfully include them in his or

her language.

Communications

A student may struggle with grammar and pronunciation, but

how creative is she when communicating with the language she knows?

Assessing communication in the students means looking at their creative

use of the language they do know to make their points understood. A

student with a low level of vocabulary and grammar may have excellent

communication skills if she is able to make other understand

her/him„ whereas an advanced student who is tied to manufactured

dialogues may not be able to be expressive with language and would

therefore have low communication skills. Don't let a lack of language

skill keep the students from expressing themselves. The more creative

they can be with language and the more unique ways they can express

themselves, the better their overall communication skills will be.

Interactions

Ask the students questions. Observe how they speak to one

another. Are they able to understand and answer questions? Can they

answer when teacher ask them questions? Do they give appropriate

responses in a conversation? All these are elements of interaction and are

necessary for clear and effective communication in English. A student

with effective interaction skills will be able to answer questions and

follow along with a conversation happening around him. Great oratory

146

Page 156: Measurement and Evaluation (Book) Abbasi.docx

skills will not get anyone very far if he or she cannot listen to other

people and respond appropriately. Encourage your students to listen as

they speak and have appropriate responses to others in the conversation.

Fluency

Fluency may be the easiest quality to judge in your students'

speaking. How comfortable are they when they speak? How easily do the

words come out? Are there great pauses and gaps in the student's

speaking? If there are then your student is struggling with fluency.

Fluency does not improve at the same rate as other language skills. You

can have excellent grammar and still fail to be fluent. You want your

students to be at ease when they speak to you or other English speakers.

Fluency is a judgment of this ease of communication and is an important

criterion when evaluating speaking.

Suggestions for Improvement

Offer suggestions (rather than criticisms) for improved delivery

style. Many students are aware of their difficulties in delivering oral

communication and want feedback and support, and they do want

suggestions. Not so useful: "Don't wave your hands when you

talk."Better: "Let's figure out what you're going to do with your hands so

that you don't distract the audience from what you are saying. What feels

more natural to you?"

Present oral communication skills as a set of professional skills

that all professionals learn and practice steadily throughout their lives.

147

Page 157: Measurement and Evaluation (Book) Abbasi.docx

UNIT-6:

PORTFOLIOS

6.1 PURPOSE OF PORTFOLIOS:

Literally Definition:

A) a large, flat, thin case for carrying loose papers or drawings or

maps; usually leather

B) a set of pieces of creative work collected to be shownto

potential customers or employers; ' the artist had put together a

portfolio of his work"; "every actor has a portfolio of

photographs"

C) A collection of various company shares, fixed interest securities

or money-market instruments.

Terminological Ally Definition:

A portfolio is a purposeful collection of student work that

exhibits the student's efforts, progress, and achievements in one or more

areas of the curriculum. The collection must include the following:

Student participation in selecting contents.

Criteria for selection.

Criteria for judging merits.

Evidence of a student's self-reflection.

It should represent a collection of students' best work or best

efforts, student-selected samples of work experiences related to outcomes

being assessed, and documents according growth and development

toward mastering identified outcomes.

Purpose of Portfolios:

In this new era of performance assessment related to the

monitoring of students' mastery of a core curriculum, portfolios can

enhance the assessment process by revealing a range of skills and

understandings one students' parts; support instructional goals, reflect

148

Page 158: Measurement and Evaluation (Book) Abbasi.docx

change and growth over a period of time; encourage student, teacher, and

parent reflection; and provide for continuity in education from one year

to the next. Instructors can use them for a variety of specific purposes,

including:

Encouraging self-directed learning.

Enlarging the view of what is learned.

Fostering learning about learning.

To promote student control of learning

To track student progress

To demonstrate individual growth

To respond to individual needs

To evaluate and report on student progress

To facilitate student-led conferences

To show process and product

To show final products

To show student achievement with respect to specific curricular

goals

To document achievement for alternative credit

To accumulate "best work" for admission to other educational

institutions or program

Demonstrating progress toward identified outcomes.

Creating an intersection for instruction and assessment. ,

Providing a way for students to value themselves as learners.

Offering opportunities for peer-supported growth.

Benefits of Portfolio:

One of the most important benefits of using portfolios is the

enhancement of critical thinking%, skills which result from the

need for students tot

Develop evaluation criteria

Students are pleased to observe their personal growth,

They have better attitudes toward their work, and

They are more likely to think of themselves as writers.

Factors that go into the development of a student portfolio assessment:

149

Page 159: Measurement and Evaluation (Book) Abbasi.docx

1. First, you must decide the purpose of your portfolio. For example,

the portfolios might be used to show student growth, to identify

weak spots in student work, and/or to evaluate your own

teaching methods.

2. After deciding the purpose of the portfolio, you will need to

determine how you are going to grade it. In (titer words, what

would a student need in their portfolio for it to be considered

success and for them to earn a passing grade.

3. The answer to the previous two questions helps form the answer to

the third: What should be included in the portfolio? Are you

going to have students put of all their work or only certain

assignments? Who gets to choose?

How to Build a Student Portfolio

The following suggestions will help you effectively design a

student portfolio.

1. Set a Purpose for the Portfolio. First, we need to decide what

your purpose of the portfolio is. Is it going to be used to show

student growth or identify specific skills? Are we looking for a

concrete way to quickly show parents student achievement, or

are we looking for a way to evaluate your own teaching

methods? Once we have figured out your goal of the portfolio,

then we think about how to use it.

2. Decide How ' You Will You Grade it. Next, we will need to

establish how we are going to grade the portfolio. There are

several ways you can grade students work, we can use a rubric,

letter grade, or the most efficient way would be to use a rating

scale. Is the work completed correctly and completely? Can we

comprehend it? we can use the grading scale of 4-1. 4 = Meets

all Expectations, 3 = Meets Most Expectations, 2 = Meets Some

Expectation, 1 = Meets No Expectations. Determine what skills

you will be evaluating then use the rating scale to establish a

grade.

150

Page 160: Measurement and Evaluation (Book) Abbasi.docx

3. What will b Included in it. How will we determine what will

go into the portfolio? Assessment portfolios usually include

specific pieces that students are required to know. For example,

work that correlates with the Common Core Learning Standards.

Working portfolios include whatever the student is currently

working on, and display portfolios showcase only the best work

students produce. Keep in, mind that we can create a portfolio

for one unit and not the next. We get to choose what is included

and how it is included. If you want to use it as a long-term

project and include various pieces throughout the year, we can.

But, we can also use it for short- term projects as well.

4. How Much Will You Involve the Students. How much we

involve the students in the portfolio depends upon the students

age. It is important that all students should understand the,

purpose of the portfolio and what is expected of them. Older

students should be give n a checklist of what is expected, and

how' it will be graded. Younger students may 1 of understand

the grading scale so we can give them the option of what w 11

be include d in their portfolio. Ask them questions such as, why

did you choose this particular piece and does it represent your

best work? Involving students in the portfolio process will

encourage them to reflect on their work.

5. Will You Use a Digital Portfolio. With the fast-paced world of

technology, paper portfolios may'become a thing of the past.

Electric portfolios (e-portfolios/digital portfolios) are Teat

because they are easily accessible, easy to transport and easy to

use. Today’s students are tuned into the latest must-have

technology, and electronic portfolios arc part of that. With

students using an abundance of multimedia outlets, digital

portfolios seem like a great fit. The uses of these portfolios are

the same, students still reflect upon the r work but only in a

digital way.

151

Page 161: Measurement and Evaluation (Book) Abbasi.docx

The key to designing a student portfolio is to take the time to

think about what kind it will be, and how we well manage it. Once we do

that and follow the steps;above, we will find it will be a success.

Types C F Portfolios Duo

1) Best Work Portfolio

This type of portfolio highlights and shows evidence of the best

work of learners. Frequently, this type of portfolio is called a display or

showcase portfolio. For Students, best work is often associated with pride

am a sense of accomplishment and can result in a desire to share their

work with o hers. Best work can include both product and process. It is

often correlated with the amount of effort that few learners have invested

in their work. A major advantage of this type of portfolio is that learners

(an select items that reflect their highest level of learning and canexplain

why these it (ms represent their best effort and achievement. Best work

portfolios are used for the following purposes:

Student Achievement. Students may select a given number of entries

(e.g., 10) that reflect their best effort or achievement (or both) in a course

of study. The portfolio can be presented in a student-led parent

conference or at a community open house. As students publicly share

their excellent work, work they have chosen and reflected upon, the

experience may enhance their self-esteem.

Post-Secondary Admissions. The preparation of g.post-secondary

portfolio targets work samples from high school that can be submitted

forconsideration in the process of admission to college or university. This

portfolio should show evidence of a range of knowledge, skills, and

attitudes, and may highlight particular qualities relevant to specific

programs. Many colleges and universities are adding portfolios to the

initial admissions process while others are using them to determine

particular placements once students are admitted.

Employability. The audience for this portfolio is an employer, .This

collection of work needs to be focused on specific knowledge, skills, and

attitudes necessary for a particular job or career. The school-to-work

152

Page 162: Measurement and Evaluation (Book) Abbasi.docx

movements in North America are influencing an increase in the use of

employ-ability portfolios. The Conference Board of Canada (1092), for

example, outlines the academic, personal management, and teamwork

skills that are the foundation of a high-quality Canadian workforce. An

employability portfolio is an excellent vehicle for showcasing these

skills.

2) Growth Portfolio

A growth portfolio demonstrates an individual's development

and growth over time. Development can be focused on academic or

thinking skills, content knowledge, self-knowledge, or any area that is

important in your setting. A focus on growth connects directly to

identified educational goals and purposes. When growth is emphasized, a

portfolio will contain evidence of struggle, failure, success, and change.

The growth will likely be' an uneven journey of highs and lows, peaks

and valleys, rather than a smooth continuum. What is significant is that

learners recognize growth whenever it occurs and can discern the reasons

behind that growth. The goal of a growth portfolio isfor learners to see

their own changes over time and, in turn, share their journey with others.

A growth portfolio ca -I be culled to extract a best work sample.

It also helps learners see how achievement is often a result of their

capacity to self-evaluate, set goals, and work over time. Growth

portfolios car be used for the following purposes:

Knowledge. This portfolio shows students' growth in knowledge in a

particular content area or across several content areas over time. This

kind of portfolio can contain samples of both satisfactory and

unsatisfactory work, along with reflections to guide further learning.

Skills and Attitudes. This portfolio shows students' growth in skills and

attitudes in areas such as academic discipline s, social skills, thinking

skills, and work habits. In this type of portfolio,challenges, difficult

experiences, and other growth events can be included to demonstrate

students' developing skills. In a thinking skills portfolio; for example,

153

Page 163: Measurement and Evaluation (Book) Abbasi.docx

students might include evidence showing growth in their ability to recall,

comprehend, apply, analyze, synthesize, and evaluate information

Teamwork. This portfolio demonstrates growth in social skills in a

variety of cooperative experiences. Peer responses and evaluations are

vital elements in this portfolio model, along with self-evaluations.

Evidence of changing attitudes resulting from team experiences can also

be included, especially s expressed in self-reflections and peer

evaluations.

Career. This portfolio helps students identify personal strengths related

to potential career choices: The collection can be developed over several

years, perhaps beginning in middle school and continuing throt4;hout

high school. The process of selecting pieces over time empowers young

people to make appropriate educational choices leading toward

meaningful careers. Career portfolios mat items from outside the

school setting that substantiate students' choices and create a holistic

view of the students as learners and people. This type of portfolio may be

modified for employment purposes.

3) Showcase Portfolios

Showcase portfolios highlight the best products over a particular

time period or course. For example, a showcase portfolio in a

composition class may include the best examples of different writing

genres, such an essay, a poem, a short story, a biographical piece, or a

literary analysis. In a business class, the showcase portfolio may include

a resume, sample business letters, a marketing project, and a

collaborative assignment that demonstrates the individual's ability to

work in a team. Students are often allowed to choose What they believe

to be their best work, highlighting the it achievements and skills.

Showcase reflections typically focus on the strengths of selected pieces

and discuss how each met or exceeded required standards

4) Process Portfolios

154

Page 164: Measurement and Evaluation (Book) Abbasi.docx

Process portfolios, by contrast, concentrate more on the journey,

of learning rather than the final destination or end pro lusts of the

learning process. In the composition class, for example, different stages

of the process—an outline, first draft, peer and teacher responses, early

revisions, and a final edited draft—may be required. A process reflection

may discuss why a particular strategy was used, what was useful or

ineffective for the individual in the writing process, and how the student

went about making progress in the face of difficulty in meeting

requirements. A process reflection typically focuses on many aspects of

the learning process, including the following: what approaches fiches

work best, which are ineffective, information about oneself as a learner,

and strategies or approaches to remember in future assignments.

5) Evaluation Portfolios.

Evaluation portfolios may vary substantially in their content.

Their basic purpose, however, remains to exhibit a series of evaluations

over a course and the learning or accomplishments of the student in

regard to previously determined criteria or goals. Essentially, this type of

portfolio documents tests, observations, records, or other assessment

artifacts required for successful completion of the course. A math

evaluation portfolio may include tests, quizzes, and written explanations

of how me went about solving a problem or determining which formula

to use, whereas a science evaluation portfolio might also include

laboratory experiments, science project outcomes with photo ; or other

artifacts, and research reports, as well as tests and quizzes. Unlike the

showcase portfolio, evaluation portfolios do not simply include the best

work, but rather a selection of predetermined evaluations that may also

demonstrate students' difficulties and unsuccessful struggles as well as

their better world. Students who reflect on why some work was

successful and other work was less so continue their learning as they

develop their met cognitive skills.

6) Online or e-portfolios

155

Page 165: Measurement and Evaluation (Book) Abbasi.docx

Online or e-portfolios may be one of the above portfolio types or

a combination of different types, a general requirement being that all

information and artifacts are somehow accessible online. A number of

colleges require students to maintain a virtual portfolio that may include

digital, video, or Wet -based products. The portfolio assessment process

may be linked to a specific course or an entire program. As with all

portfolios, students are able to visually track and show their

accomplishments to a wide audience,

Conclusion: The portfolio process will continue to be refined and efforts

made to improve students' perceptions if the process as it is intended to

develop the self-assessments skills they will need to improve their

knowledge and professional skills throughout their education careers.

6.3 GUIDELINE AND STUDENTS ROLE IN SELECTION OF PORTFOLIO ENTRIES AND SELF-EVALUATION:

Portfolio:

An organized presentation of an individuals education, work

samples, and skills.

Terminologically a portfolio is a purposeful collection of student

work that exhibits the student’s efforts, progress, and achievements in

one or more areas of the curriculum.

Guidelines:

Identify purpose

Select objectives.

Think about the kinds of entries that will best match

instructional outcomes.

Decide who select the entries

Decide how much to include, how to organize the portfolio,

where to keep it and when to access.

156

Page 166: Measurement and Evaluation (Book) Abbasi.docx

Set the criteria for judging the work (rating scales, rubrics,

checklists) and make such student understand the criteria.

Review the student’s progress.

Hold portfolio conferences with students to discuss their

progress.

These guidelines are discussed below in detail.

Identify Purpose:

Without purpose, a portfolio is only a collection of student work

samples. Different purposes result in different portfolios. For example, if

the student is to be evaluated on the basic of the work in the portfolio for

admission to college, then his final version of his best work would

probably be included in the portfolio.

Select Objectives:

The objectives ot be met be students should be clearly stated a

list of communicative functions can be included for students to check

when the feel comfortable with them and stapled to the inside lover.

Students would list the little or the number of the samples which address

this function.

Portfolios also can be organized according the selected

objectives addressing one skill such as writing. The selected objectives

will be directly related to the stated purpose for the portfolio. At any rate,

teachers must ensure that classroom instruction support the identified

seals.

Decide how much to include & how to Organize:

Teachers may want to spend some time going over the purpose

of the portfolio at regular intervals with students to ensure that the

selected pieces do address the purpose and the objectives. At regular

times, ask students to go through their entries, to choose what should

remain in the portfolio, and what could be replaced by another work

which night be move illustrative of the objectives. Other material no

157

Page 167: Measurement and Evaluation (Book) Abbasi.docx

longer current and/or not useful to document student progress toward

attain bent of the objective should be discarded.

What is the student’s role?

The student’s role of participation in the portfolio will be largely

responsible for the success of the portfolio. For this reason, students must

be actively involved in the choices of entries and in the rationale for

selecting those entries.

i. Selecting:

The student’s first role is in selecting some of the items to be4

pair of the portfolio. Some teachers give students a checklist for making

choices. Others leave students almost freedom in selecting their entries.

At an rate student should include their best and favorite pieces of work

along with those showing growth and process.

ii. Reflecting and self-assessing:

An essential component of self-assessment involves the student

in reflecting about their own work. At the beginning students might not

know what to saw so teaching will need to model the kinds of reflection

expected from students.

Set the Criteria for Judging the Work:

There are two kinds of criteria needed at this point.

Criteria for individual entries (refers to the section on rubrics for

details).

Criteria for the portfolio as a whole.

Assessing the individual entries in a portfolio is different from

assessing the portfolio as a whole. If the purpose of the portfolio is to

now student progress then if is highly probable that some of the

beginning entries may not reflect high quality; however, over several

months, the student now have demonstrated growth toward the stated

objectives.

158

Page 168: Measurement and Evaluation (Book) Abbasi.docx

Criteria can be established by teachers alone and/or by teachers

and students together. At and rate, criteria for evaluating the portfolios

must be announced a head of time.

Possibilities of criteria include teacher evaluation and/or

observation, student self-evaluation, peer assessment, and a combination

of several teacher’s comments.

Following is a list of suggested criteria for a portfolio as a

whole.

Variety: Selected pieces display the range of tasks students can

accomplish and skills they have learned.

Growth: Student work represents the student’s growth in content

knowledge and language proficiency.

Completeness: Students organized the contents systematically.

Organization: Students organized the contents systematically.

Fluency: Selected pieces are meaningful to the students and

communicate information to the teacher.

Accuracy: Student work demonstrates skills in the mechanics of the

language.

Goal Oriented: The contents reflect progress and accomplishment of

curricular objectives.

Following Directions: Students followed the teacher’s directions for

pieces of the portfolio.

Neatness: Student work is neatly written, typed or illustrated.

Justification or Significance: Student include reasonable justifications

for the work selected or explain why selected items are significant.

Reference

Katozai, Murad Ali. Measurement & Evaluation. Peshawar. University

Publisher, 2013

159

Page 169: Measurement and Evaluation (Book) Abbasi.docx

6.4 USING PORTFOLIOS IN INSTRUCTION AND COMMUNICATION:

Portfolio:

Literally the word “Portfolio” is used in the following meanings:

1. A portable large things and flat briefcase especially of leather

used for carrying papers, pictures, drawings or maps.

2. A list of the financial assests held by an individual or a bank or

other financial institution.

3. The role of the head of a government department e.g. “He holds

the portfolio for foreign affairs”.

4. An organized presentation of an individual’s education, work

samples and skills.

Using portfolios of studne4t work in Instruction and communication:

The term portfolio has become popular buzz word.

Unfortunately, it is not always clear exactly what is meant or

implied by the term especially when used in the context of portfolio

assessment. This training module is intended to clarify the notion of

portfolio assessment and help users design such assessments in a

thoughtful manner. We begin with a discussion of the rationale for

assessment alternatives and the discuss portfolio definitions

characteristics and design considerations.

Educators and critics are currently reciting a litany of problems

concerning the use of multiple-choice and other structured format tests

for assessing many important students outcomes. This has been

accompanied by an explosion of activity searching for assessment

alternatives.

1. Capture a richer array of what students know and can do than is

possible with multiple-choice tests. Current goals for students go

beyond knowledge of facts and include such things as problems

solving critical thinking, lifelong learning of new information

160

Page 170: Measurement and Evaluation (Book) Abbasi.docx

and thinking independently. Goals also include dispositions such

as persistence, flexibility, motivation and self-confidence.

2. Portray the process by which students produce work. It is

important for example that students utilize efficient strategies for

solving problems as well as getting the right answer. It is also

important for students to be able to do such things as monitoring

their own learning so that they do when they perceive they are

not understanding.

3. Make our assessment align with what we consider important

outcomes for students in order to communicate the right

message to students and other about what we valve. For example

if we emphasize higher order thinking in instruction but only test

knowledge because testing thinking is difficult, students figure

out pretty fast figure out pretty fast what is really valued.

4. Have realistic contexts for the production of work, so that we

can examine what students know and can do in real-life

situations.

5. Provide continuous and ongoing information on how students

are doing in order to chronicle development, give effective

feedback to students and encourage students to observe their

own growth.

6. Integrate assessment with instruction in a way consistent with

both current theories of instruction and goals for students.

Specifically we want to encourage active student engagement in

learning, and student responsibility for the control of learning.

We also want to develop assessment techniques that in their use,

improve achievement and not just monitor it.

7. Using portfolios of student work for assessment, already an

instructional tool in many places, it seen as one potential way to

accomplish these things. But using portfolios will only have

these desired effects if we plan them carefully.

161

Page 171: Measurement and Evaluation (Book) Abbasi.docx

Important Points in Portfolio Developing Process:

Some important points in portfolio development process are as

follows:

1. It should be consulted to teachers, students, parents and school

administrations in deciding which items would be placed in it.

2. It should be created a shared, clear purpose for using portfolios.

3. It should reflect the actual day-to-day learning activities of

students.

4. It should be on-going so that they show students efforts,

progress and achievements over a period of time.

5. Items in portfolio should be collected as a systematic, purposeful

and meaningful.

6. It should give opportunities for students in selecting pieces they

consider most reprehensive of themselves as learners to be

placed into their portfolios, and to establish criteria for their

selections.

7. It should be viewed as a part of learning process rather than

merely as recordkeeping tools, as a way to enhance students

learning.

8. Students can access their portfolios.

9. Share the criteria that will be used to assess the work in the

portfolio as well as in which the result are to be used. Teachers

should give feedback to students, parents about the use the

portfolio.

In conclusion, in portfolio making process some necessary steps

are; assessment of studies should be clearly explained the process should

over a certain time period, portfolio should encourage students to learn,

and items in the portfolio should be multi-dimensional and should

address different learning areas. Besides, it is vitally important that the

162

Page 172: Measurement and Evaluation (Book) Abbasi.docx

studies in a portfolio should be designed in order to present students,

performance and development in any time period in detail.

Reference

Katozai, Murad Ali. Measurement & Evaluation. Peshawar. University

Publisher, 2013

6.5 POTENTIAL STRENGTH AND WEAKNESSES OF PORTFOLIOS:

Potential Strength of Portfolios

(Or Advantages of Portfolios as Method of Assessment)

Portfolio can present a wide perspective of learning process for

students and enables a continuous feedback for them. Besides this, it

enables students to have a self-assessment for their studies and learning,

and to review their progress. Since it provides visual and dynamic proofs

about students' interests, their skills, strong sides, successes and

development in a certain time period, portfolio which is the systematic

collection of the student's studies helps assessing students as a whole.

Portfolio is strong devices that help students to gain the impbrtant

abilities such as self-assessment, critical thinking and monitoring one's

own learning. Furthermore, portfolio provide pre-service teacher

assessing their own learning and growth, and help them become self-

directed and reflective practitioners, and contribute them the individual

and professional developments. Mullin (1998) stresses that portfolio

provides teachers to have new perspective in education. For instance,

portfolio can answer these questions: what kind of troubles do students

have? Which activities are more effective or ineffective? What subjects

are understood and not understood? How efficient is the teaching

process? Some advantages or strengths of Portfolios are given below:

1. Portfolio provides multiple ways of assessing students' learning

over time

2. It provides for a more realistic evaluation of academic content

163

Page 173: Measurement and Evaluation (Book) Abbasi.docx

than pencil-and paper tests.

3. It allows students, parent, teacher and staff to evaluate the

students' strength and weakness.

4. It provides multiple opportunities for observation and

assessment

5. It provides an opportunity for students to demonstrate his/her

strengths as well as weakness.

6. It encourages students to develop some abilities needed to

become independent, self-directed learners

7. It also helps parents see themselves as partners in the learning

process.

8. It allows students to express themselves in a comfortable way

and to assess their own learning and growth as learners.

9. It encourages students to think of creative ways to share what

they are learning

10. It increases support to students from their parents and enhances

communication among teachers, students and parents.

11. It encourage teachers to change their instructional practice and it

is a powerful way to link curriculum and instruction with

assessment

12. It assesses and promotes critical thinking.

13. It encourages students to become accountable and responsible

for their own learning (i.e., self-directed, active, peer-supported,

adult learning).

14. It can be the focus of initiating a discussion between student and

tutor.

15. It facilitates reflection and self-assessment.

16. It can accommodate diverse learning styles, though they are not

suitable for all learning styles.

164

Page 174: Measurement and Evaluation (Book) Abbasi.docx

17. Portfolios can monitor and assess students' progress overtime.

18. Portfolios can assess performance, with practical application of

theory, in real-time naturalistic settings (i.e., authentic

assessment).

19. Portfolios use multiple methods of assessment.

20. Portfolios take into account the judgment of multiple assessors.

21. Portfolios have high face validity, content validity, and construct

validity.

22. Portfolios integrate learning and assessment.

23. Portfolios promote creativity and problem solving.

24. Portfolios promote learning about learning (i.e., metacognition).

25. Portfolios can be standardized and used in summative

assessment.

26. Portfolios combine subjective and objective, as well as

qualitative and quantitative, assessment procedures.

27. Portfolios can be used to assess attitudes and professional and

personal development.

28. Portfolios enable identification of the unsatisfactory or

struggling performer.

29. Portfolios offer teachers vital information for diagnosing

students' strengths and weaknesses to help them improve their

performance (i.e., formative assessment).

30. Portfolios reflect students' progression toward learning

outcomes (i.e., student profiling).

31. Portfolios allow the evaluators to see, the student, group, or

community as individual, each unique with its own

characteristics, needs, and strengths.

32. Portfolios serve as a cross-section lens, providing a basis for

165

Page 175: Measurement and Evaluation (Book) Abbasi.docx

future analysis and planning. By viewing the total pattern of the

community or of individual participants, one can identify areas

of strengths and weaknesses, and barriers to success.

33. Portfolios serve as a concrete vehicle for communication,

providing ongoing communication or exchanges of information

among those involved.

34. Portfolios Promote a shift in ownership; communities and

participants can take an active role in examining where they

have been and where they want to go.

35. Portfolio assessment offers the possibility of addressing

shortcomings of traditional assessment. It offers the possibility

of assessing the more complex and important aspects of, an area

or topic.

36. Portfolios cover a broad scope of knowledge and information,

from many different people who know the program or person in

different contexts (e.g., participants, parents, teachers or staff,

peers, or community leaders).

Potential Weaknesses of Portfolios

(Or Disadvantages of Portfolios as Method of Assessment)

1. When portfolios are used for summative assessment, students

may be reluctant to reveal weaknesses.

2. Portfolios are personal documents, and ethical issues of privacy

and confidentiality may arise when they are used for assessment.

3. Difficulties may arise in verifying whether the material

submitted is the candidate's own work.

4. Portfolios take a long time to complete and assess.

5. The portfolio process involves a large amount of paperwork.

6. Portfolio assessment may produce unacceptably low inter-rater

reliability, especially if the assessment rubrics .are not properly

prepared or are used by untrained assessors.

166

Page 176: Measurement and Evaluation (Book) Abbasi.docx

7. May be seen as less reliable or fair than more quantitative

evaluations such as test scores.

8. Can be very time consuming for teachers or program staff to

organize and evaluate the contents, especially if portfolios have

to be done in addition to traditional testing and grading.

9. Having to develop your own individualized criteria can be

difficult or unfamiliar at first.

10. If goals and criteria are not clear, the portfolio can be just a

miscellaneous collection of artifacts that don't show patterns of

growth or achievement.

11. Like any other form of qualitative data, data from portfolio

assessments can be difficult to analyze or aggregate to show

change.

Portfolio Assessment is Most useful for:

1. Evaluating programs that have flexible or individualized goals

or outcomes. For example, within a program with the general

purpose of enhancing children's social skills, some individual

children may need to become less aggressive while other shy

children may need to become more assertive.

2. Each child's portfolio assessment would be geared to his or her

individual needs and goals.

3. Allowing individuals and programs in the community (those

being evaluated) to be involved in their own change and

decisions to change.

4. Providing information that gives meaningful insight into

behaviour and related change. Because portfolio assessment

emphasizes the process of change or growth, at multiple points

in time, it may be easier to see patterns.

5. Providing a tool that can ensure communication and

accountability to a range of audiences. Participants, their

167

Page 177: Measurement and Evaluation (Book) Abbasi.docx

families, funders, and members of the community at large who

may not have much sophistication in interpreting statistical data

can often appreciate more visual or experiential "evidence" of

success.

6. Allowing for the possibility of assessing some of the more

complex and important aspects of many constructs (rather than

just the ones that are easiest to measure).

Portfolio Assessment is not as useful for:

1. Evaluating programs that have very concrete, uniform goals or

purposes. For example, it would be unnecessary to compile a

portfolio of individualized “evidence” in a program whose sole

purpose is full immunization of all children in a community by

the age of five years. The required immunizations are the same,

and the evidence is generally clear and straightforward.

2. Allowing you to rank participants or programs in a quantitative

or standardized way (although evaluators or program staff may

be able to make subjective judgments or relative merit).

3. Comparing participants or programs to standardized norms.

While portfolios can (and often do) include some standardized

test scores along with other kinds of “evidence”, this is not the

main purpose of the portfolio.

4. May be seen as less reliable or fair than more quantitative

evaluations such as test scores.

5. Can be very time consuming for teachers or program staff to

organize and evaluate the contents, especially if portfolios have

to be done in addition to traditional testing and grading.

6. Having to develop you own individualized criteria can be

difficult or unfamiliar at first.

7. If goals and criteria are not clear, the portfolio can be just a

miscellaneous collection of artifacts that don’t show patterns of

growth or achievement.

168

Page 178: Measurement and Evaluation (Book) Abbasi.docx

8. Like any other form of qualitative data, data from portfolio

assessments can be difficult to analyze or aggregate to show

change.

6.6 EVALUATION OF PORTFOLIO:

According to Paulson, Paulson and Meyer, (1991, p. 63):

“Portfolios offer a way of assessing student learning that is different than

traditional methods. Portfolio assessment provides the teacher and

students an opportunity to observe students in a broader context: taking

risks, developing creative solutions, and learning to make judgments

about their own performances”.

In order for thoughtful evaluation to take place, teachers .must

have multiple scoring strategies to evaluate students' progress. Criteria

for a finished portfolio might include several of the following:

Thoughtfulness (including evidence of students' monitoring of

their own comprehension, metacognitive reflection, and

productive habits of mind).

Growth and development in relationship to key curriculum

expectancies and indicators.

Understanding and application of key processes.

Completeness, correctness, and appropriateness of products and

processes presented in the portfolio.

Diversity of entries (e.g., use of multiple formats to demonstrate

achievement of designated performance standards).

It is especially important for teachers and students to work

together to prioritize those criteria that will be used as a basis for

assessing and evaluating student progress, both formatively (i.e.,

throughout an instructional time period) and summatively (i.e., as part of

a culminating project, a.ctivity, or related assessment to determine the

extent to which identified curricular expectancies, indicators, and

standards have been achieved).

169

Page 179: Measurement and Evaluation (Book) Abbasi.docx

As the school year progresses, students and teacher can work

together to identify especially significant or important artifacts and

processes to be captured in the portfolio. Additionally, they can work •

collaboratively to determine grades or scores to be assigned. Rubrics,

rules, and scoring keys can be designed for a variety of portfolio

components. In addition, letter grades might also be assigned, where

appropriate. Finally, some form of oral discussion or investigation should

be included as part of the summative evaluation process. This component

should involve the student, teacher, and if possible, a panel of reviewers

in a thoughtful exploration of the portfolio components, students'

decision-making and evaluation processes related to artifact selection,

and other relevant issues.

170

Page 180: Measurement and Evaluation (Book) Abbasi.docx

UNIT-7:

BASIC CONCEPTS OF INFERENTIAL STATISTS

7.1 CONCEPT & PURPOSE OF INFERENTIAL STATISTICS:

Introduction:

The role and importance of statistics in education cannot be

denied. In education we come across with measurement, evaluation and

research. Similarly, we have to make educational policies and budgets. In

all these fields we need to make proper measurement and present the data

quantitatively. Thus without statistics we cannot make proper

measurement. As quoted in different statistics books "Planning is the

order of the day, and planning without statistics is inconceivable". Good

statistics and sound statistical analysis assist in providing the basis for the

design of educational policies, monitor policy implications and evaluate

policy impact. To generate reliable and relevant information the data

should be collected using appropriate statistical methods. The materials

one uses for data collection should be well designed. The data analysis

should also be done using appropriate statistical method. All these show

that statistics plays vital role in Education Management and educational

planning.

Concept of Inferential Statistics

Definition:

The branch of statistics concerned with using sample data to

make an inference about a larger group of data is called inferential

statistics.

Example:

For instance the college teacher decides to use the average grade

achieved by one statistics class to estimate the average grade of all the

171

Page 181: Measurement and Evaluation (Book) Abbasi.docx

sections of the same statistics course. This is a problem of estimation,

which falls in the inferential statistics.

In educational research, it is never possible to sample the entire

population that we want to draw a conclusion about. For example, we

might want to determine how well a new way of teaching mathematics

can affect mathematical achievement for all children in Primary 1.

However, it would be impossible to test all children in Primary 1 because

of time, resources, and other logistical factors. Instead, we choose a

sample of the population to conduct a study. Then we want to make

conclusions - or inferences, about the entire population based on the

results of the study from the sample.

Quantitative research in education and social science aims to test

theories about the nature of the world in general (or some part of it) based

on samples („;?) of "subjects" taken from the world (or some part of it).

When we perform research on the effect of TV violence on children's

aggression, our intention is to create theories that apply to all children

who watch TV, or perhaps to all children in cultures similar to our own

who watch TV. We of course cannot study all children, but we can

perform research on samples of children that, hopefully, will generalize

back to the populations from which the samples were taken. Recall that

external validity is the ability of a sample to generalize to the population.

Purpose of Inferential Statistics

The main purpose of inferential statistics is .to determine

whether the findings from the sample can generalize to the entire

population. There will always be differences between groups in a

research study. Inferential statistics can determine whether the difference

between the two groups in the sample is large enough to be able to say

that the findings are significant. If the findings are indeed significant,

then the conclusions can be applied - generalized - to the entire

population. On the other hand, if the difference between the groups is

very small, then the findings are not significant and therefore were simply

the result of chance.

172

Page 182: Measurement and Evaluation (Book) Abbasi.docx

To illustrate this practically, imagine an entire room full of

socks. You want to determine whether there are more white socks than

green socks in the room. However, there are too many socks in the room

to count them all, so you want to take a sample of socks. Based on this

sample of socks, you will draw a conclusion about whether there are

more white socks than green socks. After you collect your sample, then

you will need to calculate inferential statistics is to determine whether the

colours chosen in your sample likely reflect the colours of socks in the

entire room or if your results were due to chance.

What factors will determine whether the colours in the sample of

socks adequately represents the colours of the entire room? Sample size.

If you only pick two socks, they would probably not represent the entire

room. The larger the sample is, the more representative the sample will

be of the entire room and the more likely the inferential statistics will find

a significant result. This is why when conducting experiments, the larger

the sample is, the better: with large samples, the results will more likely

reflect the entire population.

Inferential statistics is the mathematics and logic of how this

generalization from sample to population can be made. The fundamental

question is: can we infer the population's characteristics from the

sample's characteristics? Descriptive statistics remains local to the

sample, describing its central tendency and variability, while inferential

statistics focuses on making statements about the population.

Unlike descriptive statistics, inferential statistics provide ways

of testing the reliability of the findings of a study and "inferring"

characteristics from a small group of participants or people (your sample)

onto much larger groups of people (the population). Descriptive statistics

just describe the data, but inferential let you say what the data mean.

7.2 SAMPLING ERROR:

In statistics, sampling error is incurred when the statistical

characteristics of a population are estimated from a subset, or sample, of

173

Page 183: Measurement and Evaluation (Book) Abbasi.docx

that population. Since the sample does not include all members of the

population, statistics on the sample, such as mean and quantities,

generally differ from parameters on the entire population.

For example:

If one measures the height of a thousand individuals from a

country of one million, the average height of the thousand is typically not

the same as the average height of all one million people in the country.

Since sampling is typically done to determine the characteristics of a

whole population, the difference between the sample and population

values is considered a sampling error.

Population and Samples:

A population is the entire group to which we want to generalize

our results. A sample if a subset of the population might be all adult

humans but our sample might be a group of 30 friends and relatives.

Types of sampling errors:

1. Random sampling

2. Bias problems

3. Non-sampling error

1. Random Sampling:

In statistics, sampling error is the error caused by observing a

sampling instead of the whole population. The sampling error

can be found by subtracting the value of a parameter from the

value of a statistic. In nursing research, a sample error is the

difference between sample statistics used to estimate a

population parameter and the actual but unknown value of the

parameter.

(Bunns and Grove, 2009)

Parameters and statistics:

A numerical summary of a population is called a parameter,

while the same numerical summary of a sample is called a

statistic.

2. Bias Problems:

174

Page 184: Measurement and Evaluation (Book) Abbasi.docx

Sampling bias is a possible source of sampling errors. It leads to

the sampling error which either have a prevalence to be positive

or negative. Such errors can be considered to be systematic

errors.

3. Non-sampling Error:

Sampling error can be constrasted with non-sampling error.

Non-sampling error is a catch all term for the deviations from

the true value that are not a function of the sample chosen,

including various systematic errors and any random errors that

are not due o sampling. Non-sampling errors are much harder to

quantify than sampling error.

Example of non-sampling error:

Answers given by respondents may be influenced by the desire

to impress an interviewer.

4. Characteristics:

Sampling Error:

1. Generally decreased as the sample size increases (but not

proportionally)

2. Depends on the size of the population under study.

3. Depends on the variability of the characteristic of interest in the

population.

4. Can be accounted for and reduced by an appropriate sampling

plan.

5. Can be measured an controlled in probability sample surveys.

7.3 NULL HYPOTHESIS:

Before defining the term null-hypothesis, it is necessary that we

must know about Hypothesis and statistical hypothesis.

Hypothesis:

A hypothesis is any statement or assumption about any

phenomena of nature.

175

Page 185: Measurement and Evaluation (Book) Abbasi.docx

Statistical Hypothesis:

A statistical hypothesis is a statement or assumption about the

value of a population parameter.

For example;

1 = 80 (The population mean is equal to 80)

> 22 (The population mean is greater than 22)

2 # 25 (The population variance is not equal to 25)

1 = 2 (Population mean 1 is equal to population

mean 2)

1 - 2 = 0 (there is no difference between 1 and 2)

Null Hypothesis:

The hypothesis to be tested in a test of hypothesis is called null

hypothesis. It is a hypothesis which is tested for possible rejection or

mollification under the assumption that it is true. It is denoted by H0 and

usually contains and equal sign.

For example if we want to test that the population mean is 80,

then we write.

H0 : = 80

Another definition of ‘Null-Hypothesis’:

Null hypothesis is a type of hypothesis used in statistics that

proposes that no statistical significance exists in a set of given

observations.

The null hypothesis attempts to show that no variation exists

between variables, or that single variable is no different than ero. It is

presumed to be true until statistical evidence nullifies it for an alternative

hypothesis.

Examples:

Hypothesis:

176

Page 186: Measurement and Evaluation (Book) Abbasi.docx

The loss of my socks is due to alien burglary. (Alien burglary

means unfamiliar theft).

Null Hypothesis:

The loss of my socks is nothing to do with alien burglary.

Alternative Hypothesis:

The loss of my socks is due to alien burglary. In statistics, the

only way of supporting your hypothesis is to refute the null hypothesis.

Rather than trying to brave your idea (the alternative hypothesis) right

you must show that the null hypothesis is likely to be wrong. You have to

‘refute’ or ‘nullify’ the null hypothesis.

7.4 TESTS OF SIGNIFICANCE:

Once sample data has been gathered through an observational

study or experiment, statistical inference allows analysts to assess

evidence in favor or some claim about the population from which the

sample has been drawn. The methods of inference used to support or

reject claims based on sample data are known as tests of significance.

Every test of significance begins with a null hypothesis HO. HO

represents a theory that has been put forward, either because it is believed

to be true or because it is to be used as a basis for argument, but has not

been proved. For example, in a clinical trial of a new drug, the null

hypothesis might be that the new drug is no better, on average, than the

current drug. We would write HO: there is no difference between the two

drugs on average.

The alternative hypothesis, Ha, is a statement of what a

statistical hypothesis test is set up to establish. For example, in a clinical

trial of a new drug, the alternative hypothesis might be that the new drug

has a different effect, on average, compared to that of the current drug.

We would write Ha: the two drugs have different effects, on average. The

alternative hypothesis might also be that the new drug is better, on

average, than the current drug. In this case we would write Ha: the new

drug is better than the current drug, on average.

177

Page 187: Measurement and Evaluation (Book) Abbasi.docx

The final conclusion once the test has been carried out is always

given in terms of the null hypothesis. We either "reject HO in favor of

Ha" or "do not reject HO"; we never conclude "reject Ha", or even

"accept Ha".

If we conclude "do not reject HO", this does not necessarily

mean that the null hypothesis is true, it only suggests that there is not

sufficient evidence against HO in favor of Ha; rejecting the null

hypothesis then, suggests that the alternative hypothesis may be true.

Example

Suppose a test has been given to all high school students in a

certain state. The mean test score for the entire state is 70, with standard

deviation equal to 10. Members of the school board suspect that female

students have a higher mean score on the test than male students, because

the mean score from a random sample of 64 female students is equal to

73. Does this provide strong evidence that the overall mean for female

students is higher?

The null hypothesis HO claims that there is no difference

between the mean score for female students and the mean for the entire

population, so that = 70. The alternative hypothesis claims that the mean

for female students is higher than the entire student population mean, so

that > 70.s

Steps in Testing for Statistical Significance

1. State the Research Hypothesis

2. State the Null Hypothesis

3. Select a probability of error level (alpha level)

4. Select and compute the test for statistical significance

5. Interpret the results

1) State the Research Hypothesis

A research hypothesis states the expected relationship between

two variables. It may be stated in general terms, or it may include

dimensions of direction and magnitude.

178

Page 188: Measurement and Evaluation (Book) Abbasi.docx

For example,

General: The length of the job training program is related to the

rate of job placement of trainees. Direction: The longer the training

program, the higher the rate of job placement of trainees.

Magnitude: Longer training programs will place twice as many

trainees into jobs as shorter programs.

General: Graduate Assistant pay is influenced by gender.

Direction: Male graduate assistants are paid more than female

graduate assistants.

Magnitude: Female graduate assistants are paid less than 75% of

what male graduate assistants are paid.

2) State the Null Hypothesis

A null hypothesis usually states that there is no relationship

between the two variables. For example,

There is no relationship between the length of the job training

program and the rate of job placement of trainees.

Graduate assistant pay is not influenced by gender.

A null hypothesis may also state that the relationship proposed

in the research hypothesis is not true. For example,

Longer training programs will place the same number or fewer

trainees into jobs as shorter programs.

Female graduate assistants are paid at least 75% or more of what

male graduate assistants are paid.

Researchers use a null hypothesis in research because it is easier

to disprove a null hypothesis than it is to prove a research hypothesis.

The null hypothesis is the researcher's "straw man." That is, it is easier to

show that something is false once than to show that something is always

true. It is easier to find disconfirming evidence against the null hypothesis

than to find confirming evidence for the research hypothesis.

179

Page 189: Measurement and Evaluation (Book) Abbasi.docx

(Definitions taken from Valerie J. Easton and John H. McColl's

Statistics Glossary v1.1)

One Tailed and Two Tailed Significant Tests

One important concept in significant testing is whether you use a

one tailed or two tailed test of significance. The answer is that it depends

on your hypothesis. When your research hypothesis states the direction of

the difference or relationship, then you use a one tailed probability. For

example, a one tailed test would be used to test these null hypothesis:

Females will not score significantly higher than males on an IQ test.

Superman is not significantly stronger than the average person.

The one tailed probability is exactly half the value of two tailed

probability.

7.5 LEVELS OF SIGNIFICANCE:

In hypothesis testing, the significance level is the criterion used

for rejecting the null hypothesis.

The significance level is used in hypothesis testing as follows.

First, the difference between the results of the experiment and

the null hypothesis is determined. Then alluring the null hypothesis is

true, the probability of a difference that large or larger is computed.

Finally, the probability is compared to the significance level.

If the probability is less than ON equal to the significance level,

then the null hypothesis is rejected & the outcome is said to be

statistically significant. Traditionally experiments have used to be

statistically significant. Traditionally, experiments have used either the

0.05 level (sometime called 5% level) on the 0.01 level (1% level),

although the choice of levels is largely subjective. The lower the

significance level, the more the data must diverge from the null the 0.01

level is more conservative than the 0.05 level.

180

Page 190: Measurement and Evaluation (Book) Abbasi.docx

Symbols:

The Greek word alpha () is sometime used to indicate the

significance level. The above explanation shows that the significance

level is a value associated to some statistical value, tests, which indicates

the probability of obtaining those on more extreme results. This value can

be interpreted as the probability of obtain those results. If the null

hypothesis were (true) when (sampling is random) on as the probability

of obtaining those results by chance alone. (When sampling is less than

random). The value of this probability (also known as “p”, “p” – value, alpha & Type I error) runs between 0& 1. The closer to “0” the lower the

probability of the results being found if the null hypothesis were true, on

the lower the probability of the result being a chance result. As stated in

beginning, significance levels are used to reject the null hypothesis that,

for example, there is no correlation between variables” there is no

difference between groups on there is no change between treatments”.

A significant level of 0.051 is conventionally used in the social

sciences, although probity as high as “0.10” also be used. Probability

greater than 0.10 are rarely used. A significance level of 0.05 for example

indicates that there is a 5% probability that results are due to chance. A

significance level of 0.10 indicates a 10% probability that the results are

due to chance. Thus, using significance levels above 0.10 is rather risky:

while using lower significance level is “safer”.

History:

The present day concept of statistical significance originated by

Ronald Fisher when he developed statistical hypothesis testing which he

described as test of significance in his 1925 publication.

Fisher suggested a probability of one-in-twenty (0.05) as a

convenient cut off level of rejection null hypothesis.

Role in Statistics:

Statistical significance play a pivotal role in statistical

hypothesis testing where it is used to determine it a null hypothesis can

be rejecting on retained. A null hypothesis is the greater on general

181

Page 191: Measurement and Evaluation (Book) Abbasi.docx

default statement that nothing happened on changed. For a null

hypothesis to be rejected on false, the result has to be identified as being

statistical significant. i.e. unlikely to have occurred by chance alone.

To determine a result is statistically significant a researcher

would have to calculate a p-value which is the probability of observing an

effect given that the null hypothesis is true.

References

www.en.wikipedia.org/wiki/statistical_significance.

M.A. Kotazoi, Measurement & Evaluation: 2013.

7.6 TYPE-I AND TYPE-II ERRORS: REMAINING:

Statistical Errors

Even in the best research project, there is always a possibility

that the researcher will make a mistake regarding the relationship

between the two variables. This mistake is called statistical error.

In statistical test theory the notion of statistical error is an

integral part of hypothesis testing. The test requires an unambiguous

statement of a null hypothesis, which usually corresponds to a default

"state of nature", for example "this person is healthy", "this accused is not

guilty" or "this product is not broken". An alternative hypothesis is the

negation of null hypothesis, for example, "this person is not healthy",

"this accused is guilty" or "this product is broken". The result of the test

may be negative, relative to null hypothesis (not healthy, guilty, broken)

or positive (healthy, not guilty, not broken). If the result of the test

corresponds with reality, then a correct decision has been made.

However, if the result of the test does not correspond with reality, then an

error has occurred. Due to the statistical nature of a test, the result is

never, except in very rare cases, free of error. Two types of error are

distinguished: type I error and type II error.

In statistics, a type I error (or error of the first kind) is the

incorrect rejection of a true null hypothesis. A type 11 error (or error of

182

Page 192: Measurement and Evaluation (Book) Abbasi.docx

the second kind) is the failure to reject a -false. null hypothesis. A type I

error is a false positive. Usually a type I error leads one to conclude that a

thing or relationship exists when really it doesn't, for example, that a

patient has a disease being tested for when really the patient does not

have the disease, or that a medical treatment cures a disease when really

it doesn't. A type II error is a false negative. Examples of type II errors

would be a blood test failing to detect the disease it was designed to

detect, in a patient who really has the disease; or a clinical trial of a

medical treatment failing to show that the treatment works when really it

does. When comparing two means, concluding the means were different

when in reality they were not different would be a Type I error;

concluding the means were not different when in reality they were

different would be a 'Type II error.

All statistical hypothesis tests have a probability of making type

I and type II errors. For example, all blood tests for a disease will falsely

detect the disease in some proportion of people who don't have it, and

will fail to detect the disease in some proportion of people who do have

it. A test's probability of making a type I error is denoted by a. A test's

probability of making a type II error is denoted by β.

The detail is given below:

Type-I Error:

The first is called a Type I error. This occurs when the

researcher assumes that a relationship exists when in fact the evidence is

that it does not. In a Type 1 error, the researcher should accept the null

hypothesis and reject the research hypothesis, but the opposite occurs.

The probability of committing a Type I error is called alpha (a).

A type I error, also known as an error of the first kind, occurs

when the null hypothesis (H0) is true, but is rejected. It is asserting

something that is absent, a false hit. A type I error may be compared with

a so-called false positive (a result that indicates that a given condition is

present when it actually is not present) in tests where a single condition is

tested for. Type I errors are philosophically a focus of skepticism and

183

Page 193: Measurement and Evaluation (Book) Abbasi.docx

Occam's razor. A Type I error occurs when we believe a falsehood. In

terms of folk tales, an investigator may be "crying wolf' without a wolf in

sight (raising a false alarm) (Ho: no wolf).

The rate of the type I error is called the size of the test and

denoted by the Greek letter a (alpha). -It usually equals the significance

level of a test. In the case of a simple null hypothesis a is the probability

of a type I error. If the null hypothesis is composite, a is the maximum

(supremum) of the possible probabilities of a type I error.

Explanation:

A Type I Error is also known as a False Positive or Alpha Error.

This happens when you reject the Null Hypothesis even if it is true. The

Null Hypothesis is simply a statement that is the opposite of your

hypothesis. For example, you think that boys are better in arithmetic than

girls. Your null hypothesis would be: "Boys are not better than girls in

arithmetic."

You will make a Type I Error if you conclude that boys are

better than girls in arithmetic when in reality, there is no difference in

how boys and girls perform. In this case, you should accept the null

hypothesis since there is no real difference between the two groups when

it comes to arithmetic ability. If you reject the null hypothesis and say

that one group is better, then you are making a Type I Error.

Type-II Error

The second is called a Type II error. This occurs when the

researcher assumes that a relationship does not exist when in fact the

evidence is that it does. In a Type II error, the researcher should reject the

null hypothesis and accept the research hypothesis, but the opposite

occurs. The probability of committing a Type II error is called beta.

Generally, reducing the possibility of committing a Type I error

increases the possibility of committing a Type II error and vice versa,

reducing the possibility of committing a Type II error increases the

possibility of committing a Type I error.

184

Page 194: Measurement and Evaluation (Book) Abbasi.docx

Researchers generally try to minimize Type I errors, because

when a researcher assumes a relationship exists when one really does not,

things may be worse off than before. In Type II errors, the researcher

misses an opportunity to confirm that a relationship exists, but is no

worse off than before.

Type II Error is a statistical term used within the context of hypothesis testing that describes the error that occurs when one accepts a null hypothesis that is actually false. The error rejects the alternative hypothesis, even though it does not occur due to chance.

A type II error accepts the null hypothesis, although the

alternative hypothesis is the true state of nature. It confirms an idea that

should have been rejected, claiming that two observances are the same,

even though they are different.

Example:

An example of a type II error would be a pregnancy test that gives a

negative result, even though the woman is in fact pregnant. In this

example, the null hypothesis would be that the woman is not pregnant,

and the alternative hypothesis is that she is pregnant.

In other words, a type DI error, also known as an error of the second

kind, occurs when the null hypothesis is false, but erroneously fails to be

rejected. It is failing to assert what is present, a miss. A type II error may

be compared with a so-called false negative (where an actual 'hit' was

disregarded by the test and seen as a 'miss') in a test checking for a single

condition with a definitive result of true or false. A Type II error is

committed when we fail to believe a truth. In terms of folk tales, an

investigator may fail to see the wolf ("failing to raise an alarm"). Again,

Ho: no wolf.

The rate of the type II error is denoted by the Greek letter f3

(beta) and related to the power of a test (which equals 143).

185

Page 195: Measurement and Evaluation (Book) Abbasi.docx

What we actually call type I or type H error depends directly on

the null hypothesis. Negation of the null hypothesis causes type I and

type II errors to switch roles.

The goal of the test is to determine if the null hypothesis can be

rejected. A statistical test can either reject (prove false) or fail to reject

(fail to prove false) a null hypothesis, but never prove it true (i.e., failing

to reject a null hypothesis does not prove it true).

Explanation:

A Type II Error is also known as a False Negative or Beta Error.

This happens when you accept the Null Hypothesis when you should in

fact reject it. The Null Hypothesis is simply a statement that is the

opposite of your hypothesis. For example, you think that dog owners are

friendlier than cat owners. Your null hypothesis would be: "Dog owners

are as friendly as cat owners."

You will make a Type II Error if dog owners are actually

friendlier than cat owners, and yet you conclude that both kinds of pet

owners have the same level of friendliness. In this case, you should reject

the null hypothesis since there is a real difference in friendliness between

the two groups. If you accept the null hypothesis and say that both types

of pet owners are equally friendly, then you are making a Type II Error.

7.7 DEGREES OF FREEDOM:

In statistics, the numl er of degrees of freedom is the number of

values in the final calculation of a statistic that are free to vary.

The number of independent ways by which a dynamic system

Can move without violating any constraint imposed of it, is called degree

of freedom. In other words, the degree of freedom can be defined as the

min mum number of independent coordinates that can specify the

position of the system completely:

Estimates of statistical parameters can be based upon different

amounts of information or data. The number of independent pieces of

186

Page 196: Measurement and Evaluation (Book) Abbasi.docx

information that go into the estimate of a parameter is called the degrees

of 7eedom. In general, the degrees of freedom of an estimate of a

parameter is equal to the number o 'independent scores that go into the

estimate minus the number of parameters used as intermediate steps in

the estimation of the parameter itself (which, in sample variance, is one,

since the sample mean is the only intermediate step).

In many statistical problems we are required to determine the

degrees of freedom. This refers to a positive whole number that indicates

the lack of restrictions in, our calculations. The degree of freedom is the

number of values in a calculation that we can vary.

One step in most statistical inference problems is to determine

the number of degrees of freedom. The number of degree of freedom in a

problem is related to the,precise probability distribution that is to be used

in the inference procedure. This step is an often overlooked but crucial

detail in both the calculation of confidence intervals and the workings of

hypothesis tests.

There is not a single general formula for the number

of degrees Of freedom for every inferenceproblem. Instead

there are specific formulas to be used for each type of

procedure in inferentialstatistics. In other worlds, the setting

that we are working in will determine how we calculate

thenumber of degrees of freedom.

Determining Degree of Freedom:

Number of components that are free to vary about a parameter

Df = Sample size – Number of parameters estimated

Df is n-1 for one sample test of mean

A Few Examples

187

Page 197: Measurement and Evaluation (Book) Abbasi.docx

For a moment suppose that we know the mean of data is 25 and

that the values are 20,10, 50, and one unknown value. To find the mean

of a list of data, we add all of the data and divide by the total number of

values. This gives us the formula (20 + 10 + 50 + x)/4 = 25, where x

denotes the unknown. Despite c ling this unknown, we can use some

algebra to determine that x = 20.

Let's alter this scenario slightly. Instead we suppose that we

know the mean of a data set is 25, with values 20, 10; and two unknown

values. These unknowns Could be different, so we use two different

variables, A and y to denote this. The resulting formula is (20 + 10 + x

+y)/4 = 25. With some algebra we obtain y = 70 - x. The formula is

written in this form to show that once we choose a value for x, the value

fory is determined. This shows 'that there is one degree of freedom.

Now we'll look at a t ample size of one hundred. If we know that

the mean of this sample data is 20, but do not know he values of any of

the data, then there are 99 degrees of freedom. All values must add up t )

a total of 20 x 100 = 2000. Once we have the values of 99 elements in the

data set, then the last one has been determined.

Example

To compute the variance I first sum the square deviations from

the mean. The mean is a parameter: it is a characteristic of the variable

under examination as a whole and is part of describing the overall

distribution of values. If you know all the, parameters you can accurately

describe the data. The more parameters you know, that is to saythe more

you fix, the fewer samples fit this mode of the data. If you know only the

mean, there will be many possible sets of data that are consistent with this

model but if you know the mean and the standard deviation, fewer

possible sets of data fit this model.

So in computing the Variance I had first to calculate the mean.

When I have calculated the mean, I could vary any of the scores in the

data except for one. If I leave one score unexamined it can always be

188

Page 198: Measurement and Evaluation (Book) Abbasi.docx

calculated accurately from the rest of the data and the mean itself. Maybe

an example can make this clearer.

I take the ages of a class of students and find the mean. If I fix

the mean, how many of the other scores (there are N of them remember)

could still vary? The answer is N-1. There are N-1 independent pieces of

information that could vary while the mean is known. These are the

degrees of freedom. One piece of information cannot vary because its

value is fully determined by the parameter (in t its case the mean) and the

other scores. Each parameter that is fixed during our computations

constitutes the loss of a degree of freedom.

If we imagine starting with a small number of data points and

then fixing a relatively large number of parameter: as we compute some

statistic, we see that as more degrees of freedom are lost, fewer and fewer

different situations are accounted for by our model since fewer and fewer

pieces of information could in principle be different from what is actually

observed.

So, the interest, to put it very informally, in our data is

determined by the degrees of freedom: if there is nothing that can vary

once our parameter is fixed (because we have so very few data points -

maybe just or e) then there is nothing to investigate. Degrees of freedom

can be seen as linking sample size to explanatory power.

The Standard Deviation is a measure of how spread out numbers

are;

Its symbol is a (the greek letter sigma)

The formula is easy: It is the square root of the Variance.

To calculate the variance follow these steps:

Work out the Mean (the simple average of the numbers)

Then for each number: subtract the Mean and square the resIult

(the squared difference).

Then work out the average of those squared differences.

189

Page 199: Measurement and Evaluation (Book) Abbasi.docx

Let suppose we have five values i.e 600,470,170,430 & 300

Mean = 600+470+170+430+300

5 = 1970

5 = 394

Variance: σ 2 =n 2062+762+(−224 )2+362+¿¿

= 42,436+5,776+50,176+1,296+8,836

5

= 108,520

5 = 21,704

Variance σ 2 = 2062+762+(−224 )2+362+¿¿

= 42,436+5,776+50,176+1,296+8,836

5

= 108,520

5 = 21,704

190

Page 200: Measurement and Evaluation (Book) Abbasi.docx

UNIT-8:

SELECTED TESTS OF SIGNIFICANCE

8.1 T-TEST:

Definition:

i) A t-test helps you compare weather two groups have

different average values (For example, weather men and

women have different average heights).

ii) A t-test asks weather a different between two groups

averages unlikely to have occurred because of random

chance in sample selection. A difference is more likely to be

meaningful and “real” if (a) the difference between, the

average is large, (b) the sample size is large, and (c)

Responses are consistently close to the average values and

not widely spread out (the standard deviation is low).

iii) A statistical examination of two population means. A two-

sample. T-test examines weather two samples are different

and is commonly used when the variances of two normal

distribution are unknown and when an experiment uses a

small sample size. For example, a t-test could be used to

compare the average floor routine score of the U.S women’s

Olympic gymnastic team to the average floor routine score

of China’s women’s team.

The t-test’s statistical significance and the t-test’s effect size are

the two primary outputs of the t-test. Statistical significance indicates

weather the difference between sample averages is likely to represent an

actual difference between population and the effect size indicates wither

that difference is large enough to be practically meaningful.

The “One sample t-test” is similar to the “independent samples

t-test” except it is used to compare one group’s average value to a single

191

Page 201: Measurement and Evaluation (Book) Abbasi.docx

number .x. for practical purposes you can look at the confidence interval

around the average value to gain this same information.

The “paired t-test” is used when each observation in one group

is paired with a related observation in the other group. For example do

Kansans spend money on movies in January to February. Where each

respondent is asked about their January from their February spending? In

fact a period t-test subtracts each respondent’s January spending from

their February spending (yielding the increase is spending), then take the

average of all those increases in spending and looks to see wither that

average is statistically significantly greater than Zero (using a one sample

t-test).

The “ranked independent t-test” ask a similar question to the

typical unranked test but it is more robust to outliners (a few bad

outliners can make the results of an unranked t-test invalid).

T-test (Independent Samples)

Dollars spend on movies per month. Stat-wing represents t-test

results as distribution curves. Assuming there is a large enough sample

size, the difference between these samples probably represents a “real’s”

difference between population from they were sampled.

Example:

Let’s say you are curious about wether New Yorkers and

Kansans spend a different amount of money per month on movies. It is

impractical to ask every New Yorker and Kansans about their movie

spending, so instead you ask a sample of each – may be 300 New

Yorkers and 300 Kansans – and the average are 14 Dollars and 18

Dollars. The t-test asks wether that difference is probably representative

of a real difference between Kansans and New Yorkers generally or

whether that is most likely a meaningless statistical fluke.

Technically, it asks the following. If there were in fact no

difference between Kansans and New Yorkers generally, what are

192

Page 202: Measurement and Evaluation (Book) Abbasi.docx

chances that randomly selected groups from those populations would be

as different as these randomly selected groups are?

For example if Kansans and New Yorks as a whole actually

spent the same amount of money on average. It is very unlikely that 300

randomly selected Kansans each spend exactly 14 Dollars and 300

randomly selected New Yorkers each spend. 18 Dollars exactly. So if you

are sampling yielded those results, you would conclude that the

difference in the sample groups is most likely representative of a

meaningful difference between the populations as a whole.

Statistical Analysis of the T-test:

The formula for the t-test is a ratio. The top part of the ratio is

just the difference between the two means or averages. The bottom part is

a measure of the variability or dispersing of the scores. This formula is

essentially another example of the signal-to-noise metaphor in research

the difference between the means is the signal that in this case, we think

our program or treatment introduced into the data, the bottom part of the

formula is a measure of variability that is essentially noise that may make

it harder to see the group difference.

Signal noise:

The top part of the formula is easy to compute----- Just find the

difference between the means. The bottom part is called the standard

error of the difference. To compute it, we take the variance for each

group and divide it by the number of people in that group. We add these

two values and then their square root. The specific formula is given in

Figure.

SE (X T−Xc )=√ varT

nc

+var c

nc

Remember that the variances is simply the square of the

standard deviation. The final formula for the T-test is shown in the given

figure as under.

193

Page 203: Measurement and Evaluation (Book) Abbasi.docx

T=XT−XC

√ nar t

nT

+narc

nC

Formula for T-test.

References

O’Mahony, Michael (1986). Sensory Evaluation of Food: Statistical

Methods and procedures.

William H.; Saul A. Teukolsky. William T. Vetterling Br Ain P. Flannery

(1992). Numerical Recipes in C: The Art of \Scientific

Computing.

Internet Google, pre Encyclopedia.

8.2 CHI-SQUARE (X2):

The X2-distribution (X is the Greek letter Chi, pronounced Ki)

was first obtained in 1875 by H.R Helmert a German physicist. Later in

1900, Karl Pearson showed that as n-increasing to infinity a discrete

multinomial distribution may be transformed and made to approach a chi-

square distribution. This approximation has broad application such as a

test of goodness of fit, as a test of independence and a test of

homogeneity.

The chi-square distribution contains only one parameter, called

the number of degree of freedom.

Chi-Square Distribution:

Let Z1, Z2 ----- Zn be normally and independently distributed

variables with Zero mean and unit vassance (0, 1). Then the random

variable expressed by the quantity.

X2 =

In otherworld’s it can be defined as “It is the sum of squares of

n-indep endant standardized random variables”.

194

Page 204: Measurement and Evaluation (Book) Abbasi.docx

Properties of Chi-Square Distribution:

Chi-square distribution has the following properties.

1. The chi-square distribution is continuous ranging from Zero to

infinity.

2. Total area under the curve is unity.

3. The mean of X2 distribution is equal to the number of degree of

freedom i.e. n.

4. The variance of f2 distribution is equal to twice the degree of

freedom i.e. 2n.

5. The carve of chi-square distribution is positively skewed.

6. The X2 distribution tends to normal distribution an the number

of degrees of freedom approaches to infinity.

7. Moment generating function of x2 distribution is (1-2+)-n/2

8. X2 distribution is leptokurtic as 2> 3.

Uses of X2 Distribution:

1. X2 is used to test the goodness of fit.

2. X2 is used to test the independence of attributes.

3. X2 is used to test the validity of a hypothetical ratios.

4. X2 is used to test the homogeneity of soosal X2 variances.

5. X2 is used to test whether the hypothical value S2 of

population variances hypothical value S2 of population

variances is true on not.

6. X2 is used to test the equality of several population

correlation co-efficient.

Goodness of Fit Test:

This test is based on the property that cell probabilities depend

upon unknown parameters, provided that the unknown parameters are

replaced with their estimates and provided that and one degree of

freedom is deducted for each parapets estimated”. To see whether there is

evidence of small or large differences, the test statistic to use is;

195

Page 205: Measurement and Evaluation (Book) Abbasi.docx

x2∑i=1

K (¿−npi)2

npi=¿∑

i=1

u

¿¿¿

With k-1-number of parameters estimated degrees of freedom.

The symbol Oi and ei are represented observed and expected

frequencies respectively. When the observed values are equal to the

expected values, the X2 = 0. The larger the difference between the

observed and expected frequencies, the larger will be the X2 value. A

small value of X2 indicates that the fit is good and leads to accept H0. A

large value of X2 indicates that the fit is poor and leads to accept H1.

Contingency Table:

A table consists two & more rows and two or more columns,

into which n-observations are classified according to two different

criteria (or variables) is commonly called, a contingency table.

The simplest form of a contingency table is 2×2 table which is

obtained when both criteria are dichotomized. The totals of the

frequencies in each of the rows and columns are called the marginal total

a frequencies. Contingency tables provides a useful method of comparing

two variables.

A 2 × 2 contingency table are as under.

Classes B1 B2 Total

A1 O11 O12 (A1)

A2 O21 O22 (A2)

Total (B1) (B2) N

A contingency table may be extended to higher dimension. i.e. r

× c contingence table, where r represents number of rows and c

represents number of columns.

Testing Hypothesis of Independence in Contingency Table:

The data presented in a contingency table can be used to test the

hypothesis that the two variables of classification are independent. It this

196

Page 206: Measurement and Evaluation (Book) Abbasi.docx

hypothesis is rejected, the two variables of classification are not

independent and we say that there is some also citation (or interaction)

between the two variables of classification. To do so, we must calculate

the expected frequencies based on this hypothesis, keeping the marginal

totals fixed.

Let eij denote the expected frequency belonging to Ai and Bj.

Assuming the hypothesis of independence is true, the proportion of

members belonging to any class Ai should be the same and equal to the

proportions in the total. Thus

eij(Ai)

=∑i=1

r

eij

∑i=1

4

(Ai)=(Bj)

n So that

Eij = ( Ai )(Bj)

n

That is, under Ho: The classification are independent, the

expected frequency in any cell is equal to the product of the marginal

total common to that cell divided by the total number of observation.

If our hypothesis of independence is true the difference between

observed and expected frequencies are small and are attributed to

sampling error. Large differences arise of the seeing false. The Chi-

square statistic provides a means for deciding whether the differences are

large or small overall. Hence the statistic to use is,

X2=∑i=1

r

❑∑j=1

c

(oij−eij )1 / eij

With (r-1) (c-1) degrees of freedom. Where r represents rows

and c represents the number of columns. A large value of X2 indicates

that the null hypothesis is false.

197

Page 207: Measurement and Evaluation (Book) Abbasi.docx

The procedure for testing the null hypothesis of

independence in contingency table is given below:

i) Formulate the null and altonative hypothesis as:

H0: The two variables of classification are

independent OR

There is no relationship / Association between

the two variables.

H1: The two variables of classification are not

independent; means they are associated.

ii) Choose a significance level x. The commonly used

levels are at x = 0.01, 0.05 etc.

iii) The test statistic use to

X2=∑i=1

r

❑∑j=1

c

(oij−eij )2/ eij

Which, if H0 is true, has an approximate chi-square distribution

with (r-1) (c-1) degrees of freedom.

iv) Compute the expected frequencies under H0 for each

cell by the formula

eij=( Ai ) (Bj )

n¿

( ithrow total ) ( jth column total )Totalnumber of observation

Also calculate the value of X2 and the degrees of freedom.

v) Determine the critical region which depends on X and

the number of degrees of freedom.

iv) Decide as below:

(i) Reject H0, if the computed value of

X2> X2× (r-1) (c-1)

(ii) Accept H0 if

198

Page 208: Measurement and Evaluation (Book) Abbasi.docx

X2> X2× (r-1) (c-1)

References

1. Chudry and Kamal (2004), Introduction to statistical theory part-

I. Markazi Kutab Khana, Urdu Bazar, Lahore, Pakistan.

2. B.L. Agarwal (2003), Programmed Statistics, 2nd Edition. New

Age International (P) Limited Publishers 4835/24, Ansori Road,

Daryaganj, New Delhi – 110002, ISBN: 81-224-1458-3.

8.3 REGRESSION:

In statistics, regression analysis is a statistical technique for

estimating the relationships among variables. It includes many techniques

for modelling and analysing several variables, when the focus is on the

relationship between a dependent variable and one or more independent

variables.

In other words regression is a statistical measure that attempts to

determine the strength of the relationship between one dependent variable

(usually denoted by Y) and a series of other changing variables (known

as independent variables).

Types of 'Regression'

There are two basic types of regression:

(i) Linear regression

(ii) Multiple regression.

Linear regression uses one independent variable to explain

and/or predict the outcome of Y, while multiple regression uses two or

more independent variables to predict the outcome. The general form of

each type of regression is:

Linear Regression: Y = a + bX + u

Multiple Regression: Y = a + b1 X1+ b2 X2 + B3 X3 B3X3 + …… Bt Xt u

Where:

199

Page 209: Measurement and Evaluation (Book) Abbasi.docx

Y = the variable that we are trying to predict

X = the variable that we are using to predict Y

a = the intercept

b = the slope

u = the regression residual.

In multiple regression, the separate variables are differentiated

by using subscripted numbers.

Regression takes a group of random variables, thought to be

predicting Y, and tries to find a mathematical relationship between them.

This relationship is typically in the form of a straight line (linear

regression) that best approximates all the individual data points.

Regression is often used to determine how much specific factors such as

the price of a commodity, interest rates, particular industries or sectors

influence the price movement of an asset.

200