m&e slides tutorial 2

44
CHAPTER 3 HOW TO ASSESS? OBJECTIVE TESTS

Upload: natasya-ain

Post on 28-Mar-2015

201 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: M&E Slides Tutorial 2

CHAPTER 3HOW TO ASSESS?OBJECTIVE TESTS

Page 2: M&E Slides Tutorial 2

HOW TO ASSESSObjective TestsEssay TestsProjects, Practicals, Fieldwork &

Oral TestsObservations & Portfolio

Assessment

Page 3: M&E Slides Tutorial 2

OBJECTIVE TEST

DefinitionA written test consisting of questions which require respondents to select from a list of possible answers. Marking/Scoring of answers is not influenced by the subjective opinions of the marker.

Formats/TypesMultiple-Choice QuestionsMatching QuestionsTrue-False Questions

Page 4: M&E Slides Tutorial 2

Parts of an MCQ

What is the capital of Mongolia?

(A) Cochin(B) Calcutta(C) Katmandu(D) Ulan Bator

Stem

Options/Alternatives

KeyDistracters

Page 5: M&E Slides Tutorial 2

The Stem In the form of a question or statement

• Direct-question form• Incomplete-statement form

Clear & concise with a definite focus, free from poor grammar, complex sentences, ambiguity & double negatives

Present a positive question (highlight negative if used)

Ask for ONE answer only

Avoid asking for opinions

Avoid using ALWAYS & NEVER in the stem

Include as many as possible words common to all alternatives

Page 6: M&E Slides Tutorial 2

The StemCan be in the form of a question or statement

• Direct-question formE.g. Who was the first Prime Minister of Malaysia?

(A) Tun Dr. Mahathir(B) Tun Abdul Razak(C) Tun Hussein Onn(D) Tunku Abdul Rahman

• Incomplete-statement formE.g. The first Prime Minister of Malaysia was

(A) Tun Dr. Mahathir(B) Tun Abdul Razak(C) Tun Hussein Onn(D) Tunku Abdul Rahman

Page 7: M&E Slides Tutorial 2

The above examples are CORRECT-ANSWER TYPE of multiple-choice item

The other type: The BEST-ANSWER TYPE

Example:

Which of the following is the best title for the passage?

(A) A bad experience

(B) An eventful journey

(C) A terrifying occasion

(D) An unforgettable day

Page 8: M&E Slides Tutorial 2

Clear & concise with a definite focus

Poor item:Wold War II was:(A) the result of the failure of the League of Nations(B) horrible(C) fought in Europe, Asia and Africa(D) fought during the period of 1939-1945.

N.B. there is no sense from the stem what the question is asking.

Better item: In which of these time periods was World War II fought?(A) 1914 – 1917(B) 1929 – 1934(C) 1939 – 1945(D) 1951 – 1955

N. B. The Improved version more clearly identifies the question and offers the student a set of homogeneous choices.

Page 9: M&E Slides Tutorial 2

Use clear, straight forward language. The stem with complex wording may become a test of reading comprehension, rather than an assessment of the subject matter.

Poor Item:As the level of fertility approaches its nadir, what is the most

likely ramification for the citizenry of a developing nation?(A) a decrease in the labour force participation rate of women(B) a downward trend in the youth dependency ratio(C) a broader base in the population pyramid(D) an increased infant mortality rate

Better Item: A major decline in fertility in a developing nation is likely to

produce(A) a decrease in the labour force participation rate of women(B) a downward trend in the youth dependency ratio(C) a broader base in the population pyramid(D) an increased infant mortality rate

N.B. In the Improved question the word “nadir” is replaced with “decline” and “ramification” is replaced with “produce” which are simpler words.

Page 10: M&E Slides Tutorial 2

Present a positive question (highlight negative if used)

Example:

Which of the following is NOT a symptom of osteoporosis?(A) decreased bone density(B) frequent bone fractures(C) raised body temperature(D) lower back pain

Better Item Which of the following is a symptom of osteoporosis?(A) hair loss(B) painful joints(C) decreased bone density(D) raised body temperature

Page 11: M&E Slides Tutorial 2

Include as many as possible words common to all alternatives

Poor Item Theorists of pluralism have asserted which of the following?(A) The maintenance of democracy requires a large middle class.(B) The maintenance of democracy requires autonomous centres of

countervailing power.(C) The maintenance of democracy requires the existence of a

multiplicity of religious groups.(D) The maintenance of democracy requires the separation of

governmental powers.

Better ItemTheorists of pluralism have asserted that the maintenance of democracy requires(A) a large middle class(B) autonomous centres of countervailing power(C) the existence of a multiplicity of religious groups(D) the separation of governmental powers

Page 12: M&E Slides Tutorial 2

Avoid giving away the answer because of grammatical cues

Poor ItemA fertile area in the desert in which the water table reaches the ground surface is called an(A) oasis(B) polder(C) mirage(D) water hole

Better Item: A fertile area in the desert in which the water table reaches the ground surface is called a/an(A) oasis(B) polder(C) mirage(D) water hole

Page 13: M&E Slides Tutorial 2

Avoid asking for an opinion

Poor Item

Which of the following men contributed most towards the defeat of Hitler's Germany in World War II?

(A) Winston Churchill

(B) Josef Stalin

(C) Franklin D. Roosevelt

(D) George Patton

Page 14: M&E Slides Tutorial 2

The Options/Alternatives

Each item should have 4 or 5 optionsOptions should be grammatically consistent with stemOptions should be clearly different with only ONE

correct responseOptions should be fairly consistent in lengthAvoid “None of the above” & “All of the above”.Key should be clearly correct to the informed while

distracters should be clearly incorrect but plausible to the uninformed.

Page 15: M&E Slides Tutorial 2

Options should be fairly consistent in length

Poor ItemThe main purpose of a placement test is to(A) determine the prerequisite skills of learners so

that they can be placed at an appropriate level.

(B) determine end-of-course achievement(C) determine learning progress(D) determine learning difficulties

Better itemThe main purpose of a placement test is to

determine learners’(A) prerequisite skills(B) learning progress(C) learning difficulties(D) overall achievement

Page 16: M&E Slides Tutorial 2

Options should be clearly different with only ONE correct response

Poor ItemWhat is the main source of pollution of Malaysian

rivers?(A) land clearing(B) open burning(C) coastal erosion(D) solid waste dumping

NB: (A) and (B) could be the answers

Better ItemWhat is the main source of pollution of Malaysian

rivers?(A) carbon dioxide emission(B) open burning(C) solid waste dumping(D) coastal erosion

Page 17: M&E Slides Tutorial 2

Use only plausible and attractive alternatives as distractors

Poor ItemWho was the third Prime Minister of Malaysia?(A) Hussein Onn(B) Ghafar Baba(C) Mahathir Mohamad(D) Musa Hitam

NB. (B) and (D) are not serious distracters.

Better ItemWho was the third Prime Minister of Malaysia?(A) Hussein Onn(B) Abdul Razak Hussein(C) Mahathir Mohamad(D) Abdullah Badawi

Refer to Linn & Gronlund for more examples, p. 203 - 214

Page 18: M&E Slides Tutorial 2

MCQ: Strengths/Advantages

Measure LOs from simple to complex

Provide highly structured and clear tasks

Capable of covering a wide range of areas taught

Distracters provide diagnostic information

Scores – more reliable than subjective marking

Easy scoring

Can include options that vary in degree of correctness

Allow for item analysis – reveal which item is too difficult or ambiguous

Page 19: M&E Slides Tutorial 2

MCQ: Weaknesses/Disadvantages/Limitations

Time consuming in making good itemsDifficult to find plausible distractersNot suitable in measuring the ability to organise &

express ideasScores can be influenced by reading abilityUnable to detect individual thought processesUnable to measure writing and speaking skills

(language test)Open to guessing

Page 20: M&E Slides Tutorial 2

TRUE-FALSE QUESTIONS

Strengths

Suitable for testing recall or comprehensionWide coverage of contentEasy to construct & can be written quicklyEasy to scoreScores are more reliable – objective scoring

Tunku Abdul Rahman was the first Prime Minister of Malaysia

True False

Page 21: M&E Slides Tutorial 2

Limitations

Open to guessing – 50% chancesRecognising a false statement does not indicate

that the respondent knows what is rightDifficult to write true-false statements for complex

materials

Page 22: M&E Slides Tutorial 2

Constructing True-False Qs

Avoid broad general statementsAvoid trivial statementsAvoid the use of negative statements, esp double

negativesAvoid long complex sentenceAvoid including more than one idea in one

statementAvoid statements of opinionAvoid True and False statements of unequal

lengthAvoid unequal number of true & false statements

Linn, R.L. & Gronlund, N.E. (2000). Measurement and assessment in teaching. NJ: Prentice hall

Page 23: M&E Slides Tutorial 2

Matching Questions

Column BA. Edwin AldrinB. Neil ArmstrongC. Frank BormanD. Scott CarpenterE. John GlennF. Wally SchirraG. Alan ShepardH. Edward White

Column AColumn A

1.1. First US astronaut First US astronaut to walk in spaceto walk in space

2.2. First US astronaut First US astronaut to ride in a space to ride in a space capsulecapsule

3.3. First US astronaut First US astronaut to orbit the earthto orbit the earth

4.4. First US astronaut First US astronaut to step on the to step on the moonmoon

premises responses

G

E

H

B

Page 24: M&E Slides Tutorial 2

AdvantagesGood at assessing understanding of relationships.

E.g. achievement – peoplePossible to measure a large amount of contentGenerally easy to write and score

DisadvantagesLimited to measurement of factual informationPossible to use elimination to pick the right

answer

Page 25: M&E Slides Tutorial 2

Constructing Matching QuestionsProvide clear directions Include an unequal number of responses &

premises or allow responses to be used more than once

Keep information in each column homogenousPut items with more words on the left (A)Place all of the items for one matching exercise on

one page.

Page 26: M&E Slides Tutorial 2

Table of Specifications

Test blue-print that includes the following information:

Topics/Skills/knowledge to be tested

Types & formats of questions

Weighting of each section/question

Time allocation

Page 27: M&E Slides Tutorial 2

Topics Recall ApplicationEvaluation

A. Identify crisis vs. role confusion; achievement motivation.

2, 9 4, 21, 33 16 18%

B. Adolescent sexual behavior; transition of puberty.

5, 8 1, 13, 26 11 18%

C. Social isolation and self-esteem; person perception.

14, 6 3, 20 25 15%

D. Egocentrism; adolescent idealism.

7, 29 12, 31 10, 15, 27 21%

E. Law and maintenance of the social order.

17 22 18 9%

F. Authoritarian bias; moral development.

19 30 24 9%

G. Universal ethical principle orientation.

28 23 32 9%

33% 40% 27%

Page 28: M&E Slides Tutorial 2

CHAPTER 7

RELIABILITY & VALIDITY

Page 29: M&E Slides Tutorial 2

What is a good test?

A good test must be able to measure the TRUE ABILITY of an individual, i.e. it should be able to give the TRUE SCORE of an individual

TRUE SCORE is difficult to obtain because of the presence of errors which may come from various sources such as within the test takers within the test in the administration of the test during the scoring/marking of the test

TRUE SCORE = OBSERVED SCORE + ERROR

Page 30: M&E Slides Tutorial 2

To ensure that a test measures the TRUE SCORE, we should reduce the magnitude of error in our test.

Error OOBSERVED SCORE TRUE SCORE

While it’s impossible to eliminate error completely, it is possible to reduce it. To reduce the error, the test must be reliable and valid

Page 31: M&E Slides Tutorial 2

RELIABILITY

Reliability refers to the consistency of the measurement

A test is reliable (a) when it yields the same score for a student who

takes the test on different occasions

takes the parallel forms of the same test

(b) When a student who answers a given question correctly is more likely to answer other similar or related questions correctly as well

Page 32: M&E Slides Tutorial 2

METHODS FOR

ESTIMATING RELIABILITY

Test-Retest

Parallel or Equivalent form

Internal Consistency

Split-half

Cronbach Alpha

Page 33: M&E Slides Tutorial 2

TEST-RETEST/PARALLEL FORMS

Subject Score 1 Score 2

1 4 8

2 8 10

3 20 18

4 12 12

5 14 16

6 8 10

7 20 16

8 4 4

9 20 16

10 20 16

Pearson Product Moment Correlation

r = Nξ XiYi

(ξXi)(ξYi)

[N ξXi2 (ξXi)2] [NξYi2 _ (ξYi) 2]

Page 34: M&E Slides Tutorial 2

Internal Consistency – Split-half

Subject ODD EVEN

1 4 8

2 8 10

3 20 18

4 12 12

5 14 16

6 8 10

7 20 16

8 4 4

9 20 16

10 20 16

rsb

= --------------------2rxy

(1 + rxy)

Spearman-Brown Correlation coefficient

Page 35: M&E Slides Tutorial 2

Internal Consistency – Cronbach Alpha

suitable to check the reliability of a measurement instrument with

binary-type items e.g. I’m afraid of school tests T F

Scale items e.g. I’m afraid of school tests SA A N D

SD MCQs

Reliability = correlation between the individual items & the extent to which individual items correlate with the total test (Refer to p.156 for the formula)

Page 36: M&E Slides Tutorial 2

Value of Reliability Coefficient (rxy)

rxy = ---------------------------------------Variance of the True Score

Variance of the Observed Score

No reliability

0.00

Perfect reliability

1.00

Page 37: M&E Slides Tutorial 2

Rule of Thumb – Reliability for a classroom test

ReliabilityReliability InterpretationInterpretation

.90 & above.90 & above Excellent reliabilityExcellent reliability

.80 - .90.80 - .90 Very good Very good

.70 - .80.70 - .80 Good for a classroom test but a Good for a classroom test but a few items could be improvedfew items could be improved

.60 - .70.60 - .70 Somewhat low. Some items could Somewhat low. Some items could be removed or improvedbe removed or improved

.50 - .60.50 - .60 Test needs to be revisedTest needs to be revised

.50 & below.50 & below Questionable reliability. Test needs Questionable reliability. Test needs to be replaced /needs major to be replaced /needs major revisionrevision

Page 38: M&E Slides Tutorial 2

Use of Test Reliability to determine the true score (p. 152)

Standard Error of Measurement (Sm) - the standard deviation of the error scores of

a test, i.e. the extent the error scores deviate from the mean error score.

You can determine Sm if you know SD & r of a test.

Sm = SD √ 1 – r , where r = test reliability

You can estimate a student’s TRUE SCORE with some degree of certainty based on the observed score & Sm

Page 39: M&E Slides Tutorial 2

INTER-RATER RELIABILITY

- Indicates whether two examiners are consistent in their scoring/marking

INTRA-RATER RELIABILITY

- Indicates whether an examiner is consistent in his scoring when marking at different times

Page 40: M&E Slides Tutorial 2

Validity

Validity refers to the extent to which a test measures what it is supposed to measure.

Types of

validity

Construct validity

Content validity

Criterion-related validity

Predictive validity

Concurrent validity

Page 41: M&E Slides Tutorial 2

Construct Validity

• How far does the test measure the attributes of a construct?

Content Validity

• How far does the test cover the content (syllabus) that has been taught?

Page 42: M&E Slides Tutorial 2

Criterion-related Validity

How far is the test related to some other criterion measure?

Examples:

How far is the students’ SPM performance related to their performance in STPM? – Predictive Validity How far is the students’ year-end English performance related to their SPM English performance?

Concurrent Validity

Page 43: M&E Slides Tutorial 2

Factors Affecting Reliability & Validity

Construction of test itemsLength of testSelection of topicsChoice of testing techniquesMethod of administrationMethod of marking

Page 44: M&E Slides Tutorial 2

Task

Can you explain how each of the following testing situations could have happened?

(1) The test is valid but not reliable

(2) The test is not reliable and not valid.

(3) The test is reliable and valid.

(4) The test is reliable but not valid.