ContentsContents
IntroductionIntroduction: : Definition of ValidityDefinition of ValidityTypes of validityTypes of validity
Non-empiricalNon-empiricalFace ValidityFace ValidityContent ValidityContent Validity
EmpiricalEmpiricalConstruct ValidityConstruct ValidityCriterion-related ValidityCriterion-related Validity
PracticalityPracticality
IntroductionIntroductionA writing test asks test takers to write A writing test asks test takers to write
on the following topic:on the following topic:““Is Photography an Art or a Science?”Is Photography an Art or a Science?”
A valid writing test? Why or why not?A valid writing test? Why or why not?You should be clear about what exactly You should be clear about what exactly
you want to test (i.e., no other you want to test (i.e., no other irrelevant abilities or knowledge).irrelevant abilities or knowledge).
Validity concerns what a test measures Validity concerns what a test measures and how well it measures what it is and how well it measures what it is intended to measure. intended to measure.
Definition of ValidityDefinition of Validity
• ““the extent to which inferences made from the extent to which inferences made from assessment results are appropriate, assessment results are appropriate, meaningful, and useful in terms of the meaningful, and useful in terms of the purpose of the assessment” (cited in Brown purpose of the assessment” (cited in Brown 22)22)
A valid test = a test that measures what it is A valid test = a test that measures what it is intended to measure, and nothing else (i.e., intended to measure, and nothing else (i.e., no external knowledge or other skills no external knowledge or other skills measured at the same time).measured at the same time).
e.g. A listening test measures listening skill e.g. A listening test measures listening skill and nothing else. It shouldn’t favor any and nothing else. It shouldn’t favor any students.students.
Non-empirical ValidityNon-empirical Validity
Involving inspection, intuition, Involving inspection, intuition, and common senseand common senseConsequential validity:Consequential validity:
Face validityFace validityContent validityContent validity
Consequential ValidityConsequential ValidityEncompasses all the consequences of Encompasses all the consequences of
a test: a test: (Brown 26)(Brown 26)
Its accuracy in measuring intended criteriaIts accuracy in measuring intended criteriaIts impact on the preparation of test-Its impact on the preparation of test-
takerstakersIts effect on the learnerIts effect on the learnerThe social consequences of a test’s The social consequences of a test’s
interpretation and useinterpretation and useThe effect on Ss’ motivation, subsequence The effect on Ss’ motivation, subsequence
in a course, independent learning, study in a course, independent learning, study habits, and attitude toward school work.habits, and attitude toward school work.
Face Validity (1)Face Validity (1)
You know if the test is valid or nYou know if the test is valid or not by ‘looking’ at it.ot by ‘looking’ at it.
It “looks right” to other testerIt “looks right” to other testers, teachers, and testees, the gens, teachers, and testees, the general public, etc.eral public, etc.
It “appears” to measure the kIt “appears” to measure the knowledge or abilities it claims to nowledge or abilities it claims to measure.measure.
Face Validity (2)Face Validity (2)
Face validity asked the Q: “does Face validity asked the Q: “does the test, on the ‘face’ of it, appear the test, on the ‘face’ of it, appear from the learner’s perspective to from the learner’s perspective to test what it is designed to test?” test what it is designed to test?” (Brown 27)(Brown 27)
Face validity cannot be Face validity cannot be empirically tested.empirically tested.
Essential to all kinds of tests, but Essential to all kinds of tests, but it is not enough.it is not enough.
Content Validity (1)Content Validity (1) ““A test is said to have content A test is said to have content
validity if its content constitutes a validity if its content constitutes a representative sample of the representative sample of the language skills, structures, etc. with language skills, structures, etc. with which it is meant to be concerned.” which it is meant to be concerned.” (Hughes 1989)(Hughes 1989)
Also called rational or logical validity.Also called rational or logical validity.Esp. important for achievement, Esp. important for achievement,
progress, & diagnostic testsprogress, & diagnostic testsA valid test: A valid test:
contains appropriate and representative contains appropriate and representative content.content.
Content Validity (2)Content Validity (2)A test with content validity contains a A test with content validity contains a
representative sample of the course representative sample of the course (objectives), and quantifies and (objectives), and quantifies and balances the test components (given a balances the test components (given a percentage weighting)percentage weighting)
Check against:Check against:Test specifications (test plan)Test specifications (test plan)Notes, textbooksNotes, textbooksCourse syllabus/objectivesCourse syllabus/objectivesAnother teacher or subject-matter expertsAnother teacher or subject-matter experts
Content Validity (3)Content Validity (3)
An example of a (fill-up) quiz on the An example of a (fill-up) quiz on the use of articles: (see Brown 23)use of articles: (see Brown 23)Does it have content validity if used as a Does it have content validity if used as a
listening/speaking test?listening/speaking test?Classroom tests should always have Classroom tests should always have
content validity.content validity.Rule of thumb for achieving content Rule of thumb for achieving content
validity: always use direst testsvalidity: always use direst tests
Criterion-related Validity (1)Criterion-related Validity (1)
The extent to which the The extent to which the “criterion” of the test has actually “criterion” of the test has actually been reached.been reached.
““how far results on the test agree how far results on the test agree with those provided by some with those provided by some independent and highly independent and highly dependable assessment of the dependable assessment of the candidate’s ability.”candidate’s ability.”
Criterion-related Validity (2)Criterion-related Validity (2)Two kinds of criterion-related Two kinds of criterion-related
validityvalidityConcurrent validity:Concurrent validity:
How closely the test result parallels test How closely the test result parallels test takers’ performance on another valid takers’ performance on another valid test, or criterion, which is thought to test, or criterion, which is thought to measure the same or similar activitiesmeasure the same or similar activities
test & criterion administered at about test & criterion administered at about the same timethe same time
possible criteria = an established test possible criteria = an established test or some other measure within the same or some other measure within the same domain (e.g., course grades, T’s ratings)domain (e.g., course grades, T’s ratings)
Criterion-related Validity (3)Criterion-related Validity (3)
E.g., situation: E.g., situation: conv. class, objectives = conv. class, objectives = a large # of functions. To test all of whica large # of functions. To test all of which will take 45 min. for each S.h will take 45 min. for each S.
Q: Q: Is such a 10-min. test a valid measure?Is such a 10-min. test a valid measure?Method: Method: a random sample of Ss taking a random sample of Ss taking
the full 45 min-test = criterion test; compthe full 45 min-test = criterion test; compare scores on short version with the thosare scores on short version with the those on criterion test e on criterion test if a high level of agr if a high level of agreement eement short version = valid test short version = valid test
Criterion-related Validity (4)Criterion-related Validity (4)
Validity coefficient:Validity coefficient:A mathematical measure of A mathematical measure of
similaritysimilarityPerfect agreement Perfect agreement validity validity
coefficient = 1coefficient = 1E.g., a coefficient = 0.7; (0.7)E.g., a coefficient = 0.7; (0.7)2 2 = =
0.490.49 49%, which means almost 49%, which means almost 50% agreement50% agreement
Criterion-related Validity (5)Criterion-related Validity (5)
Predictive validity:Predictive validity: How well the test result predicts How well the test result predicts
future performance/successfuture performance/successcorrelation done at future timecorrelation done at future timeImportant for the validation of Important for the validation of
aptitude tests, placement test, aptitude tests, placement test, admissions tests.admissions tests.
Criterion:Criterion:Outcome of the course (pass/fail), T’s Outcome of the course (pass/fail), T’s
ratings laterratings later
Construct Validity (1)Construct Validity (1)
Construct:Construct:any underlying ability (trait) which any underlying ability (trait) which
is hypothesized in a theory of is hypothesized in a theory of language abilitylanguage ability
Any theory, hypothesis, or model Any theory, hypothesis, or model that attempts to explain observed that attempts to explain observed phenomena in our universe of phenomena in our universe of perceptions (Brown 25)perceptions (Brown 25)
Construct Validity (2)Construct Validity (2)Originated for psychological testsOriginated for psychological testsRefers to the extent to which the test Refers to the extent to which the test
may be said to measure a theoretical may be said to measure a theoretical construct or trait which is normally construct or trait which is normally unobservable and abstract at different unobservable and abstract at different levels (e.g., personality, self-esteem; levels (e.g., personality, self-esteem; proficiency, communicative proficiency, communicative competence)competence)
It examines whether the test is a true It examines whether the test is a true reflection of the theory of the trait reflection of the theory of the trait being measured.being measured.
Construct Validity (3)Construct Validity (3)
A test has construct validity if it A test has construct validity if it can be demonstrated that it can be demonstrated that it measures measures just the abilityjust the ability which which it is supposed to measure.it is supposed to measure.
Two examples:Two examples:1. reading ability: involves a # of 1. reading ability: involves a # of
sub-abilities, e.g., skimming, sub-abilities, e.g., skimming, scanning, guessing meaning of scanning, guessing meaning of unknown words, etc.unknown words, etc.
Construct Validity (4)Construct Validity (4) need empirical research to need empirical research to
establish if such a distinct ability establish if such a distinct ability existed and could be measuredexisted and could be measured
Need of construct validity Need of construct validity (because we have to (because we have to demonstrate we’re indeed demonstrate we’re indeed measuring just that ability in a measuring just that ability in a particular test.particular test.
Construct Validity (5)Construct Validity (5)2. when measuring2. when measuring an ability an ability
indirectlyindirectly::E.g., writing abilityE.g., writing abilityNeed to look to a theory of writing ability for Need to look to a theory of writing ability for
guidance as to the form (i.e., content, guidance as to the form (i.e., content, techniques) an indirect test should taketechniques) an indirect test should take
Theory of writing tells us that underlying Theory of writing tells us that underlying writing abilities = a # of sub-abilities, e.g., writing abilities = a # of sub-abilities, e.g., punctuation, organization, word choice, punctuation, organization, word choice, grammar . . .grammar . . .
Based on the theory, we construct multiple-Based on the theory, we construct multiple-choice tests to measure these sub-abilitieschoice tests to measure these sub-abilities
Construct Validity (6)Construct Validity (6)
But, how do we know this test really But, how do we know this test really is measuring writing ability?is measuring writing ability?
Validation methods:Validation methods:Compare scores on the pilot test with Compare scores on the pilot test with
scores on a writing test (direct test) scores on a writing test (direct test) if high level of agreement if high level of agreement yes yes
Administer a # of tests; each Administer a # of tests; each measures a construct. Score the measures a construct. Score the composition (direct test) separately for composition (direct test) separately for each construct. Then compare scores.each construct. Then compare scores.
Construct Validity (7)Construct Validity (7)
To examine whether the test is a true To examine whether the test is a true reflection of the theory of the trait being reflection of the theory of the trait being measured.measured.
In lang. testing construct= any In lang. testing construct= any underlying ability/trait which is underlying ability/trait which is hypothesized in a theory of language hypothesized in a theory of language ability.ability.
Necessary in a case of indirect testing.Necessary in a case of indirect testing.Can be measured by comparing the Can be measured by comparing the
scores of a group of students for two scores of a group of students for two tests.tests.
PracticalityPracticalityPractical consideration when planning Practical consideration when planning
tests or ways of measurement, tests or ways of measurement, including cost, time/effort requiredincluding cost, time/effort required
Economy (cost, time: Economy (cost, time: administration & administration &
scoringscoring))Ease of Ease of
scoring and score interpretationscoring and score interpretationadministrationadministrationtest compilationtest compilation
A test should be practical to use, but A test should be practical to use, but also valid and reliablealso valid and reliable