1 validity – outline 1.definition 2.validity: two different views 3.types of validity a.face...

54
1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity Coefficients D.Construct i.Convergent ii.Discriminant

Upload: egbert-morton

Post on 17-Jan-2016

231 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

1

Validity – Outline

1.Definition2.Validity: Two Different Views3.Types of Validity

A. FaceB. ContentC. Criterion

i. Predictive vs. Concurrentii. Validity Coefficients

D. Constructi. Convergentii. Discriminant

Page 2: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

2

Validity – Definition

• Validity measures agreement between a test score and the characteristic it is believed to measure

• The basic question is: are you measuring what you think you’re measuring?

Page 3: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

3

Validity: two very different views

• Traditional:

Validity is a property of tests

Does the test measure what you think it measures?

Page 4: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

4

Validity: two very different views

• Traditional• Recent (e.g, Messick,

1989; Committee on Standards for Educational and Psychological Testing (CSEPT)):

Validity is a property of test score interpretations

Validity exists when actions based on the interpretation are justified given a theoretical basis and social consequences

Page 5: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

5

Note the difference:

• Does the test measure what you think it measures?

• Validity exists when actions based on the interpretation are justified given a theoretical basis and social consequences

Page 6: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

6

A problem with the CSEPT view

• Who is to say the ‘social consequences’ of test use are good or bad?

• According to CSEPT validity is a subjective judgment

• In my view, this makes the concept useless: “if you like the result the test gives you, you will consider it valid. If you don’t, you won’t.”

• That’s not how scientists think.

Page 7: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

7

Borsboom et al. (2004)

• Borsboom et al reject CSEPT’s view

• “Validity… is a very basic concept and was correctly formulated, for instance, by Kelley (1927, p. 14) when he stated that a test is valid if it measures what it purports to measure.” (p. 1061)

Page 8: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

8

Borsboom et al. (2004)

• “a test is valid for measuring an attribute if and only if (a) the attribute exists and (b) variations in the attribute causally produce variations in the outcomes of the measurement procedure.”

• Variations in what you are measuring cause variations in your measurements.

• E.g., variations across people in intelligence cause variations in their IQ scores

• This is not a correlational model of validity

Page 9: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

9

Borsboom et al. (2004)

• You don’t create a test and then do the analysis necessary to establish its validity

• Rather, you begin by doing the theoretical work necessary to create a valid test in the first place.

• On this view, validity is not a big issue.

Page 10: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

10

Borsboom et al. vs. CSEPT

• Who is right?• Each scientist has to

make up his or her own mind on that question

• I find Borsboom et al.’s arguments compelling.

• Other psychologists may disagree

Page 11: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

11

The CSEPT view

• CSEPT recognizes 3 types of evidence for test validity: Content-related Criterion-related Construct-related Boundaries not clearly

defined

• Cronbach (1980): Construct is basic, while Content & Criterion are subtypes.

Page 12: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

12

Parenthetical Point – Face Validity

• Face validity refers to the appearance that a test measures what it is intended to measure.

• Face validity has P.R. value – test-takers may have better motivation if the test appears to be a sensible way to measure what it measures.

Page 13: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

13

CSEPT: Content validity

• Content-related evidence considers coverage of the conceptual domain tested.

• Important in educational settings

• Like face validity, it is determined by logic rather than statistics

• Typically assessed by expert judges

Page 14: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

14

CSEPT: Content validity

• Content-related evidence considers coverage of the conceptual domain tested. Construct-irrelevant

variance Construct under-

representation

• Is each item relevant to domain?

• Is domain adequately covered or are parts of it left out?

• But if you are going to ask these questions, why not do it when creating the test?

Page 15: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

15

Borsboom et al.: Content validity

• Borsboom et al. would say that content validity is not something to be established after the test has been created.

• Rather, you build it into your test by having a good theory of what you are testing

• E.g., for a test in this course to have content validity, it should test your understanding of content validity!

Page 16: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

16

CSEPT: Criterion validity

• Criterion-related evidence tells us how well a test score corresponds to a particular criterion measure.

• A criterion is a standard against which a test is compared.

• The test score should tell us something about the criterion score.

Page 17: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

17

CSEPT: Criterion validity

• A criterion is a standard against which a test is compared.

• E.g., we could compare GPAs to SAT scores to produce evidence of validity of conclusions drawn on basis of SAT scores

• Two basic types: Predictive Concurrent

Page 18: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

18

CSEPT: Criterion validity

• Predictive validity • Test scores used to predict future performance – how good is the prediction?

• E.g., SAT is used to predict final undergraduate GPA

• SAT – GPA are moderately correlated

Page 19: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

19

CSEPT: Criterion validity

• Predictive validity• Concurrent validity

• Correlation between test scores and criterion when the two are measured at same time.

• Test illuminates current performance rather than predicting future performance (e.g., why does patient have a temperature? Why can’t student do math?)

Page 20: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

20

Borsboom et al.: Criterion validity

• “Criterion validity” involves a correlation, of test scores with some criterion such as GPA

• That does not establish the test’s validity, only its utility.

• E.g., height and weight are correlated, but a test of height is not a test of what bathroom scales measure.

Page 21: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

21

Borsboom et al.: Criterion validity

• SAT is valid because it was developed on the sensible theory that “past academic achievement” is a good guide to “future academic achievement”

• Validity is built into the test, not established after the test has been created

Page 22: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

22

Borsboom et al.: Criterion validity

• Validation research aims at showing how variation in the attribute causes variation in the test score

• This requires a “theory of the task”: how does the test-taker do the mental operations needed to respond to test items?

Page 23: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

23

CSEPT: Criterion validity

• Note: no point in developing a test if you already have a criterion – unless impracticality or expense makes use of the criterion difficult.

• Criterion measure only available in the future?

• Criterion too expensive to use?

Page 24: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

24

CSEPT: Criterion validity

• Validity Coefficient • Compute correlation (r) between test score and criterion.

• r = .30 or .40 would be considered normal.

• r > .60 is rare

Note: r varies between -1.0 and +1.0

Page 25: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

25

CSEPT: Criterion validity

• Validity Coefficient • r2 gives proportion of variance in criterion explained by test score.

• E.g., if rxy = .30, r2 = .09, so 9% of variability in Y “can be explained by variation in X”

Page 26: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

26

CSEPT: Criterion validity

• Interpreting Validity Coefficients – watch out for:

1. Changes in causal relationships

2. What does criterion mean? Is it valid, reliable?

3. Is subject population for validity study appropriate?

4. Sample size

Page 27: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

27

CSEPT: Criterion validity

• Interpreting Validity Coefficients – watch out for:

5. Criterion/predictor confusion

6. Range restrictions

7. Do validity study results generalize?

8. Differential predictions

Page 28: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

28

CSEPT: Construct validity

• Problem: for many psychological characteristics of interest there is no agreed-upon “universe” of content and no clear criterion

• We cannot assess content or criterion validity for such characteristics

• These characteristics involve constructs: something built by mental synthesis.

Page 29: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

29

CSEPT: Construct validity

• Examples of constructs:

Intelligence Love Curiosity Mental health

• CSEPT: We obtain evidence of validity by simultaneously defining the construct and developing instruments to measure it.

• This is ‘bootstrapping.’

Page 30: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

30

Bootstrapping construct validity

• assemble evidence about what a test “means” – in other words, about the characteristic it is testing.

• CSEPT: this process is never finished

• Borsboom: this is part of the process of creating a test in the first place, not something done after the fact

Page 31: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

31

Bootstrapping construct validity

• assemble evidence• show relationships

between a test and other tests

• none of the other tests is a criterion

• Borsboom: these relationships do not tell us what a test score means (e.g., age is correlated

with annual income but a measure of age is not a measure of annual income).

Page 32: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

32

Bootstrapping construct validity

• assemble evidence • show relationships• each new relationship

adds meaning to the test

• test’s meaning is gradually clarified over time

• Borsboom would say, why all the mystery? The meaning of many tests (e.g., WAIS, academic exams, Piaget’s tests) is clear right from the start

Page 33: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

33

CSEPT: Construct validity

• Example from text: Rubin’s work on Love.

• Rubin collected a set of items for a Love scale

• He read poetry, novels; asked people for definitions

• created a scale of Love and one of Liking

Page 34: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

34

CSEPT: Construct validity

• Rubin gave scale to many subjects & factor-analyzed results

• Love integrates Attachment, Caring, & Intimacy

• Liking integrates Adjustment, Maturity, Good Judgment, and Intelligence The two are independent:

you can love someone you don’t like (as song-writers know)

Page 35: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

35

Campbell & Fiske (1959)

• Two types of Construct-related Evidence

• Convergent evidence

• When a test correlates well with other tests believed to measure the same construct

Page 36: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

36

Campbell & Fiske (1959)

• Two types of Construct-related Evidence

• Convergent evidence• Discriminant evidence

• When a test does not correlate with other tests believed to measure some other construct.

Page 37: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

37

Convergent validity

• Example – Health Index

• Scores correlated with age, number of symptoms, chronic medical conditions, physiological measures

• Treatments designed to improve health should increase Health Index scores. They do.

Page 38: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

38

Discriminant validity

• low correlations between new test and tests believed to tap unrelated constructs.

• evidence that the new test measures something unique

Page 39: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

39

CSEPT: Validity & Reliability

• CSEPT: No point in trying to establish validity of an unreliable test.

• It’s possible to have a reliable test that has no meaning (is not valid).

• Logically impossible to produce evidence of validity for an unreliable test.

Page 40: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

40

Borsboom: Validity & Reliability

• Borsboom et al: what does it mean to say that a test is reliable but not valid?

• What is it a test of?• It isn’t a test at all, just

a collection of items

Page 41: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

41

Borsboom: Validity & Reliability

• Borsboom et al: validity is a necessary condition for reliability

• Reliability of a test of X estimates precision of measurement of X – but how could you estimate the precision of measurement of X for a test that does not measure X?

• Thus, validity is presumed when you assess reliability

Page 42: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

42

Blanton & Jaccard – arbitrary metrics

• We observe a behavior in order to learn about the underlying psychological characteristic

• A person’s test score represents their standing on that underlying dimension

• Such scores form an arbitrary metric

• That is, we do not know how the observed scores are related to the true scores on the underlying dimension

Page 43: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

6543210

0 1 2 3 4 5 6

Person A Person B

Underlying dimension

Test 1

Test 2

Adapted from Blanton & Jaccard (2006) Figure 1, p. 29

Neutral

Page 44: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

44

Arbitrary metrics – the IAT

• Implicit Association Test (IAT) – claimed to diagnose implicit attitudinal preferences – or racist attitudes

• IAT authors say you may have prejudices you don’t know you have.

• Are these claims true?

Page 45: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

45

Arbitrary metrics – the IAT

• Task: categorize stimuli using two pairs of categories

• Two buttons to press, two assignments of categories to buttons, used in sequence

Page 46: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

46

Arbitrary metrics – the IAT

• Assignment pattern A • Button 1 – press if

stimulus refers to the category White or the category Pleasant

• Button 2 – press if stimulus refers to the category Black or the category Unpleasant

• Assignment pattern B • Button 1 – press if

stimulus refers to the category White or the category Unpleasant

• Button 2 – press if stimulus refers to the category Black or the category Pleasant

Page 47: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

47

Arbitrary metrics – the IAT

• IAT authors claim that if responses are faster to Pattern A than to Pattern B, that indicates a “preference” for Whites over Blacks – in other words, a racist attitude

• IAT authors also give test-takers feedback about how strong their preferences are, based on how much faster their responses are to Pattern A than to Pattern B

• This is inappropriate

Page 48: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

48

Arbitrary metrics – the IAT

• Blanton & Jaccard: • The IAT does not tell us about racist attitudes

• IAT authors take a dimension which is non-arbitrary when used by physicists – time – and use it in an arbitrary way in psychology

Page 49: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

49

Arbitrary metrics – the IAT

• The function relating the response dimension (time) to the underlying dimension (attitudes) is unknown

• Zero on the (Pattern A – Pattern B) difference may not be zero on the underlying attitude preference dimension

• There are alternative models of how that (Pattern A – Pattern B) difference could arise

Page 50: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

50

Review

CSEPT:

1. Validity is a characteristic of evidence, not of tests.

2. Valid evidence supports conclusions drawn using test results

3. Validity is determined by social consequences of test use

Borsboom et al.

1. Validity is not a methodological issue, but a substantive (theoretical) issue

2. A test of an attribute is valid if (a) the attribute exists, and (b) variation in the attribute causes variation in test scores

Page 51: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

51

Review

CSEPT:

4. Validity can be established in three ways, though boundaries between them are fuzzy:A. Content-related evidence

B. Criterion-related evidence

C. Construct-related evidence

Borsboom et al:

3. It’s all the same validity: a test is valid if it measures what you think it measures

4. Validity is not mysterious

Page 52: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

52

Review

CSEPT

5. Content-related evidence: do test items represent whole domain of interest?

6. Criterion-related evidence: do test scores relate to a criterion either now (concurrent) or in the future (predictive)?

Borsboom et al.

5. These questions are properly part of the process of creating a test

Page 53: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

53

Review

CSEPT

6. Construct-related evidence is obtained when we develop a psychological construct and the way to measure it at the same time.

7. A test can be reliable but not valid. A test cannot be valid if not reliable.

Borsboom et al.

6. A test must be valid for a reliability estimate to have any meaning

Page 54: 1 Validity – Outline 1.Definition 2.Validity: Two Different Views 3.Types of Validity A.Face B.Content C.Criterion i.Predictive vs. Concurrent ii.Validity

54

Review

• Blanton & Jaccard (2006) warn against over-interpretation of scores which are based on an arbitrary metric

• For an arbitrary metric, we have no idea how the test scores are actually related to the underlying dimension