1 evaluating psychological tests. 2 psychological testing suffers a credibility problem within the...

1

Evaluating Psychological Tests

2

Psychological testing• Suffers a credibility problem within the eyes

of general public• Two main problems

– Tests used inappropriately• Goddard (1912) used a translation of Binet’s test to

test ability of American immigrants - conclusion 79% of Italian immigrants = ‘feeble-minded’ - bias

– Tests themselves can be flawed• Often measures supposed constructs which are not

supported by proper factor analysis - (Internal locus of control)

3

External bias in tests• Do group differences imply test bias (difficulty

unrelated to characteristic being assessed)? – V1 - innate abilities can be different across

groups (Reynolds, 1995; Kline, 1993)• Japanese have higher than average spatial abilities

• African Americans have ‘lower IQ’ (Hernstein & Murray, 1996)

– V2 – Ethnic and gender groups must have the same underlying abilities – evidence to the contrary must be a product of measuring something other than what is relevant

• Kline – ‘egalitarian fallacy’

4

Dealing with differences

• Detected through different regression equation – not through different means

• What purpose does research in this area serve? – Within group differences far outweigh between

group differences

5

Detecting internal bias• If only gross scores are considered, hard and easy

items for each group might balance themselves out giving a false impression of the test’s ‘health’

• Alternative – Run a mixed factorial ANOVA– Each test item (question) is entered as a level of

repeated measures factor – Group = between subjects variable

• Main effect of item – expected• Main effect of group shows external bias• Interaction show internal bias in that the pattern of responding

is different across the groups • Such a method is susceptible to power manipulation

6

Bias - performance characteristics

• Response bias– individuals are more likely to agree than

disagree (Cronbach, 1946) – response set of acquiescence

• Does not cause a problem if everyone behaves in same manner – standard score will be unaffected

• But there are considerable individuals differences in acquiescence therefore it can cause a major problem

– Changing polarity removes this difficulty

• Social desirability– Counter acted by lie scales and consistency

measures

7

Obvious influences

• Motivation

• Expectation

• Anxiety

• Test specific practise

8

Revisiting Validity

9

Validity – different definitions• Correctness or truth of an inference

• Validity with respect to IV– Are we truly manipulating that which we think we are

• Often relies on the construct of interest being adequately described

• How do you manipulate something like the unconscious?

• Validity with respect to the DV– Extent to which you are measuring what you claim to

measure

10

Different types of validity

• Content validity – Whether the target construct is adequately

addressed– When measuring depression should assess

aspects such as fatigue, anxiety, appetite, motivation, libido

• Is assessed through expert opinion – Has a certain amount of subjectivity

11

• Criterion-Related validity– How measure compares to some already

validated measure

• Two types– Predictive– Concurrent


12

Different types of validity• Construct validity

– Most important – Are the experimental manipulations that we make really manipulating the construct of interest

– Evaluation requires • Clear definition of the construct

– Can be difficult e.g., IQ – has many different facets

• Assess match between construct and operations used to represent it (exp manipulations)

– Can involve criterion and content validity

– Viewed as an evolving never ending process

13


• Internal validity – degree to which the independent and dependent variables are causally linked

• External validity – degree to which causal relationship holds across different settings

14

How relevant is validity to you• Reviewing articles is essentially addressing

validity and reliability issues– In examination situation would be useful although not

essential to talk about the different forms of validity

• In discussion sections of reports again you are essentially evaluating the results with respect to validity and reliability – Would not really use the formal language used here –

is a style issue

1 evaluating psychological tests. 2 psychological testing suffers a credibility problem within the...

Documents