1 evaluating psychological tests. 2 psychological testing suffers a credibility problem within the...
TRANSCRIPT
1
Evaluating Psychological Tests
2
Psychological testing• Suffers a credibility problem within the eyes
of general public• Two main problems
– Tests used inappropriately• Goddard (1912) used a translation of Binet’s test to
test ability of American immigrants - conclusion 79% of Italian immigrants = ‘feeble-minded’ - bias
– Tests themselves can be flawed• Often measures supposed constructs which are not
supported by proper factor analysis - (Internal locus of control)
3
External bias in tests• Do group differences imply test bias (difficulty
unrelated to characteristic being assessed)? – V1 - innate abilities can be different across
groups (Reynolds, 1995; Kline, 1993)• Japanese have higher than average spatial abilities
• African Americans have ‘lower IQ’ (Hernstein & Murray, 1996)
– V2 – Ethnic and gender groups must have the same underlying abilities – evidence to the contrary must be a product of measuring something other than what is relevant
• Kline – ‘egalitarian fallacy’
4
Dealing with differences
• Detected through different regression equation – not through different means
• What purpose does research in this area serve? – Within group differences far outweigh between
group differences
5
Detecting internal bias• If only gross scores are considered, hard and easy
items for each group might balance themselves out giving a false impression of the test’s ‘health’
• Alternative – Run a mixed factorial ANOVA– Each test item (question) is entered as a level of
repeated measures factor – Group = between subjects variable
• Main effect of item – expected• Main effect of group shows external bias• Interaction show internal bias in that the pattern of responding
is different across the groups • Such a method is susceptible to power manipulation
6
Bias - performance characteristics
• Response bias– individuals are more likely to agree than
disagree (Cronbach, 1946) – response set of acquiescence
• Does not cause a problem if everyone behaves in same manner – standard score will be unaffected
• But there are considerable individuals differences in acquiescence therefore it can cause a major problem
– Changing polarity removes this difficulty
• Social desirability– Counter acted by lie scales and consistency
measures
7
Obvious influences
• Motivation
• Expectation
• Anxiety
• Test specific practise
8
Revisiting Validity
9
Validity – different definitions• Correctness or truth of an inference
• Validity with respect to IV– Are we truly manipulating that which we think we are
• Often relies on the construct of interest being adequately described
• How do you manipulate something like the unconscious?
• Validity with respect to the DV– Extent to which you are measuring what you claim to
measure
10
Different types of validity
• Content validity – Whether the target construct is adequately
addressed– When measuring depression should assess
aspects such as fatigue, anxiety, appetite, motivation, libido
• Is assessed through expert opinion – Has a certain amount of subjectivity
11
• Criterion-Related validity– How measure compares to some already
validated measure
• Two types– Predictive– Concurrent
Different types of validity
12
Different types of validity• Construct validity
– Most important – Are the experimental manipulations that we make really manipulating the construct of interest
– Evaluation requires • Clear definition of the construct
– Can be difficult e.g., IQ – has many different facets
• Assess match between construct and operations used to represent it (exp manipulations)
– Can involve criterion and content validity
– Viewed as an evolving never ending process
13
Different types of validity
• Internal validity – degree to which the independent and dependent variables are causally linked
• External validity – degree to which causal relationship holds across different settings
14
How relevant is validity to you• Reviewing articles is essentially addressing
validity and reliability issues– In examination situation would be useful although not
essential to talk about the different forms of validity
• In discussion sections of reports again you are essentially evaluating the results with respect to validity and reliability – Would not really use the formal language used here –
is a style issue