automatic generation of verbal analogy items
DESCRIPTION
Automatic Generation of Verbal Analogy Items. Alan D. Mead Illinois Institute of Technology. AIG in employment testing. Rise of unproctored Internet testing (UIT) UIT may cause many security problems One is item theft and coaching - PowerPoint PPT PresentationTRANSCRIPT
Automatic Generation of Verbal Analogy Items
Alan D. MeadIllinois Institute of Technology
AIG in employment testing
• Rise of unproctored Internet testing (UIT)• UIT may cause many security problems– One is item theft and coaching
• Solution: Generate entire test from scratch for each examinee– Item theft less of a problem– Coaching less effective– Items could be “watermarked”
• Also reduces cost and speeds deployment
AIG in employment testing (cont.)
• Need a variety of test content– Verbal analogies– Vocabulary– Math– Perceptual speed and accuracy– Spatial ability– Personality– Situational Judgment– Etc.
Verbal AnalogiesShovel:Diga) Bag:Buyb) Baby:Cryc) Fork:Eatd) Car:Stop
Shovel:Dig::Forka) Buyb) Cryc) Eatd) Stop
• Identify a “bridge”; you DIG with a SHOVEL• Find a matching answer; you EAT with a FORK
Pair responses Word Responses
Generating Verbal Analogies
• Identified database of relationships (e.g., “RIDER operates a BIKE”)
• Identified additional bridge relationships (“BOVINE means COW-like” & “ABSENT is the opposite of PRESENT”)
• Gathered data on word frequency and (part of this study) word familiarity
Generating Verbal Analogies (cont.)
1. Randomly select a bridge2. Randomly select TWO pairs for this bridge
(one for the stem, one for the key)3. Randomly select 2-3 additional pairs from
other bridges4. Randomly assign key pair; fill in remaining
pairs
Sample Items
1. paternal:father:: ?a. juvenile:childb. microphone:soundc. chalk:writerd. unfold:fold
3. rocket:astronaut:: ?a. lamp:lightb. stick:skating rinkc. jet:pilotd. demand:supply
Alternative format
1. paternal:father:: juvenile:?a. childb. soundc. writerd. fold
3. rocket:astronaut::jet:?a. lightb. skating rinkc. pilotd. supply
Keys1. paternal:father:: ?[Bridge: FATHER is described by PATERNAL]a. juvenile:child ***b. microphone:sound (unrelated: sound is a (typical) theme of microphone)c. chalk:writer (unrelated: writer is a (typical) agent of chalk)d. unfold:fold (unrelated: unfold and fold are opposites/opposed)
3. rocket:astronaut:: ?[Bridge: ASTRONAUT operates ROCKET]a. lamp:light (unrelated: lamp is a (typical) result of light)b. stick:skating_rink (unrelated: skating_rink is a (typical) location of stick)c. jet:pilot ***d. demand:supply (unrelated: supply and demand are opposites/opposed)
Present Study
• H1: Two forms of AIG analogies (word responses and pair responses) will have comparable reliability & validity
• H2: AIG scales will have reliability comparable to manually-written scale
• H3: AIG scales will have construct and criterion validity comparable to manually-written scale
Method
• Sample of N=251 gathered online and from psychology classes
• Measures: – n=20 AIG & human-written verbal analogy scales – N=40 vocabulary– Self-reported performance at work & school
Feasibility
• Manually examined items for feasibility• 40/64 (63%) items were feasible• Reasons for infeasibility– Over-use of a bridge or a pair (some bridges have
few pairs)– Ambiguous pairs (drum:drum?)– Foil inadvertently a correct key
Results for H1 Variable Mean SD n 1 2 3 4
1 Vocabulary 0.75 0.14 40 (0.86) 0.66 0.66 0.69
2 Human-written items 0.65 0.14 20 0.46 (0.57) 0.97 1.04
3 AIG items with pairs responses 0.73 0.16 20 0.52 0.63 (0.73) 0.94
4 AIG items with word responses 0.81 0.14 19 0.54 0.67 0.68 (0.72)
5 Self-Rated Performance 3.72 0.61 6 -0.04 -0.01 0.05 0.10
6 Academic Performance 0.02 0.72 3 0.14 0.22 0.20 0.14
H1: Two forms of AIG analogies (word responses and pair responses) will have comparable reliability & validity CONFIRMED
Results for H2 Variable Mean SD n 1 2 3 4
1 Vocabulary 0.75 0.14 40 (0.86) 0.66 0.66 0.69
2 Human-written items 0.65 0.14 20 0.46 (0.57) 0.97 1.04
3 AIG items with pairs responses 0.73 0.16 20 0.52 0.63 (0.73) 0.94
4 AIG items with word responses 0.81 0.14 19 0.54 0.67 0.68 (0.72)
5 Self-Rated Performance 3.72 0.61 6 -0.04 -0.01 0.05 0.10
6 Academic Performance 0.02 0.72 3 0.14 0.22 0.20 0.14
H2: AIG scales will have reliability comparable to manually-written scale NOT CONFIRMED because the AIG scales had better reliability
Results for H3 Variable Mean SD n 1 2 3 4
1 Vocabulary 0.75 0.14 40 (0.86) 0.66 0.66 0.69
2 Human-written items 0.65 0.14 20 0.46 (0.57) 0.97 1.04
3 AIG items with pairs responses 0.73 0.16 20 0.52 0.63 (0.73) 0.94
4 AIG items with word responses 0.81 0.14 19 0.54 0.67 0.68 (0.72)
5 Self-Rated Performance 3.72 0.61 6 -0.04 -0.01 0.05 0.10
6 Academic Performance 0.02 0.72 3 0.14 0.22 0.20 0.14
H3: AIG scales will have construct and criterion validity comparable to manually-written scaleCONFIRMED
Predicting Item DifficultyPredictor Correlation
Automatically generated (1) or manually written (0) 0.28*
Familiarity of least familiar word in item 0.33*
Familiarity of second least familiar word in item 0.39**
Mean familiarity of all words in item 0.37**
Lowest log(count(word)) 0.14
Second lowest log(count(word)) -0.06
Mean log(count(word)) 0.17
Future Directions
• Better handling of senses (DRUM is for DRUMMING)
• Better difficulty calculations based on larger sample of items
• Automated feasibility checking• Enhanced database of relationships• Choosing foils to have more semantic
similarity to other words