automatic generation of verbal analogy items

Automatic Generation of Verbal Analogy Items

Alan D. MeadIllinois Institute of Technology

AIG in employment testing

• Rise of unproctored Internet testing (UIT)• UIT may cause many security problems– One is item theft and coaching

• Solution: Generate entire test from scratch for each examinee– Item theft less of a problem– Coaching less effective– Items could be “watermarked”

• Also reduces cost and speeds deployment

AIG in employment testing (cont.)

• Need a variety of test content– Verbal analogies– Vocabulary– Math– Perceptual speed and accuracy– Spatial ability– Personality– Situational Judgment– Etc.

Verbal AnalogiesShovel:Diga) Bag:Buyb) Baby:Cryc) Fork:Eatd) Car:Stop

Shovel:Dig::Forka) Buyb) Cryc) Eatd) Stop

• Identify a “bridge”; you DIG with a SHOVEL• Find a matching answer; you EAT with a FORK

Pair responses Word Responses

Generating Verbal Analogies

• Identified database of relationships (e.g., “RIDER operates a BIKE”)

• Identified additional bridge relationships (“BOVINE means COW-like” & “ABSENT is the opposite of PRESENT”)

• Gathered data on word frequency and (part of this study) word familiarity

Generating Verbal Analogies (cont.)

1. Randomly select a bridge2. Randomly select TWO pairs for this bridge

(one for the stem, one for the key)3. Randomly select 2-3 additional pairs from

other bridges4. Randomly assign key pair; fill in remaining

pairs

Sample Items

1. paternal:father:: ?a. juvenile:childb. microphone:soundc. chalk:writerd. unfold:fold

3. rocket:astronaut:: ?a. lamp:lightb. stick:skating rinkc. jet:pilotd. demand:supply

Alternative format

1. paternal:father:: juvenile:?a. childb. soundc. writerd. fold

3. rocket:astronaut::jet:?a. lightb. skating rinkc. pilotd. supply

Keys1. paternal:father:: ?[Bridge: FATHER is described by PATERNAL]a. juvenile:child ***b. microphone:sound (unrelated: sound is a (typical) theme of microphone)c. chalk:writer (unrelated: writer is a (typical) agent of chalk)d. unfold:fold (unrelated: unfold and fold are opposites/opposed)

3. rocket:astronaut:: ?[Bridge: ASTRONAUT operates ROCKET]a. lamp:light (unrelated: lamp is a (typical) result of light)b. stick:skating_rink (unrelated: skating_rink is a (typical) location of stick)c. jet:pilot ***d. demand:supply (unrelated: supply and demand are opposites/opposed)

Present Study

• H1: Two forms of AIG analogies (word responses and pair responses) will have comparable reliability & validity

• H2: AIG scales will have reliability comparable to manually-written scale

• H3: AIG scales will have construct and criterion validity comparable to manually-written scale

Method

• Sample of N=251 gathered online and from psychology classes

• Measures: – n=20 AIG & human-written verbal analogy scales – N=40 vocabulary– Self-reported performance at work & school

Feasibility

• Manually examined items for feasibility• 40/64 (63%) items were feasible• Reasons for infeasibility– Over-use of a bridge or a pair (some bridges have

few pairs)– Ambiguous pairs (drum:drum?)– Foil inadvertently a correct key

Results for H1 Variable Mean SD n 1 2 3 4

1 Vocabulary 0.75 0.14 40 (0.86) 0.66 0.66 0.69

2 Human-written items 0.65 0.14 20 0.46 (0.57) 0.97 1.04

3 AIG items with pairs responses 0.73 0.16 20 0.52 0.63 (0.73) 0.94

4 AIG items with word responses 0.81 0.14 19 0.54 0.67 0.68 (0.72)

5 Self-Rated Performance 3.72 0.61 6 -0.04 -0.01 0.05 0.10

6 Academic Performance 0.02 0.72 3 0.14 0.22 0.20 0.14

H1: Two forms of AIG analogies (word responses and pair responses) will have comparable reliability & validity CONFIRMED


1 Vocabulary 0.75 0.14 40 (0.86) 0.66 0.66 0.69






H2: AIG scales will have reliability comparable to manually-written scale NOT CONFIRMED because the AIG scales had better reliability


1 Vocabulary 0.75 0.14 40 (0.86) 0.66 0.66 0.69






H3: AIG scales will have construct and criterion validity comparable to manually-written scaleCONFIRMED

Predicting Item DifficultyPredictor Correlation

Automatically generated (1) or manually written (0) 0.28*

Familiarity of least familiar word in item 0.33*

Familiarity of second least familiar word in item 0.39**

Mean familiarity of all words in item 0.37**

Lowest log(count(word)) 0.14

Second lowest log(count(word)) -0.06

Mean log(count(word)) 0.17

Future Directions

• Better handling of senses (DRUM is for DRUMMING)

• Better difficulty calculations based on larger sample of items

• Automated feasibility checking• Enhanced database of relationships• Choosing foils to have more semantic

similarity to other words

Thank you!

[email protected]

mailto:[email protected]

automatic generation of verbal analogy items

Documents