predicting and interpreting identification errors in military vehicle training using...
TRANSCRIPT
This article was downloaded by: [Dicle University]On: 09 November 2014, At: 11:37Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK
ErgonomicsPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/terg20
Predicting and interpreting identification errors inmilitary vehicle training using multidimensional scalingCorey J. Bohila, Nicholas A. Higginsa & Joseph R. Keeblerb
a Department of Psychology, University of Central Florida, Orlando, FL, USAb Department of Psychology, Wichita State University, Wichita, KS, USAPublished online: 04 Apr 2014.
To cite this article: Corey J. Bohil, Nicholas A. Higgins & Joseph R. Keebler (2014) Predicting and interpretingidentification errors in military vehicle training using multidimensional scaling, Ergonomics, 57:6, 844-855, DOI:10.1080/00140139.2014.899631
To link to this article: http://dx.doi.org/10.1080/00140139.2014.899631
PLEASE SCROLL DOWN FOR ARTICLE
Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of theContent. Any opinions and views expressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon andshould be independently verified with primary sources of information. Taylor and Francis shall not be liable forany losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoeveror howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use ofthe Content.
This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions
Predicting and interpreting identification errors in military vehicle training usingmultidimensional scaling
Corey J. Bohila*, Nicholas A. Higginsa and Joseph R. Keeblerb
aDepartment of Psychology, University of Central Florida, Orlando, FL, USA; bDepartment of Psychology,Wichita State University, Wichita, KS, USA
(Received 23 September 2013; accepted 21 February 2014)
We compared methods for predicting and understanding the source of confusion errors during military vehicle identificationtraining. Participants completed training to identify main battle tanks. They also completed card-sorting and similarity-rating tasks to express their mental representation of resemblance across the set of training items. We expected participantsto selectively attend to a subset of vehicle features during these tasks, and we hypothesised that we could predictidentification confusion errors based on the outcomes of the card-sort and similarity-rating tasks. Based on card-sortingresults, we were able to predict about 45% of observed identification confusions. Based on multidimensional scaling of thesimilarity-rating data, we could predict more than 80% of identification confusions. These methods also enabled us to inferthe dimensions receiving significant attention from each participant. This understanding of mental representation may becrucial in creating personalised training that directs attention to features that are critical for accurate identification.
Practitioner Summary: Participants completed military vehicle identification training and testing, along with card-sortingand similarity-rating tasks. The data enabled us to predict up to 84% of identification confusion errors and to understand themental representation underlying these errors. These methods have potential to improve training and reduce identificationerrors leading to fratricide.
Keywords: vehicle identification; training; incidental learning; multidimensional scaling
Introduction
Friendly fire accidents (also known as fratricide or blue-on-blue incidents) often result from mistaken identification of
military vehicles by individual soldiers (Regan 1995). Such errors disrupt the planning of appropriate responses to rapidly
changing field conditions, often with life-threatening results (Briggs and Goldberg 1995; Keebler et al. 2010). Research
suggests that novices can overreact based on the appearance of a prominent feature such as tank treads or turrets (Biederman
and Shiffrar 1987) and that, depending on the extent of feature overlap, vehicles can be easily confused with one another
(O’Kane et al. 1997). Accurate combat identification likely requires (1) attention to multiple features and (2) that features
with little predictive value of a vehicle’s identity are not overly attended to.
A common vehicle identification-training method is through repeated study of images – often line drawings – of
armoured vehicles until their identities are memorised (Keebler, Jentsch, and Hudson 2011). Although simple and cost
effective, there are shortcomings to this approach. There is no interactivity to promote active engagement in learning.
Training is known to be more effective when learners are deeply engaged in the training system (Kirkpatrick 1975; Malone
1981; Keebler, Jentsch, and Hudson 2011). Also, given the complexity of the training items (armoured vehicles with many
features) there is no way to know which features are most prominent in the learner’s memory.
Object identification requires learning a 1:1 mapping of stimulus features to response labels (see Figure 1a). However,
given the large number of features comprising an object as complex as a vehicle, there could be any number of feature
subsets upon which a learner fixates. Furthermore, attending to idiosyncratic – but not predictive beyond the training set –
features of training examples can lead to learning that is difficult to reverse (Biederman and Shiffrar 1987). However, given
the task of memorising several training items, focusing on a subset of outward physical features (even if unconsciously)
may be effective and efficient. Over time, experienced learners will develop a deeper, knowledge-based understanding of
the critical features for each vehicle type (i.e. other thought processes over and above perception and memory play a role).
But within the operational environment, most military personnel making identification judgements may be relatively
inexperienced. For example, upon encountering an armoured vehicle in the field, infantry must make rapid decisions to
engage or retreat, and must communicate what they see to forces elsewhere. These identification judgements are often made
in poor viewing conditions, including distance, rain or dust, and occlusion by other objects, as well as under time pressure
q 2014 Taylor & Francis
*Corresponding author. Email: [email protected]
Ergonomics, 2014
Vol. 57, No. 6, 844–855, http://dx.doi.org/10.1080/00140139.2014.899631
Dow
nloa
ded
by [
Dic
le U
nive
rsity
] at
11:
37 0
9 N
ovem
ber
2014
and the duress of battle (Keebler et al. 2007). Improved decision-making in the field could result from identification training
that is tailored to help learners focus attention on the most diagnostic features and avoid attending to less informative
features (Biederman and Shiffrar 1987; Keebler et al. 2008; Keebler, Jentsch, and Hudson 2011).
A well-known finding from the literature on classification learning is that when stimuli are complex, learners often focus
on a subset of stimulus features (Biederman and Shiffrar 1987; Yamauchi and Markman 1998). During identification
training, learners may incidentally (i.e. unintentionally) form a mental representation of vehicle classes in addition to
memorising individual items (e.g. Kemler Nelson 1984; Smith 2008; Folstein, Gauthier, and Palmeri 2010). While
progressing through a training set, learners are likely to notice recurring features across items (see Figure 1b). For example,
some tanks have treads that are covered by armour while many do not; some vehicles have various peripheral devices
mounted to their surface while others do not, and so on. Critically, this ‘many to one mapping’ mental representation may
occur despite the fact that training feedback does not suggest any sort of classification scheme across sets of training objects,
but rather supports only individual item identification.
The current research explores the possibility that participants develop mental classification schemes incidentally during
identification training, and the possibility that we can (1) determine what their mental representations are like and (2)
predict their identification errors based on this knowledge. Participants completed an identification-training task in which
they learned to identify a set of armoured military vehicles (main battle tanks). During identification training, participants
received information about the unique identity of each individual training item. The training information did not serve to
reinforce their attention to any particular subset of stimulus dimensions or class of vehicles based on common features. If
participants notice recurring differences and similarities across training items and form associations between these items
Figure 1. Representative items from the training set. (a) Identification: 1:1 mapping of stimuli to responses. (b) Classification: Manystimuli treated as equivalent.
Ergonomics 845
Dow
nloa
ded
by [
Dic
le U
nive
rsity
] at
11:
37 0
9 N
ovem
ber
2014
(even implicitly) then this could be considered a type of ‘unsupervised’ category formation (i.e. training feedback does not
reinforce category formation). We included both a passive ‘observational’ identification-training condition that is akin to
learning with a set of training images and a more actively engaging ‘feedback’ training condition, during which participants
had to guess the identity of each item followed by corrective feedback. Our goal was not necessarily to contrast these
training conditions, but rather to evaluate our ability to predict identification errors under a variety of training methods.
Participants also completed two tasks designed to assess their mental representation of the training set. They completed
a card-sorting task – both before and after the identification-training task – in which they placed each training item into
whatever piles made sense to them. Card-sorting is a widely used task for gaining insight into mental classification schemes
and elicitation of knowledge structures (Edwards et al. 2006). After completing these tasks, participants also completed a
similarity-rating task in which they compared the similarity of each possible pair of training images. These ratings facilitate
multidimensional scaling (MDS) analysis (details below), which we used to gain another measure of mental representation
of the stimulus set. Both tasks provide information regarding the participant’s focus of attention when examining members
of the training set.
Our main goal was to infer something about mental representations that develop during training and to examine howwell
we can predict identification confusion errors (i.e. confusing one training item for another) using that information. We
hypothesise that learners notice subsets of features during identification training and that attention to these features
contributes to their confusions (e.g. tanks with similar treads may be more confusable if this feature is the focus of attention).
If this is the case, then it may be possible to infer the dimensions that are receiving the most attention from a learner and
potentially predict some confusion errors during the training process. Knowledge of this underlying mental representation of
the training set could be used to create adaptive training that focuses attention on the most important dimensions and reduces
identification errors. After presenting the results of our comparison between several prediction methods, we consider the
possibility of using the methods explored here for developing an adaptive system for identification training.
Method
Participants
Undergraduate students from the University of Central Florida (n ¼ 38) voluntarily participated in the experiment for
course credit. Data from two participants were removed for failure to complete the tasks as instructed, so our analyses are
based on the remaining 36 participants.
Stimuli
The stimuli used across the experimental tasks were line drawings of armoured military vehicles, selected from 54 study
cards included in Graphic Training Aid 17-02-013: Armoured Vehicle Recognition, January 1997. Figure 1 shows some of
the images used in the study. A total of 32 drawings were selected from the set of study cards. Twenty-two were drawings of
main battle tanks, including 2S1-M1974, AMX-30, Centurion, Challenger, Chieftain, Leopard-II, M1-Abrams, M48A5,
M60A3, T-62, T-64, T-72, AMX-13, ASU-85, Leopard-1A2, Jagdpanzer-Kanone, Jagdpanzer-SK105, Leopard-1A4, T-80,
BMP-2, M109, T-54/55. A subset of 12 vehicles was chosen to be used in the identification-training task (the first 12 in the
list above), while the remaining 10 were used for comparison in the identification test, card-sort and similarity-rating phases
of the study. These vehicles were chosen because they were all of the same type (i.e. main battle tanks) and because they
appeared (to the researchers) to be highly similar to each other. An additional 10 ‘non-similar’ drawings from the set of
cards were also selected, including AMX-10P, BMD, BMP, Airborne, Jaguar, M2IFV, M551A1 Sheridan, Marder,
Type531, ZSU-57-2. These were selected because they were different in appearance from the main battle tanks (e.g. many
were not main battle tanks, although all were armoured military vehicles that were potentially confusable with other items
in the study). These 10 items appeared along with the items above during the identification test.
Images were roughly 2 inches tall and 3 inches wide when presented on paper cards (during the card-sort task), and
roughly 4 £ 6 inches when presented on the computer screen (during the identification and similarity-rating tasks). All
vehicles appeared at roughly the same oblique angle or rotation, and we controlled for image size and extraneous details in
the images (name information, size comparison with a human).
Procedure
Each participant completed several tasks. First was a card-sorting task; second, they were trained and tested on
identification; and third, they conducted another card-sorting task to assess changes due to identification training. We refer
C.J. Bohil et al.846
Dow
nloa
ded
by [
Dic
le U
nive
rsity
] at
11:
37 0
9 N
ovem
ber
2014
to the sorting tasks as pre- and post-test sorting tasks since they came before and after the identification task. Finally,
all participants finished the session by completing a similarity-rating task. Task details are as follows.
Card-sorting task
Participants were presented with a stack of 22 cards (the ‘similar’ cards described above), each with a single vehicle image
on one side. They were instructed to spread the cards onto a table and then sort them into piles. They were instructed that the
piles could be organised however they wished as long as the piles made sense to them. No other direction was given about
how to sort the vehicles. If the participant asked questions during the sorting task, they received only encouragement to sort
the vehicles however they felt made sense. They were also instructed that they should sort into at least two piles, but could
have as many piles as they wished beyond two. Finally, they were informed that they would have to provide a name for each
pile after the sort was completed, and that they would be asked to explain the basis for sorting the images as they did and
what rationale they used for naming the piles. Finally, for each vehicle in a sort pile, they were asked to rate the ‘goodness’
of that vehicle for the pile. These goodness ratings were based on a three-point scale (1 ¼ fair, 2 ¼ good, 3 ¼ perfect). This
provided a metric of representativeness for the pile into which each vehicle image was placed.
The sorting task was completed at the beginning of the experimental session, and again following the identification test
phase. We assumed that the post-ID sort results would reflect knowledge gained from experience with the images (i.e.
during the ID training and test phases). Each sorting task (pre- and post-test) took about 10 minutes to complete.
Identification training and testing tasks
There were two identification-training conditions. In both training conditions, participants viewed a series of 12 vehicle
images – one at a time – on a computer screen. These 12 vehicles also appeared in the sorting-task set.
In the ‘observational’ training condition, on each trial the participant would view a tank image, along with the name of
that tank. They could view the image for as long as they wished, and they pressed a key to move on to the next image. The
set of 12 images was presented in 10 training blocks (with order randomised for each block), for a total of 120 training trials.
In the ‘feedback’ condition, the participant was presented on each trial with one of the 12 vehicles. Instead of a single
label, all 12 vehicle labels were presented on the screen, along with an instruction to ‘press the corresponding keyboard key
to guess the name of the tank’. After pressing a response key, the screen was cleared, followed by ‘correct’ or ‘incorrect –
that was a [correct label of tank]’. The feedback remained on the screen for two seconds. The 12 training tanks were
randomly presented in each training block. The training continued until the participant provided the correct category label
for all 12 tanks in a row. It is typical in identification-training studies with feedback to continue training until an accuracy-
based criterion is reached (e.g. 100% accuracy one or two times through the training set). We followed this convention for
the feedback training condition. In the observational condition, we selected the number of training cycles based on a guess
as to the number of training trials that would likely be needed to reach criterion for participants in the feedback condition.
This was done to keep the number of training observations roughly equal across conditions. As shown in the results section,
the number of training trials completed in both conditions was similar.
The observational and feedback training conditions were presented between subjects. No participant completed more
than a single training condition.
Following training, each participant (in both training conditions) completed an identification test. In random order, the
12 training items, along with 10 additional similar and 10 non-similar (described above) tank images were presented on the
computer screen, one image per trial. On a trial, the test item was presented for 1 second before disappearing. Then the list
of 12 item labels from the training phase appeared, along with an additional label that said ‘other’ which participants could
press if they felt that they had not seen the image during training. The participant could take as long as they wished to select
their response. No accuracy feedback was provided during this identification test. The identification training and testing
portion of the experiment took approximately 30 minutes to complete. Following the identification training and test phases,
each participant again completed the card-sort task (described above).
Similarity-rating task
All participants finished the session by completing a similarity-rating task. On each trial of the similarity-rating task, two
tanks were presented side by side on the computer screen, along with the instructions ‘How similar are these?’ Under these
instructions was a rating scale with numbers from 1 (‘low similarity’) to 5 (‘high similarity’). The participant’s task was to
examine the two images and press the corresponding number key to rate their similarity. After a one-second inter-trial
interval, the next randomly selected pair of tanks was presented. The presented tanks included the 12 ID task training items,
Ergonomics 847
Dow
nloa
ded
by [
Dic
le U
nive
rsity
] at
11:
37 0
9 N
ovem
ber
2014
along with the 10 similar items that also appeared in the card-sorting and ID training tasks. All pairwise comparisons were
rated for these 22 stimuli, resulting in 231 similarity-rating trials. This portion of the experiment took approximately
30 minutes to complete.
Results
We begin by reviewing the outcome of the identification training and test phases. After that we evaluate our ability to
predict confusion errors committed during the identification test based on the results of the card-sorting and similarity-
rating tasks. The similarity-rating data for each participant were submitted to a MDS analysis that gave a spatial
representation of confusability for the items. We also examine three methods for inferring which stimulus dimensions
tended to be the focus of attention. These include a method based on the card-sort results and one based on interpreting the
dimensions of MDS results. Finally, we compare all methods on the basis of signal detection analysis, which provides a
quantitative measure of ability to predict confusions.
Identification confusion errors
During the identification test, participants were presented with the 12 training items, along with 20 additional tank images to
serve as lures (the 10 similar and 10 non-similar items described above). The lures were included to make the identification
task more difficult. Although participants completed, on average, slightly more training trials in the feedback condition, this
was not a statistically reliable difference. In the observational condition, all participants completed 120 training trials. In the
feedback condition, the median number of trials completed to reach criterion was also 120 trials (M ¼ 135 trials, SD ¼ 54),
t (30) ¼ 1.08, p ¼ 0.291.
Participants committed more identification errors (including ‘other’ responses) after observational training (M ¼ 8.24,
SD ¼ 4.72) than after feedback training (M ¼ 4.63, SD ¼ 2.49), t (34) ¼ 2.91, p ¼ 0.006. Excluding the ‘other’ response,
the difference in ID confusion rates was smaller (i.e. the rate of confusing one training item for another training item). Based
only on the 12 training stimuli, participants averaged 2.94 (SD ¼ 2.61) confusions after observational training and averaged
2.84 (SD ¼ 2.19) confusions after feedback training, but there was no significant difference between the two conditions,
t (34) ¼ 0.12, p ¼ 0.902. It follows that more ‘other’ responses were made after observational training (M ¼ 5.29,
SD ¼ 3.29) than after feedback training (M ¼ 1.79, SD ¼ 1.23), t (34) ¼ 4.32, p , 0.001.
Although the difference in error rates seems to disappear after removing the ‘other’ responses, it is important to note that
there were far fewer errors overall in the feedback training condition (total errors across all participants ¼ 88) than in the
observational condition (total errors across participants ¼ 140), while ‘other’ responses accounted for 34 (39%) and 90
(64%) errors, respectively, in these conditions. Clearly, there was greater uncertainty about item identity after observational
training.
The ‘other’ responses could merely indicate a lack of confidence in the identification response, rather than certainty that
the item was not seen in training. As a result, in the rest of our analyses we limit attention to identification confusions based
only on the 12 training items. Our primary interest is in errors of commission (i.e. misidentifying one training item as
another training item), due to these types of errors being responsible for incidences of fratricide.
Predicting ID confusions from card-sort results
We analysed the card-sort piles with respect to their ability to predict ID confusions using the following method. When two
tanks were placed into the same card-sort pile, we interpreted this to mean that the participant considered these items to be
more similar to each other than to items in other piles. Therefore, if two items appeared in the same pile and those items
were indeed confused during the identification task, we considered that observed error to be predicted by the card-sort
results. For example, if the participant confused a T-64 with a T-72 during the identification task, and these tanks appeared
together in one of the participant’s sort piles, then this confusion was predicted by the sorting task.
We compared predictions based on training condition, and also compared the pre- and post-task sorts using a 2 £ 2
mixed-factor analysis of variance (ANOVA) with training as a between-participant factor and pre–post sort as a within-
participant factor. We could predict a higher proportion of ID confusions after feedback training (M ¼ 0.54, SD ¼ 0.35)
than after observational training (M ¼ 0.33, SD ¼ 0.33; collapsed over pre- and post-task sorts); this difference was very
close to reaching statistical significance, F (1,27) ¼ 4.12, p ¼ 0.052, partial h 2 ¼ 0.132. There was no reliable difference
in predictions based on pre- (M ¼ 0.42, SD ¼ 0.35) and post-task sorts (M ¼ 0.45, SD ¼ 0.36), F (1,27) ¼ 0.10,
p ¼ 0.752, h 2 ¼ 0.004, and no interaction between training condition and change from pre- to post-task sort,
F (1,27) ¼ 0.03, p ¼ 0.87, h 2 ¼ 0.001. It appears that the difference between feedback and observational training card-sort
C.J. Bohil et al.848
Dow
nloa
ded
by [
Dic
le U
nive
rsity
] at
11:
37 0
9 N
ovem
ber
2014
predictions largely existed even before training occurred. This conclusion is supported by the non-significant difference in
ability to predict confusions based on pre- and post-test card-sorts.
The point we wish to make with this analysis is that card-sort results do a reasonable job of predicting ID confusions.
The card-sorts predict about 45% of confusion errors collapsed over training conditions and pre- and post-task sorts.
Because the main goal of this study was to predict identification confusion errors during training, rather than to champion
one training style over another, we limit our consideration of differences between training conditions in our remaining
analyses.
Before evaluating other prediction methods, we must evaluate the content of the card-sorting results. In addition to error
prediction, our goal is to interpret the psychological dimensions underlying the confusability of training items (i.e. we wish
to determine the basis for judging items as similar enough to be confusable). We summarise here the outcome of the post-
training card-sorts. This interpretation will be contrasted with another approach to error interpretation in a later section.
In the (post-training) card-sort task, participants averaged around four piles each (M ¼ 4.37, SD ¼ 2.28). Piles averaged
slightly more than five items per pile (M ¼ 5.43, SD ¼ 3.05). Despite idiosyncrasies in participants’ descriptions of their
sort piles, we found substantial consistency within and across participants in terms of the features they described as guiding
their sorts. We were able to organise their sort labels into a relatively small set of approximately six categories. These
included features pertaining to the following (from most common to least): body style (e.g. shape, size), small auxiliary
guns (e.g. presence, absence, number), antennas (e.g. presence, absence, size), main turret (e.g. size, shape), attached
auxiliary equipment (e.g. presence, absence) and number of wheels.
For each participant, we counted the number of unique feature categories described. For example, in some cases a
participant’s sort piles each corresponded to a different feature (e.g. body style, wheels). In most cases, though, several sort
piles were based on different facets of the same dimension (e.g. rounded body, sectioned body and angular body). When this
was the case, we treated these as use of a single feature category (e.g. body features). Using this process, we counted 62 total
feature categories across participants. Most prevalent among these were body features (appearing for 16 participants), small
attached guns (13 cases) and antenna (11 cases), then main turret features (8), attached equipment (8), number of wheels (5)
and finally one miscellaneous feature (based on ‘modern’ appearance). These results suggest that a relatively small set of
prominent tank features formed the basis for organising training items in the minds of participants.
Predicting ID confusions from similarity ratings
After completing the identification-training and card-sorting tasks, participants provided pairwise-similarity ratings for all
the training items and the 10 similar lure items (items were rated on a five-point similarity scale). Similarity ratings should
provide another way to understand the mental representation of the training items, and could provide a means for predicting
confusion errors.
We submitted the similarity ratings to MDS analysis, which is a data reduction technique that places each item into an
N-dimensional space based on their psychological proximity (Borg and Groenen 2005). If two items are perceived as highly
similar, they should be located close together in the MDS space; if perceived as non-similar they will be located far apart.
We used the ALSCAL method to derive a two-dimensional MDS space for each individual participant (Borg and
Groenen 2005). (The number of dimensions, N, can be fixed arbitrarily or determined based on model fit.) Because we
sought to both predict ID confusion errors and deduce the primary dimensions underlying these errors, we limited the space
to two dimensions for ease of interpretation. Better predictive performance would likely result from allowing the MDS
procedure to determine the optimal number of dimensions to describe each participant’s data, although it might be more
challenging to interpret the dimensions. Overall, however, the two-dimensional MDS solutions provided a reasonable fit to
the similarity-rating data. Average fit (Young’s S-Stress) was 0.174 (SD ¼ 0.05), and the average proportion of variance
accounted for was high (R 2 ¼ 0.83).
A separate MDS analysis was carried out for each participant. This individual-level analysis is critical since we are
interested in finding methods with potential to adaptively improve training for each learner, and because each learner’s
focus of attention is idiosyncratic. Figure 2a displays the MDS result for a representative participant (chosen at random).
All 22 tanks are plotted in the space, and those that are close together had higher pairwise-similarity ratings than those
far apart.
Based on the MDS solution for each participant, we examined two methods for predicting identification confusions. The
first approach is based on our ability to assign psychological interpretation to the dimensions of a participant’s MDS space.
This approach, which we refer to as the MDS-features method, is described in the next section. A second approach, which
we refer to as the MDS-distance method, is based on predicting confusions from inter-item distance in the MDS space (e.g.
the distance between points in the top panel of Figure 2). This approach will be considered after we evaluate the MDS-
features method.
Ergonomics 849
Dow
nloa
ded
by [
Dic
le U
nive
rsity
] at
11:
37 0
9 N
ovem
ber
2014
Predictions based on MDS-features
Apparent feature ratings. In order to interpret the MDS dimensions, the researchers rated (prior to the study) each of the
tanks in terms of their visually apparent features. For each of the tank images used, two raters assessed the following
features: sectioned body (1 ¼ no, 5 ¼ yes), number of wheels (number of visible wheels), degree to which armour covered
the wheels (1 ¼ none to 5 ¼ a lot), armour smoothness (1 ¼ smooth to 5 ¼ rough), presence of attached auxiliary
equipment (1 ¼ little to 5 ¼ a lot), presence of small auxiliary guns (1 ¼ no, 5 ¼ yes), size of main turret (1 ¼ small to
5 ¼ very large) and presence of antenna (1 ¼ no, 5 ¼ yes). In addition, various other dimensions were rated (1–5 scales
were used as well) to rule out the influence of nuisance features, including angle of the vehicle depicted in the image, size of
Figure 2. (a) Spatial representation between training items produced by multidimensional scaling for a representative participant. (b)The MDS placement of the training items summarised in panel a for the same participant. Coordinate axes are labelled with dimensionalinterpretation produced by regressing apparent feature ratings onto MDS x, y coordinates for the items. See text for details.
C.J. Bohil et al.850
Dow
nloa
ded
by [
Dic
le U
nive
rsity
] at
11:
37 0
9 N
ovem
ber
2014
the vehicle image on the card and assumed actual vehicle size, and presence/absence of a human for size perspective. None
of these dimensions appeared to contribute to the results we describe next, so we will not consider them further. Because
this method was exploratory, we were satisfied with the level of agreement between the feature ratings of the two raters.
There was a high correlation between ratings, r (142) ¼ 0.725, p , 0.001, and no significant difference between the
feature-rating responses, t (286) ¼ 0.84, p ¼ 0.401.
Next, we derived dimensional interpretations for each participant’s MDS space by regressing the apparent feature
ratings for the set of tanks onto the x, y coordinates for each tank in the MDS space. This allowed us to determine which tank
features contributed most to the perceived similarity of the tanks as indicated by their proximity in the MDS space. Features
with the smallest p-values in the regression analysis contributed most to the x, y coordinates for each item. This method has
been utilised in a variety of studies to provide psychological interpretation to latent stimulus-space dimensions (e.g. Kruskal
and Wish 1978; Markman and Makin 1998).
Figure 2b shows the outcome of this analysis for one representative participant (i.e. the same data-set as in Figure 2a).
Regression indicated that for this participant, the appearance of a ‘sectioned body’ and the presence of ‘antenna’ contributed
most strongly to the configuration of points in the MDS space (reflecting this participant’s similarity ratings). Each of the
vehicles displayed to the participant in the similarity-rating task are displayed for illustration. The vehicles on the left side
of the space tended to be main battle tanks consisting of a sectioned body with an armoured and tracked chassis and a
rotating turret supporting a large main weapon. The vehicles to the right tended to have a more unified body style in which
there is no differentiation between chassis and turret. As for the vertical dimension, it is clear that the tanks near the bottom
of the MDS space tended to have antenna, while those at the top did not. Clearly, the presence or absence of antenna is not a
highly informative indicator of tank model, and a mental representation that focuses attention on this dimension might
contribute to hazardous identification confusions among novice decision-makers.
This approach to interpreting the dimensions underlying each participant’s mental representation of tank similarity
enables us to predict identification confusion errors. For each of the two most prominent dimensions for a given participant
(e.g. sectioned body and antennae in our example), we examined the apparent feature ratings (described above) for all
possible pairings of tanks from the ID training task. If the ratings on at least one of the two dimensions matched for a pair of
tanks, or differed in rated value by no more than one, then we predicted a confusion error for that pair. For example, if two
items rated a four on ‘number of wheels’, or if one item rated a one and the other a two on the dimension ‘sectioned body’,
then we would predict a confusion error. A pair had to be closely matched on only one of the two MDS dimensions.
Although this algorithm was a rather arbitrary preliminary attempt, it substantially increased our predictive power for ID
confusions over the card-sort method summarised above. Based on this MDS-features analysis, we could predict a higher
proportion of confusions (M ¼ 0.85, SD ¼ 0.26) than based on the card-sort method (M ¼ 0.45, SD ¼ 0.36), t (56) ¼ 4.83,
p , 0.001. We further evaluate these two methods below.
Correlation between card-sort and MDS-features interpretations
There was substantial overlap between features identified by the MDS-features method and the card-sort method. The
MDS-features method identified about 58 feature categories totalled across participants. The most prevalent were features
pertaining to antenna (15 cases), number of wheels (11), small attached guns (9), armoured wheels (7), sectioned body (7),
armour smoothness (5) and attached equipment (4).
For each participant we compared the number of feature categories in common across card-sort piles and MDS-features
results. For 28 participants (those committing ID confusion errors), prediction features overlapped for at least one feature
category in 50% of cases (i.e. for 14 participants, the card-sort and MDS-features methods identified at least one of the same
features as important to the participant’s sorting or similarity-rating decisions). There was a fair amount of overlap between
the predictive features produced by the card-sort and MDS-features methods, but they were often not in agreement. This
leaves us with an important question: Which method are we to favour for inferring psychological dimensions? The next
section details another method that may inform this decision.
The psychological interpretation provided by the MDS-features method is valuable for understanding – and potentially
influencing – the outcome of identification training (i.e. for understanding the source of confusion errors). However,
predictions do not necessarily have to be linked to a psychological interpretation. Better predictive performance should be
possible when based only on inter-point distances in the MDS space, which reflects the contribution of any number of
underlying psychological dimensions.
In the next section, we consider an MDS-distance-based predictor of confusion errors (which does not rely on dimensional
interpretation), and compare its performance with the MDS-features and card-sort methods. In addition, so far we have
considered only ability to predict observed confusions (‘hits’ in the language of signal detection theory). In the next section, we
use signal detection analysis to compare all three prediction methods, taking into account both hit and false alarm rates.
Ergonomics 851
Dow
nloa
ded
by [
Dic
le U
nive
rsity
] at
11:
37 0
9 N
ovem
ber
2014
MDS-distance predictions and signal detection analysis
Using the MDS representation of training item similarity, we can use inter-point distance as a method for creating new
sorting criteria for the items (i.e. for sorting more-confusable from less-confusable items). We did this by evaluating a
range of inter-point distance criteria for each participant in order to predict confusion errors. This method is described
next.
Given an MDS representation of inter-item similarity (i.e. inter-point distance), we can predict confusability of two
items based on their closeness in space. For each participant, we evaluated a range of distance criteria and made confusion
error predictions. Then, for each participant, we selected the inter-point distance value that resulted in the best predictive
performance (as determined by d 0, the signal detection measure of discriminability).
For example, using the same participant as in Figure 2 as an example, the distance between points in the space is the
Euclidean distance computed using the x, y coordinates for each point. For this participant, inter-point distances ranged
between 0.482 and 4.328. We tried grouping items in the MDS space (i.e. treating them as confusable in a manner akin to
card-sorting) using several distance criterion values (ranging from 0.5, 1, 1.5, . . . , 4.5, 5). For example, when 0.5 was the
criterion, all pairs of items with inter-point distance of 0.5 or less were sorted together and considered confusable. All pairs
of items with larger inter-point distance would not be considered confusable. We repeated this process for each of the
distance criterion values in order to find one that best predicted ID confusion errors. For the participant in our example, an
inter-point distance criterion of two led to the best prediction performance (as defined below). The best inter-point distance
criterion varied by participant.
We computed signal detection indices of predictive performance as follows. A ‘signal’ was defined as an observed ID
confusion error (i.e. an actual confusion made by the participant during the ID confusion task). All other item pairs were
considered ‘noise’. In other words, observed confusions were treated as a signal that we tried to predict using our MDS-
distance-based detector. Other potential (but not observed in the data) confusions were treated as noise since these could
potentially be predicted as ‘signal’ by our algorithm (i.e. they could result in false alarms). A ‘hit’ was defined as a signal
that was predicted by our algorithm, and a false alarm was an ID confusion that was predicted by our algorithm but not
actually committed by the participant during the ID task.
To facilitate comparison, we also used this method to compute signal detection measures for the card-sort and MDS-
features procedures. For the card-sort method, items appearing together in a pile predicted a confusion error (a ‘hit’ if the
predicted error also happened to be an observed confusion in the data; a ‘false alarm’ otherwise). For the MDS-features
method, pairs with at least one of two dimensions close to matching on apparent feature ratings (differing by no more than
one; described above) predicted a confusion error (these, too, were classified as hits or false alarms). All signal detection
analyses were carried out at the individual participant level, and aggregate results presented below are based on individual-
level outcomes.
To increase the reliability of each method’s predictive power, we base these comparisons on participants who
committed at least three confusion errors in the identification task. Table 1 shows the d 0, hit rate and false alarm rates for
each method.
There was an advantage for the MDS-based approaches over the card-sort method in terms of discriminability, d 0 andhit rate. Average d 0 was significantly higher for the MDS-distance method than for the MDS-features method, t (18) ¼ 3.09,
p ¼ 0.003, as well as higher than the card-sorting method, t (18) ¼ 3.498, p ¼ 0.001. Although d 0 was higher for MDS-
features than for card-sort, this difference was not statistically significant, t (18) ¼ 0.93, p ¼ 0.363. Both MDS-distance and
MDS-features had significantly higher hit rates than card-sort, t (18) ¼ 4.29, p , 0.001 and t (18) ¼ 4.49, p , 0.001,
respectively. MDS-distance and MDS-features hit rates did not significantly differ from each other, t (18) ¼ 0.245,
p ¼ 0.405.
On the other hand, the card-sort method produced the lowest false alarm rates. MDS-features resulted in significantly
more false alarms than card-sort, t (18) ¼ 8.43, p , 0.001, as did MDS-distance, t (18) ¼ 4.32, p , 0.001. The MDS-
distance method provided an intermediate level of false alarms; significantly lower than those from the MDS-features
approach, t (18) ¼ 1.76, p ¼ 0.048 but still higher than the card-sort approach.
Table 1. Summary of signal detection analysis on prediction of ID confusions by method.
d 0 Hit rate False alarm rate
Card-sort 0.556 (1.541) 0.450 (0.274) 0.270 (0.163)MDS-features 0.974 (1.174) 0.819 (0.231) 0.771 (0.185)MDS-distance 1.816 (0.672) 0.836 (0.302) 0.624 (0.353)
C.J. Bohil et al.852
Dow
nloa
ded
by [
Dic
le U
nive
rsity
] at
11:
37 0
9 N
ovem
ber
2014
In summary, the MDS-based approaches can lead to substantial gains in hit rate, but this trades off with an increase in
false alarm rate. However, the MDS-distance method provides some relief from this problem over the MDS-features
approach. The overall level of discriminability, d 0, which takes both hit and false alarm rates into account, clearly favours
the MDS-distance method over the others.
Correlation between prediction methods
Another way to examine the relationship between prediction methods is through their correlation with each other. We
computed pairwise correlations between the methods on measures of d 0, hit rate and false-alarm rate. None of the
correlations reached the level of statistical significance ( p . 0.05 in all cases), but the trends were nevertheless consistent
with what we might expect based on the results described above. Based on d 0, MDS-features and MDS-distance were closer
to each other than to card-sorts. There was a weak correlation between MDS-features and MDS-distance, r (17) ¼ 0.27, but
little correlation between MDS-features and card-sorts, r (17) ¼ 0.02 and between MDS-distance and card-sorts,
r (17) ¼ 0.17. Similarly, for hit rates, the strongest correlation was between MDS-distance and MDS-features,
r (17) ¼ 0.39, with virtually no correlation between MDS-features and card-sorts, r (17) ¼ 0.002, and MDS-distance and
card-sorts, r (17) ¼ 0.08. However, for false alarm rates, the relationship between the methods was more equivocal. There
were weak correlations between MDS-features and MDS-distance r (17) ¼ 0.20 and between MDS-distance and card-sorts,
r (17) ¼ 0.21. There was very little relationship between MDS-features and card-sorts, r (17) ¼ 0.11. These relationships
suggest that the strongest predictive relationships are between MDS-based measures on d 0 and hit rate.
Discussion
In this study, we compared methods for predicting identification confusion errors and for understanding the mental
representation underlying these confusions. Each participant completed a series of tasks, including a pre-training card-sort,
identification training and testing, a post-training card-sort and finally a pairwise-similarity-rating task. The card-sort and
similarity-rating tasks provide separate indicators of participants’ mental representation of the similarities and differences
between items in the identification-training stimulus set. Our hypothesis was that participants focus on a subset of features
during training and that we can use this tendency to predict and explain confusion errors. In doing so, we may discover a
basis for directing learners’ attention towards critical stimulus features and away from superfluous details of training items.
The ID training task was designed to mimic features of self-paced study with a set of training images (e.g. a deck of
training cards). We compared two training conditions; a passive ‘observational’ training condition in which participants
studied tanks and their labels together, and a more actively engaging ‘feedback’ training condition in which identification
attempts were followed by corrective feedback. We observed some differences between training types – including a lower
error rate and greater correspondence of confusions with card-sort categories in the feedback condition. Although these
differences are interesting and might warrant further study, our primary focus was on understanding and predicting
confusion errors rather than comparing training methods.
Predicting ID confusions
There was clear evidence that subsets of features were prominent in the minds of participants. Attention was often given to
important features such as body style and weapons. In many cases, however, participants were influenced by the presence of
antennas or other peripheral attachments which may not provide a reliable guide to identification of armoured vehicles in
operational environments.
We found that the card-sort method, which is straightforward to implement, accounted well for identification errors,
predicting about 45% of observed confusions. However, the MDS-based methods proved to be much more sensitive. The
MDS-features method, which combined the spatial representation of similarity judgements with regression-based
interpretation of the most prominent psychological dimensions, predicted about 82% of observed confusions. And the
MDS-distance method, which omits any psychological interpretation, predicted about 84% of confusions. On the other
hand, both MDS methods predict a higher false alarm rate than the card-sort method (although this problem was less severe
in the MDS-distance case). If the goal is to root out ID confusions, the MDS approaches seem to be worth exploring in more
detail.
It is important to point out that the methods applied here are preliminary, and that better predictive performance could
be achieved with the MDS-based methods by allowing higher dimensional representations (we limited our MDS solutions
to two dimensions in this study to simplify interpretation). Also, the MDS-distance method could be improved slightly by
using a parameter optimisation algorithm to find the most predictive distance criterion for each participant. In order to gain a
clearer picture of the trade-off between hit and false alarm rates for each method, we would likely need a study with many
Ergonomics 853
Dow
nloa
ded
by [
Dic
le U
nive
rsity
] at
11:
37 0
9 N
ovem
ber
2014
more repetitions of the identification test trials (to produce more identification confusions). This would make the
identification task a more sensitive measure of mental representation. It is also known that small training sets can lead to
different learning strategies than large training sets (Rouder and Ratcliff 2004). Nevertheless, our goal here was
exploratory. We were able to demonstrate the feasibility of interpreting psychological representation and predicting
identification errors in a variety of ways.
Another prediction approach worth exploring would be to apply models from the literature on classification learning
(Pothos and Wills 2011). One thing we have not explored in the analyses reported here is whether participants
unintentionally develop mental categories (in addition to memorising individual identities) for the training items based on
similarity or whether they simply apply rules along a small subset of dimensions. If mental organisation of training items
corresponds to application of simple dimensional rules (e.g. tanks with armour-covered treads), then learners may memorise
fewer features of the training items than if their representation is based on memories for a set of features for each training
exemplar. Determining their actual strategy would likely require fitting computational models to the data.
Future research
There are of course additional avenues for future research. For example, eye tracking has been used in identification-
training studies to answer questions similar to our own about the focus of attention during learning (e.g. Lee et al. 2013).
It would be valuable to see whether eye tracking results corroborate our behaviour-based conclusions regarding the features
that drive performance.
Another important question pertains to the contribution of individual differences. Attention to certain details may be
predictable from personal preferences, personality type or experiences that learners have had. The current work did not
assess these variables. Furthermore, it will be important to understand any differences between learning performance of
novices (as evaluated here) and those who have already received some form of training in vehicle identification.
Another important consideration is the possibility that old/new recognition, rather than identification based on
memorising sets of features, influences what learners remember and respond to. Some errors might be due more to a vague
sense of recognition than reasoning based on attention to specific characteristics.
Application to adaptive training
Both the card-sort and the MDS-based methods could provide the basis for future adaptive training systems. However,
although card-sort data are simple to collect, their analysis and interpretation are much more subjective than the similarity-
rating-based MDS methods. It might be difficult to implement in a computer-mediated training system using the card-sort
method. On the other hand, the similarity-based methods lend themselves much more readily to automation. It is easy to
envision an adaptive training system that teaches and tests item identification (tanks or otherwise) along with a system for
collecting similarity ratings for the purpose of tailoring subsequent training sessions to optimise attention allocation.
Such an approach makes sense given the ubiquity of hand-held computing devices (e.g. cell phones). Mobile devices
can increase the realism of training stimuli and allow interactive learning, in addition to real-time adaptive capabilities
based on the methods reported here. The system could improve training by directing attention away from features that are
unreliable or unimportant and towards features that are critical for accurate vehicle identification. Furthermore, such an
evaluation system would easily integrate into efforts at determining the ideal form-factor for training items. For example,
ongoing research focuses on investigating differences between static images, movies and 3D interactive virtual models for
training (Keebler, Jentsch, and Hudson 2011; Keebler, Jentsch, and Schuster, 2013).
Mobile device training systems such as the Army’s ROC-V (Recognition of Combat Vehicles) training programme
allow users many options for studying and testing on trained vehicles (Night Vision and Electronic Sensors Directorate
2013). By incorporating a pairwise comparison or sorting task like those evaluated here, and analysing the results using our
MDS-based approach, the system could be tuned to (1) shorten training by focusing learner attention on routinely confused
items and (2) ameliorate potential errors by emphasising dimensions that are critical for accurate identification.
Finally, future research will need to put such an adaptive training method to the test with respect to real training
outcomes. The system would need to model learner performance in real time, and compare performance with more
traditional methods.
References
Biederman, I., and M. M. Shiffrar. 1987. “Sexing Day-Old Chicks: A Case Study and Expert Systems Analysis of a Difficult Perceptual-Learning Task.” Journal of Experimental Psychology: Learning, Memory, and Cognition 13: 640–645.
Borg, I., and P. Groenen. 2005. Modern Multidimensional Scaling: Theory and Applications. 2nd ed. New York: Springer-Verlag.
C.J. Bohil et al.854
Dow
nloa
ded
by [
Dic
le U
nive
rsity
] at
11:
37 0
9 N
ovem
ber
2014
Briggs, R. W., and J. H. Goldberg. 1995. “Battlefield Recognition of Armored Vehicles.” Human Factors 37 (3): 596–610.Edwards, P. J., F. Sainfort, T. Kongnakorn, and J. A. Jacko. 2006. “Methods of Evaluating Outcomes.” In Handbook of Human Factors
and Ergonomics, 3rd ed., 1150–1187. Hoboken, NJ: Wiley.Folstein, J. R., I. Gauthier, and T. J. Palmeri. 2010. “Mere Exposure Alters Category Learning of Novel Objects.” Frontiers in Psychology
1: 40.Keebler, J. R., M. Harper-Sciarini, M. Curtis, D. Schuster, F. Jentsch, and M. Bell-Carroll. 2007. “Effects of 2-Dimensional and 3-
Dimensional Media Exposure Training on a Tank Recognition Task.” Proceedings of the 51st Annual Meeting of the Human Factorsand Ergonomic Society, Baltimore.
Keebler, J. R., F. Jentsch, and I. Hudson. 2011. “Developing an Effective Combat Identification Training.” Proceedings of the 55thAnnual Meeting of the Human Factors and Ergonomics Society, Las Vegas.
Keebler, J. R., F. Jentsch, and D. Schuster. 2013. “The effects of video game experience and active stereoscopy on performance in combatidentification tasks.” Submitted for publication.
Keebler, J. R., L. Sciarini, T. Fincannon, F. Jentsch, and D. Nicholson. 2008. “Effects of Training Modality on Target Identification in aVirtual Tank Recognition Task.” Proceedings of the 52nd Annual Meeting of the Human Factors and Ergonomics Society, New York.
Keebler, J. R., L. Sciarini, T. Fincannon, F. Jentsch, and D. Nicholson. 2010. “A Cognitive Basis for Vehicle Misidentification.” InHuman Factors Issues in Combat Identification, edited by D. H. Andrews, R. P. Herz, and M. B. Wolf, 113–128. Burlington, VT:Ashgate.
Kemler Nelson, D. G. 1984. “The Effect of Intention on What Concepts Are Acquired.” Journal of Verbal Learning and Verbal Behavior23 (6): 734–759.
Kirkpatrick, D. L., ed. 1975. “Techniques for Evaluating Training Programs.” In Evaluating Training Programs. Alexandria, VA: ASTD.Kruskal, J. B., and M.Wish. 1978. “Multidimensional Scaling.” In Sage University Paper Series on Quantitative Application in the Social
Sciences, 07–011. Beverly Hills: Sage.Lee, C., E. Middleton, D. Mirman, S. Kalenine, and L. J. Buxbaum. 2013. “Incidental and Context-Responsive Activation of Structure-
and Function-Based Action Features during Object Identification.” Journal of Experimental Psychology: Human Perception andPerformance 39 (1): 257–270.
Malone, T. W. 1981. “Toward a Theory of Intrinsically Motivating Instruction.” Cognitive Science 4: 333–369.Markman, A. B., and V. S. Makin. 1998. “Referential Communication and Category Acquisition.” Journal of Experimental Psychology:
General 127 (4): 331–354.Night Vision and Electronic Sensors Directorate. 2013. Army ROC-V (Version 1.0) [Mobile application software]. https://play.google.
com/store/apps/details?id¼gov.usa.rocvO’Kane, B. L., I. Biederman, E. E. Cooper, and B. Nystrom. 1997. “An Account of Object Identification Confusions.” Journal of
Experimental Psychology: Applied 3 (1): 21–41.Pothos, E. M., and A. J. Wills, eds. 2011. Formal Approaches in Categorization. Cambridge: Cambridge University Press.Regan, G. 1995. Blue on Blue: A History of Friendly Fire. New York: Avon Books.Rouder, J. N., and R. Ratcliff. 2004. “Comparing Categorization Models.” Journal of Experimental Psychology: General 133: 63–82.Smith, E. E. 2008. “The Case for Implicit Category Learning.” Cognitive, Behavioral, and Affective Neuroscience 8 (1): 3–16.Yamauchi, T., and A. B. Markman. 1998. “Category Learning by Inference and Classification.” Journal of Memory and Language 39:
124–148.
Ergonomics 855
Dow
nloa
ded
by [
Dic
le U
nive
rsity
] at
11:
37 0
9 N
ovem
ber
2014