the relationship of perceived instructor performance - ttu
TRANSCRIPT
THE RELATIONSHIP OF PERCEIVED INSTRUCTOR PERFORMANCE
RATINGS AND PERSONALITY TRAIT CHARACTERISTICS
OF U.S. AIR FORCE INSTRUCTOR PILOTS
by
JOHN DOUGLAS GARVIN, B.S., M.A.
A DISSERTATION
IN
HIGHER EDUCATION
Submitted to the Graduate Faculty of Texas Tech University in
Partial Fulfillment of the Requirements for
the Degree of
DOCTOR OF EDUCATION
Approved
May, 1995
UT 7 3 APOSSI'^
'^ ,,i ACKNOWLEDGMENTS J A - / /c /f
I would like to express my gratitude to Dr. Ron Opp for
his extraordinary support and guidance throughout my
doctoral studies. I would like to also acknowledge Ms.
Barbi Dickensheet at the Graduate School for her support and
flexibility in the coordination of this document.
My appreciation is also extended to my Air Force
comrades in the Behavioral Sciences and Leadership
Department at the U.S. Air Force Academy. They have covered
classes, proofed, and encouraged my doctoral work while I
was in absentia from Texas Tech.
Special acknowledgment to my children, Samantha, Ross,
Austin and Jacob. For all the times we couldn't play
together. Somehow they seemed to understand. My deepest
appreciation goes to my wife, Julie. She has, and always
will be, the wind beneath my wings.
11
TABLE OF CONTENTS
• • ACKNOWLEDGMENTS n
ABSTRACT vi
LIST OF TABLES ix
CHAPTER
I. INTRODUCTION 1
Background 5
Statement of the Problem 8
Purpose Statement 8
Significance of the Study 9
Thesis Statement 11
Assumptions 12
Research Questions 12
Overall Research Questions 12
Hypotheses 13
Limitations of the Study 14
Delimitations 15
Terms and Definitions 16
Summary 20
II. REVIEW OF THE LITERATURE 22
Personality Theory 22
Background 24
Type Theory 3 0
Trait Theory 35
Aviation Personality Research 39
Early Military Development 39
• t •
111
Recent Renewed Interest 43
The Personality Characteristics
Inventory 45
The Big Five 49
Job Performance Assessment 52
Aviation 52
Performance Criterion 56
Performance Rating 60
Higher Education 61
Summary 65
III. METHODOLOGY 67
Research Design 67
Scope of the Study 7 0
Subjects 71
Instrumentation 73
Performance Measurement 74
Personality Assessment 77
Demographic Data 81
Research Procedures 82
Variables 85
Statistical Analysis 87
Research Concerns 89
Significance for Policy and Theory 90
IV. ANALYSIS OF DATA AND DISCUSSION 92
Performance Ratings 97
Personality Trait Measures n o
Demographic Measures 120 iv
Summary 127
V. SUMMARY, CONCLUSIONS, DISCUSSION
AND RECOMMENDATIONS 129
Summary of the Study 13 0
Conclusions 132
Discussion and Implications 133
Perceived Performance 133
Personality 139
Demographics 144
Observations 147 Implications for Higher Education 149
Recommendations 151
Recommendations for Instructor Pilots .. 151
Recommendations for Future Research ... 152
Conclusions 153
REFERENCES 157
APPENDIX
A. TESTING INSTRUMENTS 172
B. LETTERS OF COORDINATION 183
C. PCI CONSTRUCT COMPOSITION 186
D - DATA ANALYSES TABLES 191
ABSTRACT
This research furthers the field of knowledge in the
use of personality trait theory with aircrew classification
and training. It was an exploratory study in the use of
personality trait characteristics and demographic background
characteristics to predict perceived instructor pilot
performance effectiveness. Performance effectiveness was
measured using a 360-degree performance rating technique, a
process which includes perceived instructor effectiveness
appraisals from three distinct groups: students, peer-
instructors, and supervisors. Three stepwise regression
equations were developed to predict perceived instructor
pilot performance using: personality traits, demographic
variables, and a combination of personality traits and
demographic variables. The subjects included 152 U.S. Air
Force Air Training Command instructor pilots from two
undergraduate pilot training bases. Cluster sampling of
entire flights (classrooms) was employed to obtain
comprehensive performance assessment for each instructor. A
typical instructor's performance was rated by 15 students, 8
peers, and one supervisor. A total of 423 students and 19
supervisors participated. This constitutes approximately
35% of the population of U.S. Air Force Undergraduate
Instructor Pilots. Performance appraisal criteria included
seven dimensions identified through a pilot study: Job
vi
Competence-Knowledge, Job Competence-Performance, Job
Competence-Performance under Pressure, Leadership, Teamwork,
Personality, and Communication Skills. The performance
assessment instrument was a modified version of the NASA/UT
Astronaut Assessment Survey. Personality traits were
measured with the Personality Characteristics Inventory
(PCI). The first assessment established the validity of the
performance appraisal criteria. The various rating groups
evaluated the appropriateness of each performance criterion
scale on the NASA/UT performance assessment instrument for
instructor pilot applicability. All groups agreed or
strongly agreed on all performance scale applicability.
Regression results using multiple stepwise regression
accounted for 5% of the variance in the personality trait
only equation with two significant variables: Negative
Communion (P=-.16), and Impatience/IrriteLbility (P=-.17).
The demographic equation accounted for 11% of the variance
with two significant variables: Number of Children (P=.22),
and Military Rank (P=.24). The combined regression equation
accounted for 14% of the variance and included three
variables: Number of Children (Ps=.22), Military Rank
(P=.23), and Verbal Aggression (P=-.19). Although the
prediction portion of the research resulted in marginal
findings, the performance appraisal portion was very
successful. All rating groups identified the new
• fl
Vll
performance appraisal criteria as good to very good. The
360-degree rating technique was well received with many
instructor pilots reporting eagerness for this type of
unique feedback. The implications of this study include the
contribution and development of a new performance appraisal
method for instructor pilots that is more comprehensive and
insightful. Additionally, personality research in aviation
is further explored. Future research should continue the
performance prediction design investigation by applying the
new Big Five personality assessment measure and by studying
specific
Vlll
LIST OF TABLES
1. NASA/UT Performance 7 6
2. PCI Constructs 79
3 . AETC Instructor Pilot Demographics 94
4. AETC Instructor Pilot
Flying Experience 9 5
5. Perceived Performance Ratings 105
6. Group Ratings Correlation
Comparisons 109
7. Personality Trait Comparison 112
8. Correlation Values of Group Performance Rating and Personality Traits 114
9. Personality Predictors of Overall Performance 118
10. Correlation Values of Group Performance Rating and Demographics 121
11. Demographic Predictors of Overall Performance 124
12. Combined Demographic and Personality Trait Predictors of Overall Performance 12 6
13. Summary of One-way ANOVA Between Rating Groups for Performance Scale Appropriateness 192
14. Summary of One-way ANOVA Between Rating Groups for Perceived Instructor Pilot Performance 193
15. Intercorrelations Between Subscales for Instructor Pilot Personality Traits 194
16. Intercorrelations Between Demographic Variables for Instructor Pilots 195
IX
CHAPTER I
INTRODUCTION
One of the most vital and costly higher education
programs in the U.S. military is Undergraduate Pilot
Training (UPT). This twelve month graduate level course is
an extensive and rigorous training progreun that transforms a
college graduate into a military jet pilot. The curriculum
is highly technical and delivered at a rapid pace. The
application of newly learned skills Is practiced and
evaluated daily in a unforgiving flying environment that
constantly threatens the safety of the student and
instructor. If the student is unable to learn required
skills in a timely fashion or unable to maintain the
demanding pace of the technical curriculum, he is eliminated
from training.
Each student elimination from pilot training directly
impedes the program's ability to produce the needed military
pilot quota, and compromises the program's cost
effectiveness. Siem (1988) estimated each student failure
in the U.S. Air Force UPT program costs taxpayers $65,000 to
$80,000. This becomes Increasingly important as student
pilot attrition rates in the U.S. Air Force reach the
current level of approximately 20% (HAF-DPP-A, 1992). This
attrition rate equates to approximately 200 U.S. Air Force
pilot candidate failures annually, with an overall economic 1
loss of around 14 million dollars. Attrition clearly
affects the efficiency of the Air Force in carrying out its
mission to train pilots.
The historical and continuing effort to reduce
attrition rates in aviation has focused on student pilot
selection. A recent Air Force commissioned study underlines
the selection screening emphasis:
High training costs associated with attrition rate at Undergraduate Pilot Training (UPT), and the impending transition from a single-track UPT system to a multi-track Specialized Undergraduate Pilot Training (SUPT), have underscored the need for improving methods of selection and classifying pilot trainees. (Kantor & Carretta, 1988, p.14)
The selection process attempts to screen candidates in
physical. Intellectual, and emotional areas. The physical
aspect requires 20/20 eyesight and an overall physical
conditioning that will withstand the stresses inherent in a
high performance aircraft. The intellectual aspect attempts
to identify those who have both the aptitude for technical
learning and the fine motor coordination that complements
required cockpit skills. The intellectual assessment is
perhaps the most effective predictor in the screening
process (R. Davis, 1989). Researchers from the Air Force
Human Resources Laboratory (AFHRL) have found a modest
correlation (r=.31) between flying aptitude (intelligence)
as defined and measured by the Air Force Officer Qualifying
Test (AFOQT), and success in UPT (Borderlon & Kantor, 1986).
The weakest and most subjective measure in the
selection screening process is the personality measure.
Recent studies have attempted to apply personality theory in
the pilot screening process to help improve selection, but
have failed to yield significant results (R. Davis, 1989).
Part of the reason for previous personality measure failures
is the lack of an aviation-specific personality tool.
Despite the intensive screening process that has
evolved over the past 50 years in military aviation, there
still remains a substantial 20% student pilot attrition
rate. The emphasis in reducing student-pilot attrition
rates continues to focus on the up-front process of
selecting the candidate before any costs are invested into
their training. This may be the reason that progress in
lowering attrition has stagnated for the past 14 years. It
is the opinion of this researcher that an environmental
aspect in UPT, specifically perceptions of instructor
effectiveness, will provide better potential in reducing
future student pilot attrition rates. Bowers (1958)
indicated in his report on factors related to achievement in
instructor's training syllabus that there was a wide
divergence eunong a group of instructors with respect to
experience and effectiveness. If the quality of instruction
increases, perhaps there will be fewer students that wash
out during training. Although student pilot attrition is
not measured in this study, the present research furthered
the effort in developing personality trait measures in
aviation, but targets those chosen to be instructor pilots,
instead of student pilot screening. By identifying
personality characteristics of successful instructor pilots
rather than merely student selection, student pilot
attrition may be decreased by identifying behavior that is
perceived as increasing instructor performance.
According to R. Davis (1989), there are three
alternative approaches to solving student pilot attrition
problems other than selection screening: (1) increase
candidate selection to compensate for an expected attrition
rate, (2) lower training standards, or (3) Improve the
instructional process and instructor effectiveness.
Increasing the number of candidates may solve the pilot
production quota requirement, but it does not promote cost-
effectiveness. This approach would increase the number of
failures which would result in an even higher per capita
cost of each graduate. Davis' second alternative, lowering
training standards, simply reduces the quality of the
graduating pilot and compromises safety concerns. Both of
these alternatives are unaccepteible to the nation's military
and the overall aviation industry. The final alternative,
improving the Instructional process and instructor
effectiveness, is the most plausible. This study attempts
to improve the instructor effectiveness in UPT by
identifying a new Instructor performance measurement
instrument and by identifying instructor pilot personality
traits that complement perceived effectiveness from various
rating groups of students, peer-instructors, and
supervisors.
Previous aviation-related research has defined the
instructional process as consisting of three basic factors:
the student's capacity for learning, the course sylleibus,
and the instructor's teaching ability. Although all of
these factors overlap, they are generally assessed
individually. The student's learning capacity is assessed
during the application and selection process. Only those
students with proven ability in technical curricula and
flying aptitude are admitted. The training syllabus is
currently under drastic revision to better incorporate the
introduction of new types of training aircraft, the T-1 and
AT-38. The redesigned syllabus will produce a new operating
policy in conducting advanced training, but will have little
impact on the basic phase of training (where over 80% of
attrition occurs) (Siem, 1988). The final aspect of the
instructional process, the instructor's teaching ability and
effectiveness, appears to be the most neglected factor in
recent years and is the focus of this research.
Background
Undergraduate Pilot Training (UPT) instructor pilots
are selected by Air Education Training Command (AETC)
Headquarters, Randolph AFB, San Antonio, Texas. The
instructor pilot (IP) population is composed of two distinct
types of instructors: the First Assignment IP (FAIP), and
the Major Weapons System (MWS) pilot. A FAIP is unique in
military pilot training to the Air Force and they
historically represent the majority of the instructor pilot
population. FAIPs are recent graduates from UPT with no
operational experience. Most have approximately 2 00 hours
total flying time and are newly commissioned officers.
Generally, FAIPs perform within the top 25% of their pilot
training class. Currently, Air Education Training Command
is drastically changing the instructor pilot population
composition. It is reducing the number of FAIP instructors
in favor of MWS pilots. The latest command target figure
for instructor composition is 95% MWS instructors and 5%
FAIPs by the year 2000 (Barone, 1993).
MWS candidates possess years of operational flying
experience. They must submit an application package to
Headquarters AETC to be considered for an instructor pilot
position. A selection board reviews the applications and
selects, not necessarily the most qualified, but rather the
most eligible (Barone, 1993). Eligibility Is based on
possessing the minimum requirements and the maximum time on
station. Once a candidate is selected, FAIP or MWS, they
are sent to Randolph AFB for three months of Pilot
Instructor Training (PIT).
PIT trains a pilot in specific aircraft systems,
introduces the student training syllabus, and builds a
minim\im proficiency of instructor flying skills. Out of the
entire training only two hours are devoted to communication
and human relation skills and no training is attempted
concerning student learning or teaching. The neglect of
personal communication skills and instruction methods during
instructor training dilutes instructor effectiveness and is
further compromised by the attitudes of many of the future
instructors. Many of the instructor pilots appear resentful
being assigned instructor duty and only applied because it
was the final opportunity to remain in a flying position.
They have left, or are forced out of, more attractive flying
positions and offered a UPT instructor job as a last chance
to serve in a flying role. Their lack of commitment to the
student may be reflected in poor instruction.
While recently serving as an instructor, this
researcher noticed that many of the '*non-volunteers" become
effective instructors, and many of the "volunteers" were
non-effective instructors. Motives, attitudes, and
personality often distinguished between the effective and
non-effective instructors. Many of the perceived best
instructors seemed to be "naturals" exhibiting specific and
similar personality traits. This study investigates the
personality profiles of instructor pilots and will identify
8
the relationship between perceived effectiveness and
personality traits that complement performance ratings.
Statement of the Problem
Military student pilot attrition rates during
undergraduate pilot training remains high, costing taxpayers
millions of dollars each year. Past and current military
efforts to control student attrition focuses on student
selection screening; however, student attrition rates have
stagnated near 20% for over the past 14 years (Davis, 1989).
Previous research suggested that instructor pilot
performance may provide further control over student
attrition (W. Davis, 1990). This study proposes the more
effective instructor pilots possess a certain personality
profile that is more conducive and accommodating to the
instructor role. Therefore, personality trait research
should be redirected from pilot candidate selection to
instructor pilot classification and placement.
Purpose Statement
The purpose of this study is to identify a new
instructor performance assessment instrument and to examine
the relationship between the personality trait
characteristics of Air Force Instructor Pilots and their
perceived performance ratings as determined by three groups:
students, peer instructors, and supervisors. The three
rating groups provide a global performance assessment
(officership, flying, and instructional). The perceived
validity of a new performance appraisal instrument, the
NASA/UT Astronaut Assessment Survey, is explored; along with
a novel 360 degree performance rating technique which
develops performance feedback from the multiple perspectives
students, peers, and supervisors. The study additionally
develops a demographic model predicting perceived
performance effectiveness of instructor pilots, and a
combined personality trait with demographic factors
prediction equation.
Significance of the Study
This research makes a contribution to two higher
education issues facing the aviation industry: provides a
starting point in exploring a possible alternative solution
to reducing student pilot attrition with a new instrument to
assess Instructor pilot performance, and adding new
knowledge to the application of personality theory to
instructor selection and performance assessment.
Current efforts to lower student pilot attrition are
predominately focused on candidate selection. The
overemphasis of a single input variable in the learning
process is an incomplete analysis that ignores other vital
input factors which may affect failures. Astin (1991)
defines inputs as ^ referring to those personal qualities the
10
student brings initially to the educational program
(including the student's initial level of developed talent
at the time of entry)" (p. 17). This study uniquely
investigates a key twist to Astin's definition of input by
investigating the input variable of the instructor.
Educational input qualities of the instructor pilot appear
ignored in UPT. Additionally, the outcome variable of
instructor effectiveness also appears neglected. By
exploring the input varieible of instructors and correlating
these characteristics to an outcome variable of perceived
instructor performance, a better Tinderstanding will be
developed. This could lead to improvements in the
instructional process and decrease student attrition.
The second significant contribution of this study is
its application of personality theory to instructor
selection. Studies are quite extensive in projective
personality testing of varying groups to assure quality
selection (Mischel, 1968) . However, little research has
been aimed at flight instructor classification and desirable
personality types in relation to perceptions of effective
instruction (W. Davis, 1990). The literature is very
limited and lacks empirical evidence in identifying a
preferred type for the role of instructor pilots.
Performance criteria have yielded either weak effects,
equivocal results, insufficiently studied relationship
variables, or inexplicaible findings from cross-validation
11
(Joaquin, 1980). This study establishes a new performance
criteria and further contributes to the development of
personality trait theory applied specifically for aircrews.
Unfortunately, there is neither an agreed upon definition of
"effective teaching" nor any single, all-embracing
criterion. However, according to Cross, students do a
pretty good job distinguishing among teachers on the basis
of how much they have learned (Cross, 1988). Perceived
effectiveness is further strengthened by incorporating
perceived ratings from multiple groups (e.g., students,
peer-instructors, and supervisors). This study uniquely
utilizes perceived performance ratings from multiple groups
to establish an overall performance assessment rating. The
triangulation of perceived performance rating from various
groups also adds a form of cross-validation to the perceived
ratings.
The results will provide a new tool for instructor
performance appraisal and a predictive model that may be
used in building future constructs for selection, training,
and performance measurement of military instructor pilots.
It will also further develop the credible use of personality
theory for aviation selection and classification.
Thesis Statement
The principal investigator contends that instructor
pilots with various levels of perceived performance ratings
12
have common personality trait profiles. These profiles can
be identified with self-reporting personality inventories
and perceived performance ratings from students, peer
instructors, and supervisors.
Assumptions
The following are a list of assumptions used for this
study.
1. Increased effectiveness of instructor pilots will
lead to decreases in student pilot attrition (R. Davis,
1989) .
2. Self-reported personality trait scores are
accurate reflections of true personality trait scores
(Digman, 1990).
3. Overall perceived instructor performance may be
represented by a uniform weighted rating that combines
scores from three direct customer groups: students, peer
instructors, and supervisors (Marsh, 1987).
4. Demographic variables are not significantly
interactive with personality trait measures, as determined
by Hogan (1977), Conley (1985).
13
Research Ouestions
Overall Research Questions
There were two research questions explored in this
study.
1. Do perceived effectiveness ratings establish a
valid assessment of instructor pilot performance?
2. Are personality traits predictive of perceived
instructor performance ratings?
Hypotheses
The research questions were further refined to the
following testable hypotheses.
1. There will be no difference in the appropriateness
ratings of the seven scales from the NASA/UT Astronaut
Assessment Instrument between the three rating groups.
2. There will be no difference in perceived
performance ratings of instructors between students, peer
instructors, supervisors, and self.
3. There will be a significant relationship between
perceived effectiveness ratings of instructor pilots at UPT
and the following personality trait scale scores:
instrumentality, expressivity, mastery, work,
competitiveness, achievement striving.
4. There will be a significant relationship between
the following personality trait scale scores and perceived
effectiveness ratings of instructor pilots at UPT: negative
14
instrumentality, verbal aggression, impatience/irritability,
negative communion.
5. Personality traits can be used to create a
predictive profile of perceived instructor pilot
performance.
6. Demographic characteristics can be used to create
a predictive profile of perceived instructor pilot
performance.
7. Personality traits and demographic characteristics
can be used to create a predictive profile of perceived
instructor pilot performance.
Limitations of the Study
The limitations to this study include the cluster
sampling technique and the personality assessment
instrument, the Personality Characteristics Inventory (PCI).
The most recent Air Force studies by Pedersen, Allan, Laue,
and Johnson (1992) recommends that future personality
measures in aircrew selection should utilize the five-factor
theory, which is currently under development. Parallel
studies by the Naval Aviation Medical Lsiboratories are also
pursuing the development of a five-factor instr\unent. These
instruments are not yet developed and are years away from
being validated. Currently, the most validated and widely
accepted personality instrument in the aviation field for
aircrew assessment is the PCI.
15
The PCI was developed specifically for the aviation
field and modified to assess aircrew relationships. A
review of the literature fo\ind few studies that correlated
personality assessment with aviation instructor performance.
Because the PCI does measure inter-personal relationships
unique to the aviation industry, it represents the best
potential assessment tool for this pioneering application of
instructor pilot assessment. A final limitation is
measuring instructor performance with "perceived
effectiveness." Cohen (1981) determined a marginal
correlation (r=.47) between actual instructor performance
and perceived student rating of instructor performance in a
comprehensive meta-analysis of student course critiques.
Perceived effectiveness ratings of performance may not
provide an accurate representation of actual performance.
This study uses perceived performance ratings from multiple
groups (supervisors, students, peers) to provide a more
relieible indication of actual performance.
Delimitations
This study is limited to the performance assessment
factors measured by the seven scales on the NASA/UT
Astronaut Assessment Inventory: job competence-knowledge,
job competence-performance, job competence-under pressure,
leadership, teamwork, communication skills, personality.
16
Additionally, the study focuses only on U.S. Air Force
Instructor Pilots.
Terms and Definitions
360 degree Performance Feedback. A performance
appraisal technique that uses perceived performance feedback
from multiple groups possess unique perspectives and access
to the subjects work behavior. The rating groups usually
include subordinates, supervisors, and peers.
(AETC) Air Education Training Command. All U.S. Air
Force undergraduate flying training is conducted by this
command. Currently, there are four Undergraduate Pilot
Training bases that are regulated and controlled by ATC
Headquarters at Randolph AFB, Texas.
Achievement Striving. A cluster of characteristics
related to hard work, activity, and seriousness in
approaching work tasks ("How much does your job stir you
into action?" "Compared to others, how much work do you put
forth?") (Chidester, Helmreich, Gregory & Gels, 1991).
Competitiveness. A preference for tasks with clear
winners and losers and a desire to outperform others ("It
annoys me when other people perform better than I do.")
(Chidester et al., 1991).
17 Expressivity. A measure of interpersonal warmth and
sensitivity (gentle, kind, aware of the feelings of others),
(Chidester et al., 1991).
First Assignment Instructor Pilot (FAIP). A recent
graduate of undergraduate pilot training whose first
operational assignment is as an instructor for UPT.
Generally, a FAIP is a Second Lieutenant with a total of 200
hours military flying time and no exposure to operational
flying missions.
Flight. The Air Force structural unit of instructor
and student assignment. In this study a Flight is
synonymous with classroom. A Flight consists of
approximately 15 students, 10 instructors, and a supervisor.
Impatience/Irritability. ("How easily do you get
irritated?" "When a person is talking and takes too long to
come to a point, how often do you feel like hurrying the
person along?") (Chidester et al., 1991).
Instructor Pilot. A pilot qualified in a specific
training aircraft who has completed Pilot Instructor
Training. Generally an IP is assigned two or three
students, however, currently a 1:1 ratio.
Instrumentality. Refers to overall goal-orientation
and independence (active, self-confident, can stand up to
pressure) (Chidester et al., 1991).
18 Major Weapon System (MWS). An experienced pilot from
an operational flying backgro\ind such as: fighters, bombers,
tankers, transports. Generally, MWS IPs are a Captain grade
with approximately 1000 total military flying hours.
Mastery. A preference for challenging tasks and
striving for excellence ("If I am not good at something, I
would rather keep struggling to master it than move on to
something I may be good at.") (Chidester et al., 1991).
Negative Commimion. Self-subordinating, subservient,
or \inassertive characteristics (gullible, spineless,
subordinates self to others) (Chidester et al., 1991).
Negative Instrumentality. Negative characteristics
reflecting arrogance, hostility, and interpersonal
invulnerability (boastful, egotistical, dictatorial),
(Chidester et al., 1991).
Personality Characteristics Inventory (PCI). The test
battery NASA/UT Project found most useful in the
identification of meaningful subpopulations eunong aviators.
The PCI captures two broad trait dimensions: instrumentality
or goal orientation and expressivity or interpersonal
orientation.
Pilot Instructor Training (PIT). A 12-week program
located at AETC Headc[uarters where all instructor pilot
candidates are trained and qualified as instructor pilots in
specific aircraft (i.e., T-37, T-38) .
19 Student Pilot. A commissioned officer selected for
UPT. Selection is based on intense competition including
extensive screening in mental, physical, and basic aviation
skills. Average profile, 95% male, age 23, rank of 2 Lt.
T-37 "Tweet". Twin jet engine, subsonic, side by side
seating, basic jet trainer. The first four months of UPT
flight training occurs in this aircraft. Typical student
attrition in this phase of training is approximately 15
percent.
T-38 Talon. Twin jet engine, centerline thrust, high
performance supersonic, tandem seating, century series style
fighter trainer. Final seven months of UPT training occur
in this aircraft. Typical student attrition of eODOut 5
percent.
Undergraduate Pilot Training (UPT). A 52 week course
to train basic and advanced jet aviation skills to
commissioned officers. Conducted with a common syllabus by
four Air Force bases: Reese AFB, TX; Laughlin AFB, TX; Vance
AFB, OK; Columbus AFB, MS; supervised by Headquarters AETC,
Randolph AFB, TX.
Verbal Aggression. Verbal passive-aggressive
characteristics (complaining, nagging, fussy) (Chidester et
al., 1991).
Work. A desire to work hard and do a good job ("I find
satisfaction in working as well as I can.") (Chidester et
al., 1991).
20
Sunanary
Chapter I introduced a problem regarding the attrition
rate among military student pilots. It was suggested that
input and outcome variables concerning personality trait
characteristics of the instructors and perceived instructor
effectiveness may play a significant role in instructional
effectiveness and thus impact student attrition rates.
Perceived performance ratings are a vulnerable and
subjective measure of true performance. This study utilizes
multiple observer groups with different insights into the
overall instructor job, to establish a weighted overall
perceived performance rating for each instructor. This
feedback is useful for Instructor development baseline data
and possible future evaluation.
Field observation also identified common personality
profiles among perceived "quality" instructors. It appeared
the commonly identified best instructors had similar
personality traits that complemented learning. This study
investigates the relationship of personality traits and
perceived performance effectiveness. The purpose is to
identify specific personality characteristics that foster
student learning and success, and to explore the validity of
a new performance assessment instrument and rating
technique.
Chapter II presents a review of relevant literature
concerning perceived performance ratings, personality trait
21
theory, and past studies of the topics in both education and
aviation environments. Chapter III explains the procedures
and methodology utilized to conduct the study. Chapter IV
presents the research findings. Chapter V discusses the
interpretation of the findings and contains recommendations
for future research.
CHAPTER II
REVIEW OF THE LITERATURE
This chapter contains a review of the literature
concerned with personality trait prediction in perceived
instructor pilot performance. The first section reviews the
use of personality theory in predicting job performance,
which provides the theoretical basis for the present study.
It includes discussions illustrating the background of
personality psychology development, defining "Type" theory
and its application, and defining "Trait" theory and its use
in the present research. The second section discusses
aviation personality research, its beginning, development
and recent renewed interest. It includes two prominent
personality assessment tools in aviation selection: the
Personality Characteristics Inventory and the Big-Five
factor model. The third section examines job performance
assessment, covering performance appraisal criteria and the
360-degree rating technique. The final section reviews
faculty assessment techniques ranging from student critiques
to peer reviews.
Personality Theory
Personality theory is a subdiscipline of psychology.
It became an empirically based scientific field of
psychology with the release of G. W. Allport's book (1937),
22
23 Personality: A Psychological Interpretation. Allport used
the labels "nomothetic" and "idiographic" to explain two
different and distinct approaches currently used in
psychological inquiry. Nomothetic describes the search for
general laws, whereas, idiographic describes what is
particular to the individual case. Allport believed
psychology had developed exclusively into a nomothetic
discipline ignoring the \inique consideration and importance
of individual. He advocated the psychology of personality
should employ both nomothetic and idiographic approaches to
understand people as well as particular individuals
(Allport, 1937). As a result, personality theory has
developed into two disciplines with two different
objectives, one that studies human nature, and another that
studies the unique individual case. This present research
investigates personality from a human nature perspective.
Generalizations across a homogeneous group of subjects are
explored.
The field of personality has evolved to include five
tasks: describing, generalizing about, explaining,
predicting, and intentionally changing behavior at each of
the three levels of (1) persons-in-general; (2) groups of
persons; and (3) individuals (Runyan, 1983). The common
observation of all of these tasks is behavior. In
personality psychology behavior is attempted to be
explained, predicted, or controlled. The difficulty in
24 assessing or manipulating behavior is inconsistency.
Individual behavior differences across various situations
over time are inconsistent, at least in the short-run (Weiss
& Adler, 1984). Recently, Epstein and O'Brien (1985)
esteQjlished a new aggregation technique that provides cross-
situational stability for behavior when applied
longitudinally. The new credibility of reliably measuring
behavior has renewed interest in personality research.
Background
Personality psychology has suffered an erratic and
turbulent history. Allport's emphasis in individuality
remained controversial for over a decade. Finally being
accepted as a credible field of psychology, new questions
arose concerning generalizations of personality findings.
Psychologists began to arcfue the appropriateness of mixing
nomothetic and idiographic conclusions. Runyan (1983)
cautioned that learning what is true about persons-in-
general often has substantial limitations in eneibling one to
understand and predict the behavior of individuals. This
premise was underscored by the very influential writing of
Mischel's (1968) Personality and Assessment.
In what is now considered a classical reference in
personality research, Mischel reached some very critical and
pessimistic conclusions concerning the prediction potential
of personality research. He examined the methods and
25 conclusions of post-World War II personality research and
challenged the recent development and use of personality
findings. Mischel (1968) concluded personality measures
were very limited in predicting behavior. In his research,
he found poor correlation ranging between .20 and .30 among
personality dimensions and predicted behaviors. He
additionally found a poor consistency in the ability of a
personality dimension to predict behavior across similar
situations. Mischel successfully attacked the validity and
reliability of past personality measures and predictions. A
screeching stop and ridicule were cast upon personality
research that would last a decade.
Mischel believed the small correlation between
personality and behaviors implied the situation was
significantly more dominant. He changed the emphasis in
personality research to a cross-situational debate. Instead
of defining the task of personality research as inferring
global, steLble dispositions as individual differences,
Mischel redefined it in terms of predicting specific
behaviors across different situations. He specifically
questioned the pre-existing premise that self-reported
personality measures correlated with observations and noted
the low (r=.30) relationship. He attributed this poor
relationship to his belief that behavior is situationally
determined and preferred social learning theory explanations
rather than personality.
26 Shortly after Mischel's critique, privacy issues
temporarily terminated the use of personality measures in
the workplace. Ethical responsibility to protect an
individual's right to privacy and confidentiality dominated
over the use of personality research. Researchers and
industry were intimidated by the class-action lawsuits
concerning privacy and chose instead to cease personality
assessment (Barrett & Kernan, 1987). By the mid-197 0s,
personality research was all but abandoned.
Not until the early 1980s did personality researchers
respond to Mischel's writings. A collage of research began
to emerge and address Mischel's critiques. The first issue
to be addressed was the small correlation. Although
Thorndike (1906) and Mischel (1968) pointed to the small
correlation between behavior at different times, Epstein
(1979, 1980, 1983, 1984) identified an error in their
aggregation technique. Epstein cited the Spearman-Brown
prophecy formula which told that the correlation between any
two single behaviors was necessarily low. However, when
assessed across several situations correlation rapidly
increases. This tendency was duplicated by personality and
behavior correlation (McGowen & Gormly, 197 6).
Additionally, Mischel's critique of the small correlation
was refuted by studies that produced personality/behavior
correlation exceeding .30 (R. Hogan, DeSoto, & Solano,
1975). Mischel's criticisms were finally answered using the
27 same argument he used on earlier personality research, his
conclusions were incomplete and based on faulty research
techniques.
About the same time Mischel's statistical and
behavioral critiques were addressed, privacy issues were
resolved. Extensive costs and responsibilities associated
with employee selection and training stimulated industry to
re-instate personality assessment techniques. Professions
such as nuclear security and law enforcement emphasized the
need for employees with proven stable personalities (Gough,
1969) . Personality assessment was re-integrated in industry
by including it in a series of selection batteries. Privacy
issues were circumnavigated by making the testing process
volxintary, but mandatory for employment consideration in
certain security related positions. Furthermore, an
applicant was not refused employment due to scores on any
one testing battery, but scores based on several instruments
along with an interview (Barrett & Kernan, 1987) . The
personality testing was not a sole source for refusing
selection, but instead highlighted areas of concern for the
interviewer to further explore. Privacy issues were
resolved by integrating personality assessment as a series
of testing batteries that complemented an interview process
in screening employee selection.
Personality research has thus evolved through a stormy
process to become a credible s\ibdiscipline of psychology.
28 It has grown to represent two different and distinct
paradigms, one based on human nature and another concerned
more specifically with the individual. Concerned with
explaining, predicting, or controlling behavior, personality
research is dependent on behavioral observation and
reliability. Withstanding volatile critiques concerning low
correlation and inconstancies with behavioral situations,
and privacy issues in the workplace, personality psychology
has emerged as a stronger, credible field of psychology.
A further distinction in defining personality involves
perspective. How personality is defined depends upon whose
perspective is viewed, the individual or the observer. The
individual perspective is based on an inner nature of
individuals which explain "why" an individual behaves in a
characteristic way. The innate characteristics provide
structures, dynamics and processes that are useful in
describing why an individual behaves in the manner
perceived. This type of personality is private, innate
information and must be inferred. On the other hand, the
"observer" perspective refers to a person's "social
reputation" and how that individual is perceived by others.
This concerns the eunount of esteem, regard, or status that
person has within a social group. It includes descriptive
terms such as dominant, passive, considerate, and ruthless.
This perspective of personality is based on reputations and
past behaviors. It is both public and veriflcLble. Hogan
29 (19 87) reports that personality from an observer's
perspective may be very useful in performance prediction,
"Because reputations summarize a person's past behavior, and
because many writers believe that the best predictor of
future behavior is past behavior, reputations may be a
useful way to forecast trends in a person's performance" (p.
145) .
Like personality, trait also has two meanings which
correspond to the two meanings of personality. Based on the
social reputation aspect, trait is a neutral descriptive
measure (i.e., aggressive) that tells how we may expect an
individual to behave, but not why (Buss & Craik, 19 83).
From the inner structure perspective, trait is an innate
psychological feature such as attitudes and emotions. This
use of trait can be used to explain behavior, but must be
inferred (Allport, 1961).
Thus, defining personality involves another
distinction, perspective. An observer's perspective
involves an Individual's social reputation and may be
measured empirically using trait descriptives that describe
neutral measures such as aggression. The individual
perspective explores an inner structure that can be used to
explain or account for that reputation using another form of
trait descriptives that measure more subjective items such
as attitude and emotions. The present study explores
personality measures through the observer's perspective
30 using associated neutral and empirically measured trait
descriptives.
Type Theory
From an observer's perspective of personality we
attempt to describe and predict other people's behavior by
classifying them into categories. These categories are
called "Types," which simply consist of trait conglomerates.
Two people in the same type category will share
approximately the same traits, but will rarely have the
precise same traits. A common example of the usage of type
categories are Type A and B personalities. Type A
individuals are classified as excessively competitive,
having exaggerated time urgency, and a high level of
hostility and aggression (Glass, 1977; Matthews, 1982).
Although two individuals may fit the Type A behavior
pattern, both will have distinctly different traits (i.e.,
introversion/extroversion, assertiveness, etc.).
The first systematic type theory is credited to Galen,
a Greek philosopher in the second century A.D.. He
identified four types of people in the world: the sanguine,
who is always cheerful and upbeat; the choleric, who is hot-
tempered and self-dramatizing; the melancholic, who is
lugubrious and fretful; and the phlegmatic, who is stolid
and unflappSLble (Roback, 1927). Galen's types were based on
31 a crude biochemical theory framed in terms of four humors:
black bile, yellow bile, phlegm, and blood.
Type theories are the oldest and most consistent means
of classifying personalities of other people. Throughout
history, personality type has constantly been rediscovered
and revised. Galen's theory was revitalized and cited in
Immanuel Kant's Anthropologie (1798). Wilhelm Wundt (1874)
sophisticated Galen's theory, stating the types were based
on neurological mechanisms rather than humors. Revisions
and expansion of Type theory continued with many interesting
conceptual variations being developed in this century.
Jung's (1923) theory of psychological types stimulated the
development of the Myers-Briggs Type Indicator (MBTI; Myers
& McCaulley, 1985). Spranger's (1928) theory of types led
to the development of the Study of Values (Allport, Vernon,
& Lindzey, 1951). Holland's theory of personality and
vocational types lead to the Self-Directed Search (SDS;
Holland, 1985). Perhaps Holland's work with matching
personality types with vocations is the most popular use of
Type theory today.
In his RIASEC model, Holland proposes six ideal
personality types, each defined in terms of a distinctive
pattern of interests, competencies, vocational choices, and
problem solving styles (Holland, 19 85). The Realistic type
(an engineer or technician) is mildly introverted and
conforming, has concrete practical interests, and prefers
32 traditionally masculine careers. The investigative type (a
scientist or researcher) is mildly introverted and
nonconforming, has abstract theoretical interests, and
enjoys intellectual work. The Artistic type (a writer or
musician) is unconventional and sometimes nonconforming, and
enjoys working on open-ended desicrn problems. The Social
type (a minister or human resource person) is
unconventional, extroverted, idealistic, and enjoys helping
people. The Enterprising type (a salesperson or manager) is
extroverted, aunbitious, and enjoys leadership positions and
manipulating others. The Conventional type (an accountant
or data processor) is conforming, orderly, and pragmatic,
and enjoys problems that have clear-cut solutions.
Especially powerful of Holland's model is the overlap
between types. He describes individuals in terms of two or
more of the type categories. Holland has successfully used
personality types to match vocations based on their
psychological demands (Driskell, R. Hogan, & Salas, 19 87).
Education has also made valueible use of classification
and type theory. For decades students have been categorized
in education based on intelligence, skill level, handicaps,
and most recently, learning style. Educational
psychologists have used type theory to help match the
individual needs and cLbillties of students with specifically
engineered teaching and learning processes. A typical
learning style typology considers four major dimension of an
33 individual student: overt behavior, cognitive behavior,
motivational attitudes and affective attitudes (Golay,
19 82) . The student is then categorized using learning style
(type theory) based on their optimal learning profile. One
contemporary type theory which is based of student
personalities is Keirseian Temperament Theory (KTT).
KTT is a holistic personality approach which attempts
to match student personality types with specific classroom
environments which best facilitate learning for that type of
student (R. Dunn & K. Dunn, 197 8). There are four basic
temperament styles in the KTT paradigm: The Dionysian, The
Epimethean, The Promethean, and The Apollonian. Each type
displays characteristic patterns of thinking and behavior.
The Dionysian is action oriented and must be free to act.
Their learning is best described by the phrase, "To do is to
Learn." The Epimethean is duty oriented and prefers an
established hierarchy of control. This type of student
prefers structure, order, and planning. The Promethean is
described as having an insatieible desire to acquire
intelligence and to become competent. To them life is a
riddle waiting to be solved. Finally, the Apolloniem is an
individualist searching for their own way amd is very
people-oriented. Relationships and emotions are very
dominant learning media for the Apollonian.
The use of type theory in education has recently
extended to include the instructor and teacher. The most
34 common application is matching the student to an instructor
with a similar learning type (Robinson & Gary, 1974). A
recent extension to this approach was developed by Broudy
and extended by Hudak and Anderson (1984) suggesting
teacher's personalities cause them to place different
emphases on students' knowledge, thinking skills and
enjoyment (attitude). This typology places teachers into
four types based on their greatest classroom emphasis among
enjoyment, thinking, and knowledge: The Philetic instructor
prefers an enjoyment emphasis. The Didactive instructor
expresses a knowledge emphasis. The Heuristic instructor
displays a thinking emphasis, and The Normative instructor
has an equal emphasis of all three dimensions (Porter,
1991).
Personality type theory today is well established in
modem organizational psychology paradigms. It is commonly
used in career choice, person-environment fit, and personnel
selection (J. Hogan & R. Hogan, 1986). Holland's integration
of personality and vocation highly complement this project's
research. It appears there "ideal," or better suited
personalities, for various vocations. This study attempts
to identify the ideal personality trait profile for a
military instructor pilot.
35 Trait Theory
It is important to note the difference between traits
(personality dispositions) and types. Allport opposed the
notion of types because it ignored individual differences,
and an individual could be fitted into several different
types, pending the category. For instance, an individual
could be the intellectual type, the witty type, a fastidious
type, and many more. Allport preferred trait descriptives
because they were custom aggregation descriptives that
describe the whole individual.
In the early 1930's, Allport began developing a trait
lexicon, a listing of English trait words (Allport & Odbert,
1933) . The lexicon contains all the terms that English-
speaking people use to describe one another. The structure
of the trait vocabulary is related at some degree to the
structure of personality from the observer's perspective
(Wittgenstein, 1953). Cattell (1947) used factor analysis
to collapsed the trait lexicon of 23,000 terms into a 140-
item structure of personality. Further correlational study
reduced the structure from 140 items to 16 factors. Fiske
(1949) furthered the investigation and streamlining of the
lexicon to ultimately resolve five factors describing the
structure of personality (Big Five). These five factors
have been replicated over decades across various
populations, age groups, and languages (Borgatta, 1964;
Botwin & Buss, 19 89; Digman & Takemoto-Chock, 19 81;
36 Goldberg, 19 82; John, Goldberg, & Angleitner, 19 84; McCrae &
Costa, 1985; Peabody & Goldberg, 1989) .
The Big Five personality trait model is composed of
five factors: Neuroticism, Extroversion, Conscientiousness,
AgreeadDleness, and Culture. The first, Neuroticism or
Adjustment, is defined at one end by terms like nervous,
self-doubting, and moody and at the other by terms like
stable, confident, and effective. The second factor.
Extroversion or SociaOjility, is characterized at one end by
such terms as gregarious, energetic, and self-dramatizing
and at the other by such terms as shy, unassertive, and
withdrawn. The third factor is usually called
Conscientiousness. It is anchored at one end by traits like
planful, neat, and dependeible and at the other by impulsive,
careless, and irresponsible. The fourth factor is generally
called Agreeableness. One end is marked by such words as
warm, tactful, and considerate; the other end reflects a
combination of hostility and unsociability and is denoted by
words like independent, cold, and rude. The final factor.
Culture is defined by trait terms such as imaginative,
curious, and original; it is defined at the other end by
terms such as dull, unimaginative, and literal-minded.
(Hogan, 1987).
The application of personality trait theory in
education has produced very marginal results. Numerous
types of instruments have been explored consisting chiefly
37 of two mainline "off-the-shelf" surveys, the California
Personality Instrument (CPI) and Catell's 16 Personality
Factor Survey (16PF). Although a bit dated, Getzels and
Jackson (1963) concluded after reviewing over 2 00 studies on
teacher personality that:
Despite the critical importance of the problem and a half- century of prodigious research effort, very little is known for certain about the nature and measurement of teacher personality, or about the relation between teacher personality and teaching effectiveness. The regrettaible fact is that many of the studies so far have not produced significant results, (p. 574)
Medley (1973) and Gephart (1979) updated Getzels and
Jackson's earlier work, only to echo the same findings of no
significance.
The most encouraging results were obtained with a
survey customized specifically for teacher personality
assessment, called the Personality Research Form (PRF). The
PRF was developed by Douglas Jackson in 1974 as a general
personality research instrument (Jackson, 1974). The PRF
was subsequently reviewed and found to have sound
psychometric validity (Anastasi, 1972, 1976; Hogan, 1978).
One version of the PRF was modified specifically for teacher
personality assessment. Like previous teacher personality
research, it also determined weak correlations between
teacher personality and student classroom performance. In
19 82 the PRF was applied to a special type of teacher, one
who performs a more mentoring role such as music teachers,
tutors, and skills instructors. Seven of the 17 trait
38 measures on the PRF repeatedly indicated significant
correlations with student performance (p<.01), when applied
to these type of instructors (Bridgewater, 1982). The
findings were replicated over many designs with the same
seven personality traits emerging significant: Achievement,
Autonomy, Dominance, Endurance, Desirability, Aggression,
and Social Recognition. Additionally, two demographic
variables also resulted in moderate correlations with
student performance: Age (r=.51, p<.01), and Years of
teaching experience (r=.86, p<.001).
Due to its accepted validity and apparent success the
PRF became part of a Teacher Characteristics Study designed
to assess teaching effectiveness across various education
levels. Disappointingly, no significant findings resulted.
The PRF appears only effective when applied to mentoring
instructors with smaller student groups. This is exactly
the role of instructor pilots in undergraduate pilot
training. The PRF, or excerpts of some of its previously
significant trait measures, may be ideal for instructor
pilot personality assessment. Researchers at Armstrong
LaJDoratories, the U.S. Air Force Human Performance Research
Center, are using the PRF to build a new Five Factor
instrument. Personality trait measures from the PRF have
provided the starting point for new research in personality.
The Big Five trait model is the leading edge of
personality trait research today. Due to its recent
39 development, it is still relatively unexplored for many
applications. The five-factor trait theory's most potential
application may parallel Holland's work by matching vocation
with personality. Job analyses typically reveal that
certain personal attributes are necessary to perform a
particular job adequately (Gottfredson, Holland, & Ogawa,
1982) . Job performance appraisal are not only job specific,
but also include judgments about interpersonal performances
-- these judgments are often what is meant by personality
from the observer's perspective (J. Hogan & R. Hogan, 1986).
This study explores a personality trait theory applied to a
specific aviation vocation.
Aviation Personality Research
Early Military Development
When the United States entered World War I, the Army
had no selection or classification system to efficiently
build a large standing fighting force. As a result, a group
of psychologists were called upon to develop a series of
testing batteries that could determine an individual's
training aptitude. After the War ended, several
psychologists continued selection assessment research and
additionally began investigating hximan error causes of
aviation accidents. In 1919, at Kelly Field, Texas, a new
aviation candidate selection instrument was developed,
consisting of: an intelligence test, a test for emotional
40 Stability after shooting a handgun, and a test for measuring
one's sense of balance (Henmon, 1919). The United States
military aviation selection and screening assessment was
officially established.
Progress in aviation selection instriaments stagnated
until World War II. Once again war created a new and
greater demand for aviators. The need for selection
screening and classification of recruits was even greater
due to the rapidly progressing aviation technology demanding
greater technical knowledge of recruits, and the addition of
new aircrew positions. The criteria for aviator selection
evolved beyond simple physical qualifications and desire. A
low cost screening program was needed not only to select
recruits, but to also classify candidates into positions in
which they had a high probability of success in training.
The Army Air Force School of Aviation Medicine' s Departiment
of Psychology was tasked with creating a new selection
instrument. Their subsequent product was called the Army
Air Force Qualifying Examination (AAFQE). The AAFQE tested
aptitude, attitude, and motivation. Aviation candidates who
scored high on the AAFQE were then given an additional
aircrew classification battery, which consisted of 14 more
tests. The use of the AAFQE and aviation classification
battery reduced by more than 50 percent the number of
preflight school entrants necessary to maintain the same
41 number of advanced pilot training graduates (North &
Griffin, 1977) .
The new selection batteries were a tremendous
improvement; however, they were only effective with a large
number of candidates, since individuals were "selected out"
if considered unsuitable for training. In other words, from
the pool of recruits individuals were selected out not to
continue consideration for aviation selection based on some
form of potentially disabling psychopathology. The
batteries would not be suitable in today's environment where
a small number are "selected in" from a large pool of
candidates where candidates are selected in based on the
optimal qualifications. The "select in" is a more efficient
and effective process of identifying future success (Spence
& Helmreich, 1983). At the end of the war, efforts were
made to replace the AAFQE and aircrew selection battery due
to extensive costs associated with their "select out"
design. Over 20 studies investigated commercial instrument
alternatives, but with little success. Part of the reason
may have been the commercial instruments were designed to
identify abnormal psychological conditions, rather than to
predict success in performance (Anastasi, 1976). The
commercial alternatives were reliable when used against
psychiatric criteria (clinical evaluations), but were not
reliable when used against performance criteria (Rossander,
1980).
42 Personality assessment in aviation selection since
post-World War II has experienced a very limited success.
Jet Age assessment concentrated on pilot selection criteria
rather than classification, emphasizing intelligence
measures and previous flying experience. Numerous
personality measures were explored, but revealed little
validity. In a review of Navy selection research. Griffin
and North (1977) found that approximately 40 different
personality paper-and-pencil test devices had been evaluated
from 1970 to 1976 for pilot selection without any
appreciable impact on training success for the selection of
aviator candidates. They contended one of the major reasons
for this lack of success was that applicants were prone to
select answers that made them appear more desireible than
answers that reflected their personalities. The respondent
may be motivated to "fake good" or choose answers that
create a favoreJ3le impression (Anastasi, 197 6). Demand
characteristics have compromised the use of commercial
personality instruments in aviation selection as applicants
competed to be selected. A more effective application, and
better controls on "faking," in aviation personality
assessment may be found in the classification process. The
classification process is less threatening to candidates
than selection because it simply tries to match a candidate
skills with the most appropriate job. This present study
explores the potential use of personality in military
instructor pilot placement. Rather than focus on the
initial pilot candidate selection process, this study
investigates the use of personality measures to match an
already existing military pilot with the instructor pilot
vocation. By matching a military pilot's personality
profile with personality profiles that best complement
effective instructor pilots, student pilot attrition may
decrease.
43
Recent Renewed Interest
In the past 20 years, validity of self-reported
personality instruments has improved by incorporating more
sophisticated "lie scale" detectors (Graham & Lilly, 1984).
This, along with new personality instruments designed
specifically for aviation, has increased validity and
correlations of personality measures in aircrew selection
thereby renewing interest in its potential application.
Robert L. Helmreich, Department of Psychology, University of
Texas at Austin, championed the re-integration of
personality assessment in aviation by advocating,
"Personality may be a limiting factor on an individual's
flying performance potential and that personality research
may not only improve selection, but may also help in the
design of training" (1986, p. 87). Not only was selection
underscored, but Helmreich astutely identified the future
potential of personality in the classification process.
44 This perspective was officially echoed by the Air Force
Human Resources Laboratory (AFHRL) which cited:
personality factors were found to predict pilot training
outcome measures...(and) different combinations of
characteristics, rather than the simple presence or aLbsence
of a key personality trait, appeared to be a better
predictor of pilot training outcomes (Siem, 1989).
All branches of the U.S. military are in agreement,
personality measures are again needed in aircrew selection
and training. The Naval Aviation Medical Research
Laboratory (NAMRL) has officially concluded it is no longer
desirable to rely on aptitude alone for pilot selection and
the personality factor is rapidly emerging in importance
(Dolgin & Gibb, 19 88). The Army has similar conclusions
citing its official integration of personality assessment in
Army Fixed and Rotary Wing selection batteries (North &
Griffin, 1977). Personality measures are recognized
predictors in specific aspects of military selection
criteria. McHenry et al. (1990), while investigating
personality and aptitude predictors, concluded that
personality measures were the best predictors of criterion
measures such as leadership, personal discipline, and
military bearing, whereas aptitude measures were the best
predictors of criteria such as technical proficiency and
soldiering proficiency.
45 Both military aviation laboratories, NAMRL and AFHRL,
are actively pursuing new personality measures and
applications in aircrew selection, classification, and
training. In developing their new personality research
programs, both lahs have unilaterally subscribed to future
personality research criteria recommended by Steven
Kozlowski: (1) the selection of traits to be measured should
be based on sound research; (2) a clear relationship should
be shown between those traits and successful job
performance; (3) the test measuring these traits should show
high reliability and validity and not be susceptible to
response bias (faking); and (4) conclusions should be based
on a sound research strategy in order to explain the
validity of these personality traits as success predictors
(Kozlowski, 1978; Dolgin, & Gibb, 1988).
The Personality Characteristics Inventory
Robert Helmreich is a pioneer in developing aviation
personality measures to meet Kozlowski's recommendations.
He developed one of the aviation industry's most accepted
personality assessment instruments, the Personality
Characteristics Inventory, PCI (Appendix A) (Siem, 1987).
The PCI is modified specifically for aviation selection and
is derived from two other personality tools, the Extended
Personal Attributes Questionnaire (EPAQ) and the Work and
Family Orientation Questionnaire (WFOQ). The PCI measures
46 both positive and negative personality traits. Positive
traits include assertiveness, interpersonal orientation, and
aggressiveness; negative traits include verbal
aggressiveness, hostility, and submissiveness (Helmreich &
Wilhelm, 19 89). Assertiveness reflects an individual's
feeling for independence, performance under pressure, and
decision making ability; interpersonal orientation reflects
concern for and interaction with others; aggressiveness
reflects a need for security, reaction in a crisis
situation, and need for approval of others; hostility
reflects arrogance, greed, and cynicism; verbal
aggressiveness reflects need to nag and complain; and
submissiveness reflects gullibility and servility. The WFOQ
contribution to the PCI scales assesses achievement
motivation. The three scales used are mastery, work
orientation, and competitiveness. Mastery represents the
desire to undertake new and challenging tasks; work
orientation is the motivation to do a task well; and
competitiveness measures the desire to outdo the performance
of others. Mastery and work orientation are positive
predictors of success and performance; competitiveness has
been shown to correlate negatively.
A majority of the PCI was developed in 197 8 using
academic scientists. Initial research explored specific
trait constructs of the "Type A" behavior pattern. Results
indicated two constructs best identified Type A behavior.
47 Achievement Striving, and Impatience/Irritability. A second
major finding identified an artificial personality
phenomenon called "the honeymoon effect." Essentially, the
honeymoon effect accounts for why personality measures may
have marginal to weak correlations with job performance.
During training or the first few months on the job, negative
personality measures are suppressed by the individual. The
novelty or newness of the job masks negative personality
measures such as Mastery. Helmreich and Wilhelm (1989)
found these negative personality measures to significantly
emerge as predictors in later performance.
The PCI results were replicated in the aviation
environment. A national airline implemented a longitudinal
study using the PCI for selection and subsequent job
performance follow-up. Their results emulated Helmreich's
and Wilhelm's. There was a difference in significant
personality predictors between selection and job
performance. The job performance predictors included more
of the negative personality measures. These negative
personality measures remained as stable predictors over the
few years of job performance assessment whereas the
screening predictors rapidly lost validity a few months
after selection (Chlldester, 1988).
Helmreich developed and tested the PCI in direct
subscription to Kozlowski's four recommendations for
maximizing future personality research success: (1) the
48 selection of traits were identified from initial research of
scientific attainment and academic performance (Helreich &
Spence, 197 8); (2) a clear relationship was shown between
traits and successful job performance (Helmreich, Spence,
Beane, Lucker, & Matthews, 1980), Helmreich found a
significant correlation between the PCI and pilot
personality as measured by Federal Aviation Administration
(FAA) flight inspectors (Helmreich, 1982, 1987); (3) the
test shows high reliaibility (Bluen, Barling, & Bums, 1989) ,
the instriiment includes a "lie scale" based on statistical
combinations of improbable answers on different subscales
(Helmreich & Wilhelm, 1989) ; (4) conclusions are based on
sound research strategy validated by a national airline and
specially trained Check Airmen (Chlldester, 1988).
At the time of this research, the PCI is the aviation
industry standard in personality assessment. For over a
decade, the PCI has established credibility in validity and
relicLbllity measures. Correlations have increased, "faking"
minimized, and new applications explored. Helmreich's PCI
has clearly estadDlished recognition for personality
assessment in commercial aviation, but does the PCI apply in
military aviation? Most of the baseline for the PCI was
estcUDlished using commercial transport pilots. Helmreich's
initial research implies there should be no difference
between the two groups of pilots (Gregorich, Helmreich,
Wilhelm, & Chlldester, 1989).
49 The Big Five
Personality and job performance research over the past
25 years has resulted in small correlations and low
validities (Ghiselli, 1973; Guion & Gottier, 1965; Locke &
Hulin, 1962; Schmitt, Gooding, Noe, & Kirsch, 1984). At the
time of this research, however, there was no uniform or
well-accepted taxonomy for classifying personality. Various
studies explored different traits with different
definitions. As a result, it was not possible to determine
if there were consistent findings between specific
personality constructs and performance criteria in different
occupations (Barrick & Mount, 1991). During the past 10
years, many personality psychologists have come to accept
five general factors to represent the structure of
personality.
There is some controversy concerning the personality
construct composition of the five factor model. The most
commonly accepted structure is called the "Big Five" based
on Norman's Big Five which include: Extroversion, Emotional
Stability, Agreeableness, Conscientiousness, and Culture
(Norman, 1963). There are some researchers that disagree
with the simplistic five-factor model. They believe the
factors are in^recise and lack specification of the
personality dimensions (Briggs, 1989; John, 1989; Livneh,
1989; Waller & Ben-Porath, 1987). Other researchers are
under much smaller disagreement and believe simply that
50 another factor should be added to the model to create a six
dimension model (Hogan, 19 86). The sixth dimension is
created by splitting the Extroversion dimension into two
more specific factors. Sociability and Ambition.
Current five-factor research has resulted in some very
promising findings. Tett, Jackson, and Rothstein (1991)
used a meta-anlaysls review of 494 previous personality
studies applying the five factor model. Personality scale
prediction means more than doubled in validity from (.12) to
(.29); and an even higher mean (.38) was obtained using job
classification. Correlation mean validities across the five
factors used in the meta-analysis ranged from .16 to .33.
Additional five-factor research specifically
investigating job performance criteria (job proficiency,
training, proficiency, and personnel data) had similar
successful results. Barrick and Mount (1991) investigated
the five factor constructs related to five occupational
groups (professionals, police, managers, sales, and
skilled/semi-skilled). They found the Conscientiousness
dimension consistent with all job performance criteria for
all occupational groups. The Extroversion dimension was a
valid predictor for two occupations involving social
interaction, managers and sales representatives.
Additionally, the Extroversion dimension was a valid
predictor across all occupations for training proficiency.
51 Significant potential of the Big Five has also been
identified in the military aviation environment. Siem and
Murray (1994) had 100 USAF pilots rate the appropriateness
of 60 personality traits concerning performance of flying
skills and crew management. They identified
Conscientiousness as the most important determinant of
performance across the personality dimensions. The findings
acknowledge the ambiguity of the five factor definitions and
recommended further study to accurately define and measure
the five dimensions. Although promising, Siem and Murray's
findings underscore the ambiguity of the five-factor model.
The vagueness and oversimplistic structure of the five-
factor model is a common concern among personality
researchers. The strongest critique is its limitation in
prediction (Hough, 1988, 1989; Hough, Eaton, Dunnette, Kamp,
& McCloy, 1990). Researchers argue the Big Five is too
broad and heterogeneous, and more dimensions are needed for
accurate prediction of job performance. Hough (1992),
investigating the job performance of Army soldiers with the
five-factor model, discovered severe limitations and
omissions with the model. Hough identified nine trait
scales were needed to accurately predict performance rather
than five. Missing from the five-factor design were scales
that measured Dependability, Achievement, Potency, and
Affiliation. Also mentioned in the findings were
observations that Locus of Control and Rugged Individualism
52 were important attributes in military performance, but also
missing from the five factor model.
This present study assessed over 20 personality
instruments in selecting an appropriate measure for military
instructor pilots. Because the Big Five is still being
refined by military psychologist and lacks longitudinal
validity and reliability measures, psychologists at both the
NAMRL and AFHRL recommended the PCI for this study. The PCI
offered three important and unique aspects over other
instruments: (1) the PCI is "off the shelf" with established
reliability and validity measures; (2) the PCI is widely
accepted in the commercial aviation industry and is
considered credible, (3) the PCI has been refined to control
for the temporary effect of training, "the honeymoon
effect," and is considered a good measure of long-term job
performance.
Job Performance Assessment
Aviation
Previous U. S. military aviation selection research has
yielded low correlations with job performance and training
success. Predictive validities based on intelligence tests
and personality assessment are typically in the 0.15 to 0.25
range (Damos & Gibb, 1986; Greuter & Herman, 1992). Adding
further measures to the selection process such as
psychomotor tests, and information processing tests have
53 increased correlations to 0.20 to 0.40 range, but the range
is too variable to be considered reliable (Carretta, 1992a).
Researchers in contemporary aviation selection research
believe the low correlations from previous studies are in
large part due to the method used in measuring job
performance or training success. They attribute four
reasons for weak predictive validity: (1) range restrictions
of partially screened populations, (2) artificial success
rates imposed by military manpower needs, (3) dichotomized
pass/fail criterion variable, (4) inappropriate performance
test development.
Due to the extensive costs associated with flight
training, the military as well as commercial carriers pre-
screen aviation candidates based on intelligence and
aviation experience before accepting and processing an
application. This pre-screening process imposes a range
restriction in siibsequent selection instruments (Blower &
Dolgin, 1991). Decreasing the range in the predictor
variables will result in a decreased correlation with
criterion varieibles.
A second compromise to correlation values is the
artificial success rate in military pilot training based on
military manpower needs. If the need for the number of
pilots increases, candidates scoring below the usual
selection cutoff criteria may be accepted into flight
training (McFarland, 1953; Hoffelt & Gress, 1993), the
54 flight evaluation may be changed, or the training process
may be extended for weaker students to increase success
rate. The variability of selection criteria, training
process, and success would certainly affect correlations
adversely.
A third explanation for low correlations is the use of
pass/fail success criterion. Although there are specialized
techniques for processing correlations with non-interval
data (i.e.. Spearman-Brown), correlation values may be
substantially lower. Cohen (1983) noted that dichotomizing
the criterion variable at the mean results in a 38%
reduction of effective Seunple size when the population
correlation is between 0.20 and 0.50. As the
dichotomization departs from the mean, the decrease in power
and loss of effective sample size becomes increasingly
severe. Thus, the high success rate in undergraduate pilot
training (75%) effectively limits the correlation (Carreta,
1992b).
A final possible reason cited for low correlations, and
the primary concern for this research, is performance
assessment test development. Thorndike (1949, p. 6) notes
"The tests to be used for selection of aircraft pilots can
be determined only by relating test scores to some later
index of skill in the actual job of piloting a plane."
Regardless of this early and classical guidance, no military
studies predicting operational performance were found
55 (Griffin, Morrison, Amerson, & Hamilton, 19 87). Some
studies have examined the validity of selection tests with
operational performance of fighter pilots and other have
used selection batteries to predict advanced levels of
training performance in fighter type aircraft (Brictson,
Burger, & Gallagher, 1972; Bale, Rickus, & Ambler, 1973,
Blower, 1992); but there was still an obvious lack of
prediction of personality with operational performance,
especially outside the fighter commxinity (Griffin & Shull,
1990). The basic premise of relating aviation selection to
later operational performance requirements suggested by
Thorndike appears to be ignored in military aviation.
This study specifically investigated the relationship
between personality and operational job performance of
instructor pilots, otherwise known as aircrew
classification. Instructor pilots are from a pre-screened
population, but are beyond the control of this research.
Additionally, current IP performance assessment criteria
appears to have inflated success rates, and a dichotomized
performance variable. Therefore, the major consideration
for this study was estSLblishing an appropriate performance
assessment criterion and testing technique. Three primary
factors were involved in this process: identifying the tasks
comprising the instructor pilot's job, determining the
levels of performance for each of the tasks, and identifying
reliable measures of performance on each of the tasks.
56 Performance Criterion
Conventional military aviation performance assessment
instruments are flying skill specific and lack variability.
Current instructor pilot performance evaluation includes an
annual flight check and procedural knowledge test. There is
very little varicJaility in these assessments since the
flight check is a dichotomous variable (pass/fall), with
over 85% in the pass category annually. The procedural
knowledge examination is published and accessible to all
pilots; therefore, it has an even higher pass rate. To best
measure the multiple job aspects of an instructor pilot,
both in and out of the cockpit, a new instrument was needed
that could show more variability and be more comprehensive.
After consultation with Dr. Helmreich concerning development
requirements for a global perforznance assessment of
instructor pilots (IPs), he recommended using a modified
version of the NASA/UT Astronaut Assessment Survey. This
recommendation was endorsed by experts from NAMRL and AFHRL.
The NASA/UT Astronaut Assessment Survey offered four
distinct strengths in assessing IP performance: (1) it is an
"off-the-shelf" instriunent requiring minor modification to
assess the instructor portion of job duties; (2) it has
established developmental validities and relisLbilties based
on multiple astronaut samples; (3) it assesses operational
performance in the three primary IP duties, officership,
piloting, and Instructing; (4) it offers 360 degree rater
57 assessment, allowing perceived performance appraisal from
students, peers, and supervisors.
A copy of the NASA/UT survey is in Appendix A and a
complete description of the survey can be found in Chapter
III under Instrumentation-Performance Measurement. The
instriunent was developed in 1990 under a NASA grant intended
to empirically assess astronaut compatibility for space
station living. It was developed using 84 astronaut
subjects rated for job effectiveness over nine job
dimensions, by 65 supervisors and 22 peers. The job
dimensions included: Communications, Group Living, Job
Competence-Performance, Job Competence-Performance under
Pressure, Leadership, Liking, Personality, Space Station,
Teamwork, and Knowledge. The mean correlations for all the
dimensions ranged from 0.60 to 0.70. An exploratory factor
analysis reduced the nine dimensions to three: Socio-
Emotional, Knowledge and Perforniance, and Leadership. Of
these factors, only the Knowledge and Performance factor
significantly related with operational performance (r=.43),
as rated by peers. Additional findings indicated a high
interobserver agreement among peer ratings ranging from 0.60
to 0.70. Supervisor rating, however, ranged dramatically
and showed weak inter-observer agreement (r=.21) . The two
primary findings that resulted from the NASA/UT Astronaut
Assessment Survey development were the valid performance
criteria dimensions, and a high inter-observer agreement
58 among peer ratings (Rose, Helmreich, Fogg, & McFadden,
1993) .
Minor development modifications of the instrxunent for
military instructor pilot performance assessment was
identified with a pilot study and literature review- Based
on feedback from supervisors, students, and instructor
pilots, performance criteria requirements needed to measure
three primary IP responsibilities: officership, flying
skills, and instructional abilities. Many of the original
NASA/UT survey constructs already measured areas assessing
officership and flying skills. The original Job Knowledge
and Performance constructs (Job Competence-Knowledge, Job
Competence-Performance, Job Competence-Performance Under
Pressure) were retained due to their obvious connection to
performance assessment and their significant correlation
identified in the instrument's initial development.
Additionally, Leadership and Teamwork were specifically
identified as officership measures by the pilot study.
Instructional abilities, however, were more ambiguously
defined.
A review of the literature suggested Communication
Skills and Student/Instruetor relationships were the
predominant factors affecting flight instruction (Bowers,
1953; R. Davis, 1989). Both previous studies investigated
military aviation training environments and concluded that
students with specially trained communicators or instructors
59 trained with good communication skills progressed more
rapidly in training, retained knowledge longer, and
performed more successfully on flight evaluations. The
communication skills of the instructor obviously impacted
student performance. Commimication is considered an
essential skill for instruction, yet instructor pilots
receive very little formal training in communication, and
even less operational feedback assessing their ability to
commTinicate with the student (ATC Study Guide F-V5A-A/B-ID-
SG, 1990). Therefore, Communication Skills was also
retained on the NASA/UT performance assessment instrument.
The only construct added to the NASA/UT performance
appraisal tool was Personality. Student/Instructor
relationships were identified by the pilot study as
important variables in assessing IP performance. Appearing
obvious, the impact of student/instructor relationships were
empirically reinforced by R. Davis (1989) when he Identified
that 12% of student pilots self-eliminate from pilot
training due to stresses and anxieties caused by
relationships with instructor pilots. Additionally, Hopson
(1978) concluded that a large proportion of Naval aviation
attrition can be attributed to motivational factors. It
would appear that approximately ten percent of student pilot
attrition may be prevented if instructor pilots were more
motivational and improved student/instructor relationships.
Motivation and student/instructor relationships were
60 collapsed into a general "Personality" assessment construct
as recommended by Helmreich (Rose, Helmreich, Fogg, &
McFadden, 1993).
The final composition of the NASA/UT Astronaut
Assessment Survey included eight of the original constructs
with only a personality measures added. Levels of
performance were assessed the same as in the previous
NASA/UT format using a Likert scale ranging from Poor to
Excellent with a "Not Observed" category.
Performance Rating
The NASA/UT Astronaut Assessment Survey was designed to
solicit performance ratings from multiple perspectives.
This technique is referred to as a 360 degree rating format
which uses raters from multiple observational perspectives
to describe overall perceived performance. A 360-degree
rating decreases perception error and increases behavioral
observation stability (Schwarz, Barton-Henry, & Pruzinsky,
1985). Many organizational psychologists prefer 360-degree
evaluation techniques, because they represent perceptions
and observations from many diverse settings by informants
with \inique insights (Woodruffe, 1984) . Meta-analytic
results have shown that observers are probably in a better
position to judge an individual's reputation and
performance; as peer-peer, peer-s\ibordinate, and peer-
superior ratings of an individual's behavior had
, 61
substantially higher correlations than self-peer, self-
subordinate, or self-superior ratings of behavior (Harris &
Schraubroeck, 1988). Similarly, Hazucha (1991) reported
that self-ratings of managerial skills failed to
differentiate high performing from ineffective managers, but
observers' ratings of managerial skills clearly identified
both types of managers. Thus, recent research seems to
conclude that multiple perspectives strengthen performance appraisal validity and reliability.
This research incorporates perspectives from superiors,
subordinates (students), and peer instructors to develop an
overall global measure of perceived performance.
Higher Education
There are many purposes to performance appraisal of
faculty in higher education, such as improving teacher
performance, aiding administrative decisions, guiding
students in course selections, meeting state and
institutional mandates, and promoting research on teaching
(Millman, 1987). For this study, improving teacher
performance was the focus. The role of the teacher in
higher education has evolved to place the teacher and the
learning conditions as primarily responsible for pupil
achievement and not the pupil (Travers, 1987). This change
in philosophy occurred at the turn of the twentieth century
and changed the criterion for teacher effectiveness. Early
62 attempts to measure teacher effectiveness under the new
paradigm concentrated initially on test scores of students.
Over the past fifty years this criterion for teacher
effectiveness has again evolved from student test scores to
aspects of teacher behavior related to the growth of pupils
in achievement. By 1973, Roesenshine and Furst summarized
the ideal teacher behavior:
Teachers most effective in producing learning are clear in the expression of their ideas, variable and flexible in their approaches to teaching, enthusiastic, and task oriented, (p. 19)
Over the past decade, research in teacher effectiveness
is again shifting back to the student and away from the
teacher. Although real knowledge has been obtained
concerning teaching effectiveness criteria, measurements are
less clear. Recent research has concluded that pupils adapt
well to many different approaches to teaching, and
quantifying student progress in learning remains ambiguously
defined and dependent on the situation and material (Hoge &
Luce, 1979). It is widely accepted in higher education
today, that teacher performance appraisal is multi
dimensional and reflective of multiple appraisal groups
possessing unique insights to the teachers behavior (Marsh,
1987) . As a result, teacher development and effectiveness
in higher education is primarily measured with student
ratings, peer reviews, and indirect measures of teacher
competence (Travers, 1987). This same criterion provides
63 the basis of instructor pilot performance assessment for the
present study.
One of the more controversial measures of teacher
performance is student ratings. Faculty objections cited
students lack subject expertise and are influenced by course
difficulty and grading practices, which would imply students
are neither qualified, nor reliable as judges of teacher
performance. These concerns have been empirically addressed
over the past thirty years. Aleamoni (1976) found:
Students frankly praised instructors for their warm, friendly, hximorous manner in the classroom, but if their courses were not well-organized or their methods of stimulating students to learn were poor, the students equally frankly criticized them in those areas, (p. 112)
This conclusion was replicated with similar conclusions that
students are informed and relieible judges of teacher
performance (Costin, 1971; Frey, 197 8; Perry, 1979; Ware &
Williams, 1977).
Historically, peer review of a faculty member's work
was used for appointment, promotion, the granting of tenure,
the selection of manuscripts for pxiblication, and the
approval for research grants (Lazovik, 1987). Although peer
review has been widely used in evaluation of research
scholarship, its role in evaluating teaching has been widely
ignored (Batista, 197 6). This trend is rapidly reversing
due to the unique qualification peers can offer to faculty
development and appraisal. According to Lazovik (1987)
faculty peers are uniquely qualified to judge the sxibstance
64 of teaching because: "(1) their knowledge of the discipline
being taught provides the background against which
comparison can occur and (2) their long training in the
evaluation of evidence eneibles them to weigh what is
revealed through docxmentation" (p. 75). Peer-instructors
possess STibject expertise and judgment experience which fill
the gap of student appraisals only. This study utilized
both student and peer appraisals of instructor pilot
performance to provide a comprehensive perspective and
appraisal.
Before student and peer ratings were esteiblished and
accepted as valid faculty performance appraisals, indirect
measures provided primary faculty performance criteria.
Mitzel (1960) labeled this criteria as presage variables
which included "(a) teacher personality attributes, (b)
characteristics of teachers in training, (c) teacher
knowledge and achievement, and (d) in-service teacher status
characteristics" (p.1484). Borich (1977) updated these
leLbels to personality, aptitude/achievement, attitude, and
experience. Although well established in higher education
faculty assessment, presage variedsles are under professional
scrutiny and a bases of polarizing controversy. Educators
contend indirect measures may predict retention in a
teaching position, rather than instructional effectiveness
(McNeil & Popham, 1973). This study utilizes presage
variables of personality and demographics to explore
65 teaching effectiveness and cross correlates the results with
student and peer ratings. It should be underscored that
correlations may illustrate relationships, they do not prove
causation. The present study is an extension to the
application of presage variables in faculty assessment.
Summary
In summary, a review of the literature showed renewed
interest and support in using personality theory to predict
job performance. Previous low correlations are attributed
to poor instruments, improper methodology, and inadequate
job performance appraisal. New personality instruments such
as the PCI are yielding promising potential, especially in
the aviation environment. The PCI, along with the five-
factor model, are the leading edge of personality research.
Improper methodology focused personality research on
selection criteria rather than classification. Several
confoTinds occur when applying personality theory to the
selection process which compromises potential correlations
with future job performance: pre-selection range
restrictions, honeymoon effect, and "faking." This may be
why student pilot attrition remains constant at 20 percent
for the past 40 years. If personality trait theory is
applied to the classification process, as originally
suggested by Thorndike (1949), these confounds are better
controlled. Primary concerns and controls required for the
66 classification process hinge on controlling "faking,"
honeymoon effect, and developing appropriate job performance
appraisal. For this study "faking" was controlled by
associating the personality assessment tool as the same used
in airline hiring. IPs were warned about detecting "lie"
scales and were offered airline hiring desiraLbility feedback
at the end of this study. The honeymoon effect was
controlled by assessing flight that have been established
for a minimxim of two months. Job performance appraisal was
more difficult to control. Previous job criterion and
appraisal techniques for instructor pilots have been
underdeveloped. No studies were found that specifically
investigated the relationship of personality traits with
comprehensive job performance of instructor pilots.
Therefore, the present study will attempt to add to
personality theory research and job performance appraisal
development by investigating perceived job performance
relationship with personality traits.
CHAPTER III
METHODOLOGY
There were three purposes to this study: to measure the
perceived validity of a new performance assessment
instr\iment, the NASA/UT Astronaut Survey, applied to
military flight instructors; to develop a global measurement
of Instructor pilot performance; to construct regression
equations that predict overall (officership, flying, and
instructional) perceived instructor pilot performance using
personality traits, demographic characteristics, and a
combination of personality traits and demographic
characteristics. This chapter discusses the following: (a)
research design, (b) scope of the study, (c) subjects, (c)
instrumentation, (d) procedures, (e) variables, (f) data
analysis, (g) research concerns, (h) significance for policy
and theory.
Research Design
This is a complex predictive study that utilized
multiple survey instruments. The first objective was to
determine the validity of the dependent variable of the
regression equation, perceived performance. Perceived
performance was measured with the NASA/UT Astronaut
Assessment Instrument that assesses perceived performance on
seven scales: job competence-knowledge, job competence-
67
68 performance, job competence-performance under pressure,
leadership, teamwork, personality, communication skills.
Scale ratings are siibsequently combined to provide a single,
overall performance measure. The survey and construct
definitions are shown in Appendix A. Perceived external
validity of the performance instrument was accomplished by
asking students, peer-instructors, and supervisors to rate
each of the seven performance constructs using a Likert
scaled question on the construct's appropriateness in
assessing UPT instructor pilots. All instructor pilots were
rated on this instrument by three groups: student pilots,
peer instructors, and a supervisor. These groups were
chosen because of their xinique insights on the various job
characteristics of an instructor pilot. On average, each
instructor was rated by approximately 15 students, 8 peer
instructors, and 1 supervisor. Rating group scores were
averaged and then combined with the other groups' ratings
through a weighted formula to achieve a single overall
performance score (OPS) for each instructor. Equation 1.
OPS = [student ratings x (0.40)
+ peer instructor ratings x (0.40) (Eq. 1)
+ supervisor rating x (0.20)].
This weighted equation provided an overall performance
rating for each instructor comprised of perceived job
effectiveness insights from all three groups. Supervisor
ratings have a lower weighting due to the fact that
69 supervisors have less interaction with the IP. The equation
weighting was adopted from a similar performance appraisal
model used to evaluate student pilot overall class standing
(ATCR 51-10, Attachments 1 & 2). Therefore, the dependent
variable of perceived performance is a single score
representing an overall perceived performance rating from
three groups assessing instructor performance over seven
scales.
The second objective of the study was to identify
personality traits that predict perceived performance
ratings. An aviation specific personality assessment
instrument, the Personality Characteristics Inventory (PCI),
was used. The PCI is a self-reporting personality
assessment instr\2ment that measures 11 specific personality
traits. The 11 personality traits were used as independent
variables to predict the overall perceived performance
rating. This and the succeeding demographic model were
built on a comprehensive cluster sample.
The final objective of this research was to identify
demographic characteristics that may predict instructor
pilot perceived performance. This model was built the same
way as the personality predictor regression equation. It
utilized a modified demographic survey soliciting key
professional development, and individual description
criteria. Additional information about the demographic
70 characteristics model and all of the surveys used are found
later in this chapter under "Instrumentation."
Prior to the selection of survey instruments or
collection of any data, a pilot study was undertaken.
Forty-five student pilots near UPT completion, 10
supervisors ranging from squadron commanders to flight
commanders, and 20 peer instructors were asked to identify
characteristics of "good" and "bad" instructors. Analysis
of group responses revealed common themes and distinct group
concerns. All groups listed "job knowledge" and "aircraft
flying ability" as predominant characteristics of a good
instructor pilot. Students and peer instructors further
identified student/instructor relationships as an important
attribute. Students commonly labeled this desired
characteristic as "sincerity," while peer-instructors called
it "dedication." Supervisors distinctly identified
"leadership" or "officership" as trademarks of good
instructors. All groups identified similar characteristics
of a poor instructor: self-serving, arrogant, eibrasive, and
ignorant of duties or knowledge. These characteristics were
later used in identifying appropriate performance and
personality assessment Instruments.
Scope of the Study
This study reflects the responses of the entire
population of student pilots and line-instructor pilots at
71 Reese Air Force Base, TX, from January to March 1993.
Additionally, a second cluster sample from Vance Air Force
Base, OK, was also used to increase the sample size for the
regression models. The cluster seunpling was achieved by
assessing all flights at Vance that have been established a
minimum of 2 months with their associated IP force. Each
base represents approximately 25% of the total population.
Comprehensively the sample consists of roughly 33% of the
entire AETC population. The study specifically includes
approximately 350 student pilots and 150 instructor pilots.
The cluster sampling used should be representative of the
universe of Air Force instructor pilots since all are
homogeneously selected, trained, assigned duty stations, and
implement a common student syllabus. The students also are
from a homogeneous pool and are trained similarly. The most
significant differences between the four training population
groups are the weather conditions \inder which they operate.
This is a negligible difference since the training syllabus
narrowly defines the requirements for continuity of
training.
Subjects
The population for the present study consisted of all
U.S. Air Force instructor pilots involved in Undergraduate
Pilot Training. This amoiints to approximately 500
instructors stationed at four different training bases. It
72 was not feasible to coordinate and implement the survey to
the entire population. Therefore, a representative sample
was used for the purpose of the study. Two training bases
participated: Reese AFB, TX and Vance AFB, OK. Twenty-two
flights (classrooms) were sampled from the two bases,
providing a sample of 150 instructors; which represented 33%
of the population. In order to achieve comprehensive
classroom ratings and a stratified sampling of the various
instructor and student compositions in UPT (FAIPs/MWS IPs,
Academy/ROTC/OTS students); entire flights, or classrooms
were selected. This sampling process provides a form of
cluster seunpling and provides a good representation of the
population and a sufficient niimber of subjects to
successfully complete a regression study.
Selecting the sample in such a non-random manner
threatens the external validity of the study if the sample
group is significantly different from the population. This
is not the case for this study. The subjects in this study
are from a homogeneous pool. All instructors receive
identical training at Randolph AFB for three months before
becoming instructors, and they teach from a standardized
syllabus. They share relatively similar professional and
demographic backgrounds throughout their distribution over
the four UPT training bases (Air Training Command, 1984).
The students are also from a homogeneous pool of
candidates, representing similar backgrounds and
73 qualifications (Air Training Command, 1984; R. Davis, 1989).
The training syllabus is xiniform for all bases and directed
by the headquarters agency at Randolph AFB. The similar
representation of both students and instructors, along with
a uniform training syllabus, make cluster sampling a
credible estimate of the population.
Requirements for participation in the study were: (1)
all instructors and students in a flight must participate
together; (2) students and instructors must have been
assigned a minimxun of two months to the flight to insure
familization with the instructor; and (3) all instructors
must be currently teaching students in the aircraft. The
reasons for these requirements were: to ensure a
comprehensive assessment of each instructor, to ensure
enough time had elapsed for a student/instructor
relationship to evolve, and to ensure all instructors were
proficient in their instructor duties. Data related to
instructors from partial and special duty flights, such as
evaluators, were collected but not used in this study.
Instrumentation
The study consists of three distinct survey
instruments: the performance rating measurement, the
Personality Characteristics Inventory, and the demographics
data collection (Appendix A). The following sections will
detail each of these instruments.
74 Performance Measurement
The perceived performance rating of each instructor was
the dependent variable for this study. There are two key
considerations when using a performance rating variedDle: (1)
Who will do the rating?, and (2) What will be the criteria?
Subscribing to the currently emphasized philosophy in the
Air Force of Total Quality Management, the "who will do the
ranking?" was identified as the customer base of AETC
Instructor Pilots. These customers compose three specific
groups: the student pilots, supervisors or flight
commanders, and associated peer instructors. Each group has
a unique insight into the overall performance of individual
instructors and possess different expectations of what
constitutes quality performance. The method used in
combining ratings from these groups into an overall score
was illustrated in Equation 1.
The criteria for rating performance was identified from
students, peers, and supervisors during the exploratory
pilot study. An instrument was needed that would measure an
instructor's perceived job knowledge, flying ability,
instructor/student relationship, leadership, and
personality. Current pilot assessment and evaluation
instruments focus on flying ability and lack quantifiable
measures of instructor/student relationship, leadership, and
personality. Furthermore, most existing rating instruments
were completed solely by supervisors or evaluators. Experts
75 throughout military and commercial aviation did not know of
any empirical performance assessment measure that included
multiple groups ratings such as peers and students. As a
result, a new instrument was required. After consultation
with the Behavioral Science Department at the U.S. Air Force
Academy, a newly developed NASA/UT performance measurement
instrument was identified and recommended.
The NASA/UT astronaut assessment project recently
developed a performance effectiveness survey which is based
on peer and supervisor ratings. The instrument measures
perceived performance effectiveness over nine dimensions.
These scales are: Leadership, Teamwork, Group Living,
Personality, Space Station Compatibility, Communication
Skills, and three scales associated with job competence- Job
Knowledge, Job Performance, and Performance under Pressure.
Labels of these scales and inter-observer agreement values
are provided in Table 1. The inter-observer agreement means
are means of the various peer rating groups correlations.
The NASA/UT project team estadslished the peer rating
scale categories for a high pressure, aviation environment
similar to the one in which AETC Instructor Pilots operate.
The performance measurement survey in this study borrows the
astronaut rating criteria elements using seven of the nine
categories. The categories not used were specifically
concerned with space station living conditions. The
modified astronaut effectiveness elements have been
76 condensed to a worksheet rating instrument (Appendix A).
This worksheet provides a numerical rating scale that
enables raters to quantify their assessments. The overall
rankings are summarized at the bottom of the instrument.
Raters were instructed to rate all instructors in the flight
(classroom) , whether or not they have actually flown with
the s\2bject.
Table 1
NASA/UT Performance Survey
Attribute Individual
Agreement Mean
Communication
Job Competence (Performance)
Job Competence (Pressure)
Leadership
Personality
Teaunwork
Knowledge
0 . 5 6
0 . 6 6
0 . 6 5
0.
0.
0.
0.
.63
.62
. 6 1
.66
Overall Mean 0.62
77
The worksheet and rating criteria elements provide a
uniform and focused assessment of each instructor.
Soliciting ratings from the three separate customer groups
provides a comprehensive appraisal of the overall
performance of the instructor. Pooling the ratings of each
group and then formulating a weighting across groups
provided a comprehensive and quantifieJsle dependent varieible
for the study. This method of developing a universal
measure of perceived performance from multiple rating groups
(Equation 1) was duplicated from a similar performance
assessment prograun used to determine student pilot class
standings (ATCR 51-10, Attachments 1 & 2) .
Personality Assessment
In order to accommodate the timeline of this study, an
existing and validated personality assessment instrument was
required. Several "off-the-shelf" personality assessment
instriments were available, but few specifically apply to
the unique aviation environment. The "Big-Five" personality
theory, discussed in Chapter II, that was recommended by the
Air Force for future consideration in aircrew screening, is
still at a fledgling stage of development and lacks a
validity and reliability record. Because of the constraints
of this study, experts in aviation psychology at the United
States Air Force Academy's Department of Behavioral Sciences
and Leadership recommended the use of an existing
78
personality survey, the Personality Characteristics
Inventory (PCI). Their recommendation was endorsed by both
the Naval Aerospace Medicine Research LsUsoratories and the
Air Force Armstrong Laiboratorles.
The PCI (Appendix A) was developed by the NASA/UT team
to specifically assess aircrew personality. It contains
scales from three psychometric instruments, the Work and
Family Orientation Questionnaire (Helmreich & Spence, 197 8);
the Extended Personal Attributes Questionnaire (Spence,
Helmreich, & Holahan, 1979); and the Impatience/Irritability
and Achievement Striving Sxibscales (Predmore, Spence, &
Helmreich, 1988) . The PCI is widely used and highly
recognized in the aviation industry and has established the
highest reli€Lbility record for aircrew personality
appraisal. RelisLbility coefficients for most scales ranged
from 0.60 to 0.75 with only one scale below .60 having a low
value of 0.46. Table 2 illustrates the various personality
trait scale compositions in the PCI along with the
associated test/retest reliability coefficients.
The operational unit in measuring personality is the
trait. The PCI instrument is targeted to assess two broad
personality trait dimensions. Instrumentality or goal
orientation, and Expressivity or interpersonal capacities.
Because instriimental and expressive attributes are
considered by behavioral scientist to conflict and to be
mutually exclusive, they are measured independently.
79
Trait Attribute - subscale
Instrumentality
- Mastery
- Work
- Competitiveness
- Achievement Striving
- Negative Instrumental: Lty
Tabli e 2
PCI Constructs
Code
1+
M
W
C-
AS
I-
Number of Questions
8
8
6
4
6
8
Reliability coefficient
0.74
0.61
0.66
*
*
0.69
Bipolar Instrumentality/ Expressivity
I-E 8
Expressive E+ 8 0.75
- Verbal Aggression VA- 0.60
- Negative Communion C- 0.46
-Impatience/ Irritability
II 0.66
* Entire database was not available for comparison.
80
Much of the following narrative describing the
constructs of each scale has been borrowed directly from
NASA/UT documentation (Gregorich et al., 19 89).
Instrumentality contains a central element of achievement
motivation, which is defined as motivation directed at the
attainment of goals. This construct is further broken down
into three distinct components: Mastery needs or the desire
to undertake new and challenging tasks; Work needs or
satisfaction and pride in working well; and Competitiveness
or the desire to surpass others in all areas of endeavors.
The achievement motivation constructs are further
supplemented with two more universal scales. Instrumentality
and Achievement Striving, which address the motivation for
attainment. Instrumentality is also measured on a negative
scale which reflects an autocratic, dictatorial orientation.
Individuals that rate high on this scale tend to achieve
goals at the expense of others and without regard for their
sensitivities.
The Expressive attributes contain four Subscales, one
positive and three negative. Expressivity consists of
traits reflecting warmth and sensitivity to others. Verbal
Aggression refers to a type of nagging hostility directed
toward others. Negative Communion refers to a passivity in
interpersonal relations. Impatience/Irritability refers to
a pattern of drive and annoyance in dealing with others.
81
The labels for the various scales range as follows: very
poor, poor, average, good, very good. These are arranged so
the "very good" always has the most positive score on that
scale, even though some of the scales measure negative
characteristics. Thus "very good" on the verbal aggression
scale means the client has "less" of this bad trait. PCI
factor composition by specific questions is illustrated in
Appendix C.
Demographic Data
Each subject instructor pilot was asked to complete a
"Demographic Data" questionnaire (Appendix A). This
instriunent was used to obtain data relative to biographical
information, professional development, educational
background, and family structure. The demographic survey
provides the independent variables for the second regression
model. This is an exploratory feature of the study that
attempts to validate future Air Force intent to emphasize a
more mature, experienced instructor force. Since the Air
Force has decided to select future instructors with
operational backgroiind rather than FAIPs (Curry, 1993), this
may provide key indicators as to what specific operational
backgrounds best con^lament instructor pilot positions.
VaricLbles from the demographic survey were compiled
from: previous aviation instr\unents. Air Training Command
surveys, higher education surveys desigrned by Dr. Alexander
82
Astin at UCLA, and recommendations from senior Air Wing
officers. The validity of the instrument was established
through a pilot study where recently graduated students,
senior supervising officers and peer instructors listed
demographic varieOjles that influenced instructors'
attitudes, behavior, and performance.
Research Procedures
The research procedure for this study consisted of
three key objectives: permission to survey the sample
population, implementation of the survey, and statistical
analysis of the data collected. This section covers the
permission and implementation of the survey. Statistical
analysis is covered later in this chapter under Statistical
Analysis. Permission to conduct the survey was obtained
from the Air Force Military Personnel Center (AFMPC/DPYMOS) .
A copy of the approval letter is in Appendix B. After Air
Force approval. Wing Commanders of the four training bases
were contacted soliciting participation and support for the
study. Appendix B illustrates a sample of a personalized
letter to the Wing Commanders. Two of the four Commanders
contacted felt they had the time to support the study, those
from Reese AFB, TX and Vance AFB, OK. The two Wing
Commanders introduced the study's objective to appropriate
staff and significantly paved the way for full cooperation
by their flying organizations.
83
Survey implementation was personally administered by
the researcher and required three weeks. Operations
Officers from the flying squadrons directed short-notice
mandatory flight meetings. All instructors and students were
required to attend. The study was introduced and the survey
was subsequently implemented, requiring approximately 30
minutes of instructor's time and 10 minutes from students.
The students were placed in different rooms than the
instructors. Instructors completed the full survey, while
the students simply completed the instructor performance
assessment section. Although participation was announced as
voluntary, almost all sample target subjects participated.
Only those students and instructors that were eibsent from
the mandatory meeting were omitted. Most of these subjects
were later included in a make-up session.
Instructors were briefed and surveyed before students
since their participation required the greatest amount of
time. They were introduced to one section of the survey at
a time in a blind research desicfn fashion and did not begin
the next sections until all peers were complete. This was
done to achieve focused and honest self-reporting. It was
assumed that if the instructors knew the study was attached
to perceived performance ratings it may have biased the
self-reporting personality section of the survey. The first
section completed by Instructors was the demographic data
collection followed by the personality characteristics
84
inventory. Personality assessment is one of the key tools
used in airline hiring, so instructor pilots were eager to
cooperate in this part of the survey to learn the
characteristics of their profiles. Instructors were briefed
on the perils of over- and underreporting on a self-reported
personality assessment and clearly understood the importance
of an honest reporting.
The final section of the survey, peer performance
appraisal, was met with some resistance by the instructors.
They were reluctant to report on peers or have peers report
on them for fear the results may be incriminating. Privacy
Act statements were distributed at this time and the
identification coding operation escplained. The first step
for all subjects in completing the NASA/UT Astronaut
Assessment Survey was validating the scales. Each rater was
directed to the second page of the instrument where each
scale was escplicitly defined. The raters were then
instructed to rate the appropriateness of each scale in
assessing instructor pilot performance. They used a Likert
scale to respond to the following statement: "I feel this
scale is appropriate in assessing instructor pilot
performance." The raters then listed the names of all
Instructor in their flight on the top of the rating matrix
on the first page. The names were read by the
administrators, ordered by rank. After the names were
listed each rated proceeded to rate as many instructors on
85
as many scales as possible. After completing the
performance appraisal section, instructors placed a random
four digit coding number on the top of the survey and the
names on the performance appraisal instrument were replaced
by the researcher with codes. This operation was done in
front of the instructors so anonymity of their participation
was convincingly reinforced.
While instructors completed the first two sections of
the survey, students were briefed on their participation in
a separate room. Students first wrote in the names of all
instructors in the flight across the top of the appraisal
matrix (Appendix A) . The order of instructor names were
provided by the administrator and was arranged by order of
rank. The students simply filled out the performance
appraisal section assessing all the instructors in the
flight. There was no identification of the student attached
to the instrument so participation was completely anonymous.
VarieJjles
The multiple regression portion of the study used
personality traits and demographic characteristics to
predict perceived performance. The dependent (criterion)
vari£Lble was the overall perceived performance rating. It
consists of a continuous interval scale: 1 (poor), 2 (below
average), 3 (average), 5 (above average), 6 (good), and 7
(excellent) . A unicnie distinction in the scale is the
86
missing value response of 4. This feature was added after
the pilot study to help control perceived rating bias. If
an instructor was new to the flight, not all students or
peers may know him/her well enough to evaluate. The 4 value
was then used to indicate an inability to assess, or missing
value. The 3 and 4 values were highlighted on the survey to
distinctly differentiate between an average rating score and
a missing value. In the regression design, the 4 value was
treated as missing data.
The three regression equations used illustrate the
relationship between the dependent variable of perceived
performance of three different sets of predictors:
personality traits, demographic characteristics, and a
combination of personality traits with demographics. The
first regression model used the eleven self-reported
personality trait scores as predictors. These scales all
contained a continuous interval scale and ranged from a 1
(strongly agree) to a 5 (strongly disagree). Some of these
questions were a reverse scoring scale. In all cases,
however, a high score indicated more of the desirable or
good quality. The second regression model used demographic
characteristics as the independent variable to predict
overall perceived performance rating. Both regression
models were developed from the comprehensive cluster
sampling collected from Reese and Vance Air Force Bases.
This procedure is further defined in the following section.
87
Statistical Analvsis
The major research question of this study is concerned
with the prediction potential that personality traits may
have on perceived instructor pilot performance. Prediction
studies require the use of regression, and in this case
since multiple independent variables are involved, multiple
regression. A stepwise multiple regression equation was
developed using the eleven independent variables of
personality trait scores to predict an overall perceived
performance score. There are three features in this design
that require further explanation: the validity assessment of
the dependent variable (perceived performance), the
computation of an overall perceived performance score, and
the stepwise regression technique used to construct the
prediction equations.
The initial concern of this study was the valid
representation of the dependent variable, perceived
performance. Perceived performance was measured with the
NASA/UT Astronaut Assessment Survey. The validity of this
instrument was established by asking the various rating
groups (students, supervisors, peers) to rate each
performance scale for appropriateness in assessing an IP's
job performance. Each rating construct had explicit
definition and examples of what it included. A Likert scale
ranging from Strongly Agree to Strongly Disagree was used to
respond to following question for each scale: "I feel this
88
scale is appropriate in assessing instructor pilot
performance." The Likert scale ratings of each scale were
then averaged for each group. An ANOVA was used to
distinguish significant statistical differences between the
group ratings. The results identified the perceived
appropriateness of the performance scales in measuring
instructor pilot performance and were contrasted with
findings from the literature.
The seven performance scales were combined to create a
single performance score. Each instructor would therefore
have a single performance score from each of the three
rating groups. The overall performance score, which
represents the dependent varieible in the regression
equation, was computed from a weighted computation using the
ratings from three groups (Equation 1). Each group
(students, supervisors, and peers) rated an instructor in
seven categories on a seven point scale. The student and
peer groups each represented 40 percent of the weighting
with supervisors constituting the remaining 20 percent.
This weighting proportion is based on a similar ATC student
pilot assessment design (ATCR 51-10, Attachments 1 & 2) . An
ANOVA was also used to distiguish statistical significant
differences among the group ratings.
Stepwise regression is a technique used to control the
order in which the independent variables are entered and
removed in the regression model. The independent variable
89
with the largest Pearson correlation with the dependent
variable is entered into the equation first. The second
independent variable selected, is the one that results in
the largest increase in R2 beyond that of the first
variable. At each step after a new predictor variable is
added to the model, a second significance test is conducted
to determine the contribution of each of the previously
selected predictor variables, and a predictor variable may
be removed if it looses its effectivesness when combined
with the additonal variables. The advantage of this stepwise
regression technique is that some of the possible overlap
between variables is moderately controlled, and the
strongest relationships between predictor and criterion
variables are entered first.
Research Concerns
The performance computation representing the dependent
variable of the regression equation presents a major concern
to the study. It is derived from a seven-scale performance
assessment instrument, where the mean scores from the scales
are averaged to provide a single performance measurement
value. The scales are combined to create a comprehensive
measure of performance in the regression design. The
combining of performance scales, however, may dilute or
"cloud" the actual performance rating. A better method to
investigate a multiple factor dependent varicdDle is
90
structural modeling. This method was not used in this study
due to its complexity and time constraints.
Another concern in the performance variaible is its
multiple group weighting. The overall score represents
combined ratings from three distinctly different groups.
This procedure was accomplished in an effort to create a
comprehensive, \iniversal performance measurement
representing all the customer groups associated with an
instructor pilot performance.
Significance for Policy and Theory
This study contributes to the use of personality trait
theory in aviation training, and further develops
performance assessment criteria for Air Force instructor
pilots. Historically, personality theory application in
aviation has focused on pilot selection screening. This
study extends beyond simple selection application and
explores the use of personality theory applied to instructor
pilot assessment and classification. It adheres and
possibly pioneers the original recommendation of Thorndike
(1946) that personality research should be used for
performance assessment of pilots in their operational
aircraft. Additionally, the study further explores presage
variaibles of faculty performance by developing relationship
correlates between personality, and demographic
characteristics with student and peer instructional ratings.
91
The contribution to the applied nature of this research
is even greater. Visibility of the personality findings
will underscore the need for special instructor skills in
communication and instructor/student relationships. The
instructor pilot training syllabus may be enhaced to expand
and include further development of these skills. The
performance appraisal section may offer a new tool for both
formal and development feedback and assessment of instructor
pilots. The instrument provides new job performance
criterion that is currently either subjectively measured or
not measured at all. Additionally, the performance
assessment from this study will highlight the 360 degree
performance appraisal technique for instructor pilot
assessment. Such a technique adds validity to performance
ratings by accounting for multiple, unique perspectives of
an instructor's performance. The 360-degree feedback is
more comprehensive which may provide instructors more
detailed and astute performance critiques that will enable
them to better adjust their teaching styles, therby
increasing the instructional process.
CHAPTER IV
ANALYSIS OF DATA AND DISCUSSION
This study investigated the use of self-reported
personality trait scores and demographic variables in
predicting instructor pilot perceived performance ratings.
Three regression equations were developed to predict
perceived performance using: personality traits, demographic
characteristics, and a final equation using both personality
traits and demographic characteristics. Perceived
instructor pilot (n=152) performance ratings were determined
by three groups of raters: students (n=271), peer-
instructors (n=133), and supervisors (n=19). Each group
rated instructors using seven behavioral performance
assessment dimensions. These seven behavioral dimensions
were then collapsed to derive an overall perceived
performance rating called the "Grand Mean" for each
individual instructor pilot. An "Overall Grand Mean" was
then computed by combining the Grand Mean ratings from the
three rating groups. The Overall Grand Mean served as the
dependent variaible in the regression equations and
represented a global performance measure of instructor pilot
performance.
The following research hypotheses were investigated.
1. There will be no difference in the appropriateness
ratings of the seven behavioral assessment scales from the
92
93
groups (students, peer-instructors, and supervisors).
2. There will be no difference in perceived
performance ratings of instructors by students, peer
Instructors, supervisors and self.
3. There will be a significant relationship between
perceived effectiveness ratings of instructor pilots at UPT
and the following personality trait scale scores:
instrumentality, expressivity, mastery, work,
competitiveness, achievement striving.
4. There will be a significant relationship between
the following personality trait scale scores and perceived
effectiveness ratings of instructor pilots at UPT: negative
instrumentality, verbal aggression, impatience/
irritability, negative communion.
5. Personality traits can be used to create a
predictive profile of instructor pilot performance.
6. Demographic characteristics can be used to create
a predictive profile of instructor pilot performance.
7. Personality traits and demographic characteristics
can be used to create a predictive profile of perceived
instructor pilot performance.
In order to test these hypotheses, the investigator
collected data from two UPT bases. The cluster sampling
provided a comprehensive and stratified representation of
the population. Specific descriptions and comparisons of
the subjects to the population are listed in Tables 3 and 4.
94
Table 3
AETC Instructor Pilot Demographics
Population Sample 1 Sample 2 i%l Reese (%) Vance (%)
Gender
Male Female
Marital Status
Single Married
Commissioning Source
Aqe
Rank
Academy ROTC OTS
24 - 27 28 - 30 31+
1 LT Captain
N = 708
97.5 2.5
37.2 62.8
35.0 50.0 15.0
57.0 28.0 14.0
30.3 69.0
n = 101
97.0 3.0
37.6 62.4
32.7 52.4 14.9
57.4 26.7 14.9
47.5 50.5
n = 51
98.0 2.0
31.4 68.6
33.3 51.0 15.7
62.7 25.5 11.8
60.8 39.2
95
Table 4
AETC Instructor Pilot Flying Experience
Flying Hours Population Sample 1 Sample 2 (%) Reese(%) Vance(%)
Total flying hours
\inder 500
501 - 800
801 - 1200
above 1200
Instructor Pilot flying hours
under 200
201 - 500
501 - 1000
above 1000
N = 708
6.8
34.9
39.1
19.2
4.5
43.4
46.2
5.9
N = 101
5.9
28.7
38.6
26.7
6.9
51.5
39.6
2.0
N = 51
0.0
35.3
43.1
21.6
0.0
47.1
49.0
3.9
Entire classrooms (flights) were sampled at a time.
Each flight was composed of roughly 10 instructors, 15
students, and one supervisor. The representative instructor
pilot is a 27-year-old male, married with no children,
holding a bachelors degree, commissioned from either the Air
Force Academy or a ROTC prograun, with approximately 950
hours of total flying experience, and 500 hours instructor
96 pilot flying time. Specific population parameters for the
AETC instructor force were obtained from AETC Headcjuarters
and help illustrate typical profiles of the homogeneous
group (see Tables 3 and 4).
The samples for this study closely paralleled the
populations' demographic pareimeters. The Reese sample was
slightly older, and more were married than the population
and the Vance sample. The combination of Reese and Vance
minimizes this variance. Both samples were of higher
military rank and flying experience. This was expected due
to the assignment stagnation caused by the military
drawdown. Instructor pilots are rotated almost 18 months
later than previously to follow on assigrument. AETC has
confirmed this recent practice as drawdown driven and
indicated the entire population is rapidly indicating more
experience and higher age. AETC has confirmed the sample
demographic statistics are representative of the changing
population parameters.
Instructors completed a demographic survey followed by
a self-reporting personality inventory. After completion of
the first two instruments, the instructors and students were
given a perceived performance rating instriunent. Until this
time all subjects were blind to the performance assessment
objective. Instructors and students were then instructed to
use a Likert scale to rate each instructor in the flight
across seven defined performance dimensions.
97 Performance Ratings
The first analysis was conducted on the dependent
variable of performance rating. The following hypothesis
was investigated.
1. There will be no difference in the appropriateness
ratings of the seven behavioral assessment scales from the
NASA/UT Astronaut Assessment Instrument by the three rating
groups.
This hypothesis was rejected. The rating groups
differed on the appropriateness of four of the seven scales
contained in the NASA/UT performance assessment instrument.
In preparing to test the first hypothesis, the overall face
validity of the NASA/UT instrument was explored. The first
analysis simply explored the perceived external validity of
the NASA/UT Astronaut Assessment Instrument using an ANOVA
process with a significance test of a=.05. Since this was
a new performance assessment in the aviation training
environment, expert opinion was required concerning its
appropriateness in measuring instructor pilot performance.
Each group of raters (students, peer-instructors, and
supervisors) were asked to rate the seven performance scales
for applicability to instructor pilot duties and job
expectations. Figure 1 illustrates the mean responses of
the individual groups. All seven performance dimensions
were rated eibove a 5, or "agree," in scale appropriateness
for instructor pilot assessment. These ratings indicated
^
c «
CO 9 :
o (0
•g Q. 3 w
98
(/> c o (0
o "c E E o O
§»
I
.•^ (0 r o
ers
Q.
j£
o ^ F (D 0) \—
Q.
(0 9)
ead
—I
ess
Q.
erf
a
Q) O C (D
F u. p t :
c 0 M uaiui
ce
D
c a rf
orn
i
« a
dis
>« o
ron
and
1 a
s st
s SI O) 10 > l
m e o M
ith
$ «
sea
t o
int L
ik
^
C o 7,
mea
sur
S. a
res
o u 0)
ffi u a c e o
C4rf i-l a>
•p o
04
• 2 "
1-1
0) ^J
o en
-H PM
10 >
u
erfo
rman
04
XJ <l> >
'UT P
erce
i < Ui
S
•P to a
• r - (
essi
ng
as
s
a
pro
pri
ate
fe
el
this
sc
ale
is
00 to
(O
u> •* (O
(V4 U>
(O 00 ID
<o in
•* *o
eg lO
lO
6u!)e;d ssauaietJdojddv
99 that all three rating groups accepted all seven dimensions
of the performance appraisal instrument as valid in
assessing instructor pilot (IP) performance. The NASA/UT
astronaut assessment instrument appears to be a valid and
accepted instructor pilot performance appraisal instrument.
Three constructs of the performance instrument were
especially recognized as significant in IP appraisal. Job
knowledge, job performance, and communication were clustered
near the top of the rating scales by all three rating
groups. These constructs were rated near an average of 6.4,
"strongly agree" in appropriateness for measuring IP
performance. Interestingly, these constructs also compose
the current IP appraisal criteria. The existing technique
of IP performance appraisal consists of a written multiple
choice examination, an oral situational critical thinking
scenario, an in-flight maneuver assessment, and an
instructional phase of flight critique. The written and
oral exeunination was represented by the NASA/UT construct of
job knowledge. The in-flight maneuver assessment paralleled
the NASA/UT job performance construct, and the Instructional
critique component was represented by the NASA/UT
communication construct. These three constructs (job
knowledge, job performance, and communication), emulate the
current IP appraisal criteria which are currently in
practice and accepted as valid performance appraisal
measures. Additionally, the other four NASA/UT constructs
100 (performance-iinder-pressure, leadership, teamwork, and
personality) were also recognized as important performance
appraisal criteria by all three rating groups. These four
dimensions are currently not measured in Air Force IP
performance evaluations.
Although the NASA/UT astronaut assessment instrument
appears to have been a valid measure of perceived instructor
pilot performance, the first hypothesis was not supported.
The three rating groups statistically differed in
appropriateness rating across four scales: personality,
performance under pressure, teamwork, and leadership. The
ANOVA tcLbles are in Appendix D. The sample size for this
analysis was larger than the sxibsequent regressions, and the
other ANOVA procedure that examined differences in
instructor pilot rating among groups. The appropriateness
assessment uniquely included evaluator pilots that are not
associated with flights, but are some of the most
experienced and expert pilots in the squadron. It should
also be underscored that although the rating groups differed
on the magnitude of the importance for these constructs, all
three groups agreed the constructs were appropriate in
measuring IP performance. This finding complements the
literature review of the 360-degree performance appraisal
technique, which states each rating group has unique
Insights about an individual's performance (Woodruffe, 1984;
Harris & Schraubroeck, 1988). Thus, the various rating
101 groups would exercise their unique needs in assessing the
appropriateness of the various criterion.
The differences in group appropriateness ratings
confirmed expectations that the various groups had different
performance needs and expectations. The student group rated
personality higher in appropriateness (M s 6.23) than the
other rating groups; M = 5.85, F (2,492) = 7.23, p < .001.
Students appear concerned with the personality of their
instructor pilot. The student spends hours alone with a
single instructor, enduring both complex instruction and
personal critiques. An instructor pilot with a more
accommodating personality would make this time more
tolerable and possibly more enjoyable. This observation was
first highlighted by Getzels and Jackson (1963) who cited
the teacher-as-a person (personality traits, attitudes,
sense of humor) may radically affect student ratings due to
certain behaviors in the classroom. The long, personalized
instruction a student receives from a single, one-on-one,
instructor process underscores the importance of
instruetor/student compatibility. It would naturally follow
that students would emphasize personality as an important
performance criterion for instructors.
Peer-instructors also have a unique need and insight to
performance expectations. They rated performance under
pressure (M = 6.44) as more appropriate than the other
rating groups; supervisors M = 6.13, students M = 5.94, F
102 (2,492) = 11.21, p < .001. The peer-instructor understands
the grave severity of performing the right action at the
right time in a cockpit. They have witnessed how quickly
overloaded a pilot may become during a pressure situation,
and the consequences of inappropriate actions. Often the
lives of wingmen and students depend on the immediate and
decisive action by a single instructor pilot. Since peer-
instructors fly everyday, much more than the supervisor or
students, they have a greater awareness and tangible
appreciation for the "good airsense" in a pressure
situation. It would follow, to the peer-instructor,
performance under pressure is a daily threat, and thus, a
more important performance criterion.
Management responsibilities often remove the supervisor
from many of the daily events and routine tasks on the
flightline. Supervisors often depend on delegation to
subordinates in achieving daily operations. It would follow
for supervisors to emphasize teamwork and leadership as
important performance criterion for subordinate instructors.
Hogan (1978) determined supervisors often value more global
values (teamwork, dedication, vision, and leadership) in
the workplace than due workers. Indeed, for this study, the
supervisors rated teamwork (M = 6.05) much higher in
appropriateness than the other two rating groups; M - 5.93
and 5.74, F (2,492) = 3.28, p < .038; and leadership (M =
103 5.95) was also rated higher by supervisors than the other
groups; M = 5.74 and 5.63, F (2,492) = 4.77, p < .015.
Two important notes concerning these results: (1) the n
value is higher for this analysis than the regression
equation because the appropriateness assessment included
evaluator pilot opinions that could not be matched to a
flight, and therefore were excluded from the regression
equation sample; (2) although there are significant
differences between the group ratings on these scales, there
may be little practical significance. All the scales
appropriateness ratings were at a minimum Likert scale value
of 5, "agree," in appropriateness for assessing instructor
pilot performance. It is difficult to qualitatively
differentiate between a "5.0" agree rating and an "5.5"
agree rating. The significant statistical differences do,
however, illustrate the different job performance needs of
the different groups.
With the external validity of the performance appraisal
instrument established, the following second performance
hypothesis was tested.
2. There will be no difference in perceived
performance ratings of instructors by students, peer
instructors, supervisors and self.
This hypothesis was rejected; there is a difference in
perceived performance ratings for an instructor between the
various rating groups. The mean perceived performance
104 ratings across the various rating groups ranged from 5.05 to
6.1, the equivalent of "good" to "very good" on the
corresponding Likert scale (Table 5). The ANOVA tables of
this analyses are found in Appendix D. Students rated
instructor pilots higher on the Job Competence (Knowledge)
scale (M = 6.1; F(2,572)=7.89, p<.001). Peers rated
instructor pilot performance higher on Teamwork (M = 5.67;
F(2,572)=18.33, p<.001). Personality (M = 5.62;
F(2,572)=18.72, p<.001), and Commimication Skills (M = 5.64;
F(2,572)=15.33, E<.001). Finally, Supervisors rated
instructor pilots lower on Leadership (M = 5.05;
F(2,572)=12.64, p<.001). Although there are small
statistical differences between group ratings, all groups
rated instructor pilots between "good" and "very good."
Students rated instructors higher on their job
knowledge than did the peers and supervisors. Student
inflation of this dimension was expected from the literature
review which attributes inflated ratings of an instructor's
subject knowledge to the student's naive knowledge base
(Lazovik, 1987). Compared to the student's own knowledge,
IPs appear to be very knowledgeable. However, when other
IPs are consulted who are more qualified to judge
professional job knowledge (expert opinion), IP knowledge
ratings decrease. The inflation by students of instructor's
job or course material knowledge is a typical occurrence in
Table 5
105
Perceived Performance Ratings
Performance Scale
Job Competence (Knowledge)
Job Competence (Performance)
Job Competence (Performance Under Pressure)
Leadership
Teamwork
Personality
Communication Skills
Grand Mean
Students n - 271
Mean (SD)
6.10 (.53)
5.97 (.63)
5.88 (.68)
5.31 (.96)
5.54 (.90)
5.42 (1.18)
5.52 (.94)
5.68 (.75)
Peers n - 133
Mean (SD)
5.88 (.63)
5.83 (.65)
5.86 (.62)
5.30 (.82)
5.67 (.76)
5.62 (.78)
5.64 (.67)
5.68 (.58)
Supervisor n - 19
Mean (SD)
5.76 (.88)
5.78 (.96)
5.47 (1.00)
5.05 (1.08)
5.37 (1.11)
5.27 (1.02)
5.22 (.90)
5.41 (.68)
Self n - 152
Mean (SD)
6.07 (.79)
6.26 (.75)
6.22 (.78)
5.82 (.88)
6.11 (.81)
6.07 (.93)
5.91 (.89)
6.07 (.55)
* Overall n - 423
Mean (SD)
5.94 (.46)
5.88 (.48)
5.79 (.49)
5.25 (.71)
5.56 (.63)
5.47 (.79)
5.51 (.58)
5.63 (.49)
* Overal l Mean » (0 .4) Student Ratings ••• (0.4) Peer Ratings -i- (0.2) Supervisor Rat ings . Sample for Overall Rating does not inc lude Se l f R a t i n g s .
106 student classroom critiques and instructor performance
appraisal (Marsh, 19 87).
Peer-instructors rated IPs higher on teamwork,
personality, and communication skills than did students and
supervisors. Much of this difference is likely due to
familiarity. The peer instructors form a cohort group which
know, and understand one another better than would the
supervisor or students. Due to the close daily interaction
with one another, the peers have more opportunity to observe
each other's behavior. The high ratings in the performance
dimensions of teamwork and communication are indicative of
cohesive, elite groups (Hogan, 197 8).
Across all seven performance dimensions supervisors
rated IP performance lower than did students and peers.
Only on the leadership scale, however, were supervisor
ratings of IPs significantly lower than the other two
groups. The comparative conservative ratings by
supervisors are most likely due to their more developed
experience in performance appraisal. Supervisors have been
formally trained in subordinate performance evaluations.
Appraisal Inflation control and specific criterion
observations are part of a supervisor's formal training and
practice. This formal training, and their previous
experience would account for the lower ratings given by
supervisors.
107 A fourth group rating (Self) was provided in Table 5 to
provide a baseline comparison to later measure the need for
feedback. The self-rating group is simply the instructor
pilots rating their own performance. Across all seven
behavioral assessment scales, self-ratings were higher than
the three actual rating groups. The ANOVA tables for these
analyses are found in Appendix D.
Performance: F (2,572) = 26.44, p < .001
Knowledge: F (2,572) = 7.89, p < .001
Pressure: F (2,572) = 12.64, p < .001
Leadership: F (2,572) = 23.32, p < .001
Teamwork: F (2,572) = 18.33, p < .001
Personality: F (2,572) = 18.72, p < .001
Commxinication: F (2,572) = 15.33, p < .001
The self-rating inflations are illustrative of the need for
performance feedback. Individuals typically overrate their
performance compared to perceptions from others (Hazucha,
1991; Harris & Schraxibroeck, 1988) . Instructor pilots
significantly over-rated their own performance on all
dimensions.
Beside testing the first two hypotheses, the
performance analysis developed an overall performance rating
that provided the criterion variable in subsequent
regression models. Perceived performance for each
instructor was measured across seven dimensions that were
then collapsed Into a single performance measure called the
108 Grand Mean. The Grand Mean was further collapsed to reflect
measures from the various rating groups. This value is
referred to as the Overall Grand mean and is derived through
the following equation: Overall performance mean = 0-4
(student rating) + 0.4 (peer rating) + 0.2 (supervisor
rating), as defined by ATC (ATCR 51-10, Attachments 1 & 2).
The final assessment of perceived performance involved
comparing the overall perceived performance rating (Overall
Grand Mean) to the various rating groups. Teible 6
illustrates the relationship between the collapsed overall
performance rating and the various rating groups. As
suspected, rating groups with the largest sample size and
with the higher weighting in the overall equation have the
highest relationship with the overall performance rating.
The single exception is the rating from the self group.
Self-assessments which were not included in the overall
performance computation are marginally related (r = .29) to
the overall rating. This weak relationship between self and
perceived performance ratings underscore the need for
multiple perspectives in job performance feedback.
In summary, the performance appraisal analyses
determined a valid performance assessment criteria and
instrument. Various rating groups differed on magnitude of
importance for the various criterion, but agreed overall the
entire instrument was appropriate in measuring instructor
pilot performance. The performance ratings of instructors
109
Table 6
Group Ratings Correlation Comparisons
Rating Group Correlation value (r) with Overall Rating *
Student .81
Ratings
Peer Ratings .77
Supervisor .55 Ratings Self Ratings .29
* Overall Rating = (0.4) Student Ratings + (0.4) Peer
Ratings + (0.2)Supervisor Ratings. p<.01.
110 were also different between the rating groups, indicating
the unique insights to an instructor's performance the
various groups possess. Finally, there was a weak
relationship determined between self and perceived
performance ratings. This finding illustrates the critical
need for performance feedback and the valuable insights
multiple perspectives may offer.
Personality Trait Measures
After analyzing the dependent variable of perceived
performance, the first set of independent variables
concerning personality traits were assessed and used to
construct the first prediction equation. This analysis
investigated the next set of hypotheses.
3. There will be a significant relationship between
perceived effectiveness ratings of instructor pilots at UPT
and the following personality trait scale scores:
instrumentality, expressivity, mastery, work,
competitiveness, achievement striving.
4. There will be a significant relationship between
the following personality trait scale scores and perceived
effectiveness ratings of instructor pilots at UPT: negative
instrumentality, verbal aggression, impatience/
irritability, negative communion.
5. Personality traits can be used to predict the
perceived effectiveness of instructor pilot performance.
Ill
The first assessment of the instructor pilot's self-
reported personality trait scores involved a comparison of
trait mean scores with the instruments' developmental
baseline of Airline pilots, as recommended from Gregorich,
Helmreich, and Wilhelm (1989) . This was accomplished to
assess for valid self-reporting. UPT instructor pilot (IP)
trait scores closely paralleled those from the airline pilot
data base. Table 7 illustrates the comparison of
personality trait score means between the two samples.
From a 5-point scale with 5 representing the most
desirable score, IPs differed only slightly from airline
pilots. Positive personality trait attribute comparison
revealed that IPs reported small increases in trait scores
(range = .20 to .48) towards the more desirable direction.
One exception was a decrease in IP scores of .1 in
Expressivity. Negative personality trait attributes also
contained marginal differences with the exception of one
scale, Impatience/Irritability (I/I). IPs reported an I/I
score of almost an entire point (M = .96) lower than the
airline sample, indicating a less desirable score. This may
be reflective of a "non-volunteer" attitude for their
present assignment. As discussed in Chapter I, many
instructor pilots are resentful of their current flying
position as instructors. Overall PCI scores for the
instructor pilot s£unple closely paralleled the airline
sample database. Significant differences could not be
Table 7
Personality Trait Comparison
112
Personalitv Trait
Positive Attributes
Achievement Striving
Competitiveness
Work
Mastery
Expressivity
Instrumentality
Bipolar Instrumentality/ Expre s s ivi ty
Neaative Attributes
Verbal Aggression
Negative Instrumentality
Negative Communion
Impatience/ Irritability
Combined SamDles
Mean (SD)
n-
2.91
2.94
3.64
2.75
2.72
3.29
2.34
1.25
1.61
1.37
1.84
152
(.57)
(.67)
(.41)
(.48)
(.48)
(.40)
(.38)
(.71)
(.56)
(.49)
(.78)
Airline Sample Mean
n-121
not available
2.46
3.41
2.54
2.82
2.98
not available
1.32
1.49
1.44
2.80
All scores range from 1 to 5 with 5 representing the more desirable or "very good" value.
113 assessed because standard deviations for the airline sample
would not be released by the owning airline.
The second step in assessing IP personality trait
scores investigated the correlation between perceived
performance ratings and personality traits. T2dDle 8
illustrates the Pearson product correlation coefficients
between personality trait scores and various group perceived
performance ratings. The first personality hypothesis was
tested in the following form.
3. There will be a significant relationship between
perceived effectiveness ratings of instructor pilots at UPT
and the following personality trait scale scores:
instrumentality, expressivity, mastery, work,
competitiveness, achievement striving.
This hypothesis was rejected. None of the positive
personality trait attributes had a significant correlation
with overall perceived performance ratings tested at an
alpha = .05. The positive personality traits resulted in
very small relationships (r=-.01 to -.15) with overall
perceived performance. Additionally, the direction of the
relationship was sporadic. The self-ratings did, however,
have several significant correlations with the positive
personality traits. Self-rating had direct relationships
with four of the seven positive personality scales:
Achievement Striving (r = .17), Expressivity (r = .25),
Instrumentality (r = .28), and Bipolar-Instrumentallty
114
Table 8
Correlation Values of Group Performance Rating and Personality Traits
Positive Attributes
Achievement Striving
Competitiveness
Work
Mastery
Expressivity
Instrumentality
Bipolar-Ins triimen tali ty/ Expressivity
Negative Attributes
Verbal Aggression
Negative Instrijmentality
Negative Communion
Impatience/ Irritability
Student Ratings n = 271
-.11
-.10
-.01
-.06
.10
.07
.01
-.18*
-.16*
-.16*
-.18*
Peer Ratings n = 133
.02
-.12
.10
-.03
.22*
.03
-.11
-.12
-.14
-.09
-.16*
Supervisor Ratings n = 19
.07
-.02
-.04
-.04
-.04
.19*
.14
-.08
-.06
-.16*
-.04
Self Ratings n » 152
.17*
-.04
.02
.08
.25*
.28*
.23*
-.16*
-.05
-.21*
-.07
Overall Rating n - 423
- .04
-.12
.03
-.06
.15
.11
-.01
-.19*
-.18*
-.18*
-.19*
* significant at the .05 level.
115 /Expressivity (r = .23). The self-ratings likely resulted
in stronger correlations with positive personality traits
because the personality assessment instrxunent was also self-
reporting. Meibe and West (1982) conducted a meta-analysis
on 20 years of personality research and identified that
matched self-reporting instruments (personality and
performance) result in higher correlations. By collapsing
the ratings from three groups into a single overall
performance variable in this study, relationships with the
self-reported personality variable may have been diluted.
To strengthen correlations between performance appraisals
and personality measures, both instruments should possess
similar reporting procedures, both self-reported, or both
peer-reported.
The next personality hypothesis examined, tested the
overall performance relationship with negative personality
traits. It read as the following.
4. There will be a significant relationship between
the following personality trait scale scores and perceived
effectiveness ratings of instructor pilots at UPT: negative
instrumentality, verbal aggression, impatience/
irritability, negative communion.
This hypothesis was supported. All negative
personality trait attributes were significantly related
(a=.05) with the overall perceived performance ratings.
The four negative personality traits were inversely related
116 (r = -.18 to -.19) to overall perceived performance. The
overall performance correlations again reflect the dominant
sized student sample (n = 271). Perry (1979), while
assessing teacher personality correlates with student
critiques, determined students are likely to remember and
evaluate negative personality attributes of teachers more
than positive attributes. Such is likely the case for this
study. Students easily identify negative attributes of
instructor pilots and exaggerate the associated negative
performance rating. This would account for the significant
correlations with negative personality attributes and the
absence of significance with positive attributes.
In summary, positive personality traits had no
significant relationships with overall ratings. Self-
ratings, however, did have a moderate positive relationship
with the self-reported positive personality trait scores.
In contrast, all of the negative personality traits were
moderately related to overall performance ratings. Student
rating groups appeared to be especially instrumental in
driving the significant negative personality attribute
correlations. A personality trait inter-correlation table
can be found in Appendix D.
The final personality assessment established a stepwise
regression model. The following hypothesis was tested.
5. Personality traits can be used to predict the
perceived effectiveness of instructor pilot performance.
117 This hypothesis was marginally supported with a
regression equation. Results of regressing the overall
perceived performance rating on personality traits are
reflected in Table 9. The overall perceived performance
prediction model included two significant predictor
variables. Negative Communion (P = -.16), and
Impatience/Irritability (P = -.17). Both variables are
negative personality attributes indicating an inverse
relationship with weak magnitude. Personality traits
accounted for five percent of the variance in the perceived
performance prediction equation.
As expected from the previous correlation table, only
the negative personality trait measures entered as
significant variables in the stepwise regression equation.
The magnitude values (P = -.17) are typical of personality
assessment in aviation selection, which normally range from
0.15 to 0.25 (Damos & Gibb, 1986; Greuter & Herman, 1992).
Four reasons were cited in the literature review for
previous low personality correlations in aviation selection
studies: (1) range restrictions of partially screened
populations, (2) artificial success rates imposed by
military manpower needs, (3) dichotomized pass/fail
criterion variable, (4) Inappropriate performance test
development (Damos & Gibb, 1986) . Of these reasons, the
most likely contributors to the low correlations in this
118
Table 9
Personality Predictors of Overall Performance
Step
1
2
Variable
Impatience/ Irritability
Negative Communion
Zero r
-.19
-.16
Step Beta
-.19
-.16
Final Beta
-.17
-.16
Step F
Ratio *
5.7
5.1
* F-Ratio > 3.05 significant at .05 level. Mean=5.63, S.D.=.49; Multiple R=.25, Adjusted R2=.05; N=152
119 study are (1) range restrictions of partially screened
populations, and (4) inappropriate performance test
development. The IP force is currently composed of a
largely homogeneous sample. Referring to Tables 3 and 4,
the IP cadre is 97.5% male, 85% between the ages of 24 and
30, and 94% with less than 1000 hours flying time as an IP
(cLbout 2.5 years' experience). These demographic figures
illustrate a relatively homogeneous group which may have
contributed to the low correlations.
The other likely contributor to the low correlations
may be the performance test development. The performance
test itself, the NASA/UT astronaut assessment survey, was
not so much the likely contributor, as was the 360-degree
testing technique. The NASA survey was determined to
possess strong external validity as determined by the expert
opinions of supervisors, instructors, and students in the
\indergraduate pilot training environment. The 360-degree
assessment technique, however, may have diluted potential
correlations. The technique provided a single global
performance measure representing perceptions from multiple
groups. The global aspect of the technique may compromise
the self-reporting relationship with the self-reporting
personality assessment. Meta-analysis research conducted by
Mabe and West (1982) indicated that matched self-reporting
instruments (personality and performance) result in higher
correlations. Collapsing the ratings from three groups into
120 a single overall performance variable may have diluted the
self-reporting correlations.
Demographic Measures
The final analysis investigated relationships between
various demographic variables and perceived performance.
The following research question was investigated.
6. Demographic characteristics can be used to create
a predictive profile of instructor pilot performance.
This hypothesis was marginally supported. Demographic
questions were designed to solicit instructor pilot personal
profile information concerning areas of family structure,
professional development/experience, and career intentions.
Table 10 reflects the correlations between the various
rating groups demographic variables and overall perceived
performance ratings. Several variables had a significant
positive relationship with overall perceived performance:
age (r = .23), number of children (r = .26), time in service
(r = .21), rank (r = .28), and total flying time (r = .21).
Once again various rating groups differed in opinion on
which demographic variables are most indicative of
performance. Supervisor rating emphasized variables
reflecting family structure and maturity, such as age,
number of children, time in service, rank, and total flying
time. In contrast, students and self-ratings emphasized
flying and professional experience as being the most related
121
Table 10
Correlation Values of Group Performance Rating and Demographics
Student Peer Supervisor Se l f Overall Ratings Ratings Ratings Ratings Rating n » 271 n - 133 n « 19 n « 152 n - 423
Pearson Corre la t ions
Age
Number of Children
Time in Service
Rank
Total Flying Time
IP Flying Time
Civil Flying Time
Time as an IP
Spearman Correlations
Marital Status
Housing Status
SOS **
Previous Aircraft
Promotabillty
Career Intentions
.16*
.16
.23*
.23*
.15
.08
.05
.05
-.14
.13
.12
-.14
-.04
.13
.10
.21*
.01
.10
.07
.07
-.03
.09
-.14
.08
.08
-.11
.12
.18*
.28*
.25*
.24*
.33*
.33*
.13
-.06
.11
-.11
-.01
.23*
-.31*
-.04
.20*
.10
.10
.17*
.29*
.24*
.29*
.01
.29*
-.05
-.03
.05
-.08
-.08
.01
.23*
.26*
.21*
.28*
.21*
.12
.01
.10
-.18*
.11
.17*
-.23*
-.01
.21*
* s ign i f i cant at the .05 l e v e l . ** Squadron Officers School (Professional Schooling)
122 to performance, such as Instructor Pilot (IP) and total
flying time, rank, time in service. There appears to be a
difference in emphases between supervisor performance
criteria and instructor performance criteria. Perhaps the
real reasons are artificially driven by the military
hierarchy of command. The higher ranking officers are
generally older with established families and more diverse
experiences. These higher ranking officers are given more
opportunity to command and thus impact the greater whole of
the Instructor mission. This may be what the supervisors
are recognizing as performance, the contribution towards an
Air Force mission of training hundreds of pilots. The
instructor pilot on the other hand, has an "in the trenches"
outlook, with greater concern regarding individual students
and the daily administration of flight training. To the
instructor, performance is the "nuts and bolts" of producing
one pilot, with that being further limited to the days'
training objectives. The difference between the groups may
simply be their daily mission tasking and the big picture
outlook of what performance is measure against. A
demographic inter-correlation table may be foimd in Appendix
D.
A second stepwise regression method was applied to
determine a regression equation using demographic varieibles
to predict overall perceived performance ratings. The
regression equation included two significant predictor
123 variables. Number of Children (p = .22), and Rank (P = .24).
Table 11 illustrates specific results of the model. Both
predictor variables had a marginal magnitude and indicated a
direct relationship with perceived performance. Rank had a
significant correlation with performance for both the
student and supervisor rating groups. Its inclusion in the
regression equation is obviously due to the dominant sample
size of the student rating group (representing 64% of the
entire sample), and the high correlation (r=.33) from the
supervisor rating group. Number of children resulted in
moderate correlations (r=.21 to .25) for both peer and
supervisor rating groups. Beyond the statistical mechanics
of why these two variables entered the regression equation,
is the speculation of a wider, more comprehensive factor
representing both demographic variables of rank and number
of children. This factor may be labeled as social maturity.
Both rank and children imply more responsibility and
accountability. Friedlander (1963) determined older workers
with families are generally perceived as more stable and
productive performers by co-workers. He identified specific
perceived traits of these older workers as being
responsible, considerate, sincere, forgiving, and
accountcLble. Friedlander labeled these traits as social
maturity. The senior ranking IPs in this study appear to be
perceived as more socially mature and are recognized with
increased rating in perceptions of performance.
124
Table 11
Demographic Predictors of Overall Performance
Step
1
2
Variable
Rank
Number of Children
Zero r
.28
.22
Step Beta
.28
.22
Final SteD
Beta F Ratio *
.24 12.5
.22 10.5
* F-Ratio > 3.05 sigrnifleant at .05 level. Mean=5.63, S.D.=.49; Multiple R=.35, Adjusted R2=.11; N=152
125 A third and final regression equation was developed
combining personality traits and demographics to test the
final hypothesis.
7. Personality traits and demographic characteristics
can be used to create a predictive profile of perceived
instructor pilot performance.
This hypothesis was marginally supported. Teible 12
illustrates the results. When both sets of independent
variaibles were combined to predict overall perceived
performance, only three varieJ les proved significant: Number
of Children (P = .13), Rank (P = .22), and Verbal Aggression
(P = -.13). This regression model acco\inted for 14 percent
of the variance between overall perceived performance and
demographics combined with personality traits.
The original two predictor variables from the previous
personality regression equation dropped out of the new
equation: Impatience/Irritability, and Negative Communion.
These traits were replaced in the new equation by Verbal
Aggression. There was a significant correlation between
Verbal Aggression and Impatience/Irritability (r=.48), and
Negative Communion (r=.19). The overlap between the
varizUDles indicates the instability of the construct, and
ambiguity of the factor distinctions.
Table 12
Combined Demographics and Personality Traits Predictors of Overall Performance
126
Step
1
2
3
Variable
Rank
Nxunber of Children
Verbal Aggression
Zero r
.28
.22
-.19
Step Beta
.24
.22
-.19
Final
Beta
.23
.22
-.19
Step
F Ratio *
12.5
10.5
9.4
* F-Ratio > 3.05 significant at .05 level. Mean=5.63, S.D.=.49; Multiple R=.40, Adjusted R2=.14; N=152
127 Summary
The first analysis of the data assessed the perceived
performance measurement. Three major findings were
determined. First, external validity of the instrument was
estsiblished. All three rating groups agreed that the
performance measurement instrument was valid in measuring
instructor pilot performance. Second, an overall
performance rating was established. Seven behavioral
performance dimensions across three rating groups were
collapsed to derive an individual overall performance score.
The ratings from the various groups clustered very near the
derived overall rating score with various rating groups
emphasizing different behavioral performance dimensions.
The second analysis of data assessed personality trait
scores. The first assessment investigated the validity of
self-reported personality scores. Results indicated the
self-reported scores from IPs closely emulate the existing
database of airline pilots. The second assessment of
personality traits provided correlations with overall
perceived performance ratings. No positive personality
traits were significantly related with overall performance,
however, self-ratings of performance were significantly
related to the positive personality measures. On the other
hand, all negative personality traits were related to
overall perceived performance, reflecting small inverse
relationships.
128 The final analysis of data provide three regression
equations predicting overall perceived performance. The
first equation entered only personality traits. Two
significant variables: Negative Communion, and Impatience/
Irritability emerged. The second regression entered
demographic variables only and resulted in two significant
varieibles: Rank and Nxunber of Children. The final
regression equation combined personality traits and
demographic varieibles in predicting perceived performance.
Three significant variables resulted: Number of Children,
Rank, and Verbal Aggression. Conclusions and
recommendations are presented in Chapter V.
CHAPTER V
SUMMARY, CONCLUSIONS, DISCUSSION
AND RECOMMENDATIONS
There were three purposes to this study: to measure the
perceived validity of a new performance assessment
instrument, the NASA/UT Astronaut Survey, applied to
military flight instructors; to develop a global measurement
of instructor pilot performance; to construct regression
equations that predict overall (officership, flying, and
instructional) perceived instructor pilot performance using
personality traits, demographic characteristics, and a
combination of personality traits and demographic
characteristics. The following research hypotheses were
investigated:
1. There will be no difference in the appropriateness
ratings of the seven behavioral assessment scales from the
NASA/UT Astronaut Assessment Instrument by the three rating
groups (students, peer-instructors, and supervisors).
2. There will be no difference in perceived
performance ratings of instructors by students, peer
instructors, supervisors, and self.
3. There will be a significant relationship between
perceived effectiveness ratings of instructor pilots at UPT
and the following personality trait scale scores:
instrumentality, expressivity, mastery, work.
129
130
competitiveness, achievement striving.
4. There will be a significant relationship between
the following personality trait scale scores and perceived
effectiveness ratings of instructor pilots at UPT: negative
instrumentality, verbal aggression, impatience/
irritability, negative communion.
5. Personality traits can be used to create a
predictive profile of instructor pilot performance.
6. Demographic characteristics can be used to create a
predictive profile of instructor pilot performance.
7. Personality traits and demographic characteristics
can be used to create a predictive profile of perceived
instructor pilot performance.
Summary of the Study
In order to test these hypotheses, the investigator
collected data from two UPT bases. The cluster sampling
provided a comprehensive and stratified representation of
the population. Entire classrooms (flights) were sampled at
a time (n=22). Each flight was composed of roughly 10
instructors, 15 students, and one supervisor. Instructors
(n=152) completed a demographic survey followed by a self-
reporting personality inventory. After completion of the
first two instruments, the instructors and students (n=423)
were given a perceived performance rating instrument to
assess performance of all instructor pilots in the flight.
131
Until this time all subjects were blind to the performance
assessment objective. Instructors and students were then
instructed to use a Likert scale to rate each instructor in
the flight across the following seven defined performance
dimensions: job competence-knowledge, job competence-
performance, job competence-performance under pressure,
leadership, teamwork, personality, and communication skills.
The perceived performance appraisal instrument was
modified from the astronaut performance assessment survey
developed by a NASA/University of Texas project. A self-
reporting personality survey, the Personality
Characteristics Inventory (PCI), was also borrowed from the
University of Texas. The PCI is targeted to assess two
broad personality trait dimensions. Instrumentality or goal
orientation, and Expressivity or interpersonal capacities.
A demographic survey was also used to collect data on the
backgrounds of the instructor pilots. It was compiled from
previous aviation and higher education instruments and was
designed to collect information on: professional
development, education, and family structure.
The first analysis of data established the validity of
the performance assessment instrument. The three rating
groups, plus evaluator pilots, assessed the appropriateness
of the performance measures defined on the NASA/UT Astronaut
Survey.
132
The second analysis of data established the dependent
variable of an overall perceived performance rating. Three
groups (students, peer-instructors, and supervisors) rated
each instructor across seven performance dimensions. The
performance dimensions were then combined to provide a Grand
Mean performance rating. The Grand Mean for each rating
group was then combined through a weighted equation, [(0.4)
Student Grand Mean + (0.4) Peer-instructor Grand Mean +
(0.2) Supervisor Grand Mean], to derive a single overall
perceived performance rating, the Overall Grand Mean. The
dependent variable, therefore, represents the perceived
performance rating from three rating groups rating an
individual instructor across seven performance dimensions.
The final analysis of data established stepwise
regression equations predicting an instructor's overall
perceived performance. Three equations were developed using
the various independent variables of personality traits,
demographics, and a combination of both. A summary of the
data analysis are reported in the next section. Impressions
and implications are also noted from the researcher's
observations documented during the study.
Conclusions
The following conclusions are based upon the results of
this study. All groups reported "agree" to "strongly agree"
that the measurement constructs on the NASA/UT Survey were
133
appropriate in measuring Air Education Training Command
(AETC) Instructor Pilot performance. Additionally, subjects
repeatedly were enthusiastic in their support of the new
measure and its format. The findings additionally indicated
that the multiple group rating technique of perceived
performance was well received by the ratees and provided
valuable insights to perceived performance normally not
observed using other methods. The three regression
equations provided weak to marginal prediction of perceived
performance. Demographic variables were better predictors
of instructor pilot perceived performance than personality
trait variables. Two demographic variables (Number of
Children, and Military Rank) accounted for eleven percent of
the predictive model's variance. Only five percent of the
predictive model's variance was accounted for by two
significant personality traits. Negative Communion and
Impatience/Irritability. Combining demographic and
personality trait variables resulted in three significant
variables (Number of Children, Military Rank, and Verbal
Aggression), accounting for 14 percent of the model's
variance.
Discussion and Implications
Perceived Performance
Results from the first analysis of data indicated that
the three rating groups agreed the NASA/UT Astronaut
134
Assessment instrument was an appropriate performance measure
for instructor pilots. Moreover, two major findings of the
study were: (1) Various rating groups possess different
emphases of perceived performance criteria, (2) all types of
rating groups have biases that affect performance ratings.
The first finding impacts performance criteria policy
and theory. In this study, the three rating groups agreed
that all seven performance appraisal dimensions from the
NASA/UT instrument were appropriate in IP assessment. Three
dimensions (job knowledge, job performance, and
communication) received the highest appropriateness rating
by all rating groups. These dimensions closely parallel
formal IP performance evaluation criteria currently
practiced by the Air Force. The contribution of this study
is to identify four other performance appraisal dimensions
recognized as appropriate in IP appraisal, but not currently
measured by the Air Force. These dimensions include: job
performance under pressure, leadership, teamwork, and
personality.
Although these four dimensions were unanimously agreed
upon by the three rating groups as appropriate measures, the
emphases (magnitude of appropriateness) varied on the
different scales for the various rating groups. Students
highlighted "personality" as an important IP measure, peer-
instructors stressed "performance-under-pressure," and
supervisors identified "leadership" and "teamwork" as the
135
more important IP evaluation criteria. These findings
suggests that the three groups have different opinions of
what criteria constitutes quality performance. This
complements Boreman's (1974) conclusions that peers,
superiors, and subordinates hold unique pieces of the puzzle
which portrays an individual's job performance. The first
theoretical implication of this study supports Boreman's
observation that different groups possess different values
of performance criteria and different insights. It also
supports the 360-degree performance feedback technique. By
including multiple group perspectives and their associated
criteria emphases, a more comprehensive and representative
performance appraisal is achieved.
The policy implications of this finding suggests that
current IP performance appraisal criteria may be incomplete.
These four dimensions are not at the current time directly
observed, measured, nor documented in formal instructor
pilot job performance appraisal. Only supervisors'
perspectives of job performance, knowledge, and
communication is officially assessed. Not only may the
current criteria be deficient, but supervisors may also be
biased in their observations. This leads to the second
major finding from this research.
The second finding from the performance appraisal
analyses complements the literature review that each type of
rater possess specific rating bias. Four trends were
136
established. First, self-ratings were always higher than
the three formal rating groups across all seven assessment
scales. This finding emulates classical performance
appraisal theory that self-ratings tend to be more favorable
than ratings from other groups (Kirchner, 1965; Parker,
Taylor, Barrett, & Martens, 1959; Steel & Ovalle, 1984).
The inflated self-perceptions of performance highlight the
need for job performance feedback. Second, supervisors
always rated instructors lower on all scales than did peer-
instructors and students. This again complements classical
performance theory that supervisors' ratings tend to be less
favorable than other rating groups (Rothaus, Morton, &
Hanson, 19 65; Springer, 1953; Zedeck, Imparato, Krausz, &
Oleno, 1974). Third, students rate instructors higher on
the job competence scales of knowledge, performance, and
performance under pressure. This is most likely due to the
bias caused by the student's naive subject knowledge, as
suggested in previous studies by Rosenshine (197 0). Fourth,
peers rate instructors higher on the more sxibjective scales
of leadership, teamwork, and communication skills. This may
be due to the more direct, daily observation made by peers,
or this finding may fall prey to Muchinsky's (1990) theory
that peer ratings are often biased by friendships and
popularity.
These trends imply that all types of rating groups have
some form of bias. The theoretical implication of this
137
finding supports the 3 60-degree performance appraisal
technique. To best control, or at a minimum identify
biases, instructors should be provided ratings from each
group. This would allow the instructor and supervisor more
options in choosing the appropriate rating group for a given
performance criteria and would generate more potential areas
for discussion during performance debriefing sessions. The
theoretical strength of the 360-degree appraisal technique
is its representation of various work-level insights, and
its ability to identify bias and rating discrepancies
between rating groups.
The final policy implication from the performance
analyses, and bias identification, concerns the utility of
performance appraisal. Performance appraisal literature
cautions that certain criteria and rating groups should be
used for developmental feedback appraisal only, and not
official documented evaluations. Specific examples in this
study were the personality criterion and the peer-instructor
rating group. Peer review is generally perceived as a
popularity contest when applied to formal job performance
evaluation (Gephart, 1979). This is especially true in
higher education settings where tenure and appointments are
awarded (Batista, 1976). Peer review is widely accepted and
encouraged for informal faculty development and mentoring
programs. However, if official documentation is involved,
researchers contend the bias of friendships and popularity
13 8
compromise the developmental and accuracy potential of peer
insights (Gephart, 1979; Kane & Lawler, 1979; Millman,
19 87) . If peer review is to be used for instructor pilot
performance evaluations, it is important to keep it an
informal, undocumented process where the concentration is
developmental and not formal evaluation.
The second concern in the utility of performance
appraisal is the use of subjective criterion such as
personality. Until more reliable and stronger measurements
are validated, suspected performance criterion such as
personality should be used as a feedback tool only and not a
part of formal evaluation (Weiss & Adler, 19 84). Current
personality measures are still marginal predictors with
typical correlation values of r=.20. Such low correlations
are not valid, nor reliable enough to support formal
performance appraisal application. This study certainly
illustrates the practical application concern over
personality measures. Although, the personality findings
indicated significant correlation for four personality
traits, there is little practical significance. Weak
correlations as reported in this study (r=.19) may be
statistically significant, but are hardly of practical
significance. The correlations are far too weak to support
any tangible application. The trends, however, may
facilitate discussion and informal performance debrief
counseling. Weiss and Adler (19 84) advocate that although
139
personality correlations remain low, personality assessment
may still provide excellent performance appraisal debrief
topics in a strictly informal setting.
In summary, the performance appraisal analyses achieved
the first two purposes of this study. The NASA/UT astronaut
assessment survey was determined as a valid IP performance
appraisal tool, and a global performance rating was achieved
which represented seven performance appraisal dimensions and
three rating groups. Additionally, the 3 60-degree feedback
technique was supported by highlighting the different
performance criteria emphases by various rating groups, and
identifying the innate biases associated with the different
rating groups. This additional finding underscores the
shortcoming of the present IP performance evaluation
technique which concentrates on supervisor only perspectives
using very limited criteria.
Personality
The second analysis of data revealed modest correlation
values between personality traits and perceived performance.
The study separated personality traits into positive and
negative attributes for the investigation. None of the
positive personality attributes were significant with
overall performance. They had a wide range of variability
(r = -.12 to .15), and were variable in relationship
direction. The most significant relationships
140
for positive personality attributes occurred when measured
solely against self-ratings of performance. Four
personality traits (Achievement Striving, Expressivity,
Instriunentality, and Bipolar Instrumentivity/Expressivity)
had significant positive relationships (r = .17 to .28) with
the self-rating group.
Higher correlations between personality traits and
self-ratings probably occurred due to the imique insights of
performance motives the self possesses. Campbell (1991),
Mabe and West (19 82) have shown self raters may better
understand the motivation underlying their behavioral
patterns and are in a better position to see how their
behaviors change across time and situations. This
privileged information would affect the self-reported
performance rating and the self-reported personality
assessment. Isaacson, McKeachie and Milholland (1963)
further determined that self-reporting was not a valid
method of measuring performance/personality relationships.
Due to the self's unique insight into motives, self-
reporting of personality may have caused the high
correlations with self-rating in this study.
The relationships between negative personality trait
attributes and overall perceived performance were much more
predictable and consistent. They all had an inverse
relationship with small variation (range = -.18 to -.19;
141
p<.05), with three of the four negative attribute
correlations significant.
Two possible explanations of why negative attributes
had such high correlations with perceived performance can be
identified under two psychological paradigms -- learning
theory, and social behavior. Under learning theory,
negative personality attributes may be viewed as a form of
punishment. Verbal aggression for instance is simply
shouting or making demeaning comments/critiques. In a
learning environment this can be very detrimental. Skinner
(1984) identified multiple reasons for not using punishment:
(1) it causes unfortunate emotional byproducts, and (2)
punishment elicits aggression toward the punishing agent.
If students perceived the negative personality attributes as
a form of punishment then it would logically follow they
responded by rating appropriate instructors lower on
performance. Assessing student critiques, Azrin and Holz
(1966) confirmed the retaliatory reaction of students to
instructional forms of punishment. They found students
rated instructors lower in performance when punitive forms
of discipline were used in the classroom. The negative
personality traits in this study can easily be associated
with instructional forms of punishment and may account for
the significant inverse relationship with perceived
performance.
142
A second interpretation of the significant relationship
between negative personality traits and perceived
performance is from a social behavior paradigm. Gilbert
(1989, 1991, 1993) contends that people go through two steps
when making attributions or appraisals. They begin by
making an internal attribution based on the person's
previous behavior, and then adjust that attribution based on
the situation. Most people, however, fail to reach the
second step and base the attribution solely on previous
experience. This mental shortcut process is called the
anchoring/adjustment heuristic and is greatly biased by
first impressions or experience (Aronson, Wilson, & Akert,
1994) . The heuristic process is greatly affected by
negative perceptions (Levenson, Carstensen, & Gottman,
1994) . Negative personality traits may be regarded by
observers the same as negative perceptions and thus
influence performance appraisal. Instructor pilots with
pronounced negative personality trait scores may have been
rated based on biased heuristic perceptions by observers.
Thus, the relationship between negative personality traits
and perceived performance would be amplified by a biased
attribution process. Raters need to be informed in advance
of this natural bias process and reminded to assess the
whole person across a period of time rather than a single
situation.
143
Other explanations to the personality trait results
involve two statistical considerations. First, due to the
dominant sample size and the factor weighting, overall
perceived performance ratings closely mirrored the student
group ratings (Table 8). For the most part, student rating
correlations and overall rating correlations matched in
direction of the relationship, and significance. If the
student rating group weighting in the overall performance
computation was changed, there would certainly be a
difference in both significant correlations and the
subsequent regression model. The weighting of the overall
performance computation currently reflects AETC's criteria
in student assessment, but may not be appropriate for
instructor assessment. Further research should explore the
factor weighting of the overall performance computation and
make adjustments to better define and represent AETC's
criteria of instructor pilot performance assessment.
A second statistical consideration is the extreme
values in negative personality traits scores (Table 7). The
positive trait scores clustered near the center of the
rating scale (3). The negative trait scores, however, were
skewed to the left near the lower scores (1.5). The
skewness of the negative trait distribution would promote
significant correlations. Implications of this
consideration are simply to recognize that negative traits
have a higher potential of producing significant
144
correlations. Positive personality attribute scores have a
natural tendency to cluster near the middle of the
distribution and, therefore, are less likely to produce
significant correlations.
Demographics
The final analysis of data examined the relationship
between demographic variables and perceived performance.
The stepwise regression equation accounted for eleven
percent of the variance and was composed of two significant
demographic variables. Number of Children and Military Rank.
Closer inspection of the significant demographic variables
correlations revealed three trends (Table 10). There were
nine significant demographic correlations with overall
performance (Age, Marital Status, Number of Children, Time
in Service, Military Rank, Total Flying Time, Squadron
Officers School (2 month professional schooling), Previous
Aircraft, and Career Intentions. These correlations ranged
in magnitude from .17 to .28. Many of these variables
overlap and could be combined into a categorical factor
called "Social Maturity" or "Social Experience." It appears
the more "socially experienced" the instructor pilot is, the
higher their performance rating. This may be due to the
emphases on the social environment in the workplace by the
more mature individual. Friedlander (1963) found that the
older supervisors tended to derive more satisfaction from
145
the social and technical aspects of their work and less
satisfaction from self-actualization than did the younger
supervisors. Older, more mature instructor pilots may be
more socially in tune with student and peer needs and less
consumed with their own. Whether labeled maturity or simply
social experience, the older instructors are perceived as
better performers.
A second trend among demographic correlations appear
among supervisor ratings only. Significant correlations
with supervisor ratings solely, again identified "social
experience" variables and also identified a new set of
"professional experience" variables. Large and significant
correlation magnitudes occurred with variables such as Total
Flying Time, Military Rank, Previous Aircraft Experience,
Career Intentions, and SOS (Professional education). These
variables congregate and overlap into a "professional
experience" factor. Supervisors appear to heavily weight
previous military experience and professional education with
perceived performance. Part of this practice is
artificially imposed by the military structure and may be
another form of rating bias. Officers of higher rank are
placed in jobs with more responsibility and generally higher
visibility. The more senior officers will naturally work
closer with supervisors and thereby their performance may be
more noticeably observed or credited for their increased
responsibilities. This may bias ratings simply by providing
146
increased exposure and opportunity of senior ranking
officers to performance evaluators.
The final demographic trend was with self ratings.
Significant demographic correlations for self-ratings
appeared to cluster around a type of "hands on experience"
factor. The experience factor this time was weighted toward
time and experience as an instructor pilot. Self-ratings
resulted in the highest correlations (r = .29) with
variables such as Time as an IP, Total Flying Time, IP
Flying Time, and Rank. Instructors base their own
performance level with the amount of "instructional hands
on" experience. It appears the self recognizes performance
based on how well one can do the primary job at hand,
instruct student pilots.
All three trends of the demographic variables imply
some type of experience factor. Overall perceived
performance ratings appear related to "social" experience.
Supervisor recognition of performance appears slanted
towards "professional" experience. Self-ratings of
performance appear to emphasize "hands on" instructional
experience. These trends complement the new AETC instructor
pilot hiring philosophy of replacing younger inexperienced
IPs with more operationally experienced pilots.
147
Observations
From informal interviews with the three rating groups
and from observing the rating process, the researcher
discovered ancillary information that complement the
empirical findings. Instructor pilots wanted specific
performance feedback, and feedback from multiple groups.
They liked the performance appraisal instrument because it
included specific performance criteria instructors felt were
related to being an instructor pilot and Air Force officer.
They felt the constructs reinforced and emphasized "the
mission," or "why" they were there. Instructors also liked
the multiple rating groups technique because it reinforced
the different customer groups of the instructor pilot's
work. The instructors recognized these various groups have
different insights and priorities, and wanted to know how
these groups perceived their performance along with specific
instances of "why." There were, however, some concerns
about implementation and how the performance information
would be used.
Instructors were, at first, reluctant to report on
peers. Two concerns surfaced during the interview. First,
peers were apprehensive about how the ratings would be used
and who would see them. Anonymity and Privacy Act
statements assured the confidential nature of this study,
but the concern remained valid for future potential
application of the process. The instructors were willing to
148
critique peers and to receive peer critiques, but only under
non-threatening circumstances. The instructor's reaction
are best described by Kane and Lawler (197 8) conclusions
that peer appraisals are accepted, and perceived valid, if
they are used unofficially and solely for the purpose of
providing detailed and accurate feedback to workers.
Second, some instructors were concerned with the
validity of peer ratings. They felt it would simply reflect
popularity contests and did not believe peer ratings would
be accurate assessments. This is a classical concern by
peers. Cederblom and Lounsbury (1980) found a lack of user
acceptance in peer rating in a study of college professors.
The professors resisted peer rating reviews because of the
perceived threat of friendship and popularity bias. McEvoy
and Buller (19 87) suggested bias on peer appraisal can be
controlled, and peer rating validity increased if the
ratings are only used for informal feedback.
The key to acceptability and validity in peer ratings
is utilization. Peer ratings must be unofficial and
undocumented. They should be implemented as developmental
forms of feedback only. If these conditions are met,
instructors indicated they would accept the multi-group
rating process.
149
Implications for Higher Education
Higher education implications of this study apply
directly to faculty performance appraisal. The personality
assessment feature resulted in weak correlations with no
practical significance. The 3 60-degree feedback technique,
however, resulted in the identification of a new performance
assessment technique, criteria, and a potential indicator of
rating group bias. It was found that various rating groups
possess different expectations and criteria of faculty
performance. A student, for example, may have no interest
in a faculty member's research activity, but instead
emphasize the faculty member's teaching ability.
Administrators, on the other hand, interested in scholarly
representation of the institution may place an emphases on
faculty research. This study highlighted those group
differences in perceived faculty performance criteria. If
faculty performance is to be holistic, then various rating
groups need to be consulted to provide representative
criteria. Millman, (19 87) highlights the unique insights of
peers and students in teacher evaluation, but overlooks the
important inputs these groups can contribute to establishing
performance criteria. This study suggests that various
rating groups should be consulted to establish a holistic
criteria for faculty performance assessment.
The second higher education implication from this
research highlights the bias from various rating groups in
150
faculty appraisal. It validates previous findings by
Batista (1976) who indicated that students and peers inflate
different performance evaluation scales on course critiques.
Batista attributed the discrepancy in scale inflation to
differences between student and peer-instructors in subject
knowledge and familiarity with the instructor. This study
emulated these findings, highlighting the classical rating
pattern inflation from various rating groups. The group
rating bias found in this study underscores the need and use
for 3 60-degree appraisal. Such a technique provides the
opportunity to identify scale inflation and allows faculty
member to better process performance ratings.
Future faculty appraisal should strongly consider the
360-degree feedback technique. The process can provide
comprehensive criteria which represents the unique needs and
concerns from various customers of the higher education
process. Application of the 360-degree process also helps
identify various group bias in typical instructor and
classroom performance appraisals. The comparison of rating
scores across the various rating groups facilitates
discrepancy discussions for faculty development and
highlights scale inflation indicative of various rating
group bias.
151
Recommendations
The following recommendations are made based upon the
results of this research.
Recommendations for Instructor Pilots
1. More mature and experienced Air Force pilots should
be selected for instructor pilot duty. A more socially
mature pilot appears to interact better with students and
provide a more credible real world experience to their
instruction.
2. Performance assessment of instructor pilots should
include other perceptions than just the supervisor.
Supervisors have a limited, biased perception that may not
accurately assess all facets or insights of an instructor's
performance.
3. More periodic, holistic feedback should be provided
to instructors. Instructors want to do well, but do not
have an accurate baseline to judge their current
performance. This baseline should include perceptions from
peers and students as well as the current system of
supervisor feedback.
4. Instructor pilots should be expected to provide
performance feedback to peers which would increase peer
development opportunity and responsibility.
5. Counseling and training should be established to
help instructors interpret and improve performance feedback
152
results, thereby keeping performance rating feedback
developmental.
6. The PCI does not appear sensitive enough to
discriminate among the homogenous group of instructor
pilots. The instrument should be modified for peer
reporting, or more specific questions should be developed
for the military instructor pilot work arena.
Recommendations for Future Research
1. This study used a self-reporting personality survey
consisting of eleven traits. The potential overlap and
duplication of these traits may dilute the results. A
future study should be conducted using a simpler,
streamlined personality measure, such as the five-factor
model, or observer reporting instruments.
2. Although well defined, the seven performance
measures used in this study should include more specific
examples for the various ratings. Further research should
develop specific behaviors and examples for each rating,
i.e.. Communication: (7) always debriefs the student in a
positive manner, (1) no debrief accomplished.
3. The dependent variable in this study may have been
diluted by combining the performance constructs. Future
studies should investigate the influence of personality
traits with specific performance scale measures, i.e.,
leadership, teeum« ork, etc..
153
4. Future research should investigate the potential
impact of various instructor training programs, such as a
comparison of perceived performance ratings between an
experimental group receiving developmental 360-degree
feedback and a control group which receives no feedback.
5. In this study, experience emerged as a significant
predictor of performance. Future studies should investigate
the relationship between maturity, flight experience,
personality, and performance.
Conclusions
This study identified a new performance measure and
technique for ATC instructor pilot performance appraisal.
It identified and validated new performance criterion (the
NASA/UT Astronaut Assessment Survey) that provides a single
global measure for the broad duties of instructor pilots.
It also explored a new performance rating technique (the
360-degree feedback) that provides more in-depth, and unique
insights to an instructor pilot's daily performance. Both
of these developments contributed significantly to the
practical assessment of instructor pilots, and to the
theoretical development of performance appraisal.
This study also explored the relationship of
personality traits and demographics to perceived
performance. Although no significant implications to
personality research were found, the importance of
154
experience was identified in perceived instructor pilot
effectiveness. This finding is the first empirical support
of the new ATC direction to replace First Assignment
Instructor Pilots (FAIPs) with more operationally
experienced Major Weapon Systems (MWS) pilots.
The influence of personality traits on perceptions of
performance was quite small in this study, but
instrumentation remains suspect. Although four personality
trait scale resulted in small statistical significance, the
relationships were very weak and did not provide any
practical significance. Three confounds to the personality
regression included the mixed self-reporting of the
personality instrximent with the observer rating of perceived
performance, the largely homogeneous sample of instructor
pilots, and the lack of variability in the dependent
variable of overall perceived performance. Factors
composing the PCI appear to overlap and dilute the
significance of any single personality trait. The study
should be replicated with a new Big-Five personality measure
rather than the PCI. Additionally, observers' reporting of
an individual's personality should be explored versus the
self-reported personality measure used in this study.
Policy implications of the study resulted in multiple
performance assessment findings. First, the current
criteria for IP performance assessment appears incomplete.
Four additional criteria (performance-under-pressure.
155
leadership, teamwork, personality) were unanimously
identified by the various rating groups as appropriate, and
needed, measures of IP performance. Second, self-ratings of
performance were significantly higher than all other rating
groups. This underscores the need for IP feedback, and the
skewed perceptions of the instructor. The 3 60-degree
feedback method provides comprehensive feedback to the
instructor which highlights the unique perceptions of
various rating groups. Third, demographics indicating
flying and military experience were predictors of perceived
performance. All rating groups recognized pilots with more
experience, such as the MWS pilots, as better performers.
It appears the proposed IP composition change to increase
Major Weapons Systems(MWS) pilots at UPT is justified.
Theoretical implications also pivoted around the
performance appraisal part of the study. Personality traits
were of minimal statistical significance and of no practical
significance. This confirms current personality research
which cites the difficulty of achieving measurable
difference in self-reported personality measures. The
performance appraisal technique, however, resulted in very
encouraging findings. Multiple rating groups which
participate in the 360-degree feedback technique, do provide
unique insights to the performance appraisal process and
diffei"ent emphases of criteria for rating performance.
Their diverse representation of various work level
156
perspectives provide a more comprehensive and accurate
picture of an instructor's all-around performance.
Additionally, the various rating help identify rating group
bias of various performance appraisal scales. Different
rating group bias was found in this study to emulate
previous research. Self-ratings were always higher than
other groups, supervisors were lower, student inflated job-
knowledge scales, and peers inflated teamwork, personality,
and communication.
This study did not support personality predictors of
perceived performance as originally proposed. Instead,
demographic predictors were found to be the more significant
predictor. The greatest contribution from this research
concerns the performance appraisal technique. The 360-
degree feedback process was very successful at identifying
comprehensive performance criteria, and illustrated
differences in rating group perspectives. The 360-degree
technique may have promising potential in future faculty
performance appraisal. Its diverse perspectives may provide
assessments that go far beyond the classroom and typical
course critique. Future studies should explore this
opportunity to assess faculty using the 360-degree process
and criteria from the NASA/UT Astronaut Assessment Survey.
REFERENCES
Air Training Command. Historical Research Paper, Major Changes in Pilot Training 1939-1984. Randolph AFB, TX: Air Training Command, History and Research Office, October 19 84.
Aleamoni, L. M. (1976). Typical faculty concerns about student evaluation on instruction. National Association of Colleges and Teachers of Agriculture Journal. 20: 111-121.
Allport, G. W. (1937). Personality. New York: Holt.
Allport, G. W. (1961). Pattern and Growth in Personality. New York: Holt.
Allport, G. W., & Odbert, H. S. (1933). Trait-names: A psycho-lexical study. Psychological Monographs. 47, 171-220 (1, Whole No. 211).
Allport, G. W., Vernon, P. E., & Lindzey, G. (1951). Study of values. Boston: Houghton-Mifflin.
Anastasi, A. (1972). Personality Research Form. In O. K. Buros (Ed.), The seventh mental measurements yearbook. Highland Park: Gryphon Press.
Anastasi, A. (1976). Psychological Testing. New York: Macmillan.
Aronson, E., Wilson, T. D., & Akert, R. M. (1994). Social Psychology: The heart and the mind. New York: Harper Collins.
Astin, A. W. (1991). Assessment for Excellence: The philosophy and practice of assessment and evaluation in higher education. New York: Macmillan.
ATC Study Guide. (1990). Pilot Instructor Training-Instructor Development (ATC Study Guide F-V5A-A/B-ID-SG). Randolph AFB, TX: DCS Operations and Readiness.
Azrin, N. H., & Holz, W. C. (1966). Punishment. In W. K. Honig (Ed.). Operant Behavior: Areas of research and application. Englewood Cliffs, NJ: Prentice-Hall.
Bale, R., Rickus, G., & Ambler, R. (1973). Prediction of advanced level aviation performance criteria from early training and selection variables. Journal of Applied Psychology. 58, 347-350.
157
_ 158 Barone, J. Maj. (1993). Major weapon systems instructor
pilot advantages. Headquarters AETC Public Release. Release No.93-11-02, Randolph AFB, TX: November.
Barrett, G. V., & Kernan, M. C. (1987). Performance appraisal and terminations: A review of court decisions since Brito v. Zia with implications for personnel practices. Personnel Psychology. 40, 489-503.
Barrick, M. R., & Mount, M. K. (1991). The Big Five personality dimensions and job performance: A metaanalysis. Personnel Psychology. 44, 1-21.
Batista, E. E. (1976). The place of colleague evaluation in the appraisal of college teaching. Research in Higher Education, 4: 257-271.
Blower, D. (1992) . Performance-based testing and success in naval advanced flight training. (Tech. Report NAMRL-13 63) . Pensacola, FL: Naval Aerospace Medical Research Laboratory.
Blower, D., & Dolgin, D. (1991). An evaluation of performance-based tests designed to improve naval aviation selection. (Tech. Report NAMRL-1363). Pensacola, FL: Naval Aerospace Medical Research Laboratory.
Bluen, S. D., Barling, J., & Burns, W. (1989). Predicting job satisfaction and depression using the Impatience and Achievement Striving dimensions of Type A behavior. Journal of Applied Psychology. 44: 112-121.
Borderlon, V. P., & Kantor, J. E. (1986). Utilization of Psychomotor Screening for USAF Pilot Candidates: Independent and Integrated Selection Methodologies. AFHRL-TR-86-4. Brooks AFB, TX: Air Force Systems Command.
Borgatta, E. F. (19 64). The structure of personality characteristics. Behavioral Science. 9, 8-17.
Borich, G. D. (1977). The Appraisal of Teaching; Concepts and Process. Reading, MA: Addison-Wesley.
Borman, W. (1974). The rating of individuals in organizations: An alternative approach. Organizational Behavior and Human Performance. 12, 105-124.
159 Botwin, & Buss, D. M. (1989). The structure of act report
data: Is the five factor model of personality recaptured? Journal of Personality and Social Psychology. 56, 988-1001.
Bowers, N. D. (1953). An evaluation of instructor's ground school training in the Naval Air Basic Training Command (Special Report No 58-4) Pensacola, FL: U.S. Naval School of Aviation Medicine.
Brictson, C , Burger, W., £e Gallagher, T. (1972). Prediction of pilot performance during initial carrier landing qualification. Aerospace Medicine. 43, 483-487.
Bridgwater, C. A. (1982). Personality characteristics of ski instructors and predicitng teacher effectiveness using the PRF. Journal of Personality Assessment. 46, 2, 164-168.
Briggs, S. R. (1989) . The optimal level of measurement for personality constructs. In D. M. Buss, N. Cantor (Eds.), Personality Psychology; Recent trends and emerging directions. New York: Springer-Verlag.
Buss, D. M., & Craik, K. H. (1983). Act prediction and the conceptual analysis of personality scales. Journal of Personality and Social Psychology. 45, 1081-1095.
Campbell, D. P. (1991). Manual for Campbell Leadership Index. Minneapolis, MN: National Computer Systems.
Carretta, T. (1992a). Recent developments in U. S. Air Force pilot candidate selection and classification. Aviation. Space, and Environmental Medicine. 63, 112-114.
Carretta, T. (1992b). Understanding the relations between selection factors and pilot training performance: Does the criterion make a difference? International Journal of Aviation Psychology. 2, 95-106.
Cattell, R. B. (1947). Confirmation and clarification of primary personality factors. Psvchometrika, 12, 197-220.
Cederblom, D., & Lounsbury, J. W. (1980). An investigation of user acceptance of peer evaluations. Personnel Psychology. 33, 567-580.
160 Chlldester, T. R. (1988). Leader personality and crew
effectiveness: Factors influencing performance in full-mission air transport simulation. In Proceedings of the 66th Meeting of the Aerospace Medical Panel on Human Stress Situations in Aerospace Operations. Advisory Group for Aerospace Research and Development, The Hague, Netherlands, 7-1 - 7-9.
Cohen, J. (1983). The cost of dichotomization. Applied Psychological Measurement. 7, 249-253.
Cohen, P. A. (19 81). Student ratings of instruction and student achievement: A meta-analysis of multisection validity studies. Review of Educational Research. 51, 281-309.
Conley, J. (19 85). Longitudinal stability of personality traits; A multitrait-multimethod-multioccasion analysis. Journal of Personality and Social Psychology, 49, 1266-1282.
Costin, F. (1971). Student ratings of college teaching: Reliability, validity, and usefulness. Review of Educational Research. 41, 511-535.
Cross, K. P. (1988). Classroom Assessment Techniques: A Handbook for Faculty. Ann Arbor, MI: National Center for Research on the Improvement of Postsecondary Teaching and Learning.
Dcunos, D., & Gibb, G. (1986). Development of a computer-based naval aviation selection test battery (Tech. Report NAMRL-1319). Pensacola, FL: Naval Aerospace Medical Research Laboratory.
Davis, R. A. (1989). Personality: Its use in selecting candidates for US Air Force Undergraduate Pilot Training. Air University Report AU-ARI-88-8. Air University Press, Maxwell Air Force Base, AL.
Davis, W. A. (1990). Analysis of the Coast Guard Flight Instructor Profile. Unpublished dissertation, Detroit, MI; Wayne State University.
Digman, J. M. (1990). Personality structure: Emergence of the five-factor model. In M. R. Rosenzwieg & L. W. Porter (Eds.), Annual Review of Psychology. 41, 417-440.
Digman, J. M., & Takemoto-Chock, N. K. (1981). Factors in the natural language of personality. Multivariate Behavioral Research. 16, 149-170.
161 Dolgin, D. L., & Gibb, G. D. (1988). Personality Assessment
in Aviation Selection; Past. Present, and Future. Nava1 Aerospace Medical Research Laboratory, 21.
Driskell, J. E., Hogan, R., & Salas, E. (1987). Personality and group performance. In C. Hendrick (Ed.), Personality and social psychology review (pp. 91-112). Beverly Hills, CA: Sage.
Dunn, R., & Dunn, K. (1978). Teaching Students Through Their Learning Styles: A Practical Approach. Englewood Cliffs, NJ: Prentice-Hall.
Epstein, S. (1979). The stability of behavior: I. On predicting some of the people much of the time. Journal of Personality and Social Psychology. 37, 1097-1126.
Epstein, S. (1980). The stability of behavior: II. Implications for psychological research. American Psychologist. 35, 790-806.
Epstein, S. (19 83). Aggregation and beyond: Some basic issues on the prediction of behavior. Journal of Personality, 51, 360-392.
Epstein, S. (1984). The stability of behavior across time and situations. In R. Zucker, J. Aronoff, & A. I. Rabin (Eds.), Personality and the prediction of behavior (pp. 209-268). San Diego: Academic Press.
Fiske, D. W. (1949) . Consistency of the factorial structures of personality ratings from different sources. Journal of Abnormal and Social Psychology. 44, 329-344.
Frey, P. W. (1978). A two-dimensional analysis of student ratings of instruction. Research in Higher Education. 9, 69-91.
Friedlander, F. (1963). Underlying sources of job satisfaction. Journal of Applied Psychology, 47, 246-250.
Gephart, W. J. (1979). Practical applications of research on personal evaluation (editorial). Practical Applications of Research. 2; 3.
Getzels, J. W., & Jackson, P. W. (1963). The teacher's personality and characteristics. In N. L. Gage, (Ed.), Handbook of Research on Teaching, (pp. 506-582). Skokie, IL: Rand McNally.
162 Ghiselli, E. E. (1973). The validity of aptitude tests in
personnel selection. Personnel Psychology. 26,461-477.
Gilbert, D. T. (1989). Thinking lightly about others; Automatic components of the social influence process. In J.S. Uleman & J. A. Bargh (Eds.) Unintended thought (pp. 189-211). New York; Guilford Press.
Gilbert, D. T. (1991). How mental systems believe. American Psychologist. 46, 107-119.
Gilbert, D. T. (1993). The assent of man; Mental representation and the control of belief. In D. M. Wegner & J. w. Pennebaker, (Eds.), The handbook of mental control (pp. 57-87). Englewood Cliffs, NJ: Prentice-Hall.
Glass, D. C. (1977). Behavior patterns, stress, and coronary disease. Hillsdale, NJ: Erlbaum.
Goldberg, L. R. (19 82). From ace to zombie: Some explorations in the language of personality. In C. D. Speilberger & J. N. Butcher (Eds.), Advances in personality assessment (Vol. 1, pp. 203-234). Hillsdale, NJ; Erlbaum.
Gottfredson, G. D., Holland, J. L., & Ogawa, D. K. (1982). Dictionary of Holland occupational codes. Palo Alto, CA; Consulting Psychologist Press.
Gough, H. G. (1969). College attendance among high-aptitude students as predicted from the California Psychological Inventory. Journal of Counseling Psychology. 15, 2 69-278.
Graham, J. R., & Lilly, R. S. (1984). Psychological Testing. Englewood Cliffs, NJ: Prentice-Hall.
Gregorich, S., Helmreich, R. L., Wilhelm, J. A., & Chlldester, T. R. (1989). Personality based clusters as predictors of aviator attitudes and performance. In Proceedings of the Fifth International Symposium on Aviation Psychology. Columbus, OH, Ohio State University, April, 1989.
Greuter, M., & Herman, P. (1992). Validity and utility of assessment methods in civil aviation. In K. M. Goeters & N. Adams (Eds.), Proceedings of XX Conference of the Western European Association of Aviation Psychology WEAAP (pp. 59-61). Hamburg, Germany: DLR, Department of Aviation and Space Psychology.
163 Griffin, G., & Morrison, T., Amerson, T., & Hamilton, P.
(1987). Predicting air combat maneuvering (ACM) performance: fleet fighter ACM readiness program grades as performance criteria. (Tech. Report NAMRL-1333). Pensacola, FL; Naval Aerospace Medical Research Laboratory.
Griffin, G., & Shull, R. (1990). Predicting F/A-18 fleet replacement squadron performance using an automated battery of performance-based tests. (Tech. Report NAMRL-1354). Pensacola, FL; Naval Aerospace Medical Research Laboratory.
Guion, R. M., & Gottier, R. F. (1965). Validity of personality measures in personnel selection. Personnel Psychology. 18, 135-164.
HAF-DPP-A (1992). USAF Flying Program Flying Training. Vol I. HQ ATC/DOPR, Randolph AFB, TX: ATC 92-2, August 1991.
Harris, M. M., & Schraubroeck, J. (1988). A meta-analysis of self-supervisor, self-peer, and peer supervisor ratings. Personnel Psychology, 41, 43-62.
Hazucha, J. F. (1991). Success, ieapordv. and performance; Contrasting managerial outcomes and their predictors. Unpublished doctoral dissertation. University of Minnesota, Minneapolis.
Helmreich, R. L. (1982). Pilot selection and training. Paper Presentation at the American Psychological Association. Washington, DC.
Helmreich, R. L. (1986). Cockpit resource management; Exploring the attitude-performance linkage. Proceedings of the Fourth International Syposixm on Aviation Psychology. Ohio State University, Columbus.
Helmreich, R. L. (1987). Theory underlying CRM training: Psychological issues in flightcrew performance and crew coordination. In H.W. Orlady & H. C. Foushee (Eds.), Cockpit resource management training; Proceeding of the NASA/MAC Workshop. NASA-Ames Research Center: CP-2455.
Helmreich, R. L., Spence, J. T. (1978). The work and family orientation questionnaire: An objective instriunent to assess components of achievement motivation and attitudes toward family and career. JSAS Catalog of Selected Documents in Psychology, 8.
164 Helmreich, R. L., Spence, J. T., Beane, W. E., Lucker, G.
W., & Matthews, K. A. (1980). Making it in academic psychology; Demographic and personality correlates of attainment. Journal of Personalitv and Social Psychology. 39, 963-967.
Helmreich, R. L., Wilhelm, J. A. (1989). When training boomerangs: Negative outcomes associated with the Cockpit Resource Management Training. In Proceedings of the Fifth International Symposium on Aviation Psychology. Columbus, OH, Ohio State University, April, 1989.
Henmon, V. A. C. (1919) . Air services test of aptitude for flying. Journal of Applied Psychology 3, 2.
Hoffelt, W., & Gress, W. (1993). The GAF selection system for flying personnel. In R. Jensen and D. Neumeister (Eds.), Proceedings of the Seventh International Symposium on Aviation Psychology (pp. 39 8-403). Columbus, OH: Ohio State University.
Hogan, J. (1978). Personological dynamics of leadership-Journal of Research in Personality. 12, 390-395.
Hogan, R. (1986). Hogan Personality Inventory manual. Minneapolis, MN: National Computer Systems.
Hogan, R. (19 87) . Personality psychology: Back to basics. In J. Arnoff, A. I. Rabin, & R. A. Zucker (Eds.), The emergence of personality (pp. 141-188). New York; Springer.
Hogan, J., & Hogan, R. (1986). Manual for the Hogan Personnel Selection System. Minneapolis, MN: National Computer Systems.
Hogan, R., DeSoto, C. B., Solano, C. (1975). Traits, tests, and personality research. American Psychologist. 6, 255-264.
Hoge, R. D., & Luce, S. (1979). Predicting achievement from classroom behavior. Review of Educational Research. 49: 479-496.
Holland, J. L. (1985). The SDS professional manual--1985 revision. Odessa, FL; Psychological Assessment Resources.
165 Hopson, J. A. (1978). Development and evaluation of a naval
flight officer scoring key for the Naval Aviation Biographical Inventory (NAMRL Report No. 1256). Pensacola, FL; Naval Aerospace Medical Research Laboratory.
Hough, L. M. (19 88, April). Personality assessment for selection and placement decisions. Workshop presented at 3rd Annual Conference of the Society of Industrial and Organizational Psychology. Dallas.
Hough, L. M. (19 89). Development of personality measures to supplement selection decisions. In B. J. Fallon, H. P. Pfister, & J. Brebner (Eds.), Advances in industrial organizational psychology (pp. 365-375). New York: Elsevier.
Hough, L. M. (1992). The Big Five personality variables--construct confusion; Description versus prediction. Human Performance. 5, 139-155,
Hough, L. M., Eaton, N. K., Dunnette, M. D., Kamp, J. D., & McCloy, R. A. (1990). Criterion-related validates of personality constructs and the effect of response distortion on those validates. Journal of Applied Psychology, 75, 581-595.
Hudak, M. A. & Anderson, D. E. (1984). Teaching style and student ratings. Teaching of Psychology. 11, 177-178.
Isaacson, R., McKeachie, W., & Milholland, J. (1971). Correlation of teacher personality variables and student ratings. Journal of Educational Psychology. 54, 110-117.
Jackson, D. N. (1974). Personality Research Form. Port Huron, MI: Research Psychologist Press.
Joaquin, J. B. (1980). The Personality Research Form and its utility in prediciting undergraduate pilot training performance (Canadian Forces Report No. 80-12). Willowdale, Ontario: Canadian Forces Personnel Applied Research Unit.
John, O. P. (19 89). The Big Five factor taxonomy: Dimensions of personality in the natural language and in the questionnares. In L. A. Pervin (Eds.), Handbook of personality; Theory and Research. New York: Guilford.
166 John, O. P., Goldberg, L. R., & Angleitner, A. (1984).
Better than the alphabet; Taxonomies of personality descriptive terms in English, Dutch, and German. In H. C. J. Bonarius, G. L. M. van Heck, & N. G. Smid (Eds.), Personality psychology in Europe; Theoretical and empirical developments (Vol. 1. pp. 83-100). Lisse; Swets & Zeitlinger.
Jung, C. G. (1923). Psychological types. New York; Harcourt Brace Jovanovich.
Kane, J. S., & Lawler, E. E., III. (1979).Performance appraisal effectiveness and determinant. In B. M. Staw (ed.). Research in organizational behavior (vol. 1). Greenwich, CT: JAI Press.
Kant, I. (1974). Anthropology from a pragmatic point of view (M. J. Gregor, trans.). The Hague; Nijhoff. (Original work published 179 8).
Kantor, J. E., & Carretta, T. R. (1988). Aircrew selection systems. Aviation. Space, and Environmental Medicine. 59, II Supplement, A32-A38.
Kirchner, W. (1965). Relationships between supervisory and subordinate ratings for technical personnel. Journal of Industrial Psychology. 3, 57-60.
Kozlowski, S. W. (1978). The Validity of Personality Inventories for the Selection of Personnel: A Review of the Literature and Recommendations for Research, special report (State of Pennsylvania: State Civil Service Commission.
Lazovik, G. F. (1987). Documentary Evidence in the Evaluation of Teaching. In J. Millman (Ed.), Handbook of teacher evaluation. Beverly Hills,CA: Sage.
Levenson, R. W., Carstensen, L. L., & Gottman, J. M. (1994). The influence of age and gender on affect, physiology, and their interrelations. Journal of Personalitv and Social Psychology. 67, 56-68.
Livneh, H. (1989). The five-factor model of personality: Is evidence for its cross-media premature? Personality and Individual Differences. 10, 75-80.
Locke, E. A., Hulin, C. L. (1962). A review and evaluation of the validity studies of activity vector analysis. Personnel Psychology. 15, 25-42.
167 Mabe, P. A. Ill, & West, S. G. (1982). Validity of self-
evaluation of ability; A review and meta-analysis. Journal of Applied Psychology. 67, 280-296.
Marsh, H. W. (1987). Students' evaluations of university teaching: Research findings, methodological issues, and directions for future research. International Journal of Educational Research. 11, No. 3, 255-379.
Matthews, K. A. (1982). Psychological perspectives on Type A behavior pattern. Psychological Bulletin. 91, 293-323.
McCrae, R. R., & Costa, P. T., Jr. (1985). Updating Norman's adequate taxonomy: Intelligence and personality dimensions in natural language and questionaires. Journal of Personality and Social Psychology. 49, 710-721.
McEvoy, G- M., & Buller, P. F. (1987). User acceptance of peer appraisals in an industrial setting. Personnel Psychology. 40, 785-797.
McFarland, R. (1953). Hinnan factors in air transportation. New York: McGraw-Hill.
McGowen, J., & Gormly, J. (1976). Validation of personality traits: A multicriteria approach. Journal of Personality and Social Psychology, 34, 791-795.
McHenry, J. J., Hough, L. M., Toquam, J. L., Hanson, M. A. Ashworth, S. (1990). Project A validity results: the relationship between predictor and criterion domains. Personality Psychology. 43, 335-354.
McNeil, J. D., & Popham, W. J. (1973). The assessment of teacher competence. Second Handbook of Research on Teaching. Skokie, IL; Rand McNally.
Medley, D. M. (1973) . Closing the gap between research in teacher effectiveness and the teacher education curriculiun. Journal of Research and Development in Education, 7: 39-46.
Millman, J. (1987). Handbook of Teacher Evaluation. Beverly Hills, CA: Sage.
Mischel, W. (1968). Personality and assessment. New York: Wiley.
Mitzel, H. E. (1960). Teacher effectiveness. In C. W. Harris (Ed.), Encyclopedia of Educational Research. New York; Macmillan.
168 Muchinsky, P., (1990). Psychology applied to work (3rd ed.).
Pacific Grove, CA; Brooks/Cole Publishing.
Myers, I. B.., & McCaulley, M. H. (1985). Manual; A Guide to the development and use of the Mvers-Briggs Type Indicator. Palo Alto, CA: Consulting Psychologists Press.
Norman, W. T. (19 63) . Toward an adequate taxonomy of personality attributes: Replicated factor structure in peer nomination personality ratings. Journal of Abnormal and Social Psychology. 66, 574-583.
North, R. A., & Griffin, G. R. (1977). Aviator Selection 1919-1977. Special Report 77-2; Naval Aerospace Medical Research Laboratory. Pensacola NAS, FL.
Parker, J., Taylor, E., Barrett, R., & Martens, L. (1959). Rating scale content:3. Relationship between supervisory and self-ratings. Personnel Psychology. 12, 49-63.
Peabody, D., & Goldberg, L. R. (1989). Some determinants of factor structure from personality trait descriptors. Journal of Personalitv and Social Psychology. 57, 552-567.
Pederson, L. A., Allan, K. E., Laue, F. J., & Johnson, J. R. (1992) . Personality theory for aircrew selection and classification. Armstrong Laboratory Technical Report. AL-TR-1992-0021. Air Force Systems Command, Brooks AFB, TX, May.
Perry, R. P. (1979). Educational seduction: The effect of instructor expressiveness and lecture content on student ratings and achievement. Journal of Educational Psychology. 71: 107-116.
Porter, D. B. (1991). A perspective on College Learning. Journal of College Reading and Learning. 24, 1-15.
Roback, A. A. (1927). A bibliography of character and personality. Cambridge, MA; Sci-Art Publishers.
Robinson, J. E., & Gray, J. L. (1974). Cognitive styles as a variable in school learning. Journal of Educational Psychology. 66, 793-799.
Rose, R. M., Helmreich, R. L., Fogg, L., & McFadden, B. A. (1993). Assessments of astronauts effectiveness. Aviation, Space. & Environmental Medicine. 64, 789-794.
169 Rosenshine, B. (1970). Enthusiatic teaching: A research
review. School Review. 78, 499-514.
Rosenshine, B., & Furst, N. (1973). The use of direct observation to study teaching. In R. M. Travers (Ed.), Second handbook of research on teaching. Chicago: Rand McNally.
Rossander, P. (1980). Personality inventories and prediction of success in pilot training; State of the art. Willowdale, Ontario; Canadian Forces Applied Research Unit.
Rothaus, P., Morton, R., & Hanson, P. (1965). Performance appraisal and psychological distance. Journal of Applied Psychology. 49, 48-54.
Runyan, W. M. (19 83). Idiographic goals and methods in the study of lives. Journal of Personality, 51, 413-437.
Schmitt, N., Gooding, R. Z., Noe, R. A., & Kirsch, M. (1984) . Meta-analyses of validity studies pxiblished between 1964 and 19 82 and the investigation of study characteristics. Personnel Psychology. 37, 407-421.
Schwarz, J. C , Barton-Henry, M. L., & Pruzinsky, T. (1985). Assessing child-rearing behaviors; A comparison of ratings made by mother, father, child, and sibling on the CRPBI. Child Development. 56, 462-479.
Siem, F. M. (1987). The effects of aircrew members personality in interaction and performance. Unpublished dissertation, Austin, TX: University of Texas.
Siem, F. M. (1988). Characteristics associated with success in USAF pilot training. Brooks AFB, TX: Air Force Systems Command.
Siem, F. M., & Murray, B. S. (1994). Personality factors affecting pilot combat performance: A preliminary investigation. Aviation. Space, and Environmental Medicine. 65: A45-48.
Skinner, B. F. (19 84). The shame of American education. American Psychologist, 39, 947-977.
Spence, J. T. & Helmreich, R. L. (19 83). Achievement-related motives and behavior. In J. T. Spence, (Ed.), Achievement and Achievement Motives: Psychological and Sociological Approaches. San Francisco, CA; W. H. Freeman & Co.
170 Spranger, E. (192 8) . Types of men. Halle, Germany: Max
Niemeyer Verlag.
Springer, D. (1953). Ratings of candidates for promotion by co-workers and supervisors. Journal of Applied Psychology, 37, 347-351.
Steel, R., & Ovalle, N., 2nd. (1984). Self-appraisal based upon supervisory feedback. Personnel Psychology. 37, 667-685.
Tett, R. P., Jackson, D. N., & Rothstein, M. (1991). Personality measures as predictors of job performance; A meta-analytic review. Personnel Psychology. 44, 703-742.
Thorndike, R. (1949). Personnel selection. New York: Wiley.
Thorndike, E. L. (1906). Principles of teaching. New York; Seller.
Travers, R. W. (1987). Criteria of good teaching. In J. Millman (Ed.), Handbook of Teacher Evaluation. Beverly Hills, CA: Sage.
Waller, N. G., Beh-Porath, Y. S. (1987). Is it time for clinical psychology to embrace the five-factor model of personality? American Psychologist. 42, 887-889.
Ware, J. E., & Williams, R. G. (1977). Discriminant analysis of student ratings as a means of identifying lecturers who differ in enthusiasm or information giving. Educational and Psychological Measurements. 37: 627-639.
Weiss, H. M., & Adler, S. (1984). Personality and organizational behavior. Research in organizational behavior (Vol. 6). Greenwich, CT: JAI.
Wittgenstein, L. (1953). Philosophical investigations. New York: Macmillan.
Woodruffe, C. (19 84). The consistency of presented personality: Additional evidence from aggregation. Journal of Personalitv; 52, 307-317.
Wundt, W. (1874). Principles of physiological psychology. Leipzig: Engelmann.
171 Zedeck, S., Imparato, N., Krausz, M., & Oleno, T. (1974).
Development of behaviorally anchored rating scales as a function of organizational level. Journal of Applied Psychology, 59, 249-252.
APPENDIX A
TESTING INSTRUMENTS
This appendix contains the following materials: (a) the
Demographics Survey, (b) the Personality Characteristics
Instrument (PCI), (c) the NASA/UT Astronaut Assessment
Survey.
172
1 7 3
DEMOGRAPHICS SURVEY cODE
INSTRUCTIONS: P l e a s e c i r c l e the answer which b e s t r e p r e s e n t s your
status or background. All results are CONFIDENTIAL.
1. In what a ircraft do you instruct? a . T - 3 7 b . T - 3 8
2. What is your sex? a. Male b. Female
3. In which age group are you? a. 21-23 b. 24-27 c. 28-30 d. 31+
4. What is your martial status? a. married (currently) d. single (divorced) b. separated e. single (widowed) c. single (never married)
5. How many children live with you? a. none b. one c, two d. three or more
6. What is your housing status? a. off base b. on base
7. What is your highest degree earned? a. associate c. master's b. bachelor's d. master's plus
8. Which category best represents your area of study? a. humanities/fine arts d. business b. sciences e. engineering c. social sciences f. other
9. From what type of institution did you graduate? a. four-year college b. university c. academy
10. Under what type of control was this institution? a. private b. public
11. Approximately what was the student body size at the institution? a. below 5000 b. 5000-20,000 c. 20,000+
12. What was your undergraduate grade-point average? a. less than 2.49 c. 3.0-3.49 b. 2.5-2.99 d. 3.5-4.0
13. Did you participate in organized sports in college? a. no b. yes
14. Did you attend a junior college? a. no b. yes
15. How many higher education institutions did you attend prior to graduating?
a. one b. two c. three d. four or more
174
DEMOGRAPHICS SURVEY (Continued)
16. What was your most difficult obstacle in completing college? a. finances c. academics b. lack of discipline d. lack of direction
17. Do you have prior service experience? a. no b. yes
18. What was your commissioning source? a. USAF Academy b. ROTC c. OTS
19. What is your total time in service since commissioning? a. 1-2 yrs b. 2-4 yrs c. 4-7 yrs d. 8+
20. What is your current rank? a. 2Lt b. ILt c. Capt d. Maj +
21. What type of commission do you presently have? a. Reserve b. Regular
22. Have you completed SOS? a. no c. yes, correspondence only b. yes, in residence
23. What are your total military flying hours? a. under 500 b. 501-800 c. 801-1200 d. 1201+
24. What are your total military IP hours? a. under 200 b. 201-500 c. 501-1000 d. 1001+
25. what are your total civilian flying hours? a. none b. under 50 c. 51 250 d. 251+
26. Were you a civilian instructor pilot? a. no b. yes
27. Before this assignment, what command did you come from? a. FAIP b. TAC c. SAC d. MAC e. ATC
28. What is your previous military aircraft background? a. fighter b. bomber c. tanker/transport d. FAIP
29. How many hours do you have in this aircraft? a. under 500 b. 501 750 c. 7511100 d. 1101+
30. If from a crew aircraft how many hours as an AC do you have? a. under 100 b. 101 250 c. 251 - 500 d. 501+
31. How long have you been a line ATC IP? a. under 6 months c. 13-18 months b. 6-12 months d. 19-24 months e. 24+ months
32. Do you feel you are promotable in, or coming from, this position? a. no b. yes
Thank-you, please turn the page and continue. All results are CONFIDENTIAL.
> . ^ i ^
17 5
DEMOGRAPHICS SURVEY (Continued)
33. What are your career intentions? a. separate when able b. separate only if airlines are hiring c. stay, only for fly only track career d. stay for career
34. Do you feel you have job security for a career in the Air Force? a. no b. yes
35. Although ATC considers all IPs volunteers, what best describes your stimulus for service as an ATC IP?
a. the mission b. only flying job available c. more flying hours d. family life
36. What are the three most important attributes in being a good ATC instructor pilot?
All results are CONFIDENTIAL,
i^iaBft^^wi
176
Personality Characteristics Inventor> Code ^^^ ^ All information will remain Confidential!
1. Not at all aggressive
2. Very whiny
3. Not at all independent
4. Not at all arrogant
5. Not at all emotional
6. Verv submissive
7. Ver\ boastful
•* » • • • -iJ • • • v_ • • • MJ • • • t (
•t\.fmMJ9 • • V.^aa« mJ % • • IL<
Xm» • • MJ» 9 • ^ ^ • • • MJ m • • SLd
^»*** AJ*«« \ _ ^ • • • MJ 9 99 £^
A~\« • • M^9 • • \ -• • • MJ 9 9 9 ILt
8. Not at all excitable in a A...B...C...D...E major crisis
9. Very passive
10. Not at all egotistical
x m • • • J J a » « ^ ^ « • • MJ 9 9 • Xl«
x \ • • • -D • • • ^ • • • . 1 . / • • • £ j
11. Not at all able to devote A...B...C...D...E self completely to others
12. Not at all spineless
13. Very rough
14. Not at all complaining
15. Not at all helpful to others
16. Not at all competitive
/ \ • • • .D • • • \ ^ • • • A^ • • • C>
/ m « « * A ) * * « V>'»*« MJ9»9 H J
A R r D F
/ \ • • • . D • • • V. • • • 1 ^ • • • •H'
17. Subordinates oneself to A...B...C...D...E others
18. Very home oriented x a . « « « J 3 » « « V.X*** mJ ••9 Mlri
\'er> aggressive
Not at all whiny
Very independent
Ver\ arrogant
Ver> emotional
Very dominant
Not at all boastful
Very excitable in a maior crisis
Ver> active
Very egotistical
Able to devote self completely to others
Very spineless
Very gentle
Very complaining
Very helpful to others
Very competitive
Never subordinates oneself to others
\ ' e n worldiv
v<ain
177
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
Very greedy
Not at all kind
Indifferent to others' approval
Very dictatorial
Feelings not easily hurt
Doesn't nag
Not at all aware of feelings
Can make decisions easily
Very fussy
Gives up easily
Very cynical
Never cries
Not at all self-confident
Does not look out only for self; principled
Feels very inferior
Not at all hostile
Not at all understanding of others
Very cold in relations with others
Very servile
i\...15...1_^...Li...lL,
x \ . . . 15... \^. . . LI,,, Ht
'iV...xS...(^...LI,..ili
/ m . . . £ # . . . V ^ . . . l ^ . . * L.d
xm . . . X J . . . V . ^ . . . U »% . r.i
x \ . . . X# . . . \_,. . . U . . . HI
/ \ . . . D . . . v ^ . . . L f . . . Jli
. r \ . . . 1 3 . . . x ^ . . . L f . . . JL
/ \ . . . 0 . . . ^ . . . 1 ^ . . . l l i
x \ . . . l J . . . V ^ . . . J ^ . . . l L
/ \ . . . D . . . x^ . . . 1 ^ . . • HI
r \ . . . 1 J . . . \ ^ . . . L / . . . 1 L
/ \ . . . D . . . \ ^...L^...C>
/ \ . . . D . . . \ ^ . . . 1 ^ . . . H
/ ^ . . . . # . . . > . . . J./. . .-HI
/ \ . . . J j • . . x-<'. . . L/. . . .Hi
/ \ . . . D • . • ^^ . . . .L^. . . •HI
/ \ . . . D . . . V^ • . . X^ . . . HI
r \ . . • D • • • V^ . . . 1^ . . « HI
Not at all greedy
Very kind
Highly needful of others' approval
Not at all dictatorial
Feelings easily hurt
Nags a lot
Very aware of feelings
Has difficulty making decisions
Not at all fussy
Never gives up easily
Not at all cynical
Cries very easily
Very self-confident
Looks out only for self; unprincipled
Feels very superior
Very hostile
Very understanding of others
Very warm in relations with others
Not at all servile
178
Parti (Continued) 38. Very little need
for security
All information will remain Confidential! A...B...C...D...E Ver> strong need for
for securitv
39. Not at all gullible
40. Goes to pieces under pressure
AM • • • JLM 9 9 9\.^ 9 9 9 MJ 9 9 9 MLd
X& • • • .D • • • V. • • • J - / • • • JL
Very gullible
Stands up under pressure
Partn A I
Strongly Agree
B 1
Slightly Agree
C —I Neutral
D 1
Slightly Disagree
E —I Strongly Disagree
1. I would rather do something at which I feel confident and relaxed than something which is challenging and difficult.
2. It is important for me to do my work as well as I can even if it isn't popular with my co-workers.
3. I enjoy working in situations involving competition with others.
4. When a group I belong to plans an activity, I would rather direct it myself than just help out and have someone else organize it.
5.
6.
I would rather learn easy fun games than difficult thought games.
It is important to me to perform better than others on a task.
7.
8.
I find satisfaction in working as well as I can.
If I am not good at something I would rather keep struggling to master it than move on to something I may be good at.
9.
10.
Once I undertake a task, I persist.
I prefer to work in situations that require a high level of skill.
11. There is a satisfaction in a job well done.
12. I feel that winning is important in both work and games.
V **T^
179
13. I more often attempt tasks that I am not sure I can do than tasks that I believe I can do.
14. I find satisfaction in exceeding my previous performance even if I don't outperform others.
15. I like to work hard.
16. Part of my enjoyment in doing things is improving my past performance.
17. It annoys me when other people perform better than I do.
18. I like to be busy all the time.
19. I try harder when I'm in competition with other people.
Part m A
Very much like me
B Fairly like me
C Slightly like me
D Not very
like me
E Not at all like me
1. My general level of activity is higher than most people's.
2. When a person is talking and takes a long time to get to the point, I often feel like hurrying the person along.
3. I get irritated very easily.
4. I have a quick temper.
5. I put more effort into the things I do than most people.
6. I tend to do most things in a hurry.
7.
8.
I take life in general more seriously than most people.
When I have to wait in line, such as at a restaurant or the movies, I often feel impatient and refuse to wait very long.
9. I take my work much more seriously than most people. 10. My job really stirs me into action. 11. When I get involved in an activity, I am very hard-driving.
180
INSTRUCTOR PERFORMANCE ASSESSMENT
INSTRUCTIONS: Read each scale and its associated definitions. Rate the scale for its appropriateness in assessing ATC instructor pilot performance. Next to each number mark your rating for that scale using the following statement and rating scale:
" I feel this scale is appropriate in assessing instructor pilot performance.^"^
Strongly Slightly Slightly Strongly Disagree Disagree Disagree Neutral Agree Agree Agree
I 1 1 1 1 1 1
1 2 3 4 5 6 7
1. Job competence-knowledge a. Possess a good fund of information b. Absorbs new information quickly c. Reduces complex issues to essential elements d. Values his/her opinions on technical matters
2. Job competence-performance a. Accomplishes any task thoroughly and efficiently b. Develops innovative solutions to difficult problems c. Is predictable, consistent, reliable d. Is able to timely prioritize critical tasks e. Is self-sufficient, motivated, self-starter
3. Job competence-performance under pressure a. Thinks and acts promptly b. Is effective under unexpected emergencies c. Is effective under prolonged periods of stress d. Demonstrates good judgment; avoids unnecessary risk
4. Leadership a. Motivates others to complete tasks b. Delegates work and allows others appropriate time c. Is decisive/flexible when required d. Has command presence and projects decisiveness e. Would enjoy working in a group with this person as leader
5. Teamwork a. Puts group goals ahead of individual goals b. Works effectively with many different kinds of people c. Pulls his/her own weight d. Shares credit and accepts blame e. Would choose this person for my team
Thank-you, please turn the page and continue.
All ratings will be transferred and coded, all results will be CONFIDENTIAL.
\.WM
181
6. Personality a. Wears well over time; absence of irritating qualities b. Tolerates difficulties and frustration well c. Works in harmony with others d. Has a sense of humor e. Is tolerant of individual/cultural differences
7. Communication skills/external relations a. Presents self well; speaks clearly and effectively b. Represents the Command well c. Is concise and focused d. Is a good listener e. Is considerate of others Thank-you, please turn the page and continue.
All ratings will be transferred and coded, all results will be CONFIDENTIAL.
- - - . . - I . . ^ w — . n » _ , . , _
182
1 .
2 .
3 .
INSTRUCTOR PERFORMANCE ASSESSMENT ( p a r t 2)
C i r c l e your t r a i n i n g a i r c r a f t : T-37 T-38
C i r c l e your f l i g h t d e s i g n a t i o n : A B C D E F H
Circle your training classification: student instructor supervisor
4. (STUDENTS ONLY) On the roster below circle the names of the instructors you have personally flown with, both in the simulator or the aircraft.
5. Using the attached guidelines, rate the performance of each instructor (witnessed or perceived) in each of the seven categories. The sub-bullets provide focus areas for each topic.
Very Poor Poor Average N/A Good
Very Good Excellent
2 3 4 (*) S 6 7
* (Not familiar enough with instructor to evaluate)
Example
A,
B,
C,
D.
E,
F,
G.
JOB COMPETENCE-KNOWLEDGE
0 0 X
7 q J CO
3
^ 0
5"
b5
4 1 1
Ul
u
u
hi
3
JOB COMPETENCE-KNOWLEDGE
JOB COMPETENCE- PERFORMANCE
JOB COMPETENCE-PERFORMANCE UNDER PRESSURE
LEADERSHIP
TEAMWORK
PERSONALITY
COMMUNICATION SKILLS
APPENDIX B
LETTERS OF COORDINATION
This appendix contains the letters of coordination used
to obtain (a) U.S. Air Force approval for survey
Implementation, and (b) Wing Commander approval and
invitation to conduct the research at their training wing.
183
B M M H ^ t L ••:.• -^ , -•- -• • , , .;.«».r - . . ,m^vn,jmv>jij
184
CM HF MiQi (All I r r»s Am i f» i tc ' . m i i i /> I M ( • r m c > t i n r ' r - i rN»rn
II Af i f i ' »i r'l I A in f '111' r n/v '-.r i r x A« .
FFKIM: l iy ArM[M!/l)PnY/\S 550 (J St-reet: Wesl: S'VF. 36 Randolph AFB TX 78150-4738
SUHJ: Requesl . Uo CoiwiiJcM. SiirvRy wJl.li I n s t i r u c t o r P i l o t s ( I P ) and S t u d e n t s
TO: 6 4 MSSQ/MSP ATTN: Major Vlvori
1. We have rev.lev;ed the "Personal Ctiaracteristlca Inventory" submitted on behalf nf Cnpl: Gnrviii. We liave assigned a survey control number (SCtJ) of USAF SCN 92-91. This SCN expires on 31 December 93. Please Inform Cni)! Marvin that both SCN and expiration date should be placed In the upper right hand corner of the survey cover.
2. If you Ijpve any furtlier questions, please contact Mr Lou Da tkc*KDfln/4 87 - 56 do
:S H. HAMILTON Chief, Personnel Survey Branch
^**^Mi^ <«V
185
DEPARTMENT O n i l E AIR FORCE I I FAnOUAHI I ll'3 7 I 31 n n i H i mAI I I ING W I N G lATr.l
v A f j c r AIM r o n c E U A S E O K 7 3 7 0 5 - 5 0 0 0
RHPIV 10
AiiNor 7 1 OPG/CC
s»iBJ€ci ;^Tc IP P e r s o n a l i t y / P e r f o r m a n c e S u r v e y
2 4 FED 1993
to 8 FTS/CC 25 FTS/CC 7 1 OSS/CC
.1 . The I n f ormnt J on rv) iMnlned 111 t h e n t t n c h e d p a c k a g e i s a p r o p o s e d rlocTtornl I h e s I R l>y (.'apl n 111 . lohn (Jnrv ln from T e x a s T e c h l / n l v e r s i t y . Tlie tlier; I n I s c e t i L e r e d nroii iul r e s e a r c h i n p e r s o n a l i t y t r a J t t h e o r y n s J t re ln t i eH t o t h e p e r f o r m a n c e r a t i n g o f I P s .
2 . ( J n p t a i n ( J a i v l n inlorxlr . t o pnrfK>ii;i I I y n d m l n . l a t e r a s u r v e y t o a s many I n s t r u c t o r s aurl s t u d e n t s a s p o s s i b l e . l ie w i l l a s k i » i s t r u c t o r s , a s w e l l a s s t u d e n t s , t o s p e n d a few m i n u t e s o f t h e i r t i m e t o c o m p l e t e t h e s u r v e y on T u e s d a y , 2 March and W e d n e s d a y , 3 M a r c h . He p l a n s \:n s '? l )odule t l H s on a f l i g h t by f l i g h t b a s i s . R e q u e s t y o u r c o o p e r a t i o n t o h e l p Capt G a r v i n w i t h h i s s t u d y .
HE/ RY " Commander
y ) R. Y / ^ E Y d e r , 7 1 s t
o r t e l , USAF i o n s Group c c : 71 FTW/CC
AIK fORCt A ( iHtAI WAr OF IIFE
cr": .—— . J . . Jriii v i a
APPENDIX C
PCI CONSTRUCT COMPOSITION
This appendix contains the Personality Characteristics
Inventory construct question composition.
186
187 Personality Characteristics Inventory
/ -£+ (IE BiDolar) 1.
6. 8.
18. 21.
23. 30. 38.
VA-2. 14. 24. 27.
Not at all aggressive
Very submissive Not at all excitable in a maior crisis Very home oriented Indifferent to others' approval Feelings not easily hurt Never cries
Scale Composition P a r t i
A...D.,,\^,,.\j,.,tj
A...Mi...\^,,.lj...tL
A.,.Mi...\.^.,.l},,,t^
A...,Mi...\^...l)..,ML
A..,IJ., ,^^,.,LI,, .li ,
^ . . . JD . . . v^... LI... IL
i\...ri...\_,.,,lj..,Ej
Very little need for security A...B...C...D...E
(Verbal Assression) Very whiny Not at all complaining Doesn't nag Very fussy
/+ (Instrumentalitv +) 3. 9. 16. 26.
28. 31. 33. 40.
Not at all independent Very passive Not at all competitive Can make decisions easily
Gives up easily Not at all self-confident Feels very inferior Goes to pieces under pressure
/- (Instrumentalitv -) 4. 7. 10. 19. 22. 29. 32.
34.
Not at all arrogant Very boastful Not at all egotistical Very greedy Very dictatorial Very cynical Does not look out only for self; principled Not at all hostile
A...o...\^...Lf...ML
J\...Mi...K^...iJ..,Mlj
J\...i3...\^...Lf...iL
/V . . . i S . . . v^..»xJ... IL
r \ . . . l j . . . v ^ . . . l ^ . . . J L
r \ . . . D . . . \ _ y . . . l ^ . . . l L
r \ . . . j ) . . . ^ . . . X ^ . . . A L i
/ \ . . .D. . .v^. . .L/ . . . IL
/ \ . . . D • . . v ^ . . . l - / . . . Hi
/ \ . . . X ) . . . v ^ . . . 1-^... Hi
/ \ . . . J 3 . . . v ^ . . . L/• • . Hi
/ \ . . . Mj...\^... U...Mud
/ \ . . . J3 . . . v ^ . . . JL^... Hi
/ \ . . • D . . . x_^... H I . . . Hi
/ \ . . . X#. . . v^ . . . Hr . . . CJ
x \ . . . D . . . v ^ . . . H I . . . HJ
/ \ . . . D . . . \_x. . . H I . . . Hi
/ \ . . . XJ.. • v^. • . HI . . • Hi
X&. . . -D. . . \ .^ . . . U ... EJ
/ K . . . D • . . v^ . . • H I . . . Hi
Very aggressive Very dominant Very excitable in a maior crisis Very worldly Highly needful of others' approval Feelings easily hurt Cries very easily Very strong need for security
Not at all whiny Very complaining Nags a lot Not at all fussy
Very independent Very active Very competitive Has difficulty making decisions Never gives up easily Very self-confident Feels very superior Stands up under pressure
Very arrogant Not at all boastful Very egotistical Not at all greedy Not at all dictatorial Not at all cynical Looks out only for self; unprincipled Verv hostile
MM^VlM
188 Personality Characteristics Inventory Scale Composition - Part I (continued)
E+ (Exyressivitv) 5. Not at all emotional A...B...C...D...E 11. Not at all able to devote self A...B...C...D...E
completely to others 13. Very rough A...B...C...D...E 15. Not at all helpful to others A...B...C...D...E
20. Not at all kind A...B...C...D...E 25. Not at all aware of feelings A...B...C...D...E
35. Not at all understanding of A...B...C...D...E others
36. Very cold in relations with A...B...C...D...E others
Very emotional Able to devote self completely to others Very gentle Very helpful to others Very kind Very aware of feelings Very understanding of others Very warm in relations with others
C- (Nes Communion) 12. Not at all spineless 17. Subordinates oneself to
others 37. Very servile 39. Not at all gullible
/ \ • • • D • • • v.^ • • • 1 ^ • • • J-j
Very spineless Never subordinates oneself to others Not at all servile Very gullible
189 Personality Characteristics Inventory
Scale Composition Part II
A B C D E
Strongly Slightly Neutral Slightly Strongly Agree Agree Disagree Disagree
Mast (Mastery) 1. I would rather do something at which I feel confident and relaxed than something which is challenging and difficult.
4. When a group I belong to plans an activity, I would rather direct it myself than just help out and have someone else organize it.
5. I would rather learn easy fun games than difficult thought games.
8. If I am not good at something I would rather keep struggling to master it than move on to something I may be good at.
9. Once I undertake a task, I persist.
10. I prefer to work in situations that require a high level of skill.
13. I more often attempt tasks that I am not sure I can do than tasks that I believe I can do.
18. I like to be busy all the time.
Work (Work) 2. It is important for me to do my work as well as I can even if it isn't popular with my co-workers.
7. I find satisfaction in working as well as I can.
11. There is a satisfaction in a job well done.
14. I find satisfaction in exceeding my previous performance even if I don't
outperform others.
15. I like to work hard.
16. Part of my enjoyment in doing things is improving my past performance.
JrL-\. 'SI
Personality Characteristics Inventory Scale Composition - Part II (continued)
Corny (Competitiveness)
3. I enjoy working in situations involving competition with others.
6. It is important to me to perform better than others on a task.
12. I feel that winning is important in both work and games.
17. It annoys me when other people perform better than I do.
19. I try harder when I'm in competition with other people.
Part III A B C D E
Very much Fairly Slightly Not very Not at all like me like me like me like me like me
AS (Achievement Striving)
I. My general level of activity is higher than most people's.
5. I put more effort into the things I do than most people.
7. I take life in general more seriously than most people.
9. I take my work much more seriously than most people.
10. My job really stirs me into action.
I I . When I get involved in an activity, I am very hard-driving.
II (Impatience/Irritability) 2. When a person is talking and takes a long time to get to the point, I often feel like hurrying the person along.
3. I get irritated very easily.
4. I have a quick temper.
6. I tend to do most things in a hurry.
8. When I have to wait in line, such as at a restaurant or the movies, I often feel impatient and refuse to wait very long.
190
APPENDIX D
DATA ANALYSES TABLES
This appendix contains additional data analyses tables
Included are: (1)ANOVA tables for the scale appropriateness
assessment of the NASA/UT Astronaut Assessment Survey,
(2)ANOVA tables determining differences among group ratings
in perceived performance for Instructor Pilots, (3)Inter-
correlation tables for personality traits, and (4)Inter-
correlation tables of the demographic variables.
191
iiLMm.uieuuiu'>'-<m-"^M-
192
Table 13
Summary of One-way Analysis of Variance Between the Rating Groups for Performance
Scale Appropriateness (Performance-Under-Pressure)
Source df SS MS
Between 2 404.40 202.20 11.21 .0001*
Within 492 8870.76 18.03
Total 494**
* p<.001. ** Sample size is larger for this analysis than for the regression equation due to the inclusion of expert/evaluator pilots which are not assigned to flights.
Tukey's Honestly Significant Difference Procedure for Pairwise Comparisons (Performance-Under-Pressure)
Category Mean
Students 5.94a
Peers 6.44b
Supervisors 6.13a
Note: Higher means connote higher scores. Means having the same subscript are not significantly different at E<.05.
rsBBBBBB^am^am
Table 14
Summary of One-way Analysis of Variance Between the Rating Groups for
Perceived Instructor Pilot Performance (Job Competence-Knowledge)
193
Source
Between
Within
Total
df
3
572
575**
SS
775.41
18733-0
MS
258.47
32.75
7.89 001'
* p<.001. ** Sample size is larger for this analysis than for the regression equation due to the inclusion of the self rating group (n=152).
Tukey's Honestly Significant Difference Procedure for Pairwise Comparisons (Job Competence-Knowledge)
Category Mean
Students
Peers
Supervisors
Self
6.10a
5.88b
5.76b
6.07c
Note: Higher means connote higher scores. Means having the same subscript are not significantly different at E<.05.
fl
j; 5 R -M C lb C
z, ;.
e
> <
S:E 5 a "> — p ^ 1.-= ft 09 ^ U
•- P 'I "K .-s' Cu • = es
in rH
VO m o
CO in
ro o a\ -^
cs M
o CN
CN O
yx> CN
o in
vo CN O
CO
o CN o CO
>
cssi
u e. X bi
ro o
•
in o
•
Ji o >
o S U c
it
e
< V3
ale
w c/3
•£ •? ' c < tr.
CN CN
CO ' l * C>1
o ro o
o I
o c>«
t^ ^
rH O ro
CN CN o vo
rH ro
in CO fO
CN
vo CN
ro o
in CO o
^ ^
in H
<D H A Id H
CN
O
i)
CS
2 X
> i w;
o ex z .£ • I er C b s
Z w
CN m rH n
SSI
m
a •H
o •H MH UH (D O U c o
-H
(d H 0)
u o u C o CQ u (d 0)
Cl4
1 v l
CQ (U
cd o CQ
CQ
- H
<d
H
15 en
PI 0) 0) ^ 0)
PQ
CQ ti O
•H 4J td
rH 0)
O o u 0)
ti H
(d
o CQ
u
o H -H
o o CQ
a H V4 O
--T-r?ig:73j>;»»v»,tini.Liiiiunuu.' j-rr- ,>. . , . . :y^-H-^ ^ , - : . - . v - ^ y . ^ rs«-wM?t-3tMess.??-;ct:-_i.-._
c/3
o
i % in S 2 « ® 5 a. u
1;
O .=
I -I
o
5 «<
ro vo
o ^
ro CN
Ol ' i l *
ro vo
^ CN
CN in
CO o
GO r»
en vo •
H ^
1
VO CN • 1
rH VO • 1
rH r> 1
t*> fO •
o r> • 1
ro r4 •
vo o
CO
o <n m ro
vo CO CN
vo in
CN o
m
vo rH
in rH
ro o
vo CN o
S OS
« es
3 2 z u
in vo CN rH CO
m vo rH
CN H
vo CN
CN m
vo ro
CAl
s
• f i n **
-< z u S
o
a
at
(S o
w E. E -S
et a u o -.. u
e
I 'J
vo H
0) H .Q Id
£ 2 «- Si
CN m H II
21
en • p C (1)
MH 0) o u c o
TH JJ
H 0)
0
u c n)
g (0 0)
w
CQ 0)
rH
Id -H u Id >
o
•H
I" o §
ti
0)
0)
n CQ a o
•H Id H <u u u o o 0)
04
o •p o
CQ
H
o
*«;!«»"--• - > - ^ ^ l - . -