care/asas/activity 2 validation framework wp3 deliverable ...€¦ · work package 3 (human...

73
CARE/ASAS/VF-NLR-WP3-D5 ASAS in CARE CARE/ASAS Activity 2: VF Project-WP3-D5 page i Human Performance Metrics Report Version 1.0 – 12 November 2002 NATSCAIRS No 0203405 CARE/ASAS Action CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable 5 Human Performance Metrics EUROCONTROL Reference: CARE/ASAS/NLR/02-034

Upload: others

Post on 23-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR-WP3-D5 ASAS in CARE

CARE/ASAS Activity 2: VF Project-WP3-D5 page iHuman Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

CARE/ASAS Action

CARE/ASAS/Activity 2Validation Framework

WP3 Deliverable 5

Human Performance MetricsEUROCONTROL Reference: CARE/ASAS/NLR/02-034

Page 2: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR-WP3-D5 ASAS in CARE

CARE/ASAS Activity 2: VF Project-WP3-D5 page iiHuman Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

This page intentionally left blank

Page 3: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR-WP3-D5 ASAS in CARE

CARE/ASAS Activity 2: VF Project-WP3-D5 page iiiHuman Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

DOCUMENT REVIEWVersion Date Description Modifications

0.1 31/5/02 First draft

0.2 30/09/02 Updates according toEUROCONTROL comments

0.3 23/10/02 Updates according toEUROCONTROL comments

1.0 12/11/02 Brought to issue with minoreditorial comments

DISTRIBUTION LIST

Consortium EUROCONTROL and CARE/ASAS Action Manager

Rosalind Eveleigh NATS Mick van Gool EUROCONTROL Agency

Jose Miguel de Pablo Aena Francis Casaux EUROCONTROL Agency

John Bennett QinetiQ Ulrich Borkenhagen EUROCONTROL Agency

Juan Alberto Herreria Isdefe

Brian Hilburn NLR

Page 4: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR-WP3-D5 ASAS in CARE

CARE/ASAS Activity 2: VF Project-WP3-D5 page 4Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

EXECUTIVE SUMMARY

The aim of the current project is to specify a Validation Framework (VF) that provides forcomparability and consolidation of results across various ASAS research projects. The widerange of potential operational concepts and diverse techniques that may be used for theirvalidation have led to the requirement for the framework to be generic.

The project involves two complementary work packages, WP2 & WP3, identifying metricsthat may be used to measure performance in an ATM environment. This report presents ananalysis of human performance metrics that can be used in the ATM environment in generaland an ASAS environment in particular.

Work Package 3 (Human Performance Metrics) identifies and provides guidance formeasuring human performance and acceptance when assessing the performance of ASASapplications. This work builds on that completed in WP1 (scenario definition) and WP2(system performance metrics). A close relationship with WP1 and WP2 is necessary in orderto understand the situations in which ASAS applications would be used and in order tounderstand the performance of the system as a whole.

The concept of the system as a whole is important to the consideration of any newapplication. The human operator (of the ground-side elements of the system and the air-sideelements) will continue to be the main conduit for system inputs and outputs. Therefore, tomaximise the performance of the system as a whole, the human operator and the machineelements of the system must work in concert, not against each other for fear that the actionsof one or the other will compromise their own ability to perform effectively. Humanperformance in the context of system performance as a whole includes the following generalareas:

• Workload (Hilburn, B. et al 1997)

• Situation Awareness (Alfredson, J. 2001)

• System Monitoring (Hilburn, B. et al 1997)

• Teamwork (Kelly, C. et al 2001)

• Trust (Ruitenberg, B.1998)

• User Acceptance/Usability (Kirakowski, J. 1994)

• Human Error (EATMP 2000b)

In addition to the above seven general areas, one specific human performance area wasidentified, namely: Task performance (i.e. how well did the pilot or controller actually perform,on some measures relevant to system performance). It was decided that such a measurewould have to be addressed on a case-by-case basis, since it is most likely to be closely tiedto a given ASAS application. It was one goal of this present effort to relate these humanperformance areas (and associated metrics) to both earlier WP2 (i.e. system performanceareas), but also to the higher level goal of mapping human performance areas to ASASapplication categories. A broad survey of previous applicable studies and the relevantliterature resulted in a large range of measurement methods that quantify the above listedareas. These measurement methods have been briefly described but it is acknowledged thatany person wishing to assess human performance in ASAS applications may have difficultychoosing between the many methods. Providing guidance to the interested practitioner wastherefore the next objective to WP3.

Page 5: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 5Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

Using scenarios defined during WP1, a framework has been constructed to guide thepractitioner concerned with measuring human performance on ASAS applications throughthe decisions that enable her/him to choose an appropriate measurement methodology. Thisframework allows the practitioner to make simple decisions along the following dimensions inorder to select suitable measurements:

• Task/Function (ASAS application, but also decomposed further into tasks that comprisethe application);

• Validation Technique (Fast-Time Simulation, Real-Time Simulation, Analytic Study,Survey);

• Perspective (Ground or Air);

• Human Performance Area (as listed above);

• Data Type (Subjective or Objective);

• Metric (a subset of all possible human performance metrics).

There is also a comment field associated with each metric, including details of how to applythe metric, whether it is intrusive, and any other constraints of which the practitioner needs tobe aware.

A chief goal of the present work was to link the selection and use of human performancemetrics, directly to the ASAS application categories (as identified in the PO-ASAS). With thisin mind, a set of potential human performance issues was extracted directly from PO-ASAS,which in turn yielded a provisional set of “key” human performance areas – i.e., those thatseemed most relevant for each given ASAS application category. On the basis of subjectmatter expertise, an estimation was then made of the relevance of the four validationtechniques for assessing each of the even identified human performance area. This, it ishoped, can facilitate the selection of appropriate human performance metrics, on the basis ofa given ASAS application category. It must be noted that this directive approach is not byitself sufficient, nor does it necessarily capture all the potential human factors that may be ofinterest for a given ASAS application. Further, the validation practitioner must also considerpotential peculiarities of the chosen validation exercise and other evaluative factors (e.g.cost, intrusiveness, etc) in ultimately selecting metrics. However, it is hoped that the presentreport can provide useful input to this process.

Page 6: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 6Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

GLOSSARY

ADS-B Automatic Dependent Surveillance BroadcastASAS Airborne Self Separation AssuranceATC Air Traffic ControlATCo Air Traffic ControllerATM Air Traffic ManagementBRNAV Basic Area NavigationCARE Co-operative Actions of R&D in EUROCONTROL.CARS Controller Acceptance Rating ScaleCAVA Concerted Action on Validation of ATMCD&R Conflict Detection and ResolutionCDTI Cockpit Display of Traffic InformationCLSA China Lake Situation AwarenessDEVAM Development of EATCHIP/EATMP Validation MethodologyECG ElectrocardiogramEMT Eye Movement TrackingFMS Flight Management SystemGNSS Global Navigation Satellite SystemGSR Galvanic Skin ResponseHCT Human Computer TrustINTEGRA Advanced ATM Tool Integration ProjectISA Instantaneous Self AssessmentMAEVA Master ATM European Validation PlanPUMA Performance and Usability Modelling Technique for ATM ToolsetTLX Task Load IndexRHEA Role of the Human in the Evolution of ATM SystemsRT Radio Telephony (also Response Time)RTS Real Time SimulationRVSM Reduced Vertical Separation MinimaSA Situation AwarenessSAGAT Situation Awareness Global Assessment TechniqueSALIANT SA Linked Instances Adapted to Novel TasksSART Situation Awareness Rating TechniqueSATI SHAPE Automation Trust IndexSHAPE Solutions for Human Automation Partnerships in European ATMSUMI Software Usability Measurement InventorySWAT Subjective Workload Assessment TechniqueTCAS Traffic Collision Advisory SystemTHERP Technique for Human Error Rate PredictionTHEA Technique for Human Error AnalysisTIS-B Traffic Information Services BroadcastTRACEr Technique for Retrospective Analysis of Cognitive Errors in ATMVF Validation FrameworkWP Work Package

Page 7: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 7Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

TABLE OF CONTENTS

1. INTRODUCTION................................................................................................................. 81.1. Objectives of the project............................................................................................ 81.2. Objectives of WP3..................................................................................................... 81.3. PREVIOUS WORK ................................................................................................... 9

1.3.1. Previous Relevant ASAS Research............................................................ 91.4. Work Packages 1 and 2 ............................................................................................ 9

1.4.1. WP1 “Scenario Framework” ....................................................................... 91.4.2. WP2 “System Performance Metrics”........................................................ 10

1.5. The Link between System and Human Performance Metrics................................ 101.5.1. SAFETY...................................................................................................... 111.5.2. CAPACITY/EFFICIENCY........................................................................... 121.5.3. ENVIRONMENT......................................................................................... 121.5.4. SECURITY AND DEFENCE...................................................................... 13

1.6. Summary ................................................................................................................. 132. WP3 APPROACH.............................................................................................................. 13

2.1. Literature review ...................................................................................................... 132.2. How to use the framework....................................................................................... 14

2.2.1. ASAS Application....................................................................................... 162.2.2. Validation Technique.................................................................................. 172.2.3. Perspective ................................................................................................ 182.2.4. Human Performance Area......................................................................... 182.2.5. Data Type................................................................................................... 202.2.6. Metric.......................................................................................................... 202.2.7. Comments.................................................................................................. 20

3. MEASUREMENT ISSUES................................................................................................. 203.1. Selecting the Correct Test Subjects ....................................................................... 213.2. The Need for Proper Experimental Design............................................................. 213.3. The Need for Appropriate Analytical Techniques ................................................... 223.4. Measurement Issues: Summary............................................................................. 22

4. HUMAN PERFORMANCE METRICS............................................................................... 234.1. Workload.................................................................................................................. 23

4.1.1. Techniques for Measuring Mental Workload............................................. 244.1.2. Physiological metrics of workload.............................................................. 254.1.3. Subjective metrics of workload .................................................................. 254.1.4. Performance metrics of workload .............................................................. 264.1.5. Taskload Measures.................................................................................... 26

4.2. Situation Awareness................................................................................................ 304.3. System Monitoring................................................................................................... 334.4. Teamwork................................................................................................................ 354.5. Trust......................................................................................................................... 404.6. Usability/Acceptance............................................................................................... 444.7. Human Error ............................................................................................................ 46

5. APPLICATION OF FRAMEWORK IN TWO SELECTED SCENARIOS........................... 495.1 IN-DESCENT SEPARATION SCENARIO ........................................................................ 495.2 SELF-SEPARATION IN MIXED EQUIPAGE EN-ROUTE AIRSPACE............................ 516. CONCLUSIONS................................................................................................................. 52

Page 8: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 8Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

1. INTRODUCTION1.1. Objectives of the project

EUROCONTROL's CARE/ASAS action aims to consolidate previous work on ASAS and toco-ordinate future EUROCONTROL sponsored research in this area. As part of thisprogramme, EUROCONTROL contracted for the development of a validation framework forthe assessment of proposed ASAS applications.

The four identified ASAS application categories (as specified in the PO-ASAS) cover a widespectrum of delegation of responsibilities and therefore a wide range of potential operationalconcepts that will need to be evaluated. An evaluation process or validation is required inorder to ensure that application is able to deliver the anticipated benefits and therefore be aworthwhile investment for the ATC providers and airlines. The commonly agreed Europeandefinition of validation is stated below.

The process through which a desired level of confidence in the ability of a deliverable tooperate in a real-life environment may be demonstrated against a pre-defined level offunctionality, operability and performance….

While there is consistency in the definition of validation, the many approaches used in pastvalidation exercises has meant it has been impossible to compare their results andconclusions, and thereby identify the best future operational concept on a Europe-wide basis.Projects such as CAVA, DEVAM and MAEVA have started to provide more detailedguidance to those responsible for the conduct of validation exercises to meet this need.

The aim of the current project is to specify a Validation Framework (VF) that provides forcomparability and consolidation of results across various ASAS research projects. The widerange of potential operational concepts and diverse techniques that may be used for theirvalidation have led to the requirement for the framework to be generic.

1.2. Objectives of WP3

The project is divided into one management and four technical work packages defined asfollows:

• WP0 - Management• WP1 - Identification of ASAS operational scenarios• WP2 - System performance metrics• WP3 - Human performance metrics• WP4 - Application of validation framework.

This report presents the work of WP3, which sought to identify human performance metricsapplicable to the assessment and validation of ASAS applications, for both the flight-deckand ATC. Further, it sought to provide guidelines for the application of such metrics (in real-time, fast-time and survey data collections), and to assess the relative strengths of each.WP3 was carried out in close parallel with WP2, which focused on system performancemetrics. The WP3 approach is discussed in chapter 2.

Page 9: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 9Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

1.3. PREVIOUS WORK1.3.1. Previous Relevant ASAS Research

Recent research into the concept of mature ASAS generally rests upon the followingdefinition of free flight, as provided by RTCA's Task Force 3 (RTCA, 1995):

"... a safe and efficient flight operating capability under instrument flight rules (IFR) in whichthe operators have the freedom to select their path and speed in real time. Air trafficrestrictions are only imposed to ensure separation, to preclude exceeding airport capacity, toprevent unauthorised flight through Special Use Airspace (SUA), and to ensure safety offlight. Restrictions are limited in extent and duration to correct the identified problem. Anyactivity which removes restrictions represents a move toward free flight...."

Since the time of the RTCA report, research has progressed into ASAS and relatedconcepts, couched in such terms as Free flight, Free routing, and ASAS. Within the contextof Action Plan 1 of the FAA/EUROCONTROL R&D Committee, the operational concept ofASAS is spelled out in AP1’s own ASAS Principles of Operation (FAA/EUROCONTROLAP1, 2001), which identifies the four ASAS application categories, as described in thefollowing section.

Two publications, each reviewing recent ASAS research activities, provided especiallyvaluable input to the current WP3 activities. Together, they cover technological, economic,human factors, and institutional aspects of ASAS activities and concepts. Van Gent et al(2000) conducted a literature and study review of European ASAS activities. Krozel (2000)presents the results of a survey into free flight literature, concepts and issues, primarily asstudied in the USA. In the process, the report reviews many of the methods and humanperformance metrics used in past ASAS simulations.

1.4. Work Packages 1 and 21.4.1. WP1 “Scenario Framework”

One of the chief goals of WP1 of the CARE/ASAS Activity 2 project was the elaboration ofstandardised scenario templates to be used as a reference for validating ASAS applications.These templates were intended to provide a basis for the creation of validation scenarios,which could be used with both analytic, fast-time simulation and real-time simulation. It wasassumed that such standardised scenario templates would provide several benefits,including:

• A common reference with which designers and researchers could structure theirvalidation framework and scenarios, in such a way that terminology and results could becomparable across validation efforts;

• Enhanced traceability of the scenarios; and• Design support in the creation of scenarios.

WP1 produced four separate templates, one for each of the following four ASAS applicationcategories, as identified in the Principles of Operation for the Use of ASAS Systems:• Airborne Traffic Situation Awareness• Airborne Spacing• Airborne Separation• Airborne Self-Separation

Page 10: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 10Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

These templates were then applied to a representative set of previous ASAS experimentscenarios. These scenarios were selected from a list of projects proposed by the partnersfrom among those available within their organisations. A document provided guidelines andthe rationale for selecting the most appropriate projects from the list was elaborated. Theseguidelines provide the requirements to be met by the candidate projects to extract referencevalidation scenarios. The guidelines range from the general to the specific, and from themandatory to the desirable. For more detail on the WP1 approach and results, refer todocument CARE/ASAS/VF-ISD-WP1-D1, 2002.

1.4.2. WP2 “System Performance Metrics”

WP2 sought to identify system performance metrics applicable to ASAS. This was guided bythe initial review of metrics provided in the CARE ASAS Activity 2 preliminary report. WP2followed an integrated approach, that combined a top-down with a bottom-up approach, asfollows:• A top-down approach, beginning from the existing taxonomy related to high level metrics,

as captured in the EMERALD RTD plan, and supported by the ATM 2000+ Strategyobjectives used in MAEVA;

• Further, a bottom-up approach beginning from the measures used in or recommended forprevious ATM validation projects.

WP2 sought to define each system performance metric, including criteria for their application,and guidance on how to use the metric. As described in more detail later, it was intendedthat WP3 mirror, to the extent possible, the exact approach used in WP2. Whereas WP2focused on system performance, WP3 focused on human performance metrics.

1.5. The Link between System and Human Performance Metrics

ATM represents a complex human-machine system. It is therefore appropriate to considerthe two types of metrics in an integrated way. Indeed, in many cases (i.e., when systemresponse relies directly on human input) it is difficult to distinguish between the two. By wayof background on the WP3 approach, the following shall lay out the link between theobjectives of the two WPs.

In general, human performance will have an important influence on any aspect of the ATMsystem because all information and activity is mediated (at some point) by a human operator.Even in situations that seem automatic, a human operator is involved. For instance, code-call-sign conversion on a processed radar display is a form of automation, but in order for thedisplayed call-sign to be correct, someone on the ground must have entered the correctcallsign and Mode A code into the callsign database and the pilot must have entered theassigned Mode A code correctly into the transponder.

The following sections briefly introduce the relationship between the five system performanceobjectives as identified in WP2 (Safety, Capacity, Economics, Environment, andSecurity/Defence) and the main human performance areas as identified in WP3 (Workload,Human Error, System Monitoring, User Acceptance/Usability, Situation Awareness, andTeamwork and Trust). Each of the seven human performance areas will be discussed inmore detail later, in Chapter 3, in the context of ASAS validation applications.

Although every attempt was made to parallel in WP3 the method and framework of WP2 (onSystem Metrics), there are some important differences. For instance, the top-down effort ofWP2 (System Performance Metrics) decomposed the framework into, successively:Objectives, Performance Areas, and Metrics. WP3, on the other hand, started from Human

Page 11: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 11Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

Performance Areas (Workload, Situation Awareness), out of which separate metrics (NASATLX score, blink rate, etc) fell.

The reason for omitting “Objectives” in WP3 is that each of the six human performance areasin WP3 is, at least implicitly, linked to all of the high level objectives of safety, capacity,economics etc, as identified in WP2. For example, a new tool that, for instance, optimises anATCo’s workload would be expected to benefit each of the objectives.

1.5.1. SAFETY

Very little occurs in the ATM system without the approval of a human operator, and thehuman remains responsible for the safety of the ATM system. Under current operations, theair traffic controller is responsible for maintaining separation between aircraft, or ensuringthat aircraft have the information they require to maintain separation from other aircraft. Thepilot is tasked with correctly obeying the instructions or following the flight plan. However,under ASAS, there will be radical changes in spacing and separation responsibility, whichshould ideally improve the level of safety, or at the very least, maintain the same level. It isimportant that any validation programme considers the safety implications of these changesin responsibility in terms of safety as well as any benefits in performance. The humanperformance areas influence Safety in the following ways.

Workload mediates in a person's ability to attend to all information, to act or respond in atimely manner to events or requirements, to plan ahead, to monitor own performance andthat of others, and to make effective decisions. Excessive workload will have a detrimentalimpact on safety. In addition, the user also has to attend to the correct information in order tomake decisions and, crucially, understand what will happen to or with that information in thefuture. This is the scope of situation awareness, and through this, the person can makeeffective decisions that protect safety. If a decision is made without knowing orunderstanding some item of information, or there is a breakdown in any of these activities,system safety may be threatened.

Human Error can be an action, or the lack of an action, that directly removes a barrierprotecting safety. ATM is a distributed activity with a great many barriers protecting safety.These include verification of instructions by the recipient and others observing the instructionor its outcome. However, if errors occur in combination, or an error occurs which is directlyrelated to safety, system safety in general can be compromised. The change in roles andresponsibilities between the controller and pilot may introduce additional opportunities forerror or the nature of the errors made by either agent. Usability needs to be taken intoaccount; it is not automatic that tools suitable for a given task on the ground can be usedeffectively by a pilot, with a concomitant implication for safety. Similarly the systemmonitoring requirements change with the change in role and a may cause a reduction in theeffectiveness of the agent in performing this task, with similar implications for safety.

Trust is also important in relation to safety. The ATM system relies on its agentsunderstanding the goals of others and working in such a way to integrate the goals. Shouldthe user not accept a system or find it difficult to use, they may ignore or try to circumvent itsactivities. Likewise, teamwork is important in relation to safety, it is impossible for a singlehuman to carry out all roles, therefore s/he must trust others to carry out some tasks. If twoagents do not trust each other or do not work toward the same goals, incompatibilities mayarise that threaten system safety.

The introduction of ASAS is hoped to benefit safety by supplementing the ability of the pilotor controller to perform and monitor tasks and stimuli. Each of the four ASAS applicationcategories are anticipated to maintain safety, if not definitively improve it. As can be seen

Page 12: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 12Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

from the arguments above, it will be important to monitor all aspects that impact the safety ofthe system with such a radical change in roles and responsibilities.

1.5.2. CAPACITY and ECONOMICS

For the purposes of this section, Capacity and Economics (related here most closely toefficiency) have been combined due to their interaction. For example, if aircraft are beingflown more efficiently it should follow that the capacity of the system should increase.Likewise, if capacity is increased efficiency should also increase. Obviously, this is asimplistic explanation and there are many examples where this relationship will not beobserved. However, this description of the relationship is suitable for the followingdiscussion.

The human performance areas influence capacity and efficiency in the following ways.

As with Safety, all systems pertaining to capacity and efficiency, at some point, will beinfluenced by the human operator. If an air traffic controller considers that their workload istoo high, they will take steps to manage that workload. This may include holding aircraft,delaying instructions that would enhance efficiency, and applying flow restrictions. Likewise,a pilot may not respond to instructions or may be unable to comply with instructions. Theseactions, in response to workload, will impact the capacity and efficiency of the ATM system.

If a human operator in the aviation system commits an error, it may take some time torecover from the error. For instance, if an air traffic controller instructs an aircraft to turn leftto a heading of 270°, and the pilot turns the aircraft right to that heading, the efficiency of theflight for that aircraft will be compromised. This error may also knock on to the capacity ofthe whole system, as is often seen in the case of unfamiliar pilots entering a busy TMA ormanoeuvring on the ground.

Human operators in the aviation system tend to work in the future, attempting to make plansthat are optimised for the conditions anticipated at that time. As such, they need tounderstand the current situation and how that situation will develop in the future. Thissituation awareness helps the human make decisions now that will improve their efficiencyboth now and in the future. This will also benefit capacity.

The goals of all the parties involved in a system must be mutually compatible. That is to say,output of one part of the system must match the required input for another, adjacent part ofthe system. This element of teamwork has an important influence on efficiency and capacity.It is also necessary to trust that other parties will be working toward the shared objectives,otherwise capacity and efficiency will suffer while the objectives of others are confirmed. Theintroduction of ASAS will allow greater sharing of information regarding intent and allowpersonnel to make more effective decisions relating to capacity and efficiency.

1.5.3. ENVIRONMENT

The relationship between human performance areas and Environment is less strong thanwith the other system performance objectives (within the ASAS application domainspecifically and aviation more generally). It is possible that, should the pilot make a mistakedue to workload or system usability, there may be an environmental impact through noise(low flying aircraft off-route) or pollution (emergency aircraft may jettison fuel over a sensitiveenvironmental area). It is undeniable though that the actions of human operators can have amajor impact on the environment beyond that envisaged through normal operation of theaviation system.

Page 13: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 13Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

1.5.4. SECURITY AND DEFENCE

As with environment above, the relationship between Security and Defence and humanperformance is less strong (within the ASAS application domain specifically and aviationmore generally). Assuming that all reasonable steps are taken in design and operation ofaviation systems, there is little scope for human performance to compromise this area ofsystem performance. However, what scope there is takes a different complexion from theother system performance objectives.

An example of how human performance could compromise Security and Defence is in thearea of trust and teamwork. If any human operator trusted the systems so implicitly as tonever question what is being presented, it is possible that malicious interference could goundetected, thus compromising the system. This outcome is likely to be further influenced bya human user's situation awareness and their ability to monitor the system.

Two additional areas of benefit may be civil/military co-operation and military access to civilairspace. As noted above, greater sharing of information regarding intent could be used toresolve these areas more strategically and thus more efficiently and more safely thancurrently.

1.6. Summary

One of the main conclusions that can be drawn from the above discussion is that ultimatelythe system is a partnership between human and non-human factors. Given the interplaybetween the two, it is often hard to distinguish between human and non-human elements ofthe system, in terms of the system’s overall response. To ensure that all system performanceobjectives are met it is important that the system optimises the partnership between thehuman and the machine, exploiting the relative strengths of the two groups. Since the mainmediator in aviation/ATM system performance will continue to be the human for theforeseeable future, it must be concluded that design should begin with the human and buildaround the characteristics of the human. Following this approach, it is more likely thatsystem performance objectives will be met and less likely that human performancecompromises overall system performance.

2. WP3 APPROACH

To extract a set of human factors analyses and metrics applicable to ASAS, WP3 proceededthrough a literature review, followed by a synthesis of human factors studies relevant toASAS applications, and finally an identification of metrics suitable for use. This literaturereview was augmented by in-house expertise and experience into ASAS and relevant humanfactors topics.

2.1. Literature review

Given the topical nature of ASAS research, any literature review must be seen as a work inprogress. That is, it is a dynamic area of research, and new results are becoming availableon almost a weekly basis. The bibliography cannot claim to capture the entire scope ofliterature on either human performance measurement, or ASAS research, but hopefullycaptures the most relevant literature. The bibliography is based on literature in two mainareas: First, the aviation human factors domain, as it touches on human performance

Page 14: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 14Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

measurement techniques. The second broad area was ASAS and free flight work. In manycases, the literature was broad enough to encompass human performance measurementirrespective of actual domain (i.e. they might be relevant to non-ASAS, or even non-aviationscenarios1). Other references help refine the search for metrics by considering the uniquerequirements of the ASAS research scenario.

The bibliography captured a number of human performance areas. The following sevenwere seen to be most relevant to ASAS:• Workload• Situation awareness• System monitoring and error detection• Teamwork• Trust• Usability and user acceptance• Human error

In addition to the above seven general human performance areas, one additional “specific”human performance area was identified, namely: Task performance (i.e. how well did thepilot or controller actually perform, on some measures relevant to system performance). Itwas decided that such a measure should be omitted from the above list of general humanperformance areas, as it is highly dependent on a given ASAS application (and wouldtherefore have to be addressed on a case-by-case basis ).

This effort began with a review of the CARE/ASAS Activity 2 preliminary report (CARE/ASAS2001), part of which focuses on a state-of-the-art review of validation techniques. This state-of-the-art review ranged from general ATM validation methodology (e.g. such projects asVAPORETO or EVAS) to fast-time simulation methods (MUFTIS). In general, the state-of-the-art review focused on system metrics only (most likely because these far outnumberthose identified for human performance). When human performance metrics weresuggested, they were at a high level, as only as an indirect means to determine a systemperformance metric. For instance, workload was used as an indirect indicator of capacity (c.f.INTEGRA 1999, INTEGRA 2000). Only one project was identified that specifically focused onhuman factors validation methods, namely the European collaborative project RHEA (‘Roleof the Human in the Evolution of ATM Systems’). The aim of RHEA was to develop systemdesign guidelines that could facilitate better integration of automated functions into futureATM systems, with particular regard to the role of the human controller. RHEA identifiedseven ATM scenarios (none of them related to ASAS), ranging from en route operations toarrival/departure control under mixed mode operations. Other useful inputs included vanGent et al (2000), which reviewed European ASAS research (including easement methodsand metrics) and Krozel (2000), who conducted a similar survey of (primarily American) freeflight research to date.

2.2. How to use the framework

The metrics collated are presented systematically using a tabular framework. This wasselected predominantly for its ease in mapping onto previous CARE-ASAS work packages.The table is divided into seven columns: ASAS Application; Validation technique;Perspective; Human Performance Area; Data type; Evaluative criteria; and Metric/Guidance 1 It was noted that many of the metrics identified in this effort were applicable regardless of ASAS application. It

was the intention that application-specific advice be provided in the form of guidelines (caveats, precautions,etc.) accompanying the eventual metrics table.

Page 15: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 15Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

(i.e. comments). For an example of how such a table would appear, see Figure 2.1 below.The framework allows the user to select from a restricted set of options under each columnheading. This narrows down the number of possible metrics that are recommended for use,to correspond with the options selected by the user. These categories are explained in moredetail, in sections 2.2.1 and later.

ASASApplication

ValidationTechnique

Perspective HumanPerf. Area

DataType

EvaluativeCriteria

Metric(andguidance)

Wor

kloa

dTr

ust

Tea

mw

ork

Subj

ectiv

e

Airb

orne

sel

f sep

arat

ion

Fast

Tim

e (e

tc)

Airside

Situ

atio

n A

war

enes

sSy

stem

Mon

itorin

gU

sabi

lity

Hum

an E

rror

Obj

ectiv

e

Intru

sive

ness

Cos

tR

elia

bilit

yV

alid

ityEx

perti

seR

esou

rces

Figure 2.1 Format of Metrics

There are other, additional, factors that can influence the type of metrics (and the meansused to collect them) chosen. One of these is the stage of validation (as captured in the“Validation Route Map” of MAEVA) concerned. If validation is taking place with an earlyprototype, this is likely to influence the type of methods (i.e. validation technique) chosen.This, in turn, is likely to drive many of the choices made (e.g. how much money can bespent? How much validity is required? Is a “quick and dirty” study sufficient?). The answersto such questions can obviously drive the selection of human performance metrics.

Operations

Operationaltrial

Shadow-modetrial

Field test

Large-scalereal-time

simulation

Small-scalereal-time

simulation

Fast-timesimulation

Analyticmodelling

Page 16: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 16Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

Figure 2.2. Example Route on Validation Route Map

2.2.1. ASAS Application

Principles of Operation for the Use of ASAS (PO-ASAS) distinguished four categories ofASAS application, as follows:

Airborne Traffic Situational Awareness—applications aimed at enhancing flight crews’knowledge of surrounding traffic situation, both on the ground and in the air. No changes inseparation tasks or responsibility required. Potential Airborne Traffic Situational Awarenessapplications include:

• Enhanced Visual Acquisition;• Enhanced Visual Approaches;• Enhanced “See and Avoid”;• Enhanced Traffic Information Broadcast by aircraft; and• Enhanced taxi and runway occupancy awareness.

Airborne Spacing—applications that require flight crews to achieve and maintain spacing withdesignated aircraft, as specified in a new ATC clearance. Although flight crews are givennew tasks, separation is still assured by ATC. Airborne spacing applications include:

• In descent spacing;• Level flight spacing;• Lateral crossing and passing; and• Vertical crossing.

Airborne Separation—in these applications, the controller delegates separation responsibilityto the flight crew. Flight crews’ responsibility for separation assurance is limited to adesignated target aircraft and is limited in time, space and scope. Airborne separationapplications include:

• In-descent separation;• Level flight separation;• Lateral crossing and passing;• Vertical crossing; and• Paired Approaches.

Airborne Self-separation—applications that require flight crews to maintain separation, inaccordance with applicable separation minima and rules of flight. Potential Airborne Self-separation applications include:

• Airborne self-separation in ATC controlled airspace;• Airborne self-separation in segregated en-route airspace; and• Airborne self-separation in mixed equipage en-route airspace.

For each of the four ASAS application categories, the PO-ASAS also identified somepotential impacts that each of the ASAS applications might have, in terms of:

• Roles and procedures for flight crews and controllers;• Equipage implications (both air and ground side);• Principles or implementation; and

Page 17: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 17Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

• Human factors concerns.

The current WP3 report builds upon this assessment provided in the PO-ASAS, byattempting to extract a list of potential human performance areas, categorised by ASASapplication area. This is further discussed in section 2.2.4, in the context of the seven HumanPerformance Areas.

2.2.2. Validation Technique

The second column allows for the type of study (to test the chosen function) to be selected.As with WP2, the following broad study types are distinguished:• Real-Time Simulation (RTS);• Fast-Time Simulation (FTS);• Analytic Studies; and• Surveys.

Real-Time Simulation studies are otherwise referred to as 'Human-in-the-loop' studies. Inthis sort of investigation, qualified air traffic controllers and/or pilots are required to carry outa task in a realistic environment. This environment may include other airspace users and/orunits in order to increase realism. The task proceeds at the same pace that it would do inlive operation, although elements such as ‘targets’ and other task factors may be computergenerated. This type of simulation affords the analyst a wide variety of opportunities togather data pertaining to almost all performance areas, including safety, workload, service toaircraft, situation awareness, efficiency, capacity, etc. This data has high validity because itis based on qualified personnel carrying out representative tasks in a realistic environment.

Fast-Time Simulation studies do not involve a user. Instead, for studies focusing on humanperformance issues, previous analysis is used to assign some sort of value to a task (e.g. a'time to complete' value of 15 seconds, or a workload score of 5) and tasks are specified tooccur in certain combinations and frequencies in order to represent the actual activities of areal person. A scenario is described in terms of trigger events and a duration, and varioustasks are programmed to occur in order to achieve the aims of the scenario. By basing thisscenario on task values instead of actual user data, a processing unit can calculate theresulting values (relating to any performance area that is of interest and has already beendefined) almost instantaneously. This way, an indication of (for example) what the workloadimplication of a new airspace sectorisation might be can be provided in much less time thanit would take to conduct a Real-Time Simulation. The data also does not suffer from thesubjective biases of real people.

Analytic studies are similar to Fast-Time Simulation and may actually be required in order toprovide values for a Fast-Time Simulation. An Analytic study is a table-top exercise togenerate notional data relating to human performance areas. For instance, a task analysismay determine that in order to successfully complete a task, there are several componentparts. Each component part may, itself, be comprised of sub-parts. This data is thenassigned nominal time values, and values for the degree to which they use certain mentalresources. This information can then be used to calculate, for example, a workload scorebased on the time it takes to achieve the task and the degree to which the mental resourcesrequired interfere with each other. Several tasks can then be compared with each other toidentify where tasks may not be successfully undertaken concurrently, or where workloadmay become unacceptable, or how the likelihood of error might increase.

Survey studies, obviously, rely on collecting survey responses from participants. Althoughone might conduct a survey study per se, they are also often part of a larger study in which,

Page 18: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 18Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

for instance, participants are interviewed after their participation in a real-time simulation oroperational trial.

2.2.3. Perspective

The Perspective column refers to the agent – either Airside (i.e., flight deck) or Groundside(i.e. ATC) – for which this measurement is relevant. Notice that, in most cases, a metric isrelevant for use with either air or ground. Specific guidance (e.g. caveats, and usageguidance) might differ between the two, and this is generally spelled out in the accompanyingadvice.

2.2.4. Human Performance Area

Human Performance Areas (fourth column) refer to the seven broad areas of humanperformance that capture the focus of current human factors thinking. Each performancearea represents a human performance area, such as workload, situation awareness, etc, thathas been identified as critical to ASAS scenarios, and/or is recognised as critical tohuman/machine system interactions more generally. It should be noted that humanperformance areas are the highest level of the framework in WP3 that, as mentionedelsewhere, that is relevant to the selection of metrics, as all human performance areas applyto all high level objectives.. Various metrics are available for each of the human performanceareas.

Based on the PO-ASAS, an attempt was made to extract from each of the four ASASapplication categories, a list of potential human performance issues, which could in turn berelated to the human performance areas. These issues are drawn directly from PO-ASAS.Thus, for each ASAS application, an attempt has been made to identify the most relevanthuman performance areas. Notice the use here of the term “most relevant” – as each of theASAS applications potentially touches on all of the human performance areas. This processhas been used to speculatively identify a set of key areas for each ASAS category. Thefollowing list presents this analysis, broken out by ASAS application:

Page 19: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 19Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

Airborne Traffic Situational Awareness Identified Human Performance IssuesMonitoring of visual cockpit displays (Air)Errors in flight identification procedures (Air / Gnd)Maintain awareness of proximate aircraft (Air)Maintain awareness of equipage (Gnd)Air - ground teamwork (traffic information)Over-reliance on CDTIs (Air)Usability of flight -deck displays (Air)

Key Human Performance Areas:Monitoring, Human Error, Situation Awareness

Airborne Spacing Identified Human Performance IssuesWorkload of spacing task (Air/Gnd)Errors due to ambiguity of transfer (Air/Gnd)Similar situation awareness problems (Air/Gnd)

Key Human Performance Areas:Workload, Human Error, Situation Awareness

Airborne Separation Identified Human Performance IssuesRole change toward strategic (Gnd)Fewer tactical demands (Gnd)Workload benefits (Gnd)Greater tactical demands (Air)Reduced inter-sector communications (Gnd)Errors due to ambiguity of transfer (Air/Gnd)Similar situation awareness problems (Air/Gnd)

Key Human Performance Areas:Workload, Human Error, Situation Awareness

Airborne Self Separation Identified Human Performance IssuesTrust in automation (Air)Co-ordination demands (Air)

Key Human Performance Areas:Trust, Teamwork

Table 2.24a Key Human Performance Areas, by ASAS Application category

It was found that, fairly irrespective of ASAS application category, the human performanceareas can be linked to the four validation techniques: That is, it is possible to identify whichvalidation techniques, in general, are most useful for assessing a given human performancearea for all ASAS application categories. This relationship is shown in the following table.

Page 20: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 20Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

Human Performance AreaValidationTechnique Workload

HumanError

SystemMonitoring Usability

SituationAwareness

Teamwork Trust

Analytic a aaa aa aa a aa aFast Time a aaa a aa a aa aReal Time aaa a aaa aaa aaa aaa aSurvey aa a a aaa aa a aaa

Table 2.24b Relevance of the validation techniques, for evaluating each of the humanperformance areas.

2.2.5. Data Type

The Data Type column is intended to indicate which type of data will be produced, accordingto which study type and metric (measurement) is chosen. This will be either subjective data(e.g. from a questionnaire) or objective data (derived from empirical methods such asbehavioural monitoring or physiological recording, which are generally less influenced bysubjective bias). There are specific reasons why the validation practitioner should beconcerned about the distinction. If attitudes are critical (eg, if pilot acceptance is key to usinga new display) then subjective techniques are indispensable. If, on the other hand, thepractitioner wants to compare two conditions under “untainted” conditions (he might not beconcerned about attitudes or effects of possible training, for instance), than objectivemeasures can provide a stronger indication.

2.2.6. Metric

The Metric (Measurement) column of the table is intended to show what is being measuredfor each (e.g. heart rate) and by what means (e.g. ECG).

2.2.7. Comments

The final column is for any additional comments that may be noteworthy. Notice that this iswhere some of the main considerations are provided regarding the use of the metric. It isunderstood that this guidance should never be considered as complete. The citedreference(s) should be consulted to help familiarise oneself with the selected metric and itsuse.

3. MEASUREMENT ISSUES

The goal of WP3 is, of course, to facilitate the use of human performance metrics in ASASvalidation. To this end, the current report outlines the knowledge needed to select and applythe appropriate metrics. Quite apart from metric-specific information and knowledge,however, it was realised early on that other more general types of guidance are valuable indesigning a validation exercise. It is the goal of this section to outline some of the othergeneral issues involved in conducting human performance measurement in a validationsetting. This section highlights several important areas that must be considered inconducting a validation exercise. These include: selection of the correct test subjects; theneed for proper experimental design; and the need for appropriate analytical techniques.

Page 21: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 21Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

3.1. Selecting the Correct Test Subjects

What type of controllers and pilots should participate in validation exercises? It is generallyassumed that the current generation of operators are the best ones with which to test newsystems. Although it is often important to get skilled, experienced operators as test subjectsfor validation exercises (general aviation pilots, for example, would typically make a poorsubstitute for airline pilots in ASAS validation exercises), it is also essential to consider thenature of the concept under evaluation. It could be, for instance, that the concept underevaluation is so qualitatively different from current day operations, that the current group ofoperators (e.g. airline pilots), would have a difficult time transitioning to such a fundamentallynew manner of working. The concept used to describe this situation is “transfer of training”(i.e., that old training might be irrelevant under a new way of working. The greater theconcept under study differs from current-day operations, the more critical it is to considerpotential transfer of training problems. This in turn has implications for selecting anappropriate group of test subjects.

In the case of the ground side of ASAS validation, in might be the case that the typical poolof controllers – namely civil ATCOs – does not represent well the ultimate target population,namely: controllers in the near future who will be capable of working under a possibly muchmore flexible structure than are current day controllers. One example is the use of militarycontrollers in place of civilian controllers, to help validate ASAS applications. The nature ofcivil and military traffic is quite different: Although civilian aircraft account for the greaterpercentage of the total European traffic, they tend to be predictable in their scheduling andflight path. Military traffic, although somewhat lighter, tends to be less predictable. It istherefore reasonable to speculate that military controllers are more representative of thefuture civilian ATCO under ASAS. Similar issues are likely to apply when considering theairside of ASAS validation, particularly in Free Flight Airspace.

A related point is that there is no need to use a completely homogenous group of testsubjects. It may need to be the case that subjects, for instance, are all identical inbackground. If possible transfer problems exist (or the researcher has good reasons tosuspect such problems), one useful approach can be to include a second test group—that is,to conduct the validation exercises using both groups, in a controlled way (e.g. presentingthe same conditions to each group), and specifically compare the results across the twogroups. This is a useful way to quantify the effect of population differences, and howimportant they might be in transitioning to the concept under study.

It is also critical to consider the number of test subjects to be used in a validation. If the goalis a comparison between conditions (e.g. current versus future operations scenarios), andthere is a desire to draw statistical significance from the results, the number of subjectsrequired can be quite high. This number can be reduced by careful experimental design. So-called “within subjects designs – in which each test subject is presented all possiblecombinations of the experimental conditions – are one way to reduce the number of testsubjects required, and improve the chances (assuming an actual difference betweenconditions) of findings statistical significance.

3.2. The Need for Proper Experimental Design

The usefulness of validation relies on its ability to provide meaningful results andconclusions. This can only be so, however, if a validation exercise is properly designed. Inaddition to choosing the correct type and number of test subjects (as discussed above), thereis a strong need to design the evaluation in such a way that uncertainty can be removed fromthe results. This include such factors as:

Page 22: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 22Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

• the effect of confounding factors—are there, for instance, time-of-day, or otherextraneous factors that can systematically influence the results (are all ASAS trialsconducted in the mornings, and all conventional trials conducted immediately afterlunch?);

• the importance of counterbalancing— it is conceivable that the order in which a series ofscenarios are presented can influence a test subject’s performance in unexpected ways.Counterbalancing is one means (others include randomising the order of conditionsacross test subjects) to avoid these pitfalls;

• the value of a control group—as a means of quantifying the effect observed in anevaluation, a control group (operating under, for example, “baseline conditions” can allowa comparison).

3.3. The Need for Appropriate Analytical Techniques

The dubious statistical claims, sometimes associated with advertising (what exactly does“50% cleaner” clothing look like?), underscore the potential misuse of statistical analysis. Themost basic category of statistics is termed “descriptive statistics.” These include the familiarnotions of the arithmetic mean and median, for example. That is, quantitative expressionsthat describe the whole of some data sort in a meaningful and simplifying way. At anotherlevel of sophistication are the inferential techniques, which attempt to determine whetherdifferences exist between two or more samples of data.

There are often several factors that conspire to complicate the use of inferential statistics invalidation. First is the relatively small pool of available test subjects, be they controllers orpilots, available. Tightly related is the issue that most “effect sizes” seen in ATM trials tend tobe small. For instance, a new CDTI display, or controller automation tool, is unlikely todecrease measured workload by 50%. Given the evolutionary nature of ATM developments,increments tend to be much smaller. As a result, small effect sizes mean that a huge numberof test subjects would be required to enable “significant” results to be found betweenvalidation conditions.

Statistics themselves, in particular inferential statistics, represent something of a double-edged sword in validation exercise. Although they can bolster validation results, there issometimes, in validation and experimentation, an over-reliance on the search for statisticalsignificance in results. For this reason it is critical to distinguish operational, or practical,significance from statistical significance in validation. As an example, imagine a new displaywere capable of decreasing controller workload by three percent. This result might bedismissed by some as irrelevant, if not statistically significant. If a direct link could be made,on the other hand, to safety, few would question the operational significance of a threepercent reduction in accident rate.

3.4. Measurement Issues: Summary

A number of potential pitfalls exist in statistical analysis and experimental design. It is not theintention of this section to offer a comprehensive (or even cursory) introduction to the use ofstatistics and experimental design in validation. Nor is it the intention that this section couldever replace expertise in the areas of human factors, validation, statistical analysis andexperimental design. Rather, it is intended that this section should sensitise the reader tosome of the potential issues involved. Some of the other potential validation measurementpitfalls include the following:• Proper use and interpretation of correlation as a test technique. Beginning statistics

students are often cautioned that “correlation does not mean causation.” Nonetheless,

Page 23: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 23Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

correlation is often misinterpreted in terms of cause-and-effect. As an extreme example,a recent research study in Canada concluded that alcohol usage and Management levelcorrelated highly positively. It is unlikely, though, that the former causes the latter;

• An understanding of the notion of “statistical power”—that is, the ability of a statistical testto find significant differences when in fact they exist, is essential in setting an appropriatedecision criterion;

• The need to balance the (sometimes conflicting) goals of validation realism and datacollection richness. Given the generally high costs of validation, there is sometimes astrong motivation on the part of researchers to maximise data collection possibilities. Notonly does this run the risk of interfering with validation itself (a case of the measurescorrupting the operations). It is generally better to hone the goals of validation (i.e. limit itsscope) than to run the risk of a “fishing expedition” in which a mountain of data areimpossible to reduce into a meaningful set of conclusions.

4. HUMAN PERFORMANCE METRICS

Again, this WP3 effort identified the following seven major human performance areas, asthose being most relevant to ASAS:

• Workload• Situation awareness• System monitoring and error detection• Teamwork• Trust• Usability and user acceptance• Human error

For each of the seven areas, there are a number of theories concerning the underlyingmechanisms, and operational significance, of the results. Each of the following sub-sectionsprovides a brief introduction to each human performance area, including a theoreticalbackground, a survey of the most common metrics, and empirical evidence surrounding theiruse. Each of the following seven sections provides a tabular overview of the most commonmeasurement techniques and metrics (for a more complete list, see Appendix A). Notice thatthese tables are not meant to be exhaustive. There are many other metrics available (andnew ones being created almost daily). However, those selected here should represent themost theoretically and empirically supported metrics currently in use.

The following sections also provide an assessment of each metric, using a set of evaluativecriteria (intrusiveness, equipment cost, etc).

Notice that various techniques are available for gathering each metric. For instance, pupildiameter can be recorded using an expensive eye tracker, or it can be (but rarely is)estimated using a much lower-tech approach, with volunteer graduate students andvideorecording. It is therefore difficult to provide hard and fast estimates about each of theevaluative criteria. The provided estimates are based on the assumption that the most typicalmethods will be applied.

4.1. Workload

Page 24: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 24Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

Interest in defining and developing metrics of workload has grown dramatically since the mid1970s (Sanders & McCormick, 1987). Workload has become a critical criterion in validationand certification, from the case of the two-person flight-deck, to the newest forms of ATCautomation. There is ongoing debate in the theoretical research community about thedefinition of workload (a debate that is also reflected in the variety of means available toassess workload). Nonetheless, there is general agreement that mental workload is not aunitary, but a multi-dimensional concept that taps both the difficulty of a task and the effort(both physical and mental) brought to bear.

Inherent in the notion of mental workload has been the concept that the human operator hasa limited capacity to process information. Information processing models of the 1950s grewout of the field of communications engineering. One of the most compelling and empiricallysupported views to emerge since then models human attention as an amalgamation ofmultiple specific resources. As shown in the accompanying figure (after Wickens, 1980),tasks differ on the basis of demands they place, in terms of: modality of input (visual versusauditory); data input code (spatial /verbal) stage of processing (encoding/central/response);and response type (manual versus verbal) characteristics.

Stages

Responses

Verbal

Spatial

EncodingCentral

processing Responding

Manual

Spatial

Aud

itory

Vis

ual

Verbal

Mo

dal

itie

s

Vocal

Code

Figure 4.1 The multiple resource model of human information processing (after Wickens,1992).

4.1.1. Techniques for Measuring Mental Workload

Workload measurement techniques are typically categorized as either physiological,subjective, or performance.2 Within each category, there are a number of specific indicesavailable to the researcher. The relevant criteria for judging the usefulness of variousworkload indices are generally agreed to be: sensitivity, diagnosticity, cost, operator

2 A fourth method, workload modelling, is used during system development to predict workload imposed by future

systems. The British PUMA system (Houselander & Owens, 1995) is one example.

Page 25: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 25Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

acceptance, implementation requirements, reliability, and intrusiveness (Meshkati, Hancock& Rahimi, 1990; Kramer, 1991). Rarely, however, will any one mental workload index meetall of these criteria (Wickens, 1984). Each of the three categories of workload measurementtechniques will now be discussed.

4.1.2. Physiological metrics of workload

The use of physiological measures rests on the assumption that changes in workload causemeasurable differences in certain (generally involuntary) physiological processes. One majoradvantage of (certain) physiological indicators is that they remain available even in theabsence of overt behavior. As a result, they are potentially useful in evaluating workload invalidation scenarios in which response demands are low (as with operation within a highlyautomated cockpit). Some of the better known physiological workload metrics have included:Galvanic skin response (GSR), similar to the traditional “lie-detector” measure; Heart rateand other metrics derived from the electrocardiographic (ECG) pattern; and various aspectsof ocular (i.e. eye related) response.

4.1.3. Subjective metrics of workload

The class of subjective workload assessment methods relies on the operator's self-reportedeffort in carrying out some task(s)3. Their chief appeal has been: Cost and ease of use (theyare often administered in paper-and-pencil form); Potentially high face validity with theoperational community; and ease of analysis. Perhaps their greatest drawback, however, hasbeen the potential bias inherent in subjective reports. Test subjects might either be unable(through for instance memory limitations, inaccurate self-perceptions, or a desire to tell theresearcher “what he wants to hear”) to accurately self-report. There is empirical evidencethat subjective reports have certain predictable patterns of bias. Subjective metrics have,nonetheless, remained popular for both research, development, and validation. In addition tothe potential benefits listed above, there is another important factor to consider: As ATMsystems and tools grow “smarter” through e.g. advances in microprocessor technology, theyare increasingly playing the role of intelligent partners. As such, it is necessary for theeventual operator – be it a pilot or controller – to accept the new system, if it to succeed. Forthis reason, subjective acceptance is likely to grow more critical in future validation efforts.

A number of subjective techniques have been used, some of the better known onesincluding: the Cooper-Harper scale, an ordinal rating scale originally designed forassessment of flight control skills, and later modified (Wierwille & Casali, 1983) toaccommodate a broader range of task load scenarios; the Subjective Workload AssessmentTechnique (SWAT); and the NASA Task Load Index (TLX), a multi-dimensional scale tappingsix broad dimensions of workload, such as fatigue and physical effort (Hart & Staveland,1988). One subjective instrument, the Air Traffic Workload Input Technique, or ATWIT(Leighbody, Beck & Amato, 1992) has been designed specifically for use with ATC tasks.Finally, it is worth mentioning the Instantaneous Self Assessment (ISA) technique, originallydeveloped at UK ATC Evaluation Unit Bournemouth some years ago. ISA is a unitary (1 to 5rating) method that, notably, has been incorporated into EUROCONTROL Bretigny’s ownsimulation facility (among others), and allows for online registration of controller ratings atintervals down to two minutes or so.

3 Subjective measures need not be self-reports. Observer (e.g. “over-the-shoulder”) ratings are another type of

subjective measure, though are considered distinct for the current discussion.

Page 26: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 26Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

4.1.4. Performance metrics of workload

Performance measures attempt to infer workload through direct measurement of taskperformance. Though some slight differences of terminology exist, this class is generallyagreed to encompass two types of methods: primary task and secondary task measures.Both methods rely on measuring the influence of increasing task load on the performance ofsome task. Primary task methods directly measure performance on the task underconsideration, and focus on such parameters as inter-stimulus interval (ISI), controlcomplexity and number of information sources. The primary task technique typically involvesvarying some primary task parameter (e.g., tracking complexity) that will affect task demandsto the point that performance falls below some criterion, thereby providing a measure ofresidual capacity at resource allocations below criterion performance. Primary taskmeasures have the advantage that they can directly relate workload to system performance.Several shortcomings of performance metrics have been noted over the years (cf. Wickens,1984). For instance, performance is very dependent on strategies (e.g. priority shifts amongtasks) as well as the task combination used. Further, it can often be intrusive and artificial inan operationally realistic validation setting to use many of the better known performancemeasures (e.g. mental arithmetic). A possible solution, which has to be judged on a case-by-case basis, is the inclusion of realistic “embedded tasks”—tasks that provide proxy measuresof workload, while appearing part of the pilot or controller’s normal repertoire (Raby &Wickens, 1990).

In summary, it is safest to conclude that there are relative strengths and weaknessesassociated with each workload measurement method. The appropriate method(s) for a givenvalidation exercise will vary by the higher level goals, as well as scenario. It is, of course, forexactly this reason that the current WP3 effort was undertaken. The following table presentsthe workload metrics that have been commonly used over the years, and which seemappropriate for validation exercises.

4.1.5. Taskload measures

The human factors community often makes a distinction between taskload (the task imposedupon an operator) and their workload (the operator’s subjective response). Workload isgenerally seen as more than taskload, and incorporates such influences as time pressure,stress, training, expectations, strategies, etc. The constructs of taskload and workload aresometimes seen as analogous to stress and strain in the physical world, in which one resultsfrom the other. Although there is some disagreement within the research community about thedegree of inference that taskload measures permit, they have been traditionally importantsources of workload-related data. Some of the typical taskload measures have included: Radiocommunication bandwidth and duration (Stein, 1992); Number of aircraft under control (Hurst &Rose, 1978); and number of flight transitions (i.e., overflights vs. altitude transitions (Cardosi &Murphy, 1995)).

Page 27: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 27Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

Method ofMeasurement

Criterion Estimate Description

ValidationTechnique

RTS

Subjective/Objective ObjectivePerspective Air/Gnd*Intrusive MedCost of Equipment HighReliability HighValidity MedExpertise Required High

Heart Rate derivedmetrics (HeartRate, Heart RateVariability)

Resource Intensity High

Temporal resolutionpoorer (roughly 3 minutes)but lower cost than forsome other physiologicalmeasures; Potentiallysubject to data artefacts(e.g. physical effortlevels);High expertiserequired to measure andanalyse. (Gopher &Donchim, 1986)* motion artefacts aremore likely in flightdecksetting

ValidationTechnique

RTS

Subjective/Objective ObjectivePerspective Ground**Intrusive Med-HighCost of Equipment HighReliability HighValidity Med*Expertise Required High

Pupil diameter

Resource Intensity High

Good temporal resolution,but also costly in terms ofexpertise. Equipment foreye tracking expensive andgenerally not portable;Ocular measures currentlyquite intrusive. Subject tolight (eg probably notsuitable for use in daylightcockpit settings) and otherartefacts. (Beatty, 1986)* subject to data artefacts** less commonly used inflight

ValidationTechnique

RTS

Subjective/Objective ObjectivePerspective Ground**Intrusive Med *Cost of Equipment High *Reliability HighValidity MedExpertise Required High

Blink rate

Resource Intensity High

As above, plus poortemporal resolution.

* low cost video basedalternatives are available

** less commonly used inflight

(Kramer, 1991)

Page 28: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 28Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

ValidationTechnique

RTS

Subjective/Objective ObjectivePerspective Ground**Intrusive MedCost of Equipment HighReliability HighValidity MedExpertise Required High

Other eye trackingmeasures

Resource Intensity High

Include fixation frequency,average dwell time,scanning entropy (i.e.randomness). Are derivedfrom same eye trackingrecord as above metrics.With same disadvantages(required expertise).** less commonly used inflight(Gopher & Donchin, 1986)

ValidationTechnique

RTS

Subjective/Objective SubjectivePerspective Air/GndIntrusive Low/MedCost of Equipment LowReliability HighValidity Med*Expertise Required Low

NASA TLX

Resource Intensity Low

One of many acceptedsurvey techniques.* Face validity is veryhigh.(Hart & Staveland, 1988)

ValidationTechnique

RTS

Subjective/Objective SubjectivePerspective Air/GndIntrusive Low/MedCost of Equipment Low/MedReliability HighValidity MedExpertise Required Low

Instantaneous SelfAssessment (ISA)

Resource Intensity Low

Simple to use and scoretechnique, alreadyincorporated withEUROCONTROL’s simplatforms. Costs can below (paper and pencil) ormedium (electronicsystem).

ValidationTechnique

RTS

Subjective/Objective SubjectivePerspective Air/GndIntrusive Low/MedCost of Equipment LowReliability HighValidity MedExpertise Required Low

Bedford WorkloadScale

Resource Intensity Low

A subjective technique forpilots. Well –accepted byflightcrews, but somepractice might benecessary with the scale(Roscoe, 1984; Gawron,2000).

Page 29: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 29Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

ValidationTechnique

RTS

Subjective/Objective SubjectivePerspective Air/GndIntrusive MedCost of Equipment LowReliability HighValidity MedExpertise Required High

Cooper HarperRating Scale

Resource Intensity High

Decision tree method, usedfor assessing aircrafthandling qualities. Onlyappropriate if aircrafthandling difficulty ismajor determinant ofworkload.

ValidationTechnique

RTS

Subjective/Objective SubjectivePerspective GroundIntrusive Low/MedCost of Equipment Low/MedReliability HighValidity MedExpertise Required Low

ATWIT (AirTraffic WorkloadInput Technique)

Resource Intensity Low

Specifically for ATC.Periodic rating technique,similar to ISA, typicallyadministered as 10 pointscale every five minutes.Developed and used byFAA (Stein, 1985).

ValidationTechnique

RTS

Subjective/Objective SubjectivePerspective Air/GndIntrusive Low/MedCost of Equipment Low/MedReliability HighValidity Med*Expertise Required Low/Med

SubjectiveWorkloadAssessmentTechnique SWAT

Resource Intensity Low/Med

Like TLX, ismultidimensional (thoughonly three dimensions(time, mental effort,psychological stress).More intrusive than ISA.Well researched andgenerally well regarded.Applied in many settings.Card sorting procedure(can also be electronicallyadministered) timeconsuming and intrusive.* good for tracking tasks

ValidationTechnique

Analytical

Subjective/Objective SubjectivePerspective Air/GndIntrusive LowCost of Equipment LowReliability HighValidity MedExpertise Required High

PUMA

Resource Intensity High

Analytic technique toassess scenario taskload,based on task types, timingand interactions

Page 30: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 30Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

ValidationTechnique

RTS

Subjective/Objective ObjectivePerspective Air/GndIntrusive Med/HighCost of Equipment MedReliability HighValidity HighExpertise Required Med

Embedded(secondary) tasks

Resource Intensity Med

Natural but secondarycomponent of anoperator’s entire task set;Embedded taskperformance (e.g.responding to ATC calls)is used as indirectindication of demands of aprimary task (e.g. trackinga flightpath). (Hilburn,2000).

ValidationTechnique

RTS

Subjective/Objective ObjectivePerspective Ground**Intrusive HighCost of Equipment HighReliability HighValidity High*Expertise Required High

Brain evokedpotentials

Resource Intensity High

Good for assessingworkload in situationsinvolving no physicalresponse. Expensive anddemanding in terms ofexpertise.* many trials needed(Kramer, 1991).** some work has beendone collecting EEG inflight.

Table 4.1: Some Common Workload Metrics

4.2. Situation Awareness

Situation Awareness (SA) is a currently popular concept used to describe an operator’scomprehension of complex and dynamically changing system dynamics. The term grew outof the tactical fighter domain, in which a pilot’s ability to continuously comprehend varioussystem characteristics (vehicle energy state, opponent’s location, etc.) could mean thedifference between life and death. The prominent role of SA in current human factorsresearch is due, in large part, to the increasingly cognitive nature of human-machine tasks(Durso & Gronlund, 1999). Notice that the literature on human performance generallydistinguishes between SA and the construct of System Monitoring, as discussed later.Although there is some clear potential overlap between the two (indeed, the same can besaid of all the identified human performance areas), the latter generally concerns detectionand initial response to some discrete non-nominal state (eg, a dial suddenly out of range).The notion of SA extends far beyond this, to encompass the operator’s understanding ofcomplex system states, and ability to predict future system behaviour. Nonetheless, some ofthe behavioural techniques used to assess SA might use typical monitoring measures (eg,response time, hit rate, false alarm).

The current lack of a consensus definition of SA is reflected in the range of measurementtechniques used to assess it (Hilburn, 1996). Techniques have ranged from self-reportmeasures, to over-the-shoulder evaluations, to various “screen-freeze” procedures (e.g.requiring an air traffic controller to reconstruct the traffic pattern from memory (Sollenberger& Stein, 1995)), to the use of physiological measures such as P300 (Endsley, 1995) or eyepoint of gaze (Smolensky, 1993, Durso et al., 1995). Over the years, various classificationschemes have been proposed for the various SA measurement techniques (Sarter & Woods,1995; Wickens, 1996). Durso & Gronlund (1999) recently reviewed the available SA

Page 31: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 31Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

measurement techniques, and distinguished the following three popular types of SAmeasures:

• Subjective methods – such as the self-report SART (Taylor, 1990);• Query methods – either with the original situation absent (e.g. SAGAT (Endsley, 1990)),

or with the situation continuously present;• Performance – similar to workload measures, operationally relevant embedded tasks can

provide implicit measures of SA.

As with measures of “mental workload” several years ago, current debates over how toassess SA often fall back on a core set of evaluative criteria (sensitivity, diagnosticity, cost,operator acceptance, implementation requirements, reliability, and intrusiveness). Often thesecriteria must be weighed against one another. A particular case in point is that of subjective SAmeasurements, which are subject to the same set of potential benefits (low cost, ease ofadministration) and drawbacks (memory limitations, susceptibility to demand characteristics) assubjective techniques in general. Using the classification scheme of Durso and Grunland(1999), query and implicit performance methods could both be roughly termed “objective”techniques, in contrast to the group of subjective techniques. Both objective and subjectiveSA techniques have relative advantages, as will be discussed later.

Subjective SA assessments, as with other types of subjective measures, are susceptible toerror. Objective measures, on the other hand, are often costly or difficult to administer (e.g.physiological measures), or overly intrusive (e.g. “screen-freeze” query methods). A solutionwould seem to lie in devising appropriate embedded tasks that allow for naturalistic collection ofbehavioural data, with a minimum of task interruption. If such data can be recorded as part ofthe normal (or slightly modified) routine of the operator, acceptance is likely to be much higher.This is particularly critical in more realistic scenarios, such as high fidelity simulations or theoperational environment. Performance measures can side-step many of the theoreticaldifficulties currently being faced by the research community. This is the starting point forembedded task, or “implicit performance,” measures of SA.

Finally, it is important to reiterate that no single technique (or even type of technique) is likelyto be appropriate for all research settings. It is argued that “objective” (e.g. response time) andsubjective (e.g. self-report) SA measures might tap fundamentally different, yet equally critical,aspects of awareness. Whereas the former might relate to awareness, the latter might relatemore to “awareness of that awareness,” or the individual’s own subjective reaction to thatawareness. The case in which two SA measures (both valid and reliable) can dissociatehighlights an important point: In system development, or any case in which user acceptance isessential, operators’ subjective evaluation can influence acceptance and, ultimately, theusefulness of the system. If an air traffic controller, for example, does not (subjectively)recognise the SA benefits of a new tool, he/she will likely devise elaborate means to circumventits use.

The following table outlines some of the better-known measures of Situation Awarenessavailable for validation purposes.

Page 32: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 32Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

Method ofMeasurement

Criterion Estimate Description

Validation Technique RTSSubjective/Objective ObjectivePerspective Air/GndIntrusive Med/HighCost of Equipment LowReliability HighValidity MedExpertise Required High

SA GlobalAssessmentTechnique(SAGAT)

Resource Intensity Med

A query method thatinvolves freezing thesimulation, and comparingreal and perceivedsituations. Has beenaccused of being over-reliant on memory, andsubject to bias (selfbeliefs). (Endsley, 1988)

Validation Technique RTSSubjective/Objective SubjectivePerspective Air/GndIntrusive MedCost of Equipment LowReliability HighValidity MedExpertise Required Med

SA RatingTechnique (SART)

Resource Intensity Low

A subjective techniquethat uses a questionnaireconcerning threedimensions of SA. Easilyadministered.(Taylor, 1990)

Validation Technique RTSSubjective/Objective ObjectivePerspective Air/GndIntrusive MedCost of Equipment LowReliability MedValidity MedExpertise Required High

SA LinkedInstances Adaptedto Novel Tasks(SALIANT)

Resource Intensity Med

A performance-basedmeasure specifically forteam SA(Muniz et al, 1998).

Validation Technique RTSSubjective/Objective ObjectivePerspective Air/GndIntrusive Med/HighCost of Equipment MedReliability HighValidity HighExpertise Required Med

Embedded tasks

Resource Intensity Med

Performance-basedmeasures that derivenaturalistically from thetask environment itself(Hilburn, 1996), that canbe used to implicitlymeasure SA.

Validation Technique RTSSubjective/Objective SubjectivePerspective Air/GndIntrusive LowCost of Equipment LowReliability High*Validity Med*Expertise Required High

Expert Ratings

Resource Intensity Med

Expert operators observeperformance (or recordedreplay of performance),and assess SA.* Subject to rater biases

Page 33: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 33Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

Validation Technique RTSSubjective/Objective SubjectivePerspective Air/GndIntrusive MedCost of Equipment LowReliability MedValidity MedExpertise Required Med

China LakeSituationalAwareness(CLSA)

Resource Intensity Med

Five point subjectiverating scale based onBedford workload scaleRequires crew response,and can therefore beintrusive.

(Adams, 1998).

Table 4.2: Some Common Situation Awareness Metrics

4.3. System Monitoring

Increasing awareness is being paid to human monitoring performance in aviation systems.Two factors drive this trend. First is the recognition that humans are, by nature, poormonitors. Passive, vigilance tasks are not ones to which the human operator is well-suited.Secondly, the increased use of automation, both on the flight-deck and at the ATCworkstation, means that the presence of such tasks is likely to increase in the future. That is,automation is forcing the pilot and controller, increasingly, into the role of a passive monitorof automated systems.

The experimental study of vigilance grew out of the work of during the 1950s on theperformance of maritime sonar monitors. Performance on vigilance tasks can becharacterised by both the absolute level of vigilance (in terms of, for instance, detection rate),as well as degradation in performance that typically accompanies time-on-task. This secondcharacteristic of vigilance performance, the change in monitoring performance over time, hasbeen termed the vigilance decrement. It has been known for some decades thatperformance in such settings degrades within the first ten minutes.

The motivation behind this area of study has been, of course, the concern that operators(e.g. pilots or controllers) could miss a critical signal (such as a TCAS Traffic Advisory) withcatastrophic consequences. Unfortunately, humans have yet to produce a mechanical devicewith a reliability of 1.0 (i.e., perfect reliability). However infrequent, failures of automatedsystems are inevitable. If such failures occur after the operator's performance has degradedduring a long vigil (say, after several hours into a shift), the ability of the operator to quicklydetect and respond to the malfunction will likely be compromised. The increased monitoringdemands imposed by automation stem, in part, from the component proliferation thatautomation invites. Wickens (1984) pointed out that automation of any one function actuallyincreases by at least threefold the number of functions to be monitored– the function itself,the automated system, and the indicator of the automated system are all now possiblesources of malfunction. This both increases the probability that any one component willmalfunction, and increases the number of devices that must be monitored (Wickens, 1991). Forexample, a 'GO' light that fails to illuminate indicates that there is a problem with either a sensor,the light bulb, or the system itself. In cases of system malfunction, this increase in the numberof components complicates diagnosis. The potential problem of automation-inducedmonitoring overload is exacerbated by the injudicious use of solid state microelectronics, thatenable an inexpensive and compact means of supplying a wealth of information to theoperator. To prevent the human operator from becoming overloaded with information,adequate forethought has to be given during system development to what information thesystem will display and, equally importantly, what information it will withhold.

Page 34: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 34Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

Assessing monitoring performance in the validation setting is something of a paradox.Although it is easy to identify the relevant metrics of monitoring performance, setting upvalidation methods to accurately use them is exceedingly difficult. This stems from the verynature of the task. Because critical “signals” are very rare- a pilot can go through an entirecareer without encountering an engine flameout, or a stuck landing gear. For that reason,validation can never present such rare events with sufficient frequency to permit statisticalanalysis.

Below are the performance measures typically used to assess monitoring performance:

Method ofMeasurement

Criterion Estimate Description

ValidationTechnique

RTS

Subjective/Objective ObjectivePerspective Ground**Intrusive HighCost of Equipment HighReliability HighValidity High*Expertise Required High

Brain evokedpotentials

Resource Intensity High

Determine whetheroperator processed signal.Expensive and demandingin terms of expertise.* many trials needed

** has also been collectedin flight (simulations)

(Kramer, 1991)

ValidationTechnique

RTS

Subjective/Objective ObjectivePerspective Air/GndIntrusive LowCost of Equipment Med*Reliability HighValidity HighExpertise Required Low

Reaction (orresponse) time

Resource Intensity Med*

Time (or average time)from onset of a signal, toeither first reaction, or toinitiation of an appropriateresponse. Reaction time(RT) can be of severalsorts: simple, choice.

* highly task specific

ValidationTechnique

RTS

Subjective/Objective ObjectivePerspective Air/GndIntrusive LowCost of Equipment Med*Reliability HighValidity HighExpertise Required Low

Search time

Resource Intensity Med*

Once a signal has beeninitiated, time toassess/diagnose/search forsource or nature of signal.Used with displays anddatabases, to determinetime to retrieve desiredinformation.

* highly task specific

Page 35: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 35Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

ValidationTechnique

RTS

Subjective/Objective ObjectivePerspective Air/GndIntrusive LowCost of Equipment Med*Reliability HighValidity HighExpertise Required Low

Error detection rate

Resource Intensity Med*

Percentage of all errorsthat were properlyidentified

* highly task specific

ValidationTechnique

RTS

Subjective/Objective ObjectivePerspective Air/GndIntrusive LowCost of Equipment Med*Reliability HighValidity HighExpertise Required Low

False alarm count

Resource Intensity Med*

The absolute number offalse alarms (i.e. incorrectdetection of an error, whennone was present) persession.

* highly task specific

ValidationTechnique

RTS

Subjective/Objective SubjectivePerspective Air/GndIntrusive LowCost of Equipment Med*Reliability HighValidity HighExpertise Required Low

Miss rate

Resource Intensity Med*

The percentage of allerrors that were notidentified.

* highly task specific

Table 4.3: Some Common System Monitoring Metrics

4.4. Teamwork

Teamwork is a vital part of air traffic management and is best defined within this context bythe EUROCONTROL TRM study group (EATCHIP, 1996) as:

“a group of two or more persons who interact dynamically and interdependently withinassigned specific roles, functions and responsibilities. They have to adapt continuously toeach other to ensure the establishment of a safe, orderly and expeditious flow of air traffic”.

The following table outlines the most accepted measures of teamwork used in ATM setings.

Page 36: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 36Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

Method ofMeasurement

Criterion Estimate Description

ValidationTechnique

RTS

Subjective/Objective

Subjective

Perspective

Ground

Intrusive LowCost ofEquipment

Low

Reliability NAValidity NAUtility NAExpertiseRequired

Low

Communicationscoding schemes

ResourceIntensity

Low

Message type, message content and messagesubject classification coding schemes, intended foron-line recording.

(Cunningham, D., Kelly, C.J., Goillau, P. andBoardman, M. 2001)

ValidationTechnique

RTS

Subjective/Objective

Subjective

Perspective

Air/Gnd

Intrusive LowCost ofEquipment

Low

Reliability NAValidity NAUtility NAExpertiseRequired

Low

Communicationsrating formats

ResourceIntensity

Low

Procedures for rating effectiveness ofcommunications procedures (e.g. conduct ofbriefings, inquiry/advocacy/ assertion, conflictresolution, etc.)

(Cunningham, D., Kelly, C.J., Goillau, P. andBoardman, M. 2001)

Page 37: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 37Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

ValidationTechnique

RTS

Subjective/Objective

Subjective

Perspective

Air/Gnd

Intrusive LowCost ofEquipment

Low

Reliability NAValidity NAUtility NAExpertiseRequired

High

Measurements ofshared mentalmodels

ResourceIntensity

Low

Measures of the extent to which team membershold a common shared model of the situation,tasks, fellow team members and their capabilities,etc.

(Cunningham, D., Kelly, C.J., Goillau, P. andBoardman, M. 2001)

ValidationTechnique

RTS

Subjective/Objective

Subjective

Perspective

Air/Gnd

Intrusive LowCost ofEquipment

Low

Reliability NAValidity NAUtility NAExpertiseRequired

Low

Timeline analysis

ResourceIntensity

Low

Recording of team actions and communications ona time-line, providing a detailed description ofteam member interactions, responses toenvironmental events and sources of errors.

(Cunningham, D., Kelly, C.J., Goillau, P. andBoardman, M. 2001)

ValidationTechnique

RTS

Subjective/Objective

Subjective

Perspective

Air/Gnd

Intrusive LowCost ofEquipment

Low

Reliability HighValidity HighUtility HighExpertiseRequired

Low

Team co-ordinationobservation

ResourceIntensity

Low

Checklist based procedure based on a list of teamprocesses and typical behaviours associated witheach. These are ticked off when observed.

(Cunningham, D., Kelly, C.J., Goillau, P. andBoardman, M. 2001)

Page 38: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 38Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

ValidationTechnique

RTS

Subjective/Objective

Subjective

Perspective

Air/Gnd

Intrusive LowCost ofEquipment

Low

Reliability HighValidity HighUtility HighExpertiseRequired

Low

Behaviouralobservation scales

ResourceIntensity

Low

Procedure used to rate the occurrence of teamworkof various kinds by a particular team and itsmembers. A separate scale may be provided foreach team process to be rated. The rating scale isnormally supported by a definition of each teamprocess and typical behaviours that may beobserved.

(Cunningham, D., Kelly, C.J., Goillau, P. andBoardman, M. 2001)

ValidationTechnique

RTS

Subjective/Objective

Subjective

Perspective

Air/Gnd

Intrusive LowCost ofEquipment

Low

Reliability HighValidity HighUtility HighExpertiseRequired

Low

Event-basedperformancemeasurement

ResourceIntensity

Low

Procedure that requires pre-specification ofdesirable team behaviours to be observed.Normally requires use of pre-specified exercisescenarios and events.

(Cunningham, D., Kelly, C.J., Goillau, P. andBoardman, M. 2001)

ValidationTechnique

RTS

Subjective/Objective

Subjective

Perspective

Air/Gnd

Intrusive LowCost ofEquipment

Low

Reliability HighValidity HighUtility HighExpertiseRequired

Low

Observation basedquestionnaire

ResourceIntensity

Low

Procedure typically requires observers to rate theperformance of a team on a questionnaire-based setof effectiveness and efficiency criteria. Thecriteria may be organised under various headingsthat may include team processes.

(Cunningham, D., Kelly, C.J., Goillau, P. andBoardman, M. 2001)

Page 39: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 39Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

ValidationTechnique

RTS

Subjective/Objective

Subjective

Perspective

Air/Gnd

Intrusive LowCost ofEquipment

Low

Reliability HighValidity HighUtility HighExpertiseRequired

Low

Team self-assessmentquestionnaire

ResourceIntensity

Low

Team members themselves observe each other andrate their own efficiency and effectiveness onvarious criteria in the form of a set of questions.

(Cunningham, D., Kelly, C.J., Goillau, P. andBoardman, M. 2001)

ValidationTechnique

RTS

Subjective/Objective

Subjective

Perspective

Air/Gnd

Intrusive LowCost ofEquipment

Low

Reliability NAValidity NAUtility NAExpertiseRequired

High

Team ProcessQuality ratingform

ResourceIntensity

High

(Part of a set) Provides information on how well ateam communicates and interacts whilst carryingout observed ATM activities.

(Cunningham, D., Kelly, C.J., Goillau, P. andBoardman, M. 2001)

ValidationTechnique

Analytic

Subjective/Objective

Objective

Perspective

Air/Gnd

Intrusive LowCost ofEquipment

Low

Reliability NAValidity NAUtility NAExpertiseRequired

Low

Team ProcessFrequencyrecording form

ResourceIntensity

Low

(Part of a set) Provides a quantitative measure ofteamwork, in the form of a log of the observedfrequency of occurrence of team-workingbehaviours.

(Cunningham, D., Kelly, C.J., Goillau, P. andBoardman, M. 2001)

Page 40: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 40Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

ValidationTechnique

Survey

Subjective/Objective

Subjective

Perspective

Air/Gnd

Intrusive LowCost ofEquipment

Low

Reliability NAValidity NAUtility NAExpertiseRequired

Low

Team ProcessQuestionnaire

ResourceIntensity

Low

(Part of a set) Designed to collect the views of trialparticipants on the effect of system automation onteam processes.

(Cunningham, D., Kelly, C.J., Goillau, P. andBoardman, M. 2001)

Table 4.4: Common Teamwork Metrics

4.5. Trust

Trust is a term that is familiar to all of us in everyday life. We talk of the trust that we haveother people (family, friends, and colleagues), how much we believe what we see or are told(e.g. in a newspaper), or how confident we are that something works properly (e.g. a motorcar). Clearly, trust has several meanings, but it can be defined simply as the confidenceplaced in a person or thing, or more precisely, the degree of belief in the strength, ability,truth or reliability of a person or thing. In the context of complex, human-machine systems,Madsen and Gregor (2000) have defined trust as follows:

“Trust is the extent to which a user is confident in, and willing to act on the basis of, therecommendations, actions, and decisions of an artificially intelligent decision aid.”

This is a useful definition. However, the terms ‘artificially intelligent’ suggest too strongly thatthe focus is upon expert systems and related computer systems. Therefore, a term such ascomputer-based tool is preferred. In the context of ATM, Hopkin (1995) captured theimportance of trust as follows:

“… Pilots have to trust controllers to issue instructions that are safe and efficient. Controllershave to trust pilots to implement those instructions correctly. Both have to trust theirequipment, their information sources and displays, their communications, and the safety oftheir procedures and instructions….”

In psychological jargon, trust is an intervening variable because it “intervenes” betweenparticular stimulus conditions and particular behaviours. That is, it is an internal state thatcannot be measured directly, but is inferred on the basis of certain observations andmeasurements. In the context of ASAS, the degree of trust in automation could, theoreticallyat least, be inferred from objective measures of controller performance (e.g. frequency,accuracy or speed of interaction), if the relationship between these measures and theautomation could be unequivocally established.

An intervening variable such as trust can also be measured subjectively by asking anoperator or controller to say simply how they feel. Indeed the use of subjective rating scales

Page 41: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 41Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

is the most common means of measuring trust. It is important to note that if the origin of thesubjective ratings can be modelled, one can convert intervening variables to objectivemeasures.

Trust is a construct composed of several elements or dimensions. The main dimensionsidentified in the research literature are:• Predictability• Dependability• Faith• Reliability• Robustness• Familiarity• Understandability• Explication of intention• Usefulness• Competence• Self-confidence• Reputation

The following summarises the main techniques used to assess trust in ATM settings.

Method ofMeasurement

Criterion Estimate Description

ValidationTechnique

Survey

Subjective/Objective

Subjective

Perspective Air/GndIntrusive LowCost ofEquipment

Low

Reliability HighValidity HighUtility HighExpertiseRequired

Low

Lee and Morayscale

ResourceIntensity

Low

Simple 10-point scale to evaluate operators’ trust,with a score varying from 1 (“not at all”) to 10(“completely”). Clear analogies to the use of ISAand the NASA TLX workload measures for ATC.

(Kelly, C., Boardman, M., Goillau, P. and Jeannot,E. 2001)

Page 42: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 42Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

ValidationTechnique

Survey

Subjective/Objective

Subjective

Perspective Air/GndIntrusive LowCost ofEquipment

Low

Reliability HighValidity HighUtility HighExpertiseRequired

Low

Muir scales

ResourceIntensity

Low

Set of rating scales with the poles labelled ‘none atall’ or ‘not at all’ on the left, and ‘extremely high’on the right. The operators are asked to rate theirdegree of trust in a number of aspects of a system.

(Kelly, C., Boardman, M., Goillau, P. and Jeannot,E. 2001)

ValidationTechnique

Survey

Subjective/Objective

Subjective

Perspective Air/GndIntrusive LowCost ofEquipment

Low

Reliability HighValidity HighUtility HighExpertiseRequired

Low

Human-Computer Trust(HCT) scale

ResourceIntensity

Low

Consists of five main constructs each with fivesub-items. The five items are drawn from anoriginal list of ten trust constructs as having themost predictive validity. It’s claimed that the HCThas been empirically shown to be valid andreliable.

(Kelly, C., Boardman, M., Goillau, P. and Jeannot,E. 2001)

ValidationTechnique

Survey

Subjective/Objective

Subjective

Perspective Air/GndIntrusive LowCost ofEquipment

Low

Reliability HighValidity HighUtility HighExpertiseRequired

Low

Jian et al Trust‘questionnaire’

ResourceIntensity

Low

A 12-item trust ‘questionnaire’ incorporating a 7-point rating scale, where 1 on the scale equals ‘notat all’ and 7 equals ‘extremely’. The trustquestionnaire was developed as part of a three-phased experimental study. Data from both aquestionnaire study and a paired comparison studywere used to construct a multi-dimensionalmeasurement scale for trust.

(Kelly, C., Boardman, M., Goillau, P. and Jeannot,E. 2001)

Page 43: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 43Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

ValidationTechnique

Survey

Subjective/Objective

Subjective

Perspective Air/GndIntrusive LowCost ofEquipment

Low

Reliability HighValidity HighUtility HighExpertiseRequired

Low

Taylor et al 7-point rating scalequestionnaire

Resources Low

As part of extensive studies on the "human-electronic crew" in the military domain, a 17-item,7-point rating scale questionnaire was developed todetermine operators’ views on the timeliness andappropriateness of adaptive computer aiding.

(Kelly, C., Boardman, M., Goillau, P. and Jeannot,E. 2001)

ValidationTechnique

Survey

Subjective/Objective

Subjective

Perspective Air/GndIntrusive LowCost ofEquipment

Low

Reliability HighValidity HighUtility HighExpertiseRequired

Low

ControllerAcceptanceRating Scale(CARS)

ResourceIntensity

Low

Developed by researchers at the FAA from theearlier Cooper-Harper scale (Cooper and Harper,1969), but could form the basis of a measure oftrust.

(Kelly, C., Boardman, M., Goillau, P. and Jeannot,E. 2001)

ValidationTechnique

Survey

Subjective/Objective

Subjective

Perspective Air/Gnd

Intrusive Low

Cost ofEquipment

Low

Reliability NA

Validity NA

Utility NA

ExpertiseRequired

Low

SHAPEAutomation TrustIndex (SATI)

ResourceIntensity

Low

Consists of a set of rating scales to measure userviews about how much the user trusts theautomation in the ATM system that they areoperating. There are two parts to SATI: Part 1 -Each day, before starting the simulation runs, theuser rates their overall level of trust.Part 2 - Each day, after finishing the simulationruns, the user rates their strength of feeling aboutseveral factors that may contribute to trust, andagain the user rates their overall level of trust.

(P. Goillau, C. Kelly, M. Boardman, E. Jeannot.2001)

Table 4.5: Common Trust Metrics

Page 44: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 44Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

4.6. Usability/Acceptance

The usability and acceptance of an ATC system is fundamental to the safety and efficiency ofaircraft being controlled, as well as to the comfort of the controllers themselves. As far aspossible, the system should be instinctive and compatible with everyday interactions. Preece(1997) defines usability as:

“A measure of the ease with which a system can be learned or used, its safety,effectiveness and efficiency, and the attitude of its users towards it.”

Possible methods for measuring usability/acceptance are listed below, along with a briefdescription of each method.

Method ofMeasurement

Criterion Estimate Description

ValidationTechnique

Analytic

Subjective/Objective

Subjective

Perspective Air/GndIntrusive LowCost ofEquipment

High

Reliability HighValidity HighUtility HighExpertiseRequired

High

PUMA

ResourceIntensity

High

(The Performance and Usability Modellingtechnique for ATM toolset) Task analysis andtimeline modelling using video recordings. Enablesanalysis of existing systems and evaluations ofsystem modifications and its impact on workload.

(Kilner, A., Hook, M., Fearnside, P. & Nicholson,P.)

ValidationTechnique

RTS

Subjective/Objective

Objective

Perspective Ground

Intrusive High

Cost ofEquipment

High

Reliability HighValidity HighUtility HighExpertiseRequired

High

EMT

ResourceIntensity

High

(Eye Movement Tracking) Infrared eye trackingdevice with data collected in terms of X and Y co-ordinates. Can measure point of gaze, duration offixations, allocation of visual attention, blink rate,pupil diameter and visual scanning patterns.

(Wilson, J.R. & Corlett, E.N. 1999)

Page 45: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 45Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

Method ofMeasurement

Criterion Estimate Description

ValidationTechnique

Survey

Subjective/Objective

Subjective

Perspective Air/GndIntrusive LowCost ofEquipment

Low

Reliability HighValidity HighUtility HighExpertiseRequired

Low

SUMI

ResourceIntensity

Low

(Software Usability Measurement Inventory) Aclosed questionnaire consisting of 50 shortquestions to be answered by users of the software.

(Kirakowski, J. 1994)

ValidationTechnique

RTS

Subjective/Objective

Objective

Perspective Air/GndIntrusive LowCost ofEquipment

Low

Reliability LowValidity HighUtility HighExpertiseRequired

Low

Observation

ResourceIntensity

Low

May take place in the field or in a simulation.There are a number of techniques for collectingand analysing data. A video recording can be usedif more permanent data is required.

(Wilson, J.R. & Corlett, E.N. 1999)

ValidationTechnique

RTS

Subjective/Objective

Objective

Perspective Air/GndIntrusive LowCost ofEquipment

Low

Reliability HighValidity HighUtility HighExpertiseRequired

Low

Frequency ofuse of differenttools

ResourceIntensity

Low

Frequency count can be recorded to identify whichtools are used more often than others or if some arenot used at all.

(Wilson, J.R. & Corlett, E.N. 1999)

Page 46: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 46Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

ValidationTechnique

Survey

Subjective/Objective

Subjective

Perspective Air/GndIntrusive LowCost ofEquipment

Low

Reliability LowValidity HighUtility HighExpertiseRequired

Low

Debriefs

ResourceIntensity

Low

Can be in a group or on a one-to-one basis, basedon a specific incident or a general discussion.

Table 4.6: Common Usability Metrics

4.7. Human Error

Working within such a high responsibility, potentially stressful environment like air trafficcontrol, there are bound to be cases of human, as well as system, error however much youtry to prevent it. Nevertheless, understanding the causes of human error throughmeasurement can help highlight problem areas to which more attention can be made.

The impact of human error is, in fact, difficult to define in a way that is acceptable to differentpractitioners within different fields of expertise, but the following simple definition is used inthis paper:

"any action or inaction that actually or potentially leads to negative system consequences,where more than one possible action is available".

This definition emphasises both action and omission, and so includes failures of perceptionand attention, memory, decision-making processes, and response execution. The issue ofdefinition may seem more important to regulatory authorities than human factors specialists,where the appropriation of blame carries little significance. However, the definition of humanerror has a profound effect on the identification and classification of error.

In reality, controller and pilot error tends to be multi-causal. Error can be a function of theselection, training and experience. However, and more importantly for this paper, the qualityof the equipment provided and working environment has a major impact on the likelihood ofhuman error in operation. It is often difficult to separate ‘human’, ‘system’ and ‘design’errors, especially errors associated with Human Machine Interaction (HMI). A controller maygive incorrect information to an aircraft, but if the source information provided from the HMI isvague or ambiguous, then one may question whether it is appropriate to call this a humanerror. Still, in the UK it is currently difficult to identify a significant proportion of AircraftProximity (AIRPROX) incidents that are attributable to flaws in ATM equipment. Thisstatement is reinforced by official investigations of UK incidents involving ATM. Further, thecurrent HMIs used in ATM are simple and ‘direct’. With the introduction of tools andautomation such as ASAS, HMIs are less likely to be considered as simple and direct. Thefollowing table outlines the main methods used to assess human error in ATM settings.

Page 47: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 47Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

Method ofMeasurement

Criterion Estimate Description

ValidationTechnique

Analytic

Subjective/Objective

Subjective

Perspective Air/GndIntrusive LowCost ofEquipment

Low

Reliability NAValidity NAUtility HighExpertiseRequired

High

TRACEr

ResourceIntensity

Low

(Technique for the Retrospective Analysis ofCognitive Errors in ATM tools) TRACEr is ahierarchical human error classification system.Contextual and error information (in increasinglevels of detail) are categorised using a series ofdecision flow diagrams and look-up tables to aidthe user in classifying specific aspects of humanerror.

(Shorrock, S. 2000)

ValidationTechnique

Analytic

Subjective/Objective

Subjective

Perspective Air/GndIntrusive LowCost ofEquipment

Low

Reliability NAValidity NAUtility HighExpertiseRequired

Low

TRACEr LITE

ResourceIntensity

High –predictiveLow -Retrospective

A reduced-scope version of TRACEr and alsoallows for predictive analysis to identify potentialerrors that could occur within new systems beforethe system is implemented. It still retains thevalidity of TRACEr but has increased efficiencyand requires less training and time to use. UnlikeTRACEr, it allows pilot error to be classified morefully, however analysing controller error remainsthe prime objective and focus.

(Shorrock, S. 2000)

Page 48: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 48Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

ValidationTechnique

Analytic

Subjective/Objective

Objective

Perspective Air/GndIntrusive LowCost ofEquipment

Low

Reliability HighValidity HighUtility LowExpertiseRequired

Low

THERP

ResourceIntensity

High

(Technique for Human Error Rate Prediction)Helps create behavioural taxonomies, errortaxonomies and human-error data tables. Duringthe task analysis procedure, THERP can be used todetermine the error modes. It’s a simple tool touse, and if the whole method is implemented,THERP has the potential to be powerful andeffective, enabling the identification of all errorswithin a system.

(Swain & Guttman1983)

ValidationTechnique

Survey

Subjective/Objective

Objective

Perspective Air/GndIntrusive LowCost ofEquipment

Low

Reliability HighValidity HighUtility HighExpertiseRequired

Low

THEA

ResourceIntensity

Low

(Technique for Human Error Analysis) Atechnique designed to help anticipate interactionfailures. It employs a cognitive error analysisbased on an underlying model of humaninformation processing. It is intended for use earlyin the development lifecycle.

(Pocock, S., Harrison, M., Wright, P. & Johnson,P. 2001)

ValidationTechnique

RTS

Subjective/Objective

Objective

Perspective Air/GndIntrusive LowCost ofEquipment

Low

Reliability LowValidity HighUtility HighExpertiseRequired

Low

Observation

ResourceIntensity

Low

May take place in the field or in a simulation.There are a number of techniques for collectingand analysing data. A video recording can be usedif more permanent data is required.

(Wilson, J.R. & Corlett, E.N. 1999)

Table 4.7: Common Human Error Metrics

Page 49: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 49Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

5. APPLICATION OF FRAMEWORK IN TWO SELECTED SCENARIOS

This section will demonstrate the application of the framework, by providing two workedexamples. One example will be based on Full Delegation (Self-Separation) application, thesecond on the Time-based sequencing in approach.

The following abbreviated descriptions have been described to demonstrate the applicationsof time-based sequencing in approach and full delegation (self-separation). The descriptionconsist of a brief explanation of the operational, together with a hypothetical set of simulationdesign factors (e.g., who are the test subjects? At what stage is the validation?) Thedescription then runs through the preliminary criteria needed to arrive at a set of candidatemeasures, and a hypothetical example of how evaluative criteria (e.g., cost, intrusiveness,etc) might be considered. The following are intended as illustrative examples (based on thetypical circumstances), not complete procedures for how to apply the framework. Forinstance, a particular validation exercise might have specific hypotheses that differ fromthose presented here.

5.1 TIME-BASED SEQUENCING IN APPROACH

The following example describes the process by which, given a specific ASAS application(time-based sequencing in approach, which falls out of the Airborne Separation category ofASAS applications), a human performance metric can be chosen. Since this selectionprocess cannot occur in a vacuum, certain other background factors are hypothesised. Forsimplicity, the following illustrative example only depicts in detail the process by which asingle human performance area (namely, workload) is addressed, the other two areas whichshould be considered for this ASAS application category, human error and situationawareness are presented more briefly.

Operational Concept: this application belongs to the Airborne Separation category of ASASapplications, as defined in PO-ASAS. The application applies to time based sequencing andmerging operations within the Extended TMA (ETMA) and TMA from the Top Of Descentuntil the Final Approach Fix. Within this the scenario has the following assumptions:

• Time criterion is used as the separation criterion;• The flight crew has been delegated limited separation responsibility from a

designated aircraft respecting airborne separation minima;• the controller is no longer responsible for the separation between these aircraft;• The airborne separation minima may be lower than the ATC separation minima;• There is a mixed level of ADS-B equipage.

Airspace consists of controlled airspace around a clutch of busy airports with multipleconflicting SIDs and STARs. Descending and ascending traffic is simulated, in differentatmospheric conditions. Inbound traffic passed through several sectors during descent.Multiple levels of traffic density, complexity and conflict base rate are examined.

A mixture of aircraft types (representing light, medium and heavy and a variety ofperformance categories) are simulated in all combinations. Aircraft in the trial are ASAS andCDTI equipped, the impact of non-equipped aircraft in the traffic will not be considered in thistrial. Scenarios are generally in agreement with western European traffic composition forTMA flight. ICAO wake vortex minima are in place. Latest generation TCAS is assumed, asare the additional functionalities of ADS-B and CD&R.

Page 50: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 50Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

Standard BRNAV capabilities augmented with: GNSS, FMS, 4DRNAV RNP1, Flight datasystem, and automatic management of the separation (connected with FGCS).

Selecting a Validation Technique: As shown in Table 2.24a, the three key humanperformance areas associated with Airborne Separation category of ASAS applications areworkload, human error, and situation awareness. From Table 2.24b we see that real-timesimulation best satisfies the need to assess these three human performance areas (it is quiterelevant for the assessment of workload and situation awareness).

Real-time simulation is used, with trained ATP crews required to perform approaches fromtop-of-descent, to a designated STAR way-point. ATC communication is specifically notincluded. Crews perform under both current day and approach ASAS conditions.Experimental variables include traffic density, traffic complexity, and conflict base rate. Thechief validation question surrounds whether the in-descent separation task raises flight-deckworkload over that of the current day.

Validation hypothesis: The chief characteristic of approach separation is the time-critical andbusy nature of this phase-of-flight. For this reason, one significant research question iswhether ASAS in this scenario would be likely to lead to higher transient workload. That is,are there times that ASAS would result in excessive workload during the descent profile?

Metric selection: The scenario requires a workload metric suitable for the pilot’s perspectivein a real time simulation (as validation technique). The chief evaluative criteria are cost (thisis an early validation exercise), intrusiveness (this concerns the demanding approach phaseof flight), and time resolution (again, the concern is short term workload peaks). There arevarious subjective and objective measures available. This is a first exploratory study, inwhich the validation practitioner wants to roughly establish whether ASAS might increaseworkload. Further, there is concern about transient workload peaks—that workload mightoccasionally become excessive.

Because time resolution is important (again, there is reason to suspect transient workloadpeaks), two physiological measures are identified: evoked EEG (brain waves), and pupildiameter. Because cost is the driving criterion, subjective measures are considered: theNASA TLX, Instantaneous Self Assessment (ISA), and Bedford Scales. The SubjectiveWorkload Assessment Technique (SWAT) is ruled out on the basis of its potentialintrusiveness. Of the performance-based measures, secondary tasks are ruled out—addingadditional tasks seems too intrusive. Embedded tasks (eg flight path tracking RMS error) arean option, if a task can be identified that will provide sufficiently rich data. Physiologicalmeasures offer a few possibilities. Given the need for good time resolution, two availabletechniques (Evoked EEG potentials (brain waves) and pupil diameter) are identified. Twocriteria, however, argue against the use of EEG: First, they are too intrusive; Second, theycannot be collected often enough. Although they have sufficient time resolution in single trials(on the milli-second level) they are best used to probe responses to infrequent events.

Although they offer potentially high temporal resolution (as low as tenths of a second), thephysiological measures are costly, intrusive and require great care in data collection andvalidation design. They are therefore ruled out as impractical. All three of the candidatesubjective measures are attractive in terms of cost. Of the three, the multivariate NASA TLXand the Bedford workload scales are ruled out as too intrusive (time consuming to completemultiple scales). Given its low data collection demands on pilots (pilots can either call out,note, or indicate via push-button their ISA ratings), the ISA has the advantage of beingcollectable every few minutes. In the end, the team settles on the use of ISA ratings.

Guidance: The major concerns in this example are the competing demands of validity andintrusiveness. The ultimate metric must obviously consider the specific context of the

Page 51: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 51Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

validation. For instance, are funds limited? At what stage is the validation (an early prototypeor a near operational system?). Is intrusiveness likely to be an issue in the trials? In thisexample, the practitioner placed emphasis on finding workload metric(s) that could detectshort term workload peaks. Cost and intrusiveness are the chief criteria.

Although ISA seems a workable option, several cautions are in order. First, ratings should becollected as often as possible though not overwhelm the pilot, or become the chief focus ofthe pilot’s task. ISA has been collected as frequently as every few minutes without reportedproblems. Further, given the high visual and motor demands of the approach phase of flight,ISA ratings should not impose these sorts of demands. So the common method of ISA push-button ratings should be avoided. Instead, aural prompts should be made on schedule, andpilots should be instructed to provide ISA ratings as verbal call-outs.

For the sake of simplicity, the preceding illustrative example focused only on Workload. Infact, PO-ASAS also suggests that Situation Awareness and Human Error are two otherhuman performance areas likely relevant to the Airborne Separation category of ASASapplications. Selection of appropriate metrics for these areas would proceed similarly. In thecase of Situation Awareness, is key to find a method suitable for real-time simulation that isnot intrusive or disruptive during the (demanding) approach phase of flight. It is thereforelogical to use ‘embedded task performance’ as a measure of situation awareness. Any othertechnique would involve freezing the display (deemed too unrealistic and disruptive) orimposing a rating/judgement task (considered too intrusive and demanding). For similarreasoning, ‘embedded task performance’ was also chosen to assess human error.

5.2 SELF-SEPARATION IN MIXED EQUIPAGE EN-ROUTE AIRSPACE

Operational Concept: This application falls out of the Airborne Self-Separation category ofASAS applications. It assumes two types of aircraft: those equipped with ADS-B basedASAS, as well as CDTI, and those not equipped. In this scenario, equipped aircraft are freeto self separate, while non-equipped aircraft remain under positive ATC control. Boundary

Figure 5.1 The process of selecting a workload metric (using section 5.1 example ofTime Based Sequencing in Approach )

Are thereunacceptable

workload peaks?AirborneWorkload

Real timesim (RTS)

Time ResolutionCost

IntrusivenessISA Guidance..

.

Issue

HumanPerformance

AreaPerspective

ValidationTechnique

EvaluativeCriteria Metric

Page 52: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 52Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

exit and entry points, and special areas and alternative routes are simulated, in differentatmospheric conditions.

Selecting a Validation Technique: The Airborne Self-Separation category of ASASapplications is associated with two key human performance areas: teamwork and trust.These are best measured by either real time simulation (for teamwork), or survey methods(for trust). Given that surveys are often incorporated into other types of studies (see section2.2.2), it is decided to integrate both validation techniques into one: a real time study, withparticipant surveys.

This is an early validation exercise. Training and familiarisation is provided into the use ofCDTI. Crews perform under both current day (positive ATC control, using standard way-points) and ASAS (passive ATC, no way-points, and intervention by exception for imminentloss of separation), depending on their equipage. Experimental variables include trafficdensity, traffic complexity, and conflict base rate.

Validation hypothesis: The chief validation questions concern whether: [1] controllers andpilots can work together (i.e. teamwork), and whether flight-crews onboard ASAS equippedaircraft trust their CDTI and CD&R tools.

Metric selection: The scenario requires a metric suitable assessing trust (on the air side) in asurvey setting, as well as a metric for assessing teamwork (for both air and ground) usingteamwork in a real time study setting. Lee and Moray trust scales are chosen for theirsimplicity and ease of use in assessing trust, while teamwork is assessed usingcommunications ratings forms, which can address various aspects of teamwork /coordination (e.g. conflict resolution, etc) that seem valuable given the very different roles ofATC and flight-deck under ASAS.

Guidance: The Lee and Moray scale is a simple 10-point scale that allows the user to ratetheir trust in the tools provided. It is selected on the basis of its low cost and low level ofintrusiveness, which is important for the validation of a relatively immature system thatinvolves a high level of interaction. As a survey type technique, limited briefings will beneeded to train the users in its use, however, it will be important to have evaluated thescorings provided by the pilots prior to the measured trial to ensure that any biases areremoved as far as is possible. As it is a subjective measurement, this should be emphasisedin the analysis. This process does have the added benefit of engendering support from theuser community and can act as a focus for feedback to improve the capability andpresentation of the results.

The communications rating form is again selected on the basis of cost and intrusiveness as itis relatively cheap and can be used after the end of a measured run. The rating form willneed to be developed to meet the particular communications (type, agent/agent etc) that areexpected, so it will need to be tested in some kind of ‘shakedown trial’ to ensure that thequestions are clear and produce responses that are reasonably objective. Again theopportunity for individual bias needs to be minimised, by comparing the ratings of individualparticipants in a standard scenario. The problem of subjectivity is also present with thismetric and should be drawn out in the analysis of the results.

6. CONCLUSIONS

WP3 sought to provide a usable framework that can be used to guide the selection ofappropriate human performance metrics for use in ASAS validation exercises. This effort wasnot intended to “recreate the wheel” by assessing in depth the universe of potentiallyavailable metrics. Although the list of presented metrics was not exhaustive, a great deal of

Page 53: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 53Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

effort went into presenting those whose use is most supported on the basis of empirical andtheoretical experience (together with primary references for each) and as being mostrelevant to the validation of ASAS applications.

It was not intended that this effort produced a “cookbook” that could guide the uninitiatedthrough the process of conducting validation in an ASAS setting. Rather, this WP3 effortproduced a framework that can be used by the validation practitioner to choose theappropriate validation metrics with which to assess ASAS applications. It is hoped that theWP3 product can help the validation practitioner narrow the list of potential candidate humanperformance metrics, and, thereafter, to systematically weigh the various evaluative criteria(as outlined in the tables of chapter 4), in the context of the given validation. Although nospecific expertise is assumed, it is hoped that the validation practitioner will refer to theincluded primary literature references for each metric, and become familiar with the metric inquestion before embarking on the validation exercise.

There are other, additional, factors that can influence the type of metrics (and the meansused to collect them) chosen. One of these is the stage of validation (as captured in thevalidation route of MAEVA, Figure 2.2) concerned. If validation is taking place with an earlyprototype, this is likely to influence the type of methods (validation technique) chosen.Moreover, it is likely to influence the types of metrics chosen.

Clearly, a goal of the present work was to link the selection and use of human performancemetrics, directly to the ASAS applications as identified in the PO-ASAS. With this in mind, thefollowing table presents the priority human performance areas associated with each ASASapplication, as presented earlier in section 2.2.4. Although it is difficult a priori to identifyexactly which human performance areas are of most interest for a given ASAS application,the PO-ASAS does provide some insights in terms of equipage, operational, and humanfactors principles. These have been used to extract the most likely relevant humanperformance areas for each ASAS application, as follows:

Human Performance AreaASASApplicationCategory Workload

HumanError

SystemMonitoring Usability

SituationAwareness

Teamwork Trust

AirborneTrafficSituationalAwareness

a a a

AirborneSpacing

a a a

AirborneSeparation

a a a

AirborneSelfSeparation

a a

Table 6.14 Key human performance areas, for each of the ASAS application categories (asextracted from PO-ASAS).

4 Again, the preceding was based exclusively on the issues identified in the PO-ASAS. Further, theabsence of a given human performance area does not mean that it is irrelevant for the applicationcategory. Indeed, usability (which was never identified as a key issue for any of the ASASapplications) is an important area of consideration, regardless of ASAS application. Again, taskperformance was also identified but as its use will be specific to each individual ASAS application.therefore not developed as part of the validation framework.

Page 54: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 54Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

For each of the seven identified human performance areas, the type of validation technique(i.e. real-time simulation, fast-time simulation, survey, analytic study) is the primarydeterminant of what one can expect to measure. That is, the question of which humanperformance metrics can be assessed by which validation technique (or, conversely, whichvalidation techniques are most appropriate for assessing a given human performance metric)can be answered largely irrespective of the PO-ASAS category under investigation. With thatin mind, the following table (from section 2.2.2) summarises the relevance of each of thevalidation techniques for each of the seven human performance metrics.

Human Performance AreaValidationTechnique Workload

HumanError

SystemMonitoring Usability

SituationAwareness

Teamwork Trust

Analytic a aaa aa aa a aa aFast Time a aaa a aa a aa aReal Time aaa a aaa aaa aaa aaa aSurvey aa a a aaa aa a aaa

Table 6.2 General relevance of the validation techniques, for assessing each of the humanperformance areas.

Linking together the two preceding tables enables a first approximation of the process bywhich an ASAS application can drive the selection of an appropriate human performancemetric. Given that, for instance, that workload has been identified as a key humanperformance area for Airborne Separation applications, the next step would be to focus onthose workload metrics appropriate for use in real-time simulations.

It is important, again, that there are in some cases alternative means of collecting the samehuman performance metric. For instance, some subjective measures (eg ISA) are currentlybeing collected in either paper-and pencil form (cheap and easy to administer) or in morecostly electronic form. Similarly, if labour costs are no option, lower tech solutions can permitbehavioural analysis from videotaped review, as opposed to (potentially expensive)automated data collection. Although the current framework does not capture this, theimplications are obvious. The preceding tried to provide an overview of the metrics, anddiscuss each in terms of the typical data collection methods. Alternative data collectiontechniques, or analysis techniques, can have an influence on such criteria as cost, etc.

WP 3 identified a number of metrics. Given that these were crossed with the other factors (egPerspective), and that metrics often could map onto various combinations ofperspective/validation type/ etc, it quickly becomes obvious that representing this is tabularformat would result in a combinatorial explosion. For this reason, it was chosen to representthe results in the form of an Excel spreadsheet, without drawing out each example into itsfully-worked form.

The CARE/ASAS team has been working on the assumption of five high level validationobjectives: Safety; Capacity; Efficiency; Economics (encompassing efficiency); Security &Defence. A fundamental notion underlying validation is that the use of validation metrics,regarding either system or human performance, allows some inference and relation to thesehigher level goals. In the case of system metrics, the link is often inherent. Capacity, forinstance, can be assessed using the number of aircraft movements (e.g. arrivals). The linkbetween a system performance metric and the higher level goal is, in this case, verycompelling. Similarly, human performance metrics can be used to demonstrate the ability ofa deliverable to operate in a real-life environment, with pre-defined criteria regarding

Page 55: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 55Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

functionality, operability and performance. However, the human performance metricsidentified in WP3 are implicitly linked to both system performance areas and the high levelobjectives. Workload benefits, for instance, are implicitly expected to pay dividends in termsof capacity, safety, etc. It is this critical connection to the higher level goals, and indeed thecritical nature of the human operator in the ATM system, that makes the assessment ofhuman performance issues essential in any validation programme for ASAS.

Page 56: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 56Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

Annex A. List of Human Performance MetricsThe following lists each of the identified human performance metrics, grouped according totheir main human performance area5. Of the human performance areas identified, workloadhas historically received the most attention. This is clearly reflected in the following list.

Workload metrics:

1. Air Traffic Workload Input Technique (ATWIT)2. Aircraft on Frequency3. Aircraft spacing performance4. Auditory Choice Secondary Task Response Time5. Bedford Workload Scale6. Blink latency7. Blink rate8. Blink-saccade asynchrony9. Blood Pressure10. Brain Evoked Potentials11. Card Sorting Secondary Task12. Catecholemine13. Choice Reaction Time secondary task14. Classification Secondary Task15. Cooper Harper Rating Scale16. Cortisol level17. Dichotic Listening18. EEG pattern19. Event Related (Evoked Cortical) Potential20. Galvanic Skin Response GSR21. Head down time22. Heart rate23. Heart Rate Variability24. Instantaneous Self Assesssment (ISA)25. Lexical Decision Secondary Task26. Magnetoencephlographic Activity (MEG)27. Mental Arithmetic28. Modified Cooper-Harper rating Scale29. NASA Task Load Index (TLX)30. Number of aircraft accepted into sector31. Number of altitude transitions32. Number of button pushes33. Number of Errors34. Number of hand offs35. Number of separation violations36. Pilot Objective/Subjective Workload Assessment Technique (POSWAT)37. Probability of Correct Detections38. PUMA39. Pupil diameter40. Radio Communications embedded task41. Radio Telephony Average Call Duration42. Reading time43. Respiration rate44. RT congestion 5 Notice that some metrics have been used to assess more than one aspect of human performance.

Page 57: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 57Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

45. Scanning entropy (randomness)46. Secondary task detection response time47. Secondary task Interval production accuracy48. Secondary Task signal detection hit rate49. Secondary Task Time estimation accuracy50. Serial Recall51. Sternberg auditory memory search52. Sternberg visual memory search53. Subject Matter Ratings54. Subjective Workload Assessment Technique (SWAT)55. Taylor et al 7-point rating scale questionnaire56. Time Estimation Secondary Task57. Vanillin Mandelic Acid58. Verbal Protocol Analysis59. Visual Dwell Time60. Visual Fixation Frequency61. Visual Saccade Duration62. Visual Saccade Rate

63. Situation Awareness metrics:64. China Lake SA Rating Scale65. Embedded Task Performance66. SA Global Assessment Technique (SAGAT)67. SA Linked Instances Adapted to Novel Tasks (SALIANT)68. Situation Awareness Rating technique (SART)69. Subject Matter Ratings (i.e., from an observer)70. Subjective Rating (i.e., from the subject)71. Situation Present Assessment Method (SPAM)

72. System monitoring metrics:

73. Brain Evoked Potentials74. Response time75. Search time76. Signal detection false alarm count77. Signal detection hit rate78. Signal detection miss rate79. Subject Matter Ratings

80. Teamwork metrics:81. Behavioural observation scales82. Communications coding schemes83. Communications ratings formats84. Event-based performance measurement85. Head down time86. Measurements of shared mental models87. Observation-based questionnaire88. Subject Matter Ratings89. Subjective Rating90. Team co-ordination91. Team process quality rating form

Page 58: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 58Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

92. Team process frequency recording form93. Team process questionnaire94. Team self-assessment questionnaire95. Timeline analysis96. Video walkthrough analysis

Trust metrics:1. Controller Acceptance Rating Scales (CARS)2. Debriefs3. Human Computer Trust scales4. Jian et al Trust questionnaire5. Lee and Moray scales6. Muir scales7. SHAPE Automation Trust Index (SATI)8. Subject Matter Ratings9. Subjective Rating10. Taylor et al 7 point rating scale questionnaire

Usability metrics:1. Cognitive Task Analysis2. Controller Acceptance Rating Scales (CARS)3. Controller Operational Acceptability Rating (CARS)4. Debriefs5. EMT (Point of Gaze)6. Frequency of use7. Head down time8. Job elements completed per unit time9. Movement Time10. Pilot Performance Index11. PUMA12. Radio Telephony usage rate13. Rating of display clutter14. Software Usability Measurement Inventory (SUMI)15. Subject Matter Ratings16. Subjective Rating17. Task Analysis18. Time-to-task completion19. Video walkthrough analysis

Human Error metrics:1. Absolute Error2. Aircraft spacing performance3. Commission errors4. Deviations5. Missed conflicts6. Ratio of Number Correct / Number of Errors7. RMS error8. Subject Matter Ratings9. Technique for Human Error Analysis (THEA)10. Technique for Human Error Rate Prediction (THERP)11. Technique for the Retrospective Analysis of Cognitive Errors in ATM tools (TRACEr)12. Tracking Error13. Embedded task performance14. Observation

Page 59: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5 page 59Human Performance Metrics Report Version 1.0 – 12 November 2002NATSCAIRS No 0203405

Page 60: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5Human Performance Metric Report Version 0.1– 30 May 2002

60

REFERENCE LIST

Abdul-Rahman, A. and Hailes, S. (1999) Relying on trust to find reliable information. 1999International Symposium on Database, Web and Cooperative Systems (DWACOS'99),Baden-Baden, Germany.

Abdul-Rahman, A. & Hailes, S. (2000) Supporting trust in virtual communities. HawaiiInternational Conference on System Sciences 33, Maui, Hawaii, 4-7 January 2000.

Alfredson, J. (2001). Aspects of situational awareness and its measures in an aircraftsimulation context. Linkoping Studies in Science and Technology, Thesis No. 865, LiU-Tek-Lic-2001:2, Linköping University, Sweden.

American National Standard. Guide to Human Performance Measurements (1992).BSR/AIAA, G-035-1992.

Annett, J. and Cunningham, D. (2000) Analysing Command Team Skills. In, J. Schraagen, S.Chipman and V. Shalin (Eds), Cognitive Task Analysis. Lawrence Erlbaum Associates,N.J. 401-415.

Annett, J., Cunningham, D. and Mathias-Jones, P. (2000) A method for measuring teamskills. Ergonomics, Volume 43, 8, 1076-1094.

Artman, H. and Garbis, C. (1998) Situation awareness as distributed cognition. Paperpresented at the Ninth European Conference on Cognitive Ergonomics (ECCE-9) ,Limerick, 24th-26th August. EACE.

Artman, H. and Granlund, R. (1998) Team situation awareness using graphical or textualdatabases in dynamic decision making. Paper presented at the Ninth European Conferenceon Cognitive Ergonomics (ECCE-9) , Limerick, 24th-26th August. EACE.

Avermaete, J.A.G. van (1998) NOTECHS: Non-technical skill evaluation in JAR-FCL. NLR-TP-98518. Amsterdam: National Aerospace Laboratory.

Bainbridge, L. (1982) Ironies of automation, in, G. Johannsen and J.E. Rijnsdorp, Analysis,Design and Evaluation of Man-Machine Systems, Proc. of IFAC Conf.,

Baden-Baden, Germany, 129-135.Bauer, L. Goldstein, R., & Stern, J. (1987). Effects of infomation processing demands on

physiological response patterns, Human Factors, 29, 213-34.Beatty, J. (1986). The pupillary system. In M.G.H. Coles, E. Donchin, & S. W. Porges (Eds.),

Psychophysiology: systems, processes and applications. New York: The Guilford Press.Beatty, J. (1982). Task-evoked pupillary responses, processing load, and the structure of

processing resources. Psychological Bulletin, 1982, 1(2), 276-292.Berggren, P. (2000). Situational awareness, mental workload, and pilot performance -

relationships and ual aspects. FOA-R--00-01438-706-SE.Billings, C.E. (1996) Aviation automation. The search for a human-centred approach.

Lawrence Erlbaum Associates.Bisantz, A.M., Llinas, J., Seong, Y., Finger, R. and Jian, J-Y. (2000) Empirical investigations

of trust-related system vulnerabilities in aided, adversarial decision making. Center forMulti-Source Information Fusion, Dept. of Industrial Engineering, State University of NewYork at Buffalo.

Bisseret, A. (1995) “Représentation et décision experte; psychologie cognitive de la décisionchez les aiguilleurs du ciel” edition Octares

Bisseret, A. (1971) Analysis of mental processes involved in Air Traffic Control. Ergonomics,14, (5).

Boag, C., Neale, M. and Neal, A. (1999) Measuring situation awareness: A comparison ofalternative measurement techniques. Proc. of the 10th International Symposium on AviationPsychology, 1240-1246.

Bolstad, C.A. and Endsley, M. (1999) Shared mental models and shared displays: Anempirical evaluation of team performance. In, Proceedings of the 43rd meeting of theHuman Factors & Ergonomics Society.

Bonini, D., Jackson, A. and McDonald, N. (2001) Do I trust thee? An approach tounderstanding trust in the domain of air traffic control. In, Proc. of People in Control, 19-21June, UMIST Manchester.

Page 61: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5Human Performance Metric Report Version 0.1– 30 May 2002

61

Bowers, C., Weaver, J., Barnett, J. and Stout, R. (1998) Empirical validation of the SALIANTmethodology. Paper presented at the RTO HFM Symposium on Collaborative CrewPerformance in Complex Operational Systems, Edinburgh 20-22 April. RTO/NATO.

Bradshaw, J.L. (1968). Load and pupillary changes in continuous processing tasks. BritishJournal of Psychology, 59, 265-271.

Brannick, M.T. and Prince, C. (1997) An overview of team performance measurement. In,M.T. Brannick, E. Salas and C. Prince (Eds.), Team Performance Assessment andMeasurement. Theory, Methods, and Applications. Lawrence Erlbaum Associates, 3-16.

Brickman, B., Hettinger, L., Stautberg, D., Haas, M., Vidulich, M. and Shaw, R. (1998) TheGlobal Implicit Measurement of Situation Awareness: Implications for Design and AdaptiveInterface Technologies. In, M. Scerbo and M. Mouloua (Eds), Automation Technology andHuman Performance: Current Research and Trends. 160-170.

Brookings, J.B. & Wilson, G.F. (1994). Physiological and workload changes during asimulated air traffic control task. In Proceedings of the Human Factors and ErgonomicSociety, 38th Annual Meeting.

Buckley, E. P., DeBaryshe, B. D., Hitchner, N., & Kohn, P. (1983). Methods andmeasurements in real-time air traffic control system simulation (DOT/FAA/CT-83/26).Atlantic City, NJ: DOT/FAA Technical Center.

Caldwell, J. A., Wilson, G. F., Cetinguc, M., Gallard, A. W. K., Gundel, A., Lagarde, D., Makeig,S., Myhre, G. and Wright, N. A. (1994) Psychophysiological assessment methods, AGARD-AR-324, Neuilly Sur Seine, France: NATO.

Cardosi, K and Murphy, E. (1995) Human Factors in the Design and Evaluation of Air TrafficControl Systems. Federal Aviation Administration, Office of Aviation Research,DOT/FAA/RD-95/3.

CARE/ASAS Activity 1: European ASAS literature and study review (Problem Dimensions /Evaluation of Past Studies).

Carmody, M. A. (1994). Current issues in the measurement of military aircrew performance: Aconsideration of the relationship between available metrics and operational concerns, AirVehicle and Crew Systems Technology Department, Naval Air Warfare Center: AircraftDivision, Warminster, PA

Casali, J. & Wierwille, W. (1983). A comparison of rating scale, secondary task,physiological, and primary task workload estimation techniques in a simulated flight taskemphasizing communications load. Human Factors, 25, 623-642.

Carroll, J.M. and Olson, J.R. (1987) Mental models in human-computer interaction. Researchissues about what the user of software needs to know. Washington D.C. : NationalAcademy Press.

Cashion, P. & Lozito, S. (2000); “How Short- and Long-term Intent Information Affects PilotPerformance in a Free Flight Environment”, San Jose State University, NASA AmesResearch Center, HCI-Aero conference 2000 paper

Costa, G. (1993). Evaluation of workload in air traffic controllers. Ergonomics, 36(9), 1111-1120.

Chabrol, C., Vigier, J.C., Garron, J. and Pavet, D. (1999) CENA PD/3 Final Report,PHARE/CENA/PD/3-2.4/FR/2.0

Charlton, S.G. (1996) Questionnaire techniques for test and evaluation. In, T.G. O’Brien andS.G. Charlton (Eds), Handbook of Human Factors Testing and Evaluation. LawrenceErlbaum Associates. 81-99.

Checkland, P. (1981) Systems Thinking, Systems Practice. Chichester: Wiley.Cook, M. (1998) Personnel selection: Adding value through people, 3rd edition. Chichester:

Wiley.Coolican, H. (1996) Research Methods and Statistics in Psychology. Hodder & Stoughton,

London.Cooper, G.E. and Harper, R.P. (1969) The use of pilot rating in the evaluation of aircraft

handling qualities, NASA-AMES Report TN-D-5153.

Page 62: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5Human Performance Metric Report Version 0.1– 30 May 2002

62

Cooper, G.E., White, M.D., & Lauber, J.K. (1980). Resource Management on the Flightdeck:Proceedings of a NASA/Industry Workshop. (NASA CP-2120). Moffett Field, CA: NASA-Ames Research Center.

Cox, M. and Kirwan, B. (1999) The future role of the air traffic controller: Design principles forhuman-centred automation; in, M.A. Hanson, E.J. Lovesey, and S.A. Robertson,Contemporary Ergonomics 1999, Taylor & Francis Ltd., 27-31

Cunningham, D., Kelly, C.J., Goillau, P. and Boardman, M. (2001) The development ofteamwork measures in ATM systems. HRS/HSP-005-REP-03. Edition 0.2. Brussels:EUROCONTROL.

Damos, D. (1992). Dual task methodology: some problems. In D.L. Damos (Ed.), Multipletask performance. London: Taylor & Francis.

DCIEM, Defence and Civil Institute of Environmental Medicine (1988). A preliminaryexamination of mental workload, Its measurement and prediction, Technical Report No. AD-B123-23, Canada: DCIEM.

Deighton, C.D.B. (1997) Towards the development of an integrated human factors andengineering evaluation methodology for rotorcraft D/NAW Systems, QinetiQ Report No.DERA/AS/FMC/CR97629/1.0, Dec. Farnborough: QinetiQ Ltd.

Dennehy, K. (1997) Cranfield - Situation Awareness Scale, User Manual. AppliedPsychology Unit, College of Aeronautics, Cranfeld University, COA report N0 9702,Bedford, January

Delle, H.J. van, Aasmen, J., Mulder, L.J.M. & Mulder, G. (1985). Time domain versusfrequency domain measures of heart rate variability. In J.F. Orlebeke, G. Mulder & L.J.P.van Doornen (Eds.), Psychophysiology of cardiovascular control: models, methods, anddata. New York: Plenum Press.

DERA (1997) WP6: Application of evaluation techniques. Annex B. Results of DERAcognitive walkthrough activity. EC DGVII RHEA project, Ref. RHEA/TH/WPR/6/2.0, 30th

July.Dickinson, T.L. and McIntyre, R.M. (1997) A ual framework for teamwork measurement. In,

M.T. Brannick, E. Salas and C. Prince (Eds.), Team Performance Assessment andMeasurement. Theory, Methods, and Applications. Lawrence Erlbaum Associates, 19-43.

Dominguez C.; Vidulich, M.; Vogel, E. & McMillan, G. (1994) Situation Awareness; papersand Annotated bibliography (U). Armstrong Laboratory, Human System Center, ref. AL/CF-TR-1994-0085.

Donovan, J. & Joseph, K.M. et al.(1998); “Human Factors Issues In Free Flight”, SAE G-10Aerospace Resource Document (ARD) No. 50079 Du Boulay, E., Cox, M., Hawkins, J.R.and Williams, J. (1994) NODE-M STCA: ATC Evaluation. National Air Traffic Services, CSReport 9439, May.

Duong, V. & Floc’hic, L. (1996). “FREER-1 Requirement Document version 2.0 ”,EUROCONTROL Experimental Centre EEC Bretigny, France.

Duong, V. (1997). FREER Free Route Encounter Experimental Resolution—a solution for theEuropean Free Flight Implementation, Air Navigation Conference, September 1997.

Durso, F.T. and Gronlund, S.D. (2000) Situation awareness. In, F.T. Durso et al, Handbookof Applied Cognition. John Wiley & Sons. 283-314.

Durso, F.T., Truitt, T.R.; Hackworth, C.A., Crutchfield, J., Nikolic, D., Moertl, P.M., Ohrt, D.and Manning, Carol, A (1995) Expertise and Chess: a Pilot Study Comparing SituationAwareness Methodologies. In, D.J. Garland and M. Endsley (Eds), Experimental Analysisand Measurement of Situation Awareness, Embry-Riddle Aeronautical University Press.

Durso, F.T., Truitt, T.R., Hackworth, C.A., Cruchtfield, J.M. and Manning, C.A. (1997) EnRoute Operational Errors and Situation Awareness. The International Journal of AviationPsychology, 8 (2), 177-194.

Durso, F.T., Hackworth, C.A., Truitt, T.R., Crutchfield, J., Nikolic, D. and Manning, C.A.(1998) Situation Awareness as a Predictor of Performance for En Route Air TrafficControllers. Air Traffic Control Quarterly, Vol. 6 (1), 1-20.

Durso, F.T, Crutchfield, J.M. & Balsakes, P.J. (2001 in Press) Cognition in a DynamicEnvironment, University of Oklahoma, Department of Psychology, Norman.

Page 63: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5Human Performance Metric Report Version 0.1– 30 May 2002

63

Dwyer, D.J., Fowlkes, J.E., Oser, R.L. ans Lane, N.E. (1997) team performancemeasurement in distributed environments: the TARGETs methodology. In, M.T. Brannick,E. Salas and C. Prince (Eds.), Team Performance Assessment and Measurement. Theory,Methods, and Applications. Lawrence Erlbaum Associates, 137-153.

Dzindolet, M., Pierce, L.G., Beck, H.P. and Dawe, L. (1999) Misuse and disuse of automatedaids. Proc. of the Human Factors and Ergonomics Society 43rd Annual Meeting, 339-343.

Dzindolet et al (2000a) Building trust in automation. Paper presented at, HumanPerformance, Situation Awareness and Automation: User-Centered Design for the NewMillenium, The 4th Conference on Automation Technology and Human Performance andthe 3rd Conference on Situation Awareness in Complex Systems, October 15-19

Dzindolet, M., Pierce, L.G., Beck, H.P. and Dawe, L. (2000b) a framework of automation use.Manuscript submitted for publication.

EATCHIP (1996) Guidelines for developing and implementing team resource management.EUROCONTROL EATCHIP. Released Issue. Edition 1.0. 15/03/96.

EATMP (1999) Team resource management test and evaluation. EUROCONTROL EATMP.Released Issue. Edition 1.0. 30/11/99.

EATMP (1999) Integrated task and job analysis of air traffic controllers - Phase 2: Taskanalysis of en-route controllers. HUM-ET1.ST01.1000-REP-04. Edition 1.0. ReleasedIssue. Brussels: EUROCONTROL.

EATMP (2000) Human Resources Programme - Stage 1. Programme Management Plan.EUROCONTROL EATMP. Edition 1.0, 14/09/00.

EATMP (2000b) The Human Error in ATM (HERA) technique. HRS/HSP-002-REP-03.Edition 0.2. Draft Issue. Brussels: EUROCONTROL.

EATMP (2001a) Age, experience, and automation in European air traffic control. Brussels:EUROCONTROL EATMP. Working Draft. Edition 0.1. October.

Edwards, E. and Lees, F.P. (1974). The Human Operator in Process Control, London:Taylor and Francis.

Eggemeier, F.T. (1988). Properties of workload assessment techniques. In M. Venturino(Ed.) Selected readings in human factors. (1990) (pp. 228-248). Santa Monica, California:Human Factors Society.

Endsley, M.R. (1988) Design and evaluation for situation awareness enhancement. In,Proceedings of the Human Factors Society 32nd Annual Meeting (Vol. 1). Santa Monica:Human Factors Society. 97-101.

Endsley. M (1993). Situation awareness and workload: flip sides of the same coin? InProceedings of the 7th International Symposium on Aviation Psychology, Columbus, Ohio.

Endsley, MR (1994). A taxonomy of situation awareness errors. Paper presented at theWestern European Association for Aviation Psychology 21st Conference, March 1994--Dublin, Ireland.

Endsley, M. (1998). EFFECT OF FREE FLIGHT CONDITIONS ON CONTROLLERPERFORMANCE, WORKLOAD, AND SITUATION AWARENESS, SA Technologies

Endsley, M.R. (1995a) Toward a theory of situation awareness in dynamic systems. HumanFactors, vol. 37/1, 32-64.

Endsley, M.R. (1995b) Measurement of situation awareness in dynamic systems. HumanFactors, vol. 37/1, 65-84.

Endsley, M.R. (2000) Theoretical underpinnings of situation awareness: A critical review. In,M.R. Endsley and D.J. Garland (Eds), Situation Awareness Analysis and Measurement.Lawrence Erlbaum Assoc.

Endsley, M., Hansman, R.J. and Farley, T. (1999) Shared situation awareness in the flightdeck-ATC system. Paper presented at the 1999 Digital Aviation Systems Conference,Seattle, Washington.

Endsley, M. R., and Kaber, D.B. (1999) Level of automation effects on performance, situationawareness and workload in a dynamic control task. Ergonomics, 42(3), 462-492.

Endsley, M.R. and Kiris, E.O. (1995) The out-of-the-loop performance Problem and Level ofControl in Automation. Human Factors, (37) 2, 381-394.

Page 64: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5Human Performance Metric Report Version 0.1– 30 May 2002

64

Endsley, M.R. and Kiris, E.O. (1995) Situation Awareness Global Assessment Technique(SAGAT) TRACON Air Traffic Control Version, user guide, TTU-IE-95-02, Texas TechUniversity.

Endsley, M.R. and Rodgers, M.D. (1994) Situation awareness information requirements foren route air traffic control. Report DOT/FAA/AM-94/27. Federal Aviation Administration.

Endsley, M., Sollenberger, R., Nakata, A., and Stein, E., (2000), Situation Awareness in AirTraffic Control: Enhanced Displays for Advanced Operations,Technical note DOT/FAA/CT-TN00/01, Federal Aviation Administration, W. Hughes Technical Center

Erzberger, H. (1989) ATC automation s; in, Proc. of the Aviation Safety and AutomationProgram Conf., NASA Conf. Publication 3090.

EUROCONTROL (1993) Role of Man within PHARE; EUROCONTROL DOC 93-70-35.EUROCONTROL (2000). FAST: 1999 pilot in the loop evaluation. EEC Note No 13/00 July

2000.EUROCONTROL (2000a) Air traffic controller attitudes toward future automation s: A

literature review. EUROCONTROL Report ASA.01.CORA.2.DEL02-A.RS, 4th December.EUROCONTROL (2000b) Conflict Resolution Assistant level 2 (CORA2). Controller

assessments. EUROCONTROL Report ASA.01.CORA.2. DEL02-b.RS, 4th December.EUROCONTROL (2001) Principles and Guidelines for the Development of Trust in Future

ATM Systems: A Literature Review. EUROCONTROL Report HRS/HSP-005-REP-01 v0.2,27th February.

FAA/EUROCONTROL AP1, Principles of Operation for the Use of ASAS. Version 7.1. June2001

Fairburn, C. and Wright, P. (2000) Exploring the Metaphor of “Automation as a Team Player”:taking team playing seriously; Paper presented at 10th European Conference on CognitiveErgonomics (ECCE - 10), Linkoping, Sweden, 21st – 23rd August.

Farley, T., Hansman, J., Endsley, M., Amonlirdviman, K. and Vigeant-Langlois, L. (1998) Theeffect of shared situation information on pilot/controller situation awareness and re-routenegotiation. Paper presented at the 2nd USA/Europe ATM R&D Seminar, Orlando, Florida,1st-4th December.

Fink, A. and Major, D. (2000) Measuring situation awareness: A comparison of threetechniques. In, Procs., Human Performance, Situation Awareness and Automation: User-Centered Design for the New Millennium, Savannah, Georgia, October 15-19.

Finnie, S. and Taylor, R. (1998) The Cognitive Cockpit. Flight Deck International, UK &International Press.

Flin, R., Goeters, K-M., Hormann, H-J. and Martin, L. (1998) A generic structure of non-technical skills for training and assessment. Paper presented at the 23rd Conference of theEuropean Association for Aviation Psychology, Vienna, 14-18 September.

Flin, R. & Martin, L. (1998). Behavioural markers for crew resource management. CAA Paper98005. Civil Aviation Authority. London: UK.

Flin, R. and Martin, L. (2001) Behavioural markers for Crew Resource Management: Areview of the literature. The International Journal of Aviation Psychology, Vol. 11, 1, 95-118.

Funk, K., Lyall, B. and Riley, V. (1996) A comparative analysis of flightdecks with varyinglevels of automation. Phase 1 Final Report: Perceived human factors problems of flightdeck automation; FAA.

Furnham, A. (1997) The Psychology of Behavior at Work: The individual in the organization.London: Psychology Press.

Gent, R.v. et al. (2000). CARE-ASAS Activity 1: Problem Dimensions / Evaluation of opastStudies. European ASAS Literature and Study Review. CARE/ASAS

Gentner, D. and Stevens, A. (1983) Mental Models. LEA Associates, Inc.Gladstein, D. L. (1984) Groups in context: A model of task group effectiveness.

Administrative Science Quarterly, 29, 499-517.Gopher, D. & Donchin, E. (1986). Workload: An examination of the . In K.R. Boff, L.

Kaufman, & J.P. Thomas (Eds .). Handbook of perception and human performance (Vol. II,chap. 41). New York: John Wiley and Sons.

Page 65: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5Human Performance Metric Report Version 0.1– 30 May 2002

65

P. Goillau, C. Kelly, M. Boardman, E. Jeannot. (2001). Development of a Measure of Trust inATM Systems. HRS/HSP-005-REP-02. Edition 0.2. Brussels: EUROCONTROL.

Goillau, P. and Kelly, C. (1997) MAlvern Capacity Estimate (MACE) - a proposed cognitivemeasure for complex systems. In Harris, D. (ed) Engineering Psychology and CognitiveErgonomics Volume 1: Transportation Systems, Ashgate Publishers, pp 219-225.

Goillau, P., Woodward, V., Kelly, C. and Banks, G. (1998) Evaluation of virtual prototypes forATC – the MACAW technique. In Hanson, M. (ed) (1998) Contemporary Ergonomics’98,London: Taylor and Francis, p. 419-423.

Graham, R., Young, D., Pichancourt, I., Marsden, A. and Irkiz, I. (1994) ODID IV simulationreport. EUROCONTROL Experimental Centre, EEC Report No. 269/94.

Gronlund, S. D., Ohrt, D., Dougherty, M., Perry, J.L. and Manning C. (1998) Aircraftimportance and its relevance to situation awareness. Report DOT/FAA/AM-98/16. FederalAviation Administration.

Hackman, J. R. (1983) A normative model of work team effectiveness (Tech. Rep. No. 2).New Haven, CT: Yale University.

Hackman, J. R. (1998) Why teams don't work. In, R. S. Tindale, J. Edwards, & E. J. Posavac(Eds.). Applications of theory and research on groups to social issues . New York: Plenum.

Hale, S. and Baker, S. (1990) The presentation of Short Term Conflict Alert: A human factorsperspective. Civil Aviation Authority, DORA Report 9018, June.

Hall, R.J. (1996) Trusting your assistant. Proc. of KBSE’96, 42-51.Harris, R.L., Glover, B.L., & Spady, A.A. (1986). Analytic techniques of pilot scanning

behavior and their application (NASA Technical Paper 2525).Harris, R.L., Tole, J.R., Ephrath, J.R. & Stephens, A.T. (1982). How a new instrument affects

a pilot's mental workload. Proceedings of the Human Factors Society 26th Annual Meeting,1010-1013.

Hart, S.G. (1990). Pilots' workload coping strategies. In Challenges in aviation humanfactors: the national plan. pp.25-28. Washington, DC: American Institute of Aeronautics andAstronautics.

Hart, S.G. & Hauser, J.R. (1987). Inflight application of three pilot workload measurementtechniques. Aviation, Space and Environmental Medicine. May, 402-410.

Hart, S.G. & Stavelend, L.E. (1988). Development of the NASA-TLX (Task Load Index):results of empirical and theoretical research. In P.A. Hancock & N. Meshkati (Eds.) Humanmental workload. Holland: Elsevier.

Hart, S. & Wickens, C.D. (1990).Workload assessment and prediction. In H.R. Booher (Ed.)MANPRINT: An emerging technology, advanced s for integrating people, machine, andorganization. New York: Van Nostrand Reinhold.

Hauß, Y., Gauss, B. and Eyferth, K. (2001) The influence of multi-sector-planning on thecontrollers’ mental models. In, D. Harris (Ed), Engineering Psychology and CognitiveErgonomics. Volume Five. Ashgate Publishing, 203-209.

Hauß, Y., Gauss, B. and Eyferth, K (2000) The Evaluation of a Future Air TrafficManagement: Towards a new approach to measure Situation Awareness in Air TrafficControl. In, Camarinha-Matos L.M.; Afsarmanesh, H.; and Erbe, H (Eds), Advances inNetworked Enterprises: virtual Organizations, Balanced Automation and SystemIntegration, CD Suppl., Boston: Kluwer.

Helmreich, RL, Merritt, AC, Wilhelm, JA. The evolution of crew resource managementtraining in commercial aviation. International J Aviation Psych 1999; 9: 19-32.

Hendy, K.C., Hamilton, K.M., & Landry, L.N. (1993). Measuring subjective workload: when isone scale better than many? Human Factors, 35(4), 579-601.

Hilburn, B. (1998). Techniques for Evaluating Human / Machine System Performance. InM.W. Scerbo & M. Mouloua [Eds.] Automation Technology and Human Performance.Hillsdale, New Jersey: Lawrence Erlbaum Associates.

Hilburn, B., Bakker, M.W.P. & Pekela, W.D.(1997). The Effect of Free Flight on Air TrafficController Mental Workload, Monitoring and System Performance. Technical PaperTP98237. Amsterdam, The Netherlands: NLR.

Page 66: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5Human Performance Metric Report Version 0.1– 30 May 2002

66

Hilburn, B. & Nijhuis, H.B (2000). Free Route Airspace Project (FRAP) Human PerformanceMeasurements. Contract Report CR-2000-040. Amsterdam, Netherlands: NationalAerospace Laboratory NLR.

Hoekstra, J.M., Ruigrok, R.C.J., van Gent, R.N.H.W., Visser, J., Gijsbers, B., Valenti Clari,M.S.C.V., Heesbeen, W.W.M., Hilburn, B.G., Groeneweg, J.G., Bussink, F.J.L. (2000),Overview of NLR Free Flight project 1997-1999, NLR Technical Paper TP 2000-227, May2000.

Hollnagel, E., Cacciabue, P.C. and Bagnara, S. (1994) Workshop report. The limits ofautomation in air traffic control; Int. J. Human-Computer Studies, 40, 561-566.

Hopkin, V. David (1975) The controller versus automation. In AGARD AG-209.Hopkin, V.D. (1998) The impact of automation on air traffic control specialists; in,M.W. Smolensky and E.S. Stein, Human Factors in Air Traffic Control, Academic Press, 391-

419.Hutchins, E. and Klausen, T. (1991) Distributed cognition in the cockpit; in, Y. Engestrom and

D. Middleton, Cognition and communication at work Cambridge University Press..Hoogeboom, P.J. (2000). DIVA – WP3: Evaluation methodology. Amsterdam: NLR-TR-2000-

517.ICAO (1994) Human Factors Digest No.11. Human factors in CNS/ATM systems. The

development of human-centred automation and advances technology in future aviationsystems. International Civil Aviation Organization, ICAO Circular 249-AN/149.

Inagaki, T. (1999) Automation may be given the final authority. Proc. of CybErg 1999: The2nd Int. Cyberspace Conf. on Ergonomics. Int. Ergonomics Assoc. Press, 68-74.

Isaac, A., Shorrock, S., Kennedy, R., Kirwan, B., Andersen, H. and Bove, T. (2001) TheHuman Error in ATM (HERA) technique. HRS/HSP-002-REP-03. Edition 0.2. Draft Issue.Brussels: EUROCONTROL.

Javaux, D. and Figarol, S. (1995) Distributed situation awareness: A to cope with thechallenge of tomorrow. In, R. Fuller, N. Johnston and N. McDonald (Eds), Human Factors inAviation Operations. Aldershot: Avebury Aviation, 293-298.

Jay, R. (1993) Selecting the perfect team. Belbin Associates / Video Arts.Jeannot, E. (2000a) Situation Awareness: Synthesis of Literature Search, EEC Note No.

16/00, project ASA-Z-EC.Jeannot, E. (2000b) SPAM evaluation. EEC Note. Bretigny-sur-Orge: EUROCONTROL.Jeannot, E., Kelly, C. and Thompson, D. (2001) The development of situation awareness

measures in ATM systems. HRS/HSP-005-REP-04. Edition 0.1. Draft Issue. Brussels:EUROCONTROL.

Jensen, S. (1999) Perceived versus real situation awareness: towards more objectiveassessment of SA. In, L. Straker and C. Pollack (Eds). CD-ROM Proceedings of CybErg1999: The Second International Cyberspace Conference on Ergonomics. Perth, Australia:International Ergonomics Association Press. 327-334.

Jex, H.R., & Clement, W.F. (1979). Defining and measuring perceptual-motor workload inmanual control tasks. In N. Moray (Ed.), Mental workload: its theory and measurement. pp.125-178. New York: Plenum Press.

Jian, J-J, Bisantz, A.M. and Drudy, C.G. (1998) Towards an empirically determined scale oftrust in computerized systems: Distinguishing s and types of trust. Proc. of the HumanFactors and Ergonomics Society Annual Meeting, Chicago, 501-505.

Jian, J-J, Bisantz, A.M. and Drudy, C.G. (2000) Foundations for an empirically determinedscale of trust in automated systems. Int. J. of Cognitive Ergonomics, 4(1), 53-71.

Jones, D.G. (2000) Subjective measures of situation awareness. In, In, M.R. Endsley andD.J. Garland (Eds), Situation Awareness Analysis and Measurement. Mahwah NJ:Lawrence Erlbaum Associates. 113-128.

Jones, D. G., and Endsley, M. R. (1996) Sources of situation awareness errors in aviation.Aviation, Space and Environmental Medicine, 67(6), 507-512.

Jones, R.E., Milton, J.L. & Fitts, P.M. (1949). Eye fixations of aircraft pilots (USAF TechnicalReport 5837). US Air Force.

Page 67: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5Human Performance Metric Report Version 0.1– 30 May 2002

67

Joseph, K. M. and Uhlarik, J. (1997) Using the competence performance distinction toidentify and assess situation awareness in a simulated IFR environment. InternationalSymposium of Aviation Psychology, pp.1429-1435. Columbus, Ohio: Ohio State University.

Katzenbach, J.R. and Smith, D.K. (1994) The Wisdom of Teams: Creating the High-Performance Organization. Harper Business.

Kelly, C., Boardman, M., Goillau, P. and Jeannot, E. (2001) Principles and guidelines for thedevelopment of trust in future ATM systems: A literature review. EUROCONTROL EATMPReport HRS/HSP-005-REP-01.

Kelly, C. and Goillau, P. (1996) Cognitive Aspects of ATC: Experience of the CAER &PHARE Simulations. Paper presented at 8th European Conference on CognitiveErgonomics (ECCE - 8), Granada, 10th - 13th September.

Kelly, C.J. and Goillau, P.J., Finch, W. and Varellas, M. (1995) CAER Future System 1 (FS1)Final trial report. Defence Research and Evaluation Agency, Report No.DRA/LS(LSC4)/CTR/RPT/CD246/1.0, November.

Kilner, A., Hook, M., Fearnside, P. & Nicholson, P. Contemporary Ergonomics.Kirakowski, J. (1994) The Use of Questionnaire Methods for Usability Assessment.

Unpublished. http://www.ucc.ie/hfrg/questionnaires/sumi/Klimoski, R.J. and Mohammed, S. (1994) Team mental models: construct or metaphor?

Journal of Management, 20, 2, 403-437.Kline, P. (2000) Handbook of Psychological Testing. 2nd edition. Routledge, N.Y.Kraiger and Wenzel (1997) ual Development and Empirical Evaluation of Measures of

Shared Mental Models as Indicators of Team Effectiveness. In, Brannick, M.T.; Salas, E. &Prince, C. Eds Team performance assessment and measurement, Mahwah NJ: LawrenceErlbaum Associates.

Kramer, R.M. (1999) Trust in organizations: Emerging perspectives, enduring questions.Annu. Rev. Psychology, Vol 50, 569-598.

Kramer, A.F. (1991). Physiological metrics of mental workload: a review of recent progress.In D.L. Damos (Ed.), Multiple task performance. London: Taylor & Francis.

Krozel, J. (2000). Free Flight Research Issues and Literature Search. Report TR 00RT043-04. Los Gatos, California: Seagull Technology.

Langan-Fox, J., Code, S. and Langfield, K. (2000) Team mental models: Techniques,methods, and analytic approaches. Human Factors, 42, 2, 242-271.

Lee, K. and Davis, T.J. (1995) The development of the Final Approach Spacing Tool (FAST):A cooperative controller-engineer design approach; NASA Technical Memorandum 110359,August.

Lee, J. and Moray, N. (1992) Trust, control strategies and allocation of function in human-machine systems. Ergonomics, 35, 10, 1243-1270.

Lee, J.D. and Moray, N. (1994) Trust, self-confidence, and operators’ adaptation toautomation. Int. J. Human-Computer Studies, 40, 153-184.

Leplat, J. (1978). The factors affecting workload. Ergonomics, 21, 143-149.Lewandowsky, S., Mundy, M. and Tan, G. (2000) The dynamics of trust: Comparing humans

to automation. J. of Experimental Psychology: Applied. Vol 6, 2, 104-123.Liu, C.and Hwang, S. (2000) Evaluating the effects of situation awareness and trust with

robust design in automation. International Journal of Cognitive Ergonomics, 4 (2), 125-144.Lodge, M. (Ed) (2000) Results of the experiment. WP3 report. JAR-TEL Consortium Report

JARTEL/BA/WP3/D5_06B. European Commission, DG TREN.Lozito, S. & McGann, A. et al (1997); “Free Flight and Self-Separation from the Flight Deck

Perspective”; NASA Ames Research Center, San Jose State University, paper ATM ‘97Conference

Lozito, S., McGann, A., Mackintosh, M., Cashion, P. (1997). Free flight and self-separationfrom the flight deck perspective. The First United States/European Air Traffic ManagementResearch and Development Seminar, Sacley, June 16-19, 1997.

Lysaght RJ, Hill SG, Dick AO, Plamondon BD, Linton PM, Wierwille WW, Zaklad AL, Bittner JrAC, Wherry RJ. (1989). Operator workload: comprehensive review and evaluation of operator

Page 68: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5Human Performance Metric Report Version 0.1– 30 May 2002

68

workload methodologies. U.S. Army Research Institute. Fort Bliss, TX: Technical Report No.851, (MDA 903-86-C-0384).

Madsen, M. and Gregor, S. (2000) Measuring human-computer trust. In, Proceedings ofEleventh Australasian Conference on Information Systems, Brisbane, 6-8 December.

Manning, C. (2000) Measuring air traffic controller performance in a high-fidelity simulation.DOT/FAA/AM-00/2. Federal Aviation Administration.

Manning, C., Mills, S., Mogilka, H., Hedge, J., Bruskiewicz, K. and Pfleiderer, E. (2000)Prediction of subjective ratings of air traffic controller performance by computer-derivedmeasures of behavioral observations. In, C. Manning, C. (Ed) Measuring air traffic controllerperformance in a high-fidelity simulation. DOT/FAA/AM-00/2. Federal AviationAdministration.

Masalonis, A.J., Duley, J., Galster, S., Castano, D., Metzger, U. and Parasuraman, R. (1998)Air traffic controller trust in a conflict probe during Free Flight. Proc. of the 42nd Annualmeeting of the Human Factors and Ergonomics Society, 1607.

Masalonis, A.J. and Parasuraman, R. (1999) Trust as a construct for evaluation of automatedaids: Past and future theory and research. Proc. of the Human Factors and ErgonomicsSociety 43rd Annual Meeting, 184-188.

Masson, M. and Pariès, J.(1998) Team resource Management training for Air trafficcontrollers. In, In, Proceedings of the Second EUROCONTROL Human Factors Workshop,Teamwork in Air Traffic Services. EUROCONTROL EATCHIP, Released Issue. Edition 1.0.30/01/198.

Mogford, R.H. (1994) Mental Models and Situation Awareness in Air Traffic Control. TheInternational Journal of Aviation Psychology, 7 (4).

Mogford, R.H. (1997) Mental models and situation awareness in air traffic control.International Journal of Aviation Psychology, 7, (4), 331-342.

Mogford, R.H., Murphy, E.D., Roske-Hofstrand, R.J., Yastrop, G. & Guttman, J.A. (1994).Research techniques for documenting cognitive processes in air traffic control: sectorcomplexity and decision making (Report DOT/FAA/CT-TN94/3). Pleasantville, New Jersey:CTA Incorporated.

Mooij, H.A. (1995). Point of gaze measurement in aviation research. Paper presented at the79th Symposium of the Aerospace Medical Council, 23-27 April, 1995, Brussels.

Moray, N. (1999) Monitoring, complacency, scepticism and eutectic behaviour. Proc. ofCybErg 1999: The 2nd Int. Cyberspace Conf. on Ergonomics. Int. Ergonomics Assoc.Press.

Moray, N. (2001) Personal communication.Moray, N., Inagaki, T. and Itoh, M. (2000) Adaptive automation, trust and self-confidence in

fault management of time-critical tasks; J. of Experimental Psychol: Applied, 6, 1, 44-58.Moray, N. and Inagaki, T. (2001) Attention and complacency. In press – Theoretical Issues in

Ergonomics.Morgan, B. B., Jr., Glickman, A., Woodard, E., Blaiwes, A. and Salas, E. (1986)

Measurement of team behaviours in a navy environment. Rep. NTSC TR-86-014, Orlando,FL: Naval Training Systems Centre.

Mosier, K.L. and Chidester, T.R. (1991) Situation assessment and situation awareness in ateam setting. In, Proceedings of the Special Session on Situational Awareness at the 11th

Congress of the International Ergonomics Association, Paris, France, 17th July 1991Mouloua, M. and Koonce, J. (1997) Human-automation interaction: Research and Practice.

Lawrence Erlbaum Associates.Muir, B. (1987) Trust between humans and machines, and the design of decision aids; Int. J.

Man-Machine Studies, 27, 527-539.Muir, B. (1994) Trust in automation: Part 1. Theoretical issues in the study and human

intervention in automated systems. Ergonomics, 37, 1905-1923.Muir, B. and Moray, N. (1996) Trust in automation. Part II. Experimental studies of trust and

human intervention in a process control simulation. Ergonomics, 39, 3, 429-460.Muniz, E.J., Stout, R.J., Bowers, C.A., and Salas, E. (1998) A methodology for measuring

team situational awareness: Situational Awareness Linked Indicators Adapted to Novel

Page 69: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5Human Performance Metric Report Version 0.1– 30 May 2002

69

tasks (SALIANT). Paper presented at the RTO HFM Symposium on Collaborative CrewPerformance in Complex Operational Systems, Edinburgh 20-22 April. RTO/NATO.

Neal, A., Griffin, M., Paterson, J. and Bordia, P. (1998) Human factors issues: Performancemanagement transition to a CNS/ATM environment. (Final Report: Airservcies Australia).Brisbane: University of Queeensland.

Neisser, U. (1976) Cognition and reality: Principles and implications of cognitive psychology.San Francisco: W.H. Freeman.

Nickleby (1998) Instructional strategies for training teams: Study report. Nickleby Ltd.,(QinetiQ/CHS contract. B/C111/FD.2/01/01).

Niessen, C., Eyferth, K. and Bierwagen, T. (1999) Modelling cognitive Processes of Experienced Air Traffic Controller. Ergonomics, Vol. 42 (11), 1507-

1520.Nijhuis, H., Buck, S., Kelly, C., Goillau, P., Fassert, C., Maltier, L. and Cowell, P. (1999)

WP8: Summary and consolidation of RHEA results. European Commission DGVII, ReportRHEA/NL/WPR/8/04, 28th Feb.

Nieva, V. F., Fleishman, E. A. and Reick, A. (1978) Team dimensions: Their identity, theirmeasurement and their relationships. Washington, DC: Advanced Research ResourcesOrganization.

NRC (1997) Flight to the Future. Human Factors in Air Traffic Control. Panel on HumanFactors in Air Traffic Control Automation. Commission on Behavioral and social Sciencesand Education, National Research Council. Washington D.C.: National Academy Press.

NRC (1997) More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation’sInformation Infrastructure. Commission on Physical Sciences, Mathematics andApplications. National Research Council.

NTSC (undated) Training booklet for AEGIS Teamwork Observation Measure (ATOM).Orlando, FL: Naval Training Systems Center.

Nygren, T.E.(1991). Psychometric properties of subjective workload measurementtechniques: Implications for their use in the assessment of perceived mental workload.Human Factors, 33,17-33.

Ochanine, D. (1969) Rôle de l'image opérative dans la saisie du contenu informationnel dessignaux. Questions de psychologie.

Ochanine D. (1981) L'image opérative. Actes d'un séminaire et recueil d'articles, Centred'Education Permanente, Département d'Ergonomie et d'Ecologie Humaine, UniversitéParis l.

Oser, R., McCallum, G., Sals, E. and Morgan B. (1989) Toward a definition of teamwork: Ananalysis of critical team behaviour. NTSC TR 89-004, Orlando, FL: Naval Training SystemsCenter.

Parasurman, R., Molloy, R. and Singh, I.L. (1993) Performance consequences ofautomation-induced "complacency". Int. J. of Aviation Psychology, 3, 1-23.

Parasuraman, R. and Riley, V. (1997) Humans and automation: Use, misuse, disuse, abuse.Human Factors, 39, 2, 230-253.

Parasuraman, R., Sheridan, T.B. and Wickens, C. (2000) A model for types and levels ofhuman interaction with automation. IEEE Trans. On Systems, Man and Cybernetics-Part A:Systems and Humans, Vol 30, No.3, 286-297.

Pew, R.W. (1979). Secondary tasks and workload measurement. In N. Moray (Ed.), Mentalworkload: its theory and measurement. pp. 23-28. New York: Plenum Press.

Phillipp, U., Reiche, D., & Kirchner, J.H. (1971). The use of subjective rating. Ergonomics,14, 611-616.

Pocock, S., Harrison, M., Wright, P. & Johnson, P. THEA: A Technique for Human ErrorAssessment Early in Design. Proceedings Interact01, Japan, 2001.

Powers, W.T. (1973) Behavior: The control of perception. Chicago: Aldine.Preece, J. et al. (1997) Human-Computer Interaction. Addison-Wesley.Prince, C., & Salas, E. (1993). Training and research for teamwork in the military aircrew. In

E. L. Wiener, B. G. Kanki, & R. L. Helmreich (Eds.), Cockpit resource management.Orlando, FL: Academic Press. 337-366.

Page 70: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5Human Performance Metric Report Version 0.1– 30 May 2002

70

Prince, C. and Salas, E. (2000) Team situation awareness, errors, and crew resourcemanagement: Research integration for training guidance. In, M.R. Endsley and D.J.Garland (Eds), Situation Awareness Analysis and Measurement. Mahwah NJ: LawrenceErlbaum Associates.

Pritchett A.R., Hansman R.J. & Johnson E.N. (1996) : “Use of testable responses forperformance-based measurement of situation awareness” Conference on ExperimentalAnalysis and Measurement of Situation Awareness, Daytona Beach.

Reichmuth J., Schick, F., Adam, V., Hobein, A., Link, A., Teegen, U. and Tenoort, S. (1998)PD/2 Final Report. EUROCONTROL PHARE Report PHARE/DLR/PD/2-10.2/SSR;1.2.February.

Rempel, J.K., Holmes, J.G. and Zanna, M.P. (1985) trust in close relationships. J. ofPersonality and Social Psychology, 49, 1, 95-112.

Riley, V. (1994) Human use of automation. Unpublished doctoral dissertation, University ofMinnesota.

Rugg-Gunn, M., Cunningham, D., Grimshaw, T. and Lawrence, S. (1999) Team TNAmethods for the procurement and selection of team trainers. Report No.DERA/CHS/MID/CR990376/1.0. Farnborough: QinetiQ Ltd.

Ruitenberg, B. (1996) CRM in ATC: Is it feasible? In, B. J. Hayward and A.R. Lowe (Eds),Applied Aviation Psychology. Achievement, Change and Challenge. Avebury Aviation. 247-256.

Ruitenberg, B. (1997) Situational awareness in ATC : a model. The Controller, Vol 36, No.1.Ruitenberg, B. (1998) Teamwork for air traffic controllers. In, Proceedings of the Second

EUROCONTROL Human Factors Workshop, Teamwork in Air Traffic Services.EUROCONTROL EATCHIP, Released Issue. Edition 1.0. 30/01/198.

Salas, E. and Cannon-Bowers, J.A. (2001) The Science of training: A decade of progress.Annual Review of Psychology, Vol. 52, 471-499.

Salas, E., Muniz, E.J. and Prince, C. (2000) Situation awareness in teams. In, InternationalEncyclopedia of Ergonomics and Human Factors. Taylor & Francis, 555-557.

Sarter, N. and Woods, D. (1991) Situation Awareness: A Critical but ill-defined phenomenon.The International Journal of Aviation Psychology, 1 (1).

Salas, E., Prince, C., Baker, D.P. and Shrestha, L. (1995) Situation awareness in teamperformance: Implications for measurement and training. Human Factors, 37 (1), 123-136.

Scerbo, M.W. and Mouloua, M. (1999) Automation technology and automation performance:Current research and trends. Lawrence Erlbaum Associates.

Schick, F. (1997) PD/1 Final Report Annex D. Analysis of questionnaires. PHARE DOC 96-70-24, Version 1.1. Brussels: EUROCONTROL.

Schneider, F.B. (1999) Trust in cyberspace. National Academy Press.Sheridan, T.B. (1988) Trustworthiness of command and control systems. Proc. of Analysis,

Design and Evaluation of man-Machine Systems 1988, 3rd IFAC/IFIP/IEA/IFORS Conf.,Finland, 14-16 June.

Shorrock, S. (1999) Human Error Identification for NATS Systems. R&DG File Reference8RD/14/16/28/1017/HF.

Shorrock, S. (1999) Human Error Identification for the New Scottish Centre Human machineInterface—Main Report. File 8RD/14/16/28/1017/HF

Shorrock, S. (2000) Summary Paper on the Technique for the Retrospective and PredictiveAnalysis of Cognitive Errors. Draft.

Shorrock, S. (2000) Development of TRACEr lite – an Error Analysis Tool for OperationalIncident Investigation. R&DG File Reference ATMDC/HF/1016/TN/1.0.

Shorrock, S. and Scaife, R. (2000) Evaluation of an alarm management system for an ATCcentre. In, D. Harris (Ed) Engineering Psychology and Cognitive Ergonomics: Volumes 5and 6. Ashgate Publishing.

Simpson, A. (1992) HCI issues in trust and acceptability; Defence Evaluation and ResearchAgency, Report No. DRA TM(CAD5) 92018, November.

Simpson, A. (1995) Seaworthy trust: Confidence in automated data fusion. In, R. Taylor andJ. Reising (Eds), The Human-Electronic Crew: Can we Trust the Team? Proc. of the 3rd Int.

Page 71: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5Human Performance Metric Report Version 0.1– 30 May 2002

71

Workshop on Human-Computer Teamwork. Defence Evaluation and Research Agency,Report No. CHS/HS3/TR95001/02, 77-81.

Smith-Jentsch, K., Johnston, J. and Payne, S. (1998) Measuring team-related expertise incomplex environments. In: Cannon-Bowers, J. and Salas, E. Making decisions understress. Implications for individual and team training. American Psychological Association,Washington, DC)

Smith, K. and Hancock, P.A. (1995) Situation awareness is adaptive, externally directedconsciousness. Human Factors, 37, 1, 137-148.

Stein, E.S. (1985) Air traffic controller workload: An examination of workload probe (ReportFAA/CT-TN90/60). Atlantic City, New Jersey: FAA Technical Center.

Stoner, C. (1995) Controllers as air traffic managers. In, Proc. of Global NAVCOM’95,Montreal, 23-25 May.

Swain & Guttman (1983) NUREG/CR-1278.Swezey, R.W. and Salas, E. (1992) Teams: their training and performance. Norwood, N.J.

Ablex Publishing.Syer, J. and Connolly, C. (1996) How teamwork works : The dynamics of effective team

development. McGraw-Hill Professional PublishingTan, G. and Lewandowsky, S. (1996) A comparison of operator trust in humans versus

machines. Proc. of CybErg 1996: The 1st Int. Cyberspace Conf. on Ergonomics. Int.Ergonomics Assoc. Press.

Taylor, R.M. (1988) Trust and awareness in human-electronic crew teamwork. In, TheHuman-Electronic Crew: Can They Work Together? Wright-Patterson AFB, OH., ReportWRDC-TR-89-7008.

Taylor, R.M. (1989) Situational Awareness Rating Technique (SART): the development of atoll for aircrew systems design. In, AGARD Conference Proceedings No 478, SituationalAwareness in Aerospace Operations. Aerospace Medical Panel Symposium, Copenhagen,2nd-6th October 1989.

Taylor, R.M. (1995a) CC-SART: The Development of an Experiential Measure of CognitiveCompatibility in System Design. Report to TTCP UTP-7 Human Factors in AircraftEnvironments, Annual Meeting, DCIEM, Toronto, 12th-16th June 1995.

Taylor, R.M. (1995b) “Experiential Measures: Performance-Based Self Ratings of SituationalAwareness” International Conference on Experimental Analysis and Measurement ofSituation Awareness, Daytona Beach, 1st-3rd November 1995.

Taylor, R.M., Shadrake, R. and Haugh, J. (1995) Trust and adaptation failure: Anexperimental study of unco-operation awareness. R. Taylor and J. Reising (Eds), TheHuman-Electronic Crew: Can we Trust the Team? Proc. of the 3rd Int. Workshop on Human-Computer Teamwork. Defence Evaluation and Research Agency, Report No.CHS/HS3/TR95001/02, 93-98.

Tenney, Y.J., Adams, M.J., Pew, R.W., Huggins, A., and Rogers, W.H. (1992) A principledapproach to the measurement of situation awareness in commercial aviation. NASAContractor Report 4451, July.

Tole, J.R., Stephens, A.T., Vivaudou, Eprath, A. & Young, L.R. (1983). Visual scanningbehavior and pilot workload (NASA Contractor Report 3717). Hampton, Virginia: NASALangley Research Center.

Vidulich, M.A . (2000) “Testing the Sensitivity of Situation Awareness Metrics in InterfaceEvaluations” in Situation Awareness Analysis and Measurement, Endsley, M, R & Garland,D, J, Lawrence Erlbaum Associates Publisher, London.

Weigner, M.B. (1997) Human-user medical device interactions in the anesthesia workenvironment. In, M. Mouloua and J. Koonce (eds), Human-Automation Interaction:Research and Practice. Lawrence Erlbaum Associates, 241-248.

Weston, R. (1983) Human factors in air traffic control. Int. Journal of Aviation Safety, 1, 94-104.

Whitaker, R. and Marsh, D. (1997) PD/1 Final Report, PHARE Report DOC 96-70-24,PHARE/NATS/PD1-10.2/SSR;1.1.

Page 72: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

CARE/ASAS/VF-NLR–WP3-D5

CARE/ASAS Activity 2: VF Project-WP3-D5Human Performance Metric Report Version 0.1– 30 May 2002

72

Whitfield and Jackson (1982) “The Air Traffic Controller’s picture as an example of a MentalModel, in G. Johannsen and J.E. Rijnsdorp; Analysis, Design and Evaluation of Man-Machine Systems, Proceedings of IFAC Conf., Baden-Baden, Germany.

Wickens, C.D., Mavor, A.S. and McGee, P. (1997) Flight to the future. Human factors in airtraffic control; Commission on Behavioral and Social Sciences and Education, NationalResearch Council, National Academy Press.

Wiener, E.L. (1985) Beyond the sterile cockpit. Human Factors, 27, 1, 75-90.Wiener, E.L. and Curry, R.E. (1980) Flightdeck automation: promises and problems;

Ergonomics, 23, (10), 995-1011.Wiener, E.L., Kanki, B.G. and Helmreich, R.L. (Eds) (1993) Cockpit resource management.

Academic Press.Wierwille, W. (1979). Physiological measures of aircrew mental workload. Human Factors,

21, 575-594.Wierwille, W. & Connor, S. (1983). Evaluation of 20 workload measures using a psychomotor

task in a moving base aircraft simulator. Human Factors, 25, 1-16.Willems, B. (2000) Development of the Situation Awareness Verification and Analysis Tool

(SAVANT). FAA William J. Hughes Technical Center. Unpublished.Willems, B. (2001) Study of an ATC Baseline for the Evaluation of Team-configurations

(SABET). FAA William J. Hughes Technical Center. Unpublished.Willems, B., Heiney, M., and Endsley, M. (2001, in press) Decision Support Automation

Research in the En Route Air Traffic Control Environment: Interim Report. Draft version,FAA Technical Center, Human Factors Laboratory, Atlantic City.

Wilson, J.R. & Corlett, E. N (1999). Evaluation of Human Work. A Practical ErgonomicsMethodology. London: Taylor & Francis.

Wise, J.A., Hopkin, V.D. and Smith, M.L. (1991) Automation and system issues in air trafficcontrol; Springer-Verlag.

Yeh, Y-Y and Wickens, C.D. (1984) Why do performance and subjective workload measuresdissociate? Proc. of the Human Factors Society 28th Annual Meeting, 504-508.

Yeh, Y. & Wickens, C.D. (1988). Dissociation of performance and subjective measures ofworkload. Human Factors, 30, 111-120.

Page 73: CARE/ASAS/Activity 2 Validation Framework WP3 Deliverable ...€¦ · Work Package 3 (Human Performance Metrics) identifies and provides guidance for measuring human performance and

Page: 11[RSE1]+ ‘Using the same logic, the Human Performance Areas relate to all of the SystemPerformance Areas identified in WP2, so no individual mapping between Human PerformanceAreas and System Performance Areas is provided.’

Page: 54[RSE2]Could you add mention of Task Performance here. Eg Task Performance was alsoidentified but as its use will be specific to each individual ASAS application, it was notdeveloped as part of the validation framework.Page: 55[RSE3]Add ‘both system performance areas and the high level objectives’