traceable ai - fedvte.usalearning.gov

Traceable AI

Table of Contents

Notices ............................................................................................................................................ 2

Traceable ......................................................................................................................................... 3

Traceable ......................................................................................................................................... 4

AI transparency ............................................................................................................................... 5

Data transparency ........................................................................................................................... 7

Datasheets for Datasets (markdown) ............................................................................................. 8

Datasheets for Datasets .................................................................................................................. 9

AI auditability for Cyber ................................................................................................................ 10

Explainable AI (XAI) – DARPA ........................................................................................................ 11

Explainable AI (XAI) Program ........................................................................................................ 12

Traceable – Review ....................................................................................................................... 13

Notices

119

Copyright 2020 Carnegie Mellon University.

This material is based upon work funded and supported by the Department of Homeland Security under Contract No. FA8702-15-D-0002 with Carnegie Mellon University

for the operation of the Software Engineering Institute, a federally funded research and development center sponsored by the United States Department of Defense.

The view, opinions, and/or findings contained in this material are those of the author(s) and should not be construed as an official Government position, policy, or decision,

unless designated by other documentation.

NO WARRANTY. THIS CARNEGIE MELLON UNIVERSITY AND SOFTWARE ENGINEERING INSTITUTE MATERIAL IS FURNISHED ON AN "AS-IS" BASIS.

CARNEGIE MELLON UNIVERSITY MAKES NO WARRANTIES OF ANY KIND, EITHER EXPRESSED OR IMPLIED, AS TO ANY MATTER INCLUDING, BUT NOT

LIMITED TO, WARRANTY OF FITNESS FOR PURPOSE OR MERCHANTABILITY, EXCLUSIVITY, OR RESULTS OBTAINED FROM USE OF THE MATERIAL.

CARNEGIE MELLON UNIVERSITY DOES NOT MAKE ANY WARRANTY OF ANY KIND WITH RESPECT TO FREEDOM FROM PATENT, TRADEMARK, OR COPYRIGHT INFRINGEMENT.

[DISTRIBUTION STATEMENT A] This material has been approved for public release and unlimited distribution. Please see Copyright notice for non-US Government use

and distribution.

Internal use:* Permission to reproduce this material and to prepare derivative works from this material for internal use is granted, provided the copyright and “No Warranty”

statements are included with all reproductions and derivative works.

External use:* This material may be reproduced in its entirety, without modification, and freely distributed in written or electronic form without requesting formal permission.

Permission is required for any other external and/or commercial use. Requests for permission should be directed to the Software Engineering Institute at

[email protected].

* These restrictions do not apply to U.S. government entities.

DM20-0577

**119 Instructor: Let's talk about

Traceable

C I S A | C Y B E R S E C U R I T Y A N D I N F R A S T R U C T U R E S E C U R I T Y A G E N C Y

Traceable

61

**061 traceable AI.

Traceable

AI capabilities will be developed and deployed such that relevant personnel

possess an appropriate understanding of the technology, development

processes, and operational methods applicable to AI capabilities, including with

transparent and auditable methodologies, data sources, and design

procedure and documentation.

Traceable

62

**062 AI capabilities will be

developed and deployed such that

relevant personnel possess an

appropriate understanding of the

technology, development, processes

and operational methods applicable

to AI capabilities, including with

transparent and auditable

methodologies, data sources and

design procedures and

documentation. I realize that is quite

a bit to cover in one slide, and I'll be

getting into the details of this.

AI transparency

Not “unknowable”

Impart confidence and trust to personnel

Technology and methodologies are explained appropriately

Access provided to details

Rationale for decisions and recommendations are provided

AI transparency

63

**063 So AI transparency is about

understanding how the system

works. These AI systems are often

described as not knowable, that they

are unknowable, and that is not what

we want these systems to be. We

want these systems to be

understandable, to be knowable.

Another term that is used is black

box. I'm not going to be using that

term today because it is based on

racist imagery, but that is another

term that you'll commonly hear, and

people use this as a way of talking

about how complex these systems

are. But they don't have to be that

complex. They can be designed to

be knowable.

We want to impart confidence and

trust to personnel who are using

these systems, and the way to do

that is by giving them more

transparency into the system, how

the system is working, what the

system is doing.

Technology and methodologies need

to be explained appropriately and

access needs to be provided to those

details, and again, this is as

appropriate for the individual, their

permissions and their access, and

those types of considerations need to

be considered.

The rationale also needs to be

provided for decisions and

recommendations that the system is

making so that people really

understand what is going on. This is

the transparency aspect that will help

to make really good systems.

Data transparency

Understand data

Provenance

Creator’s motivation, composition and collection

Transparency improves with use of:

Datasheets for Datasets*

Model Cards for ML systems

Data transparency

64

*Datasheets for Datasets. Working Paper by Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer

Wortman Vaughan, Hanna Wallach, Hal Daumé III, Kate Crawford https://arxiv.org/abs/1803.09010

**064 Data transparency is also

important. This is about making sure

that people understand the data and

the information about the data. So

what is the provenance of the data?

What are the creator's motivations,

composition, and how did they even

collect the data? These types of

pieces of information can improve

trust of the system and help people

to understand the parameters of the

system and can help them also to

understand the limitations of the

system.

Transparency will improve with the

use of a variety of different support

pieces that are being developed by

researchers right now. So datasheets

for datasets, and I will talk about that

in more detail today, as well as

model cards for ML systems. Both

can provide methods of describing

the information that is being

presented, describing the data, so

that people have a clear idea of what

that information is, how it was

collected, how they can use it, and

what it was created for, what the

purpose was in the creation of that.

Datasheets for Datasets (markdown)

Datasheets for Datasets (markdown)

65

“markdown-datasheet-for-datasets” Josh Meyer.

GitHub: https://github.com/JRMeyer/markdown-datasheet-for-datasets/blob/master/DATASHEET.md

**065 And with datasheets for

datasets, it is basically a set of

questions that you answer as you're

thinking about your data, and this is

an example of what a datasheet

might look like. This markdown was

created by Josh Meyer, based on the

initial paper about datasheets for

datasets, and you can see some of

the questions.

For what purpose was this dataset

created? Just really being very

explicit about what this data is and

why it is in existence.

Datasheets for Datasets

Transparency and clarity

Motivation

Composition

Collection

Preprocessing / Cleaning / Labeling

Uses

Distribution

Maintenance

Datasheets for Datasets

66

Datasheets for Datasets. Working Paper by Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer

Wortman Vaughan, Hanna Wallach, Hal Daumé III, Kate Crawford https://arxiv.org/abs/1803.09010

**066 And there are a variety of

different sets of questions in

datasheets for datasets. They

include motivation, composition,

collection, preprocessing, cleaning

and labeling, uses, distribution and

maintenance, and all of these have a

set of questions with them that will

help your team to better describe

that data and this will then provide

more transparency to the people

using and accessing that data.

AI auditability for Cyber

Probe with hypothetical cases

Checks for bias, brittleness or potential distribution shift

Access to history of system operation

Logs

User access*

Records of user and purpose*

Mitigate harms of off-label use

Reinforce principle of Responsibility

AI auditability for Cyber

67

*Consider ethical principles when determining what data needs to be collected.

**067 AI needs to be auditable for cyber to

be helpful for individuals who are

using this technology. We need to

be able to probe the system with

hypothetical cases. We need to be

able to check for bias and brittleness

and potential distribution shift within

the data. We need to be able to

access the history of the system's

operation, and we need to keep logs,

in many cases, and those logs we

need to be careful with because there

can be ethical considerations that

need to be addressed.

So, for example, with the user's

access. How much information do

we really need to collect about the

user's access? With records of users

and purposes, again, how much do

we need to collect to be safe and to

be auditable, and how much would

be more than is necessary? We need

to also still be protecting our users.

We need to mitigate the harms of

off-label use. If someone is taking

data that we've created and using it

for a different purpose or taking the

entire AI system and using it for a

different purpose, how can we

reduce the harms that are

potentially--could come from that

use?

And then we need to reinforce the

principles of responsibility that we've

already talked about.

Explainable AI (XAI) – DARPA

**068 Explainable AI is an effort by

DARPA to really dig in deeper into

this problem and make systems that

are transparent and provide the

answers to these types of questions.

Why did you do that? Why not

something else? What did you--why

did you--when did you fail? What is

going on with the system? Really

understanding the basics about the

system helps to engender trust to the

users and generally is a helpful way

to make a system a system that is

understandable and explainable to

the end user.

Explainable AI (XAI) Program

Aims to create a suite of ML techniques that:

Produce more explainable models, while maintaining a high level of

learning performance (prediction accuracy); and

Enable human users to understand, appropriately trust, and effectively

manage the emerging generation of artificially intelligent partners.

Users can

Interpret what the AI system did

Understand AI system’s limitations

Explainable AI (XAI) Program

69

**069 And this program with DARPA

aims to create a suite of machine

learning technologies that produce

more explainable models while

maintaining a high level of learning

performance, so it will not reduce the

prediction accuracy. We need to,

well, DARPA needs to enable human

users to understand, appropriately

trust, and effectively manage the

emerging generation of artificially

intelligent partners. These are the

goals of this project that DARPA is

doing. They want users to be able to

interpret what the AI system did and

understand the AI system's

limitations, and again, this is by

design. They're designing these

systems to do that work and

encouraging others to do the same.

Traceable – Review

Relevant personnel possess an appropriate understanding

of the technology, development processes, and operational methods…

Transparency for Cyber

Data Transparency

Auditability for Cyber

Explainable AI – DARPA

Traceable – Review

70

**070 So to review, with traceable

technology, relevant personnel

possess an appropriate

understanding of the technology,

development processes and

operational methodologies, and this

is something that we can do. This is

a lot of work. Making systems

transparent is a lot of work, but the

benefit is that we get the trust of the

users, our personnel, and people who

are using the system, accessing the

data, feel that the system is

transparent, and therefore they are

more likely to trust the system

appropriately.

If it's auditable, we can actually track

what has happened and how it has

been used, and we want to make

systems that are explainable and

understandable, and we can use

DARPA's guidance as one way of

improving those systems.

traceable ai - fedvte.usalearning.gov

Documents