lecture 15: course conclusion

70
1 Serena Yeung BIODS 220: AI in Healthcare Lecture 15 - Lecture 15: Course Conclusion

Upload: others

Post on 12-Jul-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 15: Course Conclusion

1Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

Lecture 15:Course Conclusion

Page 2: Lecture 15: Course Conclusion

2Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

Announcements● TA office hours will continue to be project advising sessions during this week

○ Sign up on spreadsheet (see Ed announcement)○ Attendance is worth 5% of project grade

● Final Project Poster Session: Thu 12/9 12:15-3:15pm ● Final Project Report due Fri 12/10, 11:59pm

Page 3: Lecture 15: Course Conclusion

3Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

This course: foundations of AI in healthcare

Page 4: Lecture 15: Course Conclusion

4Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

This course: foundations of AI in healthcare

Page 5: Lecture 15: Course Conclusion

5Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

Convergence of key ingredients of deep learning Algorithms Compute

Data

Page 6: Lecture 15: Course Conclusion

6Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

Different classes of neural networks

...

Input sequence

Output sequence

Fully connected neural networks(linear layers, good for “feature vector” inputs)

Convolutional neural networks(convolutional layers, good for image inputs)

Recurrent neural networks(linear layers modeling recurrence relation across

sequence, good for sequence inputs)

Page 7: Lecture 15: Course Conclusion

7Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

Neural network parameters:

Output:

Loss function (regression loss, same as before):

Per-example:

Over M examples:

Gradient of loss w.r.t. weights:Function more complex -> now much harder to derive the expressions! Instead… computational graphs and backpropagation.

Two-layer fully-connected neural network

Page 8: Lecture 15: Course Conclusion

8Serena Yeung BIODS 220: AI in Healthcare Lecture 1 -

Input

Softmax

3x3 conv, 64

7x7 conv, 64, / 2

FC 1000

Pool

3x3 conv, 64

3x3 conv, 643x3 conv, 64

3x3 conv, 643x3 conv, 64

3x3 conv, 1283x3 conv, 128, / 2

3x3 conv, 1283x3 conv, 128

3x3 conv, 1283x3 conv, 128

..

.

3x3 conv, 5123x3 conv, 512, /2

3x3 conv, 5123x3 conv, 512

3x3 conv, 5123x3 conv, 512

PoolResNet[He et al., 2015]

relu

Residual block

3x3 conv

3x3 conv

Xidentity

F(x) + x

F(x)

relu

X

Full ResNet architecture:- Stack residual blocks- Every residual block has

two 3x3 conv layers- Periodically, double # of

filters and downsample spatially using stride 2 (/2 in each dimension)

- Additional conv layer at the beginning

- No FC layers at the end (only FC 1000 to output classes)

No FC layers besides FC 1000 to output classes

Slide credit: CS231n

Page 9: Lecture 15: Course Conclusion

9Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

Common loss functionsRegression Binary Cross-Entropy

Label is a continuous value.

Minimize squared difference between prediction output and target

Equivalent to the negative log of the probability of the correct ground truth class being predicted. Think about what the expression looks like when y_i = 1 vs. 0.

Label is binary in {0,1}. Prediction is a real number in (0,1) and is the probability of the label being 1. It is usually the output of a sigmoid operation after the final layer.

Softmax

Label is 1 of K classes in {0, …, K}. Extension of binary cross-entropy loss to multiple classes. s_j corresponds to the score (e.g. output of final layer) for each class; the fraction in the log provides a normalized probability for each class.

Negative log of the probability of the true class y_i, as with the BCE loss. SVM

Label is 1 of K classes in {0, …, K}. Same use case as softmax, but different way of encouraging the model to produce outputs that we “like”. In practice, softmax is more popular and provides a nice probabilistic interpretation.

Incurs lowest loss of 0 (what we want) if the score for the true class y_i is greater than the score for each incorrect class j by a margin of 1

Page 10: Lecture 15: Course Conclusion

10Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

- Receiver Operating Characteristic (ROC) curve:

- Plots sensitivity and specificity (specifically, 1 - specificity) as prediction threshold is varied

- Gives trade-off between sensitivity and specificity

- Also report summary statistic AUC (area under the curve)

Evaluation metrics

True Positive Rate (TPR)

False Positive Rate (FPR)

Page 11: Lecture 15: Course Conclusion

11Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

Ciompi et al. 2015

Ciompi et al. Automatic classification of pulmonary peri-fissural nodules in computed tomography using an ensemble of 2D views and a convolutional neural network out-of-the-box. Medical Image Analysis, 2015.

- Task: classification of lung nodules in 3D CT scans as peri-fissural nodules (PFN, likely to be benign) or not

- Dataset: 568 nodules from 1729 scans at a single institution. (65 typical PFNs, 19 atypical PFNs, 484 non-PFNs).

- Data pre-processing: prescaling from CT hounsfield units (HU) into [0,255]. Replicate 3x across R,G,B channels to match input dimensions of ImageNet-trained CNNs.

Page 12: Lecture 15: Course Conclusion

12Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

Gulshan et al. 2016- Dataset:

- 128,175 images, each graded by 3-7 ophthalmologists.

- 54 total graders, each paid to grade between 20 to 62508 images.

- Data preprocessing: - Circular mask of each image was detected

and rescaled to be 299 pixels wide- Model:

- Inception-v3 CNN, with ImageNet pre-training- Multiple BCE losses corresponding to different

binary prediction problems, which were then used for final determination of referable diabetic retinopathy

Graders provided finer-grained labels which were then consolidated into (easier) binary prediction problems

Gulshan, et al. Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. JAMA, 2016.

Page 13: Lecture 15: Course Conclusion

13Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

Richer visual recognition tasks: segmentation and detection

Figures: Chen et al. 2016. https://arxiv.org/pdf/1604.02677.pdf

Classification

Output: one category label for image (e.g., colorectal

glands)

Semantic Segmentation

Detection InstanceSegmentation

Output: category label for each pixel

in the image

Output: Spatial bounding box for

each instance of a category object in the

image

Output: Category label and instance

label for each pixel in the image

Distinguishes between different instances of an object

Page 14: Lecture 15: Course Conclusion

14Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

Lung nodule segmentation

Liu et al. Segmentation of Lung Nodule in CT Images Based on Mask R-CNN. 2018.

- E.g. Liu et al. 2018

- Dataset: Lung Nodule Analysis (LUNA) challenge, 888 512x512 CT scans from the Lung Image Data Consortium database (LIDC-IDRI).

- Performed 2D instance segmentation in 2D CT slices

We will see other ways to handle 3D medical data types in the next lecture

Page 15: Lecture 15: Course Conclusion

15Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

Example: instance segmentation of cell nuclei

Page 16: Lecture 15: Course Conclusion

16Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

3D convolutions

Figure credit: https://www.researchgate.net/profile/Deepak_Mishra19/publication/330912338/figure/fig1/AS:723363244810254@1549474645742/Basic-3D-CNN-architecture-the-3D-filter-is-convolved-with-the-video-in-three-dimensions.png

Slide filter along 3 directions:x, y, and z!

When might you use 3D convolutions?

Ex: 224 x 224 x 1 x 256 3D CT scan (with 256 slices)

Ex: 224 x 224 x 3 x 500 video data (with 500 temporal frames)

x,y,z are spatial and/or temporal dimensions. Filter (e.g. 5 x 5 x 3 x 10 filter) goes all the way through the “channels” dimension as before.

x y z

channels (e.g. R,G,B)

Page 17: Lecture 15: Course Conclusion

17Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

I3D: 3D convolutional network for video dataInception Module (Inc.) w/

3D convolutions3D Inception Module used in Inception Network (also known as GoogLeNet)

3D convs

Carreira and Zisserman. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. CVPR 2017.

Can pre-train from 2D datasets e.g. ImageNet by replicating and normalizing 2D weights over additional dimension!

Note: in general, can 3D-ify many 2D architectures!

Page 18: Lecture 15: Course Conclusion

18Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

For richer visual recognition tasks, can also extend respective CNN architectures to use 3D convolutions

Figures: Chen et al. 2016. https://arxiv.org/pdf/1604.02677.pdf

Classification

Output: one category label for image (e.g., colorectal

glands)

Semantic Segmentation

Detection InstanceSegmentation

Output: category label for each pixel

in the image

Output: Spatial bounding box for

each instance of a category object in the

image

Output: Category label and instance

label for each pixel in the image

Page 19: Lecture 15: Course Conclusion

19Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

E.g. 3D U-NetEx: 3D segmentation of Xenopus kidney in confocal microscopic data

Spatial dims: ~ 250 x 250 x 60. 3 channels: each channel corresponds to a different type of data capture

Used only 3 samples total! (with total of 77 annotated 2D slices). Leverages fact that each sample contains many instances of same repetitive structures w/ variation.

Cicek et al. 3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation. MICCAI 2016.

Page 20: Lecture 15: Course Conclusion

20Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

What are electronic health records?

Figure credit: Rajkomar et al. 2018

Patient chart in digital form, containing medical and treatment history

Medical imaging and lab test results and reports

Page 21: Lecture 15: Course Conclusion

21Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

A real example of EHR data: MIMIC-III dataset

Johnson et al. MIMIC-III, a freely accessible critical care database. 2016.

Page 22: Lecture 15: Course Conclusion

22Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

CPT (Current procedural terminology): procedures and services codes

Johnson et al. MIMIC-III, a freely accessible critical care database. 2016.Additional figure credit: https://d20ohkaloyme4g.cloudfront.net/img/document_thumbnails/e570ad571499b88c8814e7366594e9bd/thumb_1200_1553.png

Page 23: Lecture 15: Course Conclusion

23Serena Yeung BIODS 220: AI in Healthcare Lecture 1 -

(Vanilla) Recurrent Neural Network

x

RNN

y

The state consists of a single “hidden” vector h:

Fully connected layersSlide credit: CS231n

Page 24: Lecture 15: Course Conclusion

24Serena Yeung BIODS 220: AI in Healthcare Lecture 1 -

h0 fW h1 fW h2 fW h3

x3

yT

x2x1W

RNN: Computational Graph: Many to Many

hT

y3y2y1 L1L2 L3 LT

L

Slide credit: CS231n

Page 25: Lecture 15: Course Conclusion

25Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

Harutyunyan et al.: phenotypes- Input: Time-series data corresponding to entire ICU stay- Output: Multilabel classification of the presence of 25 acute care

conditions (merged from ICD codes) in stay record

Figure credit: Harutyunyan et al. Multitask learning and benchmarking with clinical time series data. 2019.

Q: Why do we formulate this as

a multi-label classification

task?

Q: What loss function should

we use?

A: Comorbidities (co-occurring conditions)

A: Multiple binary cross-entropy losses

Page 26: Lecture 15: Course Conclusion

26Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

OMOP Common Data Model

Figure credit: https://ohdsi.github.io/TheBookOfOhdsi/images/CommonDataModel/cdmDiagram.png

Page 27: Lecture 15: Course Conclusion

27Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

FHIR

Figure credit: Choi et al. OHDSI on FHIR Platform Development with OMOP CDM mapping to FHIR Resources. 2016.

Data from all sources can be written in an OMOP data repository for analysis

Page 28: Lecture 15: Course Conclusion

28Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

Data representation

Raw data as FHIR resources

Rajkomar et al. Scalable and accurate deep learning with electronic health records. Npj Digital Medicine, 2018.

Page 29: Lecture 15: Course Conclusion

29Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

Token embeddings

[0 0 1 0 0 0 0 …. 0]

0.5 0.2 0.1

0.6 0.1 0.6

0.5 0.8 0.2

0.7 0.9 0.3

0.3 0.5 0.1

0.7 0.8 0.1

...

X = [0.5 0.8 0.2]

N x D embedding matrix

1xN token input (one-hot selection of token)

D-dim token embedding

In general, learning embedding matrices are a useful way to map discrete data into a semantically meaningful, continuous space! Will see frequently in natural language processing.

Page 30: Lecture 15: Course Conclusion

30Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

Today: Token Word Embeddings

[0 0 1 0 0 0 0 …. 0]

0.5 0.2 0.1

0.6 0.1 0.6

0.5 0.8 0.2

0.7 0.9 0.3

0.3 0.5 0.1

0.7 0.8 0.1

...

X = [0.5 0.8 0.2]

N x D embedding matrix

1xN token input (one-hot selection of token)

D-dim token embedding

Words come from a discrete vocabulary! Can learn word embeddings using a similar framework

Page 31: Lecture 15: Course Conclusion

31Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

Skip-gram model

E

xt

ht

xt-2 xt-1 xt+1 xt+2

Word embedding (feature vector), of word at the t-th position

Use word embedding vector to predict the word identity of a set of neighboring positions(Each is an N-way classification if the dictionary has N words)

Can train using a classification loss (e.g. softmax loss) based only on the text structure, without any external labels!

Lt-2 Lt-1Lt+

1

Lt+

2

Captures notion that words occurring in similar contexts should have similar feature vectors (word embeddings)

Aside: trying to learn “good” feature representations using loss functions based on inherent structure in data, as opposed to external labels, is a currently active area of research called “self-supervised learning”Mikolov, et al. Efficient Estimation of Word Representations in Vector Space, 2013.

Page 32: Lecture 15: Course Conclusion

32Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

Transformer architecture framework - Recent approach for sequence processing based on “self-attention” (Vaswani et al. 2017). BERT uses a stack of “encoder layers” each with self-attention (original Transformer also had decoder layers).

Encoder Layer

Encoder Layer

Encoder Layer

...

abnormal findings lung...

Encoder Stack

Encoder self-attention

Feed-forward

Encoder Layer

Devlin et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, 2018.Vaswani et al. Attention is All You Need, 2017.

Page 33: Lecture 15: Course Conclusion

33Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

Training BERT

Encoder Layer

Encoder Layer

Encoder Layer

...

abnormal findings lung...

Encoder Stack

Encoder self-attention

Feed-forward

Encoder Layer

Devlin et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, 2018.Vaswani et al. Attention is All You Need, 2017.

CLS MASK

1. Predict randomly masked words in sentence inputs (classification)

Input sequences with a start

token

2. Input sentence pairs separated by a [SEP] token, predict whether the 2nd sentence follows the 1st in the text

Page 34: Lecture 15: Course Conclusion

34Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

ClinicalBERT: training on clinical notes (from MIMIC)

Huang et al. ClinicalBert: Modeling Clinical Notes and Predicting Hospital Readmission, 2019.

Fine-tuning ClinicalBERT for prediction of 30-day hospital readmission:

Use hidden state corresponding to [CLS] token

When performing prediction from long sequences, obtain predictions for each sentence separately and then combine

Page 35: Lecture 15: Course Conclusion

35Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

Some biology basics: starting from DNA

Figure credit: virtualmedicalcentre.comFigure credit: https://en.wikipedia.org/wiki/Nucleobase#/media/File:DNA_chemical_structure.svg

Page 36: Lecture 15: Course Conclusion

36Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

Transcription and translation

Figure credit: https://www.cancer.gov/images/cdr/live/CDR761782-571.jpg

Transcription: DNA -> RNA

Translation: RNA -> Protein

Page 37: Lecture 15: Course Conclusion

37Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

Many data types, e.g. RNA-seq

Produces readout of mRNA content in a tissue sample

Figure credit: https://cdn.technologynetworks.com/tn/images/body/dnasequencinga1529596208892.png

Map back to reference genome for analysis

Now standard approach for transcriptomics study

More recently in 2010s, single-cell RNA-seq!

Page 38: Lecture 15: Course Conclusion

38Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

ENCODE: identifying and analyzing all functional elements in the human genome

Figure credit: https://www.encodeproject.org/

- Launched by US National Human Genome Research Institute in 2003

- Contributions from worldwide consortium of research groups

Page 39: Lecture 15: Course Conclusion

39Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

DeepSea

Zhou and Troyanskaya. Predicting effects of noncoding variants with deep learning–based sequence model. Nature Methods, 2015.

Predict chromatin effects of (non-coding) sequence alterations with single-nucleotide sensitivity (SNPs: single nucleotide polymorphism)

Input: DNA sequence pair with SNPOutput: Predicted chromatin effects (919 total)

- 690 transcription factor profiles- 125 DNase I hypersensitive sites (DHS)

profiles (looser chromatin structure, easier protein binding)

- 104 histone-mark profiles (histone modifications)

Multi-task training!

Multi-task prediction of 919 chromatin profiles, for each allele (variant)

Page 40: Lecture 15: Course Conclusion

40Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

Multimodal dataCan be very similar, e.g. different image acquisition variants

Figure credit: Dong et al. MIUA, 2017.

Page 41: Lecture 15: Course Conclusion

41Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

Multimodal dataOr very different, e.g. different types of clinical data

Figure credit: Rajkomar et al. 2018.

Page 42: Lecture 15: Course Conclusion

42Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

Categorizations of multimodal models

Joint fusion: Both modality-specific components (with learnable parameters) and combined-modality components within the model, that are updated during model training

Huang et al. Fusion of medical imaging and electronic health records using deep learning: a systematic review and implementation guidelines, 2020.

Page 43: Lecture 15: Course Conclusion

43Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

How can we produce good labels from noisy sources? More sophisticated approach: learn models for how to best aggregate noisy labeling functions!

Dunmon et al. Cross-Modal Data Programming Enables Rapid Medical Machine Learning, 2020.Figure credit: Nishith Khandwala et al., 2017.

Page 44: Lecture 15: Course Conclusion

44Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

AI and COVID-19- Detection of COVID-19 from CT images- 2 stage process: lung segmentation followed by classification of COVID-19 or not- Multinational dataset of 2724 scans from 2617 patients, with 1029 scans (922) patients

confirmed positive for COVID-19

Harmon et al. Artificial intelligence for the detection of COVID-19 pneumonia on chest CT using multinational datasets, 2020.

Page 45: Lecture 15: Course Conclusion
Page 46: Lecture 15: Course Conclusion

46Serena Yeung BIODS 220: AI in Healthcare Lecture 15 - 46

Data: x

Just data, no labels!

Goal: Learn some underlying hidden structure of the data

Examples: Clustering, representation / feature learning, density estimation, etc.

Other paradigms of machine learning:Unsupervised learning

Representation learning

Encoder

Input data

Features

Unsupervised training objective

Page 47: Lecture 15: Course Conclusion

47Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

Darabi 2019- Autoencoder-based unsupervised representation learning for multimodal data of 200,000 records

from 250 hospital sites (eICU collaborative Research Database)

- Used feature representation to train models for downstream mortality, readmission prediction tasks

Darabi et al. Unsupervised Representation for EHR Signals and Codes as Patient Status Vector, 2019.

Autoencoder for each code-based modality (e.g. medication, treatment, diagnosis), and signal time-series (e.g. heart rate)

Page 48: Lecture 15: Course Conclusion

48Serena Yeung BIODS 220: AI in Healthcare Lecture 15 - 48

Decoder network

Sample z from

Sample x|z from

Use decoder network. Now sample z from prior!

Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014

Variational autoencoders can also be used to sample new (synthetic) data

Data manifold for 2-d z

Vary z1

Vary z2

Page 49: Lecture 15: Course Conclusion

49Serena Yeung BIODS 220: AI in Healthcare Lecture 15 - 49

Generator network: try to fool the discriminator by generating real-looking imagesDiscriminator network: try to distinguish between real and fake images

Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014GANs: Two-player game

49

zRandom noise

Generator Network

Discriminator Network

Fake Images(from generator)

Real Images(from training set)

Real or Fake

Fake and real images copyright Emily Denton et al. 2015. Reproduced with permission.

Page 50: Lecture 15: Course Conclusion

50Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

Example: GAN-based medical image synthesis

Liver lesions of different types (Frid-Adar 2018)

Dermatology lesions (Ghorbani 2019)

Brain MRIs with lesions (Han 2018)

Can be used for data augmentation!

Page 51: Lecture 15: Course Conclusion

51Serena Yeung BIODS 220: AI in Healthcare Lecture 15 - 51

Problems involving an agent interacting with an environment, which provides numeric reward signals

Goal: Learn how to take actions in order to maximize reward

Atari games figure copyright Volodymyr Mnih et al., 2013. Reproduced with permission.

A third paradigm of learning: reinforcement learning

Page 52: Lecture 15: Course Conclusion

52Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

:neural network with weights

Q-network architecture

52

Current state st: 84x84x4 stack of last 4 frames (after RGB->grayscale conversion, downsampling, and cropping)

16 8x8 conv, stride 4

32 4x4 conv, stride 2

FC-256

FC-4 (Q-values)

[Mnih et al. NIPS Workshop 2013; Nature 2015]

Output expected future reward from taking each of the 4 possible actions

Page 53: Lecture 15: Course Conclusion

53Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

Example: Raghu et al. 2017

Learned a Q-learning based policy to take treatment actions for sepsis patients, using the MIMIC dataset

5x5 possible policy actions at any timestep

Raghu et al. Deep Reinforcement Learning for Sepsis Treatment, 2017.

Page 54: Lecture 15: Course Conclusion

54Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

Interpretability: a challenge in deep learning

https://www.cs.cmu.edu/~bhiksha/courses/10-601/decisiontrees/DT.png

vs.

Page 55: Lecture 15: Course Conclusion

55Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

Saliency Maps: Class Activation Maps (CAM)- Zhou et al. 2015- Visualizes heatmap

(class activation map) indicating the importance of the activation at spatial grid (x, y) leading to the classification of an image to class c.

Zhou et al. Learning Deep Features for Discriminative Localization, 2016.

Weight (importance) of kth filter activation for predicting cth class

Page 56: Lecture 15: Course Conclusion

56Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

Rajpurkar et al. 2017- Binary classification of pneumonia

presence in chest X-rays- Used ChestX-ray14 dataset with over

100,000 frontal X-ray images with 14 diseases

- 121-layer DenseNet CNN- Compared algorithm performance with 4

radiologists- Also applied algorithm to other diseases to

surpass previous state-of-the-art on ChestX-ray14

Rajpurkar et al. CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning. 2017.

CAM visualization

Page 57: Lecture 15: Course Conclusion

57Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

Ethics: many questions around AI / human collaboration in medicine

- How to make diagnosis and/or care decisions when the algorithm disagrees with the human?

- How should AI algorithms work together with humans?- How to handle machine error vs. human error?- How to make sure AI algorithms don’t (perhaps inadvertently) discriminate

against certain populations?- How to handle tradeoffs between algorithmic performance on some groups

vs. others?

Page 58: Lecture 15: Course Conclusion

58Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

Chen et al. 2019- Showed discrepancies in error rates by race, gender, insurance type, etc. for

models trained to make clinical predictions on MIMIC-III data

Error rate for predicting ICU mortality by gender

Chen et al. Can AI Help Reduce Disparities in General Medical and Mental Health Care? 2019.

Page 59: Lecture 15: Course Conclusion

59Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

More on fairness… there are many possible definitions of fairness!

- Group-independent predictions: predictions should be independent of group membership

- Equal metrics across groups: e.g. equal true positive rates or false positive rates across groups

- Individual fairness: individuals who are similar with respect to a prediction task should have similar outcomes

- Causal fairness: e.g. there should not be a causal pathway from a sensitive attribute to the outcome prediction

Suresh and Guttag. A Framework for Understanding Unintended Consequences of Machine Learning, 2020.

Cannot satisfy all of these simultaneously: satisfying “fairness” according to one definition generally leads to a trade-off respect to another definition!

Page 60: Lecture 15: Course Conclusion

60Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

Mitchell 2019: Model cards for Model Reporting- Documentation accompanying trained models to detail performance characteristics

Mitchell et al. Model Cards for Model Reporting, 2019.

Page 61: Lecture 15: Course Conclusion

61Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

Gebru 2020: Datasheets for Datasets

Gebru et al. Datasheets for Datasets. 2020.

Page 62: Lecture 15: Course Conclusion

62Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

Federated Learning- Related to distributed computing, but with an important property for many medical

settings: data is decentralized and never leaves local silos. Central server controls training across decentralized sources.

Figure credit: https://blogs.nvidia.com/wp-content/uploads/2019/10/federated_learning_animation_still_white.png

Page 63: Lecture 15: Course Conclusion

63Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

Li et al. 2019- NVIDIA Clara’s Federated Learning system for medical imaging data

- Used federated learning to train segmentation model on BRATS

- Achieved comparable performance to non-federated learning, training somewhat slower but data “silos” preserved

Li et al. Privacy-preserving Federated Brain Tumour Segmentation, 2019.

Page 64: Lecture 15: Course Conclusion

64Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

Differential privacyKey idea: output for a dataset, vs. the dataset with a difference for a single entry (e.g., one individual), is “hardly different”. Mathematical guarantees on this idea.

Abadi et al. Deep Learning with Differential Privacy, 2016.

Page 65: Lecture 15: Course Conclusion

65Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

Differential privacySimple intuition behind how we can achieve differential privacy: adding noise!

Figure credit: https://github.com/frankmcsherry/blog/blob/master/posts/2016-02-03.md

Example of reporting a value with Laplacian noise added

Page 66: Lecture 15: Course Conclusion

66Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

Training differentially private deep learning models

Abadi et al. Deep Learning with Differential Privacy, 2016.

Add noise for differential privacy

Page 67: Lecture 15: Course Conclusion

67Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

Implementation of DP-SGD

Utilities for calculating epsilon

Can work with differential privacy within deep learning frameworks

https://blog.tensorflow.org/2019/03/introducing-tensorflow-privacy-learning.htmlhttp://www.cleverhans.io/privacy/2019/03/26/machine-learning-with-differential-privacy-in-tensorflow.html

Page 68: Lecture 15: Course Conclusion

68Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

Where to go from here?- More deep learning courses, e.g. focusing on different domains

- CS 221 and 229: broader AI courses- CS 231N: computer vision- CS 224N: natural language processing- CS 224S: spoken language processing- Many more!: https://ai.stanford.edu/courses/

- More biomedicine focused courses- CS/BMI 273B: deep learning in genomics- CS/BMI 279: computational biology- BMI 217: translational bioinformatics- Many more!: (BMI courses)

https://explorecourses.stanford.edu/search?view=catalog&filter-coursestatus-Active=on&page=0&catalog=&academicYear=&q=BIOMEDIN&collapse=

- (BIODS courses) https://explorecourses.stanford.edu/search?q=BIODS&view=catalog&academicYear=&catalog=&page=0&filter-coursestatus-Active=on&collapse=

- Many research and internship opportunities as well

Page 69: Lecture 15: Course Conclusion

69Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

Where to go from here?- More deep learning courses, e.g. focusing on different domains

- CS 221 and 229: broader AI courses- CS 231N: computer vision- CS 224N: natural language processing- CS 224S: spoken language processing- CS 236: generative models- Many more!: https://ai.stanford.edu/courses/

- More biomedicine focused courses- CS/BMI 273B: deep learning in genomics- CS/BMI 279: computational biology- BMI 217: translational bioinformatics- Many more!: (BMI courses)

https://explorecourses.stanford.edu/search?view=catalog&filter-coursestatus-Active=on&page=0&catalog=&academicYear=&q=BIOMEDIN&collapse=

- (BIODS courses) https://explorecourses.stanford.edu/search?q=BIODS&view=catalog&academicYear=&catalog=&page=0&filter-coursestatus-Active=on&collapse=

Page 70: Lecture 15: Course Conclusion

70Serena Yeung BIODS 220: AI in Healthcare Lecture 15 -

Thank you!