cs184a/284a ai in biology and medicine › ~xhx › courses › cs284a › ... · cs184a/284a ai in...

33
CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization

Upload: others

Post on 10-Jun-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data

CS184A/284AAI in Biology and Medicine

Dimension Reduction and Data Visualization

Page 2: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data

Visualizing Data using t-SNE

Page 3: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data

Visualization and Dimensionality Reduction

Intuition behind t-SNE

Visualizing representations

Page 4: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data

Visualization is key to understand data easily

QuestionIs the relation linear?

Page 5: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data
Page 6: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data

Dimensionality Reduction is a helpful tool forvisualization● Dimensionality reduction algorithms

○ Map high-dimensional data to a lower dimension○ While preserving structure

● They are used for○ Visualization○ Performance○ Curse of dimensionality

● A ton of algorithms exist● t-SNE is specialised for visualization● ... and has gained a lot of popularity

Page 7: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data

Dimensionality Reduction techniques solveoptimization problems

Three approaches for Dimensionality Reduction:

● Distance preservation● Topology preservation● Information preservation

t-SNE is distance-based but tends to preserve topology

Page 8: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data

SNE computes pair-wise similaritiesSNE converts euclidean distances to similarities, that can be interpreted as probabilities.

Hence the name Stochastic Neighbor Embedding...

Page 9: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data

Pair-wise similarities should stay the same

Page 10: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data

Kullback-Leiber Divergence measures thefaithfulness with which qj|i models pj|i

Page 11: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data

Some remaining questions

Why radial basis function (exponential)?

2. Why probabilities?

3. How do you choose i?

Page 12: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data

Why radial basis function (exponential)?Focus on local geometry.

This is why t-SNE can be interpreted as topology-based

Page 13: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data

Why probabilities?

Small distance does not mean proximity on manifold.

Probabilities are appropriate to model this uncertainty

Page 14: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data

How do you choose σi?

Page 15: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data

The entropy of p increases with σi

Entropy

H(p) = -Σi pi log2 pi

Page 16: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data

Perplexity, a smooth measure of the # of neighbors.Perplexity

Perp(P) = 2H(P)

Page 17: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data
Page 18: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data

The "Crowding problem"

Page 19: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data

Mismatched Tails can Compensate for MismatchedDimensionalities

Page 20: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data

Last but not least: Optimization

Page 21: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data
Page 22: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data

Visualizing representations

Page 23: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data

Mapping raw data to distributed representations● Feature engineering is often laborious.● New tendency is to automatically learn adequate features or representations.● Ultimate goal: enable AI to extract useful features from raw sensory data.

● t-SNE can be used to make sense of the learned representations!

TextImagesOther sensory inputs

High dimensional vectors

Page 24: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data

Using t-SNE to explore a Word embedding

Page 25: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data

Explore a Wikipedia article embedding.

Page 26: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data

Exploring game state representations.Google Deepmind plays Atari games.

● A representation is learned with a convolutional neural network

● From 84x84x4 = 28.224 pixel values to 512 neurons.

● Predicts expected score if a certain action is taken.

Page 27: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data

Exploring game state representations.Google Deepmind plays Atari games.

Page 28: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data

Using t-SNE to explore image representations.Classifying dogs and cats.

Each data point is an image of a dog or a catred = cats, blue = dogs

Page 29: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data

Using t-SNE to explore image representations.Classifying dogs and cats.

RepresentationConvolutional net trained for Image Classification (1000 classes)

https://indico.io/blog/visualizing-with-t-sne/

Page 30: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data

Using t-SNE to explore image representations.Classifying dogs and cats.

RepresentationConvolutional net trained for Image Classification (1000 classes)

https://indico.io/blog/visualizing-with-t-sne/

Page 31: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data

Conclusion● The t-SNE algorithm reduces dimensionality while preserving local similarity.

● The t-SNE algorithm has been build heuristically.

● t-SNE is commonly used to visualize representations.

Page 32: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data

AcknowledgementSimon Carbonnelle for slides

Page 33: CS184A/284A AI in Biology and Medicine › ~xhx › courses › CS284A › ... · CS184A/284A AI in Biology and Medicine Dimension Reduction and Data Visualization. Visualizing Data