supplemental materials systematic exploration of ......20-gmm-sw cluster assignment. a red box...

5
Supplemental Materials Systematic exploration of unsupervised methods for mapping behavior Jeremy G Todd, Jamey S Kain and Benjamin L de Bivort Supplemental Figures Figure S1 GMM component histograms and illustrative monte carlo fits A) Observed distribution of maximum posterior probability for all data points from the PCA20-GMM-SW co-fit of all fly experiments (black) and 4000 points randomly sampled from the GMM probability distribution function shown in Figure 5A (red), whose components were positioned using a Monte Carlo algorithm to match the observed distribution. B) Observed distribution of pairwise distances between GMM component means in the co-fit of all fly experiments (black) and the Monte Carlo fit shown in Figure 5A (red).

Upload: others

Post on 04-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Supplemental Materials Systematic exploration of ......20-GMM-SW cluster assignment. A red box around the panels on the right indicates a low-variance cluster assignment (which was

Supplemental Materials

Systematic exploration of unsupervised methods for mapping behavior

Jeremy G Todd, Jamey S Kain and Benjamin L de Bivort

Supplemental Figures

Figure S1 – GMM component histograms and illustrative monte carlo fits – A) Observed distribution of maximum posterior probability for all data points from the PCA20-GMM-SW co-fit of all fly experiments (black) and 4000 points randomly sampled from the GMM probability distribution function shown in Figure 5A (red), whose components were positioned using a Monte Carlo algorithm to match the observed distribution. B) Observed distribution of pairwise distances between GMM component means in the co-fit of all fly experiments (black) and the Monte Carlo fit shown in Figure 5A (red).

Page 2: Supplemental Materials Systematic exploration of ......20-GMM-SW cluster assignment. A red box around the panels on the right indicates a low-variance cluster assignment (which was

Figure S2 – Speed histograms – Count vs behavioral state speed (arbitrary units) for each fly experiment. Bimodality in these is interpreted as the respective contribution of state changes within and between stereotyped behavioral modes. Empty plot at bottom right indicates axis labels.

Page 3: Supplemental Materials Systematic exploration of ......20-GMM-SW cluster assignment. A red box around the panels on the right indicates a low-variance cluster assignment (which was

Figure S3 – BIC vs k – Bayesian Information Criterion (BIC) for the GMM of each fly experiment versus the number of Gaussian modes employed (k). Knees in these curves suggest appropriate numbers of clusters in the data. Empty plot at bottom right indicates axis labels.

Page 4: Supplemental Materials Systematic exploration of ......20-GMM-SW cluster assignment. A red box around the panels on the right indicates a low-variance cluster assignment (which was

Figure S4 – Cluster label streak length histogram – Log-linear plot of abundance vs length of streaks of consecutive frames receiving identical cluster labels. Each line is a different cluster from fly experiment 371 using mapping method PCA20-GMM-SW with k=72 clusters. Red, green and blue lines highlight the curves from clusters 60, 40 and 35 respectively (clusters which are examined in Figure 6).

Figure S5 – Number of watershed regions versus kernel size – The number of watershed regions, identified during the post-t-SNE density estimation and clustering steps, grows roughly linearly as a function of the size of the 2d Gaussian blur kernel. Each line is a different fly experiment.

0 10 20 30 40 50 60 70 80 90 1000

100

200

300

400

500

600

700

800

900

1000

2d Gaussian blur kernel size (1/σ)

num

ber o

f wat

ersh

ed re

gion

s

Page 5: Supplemental Materials Systematic exploration of ......20-GMM-SW cluster assignment. A red box around the panels on the right indicates a low-variance cluster assignment (which was

Supplemental Movie Captions

Movie M1 – Data and clustering visualization for fly experiment 371 – movie frames (upper-left), data values after pre-processing (white lines) superimposed on their wavelet time-frequency representation with vertical green line indicating current frame (lower-left), t-SNE2 density map with white dot indicating current frame’s projection (upper-right), and density estimate of the points belonging to the current PCA20-GMM-SW cluster (lower-right). The PCA20-GMM-SW cluster panel facilitates a comparison between t-SNE2-watershed and PCA20-GMM-SW by showing t-SNE2-watershed boundaries, with a density map computed from all frames sharing the current PCA20-GMM-SW cluster assignment. A red box around the panels on the right indicates a low-variance cluster assignment (which was not subjected to t-SNE projection).

Movie M2 – Selected behavioral motifs for co-fit data – Each 4x4 grid of cells displays frames from a particular PCA20-GMM-SW cluster assignment for data co-fit across all fly experiments. For each fly experiment the longest streak of consecutive frames is shown and repeated in a loop. Cluster labels correspond to Figure 8A-C.

Movies M3-M5 – Inter-fly comparisons for co-fit data – Each 4x4 grid of cells displays frames from a particular PCA20-GMM-SW fly experiment for data co-fit across all fly experiments. One cluster is shown in each movie (M3: cluster #15, M4: cluster #7, M5: cluster #17). For each fly experiment the 16 longest streaks of consecutive frames are shown and repeated in a loop. The caption below each sequence indicates the base 10 log of the maximum posterior probability of the cluster assignment, averaged across all frames in the streak. Cluster labels correspond to Figure 8A-C.