supplemental material

Download Supplemental Material

If you can't read please download the document

Upload: nicholas-terry

Post on 27-Dec-2015

220 views

Category:

Documents


2 download

TRANSCRIPT

  • Slide 1
  • Slide 2
  • Supplemental Material
  • Slide 3
  • http://www.brain-map.org
  • Slide 4
  • Slide 5
  • A Big Thanks Prof. Jason Bohland Quantitative Neuroscience Laboratory Boston University
  • Slide 6
  • The Process Construction and representation of the Anatomic Gene Expression Atlas (AGEA).
  • Slide 7
  • Allen Reference Atlas
  • Slide 8
  • 3D Nissl volume comes from rigid reconstruction Each section reoriented to match adjacent images as closely as possible A 1.5T low resolution 3D average MRI volume used to ensure reconstruction is realistic Reoriented Nissl section down-sampled, converted to grayscale Isotropic 25m grayscale volume.
  • Slide 9
  • Anatomy 208 large structures and structural groupings extracted Projected & smoothed onto 3D atlas volume to for structural annotation Additional decomposition of cortex into an intersection of 202 regions and areas
  • Slide 10
  • The Process Construction and representation of the Anatomic Gene Expression Atlas (AGEA).
  • Slide 11
  • InSitu Hybridization or ISH Each gene ISH series is reconstructed from serial sections (200 m spacing) Coronal section Sagittal section
  • Slide 12
  • Why ISH ? Phenotypic properties in cells result of unique combination of expressed gene products Gene expression profiles => define cell types.
  • Slide 13
  • 6 genes on 1 brain Each gene on 56 sections 2 sections are for Nissl
  • Slide 14
  • 8 genes on 1 brain Each gene on 20 Sections.
  • Slide 15
  • ISH Tissue Preparation & Imaging Process Sectioning Staining (Non-isotopic digoxigenine (DIG)) Washing Imaging
  • Slide 16
  • ISH Probe Preparation
  • Slide 17
  • Traditional Approach vs. ISH Histology One gene at a time For 20,000 genes need 20000 x (5 or 14) slides ~1year DNA microarrays & SAGE - Applied to large brain region Cannot differentiate neuronal subtypes Kamme, F et. al. J. Neurosci (2003) Sugino, K. et. al. Nature Neurosci (2006) in situ hybridization measures expression & preserves spatial information for single gene Finer resolution cellular but not single cell Data can be used to analyze Gene expression Gene regulation CNS function (spatial) Cellular phenotype (spatial)
  • Slide 18
  • Reproducibility For multiple genes, inbred mouse strain used Although different mice used for different genes, expression for under same environmental conditions are reproducible.
  • Slide 19
  • Is ISH Reproducible? Primary Source of variation comes from Riboprobes Day-to-day variability Biological variability in brains Still with inbred mice, variation between brains is significant.
  • Slide 20
  • Processing Expression StatisticsReconstruction 3D Data accessed by standard coord system 200^3 m voxels Ontology of Allen Reference Atlas used to label individual voxels
  • Slide 21
  • Grid Based Nearest Plane
  • Slide 22
  • Registration - Key Volumes iteratively registered to AB atlas using affine and locally nonlinear warping Registration good to ~200 microns Local deformation field example
  • Slide 23
  • Slide 24
  • 3D Annotation
  • Slide 25
  • Lower dimensional data volumes Analyze binned expression volumes at 200 m 3 resolution ~31,000 image series (mostly single hemisphere, sagittal series) 4,104 unique genes available from coronally sectioned brains Each volume is 67 x 41 x 58 voxels (about 50k brain voxels) Comparable to fMRI resolution
  • Slide 26
  • Data normalization Background correction & Registration Intensity normalization Correct background from negative control Registration - Map the image to the reference atlas Smoothed Expression Energy Sum of intensities of expressing cells / # of cells in the voxel An average over many cells of diverse types
  • Slide 27
  • ISH Signal (c) Coronal plane in situ hybridization (ISH) image of gene tachykinin 2 (Tac2) from the Allen Brain Atlas showing enriched expression in the bed nucleus of the stria terminalis (BST). The box represents a 1-mm2 square. (d) Enlarged expression mask view of boxed area in c depicting gene expression levels color coded by ISH signal intensity (red, higher expression level; green/blue, lower expression level).
  • Slide 28
  • Measurements p is a image pixel in voxel C |C| is the total number of pixels in C M(p) - expression segmentation mask 1 (expressing pixel) or 0 (non expressing pixel) I(p) grayscale value of ISH image intensity Gray = 0.3*Red + 0.59*Green + 0.11*Blue.
  • Slide 29
  • Per Gene Signature Prox1 Coronal section Sagittal section Prox1 volume maximum intensity projections Raw ISH Expression Energy
  • Slide 30
  • Expression measures expression density = sum of expressing pixels / sum of all pixels in division expression intensity = sum of expressing pixel intensity / sum of expressing pixels expression energy = sum of expressing pixel intensity / sum of all pixels in division == density x intensity Recap - Measurements
  • Slide 31
  • MetaData Each voxel can be connected to a node in a hierarchical brain atlas / ontology, and also to Waxholm space Raw Nissl sections from the same brain (with 200 m spacing) can also be obtained Each gene has specific probe sequence used, various identifiers to link to gene information (weve used Entrez ID)
  • Slide 32
  • Deriving Insights
  • Slide 33
  • Large-scale data analysis How much structure is present across space and across genes? How would the brain segment on the basis of gene expression patterns (as opposed to Nissl, etc.)? Is there structure in the patterns of expression of highly localized genes? What can we learn from the expression patterns of genes implicated in disorders? see Bohland et al. (2009) Methods; Ng et al. (2009) Nature Neuroscience.
  • Slide 34
  • Genome-wide Analysis of Expression 70.5% genes expressed in less than 20% cells
  • Slide 35
  • Notes Well-established genes for different cells identified For 12 major brain regions, 100 top genes.
  • Slide 36
  • Cell-Specific Genes Gene Ontology enrichment analysis useful Oligodendrocyte-enriched genes => myelin production.
  • Slide 37
  • Heterogeneity
  • Slide 38
  • Functional Compartments Genes with regional expression provides substrates for functional differences
  • Slide 39
  • Tools from AGEA Correlation mode View navigate 3-D spatial relationship maps Clusters mode Explore transcriptome based spatial organization Gene Finder mode - Search for genes with local regionality
  • Slide 40
  • Slide 41
  • Expression energy for each gene (M=4,376) and for each voxel (N=51,533) For each voxel find Pearsons correlation coefficient between seed voxel and other voxel using expression vectors of length M Compute 51,533 three-dimensional correlation maps Web viewer for easy navigation between maps and within each 3-D map Correlation values as 24-bit false color using a blue-to-red (jet) color scale Spatial Transcriptome
  • Slide 42
  • Slide 43
  • Clusters of Correlated Gene Expression Classical definition of brain regions Overall Morphology Cellular Cytoarchitecture Ontological Development Functional Connectivity
  • Slide 44
  • Slide 45
  • Hierarchical clustering Voxels are spatially organized as a binary tree Each node is collection of voxels and has 0 or 2 branches Initially 51,533 voxels assigned to root node of the tree. Final tree has103,065 nodes with a maximum depth of 53 levels and 51,533 leaf nodes (one for each voxel in the brain). At each bifurcation an ordering is assigned to each child to enable the definition a global depth first ordering for all leaf nodes. Clusters of Correlated Gene Expression
  • Slide 46
  • 46 Clustering Analysis
  • Slide 47
  • Hierarchical Clustering
  • Slide 48
  • Notes
  • Slide 49
  • Microarray Data Analysis Unsupervised Analysis clustering Supervised Analysis Visualization & Decomposition Pattern Analysis Statistical Analysis K-means Hierarchical Clustering Biclustering CLICK Self-Organizing Maps DBSCAN OPTICS DENCLUE
  • Slide 50
  • Up regulated genes Down regulated genes Differentially Regulated Genes
  • Slide 51
  • Clusters ?
  • Slide 52
  • Clustering Analysis Group genes that show a similar temporal expression pattern. Group samples/genes that show a similar expression pattern.
  • Slide 53
  • Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups Inter-cluster distances are maximized Intra-cluster distances are minimized Clustering Analysis
  • Slide 54
  • Clusters ? How many clusters? Four ClustersTwo Clusters Six Clusters
  • Slide 55
  • Clustering Algorithms K-means and its variants Hierarchical clustering
  • Slide 56
  • K-means Clustering Partitional clustering approach Each cluster is associated with a centroid (center point) Each point is assigned to the cluster with the closest centroid Number of clusters, K, must be specified The basic algorithm is very simple
  • Slide 57
  • Choosing Initial Centroids
  • Slide 58
  • Limitations - Differing Sizes Original Points K-means (3 Clusters)
  • Slide 59
  • Limitations : Differing Density Original Points K-means (3 Clusters)
  • Slide 60
  • Limitations : Non-globular Shapes Original Points K-means (2 Clusters)
  • Slide 61
  • Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram A tree like diagram that records the sequences of merges or splits
  • Slide 62
  • Agglomerative Clustering More popular hierarchical clustering technique Basic algorithm is straightforward Compute the proximity matrix Let each data point be a cluster Repeat Merge the two closest clusters Update the proximity matrix Until only a single cluster remains Key operation is the computation of the proximity of two clusters Different approaches to defining the distance between clusters distinguish the different algorithms
  • Slide 63
  • In The Beginning... Start with clusters of individual points and a proximity matrix p1 p3 p5 p4 p2 p1p2p3p4p5......... Proximity Matrix
  • Slide 64
  • Intermediate Step After some merging steps, we have some clusters C1 C4 C2 C5 C3 C2C1 C3 C5 C4 C2 C3C4C5 Proximity Matrix
  • Slide 65
  • Intermediate Step We want to merge the two closest clusters (C2 and C5) and update the proximity matrix. C1 C4 C2 C5 C3 C2C1 C3 C5 C4 C2 C3C4C5 Proximity Matrix
  • Slide 66
  • After Merging The question is How do we update the proximity matrix? C1 C4 C2 U C5 C3 ? ? ? ? ? C2 U C5 C1 C3 C4 C2 U C5 C3C4 Proximity Matrix
  • Slide 67
  • Inter-Cluster Similarity p1 p3 p5 p4 p2 p1p2p3p4p5......... Similarity? MIN MAX Group Average Distance Between Centroids Proximity Matrix
  • Slide 68
  • Inter-Cluster Similarity p1 p3 p5 p4 p2 p1p2p3p4p5......... Proximity Matrix MIN MAX Group Average Distance Between Centroids
  • Slide 69
  • Inter-Cluster Similarity p1 p3 p5 p4 p2 p1p2p3p4p5......... Proximity Matrix MIN MAX Group Average Distance Between Centroids
  • Slide 70
  • p1 p3 p5 p4 p2 p1p2p3p4p5......... Proximity Matrix MIN MAX Group Average Distance Between Centroids Inter-Cluster Similarity
  • Slide 71
  • p1 p3 p5 p4 p2 p1p2p3p4p5......... Proximity Matrix MIN MAX Group Average Distance Between Centroids
  • Slide 72
  • Hierarchical: MIN Nested ClustersDendrogram 1 2 3 4 5 6 1 2 3 4 5
  • Slide 73
  • Hierarchical Clustering: MAX Nested ClustersDendrogram 1 2 3 4 5 6 1 2 5 3 4
  • Slide 74
  • Hierarchical Clustering: Group Average Nested ClustersDendrogram 1 2 3 4 5 6 1 2 5 3 4
  • Slide 75
  • Complexity: Time & Space O(N 2 ) space since it uses the proximity matrix. N is the number of points. O(N 3 ) time in many cases There are N steps and at each step the size, N 2, proximity matrix must be updated and searched Complexity can be reduced to O(N 2 log(N) ) time for some approaches
  • Slide 76
  • Microarray Data Analysis Unsupervised Analysis clustering Supervised Analysis Visualization & Decomposition Pattern Analysis Statistical Analysis KNN Decision tree Neuro nets SVM LDA Nave Bayes
  • Slide 77
  • Microarray Data Analysis Unsupervised Analysis clustering Supervised Analysis Visualization & Decomposition Pattern Analysis Statistical Analysis Apriori Algorithm FP-Growth Algorithm CARPENTER
  • Slide 78
  • Microarray Data Analysis Unsupervised Analysis clustering Supervised Analysis Visualization & Decomposition Pattern Analysis Statistical Analysis PCA SVD Scatter Plot Gene Pies
  • Slide 79
  • Next
  • Slide 80
  • Slide 81
  • Finding enriched genes Seeding with known structure-specific genes. Oligodendrocyte (Mbp, Mobp, Cnp1) Choroid-plexus (Col8a2, Lbp, Msx1) Find the genes with similar expression patterns.