genetics journal club

37
Genetics Journal Club Sylvia 03/26/15

Upload: dodat

Post on 13-Feb-2017

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Genetics Journal Club

Genetics Journal Club

Sylvia 03/26/15

Page 2: Genetics Journal Club
Page 3: Genetics Journal Club

Interactions (4C), using the Fto (purple) or Irx3 (blue) promoter as a viewpoint. The locus is displayed around the inner circle, the interaction are displayed as lines (darker lines symbolizing greater significance), and interactions above background are shown on the outer circles. (Figure 1a, taken from Smemo et al).

GWAS SNP to target gene(s) association

eQTL

ASE

Chromosome Conformation

Page 4: Genetics Journal Club

Transcriptional Regulation of Gene Expression

Page 5: Genetics Journal Club

Chromosome Conformation Capture

Capture-C

“all-versus-all”“one-versus-some” “one-versus-all” “many-versus-many”

Hi-C

2002 2006 200920142006

Page 6: Genetics Journal Club
Page 7: Genetics Journal Club
Page 8: Genetics Journal Club

What? Hi-C

CTCFpreviously published histone mods and MethylCpublic DNAse

public WGSRNA-Seq

Why?Describe higher order chromatin organization during lineage specification

Page 9: Genetics Journal Club

TeSR™-E8™

Page 10: Genetics Journal Club

The trophoblast

blastocyst

forms the outer layer of the blastocyst, which provide nutrients to the embryo and develop into a large part of the placenta

“Among the most studied of all hESC lines, H1 was derived by James Thomson, director of regenerative biology at the Morgridge Institute for Research and professor of anatomy at UW-Madison, during his 1998 breakthrough discovery of these unique and promising cells. H1 is the first of the formerly approved pre-2001 Bush era lines to meet the new NIH guidelines for stem cell research.”

H1 human embryonic stem cell line

Page 11: Genetics Journal Club

Mesoderm and Mesenchymal Stem Cells

Page 12: Genetics Journal Club

NPCs and Neurons (Ectoderm)

IMR90:human fetal lung fibroblastcell line

Page 13: Genetics Journal Club

NPC:7dE8 minus FGF2, minus TGFβ1, with only 5 μg/ml insulin, with 10 μM SB431542 and 100ng/ml Noggin.

TB:5d E8 minus FGF2 with 50ng/ml BMP4

ME:2dE8 with 5ng/ml BMP4 and 25ng/ml Activin A

MSC:6dM-SFEM containing 50% StemLine™ II serum-free HSC expansion medium (HSFEM; Sigma), 50% ESFM, GlutaMAX™ (1/100 dilution), Ex-Cyte® supplement (1/2000 dilution), 100 μM MTG, and 10 ng/ml FGF2.

H1 differentiation

Neurons:additional 25dDMEM/F12 medium supplemented with 1x N2, 1x B27, 64 μg/ml vitamin C, 14ng/ml sodium selenite and 5ng/ml FGF2

Page 14: Genetics Journal Club

Topological associated Domains (TADs)

Jargon

Mb-scale compartments of interphase chromosomes

either open, gene rich, highly transcribed and interactive (A compartment)or closed, gene poor, less transcriptionally active (B compartment)

Page 15: Genetics Journal Club

TADsbased on Directionality Index and Hidden Markov Model

Directionality Index“We noted that the regions at the periphery of the topological domains are highlybiased in their interaction frequencies…To determine the directional bias at any given bin in the genome, we developed aDirectionality Index (DI) to quantify the degree of upstream or downstream bias of agiven bin. The directionality index is calculated in equation 1, where A is the number ofreads that map from a given 40kb bin to the upstream 2Mb, B is the number of reads thatmap from the same 40kb bin to the downstream 2Mb, and E, the expected number ofreads under the null hypothesis, is equal to (A + B)/2.

Page 16: Genetics Journal Club

Figure 1 | Dynamic reorganization of chromatin structure during differentiation of human ES cells.

First principal component (PC1) values

Hi-C interaction heat maps

blue:A compartmentyellow:B compartment

36% A/B compartment change in at least one lineage

A/B compartments within TADs

Page 17: Genetics Journal Club

b) K-means clustering (k=20) of PC1 values for 40-kb regions of the genome that change A/B compartment status in at least one lineage. c) K-means clustering of PC1 values surrounding TAD boundaries (b=TAD boundary)

many A/B transitions are lineage restricted blue:A compartmentyellow:B compartmentexpansion of repressive heterochromatin

during differentiation

Page 18: Genetics Journal Club

d) Distribution of fold-change in gene expression for genes that change compartment status or that remain the same (‘stable’) upon differentiatione) Genome browser for two genes of which one (OTX2) shows concordance between expression and PC1 values, whereas a second (TMEM260) does not.

A to B change correlates with decreased expressionB to A change with increased expression

subtle effects!all genes in compartment

maybe only subset of genes affected

Page 19: Genetics Journal Club

Figure 2 | Domain-wide alterations in chromatin interaction frequency and chromatin state.

a) Chromatin interaction heat maps in H1 lineages and IMR90 fibroblasts. Also shown are domain calls in ES cells and thedirectionality index (DI) in each lineage.

positioning of TADs stable

Page 20: Genetics Journal Club

b) Changes in interaction frequency between ES and MS cells. Regions with higher interaction frequency in ES cells are shown in blue, while regions with higher interaction frequency in MScells are shown in yellow. TADs having a concerted increase or decrease in intra-domain interaction frequency are labelled yellow or blue, respectively, with the fraction of the domain showing increased or decreased interaction frequency listed. Domains that do not show a concerted change are shown in grey.

domain-wide increase or decrease in interactions

Page 21: Genetics Journal Club

c) Boxplots of Pearson correlations coefficients between interaction frequency changes and chromatin mark changes across TADs for each chromosome (n=23). d) Classification accuracy of the Random Forest model in predicting whether a bin increases or decreases in interaction frequency (n=768,793), tested on 10 randomly selected subsets ofHi-C data. (actual data (blue), circularized permutation (green) and a random permutation (yellow))e) Ranked chromatin features shown according to importance in classification as boxplots of the mean decrease in Gini index from 10 randomly selected data subsets. Whiskers correspond to the highest and lowest points within 1.53 the interquartile range.

interaction frequency changes correlate with histone mark changes LOI: loss of interactionGOI: gain of interaction

Page 22: Genetics Journal Club

The vector of histone modification values was calculated as follows.

For each 40-kb interacting bin, the enrichment of a given chromatin mark in the two 40-kb bins that compose the interaction was averaged.

The average enrichment was then multiplied by a weight proportional to the genomicdistance between the two 40-kb bins. This weight was based on the global averageof Hi-C interaction frequencies from six lineages analyzed between loci separatedby a given genomic distance.

The two vectors were used to calculate a Pearson correlation in each chromosome, which reflects how change in domain-wide

Page 23: Genetics Journal Club

Random Forrest model

to better understand which histone mark is most predictivefor changes in interaction frequency

Page 24: Genetics Journal Club

Figure 3| Haplotype-resolved chromatin organization in H1 lineages.

Phasing = Haplotype Inference from population genotype data (HapMap, 1000 Genomes)

haplotype-specific reanalysis of genome-wide data

p1, p2:parental alleles

Page 25: Genetics Journal Club

GenotypesPublic WGS data Hg18 (Novoalign)Picard tools, GATK, Unified Genotyper

HaplotypesFrom Hi-C dataHaploSeq method using HapCUT

chromosome-span haplotypes including93.5% of all H1 heterozygous variants

Page 26: Genetics Journal Club

c) Genome browser image of PC1 values along chromosome 2 for the p1 and p2 allele. d) Allele specific compartment A/B patterns and mRNA-seq surrounding the imprinted ZDBF2 gene.e) Boxplots of the difference between alleles of PC1 values. Regions with imprinted genes and allelic genes have more variable PC1 valuesf) Similar to e, but for regions with differential allelic chromatin activity (the number of allelic biased variants per 200-kb bin). Regions in the top 0.1% of differential allelic activities (orange) show greater differences in PC1 values compared other regions

A/B compartment patterns highly similar (autosomes)

only 0.6-2.3% of genome

Page 27: Genetics Journal Club

Imprinted genes:

list of known imprinted genesdownloaded from www.geneimprint.com

Page 28: Genetics Journal Club

Figure 4 | Allelic biases in gene expression in H1 lineages.

a) Proportion of genes with detectable allelic expression with statistically significant allelic bias.b) Density plot of the absolute value of the fold change in expression (log2) between alleles.

mostly not on/off eventsboth lineage-specific and constitutive genes

Page 29: Genetics Journal Club

c) Heat map showing k-means (k=20) clustering of the allelic expression ratios (log2) at genes with constitutively testable expression (a minimum of 10 reads in each lineage).

d) Genome browser image of variable allelic expression of the PARP9 gene.

?

Only in rare cases do genes switch expression from one allele to the other between cell types.

Page 30: Genetics Journal Club

e) Fraction of imprinted genes among allele-biased genes and other genes. (P=4.4x10-5, Fisher’s exact test).f) Fraction of allele-biased genes that are known imprinted genes. .

Imprinted genes are overrepresented within ASE genes

imprinted genes often occur in clusters -the majority of allele-biased gene expression is not clustered in the genome

but imprinting not MAIN mechanism of ASE

Page 31: Genetics Journal Club

g) Cumulative density plot of distances from variants to the nearest allele-specific gene.Allele specific variants are defined using histone acetylation, H3K9me3, H3K27me3, DHS and H3K4me3 h) Number of allele-biased genes showing consistent allele specific chromatin states in their promoter regions. Active variants are defined by H3K4me3, DHS or histone acetylation. Inactive promoter variants are defined by DNA methylation and H3K9me3/27me3. i) Genome browser image of mRNA-seq and chromatin features surrounding the TDG gene.

> role of cis-regulatory elements in ASEallele-specific chromatin markSNPs closer to ASE genes ASE strongly correlated with allele-specific

chromatin marks at promoters

29%

majority of ASE genes show allele-specific chromatin marks in promoter

?

Page 32: Genetics Journal Club

Figure 5 | Allele biases at enhancers in H1 lineages.

a) Enrichment of acetylation (top row), DHS (middle) and DNA methylation (bottom) atenhancers defined as allelic by acetylation (left column), DHS (middle), or DNA methylation (right). The active allele is in blue, inactive allele in red.b) The distance between allelic genes and enhancers as defined by allelic histoneacetylation (purple) compared with randomly selected enhancers (grey).c) Number of allele specific genes linked to concordantly biased allele specific enhancers. Genes linked by ‘long-range enhancers’ are defined using Hi-C interaction frequencies, whereas ‘short-range enhancers’ are defined as any enhancer less than 20 kb from a genes transcription start site.

high correlation between allelic enhancers and enrichment

allelic enhancers closer to ASE genes

mostly long –range enhancers

Page 33: Genetics Journal Club

d) Boxplots of the Pearson correlation coefficients between allelic gene-enhancer pairs defined by acetylation. Gene-enhancer pairs are grouped into strongly interacting (top 30%), weakly interacting (bottom 30%), and intermediately interacting pairs (others) based on Hi-C interaction frequency (P values using Welch’s t-test).e) Normalized 4C-seq interaction frequencies near the HAPLN1 gene. The 4C-seq bait region is in an allele-biased enhancer near the 3’ end of the EDIL3 gene. Specific interactions called by the LOWESS regression model are shown in black as ‘bait interacting regions’ (BIRs).

correlation strongest for strong Hi-C interactions

4/6 allelic enhancers interact with ASE gene promoter

Page 34: Genetics Journal Club

f) Allele-biased expression of the two alleles of the HAPLN1 gene, histone acetylation levels at the nearby interacting allele-biased enhancer and allele resolved 4C-seq data

Are there allele-specific interactions of allelic enhancers and target genes (Hi-C data)?

Supplementary Figure 9a

correlation trend between allelic enhancer and expression or4C interaction frequency

Page 35: Genetics Journal Club

Summary

Extensive A/B compartment switching during differentiation36% of genome in at least one lineagecorrelate with gene expression level changes

Domain-level changes in interaction intensitycorrelate with changes in chromatin markspredictable (H3K4me1 most informative)

Allele-specific chromatin organizationautosomal A/B compartments mostly stable between allelesallelic difference at imprinted genes and

regions with allelic chromatin activity

Allelic imbalance between different lineagesboth lineage-specific and constitutive genesimprinted genes overrepresented

ASE genes and allelic promoters/enhancersstrong correlation between ASE genes and allelic chromatin marks

at promotersevidence for correlation between allelic enhancers and ASE

Page 36: Genetics Journal Club

Discussion

Analysis of allele-specific chromosome interactions problematic because ofHi-C resolution problem

Figure 2bAre domain-wide changes in interactionintensities associated with lineage specificgenes?

move from Hi-C in different lineages (fg1&2) to ASE and chromosome marks (fig3&4)back to Hi-C

Page 37: Genetics Journal Club

Both compartment subset gene changes and

whole domain changes?