investigate variation of chromatin interactions in human tissues hiren karathia, phd., sridhar...

19
Investigate Variation of Chromatin Interactions in Human Tissues Hiren Karathia, PhD., Sridhar Hannenhalli, PhD., Michelle Girvan, PhD.

Upload: christal-mckenzie

Post on 19-Jan-2016

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Investigate Variation of Chromatin Interactions in Human Tissues Hiren Karathia, PhD., Sridhar Hannenhalli, PhD., Michelle Girvan, PhD

Investigate Variation of Chromatin Interactions in Human Tissues

Hiren Karathia, PhD.,Sridhar Hannenhalli, PhD.,

Michelle Girvan, PhD.

Page 2: Investigate Variation of Chromatin Interactions in Human Tissues Hiren Karathia, PhD., Sridhar Hannenhalli, PhD., Michelle Girvan, PhD

Introduction of Hi-C experiment

In-silico Analysis

Page 3: Investigate Variation of Chromatin Interactions in Human Tissues Hiren Karathia, PhD., Sridhar Hannenhalli, PhD., Michelle Girvan, PhD

Objectives

• Develop a general pipeline for Hi-C data processing

• Detect gene-centric Hi-C interactions across different cell types

• Differentiate ubiquitous versus tissue specific gene-gene interactions

• Quantify spatial proximity of genes in pathways and quantify pathway proximity across multiple cell lines

• Investigate correlation between pathway proximity and pathway activity (approximated by expression of pathway genes)

• MORE….

Page 4: Investigate Variation of Chromatin Interactions in Human Tissues Hiren Karathia, PhD., Sridhar Hannenhalli, PhD., Michelle Girvan, PhD

Summary

• Outlining the Hind-III fragment distribution of Human Genome (Slide Number – 7 & 8) - These slides display numbers of in-silico Hind-III fragments (recognize AAGCTT) in the human genome. Downstream Hi-C analyses are based on these fragments.

• Hi-C data processing (Slide Number – 9 & 10) – List of samples processed. The crucial steps are normalization and filtration of the Hi-C interactions.– Filtration: Removal of technical biases from the Hi-C data. These biases include GC%, Ligation

Preferences (Self Ligations), unequal tag densities.– Normalization : Normalization is done with background calculation of expected Hi-C reads between

two given regions with assumption that interaction probability decreases with increasing distance between the two regions.

– Selection of Significant Interactions : Select the significant interactions based on difference between the observed number of reads and the expected number of reads (Odd Ratio) with significance cut-off (P-value : 0.001 & 0.05).

• Annotate the significant Hi-C interactions (Slide Number: 11) - Annotation of Hi-C interactions with Hg-19 Genomic features (Gene structures, Promoter, Intergenic & Non-coding regions).

• Non-redundant Genes in Hi-C interactions (Slide number: 12) - Select all annotated genes and promoters involved in a significant Hi-C interaction. The slide show the numbers of genes and promoters in replicates of all tissues.

• Non-redundant Hi-C Interactions across the tissues and replicates (Slide number: 13) – Hi-C interactions whose end-points are mapped on different genomic features in either replicates of all the tissues.

Page 5: Investigate Variation of Chromatin Interactions in Human Tissues Hiren Karathia, PhD., Sridhar Hannenhalli, PhD., Michelle Girvan, PhD

• Inter-tissue comparison of Hi-C interactions (Slide number : 14) - Merged all tissue replicate gene-gene Hi-C interactions and searched for interactions that are unique to single tissue and the those that are shared by pair of tissues (Figure-A). Figure-B shows number of gene-gene interactions commonly found in certain number of tissues (Figure-B).

• KEGG Pathways Analysis (Slide number : 15) – KEGG pathways with fewer than 5 annotated genes were excluded. Edge fraction was used to quantify spatial proximity of the gene in a pathway.

• Z-score distribution of the KEGG Pathways (Slide number : 16 & 17) – Edge fraction (and their z-score based on 500 length-controlled random gene sets) was calculated for ALL pathways in ALL cell types.

• Inter-tissue comparison of pathway proximity (Slide number : 18) – Unique and shared pathways with spatial proximity are shown for two z-score thresholds.

• Heat-map for the Pathways Hi-C analysis (Slide number : 19) – Heatmap shows Z scores of all the pathways in 6 tissues. The Pathways are clustered based on Manhattan distance of the Z-score vector.

Summary

Page 6: Investigate Variation of Chromatin Interactions in Human Tissues Hiren Karathia, PhD., Sridhar Hannenhalli, PhD., Michelle Girvan, PhD

• Finding Hi-C interactions at lower stringency : Since in few tissues read coverage is low, very few significant interactions are detected. We will repeat the analyses with a lower interaction significance cutoff (updated slide number 10)

• Processing RNA-Seq : There are 4 tissues for which matched RNA-Seq data are available. We will test the hypothesis that spatial proximity of pathways correlate with expression of pathway genes.

Future Work

Page 7: Investigate Variation of Chromatin Interactions in Human Tissues Hiren Karathia, PhD., Sridhar Hannenhalli, PhD., Michelle Girvan, PhD

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y M

Frequency of Hind-III frag-ments

64395 71483 59636 58925 55089 51311 45517 43187 34643 38016 39326 38987 29412 25752 22933 19903 18939 22771 11542 15705 10089 7582 45074 7430 4

5000

15000

25000

35000

45000

55000

65000

75000

Frequency of Hind-III fragments

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y M0

1000

2000

3000

4000

5000

6000

7000

8000

9000

Chromosome Length/Number of Hind-III fragment

Hind-III RE Sites on Annotated Hg19 Genome

Page 8: Investigate Variation of Chromatin Interactions in Human Tissues Hiren Karathia, PhD., Sridhar Hannenhalli, PhD., Michelle Girvan, PhD

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y M

Reference 64395 71483 59636 58925 55089 51311 45517 43187 34643 38016 39326 38987 29412 25752 22933 19903 18939 22771 11542 15705 10089 7582 45074 7430 4

hESC-1 62899 70355 59015 58207 54124 50729 44519 42537 32456 37072 38765 38661 29160 25446 22125 19325 18567 22548 11381 15420 9970 7371 43571 699 4

hESC-2 62988 70476 59123 58369 54195 50804 44571 42609 32469 37161 38783 38673 29178 25459 22224 19349 18575 22594 11396 15538 9983 7357 43512 4986 4

5000

15000

25000

35000

45000

55000

65000

75000

No. of Hind-III fragments

ReferencehESC-1hESC-2

Distribution of RE sites in cell line sample

Page 9: Investigate Variation of Chromatin Interactions in Human Tissues Hiren Karathia, PhD., Sridhar Hannenhalli, PhD., Michelle Girvan, PhD

SampleFastq files

BWA SamtoolsBAM file Merged BAM file

Samtools

Samtools

Sorted BAM fileDe-duplicated file

Picard tool

Separate Hi-C interacting Reads

Samtools

SAM file

Select Significant Interactions

HOMER tools

Tissue ID Tissue Source DNA RNAHEK293 Kidney Cell Line (Replicate 1 & 2)

hESC Embryonic Stem Cell Line (Replicate 1 & 2) IMR90 Lung Fibroblast Cell Line (Replicate 1 & 2) BT483 Mammary Gland Cell Line (Replicate 1 & 2)

GM06990 B-Lymphocyte Cell Line (Replicate 1 & 2)RWPE1 Prostate Epithelial Cell Line (Replicate 1 & 2)

Annotate the Interactions

Normalize Hi-C reads

Hi-C data processing

Pathways Analysis Gene centric Analysis

In-house Python Scripts

HOMER tools

Page 10: Investigate Variation of Chromatin Interactions in Human Tissues Hiren Karathia, PhD., Sridhar Hannenhalli, PhD., Michelle Girvan, PhD

Normalization & Filtration of Hi-C interactions

N = estimated total number of reads n = estimated total number of interaction reads at each regionf = expected frequency of Hi-C reads as a function of distance

Select Significant Intra/Inter chromosomal interactions

Random InteractionsInteractions after Normalization and Filtrations process

Annotate the Interactions

Page 11: Investigate Variation of Chromatin Interactions in Human Tissues Hiren Karathia, PhD., Sridhar Hannenhalli, PhD., Michelle Girvan, PhD

chr1chr3chr5chr7chr9

chr11chr13chr15chr17chr19chr21

chrX

2500 7500 12500 17500 22500chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr1

0chr1

1chr1

2chr1

3chr1

4chr1

5chr1

6chr1

7chr1

8chr1

9chr2

0chr2

1chr2

2chrX chrY

3 UTR

236 134 172 103 148 149 98 76 70 103 117 131 62 93 78 50 112 49 40 44 41 55 43 0

5 UTR

22 13 16 10 8 23 6 13 11 10 10 19 9 14 12 8 13 2 3 11 5 6 3 0

Intergenic

9600

7780

8279

8473

7441

9382

5430

4955

4525

4625

3410

5915

7964

3792

3389

1658

2548

2630

838 1787

1754

1141

2350

23

TTS

269 148 175 93 115 174 104 79 84 110 96 134 71 102 77 60 137 31 63 43 32 64 27 0

exon

320 176 206 130 118 199 130 79 90 132 116 160 88 124 118 90 146 43 62 64 27 79 52 0

intron

8961

5978

7014

4877

4922

6225

5081

3645

3696

4828

3341

4821

4356

3423

3755

1729

3368

1841

793 1918

1296

1714

1408

2

non-coding

62 52 44 23 35 64 33 34 32 40 35 30 42 23 28 20 36 6 13 24 10 18 6 1

promoter-TSS

312 155 176 105 143 198 108 77 124 106 145 156 96 99 101 81 166 34 60 55 37 57 33 0

3 UTR 5 UTR Intergenic TTS exon intron non-coding promoter-TSS

Annotation of Hi-C interactions on Genomic Structures

i.e., HEK293 Tissue

Page 12: Investigate Variation of Chromatin Interactions in Human Tissues Hiren Karathia, PhD., Sridhar Hannenhalli, PhD., Michelle Girvan, PhD

Genes & Promoters in Hi-C interactions

Page 13: Investigate Variation of Chromatin Interactions in Human Tissues Hiren Karathia, PhD., Sridhar Hannenhalli, PhD., Michelle Girvan, PhD

Genomic features on end points of Hi-C interactions

Page 14: Investigate Variation of Chromatin Interactions in Human Tissues Hiren Karathia, PhD., Sridhar Hannenhalli, PhD., Michelle Girvan, PhD

Inter-tissue Hi-C gene-gene interactions

Figure – A

Figure – B

Diagonal values represent interactions unique to a tissue.

Other values represent interactions shared between 2 specific tissues

Tissues HEK293 IMR90 hESC BT483 GM06990 RWPE1

HEK293 19609

IMR90 23015 20608

hESC 17683 16965 6685

BT483 11371 11961 10498 3503

GM06990 8846 9036 9879 8958 1419

RWPE1 6555 6931 5771 4865 4025 1440

Page 15: Investigate Variation of Chromatin Interactions in Human Tissues Hiren Karathia, PhD., Sridhar Hannenhalli, PhD., Michelle Girvan, PhD

Pathways analysis

Evaluate Edge-fraction property for its statistical correlation with spatial proximity

E(f) = set of observed gene-gene interactions in a pathway

Ea(f) = possible gene-gene interactions of all the genes in a pathway

Z score of the Edge-Fractions calculated from randomly selected length-controlled genes

Page 16: Investigate Variation of Chromatin Interactions in Human Tissues Hiren Karathia, PhD., Sridhar Hannenhalli, PhD., Michelle Girvan, PhD

Pathways analysis for Gene-Gene Interactions

49456 Interactions 14524 Interactions 30356 Interactions

Page 17: Investigate Variation of Chromatin Interactions in Human Tissues Hiren Karathia, PhD., Sridhar Hannenhalli, PhD., Michelle Girvan, PhD

Pathways analysis for Gene-Gene Interactions

20018 Interactions 50841 Interactions 10088 Interactions

Page 18: Investigate Variation of Chromatin Interactions in Human Tissues Hiren Karathia, PhD., Sridhar Hannenhalli, PhD., Michelle Girvan, PhD

Inter-tissue Hi-C pathways interactions

Z-score >= 1

Z-score >= 2

Tissues HEK293 IMR90 hESC BT483 GM06990 RWPE1HEK293 2

IMR90 112 1

hESC 110 108 1

BT483 107 105 103 2

GM06990 109 109 114 104 1

RWPE1 72 73 73 75 73 0

Tissues HEK293 IMR90 hESC BT483 GM06990 RWPE1

HEK293 3

IMR90 76 4

hESC 71 72 7

BT483 61 64 60 0

GM06990 73 72 75 62 1

RWPE1 30 32 32 29 32 3

Page 19: Investigate Variation of Chromatin Interactions in Human Tissues Hiren Karathia, PhD., Sridhar Hannenhalli, PhD., Michelle Girvan, PhD

Heat-map for the Pathways Hi-C analysis