genome widetargets of dna binding molecules: from seq to
TRANSCRIPT
Genome‐wide targets of DNA‐binding molecules: from ChIP‐Seq to Chem‐Seq
Mario Nuvolone
Technical Journal Club
May 05th 2015
DNA‐binding proteins
DNA‐binding small molecules
ChIPChIP‐on‐chipChIP‐seq
Chem‐seq
DNA‐binding proteins
‐ Include histones, transcription factors, DNA polymerases, RNA polymerases, DNA nucleases etc.
‐ Play crucial roles in many cellular processes (transcription, splicing, replication, DNA repair etc.)
Furey Nat Rev Genet 2012Schones Nat Rev Genet 2008
Mapping DNA‐binding proteins
‐ Bound genomic locations in a particular cell type cannot be predicted using DNA sequence features alone
‐ Functional assays are necessary
‐ Mapping binding sites is vital to study epigenome and regulatory networks underlying different biological processes
Furey Nat Rev Genet 2012Park Nat Rev Genet 2009
Chromatin immunopurification (ChIP)
Solomon et al. Cell 1988
Collas Methods Mol Biol 2009
Evolution of ChIP: ChIP‐on‐chip
"Chromatin immunoprecipitation"
Year# Pub
Med
Entrie
s
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
0
500
1000
1500
2000
ChIP‐on‐chip
ChIP‐on‐chip
Park Nat Rev Genet 2009Schones Nat Rev Genet 2008
Collas Methods Mol Biol 2009
Evolution of ChIP: ChIP‐seq
"Chromatin immunoprecipitation"
Year# Pub
Med
Entrie
s
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
0
500
1000
1500
2000
ChIP‐seq
ChIP‐seq
Furey Nat Rev Genet 2012
Landt et al. Genome Res 2012
Park Nat Rev Genet 2009
ChIP‐seq: peak identification
‐ Fragments are sequenced at the 5′ end
‐ Locations of reads form two distributions (+ and – strand)
‐ Combined profiles used to identify peak
Park Nat Rev Genet 2009
ChIP‐seq: peak identification
Park Nat Rev Genet 2009
ChIP‐seq: types of peaks
Sharp binding sites
Mixed peaks
Medium size broad peaks
Large size broad peaks
Insulators
Transcribed regions
Transcription elongation
Transcription repression
ChIP‐on‐chip vs ChIP‐seq
Park Nat Rev Genet 2009
*
* Now reduced→ ChIP‐seq:higher resolution fewer artefactsgreater coverage larger dynamic range
ChIP‐seq: issues in experimental design
Park Nat Rev Genet 2009
‐ Antibody quality
‐ Sample quantity
‐ Control experiment
‐ Sequencing
‐ Data analysis
‐ Downstream analyses
ChIP‐seq: Antibody quality
Park Nat Rev Genet 2009
‐ Crucial determinant of any ChIP experiment
‐ Sensitivity and specificity of antibody influence enrichment and peak detection
‐ Batches differences in commercial antibody
‐ Need for rigorous validation
Landt et al. Genome Res 2012
ChIP‐seq: Antibody quality
ChIP‐seq: Sample quantity
Park Nat Rev Genet 2009
‐ Lower amount of DNA required(ChIP‐on‐chip: > 2μg; ChIP‐seq: 10‐50 ng)
‐ Fewer rounds of amplifica on → Less PCR bias
‐ Required amount depends on:
‐ Abundance of target protein
‐ Quality of antibody
ChIP‐seq: Control experiments I
Park Nat Rev Genet 2009
‐ Potential sources of artefacts:‐ Euchromatin vs heterochromatin‐ Repetitive regions
‐ Controls are required
‐ Commonly used controls:‐ Input DNA‐ Mock IP DNA‐ DNA from non‐specific IP
ChIP‐seq: Control experiments II
Furey Nat Rev Genet 2012Landt et al. Genome Res 2012
‐ Experimental replication:
‐ Minimum of two experimental replicates
‐ Biological rather than technical replicates
‐ For duplicates:‐ 80% of the top 40% of identified targets in one replicate must be replicated
OR‐ 75% of targets lists must be in common between both replicates
ChIP‐seq: Sequencing I
Park Nat Rev Genet 2009
‐ Higher sequence depth allows detection of more sites with lower level of enrichment over background
ChIP‐seq: Sequencing II
Park Nat Rev Genet 2009
‐ Multiplexing at the sequencing stage is possible
‐ Paired‐end sequencing for particular applications
ChIP‐seq: Data analysis I
Furey Nat Rev Genet 2012Park Nat Rev Genet 2009
‐ Sequence alignment (some mismatches allowed)
‐ Identification of enriched reagions (peaks)
‐ Data quality assessment
‐ Fractions of reads in peaks (FRiP) >1%
‐ Correlations of sequence reads from + and ‐ strands
Furey Nat Rev Genet 2012
ChIP‐seq: Data analysis II
ChIP‐seq: Downstream analyses
Park Nat Rev Genet 2009
→ Introduc on of ChIP‐seq
Johnson et al. Science 2007
‐ Jurkat human T lymphoblast cell line
‐ IP with anti‐human neuron‐restrictive silencer factor (NRSF/REST)
‐ Validated target genes
‐ Known DNA motif bound (NRSE)
‐ Additional predicted target genes
‐ High‐quality monoclonal antibody validated for ChIP
‐ Two replicate IPs (one with and one without PCR amplification)
‐ Illumina sequencing on two IPs and two DNA input controls
Study design
Johnson et al. Science 2007
‐ 2‐5 x106 25bp sequences per sample
‐ Removal of reads with multiple sites
‐ Alignment allowing up to 2 mismatches
‐ Generation of ChIPSeq peak locator for peaks identification
‐ Peaks required to have ≥5 fold enrichment vs input DNA control and ≥ 13 reads
Johnson et al. Science 2007
‐ #Reads/peak: from 13 to 6718
‐ Most peaks contains one canonical NRSE binding motif
‐ 83 NRSF‐binding sites previously identified (true positives)
‐ 130 known negatives (true negatives)
‐ 87% sensitivity‐ 98% specificity
Sensitivity
1‐Specificity
Johnson et al. Science 2007
‐ Based on 771 computationally identified NRSF binding sites which were positive also at ChIP‐seq
‐ Distance <50 bp in 94% cases
‐ Based on all computationally identified NRSF binding sites
‐ Virtually all strong canonical NRSF binding sites were detected
→ Introduc on of HiC to study 3D architectures of whole genomes
Lieberman‐Aiden et al. Science 2009
Study design
‐ Karyotypically normal lymphoblastoid cell line (GM06990)
‐ Illumina sequencing with (paired end)
Lieberman‐Aiden et al. Science 2009
‐ 8.4 x 106 read pairs mapped to human genome
‐ 6.7 x 106 read pairs correspond to long‐range contacts (>20 kb apart or interchromosomal)
‐ Genome‐wide contact matrix M:‐ 1 Mb regions (loci)‐ Matrix entry mij: #ligation products between locus i and locus j
Lieberman‐Aiden et al. Science 2009
‐ Contact probability:intrachrom >> interchrom(also at high distances)
‐ Small, gene‐rich chromosomes preferentially interact with each other
→ Chromosome territories
Lieberman‐Aiden et al. Science 2009
→ Introduc on of Chem‐seq
Anders et al. Nat Biotechnol 2014
Study design
‐ MM1.S multiple myeloma cell line
‐ Bromodomain inhibitor JQ1, known to bind the BET bromodomainfamily members BRD2, BRD3 and BRD4
Anders et al. Nat Biotechnol 2014
Bio‐JQ1: biotinylated derivative of JQ1
Only slightly reduced bioactivity on MM1.S cells
Anders et al. Nat Biotechnol 2014
‐ Similar results with Chem‐Seqwith bio‐JQ1 in vivo and in vitro
‐ Similar profile with ChIP‐seqwith BDR2, BDR3 and BDR4
Anders et al. Nat Biotechnol 2014
No significant binding with bio‐JQ1R
Anders et al. Nat Biotechnol 2014
JQ1 occupancy of chromatin is most strongly correlated with that of BRD4 in MM1.S cells
DNA‐binding proteins
DNA‐binding small molecules
ChIPChIP‐on‐chipChIP‐seq
Chem‐seq