analysis of massively parallel sequencing data – application of … · 2017. 1. 29. · gordon...
TRANSCRIPT
-
www.sourcebioscience.com1
Analysis of Massively Parallel Sequencing Data – Application of Illumina Sequencing to
the Genetics of Human Cancers
Gordon BlackshieldsSenior Bioinformatician
Source BioScience
-
www.sourcebioscience.com2
Next Generation Sequencing ApplicationsTo Cancer Genetics Studies
Introduction
“Next Generation Sequencing” (NGS) on Illumina platform is suitable for clinical applications that require large amounts of information, accurate quantification and high-sensitivity detection
– Mutation detection in tumours (from biopsies / circulating tumour cells (CTC)). – Pathogen detection e.g. organism identification for epidemiological investigations– Gut microbial flora genomics – Detection of the presence of antibiotic resistance genes– Comparison of novel sequences / genes to those in public databases
-
www.sourcebioscience.com3
Applications of NGS to Cancer GeneticsSome Commonly Applied Techniques
“Sequencing The Genome”Reference alignment, targeted resequencing for polymorphism and mutation discoveryDe novo assembly for characterisation of novel genes, genomes.Paired-end sequencing highlights larger structural variants (inherited/acquired)
“Sequencing The Transcriptome”RNA-Seq allows “absolute” quantification of gene expression across transcriptomeNo prior knowledge of content needed – quantify expression of ‘unknown’ genes Profiling of mRNA, ncRNA, miRNA…
“Sequencing The Cistrome”ChIP-Seq allows profiling of cis-acting targets (DNA binding sites) of a trans-acting factor (transcription factor, restriction enzyme, etc) on a genome scale. Determine how proteins interact with DNA to regulate gene expressionDetermine how TFs and other proteins influence phenotype-affecting mechanismsSImilar approach can be used to characterise genomic methylation patterns – “the methylome”
-
www.sourcebioscience.com4
Applications of NGS to Cancer GeneticsLevels of information extraction, data integration
Density on known exons Novel Transcripts
Enriched regions
Binding Sources
Motif finding
Associated GenesDifferential Expression
Expression levels Novel gene models
Associate observed variants with regulation/transcriptional changes; link to external databases
Consensus Sequence
Targeted Resequencing
Variant Detection
Overlapping Genes
Integrate
Analyse
Identify
De novo assembly / reads mapped to (un) annotated reference sequenceProcess
108-109 short DNA fragmentsGenerate
Lev
el o
f In
form
atio
n E
xtra
ctio
n
Variant Detection
ChIP-Seq
Novel Isoforms
RNA-SeqQuantification
RNA-SeqDiscovery
Identify splice-crossing reads
-
www.sourcebioscience.com5
Next Generation Sequencing ApplicationsHuman Resequencing and Variant Detection
Reference Assembly, Targeted ResequencingAnd Variant Detection
Search for alterations at nucleotide level to explain changes in regulation/transcription
Single Ended (SE) sequencing• ~85% of complex genome accessible• suitable for SNPs, small indels (DIPs)
Paired-Ended (PE) sequencing• ~99% of complex genome accessible• Find longer DIPs• Find larger structural variations• Span repeat regions
-
www.sourcebioscience.com6
Next Generation Sequencing ApplicationsHuman Resequencing and Variant Detection
2009 Nature Paper
Cytogenetically normal AML genome sequenced (32x)Comparison with matched normal tissue (14x)
98 full runs on Illumina GA to achieve required depth
Alignment, variant discovery performed by MAQ
97.7% of variants in AML genome also in normal Further restricted to annotated gene-coding regions
Across all tumour cells:found 10 genes with acquired mutations (8 novel )present in all cells at presentation and relapse
“Our study establishes whole genome sequencing as an unbiased method for discovering initiating mutations in cancer genomes, and for identifying novel genes that may respond to targeted therapies”
-
www.sourcebioscience.com7
Next Generation Sequencing ApplicationsPolymorphism detections within P53
P53 Variant detection study• “Guardian of the genome” (Lane, 1992)• Protects fidelity of DNA replication• Directs cell arrest/apoptosis when stressed
• Mutated in more than half of human cancers
http://p53.free.fr/
-
www.sourcebioscience.com8
Next Generation Sequencing ApplicationsPolymorphism detections within P53
17p13.1P53 Variant detection study• “Guardian of the genome” (Lane, 1992)• Protects fidelity of DNA replication• Directs cell arrest/apoptosis when stressed
• Mutated in more than half of human cancers• Human TP53 gene located on 17p13.1• Region sometimes deleted in human cancer
-
www.sourcebioscience.com9
PCR amplification
Next Generation Sequencing ApplicationsPolymorphism detections within P53
17p13.1P53 Variant detection study• “Guardian of the genome” (Lane, 1992)• Protects fidelity of DNA replication• Directs cell arrest/apoptosis when stressed
• Mutated in more than half of human cancers• Human TP53 gene located on 17p13.1• Region sometimes deleted in human cancer
Study• Search for variants on P53 gene in matched
tumour samples. • Use gene specific PCR to amplify exons only to
maximise depth of coverage
-
www.sourcebioscience.com10
Next Generation Sequencing ApplicationsPolymorphism detections within P53
Cov
erag
e pe
r ba
se p
ositi
on
35000
30000
25000
20000
15000
10000
5000
Gene position
12000 13000 14000 15000 16000 17000 18000 19000
Coverage of p53 geneP53 Variant detection study• “Guardian of the genome” (Lane, 1992)• Protects fidelity of DNA replication• Directs cell arrest/apoptosis when stressed
• Mutated in more than half of human cancers• Human TP53 gene located on 17p13.1• Region sometimes deleted in human cancer
Study• Search for variants on P53 gene in matched
tumour samples. • Use gene specific PCR to amplify exons only to
maximise depth of coverage
• Use MAQ for alignment, variant discovery against P53 reference gene
• Comparison with results from 454, Sanger
-
www.sourcebioscience.com11
Next Generation Sequencing ApplicationsPolymorphism detections within BRCA1
BRCA1 Variant detection study• Human tumour suppressor gene • Primarily expressed in breast tissue• Helps repair damaged DNA (if possible)
• Mutations to BRCA1 allow uncontrolled replication of damaged cells.
-
www.sourcebioscience.com12
Next Generation Sequencing ApplicationsPolymorphism detections within BRCA1
CASAVADemultiplex (11 samples)Map reads to ref (BRCA1)
SAMToolsConversion to SAM format
Conversion to Pileup formatConsensus/Indel Calling
Filter for variants
Comparison with Known variants
BRCA1 Variant detection study• Human tumour suppressor gene • Primarily expressed in breast tissue• Helps repair damaged DNA (if possible)
• Mutations to BRCA1 allow uncontrolled replication of damaged cells.
Pilot Study• Search for variants on BRCA1 gene • Use gene specific PCR to amplify exons only to
maximise depth of coverage• Multiplexed – 11 samples loaded into one lane• Use CASAVA for de-multiplexing, alignment• Use SAMtools for consensus/indel calling, filtering• Validation of results against known variants.
-
www.sourcebioscience.com13
Next Generation Sequencing ApplicationsRNA-Seq: Transcriptome Analysis
RNA-Seq• Sequence RNA (translated to cDNA) • Mapped to annotated reference genome
(annotated genes, known variants)• Expression levels deduced from total
number of reads that map to exons of a gene.
RNA-Seq versus Microarray• More sensitive to low-abundance transcripts• “absolute” gene expression levels detectable
– can detect single molecules • no prior knowledge required of content• Greater ability to distinguish isoforms• Ability to determine allelic expression• Less biased
-
www.sourcebioscience.com14
DAVID – Pathways Analysis of deregulated geneshttp://david.abcc.ncifcrf.gov/
DESeq – Differential Gene Expression of RNA-Seq data
BOWTIE – Maps reads to reference genome (hg19)TOPHAT – Identifies splice sites (known/novel)CUFFLINKS – Transcript Assembly, Quantification
Next Generation Sequencing ApplicationsRNA-Seq: Transcriptome Analysis
RNA-Seq Study of ovarian cancer cell lines• Identification of changes in gene expression in strains with
acquired drug-resistance
• Special interest in ncRNA expression data
• Use Bowtie and Tophat to map reads, identify splice sites
• Use Cufflinks to assemble transcripts, calculate abundances• ~87% of reads mapped to genome
• Use DESeq to perform differential expression tests
• Use DAVID (Database for Annotation, Visualisation and Integrated Discovery (http://david.abcc.ncifcrf.gov/)) for pathway analysis
– Found significant representation of cancer pathways and focal adhesion genes
-
www.sourcebioscience.com15
Next Generation Sequencing ApplicationsChIP-Seq: Genome-wide protein-DNA interactions
ChIP-Seq• Chromatin-immunoprecipitation (ChIP) isolates protein-
bound DNA• Follow by deep sequencing of DNA fragments (Seq)• Facilitates genome wide mapping pf DNA-protein
interactions• How TFs, other chromatin associated factors can affect
phenotype. • Regulation/Structural Analysis
ChIP-Seq vs. ChIP-chip• no prior knowledge of content required• Similar approach can be used to map genomic methylation
-
www.sourcebioscience.com16
Next Generation Sequencing ApplicationsChIP-Seq: Genome-wide protein-DNA interactions
ChIP-Seq Study ofHaematopoietic Stem Cells
• Interest in Haematopoiesis and genetic circuitry of blood cell development
• Tal1 – T-cell acute lymphocytic leukaemia protein 1• TF that controls development and differentiation
of Haematopoietic Stem Cells (HSCs)• Very few target genes had been validated.
• ChIP-Seq approach taken to generate a genome-wide catalogue of Tal1 binding events in stem cell line
• Use Illumina BeadStudio ChIP-Seq module to identify peaks (potential chromatin binding sites)
• Followed by in vivo validation (foetal liver, transgenic mice)
• Allows construction of in vivo validated network of 17 factors and respective regulatory elements
-
www.sourcebioscience.com17
Applications of NGS to Cancer GeneticsLevels of information extraction, data integration
Density on known exons Novel Transcripts
Enriched regions
Binding Sources
Motif finding
Associated GenesDifferential Expression
Expression levels Novel gene models
Associate observed variants with regulation/transcriptional changes; link to external databases
Consensus Sequence
Targeted Resequencing
Variant Detection
Overlapping Genes
Integrate
Analyse
Identify
De novo assembly / reads mapped to (un) annotated reference sequenceProcess
108-109 short DNA fragmentsGenerate
Lev
el o
f In
form
atio
n E
xtra
ctio
n
Variant Detection
ChIP-Seq
Novel Isoforms
RNA-SeqQuantification
RNA-SeqDiscovery
Identify splice-crossing reads