lugm-update of the illumina analysis pipeline
Post on 14-Apr-2017
1.154 Views
Preview:
TRANSCRIPT
© 2011 Illumina, Inc. All rights reserved.Illumina, illuminaDx, BeadArray, BeadXpress, cBot, CSPro, DASL, Eco, Genetic Energy, GAIIx, Genome Analyzer, GenomeStudio, GoldenGate, HiScan, HiSeq, Infinium, iSelect, MiSeq, Nextera, Sentrix, Solexa, TruSeq, VeraCode, the pumpkin orange color, and the Genetic Energy streaming bases design are trademarks or registered trademarks of Illumina, Inc. All other brands and names contained herein are the property of their respective owners.
Update of the IlluminaAnalysis Pipeline
顏海威 Henry YenBioinformatics FAS
均泰生物科技有限公司techsupport@gtbiotech.com.tw
Slide generated from Henry Yen
2
Course Objectives
By the end of this course, you will be able to:
Illumina Data Analysis Overview
The Workflow in MiSeq Reporter
Powerful Annotation Tool - VariantStudio
Illumina iCloud - BaseSpace
Slide generated from Henry Yen
3
Illumina Data Analysis Overview
Slide generated from Henry Yen
4
Data Visualization
Secondary Analysis
Primary Analysis
Data Analysis Workflow
Slide generated from Henry Yen
5
Alignments and Variant Detection
Images/TIFF files
Base CallingIntensities
Outputs Outputs
Primary and Secondary Analysis Overview
Analysis Type
Primary Analysis(RTA)
Secondary Analysis(MSR / BaseSpace)
Sequencing(MCS/NCS/HCS)
Slide generated from Henry Yen
6
MiSeq Analysis Workflow
RTA
Resequencing Amplicon Small RNA De novoAssembly
16SMetagenomics
Base calls &Quality Scores
Instrument Control Software(MCS)
Images and Intensities
Limited Visualization via HTTP interface
Application-specific additional analysis
Alignment/FASTQ, Variants, Statistics
Enrichment
MiSeq Reporter
I’m All-in-One Sequencer
Slide generated from Henry Yen
7
Why We use the MiSeq Reporter
Automatic– Auto start after sequencing
Simply– Start-to-end workflow
Powerful– Support different analysis required
Friendly– Graphical User Interface
Slide generated from Henry Yen
8
The Workflow in MiSeq Reporter
Slide generated from Henry Yen
9
Workflows from MiSeq Reporter
AssemblyCapture-based Taxonomy
Reference Non Reference
Whole genome
Targeted-Seq
PCR-based
Resequencing
Library QC
EnrichmentAmplicon
Amplicon-DSPCR-Amplicon
mtDNA
RNA
Small RNA
Targeted-RNA
De novoAssembly Metagenomics
MiSeq Reporter
Slide generated from Henry Yen
10
Resequencing Workflows
Adapter Masking
Reads Demultiplexing
Enrichment workflow:
Reads are aligned to reference genome.
Variants are noted
Output the fastq, .bam, .vcf, .gVCF
Report the on-targeted rate, coverage & variants summary
Alignment
Indel Realignment
Bin / Sort
Variants Calling
Report
Fastq file
BAM file
VCF file
PDF file
Duplicated Flag
Resequencing
Slide generated from Henry Yen
11
Library QC Workflows
Adapter Masking
Reads Demultiplexing
PCR Amplicon workflow:
Analyzed the data by BWA.
Reads are aligned to reference genome.
Non Variants calling
Output the fastq, .bam,
Alignment
Indel Realignment
Bin / Sort
Alignment Statistics
Fastq file
BAM file
Duplicated Flag
LibraryQC
Slide generated from Henry Yen
12
Enrichment Workflows
Adapter Masking
Reads Demultiplexing
Enrichment workflow:
Reads are aligned to targeted region.
Analyzed data from probe captured
Output the fastq, .bam, .vcf, .gVCF
Report the aligned rate, on-targeted rate, coverage & variants summary
Alignment
Indel Realignment
Bin / Sort
Variants Calling
Targeted Statistics
Fastq file
BAM file
VCF file
CSV file
Duplicated Flag
Targeted Region
Enrichment
Slide generated from Henry Yen
13
Amplicon Workflows
Adapter Masking
Reads Demultiplexing
Amplicon workflow:
Analyzed the data from short-range PCR.
Reads are aligned to targeted region.
Customer targeted design from Illumina
Output the fastq, .bam, .vcf, .gVCF
Alignment
Indel Realignment
Bin / Sort
Variants Calling
Fastq file
BAM file
VCF file
Targeted Region
TruSeq Amplicon
Amplicon Viewer
Excel file
Slide generated from Henry Yen
14
Amplicon-DS Workflows
Adapter Masking
Reads Demultiplexing
Amplicon-DS workflow:
Analyzed the data from TruSight Tumor.
Variants check by double strand.
Filtering FFPE sample false-positive variants
Output the fastq, .bam, .vcf, .gVCF
Alignment
Indel Realignment
Bin / Sort
Variants Calling(Somatic)
Fastq file
BAM file
VCF file
Targeted Region
Variants filtering
Amplicon-DS
Slide generated from Henry Yen
15
Two manifest file :1. downstream locus-specific oligos (DLSO)2. upstream locus-specific oligos (ULSO)
The DNA Deamination bias corrected
The Amplicon Double-Stranded workflow can remove the FFPE sample DNA deamination bias (C -> T)
Slide generated from Henry Yen
16
PCR Amplicon Workflows
Adapter Masking
Reads Demultiplexing
PCR Amplicon workflow:
Analyzed the data from long-range PCR.
Reads are aligned to targeted region.
Targeted design by customer
Output the fastq, .bam, .vcf, .gVCF
Alignment
Indel Realignment
Bin / Sort
Variants Calling
Fastq file
BAM file
VCF file
Targeted Region
Duplicated Flag
PCR Amplicon
Slide generated from Henry Yen
17
mtDNA Workflows
Adapter Masking
Reads Demultiplexing
mtDNA workflow:
Analyzed the data by forensic.
Reads are aligned to rRCS.
Output the fastq, .bam, viewer file & excel file
It can be used to trace maternal lineage
Alignment with rRCS
Bin / Sort
Show by mtDNA viewer
Fastq file
BAM file
Excel file
Viewer file generated
Viewer file
mtDNA
Slide generated from Henry Yen
18
Small RNA Workflows
Adapter Masking
Reads Demultiplexing
Small RNA workflow:
Analyzed the data by Bowtie.
Reads are aligned to miRBase.
Non Variants calling
Output the fastq, .bam, pi chart & reads count for miRNA
Alignment
Bin / Sort
Reads count
Fastq file
BAM file
TXT file
Small RNA
Slide generated from Henry Yen
19
Targeted RNA Workflows
Adapter Masking
Reads Demultiplexing
Targeted RNA workflow::Reads are aligned against custom manifest file (banded Smith-Waterman)
Reports relative expression of genes and isoforms between several samples
Outputs:
FASTQ, BAM, HTML report
Alignment
Bin / Sort
Different Expression Analysis
Fastq file
BAM file
HTML file
Targeted RNA
Slide generated from Henry Yen
20
De novo assembly Workflows
Adapter Masking
Reads Demultiplexing
De novo Assembly workflow:
The data Assembly by Velvet.
Assembly of small (<20MB) genome from reads, without the use of a genomic reference
Output the fastq, .fasta & dot plot
Assembly
Indel Realignment
Dot plot
Fastq file
Fasta file
De Novo Assembly
Slide generated from Henry Yen
21
Metagenomics Workflows
Adapter Masking
Reads Demultiplexing
Metagenomics workflow:
Bacteria population analysis based on 16S rRNA amplicons .
Assembly of small (<20MB) genome from reads, without the use of a genomic reference
Output the fastq, .fasta & dot plot
Reads Classification
Current Taxonomy
Pi chart
Fastq file
Fasta file
Metagenomics
Slide generated from Henry Yen
22
Greengenes database 13.5 (May 2013) to perform taxonomic classification– http://greengenes.lbl.gov/– Illumina-curated version– Filter entries with 16S length <1250 bp– Filter entries with incomplete annotation
Bayesian classification method to assign taxonomies
RDP Naïve Bayesian Classifier (http://dx.doi.org/10.1128%2FAEM.00062-07)
Short sub-sequences are extracted from each read and compared to the
database by the classifier
Uses full length Illumina paired-end reads
Classification down to genus/species-level
16S metagenomics in MiSeq Reporter 2.4Slide generated from Henry Yen
23
Top 20 classification results
Ordered by Taxonomic level
New HTML Output in Metagenomics WorkflowSlide generated from Henry Yen
24
Read Stitch in MiSeq Reporter
≥ 10 bps
Read 1Read 2
Stitch Read
MiSeq Reporter has the PE reads stitch function
Read 1 and Read 2 have minimum 10 bps overlapping Bases Match Score need ≥ 0.9
Bases Match Score = 1- [Base Mismatch Rate] Overlapping PE reads can be stitch one read.
Cannot be stitched PE reads are converted to two single reads in the FASTQfile.
Slide generated from Henry Yen
25
Powerful Annotation Tool VaraintStudio
Slide generated from Henry Yen
26
Illumina VariantStudioIntuitive analysis and interpretation
Import Data Annotate Filter Classify Report
• Intuitive user interface
• Rich annotations
• Flexible and comprehensive set of filters
• Streamlined variant classification
• Easy and customizable report generation
Insight
Slide generated from Henry Yen
27
Illumina VariantStudio WorkflowData in, biological knowledge out
Import VCF or gVCF Files
Illumina VariantStudio Desktop ClientIllumina VariantStudio Desktop Client
Export Report of interpreted variants
VariantStudioAnnotation Database
Slide generated from Henry Yen
28
Annotation & FilteringLeveraging a broad range of annotation sources to enrich data with biological context
NHLBIExome Variant Server
1,000,000sDetected Variants
1,000,000sDetected Variants
10,000sCoding Variants
10,000sCoding Variants
100sDeleterious
Variants
100sDeleterious
VariantsFew
Causal Variants
FewCausal
Variants
Big Data
Easy to validate
Slide generated from Henry Yen
29
Clinical Panels and VariantStudioStreamlined workflow from sample to report
Align +Call Variant Annotate Filter Generate
ReportClassify
Easy!! Correctly !! Rapid!!
Slide generated from Henry Yen
30
Illumina iCloudBaseSpace
Slide generated from Henry Yen
31
The Illumina Analysis iCloud : BaseSpaceSlide generated from Henry Yen
32
BaseSpace Creates a Sequencing EcosystemAccelerates Analysis and Sharing of Genomic Data
Electronic Medical Record
Electronic Medical Record
Medical HistoryMedical History
Drugs & Immunization
Drugs & Immunization
Patient SchedulePatient
Schedule
Reference Content
Reference Content
Lab DataLab Data
Genomic Data
Diagnostic Images
Diagnostic Images
ScannedCharts
ScannedCharts
App Space
Public Databases
Slide generated from Henry Yen
33
Run data is automatically sent to Projects in BaseSpace
Runs and Projects have separate permissions
Core labs will be able to transfer ownership of a project
Runs and ProjectsSlide generated from Henry Yen
34
Enrichment Apps Release on BaseSpace NowPush-Button, Step by Step App Analysis
BWA EnrichmentILLUMINA, INC
The core algorithms in the BWA Enrichment workflow are the BWA Genome Alignment Software and the GATK Variant Caller.
Isaac EnrichmentILLUMINA, INC
The core algorithms in the Isaac Enrichment workflow are the Isaac Genome Alignment Software and the Isaac Variant Caller.
Only for Human hg 19 Read length of at least 32bp Support Paired-end run
Free
Slide generated from Henry Yen
35
Resequencing Analyzed Apps on BaseSpacePush-Button, Step by Step App Analysis
BWA Whole Genome SequencingILLUMINA, INC.
BWA/GATK Whole Genome Sequencing processes whole-genome sequencing data using BWA for alignment and variant detection using GATK.
Isaac Whole Genome Sequencing v2ILLUMINA, INC.
The Isaac Whole Genome Sequencing workflow performs read mapping using Isaac Genome Alignment Software and Isaac Variant Detection (SNVs, small indels, copy number anomalies and structural variations).
HiSeq Isaac Human WGS WorkflowILLUMINA INC.
Isaac Genome Alignment Software and Isaac Variant Caller for human samples.
Free
Free
Free
Slide generated from Henry Yen
36
About 12 species reference genome to aligned
Read length 21 ~ 150 bps ( Isaac is 35 ~150bps)
Support the Paired end runs
Does not support the Mate-paired reads
Detected CNV & Structure Variants result [VCF file]
Isaac & BWA Whole Genome Sequencing ILLUMINA, INC
Whole genome Analysis Apps on BaseSpacePush-Button, Step by Step App Analysis
Slide generated from Henry Yen
37
Tumor/Normal Paired Analysis Apps on BaseSpacePush-Button, Step by Step App Analysis
Tumor NormalILLUMINA, INC
The Tumor/Normal Sequencing App is designed to detect somatic variants from a tumor and matched normal sample pair
Only support human hg 19
Read length 50 ~ 150 bps
Support the Paired end runs
40X for normal sample & 80X for tumor(recommend)
Detected the somatic mutation in tumor
Free
Slide generated from Henry Yen
38
16S MetagenomicsILLUMINA, INC.
The 16S Metagenomics app performs taxonomic classification of 16S rRNA targeted amplicon reads using an Illumina-curated version of the GreenGenes taxonomic database.
16s Metagenomics Apps Release on BaseSpace NowPush-Button, Step by Step App Analysis
Free
Slide generated from Henry Yen
39
De novo assembly Apps in BaseSpacePush-Button, Step by Step App Analysis
Align, assemble & analyze readsDNASTAR, INC.
DNASTAR software for comprehensive next-gen sequence assembly and analysis.
Assemble bacteria de novo - FREEDNASTAR, INC.DNASTAR SeqMan NGen allows you to perform de novo assembly of bacterial genome sequences.
Slide generated from Henry Yen
40
SPAdesALGORITHMIC BIOLOGY LAB
SPAdes 3.0 - St. Petersburg Genome Assembler -is intended for both standard isolates and single-cell MDA bacterial assemblies.
BayesHammer + SPAdesBayesHammer – read error correction tool, which works well on both single-cell and standard data sets.SPAdes – iterative short-read genome assembly module; by default consecutively iterates through the
set of K-mer length values depending on the reads length.
Support MDA (Multiple displacement amplification) singel-cell bacterial assemblies
Supports paired-end reads, mate-pairs and unpaired reads.
De novo assembly Apps in BaseSpacePush-Button, Step by Step App Analysis
Free
Slide generated from Henry Yen
41
The Algorithm for de Bruijn graph
You should setting the K-merin your assemblies
Slide generated from Henry Yen
4242
New RNA-seq End-to-End Analysis Apps in “BaseSpace”
Software : TopHat2 v2.0.7Aligner : Bowtie 0.12.9Assembly & Gene Expression : Cufflinks 2.1.1Variants Caller: Isaac Variant Caller 2.0.5Alignment Statistics : Picard tools 1.72
What can the App do ?A. Alignment to hg19 human genomeB. FPKM value for genes or transcriptsC. Splice Junctions & fusions gene detectD. cSNPs findingE. Different expression gene discovery
TopHat Alignment Cufflinks Assembly & DE
Free
Slide generated from Henry Yen
43
Support 3 species (Human, Mouse, Rat) Can call gene fusion Only can trim adapter from TruSeq
New RNA-seq End-to-End Analysis Apps in “BaseSpace”Slide generated from Henry Yen
44
Biological Interpretation for RNA-seq Data in BaseSpace
FreeiPathwayGuide (Supports Human datasets only)ADVAITA BIO
An extension of the Cufflinks Assembly & DE workflow, iPathwayGuide will perform the following analyses: DE Gene Analysis Gene Ontology Analysis for Biological Processes, Molecular Functions,
and Cellular Components Pathway Analysis with Impact Analysis modeled on KEGG Pathways Coherent Cascade Analysis on Pathways Downstream Gene Perturbation Analysis Drug Interaction Analysis Disease Analysis based on enrichment
Slide generated from Henry Yen
4545
Overview the Core Apps for BaseSpace
BWA Enrichment
BWA Whole Genome Sequencing
Tumor Normal Paired
TopHat Alignment
Cufflinks Assembly & DE
Slide generated from Henry Yen
4646
BaseSpace Onsite System
Easy to Use from sample to Answer
Secure, Safe and Local Environment
Push-Button Data Processing
Two 6 cores CPUs with 128GB RAMCan only do the LIMS for NextSeq 500 now!!
(Support The HiSeq & MiSeq system in future)
RNA-seq Exome-seq Whole genome Analysis Tumor & Normal Paired
Slide generated from Henry Yen
4747
SummaryWorkflow MSR Local
VersionBaseSpace
Version
Amplicon – DS 2.4 N/A
Assembly 2.4 2.2
Enrichment 2.4 2.2
Generate FASTQ 2.4 2.2
Library QC 2.4 2.2
Metagenomics 2.4 2.2
PCR Amplicon 2.4 2.2
Resequencing 2.4 2.2
Small RNA 2.4 2.2
Targeted RNA 2.4 N/A
TruSeq Amplicon 2.4 2.2
BaseSpace Dual Mode Replicates Analysis Locally on MiSeq• Selectable option in MCS
• Allows customers to compare and evaluate MSR Local vs. BaseSpace
• Retains local copy of all files for customers reluctant to rely on 100% remote storage
Slide generated from Henry Yen
48
Questions?
…..or Tired?
Slide generated from Henry Yen
top related