introduction to next-generation sequencingthaiviro.org/upload/์ngs-pdf/9 อ....
TRANSCRIPT
Introduction toNext-Generation Sequencing
Associate Prof. Sunchai Payungporn, Ph.D.
Department of Biochemistry
Faculty of Medicine
Chulalongkorn University
E-mail: [email protected]
Next-Generation Sequencing (NGS)
Highly parallel DNA sequencing technologies that produce millions of reads for a low cost and in a short time
http://upload.wikimedia.org/wikipedia/commons/2/2e/Mapping_Reads.png
First generation sequencing VS Next-generation sequencing
http://cmr.asm.org/content/29/4/837/F5.expansion.html
Next-generation sequencing platforms based on company
https://www.youtube.com/watch?v=fCd6B5HRaZ8
https://www.youtube.com/watch?v=zjEQPGDx-J4
https://www.youtube.com/watch?v=DyijNS0LWBY
https://www.youtube.com/watch?v=v8p4ph2MAvI https://www.youtube.com/watch?v=E9-Rm5AoZGw
http://en.annoroad.com/news/company/51.html
https://www.youtube.com/watch?v=fCd6B5HRaZ8
http://www.bgi-agro.com/en/technology/41
https://www.youtube.com/watch?v=zjEQPGDx-J4
http://www.medsantek.com.tr/index_en.php
Ion Torrent-PGM Ion Torrent-Proton IIIon S5 / S5 XL System
https://www.youtube.com/watch?v=DyijNS0LWBY
https://www.pacb.com/applications/whole-genome-sequencing/human/
https://www.youtube.com/watch?v=v8p4ph2MAvI
https://mms.businesswire.com/media/20180319006158/en/646842/5/nanopore-product-family.jpg?download=1
https://www.youtube.com/watch?v=E9-Rm5AoZGw
Comparisons among different NGS platforms
https://twitter.com/albertvilella/status/946101154005639170
Comparisons among different NGS platforms
Summary of different NGS platforms
Platform Advantages Disadvantages
Illumina High data outputHigh accuracy
Short reads
BGI-SEQ High data outputLow cost/run
Short reads
Ion torrent Very fast run Error at repeated bases
Pacific Biosciences Very long reads High error rates
Oxford Nanopore Very long readsPortable
Quite high error rates
NGS terms: types of sequencing
Terms Description
Amplicon Sequencing High-throughput sequencing of DNA fragments obtained by conventional PCR.
De Novo Sequencing Sequencing of genetic material with no reference sequence available.
Re-Sequencing Sequencing of genetic material with reference sequence available.
Exome Sequencing Sequencing parts of genome made of exons.
RNA-Seq Sequencing of total RNA.
Chip-Seq For detecting transcription factor binding sites and histone modifications
NGS terms: sample processing
Terms Description
Sample Enrichment Preparation of a sample so that it contains the maximum amount of the
genetic material in question.
Fragmentation Splitting of genetic material into fragments of desired sizes: mechanically
(nebulisation, sonication) or enzymatically
Library A set of nucleic acid fragments which has undergone all processing steps and
is ready for actual sequencing.
Multiplex A library containing various samples labelled with barcodes.
Barcode or index A short unique sequence through which you can identify different samples
pooled into a single library.
Library preparation
Whole genome sequencing
Target ampliconssequencing
RNAsequencing
Size selection
& Adaptor ligation
From: ”Application of next-generation sequencing technologies in virology”Journal of General Virology (2012), 93, 1853–1868
(1 g)
(1 g)
DN
A s
equ
en
cin
g
Multiplexing and barcoding
https://www.illumina.com/content/dam/illumina-marketing/images/technology/multiplexing-overview-figure.gif
NGS terms: methods of reading
Terms Description
Single-End Reads A method of reading a fragment where the fragment is read from one end only during
sequencing.
Paired-End Read A method of reading a fragment where the fragment is first read from one end and then
from the other.
Mate Pair-End Read Strategy for sample preparation where the longer fragment (thousands of bases) is
circularized using labelled adapters, the molecule is subsequently fragmented, but only
the fragments containing the labelled adapters are sequenced.
Paired-End VS Mate Pair-End
Paired-End + Mate Pair-End
NGS primary analysis
FASTQ files
NGS terms: primary analysis results
Terms Description
Output Capacity A number of read bases in sequencing, typically measured in thousands to
trillions of bases (kb, Mb, Gb, Tb), can be related to an experiment
Read Data output from the analysis of a single fragment (sequence).
Read Accuracy Indicates the occurrence of errors (in %) after primary analysis.
Read Length The number of read bases per fragment, respectively the maximum length of
the fragment, which can be sequenced at a time (indicated in bases).
Secondary analysis: FASTQ format@ followed by description
(Sequence in FASTA format)
(ASCII character)
Phred quality score (Q) = ASCII value - 33
(Q30)
@ Instrument: Run ID: Flowcell ID: Lane: Tile: X:Y ReadNum:FilterFlag:0:SampleNumber
ASCII: American Standard Code for Information Interchange
NGS secondary analysis
FASTQ files
Quality trimming
Adaptors trimming
Analysis specified by applications
• Map reads to reference sequences (SAM, BAM file)
• Alignment variant calling (VCF File)
• De novo assembly (Fragment Contig Scaffold)
• Read identification taxonomic classification (OTUs)
• Count reads expression levels / relative abundance
Trim bases with Q<30Trim adaptor sequences
• Read Depth: DNA = number of times a nucleotide is read
RNA = total number of reads per sample
• Coverage: This value indicates the coverage of an analysedsequence with respect to its length, usually expressed as a percentage;sometimes the term is also used for the depth of reading.
NGS terms: depth and coverage
Average read depth = 2.80
NGS terms: Map reads to reference sequences
Terms Description
SAM File File containing alignment of fragments together with quality indicators and possibly other
information.
BAM File Binary version of SAM file, a typical output of the secondary phase of data analysis.
VCF File A file containing information about the sequence variants identified, a typical output of
the secondary phase of data analysis.
Variant Calling Process of detection of sequence variants in the sequences obtained.
SNP Calling Process of detecting SNPs in the sequences obtained.
SNP Single-Nucleotide Polymorphism = sequence divergence in the range of a single base.
InDel Insertion/deletion = sequencing divergences that can cause a reading frame shift.
NGS terms: de novo assemblyTerms Description
Assembly Assembly of fragment sequences into higher order structures based on their overlap
and reference sequence, where appropriate.
Fragment A short stretch of nucleic acid resulting from the fragmentation of longer stretches and
sequenced. The required size of a fragment is specific to the type of experiment and
sequencer possibilities.
Contig The first level of the association of fragment sequences to higher structures (Fragment -
> Contig).
Scaffold The second level of the association of fragment sequences to higher structures
(Fragment -> Contig -> Scaffold).
Applications of NGS in virology• Viral genome characterization
de novo sequencing (new emerging viruses)
Resequencing (minor mutations analysis)
• Deep sequencing of target gene
Antigenic variations
Drug resistant mutants
High resolution genotyping
• Transcriptome (virus-host interaction)
Alternative splicing
RNA editing
Differential gene expression profiling
MicroRNA expression profiling
• Metagenome
Virome
Pathogen identification
From: “The next-generation sequencing technology and application” Protein Cell 2010, 1(6): 520–536