sequencing technologies and velvet a ssembly
DESCRIPTION
Sequencing technologies and Velvet a ssembly. Lecturer : Du Shengyang September 29 , 2012. The Advances of DNA Sequencing Technology. 化学降解法. Sanger 法. 荧光自动测序技术. 454. The second generation of sequencing technologies. Solexa. SOLiD. - PowerPoint PPT PresentationTRANSCRIPT
Sequencing technologies and Velvet assembly
Lecturer: Du ShengyangSeptember 29 , 2012
The Advances of DNA Sequencing Technology
The first generation of se-quencing technologies
化学降解法Sanger 法
荧光自动测序技术
The second generation of sequencing technologies
454
Solexa
SOLiD
The third generation of sequencing
一、 Helico BioScience 单分子测序技术 二、 Pacific Bioscience SMRTT 技术 三、 Oxford Nanopore Technologies 的纳米孔单分子测序技术
三代测序技术的优点
High throughput, low cost, long read length, sequencing time is short
And avoid the second generation sequencing of PCR amplification link
reduce the sequencing of the error rate, the real realize the single molecule sequencing
The key to Sequencing success
1 、 Sample preparation
2 、 Choose the right sequencing platform
3 、 Late bioinformatics analysis
Bioinformatics analysis
Introduction Some sequencing techniques are commercially available (e.g. 454
Sequencing, Solexa)
454 Sequencing ~ 100 – 200bp
Solexa ~ 30bp
Introduction
Euler assembler (Pevzner 2001) used k-mer for a node of de Bruijn graphs
Reads are mapped as a path through the de Brujin graph
High redundancy does not affect the number of nodes
“Velvet” effectively deals with experimental errors and repeats by us-ing Brujin graphs with k-mers
De Bruijn Graphs - structure
De Bruijn Graphs – construction
Adjacent k-mers overlap by k-1 nucleotides
Each node is attached to twin node Reverse series of reverse complement k-mers Overlap between reads from opposite strand
Union of a node and its twin node is called a “block”
De Bruijn Graphs – construction
For each k-mer, hash table records ID of the first read and its posi-tion
Each k-mer is recorded with reverse complement
Reads are traced through the graph
Create a directed arc if necessary
11
De Bruijn Graphs – simplification
Simplify the chains of blocks No information loss
If node A has only one outgoing arc to node B, and if node B has only one ingoing arc → merge
A B
12
De Bruijn Graphs – error removal
Velvet focuses on “topological features” of
the graph
First step: remove tips Tip: chain of nodes disconnected on one end
Use two criteria: (1) length and (2) minority count Length: remove a tip if < 2k bp
since two nearby errors can create a tip up to 2k bp
error error
k k
13
De Bruijn Graphs – error removal
Minority count: multiplicity m < n
Starting from node B, going through the tip is an alterna-tive to a more common path
m
n
B
tip
A
C
14
De Bruijn Graphs – error removal
Second step: remove bubbles using Tour
Bus
Redundant paths start and end at the same nodes
Bubbles are created by errors or biological variants such as SNP
Bubble
15
De Bruijn Graphs – error removal
1. Detect redundant paths
2. Compare them using dynamic programming methods
3. If similar, merge them
Tour Bus
16
De Bruijn Graphs – error removal
Third step: remove erroneous connections
Remove erroneous connections after Tour Bus algorithm
Remove erroneous connections with basic coverage cutoff
Genuine short nodes which cannot be simplified in the graph should have high coverage
17
Breadcrumb: resolution of repeats
1. Using read pairs, pair up the long nodes
2. Flag paired reads using unambiguous long nodes
unambiguous long nodes
18
Breadcrumb: resolution of repeats
Extends the nodes as far as possible using flagged paired reads
All nodes between A and B are paired up to either A or B
19
Experimental Results
Test error removal pipeline on simulated data Simulate reads are from E. coli, S. cerevisiae,
C.elegans, and H. sapiens
20
Experimental Results
Test error removal pipeline on experimental data
173,428 bp human BAC was sequenced using Solexa machines
Reads were 35bp long, and k=31
Tour Bus increased sensitivity by correcting errors andpreserved the integrity of the graph structure
21
Experimental Results (cont)
22
Conclusions
Velvet is a de Bruijn graph based sequence assembly method for short reads
Errors are handled by removing tips and Tour Bus algo-rithm
A large number of repeats are resolved by Breadcrumb algorithm
Velvet was assessed using simulated and real datasets and it performed well