de novo assembly - oregon state universitypeople.oregonstate.edu/~knausb/rna_seq/10_denovo.pdfbasic...
TRANSCRIPT
![Page 1: De novo assembly - Oregon State Universitypeople.oregonstate.edu/~knausb/rna_seq/10_denovo.pdfBasic Velvet de novo assembly Run velveth Velveth takes in a number of sequence files,](https://reader033.vdocuments.mx/reader033/viewer/2022041716/5e4b5d8b29b55d3648772e53/html5/thumbnails/1.jpg)
De novo assembly Brian J. Knaus
USDA Forest Service
Pacific Northwest Research Station
Kelly Vining and Katherine Pillman
Oregon State University
1
![Page 2: De novo assembly - Oregon State Universitypeople.oregonstate.edu/~knausb/rna_seq/10_denovo.pdfBasic Velvet de novo assembly Run velveth Velveth takes in a number of sequence files,](https://reader033.vdocuments.mx/reader033/viewer/2022041716/5e4b5d8b29b55d3648772e53/html5/thumbnails/2.jpg)
Calendar
December meeting cancelled.
January meeting organizational.
◦ Introductions.
◦ Projects we’re involved with.
◦ What we want to get out of this group?
February meeting: Cufflinks (pending)
2
![Page 3: De novo assembly - Oregon State Universitypeople.oregonstate.edu/~knausb/rna_seq/10_denovo.pdfBasic Velvet de novo assembly Run velveth Velveth takes in a number of sequence files,](https://reader033.vdocuments.mx/reader033/viewer/2022041716/5e4b5d8b29b55d3648772e53/html5/thumbnails/3.jpg)
Overview
What are de Bruijn graphs?
Software:
◦ Velvet
◦ ABySS
◦ Trinity
3
![Page 4: De novo assembly - Oregon State Universitypeople.oregonstate.edu/~knausb/rna_seq/10_denovo.pdfBasic Velvet de novo assembly Run velveth Velveth takes in a number of sequence files,](https://reader033.vdocuments.mx/reader033/viewer/2022041716/5e4b5d8b29b55d3648772e53/html5/thumbnails/4.jpg)
Genomic vs. transcriptomic
Genomic expectations:
◦ Long contigs (i.e., chromosomes).
◦ Uniform coverage.
Transcriptomic expectations:
◦ Short contigs (i.e., genes or operons).
◦ Non-uniform coverage.
4
![Page 5: De novo assembly - Oregon State Universitypeople.oregonstate.edu/~knausb/rna_seq/10_denovo.pdfBasic Velvet de novo assembly Run velveth Velveth takes in a number of sequence files,](https://reader033.vdocuments.mx/reader033/viewer/2022041716/5e4b5d8b29b55d3648772e53/html5/thumbnails/5.jpg)
Two types of computer error
Errors which crash the program and
generate an error message.
Errors where the software appears to
exit successfully but has actually done
something bad.
5
![Page 6: De novo assembly - Oregon State Universitypeople.oregonstate.edu/~knausb/rna_seq/10_denovo.pdfBasic Velvet de novo assembly Run velveth Velveth takes in a number of sequence files,](https://reader033.vdocuments.mx/reader033/viewer/2022041716/5e4b5d8b29b55d3648772e53/html5/thumbnails/6.jpg)
Essential bioinformatic software Database/query applications
◦ Blast
◦ Blat
◦ MAQ
◦ Bowtie
◦ BWA
De novo assembly
◦ Velvet
◦ ABySS
Soap 6
![Page 7: De novo assembly - Oregon State Universitypeople.oregonstate.edu/~knausb/rna_seq/10_denovo.pdfBasic Velvet de novo assembly Run velveth Velveth takes in a number of sequence files,](https://reader033.vdocuments.mx/reader033/viewer/2022041716/5e4b5d8b29b55d3648772e53/html5/thumbnails/7.jpg)
De novo assembly
Compiled code (usually C; Dennis Ritchie
R.I.P).
Require LOTS of memory!
Implement de Bruijn graphs.
7
![Page 8: De novo assembly - Oregon State Universitypeople.oregonstate.edu/~knausb/rna_seq/10_denovo.pdfBasic Velvet de novo assembly Run velveth Velveth takes in a number of sequence files,](https://reader033.vdocuments.mx/reader033/viewer/2022041716/5e4b5d8b29b55d3648772e53/html5/thumbnails/8.jpg)
8
ACTGATTG
![Page 9: De novo assembly - Oregon State Universitypeople.oregonstate.edu/~knausb/rna_seq/10_denovo.pdfBasic Velvet de novo assembly Run velveth Velveth takes in a number of sequence files,](https://reader033.vdocuments.mx/reader033/viewer/2022041716/5e4b5d8b29b55d3648772e53/html5/thumbnails/9.jpg)
9
ACTGATTG
k-mer = 5
![Page 10: De novo assembly - Oregon State Universitypeople.oregonstate.edu/~knausb/rna_seq/10_denovo.pdfBasic Velvet de novo assembly Run velveth Velveth takes in a number of sequence files,](https://reader033.vdocuments.mx/reader033/viewer/2022041716/5e4b5d8b29b55d3648772e53/html5/thumbnails/10.jpg)
10
ACTGATTG
k-mer = 5
![Page 11: De novo assembly - Oregon State Universitypeople.oregonstate.edu/~knausb/rna_seq/10_denovo.pdfBasic Velvet de novo assembly Run velveth Velveth takes in a number of sequence files,](https://reader033.vdocuments.mx/reader033/viewer/2022041716/5e4b5d8b29b55d3648772e53/html5/thumbnails/11.jpg)
11
ACTGATTG
k-mer = 5
![Page 12: De novo assembly - Oregon State Universitypeople.oregonstate.edu/~knausb/rna_seq/10_denovo.pdfBasic Velvet de novo assembly Run velveth Velveth takes in a number of sequence files,](https://reader033.vdocuments.mx/reader033/viewer/2022041716/5e4b5d8b29b55d3648772e53/html5/thumbnails/12.jpg)
12
ACTGATTG
k-mer = 5
![Page 13: De novo assembly - Oregon State Universitypeople.oregonstate.edu/~knausb/rna_seq/10_denovo.pdfBasic Velvet de novo assembly Run velveth Velveth takes in a number of sequence files,](https://reader033.vdocuments.mx/reader033/viewer/2022041716/5e4b5d8b29b55d3648772e53/html5/thumbnails/13.jpg)
13
ACTGATTG
k-mer = 5
![Page 14: De novo assembly - Oregon State Universitypeople.oregonstate.edu/~knausb/rna_seq/10_denovo.pdfBasic Velvet de novo assembly Run velveth Velveth takes in a number of sequence files,](https://reader033.vdocuments.mx/reader033/viewer/2022041716/5e4b5d8b29b55d3648772e53/html5/thumbnails/14.jpg)
Schematic representation of our implementation of the de Bruijn graph.
Zerbino D R , Birney E Genome Res. 2008;18:821-829
©2008 by Cold Spring Harbor Laboratory Press
![Page 15: De novo assembly - Oregon State Universitypeople.oregonstate.edu/~knausb/rna_seq/10_denovo.pdfBasic Velvet de novo assembly Run velveth Velveth takes in a number of sequence files,](https://reader033.vdocuments.mx/reader033/viewer/2022041716/5e4b5d8b29b55d3648772e53/html5/thumbnails/15.jpg)
Schematic representation of our implementation of the de Bruijn graph.
Zerbino D R , Birney E Genome Res. 2008;18:821-829
©2008 by Cold Spring Harbor Laboratory Press
![Page 16: De novo assembly - Oregon State Universitypeople.oregonstate.edu/~knausb/rna_seq/10_denovo.pdfBasic Velvet de novo assembly Run velveth Velveth takes in a number of sequence files,](https://reader033.vdocuments.mx/reader033/viewer/2022041716/5e4b5d8b29b55d3648772e53/html5/thumbnails/16.jpg)
Example of Tour Bus error correction.
Zerbino D R , Birney E Genome Res. 2008;18:821-829
©2008 by Cold Spring Harbor Laboratory Press
![Page 17: De novo assembly - Oregon State Universitypeople.oregonstate.edu/~knausb/rna_seq/10_denovo.pdfBasic Velvet de novo assembly Run velveth Velveth takes in a number of sequence files,](https://reader033.vdocuments.mx/reader033/viewer/2022041716/5e4b5d8b29b55d3648772e53/html5/thumbnails/17.jpg)
Example of Tour Bus error correction.
Zerbino D R , Birney E Genome Res. 2008;18:821-829
©2008 by Cold Spring Harbor Laboratory Press
Issues:
• Bubbles.
• Tips.
• Errors.
![Page 18: De novo assembly - Oregon State Universitypeople.oregonstate.edu/~knausb/rna_seq/10_denovo.pdfBasic Velvet de novo assembly Run velveth Velveth takes in a number of sequence files,](https://reader033.vdocuments.mx/reader033/viewer/2022041716/5e4b5d8b29b55d3648772e53/html5/thumbnails/18.jpg)
Velvet
Kelly Vining
Oregon State University
Forestry
18
![Page 19: De novo assembly - Oregon State Universitypeople.oregonstate.edu/~knausb/rna_seq/10_denovo.pdfBasic Velvet de novo assembly Run velveth Velveth takes in a number of sequence files,](https://reader033.vdocuments.mx/reader033/viewer/2022041716/5e4b5d8b29b55d3648772e53/html5/thumbnails/19.jpg)
Velvet • Current version: 1.1.06 (1.1.05 at CGRB cluster)
• Zerbino DR, Birney E (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18: 821–829.
• Zerbino DR, McEwen GK, Margulies EH, Birney E, 2009 Pebble and Rock Band: Heuristic Resolution of Repeats and Scaffolding in the Velvet Short-Read de Novo Assembler. PLoS ONE 4(12): e8407. doi:10.1371/journal.pone.0008407
![Page 20: De novo assembly - Oregon State Universitypeople.oregonstate.edu/~knausb/rna_seq/10_denovo.pdfBasic Velvet de novo assembly Run velveth Velveth takes in a number of sequence files,](https://reader033.vdocuments.mx/reader033/viewer/2022041716/5e4b5d8b29b55d3648772e53/html5/thumbnails/20.jpg)
Basic Velvet de novo assembly
Run velveth Velveth takes in a number of sequence files, produces a hashtable, then
outputs two files in an output directory (creating it if necessary),
Sequences and Roadmaps, which are necessary to velvetg.
Run velvetg Velvetg is the core of Velvet where the de Bruijn graph is built then
manipulated.
Note that although velvetg saves some files during the process to avoid
useless recalculations, the parameters are not saved from one run to the
next.
![Page 21: De novo assembly - Oregon State Universitypeople.oregonstate.edu/~knausb/rna_seq/10_denovo.pdfBasic Velvet de novo assembly Run velveth Velveth takes in a number of sequence files,](https://reader033.vdocuments.mx/reader033/viewer/2022041716/5e4b5d8b29b55d3648772e53/html5/thumbnails/21.jpg)
On the command line…
/local/cluster/velvet
[viningk@needleman velvet]$ ll
total 32
drwxr-xr-x 2 root root 34 Sep 24 2007 VELVET/
drwxr-xr-x 8 root root 4096 Jul 7 2009 velvet_0.7.42/
drwxr-xr-x 8 root root 4096 Dec 23 2009 velvet_0.7.55/
drwxr-xr-x 8 root root 4096 Feb 16 2010 velvet_0.7.58/
drwxr-sr-x 8 4205 1304 4096 Oct 4 2010 velvet_1.0.12/
drwxr-xr-x 8 root root 4096 May 18 10:29 velvet_1.1.02/
drwxr-xr-x 8 boyda spatafojlab 4096 Sep 7 14:39 velvet_1.1.05/
lrwxrwxrwx 1 root root 13 Sep 22 14:15 velvet -> velvet_1.1.05/
drwxr-xr-x 9 root root 152 Sep 22 14:15 ./
drwxrwxr-x 123 root cgrb 4096 Nov 1 13:34 ../
![Page 22: De novo assembly - Oregon State Universitypeople.oregonstate.edu/~knausb/rna_seq/10_denovo.pdfBasic Velvet de novo assembly Run velveth Velveth takes in a number of sequence files,](https://reader033.vdocuments.mx/reader033/viewer/2022041716/5e4b5d8b29b55d3648772e53/html5/thumbnails/22.jpg)
/local/cluster/velvet/velvet_1.1.05
[viningk@needleman velvet_1.1.05]$ ll
total 884
drwxr-xr-x 3 boyda spatafojlab 23 Jul 1 2009 third-party/
drwxr-xr-x 3 boyda spatafojlab 23 Dec 9 2009 doc/
drwxr-xr-x 2 boyda spatafojlab 90 Dec 21 2010 data/
-rwxr-xr-x 1 boyda spatafojlab 26 Jan 11 2011 update_velvet.sh*
-rwxr-xr-x 1 boyda spatafojlab 479 Jan 11 2011 shuffleSequences_fastq.pl*
-rwxr-xr-x 1 boyda spatafojlab 876 Jan 11 2011 shuffleSequences_fasta.pl*
-rw-r--r-- 1 boyda spatafojlab 17988 Jan 11 2011 LICENSE.txt
-rw-r--r-- 1 boyda spatafojlab 1117 Mar 8 2011 README.txt
-rw-r--r-- 1 boyda spatafojlab 456 Mar 8 2011 For_MAC_or_SPARC_users.txt
-rw-r--r-- 1 boyda spatafojlab 1956 Mar 29 2011 CREDITS.txt
-rw-r--r-- 1 boyda spatafojlab 3834 Mar 29 2011 globals.h
drwxr-xr-x 15 boyda spatafojlab 4096 Mar 29 2011 contrib/
drwxr-xr-x 2 boyda spatafojlab 4096 Jul 27 14:35 src/
-rw-r--r-- 1 boyda spatafojlab 6118 Jul 28 13:38 Makefile
-rw-r--r-- 1 boyda spatafojlab 10871 Jul 28 13:38 ChangeLog
drwxr-xr-x 3 boyda spatafojlab 4096 Sep 7 14:39 obj/
-rwxr-xr-x 1 boyda spatafojlab 64819 Sep 7 14:39 velveth*
-rwxr-xr-x 1 boyda spatafojlab 246830 Sep 7 14:39 velvetg*
-rw-r--r-- 1 boyda spatafojlab 439712 Sep 7 14:39 Manual.pdf
-rw-r--r-- 1 boyda spatafojlab 61416 Sep 7 14:39 Columbus_manual.pdf
drwxr-xr-x 8 boyda spatafojlab 4096 Sep 7 14:39 ./
drwxr-xr-x 9 root root 152 Sep 22 14:15 ../
![Page 23: De novo assembly - Oregon State Universitypeople.oregonstate.edu/~knausb/rna_seq/10_denovo.pdfBasic Velvet de novo assembly Run velveth Velveth takes in a number of sequence files,](https://reader033.vdocuments.mx/reader033/viewer/2022041716/5e4b5d8b29b55d3648772e53/html5/thumbnails/23.jpg)
Velveth Designate file type (fastq, fasta, etc)
Designate read category
Designate hash size
Supported file formats are:
fasta (default)
fastq
fasta.gz
fastq.gz
sam
bam
eland
gerald
Read categories are:
short (default)
shortPaired**
short2 (same as short, but for a separate
insert-size library)
shortPaired2** (see above)
long (for Sanger, 454 or even reference
sequences)
longPaired**
**Paired-end reads must be in a single file (ShuffleSequences.pl)
![Page 24: De novo assembly - Oregon State Universitypeople.oregonstate.edu/~knausb/rna_seq/10_denovo.pdfBasic Velvet de novo assembly Run velveth Velveth takes in a number of sequence files,](https://reader033.vdocuments.mx/reader033/viewer/2022041716/5e4b5d8b29b55d3648772e53/html5/thumbnails/24.jpg)
Sample commands: velveth
> ./velveth output_directory hash_length
[[-file_format][-read_type] filename]
> /deepspace/strauss/viningk/velvet_1.1.02/velveth velvet_041411 29
-fastq -shortPaired s_5_seqsTrim6.txt
> /local/cluster/velvet/velvet_1.1.02/velveth velvet_091410b 31
-fastq -short s_1_seqsMergedTrim6.txt -fasta -long contigs_09.fa
![Page 25: De novo assembly - Oregon State Universitypeople.oregonstate.edu/~knausb/rna_seq/10_denovo.pdfBasic Velvet de novo assembly Run velveth Velveth takes in a number of sequence files,](https://reader033.vdocuments.mx/reader033/viewer/2022041716/5e4b5d8b29b55d3648772e53/html5/thumbnails/25.jpg)
Velvetg Designate library insert size
Designate minimum contig length
Designate coverage cutoff
Need to optimize according to genome size, expected genome coverage depth,
etc.
Examples: > ./velvetg output_directory/ -min_contig_lgth 100
> /deepspace/strauss/viningk/velvet_1.1.02//velvetg velvet_041411/
-ins_length 350 -min_contig_lgth 500
> /deepspace/strauss/viningk/velvet_1.1.02/velvetg velvet_051811b
-ins_length 350 -exp_cov 20 -cov_cutoff 7.9 -min_contig_lgth 500
![Page 26: De novo assembly - Oregon State Universitypeople.oregonstate.edu/~knausb/rna_seq/10_denovo.pdfBasic Velvet de novo assembly Run velveth Velveth takes in a number of sequence files,](https://reader033.vdocuments.mx/reader033/viewer/2022041716/5e4b5d8b29b55d3648772e53/html5/thumbnails/26.jpg)
Velvet Optimiser
Tries a range of k-mer lengths with velveth, then runs velvetg
with range of coverage cutoffs
Lives here on the CGRB cluster:
/local/cluster/velvet/velvet_1.1.05/contrib/VelvetOptimiser-2.1.7
Sample command:
/deepspace/strauss/viningk/velvet_1.1.02/contrib/VelvetOptimiser-
2.1.7/VelvetOptimiser.pl -s 27 -e 31 -f '-shortPaired -fastq s_5_seqsTrim6.txt
s_1_seqsMergedTrim6.txt' > s_15_VOtest3.txt
![Page 27: De novo assembly - Oregon State Universitypeople.oregonstate.edu/~knausb/rna_seq/10_denovo.pdfBasic Velvet de novo assembly Run velveth Velveth takes in a number of sequence files,](https://reader033.vdocuments.mx/reader033/viewer/2022041716/5e4b5d8b29b55d3648772e53/html5/thumbnails/27.jpg)
Sample VelvetOptimiser output ********************************************************
Assembly id: 1
Velveth timestamp: May 20 2011 13:47:08
Velveth version: 1.0.12
Readfile(s): -shortPaired -fastq s_5_seqsTrim6.txt
s_1_seqsMergedTrim6.txt
Velveth parameter string: auto_data_31 31 -shortPaired -fastq
s_5_seqsTrim6.txt s_1_seqsMergedTrim6.txt
Assembly directory: /deepspace/strauss/viningk/auto_data_31
Velvet hash value: 31
Roadmap file size: 12143769440
Total number of sequences: 150746258
**********************************************************
May 20 13:47:08
Beginning vanilla velvetg runs.
********************************************************
Assembly id: 1
Assembly score: 140
Velveth timestamp: May 20 2011 13:47:08
Velvetg timestamp: May 20 2011 18:59:34
Velveth version: 1.0.12
Velvetg version: 1.0.12
Readfile(s): -shortPaired -fastq s_5_seqsTrim6.txt
s_1_seqsMergedTrim6.txt
Velveth parameter string: auto_data_31 31 -shortPaired -fastq
s_5_seqsTrim6.txt s_1_seqsMergedTrim6.txt
Velvetg parameter string: auto_data_31
Assembly directory: /deepspace/strauss/viningk/auto_data_31
Velvet hash value: 31
Roadmap file size: 12143769440
Total number of sequences: 150746258
Total number of contigs: 2635009
n50: 140
length of longest contig: 6491
Total bases in contigs: 343232524
Number of contigs > 1k: 11471
Total bases in contigs > 1k: 15400103
![Page 28: De novo assembly - Oregon State Universitypeople.oregonstate.edu/~knausb/rna_seq/10_denovo.pdfBasic Velvet de novo assembly Run velveth Velveth takes in a number of sequence files,](https://reader033.vdocuments.mx/reader033/viewer/2022041716/5e4b5d8b29b55d3648772e53/html5/thumbnails/28.jpg)
Sample VelvetOptimiser output
(continued) **********************************************************
May 21 06:11:49 Setting cov_cutoff to 5.195.
********************************************************
Assembly id: 1
Assembly score: 93102316
Velveth timestamp: May 20 2011 13:47:08
Velvetg timestamp: May 21 2011 18:22:55
Velveth version: 1.0.12
Velvetg version: 1.0.12
Readfile(s): -shortPaired -fastq s_5_seqsTrim6.txt
s_1_seqsMergedTrim6.txt
Velveth parameter string: auto_data_31 31 -shortPaired -fastq
s_5_seqsTrim6.txt s_1_seqsMergedTrim6.txt
Velvetg parameter string: auto_data_31 -exp_cov 17 -
cov_cutoff 5.1952
Assembly directory: /deepspace/strauss/viningk/auto_data_31
Velvet hash value: 31
Roadmap file size: 12143769440
Total number of sequences: 150746258
Total number of contigs: 1080880
n50: 602
length of longest contig: 12710
Total bases in contigs: 261325501
Number of contigs > 1k: 52009
Total bases in contigs > 1k: 93102316
Paired Library insert stats:
Paired-end library 1 has length: 154, sample standard deviation:
57
Paired-end library 1 has length: 160, sample standard deviation:
63
**********************************************************
![Page 29: De novo assembly - Oregon State Universitypeople.oregonstate.edu/~knausb/rna_seq/10_denovo.pdfBasic Velvet de novo assembly Run velveth Velveth takes in a number of sequence files,](https://reader033.vdocuments.mx/reader033/viewer/2022041716/5e4b5d8b29b55d3648772e53/html5/thumbnails/29.jpg)
ABySS
Katherine Pillman
Oregon State University
Horticulture
http://www.molecularevolution.org/molevol
files/presentations/comparative-genomics-
abyss-20110714.pdf
29
![Page 30: De novo assembly - Oregon State Universitypeople.oregonstate.edu/~knausb/rna_seq/10_denovo.pdfBasic Velvet de novo assembly Run velveth Velveth takes in a number of sequence files,](https://reader033.vdocuments.mx/reader033/viewer/2022041716/5e4b5d8b29b55d3648772e53/html5/thumbnails/30.jpg)