genome assembly preliminary results
DESCRIPTION
Genome Assembly Preliminary Results. Jeri Dilts Suzanna Kim Hema Nagrajan Deepak Purushotham Ambily Sivadas Amit Rupani Leo Wu 02/01/2012. Outline. Data Pre-Processing Formats and Conversion PRINSEQ Data statistics Error Correction CoRAL Assembler results Newbler 2.6 MIRA3 - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/1.jpg)
Genome AssemblyPreliminary Results
Jeri DiltsSuzanna Kim
Hema NagrajanDeepak Purushotham
Ambily SivadasAmit Rupani
Leo Wu
02/01/2012
![Page 2: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/2.jpg)
Outline
• Data Pre-Processing – Formats and Conversion– PRINSEQ– Data statistics
• Error Correction – CoRAL
• Assembler results– Newbler 2.6– MIRA3– ABySS– Velvet– PCAP-454
• Results • Lab Exercise
![Page 3: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/3.jpg)
Outline
• Data Pre-Processing – Formats and Conversion– PRINSEQ– Data statistics
• Error Correction – CoRAL
• Assembler results– Newbler 2.6– MIRA3– ABySS– Velvet– PCAP-454
• Results • Lab Exercise
![Page 4: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/4.jpg)
What are sff files?
• Sff files are Roche's "Standard Flowgram Format" files, containing the sequence data produced from a 454 run.
• The sff files contains a Manifest header at the start describing the contents and flow intensity signal values for each base in each read.
• They are in binary format, so need to be converted to text format, such as a fastq/fasta file using sff2fastq , ssf_extract , sffinfo programs. The Sequence Read Archive request that these .sff files be uploaded, to obtain accession number for publications.
![Page 5: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/5.jpg)
Fastq = Fasta + Quality •No standard file extension: but .fq .fastq and .txt are commonly used• 4 lines per sequence• Line 1 begins with the @ character, a sequence ID, and an optional description @SEQ_ID• Line 2 is the sequence letters GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAA• Line 3 begins with the + character, followed by the same sequence ID, and another optional description +Optional Description• Line 4 encodes quality values for the sequence letters in line 2 !''*((((***+))%%%++(%%%%).1***+*''))**55CCF>>>>>
![Page 6: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/6.jpg)
File Tools
• There are a large number of sff tool converters available now. We list a few here:
• sffinfo --> extract fasta, quality and flowgrams as text from .sff files.o sffinfo -seq Axolotl_reads.sff > Axolotl.fnao sffinfo -qual Axolotl_reads.sff > Axolotl.qual
• sff2fastq --> extracts read information from sff and outputs the sequence and quality scores in fastqo sff2fastq -o Axolotl.fastq Axolotl_reads.sff
• sfffile --> join sff files; extract part of sff file by MIDs, read names or random reads; or trim reads in user-defined ways.
![Page 7: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/7.jpg)
Quality Control
WHY ? • Saves time, effort and money
KEY CONCERNS • Number and Length of sequences• Base qualities• Ambiguous bases• Sequence duplications
![Page 8: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/8.jpg)
Data pre-processing - PRINSEQ
http://prinseq.sourceforge.net/manual.html
![Page 9: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/9.jpg)
Generating trimmed reads in fastq
• $ prinseq-lite.pl -fastq M19107.fastq -out_good stdout -log M19107.txt -trim_qual_left 20 -trim_qual_right 20 -trim_tail_left 5 -trim_tail_right 5 -trim_ns_left 2 -trim_ns_right 2 -min_len 65 -min_qual_mean 20 -ns_max_n 4 | gzip -9 > M19107.fastq.gz Read a fastq file containing quality data and write data passing all
Filters to standard out (terminal). The trimmed sequences are gzipped to a new file.
-min_gc: Filter sequences with GC content less than min_gc. -max_gc: Filter sequences with GC content greater than max_gc -min_qual_mean: Filter sequences with mean quality scores below the specified level. Most published thresholds varied between 15 and 25. We used 20.
-fastq:Fastq file containing sequence and quality data. -out_good stdout: This will write all data passing the filter to the stdout (terminal)-log: logfile to keep track of parameters, error etc.-min_len: Filter out sequences lower than this length. -ns_max_n: Filter sequences with more than the specified Ns. We tried with 2/3 and 1.
![Page 10: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/10.jpg)
Generating graphs report from trimmed reads
Reads a filtered fastq file and graph data file to generate graphs showing the distribution of length, base quality, GC Content, Occurance of N, Poly A/T Tails, Sequence Duplication, Sequence Complexity and Dinucleotide odds ratios.
prinseq-lite.pl -fastq M19107_filtered.fastq -graph_data M19107_filtered.gd -verbose -out_good null -out_bad null
-fastq:Fastq file containing sequence and quality data. -graph_data: File containing graph data to generate graphs report -verbose: prints status and info messages during processing-out_bad null & -out_good null: This will NOT create any output file other than the graphics file.
![Page 11: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/11.jpg)
Length Distribution
![Page 12: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/12.jpg)
Base Quality Distribution
![Page 13: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/13.jpg)
Mean Base Quality Scores
![Page 14: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/14.jpg)
Outline
• Data Pre-Processing – Formats and Conversion– PRINSEQ– Data statistics
• Error Correction – CoRAL
• Assembler results– Newbler 2.6– MIRA3– ABySS– Velvet– PCAP-454
• Results • Lab Exercise
![Page 15: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/15.jpg)
Error Correction
Motivation • Sequencing errors pose the biggest challenge
• Computational efficiency of assemblers improves
• Lot of redundant data - take advantage of it
• Ensures high data usage in assembly
![Page 16: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/16.jpg)
Coral v1.3
• CORrection with ALignment
• Corrects sequencing errors in short-read high throughput data
• Key strategy - Multiple Alignment using redundant read data
• Similar reads are all updated according to the alignment based on scoring of quality reads --score is based on number of reads that align at a position/ number of total reads at position
![Page 17: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/17.jpg)
Coral Multiple Alignment
![Page 18: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/18.jpg)
Coral Command Line
• coral -f[q, s] Input File -o Output File • accepts input files FASTA, FASTQ, and Solexa FASTA • -k (length of k-mer) >= log4 (length of genome), default 21• -e (maximum expected error rate), default 0.07• -454 (sets optimal settings in gap penalty, mismatch penalty, and
reward for matching
![Page 19: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/19.jpg)
Outline
• Data Pre-Processing – Formats and Conversion– PRINSEQ– Data statistics
• Error Correction – CoRAL
• Assembler results– Newbler 2.6– MIRA3– ABySS– Velvet– PCAP-454
• Results • Lab Exercise
![Page 20: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/20.jpg)
bler Algorithm
How does it work??
![Page 21: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/21.jpg)
What is newbler??• Roche's “GS De Novo Assembler” (where “GS” = “Genome Sequencer”)
• Designed to assemble reads from the Roche 454 sequencer.
• Accepts:
• 454 Flx Standard reads, and
• 454 Titanium reads.
• Single and paired-end reads.
• Optionally can include Sanger reads.
• Runs on Linux, and has 32 bit and 64 bit versions.
• Has Command-line and Java-based GUI interface.
![Page 22: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/22.jpg)
OLC algorithm -A quick recap
• Overlap Layout Consensus (OLC) is a method used for de novo genome assemblies.
• OLC requires three steps: • 1) overlap, 2) layout, and 3) consensus. • The overlap stage computes and builds the basic
assembly graph.• The layout stage compresses the graph, and the
consensus stage determines the genome sequence based on the graph generated in the previous two steps.
![Page 23: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/23.jpg)
Overlap
In the overlap step, the sequence of each read is compared to that of every other read, in both the forward and reverse complement orientations.
As such, the overlap computation step is a very time intensive step – especially if the set of reads is very large.
![Page 24: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/24.jpg)
Overlap criteria
• The OLC overlap criteria result in two types of overlaps: true overlaps (Figure 1a) and repeat overlaps (Figure 1b).
• For example, in Figure 1b, an overlap occurs between reads S and T, due to the orange repeat section, not because reads S and T truly overlap one locus in the genome, as in Figure 1a.
![Page 25: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/25.jpg)
How does it go?
• Although we would like exclude repeat overlaps, we must construct the assembly graph using both types of overlaps, as true and repeat overlaps cannot be distinguished individually.
• In the assembly graph, the nodes represent actual reads, and edges represent OLC-quality overlaps between these reads (Figure 2).
• Thus, the genome assembly becomes equivalent to finding a path through the graph that visits each node exactly once (i.e., a Hamiltonian path).
![Page 26: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/26.jpg)
![Page 27: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/27.jpg)
Layout• Finding a path through the OLC graph is not a trivial task.
Imagine you have a graph of millions of nodes and edges.• Identifying a path that visits every node exactly once would
be extremely difficult, even for a powerful computer. • In order to find such a path, you would have to start at
some node and proceed to other connected nodes. • If you find that you visited a node more than once, you
must backtrack, adjust the path, and test this new path.• So the larger the graph is, the more options you must test.
![Page 28: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/28.jpg)
Contigs
• In order to decrease the size of the graph, the OLC assembly graph is simplified in the layout stage, where segments of the assembly graph are compressed into contigs, which are collections of reads that clearly overlap each other and refer to the same overall sequence.
• Thus in the overlap graph, a contig would be a subgraph, or a group of nodes, with many connections among each other, as they all overlap with each other and refer to the same sequence
![Page 29: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/29.jpg)
Graphical representation of a unique contig
![Page 30: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/30.jpg)
Unique and repeat contigs
• There are two classes of contigs, unique contigs and repeat contigs.
• Unique contigs are composed of reads that can be unambiguously assembled.
• Generally, these reads only overlap in one way and do not contain repeats within the genome. Essentially, unique contigs are contigs not flagged as repeat contigs.
![Page 31: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/31.jpg)
Contig Assignment
![Page 32: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/32.jpg)
Contig assignment• Repeat contigs are contigs with an abnormally high read
coverage or connected to an abnormally large number of other contigs.
• Additionally, this repeat contig is different from other contigs because it has such high coverage.
• The high coverage of repeat contigs allows algorithms to identify them through statistics that compare the coverage of each contig.
• Contigs with too much coverage are most likely due to over-collapsing of repeats and are flagged as repeat contigs, to be used later in the layout stage
![Page 33: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/33.jpg)
Consensus
• In the final stage of the OLC method, we derive the consensus sequence.
![Page 34: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/34.jpg)
![Page 35: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/35.jpg)
![Page 36: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/36.jpg)
Arrival Intervals
Discriminator Statistic is log-odds ratio of probability unitig is unique DNA versus 2-copy DNA.
Definitely UniqueDefinitely Repetitive Don’t Know
-10 +100Dist. For Unique
Dist. For Repetitive
Unique DNA unitig Repetitive DNA unitig
Identifying Unique DNA Stretches
![Page 37: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/37.jpg)
How to run Newbler?• COMMAND LINE INTERFACE
• The simplest command to run Newbler is:
• runAssembly [options] reads.sff
• • Which creates an the assembly in an output directory called:
• P_yyyy_mm_dd_hh_min_sec_runAssembly
• where P_ = Project, followed by date and time
• There are a large number of optional parameters available for controlling and refining the assembly
![Page 38: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/38.jpg)
Lets look at a Newbler run..
![Page 39: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/39.jpg)
39
Overlap between two sequences
…AGCCTAGACCTACAGGATGCGCGGACACGTAGCCAGGAC CAGTACTTGGATGCGCTGACACGTAGCTTATCCGGT…
overlap (19 bases) overhang (6 bases)
overhangoverlap - region of similarity between regionsoverhang - un-aligned ends of the sequences
The assembler screens merges based on: • length of overlap• % identity in overlap region• maximum overhang size.
% identity = 18/19 % = 94.7%
![Page 40: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/40.jpg)
Newbler Command Line
• runAssembly [options] inputFile• runMapping [options] referenceGenome inputFile• accepts input files -- FASTA, FASTQ, .SFF • -o Name output directory• -qo Generate quick output for assembly (can decrease
accuracy)
![Page 41: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/41.jpg)
M19107 - Newbler Results
![Page 42: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/42.jpg)
M19501- Newbler Results
![Page 43: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/43.jpg)
MIRA3Multifunctional Inertial Reference Assembly v3.4.0
• Started in 1997 as a PhD project at the German Cancer Research Center by Bastien Chevreux and in 2007 became open source
• MIRA 3 is able to perform true hybrid de-novo assemblies using reads gathered through Sanger, 454, Solexa, IonTorrent or PacBio sequencing technologies.o can also perform regular (non-hybrid) de-novo assemblies
using 454 data• Overlap layout consensus algorithm
http://sourceforge.net/apps/mediawiki/mira-assembler/index.php?title=Main_Page
![Page 44: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/44.jpg)
MIRA3:Command Line Arguments (1/2)"Extracting SFF"
Convert the SFFs named M19107.sff, M19107.sff and M19107.sff
The parameters of sff_extract -Q extract to FASTQ-s give the FASTQ file a name we chose -x give the XML file with vector clipping information a name we chose http://mira-assembler.sourceforge.net/docs/chap_454_part.html
![Page 45: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/45.jpg)
MIRA3:Command Line Arguments (2/2)"Begin Assembly"
Parameters--project (for naming your assembly project)--job (perhaps to change the quality of the assembly to 'draft')>& creates/outputs a file named log_assembly.txt to observe assembly progress
http://mira-assembler.sourceforge.net/docs/chap_454_part.html
![Page 46: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/46.jpg)
MIRA3 Data Manipulation Integrity
Our objective is to produce the most accurate representation of a genome.
When software tools produce better results, it doesn't necessarily indicate that the genome's representation is more accurate. (and vice versa)
This makes it tricky to determine the proper tools to use.
Could be detrimental to scientific integrity.
Take precautions.
![Page 47: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/47.jpg)
MIRA3 Data Analysis
Data Quality (ideally) Increases
![Page 48: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/48.jpg)
MIRA3 Further Work.....
• Need to look at more assembly parameters in reference manual
• Finish scripts that calculate Min. Contig Length and Avg. Contig Length
• M19501 32bit error
Finish running MIRA3 on all genomes
![Page 49: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/49.jpg)
ABySS • ABySS stands for Assembly By Short Sequences.
• It is a De Novo sequence assembler designed for short reads and large genomes.
• Single Processor version: Genomes up to 100Mbp in size Parallel version: capable of assembling mammalian sized genomes
• Capable of performing assemblies for both single end reads and paired end reads
• The output of the ABySS is set of contigs assembled from the short reads
![Page 50: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/50.jpg)
ABySS
![Page 51: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/51.jpg)
ABySS commands for assembly• ABYSS -K[K-mer value] input.fastq -o output.fastq•
• perform operation for multiple k-mer value in loop for k in {20..40}; do ABYSS -k$k reads.fa -o contigs-k$k.fa done
![Page 52: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/52.jpg)
ABySS output file• ABySS output file consist of contigs generated by the
assembly
• Each contig in the output file consist of 2 lines
• Line 1 >n iii jjj where n=contig id iii=contig length in nucleotides jjj=absolute coverage value• Line 2
AAAAACTAATTTCTGAAAT (contig sequence)
![Page 53: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/53.jpg)
ABySS output
![Page 54: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/54.jpg)
VELVET
• de Bruijn graph - based assembler
• best for high coverage very short read data sets
• leverages paired end information really well
![Page 55: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/55.jpg)
Commands
• velveth - performs hashing• default k-mer value - 31 • specify input format (-fastq), read-type (-long)
![Page 56: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/56.jpg)
Commands
• velvetg - generates the graph and forms the assembly• exp_cov (Expected coverage) - auto
![Page 57: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/57.jpg)
Results
![Page 58: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/58.jpg)
PCAP-454
• Beta version (not yet released)
• Overlap Layout Consensus assembler
• Designed for 454 paired-end data
• Computationally efficient
![Page 59: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/59.jpg)
PCAP-454 commands
• Input files: Fasta and Qual files separately zipped
• Fofn file: Specify the name of the input file• Run the automated script• $./autopcap fofn > auto.log
![Page 60: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/60.jpg)
PCAP-454 Results
Metrics Raw data
No. of Contigs 589
N50 6574
Max Contig length 20652
M19107
![Page 61: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/61.jpg)
Results Summary
![Page 62: Genome Assembly Preliminary Results](https://reader036.vdocuments.mx/reader036/viewer/2022081505/56816140550346895dd0acbc/html5/thumbnails/62.jpg)
Work Cited
M19107 original readshttp://edwards.sdsu.edu/cgi-bin/prinseq_beta/tmp/1327884761.html M19107 filtered readshttp://edwards.sdsu.edu/cgi-bin/prinseq_beta/tmp/1327950415.html
Quality Control with Prinseqprinseq.sourceforge.net/Quality_control_with_PRINSEQ.pdf(http://prinseq.sourceforge.net).