week 12, lecture 24 · arent folder 13-jun-03 38 kb dexample-programs/ 07-jun-01 2 kb 28-may-01 174...

15
Practical Bioinformatics for Life Scientists Week 12, Lecture 24 István Albert Bioinformatics Consulting Center Penn State

Upload: others

Post on 31-Mar-2021

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Week 12, Lecture 24 · arent folder 13-Jun-03 38 kb Dexample-programs/ 07-Jun-01 2 kb 28-May-01 174 kb 13-May-10 613 kb 13-May-10 745 kb 14-Feb-06 4 kb __žgg_gb 06-Aug-99 101 kb

Practical Bioinformatics for Life Scientists

Week 12, Lecture 24

István Albert

Bioinformatics Consulting CenterPenn State

Page 2: Week 12, Lecture 24 · arent folder 13-Jun-03 38 kb Dexample-programs/ 07-Jun-01 2 kb 28-May-01 174 kb 13-May-10 613 kb 13-May-10 745 kb 14-Feb-06 4 kb __žgg_gb 06-Aug-99 101 kb

Midterm project report: ReadSeqdata format conversion

Note: not all conversions are valid!

Usually you can only go from a complex format to a simpler one.

Page 3: Week 12, Lecture 24 · arent folder 13-Jun-03 38 kb Dexample-programs/ 07-Jun-01 2 kb 28-May-01 174 kb 13-May-10 613 kb 13-May-10 745 kb 14-Feb-06 4 kb __žgg_gb 06-Aug-99 101 kb

BWA vs Bowtie 1 vs Bowtie 2• Testing on simulated reads BWA and realistic ones bowtie

• For both sensitivity (mapping rate) and specificity (correct alignments)

• We’ll keep a score along the way

• We all need to remember: both tools are triumphs of human ingenuity!

bowtie2

bwa

bwa

bowtie2

Page 4: Week 12, Lecture 24 · arent folder 13-Jun-03 38 kb Dexample-programs/ 07-Jun-01 2 kb 28-May-01 174 kb 13-May-10 613 kb 13-May-10 745 kb 14-Feb-06 4 kb __žgg_gb 06-Aug-99 101 kb

Short Read Archive

Command line toolkit to convert from the sra format to fastq, ssf, csfasta formats

Page 5: Week 12, Lecture 24 · arent folder 13-Jun-03 38 kb Dexample-programs/ 07-Jun-01 2 kb 28-May-01 174 kb 13-May-10 613 kb 13-May-10 745 kb 14-Feb-06 4 kb __žgg_gb 06-Aug-99 101 kb
Page 6: Week 12, Lecture 24 · arent folder 13-Jun-03 38 kb Dexample-programs/ 07-Jun-01 2 kb 28-May-01 174 kb 13-May-10 613 kb 13-May-10 745 kb 14-Feb-06 4 kb __žgg_gb 06-Aug-99 101 kb

Clone the wgsim repository on github

Compile wgsim

get wgsim from GitHub

Page 7: Week 12, Lecture 24 · arent folder 13-Jun-03 38 kb Dexample-programs/ 07-Jun-01 2 kb 28-May-01 174 kb 13-May-10 613 kb 13-May-10 745 kb 14-Feb-06 4 kb __žgg_gb 06-Aug-99 101 kb

Install the bowties

Cannot install on cygwin! Mac Linux only

Page 8: Week 12, Lecture 24 · arent folder 13-Jun-03 38 kb Dexample-programs/ 07-Jun-01 2 kb 28-May-01 174 kb 13-May-10 613 kb 13-May-10 745 kb 14-Feb-06 4 kb __žgg_gb 06-Aug-99 101 kb

Building indices for each tool

Page 9: Week 12, Lecture 24 · arent folder 13-Jun-03 38 kb Dexample-programs/ 07-Jun-01 2 kb 28-May-01 174 kb 13-May-10 613 kb 13-May-10 745 kb 14-Feb-06 4 kb __žgg_gb 06-Aug-99 101 kb

Running the aligners

Page 10: Week 12, Lecture 24 · arent folder 13-Jun-03 38 kb Dexample-programs/ 07-Jun-01 2 kb 28-May-01 174 kb 13-May-10 613 kb 13-May-10 745 kb 14-Feb-06 4 kb __žgg_gb 06-Aug-99 101 kb

wgsim evaluation

Page 11: Week 12, Lecture 24 · arent folder 13-Jun-03 38 kb Dexample-programs/ 07-Jun-01 2 kb 28-May-01 174 kb 13-May-10 613 kb 13-May-10 745 kb 14-Feb-06 4 kb __žgg_gb 06-Aug-99 101 kb

Quality reportsget alignment reports

Page 12: Week 12, Lecture 24 · arent folder 13-Jun-03 38 kb Dexample-programs/ 07-Jun-01 2 kb 28-May-01 174 kb 13-May-10 613 kb 13-May-10 745 kb 14-Feb-06 4 kb __žgg_gb 06-Aug-99 101 kb

Redirect standard error to output

get alignment reports

Page 13: Week 12, Lecture 24 · arent folder 13-Jun-03 38 kb Dexample-programs/ 07-Jun-01 2 kb 28-May-01 174 kb 13-May-10 613 kb 13-May-10 745 kb 14-Feb-06 4 kb __žgg_gb 06-Aug-99 101 kb

An R program to plot it

Page 14: Week 12, Lecture 24 · arent folder 13-Jun-03 38 kb Dexample-programs/ 07-Jun-01 2 kb 28-May-01 174 kb 13-May-10 613 kb 13-May-10 745 kb 14-Feb-06 4 kb __žgg_gb 06-Aug-99 101 kb

bwa

bowtie2

bowtie1

Page 15: Week 12, Lecture 24 · arent folder 13-Jun-03 38 kb Dexample-programs/ 07-Jun-01 2 kb 28-May-01 174 kb 13-May-10 613 kb 13-May-10 745 kb 14-Feb-06 4 kb __žgg_gb 06-Aug-99 101 kb

Homework 24

• Install R (optionally Rstudio) (will be using them during the next two lectures)

• Install and then run wgsim_eval.pl script on one of your SAM files that came from a wgsimsimulator

• Inspect the CIGAR strings of reads that were classified incorrectly and see if there is a pattern to them