long read sequencing - lscc lab talk - fri 5 june 2015
TRANSCRIPT
![Page 1: Long read sequencing - LSCC lab talk - fri 5 june 2015](https://reader033.vdocuments.mx/reader033/viewer/2022052602/55b6e291bb61eb53268b477f/html5/thumbnails/1.jpg)
Long read sequencing
Torsten Seemann
VLSCI LSCC Lab Talk - Melbourne, AU - Fri 5 June 2015
The good, the bad, and the really cool.
![Page 2: Long read sequencing - LSCC lab talk - fri 5 june 2015](https://reader033.vdocuments.mx/reader033/viewer/2022052602/55b6e291bb61eb53268b477f/html5/thumbnails/2.jpg)
Why do we need long reads?
![Page 3: Long read sequencing - LSCC lab talk - fri 5 june 2015](https://reader033.vdocuments.mx/reader033/viewer/2022052602/55b6e291bb61eb53268b477f/html5/thumbnails/3.jpg)
Repeats!
![Page 4: Long read sequencing - LSCC lab talk - fri 5 june 2015](https://reader033.vdocuments.mx/reader033/viewer/2022052602/55b6e291bb61eb53268b477f/html5/thumbnails/4.jpg)
Long reads untangle graphs
![Page 5: Long read sequencing - LSCC lab talk - fri 5 june 2015](https://reader033.vdocuments.mx/reader033/viewer/2022052602/55b6e291bb61eb53268b477f/html5/thumbnails/5.jpg)
Completed genomes
![Page 6: Long read sequencing - LSCC lab talk - fri 5 june 2015](https://reader033.vdocuments.mx/reader033/viewer/2022052602/55b6e291bb61eb53268b477f/html5/thumbnails/6.jpg)
Phased haplotypes
![Page 7: Long read sequencing - LSCC lab talk - fri 5 june 2015](https://reader033.vdocuments.mx/reader033/viewer/2022052602/55b6e291bb61eb53268b477f/html5/thumbnails/7.jpg)
Structural variationThe missing heritability - not just SNPs & indels
![Page 8: Long read sequencing - LSCC lab talk - fri 5 june 2015](https://reader033.vdocuments.mx/reader033/viewer/2022052602/55b6e291bb61eb53268b477f/html5/thumbnails/8.jpg)
Long read instruments
![Page 9: Long read sequencing - LSCC lab talk - fri 5 june 2015](https://reader033.vdocuments.mx/reader033/viewer/2022052602/55b6e291bb61eb53268b477f/html5/thumbnails/9.jpg)
Pacific Biosciences RSII
2015 ARC LIEFw/ Tim Stinear
Installed this week.
Passed testing!
![Page 10: Long read sequencing - LSCC lab talk - fri 5 june 2015](https://reader033.vdocuments.mx/reader033/viewer/2022052602/55b6e291bb61eb53268b477f/html5/thumbnails/10.jpg)
Oxford Nanopore MinION MkI
Successor to Mk0
MinION Access Program Round 2
The up & comer!
![Page 11: Long read sequencing - LSCC lab talk - fri 5 june 2015](https://reader033.vdocuments.mx/reader033/viewer/2022052602/55b6e291bb61eb53268b477f/html5/thumbnails/11.jpg)
PacBio It’s already here and it works.
![Page 12: Long read sequencing - LSCC lab talk - fri 5 june 2015](https://reader033.vdocuments.mx/reader033/viewer/2022052602/55b6e291bb61eb53268b477f/html5/thumbnails/12.jpg)
PacBio - the device∷ It’s big!
∷ Three chunks: compute (left): robotics (top): sequencing (bottom)
∷ A cushion of N2 gas
![Page 13: Long read sequencing - LSCC lab talk - fri 5 june 2015](https://reader033.vdocuments.mx/reader033/viewer/2022052602/55b6e291bb61eb53268b477f/html5/thumbnails/13.jpg)
PacBio - technology∷ Polymerase bound to
bottom of ZMW μ-well
∷ Fluorescent nucleotide incorporation measured in real time
∷ 3 hour “movies”
![Page 14: Long read sequencing - LSCC lab talk - fri 5 june 2015](https://reader033.vdocuments.mx/reader033/viewer/2022052602/55b6e291bb61eb53268b477f/html5/thumbnails/14.jpg)
PacBio: read lengths
Needs careful library prep to ensure DNA is
not overly fragmented!
![Page 15: Long read sequencing - LSCC lab talk - fri 5 june 2015](https://reader033.vdocuments.mx/reader033/viewer/2022052602/55b6e291bb61eb53268b477f/html5/thumbnails/15.jpg)
PacBio: error rate
Single read: 86% 30x Consensus: 99.999%
![Page 16: Long read sequencing - LSCC lab talk - fri 5 june 2015](https://reader033.vdocuments.mx/reader033/viewer/2022052602/55b6e291bb61eb53268b477f/html5/thumbnails/16.jpg)
PacBio: main applications
∷ Finished microbial genomes
∷ Full length cDNA (mRNA isoforms)
∷ Extreme GC sequence
∷ HLA / MHC / KIR haplotyping
∷ Base modifications (methylation)
![Page 17: Long read sequencing - LSCC lab talk - fri 5 june 2015](https://reader033.vdocuments.mx/reader033/viewer/2022052602/55b6e291bb61eb53268b477f/html5/thumbnails/17.jpg)
PacBio: bioinformatics
∷ All in GitHub∷ SMRT Portal
: Nice GUI: Cloud ready: Linux backend: Cluster ready
∷ Cmdline too!
![Page 18: Long read sequencing - LSCC lab talk - fri 5 june 2015](https://reader033.vdocuments.mx/reader033/viewer/2022052602/55b6e291bb61eb53268b477f/html5/thumbnails/18.jpg)
Oxford NanoporeThe new kid on the block.
![Page 19: Long read sequencing - LSCC lab talk - fri 5 june 2015](https://reader033.vdocuments.mx/reader033/viewer/2022052602/55b6e291bb61eb53268b477f/html5/thumbnails/19.jpg)
MinION - the device
![Page 20: Long read sequencing - LSCC lab talk - fri 5 june 2015](https://reader033.vdocuments.mx/reader033/viewer/2022052602/55b6e291bb61eb53268b477f/html5/thumbnails/20.jpg)
PromethION - large scale
∷ 48 separate
flow cells
∷ On board ASIC
∷ Runs Python
![Page 21: Long read sequencing - LSCC lab talk - fri 5 june 2015](https://reader033.vdocuments.mx/reader033/viewer/2022052602/55b6e291bb61eb53268b477f/html5/thumbnails/21.jpg)
Nanopore - technology
![Page 22: Long read sequencing - LSCC lab talk - fri 5 june 2015](https://reader033.vdocuments.mx/reader033/viewer/2022052602/55b6e291bb61eb53268b477f/html5/thumbnails/22.jpg)
Nanopore - types of reads“1D reads”
∷ Template 1D﹕ only fwd stran
∷ Complement 1D﹕ only rev strand
“2D reads”
∷ Normal 2D﹕ mostly fwd, some rev
∷ Full 2D﹕ most of fwd & rev﹕ these are high quality
![Page 23: Long read sequencing - LSCC lab talk - fri 5 june 2015](https://reader033.vdocuments.mx/reader033/viewer/2022052602/55b6e291bb61eb53268b477f/html5/thumbnails/23.jpg)
Nanopore - read lengths
Read length is not limited by technology but by library preparation.
Can get >100kbp reads.
Read length
![Page 24: Long read sequencing - LSCC lab talk - fri 5 june 2015](https://reader033.vdocuments.mx/reader033/viewer/2022052602/55b6e291bb61eb53268b477f/html5/thumbnails/24.jpg)
Nanopore - error rate
∷ 5-mer errors∷ Not modelling
base mods yet∷ Basically
where PacBio was a few years ago!
Percent identity (aligned)
![Page 25: Long read sequencing - LSCC lab talk - fri 5 june 2015](https://reader033.vdocuments.mx/reader033/viewer/2022052602/55b6e291bb61eb53268b477f/html5/thumbnails/25.jpg)
MinION - applications
∷ Same as PacBio plus....
∷ Portable sequencing: in the field eg. Josh Quick in Guinea for Ebola: in hospitals - infection control: monitoring - water/food supply, production facilities: at the GP - pathogen test in 10 min from blood prick?: spit in a home device every morning?
![Page 26: Long read sequencing - LSCC lab talk - fri 5 june 2015](https://reader033.vdocuments.mx/reader033/viewer/2022052602/55b6e291bb61eb53268b477f/html5/thumbnails/26.jpg)
MinION - bioinformatics
∷ Event space -vs- base space: MinION MkI - base calling in cloud (Metrichor): MinION MkII - on device?: PromethION - can choose on-device add-on
∷ Mostly 3rd-party tools - lots of activity: poretools, poRe : minoTour, nanoPolish
![Page 27: Long read sequencing - LSCC lab talk - fri 5 june 2015](https://reader033.vdocuments.mx/reader033/viewer/2022052602/55b6e291bb61eb53268b477f/html5/thumbnails/27.jpg)
Disruptive technologyJust another sequencer?
![Page 28: Long read sequencing - LSCC lab talk - fri 5 june 2015](https://reader033.vdocuments.mx/reader033/viewer/2022052602/55b6e291bb61eb53268b477f/html5/thumbnails/28.jpg)
“Run until” Dynamically adjust sequencing yield
![Page 29: Long read sequencing - LSCC lab talk - fri 5 june 2015](https://reader033.vdocuments.mx/reader033/viewer/2022052602/55b6e291bb61eb53268b477f/html5/thumbnails/29.jpg)
“Read until”
∷ Can access events/bases during reading: remember reads are long 40 kbp: examine first 100 bp say: can decide to stop reading and eject molecule!
∷ This is a killer app!: only want pathogens? eject if human DNA: only want exome? eject if not exonic looking: controlled with Python code
![Page 30: Long read sequencing - LSCC lab talk - fri 5 june 2015](https://reader033.vdocuments.mx/reader033/viewer/2022052602/55b6e291bb61eb53268b477f/html5/thumbnails/30.jpg)
VolTRAX - library prep
![Page 31: Long read sequencing - LSCC lab talk - fri 5 june 2015](https://reader033.vdocuments.mx/reader033/viewer/2022052602/55b6e291bb61eb53268b477f/html5/thumbnails/31.jpg)
A new business model
∷ No capital or reagent costs: Instrument will be free: Flow cells will be free: Only pay for what you want to sequence: Min. $20 and ~$1000 for a 100x human genome
∷ But I’ll scam the system!: Flowcell stats sent back to base: Won’t send you new flow cells if they look unused
![Page 32: Long read sequencing - LSCC lab talk - fri 5 june 2015](https://reader033.vdocuments.mx/reader033/viewer/2022052602/55b6e291bb61eb53268b477f/html5/thumbnails/32.jpg)
How will our job change?
![Page 33: Long read sequencing - LSCC lab talk - fri 5 june 2015](https://reader033.vdocuments.mx/reader033/viewer/2022052602/55b6e291bb61eb53268b477f/html5/thumbnails/33.jpg)
Some things never change
∷ Don’t worry!: 50% of our job will always be converting file formats ☺
∷ But things are improving: Pacbio: HDF5: MinION: HDF5 / FAST5
∷ Can convert .h5/.hd5 to .fastq easily
![Page 34: Long read sequencing - LSCC lab talk - fri 5 june 2015](https://reader033.vdocuments.mx/reader033/viewer/2022052602/55b6e291bb61eb53268b477f/html5/thumbnails/34.jpg)
Read alignment
∷ PacBio: BLASR - Basic Local Alignment + Successive Refinement: BWA MEM - bwa mem -x pacbio
∷ MinION: MarginAlign - sum over possible alignments, HMMs: BWA MEM - bwa mem -x ont
∷ Need to modify variant caller parameters
![Page 35: Long read sequencing - LSCC lab talk - fri 5 june 2015](https://reader033.vdocuments.mx/reader033/viewer/2022052602/55b6e291bb61eb53268b477f/html5/thumbnails/35.jpg)
De novo assembly
∷ Pacbio: HGAP, HGAP2, Falcon, Spades, Celera Assembler
∷ MinION: Spades, Celera Assembler, NanoPolish
∷ Lots of convergence: Similar error models (indels): Long reads, lower coverage - back to the future!
![Page 36: Long read sequencing - LSCC lab talk - fri 5 june 2015](https://reader033.vdocuments.mx/reader033/viewer/2022052602/55b6e291bb61eb53268b477f/html5/thumbnails/36.jpg)
Streaming analysis
∷ We are not going to keep all this data
∷ Extract info we need and discard
∷ Cheaper to resequence?
∷ Need to think streaming analyses
∷ Lots of new applications
![Page 37: Long read sequencing - LSCC lab talk - fri 5 june 2015](https://reader033.vdocuments.mx/reader033/viewer/2022052602/55b6e291bb61eb53268b477f/html5/thumbnails/37.jpg)
Conclusion
![Page 38: Long read sequencing - LSCC lab talk - fri 5 june 2015](https://reader033.vdocuments.mx/reader033/viewer/2022052602/55b6e291bb61eb53268b477f/html5/thumbnails/38.jpg)
Exciting times!
∷ Genomics is changing all the time: new technologies: changing attributes/properties of current technology
∷ Bioinformaticians need to be able to adapt: focus on key skills not specific apps
∷ Pipelines are often short lived: except maybe clinical / accredited ones