(shotgun) - read the docs · twospeciicconcepts: •...
TRANSCRIPT
![Page 1: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/1.jpg)
(Shotgun) sequencing Titus Brown
![Page 2: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/2.jpg)
Three basic problems
A. B.
XXX
X
XX
X
X
XX
X
C.
XX
X X
XX
Resequencing, coun2ng, and assembly.
![Page 3: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/3.jpg)
1. Resequencing analysis
A. B.
XXX
X
XX
X
X
XX
X
C.
XX
X X
XX
We know a reference genome, and want to find variants (blue) in a background of errors (red)
![Page 4: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/4.jpg)
2. Counting We have a reference genome (or gene set) and want to
know how much we have. Think gene expression/microarrays. A. B.
XXX
X
XX
X
X
XX
X
C.
XX
X X
XX
![Page 5: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/5.jpg)
3. Assembly We don’t have a genome or any reference, and we
want to construct one. (This is how all new genomes are sequenced.)
A. B.
XXX
X
XX
X
X
XX
X
C.
XX
X X
XX
![Page 6: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/6.jpg)
![Page 7: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/7.jpg)
Outline • Shotgun sequencing • The magic of polonies, and how Illumina sequencing works
• Sequencing depth, read length, and coverage • Paired-‐end sequencing and insert sizes • Coverage bias • Long reads: PacBio and Nanopore sequencing
![Page 8: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/8.jpg)
Shotgun sequencing It was the best of 2mes, it was the worst of 2mes, it was
the age of wisdom, it was the age of foolishness
It was the best of 2mes, it was the wor , it was the worst of 2mes, it was the isdom, it was the age of foolishness
mes, it was the age of wisdom, it was th
![Page 9: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/9.jpg)
![Page 10: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/10.jpg)
Two speciAic concepts: • First, sequencing everything at random is very much easier than sequencing a specific gene region. (For example, it will soon be easier and cheaper to shotgun-‐sequence all of E. coli then it is to get a single good plasmid sequence.)
• Second, if you are sequencing on a 2-‐D substrate (wells, or surfaces, or whatnot) then any increase in density (smaller wells, or beRer imaging) leads to a squared increase in the number of sequences yielded.
![Page 11: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/11.jpg)
Random sampling => deep sampling needed
Typically 10-‐100x needed for robust recovery (300 Gbp for human)
11
![Page 12: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/12.jpg)
“Coverage” Genome (unknown)
X XX
XX
XX
X
XX
XX
X
X
Reads(randomly chosen;
have errors)
X
XX
“Coverage” is simply the average number of reads that overlap each true base in genome.
Here, the coverage is ~10 – just draw a line straight down from the top
through all of the reads. 12
![Page 13: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/13.jpg)
Illumina yields the deepest sequencing available • MiSeq
• 30 million reads per run • 300 base paired-‐end reads
• HiSeq 2500 RR/X 10 • 6 billion reads per run • 150 base paired-‐end reads
• PacBio • 44,000 reads per run • 8500 bp in length
hRp://flxlexblog.wordpress.com/2014/06/11/developments-‐in-‐next-‐genera2on-‐sequencing-‐june-‐2014-‐edi2on/
![Page 14: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/14.jpg)
Illumina basics
hRp://ted.b2.cornell.edu/cgi-‐bin/epigenome/method-‐1.cgi
(See hRp://seqanswers.com/forums/showthread.php?t=21 for details)
![Page 15: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/15.jpg)
A movie of Illumina sequencing:
hRps://www.youtube.com/watch?v=tuD-‐ST5B3QA#t=61
![Page 16: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/16.jpg)
What goes wrong with basic assumptions? • Not all sequence is as easily sequenced as other, depending on your sequencing technology (e.g. GC/AT bias);
• Some RNA not be as accessible as others (secondary structure);
![Page 17: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/17.jpg)
FASTQ • @895:1:1:1246:14654/1 • CAGGCGCCCACCACCGTGCCCTCCAACCTGATGGT • + • ][aaX__aa[`ZUZ[NONNFNNNNNO_____^RQ_ • @895:1:1:1246:14654/2 • ACTGGGCGTAGACGGTGTCCTCATCGGCACCAGC • + • \UJUWSSV[JQQWNP]]SZ]ZWU^]ZX][^TXR` • @895:1:1:1252:19493/1 • CCGGCGTGGTTGGTGAGGTCACTGAGCTTCATGTC • + • OOOKONNNNN__`R]O[TGTRSY[IUZ]]]__X__
![Page 18: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/18.jpg)
Read length and reconstructability
Whiteford et al., Nuc. Acid Res, 2005
![Page 19: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/19.jpg)
“Reconstructability” • Assembling new genomes or transcriptomes…
• Haplotyping -‐ think human gene2cs & viruses, both.
![Page 20: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/20.jpg)
Repeats! (and shared exons) A
R
R B
C D
![Page 21: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/21.jpg)
Longer reads … OR … Paired-‐end/mate pair sequencing
A
R
R B
C D
longer reads
paired ends
![Page 22: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/22.jpg)
Paired-‐end sequencing
hRp://vallandingham.me/RNA_seq_differen2al_expression.html
![Page 23: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/23.jpg)
![Page 24: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/24.jpg)
Mate-‐pair sequencing (long insert)
![Page 25: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/25.jpg)
Longer reads • PacBio • Moleculo • Nanopore
![Page 26: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/26.jpg)
hRp://www.melanieswan.com/FOLS.html
![Page 27: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/27.jpg)
Moleculo (Illumina)
hRp://nextgenseek.com/2013/07/illumina-‐announces-‐moleculo-‐long-‐read-‐technology-‐and-‐phasing-‐as-‐service/
![Page 28: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/28.jpg)
hRp://labs.mcb.harvard.edu/branton/projects-‐NanoporeSequencing.htm
![Page 29: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/29.jpg)
Actual yields • MiSeq
• 30 million reads per run • 300 base paired-‐end reads
• HiSeq 2500 RR/X 10 • 6 billion reads per run • 150 base paired-‐end reads
• PacBio • 44,000 reads per run • 8500 bp in length
hRp://flxlexblog.wordpress.com/2014/06/11/developments-‐in-‐next-‐genera2on-‐sequencing-‐june-‐2014-‐edi2on/
![Page 30: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/30.jpg)
![Page 31: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/31.jpg)
Your basic data (FASTQ) • @895:1:1:1246:14654/1 • CAGGCGCCCACCACCGTGCCCTCCAACCTGATGGT • + • ][aaX__aa[`ZUZ[NONNFNNNNNO_____^RQ_ • @895:1:1:1246:14654/2 • ACTGGGCGTAGACGGTGTCCTCATCGGCACCAGC • + • \UJUWSSV[JQQWNP]]SZ]ZWU^]ZX][^TXR` • @895:1:1:1252:19493/1 • CCGGCGTGGTTGGTGAGGTCACTGAGCTTCATGTC • + • OOOKONNNNN__`R]O[TGTRSY[IUZ]]]__X__
![Page 32: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/32.jpg)
Mapping
• Many fast & efficient computa2onal solu2ons exist. • You have to figure out how to choose parameters to maximize sensi2vity/specificity, and when to validate.
U. Colorado hRp://genomics-‐course.jasondk.org/?p=395
![Page 33: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/33.jpg)
Assembly Reassemble random fragments computa2onally.
UMD assembly primer (cbcb.umd.edu)
![Page 34: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/34.jpg)
Shotgun sequencing It was the best of 2mes, it was the wor , it was the worst of 2mes, it was the isdom, it was the age of foolishness
mes, it was the age of wisdom, it was th
It was the best of 2mes, it was the worst of 2mes, it was the age of wisdom, it was the age of foolishness
![Page 35: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/35.jpg)
Where does # of reads count?
A. B.
XXX
X
XX
X
X
XX
X
C.
XX
X X
XX
Resequencing, coun2ng, and assembly.
![Page 36: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/36.jpg)
Where does reconstructability matter?
A. B.
XXX
X
XX
X
X
XX
X
C.
XX
X X
XX
Resequencing, coun2ng, and assembly.
![Page 37: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/37.jpg)
Summary • Coverage maRers for SNP calls and assembly;
• # of reads maRers for coun2ng;
• Length of reads maRers for reconstructability (assembly & haplotyping);
• Illumina is s2ll “best” for high coverage; • PacBio and Moleculo => genome assembly; • Nanopore: s2ll tricky but lots of progress being made.
![Page 38: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/38.jpg)
Bad data I asked: hRps://twiRer.com/c2tusbrown/status/624721875252420608
I received: • hRp://www.bioinfo-‐core.org/index.php/Interes2ng_NGS_failures
• hRp://bioinfo-‐core.org/index.php/9th_Discussion-‐28_October_2010
• hRps://biomickwatson.wordpress.com/2013/01/21/ten-‐things-‐to-‐consider-‐when-‐choosing-‐an-‐ngs-‐supplier/
![Page 39: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/39.jpg)
Sequencing Bloopers Simon Andrews Tim Stevens
![Page 40: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/40.jpg)
Technical sequencer problems
![Page 41: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/41.jpg)
Manifold burst in cycle 26
![Page 42: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/42.jpg)
SpeciAic cycles lost
![Page 43: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/43.jpg)
No priming /signal (Wrong adapters used)
Read 1 Read 2 (barcode)
![Page 44: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/44.jpg)
Tile Problems -‐ Overclustering
![Page 45: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/45.jpg)
Tile Problems – Consistent tile fail
![Page 46: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/46.jpg)
Tile problems – transient tile fail
![Page 47: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/47.jpg)
Incorrect Phred Scores
Found LOTS of examples of this in the SRA
“the NCBI SRA makes all its data available as standard Sanger FASTQ files (even if originally from a Solexa/Illumina machine)”
Nucleic Acids Res. 2010 Apr; 38(6): 1767–1771.
1
10
100
1000
10000
100000
1000000
33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81 84 87 90 93 96 99 102 105 108 111 114 117 120 123 126
SRR619473
Phred33 (Sanger) Phred64 (Illumina)
![Page 48: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/48.jpg)
Data Extraction
![Page 49: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/49.jpg)
Wrong barcode annotation
0
5
10
15
20
25
30
35
40
AGTTCCG ATGTCAG AGTCAAC CAGATCA AGTCAAA GGCTACA ATCACGA TGACCAA ATGTAAG AGTTACG
Expected barcode
Not expected barcode
![Page 50: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/50.jpg)
Contaminated Barcode Stocks
0
2
4
6
8
10
12
14
16
18
20
GCCAAT ACTTGA CAGATC TTAGGC CTTGTA ACAGTG GATCAG TTAAGC CGGATC
Barcode Frequency
Expected barcode
Not expected barcode
![Page 51: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/51.jpg)
Odd sequence composition
![Page 52: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/52.jpg)
Read through adapter
![Page 53: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/53.jpg)
Adapter dimer overload
Sequence % Possible Source CCTAAGGAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAAAAAA 9.42 Illumina Single End PCR Primer 1 TCAATGAAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAAAAAA 7.30 Illumina Single End PCR Primer 1 GAGACTCAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAAAAAA 5.65 Illumina Single End PCR Primer 1
gi|372098977|ref|NT_039624.8| Mus musculus chr16 GRCm38 CTGGAAGGGAGAAAAGTCCAAACATTCTGGCTCTAACTTCT ||||||||||||||||||||||||||||||||| || |||| CTGGAAGGGAGAAAAGTCCAAACATTCTGGCTCCAAGTTCT gi|372098992|ref|NT_039500.8| Mus musculus chr10 GRCm38 CTTTCTCTATCTGAATTATAAACAAAAGCACACAGGCCCGCTTACATTTACATGATAAAATGTGCACTTTG |||||||||| || |||||||||||||||||||||||||||||||| ||||||||||||||| | |||| CTTTCTCTATATGCATTATAAACAAAAGCACACAGGCCCGCTTACAGGGACATGATAAAATGTGAAATTTG (Single-cell Hi-C)
![Page 54: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/54.jpg)
Positional Sequence Bias Application SpeciAic – BS-‐Seq
![Page 55: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/55.jpg)
Positional Sequence Biases Expected -‐ RRBS
Also reports of a ‘Chinese CRO’ whose RRBS libraries have the MspI sites missing due to their proprietary and unexplained pre-‐processing
![Page 56: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/56.jpg)
Positional Sequence Biases Unavoidable – RNA-‐Seq
![Page 57: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/57.jpg)
Positional Sequence Biases Unexpected – Doubled Adapters
![Page 58: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/58.jpg)
Overrepresented Individual Sequences • Adapter dimers • rRNA • Satellite sequences
![Page 59: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/59.jpg)
My data doesn’t map well…
![Page 60: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/60.jpg)
Contaminated with guessable sequence
www.bioinforma2cs.babraham.ac.uk/projects/fastq_screen
![Page 61: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/61.jpg)
Contaminated with guessable sequence
CRUK Mul2-‐genome alignment system (MGA)
![Page 62: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/62.jpg)
Contamination with unguessable sequence
>AF431889 AF431889.1 Acinetobacter lwoffii type IIs modification Query: 1 cggtgagcaggcattagaaattgattttttagaaggtgtgttgaagaaactgggccgctt 60 |||||||||||||||||||||||||||||||||||||||||||||||||||||| ||||| Sbjct: 4661 cggtgagcaggcattagaaattgattttttagaaggtgtgttgaagaaactgggtcgctt 4720 >GQ352402 GQ352402.1 Acinetobacter baumannii strain AbSK-17 plasmid Query: 1 ggtgagcagtggtttacatggttaattgaacaagacatcaacttctgcattcgtg 55 ||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 8213 ggtgagcagtggtttacatggttaattgaacaagacatcaacttctgcattcgtg 8159 >AF431889 AF431889.1 Acinetobacter lwoffii type IIs modification Query: 1 acttgctgcgattaaagcagaaaaaacacttgctgaattgagtgct 46 |||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 4484 acttgctgcgattaaagcagaaaaaacacttgctgaattgagtgct 4529
![Page 63: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/63.jpg)
TAGC Plots
hRps://github.com/blaxterlab/blobology
Assemble
Filter con2gs
Plot %GC vs Coverage
Sample and blast
![Page 64: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/64.jpg)
Reagent contamination
Molbio grade water is not the same as DNA free water – heat treated but DNA survives
![Page 65: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/65.jpg)
Later this week -‐-‐ Many different approaches to evalua2ng quality/mismatches: 1. Quality-‐score based (FastQC etc)
2. Composi2on based (FastQC etc)
3. Reference based (“I know what the answer should look like”)
4. Assembly-‐graph / k-‐mer based
![Page 66: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/66.jpg)
Reference & quality-‐score independent approaches (k-‐mers)
Zhang et al., hRps://peerj.com/preprints/890/
![Page 67: (Shotgun) - Read the Docs · Twospeciicconcepts: • First,&sequencing&everything&atrandom &is&very&much&easier& than&sequencing&aspecific&gene®ion.&&(For&example,&itwill& soon&be&easier&and](https://reader030.vdocuments.mx/reader030/viewer/2022040213/5ea8d292afe0bb49951f107f/html5/thumbnails/67.jpg)
…from a well known data set…
Zhang et al., hRps://peerj.com/preprints/890/