future applications of full length virus genome...
TRANSCRIPT
Presented at the 8th European HIV Drug Resistance Workshop, March 17-19 2010, Sorrento, Italy
Future applications of full length virus genome sequencing
Paul Kellam Virus Genomics
Presented at the 8th European HIV Drug Resistance Workshop, March 17-19 2010, Sorrento, Italy
Revisiting early HIV resistance ideasRevisiting early HIV resistance ideas
Nature 1993
AIDS 1991
Presented at the 8th European HIV Drug Resistance Workshop, March 17-19 2010, Sorrento, Italy
Virus genome sequencingVirus genome sequencing
Population or
single genome
Presented at the 8th European HIV Drug Resistance Workshop, March 17-19 2010, Sorrento, Italy
Whole genome sequencingWhole genome sequencingPrimerPrimer--walking with M13 adaptors, capillary sequencingwalking with M13 adaptors, capillary sequencing
Presented at the 8th European HIV Drug Resistance Workshop, March 17-19 2010, Sorrento, Italy
Population biologyPopulation biologyThe consensus sequence
The treatment
The minority species
Presented at the 8th European HIV Drug Resistance Workshop, March 17-19 2010, Sorrento, Italy
2nd generation: 454 sequencing2nd generation: 454 sequencing
Throughput: 500 Mb/run(GSFLX)
1 M reads/runRead length: ~500 bp
Presented at the 8th European HIV Drug Resistance Workshop, March 17-19 2010, Sorrento, Italy
HIV 2HIV 2ndnd generation sequencinggeneration sequencing
Drug resistance mutations Drug Resistance – population structure
~400b.p (inc V3)
Fragmented (nebulised 454 library)
Presented at the 8th European HIV Drug Resistance Workshop, March 17-19 2010, Sorrento, Italy
Direct indexing for 454Direct indexing for 454
b)
Presented at the 8th European HIV Drug Resistance Workshop, March 17-19 2010, Sorrento, Italy
Compound errors in 2Compound errors in 2ndnd generation sequencinggeneration sequencing
ProcesscDNA synthesis error rate
Library representation
PCR error rate for clusters
Sequencing error rates
TechnicalSampling efficiency
Robustness of process
Cost effectiveness
Mutliplexing/logistics
Presented at the 8th European HIV Drug Resistance Workshop, March 17-19 2010, Sorrento, Italy
Coverage and errors in 454 sequencesCoverage and errors in 454 sequences
Wang et al, Genome Res. 2007 17: 1195-1201
Presented at the 8th European HIV Drug Resistance Workshop, March 17-19 2010, Sorrento, Italy
22ndnd generation datageneration data@IL29_4275:7:1:1031:2292#7/1ATCATCTTCCTCACGACGTTTGCCAATTTAGCCTTCTTCTCNTCCCCTCCGACT +BCCC@B?BCCBCCC@CCC:CCACCC;;>>>?C4CC>>@BB@&7=A>?<A2=BCC
@IL29_4275:7:1:1054:12506#7/1ATAATGGATAAAACCATCATATTGAAAGCAAACTTCAGTGTGATTTTTGACCGG +CCCBC?BCCCCACCCCCCCCCCCCCCC ACCC?BCCCCCC?CCCCCCCCC>CCCC
@IL29_4275:7:1:1060:16244#7/1ATATTCTGGAGCAATGAAATTTCCATTACTCTCGAAGTTGATTGCATCATTCGG +BCBCCBCBCBCCCCCACCCCCCCBCC@ CC=CBABCC;CCBBC@=;C?CBCCCC#
@IL29_4275:7:1:1061:2394#7/1ATTTGGCGTCAAGCGAACAATGGAGAGGACGCAACTGCTGGTCTTACCCACCTG +=BBBBBBB@B>?B6BB=B>BBBBB?>A <A65:???8-4;;.:>BBABB/AABB5
@IL29_4275:7:1:1077:4877#7/1CCTGATGTGTATTTCTTGGTTATGGCCATCTGGTCCACAGTGGTTTTTGTTAGT +ADA>2????:>BBAB>BBBA?.?<?BB BB1?????BAB9A89;0:??>B;;8B6
Etc to ~ 1-3 million? means 99.9% accurate
Presented at the 8th European HIV Drug Resistance Workshop, March 17-19 2010, Sorrento, Italy
PhredPhred
QPHRED = -10 x Log10 (Pe)
Presented at the 8th European HIV Drug Resistance Workshop, March 17-19 2010, Sorrento, Italy
Phred scoresPhred scores
Phred Qual score
Prob that base is called wrongly
Accuracy of base call
ASCI code
10 1 in 10 90% +
20 1 in 100 99% 5
30 1 in 1000 99.9% ?
40 1 in 10000 99.99% I
50 1 in 100000 99.999% S
Presented at the 8th European HIV Drug Resistance Workshop, March 17-19 2010, Sorrento, Italy
splitFastq_by_MIDsplitFastq_by_MID 454FastaFromSFF454FastaFromSFF
assemblyPipeline_splitQA.sh
assemblyPipeline_QCQA.sh
Fastq_QCFastq_QC
Fastq_QAFastq_QA
Fastq_QAFastq_QA
SSAHA2SSAHA2assemblyPipeline_SSAHAmap.sh
SAMToolsSAMTools
pileupConsensu s
pileupConsensu s
ScriptScript
FASTQFASTQ
SFFSFF
JPEGJPEG
SAM/BAMSAM/BAM
Presented at the 8th European HIV Drug Resistance Workshop, March 17-19 2010, Sorrento, Italy
1/481/48thth of a 454 plate (1/12of a 454 plate (1/12thth of a of a ¼¼ plate)plate)
Presented at the 8th European HIV Drug Resistance Workshop, March 17-19 2010, Sorrento, Italy
Phred 25
Presented at the 8th European HIV Drug Resistance Workshop, March 17-19 2010, Sorrento, Italy
Sequencing whole virus genomesSequencing whole virus genomes
Bluetongue Virus
Varicella Zoster Virus
Influenza Virus
Presented at the 8th European HIV Drug Resistance Workshop, March 17-19 2010, Sorrento, Italy
Images from
Fragment DNA and add adapters
Bind fragments to flow cell
Bridge amplification
Denature to return to single stranded DNA
2nd generation: 2nd generation: IlluminaIllumina ((SolexaSolexa) sequencing) sequencing
Presented at the 8th European HIV Drug Resistance Workshop, March 17-19 2010, Sorrento, Italy
Images from
Produces millions of DNA
clusters
Add all 4 labelled
terminators
Lazer excitation causes
flourescence which is
photographed
Repeat these sequencing
cycles
2nd generation: 2nd generation: IlluminaIllumina ((SolexaSolexa) sequencing) sequencing
Presented at the 8th European HIV Drug Resistance Workshop, March 17-19 2010, Sorrento, Italy
Illumina Influenza H5N1PreIllumina Influenza H5N1Pre-- & Post Quality Filtering& Post Quality Filtering
Presented at the 8th European HIV Drug Resistance Workshop, March 17-19 2010, Sorrento, Italy
454454--Illumina Coverage ComparisonIllumina Coverage Comparison
PB2 PB1 PA HA NP NA M1/M2 NS1/NS2
Presented at the 8th European HIV Drug Resistance Workshop, March 17-19 2010, Sorrento, Italy
Dataset Platform Total Reads
Mean Read Length
Ref Coverage %
Min Coverage
Max Coverage
557H5N1
454 15,214 425.81 100 259 1,928
Illumina 1,669,501 53.96 100 7,013 61,435
Presented at the 8th European HIV Drug Resistance Workshop, March 17-19 2010, Sorrento, Italy
Comparison of platformsComparison of platforms’’ consensus sequencesconsensus sequences
Reference Position
Called Base
454 Illumina5615 A R (A or G)5621 C Y (C or T)5624 T W (A or T)6900 G S (C or G)8472 A W (A or T)8477 G R (A or G)8715 A G (difference)8937 T K (G or T)9623 G K (G or T)12575 T K (G or T)13111 T K (G or T)13280 A W (A or T)
12 differences / 13,500 bp genome = 0.089%
Presented at the 8th European HIV Drug Resistance Workshop, March 17-19 2010, Sorrento, Italy
Consensus and population structure Consensus and population structure
Patient 1Patient 2
Patient 3Patient 4
Patient 5Patient 6
Patient 7Patient 8
Frequency
Presented at the 8th European HIV Drug Resistance Workshop, March 17-19 2010, Sorrento, Italy
The problem of linkageThe problem of linkage
Presented at the 8th European HIV Drug Resistance Workshop, March 17-19 2010, Sorrento, Italy
Diversity and phenotypic potentialDiversity and phenotypic potential
Kellam & Larder, J.Virol 1995, 69(2); 669
Presented at the 8th European HIV Drug Resistance Workshop, March 17-19 2010, Sorrento, Italy
QuasispeciesQuasispecies –– haplotypehaplotype reconstructionreconstruction
Eriksson et al, Plos Comp Biol May 2008, 4(5); e1000074
Presented at the 8th European HIV Drug Resistance Workshop, March 17-19 2010, Sorrento, Italy
ConclusionsConclusions• Move towards many more consensus whole genomes and
abstraction of population structure
• Drive down/control/filter for sequencing errors.
• 3rd generation (end 2010) will produce longer reads
• Learn more from ecologists
• Considerations of genome to infectivity ratio’s
Presented at the 8th European HIV Drug Resistance Workshop, March 17-19 2010, Sorrento, Italy
Virus Genomics Team Virus Genomics Team http://www.sanger.ac.uk/Teams/Team146www.sanger.ac.uk/Teams/Team146/
Rachael Chiam Simon
Watson
Greg Baillie
Anne PalserAstrid
Gall
HIV; Myra McClure, Deenan PillayInfluenza; James Wood, Maria Zambon
BTV; Massimo Palmarini & Peter MertensVZV; Judy Breuer