xin zhou - saturday closing plenary
TRANSCRIPT
![Page 1: Xin Zhou - Saturday Closing Plenary](https://reader035.vdocuments.mx/reader035/viewer/2022062704/5562550ed8b42a1b4b8b508c/html5/thumbnails/1.jpg)
Taxon diversity analysis for bulk insect samples using Illumina Hi-seq platform
Xin ZHOU, Shanlin LIU, Yiyuan LI,
Qing YANG, and Xu SU
Department of Science and Technology
Environmental Genomics Research Group
BGI, China
Adelaide, Australia, 3 December 2011
![Page 2: Xin Zhou - Saturday Closing Plenary](https://reader035.vdocuments.mx/reader035/viewer/2022062704/5562550ed8b42a1b4b8b508c/html5/thumbnails/2.jpg)
Opt.1: ......zzzzZZZZZ
Opt.2: morph sorting indiv. ID … Opt.1
Opt.3: morph sorting indiv. barcoding … Opt.1
Opt.4: grinding up NGS CLUSTERING/BLAST DIVERSITY!
Problem Solutions?
Zhou et al. 2011, 4th International Barcode of Life Conference
![Page 3: Xin Zhou - Saturday Closing Plenary](https://reader035.vdocuments.mx/reader035/viewer/2022062704/5562550ed8b42a1b4b8b508c/html5/thumbnails/3.jpg)
Environmental barcoding of bulk insects
Zhou et al. 2011, 4th International Barcode of Life Conference
aquatic insects mini-barcode (130bp) 454
bat diet (insects) COI fragment, 157 bp 454
Biodiversity soup: metabarcoding of arthropods for rapid biodiversity assessment and biomonitoring, Yu D.W. et.al., in review
Malaise trap (insects) COI fragment, ~400 bp 454
![Page 4: Xin Zhou - Saturday Closing Plenary](https://reader035.vdocuments.mx/reader035/viewer/2022062704/5562550ed8b42a1b4b8b508c/html5/thumbnails/4.jpg)
NGS platforms Read length
Data/run(GB) Run time
Requirement of library
construction
454 platform(GS FLX Titanium XL+) ~400bp 0.7 23 hr. Yes
Illumina platform(Hi-Seq 2000)
150bp PE reads 600 14 d. Yes
Illumina platform(Mi-Seq)
150bp PE reads 2 27 hr. Yes
Ion Torrent 200bp ~1 3.5 hr. No
Major NGS platforms applicable in environmental barcoding
Zhou et al. 2011, 4th International Barcode of Life Conference
higher through-put less $ / bp increasing reading length variety of bioinformatics tools available from genomic
pipelines
Illumina Hi-Seq
![Page 5: Xin Zhou - Saturday Closing Plenary](https://reader035.vdocuments.mx/reader035/viewer/2022062704/5562550ed8b42a1b4b8b508c/html5/thumbnails/5.jpg)
• 28 Illumina GAIIx• 137 Illumina Hi-Seq2000• 25 Life Tech
SOLiD 4• 16 ABI 3730XL • 110 MegaBACEs• 2 Illumina iScan• 1 Roche 454• 1 Ion Torrent• 1 Illumina Mi-Seq
Sequencing capacity at BGI
Data production:• 100 Gb / day (2009)• >5 Tb / day (end of 2010)• >1500X human genome / day
Zhou et al. 2011, 4th International Barcode of Life Conference
![Page 6: Xin Zhou - Saturday Closing Plenary](https://reader035.vdocuments.mx/reader035/viewer/2022062704/5562550ed8b42a1b4b8b508c/html5/thumbnails/6.jpg)
What I am NOT going to talk about:
• Primer optimization
• Systematic comparisons of NGS platforms
• Quantitative diversity analysis
What I AM going to talk about:
• Can Illumina NGS be used in diversity analysis?
Zhou et al. 2011, 4th International Barcode of Life Conference
![Page 7: Xin Zhou - Saturday Closing Plenary](https://reader035.vdocuments.mx/reader035/viewer/2022062704/5562550ed8b42a1b4b8b508c/html5/thumbnails/7.jpg)
Sequencing error rate
Read-length
Can Illumina NGS be used in diversity analysis?
Zhou et al. 2011, 4th International Barcode of Life Conference
![Page 8: Xin Zhou - Saturday Closing Plenary](https://reader035.vdocuments.mx/reader035/viewer/2022062704/5562550ed8b42a1b4b8b508c/html5/thumbnails/8.jpg)
Recent improvement in sequencing quality using Illumina’s V3 chemical
(even at 100 bp, only about 10% of the base callings has error rate >1%)
Zhou et al. 2011, 4th International Barcode of Life Conference
No indel issue in homopolymers
Sequencing quality keeps increasing
Rare nucleotide error can be easily
corrected by:
increasing sequencing depth
pair-end (PE) sequencing
setting stringent matching criteria in
the overlapping fragment by allowing
only >99% identity
Sequencing error rate
Insert-size250nt
150bp
150bp
PE sequencing enables forming sequence contigs
![Page 9: Xin Zhou - Saturday Closing Plenary](https://reader035.vdocuments.mx/reader035/viewer/2022062704/5562550ed8b42a1b4b8b508c/html5/thumbnails/9.jpg)
Zhou et al. 2011, 4th International Barcode of Life Conference
Read length keeps increasing
Short-gun reads can be further assembled
into longer fragments (“short-gun”
assembly
strategy used in genome sequencing
projects)
Read length
Insert-size250nt
150bp
150bp
150PE enables contig read of 250bp
Option of scaffold assembly
![Page 10: Xin Zhou - Saturday Closing Plenary](https://reader035.vdocuments.mx/reader035/viewer/2022062704/5562550ed8b42a1b4b8b508c/html5/thumbnails/10.jpg)
Illumina environmental barcoding
COI amplicons shotgun PE sequencing
Full length COI barcode PE sequencing
PCR based
Full length COI
PCR free
Full length COI without PCR bias
Mitochondrial shotgun PE sequencing
Illumina e-barcoding
Lib1 (658bp, 150PE) Lib2 (200bp, 150PE)
Zhou et al. 2011, 4th International Barcode of Life Conference
![Page 11: Xin Zhou - Saturday Closing Plenary](https://reader035.vdocuments.mx/reader035/viewer/2022062704/5562550ed8b42a1b4b8b508c/html5/thumbnails/11.jpg)
Sample information
Mock XSBN (provided by Yu et al.)
# Specimens 23 292
# Haplotypes (2%) 12 230
Soup protocol DNA extracted individually and mixed for PCR
PCR primers LepF1/LepR1 Customized
Sequence length 658 bp 700 bp
Sequencing library details Full length (658bp) + Short-gun library (~200bp)
Sequencing protocol 150PE
Zhou et al. 2011, 4th International Barcode of Life Conference
Approach #1: PCR-based
![Page 12: Xin Zhou - Saturday Closing Plenary](https://reader035.vdocuments.mx/reader035/viewer/2022062704/5562550ed8b42a1b4b8b508c/html5/thumbnails/12.jpg)
Lib 1 Mock XSBN
Raw data 1.67G 4.04GFiltering adapter 1.60G 1.28G
High quality (Q20)
0.35G 0.50G
# Reads (Primer removed)
1,081,997 1,150,477
# Unique reads (Abundance > 1)
36,618 45,444
Zhou et al. 2011, 4th International Barcode of Life Conference
Pre-analysis data filtering
Approach #1: PCR-based
![Page 13: Xin Zhou - Saturday Closing Plenary](https://reader035.vdocuments.mx/reader035/viewer/2022062704/5562550ed8b42a1b4b8b508c/html5/thumbnails/13.jpg)
Unique reads (abundance > 1)
OTU cluster (98%)
Remove Chimera
Compared to reads of Lib 2
Mock 36,618 784 490 119 44
XSBN 45,444 4,189 3887 403 399
OTU filtering workflow
Alignment
Zhou et al. 2011, 4th International Barcode of Life Conference
![Page 14: Xin Zhou - Saturday Closing Plenary](https://reader035.vdocuments.mx/reader035/viewer/2022062704/5562550ed8b42a1b4b8b508c/html5/thumbnails/14.jpg)
Results
Mock 84 36
XSBN 19832 197
Sanger Reference
NGS OTUsBlast at 100% identity
Zhou et al. 2011, 4th International Barcode of Life Conference
LepF1/R1
Customized primers
![Page 15: Xin Zhou - Saturday Closing Plenary](https://reader035.vdocuments.mx/reader035/viewer/2022062704/5562550ed8b42a1b4b8b508c/html5/thumbnails/15.jpg)
Mock
84 36
31 can be found in our total sample, from which our mock samples were assembledNot found in raw
data (likely due to primer failure)
5 likely to be PCR errors
Sanger Reference
NGS OTUs
Zhou et al. 2011, 4th International Barcode of Life Conference
False negative“False positive”?
![Page 16: Xin Zhou - Saturday Closing Plenary](https://reader035.vdocuments.mx/reader035/viewer/2022062704/5562550ed8b42a1b4b8b508c/html5/thumbnails/16.jpg)
XSBN
19832 197
17 not found in raw data (primer failure)
Mea
n +
SE
15 were lost in data filtering
Cross-sample contamination?
Zhou et al. 2011, 4th International Barcode of Life Conference
(group1) (group2)
Sanger Reference
NGS OTUs
![Page 17: Xin Zhou - Saturday Closing Plenary](https://reader035.vdocuments.mx/reader035/viewer/2022062704/5562550ed8b42a1b4b8b508c/html5/thumbnails/17.jpg)
18149 84
after removal of sequences with abundance <10
Significantly less false positives
Slight drop of true positives
Zhou et al. 2011, 4th International Barcode of Life Conference
19832 197
Sanger Reference
NGS OTUs
![Page 18: Xin Zhou - Saturday Closing Plenary](https://reader035.vdocuments.mx/reader035/viewer/2022062704/5562550ed8b42a1b4b8b508c/html5/thumbnails/18.jpg)
What’s next?
Zhou et al. 2011, 4th International Barcode of Life Conference
Obtaining full-length barcodes via short-gun reads assembly
(new program in development – “SOAPbarcode”)
New algorithm to filter out false positive OTUs
Approach #1: PCR-based
Illuminae-barcoding
![Page 19: Xin Zhou - Saturday Closing Plenary](https://reader035.vdocuments.mx/reader035/viewer/2022062704/5562550ed8b42a1b4b8b508c/html5/thumbnails/19.jpg)
Approach #2: PCR-free method
Zhou et al. 2011, 4th International Barcode of Life Conference
Individual barcoding
Total MT isolation&
DNA extraction
Shotgun sequencing
Reference
based methodReference
independent method
![Page 20: Xin Zhou - Saturday Closing Plenary](https://reader035.vdocuments.mx/reader035/viewer/2022062704/5562550ed8b42a1b4b8b508c/html5/thumbnails/20.jpg)
Building reference library: individual barcoding
1. 89 individuals;2. 84 reference barcodes;3. 39 OTUs (2%);
Taxon group # OTUs
Lepidoptera 25Diptera 7
Hemiptera 4Hymenoptera 2Psocoptera 1
Total 39
Zhou et al. 2011, 4th International Barcode of Life Conference
![Page 21: Xin Zhou - Saturday Closing Plenary](https://reader035.vdocuments.mx/reader035/viewer/2022062704/5562550ed8b42a1b4b8b508c/html5/thumbnails/21.jpg)
Total MT isolation & DNA extraction
Sample
mixture
Total MT
isolation
MT DNA extraction
Zhou et al. 2011, 4th International Barcode of Life Conference
![Page 22: Xin Zhou - Saturday Closing Plenary](https://reader035.vdocuments.mx/reader035/viewer/2022062704/5562550ed8b42a1b4b8b508c/html5/thumbnails/22.jpg)
Shotgun sequencing
Percentage of base pairs
Q20 (Sequencing error rate < 1%) 96.2%
Q30 (Sequencing error rate < 0.1%) 92.9%
GC content 38.0%
Insert size: 200bp;Read length: 100bp PE;
Zhou et al. 2011, 4th International Barcode of Life Conference
![Page 23: Xin Zhou - Saturday Closing Plenary](https://reader035.vdocuments.mx/reader035/viewer/2022062704/5562550ed8b42a1b4b8b508c/html5/thumbnails/23.jpg)
Pre-analysis
Raw data 2.45G
After filtering 2.20GRatio of high
quality reads 89.91%
Data filtering:1. Adaptor contamination removal;2. Quality control:
in each read, only allowing <10bp with seq. error rate >1%
Zhou et al. 2011, 4th International Barcode of Life Conference
![Page 24: Xin Zhou - Saturday Closing Plenary](https://reader035.vdocuments.mx/reader035/viewer/2022062704/5562550ed8b42a1b4b8b508c/html5/thumbnails/24.jpg)
Taxon groups # OTUs
Lepidoptera 20Diptera 2
Hemiptera 3Psocoptera 1
Total 26Not found 13
Method 1: Reference basedBlast reads to reference barcodes, confident identification is made only when:1. Best BLAST hit >98% identity;2. Reference coverage > 90%;
Reference 1
Reference 2
Correct mapping
Incorrect mapping
Coverage: 100%
Coverage: 30%
Approach #2: PCR-free method
Zhou et al. 2011, 4th International Barcode of Life Conference
![Page 25: Xin Zhou - Saturday Closing Plenary](https://reader035.vdocuments.mx/reader035/viewer/2022062704/5562550ed8b42a1b4b8b508c/html5/thumbnails/25.jpg)
Potential sources of failure in detecting taxa
?Taxon specific
orBio-mass
(size & number)
Zhou et al. 2011, 4th International Barcode of Life Conference
![Page 26: Xin Zhou - Saturday Closing Plenary](https://reader035.vdocuments.mx/reader035/viewer/2022062704/5562550ed8b42a1b4b8b508c/html5/thumbnails/26.jpg)
Taxon bias?
Failures in taxon detection
Taxon groups undetected
# Total OTUs
# OTUs missing
Lepidoptera 25 5Diptera 7 5
Hymenoptera 2 2Hemiptera 4 1Psocoptera 1 0
Total 39 13
Zhou et al. 2011, 4th International Barcode of Life Conference
![Page 27: Xin Zhou - Saturday Closing Plenary](https://reader035.vdocuments.mx/reader035/viewer/2022062704/5562550ed8b42a1b4b8b508c/html5/thumbnails/27.jpg)
OR bio-mass (body size, # individuals)?
Failures in taxon detection
Readily detectedAverage length> 5mm
MissingAverage length < 5mm
Zhou et al. 2011, 4th International Barcode of Life Conference
![Page 28: Xin Zhou - Saturday Closing Plenary](https://reader035.vdocuments.mx/reader035/viewer/2022062704/5562550ed8b42a1b4b8b508c/html5/thumbnails/28.jpg)
1. Assembly of COI gene using genome assembly program (SOAPdenovo);
2. Annotation using ~240 MT genomes downloaded from Genbank;
Method 2: Reference independent
Approach #2: PCR-free method
Zhou et al. 2011, 4th International Barcode of Life Conference
(Will we be able to identify diversity without reference MT genomes for the targeted species?)
Workflow:
![Page 29: Xin Zhou - Saturday Closing Plenary](https://reader035.vdocuments.mx/reader035/viewer/2022062704/5562550ed8b42a1b4b8b508c/html5/thumbnails/29.jpg)
PCR-Free reference-independent: results
23/31 falling in standard COI barcode region (mostly >600 bp);
1 of 23 is not in our reference barcodes;(Insecta; Lepidoptera; Pyralidae);
Multiple genes obtained simultaneously;1 nearly complete mitochondrial genome (~15k bp);3 fragments >6000 bp;
Zhou et al. 2011, 4th International Barcode of Life Conference
![Page 30: Xin Zhou - Saturday Closing Plenary](https://reader035.vdocuments.mx/reader035/viewer/2022062704/5562550ed8b42a1b4b8b508c/html5/thumbnails/30.jpg)
23/31 falling in standard COI barcode region (mostly >600 bp);
1 of 23 was not presented in our reference barcodes;(Insecta; Lepidoptera; Pyralidae);
Reference independent
Barcode references39 OTUs (84 individuals)
References based26 OTUs
References independent23 OTUs
Number of individuals we collected89 individuals
3 OTUs not detected in reference independent method because:
(1) sequencing depth is too low (<10X) to allow for reliable assembly
(2) relatively small body-size
5 individuals failed in Sanger sequencing
Zhou et al. 2011, 4th International Barcode of Life Conference
![Page 31: Xin Zhou - Saturday Closing Plenary](https://reader035.vdocuments.mx/reader035/viewer/2022062704/5562550ed8b42a1b4b8b508c/html5/thumbnails/31.jpg)
Gene NumberATP6 29ATP8 4COX1 31COX2 33COX3 31CYTB 31ND1 35ND2 34ND3 24ND4 30
ND4L 16ND5 30ND6 24
PCR-free method
Multiple MT genes obtained simultaneously
Zhou et al. 2011, 4th International Barcode of Life Conference
![Page 32: Xin Zhou - Saturday Closing Plenary](https://reader035.vdocuments.mx/reader035/viewer/2022062704/5562550ed8b42a1b4b8b508c/html5/thumbnails/32.jpg)
PCR-free method
1 nearly complete mitochondrial genome (~15k bp);3 fragments longer than 6k bp;
Barcode regionZhou et al. 2011, 4th International Barcode of Life Conference
![Page 33: Xin Zhou - Saturday Closing Plenary](https://reader035.vdocuments.mx/reader035/viewer/2022062704/5562550ed8b42a1b4b8b508c/html5/thumbnails/33.jpg)
What’s next?
1. Wet-lab protocol optimization Pre-sorting insects by body-size Alternative MT isolation methods
2. Increase sequencing depth
MT DNA 5-10% after isolation; Non-targeting DNA affects MT assembly (e.g.,
bacteria & genomic DNA); Taxonomic/biomass bias
Currently:
Potential solutions:
Approach #2: PCR-free method
Zhou et al. 2011, 4th International Barcode of Life Conference
![Page 34: Xin Zhou - Saturday Closing Plenary](https://reader035.vdocuments.mx/reader035/viewer/2022062704/5562550ed8b42a1b4b8b508c/html5/thumbnails/34.jpg)
Conclusions Illumina Hi-Seq delivers compatible performance
as other NGS platforms in analyzing bulk insect samples, with potential advantages in achieving higher sensitivity at lower cost;
Deep sequencing capacity enables a novel PCR-free approach, which may eventually solve biases caused by DNA amplification;
It shares issues with other NGS platforms (non-quantitative, inflation of OTUs, etc.)
Methodology optimization is much needed in many details of the pipeline;
Collaborative and synergistic efforts made by the community would greatly advance the progress.
Zhou et al. 2011, 4th International Barcode of Life Conference
![Page 35: Xin Zhou - Saturday Closing Plenary](https://reader035.vdocuments.mx/reader035/viewer/2022062704/5562550ed8b42a1b4b8b508c/html5/thumbnails/35.jpg)
Acknowledgements
Douglas W. YuKunming Institute of Zoology, Chinese Academy of Sciences
Mehrdad Hajibabaei, Shadi ShokrallaUniversity of Guelph
Owain EdwardsCSIRO Ecosystem Sciences
LU JianliangWU QiongAN SainanZHOU YizhuangZHAO Jing
Collaborators:
Zhou et al. 2011, 4th International Barcode of Life Conference
Funder:
![Page 36: Xin Zhou - Saturday Closing Plenary](https://reader035.vdocuments.mx/reader035/viewer/2022062704/5562550ed8b42a1b4b8b508c/html5/thumbnails/36.jpg)
36
Thanks for your attention!
Zhou et al. 2011, 4th International Barcode of Life Conference