automated barcoding using the characteristic attribute organization system indra neil sarkar, phd...
TRANSCRIPT
Automated Barcoding Using theCharacteristic Attribute Organization System
Indra Neil Sarkar, PhDDivisions of Invertebrate Zoology & Library Services
American Museum of Natural History
Consortium for the Barcoding of LifeData Analysis Working Group
Muséum National d’Histoire NaturelleJuly 06, 2006
© 2006 Indra Neil Sarkar, PhD CBoL DAWG 2006
Barcoding
• Identify Species– Recall– Precision
• Speed– Simplicity– Consistency
© 2006 Indra Neil Sarkar, PhD CBoL DAWG 2006
Similarity Based Methods
• BLAST– Database Retrieval
• Clustering Algorithms– Phenetics
© 2006 Indra Neil Sarkar, PhD CBoL DAWG 2006
Phenetic vs Cladistic
• Tree Topologies Are Often Different!– Which Is Right?– Does it Matter?
• Similarity Methods (Phenetic)– Evolution of Complete Sequences– FAST
• Character Methods (Cladistic)– Evolution of Individual Characters – SLOW
© 2006 Indra Neil Sarkar, PhD CBoL DAWG 2006
A Character Mindset
MLAT
MLBT
MRBT
MLCTMRCTMRCA
MLATMLBTMRBTMLCTMRCTMRCA
© 2006 Indra Neil Sarkar, PhD CBoL DAWG 2006
A Character Mindset
MLAT
MLBT
MRBT
MLCTMRCTMRCA
Characters
© 2006 Indra Neil Sarkar, PhD CBoL DAWG 2006
A Character Mindset
MLAT
MLBT
MRBT
MLCTMRCTMRCA
CharacterStates
© 2006 Indra Neil Sarkar, PhD CBoL DAWG 2006
CAOS
• Characteristic– Character States
• Attribute– Characters
• Organization System
• Originally Designed as a Character-Based Heuristic for Phylogenetic Classification
© 2006 Indra Neil Sarkar, PhD CBoL DAWG 2006
CAOS
Pure (Pu)Private (Pr)
Simple (s)Compound (c)
ALL Members of One Group HaveThe SameCharacter State
SOME Members of One Group HaveThe SameCharacter State
CA’s with single positionCA’s with multiple positions
© 2006 Indra Neil Sarkar, PhD CBoL DAWG 2006
Characters vs Vectors
• Characters = Diagnostic– Apomorphies
• Vectors ≠ Diagnostic– Similarity Score
Which approach provides a consistent phylogenetic representation of data?
© 2006 Indra Neil Sarkar, PhD CBoL DAWG 2006
Mopalia Test Case
• 569bp COI
• 19 In-Group Species
• 116 Individuals (~6/Species)
• What Happens to Classification Accuracy with Limited Sampling (e.g., 50%)?
© 2006 Indra Neil Sarkar, PhD CBoL DAWG 2006
Entire Dataset
Phenetic 100%
CAOS 100%
AB
Phenetic 59%
CAOS 96%
A B
BA
Phenetic 69%
CAOS 100%
© 2006 Indra Neil Sarkar, PhD CBoL DAWG 2006
atcgatcgatcgatcgatcgatcgTTatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgTTatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgTTatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgTTatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgTTatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgTTatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgTTatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgTTatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgTTatcgatcgatcgatcgatcgatcg
atcgatcgatcgatcgatcgatcgAAatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgAAatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgAAatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgAAatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgAAatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgAAatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgAAatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgAAatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgAAatcgatcgatcgatcgatcgatcg
------------------------T------------------------------------------------T------------------------------------------------T------------------------------------------------T------------------------------------------------T------------------------------------------------T------------------------------------------------T------------------------------------------------T------------------------------------------------T------------------------
------------------------A------------------------ ------------------------A------------------------------------------------A------------------------------------------------A------------------------------------------------A------------------------------------------------A------------------------------------------------A------------------------------------------------A------------------------------------------------A------------------------
1 2
------------------------T------------------------------------------------T------------------------------------------------T------------------------------------------------T------------------------------------------------T------------------------------------------------T------------------------------------------------T------------------------------------------------T------------------------------------------------T------------------------
1
------------------------A------------------------------------------------A------------------------------------------------A------------------------------------------------A------------------------------------------------A------------------------------------------------A------------------------------------------------A------------------------------------------------A------------------------------------------------A------------------------
2
T
A
© 2006 Indra Neil Sarkar, PhD CBoL DAWG 2006
On Being Ambitious...
• Inter- vs. Intra- Species Classification
• Limited Sampling Strategies
• Accuracy at the Cost of Speed
© 2006 Indra Neil Sarkar, PhD CBoL DAWG 2006
On Being BOLD...
• Diagnostics
• Primers– Drop-Off– PCR– TAQ Assay
• Single Molecular Sequencing Oligos• Diagnostic-Based Query Interface
(In Addition to NJ Interface)
© 2006 Indra Neil Sarkar, PhD CBoL DAWG 2006
Will the Real DNA Barcode Please Stand Up?Will the Real DNA Barcode Please Stand Up?
© 2006 Indra Neil Sarkar, PhD CBoL DAWG 2006
Acknowledgments
Rob DeSalleRyan P KellyPaul J PlanetMark Siddall
Al Phillips
MLA Donald A.B. Lindberg Research FellowshipNational Science Foundation (IIS-0241229)
Lewis B. & Dorothy Program for Molecular Systematics