automated barcoding using the characteristic attribute organization system indra neil sarkar, phd...

24
Automated Barcoding Using the Characteristic Attribute Organization System Indra Neil Sarkar, PhD Divisions of Invertebrate Zoology & Library Services American Museum of Natural History Consortium for the Barcoding of Life Data Analysis Working Group Muséum National d’Histoire Naturelle July 06, 2006

Upload: roger-baker

Post on 02-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Automated Barcoding Using theCharacteristic Attribute Organization System

Indra Neil Sarkar, PhDDivisions of Invertebrate Zoology & Library Services

American Museum of Natural History

Consortium for the Barcoding of LifeData Analysis Working Group

Muséum National d’Histoire NaturelleJuly 06, 2006

© 2006 Indra Neil Sarkar, PhD CBoL DAWG 2006

Ambition&

Being BOLD

http://www.jaestudio.com/

© 2006 Indra Neil Sarkar, PhD CBoL DAWG 2006

Barcoding

• Identify Species– Recall– Precision

• Speed– Simplicity– Consistency

© 2006 Indra Neil Sarkar, PhD CBoL DAWG 2006

Similarity Based Methods

• BLAST– Database Retrieval

• Clustering Algorithms– Phenetics

© 2006 Indra Neil Sarkar, PhD CBoL DAWG 2006

Phenetic vs Cladistic

• Tree Topologies Are Often Different!– Which Is Right?– Does it Matter?

• Similarity Methods (Phenetic)– Evolution of Complete Sequences– FAST

• Character Methods (Cladistic)– Evolution of Individual Characters – SLOW

© 2006 Indra Neil Sarkar, PhD CBoL DAWG 2006

A Character Mindset

MLAT

MLBT

MRBT

MLCTMRCTMRCA

MLATMLBTMRBTMLCTMRCTMRCA

© 2006 Indra Neil Sarkar, PhD CBoL DAWG 2006

A Character Mindset

MLAT

MLBT

MRBT

MLCTMRCTMRCA

Characters

© 2006 Indra Neil Sarkar, PhD CBoL DAWG 2006

A Character Mindset

MLAT

MLBT

MRBT

MLCTMRCTMRCA

CharacterStates

© 2006 Indra Neil Sarkar, PhD CBoL DAWG 2006

CAOS

• Characteristic– Character States

• Attribute– Characters

• Organization System

• Originally Designed as a Character-Based Heuristic for Phylogenetic Classification

© 2006 Indra Neil Sarkar, PhD CBoL DAWG 2006

CAOS

A

B

C

D

© 2006 Indra Neil Sarkar, PhD CBoL DAWG 2006

CAOS

Pure (Pu)Private (Pr)

Simple (s)Compound (c)

ALL Members of One Group HaveThe SameCharacter State

SOME Members of One Group HaveThe SameCharacter State

CA’s with single positionCA’s with multiple positions

© 2006 Indra Neil Sarkar, PhD CBoL DAWG 2006

CAOS Classification

Rule Set

UnclassifiedSequence

© 2006 Indra Neil Sarkar, PhD CBoL DAWG 2006

Characters vs Vectors

• Characters = Diagnostic– Apomorphies

• Vectors ≠ Diagnostic– Similarity Score

Which approach provides a consistent phylogenetic representation of data?

© 2006 Indra Neil Sarkar, PhD CBoL DAWG 2006

© 2006 Indra Neil Sarkar, PhD CBoL DAWG 2006

Mopalia Test Case

• 569bp COI

• 19 In-Group Species

• 116 Individuals (~6/Species)

• What Happens to Classification Accuracy with Limited Sampling (e.g., 50%)?

© 2006 Indra Neil Sarkar, PhD CBoL DAWG 2006

Entire Dataset

Phenetic 100%

CAOS 100%

AB

Phenetic 59%

CAOS 96%

A B

BA

Phenetic 69%

CAOS 100%

© 2006 Indra Neil Sarkar, PhD CBoL DAWG 2006

Proceeding BOLDly

© 2006 Indra Neil Sarkar, PhD CBoL DAWG 2006

© 2006 Indra Neil Sarkar, PhD CBoL DAWG 2006

atcgatcgatcgatcgatcgatcgTTatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgTTatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgTTatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgTTatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgTTatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgTTatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgTTatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgTTatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgTTatcgatcgatcgatcgatcgatcg

atcgatcgatcgatcgatcgatcgAAatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgAAatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgAAatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgAAatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgAAatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgAAatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgAAatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgAAatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgatcgAAatcgatcgatcgatcgatcgatcg

------------------------T------------------------------------------------T------------------------------------------------T------------------------------------------------T------------------------------------------------T------------------------------------------------T------------------------------------------------T------------------------------------------------T------------------------------------------------T------------------------

------------------------A------------------------ ------------------------A------------------------------------------------A------------------------------------------------A------------------------------------------------A------------------------------------------------A------------------------------------------------A------------------------------------------------A------------------------------------------------A------------------------

1 2

------------------------T------------------------------------------------T------------------------------------------------T------------------------------------------------T------------------------------------------------T------------------------------------------------T------------------------------------------------T------------------------------------------------T------------------------------------------------T------------------------

1

------------------------A------------------------------------------------A------------------------------------------------A------------------------------------------------A------------------------------------------------A------------------------------------------------A------------------------------------------------A------------------------------------------------A------------------------------------------------A------------------------

2

T

A

© 2006 Indra Neil Sarkar, PhD CBoL DAWG 2006

On Being Ambitious...

• Inter- vs. Intra- Species Classification

• Limited Sampling Strategies

• Accuracy at the Cost of Speed

© 2006 Indra Neil Sarkar, PhD CBoL DAWG 2006

On Being BOLD...

• Diagnostics

• Primers– Drop-Off– PCR– TAQ Assay

• Single Molecular Sequencing Oligos• Diagnostic-Based Query Interface

(In Addition to NJ Interface)

© 2006 Indra Neil Sarkar, PhD CBoL DAWG 2006

Will the Real DNA Barcode Please Stand Up?Will the Real DNA Barcode Please Stand Up?

© 2006 Indra Neil Sarkar, PhD CBoL DAWG 2006

Acknowledgments

Rob DeSalleRyan P KellyPaul J PlanetMark Siddall

Al Phillips

MLA Donald A.B. Lindberg Research FellowshipNational Science Foundation (IIS-0241229)

Lewis B. & Dorothy Program for Molecular Systematics

© 2006 Indra Neil Sarkar, PhD CBoL DAWG 2006

Indra Neil Sarkar, [email protected]