dna barcoding statistics rasmus nielsen university of copenhagen

21
DNA Barcoding DNA Barcoding Statistics Statistics Rasmus Nielsen Rasmus Nielsen University of Copenhagen University of Copenhagen

Upload: baldric-park

Post on 03-Jan-2016

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: DNA Barcoding Statistics Rasmus Nielsen University of Copenhagen

DNA Barcoding DNA Barcoding StatisticsStatistics

Rasmus NielsenRasmus Nielsen

University of CopenhagenUniversity of Copenhagen

Page 2: DNA Barcoding Statistics Rasmus Nielsen University of Copenhagen

Statistical ApproachesStatistical Approaches

Hypothesis testing problem.Hypothesis testing problem. Test membership of specific species.Test membership of specific species.

Decision theoretic/Bayesian problemDecision theoretic/Bayesian problem Choose assignment by weighing how Choose assignment by weighing how

desirable/undesirable false positives desirable/undesirable false positives and false negatives are.and false negatives are.

Species assignment and higher Species assignment and higher taxonomic assignment without taxonomic assignment without population genetics.population genetics.

Page 3: DNA Barcoding Statistics Rasmus Nielsen University of Copenhagen

Approach 1: Hypothesis Approach 1: Hypothesis testingtesting

Test HTest H00: : X Si In divergence model In divergence model X Si ~ ~ TT = 0 = 0 Likelihood ratio test Likelihood ratio test

based on based on

)(max

)(log2 0

TL

TL

T

T

a

T

Page 4: DNA Barcoding Statistics Rasmus Nielsen University of Copenhagen

Distribution of LRDistribution of LR

Page 5: DNA Barcoding Statistics Rasmus Nielsen University of Copenhagen

Statistical ApproachesStatistical Approaches

Hypothesis testing problem.Hypothesis testing problem. Test membership of specific species.Test membership of specific species.

Decision theoretic/Bayesian problemDecision theoretic/Bayesian problem Choose assignment by weighing how Choose assignment by weighing how

desirable/undesirable false positives desirable/undesirable false positives and false negatives are.and false negatives are.

Species assignment and higher Species assignment and higher taxonomic assignment without taxonomic assignment without population genetics.population genetics.

Page 6: DNA Barcoding Statistics Rasmus Nielsen University of Copenhagen

Approach 2: Classical Approach 2: Classical (decision theoretic) (decision theoretic)

assignment approachassignment approachBase assignment on Base assignment on Pr(X Si | D, X)

X: query sequenceSi : set of (mostly unobserved) sequences from species ID: all the avcailable DNA sequence data

Page 7: DNA Barcoding Statistics Rasmus Nielsen University of Copenhagen

ComputationComputation

Use MCMC under coalescence Use MCMC under coalescence model with divergence between model with divergence between species and other parameters.species and other parameters.

Calculate Calculate Pr(X Si | D, X) from MCMC output.

Currently only implemented for two species

Page 8: DNA Barcoding Statistics Rasmus Nielsen University of Copenhagen

Skipper butterfly Skipper butterfly Astraptes Astraptes fulgeratorfulgerator

Page 9: DNA Barcoding Statistics Rasmus Nielsen University of Copenhagen

Skipper butterfly Skipper butterfly Astraptes Astraptes fulgeratorfulgerator

Page 10: DNA Barcoding Statistics Rasmus Nielsen University of Copenhagen

Why not use assignment Why not use assignment based on marginal based on marginal

probabilities?probabilities?What if we usedWhat if we used

i.e. we can calculate posterior probabilities i.e. we can calculate posterior probabilities by assuming independence, i.e. ignoring by assuming independence, i.e. ignoring phylogeny.phylogeny.

jjjj

iiiii SXpSXX

SXpSXXXSX

)(),|Pr(

)(),|Pr(),|Pr(

D

DD

Page 11: DNA Barcoding Statistics Rasmus Nielsen University of Copenhagen

Assignment errorAssignment error

Page 12: DNA Barcoding Statistics Rasmus Nielsen University of Copenhagen

Approach 3: Coaleescence- Approach 3: Coaleescence- ShmoalescenceShmoalescence

Assign based on monophyly with Assign based on monophyly with other members of species other members of species (phylogenetic criterion).(phylogenetic criterion).

Do not estimate phylogeny but only Do not estimate phylogeny but only placement of query sequence placement of query sequence

of phylogeny.of phylogeny. Calculate posterior Calculate posterior

probability of assignment.probability of assignment.

Page 13: DNA Barcoding Statistics Rasmus Nielsen University of Copenhagen

AlgorithmsAlgorithms

BLAST to identify candidate set of BLAST to identify candidate set of species.species.

Possible iteration to ensure a Possible iteration to ensure a phylogenetic diverse sample.phylogenetic diverse sample.

Align and pipe to special version of Align and pipe to special version of MrBayes (by J. Huelsenbeck) which MrBayes (by J. Huelsenbeck) which maintains phylogenetic constraints.maintains phylogenetic constraints.

Caluclate assignment probability Caluclate assignment probability based on MrBayes output.based on MrBayes output.

Page 14: DNA Barcoding Statistics Rasmus Nielsen University of Copenhagen

Example taxonomy Example taxonomy summarysummary

Page 15: DNA Barcoding Statistics Rasmus Nielsen University of Copenhagen

fig2

Page 16: DNA Barcoding Statistics Rasmus Nielsen University of Copenhagen
Page 17: DNA Barcoding Statistics Rasmus Nielsen University of Copenhagen
Page 18: DNA Barcoding Statistics Rasmus Nielsen University of Copenhagen

Greenland Ice Cores Greenland Ice Cores ExampleExample

Page 19: DNA Barcoding Statistics Rasmus Nielsen University of Copenhagen

Greenland Ice Cores Greenland Ice Cores ExampleExample

Page 20: DNA Barcoding Statistics Rasmus Nielsen University of Copenhagen

Neanderthal ExampleNeanderthal Example

Page 21: DNA Barcoding Statistics Rasmus Nielsen University of Copenhagen

AcknowledgmentsAcknowledgments

Misha Matz (Coalescence based Misha Matz (Coalescence based methods).methods).

Wouter Boomsma and Kasper Munch Wouter Boomsma and Kasper Munch (Phylogenetic methods).(Phylogenetic methods).

John Huelsenbeck (MrBayes).John Huelsenbeck (MrBayes). Eske Willerslev (Ice and DNA Eske Willerslev (Ice and DNA

examples).examples). Jody Hey (discussion and inspiration).Jody Hey (discussion and inspiration).