rasmus group presentation

Upload: coppuca

Post on 30-May-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 Rasmus Group Presentation

    1/39

    A phylogeographic

    distance metric forProject leader: Dr Rasmus Hovmoller

    Group members:

    John Christensen, Shishi Luo, Jacob Porter,

  • 8/14/2019 Rasmus Group Presentation

    2/39

    Talk outline

    Project background and goals

    Theory and calculations

    Results Conclusions

  • 8/14/2019 Rasmus Group Presentation

    3/39

    Influenza

    H5N1 Endemic in bird population

    Bird-to-human transmission possible

    H3N2 Seasonal flu

    Human-to-human transmission only

  • 8/14/2019 Rasmus Group Presentation

    4/39

    Influenza

    Wild aquatic birds,reservoir for allinfluenza subtypes

    Domestic fowl, eg Humans, eg H3N2

  • 8/14/2019 Rasmus Group Presentation

    5/39

    Global distribution of H5N1 andH N2

  • 8/14/2019 Rasmus Group Presentation

    6/39

    Goal

    To compare and contrast the patternsof geographic spread of two types ofinfluenza. calculate a statistic for correlation

    between the distance between the virusisolates in the phylogenetic tree(patristic distance) and their actualgeographical distances.

  • 8/14/2019 Rasmus Group Presentation

    7/39

    Goal

    Create datasets

    Make trees

    Calculate patristic distances between

    all pairs of sequences Collect geographic metadata

    Calculate geographic distances

    between all pairs of sequences Calculate correlation coefficient and

    determine significance

  • 8/14/2019 Rasmus Group Presentation

    8/39

    Databases and software

    Genetic data from GenBank

    Phylogenetic trees generated in RAxML, TNT, Mr

    Bayes

    Integrative Tree of Life (iTOL) to visualize trees Excel, Unix to manipulate data

    Patristic distances calculated in Matlab

    Geographic data (longitude, latitude) fromGenbank data

  • 8/14/2019 Rasmus Group Presentation

    9/39

    Types of phylogenetic trees

    Parsimony tree is the tree thatrequires the least evolutionarychange to explain given data

    Maximum likelihood tree is the treewhich has the maximum likelihoodover all possible topologies under the

    specified evolution model

  • 8/14/2019 Rasmus Group Presentation

    10/39

    Types of phylogenetic trees

    Parsimony

    Optimality criterion: search for most

    simple tree

    Equal branch lengths

    Doesn't work for horizontal gene transfer

    Doesn't show genes lost during evolutionprocess

  • 8/14/2019 Rasmus Group Presentation

    11/39

    Types of phylogenetic trees

    Maximum Likelihood

    - Evolution is characterized by acontinuous Markov chain

    - Evolution model is a substitution ratematrix

    - Branch lengths show geneticdistances

    - Doesnt work with big datasets

  • 8/14/2019 Rasmus Group Presentation

    12/39

    Types of phylogenetic trees

    Improving of Maximum likelihood

    (mixed model):

    - Applying maximum likelihood onreasonable randomized parsimony

    starting trees

    - Using loop-level parallelism in thelikelihood functions

  • 8/14/2019 Rasmus Group Presentation

    13/39

    H3N2 ML vs. parsimony

  • 8/14/2019 Rasmus Group Presentation

    14/39

    H5N1 ML vs. parsimony

  • 8/14/2019 Rasmus Group Presentation

    15/39

    Calculating correlation

    Patristic distance Geographic distance

    B

    C

    (Assume all branch

    B

  • 8/14/2019 Rasmus Group Presentation

    16/39

    Geographic

    z

    d=cos-

    d is the shortestdistancebetween two pointsalongthe surface of thesphere.

    corresponds tolatitude corresponds tolongitudeand is the anglebetween the

    prime meridian and a

  • 8/14/2019 Rasmus Group Presentation

    17/39

  • 8/14/2019 Rasmus Group Presentation

    18/39

    Testing the significance of

    GCC does not have same null-hypothesis distribution as the usualPearsons correlation coefficient, r

    Use permutation distribution instead Since data set large, used random

    sample of permutations

    2774! > 108000

    1646! >H

    H

  • 8/14/2019 Rasmus Group Presentation

    19/39

    Testing the significance of

    H0: no significant correlation

    H1: significant positive correlation

    Reject H0 for sufficiently small p-value.

    P-value: proportion of permutation GCC> observed GCC

  • 8/14/2019 Rasmus Group Presentation

    20/39

    Smal

    f

    GCObserved

    f

    GCObserved

    f

    GCObserved

    Easy to calculate

    exact p-value

    Medium

    n

    Computationallintensive to

    Large n

    p-value should beestimated

    p-

  • 8/14/2019 Rasmus Group Presentation

    21/39

    P Value Computation:

    Non-standardized GCC

    Permuted non-

    Phillip Good. Permutation, Parametric, and Bootstrap Tests ofHypotheses. 3rdEdition. Springer: 2005Critchlow, et al. Some Statistical Methods for PhylogeneticTrees withApplication to HIV Disease. Mathematical and Computer Modeling 32

  • 8/14/2019 Rasmus Group Presentation

    22/39

    P Value Computation: R# X, Y are the matrices to correlate

    # perm is a permutation of row indices computed with the Rsample function# n is the number of rows of the matrix

    covPerm

  • 8/14/2019 Rasmus Group Presentation

    23/39

    P Value Computation:

  • 8/14/2019 Rasmus Group Presentation

    24/39

    Asia Scatter

  • 8/14/2019 Rasmus Group Presentation

    25/39

    Asia Scatter

  • 8/14/2019 Rasmus Group Presentation

    26/39

    H3N2 phylogeny colored by

  • 8/14/2019 Rasmus Group Presentation

    27/39

    H5N1 Ph l ith G hi

  • 8/14/2019 Rasmus Group Presentation

    28/39

    H5N1 Phylogeny with GeographicL ti n

  • 8/14/2019 Rasmus Group Presentation

    29/39

    H3N2 Ph logen ith Geographic

  • 8/14/2019 Rasmus Group Presentation

    30/39

    H3N2 Phylogeny with GeographicL ti n

  • 8/14/2019 Rasmus Group Presentation

    31/39

    Conclusions

  • 8/14/2019 Rasmus Group Presentation

    32/39

    H5N1

    Conclusion:

    There is no significant relationshipbetween patristic distance andgeographical distance.

    Explanation:(1)Its bird flu. Its much easier for birds

    to migrate among countries of Asia.

    (2)The H5N1 strain is fast-mutating.

    1. The Asian subset of data

  • 8/14/2019 Rasmus Group Presentation

    33/39

    Conclusion:

    There is some relationship between

    two kinds of distance. Explanation:

    Its a human influenza virus. The

    migration of humans is not frequentamong countries of Asia.

    H3N2

  • 8/14/2019 Rasmus Group Presentation

    34/39

    ConfusingH5N1

    Maximum Likelihood tree: norelationship.

    Parsimony tree: some

    relationship.H3N2

    Maximum Likelihood tree: no

    2. The global set of data

  • 8/14/2019 Rasmus Group Presentation

    35/39

    H5N1: no significantrelationship

    H3N2: some relationship

    he result of Europeansubset of data is

    3. The European subset of

  • 8/14/2019 Rasmus Group Presentation

    36/39

  • 8/14/2019 Rasmus Group Presentation

    37/39

    Tree algorithms:

    1) Parsimony tree: set all branchlengths equal to 1.

    2) Maximum likelihood tree: Its

    computational intensive, we haveto stop it before finding the besttree.

    Hypothesis Test: We sampled avery small proportion of the

  • 8/14/2019 Rasmus Group Presentation

    38/39

    Our results are consistent in smalldata sets:

    H5N1: no significant relationship

    H3N2: some relationship

    Thus, they are persuasive.

    On the other hand, the result forglobal data set is confusing, we needto do further research.

    Conclusion

  • 8/14/2019 Rasmus Group Presentation

    39/39

    Questions?