06.30.2014.brainstorming.the.downstream.analysis.of.metagenomic.data.using.r.bayesian.statistics.or.other.methods...

20
Downstream Analysis of Metagenomic Data Using R, Bayesian Statistics, or Other Methods Mitch Fernandez July 1 st , 2014

Upload: mitch-fernandez

Post on 14-Aug-2015

60 views

Category:

Science


0 download

TRANSCRIPT

Brainstorming the Downstream Analysis of Metagenomic Data Using R, Bayesian Statistics, or Other Methods

Mitch Fernandez

July 1st, 2014

Workflow• Data Preparation

• Data Preprocessing

• Data Clustering

• Downstream Analysis

Alignment

http://genome.cshlp.org/content/17/2/127/F2.expansion.html

Preprocessing pt. 2

Pre.clustering

http://www.nature.com/nmeth/journal/v9/n5/full/nmeth.1990.html

Preprocessing pt. 3

Chimeras

http://genome.cshlp.org/content/21/3/494/F1.expansion.html

Preprocessing pt. 4

Contaminants

http://www.reids-workouts.com/the-reason-you-are-not-building-muscle-part-2; http://acceleratingscience.com/proteomics/proteomic-analysis-of-mitochondria-unravels-the-pathophysiology-of-pre-eclampsia/; http://vickgaza.deviantart.com/art/Mountain-Yeti-360756260; http://www.funchap.com/pictures-of-dogs/; http://en.wikipedia.org/wiki/Chloroplast

Operational Taxonomic Units

OTU Clustering

http://rosalind.info/glossary/distance-matrix/https://peerj.com/articles/237/

Classifying OTUs

• Low Diversity

Richness and Diversity• High Diversity

Equal Richness

Text Parsing

• Automate Oligos file creation• Produce read count tables• Parse richness and diversity results• Parse error logs• Split data into groups

Normalization

Visualization

Metastats

  Former (n=24) Active (n=22)Difference in Relative Abundance

   OTU Reads Relative Abundance Variance Std. Error Reads Relative Abundance Variance Std. Error p-value q-valueSelenomonas_003 79 8.71E-04 1.00E-06 1.85E-04 216 2.03E-03 4.00E-06 4.01E-04 -1.16E-03 1.10E-02 2.33E-02Porphyromonas_001 607 8.30E-03 8.50E-05 1.89E-03 331 3.96E-03 7.00E-06 5.68E-04 4.34E-03 3.10E-02 6.38E-02Family_Burkholderiaceae_001 96 1.51E-03 5.00E-06 4.40E-04 67 5.04E-04 0.00E+00 1.48E-04 1.00E-03 3.50E-02 7.05E-02Neisseria_001 4,851 4.58E-02 4.68E-03 1.40E-02 1,353 2.00E-02 2.55E-04 3.41E-03 2.57E-02 3.80E-02 7.38E-02Campylobacter_001 121 1.10E-03 2.00E-06 2.76E-04 115 2.10E-03 4.00E-06 4.26E-04 -1.00E-03 5.29E-02 8.54E-02Rhizobium_001 20 3.42E-04 0.00E+00 1.37E-04 76 7.55E-04 1.00E-06 1.74E-04 -4.13E-04 5.99E-02 9.21E-02

Class_Gammaproteobacteria_002 158 2.03E-03 6.00E-06 5.06E-04 454 4.29E-03 2.90E-05 1.14E-03 -2.26E-03 7.09E-02 9.93E-02Catonella_001 274 2.13E-03 1.10E-05 6.81E-04 65 9.22E-04 1.00E-06 2.00E-04 1.21E-03 7.19E-02 1.00E-01Family_Carnobacteriaceae_001 621 6.83E-03 4.30E-05 1.34E-03 283 4.31E-03 1.20E-05 7.30E-04 2.52E-03 1.01E-01 1.25E-01Prevotella_001 3,064 3.58E-02 5.73E-04 4.89E-03 6,061 5.58E-02 2.50E-03 1.07E-02 -2.00E-02 1.06E-01 1.30E-01Paracoccus_001 24 2.76E-04 0.00E+00 1.28E-04 138 1.44E-03 1.40E-05 8.07E-04 -1.17E-03 1.06E-01 1.30E-01Actinomyces_001 903 7.32E-03 4.50E-05 1.37E-03 1,055 1.23E-02 1.76E-04 2.83E-03 -5.01E-03 1.07E-01 1.31E-01Prevotella_005 46 6.95E-04 2.00E-06 2.86E-04 328 3.09E-03 6.80E-05 1.76E-03 -2.40E-03 1.10E-01 1.34E-01Family_Rhodocyclaceae_001 14 1.92E-04 0.00E+00 1.36E-04 87 1.14E-03 8.00E-06 5.94E-04 -9.47E-04 1.25E-01 1.47E-01

Running the Workflow1. Gather your data

2. Prepare an Oligos file

3. Zip everything up and copy to the “work” folder

4. Run mothur.sh

5. Come back in a few hours/days

6. Run ReadCountTable.py on the taxonomy output

7. Do additional downstream processing

8. Publish results

What we need help with

Data managementPost-hoc OTU namingImproved scriptingIdentifying new toolsOther stuff I haven’t thought of

Thanks