ASaiM An intuitive and adjustable pipeline to process metatranscriptomic data from
intestinal microbiota
Bérénice Batut, Clémence Defois, Céline Ribière, Cyrielle Gasc, Jean-François Brugère, Eric Peyretaillade, CPER consortium Environnement
Digestif, Pierre Peyret
ASaiM An intuitive and adjustable pipeline to process metatranscriptomic data from
intestinal microbiota
Bérénice Batut, Clémence Defois, Céline Ribière, Cyrielle Gasc, Jean-François Brugère, Eric Peyretaillade, CPER consortium Environnement
Digestif, Pierre Peyret
ASaiM An environment to analyze
intestinal microbiota Demo with analysis of gut
metatranscriptomic sequences
Why ASaiM?
6
Gut metagenomic projects
NCBI 2318
ENA 1103
DDBJ 28
MG-Rast 46
Camera 3
Total 3508
But Difficult to query those databases Not standardized information
Why ASaiM?
8
Available tools for metagenomic and metatranscriptomic sequence processing
But
Almost nothing for metatranscriptomic sequences Difficult to use Not adjustable Only one step in sequence processing and analysis
Checkout the code!
17
Download source code Move to demo directory
$ git clone https://github.com/ASaiM/ASaiM.git
$ cd demo
$ ls
config_file.json R2_sequences.fastq R1_sequences.fastq
config_file.json
18
Download source code Move to demo directory
$ git clone https://github.com/ASaiM/ASaiM.git
$ cd demo
$ ls
config_file.json R2_sequences.fastq R1_sequences.fastq
(Really) easy pipeline execution
25
Install requirements Docker Docker-compose make
Execute the pipeline
$ cd demo/
$ make –f ../Makefile run_pipeline
Generated outputs
28
2015-07-02_19-31/ report.txt quality_estimation/ FastQC/ quality_treatments/ Prinseq/ paired_end_assembly/ FastQ_Join/ rna_sorting/ SortMeRNA/ non_rRNA_taxonomic_assignation/ MetaPhlAn/ protein_ncrna_db_search/ search_against_cog/ Blast/
Generated outputs
29
2015-07-02_19-31/ report.txt quality_estimation/ FastQC/ quality_treatments/ Prinseq/ paired_end_assembly/ FastQ_Join/ rna_sorting/ SortMeRNA/ non_rRNA_taxonomic_assignation/ MetaPhlAn/ protein_ncrna_db_search/ search_against_cog/ Blast/
k__Bacteria 100.0 k__Bacteria|p__Bacteroidetes 95.68413 k__Bacteria|p__Fusobacteria 4.31587 k__Bacteria|p__Bacteroidetes|c__Bacteroidia 92.62004 k__Bacteria|p__Fusobacteria|c__Fusobacteria 4.31587 k__Bacteria|p__Bacteroidetes|c__Flavobacteria 3.06409 k__Bacteria|p__Bacteroidetes|c__Bacteroidia|o__Bacteroidales 92.62004 k__Bacteria|p__Fusobacteria|c__Fusobacteria|o__Leptotrichales 4.31587 k__Bacteria|p__Bacteroidetes|c__Flavobacteria|o__Flavobacteriales 3.06409 k__Bacteria|p__Bacteroidetes|c__Bacteroidia|o__Bacteroidales|f__Bacteroidaceae 88.87376 k__Bacteria|p__Fusobacteria|c__Fusobacteria|o__Leptotrichales|f__Leptotrichales_unclassified 4.31587 k__Bacteria|p__Bacteroidetes|c__Bacteroidia|o__Bacteroidales|f__Porphyromonadaceae 3.74628 k__Bacteria|p__Bacteroidetes|c__Flavobacteria|o__Flavobacteriales|f__Flavobacteriaceae 3.06409 k__Bacteria|p__Bacteroidetes|c__Bacteroidia|o__Bacteroidales|f__Bacteroidaceae|g__Bacteroides 88.87376 k__Bacteria|p__Bacteroidetes|c__Bacteroidia|o__Bacteroidales|f__Porphyromonadaceae|g__Parabacteroides 3.74628 k__Bacteria|p__Bacteroidetes|c__Flavobacteria|o__Flavobacteriales|f__Flavobacteriaceae|g__Cellulophaga 3.06409 k__Bacteria|p__Bacteroidetes|c__Bacteroidia|o__Bacteroidales|f__Bacteroidaceae|g__Bacteroides|s__Bacteroides_unclassified 88.87376 k__Bacteria|p__Bacteroidetes|c__Bacteroidia|o__Bacteroidales|f__Porphyromonadaceae|g__Parabacteroides|s__Parabacteroides_unclassified 3.74628 k__Bacteria|p__Bacteroidetes|c__Flavobacteria|o__Flavobacteriales|f__Flavobacteriaceae|g__Cellulophaga|s__Cellulophaga_unclassified
3.06409
report.txt Executed treatments and some results
30
2015-07-02_19-31/ report.txt quality_estimation/ FastQC/ quality_treatments/ Prinseq/ paired_end_assembly/ FastQ_Join/ rna_sorting/ SortMeRNA/ non_rRNA_taxonomic_assignation/ MetaPhlAn/ protein_ncrna_db_search/ search_against_cog/ Blast/
Pretreatments... Quality control... Quality esOmaOon... Run FastQC... Quality treatment... Run PRINSEQ... 60 bad sequences for R1 979 bad sequences for R2 6136 conserved sequences for R1 5217 conserved sequences for R2 Paired-‐end assembly... Run FastQ_Join... 3777 joined sequences 2359 single sequences for R1 1440 single sequences for R2 RNA sorOng... Run SortMeRNA... 1465 rRNA sequences 2312 non rRNA sequences
Taxonomic assignaOon... Non rRNA sequence taxonomic assignaOon... Run MetaPhlAn…
FuncOonal assignaOon... Search against protein and ncRNA databases... Search against COG database... Run Blast…
Current status
31
Core structure Available
7 tools 1 sequence database (COG)
Open-source Documentation
https://asaim.github.io/
ASaiM
What’s next?
32
Short term More detailed reports Addition of visualization tools Tests
Long term
More tools and treatments Better web interface Expert database