Genboree Microbiome Workbench 16S Workshop Part I
March 11th, 2014Julia Cope
Emily HollisterKevin Riehle
Genboree 16S Workshop
• Learning Objectives– Students should be able to take .sff files and user
supplied information and produce:• Metadata File• PCoA• Classification Distribution
• Expectations– Apply topics learned today before next meeting– Be able to discuss where issues arise– Be able to move knowledgeably through the whole
Genboree Workflow
Genboree 16S Workshop Part II
• Learning Outcomes– Newer database version of RDP – How to take advantage?– Students should take user .sff files and user created
metadata file and produce: (I can provide files if needed.)• PCoA (QIIME)• Classification Distribution (RDP)
• Expectations– Apply topics learned in tutorial– Be able to discuss where in the process issues arose – Have a hypothesis about your data issues if they happen
Workshop Outline
• 16S• Metadata File• Genboree Workbench Workflow
– Account– Group– Database– Project– Loading your files/samples/sequences (and linking)– QIIME– RDP– How to get help
• Wrap Up and Preparation for 2nd Installment
Resources
• Genboree Home Screen– http://genboree.org
• Tutorials are located in the Genboree Commons– You must be signed in to open the following link– http://genboree.org/theCommons/projects/mw-march-201
4– Tutorial 1 Data Set:
• http://www.genboree.org/microbiome/include/data/tutorial_sequence_file.sff.gz
– Tutorial 2 Data Set:• http://genboree.org/theCommons/attachments/3545/Tutorial_2
.zip
• Projects are accessed through the Genboree Workbench
16S
• What is it?• What part is being sequenced?– Here?– Elsewhere?
• How is this accomplished?– DNA to bead to light– Intro. to flow data and .sff file content– OUTPUT is an .sff file– Aside on zipping methods and large file transfers
Allmetrics.net Sales Material
Tortoli E Clin. Microbiol. Rev. 2003;16:319-354
• What is it? 16Svedberg (small sub-unit of the ribosome)
• What part is being sequenced?
Here? - TCMC sequences the V5-V3 by 454Elsewhere? - V3-V5, V1-V3, V9, V7-V9…many more.Know your variable regions
16S
16S
• How is this accomplished?– DNA to bead to light
http://cage.unl.edu/equipmentsoftware.shtml454 Life Sciences Sales Materials
16S
• How is this accomplished?– DNA to bead to light
http://cage.unl.edu/equipmentsoftware.shtml454 Life Sciences Sales Materials
16S
• How is this accomplished?– DNA to bead to light– Intro to flow data and sff file content– OUTPUT is an .sff file– Standard Flowgram Format
• All reads are structured as linker-tag-primer • Provides both identity and quality information
http://cage.unl.edu/equipmentsoftware.shtmlAllmetrics.net Sales Material
Genboree Workflow
• Take one step back from the Genboree Workflow and talk about input files.
• What do you do with your files?
From: Genboree.org help files
Meta-data
.sff
Genboree Workflow
• What do you do with many files?• Genboree takes .zip, .gzip, .txt, and .sff files– Compressed files are easier and faster to move– Multiple files are easier to move when compressed together
in an archive
Meta-data
.sff
.sff.sff
.sff
.sff
.sff.sff(s) should be
archived and compressed.
Meta data files are very small and do not
need compression.Meta-data
Metadata Files
• What data must you have?• How should it be formatted for Genboree?• What can you include?• How to make it tab-delimited• Include variable region or primer?• Directional awareness on primers
Metadata Files
• What data must you have?– name– barcode– region or proximal & distal– First column must begin with #– #No_spaces_are_allowed_in_column_names_0123456789
• How should it be formatted for Genboree?– Tab delimited
• What can you include?• How to make it tab-delimited?• Include variable region or primer?• Directional awareness on primers
Metadata Files
• How to determine which to include - variable region or primers
• Directional awareness on primers• Demo of making and saving as tab delimited
#name barcode proximal distal region body_siteS_700033665 CCGTTCCTC CCGTCAATTCMTTTRAGT CTGCTGCCTCCCGTAGG V3V5 StoolS_700035861 ACCGGCGTTC CCGTCAATTCMTTTRAGT CTGCTGCCTCCCGTAGG V3V5 StoolS_700095543 ACGAATTAAC CCGTCAATTCMTTTRAGT CTGCTGCCTCCCGTAGG V3V5 StoolS_700095850 AACCGGATAC CCGTCAATTCMTTTRAGT CTGCTGCCTCCCGTAGG V3V5 StoolS_700101600 AACGGAACGC CCGTCAATTCMTTTRAGT CTGCTGCCTCCCGTAGG V3V5 StoolT_700016994 AATAACCGTC CCGTCAATTCMTTTRAGT CTGCTGCCTCCCGTAGG V3V5 ThroatT_700095565 TTAATGGAAC CCGTCAATTCMTTTRAGT CTGCTGCCTCCCGTAGG V3V5 ThroatT_700095872 CGGACCGGAAC CCGTCAATTCMTTTRAGT CTGCTGCCTCCCGTAGG V3V5 ThroatT_700101388 CCGAACGAC CCGTCAATTCMTTTRAGT CTGCTGCCTCCCGTAGG V3V5 ThroatT_700101622 TTCGTTCTTC CCGTCAATTCMTTTRAGT CTGCTGCCTCCCGTAGG V3V5 Throat
or
#name barcode proximal distal region body_siteS_700033665 CCGTTCCTC CCGTCAATTCMTTTRAGT CTGCTGCCTCCCGTAGG V3V5 StoolS_700035861 ACCGGCGTTC CCGTCAATTCMTTTRAGT CTGCTGCCTCCCGTAGG V3V5 StoolS_700095543 ACGAATTAAC CCGTCAATTCMTTTRAGT CTGCTGCCTCCCGTAGG V3V5 StoolS_700095850 AACCGGATAC CCGTCAATTCMTTTRAGT CTGCTGCCTCCCGTAGG V3V5 StoolS_700101600 AACGGAACGC CCGTCAATTCMTTTRAGT CTGCTGCCTCCCGTAGG V3V5 StoolT_700016994 AATAACCGTC CCGTCAATTCMTTTRAGT CTGCTGCCTCCCGTAGG V3V5 ThroatT_700095565 TTAATGGAAC CCGTCAATTCMTTTRAGT CTGCTGCCTCCCGTAGG V3V5 ThroatT_700095872 CGGACCGGAAC CCGTCAATTCMTTTRAGT CTGCTGCCTCCCGTAGG V3V5 ThroatT_700101388 CCGAACGAC CCGTCAATTCMTTTRAGT CTGCTGCCTCCCGTAGG V3V5 ThroatT_700101622 TTCGTTCTTC CCGTCAATTCMTTTRAGT CTGCTGCCTCCCGTAGG V3V5 Throat
Metadata Files - Demo
• Select the data above and Copy.• Paste into Excel or an open source spreadsheet program. Be sure all
entries are free of spaces and special characters and that all samples have the same number of columns. Avoid the column titles "state" and "type".
• Save As and select tab-delimited.• Name your file in a clear and consistent manner.
or
Metadata Files
• How to determine variable region vs. primer inclusion• Directional awareness of primers• If you aren’t sure, ask!• What are these files often called: mapping, metadata,
oligos, or linker-primer file. (Many others possible.)#name barcode proximal distal region body_siteS_700033665 CCGTTCCTC CCGTCAATTCMTTTRAGT CTGCTGCCTCCCGTAGG V3V5 StoolS_700035861 ACCGGCGTTC CCGTCAATTCMTTTRAGT CTGCTGCCTCCCGTAGG V3V5 Stool
Allmetrics.net Sales Material
Metadata Files
• Another example: Tutorial Set 2 Metadata• What possible issues may arise with this metadata
file?sampleName tag proximal distal region sample_period typeFerm_5 AGCTTCGA GAGTTTGATCNTGGCTCAG CAGCMGCCGCNGTAANAC V1V3 5 FermentationFerm_2 GCCATACATT GAGTTTGATCNTGGCTCAG CAGCMGCCGCNGTAANAC V1V3 2 FermentationFerm_3 GCCAGCAAGT GAGTTTGATCNTGGCTCAG CAGCMGCCGCNGTAANAC V1V3 3 FermentationFerm_4 CGTTAAGA GAGTTTGATCNTGGCTCAG CAGCMGCCGCNGTAANAC V1V3 4 FermentationFerm_1 CTAACAGA GAGTTTGATCNTGGCTCAG CAGCMGCCGCNGTAANAC V1V3 1 FermentationSoil_1 ACGCAAAA GAGTTTGATCNTGGCTCAG CAGCMGCCGCNGTAANAC V1V3 1 SoilSoil_2 CTAACTAA GAGTTTGATCNTGGCTCAG CAGCMGCCGCNGTAANAC V1V3 2 SoilSoil_3 GCGACCTAGT GAGTTTGATCNTGGCTCAG CAGCMGCCGCNGTAANAC V1V3 3 SoilSoil_4 AAGAATCA GAGTTTGATCNTGGCTCAG CAGCMGCCGCNGTAANAC V1V3 4 SoilSoil_5 AGCGCAGA GAGTTTGATCNTGGCTCAG CAGCMGCCGCNGTAANAC V1V3 5 Soil
Metadata Files• Another example• What possible issues may arise with this metadata file?• Change name => #name (or any #1st entry)• Change tag => barcode• Change type => sample_type (do not name columns ‘type’ or ‘state’)• Demo. making and saving as tab-delimited
#name barcode proximal distal region sample_period sample_typeFerm_5 AGCTTCGA GAGTTTGATCNTGGCTCAG CAGCMGCCGCNGTAANAC V1V3 5 FermentationFerm_2 GCCATACATT GAGTTTGATCNTGGCTCAG CAGCMGCCGCNGTAANAC V1V3 2 FermentationFerm_3 GCCAGCAAGT GAGTTTGATCNTGGCTCAG CAGCMGCCGCNGTAANAC V1V3 3 FermentationFerm_4 CGTTAAGA GAGTTTGATCNTGGCTCAG CAGCMGCCGCNGTAANAC V1V3 4 FermentationFerm_1 CTAACAGA GAGTTTGATCNTGGCTCAG CAGCMGCCGCNGTAANAC V1V3 1 FermentationSoil_1 ACGCAAAA GAGTTTGATCNTGGCTCAG CAGCMGCCGCNGTAANAC V1V3 1 SoilSoil_2 CTAACTAA GAGTTTGATCNTGGCTCAG CAGCMGCCGCNGTAANAC V1V3 2 SoilSoil_3 GCGACCTAGT GAGTTTGATCNTGGCTCAG CAGCMGCCGCNGTAANAC V1V3 3 SoilSoil_4 AAGAATCA GAGTTTGATCNTGGCTCAG CAGCMGCCGCNGTAANAC V1V3 4 SoilSoil_5 AGCGCAGA GAGTTTGATCNTGGCTCAG CAGCMGCCGCNGTAANAC V1V3 5 Soil
7zip
• Zipping methods and large file transfers• Compression and archiving of files• Uncompressing in an easy to use format for
PCs• Demo compressing– .sff (s) – http://www.7-zip.org/
From: 7-zip.org
Genboree Workflow• Create Group• Create Database• Create Project• Upload Files • Create Samples (Sample Import using metadata file) • Link Samples to Sequence Files (Sample File Linker) • QC and Attach Sequences (Sequence Import) • QIIME • RDP